Browse 59 exciting jobs hiring in Model Evaluation now. Check out companies hiring such as Oura, Gates Foundation, OpenAI in Fayetteville, Lincoln, San Diego.
Lead evaluation and custom model development for Oura’s AI Advisor, combining production ML engineering with research to deliver reliable, actionable AI-driven health insights.
A 12-month AI Fellowship at the Gates Foundation to design, prototype, and deploy responsible AI solutions for global health and development while building capacity across program teams.
OpenAI is hiring a Research Engineer/Scientist to advance personality and model-behavior research and integrate novel methods into products used by hundreds of millions of users.
Join a research team building agentic capabilities for ChatGPT, contributing to research, large-scale training, evaluations, and production deployment in a hybrid San Francisco role.
MLabs is hiring a Data Scientist to develop and productionize statistical and machine learning models that detect insurance fraud across workers' compensation and personal injury domains.
Lead product and context engineering efforts to improve LLM-driven AI agent performance and user experience for advice-focused client intents within Vanguard's Discretionary Advice Platform.
Prime Time Consulting is hiring a Level 3 Data Scientist in Maryland to develop and evaluate NLP tokenization and POS annotation solutions for government-focused language datasets.
Handshake AI seeks an experienced Electrical Engineering specialist (contract) to refine and annotate AI model outputs across circuits, signal processing, and embedded/embedded-systems domains.
Work at the intersection of research and deployment to turn Twelve Labs’ video understanding models into scalable, production solutions for customers.
Build and ship mission-critical conversational AI agents at Decagon, working directly with enterprise customers to create scalable, high-impact solutions.
Dandy is hiring a Senior Machine Learning Engineer to advance 3D computer vision and generative ML models that automate and scale dental appliance manufacturing.
Mercor is hiring an Applied AI Engineer to convert real-world human datasets into production-ready signals, deploy and evaluate LLMs, and build integrations and tooling that improve customer outcomes.
OpenAI seeks a Research Engineer to design, build, and iterate frontier evaluations that quantify financial reasoning and related capabilities in large-scale AI models.
An entry-level AI engineering position at OCC focused on building data integrations, evaluating AI tools, and supporting responsible AI implementations across business and technology teams.
Lead the development of large-scale, auditable evaluations for frontier AI models to measure capabilities and steer safety decisions at OpenAI.
BetterUp seeks a product-focused Staff Machine Learning Engineer to design and deliver cutting-edge Generative AI coaching experiences and help scale ML systems in production.
TrustLab is hiring a Senior AI Engineer to develop, tune, and deploy LLM-based content moderation systems that operate at enterprise scale.
Lead development of LLM-driven systems at a mission-driven healthcare startup, focusing on prompt engineering, model optimization, and scalable AI product delivery.
PointClickCare seeks an experienced Principal AI Engineer to lead architecture and delivery of agentic AI systems that drive safe, scalable AI adoption across its healthcare platform.
Blue River Technology is hiring a CVML Engineer to drive data-centric workflows, dashboards, and model development for the See & Spray precision agriculture project.
MCI is hiring a Prompt Engineer to craft and refine prompts for generative AI models, improving output quality across product and customer-facing applications.
Handshake is seeking a Strategic Projects Lead to manage large-scale AI data projects, scale an expert workforce, and optimize cross-functional operations to support growth and customer success.
College Board's GenAI Studio is hiring a Data Scientist to prototype and evaluate generative AI solutions that support students, educators, and internal products in a fully remote, mission-driven environment.
MCI seeks a detail-oriented Prompt Engineer to craft and optimize prompts for generative AI models and integrate them into practical BPO and product workflows.
Eve is hiring an AI Engineer to build, optimize, and ship LLM-powered systems that transform legal workflows and improve outcomes for plaintiff attorneys.
Seeking licensed litigators to evaluate and train advanced legal AI models by testing realistic legal scenarios and documenting model reasoning gaps.
Zoox is hiring a Senior ML Engineer to design AutoML and evaluation systems that align autonomous driving software with expert human driving behavior.
Harvey seeks an Applied Legal Researcher who combines corporate law expertise and hands-on AI evaluation skills to design and validate legal workflows used by leading law firms.
College Board's GenAI Studio is hiring a Data Scientist to prototype, evaluate, and operationalize generative AI solutions that support students, educators, and internal teams.
Lead and scale new customer acquisition channels at Reprise Financial as Head of New Customer Strategy & Analytics, driving profitable growth via direct mail, affiliate, and search with a data-first approach.
Help Zoox design and productionize ML and AutoML systems that learn from expert human drivers to benchmark and tune autonomous vehicle behavior for safer, more natural driving.
Lead the development and scaling of large language model customization and adaptation as a Principal Machine Learning Engineer on Microsoft's CoreAI - PostTraining team.
Lead cross-functional research programs to discover, evaluate, and mitigate adversarial behaviors in large language models at OpenAI's San Francisco office.
Crux is hiring an AI Product Engineer to lead the build-out of production AI features and infrastructure that modernize financing for clean energy projects.
Lead engineering work to productize and deploy frontier AI models into edge and air-gapped defense systems while building evaluation and deployment pipelines for simulation-driven workflows.
Help design, train, and deploy state-of-the-art speech recognition and pronunciation models at Speak to power personalized language learning experiences worldwide.
Help build and productionize novel LLM-driven lesson experiences and assessment systems at Speak, a fast-growing Series C AI language learning company based in San Francisco.
Middesk is hiring a founding Data Scientist to design scalable ML analytics and operationalize model-backed identity and fraud products on a hybrid SF/NY team.
Yupp seeks an experienced Staff+ AI Engineer in Mountain View to architect and ship scalable LLM applications and lead ML lifecycle work across data, model development, evaluation, and production.
Join Daydream as a Data Scientist to design and deploy LLM-driven stylist features and lead model lifecycle work that reimagines fashion shopping.
Mercor seeks PhD-level STEM experts with scientific Python experience to evaluate and improve LLM-generated code and reasoning in an asynchronous, remote contractor role.
Mercor seeks PhD-level biological scientists to design and evaluate advanced biology problems for a top AI lab in a flexible, remote contractor role.
Mercor is recruiting PhD-level scientists and advanced STEM graduates to perform part-time, remote evaluations of LLM outputs in biology, physics, and chemistry for a high-impact AI research program.
Mercor seeks experienced MDs/DOs to work remotely, part-time, on a 5-week project evaluating AI systems for clinical tasks and medical workflow simulations.
Senior-level AI engineer needed to design, build, and scale production LLM applications at a fast-growing Silicon Valley startup focused on trustworthy model evaluation and GenAI products.
Kiddom seeks an AI Researcher to drive generative-AI research and build safe, practical AI features that personalize instruction and improve student outcomes.
Lead Kyivstar.Tech’s NLP efforts to design, train, and deploy Ukrainian-focused LLMs and NLP systems while mentoring a team and shaping the product roadmap.
Lead product strategy and execution for Yupp’s consumer and AI-builder platforms, shaping features that impact millions and improve model evaluation and adoption.
Guidewire seeks a Product Manager focused on GenAI/ML to lead development of an AI-native underwriting solution that delivers seamless data ingestion, model-driven risk assessment, and platform integrations for P&C insurers.
Yupp is hiring a Senior Product Manager to own consumer and AI-builder product strategy, model evaluation frameworks, and growth initiatives at scale.
Below 50k*
2
|
50k-100k*
3
|
Over 100k*
41
|