Browse 77 exciting jobs hiring in Ai Evaluation now. Check out companies hiring such as Alignerr, Handshake, The Mirage in Chicago, Virginia Beach, Louisville/Jefferson County.
Senior C# Full-Stack Engineer (contract, remote) to build high-performance C# systems and full-stack tooling for AI data, evaluation, and annotation pipelines.
Contribute your electronics engineering expertise to train and evaluate AI models as a remote contract AI Trainer for Handshake at $120/hr.
Join Mirage's NYC engineering team to build end-to-end applied AI systems that enable new creative experiences for short-form video at scale.
Provide technical leadership to ARPA-H and external performers by designing and validating healthcare datasets and evaluation frameworks for AI-driven rare disease diagnostics.
Handshake seeks Community Health Worker professionals to provide expert, asynchronous evaluations of AI outputs—no prior AI experience required.
Handshake seeks seasoned Advertising and Promotions Managers to perform flexible, remote contract work evaluating AI outputs and providing structured feedback to improve models.
Handshake seeks advertising sales professionals to work remotely as contract AI Trainers, reviewing model outputs and crafting prompts to improve AI understanding of advertising tasks.
Handshake seeks experienced electrical engineers to work remotely and asynchronously as contract AI trainers, evaluating model outputs and crafting domain-aligned prompts to improve AI understanding of electrical engineering tasks.
Contribute your business teaching or professional experience to a remote, flexible AI training program that evaluates model outputs and improves workplace-relevant AI understanding.
Boeing is looking for Software Engineers specializing in LLMs to design, implement, and integrate AI/ML capabilities into aerospace and autonomy systems at its Tukwila, WA site.
Join Fieldguide as an AI Engineer to design, build, and operate agentic systems and production-ready LLM-powered features for mission-critical audit workflows.
Work remotely as a contract Senior Python Full-Stack Engineer to build scalable evaluation and data infrastructure powering model training, benchmarking, and quality assurance at Alignerr.
Arcade seeks an experienced technical lead to build and run DataOps, annotation, and evaluation systems that power generative AI-driven product design.
Alignerr is hiring a Senior C++ Full-Stack Engineer to design and optimize high-performance C++ systems and full-stack tooling for AI data annotation, validation, and evaluation pipelines.
Flow Engineering is hiring an AI/ML Software Engineer in San Francisco to build agentic, LLM-driven features that help engineers author, review, and validate complex system requirements.
Experienced AI engineer needed to rapidly prototype and productionize LLM and ML-driven systems for healthcare-focused products and internal tools at a fast-growing startup.
Work remotely as a Senior C++ Full-Stack Engineer at Alignerr to build and optimize high-performance C++ services and tooling for AI data pipelines and evaluation workflows.
Alignerr is looking for a Senior Rust Full-Stack Engineer to design and optimize high-performance Rust systems and tooling that power AI data pipelines and model evaluation workflows.
Senior Rust Full-Stack Engineer needed to build and optimize production-grade AI data pipelines and tooling for model training and evaluation at a remote-first AI infrastructure firm.
Plasmidsaurus is hiring an AI Engineer to build production LLM-driven bioinformatics agents that turn rapid RNA-seq outputs into actionable biological insights for research teams.
A senior C++ full-stack systems engineer is needed to build reliable, high-performance infrastructure and tooling for AI data pipelines and evaluation workflows at a remote-first AI-focused company.
Work remotely as a senior C++ engineer building and optimizing high-performance systems and full-stack tooling for AI data pipelines and evaluation workflows.
A senior Rust engineer is needed to build and optimize high-performance ML data and evaluation infrastructure for Alignerr’s AI research and production workflows on a part‑time remote contract.
Lead and execute high-impact AI data programs at Handshake, coordinating large distributed teams and partnering with frontier AI labs to drive revenue, quality, and scalable delivery.
Agiloft seeks a Learning & Development Partner to design and deliver scalable, data-driven learning programs—especially AI-focused enablement—to accelerate employee development and organizational capability.
Experienced full-stack engineer needed to build and optimize C# systems and tooling that power large-scale AI data and evaluation workflows at Alignerr.
Handshake is hiring experienced commercial pilots to remotely evaluate and refine AI model outputs using real-world aviation expertise.
Pax Historia seeks a founding ML systems engineer in San Francisco to build production-grade infrastructure, evaluations, and model tuning that make their AI-driven game both higher-quality and more affordable.
Take2 AI seeks a hands-on Prompt Engineer to design and scale AI Interviewers and evaluation systems that automate and improve high-volume candidate screening.
Lead production-grade LLM and AI agent development at Everstar to accelerate nuclear deployment through rigorous evals, fine-tuning, and synthetic data pipelines.
Work as a contract Senior C++ Full-Stack Engineer building high-performance C++ systems and full-stack tooling to support large-scale AI data, annotation, and evaluation workflows for leading labs.
Experienced C# backend engineer needed to build and optimize high-performance services and full-stack tooling for AI data pipelines and evaluation workflows at Alignerr.
Alignerr is hiring a Senior C# Full-Stack Engineer to build high-performance backend services and tooling for AI data pipelines and evaluation systems on a remote, contract basis.
Alignerr is hiring a Senior Rust engineer to build high-performance backend services and developer tooling for AI data pipelines and evaluation workflows on a part-time contract basis.
Work on production-grade data ingestion, UI components, evaluation systems, and agent tooling at a small engineering company focused on automating pre-construction engineering workflows.
Cambium Assessment is hiring a Senior Software Engineer to design and implement responsible generative AI agents and integrate advanced LLM-driven capabilities into mission-critical EdTech products.
Experienced Python backend engineer needed to build and optimize scalable data and evaluation tooling for cutting-edge AI research workflows at Alignerr.
Senior Rust systems engineer needed to architect and optimize high-performance backend and tooling for AI data pipelines and evaluation workflows at Alignerr.
Alignerr seeks a senior C++ full-stack engineer to build high-performance backend and tooling for AI data pipelines and model evaluation on a flexible remote contract.
Work remotely as a contract Senior Rust Full-Stack Engineer to build and optimize distributed systems powering AI data pipelines, annotation, and evaluation workflows for Alignerr.
CloudFactory is seeking US-based, detail-focused Data Entry & Content Review Specialists for a full-time, fixed-term AI model evaluation project running Feb 1–Oct 1, 2026.
Red Hat's OpenShift AI team is hiring a Senior ML Engineer to architect and lead large-scale evaluation and safety infrastructure for LLMs and agentic systems in open-source and hybrid-cloud environments.
Handshake is seeking experienced supervisors in mechanics, installation, or repair to work remotely as contract AI Trainers evaluating and improving model outputs using their hands-on expertise.
Contract Health Education Specialists will use their public health experience to develop prompts and evaluate AI-generated health education content in a flexible, remote, asynchronous role.
Handshake seeks experienced instructional coordinators to assess AI outputs and provide structured, field-informed feedback on a flexible, remote contract basis.
Handshake is contracting mathematicians to remotely evaluate AI-generated math content and provide expert feedback to improve model accuracy and domain understanding.
Experienced event planners are sought for a remote, flexible contract to evaluate and train AI models using real-world event planning expertise.
Experienced entertainment and recreation supervisors are needed to evaluate AI responses, craft field-relevant prompts, and provide feedback in a remote, hourly contract role.
Handshake seeks aerospace engineering professionals to evaluate AI outputs and craft domain-specific prompts on a flexible, remote contract basis at $150/hr.
Handshake seeks seasoned talent agents and business managers to evaluate AI-generated content and craft prompts that reflect real entertainment and sports industry workflows in a flexible, contract role.
Below 50k*
7
|
50k-100k*
1
|
Over 100k*
9
|