Llm Evaluation Jobs

Browse 26 exciting jobs hiring in Llm Evaluation now. Check out companies hiring such as Welocalize, WHOOP, Flock Safety in Charlotte, Laredo, Madison.

VIEW COMPANIES

Multilingual Audio Personalization Evaluator - English (United States)

Welocalize Hybrid United States

VIEW

Posted 7 hours ago

Welo Data is seeking native English annotators in the U.S. to produce high-quality ground truth and evaluate model outputs for personalized music, podcast, and audiobook experiences.

Senior AI/ML Engineer (AI Platform)

WHOOP Hybrid Boston, MA

VIEW

Posted 2 days ago

WHOOP is hiring a Senior AI/ML Engineer to design, build, and operate production AI systems and LLM tooling that power personalized, member-facing experiences like WHOOP Coach and AI Support.

Staff AI Systems Engineer

Flock Safety Hybrid No location specified

VIEW

Posted 5 days ago

Medical Insurance

Dental Insurance

Vision Insurance

Mental Health Resources

Learning & Development

Equity

Paid Holidays

Paid Time-Off

WFH Reimbursements

Child Care stipend

Maternity Leave

Paternity Leave

Lead the design and productionization of agentic AI systems and an evaluation platform to power Night Shift, Flock Safety’s investigator-facing LLM agent product.

Field Quality Assurance Compliance Auditor - Manufacturing

FM Hybrid MALVERN, Pennsylvania

VIEW

Sponsored

FM Research Cybersecurity Co-op - Summer/Fall 2026

FM Hybrid NORWOOD, Massachusetts

VIEW

Sponsored

FM IT/OT Infrastructure & Security Co‑op - Winter/Spring 2026

FM Hybrid NORWOOD, Massachusetts

VIEW

Sponsored

Senior Machine Learning Engineer

Jobgether Hybrid US

VIEW

Posted 7 days ago

Senior Machine Learning Engineer needed to build and deploy scalable, production ML systems that improve healthcare outcomes and operational efficiency.

Prompt Engineering Intern

Anduril Industries Hybrid Costa Mesa, California, United States

VIEW

Posted 7 days ago

Anduril's Thunderforge team is hiring a Prompt Engineering Intern to develop prompts, agent graph architectures, and test/evaluation tooling for AI-enabled wargaming.

Data Scientist, AI Agent

Replit Hybrid Foster City

VIEW

Posted 9 days ago

Inclusive & Diverse

Mission Driven

Work/Life Harmony

Diversity of Opinions

Friends Outside of Work

Empathetic

Collaboration over Competition

Fast-Paced

Transparent & Candid

Medical Insurance

Dental Insurance

Vision Insurance

Disability Insurance

Learning & Development

401K Matching

Paid Time-Off

WFH Reimbursements

Paid Holidays

Equity

Flex-Friendly

Lead experimentation, trace analysis, and metric design to measure and improve Replit's AI agent, converting agent traces into product-changing insights for engineering and leadership.

Data Scientist/AI Engineer (Remote)

YouGov Hybrid New York, United States of America

VIEW

Posted 9 days ago

YouGov seeks a hands-on Data Scientist/AI Engineer to build and deploy LLM-based applications and advanced analytics for market research using survey, census, and behavioral datasets.

Staff Engineer, Applied AI

Zapier Hybrid San Francisco

VIEW

Posted 9 days ago

Inclusive & Diverse

Rise from Within

Mission Driven

Diversity of Opinions

Work/Life Harmony

Lead the design and delivery of Zapier’s unified AI platform as a Staff Applied AI Engineer, shaping runtime, orchestration, and evaluation systems that power the company’s AI products.

QA Engineer for Generative AI

Jump Hybrid Salt Lake City

VIEW

Posted 9 days ago

Jump seeks a US-based QA Engineer to own AI evaluation, labeling campaigns, and QA processes that improve generative AI outputs for our meeting assistant product.

Senior Product Manager - AI Systems & Context

MagicSchool AI Hybrid Remote

VIEW

Posted 10 days ago

Lead product strategy and execution for context, memory, and retrieval systems that power MagicSchool’s AI agents to deliver reliable, educator-focused assistance at scale.

Senior Research Scientist – Computational Wind Engineering

FM Hybrid NORWOOD, Massachusetts

VIEW

Sponsored

Senior Research Scientist - Material Flammability, Fire Dynamics and Lithium-ion Battery Safety

FM Hybrid NORWOOD, Massachusetts

VIEW

Sponsored

Sr. Research Engineer - Electrical/Power Generation - Design, operation, maintenance of electrical/power generation equipment

FM Hybrid NORWOOD, Massachusetts

VIEW

Sponsored

Prompt Engineer

Mursion, Inc Hybrid No location specified

VIEW

Posted 10 days ago

Mursion is hiring a Prompt Engineer to craft production-grade LLM prompts, manage RAG/JSON workflows, and translate learning objectives into reliable AI-driven simulation behavior.

Engineering Intern (AI Ethics)

sonyglobal Hybrid Remote - California

VIEW

Posted 11 days ago

Sony AI's Research Ethics team is hiring a remote Engineering Intern (AI Ethics) to help build agentic AI infrastructure, run LLM evaluations, and develop tools for responsible AI in a research-driven environment.

Product Manager, AI

Arcade Hybrid Presidio

VIEW

Posted 11 days ago

Lead development of Arcade’s conversational AI product creation agent as the company’s first dedicated Product Manager for AI, reporting directly to the CEO.

BAML Engineer

Vetcove Hybrid Remote

VIEW

Posted 18 days ago

Mission Driven

Inclusive & Diverse

Growth & Learning

Transparent & Candid

Medical Insurance

Dental Insurance

Vision Insurance

401K Matching

Flex-Friendly

Equity

Vetcove seeks an AI-focused BAML Engineer to design, implement, and maintain BAML-driven LLM workflows and evaluation tooling for its veterinary software platform.

Developer Relations Engineer

Judgment Labs Hybrid San Francisco

VIEW

Posted 18 days ago

Help developers adopt Judgment Labs' SDK and evaluation tools by building docs, demos, and sample agent setups as a Developer Relations Engineer in San Francisco.

Technical Writer

Judgment Labs Hybrid San Francisco

VIEW

Posted 18 days ago

Be part of a San Francisco-based venture-backed team as a Technical Writer crafting deep technical content on agent evaluation, monitoring, and reward modeling for a technical audience.

Research Scientist

Oumi Hybrid New York

VIEW

Posted 19 days ago

Oumi seeks a Research Scientist to advance open-source LLM and VLM research by developing models, datasets, benchmarks, and publishing results with the community.

Technical Product Manager, AI

webAI Hybrid Austin

VIEW

Posted 19 days ago

Lead the strategy and delivery of distributed inference, LLM integrations, and on-device ML features at webAI to enable privacy-first, enterprise-grade AI on the edge.

Senior Software Engineer, AI Platform

Vanta Hybrid No location specified

VIEW

Posted 22 days ago

Inclusive & Diverse

Growth & Learning

Customer-Centric

Collaboration over Competition

Medical Insurance

Maternity Leave

Flex-Friendly

401K Matching

Lead design and implementation of scalable AI infrastructure and developer tooling to accelerate Vanta’s AI-powered product initiatives.

Senior Software Engineer, AI Product

Vanta Hybrid No location specified

VIEW

Posted 22 days ago

Inclusive & Diverse

Growth & Learning

Customer-Centric

Collaboration over Competition

Medical Insurance

Maternity Leave

Flex-Friendly

401K Matching

Lead applied AI product work at Vanta by designing, shipping, and scaling LLM-powered features that accelerate customer compliance and trust.

FM Approvals Research Campus Engineering Technician - Materials

FM Hybrid WEST GLOCESTER, Rhode Island

VIEW

Sponsored

Senior Research Engineer – Mechanical - Rotating Machinery

FM Hybrid NORWOOD, Massachusetts

VIEW

Sponsored

Member of Technical Staff (AI Engineering)

Awesome Motive Hybrid San Francisco

VIEW

Posted 22 days ago

An AI engineering role focused on building and improving voice-first and omnichannel credit-servicing agents using Python and integrated language models at an early-stage fintech startup.

Senior Backend Engineer, Evals and AI Infra

Commure Hybrid Mountain View

VIEW

Posted 24 days ago

Join Commure's Ambient Scribe team as a Senior Backend Engineer to build and scale eval and AI infrastructure that powers next-generation clinical AI products.

Staff AI Engineer

Awesome Motive Hybrid United States

VIEW

Posted 27 days ago

Bond Studio AI is hiring a Staff AI Engineer to design and implement production AI systems and multi-agent LLM architectures that power agentic 3D design experiences for real-world spaces.

Solutions Engineer

Kilo Code Hybrid No location specified

VIEW

Posted 28 days ago

Kilo Code seeks a hands-on Solutions Engineer to run high-leverage demos and POCs, bridge sales and engineering, and help shape the company’s pre- and post-sales technical motions.

Applied AI Engineer

MLabs Hybrid No location specified

VIEW

Posted 29 days ago

MLabs seeks an Applied AI Engineer to build and ship LLM-powered production systems that transform healthcare and life-sciences workflows.

Senior Engineer, AI Evaluation & Reliability (Agentic AI)

Anomali Hybrid Redwood City, CA

VIEW

Posted 29 days ago

Dental Insurance

Disability Insurance

Flexible Spending Account (FSA)

Vision Insurance

Family Medical Leave

Paid Holidays

Lead the design and execution of evaluation, reliability, and production-scale testing for Anomali’s agentic AI features that automate SOC workflows and improve analyst productivity.

Employment type

Remote/Onsite

Application Type

Date Posted

Department

Work Experience

Industries

Skills

Company size

Funding

Company Culture

Benefits & Perks