Browse 26 exciting jobs hiring in Llm Evaluation now. Check out companies hiring such as Welocalize, WHOOP, Flock Safety in Charlotte, Laredo, Madison.
Welo Data is seeking native English annotators in the U.S. to produce high-quality ground truth and evaluate model outputs for personalized music, podcast, and audiobook experiences.
WHOOP is hiring a Senior AI/ML Engineer to design, build, and operate production AI systems and LLM tooling that power personalized, member-facing experiences like WHOOP Coach and AI Support.
Lead the design and productionization of agentic AI systems and an evaluation platform to power Night Shift, Flock Safety’s investigator-facing LLM agent product.
Senior Machine Learning Engineer needed to build and deploy scalable, production ML systems that improve healthcare outcomes and operational efficiency.
Anduril's Thunderforge team is hiring a Prompt Engineering Intern to develop prompts, agent graph architectures, and test/evaluation tooling for AI-enabled wargaming.
Lead experimentation, trace analysis, and metric design to measure and improve Replit's AI agent, converting agent traces into product-changing insights for engineering and leadership.
YouGov seeks a hands-on Data Scientist/AI Engineer to build and deploy LLM-based applications and advanced analytics for market research using survey, census, and behavioral datasets.
Lead the design and delivery of Zapier’s unified AI platform as a Staff Applied AI Engineer, shaping runtime, orchestration, and evaluation systems that power the company’s AI products.
Jump seeks a US-based QA Engineer to own AI evaluation, labeling campaigns, and QA processes that improve generative AI outputs for our meeting assistant product.
Lead product strategy and execution for context, memory, and retrieval systems that power MagicSchool’s AI agents to deliver reliable, educator-focused assistance at scale.
Mursion is hiring a Prompt Engineer to craft production-grade LLM prompts, manage RAG/JSON workflows, and translate learning objectives into reliable AI-driven simulation behavior.
Sony AI's Research Ethics team is hiring a remote Engineering Intern (AI Ethics) to help build agentic AI infrastructure, run LLM evaluations, and develop tools for responsible AI in a research-driven environment.
Lead development of Arcade’s conversational AI product creation agent as the company’s first dedicated Product Manager for AI, reporting directly to the CEO.
Vetcove seeks an AI-focused BAML Engineer to design, implement, and maintain BAML-driven LLM workflows and evaluation tooling for its veterinary software platform.
Help developers adopt Judgment Labs' SDK and evaluation tools by building docs, demos, and sample agent setups as a Developer Relations Engineer in San Francisco.
Be part of a San Francisco-based venture-backed team as a Technical Writer crafting deep technical content on agent evaluation, monitoring, and reward modeling for a technical audience.
Oumi seeks a Research Scientist to advance open-source LLM and VLM research by developing models, datasets, benchmarks, and publishing results with the community.
Lead the strategy and delivery of distributed inference, LLM integrations, and on-device ML features at webAI to enable privacy-first, enterprise-grade AI on the edge.
Lead design and implementation of scalable AI infrastructure and developer tooling to accelerate Vanta’s AI-powered product initiatives.
Lead applied AI product work at Vanta by designing, shipping, and scaling LLM-powered features that accelerate customer compliance and trust.
An AI engineering role focused on building and improving voice-first and omnichannel credit-servicing agents using Python and integrated language models at an early-stage fintech startup.
Join Commure's Ambient Scribe team as a Senior Backend Engineer to build and scale eval and AI infrastructure that powers next-generation clinical AI products.
Bond Studio AI is hiring a Staff AI Engineer to design and implement production AI systems and multi-agent LLM architectures that power agentic 3D design experiences for real-world spaces.
Kilo Code seeks a hands-on Solutions Engineer to run high-leverage demos and POCs, bridge sales and engineering, and help shape the company’s pre- and post-sales technical motions.
MLabs seeks an Applied AI Engineer to build and ship LLM-powered production systems that transform healthcare and life-sciences workflows.
Lead the design and execution of evaluation, reliability, and production-scale testing for Anomali’s agentic AI features that automate SOC workflows and improve analyst productivity.
Below 50k*
0
|
50k-100k*
1
|
Over 100k*
0
|