Browse 31 exciting jobs hiring in Ai Evaluation now. Check out companies hiring such as Vanta, Handshake, FM in Lubbock, Modesto, Garland.
Lead design and implementation of scalable AI infrastructure and developer tooling to accelerate Vanta’s AI-powered product initiatives.
Lead applied AI product work at Vanta by designing, shipping, and scaling LLM-powered features that accelerate customer compliance and trust.
Handshake AI is hiring a contract Red Teaming Domain Expert to craft adversarial prompts and stress-test LLMs for safety and robustness across real-world edge cases.
Lead cross-functional programs to scale Figma’s AI platform, driving annotation, evaluation, and cost/capacity readiness across engineering, infra, research, and product teams.
An AI engineering role focused on building and improving voice-first and omnichannel credit-servicing agents using Python and integrated language models at an early-stage fintech startup.
Join Commure's Ambient Scribe team as a Senior Backend Engineer to build and scale eval and AI infrastructure that powers next-generation clinical AI products.
Lead the architecture and delivery of large-scale, regulated AI systems—driving multi-agent, multi-modal solutions and engineering standards across cross-functional teams.
Vanta is seeking a GRC AI Subject Matter Expert to drive the accuracy, explainability, and compliance alignment of AI features across core GRC workflows.
Lead the next generation of AI-driven ranking and recommendation systems for LinkedIn's Feed to improve relevance, personalization, and member engagement at massive scale.
Lead a small engineering team to build and scale LinkedIn’s HALO model and agent evaluation platform, combining hands-on technical delivery with people and cross-functional leadership.
Lead the design, research, and deployment of novel AI systems at Campus to personalize and measurably improve the student learning experience.
Bond Studio AI is hiring a Staff AI Engineer to design and implement production AI systems and multi-agent LLM architectures that power agentic 3D design experiences for real-world spaces.
Kilo Code seeks a hands-on Solutions Engineer to run high-leverage demos and POCs, bridge sales and engineering, and help shape the company’s pre- and post-sales technical motions.
MLabs seeks an Applied AI Engineer to build and ship LLM-powered production systems that transform healthcare and life-sciences workflows.
Kilo seeks a technically fluent Senior Partnerships Manager to build and scale strategic relationships with model providers, infra partners, and devtool platforms for its open-source AI coding agent.
Lead the design and execution of evaluation, reliability, and production-scale testing for Anomali’s agentic AI features that automate SOC workflows and improve analyst productivity.
Open Philanthropy is hiring a Recruiter to lead sourcing, round management, and founder searches to bring top talent into mission-critical programs like AI safety and global health.
Help build and ship production AI agents at Sierra as a Software Engineer intern, contributing to the design, implementation, and real-world evaluation of agent features.
GC AI, a fast-growing legal AI startup, is hiring a licensed attorney as a Legal Engineer (Research) to lead evaluation, prompt design, and quality assurance of AI-generated legal outputs.
Work remotely as a Norwegian LLM Agentic Trainer creating realistic multi-turn dialogues and tool-usage examples to improve and benchmark healthcare AI assistants.
Lead the development and optimization of production LLM features—building RAG pipelines, prompt engineering, and evaluation frameworks—to deliver intelligent capabilities across a remote-first product platform.
Lead and shape novel AI safety research directions at OpenAI, driving scalable methods, evaluations, and cross-team delivery to reduce risks from misalignment and model failures.
Lead the GenAI Platform engineering team at Abridge to design, deliver, and operate LLM workflows, agentic systems, and retrieval/evaluation infrastructure for clinical AI products.
Design scalable, high-quality conversational interactions and evaluation systems for Perplexity's AI answer engine, combining strong communication instincts with Python and LLM fluency.
Notion is hiring an AI Applications Engineer to prototype and productize AI-driven solutions that solve cross-functional business problems and accelerate impact.
Lead Jump’s AI-native foundation as Principal AI Engineer, building the AI infrastructure, tooling, and practices that enable the company to deliver smarter fan experiences and internal automations.
Develop production-ready AI features using .NET/C# and LLM/RAG techniques at Thomson Reuters to power expert solutions across legal, tax, risk, and compliance.
Technical AI Safety grantmaker roles at Open Philanthropy to evaluate and recommend research grants that reduce catastrophic AI risk, hiring across Senior Program Associate, Associate Program Officer, and Senior Program Officer levels.
A remote Machine Learning Engineer role building and operating agentic AI systems and LLM-driven pipelines to deliver real-time, reliable insights for a fast-moving AI product team.
Applied Compute is hiring a Forward Deployed Researcher in San Francisco to build and scale production-grade AI agents that embed into enterprise workflows and drive revenue-critical outcomes.
Lead the development of Holtzbrinck’s AI Hub program by scouting AI opportunities, shaping strategic partnerships, and driving community and thought-leadership initiatives across media, science, and education.
Below 50k*
0
|
50k-100k*
0
|
Over 100k*
2
|