Rise Jobs & Careers icon Llm Evaluation Jobs

Browse 55 exciting jobs hiring in Llm Evaluation now. Check out companies hiring such as Beyondsoft Consulting, Cartesia, Braintrust in Irvine, Garden Grove, Orlando.

Beyondsoft Consulting Hybrid United States (Remote)
Posted 4 hours ago

Beyondsoft is hiring a Data Analyst to prepare training data, anonymize documents, and validate LLM/model outputs for AI projects in a remote US-based role.

Photo of the Rise User
Posted yesterday

Join Cartesia’s in-office SF research-engineering team to design and scale synthetic datasets and systems that power next-generation foundation models.

Braintrust Hybrid No location specified
Posted yesterday

Design and deliver developer-focused curriculum and hands-on programs that teach evals and agentic AI at Braintrust, working closely with engineers and product teams.

Photo of the Rise User
Posted 3 days ago

Work as a founding Backend Engineer to build scalable, secure backend infrastructure and data pipelines that power high-impact AI features at an early-stage startup in NYC.

Posted 3 days ago

MLabs, a fast-growing research lab supporting foundation model teams, is hiring a Senior Research Engineer to develop scalable RL recipes, modular environments, and production-ready data pipelines for post-training.

Photo of the Rise User
Posted 4 days ago
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Customer-Centric
Fast-Paced
Growth & Learning
Medical Insurance
Dental Insurance
401K Matching
Paid Time-Off
Maternity Leave
Paternity Leave
Mental Health Resources
Flex-Friendly

Lead the architecture and delivery of generative AI and multimodal systems that enable creative and contextual advertising capabilities across Netflix Ads.

Photo of the Rise User
Posted 4 days ago

WeRide seeks an AI Simulation Engineer to design AI-based simulation scenarios and agent behaviors that validate and accelerate autonomous vehicle algorithms.

Photo of the Rise User
Posted 4 days ago
Inclusive & Diverse
Diversity of Opinions
Passion for Exploration
Dare to be Different
Empathetic
Growth & Learning
Paid Holidays
Medical Insurance
Equity
401K Matching
Learning & Development
Social Gatherings
Flex-Friendly
Maternity Leave
Paternity Leave
Sabbatical

Canva is hiring a Senior Research Engineer to engineer agentic, multimodal evaluation systems that automatically assess and improve the quality and human alignment of generative design models.

Posted 5 days ago

Eigenplane is hiring a Founding AI Research Scientist to drive LLM and agent research into scalable, interpretable production systems at an early-stage AI startup.

Photo of the Rise User
Posted 5 days ago
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony

Lead the technical direction and hands-on engineering for Zapier Agents, building production-grade LLM-driven agent capabilities, integrations, and evaluation systems that scale across thousands of apps and real customers.

Photo of the Rise User
Posted 6 days ago

Ironclad is looking for a Staff Software Engineer - Applied AI to build and productionize LLMs, RAG systems, and document-understanding services that deliver actionable contract insights.

Unstructured seeks an experienced AI/ML Engineer to design, evaluate, and deploy secure ML solutions for Department of Defense and national security customers on government networks.

MLabs Hybrid No location specified
Posted 7 days ago

Work from the SoHo NYC office as an Applied AI Engineer building production LLMs and ML systems that accelerate bringing new therapies to market.

Tessera Labs seeks a Machine Learning Engineer Intern (Fall 2025, Hybrid in San Jose) to build and fine-tune LLM-driven multi-agent pipelines and enterprise tool integrations.

Posted 7 days ago

Lead the design and evaluation of long-term memory systems for LLMs at an early-stage AI startup focused on building self-improving agents.

Weekday AI Hybrid No location specified
Posted 7 days ago

Work with a top AI research lab to evaluate and improve LLM performance on advanced economics tasks by providing expert, written feedback.

Weekday AI Hybrid No location specified
Posted 7 days ago

Help shape next-generation AI by evaluating advanced physics solutions and guiding research teams to improve model performance as a contract Physics AI Trainer.

Oura Hybrid No location specified
Posted 9 days ago

Oura is hiring a Senior AI Engineer to design evaluation systems and build custom LLM and agentic models that power next-generation, actionable health recommendations.

Posted 9 days ago

A 12-month AI Fellowship at the Gates Foundation to design, prototype, and deploy responsible AI solutions for global health and development while building capacity across program teams.

Photo of the Rise User

Work with Khan Academy to design and deploy generative AI features that improve literacy learning in a 24-month fixed-term Senior AI Engineer role.

Photo of the Rise User
Inclusive & Diverse
Feedback Forward
Collaboration over Competition
Growth & Learning

OpenAI is hiring a Research Engineer/Scientist to advance personality and model-behavior research and integrate novel methods into products used by hundreds of millions of users.

Posted 11 days ago

Lead product and context engineering efforts to improve LLM-driven AI agent performance and user experience for advice-focused client intents within Vanguard's Discretionary Advice Platform.

Photo of the Rise User
Dental Insurance
Disability Insurance
Flexible Spending Account (FSA)
Health Savings Account (HSA)
Vision Insurance
Sabbatical
Paid Holidays

Handshake AI seeks an experienced Electrical Engineering specialist (contract) to refine and annotate AI model outputs across circuits, signal processing, and embedded/embedded-systems domains.

Photo of the Rise User

Decagon seeks an Agent Software Engineer intern to build and evaluate production-ready conversational AI agents that improve customer support, working onsite in San Francisco during Summer 2026.

Profound Hybrid New York City
Posted 13 days ago

Profound, an NYC AI startup backed by Sequoia, is hiring an AI/ML Engineer to build production-scale NLP and LLM systems for content classification, generation, and measurement.

Photo of the Rise User
NBCUniversal Hybrid 30 Rockefeller Plaza, New York, NEW YORK
Posted 15 days ago

NBCUniversal is hiring an Analyst, AI Strategy & Innovation to perform market and vendor analysis, build financial/business cases, and support cross-functional pilots and innovation programs across its media businesses.

Photo of the Rise User
Mercor Hybrid San Francisco
Posted 17 days ago

Mercor is hiring an early-career Data Scientist in San Francisco to drive experiments, metrics, and prototypes that improve hiring match quality and product metrics using SQL, Python, and causal thinking.

Photo of the Rise User
Mercor Hybrid San Francisco
Posted 18 days ago

Mercor is hiring an Applied AI Engineer to convert real-world human datasets into production-ready signals, deploy and evaluate LLMs, and build integrations and tooling that improve customer outcomes.

Photo of the Rise User
Posted 18 days ago
Inclusive & Diverse
Feedback Forward
Collaboration over Competition
Growth & Learning

OpenAI seeks a Research Engineer to design, build, and iterate frontier evaluations that quantify financial reasoning and related capabilities in large-scale AI models.

Photo of the Rise User
Posted 19 days ago
Inclusive & Diverse
Feedback Forward
Collaboration over Competition
Growth & Learning

Lead the development of large-scale, auditable evaluations for frontier AI models to measure capabilities and steer safety decisions at OpenAI.

Photo of the Rise User

BetterUp seeks a product-focused Staff Machine Learning Engineer to design and deliver cutting-edge Generative AI coaching experiences and help scale ML systems in production.

TrustLab is hiring a Senior AI Engineer to develop, tune, and deploy LLM-based content moderation systems that operate at enterprise scale.

Photo of the Rise User
Granted Consulting Hybrid No location specified
Posted 20 days ago

Lead development of LLM-driven systems at a mission-driven healthcare startup, focusing on prompt engineering, model optimization, and scalable AI product delivery.

Photo of the Rise User
Kiddom Hybrid No location specified
Posted 21 days ago
Dental Insurance
Disability Insurance
Flexible Spending Account (FSA)
Health Savings Account (HSA)
Vision Insurance
Paid Holidays

Kiddom is hiring a Research Engineer (GenAI) to design and deploy ML-powered search, personalization, and agentic assistant systems that support teachers and improve student learning.

Photo of the Rise User
Posted 21 days ago

PointClickCare seeks an experienced Principal AI Engineer to lead architecture and delivery of agentic AI systems that drive safe, scalable AI adoption across its healthcare platform.

Posted 22 days ago

MCI is hiring a Prompt Engineer to craft and refine prompts for generative AI models, improving output quality across product and customer-facing applications.

College Board's GenAI Studio is hiring a Data Scientist to prototype and evaluate generative AI solutions that support students, educators, and internal products in a fully remote, mission-driven environment.

Posted 22 days ago

MCI is hiring a Prompt Engineer to craft, test, and optimize prompts for generative AI models and integrate prompt engineering into practical BPO and product workflows.

Posted 23 days ago

MCI seeks a detail-oriented Prompt Engineer to craft and optimize prompts for generative AI models and integrate them into practical BPO and product workflows.

Eve Hybrid San Mateo, California
Posted 24 days ago

Eve is hiring an AI Engineer to build, optimize, and ship LLM-powered systems that transform legal workflows and improve outcomes for plaintiff attorneys.

USAA Full-Time COLORADO SPRINGS, Colorado
Sponsored
Aarons Corporate Retail Store SCHENECTADY, New York
Sponsored
Photo of the Rise User
Posted 28 days ago

College Board's GenAI Studio is hiring a Data Scientist to prototype, evaluate, and operationalize generative AI solutions that support students, educators, and internal teams.

Lead LMArena’s open-source research program—building reproducible benchmarks, datasets, and evaluation methods that advance transparent, human-centered AI evaluation.

Photo of the Rise User
Posted 29 days ago
Inclusive & Diverse
Mission Driven
Social Impact Driven
Passion for Exploration
Dare to be Different
Diversity of Opinions
Reward & Recognition
Empathetic
Feedback Forward
Work/Life Harmony
Collaboration over Competition
Growth & Learning
Transparent & Candid
Customer-Centric
Rise from Within
Friends Outside of Work
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Learning & Development
Work Visa Sponsorship
Employee Resource Groups
401K Matching
Paid Time-Off
Maternity Leave
Social Gatherings
Company Retreats

Lead the development and scaling of large language model customization and adaptation as a Principal Machine Learning Engineer on Microsoft's CoreAI - PostTraining team.

Photo of the Rise User
Inclusive & Diverse
Feedback Forward
Collaboration over Competition
Growth & Learning

Lead cross-functional research programs to discover, evaluate, and mitigate adversarial behaviors in large language models at OpenAI's San Francisco office.

Photo of the Rise User
Anduril Industries Hybrid Costa Mesa, California, United States; Seattle, Washington, United States; Washington, District of Columbia, United States
Posted 29 days ago

Lead engineering work to productize and deploy frontier AI models into edge and air-gapped defense systems while building evaluation and deployment pipelines for simulation-driven workflows.

Photo of the Rise User
Posted 29 days ago

Help build and productionize novel LLM-driven lesson experiences and assessment systems at Speak, a fast-growing Series C AI language learning company based in San Francisco.

Yupp AI Hybrid Mountain View
Posted 30 days ago

Yupp seeks an experienced Staff+ AI Engineer in Mountain View to architect and ship scalable LLM applications and lead ML lifecycle work across data, model development, evaluation, and production.

Daydream Hybrid New York City
Posted 30 days ago

Join Daydream as a Data Scientist to design and deploy LLM-driven stylist features and lead model lifecycle work that reimagines fashion shopping.

Photo of the Rise User

Mercor seeks PhD-level STEM experts with scientific Python experience to evaluate and improve LLM-generated code and reasoning in an asynchronous, remote contractor role.

Photo of the Rise User
Posted 30 days ago

Mercor seeks PhD-level biological scientists to design and evaluate advanced biology problems for a top AI lab in a flexible, remote contractor role.

Aarons Corporate Retail Store ALTOONA, Pennsylvania
Sponsored
Employment type
Remote/Onsite
Application Type
Date Posted
Department
Work Experience
Industries
Skills
Company size
Funding
Company Culture
Benefits & Perks
Company Rating
Salary (USD)
Keywords to Exclude

How much do llm evaluation jobs pay?

Below 50k*
3
43%
50k-100k*
0
0%
Over 100k*
4
57%
*average yearly salary (USD)

Top companies hiring for llm evaluation jobs

Best cities to find llm evaluation jobs