Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
LLM Evals Engineering Lead image - Rise Careers
Job details

LLM Evals Engineering Lead

About Grafton Sciences

We’re building AI systems with general physical ability — the capacity to experiment, engineer, or manufacture anything. We believe achieving this is a key step towards building superintelligence. With deep technical roots and real-world progress at scale (e.g., a $42M NIH project), we’re pushing the frontier of physical AI. Joining us means inventing from first principles, owning real systems end-to-end, and helping build a capability the world has never had before.

About the Role

We’re seeking a Senior LLM Evals Engineer to build the evaluation and verification layer for agentic, LLM systems acting in complex environments driving autonomous workflows. You’ll design eval suites, automated verifiers, and regression gates that measure real progress on long-horizon planning, agent execution, uncertainty retirement, and end-to-end build success. This role spans systems engineering, rigorous experimentation, and tight collaboration with LLM scientists, agent/toolchain engineers, and simulation teams.

Responsibilities

  • Build an eval harness for agentic LLM systems (offline, simulator-in-the-loop, and workflow-in-the-loop).

  • Design evals for long-horizon planning, specific agent-call correctness, recovery behavior, and safety/constraint adherence.

  • Help with verifier-driven scoring (symbolic checks, simulation/twin checks, surrogate checks) and automated self correction of execution pipeline.

  • Create regression gates and release criteria for model/prompt/toolchain changes; prevent capability and safety regressions.

  • Define metrics for outliers identification and efficient question-asking that reduces uncertainty per unit time.

  • Partner with training teams to turn eval failures into data (SFT/DPO/RL signals) and continuously improve the suite.

Qualifications

  • Strong experience building evaluation systems for ML models (LLMs preferred) with high engineering rigor.

  • Excellent software engineering skills (Python, data pipelines, test harnesses, distributed execution, reproducibility).

  • Deep understanding of agentic failure modes (tool misuse, hallucinated evidence, reward hacking, brittle formatting) and how to measure them.

  • Ability to work across research and production systems in a fast-moving environment.

Compensation

  • We offer competitive salary, meaningful equity, and benefits.

Average salary estimate

$240000 / YEARLY (est.)
min
max
$180000K
$300000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs
Photo of the Rise User
Salesforce Hybrid Indiana - Indianapolis
Posted 12 hours ago
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Feedback Forward
Take Risks
Collaboration over Competition
Medical Insurance
Dental Insurance
Vision Insurance
Paid Time-Off
Maternity Leave
Paternity Leave
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Employee Resource Groups

Experienced Salesforce technical leader needed to drive development of LWC/Apex features, integrations, and QA automation in a hybrid Indianapolis team working on Agentforce and Service Cloud initiatives.

Photo of the Rise User
Mclane Global Hybrid No location specified
Posted 1 hour ago

Experienced Power Apps Developer needed to configure and extend Dynamics 365 Sales Hub (Dataverse) with Canvas/Model-driven apps, automations, integrations, and user enablement for a major logistics company.

Photo of the Rise User
Posted 1 hour ago

Mid‑career Full Stack Software Engineer needed to build backend services and Android features at a Series A, AI‑driven healthcare startup focused on improving patient outcomes.

Posted 4 hours ago

Senior full-stack engineer needed to build secure integration layers, APIs, and front/back-end components that unify commercial digital engineering tools in a mission-focused cloud ecosystem.

Photo of the Rise User
Uncountable Hybrid No location specified
Posted 11 hours ago

Uncountable seeks a Full-Stack Engineer to split time between front-end and back-end development to help scale and refine a data-analysis driven web platform for scientific R&D.

Photo of the Rise User

Palo Alto Networks is hiring a Sr. Staff Software Engineer to lead UI development for its Cloud Management Platform, building scalable React/TypeScript enterprise web applications.

Photo of the Rise User
Posted 4 hours ago
Health Savings Account (HSA)
Dental Insurance
Vision Insurance
Disability Insurance
Flexible Spending Account (FSA)
Family Medical Leave
Paid Holidays

Senior full-stack engineer role at Instructure focused on building global, accessible web experiences and serverless services using TypeScript, Node.js, and Next.js.

Photo of the Rise User
Posted 3 hours ago

Lead architecture and delivery of core backend and platform systems powering EvenUp’s LLM-driven legal products, working at the intersection of backend engineering, ML systems, and platform enablement.

Photo of the Rise User

Experienced RDBMS-focused Application Developer needed to build and optimize cross-platform database applications using PL/SQL, T-SQL, and DB2 LUW in a remote role.

Photo of the Rise User

Relativity Space is hiring a Senior Software Engineer to develop and operate Linux- and cloud-based backend infrastructure for mission-critical vehicle software on the Terran R rocket.

Posted 41 minutes ago

Experienced C++/Qt software engineer needed to build device drivers, automation scripts, and image-processing algorithms for laser optical alignment systems at an early-stage quantum instrumentation company.

Photo of the Rise User
Mastercard Hybrid O'Fallon, Missouri
Posted 6 hours ago
Inclusive & Diverse
Empathetic
Collaboration over Competition
Growth & Learning
Transparent & Candid

Mastercard is hiring a Lead Software Engineer to architect and deliver secure, high‑availability payment services while mentoring teams and driving engineering excellence.

Posted 22 hours ago

Sweep seeks a hacker-minded New Grad iOS Engineer to develop high-performance, mission-critical iOS systems that safeguard people and organizations from advanced machine threats.

Grafton Sciences (formerly, Grafton Biosciences) is pioneering physical superintelligence — autonomous systems that merge machine learning, robotics, and scientific reasoning to explore and understand the universe. By designing AI-driven platform...

6 jobs
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
No info
EMPLOYMENT TYPE
Full-time, onsite
DATE POSTED
January 1, 2026
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!