About Grafton Sciences
We’re building AI systems with general physical ability — the capacity to experiment, engineer, or manufacture anything. We believe achieving this is a key step towards building superintelligence. With deep technical roots and real-world progress at scale (e.g., a $42M NIH project), we’re pushing the frontier of physical AI. Joining us means inventing from first principles, owning real systems end-to-end, and helping build a capability the world has never had before.
About the Role
We’re seeking a Senior LLM Evals Engineer to build the evaluation and verification layer for agentic, LLM systems acting in complex environments driving autonomous workflows. You’ll design eval suites, automated verifiers, and regression gates that measure real progress on long-horizon planning, agent execution, uncertainty retirement, and end-to-end build success. This role spans systems engineering, rigorous experimentation, and tight collaboration with LLM scientists, agent/toolchain engineers, and simulation teams.
Responsibilities
Build an eval harness for agentic LLM systems (offline, simulator-in-the-loop, and workflow-in-the-loop).
Design evals for long-horizon planning, specific agent-call correctness, recovery behavior, and safety/constraint adherence.
Help with verifier-driven scoring (symbolic checks, simulation/twin checks, surrogate checks) and automated self correction of execution pipeline.
Create regression gates and release criteria for model/prompt/toolchain changes; prevent capability and safety regressions.
Define metrics for outliers identification and efficient question-asking that reduces uncertainty per unit time.
Partner with training teams to turn eval failures into data (SFT/DPO/RL signals) and continuously improve the suite.
Qualifications
Strong experience building evaluation systems for ML models (LLMs preferred) with high engineering rigor.
Excellent software engineering skills (Python, data pipelines, test harnesses, distributed execution, reproducibility).
Deep understanding of agentic failure modes (tool misuse, hallucinated evidence, reward hacking, brittle formatting) and how to measure them.
Ability to work across research and production systems in a fast-moving environment.
Compensation
We offer competitive salary, meaningful equity, and benefits.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Experienced Salesforce technical leader needed to drive development of LWC/Apex features, integrations, and QA automation in a hybrid Indianapolis team working on Agentforce and Service Cloud initiatives.
Experienced Power Apps Developer needed to configure and extend Dynamics 365 Sales Hub (Dataverse) with Canvas/Model-driven apps, automations, integrations, and user enablement for a major logistics company.
Mid‑career Full Stack Software Engineer needed to build backend services and Android features at a Series A, AI‑driven healthcare startup focused on improving patient outcomes.
Senior full-stack engineer needed to build secure integration layers, APIs, and front/back-end components that unify commercial digital engineering tools in a mission-focused cloud ecosystem.
Uncountable seeks a Full-Stack Engineer to split time between front-end and back-end development to help scale and refine a data-analysis driven web platform for scientific R&D.
Palo Alto Networks is hiring a Sr. Staff Software Engineer to lead UI development for its Cloud Management Platform, building scalable React/TypeScript enterprise web applications.
Senior full-stack engineer role at Instructure focused on building global, accessible web experiences and serverless services using TypeScript, Node.js, and Next.js.
Lead architecture and delivery of core backend and platform systems powering EvenUp’s LLM-driven legal products, working at the intersection of backend engineering, ML systems, and platform enablement.
Experienced RDBMS-focused Application Developer needed to build and optimize cross-platform database applications using PL/SQL, T-SQL, and DB2 LUW in a remote role.
Relativity Space is hiring a Senior Software Engineer to develop and operate Linux- and cloud-based backend infrastructure for mission-critical vehicle software on the Terran R rocket.
Experienced C++/Qt software engineer needed to build device drivers, automation scripts, and image-processing algorithms for laser optical alignment systems at an early-stage quantum instrumentation company.
Mastercard is hiring a Lead Software Engineer to architect and deliver secure, high‑availability payment services while mentoring teams and driving engineering excellence.
Sweep seeks a hacker-minded New Grad iOS Engineer to develop high-performance, mission-critical iOS systems that safeguard people and organizations from advanced machine threats.
Grafton Sciences (formerly, Grafton Biosciences) is pioneering physical superintelligence — autonomous systems that merge machine learning, robotics, and scientific reasoning to explore and understand the universe. By designing AI-driven platform...
6 jobs