Judgment Labs builds infrastructure for Agent Behavior Monitoring (ABM). While traditional observability focuses on logging exceptions and latency, our ABM surfaces behavioral anomalies such as instruction drifts and context retrieval loss in scaled production environments.
Hundreds of teams building autonomous agents rely on Judgment to understand how their systems are behaving post-deployment. Instead of reactive incident triage, they cluster patterns across conversations and workflows, correlate regressions to specific interaction types, and pinpoint where reliability breaks down in their usage context.
We’ve raised $30M+ across two rounds in the past five months. Our investors include Lightspeed, SV Angel, Valor Equity Partners, Nova Global, Chris Manning, Michael Ovitz, Michael Abbott, Cory Levy, Kevin Hartz, and others.
Forward Deployed AI Engineers at Judgment Labs embed our agent behavior monitoring (ABM) infrastructure directly into customer production systems. You will work inside customer codebases to integrate monitoring and evaluation into real agent workflows, diagnose failures in live environments, and drive deployments to reliable production use.
This role centers on deep technical execution and customer ownership. You will work directly with customer teams to reason about agent behavior, translate high-level goals into concrete ABM deployments, and own outcomes end-to-end across real production environments. The scope, judgment, and autonomy required in this role mirrors a training ground for what it takes to found or lead a technical company.
Deploy and embed Judgment Labs’ ABM platform and AI components directly into customer codebases and production AI systems
Work inside customer systems to integrate monitoring, evaluation, and agent-facing components into real workflows
Guide customers through technical decisions around agent monitoring, evaluation strategy, and integrating these capabilities into existing production systems.
Own multiple customer engagements end-to-end, ensuring successful integration and sustained adoption of monitoring and evaluation systems within production agent workflows.
You identify with at least one of the following:
Experience deploying AI or LLM-based systems into real production environments
Ability to quickly learn new tools and systems, and integrate AI infrastructure into existing customer workflows and codebases
Ability to translate ambiguous customer goals into concrete technical solutions and evaluation strategies
Strong customer-facing skills, including explaining complex technical concepts clearly and building trust with both technical and non-technical stakeholders
Comfort owning deployments end-to-end, from initial integration through successful production adoption
You want to be a technical founder in the future.
Agents can’t work without this. Today’s agents hallucinate, drift, and break in production. We’re building the infrastructure that fixes this: the monitoring layer that makes agents self-improving.
We’re wired to win. We're a team of less than 20 but we ship like 50+ on the daily. You'll be working with olympiad medalists, debate champions, and competitive athletes who bring that same intensity to company building.
Fast track to founding. Our engineers interface directly with customers, ship code into their environments, and use their feedback to dictate what’s next on the roadmap. Everyone on the team is either an ex-founder or a founder-to-be.
We make sure our people do their best work. If you deserve a spot on the team, money will never get in the way of it. Full benefits, Equinox, and a private chef to take care of you. We sprint hard but we play hard, ask us about our Smash/Mario Kart tournaments.
We work in person in San Francisco.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Experienced Full Stack MERN Developer needed to architect and implement CRM-driven integrations and middleware that power a growing residential solar operations platform.
Help scale Harvey's developer platform by building CI/CD, test, and observability systems that accelerate engineering velocity and reliability.
Lead and scale a high-performing engineering organization at a mission-driven health-tech company, balancing hands-on coding with managerial leadership to ensure reliable delivery of a drug access platform.
An experienced Webmaster / Frontend Developer is needed to own and optimize a high-traffic, marketing-driven website for a US-based company in a fully remote role.
Work on manufacturing software and automation at Skydio to help build, test, and deploy autonomous drones and the tooling that scales production.
Lead and mentor a distributed engineering team at Alertus to design, build, and operate secure, scalable SaaS mass-notification systems on AWS.
PayPal is hiring a Staff Software Engineer (Backend Java) to architect and lead the development of scalable, secure backend systems that power global payment experiences.
Kepler AI is hiring a Senior Software Engineer to architect and ship production-grade multi-agent orchestration systems that power autonomous financial research at enterprise scale.
Lead the architecture and performance of a mission-critical iOS SDK that powers integrations for top FinTech partners.
Scribd is hiring a Senior Software Engineer on the Web Platform team to drive Next.js-powered modernization and build shared frontend architecture used across product teams.
Lead and grow a 10+ engineer platform team to deliver high-scale, low-latency APIs and enterprise tooling that improve reliability, developer velocity, and platform adoption across the business.
Lead the architecture and operation of large-scale GPU proving infrastructure — from bare-metal optimization to cloud-native orchestration — to power a decentralized ZK proving network.
RealSelf is looking for a Senior Software Engineer to modernize and scale its AdTech stack, building backend services and data pipelines that power ad delivery, measurement, and monetization.