Replit is the agentic software creation platform that enables anyone to build applications using natural language. With millions of users worldwide and over 500,000 business users, Replit is democratizing software development by removing traditional barriers to application creation.
Replit is redefining how software is built, and who gets to build it. Our mission is to achieve Autonomy for All: making programming accessible, collaborative, and powered by AI. To realize this vision, we are building a brand that is as iconic, inventive, and human as the product itself.
You'll directly impact Replit's AI agent—the core of our product strategy—by defining how we measure success, designing experiments that drive improvements, and turning agent trace data into actionable insights for the AI team and company leadership.
Design and analyze experiments to measure agent improvements—from model changes to UX variations—with statistical rigor and practical tradeoffs.
Define success metrics that connect agent trace data (prompts, responses, code changes, execution outcomes) to user outcomes like successful deploys, retention, and revenue.
Build the semantic layer for agent data in partnership with data engineering—defining the tables, metrics, and models that enable self-serve analysis across the AI team.
Surface insights from trace analysis that identify failure modes, successful patterns, and opportunities to improve agent effectiveness.
Partner with AI engineering, product, and leadership to translate data into roadmap decisions; you'll have a seat at the table for critical agent strategy discussions.
Create dashboards and reporting that surface agent performance metrics (task completion, latency, quality scores, user satisfaction) for the AI team and executives.
Design an experiment to measure whether a new model improves task completion rates, accounting for user heterogeneity and novelty effects.
Build outcome-linked data models that connect agent trajectories to downstream success (deployments, user satisfaction, retention).
Develop evaluation frameworks for agent quality that can be reused as benchmarks—similar to how LLMs have standard evals.
Investigate why agent performance varies across coding tasks, languages, or user segments—and recommend targeted improvements.
5+ years of experience in data science, analytics, or a quantitative role with a focus on product, growth, or experimentation.
Deep experimentation expertise: A/B testing, experiment design, power analysis, handling skewed data, interpreting results beyond p-values.
Strong SQL skills and experience designing data models for high-volume event data; experience with dbt or similar transformation tools.
Proficiency in Python and data science libraries (pandas, scipy, statsmodels, etc.).
Ability to translate ambiguous questions into structured analysis and communicate findings clearly to both technical and non-technical stakeholders.
Bias toward action: you ship insights that influence decisions, not just dashboards.
Experience with LLM or AI agent evaluation—understanding of prompt-response patterns, agent evaluation frameworks, or model quality measurement.
Background in high-growth SaaS or PLG companies with large-scale event data.
Experience with modern data stack (BigQuery, dbt, Fivetran, Segment, Hex).
Familiarity with experimentation platforms (LaunchDarkly, Statsig, Eppo, or similar).
Understanding of developer tools or software engineering workflows.
You've built agent or LLM evaluation frameworks from scratch.
Experience with causal inference methods (difference-in-differences, synthetic control, CUPED).
Familiarity with real-time data systems or operational analytics for monitoring agent performance.
Experience working with trace data, logging systems, or observability tooling.
This is a full-time role that can be held from our Foster City, CA office. The role has an in-office requirement of Monday, Wednesday, and Friday.
Full-Time Employee Benefits Include:
💰 Competitive Salary & Equity
💹 401(k) Program
⚕️ Health, Dental, Vision and Life Insurance
🩼 Short Term and Long Term Disability
🚼 Paid Parental, Medical, Caregiver Leave
🚗 Commuter Benefits
📱 Monthly Wellness Stipend
🧑💻 Autonoumous Work Environement
🖥 In Office Set-Up Reimbursement
🏝 Flexible Time Off (FTO) + Holidays
🚀 Quarterly Team Gatherings
☕ In Office Amenities
Want to learn more about what we are up to?
Interviewing + Culture at Replit
To achieve our mission of making programming more accessible around the world, we need our team to be representative of the world. We welcome your unique perspective and experiences in shaping this product. We encourage people from all kinds of backgrounds to apply, including and especially candidates from underrepresented and non-traditional backgrounds.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Quizlet is hiring a Senior Applied AI Engineer to build and scale personalization, ranking, retrieval, and LLM systems that drive measurable improvements in learner engagement and retention.
Airwallex is hiring a Senior Data Scientist to lead marketing analytics, build measurement frameworks, and influence go-to-market decisions from San Francisco.
Senior data scientist needed to design and lead advanced ML and analytics solutions for law enforcement missions using big data platforms and cloud services.
A remote eCommerce company is hiring a Data Scientist to develop predictive models, automate reporting, and deliver data-driven insights that fuel growth and operational efficiency.
Lead research and production-grade machine learning for Walmart's advertising platform, designing large-scale models and features to optimize relevance, targeting, and campaign performance.
YouGov seeks a hands-on Data Scientist/AI Engineer to build and deploy LLM-based applications and advanced analytics for market research using survey, census, and behavioral datasets.
Posh is hiring a Senior Data Scientist in SoHo to design and lead the personalization and recommendation systems that will drive product relevance and growth.
Airwallex is hiring a Senior Data Scientist to lead Marketing analytics—building predictive, causal and attribution models to shape go-to-market strategy as part of the SF-based data science team.
Senior Data Scientist to own end-to-end finance and GTM analytics—turning revenue, pipeline, and customer lifecycle data into decision-ready signals and executive-grade insights.
Oteemo is hiring an experienced AI Engineer to design, optimize, and deploy production-grade AI/ML systems that deliver measurable business outcomes while upholding ethical and governance standards.
Securian Financial is hiring a Data Science Consultant to design, deploy, and operationalize machine learning and NLP solutions that drive business outcomes across the enterprise.
Pilot Company is hiring an Engineer II, Analytics in Knoxville to design, implement, and maintain data products that drive actionable business insights and improve operational performance.
Sequen AI is hiring a Senior Research Engineer to develop and deploy state-of-the-art ranking, embedding, and retrieval models for its personalized discovery platform.
Create software together seamlessly from any location across the globe, using any device, without wasting time on setup.
11 jobs