Job details

Research Engineer (Infrastructure)

Research Engineer, Infrastructure (RL & Numerics)

About the Role

As a Research Engineer at humans&, you'll design, build, and optimize the core systems that power large-scale reinforcement learning and model training. You'll work at the intersection of research and infrastructure, building the foundation that enables our researchers to push the boundaries of AI capabilities.

Your work will directly enable breakthroughs in AI by making experimentation and training fast, reliable, and scalable. This role is ideal for someone who blends deep systems expertise with curiosity for machine learning at scale—a builder who understands both the math of optimization and the realities of distributed compute.

We're hiring multiple engineers for this team.

What You'll Do

Design and build scalable infrastructure for reinforcement learning workloads
Optimize the numerical foundations of our distributed training stack, including precision formats, kernel optimizations, and communication frameworks
Improve the performance, stability, and reproducibility of training large models
Debug complex issues at the intersection of ML and systems—from diagnosing cluster failures to fixing regressions in data pipelines
Collaborate closely with researchers to accelerate experiments, develop new capabilities, and ensure every GPU cycle drives scientific progress
Build tools and abstractions that improve research velocity and enable our team to focus on science rather than system bottlenecks

What We're Looking For

Strong software engineering skills with the ability to write performant, maintainable code and debug complex codebases
Proficiency in Python and deep understanding of deep learning frameworks (PyTorch, JAX) and their underlying system architectures
Experience with distributed systems and large-scale computing
A bias for action—comfortable working across different stacks and teams to make sure things ship

Highly valued:

Experience building production training systems on many GPUs
Experience with reinforcement learning infrastructure and training pipelines
Background in floating-point numerics, low-precision arithmetic, and numerical optimization
Familiarity with distributed training frameworks (DeepSpeed, Megatron-LM, XLA) and cluster orchestration (Kubernetes, SLURM, Ray)
Track record of improving research productivity through infrastructure design
Contributions to open-source ML infrastructure

Research Engineer RL Infrastructure Distributed Training PyTorch JAX CUDA GPU Numerics DeepSpeed Kubernetes Reinforcement Learning Low-precision

Average salary estimate

$235000 / YEARLY (est.)

min

max

$170000K

$300000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Area TA Lead Neuroscience, Intercontinental Medical Affairs

AbbVie Hybrid Mettawa, IL

VIEW

Posted 22 hours ago

AbbVie is hiring an Area TA Lead, Neuroscience to provide strategic medical affairs leadership and drive evidence generation and affiliate support across Intercontinental markets from a hybrid Mettawa, IL location.

h humans& ai, inc

2 jobs

MATCH

Calculating your matching score...

FUNDING

Private

DEPARTMENTS

Research & Development

SENIORITY LEVEL REQUIREMENT

Senior Level

TEAM SIZE

No info