Job details

Member of Technical Staff - Foundations

Tzafon is a foundation model lab building scalable compute systems and advancing machine intelligence, with offices in San Francisco, Zurich & Tel Aviv. We’ve raised over $12m in funding to advance our mission of expanding the frontiers of machine intelligence.

We're a team of engineers and scientists with deep backgrounds in ML infrastructure & research. Founded by IOI and IMO medalists, PhDs, and alumni from leading tech companies, such as Google Deepmind, Character, and NVIDIA, we train models and build infrastructure for swarms of agents to automate work across real-world environments.

You'll work between our product and post-training teams to ship Large Action Models that actually work. Build evals, benchmarks, and fine-tuning pipelines. Define what good model behavior means and make it happen at scale.

What you'll do

Design and execute large scale training runs on our clusters
Build and optimize distributed training infrastructure across massive multi-node systems
Implement post-training pipelines at scale
Develop data pipelines that process and filter trillions of tokens for pre-training
Research and implement architectural improvements, scaling laws, and training optimizations
Debug training instabilities, loss spikes, and convergence issues in long-running jobs
Build tooling for cluster utilization, fault tolerance, and checkpoint management
Write custom CUDA/Triton kernels to optimize critical training operations (attention, normalization, activations)
Collaborate on research that advances the state of the art in foundation model training

We're looking for

Deep experience pre-training or post-training foundation models on large clusters
Expert-level at Python and ML frameworks (PyTorch, JAX, Torchtitan)
Strong systems skills: distributed training, FSDP/ZeRO, tensor parallelism, pipeline parallelism
Experience writing performant CUDA or Triton kernels for ML workloads
Track record of running stable multi-week training jobs and debugging distributed training failures
Understanding of cluster scheduling, networking bottlenecks, and GPU/TPU performance optimization

Preferred Experience

Trained foundation models at major AI labs (OpenAI, Anthropic, Google DeepMind, Meta, xAI, etc.)
Worked on large scale RL runs
Optimized critical training kernels (FlashAttention, fused optimizers, custom kernels)
Published research at top ML conferences (NeurIPS, ICML, ICLR)
Contributions to open source ML infrastructure (PyTorch, JAX, vLLM, etc.)
Experience with training data pipelines, data quality research, or synthetic data generation

Life at Tzafon

Full medical, dental, and vision coverage, plus 401(k)
Office in SF, Zurich, and Tel Aviv
Early-stage equity in a future-defining company

Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

Compensation starts at $200k-$500k + equity package, depending on experience & location.

We also offer a referral bonus of $5k for referral of successful hires (send to [email protected]).

foundation-models distributed-training PyTorch JAX CUDA Triton FSDP ZeRO tensor-parallelism pipeline-parallelism FlashAttention checkpointing ML-infrastructure GPU-optimization large-scale-training

Average salary estimate

$350000 / YEARLY (est.)

min

max

$200000K

$500000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Junior Python Coder and Analyst

KBR Hybrid Arlington, Virginia

VIEW

Posted 14 hours ago

KBR is hiring a Junior Python Coder and Analyst to create and integrate Python-based tools and APIs for AFCATT in support of USAF analysis work at the Pentagon.

Senior Backend Engineer - Connectors

TigerData Hybrid No location specified

VIEW

Posted 12 hours ago

Senior Backend Engineer to own and build Golang-based connector services that make ingesting and exporting data to TigerData seamless for developers and enterprises.

Founding Design Engineer

Datacurve AI Inc. Hybrid San Francisco

VIEW

Posted 17 hours ago

Lead the design and engineering of Shipd's UI and developer tooling to build reusable design systems, scalable quest workflows, and gamification features that drive contributor engagement.

Senior Debugger Software Engineer (Remote - US)

Jobgether Hybrid No location specified

VIEW

Posted 12 hours ago

Lead the design and implementation of cross-platform GPU debugger tools that accelerate development across automotive, VR, gaming, AI, and HPC domains.

Senior Software Engineer (Backend)

WireScreen Hybrid New York

VIEW

Posted 17 hours ago

At WireScreen, a mission-driven Series A startup, this Senior Backend Engineer role will design scalable backend systems, build platform APIs, and help deliver high-impact data products used by top-tier government customers.

Staff Software Engineer, Core AI

FloQast Hybrid San Jose, California

VIEW

Posted 18 hours ago

Inclusive & Diverse

Empathetic

Feedback Forward

Collaboration over Competition

Growth & Learning

Transparent & Candid

Customer-Centric

Dental Insurance

Flexible Spending Account (FSA)

Vision Insurance

Disability Insurance

Family Medical Leave

Paid Holidays

Medical Insurance

Learning & Development

Employee Resource Groups

FloQast seeks a Staff Software Engineer to lead architecture and delivery of production Core AI systems that power its accounting automation platform.

Software Engineer

Q2ebanking Hybrid Austin, TX

VIEW

Posted 39 minutes ago

Q2 is hiring a Software Engineer in Austin to develop, integrate, and support custom platform solutions using Python, SQL, and web technologies within a collaborative, client-focused engineering team.

AI Agent Engineer

Agiloft Hybrid United States

VIEW

Posted 19 hours ago

Agiloft is seeking an experienced AI Agent Engineer to design, deploy, and manage agent-based AI solutions that safely automate and enhance enterprise contract workflows.

Software Engineer, Internal Infrastructure (Europe & UK)

Cohere Hybrid No location specified

VIEW

Posted 18 hours ago

Startup Mindset

Collaboration over Competition

Growth & Learning

Inclusive & Diverse

Cohere is hiring a Software Engineer to design, operate, and scale Kubernetes GPU infrastructure across clouds to accelerate model research and training for teams across Europe & the UK.

3.11 Robotics Autonomy Engineer: Simulation

Field AI Hybrid Irvine, CA

VIEW

Posted 10 hours ago

Field AI is hiring a Robotics Autonomy Engineer focused on simulation to create high-fidelity sensor and dynamics models, integrate autonomy stacks, and support real-world robot deployments and sim-to-real validation.

Sr Principal Engineer Software (Backend)

Palo Alto Networks Hybrid Santa Clara, CA

VIEW

Posted 12 hours ago

Palo Alto Networks seeks a Senior Principal Backend Engineer to lead design and delivery of cloud-native microservices for the Enterprise Browser, protecting customers' web traffic with high-performance, secure solutions.

Predictive AI Software Engineer

LLNL Hybrid Livermore, CA, USA

VIEW

Posted 18 hours ago

Lawrence Livermore National Laboratory is hiring a Predictive AI Software Engineer to design, implement, and operate LLM-based agents and predictive AI pipelines for the Bernie AI infrastructure management program.

Senior Software Engineer

WireScreen Hybrid New York

VIEW

Posted 21 hours ago

Help build WireScreen’s next-generation OSINT platform as a Senior Backend Engineer, scaling systems that analyze tens of millions of entities to support national security, compliance, and regulatory oversight.

T Tzafon

1 jobs

MATCH

Calculating your matching score...

FUNDING

Other

DEPARTMENTS

Software Engineering

SENIORITY LEVEL REQUIREMENT

Senior Level

TEAM SIZE

No info