Job details

Member of Technical Staff – Data Infra

Tzafon is a foundation model lab building scalable compute systems and advancing machine intelligence, with offices in San Francisco, Stockholm & Tel Aviv. We recently raised $9.7m in pre-seed funding to advance our mission of expanding the frontiers of machine intelligence.

We’re a team of engineers and scientists with deep backgrounds in ML infrastructure & research. Founded by IOI and IMO medalists, PhDs, and alumni from leading tech companies, we train models and build infrastructure for swarms of agents to automate work across real-world environments.

This role will work closely with our researchers on collecting and preparing data for the training of our foundation models. You'll be developing the data engine that powers our models, ensuring it is clean, diverse, and high-quality.

What You’ll Do

Build and maintain scalable data pipelines for training and fine-tuning LLMs and agent models
Create and optimize distributed computing systems for processing web-scale datasets
Clean, deduplicate, normalize, and cluster diverse datasets across structured and unstructured sources
Design robust pipelines using tools like Spark, BigQuery, DBT, and Airflow
Collaborate with researchers and engineers to develop reproducible dataset curation workflows
Monitor data quality and build tools for versioning, observability, and auditing
Help define what “great data” looks like for real-world intelligent agents

Develop and maintain core processing primitives (e.g., tokenization, deduplication, chunking) with a focus on scalability

We’re looking for

Have 3+ years of full-time experience as a data engineer and 6+ years of any software engineering experience (including data engineering).
Proficiency in Python, Scala, or Java
Solid understanding of Spark and ability to write, debug and optimize Spark code
Familiarity with GCP, BigQuery, DBT, Trino, Hex, and other cloud-based data and analytics platforms
Experience with ML datasets and data preparation for model training
Excited about joining a fast-moving research team to shape the quality of intelligence from the ground up

Sample Projects

Designing and implementing distributed computing architecture for web-scale data processing
Building scalable infrastructure for model training data preparation
Creating comprehensive monitoring and alerting systems
Optimizing tokenization infrastructure for improved throughput
Developing fault-tolerant distributed processing systems
Implementing new infrastructure components based on research requirements
Building automated testing frameworks for distributed systems

Life at Tzafon

Full medical, dental, and vision coverage, plus 401(k)
Office in SF and Tel Aviv
Early-stage equity in a future-defining company

Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

Compensation

Compensation starts at $150k-$425k and equity package.

We also offer a referral bonus of $20k for referral of successful hires (send to [email protected]).

data engineer ML infrastructure foundation models Spark GCP BigQuery distributed computing data pipelines Python Scala Java

Average salary estimate

$287500 / YEARLY (est.)

min

max

$150000K

$425000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Data Engineer II

Les Schwab Tire Centers Hybrid Bend, OR

VIEW

Posted 19 hours ago

Contribute to enterprise data delivery and analytics at Les Schwab as a skilled Data Engineer II, driving data management and solution development.

Senior Data Engineer

QODE Hybrid No location specified

VIEW

Posted 59 minutes ago

Experienced Senior Data Engineer needed at PNC Bank to develop and optimize Hadoop-based big data solutions supporting enterprise analytics.

Data Engineer

dbt Labs Hybrid US - Remote

VIEW

Posted 9 hours ago

dbt Labs is looking for a skilled Data Engineer to build and optimize scalable data pipelines enabling analytics and business growth within a fast-paced remote environment.

Data Engineer - Database Focus

ITility Hybrid UNAVAILABLE

VIEW

Posted 17 hours ago

Experienced Data Engineer sought to develop and optimize AWS-based data solutions supporting critical DoD programs in a mostly remote capacity.

Data Engineer / Senior Data Engineer

Arcadia Hybrid Remote (USA)

VIEW

Posted 9 hours ago

Arcadia is looking for a Senior Data Engineer to develop and maintain data pipeline connectors that advance healthcare analytics and client data integration in a fully remote role.

T Tzafon

1 jobs

MATCH

Calculating your matching score...

FUNDING

Other

DEPARTMENTS

Data Engineering

SENIORITY LEVEL REQUIREMENT

Mid-Level

TEAM SIZE

No info