Exa is building a search engine from scratch to serve every AI application. We build massive-scale infrastructure to crawl the web, train state-of-the-art embedding models to index it, and develop super high performant vector databases in Rust to search over it. We also own a $5M H200 GPU cluster that regularly lights up tens of thousands of machines.
As a Data Engineer, you'll architect and build the data infrastructure that powers everything we do—from crawling billions of pages to training our embedding models to serving real-time search. You'll have enormous autonomy in designing systems that scale to hundreds of petabytes. If you've ever wanted to build data pipelines at a scale that most companies only dream about, this is your chance.
Desired Experience
Deep understanding of lakehouse architectures (Delta Lake, Iceberg, Hudi) and when to use them
Experience building and operating large-scale distributed data processing pipelines
Hands-on experience with streaming data systems (Kafka, Flink, or similar)
Familiarity with Ray, Spark, or ClickHouse at production scale
An obsessive focus on reliability and building systems that don't page you at 3am
Bonus Points
Experience with Lance or other vector-native storage formats
Background in GPU-accelerated data processing (RAPIDS, cuDF)
Example Projects
Design a lakehouse architecture that handles 100+ PB of web crawl data
Build streaming pipelines that process billions of documents per day for real-time indexing
Architect the data layer for our embedding training infrastructure on Ray
Scale our ClickHouse deployment to handle analytical queries across petabytes of search logs
This is an in-person opportunity in San Francisco. We're happy to sponsor international candidates (e.g., STEM OPT, OPT, H1B, O1, E3).
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Exa is hiring an on-site Automation Engineer in San Francisco to automate cross-team processes and build AI-driven workflows that eliminate friction and technical debt.
Handshake seeks experienced office clerks to work remotely as contract AI trainers who evaluate model outputs, craft prompts, and provide structured feedback to improve AI understanding of office tasks.
Handshake seeks seasoned agricultural managers and farmers to evaluate AI outputs and create domain-specific prompts in a flexible, remote contract role at $75/hr.
Substack is hiring a technical analyst to build the internal market-intelligence OS combining data engineering, full‑stack development, and AI to power go-to-market and product decisions.
Contract AI Trainer role for seasoned Health Information Technologists and Medical Registrars to evaluate AI outputs and provide structured feedback for healthcare-focused research.
AECOM seeks a GIS Specialist II to support MCB Camp Pendleton with geospatial data management, field GPS collection, cartography, and technical GIS support for the USMC environment program.
Lead a team to design and operate scalable, secure data platforms and pipelines for analytics and ML in a client-facing, remote consulting role at Stride Consulting.
An experienced LLM-focused Data Engineer needed to design complex prompts, curate and QA high-quality training data, and collaborate with engineering teams on a short-term freelance engagement for U.S.-based candidates.
Technical Solutions Analyst needed to convert business needs into precise data engineering specifications and user stories for a fast-moving, healthcare/legal-focused data team.
Handshake seeks seasoned supervisors in mechanics and repair to evaluate and refine AI model outputs through structured, remote, contract work at $75/hour.
Experienced Database Administrator needed to manage and optimize SQL Server and MySQL environments (on-prem and AWS) for a mission-critical, revenue-impacting organization — fully remote for Maryland candidates.
Handshake seeks experienced agriculture professionals to evaluate AI outputs and provide expert, structured feedback on a flexible, remote contract basis.
Work remotely as a part-time Clinical AI Nurse Practitioner SME to label clinical notes, lend NP clinical expertise, and help refine AI models for long-term and post-acute care.
Handshake seeks seasoned hospitality desk clerks to work remotely as contract AI Trainers, evaluating model outputs and providing structured feedback to improve AI handling of hotel operations.