Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Cluster Infrastructure Engineer image - Rise Careers
Job details

Cluster Infrastructure Engineer

About Cartesia

Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason over a year-long stream of audio, video and text—1B text tokens, 10B audio tokens and 1T video tokens—let alone do this on-device.

We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.

We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI.

About the Role

We’re looking for a Cluster Infrastructure Engineer to help build and scale the compute backbone that powers Cartesia’s research on real-time, multimodal intelligence. In this role, you’ll work at the intersection of distributed systems and infrastructure engineering, designing and operating the large-scale GPU clusters that train and serve Cartesia’s foundation models. You’ll own systems that need to be fast, reliable, and highly automated — ensuring our researchers and product teams can move at the speed of innovation. You’ll build the tooling, automation, and monitoring needed to keep clusters resilient under load, quickly diagnose and resolve issues, and continuously push the boundaries of scalability and efficiency.

Your Impact

  • Design and build large-scale GPU clusters for model training and low-latency inference

  • Develop automation for provisioning, scaling, and monitoring to ensure clusters are fast, resilient, and self-healing

  • Collaborate closely with research and product teams to enable distributed training at scale, optimizing for speed, reliability, and utilization

  • Implement robust observability and alerting systems to monitor GPU health, node stability, and job performance

  • Diagnose and triage hardware, networking, and distributed training issues across environments, coordinating with provider support as needed

  • Continuously improve cluster reliability, developer ergonomics, and overall system efficiency across Cartesia’s research and production workloads

What You Bring

  • Strong engineering fundamentals and experience building and operating large-scale distributed systems

  • Deep familiarity with GPU cluster management using Kubernetes and Slurm

  • A blend of developer empathy and raw performance engineering, designing systems and tools that are intuitive to use and fast

  • Ability to balance principled engineering with the urgency of keeping mission-critical systems alive

  • Proficiency with Infrastructure-as-Code tools (Terraform, Ansible, etc.) and observability tools (Prometheus, Grafana, etc.)

  • Strong debugging skills— comfortable diagnosing NCCL issues, CUDA errors, and network or driver-level faults.

What Sets You Apart

  • Experience optimizing large-scale distributed training frameworks such as DeepSpeed, Megatron-LM, or similar

  • Familiarity with advanced parallelization techniques such as FSDP, context parallelism, or tensor parallelism

Our culture

🏢 We’re an in-person team based out of San Francisco. We love being in the office, hanging out together and learning from each other everyday.

🚢 We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality and design along the way.

🤝 We support each other. We have an open and inclusive culture that’s focused on giving everyone the resources they need to succeed.

Average salary estimate

$235000 / YEARLY (est.)
min
max
$170000K
$300000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs
Photo of the Rise User
Posted 11 hours ago

Cartesia is hiring a senior Product Manager to define and lead the voice AI agent product area, building enterprise-grade speech-driven agents and evaluation standards using cutting-edge audio models.

Photo of the Rise User
Posted 6 hours ago

Cartesia is looking for a Post-Training Researcher to design and scale preference optimization, evaluation, and feedback-driven learning methods for multimodal foundation models.

Photo of the Rise User
Posted 2 hours ago

Narmi is hiring a Senior Software Engineer to lead the data conversion efforts that migrate banks and credit unions onto its digital banking platform.

Cleeng Hybrid No location specified
Posted 17 hours ago

Contribute to a market-leading SaaS product as a Front-end Engineer intern, building embeddable low-code widgets and interactive dashboards for global media clients.

Photo of the Rise User
Posted 4 hours ago

Momentus is hiring a Senior DevOps Engineer to lead cloud infrastructure, CI/CD automation, and observability for a fast-growing event management SaaS in a U.S.-based remote role.

Photo of the Rise User

Flex, a fast-growing finance super app for mid-market businesses, is hiring a Senior Software Engineer (Platform & Infra) to build and maintain scalable backend systems, platform services, and developer tooling in a fully-remote role.

Photo of the Rise User
Posted 13 hours ago

Lead the architecture and delivery of mission-critical Command & Control software for Boeing's satellite ground systems, driving design, integration, and CI/CD practices at the El Segundo site.

Posted 19 hours ago

Reveal is seeking mid-to-principal Computer Vision Engineers to develop production-grade C++ and ML solutions for 3D reconstruction and perception in our Farsight geospatial product.

Photo of the Rise User
Posted 9 hours ago

Lead Radar Protect's multi-stack engineering team in NYC to deliver robust fraud and compliance solutions across mobile SDKs, web dashboards, and backend services.

Photo of the Rise User
Posted 16 hours ago

KEL Applied R&D is hiring a Software Engineer II in St. Cloud, MN to design, develop, and maintain moderate-complexity system components using Java, C/C++, SQL and test-driven practices.

Photo of the Rise User
Posted 16 hours ago

Ingram Barge is hiring a Senior DevOps Engineer in Baton Rouge to lead AKS-based production operations, CI/CD automation, monitoring, and cost-optimization efforts for mission-critical systems.

Photo of the Rise User
Posted 22 hours ago

Build and deploy scalable AI-driven healthcare applications as a remote AI Software Engineer, translating ML models into production-ready solutions across backend, frontend, and cloud environments.

Photo of the Rise User

Help scale a secure, HIPAA-conscious data platform (Databricks + AWS) at an early-stage healthcare AI company as a mid-level Software Engineer focused on platform reliability and developer experience.

Photo of the Rise User
Sunlighten Hybrid No location specified
Posted 7 hours ago

Experienced Shopify Developer needed to own storefront development and integrations for a growing wellness-technology brand based in the Kansas City area.

Photo of the Rise User

A technically driven AI software engineer is needed to develop and deploy scalable machine learning and deep-learning solutions for production applications at a fast-paced, innovation-led company.

Founded in 1992, Cartesia, Inc. is a group of talented professionals providing custom solutions in the areas of engineering design automation, Web-based applications development, and Microsoft Windows-based software construction and integration. ...

3 jobs
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, onsite
DATE POSTED
October 21, 2025
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!