Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.
Cerebras' current customers include global corporations across multiple industries, national labs, and top-tier healthcare systems. In January, we announced a multi-year, multi-million-dollar partnership with Mayo Clinic, underscoring our commitment to transforming AI applications across various fields. In August, we launched Cerebras Inference, the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services.
About The Role
We are seeking a highly skilled and experienced AI Infrastructure Operations Engineer to manage and operate our cutting-edge machine learning compute clusters. These clusters would provide the candidate an opportunity to work with the world's largest computer chip, the Wafer-Scale Engine (WSE), and the systems that harness its unparalleled power.
You will play a critical role in ensuring the health, performance, and availability of our infrastructure, maximizing compute capacity, and supporting our growing AI initiatives. This role requires a deep understanding of Linux-based systems, containerization technologies, and experience with monitoring and troubleshooting complex distributed systems. The ideal candidate is a proactive problem-solver with expertise in large-scale compute infrastructure, dependable and an advocate for customer success.
Responsibilities
Skills And Requirements
Preferred Skills And Requirements
Location
Why Join Cerebras
People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we've reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:
Read our blog: Five Reasons to Join Cerebras in 2025.
Apply today and become part of the forefront of groundbreaking advancements in AI!
Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.
This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Lead the design and implementation of enterprise-grade security features and AI-aware defenses for Dia, spanning client and backend surfaces in a remote-first startup environment.
Lead a remote engineering team at LifeStance Health to design and deliver scalable, secure serverless microservices and interoperability solutions for next-generation mental health technology.
PwC IT Services is hiring a remote Manager-level Full-Stack .NET Developer to lead Agile teams building scalable, cloud-native HR systems for global PwC member firms.
Senior React developer needed to lead front-end engineering and deploy cloud-native web applications using AWS for a financial services firm.
Work on cutting-edge SSD firmware at Solidigm, developing embedded C/C++ solutions and collaborating with hardware and cross-functional teams to deliver high-quality storage products.
Senior Software Engineer (Infrastructure) to help scale Kiddom's multi-region platform, improve CI/CD and observability, and drive DevOps best practices for an education-focused SaaS.
An AI Engineer role on Point72's Long/Short Equity team to design and implement GenAI/ML solutions that enhance the team's research and investment capabilities.
Lead a system software team at HRL in Calabasas to build high-performance C++/Python software and drive technical execution for quantum device integration.
Design and operationalize machine learning and generative AI solutions to enrich content metadata and improve discovery on Netflix's Content Management & Distribution team.
Experienced software engineer to develop ETL/ELT solutions, build CI/CD test automation, and support enterprise data warehousing initiatives at Cigna-Evernorth on a hybrid schedule.
Lead the architecture and delivery of Palantir Foundry-based data platforms at an early-stage fintech, applying full-stack and data engineering expertise to drive product impact.
Entry-level Software Engineer supporting HCL America’s device testing projects with emphasis on software testing, debugging, and learning scripting and Windows device test procedures.
Lead the architecture and experimentation strategy for Spotify’s Home backend systems to enable trustworthy, scalable personalization and better product experiences.