Cruose's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability.
Be part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.
Overview
Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to power their most advanced AI applications. Crusoe is redefining AI cloud infrastructure, with a mission to align the future of computing with the future of the climate. Our AI platform is recognized as the "gold standard" for reliability and performance. Our data centers are optimized for AI workloads and are powered by clean, renewable energy.
Be part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.
Crusoe is building its next-generation orchestration platform to power GPU-accelerated and high-performance computing at scale. As a Staff Software Engineer on the Managed Orchestration team, you will shape the technical direction of our managed Kubernetes service, delivering systems that allow customers to run advanced workloads across CPUs, NVIDIA and AMD GPUs, and high-performance networking environments.
You’ll drive architecture and design for complex, distributed systems that integrate GPU operators, network operators, and CNI technologies (Cilium, Calico, Multus) with Kubernetes, while also supporting high-performance fabrics such as InfiniBand and RoCE. This role requires a blend of deep technical expertise, architectural leadership, and the ability to influence cross-functional teams to deliver reliable, scalable, and secure orchestration for mission-critical workloads.
Lead architecture and design for core features of Crusoe’s Managed Kubernetes platform (multi-tenancy, control plane scalability, cluster lifecycle, and high availability).
Drive integration of GPU acceleration in Kubernetes, including device plugin architecture, GPU operators, scheduling, autoscaling, and monitoring.
Guide development of advanced container networking capabilities, including CNI plugins, network operators, service meshes, and high-performance fabrics (InfiniBand, RoCE).
Define and enforce best practices for security, multi-cluster deployments, and workload isolation across compute, GPU, and networking layers.
Partner with product and engineering leadership to set long-term technical strategy and roadmap for CMK.
Mentor engineers across the organization, providing technical guidance and elevating standards for design, code quality, and operational excellence.
Troubleshoot and resolve complex distributed systems challenges spanning compute, networking, and GPU acceleration.
Contribute to and represent Crusoe in open-source communities (Kubernetes SIGs, CNCF projects, GPU and networking ecosystem).
8+ years of software engineering experience in distributed systems, cloud, or HPC.
Proven track record of technical leadership and driving architecture in production systems.
Deep expertise in Kubernetes internals (control plane, operators, API machinery, scheduling).
Strong proficiency in Go (preferred) or another systems language (Rust, C++, Python for HPC tooling).
Extensive experience with GPU integration in Kubernetes (device plugins, GPU operators, resource allocation).
Strong knowledge of container networking (Cilium, Calico, Multus, service meshes) and Linux networking fundamentals.
Familiarity with high-performance networking technologies (InfiniBand, RoCE) and accelerator-aware scheduling.
Excellent debugging, systems design, and problem-solving skills in distributed systems.
Familiarity with both NVIDIA and AMD GPU stacks (CUDA, ROCm, NCCL).
Experience with Slurm, MPI, Ray, or distributed ML frameworks (TensorFlow, PyTorch, JAX).
Contributions to open-source projects in the Kubernetes, GPU, or networking ecosystems.
Experience scaling multi-cluster environments and managing interconnects across data centers.
Background in security for Kubernetes and GPU workloads (RBAC, PodSecurity, runtime scanning).
Compensation Range:
Compensation will be paid in the range of $204,000 - $247,000. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data.
Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Crusoe Energy is hiring a BMS/EPMS Controls Engineer to design, commission, and optimize control systems that ensure efficient, reliable operations at our Abilene data center.
Build and operate Crusoe's next-generation managed Kubernetes orchestration platform, enabling high-performance CPU/GPU workloads at scale for enterprise customers.
Senior Engineer sought to develop and optimize Roku Brightscript-based video playback features for NBCUniversal’s streaming products.
Immuta is looking for a Software Engineer to build scalable frontend and backend solutions that enable automated data governance across cloud platforms.
Build and operate Crusoe's next-generation managed Kubernetes orchestration platform, enabling high-performance CPU/GPU workloads at scale for enterprise customers.
Kentro seeks a Senior SharePoint Developer to architect and implement secure SharePoint Online and Power Platform solutions in support of DoD and US SOCOM missions.
Lead Anrok’s filing automation engineering team to scale automated tax filing products and improve manual filing efficiency across domestic and international markets.
Lead the design and implementation of a cutting-edge browser agent and the AI systems that enable reliable, repeatable checkout workflows for agentic ecommerce experiences.
Join a focused innovation team at MPR to architect and build full-stack, AI-powered document analysis and verification applications for the energy and nuclear sectors.
Decagon is looking for a Senior Infrastructure Engineer to build and operate production infrastructure that meets strict SLOs and enables high‑scale conversational AI services.
Senior Java Developer needed to enhance and support Sky Road's OMS—working on allocation and workflow engines, REST APIs, testing, and production support for a leading credit intelligence firm.
Anduril seeks a Production Software Engineer to design and implement automated hardware test systems and production tools that improve product quality and manufacturing efficiency.
Lead a small engineering team at Pattern to design and build AI-infused creative tooling that accelerates content production and insight-driven creative decisions.
WindBorne Systems is hiring a Flight Team Web Developer to own and build the mission control dashboard and operational web tools that power a growing constellation of autonomous weather balloons.
Lead the design and implementation of scalable backend applications and OLTP systems to power Reach Security’s AI-driven security platform and drift-detection capabilities.
We’re on a mission to align the future of computation with the future of the climate.
32 jobs