Browse 33 exciting jobs hiring in Slurm now. Check out companies hiring such as Institute of Foundation Models, NVIDIA, Anduril Industries in Aurora, Austin, Boston.
Lead the design and optimization of RDMA-based networking and GPUDirect integrations for world-class GPU supercomputing clusters at a research-focused AI institute.
NVIDIA seeks a seasoned technical leader to shape and standardize AI accelerator architectures across strategic accounts, combining hands-on engineering with cross-functional influence.
Lead the architecture, build, and accreditation of large-scale classified HPC and VDI infrastructure to support mission-critical DoD/IC workloads for a fast-moving defense technology company.
NVIDIA's MARS team is hiring a Senior AI/ML Storage Engineer to architect and operate exascale storage and orchestration systems powering frontier AI research and global-scale workloads.
Lead full-stack engineering to build and operationalize scalable GPU cluster platforms that empower researchers to run cutting-edge machine learning workloads with minimal operational overhead.
Cartesia is hiring a Cluster Infrastructure Engineer in San Francisco to build and operate large-scale GPU clusters and automation that power state-of-the-art multimodal model training and inference.
Experienced HPC Support Engineer needed to troubleshoot GPU/HPC clusters, mentor peers, and deliver high-quality customer support for Lambda’s deep learning cloud.
Lead technical engagements with NVIDIA Cloud Partners and customers to design, deploy, and scale production AI/HPC GPU infrastructure while influencing product and engineering strategy.
Drive technical integration and deployment of large-scale GPU AI/HPC infrastructure as a Solutions Architect partnering closely with NVIDIA Cloud Partners and strategic customers.
Lead and mentor a team of Super Intelligence HPC Support Engineers to deliver world-class support and incident leadership for hyperscale AI customers.
Lead product strategy for Pryon’s petabyte-scale, low-latency HPC RAG platform, aligning engineering, research, and federal stakeholders to deliver high-throughput inference and compliant deployments.
Remote role for an experienced Python engineer to automate testing, benchmarking, and deployment of large-scale deep learning models and AI services across GPU clusters.
The Pop Lab at UMIACS, University of Maryland seeks a Bioinformatics Engineer to build and maintain open-source genomic analysis software and pipelines for microbial communities.
Lead and mentor Lambda’s Tier III HPC Support Engineers to deliver rapid, high-quality support for enterprise Private Cloud GPU clusters while shaping product supportability and incident response.
Lead the scaling and operation of a production ML inference platform for biological models at an early-stage AI-for-drug-discovery startup.
Tamarind Bio is hiring a Full-Stack Engineer to design, build, and scale the web and API stack that enables AI-driven drug discovery for enterprise customers.
Tamarind Bio is hiring a Senior Software Engineer in San Francisco to scale the infrastructure and web/API platform that delivers AI-driven drug discovery tools to enterprise customers.
Experienced research engineers are wanted to build and scale multimodal AI and reinforcement learning training systems for autonomous vehicles at NVIDIA, combining publishable research with production-grade GPU infrastructure work.
Lead architecture and deployment of large-scale GPU-accelerated cloud solutions and AI Factory environments for NVIDIA, working with customers and engineering teams to deliver resilient, high-performance AI infrastructure.
Develop and maintain Python-based automation and tooling to benchmark, test, and deploy deep learning models and AI services at scale on enterprise GPU clusters.
Lead the reliability, scalability, and observability of research compute clusters to enable large‑scale ML and HPC workloads for an innovative research-focused engineering team in California.
At NVIDIA, this Senior Software Engineer role will design and deliver scalable GPU cluster platforms and AIOps-driven automation to enable researchers to train and deploy advanced ML models with minimal operational overhead.
Lead the development of robust, high-performance deep learning training infrastructure for NVIDIA's Autonomous Vehicles group to enable multi-thousand-GPU training and rapid experimentation on massive datasets.
Tamarind Bio is hiring an AI/LLM Engineer in San Francisco to build scalable, production-grade workflows and enhance an ML copilot for computational biology.
Lead the University of Iowa's Research Services unit to shape and scale HPC, storage, and research-support services while building strategic partnerships and managing a $2.8M budget.
Founding Software Engineer at Tamarind Bio to build and scale the core infrastructure, web interface, and API products that power AI-driven drug discovery.
CesiumAstro is hiring a Senior DevOps Engineer I to manage on-premise RHEL build servers, GitLab CI pipelines, and containerized build environments that support FPGA compilation and hardware simulation workflows.
NVIDIA is hiring an EDA Workflow Optimization Engineer to investigate and optimize end-to-end chip-design workflows, build reliable metrics and infrastructure, and enable engineers to develop at high velocity.
NVIDIA is hiring a Senior Software Engineer to build and maintain scalable, high-performance GPU cluster platforms that accelerate AI research and reduce operational toil.
Voleon is seeking a Senior Cluster Site Reliability Engineer to ensure high-availability, observability, and scalable operations for our research compute clusters across on-prem and cloud environments.
NVIDIA seeks a Senior Solutions Architect (HPC & AI) to validate and debug GPU cluster performance, architect distributed AI infrastructure, and drive technical customer engagements across datacenter and cloud environments.
An early engineering hire to build and scale Tamarind Bio's DevOps/MLOps platform, web interface, and APIs while working closely with founders and customers in the SF Bay Area.
Lead the technical design and implementation of Crusoe's managed Slurm service to enable scalable, GPU-accelerated AI and HPC workloads on Crusoe Cloud.
| 
                            Below 50k*
                           
                            0
                           | 
| 
                            50k-100k*
                           
                            0
                           | 
| 
                            Over 100k*
                           
                            32
                           |