NVIDIA is at the forefront of innovations in Artificial Intelligence, High-Performance Computing, and Visualization. Our invention—the GPU—functions as the visual cortex of modern computing and is central to groundbreaking applications from generative AI to autonomous vehicles. We are now looking for a Senior Software Engineer to help accelerate the next era of machine learning innovation.
In this role, you will propose and implement engineering solutions to ensure delivery of functional, reliable, secure, and performance-optimal GPU clusters to internal researchers, enable them to focus on training and development by reducing operational disruption and overhead, empower them for self-service continuous improvement on reliability, operational excellence & performance. Your work will empower scientists and engineers to train, fine-tune, and deploy the most advanced ML models on some of the world’s most powerful GPU systems.
What You'll Be Doing:
In this position, you will work with coworkers across the AI Platform organization to understand the pain points of validating, monitoring and operating GPU clusters at scale. Then you will design, develop and maintain engineering solutions to solve those pain points systematically.
You will also research in traditional AIOps and the emerging Agentic AI, and leverage it to further reduce the operation toil.
You will participate in on-call support for systems, platforms built and owned by the team.
What We Need To See:
BS/MS in Computer Science, Engineering, or equivalent experience.
8+ years in software/platform engineering, including 3+ years in ML infrastructure or distributed systems.
Experience in software development lifecycle on Linux-based platforms.
Strong coding skills in languages such as Python, C++ or Rust.
Experience with Docker, Kubernetes, GitLab CI, automated deployments.
Experience with AIOps or Agentic AI and apply it successfully in production environment.
Ways To Stand Out From The Crowd:
Proficiency with full-stack development: Relational Data Modeling, DB optimization, REST API Semantics, Javascript, CSS, providing API as a service.
Passion for building developer-centric platforms with great UX and strong operational reliability.
Experience running Slurm or custom scheduling frameworks in production ML environments.
Familiarity with GPU computing, Linux systems internals, and performance tuning at scale.
You will also be eligible for equity and benefits.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Experienced strategic sourcing leader needed to own global hardware category strategy and hands-on sourcing for compute, networking and storage at NVIDIA.
Lead a team to design and operate scalable cloud services and telemetry pipelines for NVIDIA's DGX Cloud GPU infrastructure.
Lead NBCUniversal's developer platforms and AI-enabled SDLC initiatives as a Principal Software Engineer driving cloud control plane, API governance, observability, and developer tooling at enterprise scale.
NVIDIA is hiring an HPC Middleware Developer to design and implement high-performance communication protocols and software for networked supercomputers and datacenters.
Lead the design, prototyping, and production deployment of generative AI applications for enterprise customers while shaping product roadmaps and repeatable delivery patterns.
Experienced engineering leader needed to grow and coach teams building PlayStation’s commerce and monetization systems across consoles, web and mobile.
Lead the design and implementation of low-latency, high-throughput infrastructure for Nexus’s Layer 1 blockchain and integrated DEX, helping bring mainnet performance to institutional-grade levels.
Hands-on internship on the Flight Software team at a venture-backed hypersonics company, developing and testing embedded flight software and hardware integrations.
Lead the Core Experience engineering team at Grammarly to deliver user-facing document surfaces and a next-generation no-code application builder, blending technical depth with strong product and people leadership.
Standard Bots is hiring a Senior Software Engineer (Embedded Systems) to drive firmware and systems software for our robotics platform in a hybrid role based in Glen Cove, NY.
iHeartMedia seeks a Web Software Engineer to build scalable React/TypeScript features and improve developer workflows for iheart.com's high-traffic platform in Nashville.
Entrust seeks a Sr. Software Developer in Shakopee, MN to drive development and maintenance of driver and Instant ID issuance software in a hybrid HQ-based role.
Work at the intersection of real-time visualization and defense technology as a C++ Software Engineer building scalable 3D visualization and simulation clients for mission-critical systems.
Senior Java backend engineer sought to design and deliver scalable, high-performance services that power Canva’s product features and user experiences.
Senior frontend-focused developer needed to help evolve 1Password’s billing and payments experience using React, TypeScript and modern full-stack practices in a remote US/Canada role.
NVIDIA is a publicly traded, multinational technology company headquartered in Santa Clara, California. NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI.
169 jobs