NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions, from artificial intelligence to autonomous cars. NVIDIA is looking for great people like you to help us accelerate the next wave of artificial intelligence.
The team delivers NVIDIA Mission Control Software that runs on superpods. The software we develop is shipped as an autonomous hardware recovery engine and is responsible for baseline validation tests, taking remedial actions (break/fix workflows), and periodic health checks for hardware components. We are looking for a Senior Software Engineer with experience in building highly scalable and robust enterprise software to join us. We are building and improving a powerful platform that will automate the diagnosis and repair of a cluster of GPUs or CPUs across public clouds, private clouds, and virtual and physical hardware.
What you'll be doing:
Designing and implementing scalable and reliable software components to enable the core platform to maintain an inventory of resources, including hosts, GPUs, and switches; to automate actions to diagnose failures, and to repair
Enabling Agentic AI within the core platform to create remedial workflows
Influencing the product roadmap in collaboration with teams across various departments with the goal of reducing SRE toil and improving hardware utilization
Collaborating with various organizations across Nvidia to drive adoption of the platform in order to improve GPU utilization
Defining and running benchmarks for various subsystems
Leading and delivering high-impact projects with high quality, performance, and stability with the lowest resource consumption
Developing a robust feedback control system that analyzes signals about system health and automatically runs commands to fix discovered issues
Programming in modern languages like Go and Rust
What we need to see:
Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent experience)
Keen interest in driving Agent AI projects
10 years of equivalent experience
Demonstrated ability in building scalable and robust distributed systems
Proven record of product rollouts and collaborating with early adopters
Proficiency in programming in C/C++, Java, Rust or Go.
Technical stewardship of projects across the organization
Ways to stand out from the crowd:
Deep understanding of multi-threading and distributed systems concepts
Excellent track record of delivering projects
Expertise in optimizing SQL queries
Expert-level knowledge of Go/Rust programming
With competitive salaries and a generous benefits package, NVIDIA is widely considered to be one of the technology industry's most desirable employers. We have some of the most forward-thinking and versatile people in the world working with us, and our engineering teams are growing fast in some of the most impactful fields of our generation: Cloud Engineering and Cloud Functions. If you're a creative engineer who enjoys autonomy and shares our passion for technology, we want to hear from you.
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and dedicated people in the world working for us. If you're creative and passionate about developing cloud services we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.You will also be eligible for equity and benefits.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Lead and execute high-impact, end-to-end enablement programs at NVIDIA’s Santa Clara HQ to upskill field and partner teams through bootcamps, workshops, and on-demand campaigns.
NVIDIA is hiring a seasoned Field Enablement manager to architect and deliver technical and sales training for global teams focused on Networking and accelerated computing solutions.
Drata is hiring a hands-on Software Engineering Manager to lead a remote engineering team building cloud-based SaaS products and driving technical excellence.
Figma is hiring a PhD Intern in Security Engineering (2026) to work onsite in San Francisco or New York on assessments, security tooling, offensive exercises, and risk-reduction projects.
Lead the design and implementation of scalable Python backend systems on GCP to enable AI-powered financial products at a high-growth global fintech.
Support Aptiv's AUTOSAR team by automating ECU test processes and building scripts and reports that accelerate development and improve test coverage.
BuildOps is hiring a remote California-based Mobile Engineer to lead development of an offline-first React Native app while contributing to backend Node.js services and cross-platform frontend consistency.
Experienced Staff AWS SRE to lead scalability, automation, and reliability across a rapidly growing cloud platform serving enterprise search workloads.
Lead design and implementation of high-volume, low-latency API services for a public-cloud, multi-tenant security management platform at Palo Alto Networks.
Anduril is looking for a PLM Developer to customize and extend Teamcenter PLM solutions, integrate downstream systems, and support high-availability enterprise deployments.
An Android Engineer II role focused on building and maintaining Kotlin-based Android apps (Compose, Coroutines) as part of Chick-fil-A's Customer Technology team.
Experienced software engineer with TS/SCI and polygraphs needed to develop and secure production Java/C++ services and infrastructure at Ft. Meade.
Software Engineer needed to build and test scalable integration services for iHeartMedia’s audio and ad tech products at the Dallas Parkway office.
Lead a remote backend engineering team at Flock Safety, driving architecture and delivery for Ruby on Rails-backed systems that support their aviation (drone) product suite.
Lead the development of a polished, high-performance React Native creator app at Scrollmark, a seed-funded startup building SocialGPT.
NVIDIA is a publicly traded, multinational technology company headquartered in Santa Clara, California. NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI.
177 jobs