NVIDIA is looking for outstanding software and systems engineers to help us develop and operate our enterprise GPU infrastructure management systems across Clouds. In this role, you will work closely with the broader NVIDIA team to operate, design and build infrastructure management systems, Kubernetes operators, and end-to-end HPC integration solutions that combine GPUs with the rest of the datacenter software management ecosystem. We are focused on supporting NVIDIA products across HPC, Cloud, and enterprise on both bare metal and virtualized platforms as the role of GPUs in all of these environments expands. Your contributions will span many aspects of GPU systems management, including Cloud provisioning, observability, operations and incident response. The systems you operate will support single-node developer systems through large clusters with thousands of nodes deployed on multiple Cloud providers.
To succeed, you must have a strong system and software development background, familiarity with modern distributed systems especially the Cloud-native ecosystem, and a proven work ethic. This is a dynamic work environment with many exciting opportunities awaiting. NVIDIA GPUs are central to many hot enterprise, cloud, and datacenter trends, come join us as we craft the future of accelerated computing and AI.
What you'll be doing:
Enable GPU provisioning and life-cycle with state-of-the-art Cloud-Native open-source ecosystem solutions, including Kubernetes, Docker, Prometheus, TerraForm and Crossplane.
Develop, maintain and/or operate robust, scalable Go programs in a Kubernetes environment.
Develop the next-generation multi-cloud infrastructure management systems to support GenAI.
Support internal and external users through bug fixes, documentation, and feature improvements.
Maintain high-quality products through robust test coverage and Day 2 capabilities.
What we need to see:
BS or higher in Computer Science or equivalent experience.
8+ years of meaningful industry experience with a strong Kubernetes and SRE background
Deep understanding and execution skills of all aspects of the software development lifecycle
Experience with OpenAPI and Kubernetes Custom Resource Definitions
Business level English, outstanding written and verbal interpersonal skills
Strong motivation and commitment to learn new skills
Ability to manage time in a fast, heavily multitasked environment
Ways to stand out from the crowd:
Open-Source contributions to the Cloud-Native community and an understanding of AI and LLM principles
Strong experience with GitHub/GitLab CI/CD pipelines and application configuration.
Strong knowledge of container technologies, orchestration frameworks and observability systems.
Exposure to GPU programming with CUDA and familiarity with Kubernetes internals. Experience in developing Kubernetes operators.
Experience with managing and operating HPC schedulers and/or working across multiple Cloud providers.
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative and autonomous, we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.You will also be eligible for equity and benefits.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Lead on-device inference optimization at NVIDIA to deliver high-performance, production-ready AI models for autonomous vehicle and real-time systems.
Lead the creation and commercialization of NVIDIA's Sensor AI Platform as the founding Director of Product, uniting imaging, DSP, Holoscan Sensor Bridge and multimedia teams behind a unified SDK, API and partner ecosystem.
A Washington, D.C.–based partner is seeking a React Developer to deliver high-quality, performant web interfaces using modern front-end technologies in a remote, Agile team.
Senior Software Engineer (Risk Engineering) to architect and implement scalable risk, fraud, and decisioning systems using Java, Spring Boot, AWS, and modern frontend technologies.
At Etched, a high-growth AI hardware startup in San Jose, the Business Automation Engineer will design and deliver full-stack automation and BI systems that streamline cross-functional operations and enable data-driven decisions.
Lead the technical build and conversion optimization for a high-growth eCommerce brand across Shopify, Klaviyo, and WordPress to boost revenue and LTV.
Lead the design and delivery of production embedded firmware for autonomous industrial vehicles at a publicly-traded company headquartered in Mountain View.
Software Development Intern supporting DoD financial systems at CACI, gaining practical experience in .NET, SQL, and modern JavaScript within a remote Agile team.
Build secure, scalable RIA-focused fintech experiences as a front-end focused Senior Full-Stack Engineer at NerdWallet, partnering directly with financial advisors to deliver compliant, high-performance systems.
Yahoo is looking for a Big Data Solutions Engineer to build and scale cloud-native data platforms and workflows using Spark, Airflow, and modern cloud services across AWS and GCP.
Lead enterprise Atlassian integrations, ChatOps automation, and AI-powered business applications to accelerate Hadrian's manufacturing and compliance goals.
Experienced AI Engineering leader needed to shape and deliver enterprise GenAI and ML solutions, influence executive stakeholders, and lead distributed engineering teams in a client-facing environment.
Lead development and technical direction for Experian's Intelligent Virtual Assistant platform, building scalable NLP-driven customer experiences with Groovy, Python, Java and AWS.
Experienced backend/full-stack engineer needed to build and maintain scalable services for Fandango's consumer-facing platforms as part of NBCUniversal's engineering organization.
Work with a design-driven engineering studio as a full-stack Ruby on Rails Developer, building scalable web apps using React, Next.js, TypeScript, Tailwind CSS and Rails on US hours (remote).
NVIDIA is a publicly traded, multinational technology company headquartered in Santa Clara, California. NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI.
184 jobs