Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
AI and ML Infra Software Engineer, GPU Clusters image - Rise Careers
Job details

AI and ML Infra Software Engineer, GPU Clusters

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology-and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent. As an NVIDIAN, you'll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.


We are currently hiring an AI/ML Infrastructure Software Engineer at NVIDIA to join our Hardware Infrastructure team. As an Engineer, you will play a crucial role in boosting productivity for our researchers through implementing advancements across the entire stack. Your primary responsibility will involve working closely with customers to identify and resolve infrastructure gaps, enabling innovative AI and ML research on GPU Clusters. Together, we can create powerful, efficient, and scalable solutions as we shape the future of AI/ML technology!


What you will be doing:

  • Collaborate closely with our AI and ML research teams to understand their infrastructure needs and obstacles, translating those observations into actionable improvements.
  • Monitor and optimize the performance of our infrastructure ensuring high availability, scalability, and efficient resource utilization.
  • Help define and improve important measures of AI researcher efficiency, ensuring that our actions are in line with measurable results.
  • Collaborate with diverse teams, including researchers, data engineers, and DevOps professionals, to build a seamless and coordinated AI/ML infrastructure ecosystem.
  • Stay on top of the latest advancements in AI/ML technologies, frameworks, and effective strategies, and promote their implementation within the company.


What we need to see:

  • BS or equivalent experience in Computer Science or related field, with 8+ years of proven experience in AI/ML and HPC workloads and infrastructure.
  • Hands-on experience in using or operating High Performance Computing (HPC) grade infrastructure as well as in-depth knowledge of accelerated computing (e.g., GPU, custom silicon), storage (e.g., Lustre, GPFS, BeeGFS), scheduling & orchestration (e.g., Slurm, Kubernetes, LSF), high-speed networking (e.g., Infiniband, RoCE, Amazon EFA), and containers technologies (Docker, Enroot).
  • Expertise in running and optimizing large-scale distributed training workloads using PyTorch (DDP, FSDP), NeMo, or JAX. Also, possess a deep understanding of AI/ML workflows, encompassing data processing, model training, and inference pipelines.
  • Proficiency in programming & scripting languages such as Python, Go, Bash, as well as familiarity with cloud computing platforms (e.g., AWS, GCP, Azure) in addition to experience with parallel computing frameworks and paradigms.
  • Passion for continual learning and keeping abreast of new technologies and effective approaches in the AI/ML infrastructure field.
  • Excellent communication and collaboration skills, with the ability to work effectively with teams and individuals of different backgrounds.


NVIDIA provides competitive salaries and a comprehensive benefits package. Our engineering teams are expanding rapidly due to exceptional growth. If you're a passionate and independent engineer with a love for technology, we want to hear from you.


Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.


You will also be eligible for equity and benefits.


Applications for this job will be accepted at least until July 31, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

NVIDIA Glassdoor Company Review
4.6 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
NVIDIA DE&I Review
No rating Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
CEO of NVIDIA
NVIDIA CEO photo
Jensen Huang
Approve of CEO

Average salary estimate

$270250 / YEARLY (est.)
min
max
$184000K
$356500K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs
Photo of the Rise User
Posted 17 hours ago
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave

Lead cross-disciplinary efforts to build and operationalize low-resource language LLMs and language-based AI products that prioritize linguistic inclusion and responsible AI at NVIDIA.

Photo of the Rise User
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave

Lead the design, training, and production deployment of LLM/VLM-powered prediction and planning systems for production autonomous vehicles at NVIDIA's Santa Clara team.

Photo of the Rise User

Lead a team designing and delivering cloud-native backend systems at Capital One using Java, Go, Python and Kubernetes to drive secure, regulatory-compliant solutions for millions of customers.

Photo of the Rise User
Inclusive & Diverse
Empathetic
Take Risks
Transparent & Candid
Feedback Forward
Mission Driven
Collaboration over Competition
Work/Life Harmony
Maternity Leave
Paternity Leave
Snacks
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
401K Matching
Paid Sick Days
Paid Time-Off
Paid Volunteer Time

Lead the architecture and experimentation strategy for Spotify’s Home backend systems to enable trustworthy, scalable personalization and better product experiences.

Photo of the Rise User
Posted 1 hour ago
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Take Risks
Collaboration over Competition
Fast-Paced
Growth & Learning
Transparent & Candid
Feedback Forward
Dare to be Different
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Paternity Leave
Flex-Friendly
Snacks
Social Gatherings
Company Retreats
Fitness Stipend
Paid Holidays
Summer Fridays
Work Visa Sponsorship
Bias Training
Flexible Spending Account (FSA)
Health Savings Account (HSA)
Vision Insurance
Dental Insurance
Life insurance

Meta is seeking a Machine Learning Software Engineer to develop scalable ML systems and production algorithms that power recommendations, ranking, and prediction at internet scale.

Photo of the Rise User

PwC IT Services is hiring a remote Manager-level Full-Stack .NET Developer to lead Agile teams building scalable, cloud-native HR systems for global PwC member firms.

Photo of the Rise User
Posted 21 hours ago
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Dare to be Different
Reward & Recognition
Fast-Paced
Maternity Leave
Paternity Leave
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Paid Holidays
Paid Sick Days
Paid Time-Off
Learning & Development
Social Gatherings

Lead a small pod at Robinhood to architect and operate mission-critical banking infrastructure that must scale with extreme reliability and performance.

Photo of the Rise User
Posted 14 hours ago

Smartleaf is hiring an Application Engineer to work across UI, API and backend systems to scale its portfolio rebalancing platform and deliver reliable production software.

Photo of the Rise User
Posted 19 hours ago

Work as a hands-on field engineer implementing and customizing Parloa's conversational AI platform for complex enterprise environments while partnering closely with customer teams and internal product and deployment stakeholders.

Photo of the Rise User
Posted 14 hours ago
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave

A senior full-stack engineer role at NVIDIA building scalable, secure AI-enabled enterprise applications and platform services that connect data, agents, and user experiences across cloud and hybrid environments.

Photo of the Rise User
Posted 23 hours ago

Twingate is looking for a Senior Backend Engineer to build and scale secure, zero-trust backend services for cloud and on-prem remote access.

Photo of the Rise User
Posted 5 hours ago

Plaid is hiring a Technical Lead Manager to lead the Credit Dashboard engineering team, driving technical direction and delivering scalable lender-facing products that combine Plaid’s Credit APIs with a polished UI.

Photo of the Rise User
PDDN INC. Hybrid Cameron Ave, Los Angeles, CA 91342, USA
Posted 1 hour ago

MedTech manufacturer in Sylmar seeks an experienced MES Developer with Rockwell FactoryTalk ProductionCentre, Java, and SQL to design, maintain, and validate MES solutions for regulated manufacturing operations.

Photo of the Rise User
Posted 14 hours ago
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave

NVIDIA is hiring a Senior Backend Compiler Engineer in Austin to design and implement high-performance GPU code generation and optimization passes for graphics and compute.

Posted 40 minutes ago

Nova Dynamics seeks a Full Stack Junior Software Developer to work on-site building emergency communication tools for fire departments alongside the CEO.

NVIDIA is a publicly traded, multinational technology company headquartered in Santa Clara, California. NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI.

91 jobs
MATCH
Calculating your matching score...
BADGES
Badge ChangemakerBadge Diversity ChampionBadge Family FriendlyBadge Global CitizenBadge Work&Life Balance
CULTURE VALUES
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
BENEFITS & PERKS
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, unknown
DATE POSTED
August 20, 2025
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!