Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Senior Product Manager - Observability and Resilience image - Rise Careers
Job details

Senior Product Manager - Observability and Resilience

NVIDIA has become the platform upon which every new AI-powered application is built. From healthcare research applications to autonomous vehicles, or voice-recognition systems, there is a need to simplify and deliver predictability for AI applications and workflows ... and NVIDIA is right in the center of this revolution. Resiliency and Observability are key to delivering customer value and exhilarating customer experience. This product manager will lead the development of foundational tools dedicated to ensuring the resiliency and observability of large-scale accelerated computing platforms. By creating essential tools for system diagnostics, performance monitoring, and automated recovery, they will empower customers to confidently operate both complex AI training and demanding inference workloads with maximum uptime and efficiency.

What you will be doing:

  • Be a subject‑matter expert on resiliency and observability. Deeply understand failure modes across the GPU hardware, network, and software stack, along with the telemetry signals that reveal them, and how they correlate to workload health and SLOs. Master modern reliability architectures. Keep up-to-date with the industry trends.

  • Build for all that want to use. Drive joint project planning. Define concrete achievements, tasks, and work for resiliency and observability initiatives with external partners.

  • Fuel innovation in reliability tooling. Lead ideation sessions to propose novel approaches and shape new proof‑of‑concepts.

  • Bridge development, SRE, and partner teams. Facilitate clear communication, triage emergent issues rapidly, and ensure feedback loops between engineering and customer operations remain tight.

  • Coordinate execution across different functions. Work with engineering, design, operations, sales, and marketing to embed resiliency and observability requirements into every product launch, capacity expansion, and lifecycle transition.

What we need to see:

  • BS or MS in Computer Science, Computer Engineering, or a related field (or equivalent experience) and 12+ years of product‑management experience in enterprise technology.

  • Experience with GPU observability (DCGM, NVML, etc.) and integration into large‑scale telemetry systems.

  • Deep knowledge of AI/ML infrastructure, high‑performance computing (HPC), networking, and cloud technologies (IaaS, PaaS) including containerization, Kubernetes, and automation tools.

  • Familiarity with modern observability stacks: metrics, logs, traces, OpenTelemetry, Prometheus/Grafana, ELK/OpenSearch.

  • Experience building and preferably deep understanding of secure, compliance‑focused telemetry pipelines (SOC2, FedRAMP).

  • Ability to articulate trade‑offs among latency, throughput, cost, and reliability to both engineering and executive audiences.

  • Data-driven approach: defines SLIs/SLOs, manages error budgets, and develops value models.

  • Strong cross‑functional execution: writes clear specs and PRDs, produces GTM collateral, and leads agile processes.

Ways to stand out from the crowd:

  • Masters/Phd or Expertise in distributed systems, performance modeling, or fault‑tolerant computing.

  • Experience with MLOps and LLMOps ecosystems and integrating with enterprise platforms; deployments at modern data‑center scale; delivered ML/AI observability solutions for LLMOps, predictive incident detection, or anomaly classification.

  • Startup or 0 -> 1 experience building cloud‑native observability or resilience tools; proven success bringing open‑source observability products to market and shaping GTM strategy.

  • Familiarity with MLOps toolchains and integrations with monitoring platforms such as Splunk, Datadog, and Grafana Cloud.

  • Expertise with containerization technologies like Docker and Kubernetes, plus virtualization. Proficiency in network architecture and high‑performance interconnects (InfiniBand, Ethernet, RoCE).

We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our elite engineering teams are growing fast. NVIDIA is widely considered to be one of the industry's most desirable employers. NVIDIA is at the center of Deep Learning, Artificial Intelligence, and Autonomous Vehicles. If you're looking for a challenge, thrives in an ambiguous environment and shares our passion for technology, we want to hear from you. We are looking for great people to help us accelerate the next wave of artificial intelligence.

#LI-Hybrid

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 208,000 USD - 327,750 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until August 21, 2025.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

NVIDIA Glassdoor Company Review
4.6 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
NVIDIA DE&I Review
No rating Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
CEO of NVIDIA
NVIDIA CEO photo
Jensen Huang
Approve of CEO

Average salary estimate

$267875 / YEARLY (est.)
min
max
$208000K
$327750K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs
Photo of the Rise User
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave

Lead large-scale cloud infrastructure and AI capacity programs for NVIDIA's DGX Cloud, coordinating cross-functional engineering and partners to drive deployment, reliability, and measurable impact.

Photo of the Rise User
Posted 15 hours ago
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave

Join NVIDIA’s Product Security team to build SDLC security agents and backend platforms that automate OSS and developer security across CI/CD and version control systems.

Experienced product executive needed to define and lead product strategy for Cengage's Higher Ed business, driving customer outcomes and building a high-performing, outcome-focused product organization in a remote role.

Photo of the Rise User
Inclusive & Diverse
Empathetic
Collaboration over Competition
Growth & Learning
Transparent & Candid
Medical Insurance
Dental Insurance
Mental Health Resources
Life insurance
Disability Insurance
Child Care stipend
Employee Resource Groups
Learning & Development

Senior product leader needed to unify Resy and Tock product teams, build a joint consumer roadmap, and drive demand-generation features for American Express's Global Dining SaaS platform.

Photo of the Rise User

The Princeton Review is hiring a Senior Technical Product Manager to lead web and product technology initiatives for its academic support portfolio, partnering with marketing, sales, and engineering in a remote capacity.

Posted 4 hours ago

Experienced product development leader needed to drive new product introductions and product improvements for absorbent hygiene products at a manufacturing site in Lewistown, PA.

Photo of the Rise User
Dental Insurance
Flexible Spending Account (FSA)
Disability Insurance
Health Savings Account (HSA)
Vision Insurance
Performance Bonus
Family Medical Leave
Paid Holidays

Lead the strategic vision and execution of cross-product initiatives at Socure, building zero-to-one identity and risk solutions that span multiple product lines and drive measurable customer impact.

Photo of the Rise User
Posted 16 hours ago
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Take Risks
Startup Mindset
Collaboration over Competition
Fast-Paced
Growth & Learning
Dental Insurance
Vision Insurance
Disability Insurance
Flexible Spending Account (FSA)
Health Savings Account (HSA)
Performance Bonus
Family Medical Leave
Paid Holidays

Lead the strategy and execution of a new 0-to-1 scheduling product at Calendly, driving growth, adoption, and cross-functional launches in a product-led environment.

Photo of the Rise User
Anduril Industries Hybrid Costa Mesa, California, United States
Posted 11 hours ago

Lead the product vision and lifecycle for head-worn wearable and mixed-reality hardware at Anduril, driving cross-functional delivery from concept to fielded systems.

Photo of the Rise User
Posted 11 hours ago

Lead product strategy and Agile delivery for AI-driven solutions at Emory Healthcare, partnering across clinical, digital, and technical teams to drive adoption and measurable impact.

Photo of the Rise User

Lead the multi-year product strategy for a mission-driven K–12 edtech suite, turning educator insights and research into prioritized roadmaps that improve teaching and learning.

Photo of the Rise User
Posted 20 minutes ago

Experienced product managers are needed to audit, calibrate, and improve AI-generated product strategy, roadmaps, and go-to-market plans on a part-time contract basis with a top AI research lab.

ryansg Hybrid Chicago - Illinois - Wacker
Posted 2 hours ago

Ryan Specialty Group is hiring a Senior Product Analyst in Chicago to translate business needs into actionable product requirements and drive digital insurance product delivery across carrier and technology partners.

Photo of the Rise User
Posted 20 hours ago
Inclusive & Diverse
Empathetic
Collaboration over Competition
Growth & Learning
Transparent & Candid
Medical Insurance
Dental Insurance
Mental Health Resources
Life insurance
Disability Insurance
Child Care stipend
Employee Resource Groups
Learning & Development

American Express GCS seeks a Senior Associate to lead and evolve the mobile product vision and roadmap for commercial customers, driving connected, intuitive mobile experiences across teams.

Photo of the Rise User
Sand Cherry Associates Hybrid No location specified
Posted 3 hours ago

Lead product vision and delivery for internal telecom operational tools, partnering with stakeholders and development teams to create intuitive, integrated solutions that improve call center and field operations.

NVIDIA is a publicly traded, multinational technology company headquartered in Santa Clara, California. NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI.

80 jobs
MATCH
Calculating your matching score...
BADGES
Badge ChangemakerBadge Diversity ChampionBadge Family FriendlyBadge Global CitizenBadge Work&Life Balance
CULTURE VALUES
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
BENEFITS & PERKS
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
August 18, 2025
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!