Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Principal Software Engineer, AIOps and Observability image - Rise Careers
Job details

Principal Software Engineer, AIOps and Observability

NVIDIA has been reinventing computer graphics, PC gaming, and accelerated computing for 30 years. It is a unique legacy of innovation that’s fueled by great technology and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, generative AI, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work.

We are looking for a highly skilled Principal Software Engineer to design and develop AIOps & Observability platforms at NVIDIA. The platforms are used by internal teams to monitor, diagnose, and optimize the products, millions of assets and services in cloud, on-prem, data centers, supply chain, and edge. You will work with a team of engineers, product managers, and partners to define the observability strategy, roadmap, and standard methodologies for NVIDIA. You will also mentor and coach other engineers on observability, machine learning, tools and techniques.

What you will be doing: 

  • Lead the design, development, and deployment of AIOps & Observability platforms, including metrics, logs, traces, events, alerts, dashboards, and visualizations.

  • Drive the technical vision and roadmap for AIOps and Observability initiatives, aligning with business goals and industry best practices.

  • Collaborate with other teams and customers to understand their observability needs and provide solutions that meet their requirements and expectations.

  • Establish and implement observability standards, guidelines, and processes across NVIDIA. Research, evaluate, and adopt new observability technologies and frameworks that can enhance user experience.

  • Provide peer reviews to other engineers including feedback on performance, scalability, security and correctness.

  • Work with Data scientists to implement machine learning models for anomaly detection, forecasting, and root cause analysis on logs, metrics, and events. Handle large volumes of data and ensure data quality, security, and compliance.

  • Develop and operate scalable, reliable, and distributed systems that can handle high traffic and complex workloads.

  • Find opportunities to automate remediation of commonly occurring issues to operate systems reliably and efficiently.

What we need to see: 

  • Bachelor’s degree in computer science and engineering, or related field, or equivalent experience.

  • 15+ years of experience in product development and full stack engineering, with 5+ years of experience in developing and operating observability platforms and solutions, preferably in a cloud-native environment.

  • Strong knowledge and experience with observability tools, such as Prometheus, Victoria Metrics, Vector, Loki, Grafana, Alert Manager, Clickhouse, OpenTelemetry, etc.

  • Hands-on knowledge in AIOps tools such as BigPanda, PagerDuty, Datadog, etc.

  • Experience with Kubernetes, Nomad, Docker, and microservices architectures as well as experience with streaming services to ingest billions of events using NATS, Kafka, etc

  • Proficient in one or more programming languages, such as Go, Python, Java, C#, etc.

  • Passionate about observability and delivering high-quality internal platforms.

  • Experience with developing Observability solutions to monitor On-prem and Public Cloud environments.

  • Experience with running large Observability platforms on BareMetal Infrastructure

  • Establish scalable data pipelines and instrumentation for collecting, aggregating, and visualizing telemetry and operational metrics.

Ways To Stand Out From The Crowd:

  • Deep understanding of implementing Observability solutions to large scale on-prem Infrastructure and Networking.

  • Hands-on experience with managing large scale Observability Platforms with LLMs & ML Models and building custom services to ingest billions of metrics and logs from wide range of assets.

  • Developed unified cloud observability platform to monitor Network, Compute, Power, Storage, Operating Systems, Security, Applications, SaaS Platforms.

  • Demonstrated experience and expertise in using machine learning and Generative AI to develop solutions such as predictive monitoring, incident diagnosis, summarization and correlation.

  • Demonstrate proficiency in AI/ML systems, generative AI, or agentic AI frameworks.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative, self-motivated and enjoy having fun, then what are you waiting for apply today!

#LI-Hybrid

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 248,000 USD - 391,000 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until December 23, 2025.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

NVIDIA Glassdoor Company Review
4.6 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
NVIDIA DE&I Review
No rating Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
CEO of NVIDIA
NVIDIA CEO photo
Jensen Huang
Approve of CEO

Average salary estimate

$319500 / YEARLY (est.)
min
max
$248000K
$391000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs
Photo of the Rise User
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave

NVIDIA seeks a Senior Software Engineer (Networking - Cybersecurity) to architect and implement low-latency, secure network drivers and stacks for automotive and embedded platforms.

Photo of the Rise User
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave

NVIDIA is hiring a Senior Software Engineer on the Product Security team to design and integrate automated security tooling and pipelines across platform development environments.

Photo of the Rise User

Perplexity seeks a Staff Backend Engineer to design and operate low-latency, high-throughput APIs that power LLM-driven products and enterprise integrations.

CVS Health is hiring a Technical Lead to architect and deliver full-stack, AI-enabled HR technology solutions that scale across enterprise environments.

Photo of the Rise User
Posted 23 hours ago

Lead embedded networking and connectivity engineering for a leading surgical robotics company, focusing on secure, reliable device-to-cloud integration and production support.

Photo of the Rise User

Docker is hiring a Senior Principal Engineer to define and drive the multi-year technical vision and architecture for the Bridge platform, unifying billing, identity, data, governance and infrastructure at enterprise scale.

Photo of the Rise User
Zone IT Solutions Hybrid No location specified
Posted 3 hours ago

Zone IT Solutions is hiring a Blue Prism Developer to build and maintain enterprise RPA solutions in California City.

Photo of the Rise User
Posted 20 hours ago

A fully remote WordPress Technical Lead role at Kanopi Studios to own architecture, lead development, mentor engineers, and deliver polished WordPress solutions for mission-driven clients across the US and Canada.

Photo of the Rise User

Lead the Android platform and ship performant, privacy-first apps and SDKs for families as a hands-on Staff Android Engineer on a long-term contract with Gabb.

Photo of the Rise User

Experienced infrastructure-focused software engineer sought to design and scale resilient, observable systems for a high-growth product (remote within Georgia).

Photo of the Rise User
ResMed Hybrid Peachtree Corners, GA, United States
Posted 21 hours ago

Brightree (a ResMed company) seeks an experienced Software Engineer II to drive scalable, secure web and cloud solutions for the SNAP resupply product in the HME space.

Photo of the Rise User
Posted 13 hours ago
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Dare to be Different
Reward & Recognition
Fast-Paced
Maternity Leave
Paternity Leave
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Paid Holidays
Paid Sick Days
Paid Time-Off
Learning & Development
Social Gatherings

Experienced full-stack engineers with strong frontend and mobile skills are sought to build and ship core features for Robinhood’s Credit Card & Banking app across native mobile and backend systems.

Photo of the Rise User
Posted 4 hours ago

CDW is hiring a Senior Software Engineer I to build and maintain large-scale business applications using .NET, APEX/Visualforce, SQL Server and modern web and cloud technologies.

Photo of the Rise User

Virtue AI seeks an Inference Engineer to design and operate high-performance, production-ready inference systems for LLMs and embeddings in San Francisco.

Photo of the Rise User
Posted 3 hours ago
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Take Risks
Casual Dress Code
Startup Mindset
Emails over Meetings
Collaboration over Competition
Fast-Paced
Growth & Learning
Open Door Policy
Customer-Centric
Social Impact Driven
Passion for Exploration
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Conferences Stipend
Education Stipend
Learning & Development
Bias Training
Paid Time-Off
Maternity Leave
Equity
Work Visa Sponsorship

Lead architecture and delivery of large-scale ML-driven fraud detection and microservices for Sam's Club as a Staff Software Engineer on the Fraud Prevention Services team.

NVIDIA is a publicly traded, multinational technology company headquartered in Santa Clara, California. NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI.

89 jobs
MATCH
Calculating your matching score...
BADGES
Badge ChangemakerBadge Diversity ChampionBadge Family FriendlyBadge Global CitizenBadge Work&Life Balance
CULTURE VALUES
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
BENEFITS & PERKS
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
December 22, 2025
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!