At NVIDIA, continuous innovation in AI and accelerated computing demands robust, automated, and secure production environments. We are seeking a deeply skilled Senior Staff Site Reliability Engineer (SRE) to advance our enterprise security initiatives around identity and access, delivering zero trust outcomes by implementing, integrating, and scaling innovative technologies across cloud-native and hybrid infrastructures.
This position requires a strong software engineering background, but focuses on reliability, scalability, and operational excellence. A strong candidate excels in crafting secure systems, integrating internal and commercial products, and using sophisticated tools.
What You’ll Be Doing:
Architect, operationalize, and scale zero trust identity and access platforms—driving reliability, automation, and secure credential and policy management across on-premise and cloud environments.
Integrate and automate the deployment, monitoring, and lifecycle management of existing commercial and open-source products (SPIRE, Teleport, etc.), emphasizing ephemeral certificate-based authentication, mTLS, and SPIFFE protocols.
Advocate for operational guidelines for CI/CD, infrastructure as code (IaC), policy as code, and security observability, using tools like Kubernetes, Argo CD, Gitlab CI, Terraform, Vault, Prometheus, and Grafana.
Apply AI-assisted and data-driven approaches to automate anomaly detection, incident response, and compliance reporting, driving continuous improvement in system uptime and threat mitigation.
Collaborate with engineering, DevSecOps, and security teams to minimize manual intervention, limit privileged access, and enforce policy compliance through scalable automation.
Lead incident management, triaging, and blameless postmortems with security context, ensuring rapid root-cause analysis and recovery.
Conduct ongoing risk assessments, proactively address emerging threats and vulnerabilities, and contribute to post-incident reviews passionate about reliability and trust boundary breaches.
What We Need to See:
Bachelor’s or Master’s degree in Computer Science or related field, or proven experience (or equivalent experience).
10+ years of software engineering/DevOps/SRE experience, with a significant focus on operational security, automation, and identity management.
Proficiency in Linux administration, networking concepts, and security protocols.
Proven track record integrating and operating container platforms (Kubernetes, OpenShift, Nomad), with strong emphasis on automation and CI/CD (Argo CD, GitLab CI, Jenkins, Spinnaker, etc.).
Hands-on knowledge of zero trust security principles, including SPIFFE/SPIRE, mTLS, X.509 rotation, SSO, OAuth2/OIDC, LDAP, and cloud IAM services.
Experience with secrets management (Vault, AWS/Azure/Google Secret Manager, K8s Secrets) and infrastructure as code (Terraform, Pulumi, Ansible, CloudFormation).
Proficient in observability and monitoring tools (Prometheus, Grafana, ELK Stack, OpenTelemetry or equivalent experience) and policy automation frameworks.
Proficient in automation using Python, Go, or similar languages.
Demonstrated ability leading operational and incident response efforts at scale, developing runbooks and playbooks that leverage both automation and AI tools.
Ways to stand out from the crowd:
Direct experience operationalizing service mesh, identity federation, or policy engines in reliability-focused environments (Istio, Linkerd, Consul Connect).
Track record advancing zero trust architecture through automation and minimized human access, including ephemeral credentials and policy enforcement.
Background in integrating AI/ML-assisted tools for operational intelligence, anomaly detection, and reliability improvements.
Experience driving compliance, audit readiness, and operational security in cloud (AWS/GCP/Azure) and hybrid environments.
Relevant security/DevOps/SRE certifications and open-source contributions.
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!
You will also be eligible for equity and benefits.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Technical and business-minded capacity planning lead needed to translate GPU roadmaps and tenant demand into actionable global data center capacity strategies.
Lead product strategy and teams to design developer tools, SDKs, and platforms that improve developer efficiency across AI, HPC, and graphics at NVIDIA.
Lead a full-stack engineering team at Abridge to design and deliver secure, HIPAA-compliant EHR integrations and interoperability features using modern web technologies.
Lead the architecture and delivery of production-grade AI systems for cyber operations, building resilient agent orchestration, MCP serving infrastructure, and advanced prompt engineering patterns.
Verneek is hiring a Frontend Engineer to craft high-performance TypeScript and React applications and collaborate on AI-driven product features.
Lead the frontend development of Scribe's core document viewing and editing experiences, delivering polished, performant, and accessible React interfaces for millions of users.
Every.io is hiring a senior Front-End Engineer to lead development of accessible, performant React/TypeScript interfaces for a fast-growing startup platform.
Work on full-stack features and scalable audio-data pipelines at David AI to help researchers and enterprises turn raw speech into high-signal training data.
Zoox is hiring a Learned Trajectory Machine Learning Engineer to develop and deploy deep learned trajectory models using imitation and reinforcement learning for autonomous vehicles.
Experienced UI Developer/Lead with active TS/SCI + polygraph needed to modernize mission systems using Java, Python, Splunk, and IaC for a federal contract.
Senior Backend Software Engineer needed to build highly concurrent, large-scale systems that power Robinhood's brokerage, international expansion, and crypto products.
Patreon is hiring a Staff Software Engineer, Backend Platform to build scalable backend systems and developer platform capabilities that power creator membership experiences in a hybrid NY/SF role.
Lead the design and delivery of complex application enhancements and support activities for Highmark Health while collaborating with business partners and mentoring junior team members.
Intern with Arcade's backend and AI engineering team to build scalable model orchestration, inference, and production backend systems for generative product creation.
Experienced full-stack engineer needed to lead technical design and delivery of scalable healthcare applications while mentoring teams and shaping architecture.
NVIDIA is a publicly traded, multinational technology company headquartered in Santa Clara, California. NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI.
165 jobs