Browse 139 exciting jobs hiring in Site Reliability now. Check out companies hiring such as Rackner, NVIDIA, Hudl in Stockton, Virginia Beach, Sioux Falls.
Serve as the weekend guardian for DoD Kubernetes platforms, ensuring continuous availability and secure multi-cloud operations for Air Force and mobility missions.
Lead the DevOps effort at NVIDIA to improve availability and accelerate delivery for internal services using CI/CD, containers, and scalable infrastructure.
Hudl seeks a pragmatic Software Engineer II to help scale observability and site reliability practices across the platform while collaborating closely with product teams.
Experienced SRE leader needed to manage multiple teams and advance cloud reliability, automation, observability, and security for LexisNexis Risk Solutions.
Lead reliability and automation efforts for a fast-moving mortgage fintech, owning monitoring, on-call, tooling, and infrastructure to ensure safe, scalable production systems.
Terminal Velocity (Rocket Science Group) is hiring a Senior Platform Engineer to design and operate scalable, resilient backend and platform systems that support global game launches.
Technical and people leader needed to drive architecture, operations, and continual improvement of hybrid cloud and on-prem data infrastructure supporting AI, bioinformatics, and large-scale research at the University of Chicago.
Lead the design and automation of Linea's cloud-native infrastructure as a Senior DevOps Engineer at Consensys, focusing on AWS, Kubernetes, Terraform, and observability to support a fast-moving Layer-2 blockchain.
Visa seeks an experienced Sr. Site Reliability Engineer to enhance reliability, automation, and incident response for critical payment systems in a hybrid role based in Highlands Ranch, CO.
Visa is hiring a Site Reliability Engineer in Austin to help automate operations, resolve incidents, and drive reliability improvements across critical payment systems.
Onebrief is hiring a Senior Site Reliability Engineer to own reliability, observability, and secure operations for on-prem and cloud military deployments in Colorado Springs.
Visa is hiring a Site Reliability Engineer to support and enhance the reliability and automation of mission-critical payment systems within the Product Reliability Engineering team.
Lead Visa's Site Reliability Engineering efforts to deliver highly available, secure, cloud-native application platforms while driving automation and operational excellence.
Sierra is hiring a seasoned Site Reliability Engineer to own observability, scalability, and secure cloud infrastructure for its AI platform in San Francisco.
Canary seeks an experienced Lead Site Reliability Engineer to drive incident response, SLO frameworks, and platform reliability across its remote engineering organization.
Senior-level SRE role focused on automating infrastructure and security controls, maintaining observability and SLOs, and improving reliability across Sonar’s global platform.
Lead the architecture and operational strategy for Lightspark’s global production infrastructure to ensure secure, reliable, and scalable payment systems.
Cape is hiring a Site Reliability Engineer to build and operate privacy-focused telecommunications infrastructure, improve system reliability and monitoring, and own FedRAMP accreditation for a fast-growing, mission-driven startup.
As a Senior Site Reliability Engineer for a high-growth platform, you will design and operate large-scale AWS infrastructure, build automation and observability, and partner with engineering teams to improve reliability and deployment velocity.
Senior Site Reliability Engineer needed to own large-scale AWS infrastructure, automate CI/CD and observability, and drive platform reliability for a high-growth, remote-friendly US company.
ServiceNow is hiring a Staff Software Engineer to build and operate production-grade database provisioning tools that ensure reliable, scalable database operations across global data centers.
Experienced Site Reliability Engineer II needed to lead production reliability, observability, and automated cloud operations for a healthcare data platform.
Senior Site Reliability Engineer focused on core systems development, automation, and platform reliability for Quizlet's high-volume production environment in San Francisco.
Lead reliability and automation efforts for Crusoe's SDN stack, ensuring high-performance, fault-tolerant networking for an AI-first cloud platform.
Lead the reliability, performance, and automation of Visa's mission-critical databases across PostgreSQL, Oracle, and MySQL to ensure high availability and a great developer experience.
Lead the design and implementation of large-scale distributed systems and platform services at LinkedIn, working across teams to deliver high-performance, secure infrastructure and contribute to open-source ecosystems.
Serve as the primary production support engineer for a remote US team, managing incidents, optimizing system performance, and working with cross-functional teams to improve reliability and scalability.
Camunda is hiring a Senior Cloud Infrastructure Engineer to architect and operate its Kubernetes-based multi-cloud platform and help drive reliability, observability, and automation for a global production environment.
Palo Alto Networks seeks a Principal DevOps Engineer (Cloud) to architect, automate, and operate scalable, secure cloud infrastructure supporting large-scale GCP/AWS services.
Experienced SRE/DevOps engineer needed to lead reliability, automation, and CI/CD improvements for mission-critical federal cloud environments.
Visa is hiring a Site Reliability Engineering Intern for Summer 2026 in Austin to support reliability, monitoring, and automation of mission-critical payment systems during a 12-week immersive program.
Senior Site Reliability Engineer needed to run and harden Featurespace's ARIC Risk Hub SaaS on cloud infrastructure, driving automation, monitoring, and operational excellence.
Jerry.ai is hiring a Senior DevOps Engineer to own and scale cloud infrastructure, CI/CD pipelines, monitoring, and incident response for a rapidly growing pre-IPO startup.
SpaceX seeks a Site Reliability Engineer to manage and scale GNC mission-critical infrastructure and HPC systems supporting Falcon launch operations in Hawthorne, CA.
Lead architecture and operation of LinkedIn's edge infrastructure as a Sr. Staff Software Engineer driving reliability, automation, and large-scale traffic systems.
Visa is hiring a Site Database Reliability Engineer to design, operate, and automate high‑availability databases that power critical payment applications.
Work with engineering teams to design and operate scalable, secure AWS infrastructure and automation tooling that supports mission-critical enterprise applications.
Lead SRE and DevOps efforts for mission-critical federal cloud environments, driving automation, reliability, and compliance at scale.
Dataiku is hiring a Site Reliability Engineer to build and maintain scalable, secure cloud infrastructure for its SaaS platform with an automation-first approach in a hybrid NYC role.
Lead Shippo's platform SRE team to build and operate scalable Kubernetes-based infrastructure, observability, and deployment automation that empower product teams to deliver reliable shipping services.
Lead global site operations for a fast-growing, sustainability-focused AI cloud provider to drive uptime, operational excellence, and scalable processes across a worldwide fleet of GPU data centers.
Lead the reliability and automation of a large Microsoft 365 environment, driving monitoring, incident response, and team leadership to ensure secure, high‑performing services.
Experienced utility designer needed to deliver and support moderately complex overhead and underground power distribution projects for KCI Technologies in Greenville.
Peraton is hiring an experienced Site Reliability Engineer to maintain and troubleshoot a data-intensive cloud platform (Kubernetes, Hadoop, Accumulo) while providing Tier 1–3 operational support under an active TS/SCI clearance requirement.
A global SaaS provider is hiring a Site Reliability Engineer to architect, automate, and operate resilient cloud infrastructure for a multi-tenant production environment.
Lead a Site Reliability Engineering team at Coalfire to deliver managed cloud services, drive operational excellence, and support clients across AWS (and multi-cloud) environments with a focus on security and compliance.
ServiceNow is hiring a hands-on Shift Manager for its Federal SRE team to lead 3rd-shift operations, drive automation, and ensure high availability of mission-critical cloud platforms.
Lead DevOps and site reliability engineering for federal cloud environments, building scalable infrastructure and driving operational excellence in a security-conscious setting.
Lead SRE efforts as the founding Site Reliability Engineer at a fast-growing AI company, building scalable, secure, and observable AWS infrastructure and processes.
Lead the reliability and scalability of cloud data platforms and ML/GenAI workloads as the organization's senior SRE for data infrastructure, driving automation and performance improvements.
Below 50k*
0
|
50k-100k*
4
|
Over 100k*
73
|