Browse 58 exciting jobs hiring in Site Reliability now. Check out companies hiring such as Jobgether, Rokt, FM in Chesapeake, Des Moines, Gilbert.
Lead a globally distributed SRE team to improve reliability, scalability, observability, and cost efficiency for cloud-based SaaS platforms across AWS and Azure.
Lead and scale global, mission-critical SaaS support operations as a Senior Manager focused on operational rigor, cross-functional collaboration, and customer excellence.
Rokt is hiring an experienced Engineering Manager (SRE) to lead production engineering, harden cloud infrastructure at scale, and develop a high-performing SRE team.
Experienced cloud-focused Senior Software Engineer wanted to build and operate scalable infrastructure and developer tools across AWS, Kubernetes, and Cloudflare for enterprise platforms.
Lead the architecture and operation of NVIDIA's global observability platform to ensure reliable, high-performance telemetry for large-scale AI and data systems.
Lead DevOps Engineer needed to architect and modernize CI/CD and cloud infrastructure for large-scale enterprise applications in Dallas, TX.
Kalshi is hiring a Site Reliability Engineer to strengthen observability, automate operations, and scale reliable production services for its fast-growing prediction markets platform.
At Campfire, this in-office DevOps Engineer role owns AWS infrastructure, Terraform automation, observability, and production reliability to support a fast-growing accounting SaaS product.
WEX seeks an experienced Senior Staff SRE to define and execute enterprise reliability strategy, build resilient systems, and lead cross-functional initiatives that improve scale, observability, and operational excellence.
SpaceX Starshield is hiring a Senior Site Reliability Engineer to build and operate secure, highly available infrastructure supporting national-security satellite and communications systems.
Seasoned platform engineering leader to define and deliver cloud-first platform strategy and modernization at T. Rowe Price, ensuring reliability, scalability, and operational excellence across global infrastructure.
Senior cloud engineering leader to oversee AWS-based platform, SRE, and systems teams, driving FinOps, observability, and large-scale infrastructure modernization in a remote-first setting.
Senior infrastructure engineer needed to drive resiliency, observability, and scalable real-time systems for Orb's billing platform in a hybrid San Francisco office environment.
NBCUniversal is hiring a Site Reliability Engineer to build, operate, and enhance monitoring and control systems for its IP video distribution and on-air broadcast environments.
Experienced SRE leader needed to architect, automate, and operate cloud-native infrastructure to deliver reliable, scalable services across regulated environments.
OnePay seeks an experienced Site Reliability Engineer to improve platform reliability and observability for a high-scale consumer fintech platform serving millions of users.
Lead the mobilization and operational readiness of new and transitioning data center sites for T5, ensuring seamless handoff to operations and full compliance with company standards.
Work on NVIDIA's DGX Cloud team to design and operate large-scale Kubernetes-based GPU clusters that power cutting-edge AI workloads.
Experienced engineering leader sought to manage and grow an SRE team that ensures reliability, scalability, and operational excellence for cloud-native production systems.
Build and scale the compute and infrastructure that powers Chai Discovery's next-generation AI drug design platform as a Software Engineer, Infrastructure.
Senior SRE leader needed to shape reliability practices, mentor engineers, and deliver resilient, scalable cloud infrastructure for a high‑throughput fintech platform.
Help build the core compute delivery platform for a San Francisco startup creating a liquid market for GPU offtake as a Software Engineer focused on cloud and systems programming.
Valinor is looking for an Infrastructure & Security Engineer to design, operate, and secure CI/CD pipelines and cloud/edge infrastructure for defense-focused products across its portfolio.
Lead Peacock's SRE and DevSecOps efforts as Manager, guiding cloud architecture and engineering teams to deliver secure, scalable streaming services for millions of users.
Senior Software Engineer - Reliability (remote, CA) to help build foundational SRE practices, observability, and infrastructure automation for secure, compliant cloud production systems.
Help architect and operate cloud-native, AI-powered platforms as a Software Engineer (SRE) focused on reliability, automation, and scalable microservices.
Help operate and scale a high-performance GPU cluster used by cutting-edge ML research and production teams as a Senior Site Reliability Engineer.
Experienced cloud-native engineer needed to lead design and automation of scalable Kubernetes platforms across AWS and OCI, driving reliability, cost optimization, and developer experience.
Trunk is hiring a Forward Deployed Engineer to lead end-to-end private and on-premises deployments, collaborate with enterprise IT, and ensure secure, reliable operation of its CI Reliability Platform.
Lead reliability engineering for LinkedIn's massive streaming platform—designing, coding, and operating pub/sub infrastructure to ensure scalable, highly available data flow across the company.
Lead efforts to improve reliability and performance of Alpaca's streaming infrastructure (RabbitMQ/Redpanda) as a Staff Site Reliability Engineer on a remote, North-America-based team.
Lead and mentor a high-performing DevOps team to deliver secure, reliable cloud and monitoring solutions for Kaseya’s Remote Monitoring & Management product.
Lead the development of AI-driven, self-healing SaaS infrastructure as a Senior Site Reliability Engineer at a remote-friendly US company focused on operational excellence and scalable reliability.
Senior Site Reliability Engineer (remote, US) needed to drive automation and reliability at scale, collaborating with cross-functional teams and leading operational excellence initiatives.
Gong is hiring a Senior DevSecOps Engineer to architect and secure scalable AWS infrastructure and CI/CD pipelines for its Revenue AI platform.
Ciroos is hiring a Senior Forward Deployed Engineer to lead enterprise deployments of its AI SRE Teammate, ensuring reliable production outcomes and translating operational pain into product improvements.
Cohere is hiring a Staff Software Engineer to build and operate ML-optimized HPC infrastructure (Kubernetes-based GPU/TPU superclusters) that accelerates research and production training of large AI models.
Lead reliability engineering at Quizlet as a Senior Staff SRE—architect resilient, self-healing systems, modernize infrastructure, and mentor senior engineers for a global learning platform.
Netic seeks a founding Product Infrastructure Engineer to build and scale the cloud backbone that runs its autonomous AI agents and drives the next wave of agentic products in the physical services economy.
InfStones is hiring a Blockchain Site Reliability Engineer to maintain and scale mission-critical blockchain node infrastructure while driving automation and incident response for a multi-chain platform.
ServiceNow seeks a Site Reliability Engineer (Federal) for the 3rd shift to maintain and improve government cloud infrastructure reliability through automation, monitoring, and deep systems engineering.
Quizlet seeks a Staff Site Reliability Engineer to own platform-wide reliability, automation, and scaling for their San Francisco-based infrastructure team.
Quizlet is hiring a Senior Site Reliability Engineer to build automation, observability, and self-healing infrastructure that ensures reliable, scalable delivery of AI-driven learning services.
Help drive reliability and scalability for Palo Alto Networks' Advanced URL Filtering platform by building secure, automated cloud infrastructure and operational tooling.
Help Ashby scale reliably as a Staff Platform Engineer by building pragmatic, developer-friendly infrastructure, improving observability and reliability, and owning platform initiatives end-to-end.
Waabi is hiring a Senior/Staff Infrastructure Engineer to design and operate high-performance physical and cloud infrastructure that powers its self-driving vehicle research and deployments.
Senior Software Engineer (Site Reliability) to architect resilient, scalable services and mentor engineering teams in support of WGU’s mission to expand access to higher education.
Humana is hiring a Senior Tech Lead (SRE) to lead reliability engineering efforts, drive automation and incident response, and scale critical services across cloud and hybrid environments.
Lead architecture and technical strategy for LinkedIn’s Infrastructure SRE organization to build scalable, reliable, and AI/ML-enabled operational platforms.
Axiom is hiring a Site Reliability Engineer to build and operate scalable, highly available cloud infrastructure for its serverless data analytics platform.
Below 50k*
0
|
50k-100k*
0
|
Over 100k*
30
|