Browse 192 exciting jobs hiring in Reliability now. Check out companies hiring such as Image Associates Inc., Jobgether, InStride in Boise City, Long Beach, Wichita.
Lead maintenance and reliability for a high-speed glass bottle manufacturing plant, overseeing electrical, hydraulic, automation systems and a large technical team to maximize uptime and safety.
Experienced DevOps engineers are sought to architect and operate scalable cloud infrastructure, automate delivery pipelines, and elevate security and observability across a distributed platform.
Principal Site Reliability Engineer to lead AWS architecture, automation, and reliability practices for a remote-first engineering team focused on scalable, secure learning platforms.
Lead and grow a high-performing engineering team to deliver mission-critical government products that expand access to benefits and improve outcomes for vulnerable residents.
OceanaGold is hiring a Reliability Engineer at the Haile Gold Mine to lead RCFA, PM/CBM program development, and data-driven asset strategy improvements to maximize plant uptime and reliability.
Lead advanced data analytics and optimization modeling to improve grid reliability and resilience across Eversource’s service territories.
Lead the reliability, scalability, and observability of research compute clusters to enable large‑scale ML and HPC workloads for an innovative research-focused engineering team in California.
Experienced SRE needed to be a primary technical partner for customers, drive reliability and observability across cloud infrastructure, and lead incident management and automation efforts.
Senior Software Engineer - Infrastructure to design and operate scalable, multi-region AWS platform tooling and immutable infrastructure for Veeva's Vault CRM.
Lead a lean, globally distributed Technical Operations team to ensure reliability, security, and operational excellence for Netflix Customer Service technologies.
Provide expert field and remote technical support and training for advanced diagnostic imaging systems, acting as the primary technical liaison for field teams and customers across the U.S.
Netflix is hiring a Distributed Systems Engineer (L5) to design, operate, and improve backend delivery services (Spinnaker/Managed Delivery) that enable safe, frequent deployments across the company.
Lead facilities and maintenance at Base Factory 1 to build preventative maintenance systems, improve equipment availability, and support reliable production operations in Austin.
Lead site maintenance operations and a small technician team to deliver safe, reliable equipment performance and continuous operational improvements for a Cushman & Wakefield client site.
Provable is hiring a Senior Infrastructure Engineer to design, automate, and operate GCP and GKE infrastructure for a privacy-first Web3 platform.
C&W Services is looking for a Reliability Engineer Leader to lead reliability programs, manage maintenance planners/schedulers, and drive equipment performance improvements at a fulfillment center in Hesperia, CA.
Lead the scheduling function for LLNL’s complex maintenance portfolio, supervising schedulers and owning the integrated T-week look-ahead schedule while ensuring safety, compliance, and continuous improvement.
Experienced principal-level software engineer to design, build, and operate high-quality developer tooling and resilient infrastructure for Palo Alto Networks' Cortex platform in Santa Clara.
Lead the automation and operational lifecycle of hyper-scale production networks at ServiceNow, driving reliability through code, IaC, and robust incident response for federal and public sector environments.
Peraton is hiring an Azure Engineer to lead incident response, monitoring, and optimization for a multi-tenant Azure GovCloud environment while ensuring security and FedRAMP compliance.
At Hadrian, lead predictive analytics for manufacturing—building forecasting and reliability models that improve delivery, reduce cost, and inform operational decisions across aerospace production.
Experienced DevOps Manager needed to lead DevOps strategy, automation and engineering team development at a market-leading visual asset management SaaS company.
Sandisk is hiring a Summer 2026 Product Development Engineer intern to develop automation and quality-control software that enhances NAND product reliability at our Milpitas facility.
Datadog is looking for a Staff Software Engineer to evolve Atlas into a high-scale, durable workflow execution platform and become the orchestration backbone for internal AI and company-wide workflows.
SanDisk seeks a Senior Product Development Engineer in Milpitas to lead PCIe/NVMe SSD system test, qualification and production-readiness efforts.
AECOM is hiring a Data Center Section Manager – Power to lead technical delivery and business growth for data center energy projects with an emphasis on natural gas generation and substations in a remote capacity.
Experienced Quality Engineer wanted to lead QA processes and supplier quality for Etched’s advanced AI hardware production at our San Jose headquarters and partner factories.
ConductorOne is hiring a Site Reliability Engineer to build and run scalable, automated, and observable infrastructure that keeps their identity governance platform resilient and performant.
Micron is seeking a DRAM and Emerging Memory Test Development intern to develop qualification and reliability tests, build analysis scripts, and help characterize leading-edge memory devices at the Boise main site.
Novelis is hiring a Sr. Mechanical Engineer at its Terre Haute aluminum rolling facility to drive mechanical reliability, lead engineering projects, and optimize plant systems and assets.
Lead supplier quality and development for Starlink printed circuit boards, working cross-functionally to qualify suppliers, improve yields, and enable high-rate production.
Voleon is seeking a Senior Cluster Site Reliability Engineer to ensure high-availability, observability, and scalable operations for our research compute clusters across on-prem and cloud environments.
Mercor is hiring a Senior Software Engineer to design and scale production-grade systems that power workflows for leading AI labs and support rapid company growth.
Senior Mechanical Design Engineer to support sustaining and new-design activities for robotic laparoscopic stapler instruments at a leading medical-robotics company in Sunnyvale.
Veeam is seeking an Incident Manager to lead communications and coordination for SaaS incidents, improving reliability and customer trust through clear processes and tooling.
Anysource seeks a Staff Site Reliability Engineer to lead end-to-end enterprise deployments and run scalable, secure production infrastructure across Kubernetes and AWS.
Lead reliability and automation initiatives to keep critical enterprise SaaS systems performant and highly available in a remote-friendly, fast-paced environment.
Zapier is hiring an SRE to strengthen observability, incident response, and platform reliability across its cloud-native automation platform.
d-Matrix is hiring a contract Manufacturing Infrastructure Engineer in Santa Clara to build and maintain resilient Linux, PostgreSQL, and hybrid/cloud infrastructure for production manufacturing systems.
Catio is hiring a Senior SRE to design and build AWS infrastructure, IaC, and observability pipelines that power a fast-growing AI platform for technical leaders.
Campbell’s Pepperidge Farm is hiring a Process Specialist to drive ingredient system reliability and standardized recipe execution at its Lakeland bakery.
Experienced Site Reliability Engineer needed to drive reliability, automation, and cloud infrastructure improvements for Patreon's creator platform in a remote-capable role with optional NY or SF office attendance.
Lead and develop the Fremont facilities engineering and maintenance team to ensure reliable 24/7 wafer-fab operations, driving maintenance strategy, capital projects, and regulatory compliance for critical infrastructure.
As a Data Scientist on OpenAI’s Platform team, you will define and operationalize metrics, run experiments, and deliver insights that drive developer adoption and enterprise value for the API and platform.
Experienced Software Engineer II needed to develop Python microservices, automate infrastructure with Terraform, and maintain CI/CD pipelines in a cloud-native Kubernetes environment.
Lead serviceability and repair strategy for Palo Alto Networks hardware products, developing procedures, driving engineering changes, and improving repairability and reliability across the product lifecycle.
Experienced SRE needed to architect and run scalable, secure AWS infrastructure while driving observability, automation, and platform reliability across engineering teams.
A crypto-first CTO Advisor role to harden reliability, introduce pragmatic SRE/SDLC/MLOps cadence, and hand back operational runbooks and dashboards to the engineering team.
Developer Infrastructure Engineer role focused on building scalable cloud-native developer tooling and infrastructure to improve engineering productivity and reliability.
Below 50k*
0
|
50k-100k*
17
|
Over 100k*
80
|