We’re looking for an experienced Site Reliability Engineer (SRE) to take ownership of our production systems’ availability, latency, performance, and capacity. In this role, you’ll apply your expertise in automation, monitoring, and resilient system design to maintain and improve our critical, large-scale infrastructure.
You will:
Respond to customer support requests and participate in our 24/7 support rotation
Maintain internal documentation and deployment playbooks
Modify and test server configurations, then deploy to production
Monitor infrastructure and respond to alerts
Automate tasks using tools like Ansible, Terraform, and Nomad
Contribute to internal tooling and platform improvements
Stay current with changes in the protocols and tooling we support
Other duties as assigned
Although the focus of this role is SRE work, there’s room to grow into other areas depending on your strengths — whether that’s platform engineering, networking, or data system architecture.
6+ years of relevant work experience in systems or infrastructure roles
Strong experience with Ansible
Experience with Prometheus, Grafana, and related monitoring tools
Solid understanding of networking and Linux-based systems
Hardware knowledge and experience managing physical or cloud-based fleets
Knowledge of Kubernetes
Experience with Blockchain is a plus
Familiarity with the HashiCorp stack: Nomad, Consul, Vault
Experience with HAProxy or similar load balancing software
Programming experience in Go, Rust, or Python is a plus
Don’t meet all the “preferred” criteria? Don’t let that stop you! Let us know if your application where you’d still need to get up to speed – the most important thing to us is that you love taking on big challenges, and learning new skills while solving problems.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Great Gray is looking for a Senior DevOps Engineer to modernize our Azure platform, lead CI/CD migration to GitHub Actions, and improve developer self-service and observability.
Lead offensive security engagements and product audits to identify, communicate, and help remediate vulnerabilities across ServiceNow’s cloud products.
GEICO is hiring a Data Services Engineer II to architect and implement scalable, resilient data and platform solutions that support our evolving insurance technology landscape.
Staff Platform Engineer to design and maintain secure, reliable AWS and Kubernetes platform tooling and automation that empowers application teams across Laurel.
Drive platform reliability and observability at Forbright as a Senior SRE, building automated, resilient cloud systems that support the bank's digital banking and commercial lending services.
Vocal Media is hiring a Senior Software Engineer to design scalable data pipelines, maintain the data warehouse, and mentor engineers while shaping the company's technical roadmap.
Lead the Site Reliability Engineering efforts at Aritzia to design resilient, observable, and automated systems that support the company’s digital commerce and customer platforms.
Be a key engineer at FleetWorks, owning end-to-end development of the broker-facing platform that automates freight bookings and scales to handle high volumes of carrier interactions.
PingWind seeks a Junior Software Developer with 3+ years of software development experience, Top Secret clearance, and DoD 8140 certification to join its federal-focused engineering teams.
Staff Software Engineer (Backend C++) to advance the core of ServiceNow's Postgres-based RaptorDB by designing and implementing high-performance, reliable database internals at scale.
Contribute as a full-time Summer 2026 Software Engineering Intern in NYC, building production features like LLM-based moderation, recommendation systems, and new product surfaces for Fizz.
A frontend engineer role at Pulley to build fast, user-focused React applications that help customers navigate complex permitting processes across the nation.
Gridware seeks a Senior Site Reliability Engineer to lead AWS infrastructure, Kubernetes (EKS) operations, GitOps deployments, and observability to ensure a secure, scalable, and reliable platform.