Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Staff Software Engineer - Site Reliability and Observability image - Rise Careers
Job details

Staff Software Engineer - Site Reliability and Observability

Who We Are

Teraswitch is on a mission to provide the highest performance, lowest latency bare metal servers in the world. With 20 datacenter locations around the world, Teraswitch has served thousands of customers across 185 countries with our solutions. Founded by Brendan Mannella, Teraswitch is one of the largest privately-held infrastructure companies in the world.

The Job

The Software Engineering Site Reliability Engineer (SRE) is a Software Engineer responsible for ensuring the reliability, scalability, and performance of software systems. Their job profile includes:

  • System Monitoring and Troubleshooting: Monitoring the performance and availability of software systems, identifying and resolving issues, and implementing proactive measures to prevent future incidents.

  • Automation and Infrastructure: Developing and maintaining automation tools and infrastructure to streamline software deployment, configuration management, and system monitoring.

  • Performance Optimization: Analyzing system performance, identifying bottlenecks, and implementing optimizations to improve the efficiency and scalability of software systems.

  • Incident Response and Root Cause Analysis: Responding to incidents, conducting root cause analysis, and implementing corrective actions to prevent similar incidents in the future.

  • Collaboration with Development Teams: Collaborating with software development teams to ensure that reliability and scalability considerations are incorporated into the software design and implementation.

  • Continuous Improvement: Identifying opportunities for process improvement, implementing best practices, and driving initiatives to enhance the reliability and performance of software systems.

  • Develop Systems for Internal Developers: Identify areas that can be improved in the Software Development Lifecycle to remove cognitive overhead on developers and help them on the happy path towards developers sustainable, reliable, and resilient software utilizing industry standard practices

Additional Job Description

What You'll Do

  • Implement scalable, reliable, secure SRE and Observability platform to monitor health of our production system and provide a holistic view of the environment.

  • Deliver tools/software to improve the reliability, scalability and operability of services.

  • Collaborate with engineering teams to analyze and provide inputs in architecture, infrastructure resources, observability to achieve reliability and scalability goals.

  • Serve as a technical leader for key initiatives across the organization, identify potential issues and opportunities, and lead teams to architect the next generation reliability software.

  • Deliver impact by building software that helps maintain reliability on our backend and frontend systems.

  • Improve best practices through developing technical implementations that solve multiple developer and business needs.

  • Participate in 24/7 On-call Rotation of critical systems.

Your Skills & Abilities (Required Qualifications)

  • 7+ years of hands-on SRE experience (software development, systems monitoring) with Software Development experience (Java, golang, python)

  • Experience building and operating high-availability, fault-tolerant, scalable, distributed software in production: Building monitoring, defining alerts, writing run books, establishing dashboards etc.

  • Experience with monitoring and logging tools, such as Grafana, Loki, Logstash, Clickhouse, etc

  • Experience with owning and maintaining software including the SDLC and deployment.

  • Strong working knowledge of Docker, Kubernetes, Terraform, Chef or Ansible .

  • Experience troubleshooting production applications driving mitigation and remediation.

  • BS/MS in Computer Science/Engineering preferred

Compensation and Benefits

Along with competitive pay, as a full-time Teraswitch employee, you are eligible for the following benefits at day 1 of hire:

Health, Dental and Vision Insurance

401k with company profit sharing

Flex PTO and 11 Company Paid Holidays

Average salary estimate

$180000 / YEARLY (est.)
min
max
$140000K
$220000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs
Photo of the Rise User
Posted 7 hours ago

Lead the design and operation of the hybrid infrastructure and high-bandwidth telemetry systems that enable rapid, reliable vehicle testing and integration at REGENT.

Posted 40 minutes ago

Help build and scale Termblocks' agentic AI platform for U.S. capital markets as a Founding AI Engineer, owning features from design to production and experimenting with cutting-edge models.

Photo of the Rise User
Posted 19 hours ago

Experienced Python backend engineer needed to design and optimize APIs and backend systems for high-performance consumer applications at ItsaCheckmate, with remote flexibility and US Eastern Time collaboration.

Posted 10 hours ago

Develop and deploy machine learning models and scalable ML pipelines at Arnold AFB to improve test operations, predictive maintenance, and facility automation for a government-focused engineering contractor.

Photo of the Rise User
Posted 16 hours ago

Lead the design and delivery of scalable, secure iOS features at Bumble, owning projects end-to-end and mentoring junior teammates.

At Union Technologies, this Senior Forward Deployed Software Engineer will lead on-site deployments, automation, and industrial integrations to get factory systems from dock-to-production.

Posted 15 minutes ago

Antimetal seeks builders based in New York who are passionate about infrastructure, systems thinking, and fast-paced startup environments to join and grow with the team.

Photo of the Rise User
Posted 2 hours ago
Mission Driven
Customer-Centric
Transparent & Candid
Growth & Learning
Fast-Paced
Inclusive & Diverse
Work/Life Harmony
Rise from Within
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Education Stipend
Learning & Development
Bias Training
Performance Bonus

Lead the design and delivery of enterprise automation capabilities at HubSpot by building integrations, advising citizen developers, and evolving our Center of Excellence.

Posted 9 hours ago

Be a founding senior engineer at an early-stage fintech startup building agentic AI for capital markets, owning features from design to production and driving model-led product innovation.

Posted 15 minutes ago

BB&E, an employee‑owned engineering and consulting firm, is hiring a SharePoint & Power Platform Developer to build and sustain SharePoint and Power Platform solutions that support NAVFAC Atlantic programs and enhance team collaboration.

Photo of the Rise User
Brillio Hybrid Saint Louis, Missouri, United States
Posted 2 hours ago

Brillio is hiring an AWS Connect Architect/Leader to design and deliver scalable Amazon Connect contact center solutions using Node.js and serverless AWS technologies.

Photo of the Rise User
Posted 19 hours ago

Semperis seeks a Senior Backend Engineer in the Dallas area to build scalable .NET Core services and improve the architecture of its cloud SaaS cybersecurity platform.

Posted 21 hours ago

An early-stage AI Product Engineer role to take generative-AI prototypes into production, building trustworthy, auditable automation for local government workflows at GovWell.

MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
No info
HQ LOCATION
No info
EMPLOYMENT TYPE
Full-time, onsite
DATE POSTED
August 28, 2025
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!