Job details

Observability Engineer

Our mission at TensorWave Cloud is to build seamless, secure, reliable, and resilient AI infrastructure at scale, eliminating barriers and challenging the status quo to empower builders and support AI innovation.

About the role

We are looking for an Observability Engineer who is deeply obsessed with Grafana, Prometheus, and modern observability practices. This role exists to ensure our systems are measurable, understandable, and debuggable at all times.

You will own the observability stack end-to-end — from instrumentation standards to dashboards, alerts, and signal quality — and work closely with infrastructure, platform, and application teams to make sure nothing important fails silently.

If you think about metrics before features, believe bad alerts are worse than no alerts, and treat Grafana dashboards as first-class products, this role is for you.

Responsibilities

Own and evolve our observability and monitoring platform, with Grafana and Prometheus at its core
Design, build, and maintain high-quality metrics pipelines using Prometheus and related tooling
Create clear, actionable Grafana dashboards that tell a story — not just charts
Define and maintain alerts that are meaningful, actionable, and low-noise
Establish and enforce observability standards across services (metrics, logs, traces)
Partner with engineering teams to instrument applications correctly
Lead improvements to alerting strategies, SLOs, and SLIs
Support incident response by helping teams quickly understand what broke and why
Continuously evaluate and improve signal quality, cardinality, and cost
Identify observability gaps and eliminate blind spots before they become outages

You Are Obsessed With:

Grafana dashboards that instantly explain system health
Prometheus metrics that are intentionally designed, not accidental
Alerts that wake people up only when action is required
Monitoring that scales with system complexity
Observability as a product, not an afterthought

Required Experience

Strong hands-on experience with Grafana and Prometheus
Deep understanding of metrics-based observability
Experience designing monitoring and alerting systems at scale
Strong knowledge of alerting best practices (burn rates, SLO-based alerts, noise reduction)
Experience working with distributed systems and cloud or Kubernetes environments
Ability to reason about system behavior using telemetry
Comfortable working across teams to improve instrumentation and visibility

Preferred Experience

Experience with OpenTelemetry
Familiarity with logs and traces (Loki, Tempo, Jaeger, etc.)
Kubernetes observability experience
Experience operating observability systems in high-scale or production-critical environments
Infrastructure-as-Code experience (Terraform, Helm, etc.)

What We Bring

Mission driven company
Competitive Salary
Stock Options
100% paid Medical, Dental, and Vision insurance
Life and Voluntary Supplemental Insurance
Short Term Disability Insurance
Flexible Spending Account
401(k)
Flexible PTO
Paid Holidays
Parental Leave
Mental Health Benefits through Spring Health

We’re looking for resilient, adaptable people to join our team, people who believe in the mission and think at massive scale. The solutions that worked on a handful of devices will not work at Exascale. Be prepared to be pushed daily, to learn a lot, and literally build the future.

TensorWave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, national origin, or veteran status.

Observability Grafana Prometheus SLO SLI Kubernetes OpenTelemetry Monitoring Loki Tempo Terraform Monitoring Engineer Site Reliability Engineer Alerts Dashboards

Average salary estimate

$160000 / YEARLY (est.)

min

max

$140000K

$180000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Senior Software Engineer, Mobile

PlayOn! Sports Hybrid Remote

VIEW

Posted 5 hours ago

Senior Software Engineer, Mobile to drive the React Native mobile experience for PlayOn's OTT streaming platform, improving performance and scalability for millions of viewers.

Lead Platform Engineer - Search Platform

TetraScience Hybrid No location specified

VIEW

Posted 22 hours ago

Lead the design and operation of a production-grade scientific search platform that combines keyword, metadata, and semantic retrieval to accelerate discovery across bio-pharma R&D.

Software Engineer / Principal Software Engineer

ngc Hybrid GAWR03GC

VIEW

Posted 2 hours ago

Northrop Grumman is hiring an experienced embedded Software Engineer / Principal Software Engineer to develop and integrate software for avionics support systems at Robins Air Force Base.

Software Engineer II (Backend, Growth)

WHOOP Hybrid Boston, MA

VIEW

Posted 22 hours ago

WHOOP is hiring a Backend Software Engineer II to build scalable APIs and event-driven services that power acquisition, e-commerce, and growth experiments in support of member growth.

Senior Software Engineer

Salesforce Hybrid California - San Francisco

VIEW

Posted 22 hours ago

Inclusive & Diverse

Rise from Within

Mission Driven

Diversity of Opinions

Work/Life Harmony

Feedback Forward

Take Risks

Collaboration over Competition

Medical Insurance

Dental Insurance

Vision Insurance

Paid Time-Off

Maternity Leave

Paternity Leave

Mental Health Resources

Life insurance

Disability Insurance

Health Savings Account (HSA)

Flexible Spending Account (FSA)

Employee Resource Groups

As a Senior Member of Technical Staff on Data Cloud One, you will design and operate scalable multi-org distributed systems and APIs that power Salesforce's Data Cloud and AI platform.

Software Engineer

Comulate Hybrid San Francisco

VIEW

Posted 18 hours ago

Work onsite in San Francisco as a software engineer at Comulate, building end-to-end systems and customer-facing features for an AI-first platform transforming insurance operations.

Forward Deployed Software Engineer - Autonomous Systems C2

Palantir Technologies Hybrid Seattle, WA

VIEW

Posted 17 hours ago

Inclusive & Diverse

Rise from Within

Mission Driven

Diversity of Opinions

Work/Life Harmony

Take Risks

Startup Mindset

Collaboration over Competition

Medical Insurance

Dental Insurance

Vision Insurance

Paid Time-Off

Maternity Leave

Paternity Leave

Mental Health Resources

Learning & Development

Work Visa Sponsorship

401K Matching

Equity

Performance Bonus

Palantir is hiring a Forward Deployed Software Engineer to build and deploy Command-and-Control software for multi-modal autonomous systems used in operational missions.

Full Stack Engineer

Kepler AI (formerly Keru.ai) Hybrid New York City

VIEW

Posted 3 hours ago

Join Kepler AI as a Full Stack Engineer to build end-to-end, production-grade frontend and backend systems that power AI-driven financial research for enterprise clients.

Mission Systems Software Engineer (Full Stack) - Mission Operations

Lynk Hybrid 3800 Concorde Parkway, suite 1500, Chantilly VA

VIEW

Posted 4 hours ago

Lynk is hiring a hands-on Mission Systems Software Engineer to develop and support full-stack mission operations tools that keep satellites and networks running reliably.

Software Engineer II-Backend (C#, AWS, node.js)

Versant Holdings Hybrid 900 Sylvan Avenue, Englewood Cliffs, NEW JERSEY

VIEW

Posted 8 hours ago

VERSANT Media is hiring a Software Engineer II (Backend) to develop and maintain C#/Node.js services and APIs that power Fandango and Sports Next products.

Frontend Software Engineer - University Graduate 2026

Verkada Hybrid San Mateo, CA United States

VIEW

Posted 2 hours ago

Mission Driven

Inclusive & Diverse

Take Risks

Collaboration over Competition

Growth & Learning

Verkada seeks a 2026 university graduate Frontend Software Engineer to build polished React and React Native product experiences at our San Mateo HQ.

Software Development Co-op

Motorola Solutions Hybrid ONT39

VIEW

Posted 2 hours ago

Motorola Solutions is hiring a Software Development Co-op to support development and testing of mobile repeater products by designing tests, automating regression suites, and contributing code and tooling.

Engineering Manager, Insights

Decagon Hybrid San Francisco

VIEW

Posted 8 hours ago

Lead the Insights engineering team at Decagon to build and scale analytics, detection, and recommendation products that help customers understand and act on conversational data.