Browse 184 exciting jobs hiring in Prometheus now. Check out companies hiring such as Elsevier, Anyscale, GenBio AI in Huntington Beach, San Diego, Buffalo.
Experienced SRE leader needed to manage multiple teams and advance cloud reliability, automation, observability, and security for LexisNexis Risk Solutions.
Help ensure production-grade reliability for Anyscale's distributed ML platform by building test automation, simulation, and observability tooling for Ray-based workloads.
Lead and shape the infrastructure powering GenBio AI's large biological models, focusing on Kubernetes GPU orchestration, MLOps pipelines, security, and cross-team operational excellence.
Work remotely on the team that operates and stabilizes detection content releases—managing deployments, runtime telemetry, first-level triage, and release communications for CrowdStrike's detection platform.
TheLoops is hiring a Senior Backend Software Engineer to build high-performance Java/Kafka-based distributed systems that power its enterprise AI Agent platform.
Nordstrom seeks an entry-level Platform Engineer to support and improve its Kubernetes compute platform using Terraform, GitOps, and modern observability and CI/CD practices.
Lead the design and delivery of scalable Python/Kafka data pipelines and orchestration patterns to integrate claims platforms into Brillio's Pisces hub.
Lead the design and automation of Linea's cloud-native infrastructure as a Senior DevOps Engineer at Consensys, focusing on AWS, Kubernetes, Terraform, and observability to support a fast-moving Layer-2 blockchain.
Boeing is hiring a Cloud Engineer to implement and operate CI/CD, IaC, secrets management, and observability tooling for AWS applications in a regulated, hybrid environment.
Onebrief is hiring a Senior Site Reliability Engineer to own reliability, observability, and secure operations for on-prem and cloud military deployments in Colorado Springs.
Lead Visa's Site Reliability Engineering efforts to deliver highly available, secure, cloud-native application platforms while driving automation and operational excellence.
Sierra is hiring a seasoned Site Reliability Engineer to own observability, scalability, and secure cloud infrastructure for its AI platform in San Francisco.
Lead the design and operation of Kubernetes-backed CI/CD and DevSecOps tooling for Boeing's E-7A program in Tukwila, ensuring secure, production-ready development environments and pipelines.
Lead GFiber's Network Reliability Engineering organization to define reliability strategy, run tier-2 incident response, and drive observability and automation across metro networks.
Canary seeks an experienced Lead Site Reliability Engineer to drive incident response, SLO frameworks, and platform reliability across its remote engineering organization.
ServiceNow is hiring a Senior Software Engineer - UI to design and deliver AI-powered observability user experiences, primarily focusing on frontend engineering in a remote role.
NetBox Labs is hiring a Senior DevOps Engineer to own infrastructure automation, CI/CD, and observability for their SaaS and self-managed products in a fast-paced, product-focused environment.
Senior-level SRE role focused on automating infrastructure and security controls, maintaining observability and SLOs, and improving reliability across Sonar’s global platform.
Lead the design and implementation of large-scale observability systems for GPU-powered AI and HPC workloads at NVIDIA's MARS team, enabling telemetry, analytics, and intelligent monitoring across world-class GPU infrastructure.
Lead development of secure, enterprise-grade developer tooling and integrations at Coder, focusing on Go-based distributed systems, IDE integrations, and AI-enabled developer experiences.
Cartesia is hiring a Cluster Infrastructure Engineer in San Francisco to build and operate large-scale GPU clusters and automation that power state-of-the-art multimodal model training and inference.
Provide Linux systems engineering and device-management expertise to maintain and enhance remote in-store digital menu board platforms.
Senior DevOps Engineer needed to design and operate cloud‑native, Kubernetes‑based infrastructure and CI/CD pipelines for NBCUniversal's local media and broadcast workflows.
Cape is hiring a Site Reliability Engineer to build and operate privacy-focused telecommunications infrastructure, improve system reliability and monitoring, and own FedRAMP accreditation for a fast-growing, mission-driven startup.
Exegy’s Managed Services Engineering team is looking for a hands-on DevOps Engineer to build automation, CI/CD, and observability for high-performance market data systems.
Senior DevOps Engineer role supporting AKS-based production systems, CI/CD automation, cost optimization, and 24x7 incident response for a mature marine transportation company in New Orleans.
Work as a DevOps Engineer supporting cloud, on‑prem, and containerized platforms to automate CI/CD, optimize platform performance, and improve operational reliability in a remote-first US role.
As a Senior Site Reliability Engineer for a high-growth platform, you will design and operate large-scale AWS infrastructure, build automation and observability, and partner with engineering teams to improve reliability and deployment velocity.
Poolside seeks an experienced Solutions Architect to help enterprise customers deploy and operate its scalable AI platform across cloud and hybrid infrastructures.
Senior Site Reliability Engineer needed to own large-scale AWS infrastructure, automate CI/CD and observability, and drive platform reliability for a high-growth, remote-friendly US company.
ServiceNow is hiring an AI-native Staff/Senior Staff Product Manager to lead the design and delivery of predictive, compliance-aware network observability and autonomous remediation for hyperscaler and sovereign cloud deployments.
An experienced Cloud Infrastructure Engineer is needed to architect, automate, and operate Kubernetes-based cloud platforms for a large-scale enterprise in a fully remote US role.
Lead the development of an AI-native, compliance-aware network observability platform that predicts degradations, automates compliance validation, and orchestrates autonomous remediation across hyperscaler and sovereign clouds.
Experienced backend engineer needed to design and operate high-scale Java/Spring microservices and event-driven systems powering VGS’s credential management and payment tokenization platforms.
Lead the architecture and engineering of next-generation LLM-driven agentic workflows for enterprise observability within ServiceNow's Global Cloud Services team.
SailPoint seeks a Software Engineer II to design, implement, and operate scalable microservices that ensure identity and account integrity across its Identity Security Cloud.
Experienced Site Reliability Engineer II needed to lead production reliability, observability, and automated cloud operations for a healthcare data platform.
Contribute to the VEN agent as an Engineering Intern focused on AI supportability, building diagnostics, observability, and system services that help ensure reliable AI-driven security in production.
Senior-level monitoring analyst to architect and operate observability and log-management solutions (Splunk and related tools) that keep a high-volume global payments platform running 24x7.
Experienced HPC Support Engineer needed to troubleshoot GPU/HPC clusters, mentor peers, and deliver high-quality customer support for Lambda’s deep learning cloud.
Voltage Park is hiring an Infrastructure Operations Engineer to operate and scale distributed bare-metal GPU infrastructure and platform services for enterprise AI workloads.
Experienced backend engineer sought to help design, operate, and scale Grafana Labs' telemetry databases (Loki, Mimir, Tempo) for Grafana Cloud in a remote-first, open-source environment.
Work remotely with a U.S. client as a DevOps Engineer focused on cloud infrastructure, CI/CD pipelines, container orchestration, and system reliability.
Lead the reliability, performance, and automation of Visa's mission-critical databases across PostgreSQL, Oracle, and MySQL to ensure high availability and a great developer experience.
Experienced backend engineer wanted to design and operate high-scale, reliable services for a leading email security platform in a remote US role.
ServiceNow is hiring a Senior Network Operations Engineer to operate and automate global Layer 4–7 networks (F5/NGINX) and support high-availability production environments for US Public Sector customers.
Fliff is hiring a Go Backend Engineer II to help deliver scalable, low-latency backend services powering its social sports gaming platform.
FIS is seeking a Cloud DevOps/Network Engineer Specialist to automate, secure, and manage AWS infrastructure and network services that support internal teams and client solutions.
Camunda is hiring a Senior Cloud Infrastructure Engineer to architect and operate its Kubernetes-based multi-cloud platform and help drive reliability, observability, and automation for a global production environment.
FreeWheel is hiring a Senior GoLang Engineer to design and deliver scalable, low-latency adtech systems and lead engineering best practices at our Chicago office (onsite 4x/week).
Below 50k*
0
|
50k-100k*
7
|
Over 100k*
184
|