Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Engineering Manager - Observability image - Rise Careers
Job details

Engineering Manager - Observability

We're here to help the smartest minds on the planet build Superintelligence. The labs pushing the edge? They run on Lambda. Our gear trains and serves their models, our infrastructure scales with them, and we move fast to keep up. If you want to work on massive, world-changing AI deployments with people who love action and hard problems, we're the place to be.


If you'd like to build the world's best deep learning cloud, join us. 


*Note: This position requires presence in our San Francisco or Seattle office location 4 days per week; Lambda’s designated work from home day is currently Tuesday.

The Lambda Observability team builds and operates large scale monitoring systems for our AI cloud product suite. We deploy observability solutions across the stack, from datacenter infrastructure to our in-house software stack. Keeping those offerings reliable and instantly detecting issues in the latest high-performance AI clusters is what makes us tick.

Along with the Platform Engineering organization, we help to build the foundations that unlock product excellence and a highly reliable experience for our customers.

Our expertise lies at the intersection of:

  • Scalable Observability Platforms: We build and operate mission-critical platforms for metrics, logs, and traces based on both open-source software and systems developed in-house.

  • AI Infrastructure Observability: We design observability solutions for large-scale AI clusters running the latest GPU, Networking, and Storage technologies.

  • Observability Practices: We engage across the company to promote best practices, help teams adopt our platforms, and enable applications that require observability data.

About the Role:

We are seeking a seasoned Observability Engineering Manager with deep experience in development and operation of modern observability platforms. You will hire and guide a team of observability engineers in building out critical pillars of our internal observability stack. You will lead the team in building monitoring solutions for new products, and in measuring and reporting the availability of our products.

Your role is not just to manage people, but to coordinate the delivery of observability solutions to customers inside and outside Lambda. Your leadership will be pivotal in ensuring our ability to deliver a high-quality, reliable product experience.

This is a unique opportunity to work at the intersection of large-scale observability systems and the rapidly evolving field of artificial intelligence infrastructure. You will be building the systems that monitor some of the world’s most advanced AI solutions.

What You’ll Do

  • Team Leadership & Management:

    • Grow/Hire, lead, and mentor a team of high-performing observability engineers and SREs.

    • Foster a culture of technical excellence, collaboration, and customer service.

    • Conduct regular one-on-one meetings, provide constructive feedback, and support career development for team members.

    • Drive outcomes by managing project priorities, deadlines, and deliverables.

  • Technical Strategy & Execution:

    • Work with the engineering team to drive strategy for Lambda internal and customer observability solutions.

    • Improve observability of AI infrastructure and develop new monitoring solutions as new products are introduced.

    • Lead the broader engineering organization in adoption of Observability and SRE practices.

    • Manage costs of both vendors and internally developed platforms.

    • Lead team in the continued development of our existing Metrics solutions based on the Prometheus and OpenTelemetry ecosystems.

    • Lead team in tasks related to delivery of new Logging and Tracing solutions based on Clickhouse.

    • Guide team in problem identification, requirements gathering, solution ideation, and stakeholder alignment on engineering RFCs.

    • Participate in design of solutions for bringing observability data to our customers.

    • Identify gaps in our observability posture and drive resolution.

    • Lead the team in supporting internal customers from across Lambda engineering.

  • Cross-Functional Collaboration:

    • Collaborate with the infrastructure and HPC teams on infrastructure monitoring and alerting.

    • Work closely with Lambda product engineering teams on instrumentation and best practices usage of our platforms.

    • Work to understand the needs of engineering teams and drive our Observability solutions towards self-service.

    • Manage a short list of vendors that provide SaaS solutions in the monitoring space.

You

  • Experience:

    • 10+ years of experience in observability systems or platform engineering with at least 3 years in a management or lead role.

    • Demonstrated experience leading a team of engineers and SREs on complex, cross-functional projects in a fast-paced startup environment.

    • Significant experience in environments that require the monitoring of bare-metal infrastructure is preferred.

    • Experience with a wide variety of modern open-source observability software.

    • Strong background in software engineering and the SDLC.

    • Strong project management skills, leading planning, project execution, and delivery of team outcomes on schedule.

    • Extensive experience with site reliability engineering and ability to champion improved SRE practices.

    • Experience building a high-performance team through deliberate hiring, upskilling, performance-management, and expectation setting.

Nice to Have

  • Experience:

    • Experience driving cross-functional engineering management initiatives (coordinating events, strategic planning, coordinating large projects).

    • Experience driving organizational improvements (processes, systems, etc.)

    • Experience with Kubernetes, designing scalable distributed systems,

Salary Range Information

The annual salary range for this position has been set based on market data and other factors. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

  • Founded in 2012, ~400 employees (2025) and growing fast

  • We offer generous cash & equity compensation

  • Our investors include Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, US Innovative Technology, Gradient Ventures, Mercato Partners, SVB, 1517, Crescent Cove.

  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability

  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG

  • Health, dental, and vision coverage for you and your dependents

  • Wellness and Commuter stipends for select roles

  • 401k Plan with 2% company match (USA employees)

  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

Lambda Glassdoor Company Review
3.4 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
Lambda DE&I Review
No rating Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
CEO of Lambda
Lambda CEO photo
Stephen Balaban
Approve of CEO

Average salary estimate

$220000 / YEARLY (est.)
min
max
$180000K
$260000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Lambda logo

What it's like to work at Lambda

Read Reviews
Similar Jobs
Photo of the Rise User

Lead and scale Lambda's Detection & Response organization to deliver automated, enterprise-grade detection, AI-enabled hunting, and resilient incident response for a world-class AI infrastructure provider.

Photo of the Rise User
Posted 16 hours ago

Help build and scale Pearpop’s creator platform as a Mid-level Full Stack Engineer focused on Node.js back-end development and front-end integration with React.

Photo of the Rise User
Posted 8 hours ago
Dental Insurance
Disability Insurance
Flexible Spending Account (FSA)
Health Savings Account (HSA)
Vision Insurance
Sabbatical
Paid Holidays

Lead a high-performing engineering team at Handshake to build employer-facing, revenue-generating products that connect employers with early talent.

Photo of the Rise User

Lead development of low-level network systems software at Arista's Austin engineering team, working on device drivers, hardware control, and performance optimization for high-scale networking products.

Photo of the Rise User
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave

A new-graduate software engineer role on NVIDIA's TensorRT team to help design and optimize high-performance deep learning inference software for specialized platforms.

Photo of the Rise User
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony

Cigna-Evernorth is hiring a Cloud Engineering Senior Advisor to architect and implement scalable cloud networking, automation pipelines, and application integrations in a hybrid environment supporting healthcare services.

Photo of the Rise User
Posted 12 hours ago

Senior Fullstack Java Developer needed to lead end-to-end development of Java/Angular applications for a US-based remote team, with emphasis on quality, automated testing, and collaboration.

Photo of the Rise User
TEGNA Inc. Hybrid WBIR-TV Knoxville
Posted 22 hours ago

TEGNA seeks a React Native developer experienced with Fire TV and CTV platforms to build performant, large-screen streaming apps and integrate video, ad tech, and analytics.

Photo of the Rise User
Posted 7 hours ago

Shepherd is hiring a Senior DevOps/SysOps Engineer to lead and scale cloud infrastructure, automation, and platform reliability for a fast-growing insurtech.

Work on LIGER™, LMI’s GenAI platform, as a Full-Stack Engineer developing scalable Python backends, ensuring code quality, and collaborating with stakeholders to deliver mission-ready solutions for government customers.

Photo of the Rise User
Hewlett Packard Enterprise | HPE Hybrid San Jose, California, United States of America
Posted 15 hours ago

Lead development of cloud-managed SD-WAN applications at HPE by building React.js frontends and Node.js APIs for Silver Peak’s Orchestrator/EdgeConnect platform.

Photo of the Rise User
Aretum Hybrid No location specified
Posted 23 hours ago

Aretum is hiring a Senior .NET Developer to lead .NET modernization and Azure migration efforts for mission-critical government and homeland security systems.

Posted 12 hours ago

TwelveLabs is hiring a Staff Frontend Engineer to architect and ship performant React/Next.js interfaces that power next-generation video-AI products in a hybrid San Francisco-based role (remote within CA/WA considered).

Photo of the Rise User
Posted 23 hours ago

NT Concepts is hiring a remote Software Developer to build and sustain scalable ML and data-driven systems for national-security-focused projects.

Lambda provides Artificial Intelligence and Machine Learning infrastructure to companies like Apple, Intel, Microsoft, MIT, Harvard, the Federal Government, and the DOD. Were headquartered in the Dogpatch and are a short walk from the 22nd Street ...

10 jobs
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
August 27, 2025
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!