Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Data Center Systems Operations Engineer image - Rise Careers
Job details

Data Center Systems Operations Engineer

We're here to help the smartest minds on the planet build Superintelligence. The labs pushing the edge? They run on Lambda. Our gear trains and serves their models, our infrastructure scales with them, and we move fast to keep up. If you want to work on massive, world-changing AI deployments with people who love action and hard problems, we're the place to be.


If you'd like to build the world's best deep learning cloud, join us. 

*Note: This position prefers presence in our Bay Area office locations, but is open to remote presence for the right candidate.

About the Job

As Lambda continues to scale its AI platform and customer base, infrastructure decisions must be tightly aligned with product roadmaps, platform growth, and fiscal discipline. The Systems Operations Engineer will own availability analysis, long-term improvement of utilization, input into strategic design, and implementation of key programs across the entire Infrastructure Stack.

This role sits within the Data Center Infrastructure (DC Infra) team and will work cross-functionally with Product, Platform Engineering, and Observability to understand overall health, analyze ongoing/potential issues, make recommendations and changes to our overall design, and ownership of key programs to improve the overall business.

This position is a critical link between the HPC/HW systems and DC Infra—and will help ensure our designs and operations most effectively maximize availability and reliability across our entire Platform.

What You’ll Do

Availability Analysis

  • Own end-to-end unification of availability (number of 9s) calculations across Lambda's data center products and various data center footprints, from the power/BMS/cooling and down into the rack/GPU level, and providing adequate telemetry back to facilities, site operations, and at the platform level

  • Work with thermal/hardware team to understand AI workload characteristics on mechanical systems and need for different BMS control methodologies as Direct to Liquid Chip (DLC) Cooling technologies improve and densities increase

  • Coordinate across DC Infra team to calculate estimated availabilities for new data center designs

  • Work with product teams and capacity forecasting to understand how design decisions effecting availability impact time to market and satisfy customer needs

Utilization Analysis and Oversubscription Strategy

  • Own end-to-end utilization analysis across Lambda's entire data center infrastructure

  • Analyze DC designs to understand peak possible capacity under varying conditions

  • Build oversubscription strategy and lead/own company workstream to maximize available MW w/o impacting GPU reliability and customer experience

  • Ensure appropriate availability considerations are included

Observability and Analytics

  • Coordinate with the observability team to ensure appropriate points are monitored to understand data center characteristics loads, especially under AI workloads

  • Help the team understand where approximate warning/danger levels are

  • Use observations and warning/danger levels to inform BOD for future Data Centers and suggest upgrades/modifications to current Data Centers

  • Develop strategy for a data center fleet health dashboard

  • Help provide structure ensuring overall day-to-day and long-term health can be understood from a 20k foot level with the ability to drill down into the details

Power Capping Strategy and Implementation

  • Coordinate with Site Operations team to strategize and build out power capping capabilities, related to worst-case scenario response/protection as we start aggressively employing oversubscription

  • Identify appropriate IT blocks where real-time data is monitored

  • Analyze, propose, and implement a rigorous testing process that iteratively finds and eliminates stranded power and cooling capacity related to utilization

Site Selection Technical Review

  • Conduct end-to-end technical evaluations of prospective data center sites, including power sufficiency and stability, cooling infrastructure and mechanical systems, and network topology feasibility

  • Perform risk assessments and recommend sites based on infrastructure fit and growth capacity.

  • Coordinate with DC Infra, Legal, and Business Strategy teams to ensure site selections align with workload and deployment timelines.

Cluster-to-Facility Requirements Alignment

  • Collaborate with HPC Architecture team and Capacity Manager to translate cluster-level hardware and workload requirements into facility-level specifications.

  • Define infrastructure interface requirements (power, cooling, rack layouts, interconnects, monitoring) to ensure alignment between compute stack and facility capabilities.

  • Support long-term infrastructure roadmap development to accommodate future hardware designs, density shifts, and workload patterns.

  • Work with Capacity Manager to understand various levers that can be employed to accelerate growth during demand surges.

You

  • Self-starter with a proven ability to independently dive into the details to understand and solve hard problems across data center infrastructure and operations

  • Ability to provide world-class analysis, boiling complex issues into the root cause or few key drivers

  • 10+ years of experience working in directly in or closely with data center infrastructure and HPC/HW operations

  • Deep familiarity with AI or compute workload patterns, scaling dynamics, and infrastructure cost drivers

  • Ability to synthesize complex technical and business inputs into clear, actionable strategic recommendations

  • Excellent communication and collaboration skills across technical, operational, and financial stakeholders

Preferred Experience

  • Prior experience in hyperscale or cloud infrastructure environments

  • Familiarity with GPU cluster sizing, workload forecasting, or energy-efficient compute architectures

  • Working knowledge of typical Data Center Infrastructure designs, topologies, systems and associated reliability/availability calculations

  • Knowledge of DCIM tools, telemetry systems, or utilization analytics platforms

  • Engineering degree from university, Masters preferred.

  • Experience working across multi-disciplinary and non-technical teams to explain findings

Salary Range Information

The annual salary range for this position has been set based on market data and other factors. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

  • Founded in 2012, ~400 employees (2025) and growing fast

  • We offer generous cash & equity compensation

  • Our investors include Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, US Innovative Technology, Gradient Ventures, Mercato Partners, SVB, 1517, Crescent Cove.

  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability

  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG

  • Health, dental, and vision coverage for you and your dependents

  • Wellness and Commuter stipends for select roles

  • 401k Plan with 2% company match (USA employees)

  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

Lambda Glassdoor Company Review
3.4 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
Lambda DE&I Review
No rating Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
CEO of Lambda
Lambda CEO photo
Stephen Balaban
Approve of CEO

Average salary estimate

$200000 / YEARLY (est.)
min
max
$160000K
$240000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Lambda logo

What it's like to work at Lambda

Read Reviews
Similar Jobs
Photo of the Rise User

Greenlight needs a Senior Production Operations Engineer to lead SRE practices, automation, and infrastructure reliability for its high-scale fintech platform.

Prairie View A&M University invites qualified adjunct instructors in Electrical and Computer Engineering to join a part-time teaching pool to deliver undergraduate and graduate coursework on campus and online as needed.

Posted 18 hours ago

Experienced Process Utilities Engineer sought to lead discipline design, budget management and technical delivery for life‑sciences projects at CRB's Medford office with hybrid work options.

Photo of the Rise User
Posted 18 hours ago

AECOM seeks a licensed Civil Engineering V in Oakland to lead complex infrastructure projects, provide technical direction and client-facing presentations, and drive quality and budget performance.

Posted 17 hours ago

Lead the process utilities discipline for life-sciences projects at CRB, overseeing design execution, stakeholder coordination, and mentoring across projects from concept to commissioning.

Entry-level civil engineer needed at a respected Atlanta water/wastewater design firm to support construction administration, field coordination, and project documentation.

Photo of the Rise User

Experienced Electrical Designer needed to produce SmartPlant 3D and 2D AutoCAD/SmartSketch electrical designs, cable management data, and construction-ready drawings for a long-term federal facility contract in Ogden, UT.

Photo of the Rise User
Posted 9 hours ago

AECOM is hiring an experienced High Voltage Transmission Electrical Engineer specializing in HVDC systems to support feasibility, design, studies, construction support and commissioning across utility and renewable infrastructure projects.

Photo of the Rise User

AECOM is hiring an Entry-Level Process Mechanical Water/Wastewater Designer in New York to support water and wastewater projects starting Spring/Summer 2026 with responsibilities in Revit/BIM, process flows and multidisciplinary coordination.

Photo of the Rise User
Posted 3 hours ago

Lead Civil 3D standards, deployments, and training at an employee-owned engineering firm, improving consistency and efficiency across multi-state projects.

Posted 6 hours ago

An entry-level Graduate Manufacturing Engineer role at Airbus’s Mobile manufacturing campus offering a two-year rotational development program to build manufacturing engineering and leadership skills.

Photo of the Rise User

KPFF Special Projects Division seeks a motivated civil engineering student for a summer structural engineering internship in Long Beach supporting ports, waterfront, and heavy-civil projects.

Photo of the Rise User
Audinate Hybrid Remote, United States
Posted 6 hours ago

Audinate is looking for an experienced Field Application Engineer to support OEMs and ISVs across the Americas with Dante AV integration, presales technical support, and field troubleshooting.

Lambda provides Artificial Intelligence and Machine Learning infrastructure to companies like Apple, Intel, Microsoft, MIT, Harvard, the Federal Government, and the DOD. Were headquartered in the Dogpatch and are a short walk from the 22nd Street ...

9 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
September 1, 2025
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!