Lambda, The Superintelligence Cloud, builds Gigawatt-scale AI Factories for Training and Inference. Lambda’s mission is to make compute as ubiquitous as electricity and give every person access to artificial intelligence. One person, one GPU.
If you'd like to build the world's best deep learning cloud, join us.
Travel: 50% , Travel required to various data center sites.
What You’ll Do:
We are seeking an accomplished Advanced Cooling Facilities Manager specializing in Direct Liquid Cooling (DLC) systems to lead the global strategy, implementation, and operational excellence of Lambda’s next-generation liquid cooling infrastructure. This role will define methodologies and standards for the deployment, optimization, and scaling of cooling systems that enable Lambda’s GPU Cloud to deliver industry-leading performance for AI and machine learning workloads.
With deep domain expertise in liquid cooling technologies and critical facilities management, you will drive the design and operation of complex cooling ecosystems, including Coolant Distribution Units (CDUs), hybrid loop architectures, and advanced heat-rejection systems, across colocation and owned data center environments. You will work cross-functionally with internal and external experts to establish best practices, evaluate emerging technologies, and ensure that Lambda’s cooling infrastructure scales reliably and efficiently to support extreme rack densities.
Key Responsibilities:
Liquid Cooling Systems Strategy & Management
CDU Operations & Optimization: Define and oversee operational standards and lifecycle management for all CDU systems (L2L and L2A), including performance optimization, reliability engineering, and capacity expansion strategies. Utilize advanced analytics to identify trends and implement predictive maintenance practices.
Technical Loop Governance: Lead the design and management of multi-stage cooling loops — from facility to rack level — ensuring precise control of temperature, pressure, and flow rate across variable load conditions. Establish system performance benchmarks and quality assurance protocols for coolant integrity and flow balancing.
System Integration Leadership: Coordinate and validate integration of CDUs with facility water systems (FWS), heat exchangers, and mechanical infrastructure. Develop standardized control sequences and commissioning procedures across multiple OEM platforms.
Performance Engineering & Monitoring: Architect the monitoring framework for coolant system telemetry — pressure, temperature, flow, differential, and conductivity — and leverage analytics for continuous improvement in thermal performance, redundancy, and energy efficiency.
Predictive & Preventive Maintenance: Design and institutionalize maintenance methodologies, including condition-based maintenance schedules, failure-mode analysis, and reliability improvement plans for pumps, heat exchangers, and filtration systems.
Infrastructure Planning & Scaling
Capacity Planning & Design Leadership: Evaluate and forecast thermal capacity requirements for high-density GPU clusters, driving design and procurement of CDUs and loop systems to support rack densities exceeding 1 MW. Develop multi-year cooling capacity roadmaps aligned with corporate growth strategies.
Engineering Collaboration: Partner with data center design and mechanical engineering teams to co-develop cooling topologies, redundancy strategies, and modular infrastructure designs optimized for scalability and efficiency.
Vendor & Technology Strategy: Act as the primary technical authority for liquid cooling vendor engagement — influencing product roadmaps, negotiating technical specifications, and qualifying emerging solutions such as direct-to-chip and immersion cooling.
Innovation & Continuous Improvement: Evaluate and pilot next-generation cooling technologies and automation platforms to reduce PUE, enhance reliability, and support sustainability objectives.
Cost & Efficiency Optimization: Establish performance metrics for cooling energy efficiency, uptime, and total cost of ownership. Drive initiatives to reduce CapEx/OpEx through standardization, component reuse, and intelligent control strategies.
Operations & Reliability
Mission-Critical Operations: Oversee global operation of liquid cooling infrastructure with near-zero downtime objectives. Define escalation protocols, lead root-cause analysis for thermal incidents, and ensure resilience through redundancy and proactive risk management.
Incident Command & Response: Act as the senior technical lead for major cooling incidents, coordinating cross-functional response teams and developing long-term corrective action plans.
Documentation & Knowledge Management: Establish robust documentation standards — including P&IDs, SOPs, commissioning reports, and change logs — to ensure operational continuity and technical traceability
Regulatory & Environmental Compliance: Ensure adherence to all applicable codes, environmental standards, and safety protocols. Champion safe handling practices for coolants and system fluids.
Team Leadership & Development: Mentor and develop specialized liquid cooling technicians and engineers, building a culture of technical excellence, safety, and continuous improvement across all facilities.
Colocation & Multi-Site Management
Global Coordination: Lead liquid cooling deployment and operational programs across colocation and owned facilities worldwide, ensuring alignment with Lambda’s technical standards and SLAs.
Standardization & Governance: Define and enforce standardized cooling system configurations, control sequences, and operating parameters across all sites to ensure uniform performance and maintainability.
Remote Monitoring & Analytics: Deploy and manage advanced remote monitoring and control systems (DCIM/BMS integrations) for multi-site visibility, predictive analytics, and fault detection.
Scalability & Future Growth: Architect the global cooling expansion framework to support rapid scaling of Lambda’s GPU cloud services, integrating modular and prefabricated cooling components for deployment speed and flexibility.
Ideal Candidate Profile:
Deep technical mastery of liquid cooling systems and their application in mission-critical environments.
Proven track record of architecting, deploying, and operating cooling infrastructure supporting multi-MW high-density computing environments.
Strategic thinker capable of aligning cooling design and operations with company-wide performance, reliability, and sustainability goals.
Adept at leading multidisciplinary teams and influencing technical direction across mechanical, electrical, and network domains.
Operates with minimal oversight and consistently delivers innovative solutions in complex, ambiguous environments.
Strong communicator and collaborator with the ability to influence senior stakeholders, vendors, and partners.
Committed to continuous learning, advancing sustainability, and driving operational excellence in next-generation data center design.
Required Qualifications:
Education & Certifications
Bachelor’s degree in Mechanical, Electrical, or Thermal Engineering (Master’s preferred).
Professional certifications such as DCCA, CompTIA Server+, or liquid cooling manufacturer certifications are strongly preferred.
Experience Requirements
10+ years of experience in data center or mission-critical facility operations.
7+ years managing advanced liquid cooling systems (CDUs, L2L/L2A loops, heat exchangers).
5+ years supporting GPU/AI infrastructure or high-density compute workloads (>300 W per rack).
3+ years managing technical teams in distributed, multi-site environments.
Proven success leading system design reviews, technology evaluations, and vendor negotiations.
Technical Expertise
Liquid Cooling Systems: Expert knowledge of CDU operation, coolant distribution, manifolds, and control systems.
Thermal Management: Deep understanding of thermodynamics, heat transfer modeling, and system efficiency optimization.
Critical Infrastructure: Comprehensive knowledge of UPS, emergency power, fire suppression, and mechanical systems integration.
Monitoring & Controls: Advanced proficiency with DCIM/BMS systems and real-time telemetry analytics.
Mechanical Systems: Expertise in pumps, chillers, cooling towers, and hybrid HVAC configurations.
Core Competencies
Strategic and analytical mindset for resolving complex thermal and operational challenges.
Exceptional project leadership and cross-functional coordination skills.
Demonstrated financial acumen in CapEx/OpEx optimization and vendor negotiation.
Strong communication and presentation abilities for executive and technical audiences.
Decisive leadership under pressure with robust incident response capability.
Passion for innovation, sustainability, and advancing high-efficiency data center cooling technologies.
Preferred Qualifications:
Advanced Degree: Master’s in Mechanical or Thermal Engineering.
AI/ML Infrastructure: Experience designing or supporting large-scale GPU clusters and AI cooling ecosystems.
Industry Experience: Background in hyperscale, HPC, or advanced colocation environments.
Automation: Experience with AI-driven control systems and thermal optimization algorithms.
Sustainability: Demonstrated success implementing energy-efficient and water-conservation cooling strategies.
Salary Range Information
The annual salary range for this position has been set based on market data and other factors. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.
About Lambda
Founded in 2012, ~400 employees (2025) and growing fast
We offer generous cash & equity compensation
Our investors include Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, US Innovative Technology, Gradient Ventures, Mercato Partners, SVB, 1517, Crescent Cove.
We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
Health, dental, and vision coverage for you and your dependents
Wellness and Commuter stipends for select roles
401k Plan with 2% company match (USA employees)
Flexible Paid Time Off Plan that we all actually use
A Final Note:
You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.
Equal Opportunity Employer
Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Experienced HPC Support Engineer needed to troubleshoot GPU/HPC clusters, mentor peers, and deliver high-quality customer support for Lambda’s deep learning cloud.
Kimley-Horn in Nashville is seeking 2026 engineering graduates for a Structural Analyst position to support the design and analysis of buildings, bridges, and retaining structures.
AECOM is hiring a Senior Water Conveyance Engineer to lead pipeline design, analysis, and rehabilitation efforts across municipal water and wastewater projects while mentoring technical teams.
Kimley-Horn is hiring a senior Civil Engineer in Eden Prairie to lead and deliver roadway design projects that meet regulatory standards and client expectations.
Lead customer-facing design and deployment of GPU-accelerated generative AI data processing solutions as a Senior Solutions Architect at NVIDIA.
Apex Companies is hiring a Mid-Level Civil Engineer (Stormwater) to support stormwater and water-resources design and permitting on remote projects while contributing to a growth-focused, mentorship-driven consulting team.
Lead a multidisciplinary hardware and firmware team to develop cutting-edge control systems for Atom Computing’s neutral-atom quantum computers.
A Geotechnical Engineer is needed at NYC DDC to lead foundation and subsurface investigations, provide pile and settlement analysis, and support construction with geotechnical expertise.
Lead thermal management design and analysis for Mach Industries’ autonomous defense vehicles, driving simulation, testing, and integration from component to system level.
SpaceX is seeking a Mission Integration Engineer to support mission management, spacecraft operations, and operator training, driving ground system architecture and mission success from proposal through on-orbit operations.
Design and maintain secure, high-performance networks that support mission-critical RPA and Ground Control Station operations for a leading hypersonic aircraft company.
Lead the Electromechanical Clinical Engineering service line at UMMS, supervising technicians and managing repair, calibration, preventive maintenance, and compliance for hospital beds and sterilization equipment across multiple Maryland facilities.
Experienced electrical engineer sought to lead complex power and EV charging projects at Kimley-Horn’s Nashville office, driving technical design, mentoring, and business growth.
Experienced engineering professional sought to lead capital projects and support R&D operations at AbbVie’s North Chicago API Pilot Plant.
Lambda provides Artificial Intelligence and Machine Learning infrastructure to companies like Apple, Intel, Microsoft, MIT, Harvard, the Federal Government, and the DOD. Were headquartered in the Dogpatch and are a short walk from the 22nd Street ...
17 jobs