Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Lead Network Engineer, Operations & Reliability image - Rise Careers
Job details

Lead Network Engineer, Operations & Reliability

About the Role

Fluidstack is seeking a Lead Network Engineer to lead our Network Operations & Reliability pillar. This role will lead the Operations & Reliability team - you'll be building our network operations function from the ground up while being hands-on with incident response, reliability engineering, and operational tooling. We are looking for someone who is hungry and passionate about the autonomy of building a team and processes that ensure our AI datacenter fabrics run with exceptional reliability at scale.

This role demands deep technical expertise in network operations combined with the vision to build scalable operational systems. You'll establish Tier 2+ incident response capabilities, build observability and automation frameworks, develop runbooks that enable operational excellence, and partner with a centralized NOC (Network Operations Center) on Tier 1 monitoring and triage. Success means creating an operations organization that maintains high availability across distributed datacenter fabrics while scaling to support concurrent multi-site deployments.

Focus

  • Operations Architecture: Define and build the operational model for network reliability at scale. Establish incident response workflows, escalation procedures, runbook frameworks, and operational handoff criteria. Design the systems and processes that enable 24/7 operations across distributed datacenter regions.

  • Incident Response & Reliability: Own Tier 2+ incident management for network infrastructure. Lead response to critical incidents, perform root cause analysis, drive permanent fixes, and build the reliability engineering practices that prevent recurrence. Partner with NOC on Tier 1 triage and escalation workflows.

  • Observability & Monitoring: Build comprehensive observability for network infrastructure including monitoring stack integration, alerting frameworks, telemetry collection, and performance analytics. Ensure operators have visibility into fabric health, traffic patterns, and failure conditions across all network layers.

  • Runbook Development: Author and maintain operational runbooks for common failure scenarios, maintenance procedures, and troubleshooting workflows. Build the knowledge base that enables NOC (Tier 1) and regional operations engineers to respond effectively to incidents.

  • Automation & Tooling: Drive operational automation initiatives including auto-remediation, failure classification, and runtime tooling. Partner with Network Automation Engineers on design-time automation while owning runtime operational tooling that improves MTTR and operational efficiency.

  • Cross-Functional Partnership: Collaborate with Deployment teams on production handover criteria, Engineering Core on design feedback from operational experience, Hardware teams on break-fix coordination, and NOC on escalation procedures. Build strong relationships that enable seamless coordination during incidents.

  • Team Building: Assist in hiring and development of regional operations engineers who will serve as datacenter campus leads for Tier 2 escalations and break-fix coordination. Establish onboarding programs, mentorship frameworks, and career development paths. Build an operations culture focused on reliability, accountability, and continuous improvement.

About You

  • Proven Operations Leadership: 7+ years in network engineering with significant focus on network operations, reliability engineering, or NOC/SOC leadership. You've built operational processes from scratch or significantly scaled existing operations. You understand what it takes to maintain high availability at scale.

  • Deep Technical Operations Expertise: Strong hands-on experience operating large-scale datacenter networks including EVPN/VXLAN, BGP, CLOS architectures, and high-radix switching. You've responded to production incidents, debugged complex network failures, and driven root cause analysis to permanent fixes.

  • Reliability Engineering Mindset: You think in terms of MTTR, MTTD, and failure domains. You've built monitoring and alerting systems, developed runbooks, and implemented automation that improves operational efficiency. You understand the balance between manual intervention and automated remediation.

  • Incident Command Experience: You've led response to critical incidents involving multiple teams and stakeholders. You remain calm under pressure, communicate clearly during outages, and drive incidents to resolution while coordinating complex troubleshooting across teams.

  • Team Building & Leadership: You're hungry to build and lead a high-performing operations team. You've hired, mentored, and developed engineers before. You know how to establish operational standards, build oncall rotations, and create a culture of operational excellence. You see team building as core to the mission.

  • Process & Systems Thinking: You build repeatable processes that scale beyond yourself. You document as you go, identify operational gaps proactively, and continuously improve workflows. You understand how to balance operational rigor with startup speed.

Nice to Haves

  • AI/HPC Fabric Operations: Experience operating AI/ML or HPC fabrics with RDMA (RoCEv2), lossless Ethernet (PFC, ECN), or high-performance networking. You understand the operational precision required when network performance directly impacts workload completion.

  • Hyperscale Operations Background: Experience in network operations at hyperscale companies (Meta, Google, Microsoft, AWS) or large cloud providers. You've seen mature operational practices at scale and can adapt those lessons to a fast-growing startup.

  • NOC/SOC Leadership: Experience building or leading Network Operations Centers, including Tier 1/Tier 2/Tier 3 escalation models, shift scheduling, and oncall rotation management. You understand how to structure operations teams for 24/7 coverage.

  • Observability Stack Expertise: Deep familiarity with network monitoring and observability platforms (Prometheus, Grafana, ELK, Datadog, or similar). Experience designing telemetry collection, building dashboards, and tuning alerting to reduce noise.

  • Automation & Scripting: Comfortable with scripting languages (Python, Go) and automation frameworks (Ansible, Terraform). You can build operational tooling yourself or partner effectively with automation engineers to deliver runtime automation.

  • SRE Principles: Exposure to Site Reliability Engineering practices including SLO/SLI definition, error budgets, post-incident reviews, and operational readiness reviews. You understand how to apply SRE principles to network operations.

Salary & Benefits

  • Competitive total compensation package (salary + equity).

  • Retirement or pension plan, in line with local norms.

  • Health, dental, and vision insurance.

  • Generous PTO policy, in line with local norms.

The base salary range for this position is $200,000 - $300,000 per year, depending on experience, skills, qualifications, and location. This range represents our good faith estimate of the compensation for this role at the time of posting. Total compensation may also include equity in the form of stock options.

We are committed to pay equity and transparency.

Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

FluidStack Glassdoor Company Review
5.0 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
FluidStack DE&I Review
5.0 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
CEO of FluidStack
FluidStack CEO photo
Unknown name
Approve of CEO

Average salary estimate

$250000 / YEARLY (est.)
min
max
$200000K
$300000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

FluidStack logo

What it's like to work at FluidStack

Read Reviews
Similar Jobs
Photo of the Rise User
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Customer-Centric
Snacks
Onsite Gym
Family Coverage (Insurance)
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Learning & Development
Paid Time-Off
401K Matching
Maternity Leave
Paternity Leave

Lead RTL development and SoC-level IP integration as a Principal Engineer in Intel's Client Engineering Group, shaping architecture, verification, and delivery across CPU/GPU/NOC and advanced compute projects.

Photo of the Rise User
Posted 13 hours ago

Intuitive seeks an Industrial Engineering Intern to perform work measurement, lean improvements, and layout optimization to support manufacturing efficiency at its Sunnyvale facility.

Photo of the Rise User

KPFF is hiring a 2-4 year structural engineer to support bridge, waterfront, and horizontal infrastructure projects from design through construction in the Greater Los Angeles office.

Photo of the Rise User

An instrumentation and controls engineering intern/co-op position at Nexus to assist with drawings, instrument specifications, calculations, and collaborative engineering deliverables for client projects.

Photo of the Rise User
AECOM Hybrid Greenville, SC
Posted 2 hours ago

AECOM is hiring an entry-level Civil Engineer in Greenville, SC to support infrastructure design and analysis under senior supervision.

Applied Materials is seeking a hands-on Mechanical Engineer Intern for summer 2026 in Gloucester, MA to support prototype design, component optimization, and engineering change processes in semiconductor equipment development.

Photo of the Rise User

Boeing’s MS&B Electromagnetic Effects team in Oklahoma City seeks an entry-level Electromagnetic Effects Design and Analysis Engineer to perform EMI/EMC testing, analysis and support for aerospace systems.

Photo of the Rise User
Westgate Resorts Hybrid 7700 Westgate Blvd, Kissimmee, FL 34747, USA
Posted 9 hours ago

Westgate Resorts is seeking an organized Engineering Coordinator to manage and prioritize work orders, coordinate resources, and support maintenance operations at its Kissimmee resort.

Photo of the Rise User
AbbVie Hybrid Barceloneta, Puerto Rico
Posted 9 hours ago

Senior-level process engineer supporting process development, validation and technical transfer activities for APIs and drug products at AbbVie's Barceloneta manufacturing site.

Photo of the Rise User
Smiths Group Hybrid Remote, Remote, United States
Posted 10 hours ago

Senior Electrical Engineer needed to drive design and development of high-performance interconnect products, supporting signal integrity, mechanical/electrical co-design, and cross-functional stakeholder engagement for mission-critical applications.

Photo of the Rise User
Posted 12 hours ago
Mission Driven
Social Impact Driven
Passion for Exploration
Reward & Recognition

Support Falcon payload integration and testing by executing hands-on composite and systems assembly, data acquisition, and test operations across domestic and international facilities.

Photo of the Rise User

Lead Lochmueller Group’s Merrillville roadway design team to deliver high-quality INDOT and local agency projects while mentoring engineers and advancing client relationships.

Photo of the Rise User
Posted 11 hours ago
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave

NVIDIA seeks a Senior Test Methodology Engineer to define and implement ATE test solutions and automation for next-generation GPUs and AI server platforms.

MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
No info
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
November 23, 2025
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!