Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Infrastructure Engineer image - Rise Careers
Job details

Infrastructure Engineer

About FAR.AI

FAR.AI is a non-profit AI research institute dedicated to ensuring advanced AI is safe and beneficial for everyone. Our mission is to facilitate breakthrough AI safety research, advance global understanding of AI risks and solutions, and foster a coordinated global response.

Since our founding in July 2022, we've grown quickly to 30+ staff, producing over 40 influential academic papers, and established the leading AI Safety events for research, and international cooperation. Our work is recognized globally, with publications at premier venues such as NeurIPS, ICML, and ICLR, and features in the Financial Times, Nature News, and MIT Technology Review.

We drive practical change through red-teaming with frontier model developers and government institutes. Additionally, we help steer and grow the AI safety field through developing research roadmaps with renowned researchers such as Yoshua Bengio, running FAR.Labs, an AI safety-focused co-working space in Berkeley housing 40 members, and supporting the community through targeted grants to technical researchers.

About FAR.Research

Our research team likes to move fast. We explore promising research directions in AI safety and scale up only those showing a high potential for impact. Unlike other AI safety labs that take a bet on a single research direction, FAR.AI aims to pursue a diverse portfolio of projects.

Our current focus areas include:

We also put our research into practice through red-teaming engagements with frontier AI developers, and collaborations with government institutes.

About the Role

We’re seeking an Infrastructure Engineer to develop and manage scalable infrastructure to support our research workloads. You will own our existing Kubernetes cluster, deployed on top of bare-metal H100 cloud instances. You will oversee and enhance the cluster to 1) support new workloads, such as multi-node LoRA training; 2) new users, as we double the size of our research team in the next twelve to eighteen months; and 3) new features, such as fine-grained experiment compute usage tracking.

You will be the point-person for cluster-related work. You will work on the Foundations team alongside experienced engineers, including those who built and designed the cluster, who can provide guidance and backup. However, as our first dedicated infrastructure hire, you will need to work autonomously, design solutions to varied and complex problems, and communicate with researchers who are technically skilled but less knowledgeable about our cluster and infrastructure.

This is an opportunity to build the technical foundations of the largest independent AI safety research institute, with one of the most varied research agendas. You will be working directly with both the Foundations team and researchers across the organization to enable bleeding-edge research workloads across our research portfolio.

Responsibilities

Build and Maintain

You will deliver a scalable and easy to use compute cluster to support impactful research by:

  • Empowering the research team to solve their own day-to-day compute problems, such as debugging simple issues and streamlining recurring tasks (e.g. running batch experiments, launching an interactive devbox, etc.).

  • Maintaining and developing in-cluster services, such as backups, experiment tracking, and our in-house LLM-based cluster support bot.

  • Maintaining adequate cluster stability to avoid interfering with research workloads (currently >95% uptime outside of planned maintenance windows).

  • Maintaining situational awareness of the cloud GPU market and assisting leadership with vendor comparisons to ensure we are using the most effective compute platforms.

Support Security

We often collaborate with partners with stringent security requirements (e.g. governments, frontier developers) and handle sensitive information (e.g. non-public exploits, CBRN datasets). You will implement security measures towards:

  • Securing the cluster against insider threats (architecting it to have adequate isolation to provide data confidentiality and integrity for sensitive workloads) and external threats (through minimizing the attack surface, and ensuring security updates are promptly installed).

  • Making secure workflows the default, e.g. streamlining the deployment of internal web dashboards behind an OAuth reverse proxy.

  • Championing security across the FAR.AI team, including maintaining and extending our mobile device management (MDM) system.

Bleeding-edge Workloads

You will work with the Foundations team and specific research teams to support novel ML workloads (e.g. fine-tuning a new open-weight model release) by:

  • Architecting our Kubernetes cluster to flexibly support novel workloads.

  • Assisting projects with bespoke requirements, designing and implementing effective infrastructure solutions, and sharing your infrastructure wisdom with ML researchers.

  • Improving observability over cluster resources and GPU utilization to allow us to rapidly diagnose and work around hardware issues or software bugs that may only arise on novel workloads.

About You

It is essential that you

  • Have Kubernetes or other system administration experience.

  • Have a curiosity and willingness to rapidly learn the needs of a new space.

  • Are self-directed and comfortable with ambiguous or rapidly evolving requirements.

  • Are willing to be on-call during waking hours for cluster issues ahead of major deadlines (for a few weeks a quarter).

  • Are interested in improving our security posture through identifying, implementing and administering security policies.

It is preferable that you

  • Have experience supporting ML/AI workloads.

  • Have previously worked in research environments or startups.

  • Are experienced in administering compute or GPU clusters.

  • Are able to adopt a security mindset.

  • Are willing to be part of an eventual on-call rotation, if required.

Example Projects

  • Configure the cluster and user-space development environments to support InfiniBand nodes for high-performance multi-node training.

  • Improve our default devbox K8s pod template to incorporate best-practice workflows for our researchers.

  • Roll out a new mobile device management system to ensure corporate devices meet our security requirements.

  • Streamline onboarding to the cluster for new starters (possibly in different timezones), and candidates on time-limited work trials.

  • Be “holder of the keys”, managing permissions and access control for FAR.AI’s team members to technical systems, including streamlining/automating (e.g. via SAML, SCIM) where appropriate.

  • Analyze storage patterns and propose infrastructure improvements for backups, disaster recovery, and usability.

Logistics

You will be a full-time employee of FAR AI, a 501(c)(3) research non-profit.

  • Location: Both remote and in-person (Berkeley, CA) are possible, though 2 hours of overlap with Berkeley timezones are required. We sponsor visas for CA in-person employees, and can also hire remotely in most countries.

  • Hours: Full-time (40 hours/week).

  • Compensation: $100,000-$175,000/year depending on experience and location. We will also pay for work-related travel and equipment expenses. We offer catered lunch and dinner at our offices in Berkeley.

  • Application process: A programming assessment, a short screening call, two 1-hour interviews, and a 1 week paid work trial.

If you have any questions about the role, please reach out at [email protected]. If you don't have questions, the best way to ensure a proper review of your skills and qualifications is by applying directly via the application form. Please don't email us to share your resume (it won't have any impact on our decision). Thank you!

Average salary estimate

$137500 / YEARLY (est.)
min
max
$100000K
$175000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs
Photo of the Rise User

Stanley Consultants is hiring an Engineer‑In‑Training in Chicago to support transportation planning, traffic analysis, and preliminary roadway design on local and highway projects.

Photo of the Rise User
Posted 4 hours ago

Wade Trim is hiring a seasoned Construction Engineer with a NY PE and deep water/wastewater heavy civil experience to lead resident engineering and construction management efforts in New York City.

DN LLC Hybrid No location specified
Posted 5 hours ago

DN Tanks is hiring a Field Engineer to support project teams on heavy-civil liquid storage construction projects nationwide, managing site execution, subcontractors, concrete operations, and schedule/cost controls.

Posted 3 hours ago

As a Machining Engineer at Mach Industries, you will bridge design and manufacturing to develop and scale precision CNC processes for high-reliability defense hardware.

Photo of the Rise User

Field AI is hiring an Embedded Compute Engineer to architect and harden ARM/x86 compute platforms and end-to-end compute stacks for robots operating in challenging field environments.

Contribute to RF generator design, FPGA control development, and lab testing as a 6-month Electrical/Computer Engineering co-op with MKS Power Solutions in Rochester, NY.

General Dynamics Electric Boat seeks a detail-oriented Configuration Management Analyst to maintain engineering change control, document as-built configurations, and ensure data integrity for submarine construction at the Groton shipyard.

Posted 2 hours ago

Mach Industries is hiring a hands-on Mechanical Engineer in Huntington Beach to design, prototype, and integrate mechanical and electromechanical systems for next-generation autonomous defense platforms.

Posted 3 hours ago

Experienced digital design engineer wanted to lead high-speed circuit card design, integration, and test for complex electronic systems.

Photo of the Rise User
ERG Hybrid Indian Head, MD
Posted 6 hours ago

ERG seeks a Mechanical Engineer to support and optimize manufacturing systems at a federal facility in Indian Head, MD, bringing hands-on manufacturing expertise and strong technical documentation skills.

Photo of the Rise User
Posted 4 hours ago
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave

Technical and business-minded capacity planning lead needed to translate GPU roadmaps and tenant demand into actionable global data center capacity strategies.

Photo of the Rise User

Lead and deliver complex IDOT roadway projects for Valdes while expanding the firm's regional DOT presence and mentoring technical teams.

Photo of the Rise User
Posted 3 hours ago

Threat Tec seeks a cleared Simulation Analyst to build, run, and analyze constructive simulations (WARSIM/OneSAF/JCATS) in support of MCTP exercises at Fort Leavenworth.

far ai’s mission is to ensure ai systems are trustworthy and beneficial to society. we incubate and accelerate research agendas that are too resource-intensive for academia but not yet ready for commercialisation by industry. our current research ...

2 jobs
MATCH
Calculating your matching score...
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
September 29, 2025
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!