Job details

Software Engineer, Infrastructure

About Basis

Basis is a nonprofit applied AI research organization with two mutually reinforcing goals.

The first is to understand and build intelligence. This means to establish the mathematical principles of what it means to reason, to learn, to make decisions, to understand, and to explain; and to construct software that implements these principles.

The second is to advance society’s ability to solve intractable problems. This means expanding the scale, complexity, and breadth of problems that we can solve today, and even more importantly, accelerating our ability to solve problems in the future.

To achieve these goals, we’re building both a new technological foundation that draws inspiration from how humans reason, and a new kind of collaborative organization that puts human values first.

About the Role

Software Engineers on the Platform team at Basis build the infrastructure that accelerates research and enables commercial deployment of Basis innovations. You will create reliable training and evaluation infrastructure, manage compute resources scaling to medium-scale models, develop SaaS platform offerings, and build the technical foundation that supports both internal research and external customers.

We are looking for people who excel at infrastructure engineering and understand the unique demands of ML systems at scale. The ideal Software Engineer has experience with distributed systems, cloud infrastructure, and ML training pipelines, and brings a reliability-focused mindset that ensures researchers can trust the systems they depend on. You will work at the intersection of cutting-edge research and production-grade infrastructure.

This role is central to Basis’s commercial strategy and scaling objectives. The Platform team develops general-purpose infrastructure separate from individual design partner teams, enabling replication-based growth across multiple domains and clients.

We seek individuals who aspire to build rigorous, high-quality, robust systems, but are not afraid to iterate quickly, learn from production, and explore different architectural approaches to achieve excellence.

Basis is a collaborative effort, both internally and with our external partners; we are looking for people who enjoy building infrastructure for problems larger than ones they can tackle alone.

We expect you to:

Have demonstrated significant technical achievements in infrastructure engineering. Examples include:
- Building ML training or inference infrastructure for distributed systems
- Developing cloud platforms or services used by multiple teams or customers
- Creating developer tools, CI/CD systems, or deployment automation at scale
- Contributing to infrastructure open-source projects or technical systems with high reliability requirements
Possess deep understanding of distributed systems principles including consistency, availability, fault tolerance, scalability patterns, and performance optimization for high-throughput, low-latency workloads.
Have hands-on experience with cloud platforms (AWS, GCP, Azure) including compute orchestration, storage systems, networking, and cost optimization strategies. Experience managing significant cloud budgets is valuable.
Be proficient in infrastructure technologies including Kubernetes, Docker, infrastructure as code (Terraform), CI/CD pipelines, monitoring and observability (Prometheus, Grafana), and modern DevOps practices.
Understand ML infrastructure requirements including GPU cluster management, distributed training frameworks (PyTorch Distributed, DeepSpeed, Ray), experiment tracking, model versioning, and reproducible research pipelines.
Have experience with systems programming languages including Python (primary for ML), and familiarity with Go, Rust, or C++ for performance-critical components.
Value reliability and operational excellence. You design systems that fail gracefully, monitor proactively, and enable teams to debug and recover quickly when issues arise.
Progress with autonomy on complex technical challenges. You can scope infrastructure projects, make sound architectural decisions, and execute from design through deployment.
Be excited about enabling breakthrough research that advances society’s ability to solve intractable problems through robust, scalable infrastructure.

In addition, the following would be an advantage:

Experience at companies building ML infrastructure at scale (Anthropic, OpenAI, Google, Meta AI Research, Weights & Biases, HuggingFace).
Background in ML research or research engineering providing understanding of researcher workflows.
Experience with on-premise GPU cluster management or hybrid cloud architectures.
Contributions to infrastructure open-source projects (Kubernetes, PyTorch, Ray).
SRE background or experience with production ML systems serving external customers.
Understanding of AI safety and responsible AI deployment practices.

Responsibilities:

Design and build ML training infrastructure supporting medium-scale models with distributed training across GPU clusters, experiment tracking, checkpoint management, and reproducible pipelines.
Develop SaaS platform and API offerings that package Basis research innovations into commercial products, including backend services, API design, authentication, rate limiting, and customer-facing features.
Manage compute infrastructure as it scales, including capacity planning, resource allocation, cost optimization, cloud and on-premise orchestration, and efficiency monitoring.
Build developer tools and workflows that accelerate research velocity including CI/CD pipelines, testing frameworks, deployment automation, and development environment management.
Implement monitoring and observability providing comprehensive visibility into system health, performance, costs, and research progress through metrics, logging, alerting, and dashboards.
Ensure system reliability and scalability by designing fault-tolerant architectures, implementing graceful degradation, conducting load testing, and establishing SLAs appropriate for research and production workloads.
Collaborate with research teams to understand infrastructure needs, translate experimental techniques into scalable systems, and provide technical consultation on architecture and performance.
Maintain security and compliance implementing access controls, encryption, audit logging, and adherence to data governance policies as Basis serves external customers.
Contribute to the culture and direction of Basis by modeling technical excellence, operational discipline, and focus on enabling high-impact research and commercial applications.

Role Details

Exceptional candidates who may not meet all of the following criteria are still encouraged to apply.

FT/PT: Full-time.
In-person Policy: We are in the office four days a week. Be prepared to attend multi-day Basis-wide in-person events.
Location: New York City.
Salary range: Competitive salary.

Privacy Notice

By submitting your application, you grant Basis permission to use your materials for both hiring evaluation and recruitment-related research and development purposes. Your information may be processed in different countries, including the US. You retain copyright while providing Basis a license to use these materials for the stated purposes.

Read our full Global Data Privacy Notice here.

ML infrastructure Kubernetes Terraform Docker GPU PyTorch DeepSpeed Ray AWS GCP MLOps SRE Distributed Systems Python Go Rust CI/CD Prometheus Grafana Infrastructure Engineer Platform

Average salary estimate

$180000 / YEARLY (est.)

min

max

$140000K

$220000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Control Software Engineer

DMV IT Service Hybrid No location specified

VIEW

Posted 38 minutes ago

Experienced embedded control software engineer needed to develop and integrate high-performance control and machine vision features for remote-operated machinery at a client site in Chillicothe, IL.

Engineering Manager

FloQast Hybrid San Jose, California

VIEW

Posted 3 hours ago

Inclusive & Diverse

Empathetic

Feedback Forward

Collaboration over Competition

Growth & Learning

Transparent & Candid

Customer-Centric

Dental Insurance

Flexible Spending Account (FSA)

Vision Insurance

Disability Insurance

Family Medical Leave

Paid Holidays

Medical Insurance

Learning & Development

Employee Resource Groups

Experienced Engineering Manager needed to lead a 5–8 person AI/platform engineering team at FloQast, combining hands-on coding, system architecture, and people leadership to modernize core platform services.

Salesforce Developer - Remote (Req. #737)

Mindex Hybrid No location specified

VIEW

Posted 3 hours ago

Mindex is hiring a remote Salesforce Developer to build and maintain Apex, Visualforce, and Lightning solutions and integrate Salesforce with external systems for enterprise clients.

Senior Django/Python Software Engineer

CapeZero Hybrid No location specified

VIEW

Posted 3 hours ago

Contribute to CapeZero's mission-driven platform by building scalable Django/Python backends and APIs that power renewable energy financing and modeling tools.

Sr. Manager

Visa Hybrid Foster City, CA

VIEW

Posted 12 hours ago

Experienced engineering leader needed to guide teams building scalable, secure payment systems using Java, JavaScript/React, and modern cloud CI/CD practices at Visa.

3.81 Software Engineer: Developer Tools

Field AI Hybrid Irvine, CA

VIEW

Posted 16 hours ago

Field AI is looking for a Software Engineer (Developer Tools) to create and maintain containerized dev environments, internal CLI tools, automated quality checks, and monorepo build systems to support a cross-disciplinary engineering organization.

Senior Software Engineer - Backend

Veeva Systems Hybrid California - Pleasanton

VIEW

Posted 9 hours ago

Inclusive & Diverse

Rise from Within

Mission Driven

Diversity of Opinions

Family Medical Leave

Maternity Leave

Paternity Leave

Lactation Facilities

Family Coverage (Insurance)

Medical Insurance

Dental Insurance

Vision Insurance

Mental Health Resources

Life insurance

Disability Insurance

Health Savings Account (HSA)

Flexible Spending Account (FSA)

401K Matching

Paid Time-Off

Paid Volunteer Time

Veeva Systems is hiring a Senior Backend Software Engineer to lead development of scalable Python-based backend services for new products on the New Markets team (remote, PST/MST).

Senior Backend Engineer

Distro Hybrid United States

VIEW

Posted 9 hours ago

Lead the design and implementation of scalable, event-driven backend services at Ovation to power AI-driven guest communication and integrations for restaurant customers.

Manager, Software Engineering

Spreetail Hybrid Remote

VIEW

Posted 9 hours ago

Spreetail is hiring a Software Engineering Manager to lead remote engineering teams building large-scale backend and data platform systems that drive ecommerce growth.

Forward Deployed Engineer

Anam AI Hybrid New York

VIEW

Posted 13 hours ago

Founding Forward Deployed Engineer to own and ship complex, latency-sensitive customer integrations while building the FDE function and shaping product direction at Anam.

Software Engineer, Fullstack

Ambient.ai Hybrid No location specified

VIEW

Posted 11 hours ago

Ambient.ai is hiring a Full Stack Engineer to build scalable, real-time backend systems and APIs that power its AI-driven physical security platform.

Staff Software Engineer-AI Agents

Hatch Hybrid No location specified

VIEW

Posted 10 hours ago

Lead the architecture and implementation of large-scale backend systems and LLM-driven agents at a high-growth AI customer-service startup headquartered in NYC with remote options in Austin.

Senior Software Engineer, PyTorch - Deep Learning

NVIDIA Hybrid US, CA, Santa Clara

VIEW

Posted 13 hours ago

Customer-Centric

Mission Driven

Inclusive & Diverse

Rise from Within

Diversity of Opinions

Work/Life Harmony

Growth & Learning

Transparent & Candid

Medical Insurance

Paid Time-Off

Maternity Leave

Mental Health Resources

Equity

Child Care stipend

Paternity Leave

WFH Reimbursements

Flex-Friendly

Dental Insurance

Vision Insurance

Life insurance

Health Savings Account (HSA)

Flexible Spending Account (FSA)

401K Matching

Military leave

NVIDIA is hiring a Senior Deep Learning Software Engineer to develop and optimize PyTorch components and production AI solutions for large-scale GPU deployments.

B Basis Research Institute

1 jobs

MATCH

Calculating your matching score...

FUNDING

Nonprofit

DEPARTMENTS

Software Engineering

SENIORITY LEVEL REQUIREMENT

Senior Level

TEAM SIZE

No info