Basis is a nonprofit applied AI research organization with two mutually reinforcing goals.
The first is to understand and build intelligence. This means to establish the mathematical principles of what it means to reason, to learn, to make decisions, to understand, and to explain; and to construct software that implements these principles.
The second is to advance society’s ability to solve intractable problems. This means expanding the scale, complexity, and breadth of problems that we can solve today, and even more importantly, accelerating our ability to solve problems in the future.
To achieve these goals, we’re building both a new technological foundation that draws inspiration from how humans reason, and a new kind of collaborative organization that puts human values first.
Software Engineers on the Platform team at Basis build the infrastructure that accelerates research and enables commercial deployment of Basis innovations. You will create reliable training and evaluation infrastructure, manage compute resources scaling to medium-scale models, develop SaaS platform offerings, and build the technical foundation that supports both internal research and external customers.
We are looking for people who excel at infrastructure engineering and understand the unique demands of ML systems at scale. The ideal Software Engineer has experience with distributed systems, cloud infrastructure, and ML training pipelines, and brings a reliability-focused mindset that ensures researchers can trust the systems they depend on. You will work at the intersection of cutting-edge research and production-grade infrastructure.
This role is central to Basis’s commercial strategy and scaling objectives. The Platform team develops general-purpose infrastructure separate from individual design partner teams, enabling replication-based growth across multiple domains and clients.
We seek individuals who aspire to build rigorous, high-quality, robust systems, but are not afraid to iterate quickly, learn from production, and explore different architectural approaches to achieve excellence.
Basis is a collaborative effort, both internally and with our external partners; we are looking for people who enjoy building infrastructure for problems larger than ones they can tackle alone.
Have demonstrated significant technical achievements in infrastructure engineering. Examples include:
Building ML training or inference infrastructure for distributed systems
Developing cloud platforms or services used by multiple teams or customers
Creating developer tools, CI/CD systems, or deployment automation at scale
Contributing to infrastructure open-source projects or technical systems with high reliability requirements
Possess deep understanding of distributed systems principles including consistency, availability, fault tolerance, scalability patterns, and performance optimization for high-throughput, low-latency workloads.
Have hands-on experience with cloud platforms (AWS, GCP, Azure) including compute orchestration, storage systems, networking, and cost optimization strategies. Experience managing significant cloud budgets is valuable.
Be proficient in infrastructure technologies including Kubernetes, Docker, infrastructure as code (Terraform), CI/CD pipelines, monitoring and observability (Prometheus, Grafana), and modern DevOps practices.
Understand ML infrastructure requirements including GPU cluster management, distributed training frameworks (PyTorch Distributed, DeepSpeed, Ray), experiment tracking, model versioning, and reproducible research pipelines.
Have experience with systems programming languages including Python (primary for ML), and familiarity with Go, Rust, or C++ for performance-critical components.
Value reliability and operational excellence. You design systems that fail gracefully, monitor proactively, and enable teams to debug and recover quickly when issues arise.
Progress with autonomy on complex technical challenges. You can scope infrastructure projects, make sound architectural decisions, and execute from design through deployment.
Be excited about enabling breakthrough research that advances society’s ability to solve intractable problems through robust, scalable infrastructure.
In addition, the following would be an advantage:
Experience at companies building ML infrastructure at scale (Anthropic, OpenAI, Google, Meta AI Research, Weights & Biases, HuggingFace).
Background in ML research or research engineering providing understanding of researcher workflows.
Experience with on-premise GPU cluster management or hybrid cloud architectures.
Contributions to infrastructure open-source projects (Kubernetes, PyTorch, Ray).
SRE background or experience with production ML systems serving external customers.
Understanding of AI safety and responsible AI deployment practices.
Design and build ML training infrastructure supporting medium-scale models with distributed training across GPU clusters, experiment tracking, checkpoint management, and reproducible pipelines.
Develop SaaS platform and API offerings that package Basis research innovations into commercial products, including backend services, API design, authentication, rate limiting, and customer-facing features.
Manage compute infrastructure as it scales, including capacity planning, resource allocation, cost optimization, cloud and on-premise orchestration, and efficiency monitoring.
Build developer tools and workflows that accelerate research velocity including CI/CD pipelines, testing frameworks, deployment automation, and development environment management.
Implement monitoring and observability providing comprehensive visibility into system health, performance, costs, and research progress through metrics, logging, alerting, and dashboards.
Ensure system reliability and scalability by designing fault-tolerant architectures, implementing graceful degradation, conducting load testing, and establishing SLAs appropriate for research and production workloads.
Collaborate with research teams to understand infrastructure needs, translate experimental techniques into scalable systems, and provide technical consultation on architecture and performance.
Maintain security and compliance implementing access controls, encryption, audit logging, and adherence to data governance policies as Basis serves external customers.
Contribute to the culture and direction of Basis by modeling technical excellence, operational discipline, and focus on enabling high-impact research and commercial applications.
Exceptional candidates who may not meet all of the following criteria are still encouraged to apply.
FT/PT: Full-time.
In-person Policy: We are in the office four days a week. Be prepared to attend multi-day Basis-wide in-person events.
Location: New York City.
Salary range: Competitive salary.
Privacy Notice
By submitting your application, you grant Basis permission to use your materials for both hiring evaluation and recruitment-related research and development purposes. Your information may be processed in different countries, including the US. You retain copyright while providing Basis a license to use these materials for the stated purposes.
Read our full Global Data Privacy Notice here.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Experienced embedded control software engineer needed to develop and integrate high-performance control and machine vision features for remote-operated machinery at a client site in Chillicothe, IL.
Experienced Engineering Manager needed to lead a 5–8 person AI/platform engineering team at FloQast, combining hands-on coding, system architecture, and people leadership to modernize core platform services.
Mindex is hiring a remote Salesforce Developer to build and maintain Apex, Visualforce, and Lightning solutions and integrate Salesforce with external systems for enterprise clients.
Contribute to CapeZero's mission-driven platform by building scalable Django/Python backends and APIs that power renewable energy financing and modeling tools.
Experienced engineering leader needed to guide teams building scalable, secure payment systems using Java, JavaScript/React, and modern cloud CI/CD practices at Visa.
Field AI is looking for a Software Engineer (Developer Tools) to create and maintain containerized dev environments, internal CLI tools, automated quality checks, and monorepo build systems to support a cross-disciplinary engineering organization.
Veeva Systems is hiring a Senior Backend Software Engineer to lead development of scalable Python-based backend services for new products on the New Markets team (remote, PST/MST).
Lead the design and implementation of scalable, event-driven backend services at Ovation to power AI-driven guest communication and integrations for restaurant customers.
Spreetail is hiring a Software Engineering Manager to lead remote engineering teams building large-scale backend and data platform systems that drive ecommerce growth.
Founding Forward Deployed Engineer to own and ship complex, latency-sensitive customer integrations while building the FDE function and shaping product direction at Anam.
Ambient.ai is hiring a Full Stack Engineer to build scalable, real-time backend systems and APIs that power its AI-driven physical security platform.
Lead the architecture and implementation of large-scale backend systems and LLM-driven agents at a high-growth AI customer-service startup headquartered in NYC with remote options in Austin.
NVIDIA is hiring a Senior Deep Learning Software Engineer to develop and optimize PyTorch components and production AI solutions for large-scale GPU deployments.