Browse 59 exciting jobs hiring in Ml Infrastructure now. Check out companies hiring such as Pepr AI, LinkedIn, Sciforium in St. Petersburg, Columbus, San Bernardino.
Senior backend infrastructure engineer needed to build and operate the reliable, scalable systems that run Pepr AI’s autonomous ad-spend platform in both cloud and customer VPCs.
Lead and scale Network Growth AI efforts at LinkedIn as a hands-on Senior Staff AI Engineer driving recommender, LLM, and GNN model development and productionization.
Lead the architecture and hands-on development of Sciforium’s high-performance model serving platform, spanning GPU kernels, runtimes, distributed scheduling, and Python APIs to deliver low-latency multimodal inference.
Contract Software Engineer to develop and maintain internal annotation, dataset-tracking, and visualization tools supporting Mach9's geospatial ML teams (remote with PST overlap preferred).
Abridge is hiring a Head of AI Platform to lead the team building scalable, secure ML infrastructure and model-serving systems that power its generative-AI healthcare products.
Own and expand Abacus’s core ML-powered document extraction engine and backend systems as the company’s first Founding Engineer, driving 0→1 development and scalable infrastructure.
Lead the design and delivery of Zapier’s foundational AI Capabilities platform to power Agents, Chatbots, and AI-driven integrations across the product.
High-impact Enterprise Account Executive role focused on selling cloud, cybersecurity, and AI/ML solutions to C-suite buyers across large US accounts with significant quota upside.
Sequen AI seeks a Staff Software Engineer (Infrastructure) to own and scale high‑performance cloud and ML infrastructure supporting training, research, and serving of frontier ranking models.
Lead discovery and early-stage AI partnerships to develop data and training initiatives that accelerate Colibri’s growth in the AI ecosystem.
Norm Ai seeks an experienced Engineering Manager to lead a hybrid software engineering team building AI and compliance solutions for enterprise customers.
Experienced engineering lead to own and scale Grid's ML infrastructure, data pipelines and core backend services in an on-site Seattle role.
Senior ML Systems Engineer to own and evolve the training framework and tooling that enables reliable, high-performance large-scale LLM training.
Lead and build the ML cloud platform at an early-stage AI startup in San Francisco, owning end-to-end infrastructure for training and deploying large-scale physics models while remaining deeply technical and customer-facing.
Lead the design and scaling of high-performance ML infrastructure for large generative and predictive molecular AI models, working at the intersection of ML, physics, and computational chemistry.
Drive large-scale data pipeline design and dataset curation to enable efficient training of cutting-edge language models for a globally distributed AI research and engineering team.
Build and scale the compute and infrastructure that powers Chai Discovery's next-generation AI drug design platform as a Software Engineer, Infrastructure.
Serve as the technical bridge between customers and research at Oumi by designing, training, and deploying ML solutions on an open-source-first AI platform.
Senior Machine Learning Platform Engineer to design and optimize feature pipelines, distributed training, and low-latency inference systems for a remote US team building production ML infrastructure.
Lead the architecture and execution of a high-throughput, low-latency ML and simulations platform that enables large-scale model training, inference, and simulation-driven product development.
Lead the product direction for large-scale ML inference infrastructure, driving roadmap, customer-facing technical decisions, and delivery of reliable, high-throughput model serving solutions for a U.S.-remote team.
Build and operate robust ML training and SaaS infrastructure at Basis, scaling GPU clusters, cloud services, and developer workflows to support cutting-edge research and commercial products.
Lead development of high-performance, distributed LLM inference systems at Modular to enable fast, scalable, production-grade AI deployments.
Help design and operate scalable, multi-cloud LLM inference infrastructure at Modular as a Backend Engineer focused on distributed systems and ML inference.
Lead technical product strategy and execution for webAI’s distributed inference and on-device LLM platform, partnering closely with engineering and research to deliver enterprise-grade AI solutions.
Lead Cold Start's technical strategy and execution as a founder-style CTO, architecting shared systems, driving AI and automation initiatives, and partnering closely with early-stage founders.
Lead program strategy and delivery for platform and data engineering at Pfizer DP&TS, building secure, scalable cloud-native services that accelerate research and discovery.
Titan is hiring a Technical Sourcer to architect and execute proactive sourcing strategies that identify and engage elite technical talent for high-impact engineering and AI roles.
Lead design and implementation of scalable AI infrastructure and developer tooling to accelerate Vanta’s AI-powered product initiatives.
Lead the design and operation of production-grade infrastructure at Decagon to deliver low-latency, highly available systems that power conversational AI at scale.
Cohere seeks a Technical Program Manager to lead end-to-end infrastructure and capacity programs for GPU-heavy ML workloads across cloud providers and internal teams.
A rapidly evolving US-based company is hiring a Head of Platform Engineering to drive a modular, API-first, and event-driven modernization of its platform while leading and growing platform teams.
Early-career ML Operations / Full Stack engineer to help design, deploy, and optimize scalable model serving and training infrastructure for Abridge’s AI-driven healthcare platform.
Help architect and ship robust LLM integrations for Cohere’s North platform, collaborating closely with researchers and engineers to improve performance, latency, and reliability.
Lead the next generation of AI-driven ranking and recommendation systems for LinkedIn's Feed to improve relevance, personalization, and member engagement at massive scale.
Experienced data engineer needed to build scalable data and ML infrastructure for a mission-driven healthcare startup focused on supporting individuals with serious mental illness.
Lead the design, deployment, and scaling of ML systems for Spotify’s podcast and creator products, shaping roadmap decisions and building agentic, real-time experiences that boost engagement.
Work on the data foundations of Phare’s healthcare Revenue Operating System, building high-throughput ingestion, transformation, and serving systems for production ML and SaaS workloads in a hybrid NYC role.
Lead a remote engineering practice to architect, build, and deliver scalable, AI-enabled software solutions while mentoring teams and managing client-facing technical direction.
Lead the modernization and scaling of Bizee’s core platform—building an API-first, event-driven foundation and the platform engineering organization that enables fast, reliable, AI-ready product development.
Lead Pinterest’s Ads Modeling & Marketplace organization as Senior Director to define multi‑year technical strategy, scale modeling and serving platforms, and drive advertiser performance and marketplace health.
Help architect and operate Quizlet’s next-generation ML and data platform to enable scalable model training, deployment, and reliable data workflows for a global learning product.
Cohere is hiring a Staff Software Engineer to build and operate ML-optimized HPC infrastructure (Kubernetes-based GPU/TPU superclusters) that accelerates research and production training of large AI models.
Robinhood is seeking a Software Engineer on the Data Governance team to build backend services and automation that ensure compliant, auditable, and privacy-respecting use of customer data across analytics and AI systems.
Sciforium is hiring an onsite Systems Engineer to build and maintain Linux/GPU infrastructure that supports high-performance AI model training and serving.
Whatnot is hiring a Feature Platform Engineer to build and scale near-real-time feature ingestion and storage infrastructure that powers ML models and critical business systems.
OpenAI's Sora team is hiring a Software Engineer to design and scale distributed data infrastructure that powers large-scale multimodal training and evaluation.
Build high-performance front-end tooling and optimize GPU-level kernels at Sciforium to accelerate our AI serving platform and bridge the UI with low-level infrastructure.
Sciforium seeks a backend software engineer experienced in C++ and Python to build high-performance GPU-level kernels and scalable backend services for its AI model serving platform.
Decagon is seeking a Senior Infrastructure Engineer to own and scale high‑performance infrastructure and developer platforms that power its conversational AI products.
Below 50k*
0
|
50k-100k*
0
|
Over 100k*
38
|