Browse 20 exciting jobs hiring in Distributed Inference now. Check out companies hiring such as Jobgether, webAI, FM in Virginia Beach, Ontario, Atlanta.
Senior Machine Learning Platform Engineer to design and optimize feature pipelines, distributed training, and low-latency inference systems for a remote US team building production ML infrastructure.
Lead the strategy and delivery of distributed inference, LLM integrations, and on-device ML features at webAI to enable privacy-first, enterprise-grade AI on the edge.
Lead the product direction for large-scale ML inference infrastructure, driving roadmap, customer-facing technical decisions, and delivery of reliable, high-throughput model serving solutions for a U.S.-remote team.
Lead development of high-performance, distributed LLM inference systems at Modular to enable fast, scalable, production-grade AI deployments.
Help design and operate scalable, multi-cloud LLM inference infrastructure at Modular as a Backend Engineer focused on distributed systems and ML inference.
Lead technical product strategy and execution for webAI’s distributed inference and on-device LLM platform, partnering closely with engineering and research to deliver enterprise-grade AI solutions.
Samsara is hiring a Senior Machine Learning Engineer to build scalable ML infrastructure and end-to-end ML applications that power real-world IoT products and improve operational safety and efficiency.
Early-career ML Operations / Full Stack engineer to help design, deploy, and optimize scalable model serving and training infrastructure for Abridge’s AI-driven healthcare platform.
Build secure, scalable infrastructure and governance systems for enterprise AI agents as a Software Engineer on Rubrik's Agent Cloud team.
Coinbase is hiring a Machine Learning Platform Engineer to design and operate low‑latency inference, streaming pipelines, and distributed training infrastructure that powers fraud detection, personalization, and blockchain analysis.
Senior backend engineer role at Sprig to own and evolve large-scale data processing and AI inference systems that power product insights for leading companies.
Contribute to Sprig’s AI-powered platform as a fullstack engineer focused on large-scale backend systems, distributed data workflows, and frontend integrations in a hybrid San Francisco role.
Lead end-to-end development of large-scale AI and deep learning solutions at Thomson Reuters Labs, driving production-grade LLM, retrieval, and data-pipeline capabilities across legal and news products.
Lead the Dynamo engineering team at NVIDIA to design, build, and operationalize high-performance, fault-tolerant LLM inference and GenAI serving infrastructure.
Lead the design and optimization of large-scale AI inference systems at NVIDIA, developing high-performance kernels, compilers, and orchestration for state-of-the-art models.
Lead a talented engineering team to design, build, and operate large-scale LLM serving and model deployment infrastructure that powers personalized recommendations at scale.
Work on Etched's inference runtime to port transformer models and optimize multi-node, low-latency execution on purpose-built ASIC accelerators.
Lead the design and implementation of a production-grade network gateway/load balancer to route long-lived, low-latency inference traffic for cutting-edge AI models at OpenAI.
Relace is hiring a hands-on Machine Learning Engineer to optimize GPU kernels, performance tune large-scale ML systems, and productionize cutting-edge models from our SF FiDi office.
Palo Alto Networks is hiring a Principal Machine Learning Platform Engineer to architect and scale a high-performance ML inference platform for the Prisma AIRS AI security product.