Browse 24 exciting jobs hiring in Llm Inference now. Check out companies hiring such as webAI, Jobgether, Modular (CA) in Milwaukee, Anaheim, Mesa.
Lead the strategy and delivery of distributed inference, LLM integrations, and on-device ML features at webAI to enable privacy-first, enterprise-grade AI on the edge.
Lead the product direction for large-scale ML inference infrastructure, driving roadmap, customer-facing technical decisions, and delivery of reliable, high-throughput model serving solutions for a U.S.-remote team.
Lead development of high-performance, distributed LLM inference systems at Modular to enable fast, scalable, production-grade AI deployments.
Help design and operate scalable, multi-cloud LLM inference infrastructure at Modular as a Backend Engineer focused on distributed systems and ML inference.
Lead technical product strategy and execution for webAI’s distributed inference and on-device LLM platform, partnering closely with engineering and research to deliver enterprise-grade AI solutions.
Senior Software Developer to drive low-level, high-performance AI networking and inference infrastructure using C/C++/Rust, GPU kernels and RDMA at NVIDIA.
Build secure, scalable infrastructure and governance systems for enterprise AI agents as a Software Engineer on Rubrik's Agent Cloud team.
d-Matrix is hiring a Senior Staff ML Researcher to develop and implement algorithmic and numerical techniques that optimize LLM inference on next-generation DNN accelerators at its Santa Clara hybrid headquarters.
Coinbase is hiring a Machine Learning Platform Engineer to design and operate low‑latency inference, streaming pipelines, and distributed training infrastructure that powers fraud detection, personalization, and blockchain analysis.
Lead Developer Relations on the West Coast to grow Featherless’s open-model community, create technical demos and content, and represent the platform at events and hackathons.
Lead end-to-end development of large-scale AI and deep learning solutions at Thomson Reuters Labs, driving production-grade LLM, retrieval, and data-pipeline capabilities across legal and news products.
Lead the Dynamo engineering team at NVIDIA to design, build, and operationalize high-performance, fault-tolerant LLM inference and GenAI serving infrastructure.
Lead the design and optimization of large-scale AI inference systems at NVIDIA, developing high-performance kernels, compilers, and orchestration for state-of-the-art models.
Mercor is seeking an early-career Data Scientist to run experiments, build dashboards, and prototype models that improve matching and evaluation at its San Francisco headquarters.
Lead a talented engineering team to design, build, and operate large-scale LLM serving and model deployment infrastructure that powers personalized recommendations at scale.
Anduril is hiring a Software Engineer, AI in Reston to build, optimize, and deploy real-world ML/LLM systems that power mission-critical defense and intelligence capabilities.
Lead the GenAI Platform engineering team at Abridge to design, deliver, and operate LLM workflows, agentic systems, and retrieval/evaluation infrastructure for clinical AI products.
Capital One is hiring a Senior Lead AI Engineer to design and productionize foundational LLM, inference, and agentic AI systems that are scalable, cost-efficient, and responsible.
Help shape GPU-accelerated inference and AI infrastructure as a Spring intern working on CUDA, models, and scalable training/inference systems in San Francisco.
NVIDIA is hiring a Senior Software Development Engineer to build and optimize TensorRT-LLM inference software that powers large-scale generative AI on GPUs.
Work on cutting-edge production AI systems at Unify, building agents, retrieval, and inference infrastructure to power the next generation of go-to-market products.
Work as a hands-on engineering intern building GPU-optimized AI infrastructure and inference systems with a San Francisco-based team.
Relace is hiring a hands-on Machine Learning Engineer to optimize GPU kernels, performance tune large-scale ML systems, and productionize cutting-edge models from our SF FiDi office.
Palo Alto Networks is hiring a Principal Machine Learning Platform Engineer to architect and scale a high-performance ML inference platform for the Prisma AIRS AI security product.
Below 50k*
0
|
50k-100k*
0
|
Over 100k*
2
|