Browse 30 exciting jobs hiring in Llm Inference now. Check out companies hiring such as Awesome Motive, Palo Alto Networks, Airwallex in Greensboro, Shreveport, Moreno Valley.
Tamarind Bio is hiring an AI/LLM Engineer in San Francisco to build scalable, production-grade workflows and enhance an ML copilot for computational biology.
Lead the design and delivery of scalable, secure enterprise AI/ML solutions and mentor engineers to translate advanced AI research into measurable business impact at a top cybersecurity company.
Lead and scale a growth-focused data science and AI organization to build experimentation, personalization, and ML-driven products that unlock step-change user and revenue growth at Airwallex.
Lead high-impact, product-aligned experiments on foundation models using PyTorch and distributed training to improve real-world customer outcomes at Liquid AI.
Help define and accelerate product-market fit for OpenAI’s Codex by measuring developer productivity, designing experiments, and informing model and product improvements.
Lead the design, deployment, and responsible evaluation of generative AI solutions at a mission-driven healthcare company serving millions of patients.
NVIDIA seeks a Senior AI Software Engineer to extend Megatron Core and NeMo frameworks through distributed training innovations, performance tuning, and scalable tooling for large-scale LLM and multimodal model workflows.
Lead product strategy and go-to-market for NVIDIA's AI Infrastructure, focusing on inference software, Kubernetes integrations, and customer-driven AI Factory solutions.
Lead the design and optimization of high-performance deep learning inference software on NVIDIA GPUs as a Senior Software Engineer on the TensorRT team.
BentoML seeks an Inference Optimization Engineer to accelerate LLM inference across GPUs and distributed serving stacks, reducing latency and GPU costs while contributing to open-source tooling.
Bjak seeks an MLOps Engineer to run and scale open-source LLMs into production, optimizing for cost, latency, and reliability while working in a flexible hybrid model.
Experienced SRE with distributed systems and LLM experience needed to design and operate scalable, reliable managed AI services for a mission-driven, sustainability-focused AI infrastructure company.
Lead product experimentation and LLM-powered evaluation to drive measurable product improvements across cross-functional teams.
Netflix is seeking a senior Software Engineer (L5) to build and operate next-generation, large-scale offline inference systems and developer tooling that accelerate ML practitioners across the company.
Work on the core intelligence at a seed-stage startup, designing experiments, optimizing inference, and building training and eval systems that turn messy UI and behavioral data into production-ready models.
Tessera Labs seeks an AI Agent Engineer to design and deploy scalable, secure multi-step agentic pipelines using LLMs and tool-calling integrations for enterprise applications.
Lead the Triton Inference Server engineering team at NVIDIA to deliver high-performance, scalable model-serving solutions for cloud and on-premises AI deployments.
At Haystack News, a Senior Data Scientist will apply advanced statistical methods, causal inference, and machine learning to improve recommendations and measure product impact across our news streaming experience.
Lead the design and deployment of agentic LLM-based systems at NVIDIA to accelerate and innovate chip architecture and engineering workflows.
Visa is hiring a Senior Machine Learning Engineer to develop and productionize LLM-based generative AI solutions (RAG, fine-tuning, inference) for global payments products.
Plaid is hiring a Senior Software Engineer to architect, build, and operate scalable ML infrastructure—feature stores, pipelines, deployment and inference tooling—to accelerate trustworthy AI across the company.
Virtue AI is hiring a Research Scientist to design and implement agent and LLM red‑teaming techniques and production-ready guardrail models to advance its AI security platform.
Lead and scale the NIM Factory engineering organization to deliver reliable, performant, and secure AI inference services from day‑0 launches through enterprise hardening.
Lead research and engineering-driven development of GenAI conversational assistants, guiding a cross-functional team to fine-tune, optimize, and deploy LLM-powered features that improve customer digital experiences.
Lead the development of production-ready machine learning and causal analytics to power personalization, experimentation, and optimization across NBCUniversal’s streaming products and ad experiences.
Lead enterprise sales for a public AI cloud provider by driving adoption of AI Studio’s GPU-accelerated infrastructure and GenAI services across large customers.
Ironclad is looking for a Staff Software Engineer - Applied AI to build and productionize LLMs, RAG systems, and document-understanding services that deliver actionable contract insights.
Ramp is hiring a Summer 2026 Applied Scientist Intern to develop and ship ML and LLM-based solutions that power underwriting, fraud detection, and smarter spend management.
Join Additive as a Member of Technical Staff — Backend to design and build core backend systems that integrate ML inference, monitoring, and business logic for tax-focused applications.
Lead the strategy and execution of Generative AI across Netflix Games, shaping infrastructure, prototypes, and AI-native gameplay to deliver novel player experiences at scale.