Browse 20 exciting jobs hiring in Vllm now. Check out companies hiring such as Prime Intellect, Fiddler AI, NVIDIA in Miami, Baton Rouge, Newport News.
Work at the intersection of RL, post-training evaluation, and production agent infrastructure to shape and deploy agentic AI systems used by real customers.
Fiddler AI is hiring a Staff Backend Engineer to architect and build scalable backend systems and observability pipelines for LLMs and agentic applications at an early-stage, mission-driven company.
Technical Marketing Engineer needed to produce developer-focused technical content, samples, and benchmarks that demonstrate and improve NVIDIA's AI platform software usability.
Lead the architecture and delivery of a scalable, secure AI infrastructure platform while building and mentoring a high-caliber engineering organization at the Texas Institute for Electronics.
Senior Software Engineer to join LinkedIn's AI Platform team to design and optimize large-scale training, feature-engineering, and serving infrastructure for LLMs and recommendation systems.
Lead community strategy for the PyTorch Foundation by building relationships across projects like PyTorch, vLLM, and DeepSpeed to grow a collaborative open-source AI developer ecosystem.
Lead the zero-to-one design and implementation of a high-throughput, low-latency LLM inference stack as an early engineering hire at an SF-based AI startup.
Join Nebius AI Studio to build and scale a high-performance inference platform that makes deploying foundation models fast, reliable, and effortless at massive scale.
Lead the design and implementation of cloud-native backend and AI model-serving infrastructure on GCP for a mission-driven cybersecurity team.
Lead development of scalable, high-performance GCP-based backend and model-serving infrastructure for the ATP Cloud team at Palo Alto Networks.
Senior technical leader partnering with sales to architect and deliver enterprise Red Hat solutions—spanning RHEL, OpenShift, Ansible, hybrid cloud, and AI/LLM GPU services—while mentoring teams and driving strategic customer outcomes.
NVIDIA seeks a Senior Research Engineer to design, implement, and scale open-source post-training and RL algorithms for Nemotron generative AI models.
Join Tonic.ai as an NLP-focused Machine Learning Engineer to design, fine-tune, and productionize LLM-based systems that detect and redact sensitive data and power synthetic data products.
Drive extreme-performance LLM inference and industry benchmarking at NVIDIA by optimizing vLLM and MLPerf workloads on cutting-edge NVIDIA GPUs.
Accelerate and scale OpenAI’s inference stack on AMD GPUs by driving kernel performance, distributed execution, and communication-library integration across large GPU clusters.
Lead the architecture and build scalable, fault‑tolerant systems for Crusoe’s managed AI inference platform to serve LLMs at massive scale.
Work across frontend and backend systems at Compa to build scalable, production-grade software powering enterprise compensation intelligence.
Gimlet Labs is hiring a Software Engineer (AI Performance) to drive model and GPU-level performance improvements for production-scale inference in San Francisco.
BentoML seeks an Inference Optimization Engineer to accelerate LLM inference across GPUs and distributed serving stacks, reducing latency and GPU costs while contributing to open-source tooling.
Produce and scale safe, cost-efficient LLM inference for global AI products as an ML Ops Engineer on a hybrid, high-impact team at Bjak.
Below 50k*
0
|
50k-100k*
0
|
Over 100k*
19
|