Browse 16 exciting jobs hiring in Vllm now. Check out companies hiring such as dottxt, Awesome Motive, NVIDIA in Chesapeake, Virginia Beach, Irvine.
Work on distributed Python and Rust systems at .txt to build and maintain products like dotjson that guarantee structured LLM output and enable reliable AI applications.
Mid-level AI Engineer needed to design, fine-tune, and deploy LLM-based systems and RAG pipelines for mission-critical cyber capabilities at Twenty's Arlington office.
Senior engineer role to optimize and extend NVIDIA's GPU-accelerated inference stacks (vLLM, SGLang, FlashInfer) for LLMs and generative AI across datacenter and edge accelerators.
Lead performance engineering for Vision Language Models at NVIDIA, optimizing end-to-end inference pipelines, CUDA kernels, and SDK integrations to deliver accelerated computer vision at scale.
Lead a high-impact team accelerating LLM inference performance at NVIDIA by combining deep systems expertise, GPU profiling, and cross-functional collaboration.
Lead the Dynamo engineering team at NVIDIA to architect and deliver a high-performance, scalable LLM inference platform for real-time and multi-node AI workloads.
Palo Alto Networks is hiring a Sr Principal Software Engineer to lead backend and model-serving infrastructure development for ATP Cloud services in Santa Clara, focusing on scalable, high-performance cloud-native systems.
Lead the design and deployment of production AI systems at VORTO, focusing on LLM fine-tuning, RAG-based retrieval, and low-latency inference to optimize supply-chain operations.
Contribute to aion's inference infrastructure as an ML Inference Platform Intern, learning and implementing high-performance optimization techniques for production GPU systems.
Lead development of enterprise-grade generative and conversational AI systems that power Valence’s AI-first leadership coaching platform and drive product innovation at scale.
NVIDIA is hiring a Systems Software Engineer to develop and evaluate cloud-native AI inference systems, agentic workflows, and developer-focused content that leverage GPU-accelerated frameworks.
Help engineer the inference backbone at Together AI, optimizing global request routing, autoscaling, and multi-tenant systems to serve cutting-edge generative models at scale.
Lead the design and operation of scalable, observable, and secure ML compute infrastructure to ensure reliable, reproducible, and auditable deployments at Zyphra.
Help accelerate production LLM inference at .txt by optimizing multi-GPU pipelines, kernel performance, and deployment reliability for structured generation workloads.
NVIDIA seeks a new-graduate Deep Learning Software Engineer to design and optimize inference kernels, compilers, and runtimes that accelerate LLMs and other high-impact AI workloads.
Build and optimize high-performance inference infrastructure for large foundation models at a fast-moving, well-funded AI startup in Menlo Park.
Below 50k*
0
|
50k-100k*
1
|
Over 100k*
14
|