Browse 15 exciting jobs hiring in Vllm now. Check out companies hiring such as Palo Alto Networks, Sciforium, FM in St. Petersburg, Miami, Toledo.
Lead architecture and research for a high-scale ML inference and security platform at Palo Alto Networks' Prisma AIRS, driving MLOps standards and LLM-focused product innovations.
Sciforium is hiring a Distributed Training Engineer to own and optimize the full ML training stack — from drivers and kernels to JAX/PyTorch — enabling large-scale training and deployment of next-generation LLMs.
Lead the architecture and hands-on development of Sciforium’s high-performance model serving platform, spanning GPU kernels, runtimes, distributed scheduling, and Python APIs to deliver low-latency multimodal inference.
Lead the ML stack as a founding Machine Learning Engineer at a stealth, self-funded AI group, defining models, training pipelines, and scalable inference for a global consumer product.
A founding AI/ML research engineer role to design and build core model, data, and inference systems for a stealth, high-impact consumer AI product backed by a profitable US$2B group.
Lead the technical design and implementation of A1’s foundational LLM systems—training pipelines, inference stacks, and deployment architecture—for a global consumer AI product.
Virtue AI seeks a Research Scientist Intern in San Francisco to develop and integrate cutting-edge agent and LLM security techniques, including red-teaming, guardrail models, and efficient inference methods.
NVIDIA seeks an entry-level Deep Learning Software Engineer to help optimize and ship GPU-accelerated inference software for LLMs and generative AI.
Lead design and implementation of cloud-native, high-performance backend services and AI model-serving infrastructure for Palo Alto Networks' ATP Cloud team.
Work at the intersection of research and engineering to build scalable synthetic data pipelines that directly improve the quality and efficiency of Cohere's language models.
Senior delivery leader sought to lead hands‑on AI agent development teams and scale multi‑disciplinary programs that deliver LLM‑enabled products in regulated and high‑compliance environments.
At Dun & Bradstreet, seek an AI Engineer II to architect and deploy agentic AI systems using LLMs, vector databases, and MLOps to power scalable, production-ready intelligent automation.
Lead the product direction for large-scale ML inference infrastructure, driving roadmap, customer-facing technical decisions, and delivery of reliable, high-throughput model serving solutions for a U.S.-remote team.
Lead development of high-performance, distributed LLM inference systems at Modular to enable fast, scalable, production-grade AI deployments.
LlamaIndex is seeking a Multimodal AI Engineer to develop and productionize vision-language and document-understanding models that power large-scale document parsing and RAG applications.
Below 50k*
0
|
50k-100k*
1
|
Over 100k*
13
|