We are seeking highly skilled and motivated software engineers to join our vLLM & MLPerf team. You will define and build benchmarks for MLPerf Inference, the industry-leading benchmark suite for inference system-level performance, as well as contribute to vLLM and optimize its performance to the extreme for those benchmarks on NVIDIA's latest GPUs.
What you’ll be doing:
Design and implement highly efficient inference systems for large-scale deployments of generative AI models.
Define inference benchmarking methodologies and build tools that will be embraced across the industry.
Develop, profile, debug, and optimize low-level system components and algorithms to enhance the throughput and the latency for the MLPerf Inference benchmarks on the newest NVIDIA GPUs.
Productionize inference systems with uncompromised software quality.
Collaborate with researchers and engineers to productionize trending model architectures, inference techniques and quantization methods.
Contribute to the design of APIs, abstractions, and UX that make it easier to scale model deployment while maintaining usability and flexibility.
Participate in design discussions, code reviews, and technical planning to ensure the product aligns with the business goals.
Stay up to date with the latest advancements and come up with novel research ideas in inference system-level optimization, then translate research ideas into practical, robust systems. Explorations and academic publications are encouraged.
What we need to see:
Bachelor’s, Master’s, or PhD degree in Computer Science/Engineering, Software Engineering, a related field, or equivalent experience.
5+ years of experience in software development, preferably with Python and C++.
Deep understanding of deep learning algorithms, distributed systems, parallel computing, and high-performance computing principles.
Hands-on experience with ML frameworks (e.g., PyTorch) and inference engines (e.g., vLLM and SGLang).
Experience optimizing compute, memory, and communication performance for the deployments of large models.
Familiarity with GPU programming, CUDA, NCCL, and performance profiling tools.
Ability to work closely with both research and engineering teams, translating pioneering research ideas into concrete designs and robust code, as well as coming up with novel research ideas.
Excellent problem-solving skills, with the ability to debug sophisticated systems.
A passion for building high-impact software that pushes the boundaries of what’s possible with large-scale AI.
Ways to stand out from the crowd:
Background with building and optimizing LLM inference engines such as vLLM and SGLang.
Experience building ML compilers such as Triton, Torch Dynamo/Inductor.
Experience working with cloud platforms (e.g., AWS, GCP, or Azure), containerization tools (e.g., Docker), and orchestration infrastructures (e.g., Kubernetes, Slurm).
Exposure to DevOps practices, CI/CD pipelines, and infrastructure as code.
Contributions to open-source projects (please provide a list of the GitHub PRs you submitted).
At NVIDIA, we believe artificial intelligence (AI) will fundamentally transform how people live and work. Our mission is to advance AI research and development to create groundbreaking technologies that enable anyone to harness the power of AI and benefit from its potential. Our team consists of experts in AI, systems and performance optimization. Our leadership includes world-renowned experts in AI systems who have received multiple academic and industry research awards.
If you've hacked the inner workings of PyTorch, or if you've written many CUDA/HIP kernels, or if you've developed and optimized inference services or training workloads, or if you've built and maintained large-scale Kubernetes clusters, or if you simply just enjoy solving hard problems, feel free to drop an application!
#LI-Hybrid
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.You will also be eligible for equity and benefits.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Lead technical developer advocacy for NVIDIA’s Physical AI and generative AI platforms, helping partners integrate world foundation models and acceleration technologies into production solutions.
NVIDIA is hiring a Senior CSP Application Engineer to lead system-level integration and optimization of GPU-accelerated server solutions with major cloud service providers.
ServiceNow is hiring a Staff Software Engineer (Backend Java) to drive scalable, high-performance platform features and architecture on the Core Platform team.
Technical product leader needed to architect and ship cutting-edge LLM-powered features for Dia, driving roadmap, quality, and team growth in a remote-first startup.
Experienced Mobile Application Architect needed to perform cross-platform performance audits and deliver a modernization roadmap for Android, iOS, and React Native applications for a leading enterprise consulting client.
Applied Materials is hiring a Software Engineer IV to develop high-performance C/C++ tools and image-processing algorithms for large-scale GDS/OASIS data preparation and mask data workflows in their Santa Clara engineering team.
Work on LinkedIn’s native iOS applications and developer tooling to deliver high-performance, scalable mobile features and improve engineering productivity across the organization.
Experienced Salesforce Developer wanted to design and implement scalable APEX, LWC, and API-driven solutions while supporting admins and cross-functional teams at Houzz.
Lead Affirm’s centralized Machine Learning organization to define strategy, build talent and platforms, and deliver high-impact models that drive business outcomes across underwriting, fraud, servicing and personalization.
Experienced Salesforce technical leader needed to architect, develop, and guide enterprise Salesforce solutions while mentoring teams and delivering exceptional customer outcomes for a well-established software services firm.
Lead the design and delivery of secure, scalable .NET and Angular applications and CI/CD pipelines to support mission-critical services for New York City's Department of Social Services.
Work on a core engineering team building the high-performance trading, pricing, and infrastructure systems that power a real-time marketplace for GPU/HPC compute.
Mastercard is hiring a Vice President of Data & AI to architect and lead a cloud-first data and AI platform that powers scalable, market-facing analytics and GenAI products for Business & Market Insights.
Senior Robotics Software Engineer to design, implement, and deploy mission autonomy systems and multi-asset coordination for Anduril’s Maneuver Dominance team in Costa Mesa, CA.
Work as a founding Forward Deployed Engineer at Simple AI to build and deploy voice AI agents for enterprise customers while closely partnering with product and customers in our SF office.
NVIDIA is a publicly traded, multinational technology company headquartered in Santa Clara, California. NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI.
194 jobs