NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and optimize the GPU-accelerated software that powers today’s most sophisticated AI applications. Our team is responsible for developing and maintaining high-performance deep learning frameworks, including SGLang and vLLM, which are at the forefront of efficient large-scale model serving and inference. You will play a central role in improving these platforms, facilitating smooth deployment and serving of groundbreaking language models.
You’ll work closely with the deep learning community to implement the latest algorithms for public release in frameworks like SGLang and vLLM, as well as other DL frameworks. Your work will focus on identifying and driving performance improvements for state-of-the-art LLM and Generative AI models across NVIDIA accelerators, from datacenter GPUs to edge SoCs. You'll bring to bear open-source tools and plugins—including CUTLASS, OAI Triton, NCCL, and CUDA kernels—to implement and optimize model serving pipelines.
What you'll be doing:
Performance optimization, analysis, and tuning of DL models in various domains like LLM, Multimodal and Generative AI.
Scale performance of DL models across different architectures and types of NVIDIA accelerators.
Contribute features and code to NVIDIA’s inference libraries, vLLM and SGLang, FlashInfer and LLM software solutions.
Work with cross-collaborative teams across frameworks, NVIDIA libraries and inference optimization innovative solutions.
What we need to see:
Masters or PhD or equivalent experience in relevant field (Computer Engineering, Computer Science, EECS, AI).
5+ years of relevant software development experience.
Excellent C/C++ programming and software design skills. SW Agile skills are helpful and Python experience is a plus.
Prior experience with training, deploying or optimizing the inference of DL models in production is a plus.
Prior background with performance modeling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU is a plus.
Ways to stand out from the crowd:
Contribute to Deep Learning Software projects, such as PyTorch, vLLM, and SGLang to drive advancements in the field.
Experience with Multi-GPU Communications (NCCL, NVSHMEM)
Experience building and shipping products to enterprise customers.
GPU programming experience (CUDA, OAI TRITON or CUTLASS).
NVIDIA is at the forefront of breakthroughs in Artificial Intelligence, High-Performance Computing, and Visualization. Our teams are composed of driven, innovative professionals dedicated to pushing the boundaries of technology.
#LI-Hybrid
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.You will also be eligible for equity and benefits.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
NVIDIA is hiring a Senior Circuit Design Engineer to lead transistor-level and custom digital IP design for cutting-edge GPU and AI products.
NVIDIA seeks a Site Lab Technical Project Manager to oversee lab space allocation, infrastructure projects, and operational support for engineering teams working on datacenter products in Santa Clara.
As a Software Engineer on Sora, you will build and scale the distributed data infrastructure that powers multimodal model training and evaluation at OpenAI.
SharkNinja is hiring a consumer-focused Product Developer in Needham to own new product development and product optimizations from concept to mass production for home environment products.
Lead a Platform Engineering team to design, operate, and scale core cloud services and integrations that power Canopy’s IoT and security products.
Commure seeks a Full Stack Software Engineer on the Scribe Growth team in Mountain View to build and scale AI-driven clinical documentation, infrastructure for audio processing, and EHR integrations.
Help reduce enterprise downtime by building operational tooling, monitoring, and customer-facing features as a Software Engineer at a fast-growing SaaS outage intelligence company.
Lead architecture, execution, and AI-driven innovation for Visa's RaIS product portfolio, delivering secure, large-scale payments and identity platforms.
Work with General Dynamics Mission Systems as a Software Engineering Intern to apply software engineering coursework on mission-critical defense projects within an on-site, security-cleared environment.
Senior-level engineering role focused on building scalable, cloud-first infrastructure and tooling using Node.js/TypeScript and infrastructure-as-code for a remote-first SaaS company.
Nimble is hiring a full-time Winter/Spring Software Engineer Intern to work on backend and frontend systems that power a high-scale pharmacy platform from our Redwood City HQ.
NVIDIA is hiring a Senior Deep Learning Frameworks Sustaining Engineer to integrate, back-port, and stabilize TensorFlow, PyTorch and TensorRT for enterprise LTS releases.
Lead architecture and delivery for GoodLeap’s Funding domain, designing scalable C#.NET and TypeScript systems that power financial workflows and ledger reporting.
HHMI's Gadagkar Lab at Columbia University is hiring a software engineer to code in MATLAB/Python, build and maintain electronics and data-acquisition systems, and support experimental neuroscience research.
Gridwise is hiring a Senior Mobile Software Engineer to own and deliver React Native features and mentor teammates for a fast-growing, remote-first mobility startup.
NVIDIA is a publicly traded, multinational technology company headquartered in Santa Clara, California. NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI.
170 jobs