NVIDIA seeks a Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and optimize the GPU-accelerated software that powers today’s most sophisticated AI applications. Our team is responsible for developing and maintaining high-performance open-source frameworks, which are at the forefront of efficient large-scale model serving and inference. You will play a central role in improving these platforms, facilitating smooth deployment and serving of groundbreaking language models.
You’ll work closely with the deep learning community to implement the latest algorithms for public release in inference frameworks. Your work will focus on identifying and driving performance improvements for state-of-the-art LLM and Generative AI models across NVIDIA accelerators, from datacenter GPUs to edge SoCs. You'll bring to bear open-source tools and plugins—including CUTLASS, OAI Triton, NCCL, and CUDA kernels—to implement and optimize model serving pipelines.
What you'll be doing:
Performance optimization, analysis, and tuning of DL models in various domains like LLM, Multimodal and Generative AI.
Scale performance of DL models across different architectures and types of NVIDIA accelerators.
Contribute features and code to NVIDIA’s inference libraries, vLLM and SGLang, FlashInfer and LLM software solutions.
Work with cross-collaborative teams across frameworks, NVIDIA libraries and inference optimization innovative solutions.
What we need to see:
Pursuing a Masters or PhD or equivalent experience in relevant field (Computer Engineering, Computer Science, EECS, AI).
C/C++ programming and software design skills. SW Agile skills are helpful and Python experience is a plus.
Experience with training, deploying or optimizing the inference of DL models in production is a plus.
Modeling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU is a plus.
GPU programming experience (CUDA, OAI TRITON or CUTLASS) is a plus.
Ways to Stand out from The Crowd
Contribute to deep learning software projects, such as PyTorch, vLLM, and SGLang to drive advancements in the field.
Experience with Multi GPU Communications (NCCL, NVSHMEM)
With highly competitive salaries and a comprehensive benefits package, NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our special engineering teams are growing fast. If you're a creative and autonomous engineer with a genuine passion for technology, we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 120,000 USD - 189,750 USD for Level 2, and 148,000 USD - 235,750 USD for Level 3.You will also be eligible for equity and benefits.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
NVIDIA is hiring a Software Engineer to develop user-space applications and Linux kernel storage drivers for next-generation storage solutions.
Develop agentic AI systems at NVIDIA that use LLMs and systems programming to build safe, autonomous software integrated with GPU platforms.
EchoMark is looking for a Senior Backend Engineer to design scalable, multi-tenant backend systems and infrastructure-as-code to support secure document fingerprinting across commercial, government, and on-prem environments.
Atrix seeks a technically fluent, customer-obsessed Forward Deployed Engineer to embed with enterprise life-sciences teams and deliver accurate, trusted AI workflows from onboarding through go-live.
Lead the architecture and delivery of Faire's machine-learning platform, building scalable feature stores, model serving, and inference infrastructure to power production ML across the marketplace.
Palo Alto Networks is hiring a Sr Staff Software Engineer to design and build scalable backend services for Prisma Access, enabling secure cloud-delivered networking for global customers.
Experienced C++ developer needed to design and optimize low-latency data processing and analytics components for a Fairfield, NJ fintech platform in a fully onsite role.
Lead backend development for Verkada's Core Command systems, designing and scaling authentication and user infrastructure to support millions of users and devices.
An experienced full-stack engineer to design and ship scalable, high-performance features across frontend and backend systems for a creative, AI-enabled collaborative canvas.
Palo Alto Networks is hiring a Sr. Principal Backend Engineer to lead architecture and development of scalable, high-performance cloud posture security services for the Cortex Cloud platform.
Kaizen Labs is hiring an Engineering Manager to lead and build reliable, scalable payments and accounting systems that support government customers nationwide.
Full Stack Software Engineer (front-end focus) sought to design and deliver accessible, high-performance React/TypeScript applications integrated with enterprise systems for a large-scale mission-critical platform.
NVIDIA is hiring a Software Engineer to develop user-space applications and Linux kernel storage drivers for next-generation storage solutions.
Lead Replit's Cloud Services team to build and operate first-party cloud infrastructure that powers Replit Agent and enables scalable, user-friendly app hosting and deployment.
Air Space Intelligence is hiring a Software Engineer to develop and scale high-impact systems that power airspace decision-making for airlines and government customers.
NVIDIA is a publicly traded, multinational technology company headquartered in Santa Clara, California. NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI.
74 jobs