NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.
We are seeking a highly-skilled Senior On-Device Model Inference Optimization Engineer to join our team and lead efforts in improving the performance and efficiency of AI models enabling the next generation of autonomous vehicles technology at NVIDIA!
What you'll be doing:
Develop and implement strategies to optimize AI model inference for on-device deployment.
Employ techniques like pruning, quantization, and knowledge distillation to minimize model size and computational demands.
Optimize performance-critical components using CUDA and C++.
Collaborate with multi-functional teams to align optimization efforts with hardware capabilities and deployment needs.
Benchmark inference performance, identify bottlenecks, and implement solutions.
Research and apply innovative methods for inference optimization.
Adapt models for diverse hardware platforms and operating systems with varying capabilities.
Create tools to validate the accuracy and latency of deployed models at scale with minimal friction.
Recommend and implement model architecture changes to improve the accuracy-latency balance.
What we need to see:
MSc or PhD in Computer Science, Engineering, or a related field, or equivalent experience.
Over 10 years of confirmed experience specializing in model inference and optimization.
Expertise in modern machine learning frameworks, particularly PyTorch, ONNX, and TensorRT.
Proven experience in optimizing inference for transformer and convolutional architectures.
Strong programming proficiency in CUDA, Python, and C++.
In-depth knowledge of optimization techniques, including quantization, pruning, distillation, and hardware-aware neural architecture search.
Skilled in building and deploying scalable, cloud-based inference systems.
Passionate about developing efficient, production-ready solutions with a strong focus on code quality and performance.
Meticulous attention to detail, ensuring precision and reliability in safety-critical systems.
Strong collaboration and communication skills for working optimally across multidisciplinary teams.
Ways to stand out from the crowd:
Publications or industry experience in optimizing and deploying model inference at scale.
Hands-on expertise in hardware-aware optimizations and accelerators such as GPUs, TPUs, or custom ASICs.
Active contributions to open-source projects focused on inference optimization or machine learning frameworks.
Experience in designing and deploying inference pipelines for real-time or autonomous systems.
You will also be eligible for equity and benefits.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Senior Technical Marketing Engineer needed to translate NVIDIA GPU and rack-scale system architecture into compelling technical content and customer-facing engagement for hyperscalers, OEMs, and system operators.
NVIDIA seeks a Senior Technical Program Manager to drive delivery of SOC system software programs, coordinating hardware integration, test, validation, and cross-functional teams for automotive and embedded products.
Anduril is hiring a Staff Software Engineer (active U.S. Secret clearance) to lead software development and integration for its Air Defense family of systems, combining autonomy, networking, and data-driven features for deployed operations.
Help accelerate healthcare automation by developing robust browser automation and AI integrations that streamline clinical workflows at Commure + Athelas in Mountain View, CA.
Experienced backend/full-stack engineer needed to build and maintain scalable services for Fandango's consumer-facing platforms as part of NBCUniversal's engineering organization.
Work on production-grade AI agents at Sierra’s Atlanta office, owning the end-to-end lifecycle from pilot to deployment and partnering closely with enterprise customers to drive measurable outcomes.
Peraton seeks a skilled software developer to build and integrate high-fidelity missile and radar simulation models for mission-critical defense environments.
Lead the technical vision and build the AI-driven automation backbone for an early-stage IT operations platform while hands-on coding, deploying models, and growing a small engineering team in New York.
Lead a machine learning engineering team at ServiceNow to build scalable, cloud-native AI/ML solutions that improve enterprise workflows and user experiences.
Experienced distributed-systems engineer needed to lead architecture and development of Paxos' stablecoin and token issuance infrastructure, driving scalability, security, and cross-team technical excellence.
A Washington, D.C.–based partner is seeking a React Developer to deliver high-quality, performant web interfaces using modern front-end technologies in a remote, Agile team.
LlamaIndex seeks an Agent Engineer to build production-quality agent capabilities, retrieval systems, and production SDK bridges that empower developers to build document agents and RAG applications.
Own and resolve the most complex L4 escalations for a leading AI storage platform, driving tooling, automation, and architectural recommendations to improve reliability and MTTR.
A seasoned Sr. RPG Developer is sought to deliver production-ready RPG/ILE solutions and provide technical leadership for iSeries applications supporting regulated business processes.
Support a NATO ACT program as a Full Stack Developer building containerized, microservices-based applications using modern JavaScript, Python, or Java frameworks.
NVIDIA is a publicly traded, multinational technology company headquartered in Santa Clara, California. NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI.
178 jobs