We are looking for software engineers to contribute to the design and development of libraries and tools to simplify and accelerate computing for unstructured sparsity in DL and HPC. Around the world, leading commercial and academic organizations are revolutionizing AI, data analytics, and scientific and engineering simulations, using data centers powered by GPUs and high-performance linear algebra libraries. Applications of these technologies include LLMs, computer aided engineering, quantum chemistry, autonomous vehicles, computer vision, and countless others. Did you know our team develops the GPU accelerated libraries and SDKs that help make these possible?
In this role, you will work together with other developers on developing solutions that involve generalizations to sparse tensor computations, domain specific language (DSL) specifications of sparse storage formats, and on-demand code generation. Ideal candidates will not only have experience developing accelerated computing software, but also be motivated to advance the state-of-the-art in a variety of accelerated computing domains and DL frameworks like PyTorch. If this sounds exciting, we would love to meet you!
What you will be doing:
Design and develop a C++-based system to simplify and accelerate computing for unstructured sparsity in DL and HPC on NVIDIA GPUs
Enable the system in languages and frameworks that are more commonly used in DL, such as Python and PyTorch
Evaluate and improve the performance of the system on real-life applications
Realize opportunities to improve library quality, performance and maintainability by writing effective and well-tested code for production use
Work closely with product management and other internal and external partners to understand feature and performance requirements and contribute to technical roadmaps
What we need to see:
BS, MS or PhD degree in Computer Science, Applied Math, or related field (or equivalent experience)
6+ years of overall experience in developing, debugging and optimizing high-performance software using C++ and parallel programming; ideally for sparse linear algebra applications and using CUDA, MPI, OpenMP, or equivalent technologies
Experience with domain-specific language design and compiler optimizations, in particular sparse compilers (MLIR or TACO)
Excellent C++, Python, and CUDA programming skills
Strong collaboration, communication, and documentation habits and ideally experience with working in a globally distributed organization
Ways to stand out from the crowd:
Strong understanding of sparse computations, in particular sparsity in AI and HPC
Good understanding of LLMs, Deep Learning methods and frameworks
Experience with low-level GPU performance optimization
Understanding of numerical linear algebra methods like direct and iterative solvers
Experience with adopting and advancing, software development practices such as CI/CD systems and project management tools such as JIRA.
NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing for science and engineering. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We're looking to grow our company and build our teams with the smartest people in the world! Join us at the forefront of technological advancement. NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and talented people in the world working for us. If you're creative, autonomous and love a challenge, we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.You will also be eligible for equity and benefits.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Lead product strategy and execution for resiliency and observability tooling at NVIDIA, shaping diagnostics, telemetry, and automated recovery for large-scale accelerated computing platforms.
NVIDIA is hiring a Senior Solutions Architect to design, deploy, and operationalize large-scale HPC and AI hybrid computing solutions in collaboration with partners and customers.
Experienced Staff Software Engineer needed to lead and evolve Juniper Square's Treasury Platform—driving architecture, mentoring engineers, and delivering reliable, scalable financial systems.
Yupp seeks an experienced Staff+ AI Engineer in Mountain View to architect and ship scalable LLM applications and lead ML lifecycle work across data, model development, evaluation, and production.
Senior Product Engineer needed to build scalable, accessible React single-page applications that turn complex health data into clear, user-friendly experiences for patients and partners at a fast-growing AI-health startup.
Reacher seeks a hands-on Junior Full Stack Engineer to design and deliver end-to-end features (Python/FastAPI + React) for a fast-moving, product-first startup in the creator-economy and e-commerce space.
Arcade is hiring a frontend Software Engineer Intern to design and implement Next.js/TypeScript interfaces that bring generative product creation to life.
Lead the architecture and development of Qt/QML-based embedded software for Baxter's Novum Infusion Pump, driving high-quality, safety-conscious solutions that improve patient care.
WHOOP is hiring an Android Engineer I (Fitness) to implement Android features that translate physiological data into clear, actionable fitness insights for members.
Lead the embedded software effort at RISE Robotics to design and implement real-time control software for large electromechanical lifting systems in a hybrid role based in Somerville, MA.
Lead integration and optimization of wireless chipsets and firmware to improve performance and reliability across TP‑Link's Wi‑Fi and mesh products.
Grammarly is hiring an experienced Site Reliability Engineer to scale and automate its back-end and ML infrastructure in a hybrid San Francisco-friendly setup.
Lead front-end engineering efforts for Target.com by evaluating new technologies, designing robust architectures, and delivering high-quality, accessible web experiences.
COCO Robotics seeks an LA-based ML Engineer Intern to help develop and deploy computer-vision and learning-based systems on real autonomous delivery robots.
Experienced software engineering manager needed to lead cloud-native and big data engineering teams at Clarivate, driving architecture, delivery, and team growth in an AWS-hosted environment.
NVIDIA is a publicly traded, multinational technology company headquartered in Santa Clara, California. NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI.
81 jobs