We are looking for an experienced, highly motivated Senior Software Technical Program Manager to lead our efforts in developing pioneering compute software solutions for critically important environments. Our work has made major impact in various fields, and is used across leading academic institutions, start-ups, and industry! This is an outstanding opportunity to lead and manage our communication libraries like NCCL, NVSHMEM, UCX for Deep Learning and HPC. We need passionate, hard-working, and creative people to help us reach our engineering goals.
What you will be doing:
This GPU Communication Libraries role will strongly collaborate across SW Development Managers, Engineers, Product Marketing, Customer Program Management, Quality Assurance, and other logistics personnel to establish and implement streamlined processes for the development of advanced Compute Software solutions for cloud service providers and OEM customers. In this role, you will collect requirements, help define priorities, remove blockers, drive planning and scheduling for all phases of the software development lifecycle. Additionally, you'll be responsible for the continuous improvement and maintenance of all processes related to enterprise support and establish process for next-gen architecture and feature engagements to avoid missed opportunities of influencing changes in HW architecture. You will have the opportunity to partner with diverse technical groups, spanning all organizational levels.
Responsible for leading status meetings, proactively addressing challenges, customer concerns, and serving as primary POC for building and upholding prioritized release schedules and plans.
Strategically plan and partner across Nvidia teams to drive software objectives while maintaining schedules and formulating risk management strategies for risks identified across multiple parallel work streams.
Lead existing product development enhancements and software release processes, while collaborating with engineering management to optimize the development workflow and efficiency.
Translate customer requirements into actionable landmarks and tasks internally, ensuring customers are continually informed on issue statuses.
Drive Virtual reviews and establish continuous feedback loops by communicating benchmarking results and customer insights to product and engineering leadership.
Track and report large-scale performance benchmarking across all clusters. Build performance dashboards and reporting processes to monitor KPIs and surface performance trends
Collaborate across internal teams and third-party partners across time zones, as necessary, to resolve customer issues and oversee customer releases.
Partner with Customer Program Managers addressing software issues, including technical feedback from OEMs, CSPs, and partners.
What we need to see:
12+ overall years of experience in the software industry with specialization in HPC networking or system software.
6+ years program management experience in a similar or related role.
BS, MS, or Ph.D. in CS, CE, EE (related technical field) or equivalent experience.
Hands on experience with software development for hardware platforms or communication runtime or high performance networking with demonstrated success in delivering these complex products to customers.
Proficiency in Agile software development methodologies.
Proven experience to creatively resolve technical and resource issues, and think strategically and tactically building consensus to ensure program success
Comprehensive understanding of software engineering principles, including experience with widely-adopted configuration management tools and productivity-enhancing tools and automation processes.
Exceptional attention to detail and a demonstrated capacity for multitasking, in a dynamic environment with shifting priorities and changing requirements.
Strong communication and technical presentation skills and ability to work independently and actively with minimal guidance.
Previous experience coordinating activities between HW and SW organizations
Ways to stand out from the crowd:
Solid understanding of the Deep Learning Framework ecosystem for Training and Inference
Solid understanding of operating systems, datacenter servers, graphics principles and standards.
Background with parallel programming models (MPI, SHMEM) and at least one communication runtime (MPI, NCCL, NVSHMEM, OpenSHMEM, UCX, UCC).
Knowledge of a modern programming language is desired as well as depth in HPC and ML/DL fundamentals
Background with RDMA, high-performance networking technologies (InfiniBand, RoCE, Ethernet, EFA), network architecture and network topologies.
Our technology has no boundaries! NVIDIA is building the world’s most groundbreaking and innovative compute platforms for the world to use. At the center of NVIDIA's culture are our core values like innovation, excellence and determination and team, that guide us to be the best we can be.
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 160,000 USD - 253,000 USD for Level 4, and 192,000 USD - 304,750 USD for Level 5.You will also be eligible for equity and benefits.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
NVIDIA seeks a Lead Senior Software Engineer to design and deliver industry-leading agentic AI blueprints and scale GenAI applications for enterprise deployment.
Lead the design and validation of cutting-edge RDMA networking protocols and transport architectures to accelerate large-scale AI systems at NVIDIA.
Mainspring Energy seeks an experienced Director of Technical Program Management to lead program delivery, process ownership, and cross-functional execution for complex hardware-software products at our Menlo Park HQ.
Kentro seeks a detail-oriented Program Support Analyst to provide program management support, event coordination, deliverable facilitation, and data/reporting assistance for an ESOM program operating on ET hours (remote within the US).
Sia seeks an experienced technical program/project manager to run complex AI and data-driven initiatives, ensure disciplined delivery, and align stakeholders across product and engineering teams.
Total Life is seeking a licensed clinician to perform in-person wellness assessments and initial therapy for older adults while serving as the local clinical representative at partner residential sites in Texas.
Lead and scale a high-performing implementation organization at Humata Health, driving end-to-end customer implementations, executive relationships, and continuous process improvements across the business.
Strategic operator needed to partner with the CEO, translate strategy into disciplined execution, and coordinate cross-functional initiatives in a fast-paced, founder-led, remote environment aligned to Pacific Time.
Experienced agency Project Manager needed to lead cross-functional teams on digital and data-driven healthcare projects, managing scope, schedules, budgets, and client communication in a flexible hybrid environment.
Strategic project leadership role responsible for delivering complex pharma accounts and integrated launches on time, on scope, and within budget while coaching and growing an operations team.
Lead large, global engineering projects at AbbVie as Associate Director, Project Engineering to deliver strategic investments and ensure program compliance, quality, and fiscal control.
Lead cross-functional technical programs for Visa Direct to deliver secure, scalable real-time payment capabilities across product and engineering teams.
Experienced technical project manager needed to lead strategic fiber and network implementation initiatives, owning end-to-end delivery, budgets, and customer advocacy at Zayo.
Experienced licensed clinical supervisor needed to lead and support a team of mental health clinicians delivering culturally responsive, evidence-based services.
Lead the technical development and field sustainment strategy for Anduril's maritime products, building a multidisciplinary support organization to enable large-scale autonomous fleet operations.
NVIDIA is a publicly traded, multinational technology company headquartered in Santa Clara, California. NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI.
263 jobs