We are excited to announce an opening for a Cloud Solution Architect at NVIDIA and are seeking a passionate individual with a strong interest in large-scale GPU infrastructure and AI Factory deployments! If you are enthusiastic about contributing to projects that push the boundaries of cloud-based AI and resilience in large-scale environments, we invite you to read on. NVIDIA is renowned as one of the most sought-after employers in the technology world, offering highly competitive benefits. We are home to some of the most innovative and forward-thinking individuals globally. If you are creative, autonomous, and eager to apply your skills and knowledge in a dynamic environment, we want to hear from you!
What you'll be doing:
Working as a key member of our cloud solutions team, you will be the go-to technical expert on NVIDIA AI Factory solutions and large-scale GPU infrastructure, helping clients architect and deploy resilient, telemetry-driven AI compute environments at unprecedented scale.
Collaborating directly with engineering teams to secure design wins, address challenges, and deploy solutions into production, with a focus on developing robust tooling for observability, failure recovery, and infrastructure-level performance optimization.
Acting as a trusted advisor to our clients, understanding their cloud environment, translating requirements into technical solutions, and providing guidance on optimizing NVIDIA AI Factories for scalable, reliable, and high-performance workloads.
What we need to see:
2+ years of experience in large-scale cloud infrastructure engineering, distributed AI/ML systems, or GPU cluster deployment and management.
A BS in Computer Science, Electrical Engineering, Mathematics, or Physics, or equivalent experience.
Proven understanding of large-scale computing systems architecture, including multi-node GPU clusters, high-performance networking, and distributed storage.
Experience with infrastructure-as-code, automation, and configuration management for large-scale deployments.
A passion for machine learning and AI, and the drive to continually learn and apply new technologies.
Excellent interpersonal skills, including the ability to explain complex technical topics to non-experts.
Ways to stand out from the crowd:
Expertise with orchestration and workload management tools like Slurm, Kubernetes, Run:ai, or similar platforms for GPU resource scheduling.
Knowledge of AI training and inference performance optimization at scale, including distributed training frameworks and multi-node communication patterns.
Hands-on experience designing telemetry systems and failure recovery mechanisms for large-scale cloud infrastructures including observability tools such as Grafana, Prometheus, and OpenTelemetry.
Proficiency in deploying and managing cloud-native solutions using platforms such as AWS, Azure, or Google Cloud, with a focus on GPU-accelerated workloads.
Deep expertise with high-performance networking technologies, particularly NVIDIA InfiniBand, NCCL, and GPU-Direct RDMA for large-scale AI workloads.
You will also be eligible for equity and benefits.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Lead the design and implementation of next-generation compute debugger tools at NVIDIA to improve developer productivity across GPU architectures and supported platforms.
NVIDIA is hiring a Senior Package Layout Engineer to design and optimize high-speed, high-density ASIC package substrates in a hybrid Santa Clara role.
Contribute to cutting-edge radar sensing software as a Software Engineer Intern/Co-op at a venture-backed startup in Peachtree Corners, GA.
Lead a small engineering team at LinkedIn’s Mountain View office to architect, verify, and deliver large-scale, high-performance software systems while mentoring engineers and shaping technical roadmap.
A remote-first enterprise is seeking an experienced Enterprise Software Engineer skilled in Java, GCP, and AI/ML to build scalable internal tools and end-to-end solutions that improve operational efficiency.
Senior Product Software Engineer to lead full‑stack development and push forward LLM-driven sales automation products at a fast-growing startup.
TheLoops (an IFS company) is hiring an AI/Machine Learning Engineer proficient in Python to build backend systems and integrations that bring LLM-driven agents into enterprise workflows.
Help evolve a restaurant-focused reservations platform by building responsive, high-quality front-end experiences using React and GraphQL.
Doist is hiring an experienced Apple Engineer to craft and maintain Swift-based iOS, iPadOS, and watchOS apps that delight millions of users in a fully-remote, async-first team.
As a Full Stack Engineer at entangl, you’ll build scalable backend services and polished front-end experiences that power data-center workflows and integrations.
Experienced C++ or Rust engineer needed to optimise and build low-latency trading systems for a fast-growing, remote-first digital-asset market maker.
Experienced software engineer needed to develop scalable features and serve as a technical escalation lead for customers at a remote-first Seattle-based engineering intelligence startup.
Lead Lindy's Application Experience, Growth, and Integrations engineering teams to ship polished, customer-focused features with speed and strong technical judgment.
Build scalable, React-based internal tools and integrate ML/AI features as an Enterprise Front-End Engineer on a fully remote, enterprise-focused engineering team.
Lead full-stack AI engineering initiatives to modernize QbDVision’s SaaS platform—migrating databases, integrating LLMs, and building secure, production-ready AI features.
NVIDIA is a publicly traded, multinational technology company headquartered in Santa Clara, California. NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI.
192 jobs