NVIDIA is the world leader in GPU Computing. We are passionate about markets include gaming, automotive, vision, HPC, datacenters and networking in addition to our traditional OEM business. NVIDIA is also well positioned as the ‘AI Computing Company’, and NVIDIA GPUs are the brains powering Deep Learning software frameworks, analytics, data centers, and driving autonomous vehicles. We have some of the most experienced and dedicated people in the world working for us. If you are dedicated, forward-thinking, and hard-working technical people across countries sounds exciting, this job is for you. NVIDIA is looking for an outstanding individual who thrives in a diverse work environment, has outstanding interpersonal skills and possesses a strong sense of engagement and continuous process improvement. This candidate must have enterprise server integration, strong Linux experience, reliability testing with various telemetries, scale out cluster, test plan development, track record in developing AI tools and NLP, DevOps, CI/CD experience to join our platform SWQA team.
What you’ll be doing:
Responsible for the development and execution of NVIDIA HGX/DGX/MGX platform test plan on servers, OS, FW and CUDA SW stack from design doc.
Installing and testing various systems OS, server firmware and SW stack.
Drive support for root cause analysis on reliability and validation test failures to identify root cause(s) and achieve mitigation.
Build, develop/debug server and OS level automation front-end and back-end framework and tests
Review partner and supplier test results and prescribe additional reliability testing on components, servers, and packaging as needed.
Work in an agile software development team with very high production quality standards.
Manage bug lifecycle and collaborate with inter-groups to drive for solutions.
What we need to see:
Bachelor’s Degree (or equivalent experience) in a STEM (Science, Technology, Engineering, Math or Physics) field
5+ years proven experience; or master’s degree.
Proven years of OS and server level automation, CI/CD process and DevOps experience using Python, SHELL, Ansible, Jenkins, C/C++, Java, JavaScript
Strong server and Linux(Ubuntu, RedHat, CentOS, SuSE, Fedora and etc…) troubleshooting and debugging experience in a bare-metal and KVM/VMWare/Hyper-V environment.
Good knowledge and hands-on experience in model testing, AI tools/frameworks (TensorFlow, Pytorch, Cursor and etc…), NLP and LLM benchmarking
Experience in using AI development tools for test plans creation, test cases development and test cases automation
Strong experience in FW, BMC/OpenBMC, Network protocol, internal/external enterprise storage devices, PCIe buses and devices, IO sub-devices, CPU and memory, ACPI, UEFI spec, Redfish - huge plus
Proven years of experience in GitHub/Gitlab/Gerrit, PXE, SLURM, Stack/Kubernetes/Docker) – huge plus
Ways to stand out from the crowd:
AI related tools, LLM and NLP.
Experience working with NVIDIA GPU hardware is a strong plus.
Good to have solid understanding of virtualization in Linux (KVM, Docker orchestrated with Kubernetes)
Background in parallel programming ideally CUDA/OpenCL is a plus
With competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to unprecedented growth, our exclusive engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you.
You will also be eligible for equity and benefits.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Architect and evangelize NVIDIA GPU-accelerated data center and edge platforms for major retail customers, driving PoCs and production designs using Kubernetes, containers, and MLOps tooling.
NVIDIA is seeking a Senior Software Engineer to drive JAX core design and performance optimizations for high‑performance deep learning on NVIDIA hardware.
AECOM is seeking a Senior Construction Safety Professional to represent the owner on-site at a large data center construction campus, driving EHS program implementation, contractor oversight, and field-based coaching.
Prime Healthcare is hiring an Inpatient Coder Auditor Trainee in Ontario, CA to review inpatient records, finalize coding (ICD-10, CPT, HCPCS), and support DRG validation and documentation improvement.
Experienced or entry-level Clinical Laboratory Scientist (per-diem nights) needed to run and validate clinical lab testing across hematology, chemistry, microbiology and related areas at Garden City Hospital's Westland lab.
Experienced SDET needed to lead automation across microservices, APIs, and UI layers, improving test frameworks and CI/CT/CD practices for a California-based engineering organization.
Welocalize seeks a native Hungarian linguistic tester with strong technical QA skills to perform on-site localization testing and proofreading in Austin, TX.
Experienced QA Analyst needed to lead device-level testing of 5G/LTE and Wi‑Fi features on iOS devices, analyze modem logs, and validate fixes in an onsite Colorado role.
Experienced quality compliance leader needed to own GxP inspection readiness, vendor audits, and quality system oversight at Rezolute's Redwood City office.
Penumbra is hiring a Quality Monitoring & Improvement Specialist I to support CAPA, NCR, and DA processes, coordinate board reviews, compile quality trending data, and assist with audits at the Alameda site.
GoodLeap is hiring a Quality Engineer to validate and automate testing for complex Salesforce applications and integrations, ensuring reliable platform releases.
Samsara is seeking a Senior Customer Success Quality Analyst to build and operationalize quality programs and data-driven insights that improve outcomes for post-sales teams and customers.
Experienced QA Engineer III needed to lead test planning, automation, and quality initiatives for a fully-remote, fast-moving digital product consultancy.
Experienced software test engineer needed to design and automate UI/API tests and integrate robust verification infrastructure for Shield AI's Hivemind autonomy products in the San Diego area.
NVIDIA is a publicly traded, multinational technology company headquartered in Santa Clara, California. NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI.
196 jobs