Drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack stack, from data pipelines to GPU kernels
Design, build, and optimize distributed training systems (PyTorch) for multi-node GPU clusters, ensuring scalability, robustness, and high utilization
Implement efficient low-level code (CUDA, cuDNN, Triton, custom kernels) and integrate it seamlessly into high-level training frameworks
Optimize workloads for hardware efficiency: CPU/GPU compute balance, memory management, data throughput, and networking
Develop monitoring and debugging tools for large-scale runs, enabling rapid diagnosis of performance regressions and failures
Deep experience in distributed systems, ML infrastructure, or high-performance computing (8+ years)
Production-grade expertise in Python
Low-level performance mastery: CUDA/cuDNN/Triton, CPU–GPU interactions, data movement, and kernel optimization
Scaling at the frontier: experience with PyTorch and training jobs using data, context, pipeline, and model parallelism
System-level mindset with a track record of tuning hardware–software interactions for maximum utilization
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Lead the design and deployment of advanced control, state estimation, and trajectory optimization systems for general-purpose robots, working closely with hardware and algorithm teams.
Senior Software Engineer to help design and ship scalable, AI-driven fleet safety features across web and API surfaces for Samsara’s Connected Operations Cloud.
Build accessible, performant front-end experiences using modern JavaScript and React/Next.js while collaborating with design and backend teams in a remote-first environment.
A technically fluent journalist role focused on building data-driven interactive applications and visualizations to support investigative reporting at ProPublica.
Experienced engineering manager wanted to lead Pinterest’s User Understanding backend team building large-scale data pipelines and ML-serving infrastructure to power personalization for hundreds of millions of users.
Experienced engineering leader sought to manage and mentor backend engineers building scalable, observable microservices systems for a growing energy technology company.
As a Software Engineering Co-op at VIAVI Solutions, you will gain hands-on experience developing and testing C/C++ software for network validation systems, contributing to design reviews and product improvements.
Tomo is hiring a Senior Back End Software Engineer to lead design and implementation of scalable Python microservices and shape platform architecture for a fully remote U.S. engineering team.
Alphatec Spine seeks a Senior Site Reliability Engineer to improve uptime, automation, and observability for its Informatix cloud platform.
Help the Ethereum Foundation lower barriers to adoption for ERC-4337 and EIL by building developer tools, plugins, and multichain testing frameworks.
Lead the design and implementation of Go-based, containerized cloud-security services at Illumio to provide real-time visibility and breach containment across multi-cloud environments.
bem is hiring a Platform Engineer to design and operate multi-cloud data and GPU compute infrastructure that powers a high-accuracy AI platform for enterprise workflows.
Experienced full-stack developer needed to design and maintain cloud-native services and front-end applications for enterprise reporting and workflow solutions at PowerPlan.
Toyota is hiring a Product Security Development Engineer to manage CI/CD, deployment pipelines, and security-focused automation for connected-vehicle software at our Plano site.