Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Staff Machine Learning Infrastructure Engineer image - Rise Careers
Job details

Staff Machine Learning Infrastructure Engineer

Company Overview:

Dyna Robotics is at the forefront of revolutionizing robotic manipulation with cutting-edge foundation models. Our mission is to empower businesses by automating repetitive, stationary tasks with affordable, intelligent robotic arms. Leveraging the latest advancements in foundation models, we're driving the future of general-purpose robotics—one manipulation skill at a time.

Dyna Robotics was founded by industry leaders who previously achieved a $350 million exit in grocery deep tech as well as top robotics researchers from DeepMind and Nvidia. Our team blends world-class research, engineering, and product innovation to drive the future of robotic manipulation. With $20mil+ in funding, we're positioned to redefine the landscape of robotic automation. Join us to shape the next frontier of AI-driven robotics.

Position Overview:

We are seeking an experience Machine Learning Infrastructure Engineer to join our team and help scale our ML training platform. In this role, you will be responsible for designing, implementing, and maintaining large-scale ML infrastructure to accelerate model iteration and improve training performance across an expanding GPU ecosystem. You will work on cutting-edge high-performance computing systems, optimizing distributed training environments, and ensuring system reliability as we scale.

Key Responsibilities:

  • Infrastructure Design & Scalability:

    • Architect and implement large-scale ML training pipelines that leverage parallel GPU processing on platforms like GCP or AWS.

    • Enhance our existing infrastructure to fully exploit parallelism and design for future expansion, ensuring that our system is ready to support growth.

  • High-Performance ML Computing & Distributed Systems:

    • Manage and optimize high-performance computing resources.

    • Develop robust distributed computing solutions, addressing challenges like race conditions, memory optimization, and resource allocation.

    • Optimize model training with techniques like mixed precision, ZeRO, Lora, etc.

  • Job Scheduling & Reliability:

    • Design systems for job rescheduling, automated retries, and failure recovery to maximize uptime and training efficiency.

    • Implement intelligent job queuing mechanisms to optimize training workloads and resource utilization.

  • Storage & Data Handling:

    • Evaluate and implement tradeoffs between different local and networked storage solutions to improve data throughput and access.

    • Develop strategies for caching training data to optimize performance.

  • Collaboration & Continuous Improvement:

    • Work closely with ML researchers and data scientists to understand training requirements and bottlenecks.

    • Continuously monitor system performance, identify areas for improvement, and implement best practices to enhance scalability and reliability.

Required Qualifications:

  • Bachelor’s degree or higher in Computer Science or a related field.

  • At least 7 years of professional experience in the software industry, with a minimum of 2 years in a tech lead role.

  • Proven experience with high-performance computing environments and distributed systems.

  • Demonstrated ability to scale ML training systems and optimize resource utilization.

  • Hands-on experience with job scheduling systems and managing cloud GPU environments (GCP, AWS, etc.).

  • Deep understanding of distributed computing concepts, including race conditions, memory optimization, and parallel processing.

  • Hands-on experience in ML model tuning for performance.

  • Experience with common ML training and inference tools including PyTorch, TensorRT, Triton, Accelerate, etc.

  • Strong analytical and problem-solving skills with the ability to troubleshoot complex system issues.

  • Excellent communication skills to collaborate effectively with cross-functional teams.

Preferred Qualifications:

  • Experience with container orchestration tools (e.g., Kubernetes) and infrastructure-as-code frameworks.

Benefits:

  • Competitive salary and equity in a seed-stage venture-backed startup

  • Comprehensive health, dental, and vision insurance

  • Professional growth and development through training, mentorship, and challenging projects

  • Daily catered lunches and dinner with a fully stocked kitchen

If you're passionate about building scalable ML systems and optimizing high-performance computing infrastructures, we'd love to hear from you.

Average salary estimate

$220000 / YEARLY (est.)
min
max
$180000K
$260000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs
Photo of the Rise User
Posted 15 hours ago

Build scalable web and backend systems at Immuta as a remote Software Engineer focused on data governance, security, and enterprise-grade performance.

Photo of the Rise User
Northstrat Hybrid No location specified
Posted 21 hours ago

Lead the design and implementation of secure, enterprise microservices for a greenfield program while mentoring engineers and ensuring high code quality under Northstrat's TS/SCI-cleared environment.

Photo of the Rise User
Posted 4 hours ago

Work on Fliff's mobile Fantasy product as a React Native Engineer II, building performant, user-facing features for millions of sports fans on iOS and Android.

Photo of the Rise User
Posted 11 hours ago

SIERTEK LTD is looking for a remote Full Stack Developer (Associate) to develop and maintain full-stack features while growing technical skills in a collaborative environment.

Photo of the Rise User
Posted 12 hours ago

Pano AI is hiring a Senior Software Engineer to advance its embedded Linux edge platform that powers early wildfire detection and response.

Posted 16 hours ago

Software Development Intern to support internal applications, billing tools, and web/mobile improvements at Altom Transport's Hammond terminal.

Photo of the Rise User
QODE Hybrid No location specified
Posted 19 hours ago

Lead a cross-functional engineering effort to build scalable Python-based web applications using AWS and Angular in hybrid/onsite engagements across Fort Mill, NJ, and NY.

Photo of the Rise User
Posted 16 hours ago

Experienced infrastructure engineering leader needed to run and grow Ro's platform team as a hands-on manager driving observability, performance, developer experience, and database initiatives.

Posted 22 hours ago

Experienced Java/J2EE developer needed to modernize and migrate monolithic applications to AWS using containers, serverless services, and IaC for a contract engagement in Rockville.

Work at the intersection of cloud and edge to build secure, scalable IoT infrastructure and device fleets for Beacon AI's aviation-focused platform.

Photo of the Rise User

RepeatMD seeks a Senior Software Engineer (Integrations) to design and implement scalable EMR and third-party integrations that support our growing enterprise platform.

Photo of the Rise User

Lead search and personalization engineering at Coupang to design, implement, and evaluate ML-driven retrieval and ranking systems that improve conversion and customer experience.

Photo of the Rise User
Warner Bros. Discovery Hybrid UT Salt Lake City 175 East 400 South
Posted 9 hours ago
Inclusive & Diverse
Dare to be Different
Collaboration over Competition
Growth & Learning
Medical Insurance
Dental Insurance
Vision Insurance
Life insurance
Disability Insurance
Paid Time-Off
Paid Holidays

Avalanche (Warner Bros. Games) seeks a senior C++/Unreal-focused Advanced Software Engineer to own and implement gameplay systems for AAA titles in their Salt Lake City studio.

MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
No info
HQ LOCATION
No info
EMPLOYMENT TYPE
Full-time, onsite
DATE POSTED
August 24, 2025
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!