Browse 102 exciting jobs hiring in Inference now. Check out companies hiring such as NVIDIA, Lyft, ThoughtForge in Anchorage, Yonkers, Sioux Falls.
Lead the strategy and delivery of inference benchmarking products at NVIDIA, translating technical performance insights into actionable products, partnerships, and GTM outcomes.
Lyft is hiring Masters and PhD interns for Summer 2026 in San Francisco to work on optimization, ML, and inference problems that support its mobility marketplace.
Help build the next generation of adaptive robotics software as a mid-level generalist engineer working on simulation and robot control at an early-stage, VC-backed startup.
Work on state-of-the-art, real-time perception and sensor-fusion pipelines and deploy them on constrained hardware for robust field performance.
Jerry.ai is hiring a Data Scientist to apply ML, experimentation, and analytics to accelerate customer growth and product decisions at a high-growth pre-IPO startup.
Senior engineer role to optimize and extend NVIDIA's GPU-accelerated inference stacks (vLLM, SGLang, FlashInfer) for LLMs and generative AI across datacenter and edge accelerators.
Deepgram is hiring a backend engineer skilled in Rust and distributed systems to design and optimize high-performance inference services for production voice AI products.
Launch Potato is hiring a Principal ML Engineer to architect and lead production personalization and real-time ML systems for its high-traffic digital media platforms.
The Energy & Environment Lab at the University of Chicago is seeking a Research Director to lead rigorous applied research in energy and environmental policy, manage research teams, and partner with faculty and policymakers to produce and disseminate evidence-driven solutions.
Lead the strategy and execution of Crusoe's next-generation Managed AI Services, owning the product lifecycle from roadmap to market adoption for inference and managed AI offerings.
Lead performance engineering for Vision Language Models at NVIDIA, optimizing end-to-end inference pipelines, CUDA kernels, and SDK integrations to deliver accelerated computer vision at scale.
Lead the engineering effort to convert state‑of‑the‑art AI research into production‑ready open‑source components and platform integrations for Red Hat's AI offerings.
Lead the development and operation of Attentive’s ML platform to enable high-velocity, reliable training and low-latency serving for production ML applications.
Lead the technical experimentation function to design and operationalize robust statistical frameworks that enable fast, reliable decisions across product, marketing, and ad teams at NBCUniversal.
Lead a high-impact team accelerating LLM inference performance at NVIDIA by combining deep systems expertise, GPU profiling, and cross-functional collaboration.
Lead design, execution, and analysis of experiments and causal valuation for Netflix’s ads business to inform product strategy and drive measurable impact.
Lead and manage a portfolio of quantitative education research projects at CEPR, directing analytic design, supervising staff, and partnering with state agencies to produce rigorous, policy-relevant evidence.
Mercor is hiring an early-career Data Scientist in San Francisco to drive experiments, metrics, and prototypes that improve hiring match quality and product metrics using SQL, Python, and causal thinking.
Lead the Dynamo engineering team at NVIDIA to architect and deliver a high-performance, scalable LLM inference platform for real-time and multi-node AI workloads.
Oura is hiring a Senior Data Scientist to lead development of novel running-dynamics algorithms and production ML models that turn wearable sensor data into actionable performance coaching.
Lead the design and deployment of enterprise-grade MLOps, feature stores, and LLM-driven chatbot solutions at a fast-growing data product firm serving Fortune 500 clients.
Senior Machine Learning Engineer/Economist to apply auction theory, econometrics, and scalable engineering to optimize Pinterest's ads marketplace and long-term advertiser and user outcomes.
Help productionize cutting-edge video AI at Cantina by building scalable inference systems, video data pipelines, and production-grade ML infrastructure.
Kentro is hiring a remote Data Scientist to perform advanced clinical analytics and causal inference to support VA decision-makers and drive measurable health outcomes.
NVIDIA is hiring a Senior Field Applications Engineer to enable GPU-powered AI solutions by providing technical leadership, onsite customer support, and system design expertise for data center and edge deployments.
Lead the design and production deployment of advanced ML and LLM-driven systems at Jerry.ai to power large-scale consumer product features and drive strategic business initiatives.
MLabs is hiring an AI Engineer to develop multimodal LLM-based document extraction and scale ML infrastructure for freight back-office automation.
EvenUp is hiring an early-career Economist/Data Scientist to blend econometrics and applied data science in a hybrid role to drive pricing, growth, forecasting, and experimentation decisions.
An opportunity to apply computational social science expertise to editorial decision-making at Nature Communications, handling manuscripts, overseeing peer review, commissioning content, and representing the field across the journal.
Senior product leader needed to define and execute the vision, roadmap, and adoption of enterprise AI/ML platforms at Visa, spanning cloud and on-prem solutions to power payments-related ML products.
Kiddom is hiring a Research Engineer (GenAI) to design and deploy ML-powered search, personalization, and agentic assistant systems that support teachers and improve student learning.
Visa is hiring a Senior Data Scientist in Washington, DC to lead predictive modeling, experiment design, and data-product initiatives that inform payments and marketing strategies.
Lead full-stack hardware and software integrations to deploy vision AI systems in manufacturing environments as Maneva's field Mechatronics Engineer covering Arkansas and nearby regions.
Drive enterprise analytics strategy and lead cross-team delivery as a Senior Lead Data Analyst, building reusable pipelines, KPI frameworks, and executive-grade insights to accelerate data-driven decisions.
Be part of a capital-backed, high-growth team where you'll apply statistical rigor and machine learning to power product and growth decisions across a consumer super app for car ownership.
Lead the development of distributed runtime and orchestration systems (Rust, Kubernetes, Slurm) to enable large-scale, low-latency GPU inference for NVIDIA's Dynamo/Inference Server ecosystem.
A new-graduate software engineer role on NVIDIA's TensorRT team to help design and optimize high-performance deep learning inference software for specialized platforms.
Contribute to aion's inference infrastructure as an ML Inference Platform Intern, learning and implementing high-performance optimization techniques for production GPU systems.
Lead measurement strategy and causal analytics for top advertisers at Snap, driving ads efficacy, experimentation, and cross-functional playbooks from the Los Angeles office.
Work with NVIDIA's Data Center Infrastructure team to architect and deploy AI Factory solutions that optimize GPU-accelerated inference and training workloads across hybrid cloud and on-prem environments.
Serve Robotics is hiring an ML Performance Engineer to optimize and deploy real-time ML models on NVIDIA Jetson-based delivery robots in Los Angeles.
Lead technical architecture and mentoring for Chime's Experimentation Platform, building scalable systems and partnering with data teams to improve experiment design and velocity.
Lead and scale a specialized GPU kernels team at Modular to design, optimize, and ship high-performance compute kernels that power the MAX GenAI inference platform.
Superblocks is hiring an Infra Engineer to design and operate a real-time distributed execution engine and production infrastructure for hundreds of thousands of AI applications from their NYC HQ.
NVIDIA is hiring a Systems Software Engineer to develop and evaluate cloud-native AI inference systems, agentic workflows, and developer-focused content that leverage GPU-accelerated frameworks.
At Realtor.com, a Staff Machine Learning Engineer will lead the full ML lifecycle—building and deploying scalable models for pricing, forecasting, and recommendations that directly impact revenue and user experience.
Apply advanced statistics and machine learning at Intuitive to turn complex, messy healthcare and commercial data into actionable insights that inform strategic decisions and improve patient outcomes.
Help engineer the inference backbone at Together AI, optimizing global request routing, autoscaling, and multi-tenant systems to serve cutting-edge generative models at scale.
Lead research on LLM training and inference at Lila Sciences to advance scientific applications of large language models.
Below 50k*
0
|
50k-100k*
0
|
Over 100k*
35
|