Browse 17 exciting jobs hiring in Inference Optimization now. Check out companies hiring such as Capital One, NBCUniversal, TwelveLabs in Overland Park, Phoenix, Austin.
Lead research and engineering-driven development of GenAI conversational assistants, guiding a cross-functional team to fine-tune, optimize, and deploy LLM-powered features that improve customer digital experiences.
Lead the development of production-ready machine learning and causal analytics to power personalization, experimentation, and optimization across NBCUniversal’s streaming products and ad experiences.
Build and scale mission-critical ML systems at TwelveLabs to power state-of-the-art multimodal video understanding models.
Lead marketing measurement and optimization at Kin by designing models, driving data integration, and translating analytic results into business decisions that grow and protect the customer base.
Nelo seeks a senior Data Scientist in NYC to drive underwriting, personalization, and pricing models, lead experimentation and MLOps, and directly impact product and portfolio performance.
Lyft is hiring Masters and PhD interns for Summer 2026 in San Francisco to work on optimization, ML, and inference problems that support its mobility marketplace.
Senior engineer role to optimize and extend NVIDIA's GPU-accelerated inference stacks (vLLM, SGLang, FlashInfer) for LLMs and generative AI across datacenter and edge accelerators.
Lead the technical direction for personalization at Launch Potato as a Principal ML Engineer, designing large-scale, real-time ML systems and driving cross-functional ML strategy.
Lead performance engineering for Vision Language Models at NVIDIA, optimizing end-to-end inference pipelines, CUDA kernels, and SDK integrations to deliver accelerated computer vision at scale.
Kiddom is hiring a Research Engineer (GenAI) to design and deploy ML-powered search, personalization, and agentic assistant systems that support teachers and improve student learning.
Contribute to aion's inference infrastructure as an ML Inference Platform Intern, learning and implementing high-performance optimization techniques for production GPU systems.
Serve Robotics is hiring an ML Performance Engineer to optimize and deploy real-time ML models on NVIDIA Jetson-based delivery robots in Los Angeles.
Help accelerate production LLM inference at .txt by optimizing multi-GPU pipelines, kernel performance, and deployment reliability for structured generation workloads.
Experienced data scientist sought to build models, run causal analyses and forecasts, and deliver measurable commercial impact for Pilot Company's retail and logistics operations.
Work on the core model-serving infrastructure at ByteDance to design and scale distributed inference systems that power ranking and recommendation across products.
Lead analytics-driven marketing strategies at American Express to optimize customer targeting, measure campaign impact, and drive incremental revenue across the US consumer portfolio.
1Kosmos is looking for an AI / Machine Learning Engineer to create and deploy high-performance computer vision and fraud-detection models for passwordless identity in a hybrid NY/NJ role.
Below 50k*
0
|
50k-100k*
1
|
Over 100k*
3
|