Browse 9 exciting jobs hiring in Gpu Serving now. Check out companies hiring such as NVIDIA, Attentive, Coupang in Huntsville, St. Louis, Pittsburgh.
Senior engineer role to optimize and extend NVIDIA's GPU-accelerated inference stacks (vLLM, SGLang, FlashInfer) for LLMs and generative AI across datacenter and edge accelerators.
Lead the development and operation of Attentive’s ML platform to enable high-velocity, reliable training and low-latency serving for production ML applications.
Lead end-to-end, production-scale ML and LLM initiatives at Coupang to improve search, recommendations and generative product experiences.
Monarch is hiring a hands-on Infrastructure & MLOps Engineer to build and operate scalable cloud and AI infrastructure that powers their personal finance platform.
Palo Alto Networks is hiring a Sr Principal Software Engineer to lead backend and model-serving infrastructure development for ATP Cloud services in Santa Clara, focusing on scalable, high-performance cloud-native systems.
Work on the core model-serving infrastructure at ByteDance to design and scale distributed inference systems that power ranking and recommendation across products.
Build and scale core ML infrastructure—data pipelines, training frameworks, and production model serving—to power David AI’s audio research and production products.
Build and optimize high-performance inference infrastructure for large foundation models at a fast-moving, well-funded AI startup in Menlo Park.
Vultron invites an experienced DevOps Engineer to build and operate secure AI infrastructure that transforms government contracting.
Below 50k*
0
|
50k-100k*
0
|
Over 100k*
1
|