Browse 19 exciting jobs hiring in Gpu Inference now. Check out companies hiring such as Ataraxis AI, Awesome Motive, USAA in Buffalo, Madison, Irving.
Contribute to cutting-edge AI research in oncology by building, optimizing, and deploying scalable machine learning models and evaluation frameworks at Ataraxis AI.
Tamarind Bio is hiring an AI/LLM Engineer in San Francisco to build scalable, production-grade workflows and enhance an ML copilot for computational biology.
Help architect and scale a production ML inference platform at Tamarind Bio to serve hundreds of biological models and support rapid customer growth.
NVIDIA is seeking a seasoned Technical Program Manager to lead Deep Learning Inference programs, coordinating cross-functional engineering teams to deliver scalable AI software and hardware integrations.
Lead technical marketing for NVIDIA GPU and rack-scale systems, communicating architecture, performance, and deployment value to hyperscalers, OEMs, and data center operators.
NVIDIA seeks a Senior AI Software Engineer to extend Megatron Core and NeMo frameworks through distributed training innovations, performance tuning, and scalable tooling for large-scale LLM and multimodal model workflows.
Lead product strategy and go-to-market for NVIDIA's AI Infrastructure, focusing on inference software, Kubernetes integrations, and customer-driven AI Factory solutions.
Lead the design and optimization of high-performance deep learning inference software on NVIDIA GPUs as a Senior Software Engineer on the TensorRT team.
Gimlet Labs is hiring a Software Engineer (AI Performance) to drive model and GPU-level performance improvements for production-scale inference in San Francisco.
BentoML seeks an Inference Optimization Engineer to accelerate LLM inference across GPUs and distributed serving stacks, reducing latency and GPU costs while contributing to open-source tooling.
Bjak seeks an MLOps Engineer to run and scale open-source LLMs into production, optimizing for cost, latency, and reliability while working in a flexible hybrid model.
NVIDIA is seeking an experienced Embedded Field Applications Engineer to support customers building AI-enabled embedded systems on the Jetson platform across the NALA region.
Contribute to Adobe Firefly’s GenAI Services by building optimized inference pipelines, integrating generative models into flagship products, and developing scalable ML systems for production.
Boson AI seeks an experienced research engineer to optimize training and inference pipelines on GPU clusters using CUDA/Triton, PyTorch, and distributed optimization techniques.
Lead and scale the NIM Factory engineering organization to deliver reliable, performant, and secure AI inference services from day‑0 launches through enterprise hardening.
Lead research and engineering-driven development of GenAI conversational assistants, guiding a cross-functional team to fine-tune, optimize, and deploy LLM-powered features that improve customer digital experiences.
Lead enterprise sales for a public AI cloud provider by driving adoption of AI Studio’s GPU-accelerated infrastructure and GenAI services across large customers.
Work with a top-tier research team in Seattle to optimize inference pipelines for large foundation models, improving latency, throughput, and efficiency at scale.
An experienced systems and ML-inference engineer is needed to lead development of low-latency, high-throughput inference pipelines spanning on-device and cluster deployments.
Below 50k*
0
|
50k-100k*
0
|
Over 100k*
1
|