Browse 23 exciting jobs hiring in Ai Inference now. Check out companies hiring such as NVIDIA, Lark Health, USAA in Yonkers, Houston, Omaha.
Lead technical marketing for NVIDIA GPU and rack-scale systems, communicating architecture, performance, and deployment value to hyperscalers, OEMs, and data center operators.
Lead the design, deployment, and responsible evaluation of generative AI solutions at a mission-driven healthcare company serving millions of patients.
NVIDIA seeks a Senior AI Software Engineer to extend Megatron Core and NeMo frameworks through distributed training innovations, performance tuning, and scalable tooling for large-scale LLM and multimodal model workflows.
Lead product strategy and go-to-market for NVIDIA's AI Infrastructure, focusing on inference software, Kubernetes integrations, and customer-driven AI Factory solutions.
Sable is hiring an AI Engineer in San Francisco to develop and productionize multimodal deep-learning systems that power digital-human enterprise workers.
Gimlet Labs is hiring a Software Engineer (AI Performance) to drive model and GPU-level performance improvements for production-scale inference in San Francisco.
Drive the design and production deployment of secure, agent-based AI systems at 3E to deliver high-impact, customer-facing intelligence using LLMs and modern orchestration frameworks.
Intern with Arcade's backend and AI engineering team to build scalable model orchestration, inference, and production backend systems for generative product creation.
NVIDIA is seeking an experienced Embedded Field Applications Engineer to support customers building AI-enabled embedded systems on the Jetson platform across the NALA region.
Experienced SRE with distributed systems and LLM experience needed to design and operate scalable, reliable managed AI services for a mission-driven, sustainability-focused AI infrastructure company.
Contribute to Adobe Firefly’s GenAI Services by building optimized inference pipelines, integrating generative models into flagship products, and developing scalable ML systems for production.
Tessera Labs seeks an AI Agent Engineer to design and deploy scalable, secure multi-step agentic pipelines using LLMs and tool-calling integrations for enterprise applications.
Lead the design and deployment of agentic LLM-based systems at NVIDIA to accelerate and innovate chip architecture and engineering workflows.
Visa is hiring a Senior Machine Learning Engineer to develop and productionize LLM-based generative AI solutions (RAG, fine-tuning, inference) for global payments products.
Virtue AI is hiring a Research Scientist to design and implement agent and LLM red‑teaming techniques and production-ready guardrail models to advance its AI security platform.
Lead research and engineering-driven development of GenAI conversational assistants, guiding a cross-functional team to fine-tune, optimize, and deploy LLM-powered features that improve customer digital experiences.
Lead enterprise sales for a public AI cloud provider by driving adoption of AI Studio’s GPU-accelerated infrastructure and GenAI services across large customers.
Ironclad is looking for a Staff Software Engineer - Applied AI to build and productionize LLMs, RAG systems, and document-understanding services that deliver actionable contract insights.
Lead the strategy and execution of Generative AI across Netflix Games, shaping infrastructure, prototypes, and AI-native gameplay to deliver novel player experiences at scale.
NVIDIA seeks a Senior Product Manager to lead inference benchmarking products that clarify performance, scaling, and TCO for enterprise AI deployments.
Senior engineer role to optimize and extend NVIDIA's GPU-accelerated inference stacks (vLLM, SGLang, FlashInfer) for LLMs and generative AI across datacenter and edge accelerators.
Lead the strategy and execution of Crusoe's next-generation Managed AI Services, owning the product lifecycle from roadmap to market adoption for inference and managed AI offerings.
Lead the engineering effort to convert state‑of‑the‑art AI research into production‑ready open‑source components and platform integrations for Red Hat's AI offerings.
Below 50k*
0
|
50k-100k*
0
|
Over 100k*
1
|