Job details

Backend Engineer – Inference Optimization

About Us

We're a high-energy, impact-driven team, with a long track record of academic excellence. Our team includes researchers whose work has shaped the field—earning best paper awards at top AI conferences and even ranking among the most cited scientists in history of science. We've built fundamental, transformative research that has redefined the community, and now, we're here to change the world—one breakthrough at a time.

What We're Looking For & Why Join Us

We’re looking for a Backend Engineer – Inference Optimization who thrives on solving some of the hardest systems problems in AI. You’ll focus on pushing the limits of foundation model inference performance, working at the intersection of cutting-edge ML and high-performance systems engineering. This is your opportunity to set new benchmarks for latency, throughput, and efficiency at scale.

What is this role?

As a Backend Engineer, you’ll own the design and optimization of inference pipelines for large-scale models. You’ll work closely with researchers and infrastructure engineers to identify bottlenecks, implement advanced techniques like quantization and KV caching, and deploy high-performance serving systems in production. Your work will directly determine how fast and cost-effectively users can access next-generation AI.

What do we expect?

Must have:

Deep experience in optimizing model inference pipelines, model quantization and KV caching.
Proficiency in backend systems and high-performance programming (Python, C++, or Rust)
Familiarity with distributed serving, GPU acceleration, and large-scale systems
Ability to debug complex performance issues across model, runtime, and hardware layers
Comfort working in fast-moving environments with ambitious technical goals

Nice to have:

Hands-on experience with vLLM or similar inference frameworks
Background in GPU kernel optimization (CUDA, Triton, ROCm)
Experience scaling inference across multi-node or heterogeneous clusters
Prior work in model compilation (e.g., TensorRT, TVM, ONNX Runtime)
Hands-on experience with model quantization

Compensation & Benefits

$150K – $250K + Equity

We offer health benefits, a 401(k) plan, and meaningful equity—because we believe top talent should be supported, secure, and fully invested in the future we’re building together.

Location: Our company is in-office at our Seattle HQ.

inference quantization kv-cache backend Python C++ Rust CUDA Triton ONNX TensorRT vLLM GPU distributed-systems low-latency profiling model-compilation serving

Average salary estimate

$200000 / YEARLY (est.)

min

max

$150000K

$250000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Backend Engineer – Product / AI Applications

Vercept Hybrid Seattle

VIEW

Posted 5 hours ago

Build and productionize backend systems that translate foundation-model capabilities into robust, user-facing AI product features at a research-led Seattle HQ.

Sr Fullstack developer (Elixir/Phoenix + Vue)

Jobgether Hybrid No location specified

VIEW

Posted 7 hours ago

1950Labs is looking for a Senior Fullstack Developer skilled in Elixir/Phoenix and Vue to build scalable web applications, improve ERP integrations, and drive performance improvements in a fully remote setup.

Junior DevOps Engineer

Jobgether Hybrid No location specified

VIEW

Posted 11 hours ago

An early-career opportunity at SynergisticIT for a Junior DevOps Engineer to support CI/CD pipelines, cloud automation, and deployments while growing technical and collaboration skills.

Lead ML Engineer

Launch Potato Hybrid Tulsa, OK (remote)

VIEW

Posted 18 hours ago

Lead a 3–5 person ML engineering team to architect and deliver large-scale personalization, retrieval, and ranking systems at Launch Potato, a profitable remote-first digital media company.

Elastic Engineer

AnaVation Hybrid Reston, VA

VIEW

Posted 13 hours ago

AnaVation seeks an experienced Elastic Engineer to develop and maintain Elastic (Elasticsearch/Logstash/Kibana) capabilities and support classified cyber infrastructure for Intelligence Community missions.

Development Lead

Trilogy Federal Hybrid Arlington, VA

VIEW

Posted 12 hours ago

Trilogy Federal is hiring a Development Lead to oversee Salesforce and Dynamics 365 development for the VA CAATS program, combining technical leadership with hands-on cloud development expertise.

Staff Software Engineer, Design Systems

Vanta Hybrid No location specified

VIEW

Posted 6 hours ago

Inclusive & Diverse

Growth & Learning

Customer-Centric

Collaboration over Competition

Medical Insurance

Maternity Leave

Flex-Friendly

401K Matching

Lead the technical direction and hands-on development of Vanta’s Design System to deliver consistent, accessible, and high-quality UI components and developer tooling across the product organization.

Principal AI Engineer, Payments

GoodLeap Hybrid No location specified

VIEW

Posted 22 hours ago

Dental Insurance

Disability Insurance

Flexible Spending Account (FSA)

Health Savings Account (HSA)

Vision Insurance

Performance Bonus

Family Medical Leave

Paid Holidays

Lead the design and delivery of LLM-powered agentic workflows and backend architecture that advance GoodLeap's payments platform and customer-facing payment experiences.

Java with DevOps Engineer (Remote - California)

Jobgether Hybrid No location specified

VIEW

Posted 10 hours ago

SynergisticIT (via Jobgether) seeks a Java with DevOps Engineer to develop Spring Boot backends and implement CI/CD, containerization, and cloud infrastructure automation.

Backend Engineer, SaaS

Veeam Software Hybrid Remote, United States

VIEW

Posted 2 hours ago

Help build Veeam's next-generation data protection SaaS platform as a Backend Engineer focused on scalable, cloud-native microservices and resilient production operations.

Senior Machine Learning Platform Software Engineer - Perception

Jobgether Hybrid No location specified

VIEW

Posted 5 hours ago

Glydways seeks a Senior Machine Learning Platform Software Engineer to design and operate production ML pipelines and infrastructure that power perception for autonomous vehicles.

Staff Software Engineer (Agentic AI & Data)

MLabs Hybrid No location specified

VIEW

Posted 11 hours ago

Lead the architecture and implementation of production-grade agentic AI systems for a venture-backed startup transforming property management.

Software Engineer Intern

Oshkosh Corporation Hybrid Greencastle, Pennsylvania, United States

VIEW

Posted 15 hours ago

Assist JLG's Software Engineering team in developing, testing, and documenting embedded and PC-based vehicle control software while gaining hands-on experience in field and office environments.

Director of Business Development - Americas

Jobgether Hybrid No location specified

VIEW

Posted 18 hours ago

Lead and mentor a team of software automation engineers to design and maintain automation frameworks and CI/CD processes that improve product quality and reliability at a remote-first travel technology company.

V Vercept

2 jobs

MATCH

Calculating your matching score...

FUNDING

Series B

DEPARTMENTS

Software Engineering

SENIORITY LEVEL REQUIREMENT

Senior Level

TEAM SIZE

No info