Job details

Lead Software Engineer, Model Serving Platform

Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed by multi-million-dollar funding and direct sponsorship from AMD with hands-on support from AMD engineers the team is scaling rapidly to build the full stack powering frontier AI models and real-time applications.

We offer a fast-moving, collaborative environment where engineers have meaningful impact, learn quickly, and tackle deep technical challenges across the AI systems stack.

Role Overview

This is a rare chance to help architect and lead the development of Sciforium’s next-generation model serving platform, the high-performance engine that will bring a multimodal, highly efficient foundation model to market. As a senior technical leader, you’ll not only build core components yourself but also guide and mentor other engineers, influencing engineering direction, standards, and execution quality.

You will learn and shape the full AI stack: from GPU kernels and quantized execution paths to distributed serving, scheduling, and the APIs that power real-time AI applications. If you enjoy deep systems work, thrive on ownership, and want to lead engineers in building foundational AI infrastructure, this role puts you at the center of SciForium’s mission and growth.

Key Responsibilities

Lead the technical direction of the model serving platform, owning architecture decisions and guiding engineering execution.
Build core serving components including execution runtimes, batching, scheduling, and distributed inference systems.
Develop high-performance C++ and CUDA/HIP modules, including custom GPU kernels and memory-optimized runtimes.
Collaborate with ML researchers to productionize new multimodal models and ensure low-latency, scalable inference.
Build Python APIs and services that expose model capabilities to downstream applications.
Mentor and support other engineers through code reviews, design discussions, and hands-on technical guidance.
Drive performance profiling, benchmarking, and observability across the inference stack.
Ensure high reliability and maintainability through testing, monitoring, and engineering best practices.
Troubleshoot and resolve complex issues across GPU, runtime, and service layers.

Must-Haves

Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
5+ years of experience designing and building scalable, reliable backend systems or distributed infrastructure.
Strong understanding of LLM inference mechanics (prefill vs decode, batching, KV cache)
Experience with Kubernetes/Ray, Containerization
Strong proficiency in C++, Python.
Strong debugging, profiling, and performance optimization skills at the system level.
Ability to collaborate closely with ML researchers and translate model or runtime requirements into production-grade systems.
Effective communication skills and the ability to lead technical discussions, mentor engineers, and drive engineering quality.
Comfortable working from the office and contributing to a fast-moving, high-ownership team culture.

Nice to Have

Experience with ML systems engineering, distributed GPU scheduling, open source inference engine like vLLM, Sglang, or TRT-LLM
Experience in building large scale ML/MLOps infrastructure
Proficiency in CUDA or ROCm and experience with GPU profiling tools
Experience at an AI/ML startup, research lab, or Big Tech infrastructure/ML team.
Familiarity with multimodal model architectures, raw-byte models, or efficient inference techniques.
Contributions to open-source ML or HPC infrastructure

Why Join Us

Opportunity to build frontier-scale AI infrastructure powering next-generation LLMs and multimodal models.
Work with top-tier engineers and researchers across systems, GPUs, and ML frameworks.
Tackle high-impact performance and scalability challenges in training and inference.
Access state-of-the-art GPU clusters, datasets, and tooling.
Opportunity to publish, patent, and push the boundaries of modern AI
Join a culture of innovation, ownership, and fast execution in a rapidly scaling AI organization.

Benefits include

Medical, dental, and vision insurance
401k plan
Daily lunch, snacks, and beverages
Flexible time off
Competitive salary and equity

Equal opportunity

Sciforium is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.

C++ Python CUDA ROCm Model Serving Inference Distributed Systems Kubernetes Ray vLLM ML Infrastructure GPU LLM Multimodal Profiling Performance

Average salary estimate

$220000 / YEARLY (est.)

min

max

$180000K

$260000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

RPA Developer

Woongjin, Inc Hybrid Irvine, CA, USA

VIEW

Posted 23 hours ago

WOONGJIN, Inc. is hiring an experienced UiPath RPA Developer to build, tune, and operate enterprise automation solutions across SAP/Oracle and other data systems.

Salesforce Developer

Reliable Software Resources Hybrid Orlando, FL

VIEW

Posted 22 hours ago

Experienced Salesforce Developer needed in Orlando to design and build Apex/Visualforce solutions, integrations, and enterprise deployments for a client-facing engagement.

Senior Software Engineer I (Rise)

Jobgether Hybrid US

VIEW

Posted 7 hours ago

A remote-first company is hiring a Senior Software Engineer I to lead feature development, influence architecture, and mentor teammates while building scalable full‑stack applications.

C++ Software, Camera & Image Processing Intern (Summer 26')

Evolv Technology Hybrid Remote

VIEW

Posted 9 hours ago

Dental Insurance

Flexible Spending Account (FSA)

Vision Insurance

Performance Bonus

Family Medical Leave

Paid Holidays

Evolv Technology is hiring a Summer 2026 C++ Software, Camera & Image Processing Intern to work on embedded C++ development and image-processing projects for their 3D camera systems.

Software Engineering Manager, Data Protection Platform

Verkada Hybrid San Mateo, CA United States

VIEW

Posted 23 hours ago

Mission Driven

Inclusive & Diverse

Take Risks

Collaboration over Competition

Growth & Learning

Lead a security-focused engineering team at Verkada to design, implement, and drive adoption of data protection and privacy platform services across a large-scale SaaS Command platform.

Intern I - Mobile Application Developer

Dexcom Hybrid Remote - United States

VIEW

Posted 23 minutes ago

A summer software engineering internship at Dexcom focused on building and testing iOS and Android features for life-changing CGM mobile applications.

Software Engineer (Product)

Jobgether Hybrid US

VIEW

Posted 7 hours ago

Senior backend engineer to design scalable APIs and infrastructure, shape architecture, and drive product execution at an early-stage, high-growth product team (fully remote within the US).

Software Engineer Lead

Resource Innovations Hybrid No location specified

VIEW

Posted 14 hours ago

Lead development of secure, serverless AWS platforms and APIs that power utility integrations and grid-flexibility programs at a mission-driven energy transformation firm.

Senior Staff Software Engineer

Jobgether Hybrid US

VIEW

Posted 21 hours ago

Senior Staff Software Engineer needed to lead backend architecture and development for scalable microservices, mentor engineering teams, and drive strategic technical initiatives in a remote-friendly US role.

Senior Software Developer

Reach Hybrid No location specified

VIEW

Posted 5 hours ago

Experienced Java engineer needed to drive development of scalable Spring Boot services and integrations supporting Reach’s global payments and tax platform.

Embedded C++ Software Engineering Intern (Summer 26')

Evolv Technology Hybrid Remote

VIEW

Posted 9 hours ago

Dental Insurance

Flexible Spending Account (FSA)

Vision Insurance

Performance Bonus

Family Medical Leave

Paid Holidays

A 10-week Embedded C++ Software Engineering internship at Evolv offers hands-on firmware development, mentorship, and collaboration on machine-learning-enabled embedded projects with hybrid work flexibility.

Staff Software Engineer - Digital End-User Experience Platform

ServiceNow Hybrid Building A,B,C 2225 Lawson Lane, Santa Clara, CALIFORNIA, United States

VIEW

Posted 8 hours ago

Inclusive & Diverse

Mission Driven

Rise from Within

Diversity of Opinions

Work/Life Harmony

Empathetic

Feedback Forward

Take Risks

Collaboration over Competition

Medical Insurance

Dental Insurance

Vision Insurance

Mental Health Resources

Life insurance

Disability Insurance

Health Savings Account (HSA)

Flexible Spending Account (FSA)

Conferences Stipend

Paid Time-Off

Maternity Leave

Equity

Lead the design and delivery of scalable cloud-native components for ServiceNow’s Digital End-User Experience platform, shaping telemetry, UX, and automated remediation at enterprise scale.

Backend Engineer

Kernel Hybrid No location specified

VIEW

Posted 22 hours ago

Dental Insurance

Disability Insurance

Vision Insurance

Paid Holidays

Build and scale the core backend systems, APIs, and developer tools that power Kernel's AI agent platform as a founding backend engineer on a small, fast-moving team.

S Sciforium

5 jobs

MATCH

Calculating your matching score...

FUNDING

Series A

DEPARTMENTS

Software Engineering

SENIORITY LEVEL REQUIREMENT

Senior Level

TEAM SIZE

No info