Job details

Vision Language Model Engineer

Company Overview

EchoTwin AI is pioneering AI-driven infrastructure intelligence, redefining how cities are managed. Powered by a proprietary visual intelligence engine with full spatial reasoning, EchoTwin transforms municipal fleets into mobile urban sensors—creating living digital twins that provide real-time insights into infrastructure, compliance, and safety. By enabling municipalities to proactively monitor, predict, and resolve issues, EchoTwin helps build resilient, self-healing, and sustainable urban ecosystems. More than “smart cities,” EchoTwin is advancing the era of cognizant cities—urban environments with the awareness to see, think, and act on challenges in real time.

What You’ll Do

As a Vision Language Model Engineer, you will design, develop, and optimize advanced vision-language models that integrate visual and textual data to enable intelligent systems. You will work closely with cross-functional teams to build models that power applications such as image captioning, visual question answering, and multimodal AI at the edge.

Key Responsibilities

Design and implement state-of-the-art vision-language models using deep learning frameworks.
Develop and fine-tune models that combine computer vision and natural language processing for tasks like image captioning, visual question answering, and text-to-image generation.
Collaborate with data scientists and software engineers to integrate models into production systems.
Optimize model performance for accuracy, latency, and scalability in real-world applications.
Conduct experiments to evaluate model performance and iterate on architectures and training pipelines.
Stay up-to-date with the latest research in vision-language models and incorporate advancements into projects.
Contribute to data preprocessing, augmentation, and annotation pipelines for multimodal datasets.
Document model development processes and present findings to technical and non-technical stakeholders.

Qualifications

Bachelor’s, Master’s or Ph.D. in Computer Science, Machine Learning, Artificial Intelligence, or a related field (or equivalent experience).
3+ years of experience in machine learning, with a focus on vision-language models or multimodal AI.
Hands-on experience with deep learning frameworks such as PyTorch or TensorFlow.
Proven track record of building and deploying computer vision and/or NLP models.
Proficiency in Python and relevant ML libraries (e.g., Hugging Face, OpenCV, Transformers).
Experience with large-scale model training and optimization (e.g., distributed training, quantization).
Strong understanding of neural network architectures (e.g., CNNs, Transformers, CLIP, or similar).
Experience with multimodal datasets and preprocessing techniques for images and text.
Familiarity with cloud platforms (e.g., AWS, GCP, Azure) and model deployment workflows.
Strong problem-solving skills and ability to work in a fast-paced, collaborative environment.
Excellent communication skills to explain complex technical concepts to diverse audiences.

Benefits and Perks

There are endless learning and development opportunities from a highly diverse and talented peer group, including experts in various fields, including Computer Vision, GenAI, Digital Twin, Government Contracting, Systems and Device Engineering, Operations, Communications, and more!

Options for medical, dental, and vision coverage for employees and dependents (for US employees)
Flexible Spending Account (FSA) and Dependent Care Flexible Spending Account (DCFSA)
401(k) with 3% company matching
Unlimited PTO
Profit sharing

Please do not forward resumes to our jobs alias, EchoTwin AI employees, or any other company location. EchoTwin AI is not responsible for any fees related to unsolicited resumes.

vision-language multimodal computer vision NLP PyTorch TensorFlow Transformers CLIP OpenCV image captioning visual question answering text-to-image model optimization distributed training edge deployment digital twin municipal

Average salary estimate

$185000 / YEARLY (est.)

min

max

$150000K

$220000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Principal ML Engineer, Recommendation Systems

Launch Potato Hybrid Cincinnati, OH (remote)

VIEW

Posted 21 hours ago

Lead Launch Potato’s personalization vision as a Principal ML Engineer, architecting and shipping large-scale recommender systems that serve billions of predictions and directly impact business KPIs.

Principal ML Engineer, Ad Performance

Launch Potato Hybrid Columbia, SC (remote)

VIEW

Posted 21 hours ago

Lead Launch Potato's personalization strategy as Principal ML Engineer, architecting production ML systems that deliver billions of predictions and directly improve business KPIs.

2026 Summer Data Analytics/Data Science Intern

iberdrola Hybrid United States Of America, New York, Rochester

VIEW

Posted 3 hours ago

Avangrid is hiring a 2026 Summer Data Analytics/Data Science Intern to support data analysis, reporting, and process improvement projects within its energy-focused analytics teams.

Computational Social Science Assistant, Data Labs

The Pew Charitable Trusts Hybrid Washington, DC (Pew Research Center)

VIEW

Posted 2 hours ago

Pew Research Center's Data Labs is hiring a hybrid Computational Social Science Assistant to develop and review R/Python data collection and analysis code and support ML and research tasks.

Machine Learning Engineer

Doppel Hybrid New York

VIEW

Posted 18 hours ago

Doppel is hiring a Machine Learning Engineer to build and scale production ML systems that identify and neutralize social engineering threats across massive web and social datasets.

Principal Machine Learning Engineer, Recommendation Systems

Launch Potato Hybrid Columbus, OH (remote)

VIEW

Posted 22 hours ago

Lead Launch Potato’s personalization strategy and architecture as a Principal ML Engineer, designing and shipping large-scale recommender systems that drive billions of predictions and measurable business impact.

Principal Machine Learning Engineer, Ad Performance

Launch Potato Hybrid Charlotte, NC (remote)

VIEW

Posted 21 hours ago

Lead personalization strategy and architect large-scale ML platforms at Launch Potato to drive ad performance and product impact using advanced recommender, causal, and bandit techniques.

Lead Data Scientist, Devices (Remote - US)

Jobgether Hybrid No location specified

VIEW

Posted 21 hours ago

Lead device-focused data science efforts to build and deploy ML models that power smarter connected devices and improve user experiences across Life360's products.

Manager, Consumer Data & Advanced Analytics

Mattel Hybrid 333 Continental Blvd, El Segundo, CALIFORNIA

VIEW

Posted 24 hours ago

Inclusive & Diverse

Empathetic

Collaboration over Competition

Growth & Learning

Lead the Consumer Data & Advanced Analytics function for Mattel’s DTC brands, building models, tooling, and insights that drive consumer experience and business outcomes.

2026 Intern, Biobanking

modernatx Hybrid Cambridge, Massachusetts

VIEW

Posted 6 hours ago

Moderna is looking for a Co-op student to develop and optimize GPT-based assistants that support biobank and biomarker operations, integrating AI solutions with internal knowledge bases and workflows.

Data Scientist (Remote - New York)

Jobgether Hybrid No location specified

VIEW

Posted 13 hours ago

Work with a leading fintech team as a Data Scientist to build scalable data infrastructure, run experiments, and deliver insights that drive product and business decisions.

Data Scientist (3886)

GBG Hybrid No location specified

VIEW

Posted 21 hours ago

Lead the creation of automated customer-evaluation analytics and ROI storytelling as the first dedicated Data Scientist embedded in GBG's Go-to-Market organization.

Data Scientist

MLabs Hybrid No location specified

VIEW

Posted 21 hours ago

MLabs is hiring a Data Scientist to develop and productionize statistical and machine learning models that detect insurance fraud across workers' compensation and personal injury domains.

E EchoTwin AI

3 jobs

MATCH

Calculating your matching score...

FUNDING

Private

DEPARTMENTS

Data Science

SENIORITY LEVEL REQUIREMENT

Mid-Level

TEAM SIZE

No info