Job details

AI/ML Data Engineer

AI/ML Data Engineer

College Board - Technology

Location: This is a fully remote role that requires working EST hours. Candidates who live near CB offices have the option of being fully remote or hybrid (Tuesday and Wednesday in office).

Type: This is a full-time position

About the Team

Aquifer is a small, highly collaborative team that implements data and analytics services powering higher‑education recruitment and student engagement for College Boards’ BigFuture Division. We experiment thoughtfully and ship durable, secure data products that personalize outreach and help partners execute strategic enrollment plans.

Our team has a mix of engineers and architects that blends expertise in data engineering, analytics, and product strategy to deliver scalable solutions that transform how students connect with colleges. We value curiosity, reliability, and clear communication, and we work closely across disciplines to ensure every product is impactful, maintainable, and user-focused.

About the Opportunity

As an AI/ML Data Engineer, you’ll design, build, and operate the data and ML plumbing that powers personalized student experiences at scale. You’ll create batch and streaming pipelines, ML‑ready datasets, feature/embedding stores, and the services that move models into production safely and compliantly. You’ll collaborate with Product, Data Science, and Analytics to turn raw events into reliable, privacy‑preserving features that drive real impact for students and higher‑ed partners.

In this role, you will:

ML Data Platform & Pipelines (40%)

Design, build, and own batch and streaming ETL (e.g., Kinesis/Kafka → Spark/Glue → Step Functions/Airflow) for training, evaluation, and inference use cases.

Stand up and maintain offline/online feature stores and embedding pipelines (e.g., S3/Parquet/Iceberg + vector index) with reproducible backfills.

Implement data contracts & validation (e.g., Great Expectations/Deequ), schema evolution, and metadata/lineage capture (e.g., OpenLineage/DataHub/Amundsen).

Optimize lakehouse/warehouse layouts and partitioning (e.g., Redshift/Athena/Iceberg) for scalable ML and analytics.

Model Enablement & LLM DataOps (30%)

Productionize training and evaluation datasets with versioning (e.g., DVC/LakeFS) and experiment tracking (e.g., MLflow).

Build RAG foundations: document ingestion, chunking, embeddings, retrieval indexing, and quality evaluation (precision@k, faithfulness, latency, and cost).

Collaborate with DS to ship models to serving (e.g., SageMaker/EKS/ECS), automate feature backfills, and capture inference data for continuous improvement.

Reliability, Security & Compliance (15%)

Define SLOs and instrument observability across data and model services (freshness, drift/skew, lineage, cost, and performance).

Embed security & privacy by design (PII minimization/redaction, secrets management, access controls), aligning with College Board standards and FERPA.

Build CI/CD for data and models with automated testing, quality gates, and safe rollouts (shadow/canary).

Documentation & Enablement (15%)

Maintain docs‑as‑code for pipelines, contracts, and runbooks; create internal guides and tech talks.

Mentor peers through design reviews, pair/mob sessions, and post‑incident learning.

About You

You have:

4+ years in data engineering (or 3+ with substantial ML productionization), with strong Python and distributed compute (Spark/Glue/Dask) skills.

Proven experience shipping ML data systems (training/eval datasets, feature or embedding pipelines, artifact/version management, experiment tracking).

MLOps/LLMOps: orchestration (Airflow/Step Functions), containerization (Docker), and deployment (SageMaker/EKS/ECS); CI/CD for data & models.

Expert SQL and data modeling for lakehouse/warehouse (Redshift/Athena/Iceberg), with performance tuning for large datasets.

Data quality & contracts (Great Expectations/Deequ), lineage/metadata (OpenLineage/DataHub/Amundsen), and drift/skew monitoring.

Cloud experience preferably with AWS services such as S3, Glue, Lambda, Athena, Bedrock, OpenSearch, API Gateway, DynamoDB, SageMaker, Step Functions, Redshift and Kinesis BI tools like Tableau, Quicksight, or Looker for real-time analytics and dashboards

Security and privacy mindset; ability to design compliant pipelines handling sensitive student data.

An ability to judiciously evaluate the feasibility, fairness, and effectiveness of AI solutions and articulate considerations and concerns around implementing models in the context of specific business applications

Excellent communication, collaboration, and documentation habits.

Preferred

RAG & vector search experience (OpenSearch KNN/pgvector/FAISS) and prompt/eval frameworks.

Real‑time feature engineering (Kinesis/Kafka) and low‑latency stores for online inference.

Testing strategies for ML systems (unit/contract tests, data fuzzing, offline/online parity checks).

Experience in higher‑ed/assessments data domains.

All roles at College Board require:

A passion for expanding educational and career opportunities and mission-driven work

Authorization to work in the United States for any employer

Curiosity and enthusiasm for emerging technologies, with a willingness to experiment with and adopt new AI-driven solutions and a comfort learning and applying new digital tools independently and proactively.

Clear and concise communication skills, written and verbal

A learner's mindset and a commitment to growth: welcoming diverse perspectives, giving and receiving timely, respectful feedback, and continuously improving through iterative learning and user input.

A drive for impact and excellence: solving complex problems, making data-informed decisions, prioritizing what matters most, and continuously improving through learning, user input, and external benchmarking.

A collaborative and empathetic approach: working across differences, fostering trust, and contributing to a culture of shared success.

About Our Process 

Application review will begin immediately and will continue until the position is filled. This role is expected to accept applications for a minimum of 5 business days.

While the hiring process may vary, it generally includes: resume and application submission, recruiter phone/video screen, hiring manager interview, performance exercise such as live coding, a panel interview, a conversation with leadership and reference checks.

What We Offer

At College Board, we offer more than just a paycheck—we provide a meaningful career, a supportive team, and a comprehensive package designed to help you thrive. We’re a self-sustaining nonprofit that believes in fair and competitive compensation, grounded in your qualifications, experience, impact, and the market.

A Thoughtful Approach to Compensation

The hiring range for this role is $137K–$148K.

Your exact salary will depend on your location, experience, and how your background compares to others in similar roles at the College Board.

We aim to make our best offer upfront—rooted in fairness, transparency, and market data.

We adjust salaries by location to ensure fairness, no matter where you live.

You’ll have open, transparent conversations about compensation, benefits, and what it’s like to work at College Board throughout your hiring process. Check out our careers page for more.

#LI-REMOTE

#LI-AP1

AI/ML Data Engineer Data Engineering MLOps LLMOps Spark Glue Python Feature Store Embeddings Vector Search SageMaker Airflow Redshift Kinesis Great Expectations OpenLineage DVC MLflow Docker AWS

Average salary estimate

$142500 / YEARLY (est.)

min

max

$137000K

$148000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Engineering Manager, Data Science Team

College Board Hybrid Remote - USA

VIEW

Posted 9 hours ago

College Board is looking for an Engineering Manager to lead its remote Data Science team, driving technical delivery, team growth, and cross-organizational alignment for data-driven solutions.

AI/ML Data Engineer

Chabez Tech Hybrid NJ-73, Berlin, NJ, USA

VIEW

Posted 11 hours ago

Experienced AI/ML Data Engineer needed to design, build, and deploy large-scale data pipelines and models using Python, PySpark, Microsoft Fabric, Power BI, and SQL for a 12+ month contract in Berlin, NJ.

Director of Data

KIT Hybrid No location specified

VIEW

Posted 24 hours ago

Lead Kit’s centralized data organization to build a trusted, scalable modern data stack that powers product decisions, growth experiments, and predictive capabilities for creator success.

Senior Data Quality Engineer (I, II, III)

Cellares Hybrid South San Francisco, CA

VIEW

Posted 3 hours ago

Cellares is looking for a Senior Data Quality Engineer to lead automated validation, reconciliation, and monitoring of Databricks Lakehouse pipelines that power its cell therapy manufacturing platform.

Senior Data Engineer

University of Maryland Medical System Hybrid North Bethesda, MD

VIEW

Posted 3 hours ago

Lead the design and implementation of scalable analytics and data engineering solutions that power clinical and operational insights across the University of Maryland Medical System.

Senior Lead Data Engineer - R01557349

Brillio Hybrid San Francisco, California, United States

VIEW

Posted 2 hours ago

Lead the design and delivery of large-scale ELT pipelines and data warehouse solutions using Hadoop, Spark, Python, and advanced SQL to support analytics and product teams.

Analytics Engineer

Verkada Hybrid San Mateo, CA United States

VIEW

Posted 6 hours ago

Mission Driven

Inclusive & Diverse

Take Risks

Collaboration over Competition

Growth & Learning

Verkada is hiring an on-site Analytics Engineer in San Mateo to build scalable dbt data models, develop Looker dashboards, and translate marketing data into growth-driving insights.

2026 - Analytics Engineering Intern, NYC

MongoDB Hybrid New York City

VIEW

Posted 13 hours ago

Work at MongoDB as an Analytics Engineering Intern building end-to-end analytics pipelines, writing complex SQL, and contributing to data-driven projects in a hybrid NYC environment.

Principal, Data Analytics Engineering

theocc Hybrid Chicago - 125 S Franklin

VIEW

Posted 20 hours ago

Lead the design and delivery of OCC’s cloud analytics semantic layer and ETL pipelines to provide reliable, query-ready data for enterprise analytics and risk teams.

Data Platform Engineer

Trulioo Hybrid San Diego

VIEW

Posted 22 hours ago

Contribute to Trulioo’s global identity platform as a Data Platform Engineer, building scalable ETL, vector search, and ML-integrated data systems from our San Diego hub in a hybrid role.

Data Engineer

NBCUniversal Hybrid 30 Rockefeller Plaza, New York, NEW YORK

VIEW

Posted 12 hours ago

Help power international streaming analytics at NBCUniversal by developing reliable, observable batch and streaming data pipelines that enable fast business insights.

Catalog Data Specialist

Aumovio Hybrid 6755 Snowdrift Rd, Allentown, PA 18106, USA

VIEW

Posted 7 hours ago

AUMOVIO seeks a detail-oriented Catalog Data Specialist to manage ACES/PIES electronic catalogs, vehicle application data, and supplier/customer catalog communications for multiple product brands.

College Board

Our mission is to clear a path for all students to own their future, with a focus on those too often overlooked and underrepresented.

3 jobs

MATCH

Calculating your matching score...

FUNDING

Nonprofit

DEPARTMENTS

Data

SENIORITY LEVEL REQUIREMENT

Mid-Level

INDUSTRY