Job details

Data Engineer

About Kepler AI

Kepler AI is building a transparent and intelligent deep-research platform

Financial professionals spend 60-70% of their time manually gathering and consolidating data in a $26.5 trillion industry where speed and accuracy directly impact outcomes. The research landscape has fragmented into dozens of specialized systems — analysts toggle between platforms for financials, transcripts, market data, and macro indicators, reviewing hundreds of documents across disconnected sources for a single investment thesis.
Generic AI tools promise efficiency but fail the trust test. They hallucinate data, confabulate reports, and provide insights without showing their work, forcing analysts back into manual verification. In an industry where being wrong costs millions, opacity isn't acceptable.

Kepler solves this by automating research while maintaining the accuracy and traceability financial decisions demand. The result: faster decisions, deeper analysis, and a competitive advantage where synthesizing information more thoroughly than competitors translates directly to performance.

Kepler AI was founded by two Palantir veterans with 20 years of combined experience building core parts of Palantir's Gotham and Foundry Platform. Our founders created Palantir Quiver, the analytics engine behind $100M+ enterprise deals with BP and Airbus, architected core compute and data systems, led major Department of Defense projects, and served as Head of Business Engineering at Citadel.

We're backed by founders of OpenAI, Facebook AI, MotherDuck, DBT, and Outerbounds.

The Role

As a Data Engineer at Keru.ai, you'll be the architect of the data infrastructure that powers our AI-native research platform. You'll own the pipelines that ingest, transform, and deliver critical financial data, from SEC filings to proprietary vendor feeds. ensuring our platform has the reliable, high-quality data foundation that sophisticated financial research demands.

This role embodies our belief that exceptional AI requires exceptional data. Your pipelines will feed the research workflows of portfolio managers at firms managing billions in assets. Your data quality decisions directly impact million-dollar investment outcomes.

Within your first 90 days, you will:

Own and optimize our SEC data ingestion pipelines end-to-end
Build and maintain integrations with key data vendors
Develop deep expertise in financial data formats, taxonomies, and quality standards
Ship improvements that measurably increase data freshness and reliability

This is the right role if you want to build the data backbone of the future of financial research, with guidance from engineers who've scaled enterprise data platforms from zero to global adoption.

What You'll Do

Own critical data pipelines: Design, build, and maintain the pipelines that ingest SEC filings (EDGAR), vendor data feeds, and alternative data sources into our platform.
Ensure data quality at scale: Implement validation, monitoring, and alerting systems that guarantee the accuracy and freshness our clients' research depends on.
Architect for reliability: Build fault-tolerant, self-healing pipelines that handle the unpredictable nature of external data sources and vendor APIs.
Optimize performance: Solve complex challenges around data freshness, processing latency, and storage efficiency for large-scale financial datasets.
Drive data infrastructure innovation: Identify opportunities to expand our data coverage, improve pipeline efficiency, and enhance data accessibility for our AI platform.
Collaborate across teams: Work closely with product and AI engineers to ensure our data infrastructure meets the evolving needs of the platform and our clients.

What We're Looking For

Must-haves

3–5 years of data engineering experience with a track record of building and maintaining production data pipelines.
ETL/ELT expertise: Deep experience designing and operating data ingestion, transformation, and orchestration systems.
Strong Python skills with experience in data processing frameworks and pipeline orchestration tools (Airflow, Temporal, or similar).
SQL proficiency: Advanced SQL skills and experience with analytical databases.
Data quality mindset: Experience implementing data validation, monitoring, and observability for critical pipelines.
API integration experience: Comfort working with external APIs, handling rate limits, authentication, and unreliable endpoints.

Nice-to-haves

Financial data experience: Familiarity with SEC EDGAR, XBRL, or financial data vendors (Bloomberg, FactSet, S&P, etc.).
Streaming experience: Background with real-time data processing (Kafka, Flink, or similar).
Rust or Node.js experience.
Startup experience where you owned data infrastructure end-to-end.
Cloud infrastructure experience: Hands-on with AWS data services, Kubernetes, or infrastructure-as-code.

Don't check every box? Apply anyway. We prioritize speed of learning, problem-solving skills, attention to detail, and drive to build world-class data infrastructure.

Mentorship & Growth

You'll be directly mentored by engineers who built Palantir's core data systems. Expect:

Weekly 1:1s with senior engineers who've architected enterprise-scale data platforms
Deep architectural reviews and guidance on pipeline design
Clear growth path toward technical leadership and data platform ownership
Learn by building—production systems that power real financial research

At Keru.ai, mentorship accelerates strong data engineers into exceptional technical leaders.

Our Technical Stack

Backend: Python, Node.js, Rust, PostgreSQL, Redis
Data Infrastructure: Apache Airflow, Kafka, Temporal, dbt
AI/ML: OpenAI/Anthropic/OpenRouter, Vector Databases
Infrastructure: AWS, Docker, Kubernetes
Monitoring: Datadog
Tools: Git, GitHub Actions, Pulumi

Benefits

Comprehensive medical, dental, vision, 401k, insurance for employees and dependents
Automatic coverage for basic life, AD&D, and disability insurance
Daily lunch in office
Development environment budget (latest MacBook Pro, multiple monitors, ergonomic setup, and any development tools you need)
Unlimited PTO policy
"Build anything" budget - dedicated funding for whatever tools, libraries, datasets, or infrastructure you need to solve technical challenges, no questions asked
Learning budget - attend any conference, course, or program that makes you better at what we're building

Our Operating Principles

Forward-Deployed with Product DNA: We own customer outcomes, while building a product company. We don't win if our customers don't win. That means embedding, iterating, and deploying where our customers are.
Extreme Ownership: We have a big vision and everyone owns it. If you notice a problem, you own it - diagnose, coordinate, and close the loop. Authority comes from initiative, not job titles, and once you step up, you're accountable for the outcome.
Production-First Engineering: We design for customers' most critical workloads from day one. The platform runs on durable execution paths, blue/green deploys, automated rollbacks, and a continuous-delivery pipeline with end-to-end observability, so every change lands safely and stays resilient under real-world load.
Trust as the Default: We operate on the simple premise that people do their best work when confidence is mutual and earned in the open. That means we show our work, keep our promises, and flag risks before they bite. Automated tests, uptime dashboards, and clear communication back our competence; predictable delivery proves our consistency; candid retros and honest trade-offs reveal our character. Put together, trust isn't an aspiration, it's the baseline everyone can count on.
Keep Raising the Bar: We block time for training, code-health sprints, and deep-dive tech talks, because a sharper team and a cleaner stack pay compounding dividends. Continuous learning isn't a perk, it's part of the job.

Kepler AI is an Equal Opportunity Employer and prohibits discrimination and harassment of any kind. We are committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment.

Data Engineer Python SQL Airflow ETL EDGAR XBRL Kafka dbt AWS Temporal Data quality Financial data Data pipelines Streaming

Average salary estimate

$165000 / YEARLY (est.)

min

max

$140000K

$190000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Senior Software Engineer

Kepler Group Hybrid New York City

VIEW

Posted 3 hours ago

Kepler AI is hiring a Senior Software Engineer to architect and ship production-grade multi-agent orchestration systems that power autonomous financial research at enterprise scale.

Sr. Data Engineer I

iHerb Hybrid United States of America - Remote / Home Office

VIEW

Posted 10 hours ago

iHerb is seeking an experienced Senior Data Engineer to design and operate cloud-native data platforms and MLOps pipelines that enable production AI/ML at scale.

Data Engineer

KAYAK Software Corporation Hybrid Cambridge

VIEW

Posted 1 hour ago

Experienced Data Engineer wanted to build and operate scalable marketing data pipelines at KAYAK, leveraging Python, SQL, Airflow, and modern data architecture to enable analytics and experimentation.

Data Warehouse Engineer

One Park Financial Hybrid No location specified

VIEW

Posted 4 hours ago

Experienced Data Warehouse Engineer needed to build and maintain AWS data lakes, warehouses, and analytics-ready pipelines to support FinTech analytics and financial models at One Park Financial.

Senior Data Solutions Architect

Kentro Hybrid No location specified

VIEW

Posted 2 hours ago

Lead architecture and implementation of enterprise-scale lakehouse and cloud data platforms for a mission-driven consulting firm, delivering secure, compliant solutions across Azure and AWS.

Software Engineer II, Data

Iambic Therapeutics, Inc Hybrid San Diego

VIEW

Posted 1 hour ago

Iambic Therapeutics is looking for an experienced Data Engineer to design and optimize multi-terabyte data pipelines and data storage to support AI-driven drug discovery in a remote-friendly role.

Remote AI Data Integration Specialist

Kentro Hybrid No location specified

VIEW

Posted 3 hours ago

Kentro is hiring a senior AI Data Integration Specialist to architect and implement AI/ML-ready data pipelines and governance for mission-critical VA operations (remote, US ET hours).

AI Analytics Engineer

Clay Labs Hybrid New York

VIEW

Posted 20 hours ago

Lead the engineering and scaling of Clay’s Revenue + Cost + Margin Engine, building auditable, AI-native data models and interfaces that power company-wide decisions.

Data Engineer, Go-To-Market

Notion Labs Hybrid San Francisco

VIEW

Posted 1 hour ago

Inclusive & Diverse

Transparent & Candid

Mission Driven

Collaboration over Competition

Empathetic

Social Impact Driven

Rise from Within

Work/Life Harmony

Maternity Leave

Paternity Leave

Family Coverage (Insurance)

Medical Insurance

Dental Insurance

Vision Insurance

Mental Health Resources

Life insurance

Disability Insurance

Health Savings Account (HSA)

Flexible Spending Account (FSA)

Paid Time-Off

Help shape Notion’s GTM data foundation by designing and shipping scalable datasets and pipelines that power marketing, sales, and revenue analytics.

Analytics Engineer

Esperta Health Hybrid No location specified

VIEW

Posted 3 hours ago

A U.S.-based company seeks an Analytics Engineer to deliver Power BI dashboards, SQL analytics, and gold-layer data models while helping scale the analytics platform.

Kepler Group

Kepler provides digital and database services to Fortune 500 clients in the financial services, retail, healthcare and other industries. Our core services revolve around helping clients use data to power more dynamic and personalized marketing - i...

2 jobs

MATCH

Calculating your matching score...

FUNDING

Seed

DEPARTMENTS

Data

SENIORITY LEVEL REQUIREMENT

Mid-Level

INDUSTRY

Advertising & Public Relations

TEAM SIZE