Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Data Engineer (Founding Team) image - Rise Careers
Job details

Data Engineer (Founding Team)

Data/ETL Engineer (Founding Team)

Location: San Francisco Bay Area

Type: Full-Time

Compensation: Competitive salary + early-stage equity

Backed by 8VC, we're building a world-class team to tackle one of the industry’s most critical infrastructure problems.

About the Role

We’re building a multi-tenant, AI-native platform where enterprise data becomes actionable

through semantic enrichment, intelligent agents, and governed interoperability. At the heart of

this architecture lies our Data Fabric — an intelligent, governed layer that turns fragmented and

siloed data into a connected ontology ready for model training, vector search, and

insight-to-action workflows.

We're looking for engineers who enjoy hard data problems at scale: messy unstructured data,

schema drift, multi-source joins, security models, and AI-ready semantic enrichment. You’ll build the backend systems, data pipelines, connector frameworks, and graph-based knowledge

models that fuel agentic applications.

If you've worked on streaming unstructured pipelines, built connectors into ugly legacy systems, or mapped knowledge graphs that scale — this role will feel like home.

Responsibilities

  • Build highly reliable, scalable data ingestion and transformation pipelines across structured, semi-structured, and unstructured data sources

  • Develop and maintain a connector framework for ingesting from enterprise systems (ERPs, PLMs, CRMs, legacy data stores, email, Excel, docs, etc.)

  • Design and maintain the data fabric layer — including a knowledge graph (Neo4j or Puppygraph) enriched with ontologies, metadata, and relationships

  • Normalize and vectorize data for downstream AI/LLM workflows — enabling retrieval-augmented generation (RAG), summarization, and alerting

  • Create and manage data contracts, access layers, lineage, and governance mechanisms

  • Build and expose secure APIs for downstream services, agents, and users to query enriched semantic data

  • Collaborate with ML/LLM teams to feed high-quality enterprise data into model training and tuning pipelines

What We’re Looking For

Core Experience:

  • 5+ years building large-scale data infrastructure in production environments

  • Deep experience with ingestion frameworks (Kafka, Airbyte, Meltano, Fivetran) and data pipeline orchestration (Airflow, Dagster, Prefect)

  • Comfortable processing unstructured data formats: PDFs, Excel, emails, logs, CSVs, web APIs

  • Experience working with columnar stores, object storage, and lakehouse formats (Iceberg, Delta, Parquet)

  • Strong background in knowledge graphs or semantic modeling (e.g. Neo4j, RDF, Gremlin, Puppygraph)

  • Familiarity with GraphQL, RESTful APIs, and designing developer-friendly data access layers

  • Experience implementing data governance: RBAC, ABAC, data contracts, lineage, data quality checks

Mindset & Culture Fit:

  • You’re a system thinker: you want to model the real world, not just process it

  • Comfortable navigating ambiguous data models and building from scratch

  • Passionate about enabling AI systems with real-world, messy enterprise data

  • Pragmatic about scalability, observability, and schema evolution

  • Value autonomy, high trust, and meaningful ownership over infrastructure

Bonus Skills

Prior work with vector DBs (e.g. Weaviate, Qdrant, Pinecone) and embedding pipelines

Experience building or contributing to enterprise connector ecosystems

Knowledge of ontology versioning, graph diffing, or semantic schema alignment

Familiarity with data fabric patterns (e.g. Palantir Ontology, Linked Data, W3C standards)

Familiar with fine-tuning LLMs or enabling RAG pipelines using enterprise knowledge

Experience enforcing data access policy with tools like OPA, Keycloak, Snowflake row-level security

Why This Role Matters

Agents are only as smart as the data they operate on. This role builds the foundation — the semantic, governed, connected substrate — that makes autonomous decision-making and agent action possible. From factory ERP records to geopolitical news alerts, the data fabric unifies it all.

If you're excited to tame complexity, unify chaos, and power intelligent systems with trusted data — we’d love to hear from you.

Average salary estimate

$180000 / YEARLY (est.)
min
max
$140000K
$220000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs
Fabrion Hybrid San Francisco Bay Area
Posted 7 hours ago

Fabrion is seeking a founding Product Designer to craft AI-driven, scalable workflows for industrial enterprise collaboration in the San Francisco Bay Area.

Posted 6 hours ago

Agentic AI Lab is looking for an experienced ML Ops Engineer to lead building secure, scalable pipelines for model training, deployment, and governance in a pioneering AI infrastructure startup.

Posted 10 hours ago

Pro Bono Net invites a skilled Data Engineer to develop AI-informed data systems that promote equitable access to justice and enhance legal aid services.

Photo of the Rise User
Highmark Health Hybrid PA, Working at Home - Pennsylvania
Posted 6 hours ago

Senior Big Data Engineer needed to build and maintain scalable data solutions leveraging cloud and big data platforms at enGen.

Photo of the Rise User
Asana Hybrid San Francisco
Posted 13 hours ago
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Maternity Leave
Paternity Leave
Family Coverage (Insurance)
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance

A Senior Data Engineer is needed at Asana to lead the development of robust, scalable data solutions and collaborate with Finance, GTM, and Marketing teams in a hybrid, office-centric work environment in San Francisco.

Photo of the Rise User
Posted 20 hours ago

Lead Verdigris’ data engineering efforts by architecting high-performance pipelines and infrastructure that enable real-time energy intelligence, working fully remotely.

MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
No info
HQ LOCATION
No info
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
August 9, 2025
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!