About Daydream
Daydream is the first chat-based shopping agent built exclusively for fashion. Designed to redefine how people search and discover fashion, Daydream offers a personalized, conversational experience powered by advanced AI and natural language understanding.
Backed by top-tier investors including Forerunner Ventures, Index Ventures, Google Ventures, and True Ventures, our team is committed to shaping the future of shopping.
About the role
Are you passionate about the intersection of high fashion and cutting-edge artificial intelligence?
Are you passionate about building the data foundations that power truly intelligent systems? As a Data Engineering Lead at Daydream, you will be a foundational member of the team, responsible for designing and building the entire data ecosystem that fuels our AI Personal Stylist. This is a unique opportunity to solve complex technical challenges while directly shaping a product that will revolutionize how people shop online.
What you’ll do:
Design, build, and optimize scalable, parallel data processing pipelines on Google Cloud to handle massive volumes of offline data.
Implement and manage large-scale LLM batch inference jobs, processing millions of data points to enrich our product catalog with sophisticated, AI-generated attributes.
Architect and own the data infrastructure for our Fashion Knowledge Graph, leveraging BigQuery and parallel data processing frameworks.
Develop and maintain robust feature generation pipelines to craft high-quality signals for both the training and inference of our machine learning models.
Orchestrate complex workflows of data processing jobs, implementing robust monitoring, alerting, and data quality validation systems to ensure reliability and trust in our data.
Collaborate closely with data science and machine learning teams to understand data requirements and deliver production-grade data solutions.
Champion engineering best practices, including writing clean, maintainable Python and SQL, and drive a culture of high-quality data and operational excellence.
Who you are
You have extensive experience building and deploying data solutions on a major cloud platform (preferable Google Cloud Platform)
You are highly proficient with distributed data processing frameworks such as Apache Spark, Flink, or Polars.
You possess exceptional Python coding skills, with a deep understanding of writing efficient, testable, and maintainable code for data applications.
You have expert-level SQL skills and deep experience with modern cloud data warehouses like BigQuery, Snowflake, or Redshift.
You have hands-on experience with workflow orchestration tools like Airflow, Argo or Kubeflow.
You are a pragmatic and proactive builder who thrives in a fast-paced, autonomous startup environment, capable of driving projects from concept to production.
You are an empathetic and collaborative teammate, skilled at communicating complex technical ideas and passionate about building the reliable infrastructure that empowers your colleagues.
You are a natural leader who enjoys mentoring and developing teammates and aligning work to provide growth opportunities while ensuring priorities are aligned with broader company goals
What we offer
Competitive salary, equity and benefits (medical, dental, vision, 401k, etc.)
Flexible vacation and remote working
The opportunity to be part of a groundbreaking, AI-focused company
Collaborative work environment with a team of talented, fun-loving individuals.
Opportunity to learn and grow in your career while shaping the future of fashion, shopping and technology
Commitment to Diversity
Daydream is proud to be an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees, regardless of race, color, religion, gender, sexual orientation, gender identity, age, national origin, or disability status.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Work remotely as a contract LLM Agent Trainer to design realistic multi-turn dialogues and tool-usage examples in Swedish, Norwegian, Danish, or Traditional Chinese to improve and benchmark a healthcare AI platform.
Senior Data Engineer to build and optimize dbt/BigQuery-powered data pipelines for Wpromote's Polaris MarTech/AdTech platform, working cross-functionally in a remote-first environment.
Senior AI and Data executive needed to lead Zip US’s enterprise AI, agentic systems, ML engineering, and data strategy to enable agentic commerce and payments at scale.
DMV IT Service LLC is hiring a Data Engineer to architect and operate scalable GCP-based data pipelines and ensure high-quality datasets for business analytics.
Lead Enetpulse integration and the canonical club catalog at an early-stage crypto-native sports platform, owning data pipelines, entity resolution, and operational tooling.
Lead Regeneron's Pulmonology Insights & Analytics team to deliver strategic, data-driven insights that shape commercialization, launch planning, and competitive strategy.
Lead the architecture and team-building for Vanta’s Snowflake-based data platform, delivering secure, FedRAMP-capable analytics and operational data solutions across the company.
Experienced BI engineer needed to design advanced Medicaid analytics and dashboards using SQL and modern BI tools to inform clinical and market decision-making at Humana.
Jellyfish seeks a Senior Analytics Manager to architect and implement GA4/GTM measurement solutions and drive data-informed decision making for major global clients.
Lead architecture and implementation of scalable data platforms and pipelines to enable analytics and ML across a fast-growing, data-driven company.
Lead FHIR-centered data engineering for medical AI assistants at a US-based healthcare AI company, focusing on FHIR, Epic Clarity, large-scale data integration, and clinical workflow optimization.
Tokyo Electron America seeks an experienced Data Engineer III to lead the design and operation of scalable ETL pipelines and data architectures for semiconductor productivity analytics.
Contribute to a multi-year clinical data registry by designing and building scalable data lakes, automated ETL/ELT pipelines, and analytics-ready data platforms in a hybrid consultancy setting.