The ideal candidate will have strong distributed systems knowledge and AI/ML experience to design, build, and optimize scalable data pipelines, and infrastructure that power advanced analytics and machine learning solutions. In this role, you will collaborate closely with software engineers, product owners and business stakeholders to prepare and transform large datasets(realtime pipelines), support end-to-end development and deployment, and ensure robust, efficient, and secure data flows. You will leverage your expertise in cloud platforms, big data tools, and machine learning frameworks to drive innovation and deliver actionable insights that advance our organization’s AI initiatives and business objectives.
Little About Us:
Yahoo’s Central Data team manages massive scale (100+ petabyte) data systems to glean insights on Yahoo! products and to improve the experience for its 1B+ user base. The team provides the foundations for the user engagement data collection and processing for all of Yahoo’s users, Operational Excellence, Anomaly detection and Governance across the organization. Your work will directly influence product changes and you will work on a team of talented and motivated engineers to improve the user experience on popular Yahoo! sites and apps like Yahoo Mail & Homepage, Yahoo Sports, Yahoo Finance, Yahoo News and many other new products.
A Lot about You:
Apply software engineering expertise to build high-performance, scalable data warehouses.
Be excited to learn and take ownership for large-scale projects spanning many tech stacks and environments.
Design, build, and launch efficient & reliable data pipelines to move and transform data on the scale of multiple petabyte(s) using the latest technologies.
Build real time analytics and ingestion pipelines capable of processing more than a million events per second and provide insights at sub-second latency
Interact with product owners and end users to understand and solve new business requirements as they emerge.
Design and audit processes for ensuring the delivery of high-quality data through rigorous QA checks
Have excellent data modeling skills to understand the nuances of various dimension and metric types in warehouses.
Design workflows to ingest, load and present new data sets for users.
Provide active support, be on rotation for on-call support on production pipelines (typically a couple of times each quarter).
Define and manage SLA for all data sets in allocated areas of ownership.
Work with the production engineering / infrastructure team to drive resolution to production issues.
Required Skills:
BS/MS in Computer Science and/or Mathematics/Statistics
4+ years experience in relevant software development with at least 2 years of professional Java and/or Python experience
2+ years experience in the Big Data pipeline and analytics space with experience across technology stacks.
2+ years experience in custom ETL design using Big Data stack environments (Hadoop, MapReduce, Pig, Hive, AWS EMR, Apache Beam, Google Cloud Platform Dataflow, BigQuery), implementation and maintenance.
Preferred Experience:
Experience or familiarity with some of the following tools: Kafka, Storm, Streaming (Spark,Dataflow), ElasticSearch
Design, build, and maintain scalable data pipelines and ETL processes to support machine learning and AI initiatives on Google Cloud Platform (GCP).
Implement and optimize data storage solutions using GCP services such as BigQuery, Cloud Storage, and Dataflow.
Ensure data quality, integrity, and security throughout the data lifecycle.
Collaborate with data scientists, analysts, and business stakeholders to understand data requirements and deliver actionable insights.
Monitor, troubleshoot, and maintain the health and performance of cloud-based data infrastructure.
Automate manual processes and repetitive tasks to improve efficiency and reduce errors.
Apply data governance and compliance best practices to protect sensitive information and meet regulatory standards.
Stay current with new GCP features, tools, and best practices to continuously enhance data management capabilities.
Document solutions, processes, and architectural decisions to facilitate knowledge sharing and maintainability.
Experience working with either MapReduce or any other Parallel data processing system.
Experience with schema design and dimensional data modeling.
Comfortable writing complex SQL queries.
Strong data mindset with a deep appreciation for analyzing data to identify product gaps and enhancements to improve user engagement and revenue growth.
Excellent communication skills and ability to tell insightful stories using data and also manage communication within internal teams and stakeholders.
#LI-FM1
The material job duties and responsibilities of this role include those listed above as well as adhering to Yahoo policies; exercising sound judgment; working effectively, safely and inclusively with others; exhibiting trustworthiness and meeting expectations; and safeguarding business operations and brand integrity.
At Yahoo, we offer flexible hybrid work options that our employees love! While most roles don’t require regular office attendance, you may occasionally be asked to attend in-person events or team sessions. You’ll always get notice to make arrangements. Your recruiter will let you know if a specific job requires regular attendance at a Yahoo office or facility. If you have any questions about how this applies to the role, just ask the recruiter!
Yahoo is proud to be an equal opportunity workplace. All qualified applicants will receive consideration for employment without regard to, and will not be discriminated against based on age, race, gender, color, religion, national origin, sexual orientation, gender identity, veteran status, disability or any other protected category. Yahoo will consider for employment qualified applicants with criminal histories in a manner consistent with applicable law. Yahoo is dedicated to providing an accessible environment for all candidates during the application process and for employees during their employment. If you need accessibility assistance and/or a reasonable accommodation due to a disability, please submit a request via the Accommodation Request Form (www.yahooinc.com/careers/contact-us.html) or call +1.866.772.3182. Requests and calls received for non-disability related issues, such as following up on an application, will not receive a response.
We believe that a diverse and inclusive workplace strengthens Yahoo and deepens our relationships. When you support everyone to be their best selves, they spark discovery, innovation and creativity. Among other efforts, our 11 employee resource groups (ERGs) enhance a culture of belonging with programs, events and fellowship that help educate, support and create a workplace where all feel welcome.
The compensation for this position ranges from $143,625.00 - $299,375.00/yr and will vary depending on factors such as your location, skills and experience.The compensation package may also include incentive compensation opportunities in the form of discretionary annual bonus or commissions. Our comprehensive benefits include healthcare, a great 401k, backup childcare, education stipends and much (much) more.Currently work for Yahoo? Please apply on our internal career site.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Lead technical direction and delivery for full‑stack, large‑scale Search applications at Yahoo, driving architecture, performance and team excellence.
Philips seeks an experienced Director of Data Governance to lead North America data governance, compliance, and AI/ML stewardship while building enterprise-wide standards and trusted data practices.
SimplePractice seeks an experienced Director of Data Engineering to architect and lead scalable, secure cloud data platforms that power analytics and product features across a HIPAA-compliant environment.
WeRide.ai's Silicon Valley team is hiring a Data Engineering intern to help build and validate KPI, pipeline, and data-integrity solutions for ADAS and autonomous driving systems.
EVERSANA is hiring a Data Strategy Analyst to lead RWD vendor management, feasibility assessments, and hands-on data engineering using Snowflake, SQL, Python, and AWS to support life-sciences data initiatives.
Strategic Legal Practices is hiring a Junior Data Engineer to help build and maintain scalable data pipelines and analytic models while receiving hands-on mentorship from senior engineers.
Eli Lilly is hiring a Sr. Principal Engineer - Data to lead architecture and delivery of scalable, secure data platforms and pipelines supporting research and analytics in a hybrid cloud environment.
Build and maintain high-scale ETL pipelines as an entry-level Database Engineer on Visa's data team in Austin, working with SQL Server, SSIS, Python, and AWS Databricks.
Lead a team of data platform engineers and BI developers to design, build, and operate scalable, secure cloud-native analytics platforms and data products for Toyota.
Experienced Spear Data Mapping Specialist needed to perform data mapping, validation, and integration tasks on a hybrid, long-term 1099 engagement based in Brooklyn, NY.
Fannie Mae is hiring a Senior Associate AWS Data Integration and ETL Engineer to develop, provision, and optimize data integration environments and ETL pipelines on AWS.
Lead Outreach's BI & Analytics function to define KPIs, build trusted reporting, and deliver actionable insights that accelerate revenue and product outcomes.
Experienced SQL Developer with strong Snowflake expertise needed to build and optimize data pipelines, models, and analytics-ready schemas for DTCC support in New Jersey.
Lead Children’s Wisconsin’s enterprise data strategy and analytics program to advance clinical care, population health, research, and secure data interoperability across the organization.