Job details

Data Engineer - Bioinformatics

Primary Work Address: 19700 Helix Drive, Ashburn, VA, 20147

Current HHMI Employees, click here to apply via your Workday account.

Summary:

AI@HHMI: HHMI is investing $500 million over the next 10 years to support AI-driven projects and to embed AI systems throughout every stage of the scientific process in labs across HHMI. The AI initiative will be centered at HHMI’s Janelia Research Campus. Janelia has been at the forefront of AI-driven research in biology for more than 15 years. Its forward-thinking structure, centralized funding, and collaborative culture make it ideally suited to take this bold leap forward. To learn more about the initiative, visit here.  

Please include a cover letter with your application detailing your qualifications and experience as they relate to this position.

About the role:

We're seeking a skilled Data Engineer to drive scientific innovation through robust data infrastructure. In this role, you’ll design, develop, and optimize scalable data pipelines and tools for the ingestion, transformation, and integration of large, heterogeneous datasets, focused on integration of complex biological sequence datasets from diverse sources. This includes writing production-quality Python code to parse, validate, and transform sequence data from published research papers, public databases, and experimental outputs. This role requires both technical excellence in data engineering and the ability to understand biological research contexts to ensure data integrity and scientific validity. Your work will directly support computational research initiatives, including machine learning and AI applications. Collaborating closely with multidisciplinary teams of computational and experimental scientists, you’ll help define and implement best practices in data engineering, ensuring data quality, accessibility, and reproducibility. You’ll also be responsible for maintaining detailed documentation, potentially mentoring junior engineers, and automating workflows to streamline the path from raw data to scientific insight.

What we provide:

A competitive compensation package, with comprehensive health and welfare benefits.
A supportive team environment that promotes collaboration and knowledge sharing.
The opportunity to engage with world-class researchers, software engineers and AI/ML experts, contribute to impactful science, and be part of a dynamic community committed to advancing humanity’s understanding of fundamental scientific questions.
Amenities that enhance work-life balance such as on-site childcare, free gyms, available on-campus housing, social and dining spaces, and convenient shuttle bus service to Janelia from the Washington D.C. metro area.

What you’ll do:

Design and implement scalable, robust data pipelines for bioinformatics data using workflow managers (Snakemake, Nextflow, or similar) that perform data validation and quality control at every pipeline stage through tests and clear data visualization.
Stay up to date with scientific literature to understand data context and processing requirements.
Document data provenance and transformation steps comprehensively.
Apply statistical tools and programming languages (e.g., Python, R) to analyze large datasets, develop custom functions, and extract actionable insights through effective visualization.
Establish and maintain data standards, formats, workflows, and documentation to ensure data quality, accessibility, and reproducibility across projects.
Collaborate with interdisciplinary teams, potentially mentor junior engineers, and direct or assist in directing the work of others to meet project goals while advising stakeholders on data strategies and best practices.

What you bring:

Bachelor’s degree in Computer Science, Data Science, Statistics, Applied Mathematics, or a related field with 3+ years of experience applying and customizing data mining and data analysis methods and techniques. An equivalent combination of education and relevant experience will be considered.
Experience with data formats such as FASTA, FASTQ, and annotation files.
Experience with data validation and quality control techniques.
Clear technical documentation and communication skills.
Experience in building scalable data solutions, working with big data technologies, and ensuring data quality and accessibility.
Expertise in utilizing data visualization libraries and software (e.g., Matplotlib, R, Jupyter notebooks).
Demonstrated expertise in statistical analysis.
Detail-oriented, creative, and organized team player with strong communication skills and a collaborative mindset.
Able to effectively manage time, prioritize tasks, and clearly convey complex data concepts to technical and non-technical audiences.

Physical Requirements:

Remaining in a normal seated or standing position for extended periods of time; reaching and grasping by extending hand(s) or arm(s); dexterity to manipulate objects with fingers, for example using a keyboard; communication skills using the spoken word; ability to see and hear within normal parameters; ability to move about workspace. The position requires mobility, including the ability to move materials weighing up to several pounds (such as a laptop computer or tablet).

Persons with disabilities may be able to perform the essential duties of this position with reasonable accommodation. Requests for reasonable accommodation will be evaluated on an individual basis.

Please Note:

This job description sets forth the job’s principal duties, responsibilities, and requirements; it should not be construed as an exhaustive statement, however. Unless they begin with the word “may,” the Essential Duties and Responsibilities described above are “essential functions” of the job, as defined by the Americans with Disabilities Act.

Compensation Range

Data Engineer II: $98,039.20 (minimum) - $122,549.00 (midpoint) - $159,313.70 (maximum)

Data Engineer III: $112,629.60 (minimum) - $140,787.00 (midpoint) - $183,023.10 (maximum)

Pay Type: Salary

HHMI’s salary structure is developed based on relevant job market data. HHMI considers a candidate's education, previous experiences, knowledge, skills and abilities, as well as internal consistency when making job offers. Typically, a new hire for this position in this location is compensated between the minimum and the midpoint of the salary range.

#LI-BG1

Compensation and Benefits

Our employees are compensated from a total rewards perspective in many ways for their contributions to our mission, including competitive pay, exceptional health benefits, retirement plans, time off, and a range of recognition and wellness programs. Visit our Benefits at HHMI site to learn more.

HHMI is an Equal Opportunity Employer

We use E-Verify to confirm the identity and employment eligibility of all new hires.

data engineer bioinformatics sequence data FASTA FASTQ Snakemake Nextflow Python data pipelines data validation data provenance Janelia HHMI data visualization statistical analysis

Average salary estimate

$140531.15 / YEARLY (est.)

min

max

$98039.2K

$183023.1K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Staff Engineer – Data Platform and Lakehouse

Informa Group Plc. Hybrid 485 Lexington Ave, New York, NY 10017, USA

VIEW

Posted 13 hours ago

Lead the architecture and delivery of Curinos's Databricks/AWS data platform and Lakehouse to enable scalable AI, analytics and product innovation across the organization.

Data Engineer III

Spreetail Hybrid Remote

VIEW

Posted 10 hours ago

Senior-level Data Engineer to lead design and delivery of scalable batch and streaming data solutions that power analytics and ML for a rapidly growing ecommerce platform.

Sr. Data Engineering Scientist 2

Adswerve, Inc Hybrid United States - Remote

VIEW

Posted 18 hours ago

Senior Data Engineering Scientist 2 needed to design and implement cloud-based analytics pipelines and data warehouses for marketing-focused clients while mentoring junior engineers and translating technical solutions to business stakeholders.

Director Of Product Analytics

Basis Technologies Hybrid No location specified

VIEW

Posted 21 hours ago

Lead and scale the product analytics function at Basis Technologies to deliver instrumentation, experimentation, and data-driven insights that shape product and growth strategy across the organization.

Analytics Engineer, Fraud

Taptap Send Hybrid London

VIEW

Posted 7 hours ago

Taptap Send is hiring an Analytics Engineer (Fraud) to design dbt-based analytics, build real-time risk models, and own fraud monitoring and dashboards across the business.

CRM Data Engineer

Nelnet Hybrid Remote

VIEW

Posted 14 hours ago

Work remotely as a CRM Data Engineer at Nelnet Business Services to build and maintain Creatio-focused data pipelines, integrations, and back-end CRM solutions that ensure high data quality and system interoperability.

Senior Data Engineer

Handoff Hybrid Austin

VIEW

Posted 22 hours ago

Lead the evolution of Handoff’s data infrastructure by building scalable ETL/ELT pipelines, optimizing the data warehouse, and enabling analytics across product and business teams.

Software Engineer, Data

Airtable Hybrid San Francisco, CA

VIEW

Posted 11 hours ago

Inclusive & Diverse

Rise from Within

Mission Driven

Diversity of Opinions

401K Matching

Paid Holidays

Paid Time-Off

Maternity Leave

Paternity Leave

Family Coverage (Insurance)

Medical Insurance

Dental Insurance

Vision Insurance

Mental Health Resources

Life insurance

Disability Insurance

Airtable is hiring a Software Engineer, Data to design and own scalable data pipelines, foundational business tables, and observability that power product and business decisions.

Data Quality Manager

Cengage Hybrid Remote

VIEW

Posted 15 hours ago

Lead and scale data quality operations at Cengage Group by coordinating analysts and stewards to improve data integrity across education technology products.

Data Integrations Engineer

Unwrap Hybrid Santa Barbara

VIEW

Posted 7 hours ago

Unwrap.ai is hiring a Data Integrations Engineer to build reliable ETL pipelines and customer-facing integrations that bring feedback from third-party platforms into our product insights platform.

Data Engineering Manager

PayZen Hybrid San Francisco

VIEW

Posted 11 hours ago

Lead PayZen's San Francisco data team to design and operate secure, compliant data platforms and analytics that power patient payment insights across major healthcare systems.

Sr Developer, BI

Macmillan Learning Hybrid Remote

VIEW

Posted 15 hours ago

Macmillan Publishers is hiring a Senior BI Developer to lead the migration to Power BI and modernize reporting on a dbt/Snowflake data platform.

Staff Data Engineer

Citizen Health Hybrid San Francisco

VIEW

Posted 10 hours ago

Citizen Health is hiring a Staff Data Engineer to design and lead scalable data infrastructure and pipelines that enable AI-driven clinical insights and research.

Howard Hughes Medical Institute

At the Howard Hughes Medical Institute, we believe in the power of individuals to advance science through research and science education, making discoveries that benefit humanity.

1 jobs

MATCH

Calculating your matching score...

FUNDING

Nonprofit

DEPARTMENTS

Data

SENIORITY LEVEL REQUIREMENT

Mid-Level

INDUSTRY

Grantmaking & Charitable Foundations

TEAM SIZE