Senior Data Scientist, Brain Tumor Insitute

at Children's National Hospital

Full Time

Senior Data Scientist, Brain Tumor Insitute - (250000WD)

Description

The Brain Tumor Institute (BTI) Bioinformatics Core at Children’s National Hospital is seeking a highly skilled Senior Bioinformatics Scientist/Engineer to join our team. This position will play a critical role in advancing research of multiple PIs focused on uncovering oncogenic mechanisms in pediatric brain tumors and identifying novel therapeutic targets. The Senior Bioinformatics Scientist will engage in basic and translational research projects and contribute to tool development, such as interactive applications for visualizing complex genomic data.

The role involves close collaboration with researchers and clinicians within both Children’s National as well as external partners. The successful candidate will report to the Director of the BTI Bioinformatics Core and lead workflow creation and implementation using CWL and/or NextFlow, benchmark new core pipelines, contribute bioinformatics analyses to focused projects based on PI needs, participate in collaborative activities in the BTI such as code review and/or workshop training, and contribute to grant applications and scientific manuscripts. In addition, this candidate will support core engineering needs such as database/API/UI development and automation.

Key Responsibilities:

● Collaborate with bioinformatics scientists and PIs to benchmark and optimize new production-scale analysis pipelines and workflows to generate high quality and high data integrity outputs.

● Support project-specific engineering needs, such as database/API/UI development.

● Collaborate with IT to ensure AWS IAM and bucket security and optimize resource use.

● Create and maintain clear documentation for data engineering workflows, including codebases, data pipelines, validation, testing, and CI/CD processes.

● Perform high-quality bioinformatics analyses on pediatric oncology datasets, including genomic, transcriptomic, and epigenomic data.

● Design and implement downstream analytical workflows for high-throughput data using GitHub, Docker, and AWS infrastructure, focusing on reproducibility, code efficiency, and scalability.

● Utilize cloud-computing environments (e.g., AWS EC2) and/or high-performance computing (HPC) to support large-scale or memory-intensive analyses.

● Actively and positively participate in sprints and code reviews, ensuring high standards for reproducibility and documentation.

● Engage with multidisciplinary teams, providing bioinformatics expertise to support collaborative research initiatives.

Application Process:

This position will be remote. Candidates should be prepared to share their GitHub handle and present a recent project as part of the interview process.

-------------------------------------------------------------------------------------------------------------------------------------

Build scalable, production ready machine learning and statistical models to improve healthcare data latency through automation. This role will focus on advanced statistical and machine learning solutions collecting, cleansing, interpreting large volumes of data from varying sources, designing and delivering production ready models, monitoring and maintaining models' health in production, all while communicating key findings with stakeholders.

Qualifications

Preferred Skills:

● Ph.D. in Bioinformatics, Computational Biology, or a related field, or equivalent industry experience.

● At least ten years of experience in bioinformatics including cancer, with expertise in Bash, R or Python, RShiny and or Python GUI applications.

● Proficiency with cloud-based or high-performance computing environments for bioinformatics workflows.

● Strong experience with tools and best practices for reproducibility, including Git and Docker.

● Proven experience with genomic data types such as single nucleotide variants (SNVs), copy number variants, fusions, RNA expression, methylation, proteomics, splicing, and single cell datasets.

● Commitment to open science practices, including sharing and collaborating on code, data, and documentation.

● Extensive experience with current standard parallel computing and data processing workflows (eg: Snakemake, NextFlow, CWL, WDL).

● Experience diagnosing and troubleshooting pipeline errors and unexpected behaviors. This includes taking initiative whether it be debugging, online searches, contacting authors of software for assistance and generally seeking assistance as needed.

● Experience with reproducible pipeline development including software version control, use and creation of docker and/or singularity images, collaborative code review.

● Demonstrated ability to develop and implement best practices for bioinformatics systems integration, testing, and deployment is required.

● Interest in learning AWS cloud architecture, design, and automation.

● Strong organizational and project management skills, with the ability to work on multiple projects and teams.

● Excellent communication skills, with the ability to work in cross-disciplinary teams.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Minimum Education
Bachelor's Degree A Bachelor’s degree in a quantitative/statistical or business field (e.g., Statistics, Mathematics, Engineering, Computer Science). (Required)
Master's Degree Masters preferred. (Preferred)

Minimum Work Experience
6 years Requires deep functional knowledge with 6+ years of related experience or equivalent experience acquired through accomplishments of applicable knowledge, duties, scope and skill reflective of the level of this position. (Required)

Required Skills/Knowledge
Experience working in a heavily regulated industry. Healthcare is a plus.
Advanced course in machine learning and programming.
Experience working with global distributed multicultural teams.
Experience with agile leadership.
Experience with building, delivering and maintaining production ready machine learning models.
Knowledge of statistical data analysis and machine learning such as linear models, time series forecasting, neural network, random forest and NLP models, etc.
Expert in Python coding and utilization of machine learning and statistical packages for modeling.
Experience with database skills, SQL, NoSQL, coding for ETL.
In depth understanding of machine learning algorithms such as random forest, neural network, graph models, NLP, etc.
Familiarity with Spark, Azure, Databricks, MLFlow AutoML.
Experience and familiarity with backlog management tools and resources, ideally with JIRA and Confluence.
Seeks to acquire knowledge in area of specialty.
Ability to identify basic problems and procedural irregularities, collect data, establish facts, and draw valid conclusions.
Ability to work independently.
Demonstrated analytical skills.
Demonstrated project management skills.
Demonstrates a high level of accuracy, even under pressure.
Demonstrates excellent judgment and decision-making skills.
Ability to communicate and make recommendations to leadership.
Ability to drive multiple projects to successful completion.
Possesses technical aptitude.
Excellent verbal and written communication skills, communicate complex findings in a clear and understandable manner
Excellent facilitation ability to host sessions and elicit ideas from others, understanding their issues and encourage group participation
Attention to detail.
Communicate complex findings in a clear and understandable manner
Collaborate effectively with cross-functional teams
Adapt to changing priorities and thrive in a dynamic environment

Functional Accountabilities

Designs, develops and delivers statistical and/or machine learning models that solve business problems and work with engineers to make them production ready.

Develops, utilize and monitor end-to-end machine learning pipeline from data ETL to model delivery for product ionization.

Leads rapid prototyping for new business problems to support feasibility analysis for AI products.

Shares complex ideas verbally and visually wit broad audience from technical and non-technical background

Builds and adopts solutions to automate and integrate data science processes.

Researches latest and best solutions to solve data challenges at hand.

Interprets and communicates results of complex models with cross functional team and the stakeholders.

Generate internal implementations to achieve results.

Work closely with the software engineering teams to drive scalable, production ready implementations.

Collaborate with teams across the organization.

Document technical work as part of the production deployment process.

Contribute to our evolving cloud infrastructure and data engineering pipeline.

Contribute to scientific software engineering efforts utilizing professional coding standards.

Collaborate with business partners to develop new models and concepts for continuous improvement.

Performs other duties as assigned.

Complies with all policies and standards.

Organizational Accountabilities
Organizational Accountabilities (Staff)
Organizational Commitment/Identification

Anticipate and responds to customer needs; follows up until needs are met

Teamwork/Communication

Demonstrate collaborative and respectful behavior

Partner with all team members to achieve goals

Receptive to others’ ideas and opinions

Performance Improvement/Problem-solving

Contribute to a positive work environment

Demonstrate flexibility and willingness to change

Identify opportunities to improve clinical and administrative processes

Make appropriate decisions, using sound judgment

Cost Management/Financial Responsibility

Use resources efficiently

Search for less costly ways of doing things

Safety

Speak up when team members appear to exhibit unsafe behavior or performance

Continuously validate and verify information needed for decision making or documentation

Stop in the face of uncertainty and takes time to resolve the situation

Demonstrate accurate, clear and timely verbal and written communication

Actively promote safety for patients, families, visitors and co-workers

Attend carefully to important details - practicing Stop, Think, Act and Review in order to self-check behavior and performance

Primary Location

: District of Columbia-Washington

Work Locations

: Research & Innovation Campus 7144 13th Place NW Washington 20012

Job

: Information Technology

Organization

: Ctr Cancer & Immunology RsrchPosition Status: R (Regular) - FT - Full-TimeShift: DayWork Schedule: 9:00-5:30 PM

Job Posting

: Mar 27, 2025, 6:13:13 PM

Full-Time Salary Range

: 109116.8 - 181854.4

Salary

$109,116 - $181,854

Yearly based

Location

District of Columbia-Washington

Data Science

Job Overview

Job Posted:

4 days ago

Job Expires:

Job Type

Full Time

Description

Qualifications

Primary Location

Work Locations

Job

Organization

Job Posting

Full-Time Salary Range

Salary

$109,116 - $181,854

Location

Share This Job:

AI Jobs

Companies

Support

Job Details

Description

Qualifications

Primary Location

Work Locations

Job

Organization

Job Posting

Full-Time Salary Range

Salary

$109,116 - $181,854

Location

Share This Job:

Related Jobs

AI Jobs

Companies

Support