Close

Bari Ballew, PhD

Bioinformatician

Download Resume

About Me

I specialize in building up production bioinformatics teams and managing all aspects of pipeline development, testing, deployment (cloud or on-prem), data lifecycle management and strategy, and building and integrating systems for transparency, provenance, and coordination with other teams. I consider reproducibility in science to be a moral imperative, I view a healthy and supportive team culture as essential, and I am committed to assiduous advocacy for my reports.

Experience

Genomics and Data Science, 54gene, Inc.

Senior Bioinformatics Scientist

I lead our production bioinformatics efforts at 54gene, including deployment and administration of an on-demand virtual HPC, data lifecycle management, testing and development of production bioinformatics workflows, and benchmarking both for computational performance and for biological accuracy and precision. I coordinate data cleaning and reporting for patient consent forms, questionnaire results, and genomic data, and work closely with other stakeholders within the company, from our lab group to our epidemiologists to senior executives. I set team policy and best practices, write SOPs, and use ticket tracking for transparency and provenance.

Cancer Genomics Research Laboratory, NCI/Leidos Biomedical Research, Inc.

Manager, Bioinformatics Development and Analysis

I managed a team of six bioinformaticians at the Cancer Genomics Research Laboratory in support of the ~60 investigators in the Division of Cancer Epidemiology and Genetics, NCI, NIH. Our team conducted all pipeline development, including short- and long-read WES/WGS/targeted DNA-sequencing analysis, germline and somatic variant detection, structural variation analysis, microbiome (16S/ITS and metagenomics), GWAS (including association testing and meta-analysis), CNV analysis, and detection of germline mosaicism. We also supported ad-hoc downstream analysis needs. We emphasized personal accountability, clear and frequent communication, and reproducibility in our work.

Personal Genome Diagnostics

Scientist, Genome Sciences

I analyzed next generation sequencing data to identify medically actionable somatic mutations in tumors, and generate reports for physicians, academic institutions, and pharmaceutical clients.

National Institutes of Health

Postdoctoral Fellow

I explored the molecular basis of cancer susceptibility through whole exome sequencing and SNP array genotyping of individuals with genomic instability syndromes, in both family- and population-based studies.

Education

University of California San Diego

Sept 2005 - Dec 2011

Doctor of Philosophy in Biological Sciences

I completed my thesis work in Dr. Vicki Lundblad's laboratory, where I explored mechanisms by which telomeres are protected from being misinterpreted by the cell as DNA breaks.

Johns Hopkins University

Sept 2001 - June 2005

Bachelor of Science in Molecular and Cellular Biology

Projects

process.phenotypes

R library that facilitates highly-configurable, flexible, reproducible cleaning of arbitrary input phenotype data (e.g. health questionnaires, lab test results, etc.), including application of consent and age thresholds as well as an output report describing every input variable.

View Project
View Documentation

WGS Calling Pipeline

Reproducible, scalable, and robust Snakemake pipeline for calling germline WGS data. Handles dependency management, exposes critical parameters into user-configurable yaml, capable of deployment to any infrastructure that supports Snakemake. Multiple run modes to permit different use cases, e.g. a quick turn-around of read quality metrics to lab, a more thorough per-flowcell calling and subject-level QC, and joint-calling from gVCFs across multiple flowcells.

View Project
View Documentation

Python pandas tutorial

Introduction to exploratory data analysis with the Python package pandas, presented in an interactive Jupyter notebook accessible via either Binder or Google Colab.

Feedback from this workshop: "Your workshop ... received the highest scores of all our events to date! You are a Cancer Data Science Star Instructor and we cannot thank you enough for your thoughtful, tailored, engaging presentation!"

View Project

Snakemake tutorial

Introduction to building genomics pipelines with Snakemake. Slides available as a PDF; interactive Juptyer notebook available via Binder.

View Project

MoCCA-SV: A Flexible Ensemble Framework for Structural Variant Analysis

A modular Snakemake-based workflow to coordinate germline, de novo, and somatic SV calling with multiple callers. Caller output is harmonized and annotated for genomic context (e.g. segmental duplications, proximity to telomere/centromere sequence, genes and transcripts) and compared to SVs in public databases including DGV, ClinVar, ClinGen, and 1000 Genomes.

View Project

QIIME2 Microbiome Analysis Pipeline

The production pipeline used at CGR (Cancer Genomics Research Lab, NCI) for analysis of 16S microbiome sequencing data.

View Project

Skills

Get in Touch