Bioinformatics @ NYGC

Our team of bioinformaticians aims to develop, maintain and improve our analysis pipelines by leveraging the large amounts of sequencing data we produce. We work on estimating the sources of errors and variability in the data, defining methods to correct them, both computationally and on the lab side. We are also continually evaluating and benchmarking available tools, refining best practices to analyze and combine results, and are developing novel tools and methods.

We are also supporting our CLEP lab by providing the expertise in clinical interpretation of constitutional and cancer genomics.


processOur diverse team of bioinformatics scientists has expertise in:

  • Statistical and population genetics
  • Cancer genomics
  • Expression analysis
  • Epigenomics and functional genomics
  • de novo genome assembly
  • Metagenomics
  • Clinical interpretation


A typical project is initiated with one of the sequencing project managers. Our bioinformatics scientists are consulted to further refine the experimental design, analytic plan, and project deliverables.

The bioinformatics team performs standard and project-specific quality control, and analysis of sequencing data (e.g., differential expression and functional enrichment for RNA-Seq, variant annotation and interpretation for genome and exome sequencing, and somatic variant—both SNV and structural variant—for cancer). Results are delivered via our web interface or APIs and are stored and accessible for a period of time as part of NYGC’s Integrated Genomics.

Clinical Interpretation

As exome and genome sequencing data are processed and genomic variation between the sample and a reference are defined, annotated, and compared to existing databases, our bioinformatics scientists contribute to the last step of the analysis: clinical interpretation.

This usually requires ranking and filtering of putative candidates, manual curation, and functional validation (when possible) of our findings. NYGC’s analysis alleviates the need for the investigator to perform the standard computationally intensive analysis steps, thus freeing up time to focus on the biology.

Toby Bloom

Senior Director, Strategic Genomic Analytics

Michael Zody

Senior Director, Computational Biology

Dayna Oschwald

Senior Director, Informatics Program Management

Nicolas Robine

Assistant Director, Computational Biology

Avinash Abhyankar

Manager, Clinical Informatics

Uday Evani

Senior Software Engineer

Kazimierz Wrzeszczynski

Bioinformatics Scientist

Giuseppe Narzisi

Senior Bioinformatics Scientist

Marta Byrska-Bishop

Bioinformatics Scientist

Will Liao

Senior Bioinformatics Scientist

Phaedra Agius

Senior Bioinformatics Scientist

Andre Corvelo

Senior Bioinformatics Scientist

Kanika Arora

Bioinformatics Scientist

Minita Shah

Bioinformatics Scientist

Caitlin McHugh

Senior Bioinformatics Analyst, Statistical Genetics

Wayne Clarke

Senior Bioinformatics Analyst

Dillon Maloney

Bioinformatics Analyst

Rajeeva Musunuri

Bioinformatics Programmer

Molly Johnson

Bioinformatics Analyst

Heather Geiger

Senior Bioinformatics Analyst

Jennifer Shelton

Bioinformatics programmer

Alice Fang


Amrita Kar

Bioinformatics Analyst, Metagenomics

Sadia Rahman

Biocurator, Molecular Diagnostics

Whole Genome Sequencing-Based Discovery of Structural Variants in Glioblastoma.

Next-generation DNA sequencing (NGS) technologies are currently being applied in both research and clinical settings for the understanding and management of disease. The goal is to use high-throughput sequencing to identify specific variants that drive tumorigenesis within each individual's tumor...

Authors:  Kazimierz Wrzeszczynski   Minita Shah   Sadia Rahman  

Machine learning integration of rheumatoid arthritis synovial histology and RNAseq data identifies three disease subtypes.

We sought to refine histologic scoring of rheumatoid arthritis synovial tissue by training with gene expression data and machine learning. METHODS: Twenty histologic features were assessed on 129 synovial tissue samples. Consensus clustering was performed on gene expression data from a subset...

Authors:  Nicolas Robine   Phaedra Agius   Heather Geiger  

Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis

Somatic mosaicism in the human brain may alter function of individual neurons. We analyzed genomes of single cells from the forebrains of three human fetuses (15 to 21 weeks post-conception) using clonal cell populations. We detected 200-400 single nucleotide variations...


Disease variants in genomes of 44 centenarians

To identify previously reported disease mutations that are compatible with extraordinary longevity, we screened the coding regions of the genomes of 44 Ashkenazi Jewish centenarians. Individual genome sequences were generated with 30× coverage on the Illumina HiSeq 2000 and single-nucleotide...

Authors:  Avinash Abhyankar  

Deficiency of UBE2T, the E2 Ubiquitin Ligase Necessary for FANCD2 and FANCI Ubiquitination, Causes FA-T Subtype of Fanconi Anemia

Fanconi anemia (FA) is a rare bone marrow failure and cancer predisposition syndrome resulting from pathogenic mutations in genes encoding proteins participating in the repair of DNA interstrand crosslinks (ICLs). Mutations in 17 genes (FANCA-FANCS) have been identified in FA...

Authors:  Avinash Abhyankar  

A novel mutation in the POLE2 gene causing combined immunodeficiency

Early lymphocyte development requires the orchestrated interplay of pathways to maintain genomic integrity and accurate DNA repair during the proliferative bursts associated with antigen receptor rearrangement (1). Inborn errors in replication control or DNA repair can lead to primary immunodeficiency...

Authors:  Avinash Abhyankar