Bioinformatics @ NYGC

Our team of bioinformaticians aims to develop, maintain and improve our analysis pipelines by leveraging the large amounts of sequencing data we produce. We work on estimating the sources of errors and variability in the data, defining methods to correct them, both computationally and on the lab side. We are also continually evaluating and benchmarking available tools, refining best practices to analyze and combine results, and are developing novel tools and methods.

We are also supporting our CLEP lab by providing the expertise in clinical interpretation of constitutional and cancer genomics.


processOur diverse team of bioinformatics scientists has expertise in:

  • Statistical and population genetics
  • Cancer genomics
  • Expression analysis
  • Epigenomics and functional genomics
  • de novo genome assembly
  • Metagenomics
  • Clinical interpretation


A typical project is initiated with one of the sequencing project managers. Our bioinformatics scientists are consulted to further refine the experimental design, analytic plan, and project deliverables.

The bioinformatics team performs standard and project-specific quality control, and analysis of sequencing data (e.g., differential expression and functional enrichment for RNA-Seq, variant annotation and interpretation for genome and exome sequencing, and somatic variant—both SNV and structural variant—for cancer). Results are delivered via our web interface or APIs and are stored and accessible for a period of time as part of NYGC’s Integrated Genomics.

Clinical Interpretation

As exome and genome sequencing data are processed and genomic variation between the sample and a reference are defined, annotated, and compared to existing databases, our bioinformatics scientists contribute to the last step of the analysis: clinical interpretation.

This usually requires ranking and filtering of putative candidates, manual curation, and functional validation (when possible) of our findings. NYGC’s analysis alleviates the need for the investigator to perform the standard computationally intensive analysis steps, thus freeing up time to focus on the biology.

Toby Bloom

Senior Director, Strategic Genomic Analytics

Michael Zody

Senior Director, Computational Biology

Nicolas Robine

Assistant Director, Computational Biology

Avinash Abhyankar

Manager, Clinical Informatics

Uday Evani

Senior Software Engineer

Kazimierz Wrzeszczynski

Assistant Director, Clinical Oncology Informatics

Giuseppe Narzisi

Lead Bioinformatics Scientist

Marta Byrska-Bishop

Bioinformatics Scientist

Will Liao

Senior Bioinformatics Scientist

Phaedra Agius

Senior Bioinformatics Scientist

Andre Corvelo

Senior Bioinformatics Scientist

Kanika Arora

Bioinformatics Scientist

Minita Shah

Bioinformatics Scientist

Wayne Clarke

Bioinformatics Scientist

Dillon Maloney

Bioinformatics Analyst

Rajeeva Musunuri

Bioinformatics Programmer

Molly Johnson

Bioinformatics Analyst

Heather Geiger

Senior Bioinformatics Analyst

Jennifer Shelton

Bioinformatics programmer

Alice Fang


YES1 amplification is a mechanism of acquired resistance to EGFR inhibitors identified by transposon mutagenesis and clinical genomics.

In ∼30% of patients with EGFR-mutant lung adenocarcinomas whose disease progresses on EGFR inhibitors, the basis for acquired resistance remains unclear. We have integrated transposon mutagenesis screening in an EGFR-mutant cell line and clinical genomic sequencing in cases of acquired...

Authors:  Nicolas Robine   Giuseppe Narzisi  

Genomic and Geographic Context for the Evolution of High-Risk Carbapenem-Resistant Enterobacter cloacae Complex Clones ST171 and ST78.

Recent reports have established the escalating threat of carbapenem-resistant Enterobacter cloacae complex (CREC). Here, we demonstrate that CREC has evolved as a highly antibiotic-resistant rather than highly virulent nosocomial pathogen. Applying genomics and Bayesian phylogenetic analyses to a 7-year collection of CREC...


taxMaps: Comprehensive and highly accurate taxonomic classification of short-read data in reasonable time.

High-throughput sequencing is a revolutionary technology for the analysis of metagenomic samples. However, querying large volumes of reads against comprehensive DNA/RNA databases in a sensitive manner can be compute-intensive. Here, we present taxMaps, a highly efficient, sensitive and fully scalable...

Authors:  Michael Zody   Nicolas Robine   Andre Corvelo   Wayne Clarke  

Genome-wide somatic variant calling using localized colored de Bruijn graphs

Reliable detection of somatic variations is of critical importance in cancer research. Here we present Lancet, an accurate and sensitive somatic variant caller, which detects SNVs and indels by jointly analyzing reads from tumor and matched normal samples using colored...

Authors:  Michael Zody   Nicolas Robine   Giuseppe Narzisi   Andre Corvelo   Kanika Arora   Minita Shah   Rajeeva Musunuri  

Whole Genome Sequencing-Based Discovery of Structural Variants in Glioblastoma.

Next-generation DNA sequencing (NGS) technologies are currently being applied in both research and clinical settings for the understanding and management of disease. The goal is to use high-throughput sequencing to identify specific variants that drive tumorigenesis within each individual's tumor...

Authors:  Kazimierz Wrzeszczynski   Minita Shah  

Identification of Three Rheumatoid Arthritis Disease Subtypes by Machine Learning Integration of Synovial Histologic Features and RNA Sequencing Data

We sought to refine histologic scoring of rheumatoid arthritis synovial tissue by training with gene expression data and machine learning. METHODS: Twenty histologic features were assessed on 129 synovial tissue samples. Consensus clustering was performed on gene expression data from a subset...

Authors:  Nicolas Robine   Phaedra Agius   Heather Geiger