A key goal of whole-genome sequencing (WGS) for human genetics studies is to interrogate all forms of variation, including single nucleotide variants (SNV), small insertion/deletion (indel) variants and structural variants (SV). However, tools and resources for the study of SV have lagged behind those for smaller variants. Here, we used a scalable pipeline22 to map and characterize SV in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest WGS-based SV resource to date. On average, individuals carry 2.9 rare SVs that alter coding regions, affecting the dosage or structure of 4.2 genes and accounting for 4.0-11.2% of rare high-impact coding alleles. Based on a computational model, we estimate that SVs account for 17.2% of rare alleles genome-wide with predicted deleterious effects equivalent to loss-of-function coding alleles; approximately 90% of such SVs are non-coding deletions (mean 19.1 per genome). We report 158,991 ultra-rare SVs and show that around 2% of individuals carry ultra-rare megabase-scale SVs, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and non-coding elements, revealing trends related to element class and conservation. This work will help guide SV analysis and interpretation in the era of WGS.
PMID: 32460305 DOI: 10.1038/s41586-020-2371-0
Nature . 2020 May 27. doi: 10.1038/s41586-020-2371-0