1.2.1 From 0 to 60: Linking Human Genetic and Genomic Variation to Human Disease
In many ways the discovery of the human chromosome number in 19561 formed the cornerstone on which our present understanding of the relationship between individual genomes and their impact on the expression of disease would be built. Karyotype technology led to the elucidation of common chromosomal aneuploidies underlying previously clinically characterized conditions such as Down syndrome, first described in 18662 but not characterized cytogenetically until 1959.3 The first disease gene would be mapped in 1983,4 eventually linking trinucleotide repeat expansions in huntingtin (HTT) to the expression of Huntington disease.5 From that time, both single-gene disorders and genomic disorders6â8 resulting from submicroscopic structural variation of the human genome were increasingly described.
Linkage analysis and genome-wide association studies (GWAS) were used to identify common polymorphisms associated with a disease trait of interest, but they often fell short of directly identifying the disease gene and etiologic rare variant, necessitating additional techniques such as positional cloning for disease gene discovery. The development and implementation of tools that enable a direct genome-wide interrogation for rare structural and single nucleotide variants would revolutionize human disease gene discovery. In this regard, chromosomal microarray (CMA), exome sequencing (ES), and genome sequencing (GS) have truly accelerated gene discovery. Concurrent with technology development has been the elaboration of numerous variant annotation resources, including those that provide minor allele frequency data for populations of varying ethnicities, such as the Exome Aggregation Consortium (ExAC),9 the Genome Aggregation Database (gnomAD), the 1000 Genomes Project,10 the National Heart Lung and Blood Institute Exome Sequencing Project (http://evs.gs.washington.edu/EVS/), and the Atherosclerosis Risk in Communities11 databases, as well as catalogs of human structural variation such as the Database of Genomic Variants (DGV) and the Database of Genomic Variation and Phenotype in Humans Using Ensembl Resources (DECIPHER). Measures of evolutionary conservation [GERP (Genomic Evolutionary Rate Profiling), phyloP] and estimates of protein functional impact [SIFT(Sorting Intolerant From Tolerant),12 PolyPhen2 (Polymorphism Phenotyping v2),13 MutationTaster,14 LRT (Likelihood Ratio Test),15 CADD (Combined Annotation Dependent Depletion),16 REVEL (Rare Exome Variant Ensemble Learner)17] for identified variants have also improved genome-wide analytic methods. Bioinformatics approaches to genome-wide analyses have advanced our ability to interrogate large datasets, with rapid detection of copy number variants, de novo mutations, and absence of heterozygosity from exome and genome variant data,18â23 as well as prediction tools such as the probability of loss-of-function (LoF) intolerance9 and the likelihood that a truncating variant will escape from nonsense-mediated decay.24 Increasing use of expression catalogs, such as the Genotype-Tissue Expression database (GTEx), has also been harnessed to prioritize candidate disease genes through comparison of their tissue expression patterns to those of the disease trait of interest.25
The development of genome-wide assays and a tremendous toolkit for genomic variant analyses has led to a time of rapid growth in our understanding of genetic variation and its impact on human health. Below we highlight recent accomplishments in the field of human genetics and genomics.
1.2.2 Mendelian Conditions
Rare disease has been defined as a disease trait impacting fewer than 200,000 individuals in the US population. Traditionally, these conditions are associated with variants that are rare in the population (well below 1%) and convey a large effect on trait manifestation. Such traits are often referred to as âMendelian conditions,â as expression of the disease trait follows expected Mendelian modes of inheritance for a monogenic, or single locus, trait: autosomal dominant, autosomal recessive, X-linked, or mitochondrial. The diagnosis of a Mendelian condition can have immediate clinical impact, providing a molecular diagnosis and recurrence risk information for the affected family, and informing expectant medical management, and potentially therapeutic management, for the individual. This clinical value underscores the need for complete functional and phenotypic annotation of all ~20,000 genes in the human genome.
The current pace of human disease gene discovery for Mendelian conditions has never been greater and shows no evidence of slowing down. Despite this, to date, only 4083 genes (representing ~20% of the genes in the human genome) are cataloged in the Online Mendelian Inheritance in Man (OMIM) database as having one or more high-penetrance disease traits (www.OMIM.org; May 3, 2019). These data underscore both the high proportion of human genes that remain to be phenotypically annotated, and the complexity of geneâphenotype relationships, which do not always follow a one-to-one ratio.
Beyond simply the identification of novel disease genes underlying Mendelian conditions, a number of key discoveries have elucidated this very relationship between genes and their associated phenotypes. Traditional thinking led to a one gene-one disease model whereby a single gene or locus was associated with a particular disease trait, with inheritance following either a dominant or recessive pattern. However, there are increasing examples of geneâphenotype relationships that break with this traditional mold, underscoring the degree of allelic and locus heterogeneity, variability in penetrance and expression of disease traits, and combinatorial effects of rare variants at more than one locus in human disease. Genes such as RET may be associated with more than one disease trait, with rare constitutional variants leading to autosomal dominant multiple endocrine neoplasia type 2A (OMIM #171400) or the autosomal dominant neurocristopathy Hirschsprung disease (OMIM #142623). There are also increasing examples of genes associated with both dominant and recessive inheritance of disease traits. For some such genes, the dominantly inherited (due to monoallelic variation) trait is more severe (GJB2, KIF1A, MAB21L2, NALCN), whereas for other such genes the recessively inherited (due to biallelic variation) trait is more severe (AARS, CLCN1, EGR2, ROR2).26 Monoallelic and biallelic variants in CLCN1 lead to the same disease trait: dominantly or recessively inherited myotonia congenita (OMIM #160800, #255700). In contrast, variants in ATAD3A consistently affect the neurologic system, but clinically observed phenotypes are distinct when the etiologic variant is monoallelic (developmental delay, axonal neuropathy, hypotonia, hypertrophic cardiomyopathy) or biallelic (developmental delay, hypotonia, ataxia, seizures, and congenital cataracts with cerebell...