Severe loss-of-function variants in the genomes of healthy humans James Harraway, Genetic Pathologist Sullivan Nicolaides Pathology/Mater Pathology • Loss of Function mutations and Mendelian disease • Severe LOF mutations in normal human genomes • Implications of these findings Loss of Function Mutations and Mendelian Disease What is a LOF Variant? ‘Classical’ definition of a gene = DNA sequence that encodes RNA which is translated into a protein product “Loss of Function” variants are heritable changes in DNA sequence which result in reduced/abrogated function of the protein product encoded by the gene • Definition of a ‘gene’ is rapidly changing – majority of transcribed RNA doesn’t encode proteins; DNA regulatory elements (cis and trans, short and long range); epigenetic ‘histone code’ regulating transcription, RNA editing, etc Æ increasing future range of LOF variants ‘Severe’ LOF Variants MacArthur and Tyler-Smith, Hum Mol Gen 2010 Mendelian disease: mutation(s) in one or both alleles of one gene result in a clinically abnormal phenotype, with high penetrance and/or expressivity • Current focus of most diagnostic laboratories that investigate inherited disease; may change with increasing understanding of oligo/polygenic contributors to risk of ‘complex’ diseases, pharmacogenetics etc Most mutations in known Mendelian diseases = LOF • More ways to reduce/abrogate function of a gene by random mutation than to increase/change function (severe LOF mutations, missense (non-synonymous) mutations, regulatory region mutations etc) LOF mutations can lead to recessively inherited disease - when loss of function of the protein product of one mutant allele can be ‘compensated’ by the normal protein product from the other allele • At least to a ‘threshold’ where clinical phenotype is not evident LOF mutations may also lead to dominantly inherited disease via a number of mechanisms • Haploinsufficiency; subunit imbalances + dominant negative effects (for specific, often missense LOF mutations) Severe Loss of Function Mutations in Healthy Human Genomes Severe LOF Mutations: More Common Than Previously Thought… Previously, ‘mutational load’ of LOF variants in the population and each individual could only be estimated, not directly measured • e.g. based on observations of the incidence of severe recessive diseases in the general population, or perinatal mortality/severe disease as a result of consanguinous breeding Most severe LOF mutations have been detected in the context of clinical diagnosis of Mendelian disorders Æ implicit assumption that these are likely to be associated with disease • ‘Benign’ LOF variants have been assumed to be in the minority e.g. O allele of ABO blood group Increasing amounts of genome and exome data (groups/individuals) Æ number of severe LOF variants per individual in different populations can be directly measured: Yngvadottir et al, Am J Hum Gen 2009: Sequenced 805 previously reported nonsense SNVs in 1151 people from 56 populations • 169/805 were variable in the study group • Each individual on average carried 32 nonsense SNVs (14 homozygously) Conrad et al, Nat Gen 2010: Array-based testing of CNVs > 1kb in 450 HapMap individuals • 213 different complete gene deletions and 34 deletions of exons predicted to lead to frameshifts, out of 3811 total deletions Ng et al, Nature 2009: Exome sequencing of 12 HapMap individuals • 100 stop SNVs/frameshifting indels or splice-disrupting SNVs per genome • 30 on average in the homozygous state Lupski et al, NEJM 2010: Jim Lupski’s whole genome (unknown CMT gene) • 121 stop SNPS and 112 splice-disrupting SNVs (indels were harder to determine due to short read SOLiD technology). • Heterozygous for 16 HGMD recessive disease SNVs • Homozygous for 4 HGMD recessive disease SNVs • Hemizygous for 1 X-linked disease SNV (ABCD1;XLALD) • These were attributed to database error (or reduced penetrance?) Moore et al, Genet Med 2011: Survey of 10 published genomes for ‘clinically relevant’ variants • Found on average 10 000 nonsynonymous SNVs in coding sequences per individual, of which approximately 50 were nonsense SNVs. • They also found that on average each individual was heterozygous for 65 and homozygous for 42 alleles recorded as disease-associated in OMIM (although no patients were homozygous for severe LOF alleles in OMIM genes) Caveats There are potential caveats to the accuracy of estimates of severe LOF mutations in healthy individuals from High Throughput Sequencing data: • Sequencing call errors/mapping errors/sequence-mediated errors; accumulating more data leads to more ‘false positive’ errors, and het Æ hom errors are relatively common • And of course, a level of ‘false negative’ errors is unavoidable; exacerbated by low read depth, and sequence capture effects for exome approaches • Annotation issues for ‘genes’ (paralogs? pseudogenes?) and alternative isoforms • Errors/gaps in the reference sequence (unalignable reads, problems with assigning SVs) Despite these caveats, it is clear that there are more severe LOF variants in healthy humans than perhaps would have been expected • Somewhere between 100-300 nonsense SNVs/SVs Æ gene deletions/other severe LOF variants per healthy genome; homozygous for between 10-50 of these Of course, beyond severe LOF variants, there are a (much) larger number of putative LOF variants – nonsynonymous/missense SNVs; regulatory region variants, etc • These are beyond the scope of this talk; they are harder to assign functional relevance, and the high number of nonsynonymous SNVs per genome is perhaps less surprising than severe LOF variants • However, they are likely to be deleterious as a class; nonsynonymous SNVs make up the majority of variants known to be responsible for Mendelian diseases, and are overrepresented as rare alleles at a population level, indicating selection pressure (e.g. Li et al., Nat Gen, 2010) Implications for clinical diagnostics Common approach in molecular diagnostic laboratories is to report severe LOF mutations in genes sequenced for Mendelian disease as “likely/definitely pathogenic” This is reasonable if there is a high a priori probability that the gene is causative for the disorder, e.g. it is a ‘known’ disease gene for an identifiable Mendelian disorder • Includes when severe LOF mutations are discovered during a survey of a relatively small ‘shortlist’ of candidate genes However, we are moving toward an era of ‘genotype first’ discovery of the genetic basis of some Mendelian disorders (Mefford, Genet Med, 2009) Æ exome or genome-sized data sets Exome (or genome) sequencing is becoming commonplace in the literature for discovery of causative mutations in Mendelian disorders without clear candidate gene shortlist/in small families (linkage studies not possible) • Over 40 such studies have been published to date Array based testing for LOF structural variants has become standard of care in patients with developmental delay • Exome and eventually whole genome sequencing will likely become standard of care for this purpose to detect small scale variants as well as SVs (e.g. O’Roak et al, Nat Gen 2011) Moving beyond simple Mendelian disorders, HTS is being applied in GWAS and family studies • Discover risk alleles for non-Mendelian (oligo- or polygenic) diseases with significant heritability • Pharmacogenomic applications How can genome-wide data be used in clinical practice for inherited disease? Sequencing exomes or whole genomes in each patient will yield multiple severe LOF mutations per genome, and even more putative LOF mutations (e.g. missense variants) Studies published to date have managed to ‘pinpoint’ the clinically significant variant(s) for Mendelian disorders, but this may reflect publication bias The amount of time/expense that has gone into such publications may be difficult to apply in a high throughput, relatively rapid TAT clinical laboratory; how can this process become more streamlined? Improved Sequencing Technology Improved sequencing technologies (and calling/mapping/filtering algorithms) should: • Reduce sequencing cost • Increase read depth and accuracy of sequence data + speed/utility of variant filtering • May allow de novo assembly (reduce reliance on reference, and further refinements will be made to reference as data accumulates) • Allow rapid validation of variants of interest on different platforms Annotation of the genome will become more accurate with improved sequence data from more individuals • At both DNA level (genes vs paralogs/pseudogenes) and through RNA sequencing (better understanding of multiple transcripts/isoforms, and relation to function in different tissues) • Clinical labs will need to come to a consensus on which annotation set to use – e.g. GENCODE Improved software to predict clinical significance Software will (hopefully) improve for predicting clinical significance of novel severe LOF mutations • Algorithms to predict likelihood of haploinsufficiency based on gene characteistics (e.g. Huang et al, PLOS Genetics, 2010) • Includes gene characteristics (size, conservation, expression during devel. and tissue specificity) + interaction network models that estimate the whether a gene/protein is a network ‘node’ Refinements to software used to predict the functional effect of less severe LOF mutations • Algorithms based on evolutionary conservation should improve with more complete alignments, from increased sequencing of non-human species • Algorithms based on protein structure/function should become more effective, combining effects of substitutions on structure (folding/protein stability/free energy) with information on known ‘vital’ domains (in particular domains where other mutations Æ clinical phenotype) Improved Databases – Normal Variants Large scale projects such as the 1000 genomes project Æ broader picture of the variability in ‘normal’ genomes • Which LOF mutations can be tolerated without severe disease, and which genes have common LOF mutations and might be seen as ‘non-essential’ (with the caveat that phenotypic data is not available) • This will help inform filtering algorithms for clinically significant variants One goal of the 1000 genomes project is to catalogue, validate and annotate all putative coding LOF mutations present at over 0.1% frequency in the populations genotyped • Eventually, it will be useful to study the phenotypic attributes of ‘normal’ individuals with severe LOF mutations in various genes – as they do appear to reduce fitness on an evolutionary scale given population frequencies Improved Databases – Clinically Significant Variants The clinical diagnostic community needs to contribute, by cataloguing LOF variants that do cause or contribute to disease • Through publication, and contribution of SN variants/small indels to locus specific databases Æ HGMD (which, as noted previously, contains misattributed ‘pathogenic’ alleles and needs continued curation and improvement) • Through publication, and contribution of structural variants to databases such as DECIPHER and the ISCA database Æ dbVar Ultimately, through a central curated repository of all disease-causing and risk-associated variants– e.g. the Human Variome Project Patient-Specific, Clinical Approaches Broad family studies will often be vital • Segregation of variants with disease will need to be established whenever possible, especially in the case where there are a number of possible disease-causing variant alleles after filtering/validation/database searches/in silico predictions of pathogenicity • Sequencing multiple family members simultaneously will likely become the norm (e.g. parent-child trios for children with dev delay, which will detect de novo as well as inherited mutations) Close liason between clinicians and the clinical lab • Detailed phenotypic information will optimise selection of the clinically significant gene(s) from all of those containing LOF variants, based on known gene function/linked phenotypes • Feedback from the lab to the clinician should occur when new data on variants becomes available that refines the assignment of pathogenicity Summary Loss of function variants are important causes of Mendelian disease (and contribute to complex disease) A number of recent studies have established that even severe LOF variants are relatively common in each human genome, and many are rare/novel This has important implications for clinical diagnostic laboratories as we move toward exome/genome NGS, with increasing challenges in assigning pathogenicity to discovered variants Hopefully these challenges can be met, by a combination of technological and software improvements, improving databases of sequence both normal and clinically affected individuals, and optimal lab-clinical interface Questions?