ELECTRONIC SUBMISSION FOR CONSIDERATION IN THE UNIVERSITY OF TORONTO MEDICAL JOURNAL TITLE: Exome sequencing expedites rare disease gene discovery AUTHOR NAMES: Kun Huang (1T3) MD candidate, University of Toronto CORRESPONDING AUTHOR EMAIL ADDRESS: Kun Huang University of Toronto, Faculty of Medicine 1 King's College Circle, Medical Sciences Building, Room 2109 Toronto, ON, M5S 1A8 Tel: (647) 863-2715 email: kun.huang@utoronto.ca UTMJ Perspectives SUBMISSION Page 1 of 8 Kun Huang For the past several decades, genes underlying Mendelian disorders have been identified through traditional positional cloning strategies which often have reduced power due to marked locus heterogeneity, small kindred sizes, or substantially reproductive disadvantage. 1 The emergence of next generation sequencing techniques (ie. whole-genome, whole-exome and whole-transcriptome sequencing) allows substantial advances in identifying genetic alterations. In theory, whole-genome sequencing of all human genes for discovery of genetic variants could potentially identify the gene underlying any given monogenic disease. However the cost associated with sequencing whole genome is daunting. An alternative approach, exome sequencing, involves the targeted resequencing of all protein coding regions, which only requires ~5% as much sequencing as a whole human genome. 2-4 An increasing number of studies in the past two years have demonstrated whole-exome sequencing to be a powerful approach to identify causative genes underlying extremely rare Mendelian disorders. 5-12 In order to effectively process the massive sequencing data, some assumptions, albeit arbitrary, are made about causal mutations underlying Mendelian disease. 2 11 12 First, the disease is monogenic and caused by a single mutation. Secondly, the mutation has a significant effect on phenotypes; therefore it is most likely coding and highly penetrant, i.e. missense and nonsense substitutions, coding indels as well as splice acceptor and donor site changes. Thirdly, the mutation would be rare or novel, and probably private to affected individuals. Where necessary, a further assumption is often made that the disease is genetically homogenous, i.e. unrelated affected individuals have mutations in the same gene, at least for the individuals whose DNA were sequenced for the study. The proof-of-concept of exome sequencing was first demonstrated in 2009 in a rare Mendelian disorder called Freeman–Sheldon syndrome to show the feasibility of this technique. Through only four cases, this study was able to identify MYH3 as the single causal gene. 2 Since then, more than 40 exome sequencing studies have applied various strategies to identify UTMJ Perspectives Page 2 of 8 Kun Huang the causal variants for different disorders such as Miller syndrome 12, Kabuki syndrome 11, Fowler syndrome 13, Perrault Syndrome 14 and Schinzel-Giedion syndrome 15. Some studies also integrated exome sequencing data with traditionally used linkage and homozygosity analysis 8 16. However, the avalanche of data from exome sequencing provides a statistical and computational challenge: how to separate the causative alterations from the noise caused by normal variants. Based on the aforementioned assumptions, the primary filter used to identify potentially causal mutations is variant function, with the rationale that mutations which are disruptive to proteins and/or at more conserved sites are more likely to be pathogenic. Therefore, non-coding and synonymous variants are often ignored or greatly down-weighted. Tools like SIFT 17, PolyPhen 18 19, CDPred 20, PhyloP 21 and GERP 22 23 are developed to rank the variants by potential effect on protein structure and function, and also by conservation scores. Although such strategy has been justified for many studies, this will most certainly not always be the case. Shortcomings of this method include the inability to capture regulatory or evolutionary conserved sequences in non-coding regions. As more disorders are studied, there will be a growing need for functional annotation of non-coding regions and tools to analyze the same. Empirical analysis of published exomes estimates about 20 000 single-nucleotide variants in a given exome. 2 11 12 For rare mutations that give major effect and distinctive phenotype, they are not expected to be found in the population at large, and hence will not be seen in genome-wide scans for variants [e.g. the 1000 Genomes Project 24], nor in polymorphism repositories [e.g. dbSNP 25]. Exclusion from these data sets is typically an important criterion in defining a rare, novel or private variant. This simple assumption and filtering strategy offers an advantage to quickly sift through the exome data for promising causal variants. However, a caveat to note is that dbSNP has a considerable false-positive rate of 15– UTMJ Perspectives Page 3 of 8 Kun Huang 17%. 26 It is possible that the recessive disease-causing mutations from a normal carrier are deposited in the database. As an illustration, a recent study by Haack et al 27 provides an elegant example of how exome sequencing in combination with appropriate filtering strategies can be effective in the elucidation of a human respiratory chain disease, mitochondrial complex I deficiency (figure 1). Discovering the molecular basis of this disease is challenging given the large number of both mitochondrial and nuclear genes that are involved. Using whole exome sequencing followed by filtering with prioritization of mitochondrial protein (figure 1), Haack et al identified heterozygous mutations in ACAD9, a mitochondrial acyl-CoA dehydrogenase gene, from a single individual with severe, isolated complex I deficiency. The authors went on and screened 120 additional complex I defective index cases for ACAD9 mutations. Two additional unrelated cases and a total of five pathogenic ACAD9 alleles were identified, further supporting mutations in ACAD9 are associated with a mitochondrial disorder dominated by severe and generalized complex I deficiency. Of particular excitement is that supplementation of riboflavin whose metabolite fosters ACADs assembly and stability resulted in a significant increase of complex I activity in mutant cell cultures from the patient. 27 This is the first exome sequencing study that also obtained a promising clinical response. Follow-up clinical trial is needed to establish the efficacy of a supplementation with vitamins and cofactors in individuals with ACAD9 mutations. Exome sequencing revolutionizes the way that the genetic bases of Mendelian disorders are studied. More studies now have applied exome and whole-genome sequencing to common and genetically complex diseases such as mental retardation. 28 29 Albeit an emerging new technique, exome sequencing has already expedited the disease gene discovery and is on the horizon to make personalized medicine a reality. UTMJ Perspectives Page 4 of 8 Kun Huang Figure 1. Exome sequencing and filtering strategy. UTMJ Perspectives Page 5 of 8 Kun Huang Reference 1. Collins FS. Positional cloning moves from perditional to traditional. Nat Genet 1995;9(4):34750. 2. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 2009;461(7261):272-6. 3. Choi M, Scholl UI, Ji W, Liu T, Tikhonova IR, Zumbo P, et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci U S A 2009;106(45):19096-101. 4. Hodges E, Xuan Z, Balija V, Kramer M, Molla MN, Smith SW, et al. Genome-wide in situ exon capture for selective resequencing. Nat Genet 2007;39(12):1522-7. 5. Johnson JO, Mandrioli J, Benatar M, Abramzon Y, Van Deerlin VM, Trojanowski JQ, et al. Exome sequencing reveals VCP mutations as a cause of familial ALS. Neuron 2010;68(5):857-64. 6. Kalay E, Yigit G, Aslan Y, Brown KE, Pohl E, Bicknell LS, et al. CEP152 is a genome maintenance protein disrupted in Seckel syndrome. Nat Genet 2011;43(1):23-6. 7. Wang JL, Yang X, Xia K, Hu ZM, Weng L, Jin X, et al. TGM6 identified as a novel causative gene of spinocerebellar ataxias using exome sequencing. Brain 2010;133(Pt 12):3510-8. 8. Musunuru K, Pirruccello JP, Do R, Peloso GM, Guiducci C, Sougnez C, et al. Exome sequencing, ANGPTL3 mutations, and familial combined hypolipidemia. N Engl J Med 2010;363(23):2220-7. 9. Green P, Wiseman M, Crow YJ, Houlden H, Riphagen S, Lin JP, et al. Brown-Vialetto-Van Laere syndrome, a ponto-bulbar palsy with deafness, is caused by mutations in c20orf54. Am J Hum Genet 2010;86(3):485-9. UTMJ Perspectives Page 6 of 8 Kun Huang 10. Bilguvar K, Ozturk AK, Louvi A, Kwan KY, Choi M, Tatli B, et al. Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations. Nature 2010;467(7312):207-10. 11. Ng SB, Bigham AW, Buckingham KJ, Hannibal MC, McMillin MJ, Gildersleeve HI, et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat Genet 2010;42(9):790-3. 12. Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, et al. Exome sequencing identifies the cause of a mendelian disorder. Nat Genet 2010;42(1):30-5. 13. Lalonde E, Albrecht S, Ha KC, Jacob K, Bolduc N, Polychronakos C, et al. Unexpected allelic heterogeneity and spectrum of mutations in Fowler syndrome revealed by nextgeneration exome sequencing. Hum Mutat 2010;31(8):918-23. 14. Pierce SB, Walsh T, Chisholm KM, Lee MK, Thornton AM, Fiumara A, et al. Mutations in the DBP-deficiency protein HSD17B4 cause ovarian dysgenesis, hearing loss, and ataxia of Perrault Syndrome. Am J Hum Genet 2010;87(2):282-8. 15. Hoischen A, van Bon BW, Gilissen C, Arts P, van Lier B, Steehouwer M, et al. De novo mutations of SETBP1 cause Schinzel-Giedion syndrome. Nat Genet 2010;42(6):483-5. 16. Krawitz PM, Schweiger MR, Rodelsperger C, Marcelis C, Kolsch U, Meisel C, et al. Identityby-descent filtering of exome sequence data identifies PIGV mutations in hyperphosphatasia mental retardation syndrome. Nat Genet 2010;42(10):827-9. 17. Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 2003;31(13):3812-4. 18. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat Methods 2010;7(4):248-9. 19. Ramensky V, Bork P, Sunyaev S. Human non-synonymous SNPs: server and survey. Nucleic Acids Res 2002;30(17):3894-900. UTMJ Perspectives Page 7 of 8 Kun Huang 20. Johnston JJ, Teer JK, Cherukuri PF, Hansen NF, Loftus SK, Chong K, et al. Massively parallel sequencing of exons on the X chromosome identifies RBM10 as the gene that causes a syndromic form of cleft palate. Am J Hum Genet 2010;86(5):743-8. 21. Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 2010;20(1):110-21. 22. Cooper GM, Stone EA, Asimenos G, Green ED, Batzoglou S, Sidow A. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 2005;15(7):90113. 23. Cooper GM, Goode DL, Ng SB, Sidow A, Bamshad MJ, Shendure J, et al. Single-nucleotide evolutionary constraint scores highlight disease-causing mutations. Nat Methods 2010;7(4):250-1. 24. Pennisi E. Genomics. 1000 Genomes Project gives new map of genetic diversity. Science 2010;330(6004):574-5. 25. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001;29(1):308-11. 26. Day IN. dbSNP in the detail and copy number complexities. Hum Mutat 2010;31(1):2-4. 27. Haack TB, Danhauser K, Haberberger B, Hoser J, Strecker V, Boehm D, et al. Exome sequencing identifies ACAD9 mutations as a cause of complex I deficiency. Nat Genet 2010;42(12):1131-4. 28. Vissers LE, de Ligt J, Gilissen C, Janssen I, Steehouwer M, de Vries P, et al. A de novo paradigm for mental retardation. Nat Genet 2010;42(12):1109-12. 29. Caliskan M, Chong JX, Uricchio L, Anderson R, Chen P, Sougnez C, et al. Exome sequencing reveals a novel mutation for autosomal recessive non-syndromic mental retardation in the TECR gene on chromosome 19p13. Hum Mol Genet 2011;20(7):12859. UTMJ Perspectives Page 8 of 8