Machine Learning in Complex Disease Gene Discovery This project is directed at applying machine learning algorithms to the detection of susceptibility genes underlying a variety of genetically complex disorders. BACKGROUND The identification of susceptibility genes contributing to the etiology of genetically complex disorders has been revolutionized by genome-wide association (GWA) studies. These studies evaluate the LINEAR association between disorders such as Asthma, Bipolar Disorder, Cancer, Diabetes (Type I, Type II), Hypertension, Inflammatory Bowel Disease, Multiple Sclerosis, Obesity, and Schizophrenia, and a huge number of single-nucleotide polymorphism (SNP) genetic markers, arrayed on a DNA microchip. Significant cost and effort have gone into the collection and phenotypic characterization of large patient samples as well as into the genotypic characterization of SNPs. Thus, DNA samples from each of thousands of individuals affected with a disorder are simultaneously compared with more than one million SNPs. Nonetheless, relatively few disease susceptibility genes have been discovered (< 10/disorder), despite these analyses comprising more than one billion data points each. COMMENT The relatively low statistical power of GWA studies may arise from two factors: 1) failure to evaluate NONLINEAR as well as linear relationships between disease and markers and 2) disregard of an enormous wealth of biological information which exists in gene interaction and gene ontology databases across many species, as well as in neuroanatomic, neurodevelopmental, and disease gene expression databases. THE PROJECT We are developing a statistical approach to the non-linear analysis of disease-gene relationships in which machine learning paradigms (such as Artificial Neural Networks, Bayesian Networks, “Genetic” Algorithms, and Support Vector Machines) are combined with Expert Knowledge derived from the aforementioned databases to arrive at significant sets of interacting and biologically plausible candidate genes which, acting individually or in concert, contribute to the pathogenesis of complex human diseases. Once developed, this approach will be applied to one well-studied, and one less well-studied, genetically complex disorder, namely age-related macular degeneration and schizophrenia, respectively. GOAL We hope that this project will lay the groundwork for a novel approach to understanding a wide variety of medical disorders, which, while etiologically complex, are also chronic, disabling, and extremely common. SELECTED REFERENCES Complex Disorders Allen NC, Bagade S, McQueen MB, Ioannidis JP, Kavvoura FK, Khoury MJ, Tanzi RE, Bertram L. Systematic meta-analyses and field synopsis of genetic association studies in schizophrenia: the SzGene database. Nat Genet. 2008 Jul;40(7):827-34. Allikmets R, Dean M. Bringing age-related macular degeneration into focus. Nat Genet. 2008 Jul;40(7):820-1. No abstract available Becker KG The common variants/multiple disease hypothesis of common complex genetic disorders. Med Hypotheses. 2004;62(2):309-17. Carter CJ. Schizophrenia susceptibility genes converge on interlinked pathways related to glutamatergic transmission and long-term potentiation, oxidative stress and oligodendrocyte viability. Schizophr Res. 2006 Sep;86(1-3):1-14.. Ferreira MA, O'Donovan MC, Meng YA, Jones IR, Ruderfer DM, Jones L, Fan J, Kirov G, Perlis RH, Green EK, Smoller JW, Grozeva D, Stone J, Nikolov I, Chambert K, Hamshere ML, Nimgaonkar VL, Moskvina V, Thase ME, Caesar S, Sachs GS, Franklin J, Gordon-Smith K, Ardlie KG, Gabriel SB, Fraser C, Blumenstiel B, Defelice M, Breen G, Gill M, Morris DW, Elkin A, Muir WJ, McGhee KA, Williamson R, Macintyre DJ, Maclean AW, St Clair D, Robinson M, Van Beck M, Pereira AC, Kandaswamy R, McQuillin A, Collier DA, Bass NJ, Young AH, Lawrence J, Nicol Ferrier I, Anjorin A, Farmer A, Curtis D, Scolnick EM, McGuffin P, Daly MJ, Corvin AP, Holmans PA, Blackwood DH; Wellcome Trust Case Control Consortium, Gurling HM, Owen MJ, Purcell SM, Sklar P, Craddock N. Collaborative genome-wide association analysis supports a role for ANK3 and CACNA1C in bipolar disorder. Nat Genet. 2008 Aug 17. [Epub ahead of print] O'Donovan MC, Craddock N, Norton N, Williams H, Peirce T, Moskvina V, Nikolov I, Hamshere M, Carroll L, Georgieva L, Dwyer S, Holmans P, Marchini JL, Spencer CC, Howie B, Leung HT, Hartmann AM, Möller HJ, Morris DW, Shi Y, Feng G, Hoffmann P, Propping P, Vasilescu C, Maier W, Rietschel M, Zammit S, Schumacher J, Quinn EM, Schulze TG, Williams NM, Giegling I, Iwata N, Ikeda M, Darvasi A, Shifman S, He L, Duan J, Sanders AR, Levinson DF, Gejman PV; Molecular Genetics of Schizophrenia Collaboration, Gejman PV, Sanders AR, Duan J, Levinson DF, Buccola NG, Mowry BJ, Freedman R, Amin F, Black DW, Silverman JM, Byerley WF, Cloninger CR, Cichon S, Nöthen MM, Gill M, Corvin A, Rujescu D, Kirov G, Owen MJ. Identification of loci associated with schizophrenia by genome-wide association and follow-up. Nat Genet. 2008 Jul 30. [Epub ahead of print] Databases Ala U, Piro RM, Grassi E, Damasco C, Silengo L. Prediction of Human Disease Genes by Human-Mouse Conserved Coexpression Analysis. PLoS Comput Biol 4(3): e1000043, 2008 Bussemaker HJ, Ward LD, Boorsma A. Dissecting complex transcriptional responses using pathway-level scores based on prior information. BMC Bioinformatics. 2007 Sep 27;8 Suppl 6:S6. Higgs BW, Elashoff M, Richman S An online database for brain disease research BMC Genomics 2006, 7:70 doi:10.1186/1471-2164-7-70 Iossifov I, Zheng T, Baron M, Gilliam TC, Rzhetsky A Genetic-linkage mapping of complex hereditary disorders to a whole-genome molecular-interaction network Genome Res. 18:1150-1162, 2008 Jin H, Kaloper M, Matese JC, Schroeder M, Brown PO, Botstein D. The Stanford Microarray Database: data access and quality assessment tools Nucleic Acids Res. 31:94-96, 2003. Krauthammer M, Kaufmann CA, Gilliam TC, Rzhetsky A. Molecular triangulation: bridging linkage and molecular-network information for identifying candidate genes in Alzheimer's disease. Proc Natl Acad Sci U S A. 2004 Oct 19;101(42):15148-53. Genome-Wide Association Analysis Carlson CS, Eberle MA, Kruglyak L, Nickerson DA. Mapping complex disease loci in whole-genome association studies. Nature. 2004 May 27;429(6990):446-52. Eskin E Increasing power in association studies by using linkage disequilibrium structure and molecular function as prior information Genome Res. 2008 18: 653-660. Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS. A genome-wide scalable SNP genotyping assay using microarray technology. Nat Genet. 2005 May;37(5):549-54. Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet. 2005 Feb;6(2):95-108 Wang WY, Barratt BJ, Clayton DG, Todd JA. Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet. 2005 Feb;6(2):109-18. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007 Jun 7;447(7145):661-78. Machine Learning Bishop CM A New Framework for Machine Learning Computational Intelligence: Research Frontiers, IEEE World Congress on Computational Intelligence, WCCI 2008, Hong Kong, June 2008 Lecture Notes in Computer Science LNCS 5050, 1–24. Curtis D. Comparison of artificial neural network analysis with other multimarker methods for detecting genetic association. BMC Genet. 2007 Jul 18;8:49. Curtis D, North BV, Sham PC Use of an artificial neural network to detect association between a disease and multiple marker genotypes Ann. Hum. Genet. (2001), 65, 95-107. McKinney BA, Reif DM, Ritchie MD, Moore JH. Machine learning for detecting gene-gene interactions: a review. Appl Bioinformatics. 2006;5(2):77-88. Moore JH, Gilbert JC, Tsaif C-T, Chiang F-T, Holden T, Barney N, White BC A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility Journal of Theoretical Biology 241 (2006) 252–261 Ziegler A, DeStefano AL, König IR, Bardel C, Brinza D, Bull S, Cai Z, Glaser B, Jiang W, Lee KE, Li CX, Li J, Li X, Majoram P, Meng Y, Nicodemus KK, Platt A, Schwarz DF, Shi W, Shugart YY, Stassen HH, Sun YV, Won S, Wang W, Wahba G, Zagaar UA, Zhao Z. Data mining, neural nets, trees--problems 2 and 3 of Genetic Analysis Workshop 15. Genet Epidemiol. 2007;31 Suppl 1:S51-60.