Machine Learning in Complex Disease Gene Discovery
This project is directed at applying machine learning algorithms to the detection of susceptibility genes
underlying a variety of genetically complex disorders.
The identification of susceptibility genes contributing to the etiology of genetically complex disorders has
been revolutionized by genome-wide association (GWA) studies. These studies evaluate the LINEAR
association between disorders such as Asthma, Bipolar Disorder, Cancer, Diabetes (Type I, Type II),
Hypertension, Inflammatory Bowel Disease, Multiple Sclerosis, Obesity, and Schizophrenia, and a huge
number of single-nucleotide polymorphism (SNP) genetic markers, arrayed on a DNA microchip.
Significant cost and effort have gone into the collection and phenotypic characterization of large patient
samples as well as into the genotypic characterization of SNPs. Thus, DNA samples from each of
thousands of individuals affected with a disorder are simultaneously compared with more than one million
SNPs. Nonetheless, relatively few disease susceptibility genes have been discovered (< 10/disorder),
despite these analyses comprising more than one billion data points each.
The relatively low statistical power of GWA studies may arise from two factors: 1) failure to evaluate NONLINEAR as well as linear relationships between disease and markers and 2) disregard of an enormous
wealth of biological information which exists in gene interaction and gene ontology databases across
many species, as well as in neuroanatomic, neurodevelopmental, and disease gene expression
We are developing a statistical approach to the non-linear analysis of disease-gene relationships in which
machine learning paradigms (such as Artificial Neural Networks, Bayesian Networks, “Genetic”
Algorithms, and Support Vector Machines) are combined with Expert Knowledge derived from the
aforementioned databases to arrive at significant sets of interacting and biologically plausible candidate
genes which, acting individually or in concert, contribute to the pathogenesis of complex human diseases.
Once developed, this approach will be applied to one well-studied, and one less well-studied, genetically
complex disorder, namely age-related macular degeneration and schizophrenia, respectively.
We hope that this project will lay the groundwork for a novel approach to understanding a wide variety of
medical disorders, which, while etiologically complex, are also chronic, disabling, and extremely common.
Complex Disorders
Allen NC, Bagade S, McQueen MB, Ioannidis JP, Kavvoura FK, Khoury MJ, Tanzi RE, Bertram L.
Systematic meta-analyses and field synopsis of genetic association studies in schizophrenia: the SzGene
Nat Genet. 2008 Jul;40(7):827-34.
Allikmets R, Dean M.
Bringing age-related macular degeneration into focus.
Nat Genet. 2008 Jul;40(7):820-1. No abstract available
Becker KG
The common variants/multiple disease hypothesis of common complex genetic disorders.
Med Hypotheses. 2004;62(2):309-17.
Carter CJ.
Schizophrenia susceptibility genes converge on interlinked pathways related to glutamatergic
transmission and long-term potentiation, oxidative stress and oligodendrocyte viability.
Schizophr Res. 2006 Sep;86(1-3):1-14..
Ferreira MA, O'Donovan MC, Meng YA, Jones IR, Ruderfer DM, Jones L, Fan J, Kirov G, Perlis RH,
Green EK, Smoller JW, Grozeva D, Stone J, Nikolov I, Chambert K, Hamshere ML, Nimgaonkar VL,
Moskvina V, Thase ME, Caesar S, Sachs GS, Franklin J, Gordon-Smith K, Ardlie KG, Gabriel SB, Fraser
C, Blumenstiel B, Defelice M, Breen G, Gill M, Morris DW, Elkin A, Muir WJ, McGhee KA, Williamson R,
Macintyre DJ, Maclean AW, St Clair D, Robinson M, Van Beck M, Pereira AC, Kandaswamy R, McQuillin
A, Collier DA, Bass NJ, Young AH, Lawrence J, Nicol Ferrier I, Anjorin A, Farmer A, Curtis D, Scolnick
EM, McGuffin P, Daly MJ, Corvin AP, Holmans PA, Blackwood DH; Wellcome Trust Case Control
Consortium, Gurling HM, Owen MJ, Purcell SM, Sklar P, Craddock N.
Collaborative genome-wide association analysis supports a role for ANK3 and CACNA1C in bipolar
Nat Genet. 2008 Aug 17. [Epub ahead of print]
O'Donovan MC, Craddock N, Norton N, Williams H, Peirce T, Moskvina V, Nikolov I, Hamshere M, Carroll
L, Georgieva L, Dwyer S, Holmans P, Marchini JL, Spencer CC, Howie B, Leung HT, Hartmann AM,
Möller HJ, Morris DW, Shi Y, Feng G, Hoffmann P, Propping P, Vasilescu C, Maier W, Rietschel M,
Zammit S, Schumacher J, Quinn EM, Schulze TG, Williams NM, Giegling I, Iwata N, Ikeda M, Darvasi A,
Shifman S, He L, Duan J, Sanders AR, Levinson DF, Gejman PV; Molecular Genetics of Schizophrenia
Collaboration, Gejman PV, Sanders AR, Duan J, Levinson DF, Buccola NG, Mowry BJ, Freedman R,
Amin F, Black DW, Silverman JM, Byerley WF, Cloninger CR, Cichon S, Nöthen MM, Gill M, Corvin A,
Rujescu D, Kirov G, Owen MJ.
Identification of loci associated with schizophrenia by genome-wide association and follow-up.
Nat Genet. 2008 Jul 30. [Epub ahead of print]
Ala U, Piro RM, Grassi E, Damasco C, Silengo L.
Prediction of Human Disease Genes by Human-Mouse Conserved Coexpression Analysis.
PLoS Comput Biol 4(3): e1000043, 2008
Bussemaker HJ, Ward LD, Boorsma A.
Dissecting complex transcriptional responses using pathway-level scores based on prior information.
BMC Bioinformatics. 2007 Sep 27;8 Suppl 6:S6.
Higgs BW, Elashoff M, Richman S
An online database for brain disease research
BMC Genomics 2006, 7:70 doi:10.1186/1471-2164-7-70
Iossifov I, Zheng T, Baron M, Gilliam TC, Rzhetsky A
Genetic-linkage mapping of complex hereditary disorders to a whole-genome molecular-interaction
Genome Res. 18:1150-1162, 2008
Jin H, Kaloper M, Matese JC, Schroeder M, Brown PO, Botstein D.
The Stanford Microarray Database: data access and quality assessment tools
Nucleic Acids Res. 31:94-96, 2003.
Krauthammer M, Kaufmann CA, Gilliam TC, Rzhetsky A.
Molecular triangulation: bridging linkage and molecular-network information for identifying candidate
genes in Alzheimer's disease.
Proc Natl Acad Sci U S A. 2004 Oct 19;101(42):15148-53.
Genome-Wide Association Analysis
Carlson CS, Eberle MA, Kruglyak L, Nickerson DA.
Mapping complex disease loci in whole-genome association studies.
Nature. 2004 May 27;429(6990):446-52.
Eskin E
Increasing power in association studies by using linkage disequilibrium structure and molecular function
as prior information
Genome Res. 2008 18: 653-660.
Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS.
A genome-wide scalable SNP genotyping assay using microarray technology.
Nat Genet. 2005 May;37(5):549-54.
Hirschhorn JN, Daly MJ.
Genome-wide association studies for common diseases and complex traits.
Nat Rev Genet. 2005 Feb;6(2):95-108
Wang WY, Barratt BJ, Clayton DG, Todd JA.
Genome-wide association studies: theoretical and practical concerns.
Nat Rev Genet. 2005 Feb;6(2):109-18.
Wellcome Trust Case Control Consortium.
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.
Nature. 2007 Jun 7;447(7145):661-78.
Machine Learning
Bishop CM
A New Framework for Machine Learning
Computational Intelligence: Research Frontiers, IEEE World Congress on Computational Intelligence,
WCCI 2008, Hong Kong, June 2008 Lecture Notes in Computer Science LNCS 5050, 1–24.
Curtis D.
Comparison of artificial neural network analysis with other multimarker methods for detecting genetic
BMC Genet. 2007 Jul 18;8:49.
Curtis D, North BV, Sham PC
Use of an artificial neural network to detect association between a disease and multiple marker genotypes
Ann. Hum. Genet. (2001), 65, 95-107.
McKinney BA, Reif DM, Ritchie MD, Moore JH.
Machine learning for detecting gene-gene interactions: a review.
Appl Bioinformatics. 2006;5(2):77-88.
Moore JH, Gilbert JC, Tsaif C-T, Chiang F-T, Holden T, Barney N, White BC
A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of
epistasis in genetic studies of human disease susceptibility
Journal of Theoretical Biology 241 (2006) 252–261
Ziegler A, DeStefano AL, König IR, Bardel C, Brinza D, Bull S, Cai Z, Glaser B, Jiang W, Lee KE, Li CX,
Li J, Li X, Majoram P, Meng Y, Nicodemus KK, Platt A, Schwarz DF, Shi W, Shugart YY, Stassen HH,
Sun YV, Won S, Wang W, Wahba G, Zagaar UA, Zhao Z.
Data mining, neural nets, trees--problems 2 and 3 of Genetic Analysis Workshop 15.
Genet Epidemiol. 2007;31 Suppl 1:S51-60.