New Approaches to Large-Scale Data Integration: Across Variant Types and Across -Omics Nancy J. Cox, Ph.D. The University of Chicago http://genemed.bsd.uchicago.edu Overview • Why are we obtaining largely negative results in sequence studies of common disease? - Rationale for results - Integrating rare and common variants • Integration across transcriptome and genome - Some integration - Better integration - Really cool integration G C A C G G T T T G T T C C C A G C T A G C G G T T C G T A T C T G C A C G G T T G T T T C C C T A A A C T C C C T G G G C G T T T Evidence that Sequencing Will Yield Discoveries Accounting for Substantial Heritability for Common Diseases with Complex Transmission Results of Sequencing Studies • Lipids and cardiovascular phenotypes - Small number of new genes - Rediscover known genes • Schizophrenia - No new genes - Some enrichment of rare variants in prespecified gene set • Type 2 diabetes and related traits - Few new genes, some enrichment in sets Relationship Between MAF and Effect Size Lobo, I. (2008) Multifactorial inheritance and genetic disease. Nature Education 1(1):5 Relationship between MAF and Effect Size Effect size q Relationship between MAF and Effect Size Effect size q Relationship between MAF and Effect Size Effect size q Relationship between MAF and Effect Size But WHY? What is wrong with the way we were thinking? Effect size q Gene Protein Gane Pretein Selection Gane Pretein Gepe Proteen T2D Selection Gepe Proteen T2D Selection Serious, early onset disease Be optimistic about data integration! Relating Variation to Phenotype Genotyping Sequencing Relating Variation to Phenotype Common Variants Rare Variants Genome Interrogation Galton, 1889 Polygenic Load Rare Variant Burden Polygenic Load Rare Variant Burden Polygenic Load Rare Variant Burden Polygenic Load Rare Variant Burden Implications of Inverse Axis of Risk • Study design – families are rare, but subjects with GWAS are not - Sequence affecteds with low polygenic load and unaffecteds with high polygenic load • Analysis and interpretation of existing sequencing data - Weighting polygenic load to distinguish contributory de novo from rest - Incorporating into general analysis of rare variants to improve power Complex Traits Type 1 Diabetes Overall 0.48 0.06 Crohns Disease 0.50 0.07 Concentration of Heritability • Smaller numbers of eQTLs (3-30K) account for 30-60% of heritability estimated for all variants after QC (150-600K) • Observed across autoimmune and inflammatory diseases, bipolar disorder (brain), T2D (muscle+adipose) • Improved prediction? nformation rom largescale data PrediXscan • Build large-scale predictors of gene expression within and across tissues • Validate predictors in independent data • Apply to phenotype data with genome variation to identify genes with significant differences in predicted gene expression Known Crohns Disease Genes Whole Blood Predictors Gene SLC22A5 CARD9 SOX4 ZGPAT ERAP2 IL18RAP GCKR IL23R TNFSF11 UBE2L3 KLF6 T-Stat -4.03 3.71 3.43 -3.42 3.15 2.95 2.80 2.78 2.63 2.44 -2.39 Cerebellum Predictors Gene ATG16L1 GSDMB PTPRK C5orf56 CCDC88B TAGAP PTRF CCNY ERAP2 SBNO2 DNMT3A T-Stat -6.35 -3.39 3.23 -2.97 2.96 -2.83 -2.83 2.65 2.58 2.55 -2.55 Cox Lab Eric Gamazon Lea Davis (Bridget) Anna Tikhomirov Jason Torres Keston AquinoMichaels Carolyn Jumper Anuar Konkashbaev Anna Pluzhnikov Vasily Trubetskoy Colleagues & Collaborators Bob Grossman Dan Nicolae M. Eileen Dolan Haky Im Chun-yu Liu Andrey Rzhetsky Acknowledgements The GTEx Consortium Investigators (GTEx Pilot phase) • cancer Human Biobank (caHUB) • Biospecimen Source Sites (BSS) • • • • • John Lonsdale, Jeffrey Thomas, Mike Salvatore, Rebecca Phillips, Edmund Lo, Saboor Shad, National Disease Research Interchange, Philadelphia, PA Richard Hasz, Gift of Life Donor Program, Philadelphia, PA Gary Walters, LifeNet Health, Virginia Beach, VA Nancy Young, Albert Einstein Medical Center, Philadelphia, PA • Laura Siminoff (ELSI Study), Heather Traino, Maghboeba Mosavel, Laura Barker, Virginia • • Commonwealth University, Richmond, VA Barbara Foster, Mike Moser, Ellen Karasik, Bryan Gillard, Kimberley Ramsey, Roswell Park • • Cancer Institute, Buffalo, NY Susan Sullivan, Jason Bridge, Upstate New York Transplant Service, Buffalo, NY • Comprehensive Biospecimen Resource (CBR) • • Scott Jewell, Dan Rohr, Dan Maxim, Dana Filkins, Philip Harbach, Eddie Cortadillo, Bree Berghuis, Lisa Turner, Melissa Hanson, Anthony Watkins, Brian Smith, Van Andel Institute, Grand Rapids, MI • Pathology Resource Center (PRC) • • • • Leslie Sobin, James Robb, SAIC-Frederick, Inc., Frederick, MD Phillip Branton, National Cancer Institute, Bethesda, MD John Madden, Duke University, Durham, NC Jim Robb, Mary Kennedy, College of American Pathologists, Northfield, IL • Comprehensive Data Resource (CDR) • Greg Korzeniewski, Charles Shive, Liqun Qi, David Tabor, Sreenath Nampally, SAICFrederick, Inc., Frederick, MD • caHUB Operations Management • Steve Buia, Angela Britton, Anna Smith, Karna Robinson, Robin Burges, Karna Robinson, Kim Valentino, Deborah Bradbury, SAIC-Frederick, Inc., Frederick, MD Kenyon Erickson, Sapient Government Services, Arlington, VA • • • Brain Bank Laboratory, Data Analysis, and Coordinating Center (LDACC) Kristin Ardlie, Gad Getz, co-PIs; David DeLuca, Taylor Young, Ellen Gelfand, Tim Sullivan, Yan Meng, Ayellet Segre, Jules Maller, Pouya Kheradpour, Luke Ward, Daniel MacArthur, Manolis Kellis, The Broad Institute of Harvard and MIT, Inc., Cambridge, MA Statistical Methods Development (R01) Jun Liu, co-PI, Harvard University, Boston, MA, USA Jun Zhu, co-PI; Zhidong Tu, Bin Zhang, Mt Sinai School of Medicine, New York, NY Nancy Cox, Dan Nicolae, co-PIs; Eric Gamazon, Haky Im, Anuar Konkashbaev, University of Chicago, Chicago, IL Jonathan Pritchard, PI; Matthew Stevens, Timothèe Flutre, Xiaoquan Wen, University of Chicago, Chicago, IL Emmanouil T. Dermitzakis, co-PI; Tuuli Lappalainen, Pedro Ferreira, University of Geneva, Geneva, Switzerland Roderic Guigo, co-PI; Jean Monlong, Michael Sammeth, Center for Genomic Regulaton, Barcelona, Spain Daphne Koller, co-PI; Alexis Battle, Sara Mostafavi, Stanford University, Palo Alto, CA Mark McCarthy, co-PI; Manuel Rivas, Andrew Morris, Oxford University, Oxford, United Kingdom Ivan Rusyn, Andrew Nobel, Fred Wright, Co-PIs; Andrey Shabalin, University of North Carolina Chapel Hill, Chapel Hill, NC US National Institutes of Health NCBI dbGaP Mike Feolo, Steve Sherry, Jim Ostell, Nataliya Sharopova, Anne Sturcke, National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD Program Management Leslie Derr, Office of Strategic Coordination (Common Fund), Office of the Director, National Institutes of Health, Bethesda, MD Eric Green, Jeffery P. Struewing, Simona Volpi, Joy Boyer, Deborah Colantuoni, National Human Genome Research Institute, Bethesda, MD Thomas Insel, Susan Koester, A. Roger Little, Patrick Bender, Thomas Lehner, National Institute of Mental Health, Bethesda, MD Jim Vaught, Sherry Sawyer, Nicole Lockhart, Chana Rabiner, Joanne Demchok, National Cancer Institute, Bethesda, MD