Big Data Opportunities and Challenges in Human Disease Genetics & Genomics Manolis Kellis Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory Big data Opportunities & Challenges in human disease genetics & genomics • The goal: Mechanistic basis of human disease – Epigenomics: Enhancers, networks, regulators, motifs – Genetics: GWAS, QTLs, molecular epidemiology • The challenges / opportunities: – Effects are very small, huge number of hypotheses – Much larger cohorts are needed, consent limitations – Technologies for privacy vs. excuse for data hoarding • Overcoming the challenges: – Case study: Schizophrenia, Alzheimer’s – Collaboration & sharing: personal & technological Bringing knowledge gap from genetics to disease Tissue Cell Type Heart Genetic Variant CATGACTG CATGCCTG Retina Cortex Lung Blood Skin Control regions Chromatin Target genes states Protein Promoter Enhancer Insulator miRNA TIMP3 ncRNA Silencer Nerve Intermediate effects Lipids Tension Eye drusen Metabolism Drug response Disease Factors Circuitry Environment Requires: systematic understanding of genome function The most complete map of human gene regulation • 2.3M regulatory elements across 127 tissue/cell types • High-resolution map of individual regulatory motifs • Circuitry: regulatorsregionsmotifstarget genes Non-coding variants lie in tissue-specific regulatory regions • Yield new insights on relevant tissues and pathways • Enable linking non-coding elements to relevant target genes • Provide a mechanistic basis for developing therapeutics Control regions harbor 1000s weak-effect disease SNPs • GWAS top hits only explain small fraction of trait heritability • Functional enrichments well past genome-wide significance Bayesian integration of weak effects disease modules Poorly ranked SNP nearby Highly ranked SNP nearby Disease gene Genetic association Disease SNP • MAZ no direct assoc, but clusters w/ many T1D hits • MAZ indeed known regulator of insulin expression Brain methylation changes in Alzheimer’s patients MAP Memory and Aging Project + ROS Religious Order Study Dorsolateral PFC Genotype (1M SNPs x700 ind.) Reference Chromatin states Methylation (450k probes x 700 ind) • Variation in methylation patterns largely genotype driven • Global signature of repression in 1000s regulatory regions: hypermethylation, enhancer states, brain regulator targets Big data Opportunities & Challenges in human disease genetics & genomics • The goal: Mechanistic basis of human disease – Epigenomics: Enhancers, networks, regulators, motifs – Genetics: GWAS, QTLs, molecular epidemiology • The challenges / opportunities: – Effects are very small, huge number of hypotheses – Much larger cohorts are needed, consent limitations – Technologies for privacy vs. excuse for data hoarding • Overcoming the challenges: – Case study: Schizophrenia, Alzheimer’s – Collaboration & sharing: personal & technological Big data Opportunities & Challenges in human disease genetics & genomics • The goal: Mechanistic basis of human disease – Epigenomics: Enhancers, networks, regulators, motifs – Genetics: GWAS, QTLs, molecular epidemiology • The challenges / opportunities: – Effects are very small, huge number of hypotheses – Much larger cohorts are needed, consent limitations – Technologies for privacy vs. excuse for data hoarding • Overcoming the challenges: – Case study: Schizophrenia, Alzheimer’s – Collaboration & sharing: personal & technological Scaling of QTL discovery power w/ sample • Number of meQTLs continues to increase linearly • Weak-effect meQTLs: median R2<0.1 after 400 indiv. Inflection point in complex trait GWAS Incl. replication (~100K) Freeze May 2013 (~80K) 120 Freeze Jan. 2013 (~70K) 100 WCPG Hamburg 2012 (~65K) 80 60 Incl. SWE + CLOZUK (~60K) 40 20 0 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Schizophrenia GWAS: Number of significant loci 3,500 cases 0 loci 10,000 cases 5 loci 35,000 cases 62 loci! Similar inflection point found in every complex trait! Adult height Crohn’s (per 5000/5000) (per 1000/1000) 1x 2x 3x 9x 18x 0 2 7 68 180 2 4 5 51 - Schizophrenia (per 3000/3000) 1 2 6 62 - Same story in: • Type 1 diabetes • Type 2 diabetes • Serum cholesterol level • Every common chronic disease Significantly associated regions (p < 5e-08) Larger samples lead to new biological insights • Proof that Schizophrenia is a heritable, medical disorder • Genetic architecture similar to non-brain diseases and traits • Many genes recognition of key pathways and processes • • • • Voltage-gated calcium channels (CACNA1C, CACNA1D, CACNA1I, CACNB2) Proteins interacting with FMRP, fragile X gene Neuron organization: Postsynaptic density, dendritic spine heads Enhancers: brain (angular gyrus, inferior temporal lobe), immune Big data Opportunities & Challenges in human disease genetics & genomics • The goal: Mechanistic basis of human disease – Epigenomics: Enhancers, networks, regulators, motifs – Genetics: GWAS, QTLs, molecular epidemiology • The challenges / opportunities: – Effects are very small, huge number of hypotheses – Much larger cohorts are needed, consent limitations – Technologies for privacy vs. excuse for data hoarding • Overcoming the challenges: – Collaboration, consortia, sharing of datasets – Case study: Schizophrenia, Alzheimer’s