Genomic and epigenomic signatures for interpreting complex disease

advertisement
Big Data Opportunities and Challenges
in Human Disease Genetics & Genomics
Manolis Kellis
Broad Institute of MIT and Harvard
MIT Computer Science & Artificial Intelligence Laboratory
Big data Opportunities & Challenges
in human disease genetics & genomics
• The goal: Mechanistic basis of human disease
– Epigenomics: Enhancers, networks, regulators, motifs
– Genetics: GWAS, QTLs, molecular epidemiology
• The challenges / opportunities:
– Effects are very small, huge number of hypotheses
– Much larger cohorts are needed, consent limitations
– Technologies for privacy vs. excuse for data hoarding
• Overcoming the challenges:
– Case study: Schizophrenia, Alzheimer’s
– Collaboration & sharing: personal & technological
Bringing knowledge gap from genetics to disease
Tissue
Cell Type
Heart
Genetic
Variant
CATGACTG
CATGCCTG
Retina
Cortex
Lung
Blood
Skin
Control regions
Chromatin Target genes
states
Protein
Promoter
Enhancer
Insulator
miRNA
TIMP3
ncRNA
Silencer
Nerve
Intermediate
effects
Lipids
Tension
Eye drusen
Metabolism
Drug
response
Disease
Factors
Circuitry
Environment
Requires: systematic understanding of genome function
The most complete map of human gene regulation
• 2.3M regulatory elements across 127 tissue/cell types
• High-resolution map of individual regulatory motifs
• Circuitry: regulatorsregionsmotifstarget genes
Non-coding variants lie in tissue-specific regulatory regions
• Yield new insights on relevant tissues and pathways
• Enable linking non-coding elements to relevant target genes
• Provide a mechanistic basis for developing therapeutics
Control regions harbor 1000s weak-effect disease SNPs
• GWAS top hits only explain small fraction of trait heritability
• Functional enrichments well past genome-wide significance
Bayesian integration of weak effects  disease modules
Poorly ranked
SNP nearby
Highly ranked
SNP nearby
Disease gene
Genetic association
Disease SNP
• MAZ no direct assoc, but clusters w/ many T1D hits
• MAZ indeed known regulator of insulin expression
Brain methylation changes in Alzheimer’s patients
MAP Memory and Aging Project
+ ROS Religious Order Study
Dorsolateral PFC
Genotype
(1M SNPs
x700 ind.)
Reference
Chromatin
states
Methylation
(450k probes
x 700 ind)
• Variation in methylation patterns largely genotype driven
• Global signature of repression in 1000s regulatory regions:
hypermethylation, enhancer states, brain regulator targets
Big data Opportunities & Challenges
in human disease genetics & genomics
• The goal: Mechanistic basis of human disease
– Epigenomics: Enhancers, networks, regulators, motifs
– Genetics: GWAS, QTLs, molecular epidemiology
• The challenges / opportunities:
– Effects are very small, huge number of hypotheses
– Much larger cohorts are needed, consent limitations
– Technologies for privacy vs. excuse for data hoarding
• Overcoming the challenges:
– Case study: Schizophrenia, Alzheimer’s
– Collaboration & sharing: personal & technological
Big data Opportunities & Challenges
in human disease genetics & genomics
• The goal: Mechanistic basis of human disease
– Epigenomics: Enhancers, networks, regulators, motifs
– Genetics: GWAS, QTLs, molecular epidemiology
• The challenges / opportunities:
– Effects are very small, huge number of hypotheses
– Much larger cohorts are needed, consent limitations
– Technologies for privacy vs. excuse for data hoarding
• Overcoming the challenges:
– Case study: Schizophrenia, Alzheimer’s
– Collaboration & sharing: personal & technological
Scaling of QTL discovery power w/ sample
• Number of meQTLs continues to increase linearly
• Weak-effect meQTLs: median R2<0.1 after 400 indiv.
Inflection point in complex trait GWAS
Incl. replication (~100K)
Freeze May 2013 (~80K)
120
Freeze Jan. 2013 (~70K)
100
WCPG Hamburg 2012 (~65K)
80
60
Incl. SWE + CLOZUK
(~60K)
40
20
0
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
Schizophrenia GWAS: Number of significant loci
3,500 cases  0 loci
10,000 cases  5 loci
35,000 cases  62 loci!
Similar inflection point found in every complex trait!
Adult height
Crohn’s
(per 5000/5000) (per 1000/1000)
1x
2x
3x
9x
18x
0
2
7
68
180
2
4
5
51
-
Schizophrenia
(per 3000/3000)
1
2
6
62
-
Same story in:
• Type 1 diabetes
• Type 2 diabetes
• Serum cholesterol level
• Every common chronic
disease
Significantly associated regions (p < 5e-08)
Larger samples lead to new biological insights
• Proof that Schizophrenia is a heritable, medical disorder
• Genetic architecture similar to non-brain diseases and traits
• Many genes  recognition of key pathways and processes
•
•
•
•
Voltage-gated calcium channels (CACNA1C, CACNA1D, CACNA1I, CACNB2)
Proteins interacting with FMRP, fragile X gene
Neuron organization: Postsynaptic density, dendritic spine heads
Enhancers: brain (angular gyrus, inferior temporal lobe), immune
Big data Opportunities & Challenges
in human disease genetics & genomics
• The goal: Mechanistic basis of human disease
– Epigenomics: Enhancers, networks, regulators, motifs
– Genetics: GWAS, QTLs, molecular epidemiology
• The challenges / opportunities:
– Effects are very small, huge number of hypotheses
– Much larger cohorts are needed, consent limitations
– Technologies for privacy vs. excuse for data hoarding
• Overcoming the challenges:
– Collaboration, consortia, sharing of datasets
– Case study: Schizophrenia, Alzheimer’s
Download