Epigenomics and `chromatin state`

advertisement
Manolis Kellis: Research synopsis
• Why biology in a computer science group?
• Big biological questions:
1. Interpreting the human genome.
Brief overview
2. Revealing the logic of gene regulation. 1 slide each
3. Principles of evolutionary change.
• Underlying computational techniques:
– Comparative genomics: evolutionary signatures
– Regulatory genomics: motifs, networks, models
– Epigenomics: chromatin states, dynamics, disease
vignette
– Phylogenomics: evolution at the genome scale
• Defining characteristics of research program:
– Genome-wide rules, exploit nature of problems,
interdisciplinary collaborations, biology impact
(1) Comparative genomics: evolutionary signatures
Protein-coding signatures
• 1000s new coding exons
• Translational readthrough
• Overlapping constraints
Non-coding RNA signatures
• Novel structural families
• Targeting, editing, stability
• Structures in coding exons
microRNA signatures:
• Novel/expanded miR families
• miR/miR* arm cooperation
• Sense/anti-sense switches
Regulatory motif signatures
• Systematic motif discovery
• Regulatory motif instances
• TF/miRNA target networks
• Single binding-site resolution
(2) Regulatory genomics: circuits, predictive models
ENCODE/modENCODE
• 4-year effort, dozens of
experimental labs
• Integrative analysis
• Systematic genome
annotation
• Flagship NIH project
Predictive models
of gene regulation
• Infer networks
• Predict function
• Predict regulators
• Predict gene
expression
• Initial annotation of the non-coding genome, from 20% to 70%
• Systems biology for an animal genome for the first time possible
• Students and postdocs are co-first authors, leadership roles
New phylogenomic pipeline
Bayesian
formulation
Generative model
(3) Phylogenomics: Bayesian gene-tree reconstruction
Two components of gene evolution
1. Family rate
Fj
~gamma
(α,β)
2. Species-specific rates
Si
~normal(μi,σi)
Selective
pressures on
gene function
Population dynamics of the species
Length I, Topology T, Reconciliation R
Alignment data D, species-level parameters θ
Sequence
likelihood
Branch length
prior
Topology
prior
HKY model
(traditional)
Learned Fj,Si
distributions
Birth-Death
process
Vignette: Epigenomics
Jason Ernst, Pouya Kheradpour
Ernst and Kellis, Nature Biotech, 2010
Ernst, Kheradpour et al, Nature, 2011 (in press)
Epigenomics and ‘chromatin state’ signatures
DNA
Promoter states
Transcribed states
Histone
tails
Active Intergenic
Repressed
Chromatin
‘marks’
• Learn de novo
combinations of
chromatin marks
• Reveal functional
elements
• Use for genome
annotation
• Use for studying
dynamics across
many cell types
ChromHMM: learning ‘hidden’ chromatin states
Transcription
Start Site
Enhancer
Observed
chromatin
marks. Called
based on a
poisson
distribution
Most likely
Hidden State
K4me1
K27ac
1
200bp
intervals
K4me3
K4me3
Transcribed Region
K4me1
K36me3
K36me3
4
6
6
DNA
K36me3 K36me3
K4me1
2
3
6
6
High Probability Chromatin Marks in State
0.8
0.8
0.7
1:
2:
3:
K4me1
K27ac
0.9
0.8
K4me3
K4me1
0.9
K4me3
Each state: vector of emissions, vector of transitions
4:
K4me1
5:
6:
0.9
K36me3
6
5
5
5
All probabilities are
learned de novo from
chromatin data alone
(Baum-Welch aka. EM)
7
Chromatin states dynamics across nine cell types
• State definitions are cell-type invariant
– Same combinations consistently found
• State locations are cell-type specific
– Can study pair-wise or multi-way changes
Multi-cell activity profiles and their correlations
Gene
expression
Chromatin
States
Active TF motif
enrichment
TF regulator
expression
Dip-aligned
motif biases
HUVEC
NHEK
GM12878
K562
HepG2
NHLF
HMEC
HSMM
H1
ON
OFF
Active enhancer
Repressed
Motif enrichment
Motif depletion
TF On
TF Off
Motif aligned
Flat profile
Chromatin state & gene expression  link enhancers and target genes
TF motif enrichment & TF expression  reveal activators / repressors
Coordinated activity reveals enhancer links
Enhancer
activity
Gene
activity
Predicted
regulators
Activity signatures for each TF
• Enhancer networks: Regulator  enhancer  target gene
• Ex1: Oct4 predicted activator of embryonic stem (ES) cells
• Ex2: Ets activator of GM/HUVEC (but not either one alone)
Revisiting disease- xx
associated variants
• Disease-associated SNPs enriched for enhancers in relevant cell types
• E.g. lupus SNP in GM enhancer disrupts Ets1 predicted activator
Contributions
Science
We aim to further our understanding of
Nature
Nature
the human genome by computational
Nature
integration of large-scale functional and
comparative genomics datasets.
Nature Biotech
• We use comparative genomics of
Nature
Nature
multiple related species to recognize
PLoS Genetics
evolutionary signatures of proteincoding genes, RNA structures,
MBE
microRNAs, regulatory motifs, and
Genome Research
Nature
individual regulatory elements.
• We use combinations of epigenetic
modifications to define chromatin states Genome Research
Nature
associated with distinct functions,
Genome Research
PLoS Comp. Bio.
including promoter, enhancer,
transcribed, and repressed regions, each
with distinct functional properties.
Genes & Development
• We develop phylogenomic methods to Genome Research
study differences between species and to
Nature
uncover evolutionary mechanisms for the
PNAS
emergence of new gene functions
BMC Evo. Bio.
ACM TKDD
Our methods have led to numerous new
insights on diverse regulatory mechanisms,
uncovered evolutionary principles, and
Genome Research
RECOMB
provide mechanistic insights for previously
J. Comp. Bio.
uncharacterized disease-associated SNPs
PNAS
Nature Nature Nature Nature
Nature Gen Genes&Dev
Nature In review
Nature Nature Biotech
Nature Nature Nature Nature WBpress
PNAS
Nature G.R. BioChem
Nature GenomRes Nature G.R. Science
RECOMB RECOMB
Download