Noonan

advertisement
Regulomics I:
Methods to read out regulatory functions
Identifying regulatory functions in genomes
Chr5: 133,876,119 – 134,876,119
Genes
Transcription
• Regulatory elements are not easily detected by sequence analysis
• Examine biochemical correlates of RE activity in cells/tissues:
• Chromatin Immunoprecipitation (ChIP-seq)
• DNase-seq and FAIRE
• Methylated DNA immunoprecipitation (MeDIP)
Identifying regulatory functions in genomes
Noonan and McCallion, Ann Rev Genomics Hum Genet 11:1 (2010)
Biochemical indicators of regulatory function
1. TF binding
2. Histone
modification
3. Chromatin
modifiers &
coactivators
4. DNA looping
factors
• H3K27ac
p300
• H3K4me3
MLL
cohesin
Regulatory functions are tissue/cell type/time point-specific
From Visel et al. (2009) Nature 461:199
Identifying regulatory functions in genomes
Chr5: 133,876,119 – 134,876,119
Genes
Transcription
Histone mods
TF binding
Methods
ChIP-seq
TFs
Chromatin accessibility
Histone mods
DNase
FAIRE
From Furey (2012) Nat Rev Genet 13:840
ChIP-seq
ChIP
Peak call
Signal
Input
Align reads to reference
Use peaks of mapped reads to
identify binding events
PCR
Calling peaks in ChIP-seq data
ChIP
Peak call
Enrichment
relative to control
Input
ChIP-seq is an enrichment method
Requires a statistical framework for determining the significance of enrichment
ChIP-seq ‘peaks’ are regions of enriched read density relative to an input control
Input = sonicated chromatin collected prior to immunoprecipitation
There are many ChIP-seq peak callers available
Wilbanks and Facciotti PLoS ONE 5:e11471 (2010)
Generating ChIP-seq peak profiles
Artifacts:
• Repeats
• PCR duplicates
From Park (2009) Nat Rev Genet 10:669
Assessing statistical significance
Assume read distribution follows a
Poisson distribution
Many sites in input data will have some
reads by chance
Some sites will have many reads
# of reads at a site (S)
Empirical FDR: Call peaks in input (using ChIP as control)
FDR = ratio of # of peaks of given enrichment value called in input vs ChIP
From Pepke et al (2009) Nat Meth 6:S22
Assessing statistical significance
Sequencing depth matters:
# of reads at a site (S)
From Park (2009) Nat Rev Genet 10:669
ChIP-seq signal profiles vary depending on factor
Transcription
factors
Pol II
Histone
mods
From Park (2009) Nat Rev Genet 10:669
Quantitative analysis of ChIP-seq signal profiles
HeLa K562
ChIP-seq
signal
Signal at 20,000 bound sites
HeLa
Sites strongly
marked in HeLa
Sites
strongly marked
in both
Clustering
Sites
strongly
marked
in K562
ChIP-seq analysis workflow
From Park (2009) Nat Rev Genet 10:669
Interpreting ChIP-seq datasets
Requires some prior knowledge
•
•
•
TF function
Histone modification
Potential target genes
Exploit existing annotation
• Promoter locations
• Known binding sites
• Known histone modification maps
Example from PS1: CTCF and RAD21 (cohesin)
CTCF and cohesin co-occupy many sites
Promoters
Insulators
Enhancers
From Kagey et al (2010) Nature 467:430
Promoter
Enhancers?
CTCF: marks insulators and promoters
RAD21 (cohesin): marks insulators, promoters and enhancers
Discovering regulatory functions specific to a biological state
Limb Brain
Function?
Assign enhancers to genes based on proximity (not ideal)
GREAT: bejerano.stanford.edu/great/
Gene ontology annotation assigned to regulatory sequences
TF motif elicitation from ChIP-seq data
CTCF
~20,000 binding sites identified by ChIP:
MEME suite:
http://meme.nbcr.net/meme/
From Furey (2012) Nat Rev Genet 13:840
Single TF binding events may not indicate regulatory function
• Many TFs are present at high concentrations
in the nucleus
• TF motifs are abundant in the genome
Enhancer-associated
histone modification
• Single TF binding events may be incidental
Mapping chromatin accessibility
DNase I
FAIRE
From Furey (2012)
Nat Rev Genet
13:840
DNase I hypersensitivity identifies TF binding events
From Furey (2012) Nat Rev Genet 13:840
DNase I hypersensitivity identifies regulatory elements
DNase I hypersensitive sites
Song et al., Genome Res 21:1757 (2011)
De novo TF motif discovery by DNase I hypersensitivity mapping
In human ES cells:
From Neph (2012) Nature 489:83
De novo TF motif discovery by DNase I hypersensitivity mapping
Across tissue types:
From Neph (2012) Nature 489:83
Capturing long-range regulatory interactions
From Visel et al. (2009) Nature 461:199
Chromosome
Conformation
Capture
Methods
ChIP for specific factors:
ChIA-PET
Sequence
Sequence: Hi-C
Sequence
Long-range regulatory interactions mediated by specific factors:
RNA PolII
From Kieffer-Kwon et al. (2013) Cell 155:1507
Long-range regulatory interactions mediated by specific factors:
Cohesin
Int – Intergenic or intronic
Pr – Promoter
Ex – Exonic
From DeMare et al. (2013) Genome Res. 23:1224
Summary
• Relevant overview papers on ChIP-seq and DNase-seq posted on class wiki
• Wednesday: Epigenetics and the histone code
Download