Xiaole Shirley Liu
STAT115, STAT215
• Chromatin ImmunoPrecipitation + microarray or high throughput sequencing
•
Detect genome-wide in vivo location of TF and other DNA-binding proteins
–
Find all the DNA sequences bound by TF-X?
–
Cook all the dishes with cinnamon
•
Can learn the regulatory mechanism of a transcription factor or DNA-binding protein much better and faster
2
3
4
Reverse Crosslink and DNA
Purification
5
Sequence millions of 30mer ends of fragments
ChIP-DNA
Noise
Map 30mers back to the genome
6
MACS: Model-based Analysis for ChIP-Seq
• Use confident peaks to model shift size
Binding
7
• Tag distribution along the genome ~ Poisson distribution (λ
BG
= total tag / genome size)
•
ChIP-Seq show local biases in the genome
–
Chromatin and sequencing bias
8
• Tag distribution along the genome ~ Poisson distribution (λ
BG
= total tag / genome size)
•
ChIP-Seq show local biases in the genome
–
Chromatin and sequencing bias
– 200-300bp control windows have to few tags
– But can look
ChIP
Control further
300bp
1kb
5kb
10kb Dynamic λ local
= max(λ
BG
, [λ ctrl
, λ
1k
,] λ
5k
, λ
10k
) http://liulab.dfci.harvard.edu/MACS/
Zhang et al, Genome Bio, 2008
• P-value and FDR?
•
Simulation: random sampling of reads?
• FDR = A / B (Ctrl/ChIP peaks are all FPs)
• MAT: Quality Control
<1% enriched
A B
Background 10
Enriched DNA
11
Yeast TF
Regulatory
Network
Protein
Transcribe Regulate
Gene
12
• Most TF binding sites are outside promoters
•
How to assign targets?
• Nearest distance?
•
Binding within 10KB?
• Number of binding?
•
Other knowledge?
13
• Binding have effect on up genes at all hours, but only have effect on down genes at 12 hours
14
• Stronger sites are not closer to differentially regulated genes (not necessarily more functional)
15
Tang et al, Cancer Res 2011
• Evolutionary conservation
–
Can be used for ChIP QC
•
Conserved sites more functional?
– Majority of functional sites not conserved
16
Odom et al, Nat Genet 2007
Higher Order Chromatin Interactions
Chromatin confirmation capture
Interactions follows exponential decay with distance
Lieberman-Aiden et al, Science 2009
• Binary decision?
•
Rank product of
• Regulatory potential
– Default λ 100kb
•
Differential expression
19
• ChIP-chip gives 10-5000 binding regions ~200-
1000bp long. Precise binding motif?
– Raw data is like perfect clustering, plus enrichment values
• MDscan
–
High ChIP ranking => true targets, contain more sites
– Search TF motif from highest ranking targets first
(high signal / background ratio)
–
Refine candidate motifs with all targets
20
For a given w-mer and any other random w-mer
TGTAACGT 8-mer
TGTAACGT matched 8
A GTAACGT matched 7
TG C AAC A T matched 6
TG AC ACG Gmatched 5
AA TAAC AGmatched 4
m-matches for
TGTAACGT
Pick a reasonable m to call two w-mers similar
21
A 9-mer
ATTGCAAAT
T TTGC G AAT
TTGCAAATC
ChIP-chip selected upstream sequences
Seed motif pattern
ATTGCAAAT
T TTGC G AAT
T TTGCAAAT
TTGCAAATC
TTGC G AAT A
TTGCAAAT T
TTGC CC ATC
T TTGCAAAT
GCAAATCCA
CAAATCCAA
GCAAAT T C G
CAAATCCAA
G AAATCCA C
GCAAATCCA
G G AAATCCA
G G AAATCC T
TGCAAATCC
TGCAAAT T C
GCCACCGT
A CCACCGT
A CCAC G GT
GCCAC G G C
…
22
Update Motifs With Remaining Seqs
m-matches
Extreme
High
Rank
Seed1
All ChIP-selected targets
23
Extreme
High
Rank
Seed1
m-matches
All ChIP-selected targets
24
• Could also be used to examine known motif enrichment
•
Is motif enrichment correlated with ChIP-seq enrichment?
•
Is motif more enriched in peak summits than peak flanks?
•
Motif analysis could identify transcription factor partners of ChIP-seq factors
25
• Carroll et al, Cell 2005
•
Overactive in > 70% of breast cancers
• Where does it go in the genome?
•
ChIP-chip on chr21/22, motif and expression analysis found its “pioneering factor” FoxA1
ER
TF??
Estrogen Receptor (ER)
Cistrome in Breast Cancer
•
Carroll et al, Nat Genet 2006
• ER may function far away (100-200KB) from genes
• Only 20% of ER sites have PhastCons > 0.2
• ER has different effect based on different collaborators
NRIP
ER
AP1
Estrogen Receptor (ER)
Cistrome in Breast Cancer
•
Carroll et al, Nat Genet 2006
• ER may function far away (100-200KB) from genes
• Only 20% of ER sites have PhastCons > 0.2
• ER has different effect based on different collaborators
ER
AP1
NRIP
• Same TF bind to very different locations in different tissues and conditions, why?
•
TF concentration?
•
Collaborating factors, esp pioneering factors
•
Interesting observations about pioneering factors
29
• ChIP-seq identifies genome-wide in vivo protein-
DNA interaction sites
•
ChIP-seq peak calling to shift reads, and calculate correct enrichment and FDR
•
Functional analysis of ChIP-seq data:
– Strong vs weak binding, conserved vs non-conserved
– Target identification
– Motif analysis
• Cell type-specific binding
Epigenetics
30