Tiling arrays for ChIP-chip

advertisement
Special Topics in Genomics
ChIP-chip and Tiling Arrays
Traditional Method for Understanding
Transcription Regulation
Gene expression
microarray analysis
Clustering genes by
expression profile
Search conserved sequence
motifs in cluster promoters
Very challenging for mammalian genomes
ChIP-chip Technology
• Chromatin ImmunoPrecipitation + microarray
• Detect genome-wide in vivo location of TF and other
DNA-binding proteins
• Can learn the regulatory mechanism of a transcription
factor or DNA-binding protein much better and faster
Chromatin ImmunoPrecipitation (ChIP)
By Richard Bourgon at UC Berkley
TF/DNA Crosslinking in vivo
By Richard Bourgon at UC Berkley
Sonication (~500bp)
By Richard Bourgon at UC Berkley
TF-specific Antibody
By Richard Bourgon at UC Berkley
Immunoprecipitation
By Richard Bourgon at UC Berkley
Reverse Crosslink and DNA Purification
By Richard Bourgon at UC Berkley
Amplification
By Richard Bourgon at UC Berkley
Genome Tiling Arrays
# Arrays
human
genome
# Probes
/ Array
# Total
Probes
Probe
Length
Probe
Resolution
Affymetrix
7
6M
42.0M
25mer
35 bp
$2,000
Nimblegen
38
390K
14.8M
50mer
110 bp
$30,000
5.1M
300 bp in
genes;
60mer
500 bp in
intergenic
Agilent
21
244K
Price
$11,000
By Xiaole Shirley Liu at Harvard
Genome Tiling Arrays
• Affymetrix genome tiling microarrays
– Tile the genome non-repeat regions
– Chr21/22 tiling (earlier version): 1 million probe pairs
(PM & MM) at 35 bp resolution on 3 arrays
– Whole genome: 42 million PM probes on 7 arrays
PM CGACATTGATTCAAGACTACATACA
MM CGACATTGATTCTAGACTACATACA
Probes
Chromosome
By Xiaole Shirley Liu at Harvard
Chromatin ImmunoPrecipitation (ChIP)
By Richard Bourgon at UC Berkley
ChIP-chip Array Hybridization
• Map high intensity probes back to the genome
• Locate TF binding location
ChIP-DNA
Noise
Probes
Chromosome
By Xiaole Shirley Liu at Harvard
Identify ChIP-enriched Region
• Controls: sonicated genomic Input DNA
• Often 3 ChIP, 3 Ctrl replicates are needed
ChIP
Ctrl
By Xiaole Shirley Liu at Harvard
Mann-Whitney U-test
for ChIP-region Detection
• Affy TAS, Cawley et al (Cell 2004):
– Each probe: rank probes (either PM-MM or
PM) within [-500bp, +500bp] window
– Check whether sum of ChIP ranks is much
smaller
By Xiaole Shirley Liu at Harvard
TileMap
(Ji and Wong, Bioinformatics 2005)
STEP 1:
Compute a test statistic for each probe to
summarize probe level information
STEP 2:
Combine probe level test statistics of
neighboring probes to help infer binding
regions
Probe level test statistic: empirical Bayes
approach
Probe
Sample Variance (df)
1
2
3
s12
s22
…
s32
I
…
sI2
Mean
S  i [si2  (s 2 )] 2
s2
Shrinkage Factor
Bˆ 
Sum of Squares
2 I 1
2
I 1

(s 2 ) 2
df  2 I
df  2
S
Variance Shrinkage Estimator
ˆ i2  (1  Bˆ ) si2  Bˆ s 2
Variance Estimates
̂ 12
̂ 22
ˆ 32
…
ˆ I2
A modified t-statistic
~
ti 
Probe level test statistics
~
t1
~
t2
~
t3
…
~
tI
xi1  xi 2
1
1
 ˆ i
K1 K 2
Combining neighboring probes
TileMap (MA)
1. Compute the probe level test statistic
t for each probe;
2. Compute a moving average statistic
to measure enrichment;
3. Estimate FDR.
TileMap (HMM)
1. Compute the probe level test statistic
t for each probe;
2. Estimate the distribution of t under
H0 and H1;
3. Model t by a Hidden Markov Model,
and decode the HMM.
Shrinking variance increases statistical power
Moving Average
t-statistic, variance
shrinking
t-statistic, canonical
Mean(X1)-Mean(X2)
Peak 2 (180bp) transgenics
Neural tube expression
Transgenics
Comparisons between TileMap and previous
methods
cMyc ChIP-chip Data: 6 IP + 6 CT1 + 6 CT2
Gold Standard: Using GTRANS and Keles’ method to analyze all 18 arrays
Test data: 4 arrays, 2 IP vs 2 CT1 (s2r2)
TileMap-HMM (Ji & Wong, 2005)
GTRANS or TAS (Kampa et al., 2004)
1. Set a window;
2. Perform a Wilcoxon signed rank test for
each window.
Keles et al. (2004)
1. Compute a t-statistic t for each probe
(no shrinking, two sample only);
2. Rank probes by a moving average.
Shrinking variance saves money
Using non-shrinking method (Keles’ method) to analyze all probes
Using shrinking method to analyze half of the probes, i.e., reduce information by half
MAT
(Johnson W.E. et al. PNAS, 2006)
• Model-based Analysis of Tiling arrays for ChIP-chip
• Goal:
–
–
–
–
Find ChIP-regions without replicates
Find ChIP-region without controls
Find ChIP-regions without MM probes
Can analyze data array by array
By Xiaole Shirley Liu at Harvard
MAT
• Estimate probe behavior by checking other
probes with similar sequence on the same array
• Probe sequence plays a
big role in signal value
• Most of the probes in
ChIP-chip measures
non-specific
hybridization
By Xiaole Shirley Liu at Harvard
Probe Behavior Model
Baseline on
number of Ts
A,C,G at each position
of the 25mer
A,C,G,T Count Square
25mer Copy Number
along the Genome
By Xiaole Shirley Liu at Harvard
Probe Standardization
• Fit the probe model array by array
• Divide array probes to bins (3k probes/bin)
• Background-subtraction and standardization
(normalization) on a single array;
Observed probe
intensity
Log ( PM i )  mˆ i
ti 
si affinitybin
Model predicted
probe intensity
Observed probe
variance within
each bin
By Xiaole Shirley Liu at Harvard
Eliminate Normalization
• Probe log(PM) values before and after
standardization
• If normalize before model fitting
– Predicted same ChIP-regions, although less confident
By Xiaole Shirley Liu at Harvard
ChIP-region Detection
• Window-based MATscore
– ChIP without Ctrl
MAT (region )  TM (t ' s in region ) nChIP
– TM: trimmed mean
– Multiple ChIP with multiple Ctrl
MAT ( region ) 
TM (t ' s in ChIP)  TM (t ' s in Input )
 Input
nChIP
– More probes, higher t values in ChIP, less variance
(fluctuation)  more confident
By Xiaole Shirley Liu at Harvard
Raw probe values at two spike-in regions with concentration 2X
2X
2X
ChIP_1 Log(PM)
Input_1 Log(PM)
Sequence-based probe behavior standardization
ChIP_1 t-value
Input_1 t-value
Window-based neighboring probe combination for ChIP-region detection
ChIP_1 MATscore
ChIP_1/Input_1
MATscore
3 Reps ChIP/Input
MATscore
By Xiaole Shirley Liu at Harvard
MAT: Quality Control
Statistical Significance of Hits
Background
<1% enriched
Enriched DNA
•Background
P-value and FDR cutoff:
– P-value from MATscore distribution
– Estimate negative peaks under the same P value cutoff
– Regional FDR = #negative_peaks / #positive_peaks
Enriched DNA
By Xiaole Shirley Liu at Harvard
MAT summary
• Open source python
http://chip.dfci.harvard.edu/~wli/MAT/
• Runs faster than array scanner
• Can work with single ChIP, multiple ChIP, and
multiple ChIP with controls with increasing
accuracy
– Use single ChIP on promoter arrays to test antibody
and protocol before going whole genome
• Can identify individual failed samples
By Xiaole Shirley Liu at Harvard
Benchmark for ChIP-chip Target Detection
(Johnson D.S. et al. Genome Research, 2008)
• ENCODE Spike-in experiment:
both amplified and un-amplified
ChIP
Input
96 ENCODE clones,
2,4,8,...,256X enrichment +
total chromatin DNA
total genomic DNA
• Blind test:
Samples hybridized to different tiling arrays,
predictions made before the key was released
Comparison of platforms
Comparison of algorithms
Combined Johnson D.S. et al. Genome Research 2008
with Ji H. et al. Nature Biotechnology 2008
MBR: Microarray Blob Remover
By Xiaole Shirley Liu at Harvard
xMAN: eXtreme MApping of
oligoNucleotides
• http://chip.dfci.harvard.edu/~wli/xMAN
• xMAN maps ~42 M Affymetrix tiling probes to the newest
human genome assembly in less than 6 CPU hours
– BLAST needs 20 CPU years; BLAT needs 55 CPU days
– Probe TCCCAGCACTTTGGGAGGCTGAGGC maps to 50,660
times in the genome
• Can map long oligos, and paired tag high throughput
sequencing fragments
• Store the copy number information of every probe
• mXAN filters tiling array probes to ensure one unique
probe measurement per 1 kb, improves peak detection
By Xiaole Shirley Liu at Harvard
CEAS: Cis-regulatory Element Annotation
System
• Data Analysis Button for Biologists
http://ceas.cbi.pku.edu.cn
By Xiaole Shirley Liu at Harvard
CisGenome
(Ji H. et al. Nature Biotechnology, 2008)
Graphic User Interface
CisGenome Browser
Core Data
Analysis
Programs
Other applications of tiling arrays
•
•
•
•
•
Transcriptome mapping
MeDIP-chip
DNase-chip
Nucleosome localization
Array CGH and copy number variation
Download