Tiling Arrays

advertisement
Tiling Arrays
Ho-Ryun Chung
enhancer
...
promotor
exon
intron
exon
DNA -Gene
exon
pre-mRNA
Transcription
exon
intron
Capping
Splicing
Polyadenylation
Where are the exons
of a transcript?
cap
exon
exon
AAAAAAA
mRNA
Nuclear export
cap
exon
exon
AAAAAAA
Translation
Protein
enhancer
...
promotor
exon
intron
exon
DNA -Gene
exon
pre-mRNA
Transcription
exon
intron
Capping
Splicing
Polyadenylation
Where do proteins
bind?
cap
exon
exon
AAAAAAA
mRNA
Nuclear export
cap
exon
exon
AAAAAAA
Translation
Protein
Sequence alignment
PCR
Sequencing
Microarray
Pattern recognition
Chromatin immunoprecipitation
Gene prediction
Clustering
the functional genomics
toolbox
RNA immunoprecipitation
Mass spectrometry
Network inference
Yeast two hybrid
Modeling and simulation
Tandem affinity purification
In situ hybridization
Immunofluorescence
... tiling array
enhancer
...
promotor
exon
intron
exon
Genome
Probes
... tiling array
enhancer
...
promotor
exon
intron
exon
Genome
Probes
RNA
... tiling array
enhancer
...
promotor
exon
intron
exon
Genome
Probes
Signal
... tiling array
enhancer
...
promotor
exon
intron
exon
Genome
Probes
Chromatin
immunoprecipitated
DNA
... tiling array
enhancer
...
promotor
exon
intron
exon
Genome
Probes
Signal
Sequence alignment
PCR
Sequencing
Microarray
Pattern recognition
Chromatin immunoprecipitation
Gene prediction
Clustering
the functional genomics
toolbox
RNA immunoprecipitation
Mass spectrometry
Network inference
Yeast two hybrid
Modeling and simulation
Tandem affinity purification
In situ hybridization
Immunofluorescence
... Chromatin
nucleosome
beads on the string
30 nm fiber
... indentify protein-bound regions
?
... indentify protein-bound regions
?
but how?
... 1 make sure the protein sticks to the DNA
crosslinking – use formaldehyde
... 2 break chromatin into small pieces
fragmentation – use sound
= sonification
... 3 fish for the protein of interest
Magnetic
bead
immunoprecipitation – use antibody
... 4 remove unbound stuff
Magnetic
bead
washing
... 5 get unbound stuff
... 6 reverse crosslinks
... 7 isolate DNA
... 8 amplify & label DNA
... 9 hybridize to tiling array
... Chromatin Immunoprecipitation
= ChIP
an approach to enrich DNA fragments bound by the protein
of interest
•
frequency of protein- binding at a site
•
efficiency of the antibody
fold enrichment – some unbound DNA is also isolated
•
specificity of the antibody
... enrichment versus selection
bound
regions
selection
100%
bound
regions
e.g. 1%
Genome
enrichment factor 100
bound
regions
Genome
Genome
1% x 100 = 100
99% x 1 = 99
50%
50%
... enriched DNA fragments
array-specific
background
probe-specific
additive noise
measured intensity
of probe k on array i
array-specific
gain factor
xk for most of the probes
more or less uniform
-> can we fit bi bk?
Yik = ai + εk + bi bk xk exp(ηik)
probe-specific
gain factor
DNA
abundance
hybridization efficiency
efficiency of ChIP
efficiency of
amplification
labeling
probe- & array-specific
multiplicative noise
Intensity
... the nature of the signal
Tiling array probes
Probe 1
Probe 2
different concentrations of samples
or
different hybridization properties?
... probe-specific hybridization properties
remove bias due to amplification and labeling as well as
probe specific behavior
Iexperiment(i)
Snorm(i) = I
control(i)
BUT
I(i) = specific hybridization
+
genome
probe i
cross-hybridizations
... probe-specific hybridization properties
remove bias due to amplification and labeling as well as
probe specific behavior
Iexperiment(i)
Snorm(i) = I
control(i)
BUT
I(i) = specific hybridization
+
cross-hybridizations
genome
probe i
Most probes have constant specific
hybridization
... affinity measure ΚU for cross hybridizations
Calculate the contribution Ks of each duplex:
Ks = exp[-∆Gs]
s
Sum the contributions Ks of all duplexes:
KU =
ΣK
s
× fs
sample
probe
... calculate ΚU as a sum of all duplexes
Dynamic programming
R(j) = [R(j – 1) × elongation(j) + start(j)]
R(j – 1)
R(j)
... calculate ΚU as a sum of all duplexes
Dynamic programming
R(j) = [R(j – 1) × elongation(j) + start(j)]
KU(j) = KU(j – 1) + R(j) × end(j)
R(j – 1)
R(j)
R(j) × end(j)
... measured intensities as function of ΚU
ln(Intensity)
Pearson correlation
r = 0.63
linear regression
ln(I) = α + β ln(ΚU)
slope β
intercept α
ln(KU)
ln(Intensity)
... intensity as score of evidence?
evidence?
threshold
on intensity
ln(affinity measure)
... alternative score of evidence
ln(Intensity)
Score of evidence
^
S(i) = {ln[I(i)] – ln[I(i)]}
threshold
signal above
background
threshold
on intensity
ln(affinity measure)
... other approaches – MAT
Model-based analysis of tiling arrays
... other approaches – MAT
Model-based analysis of tiling arrays
for affymetrix tiling-arrays
... other approaches – MA2C
Model-based analysis of two-color arrays
mean
Individual channel
variance
co-variance
... other approaches – SNN
standard normal normalization
log( I i ) − log( I i )
xˆi =
sd(log( I i ) )
... peak-calling
... smooth the data
running average
... smooth the data
running average
problem: outliers
... smooth the data
running average – trimmed mean, discard x % lowest and highest
... smooth the data
running average – trimmed mean, discard x % lowest and highest
problem: variable number of probes
in a window of given length
... smooth the data
upweight windows with more probes
trimmed mean
xˆ = n p TM ( x)
number of probes in window
... find the peak summits
how would you do this?
... does it help to account for probe-specific
biases???
... does it help to account for probe-specific
biases???
Affymetrix
Agilent
NimbleGen
better
... does it help to account for probe-specific
biases???
not really
better
... the control removes probe-specific
biases, but only if properly scaled
Download