Statistical_Analysis..

advertisement
Statistical Analysis of MeDIP-chip Data
Array design: MeDIP-chip data was collected on NimbleGen RN34 CpG Island
Plus RefSeq Promoter Microarrays . The array contained 15790 CpG Islands
annotated by UCSC and all 15287 well-characterized RefSeq promoter regions.
The promoter regions spanned roughly about -3880bp to +970bp of the
transcription start sites(TSSs).
Data pre-processing: Raw data was extracted as pair files by NimbleScan
software. We performed median-centering, quantile normalization, and linear
smoothing by Bioconductor packages Ringo [1], limma [2], and MEDME[3]. After
normalization, a normalized log2-ratio data was created for each sample.
Although we had considered alternative normalization schemes, this scheme
correlated best with the phyro-sequencing data we had (Spearman rank
correlation of 0.45).
Peak calling at the replicate level: We smoothed the normalized log2 ratios
within each array using a simple moving average method with a window size of 3
probes. Then, for each replicate, we generated candidate peaks with the ACME
algorithm [4] using window = 700, thresh = 0.95 (for do.aGFF.calc()), thresh =
5e-3 (for findRegions()). We filtered these candidate peaks by requiring at
least 2 consecutive probes within the peak to exceed the threshold utilized by
ACME. Overall, we kept the initial step of generating candidate peaks liberal to
obtain many peaks and chose to screen these out by a downstream statistical
analysis for differential enrichment.
Identifying differentially methylated regions (DMRs) by pattern generation
and filtering: Each identified peak was classified into the following 5 patterns,
where the control, folate and folate+TSA replicates are represented in the 1-3, 46, 7-9 components of the pattern:
0.0.0.1.1.1.0.0.0: 327 DMRs
0.0.0.1.1.1.1.1.1: 450 DMRs
1.1.1.0.0.0.0.0.0: 490 DMRs
1.1.1.0.0.0.1.1.1: 187 DMRs
1.1.1.1.1.1.0.0.0: 2052 DMRs
We further screened these DMRs based on a limma differential enrichment
analysis [2] at the probe level. We required at least 2 consecutive probes with an
adjusted p-value [5] smaller than 0.1 within each DMR. In addition, we required
fold changes within the DMRs to confirm the implied ordering by the patterns. For
example, for the 0.0.0.1.1.1.0.0.0 pattern, we required that average of the
average log2 ratios of the probes across replicates within the DMR region for the
folate group was larger than those of both the control and the folate + TSA
groups. Many genes had multiple DMRs. Table 1 below lists further classification
of the DMRs with respect to CpG island and gene annotations.
Pattern
0.0.0.1.1.1.0.0.0 0.0.0.1.1.1.1.1.1 1.1.1.0.0.0.0.0.0 1.1.1.0.0.0.1.1.1 1.1.1.1.1.1.0.0.0
# of
DMRs
# of
DMRs
327
450
490
187
2052
255
350
440
147
1689
Table 1: Summary of the differential methylation patterns. ***We can modify what
rows we report in this table***
mapping
to gene
promoters
# of
genes
with a
DMR
# of
genes
with a
highly
significant
DMR
# of
unique
genes
with a
DMR in a
CpG
island
# of
DMRs in
nongenes
CpG
islands
89
72
149
30
306
52
49
81
8
216
34
40
9
17
88
37
25
24
17
210
References:
1) Toedling J, Sklyar O, Huber W: Ringo – an R/Bioconductor package for
analyzing ChIP-chip readouts. BMC Bioinformatics 2007, 8:221.
2) Smyth, G. K. (2004). Linear models and empirical Bayes methods for
assessing differential expression in microarray experiments. Statistical
Applications in Genetics and Molecular Biology 3, No. 1, Article 3.
3) M. Pelizzola, Y. Koga, A. E. Urban, M. Krauthammer, S. Weissman,R.
Halaban, and A. M. Molinaro (2008). MEDME: an experimental and
analytical methodology for the estimation of DNA methylation levels based
on microarray derived MeDIP-enrichment. Genome
research,18(10):1652–1659.
4) Scacheri, P.C., Crawford G.E., Davis S. (2006) Statistics for ChIP-chip
and Dase hypersensitivity experiments on Nimblegen arrays. Methods
Enzymology 411: 270-282.
5) Benjamini, Y., Hochberg, Y.(1995). Controlling the false discovery rate: a
practical and powerful approach to multiple testing. Journal of the Royal
Statistical Society, Series B (Methodological) 57 (1): 289–300.
Download