doc format

advertisement
Supplementary material – Emanuelsson et al.
EXPERIMENTAL METHODS
Protocols
Here is a detailed description of all three experimental protocols (MAS-B, MAS-N, Affy). In
short, the MAS-N protocol yields in-vitro transcribed, biotin-labeled single-stranded cRNA,
which is fragmented to an average size of 50-200 bp before hybridization. The MAS-B protocol
yields Cy3-aminoallyl-labeled single-stranded cDNA (no fragmentation). The Affymetrix
protocol yields end-labeled (bio-ddATP) double-stranded cDNA which is fragmented to an
average size of 50-100 bp before hybridization.
MAS-N protocol
Preparation of labeled c-RNA targets
Labeled c-RNA was prepared by following the procedure described by Van Gelder et al. (1990).
RNA was converted to double stranded cDNA using an oligo (dT) primer containing the T7 RNA
polymerase promoter
[5’GGCCAGTGAATTGTAATACGACTCACTATAGGGAGGCGG(dT)24-3’]. and superscript
reverse transcriptase (RTase) choice system (Invitrogen, CA). Briefly, 10µg total RNA or 2µg
poly(A)+ RNA was incubated with 5X first strand buffer, 0.1M DTT, 10mM each dNTPs, 5pmol
primer for 60 minutes at 420C. Second strand synthesis was accomplished by incubation with
40U DNA polymerase I, 2U of Escherichia coli RNase H, 10mM each dNTPs and 10U of
Escherichia coli DNA ligase in 5X second strand buffer for 2 hours at 160C. The double strand
cDNA synthesis was terminated by incubating with 10U of T4 DNA polymerase for 5 minutes at
160C. Double stranded cDNA was purified using phenol chloroform extraction, ethanol
precipitated, and resuspended in 3.25µL of water.
In vitro transcription (IVT) was used to produce biotin labeled cRNA from the cDNA using the
MEGA script T7 kit (Ambion, TX). Briefly, 1µg double stranded cDNA was incubated with 7.5
mM ATP and GTP, 5.625 mM CTP and UTP and 1.875 mM bio-11-CTP and bio-16-UTP
(Enzo or Perkin Elmer) in 1X transcription buffer and 1X T7 enzyme mix at 370C for 5 hours. In
vitro transcribed biotin labeled cRNA was purified on Qiagen RNeasy mini columns according to
manufacturer’s protocol.
Microarray hybridization with labeled c-RNA and washing
Before hybridization, cRNA was fragmented to an average size of 50-200bp by incubating in 5X
RNA fragmentation buffer (200mM Tris acetate, pH 8.1, 100mM KOAc and 150mM MgOAc) at
950C for 35 minutes. Fragmentation was confirmed by running an aliquot of the sample on an
agarose gel. Microarrays were hybridized with 10-12µg of cRNA in 55µL in the presence of 40%
formamide, 1mM Tris, 0.1mM EDTA, 5X SSC and 0.1% SDS for 18 hours at 420C. Before
application to the array, samples were heated to 950C for 5 minutes, then at 450C until ready for
hybridization (Max 5-30minutes). Hybridization was performed in a MAUI station.
After hybridization, arrays were washed in 0.2% SDS and 0.2X SSC for 2 minutes at 420C and
placed in non-stringent buffer (6X SSPE, 0.01% Tween 20) until ready for the next wash in 0.2X
SSC at room temperature for one minute. After washing arrays were stained with streptavidincy3 conjugate from Amersham Pharmacia for 25 minutes at room temperature followed by a
quick rinse in 0.2X SSC and signals were amplified by antibody amplification mix
(Antistreptavidin and goat IgG) for 25 minutes. Staining and amplification was repeated for 10
more minutes after a quick rinse in 0.2X SSC. This was followed by holding the arrays in nonstringent buffer until ready for wash in 0.2X SSC for one minute followed by 30 seconds wash in
0.05X SSC. The arrays were dried with air duster and were scanned on an Axon 4200B laser
scanner at 5µm resolution.
MAS-B protocol
Preparation of labeled cDNA targets
Labeled cDNA targets from RNA was prepared using aminoallyl cDNA labeling kit (Ambion,
TX). Briefly, poly(A)+ mRNA and oligo dT(1.7µM) were incubated at 700C for 10 minutes and
snap cooled on ice. RnaseH M-MLV reverse transcriptase and M-MLV RT reaction buffer with
0.5mM each dATP, dCTP, dGTP, 0.15mM dTTP and 0.15mM aminoallyl dUTP were added and
the mixture was incubated at 420C for 2 hours. RNA was degraded (sheared) by incubating with
0.2M NaOH at 650C for 15 minutes. The reaction was then neutralized by the addition of 0.3M
HEPES pH 7.0. The cDNAs were precipitated in ethanol with sodium acetate and resuspended in
0.1M NaHCO3 to facilitate coupling of the cy3 mono-amine dye (Amersham) to the aminoallyl
functional group. cDNAs were coupled to N- hydroxysuccinimidyl esters of cy3 dyes (Cyscribe,
Amersham Biosciences) for 2 hours in the dark. The labeled cDNAs were purified with cyscribe
GFX glass fiber spin columns (Amersham Bioscience, Piscataway NJ) and isopropanol
precipitated.
Microarray hybridization with labeled cDNA and washing
Microarrays were hybridized with 2-3µg labeled cDNA in 360µL hybridization buffer (50mM
MES, 0.5M NaCl, 10mM EDTA and 0.005% Tween 20) for 20 hours at 500C. Hybridizations
were performed in disposable adhesive chambers (Grace Biolabs, Bend, OR) in a hybridization
oven with constant agitation. After hybridization, the arrays were washed in non-stringent buffer
(6X SSPE, 0.01%[V/V] Tween 20) for 10 minutes at room temperature followed by washing in
stringent buffer (100mM MES, 0.1M NaCl, 0.01% Tween 20) for 30 minutes at 450C. This was
followed by a 5 minute wash in non-stringent buffer and a 4 minute wash in 0.2X SSC. The
arrays were dried with air duster. Fluorescence micrographs were acquired with an Axon 4200B
laser scanner at 5µm resolution.
Affymetrix protocol
Preparation of end labeled cDNA targets
Briefly, total RNA(10µg) and random primers (1.25µg) were incubated at 700C for 10 minutes
(Fast ramp) and 150C for 30 minutes (20minutes ramp) in presence of bacterial controls (Lys,
dap, phe etc). As soon as the reaction reaches 150C, 1X first strand buffer, 10mM DTT and
10mM dNTPs were added. This was followed by the addition of 2000U of superscript II
(Invitrogen) and the mixture was incubated at 420C for 60minutes(20minutes ramp) and 700C for
15minutes (fast ramp).
Second strand cDNA synthesis was carried out in a cycle of 160C for 120minutes and 700C for
15minutes by adding 45µL of RNase free water, 150µL 1X second strand buffer, 15µL 10mM
dNTPs, 50U Escherichia coli DNA ligase, 200U Escherichia coli DNA polymerase I and 10U
Escherichia coli RNase H. RNA was degraded by the addition of RNase cocktail [30U RNase
H(Epicentre) 15U and 60U of RNase A/T1(Ambion)] and incubate at 370C for 20minutes. The
double stranded cDNA was ethanol precipitated. cDNA fragmentation was done to get 50-100bp
size fragments by incubating at 370C for 8 minutes in presence of 1X one phor all
buffer(Pharmacia) and 5µL DNase mix (Epicentre) followed by DNase I inactivation by heating
to 990C for 10 minutes. Fragments were checked on 1% agarose gel. Fragmented cDNA was end
labeled with 1mM bio-ddATP (Perkin Elmer) using terminal deoxy transferase in presence of 5X
TDT buffer(Roche), 25mM COCl2 (Roche) and the reaction mixture was incubated at 370C for 2
hours. The labeled double stranded cDNA was used for hybridization to the array. Before
hybridization, the arrays were prehybridized with 1X MES triton (12X MES, 5M NaCl, 0.5M
EDTA, 1% Triton X 100) for 1hour at 450C in a hybridization oven with 45rpm rotation. Labeled
cDNA were added to hybridization buffer (1X MES, 3M TMAC, 30-50pM biotinylated oligo
948(B2), 1X Eukaryotic hybridization control, 100µg/mL Salmon sperm DNA, 0.02% Triton X),
denatured for 10 minutes, followed by 10 minutes incubation at 450C and spun at maximum
speed for 3 minutes. Hybridization was performed at 450C for 18 hours with 45rpm rotation.
Washing, staining and scanning were performed as described in the Affymetrix gene chip
expression analysis technical manual. In brief, the arrays were washed and stained in the
Affymetrix Fluidics Station 400 using an antibody amplification protocol and streptavidin
phycoerythrin. Arrays were scanned with the Affymetrix gene Array Scanner.
PCR validation
Poly (A)+ RNA from human placenta was purchased from Ambion and prepared as above. 400ng
of poly(a)+ RNA was primed with random hexamers and reverse transcribed to cDNA at 650C for
5 minutes using Thermoscript RT system from Invitrogen in presence of 5X cDNA synthesis
buffer, 10mM DTT and 40U RNase out in a reaction volume of 20µl. In the second step, PCR
reaction was performed using Taq DNA polymerase. 4µl of the cDNA reaction was used for
PCR in a 96 well plate format using gene specific primers. Each reaction was containing 10X
PCR buffer, 50mM MgCl2, 10mM dNTPs, 10µM forward primer and 10µm reverse primer in a
50µl reaction volume. After denaturation at 940 for 3 minutes, a 3 step PCR cycle was used as
follows, 940C for 1minute, annealing at 580C for 1minute and extension at 720C for 1 minute for
40 cycles, followed by a 3 minute final extension at 720C. An identical aliquot of each reaction
was used as a minus reverse transcriptase control. PCR products were electrophoressed on 2.5%
agarose gel.
CAPTIONS TO SUPPLEMENTARY FIGURES
General
Figure S1 -S6 are positive predictive value (PPV) vs. sensitivity plots visualizing the comparison
of TAR sets to the Gencode gene annotation, as detailed in the text body of the Results and
Methods sections of the main paper. The positive predictive value (PPV) is defined as the number
of nucleotides in TARs that overlap with exonic regions, divided by the total number of
nucleotides in the TAR set (sometimes this is called specificity (spec.)). The sensitivity (sens.) is
defined as the number of nucleotides in annotated exons that overlap with TARs, divided by the
total number of nucleotides in annotated exons. Figures S1-S4 represent the comparison of all
individual MAS slides and all the scoring schemes that were tested on the MAS data. Figures S5
and S6 represent the comparison of the best (in terms of agreement with annotation) sets of MAS
TARs with the Affymetrix TARs.
Figures S7 and S8 represent the unscaled length distribution and the probe number distribution of
the TAR sets that are analyzed in the main text.
Figure S1a
Number of nucleotides in placental TARs as a function of segmentation threshold (percentiles).
Affy, MAS-B, and MAS-N placenta TARs were generated with the maxgap/minrun algorithm
based on the scored hybridization intensity data using a genomic window and technical replicates
and one of the following scoring methods (as noted in the legend in the Figure): standard sign
test, Fwd-Rev scoring using Wilcoxon signed rank test pseudomedian point estimator on FwdRev differences, weighted sign test, Wilcoxon signed rank test on PM-MM differences ("PMMM P-median").
Figure S1b
Number of nucleotides in NB4 TARs as a function of segmentation threshold (percentiles). Affy
and MAS-N NB4 TARs were generated with the maxgap/minrun algorithm based on the scored
hybridization intensity data using a genomic window and technical replicates and one of the
following scoring methods (as noted in the legend in the Figure): standard sign test, weighted sign
test, Wilcoxon signed rank test on PM-MM differences ("PM-MM P-median").
Figure S2a
Positive predictive value (PPV) versus sensitivity for different ways of scoring and segmenting
the placenta MAS-B and MAS-N data, varying the segmentation threshold from 70th percentile
(to the right in the figure) to 99th percentile (to the left).
Figure S2b
Positive predictive value (PPV) versus sensitivity for different ways of scoring and segmenting
the NB4 MAS-N data, varying the segmentation threshold from 70th percentile (to the right in the
figure) to 99th percentile (to the left).
Figure S3
PPV versus sensitivity for MAS-N and Affy NB4 data, varying the segmentation threshold from
70th percentile (to the right in the figure) to 99th percentile (to the left). The average results of
TARs generated from raw intensities from single arrays for Affy (PM only, and PM-MM) and
MAS-N are plotted, as well as scored results for Affy and MAS-N.
Figure S4a
Positive predictive value (PPV) versus sensitivity for TARs generated from single arrays:
placenta MAS-B ("prot2") and MAS-N ("prot1"), varying the segmentation threshold from 75th
percentile (to the right in the figure) to 99th percentile (to the left). Strand information is kept.
Figure S4b
Identical to Figure S4a except that strand information is not kept ("merged strands").
Figure S5a
Positive predictive value (PPV) versus sensitivity for TARs generated from single arrays: NB4
MAS-N ("prot1"), varying the segmentation threshold from 75th percentile (to the right in the
figure) to 99th percentile (to the left). Strand information is kept.
Figure S5b
Identical to Figure S5a except that strand information is not kept ("merged strands").
Figure S6
(A) Length distribution of placental and NB4 TARs (all 5 sets). (B) Distribution of number of
probes per TAR. The measurement corresponding to a length of 16 probes actually contain all
TARs that are constructed from 16 or more probes ("16<").
Figure S7
Distribution (smoothed) of scores of exons (blue lines) and introns (red lines) for the 5
experiments, using the scoring schemes chosen for the comparison (standard sign test for MAS;
Wilcoxon signed rank test for Affy). The score of a particular exon or intron is defined as the
median score of all the probes that to at least 50% overlap the exon or intron. (A) Placenta MASB; (B) Placenta MAS-N; (C) Placenta Affy; (D) NB4 MAS-N; (E) NB4 Affy.
Figure S8
Intersection of TAR sets, measured in number of overlapping nucleotides. (A) All three placenta
TAR sets (MAS-B, MAS-N, Affy); (B) Both NB4 TAR sets (MAS-N, Affy); (C) All three MAS
TAR sets (placenta MAS-B, placenta MAS-N, NB4 MAS-N); (D) Both Affy TAR sets (placenta
and NB4). Each subfigure is divided into 4 panels: (I) total number of nucleotides in the sector;
(II) number of nucleotides overlapping with conserved regions; (III) number of nucleotides
overlapping with Gencode exons; (IV) number of nucleotides overlapping with both Gencode
exons and conserved regions. All numbers are in kbases.
Figure S9
Distribution of Gencode exon coverage by TARs: (A) all exons; (B) 5’ exons; (C), 3’ exons. Xaxis, the fraction to which an exon is covered by a TAR; 0.0-1.0 split up in 10 bins. Y-axis, the
number (A) or fraction (B, C) of exons covered to the fraction represented on the x-axis.
Figure S10
Distribution of genic and intergenic TAR overlap with conserved regions for placenta (A) and
NB4 (B) TAR sets. X-axis, the fraction to which a TAR is covered by a conserved region; 0.0-1.0
split up in 10 bins. Y-axis, the number of genic (solid bars) and intergenic (broken bars) TARs
covered to the fraction represented on the x-axis. Conserved regions were obtained from the
Consensus-Union Mlagan/TBA UCSC ENCODE Genome Browser track
(http://genome.ucsc.edu/ENCODE/).
SUPPLEMENTARY TABLES
Table S1
Overlap between TAR sets from technical replicates (T1, T2, T3) for placenta batch 1 (B1) and NB4 batch 3 (B3), in % of
total number of nucleotides within TARs from each array that overlaps with TARs from the other array. TARs were
generated from single array intensities using the maxgap/minrun/threshold parameters as specified for each experiment in
Table S1.
(A) Placenta MAS-B
B1
T1
T2
T1
60.0
T2
66.5
T3
66.6
64.5
(B) Placenta MAS-N
T3
64.4
69.1
-
(D) Placenta Affy
B1
T1
T1
T2
91.9
T3
88.5
T2
88.2
86.9
T3
89.8
91.9
-
B1
T1
T2
T3
T1
82.5
88.3
T2
86.9
84.6
T3
87.2
79.3
-
(C) NB4 MAS-N
B3
T1
T2
T1
85.7
T2
86.0
-
(E) NB4 Affy
B3
T1
T1
T2
83.9
T2
87.8
-
Table S2 [expansion of Table 3 in the manuscript]
Characteristics of TAR sets generated using different biological samples/platforms/scoring methods/segmentation
methods (data for ENCODE regions ENm001-ENm011). For the minrun/maxgap approaches, marked with an asterisk
(*), the TAR sets and parameter settings shown correspond to the sets as close as possible to a size of 680k nucleotides.
For the HMM approach, marked with a plus sign (+), the TAR sets corresponding to the optimal state path are shown
(Viterbi decoding).
--Experiment ID-- Scoring method and
Segmentation parameters
threshold/minrun/maxgap
Placenta MAS-B Sign test win.160 91/50/80
MAS-B Sign test weighted 92/50/80
MAS-B Fwd-Rev win. 160 92/50/80
MAS-B Sign test win. 160 HMM-v
MAS-N Sign test win.160 92/50/80
MAS-N Sign test weighted 92/50/80
MAS-N Fwd-Rev win. 160 92/50/80
MAS-N Sign test win. 160 HMM-v
Affy
PM-MM P-med. 90/50/50
--Number of TARs and nucleotides-- Mean/
----Stranded------ ---Unstranded--- Median
#TARs #bases #TARs #bases length
4079
955k
2545
684k
269/180
4439
824k
2875
622k
216/144
4294
655k
4294
655k
154/108
6945
3120k 4249
1872k 441/284
3853
768k
3248
701k
216/144
4273
840k
3576
763k
213/144
4397
670k
4397
670k
152/nc
6056
1888k 4610
1522k 330/194
3694
629k
170/105
Gencode cmp.
Sens.
PPV
(%)
(%)
24.6
35.9
21.8
35.1
23.3
35.3
53.2
28.5
22.3
31.7
23.9
31.4
12.2
18.3
41.8
27.5
37.0
58.6
(*)
(*)
(*)
(+)
(*)
(*)
(*)
(+)
(*)
NB4
3520
3711
4391
16810
-
697k
721k
671k
5735k
-
2936
3085
4391
11714
4674
632k
653k
671k
4097k
629k
216/144
212/144
153/nc
350/216
135/91
19.1
19.6
12.6
62.9
26.5
(*)
(*)
(*)
(+)
(*)
2563
1018k
2482
1001k
403]
MAS-N
MAS-N
MAS-N
MAS-N
Affy
[Gencode exonic
Sign test win.160
Sign test weighted
Fwd-Rev win. 160
Sign test win. 160
PM-MM P-med.
93/50/80
93/50/80
92/50/80
HMM-v
87/50/50
30.2
30.0
18.8
15.4
41.8
Download