Experiment Design

advertisement
SUPPLEMENTARY METHODS
Experimental Design
We analysed gene expression profiles in two experimental models:
1. APL blasts derived from three patients bearing the t(15;17) and expressing
PML/RAR, before and after treatment with 10-6 M Retinoic Acid (RA) (SigmaAldrich, St. Louis, Missouri, USA) in vitro for four hours;
2. A U937 clone conditionally expressing PML/RAR (U937-PR). In these cells, the
cDNAs encoding PML/RAR is under the transcriptional control of the Zinc (Zn)inducible mouse metallothionein (Mt) promoter. U937-PR clone was treated with
100M ZnSO4 for 12 hours, before 10-6 M RA was added to the culture for 4
hours. A U937 bulk population containing the empty cloning vector (Mt), also
treated with 100M ZnSO4 for 12 hours and with 10-6 M RA for 4 hours, was
used as reference. Gene expression profile of U937-PR clone prior to and after 4
hours of treatment with 10-6 M RA was analyzed and compared to that obtained
from the U937-MT cells.
3.
Samples used, Extract preparation and labelling
Leukemic blasts from peripheral blood were obtained at disease onset (prior to
any therapy and sensitive to RA therapy) from three patients with newly diagnosed APL
(AML-M3 according to the FAB classification), and that showed ≥ 75% leukemic
infiltration. The collected blasts were isolated by centrifugation on a Ficoll-Hypaque
gradient as previously described and then treated in vitro for 4 hours with 10–6 M
Retinoic Acid (RA) treatment, prior to RNA extraction.
For the U937-PR and U937-Mt cell lines, three independent vials were thawed and
the induction either with ZnSO4 or RA and RNA extraction were performed separately.
Prior to RNA extraction, a small aliquot of cells was lysed in Laemmli lysis buffer and all
experiments were controlled for PML/RAR fusion protein expression by Western
blotting.
Total RNA was extracted using TRIzol Reagent (Gibco), followed by clean up on
RNeasy mini/midi columns (RNeasy Mini/Midi Kit, Qiagen). For each cell line, an RNA
pool was obtained by mixing equal quantities of total RNA from each of the three
independent RNA extractions. Biotin-labelled cRNA targets were synthesized starting
from 5g of total RNA. Double stranded cDNA synthesis was performed with GIBCO
SuperScript Custom cDNA Synthesis Kit, and biotin-labelled antisense RNA was
transcribed in vitro using Ambion’s In Vitro Transcription System, including Bio-11UTP and Bio-11-CTP (NEN Life Sciences, PerkinElmer Inc, Boston, Massachusetts,
USA) in the reaction. All steps of the labelling protocol were performed as suggested by
Affymetrix
(http://www.affymetrix.com/support/technical/manual/expression_manual.affx). The size
and the accuracy of quantitation of targets were checked by agarose gel electrophoresis of
2g aliquots, prior to and after fragmentation. After fragmentation, targets were diluted in
hybridisation buffer at a concentration of 150g/ml.
A scheme of the experimental strategy is shown below.
U937-PR9
Pooled PML/RAR target
PML/RAR chips
Control chips
Pooled control target
U937-MT
Hybridisation procedures and parameters
Hybridization mix for target dilution (100 mM MES, 1 M [Na +], 20 mM EDTA,
0.01% Tween 20) was prepared as indicated by Affymetrix, including pre-mixed biotinlabeled control oligo B2 and bioB, bioC, bioD and cre controls (Affymetrix cat# 900299)
at a final concentration of 50 pM, 1.5 pM, 5 pM, 25 pM and 100 pM respectively. Targets
were diluted in hybridization buffer at a concentration of 150µg/ml and denatured at
99°C prior to introduction into the GeneChip cartridge.
Targets were tested for quality by hybridization to Affymetrix Test3 Arrays, cat#
900341. Two copies of the complete GeneChip HG-U133 set (HG-U133A, HG-U133B)
were then hybridized with each biotin-labeled target.
Hybridizations were performed for 14-16 hours at 45°C in a rotisserie oven.
GeneChip cartridges were washed and stained in the Affymetrix fluidics station following
the EukGE-WS2 standard protocol (including Antibody Amplification):
1. Wash 10 cycles of 2 mixes/cycle with Wash Buffer A (6X SSPE, 0.01% Tween
20) at 25°C
2. Wash 4 cycles of 15 mixes/cycle with Wash Buffer B (100 mM MES, 0.1 M
[Na+], 0.01% Tween 20) at 50°C
3. Stain the probe array for 10 minutes in SAPE solution (10 g/mL SAPE in 100
mM MES, 1 M [Na +], 0.05% Tween 20, 2 mg/mL BSA) at 25°C
4. Wash 10 cycles of 4 mixes/cycle with Wash Buffer A at 25°C
FIRST SCAN
5. Stain the probe array for 10 minutes in antibody solution (Normal Goat IgG 0.1
mg/mL,
6. Biotinylated antibody 3 g/mL, 100 mM MES, 1 M [Na +], 0.05% Tween 20, 2
mg/mL BSA) at 25°C
7. Stain the probe array for 10 minutes in SAPE solution at 25°C
8. Final Wash 15 cycles of 4 mixes/cycle with Wash Buffer A at 30°C
SECOND SCAN
Images were scanned using an Affymetrix GeneArray Scanner, using default
parameters. Each chip was scanned twice, to obtain two different images: the first scan
was performed after the first SAPE staining procedure (between steps 4 and 5 above), and
the second scan was performed after antibody amplification of the signal, at the end of
the washing procedure. The resulting images were analysed using Microarray Suite
version 5 (MASv5), Affymetrix cat# 690018. Data obtained from the two scans was
processed independently, and merged for each sample only at the end of all elaborations.
Measurement data and specifications
“Absolute analysis” was performed for each chip with MASv5 software using
default parameters, scaling all images to a value of 500. Report files were extracted for
each chip, and performance of labelled targets was evaluated on the basis of several
values (scaling factor, background and noise values, % present calls, average signal
value, etc).
Results derived from APL blasts after RA treatment (sample) were compared to
results from the APL blasts prior to RA treatment (reference) by “comparative analysis”,
using the reference chips as baseline. Each sample chip was compared to both reference
chips for identification of regulated genes. Furthermore, duplicate sample and reference
chips were compared to each other for calculation of noise (see scheme below, 1).
The same procedure was followed for the results derived from U937-PR cells after
RA treatment (sample) compared to results from the U937-PR prior to RA treatment
target (reference) (see scheme below, 2), for the U937-Mt cells after RA treatment
(sample) compared to the U937-Mt prior to RA treatment (reference) (see scheme below,
3); for the U937-PR cells expressing PML/RAR (sample) compared to U937-Mt
(reference) (see scheme below, 4).
1
APL#1+ RA_1
2
APL#1+ RA_2
U937-PR RA1
U937-PR RA2
NOISE
NOISE
COMPARISON
COMPARISON
NOISE
APL#1_1
NOISE
APL#1_2
U937-PR 1
U937-PR 2
Similarly, for patients APL#2 and APL#3
U937-Mt RA1
U937-Mt RA2
U937-PR 1
U937-PR 2
NOISE
NOISE
COMPARISON
COMPARISON
NOISE
NOISE
U937-Mt 1
U937-Mt 2
U937-Mt 1
3
U937-Mt 2
4
This procedure yielded four comparison files for each sample under analysis.
Data thus obtained was then subjected to further elaboration using the DCall-Fold
Change analysis procedure. DCall–Fold Change analysis is performed on Affymetrix
comparison files. The Affymetrix “Difference Call” (DCall) corresponds to the
qualitative information about the status of a Probe Set in the two conditions considered: it
indicates if the expression level of a Probe Set is decreased (D), mildly decreased (MD),
increased (I) or mildly increased (MI) in the sample as compared to the reference. “Fold
Change” gives the corresponding quantitative information: it is calculated from the Signal
Log ratio of Affymetrix comparison files.
FCi = 2SLRiif SLRi > 0
FCi = -1/2SLRiif SLRi < 0
Where SLRi is the signal log ratio value for Probe Set i.
The expression of a gene represented by a specific Probe Set is considered as decreased
if, in each comparison file analyzed, it has a DCall corresponding to “D” or “MD” and its
Fold Change value is lower than a fixed cut-off. Conversely, the expression of a given
gene is considered increased if its representative Probe Set, in each comparison files
analyzed, has a DCall corresponding to “I” or “MI” and its Fold Change value is higher
than a fixed cut-off. For the purpose of finding common regulated target genes, the
analysis was performed at low stringency; and the fold change cut-off value was set to
1,3 or –1.3.
The t-statistic is well suited for finding differentially expressed genes because it
allows the selection of an expression pattern that has maximal difference in mean level of
expression between two groups, and minimal variation of expression within each group.
A double sided t-test was performed on Signal values generated by MASv5, considering
that group 1 N (μ1,s1) and group 2 N (μ2,s2) follow a Gaussian distribution. Parameters
of the distributions are unknown and we assume the identity of standard deviations.
t
X 2  X1
n1  1s12  n2  1s2 2
n1n2 f
n1  n2
Where:
•X2, X1 are the means of signal values for group1 and group2, respectively,
•n1, n2 are the number of signals in group1 and group2, respectively,
•s12, s22 are the variances of the signal values in group1 and group2, respectively,
•f is the number of degrees of freedom.
f = n1+n2-2
The McNemar test was used to determine data quality and cut-off values (see Abell,
M.L., Braselton, J.P., and Rafter, J.A., 1999. Statistics with Mathematica. Academic
Press). The McNemar test is often used in clinical trials to assess the efficacy of drug
treatment versus placebo controls. The test compares lists of “yes/no” values. A P value
>0.01 indicates lack of efficacy (the lists are equivalent), whereas P values <0.01 suggest
the presence of a therapeutic effect. Applied to our data, we used the results from pair-
wise chip comparisons to obtain these lists where yes means “gene regulated” and no
means “gene not regulated”.
Lists of regulated genes resulting from chip comparisons between two test chips or
between two control chips were used to determine the noise level of the experiments
(called noise lists), whereas chip comparisons between a test and a control chip were used
to determine the signal (called signal lists or data lists). Three parameters were evaluated:
1. The equivalence of samples and chip performances (noise list vs. noise list, P >
0.01).
2. The presence of differences in transcript levels (noise list vs. signal list, P < 0.01).
3. The reproducibility of measurement of such differences (signal list vs. signal list,
P > 0.01).
Comparative analysis lists that resulted from single-chip comparisons were
combined into two duplicate lists using the logical AND operator (meaning that a gene
was called regulated only when it was found to be regulated in each of the composing
sub-lists). Noise lists and randomized signal lists combined the same way were used as
controls. Randomization of signal lists was carried out using a pseudorandom number
generator that reassigned a new position to each gene in the list before combining them.
Special-purpose software for replica analysis and t-test analysis, named GenePicker, was
developed by G. Finocchiaro and H. Muller (Finocchiaro, G., Parise, P., Minardi, S.P.,
Alcalay, M. & Muller, H. (2004). Bioinformatics, 20, 3670-2), and is available at
http://www.ifom-firc.it/RESEARCH/Appl_Bioinfo/tools.html.
Elaboration of results
Lists of regulated genes resulting from the analysis procedure described above were
imported into Access Databases for further elaboration. First, regulated probe sets from
the 2 chips were combined into a single gene list for each experimental sample. Next, the
results deriving from the first and the second scans of each chip (see “Hybridisation
procedures and parameters”) were combined into a single list. Both fold change values
were maintained for reference.
Gene identity was assigned to Affymetrix probe sets using the “Automated Chip
Reannotation tool at IFOM” (http://bio.ifom-firc.it/ARRAY_ANNOT/index.html),
derived from UniGene release Hs.166. Probe sets were then converted into non-redundant
regulated genes, rather than regulated probe sets, using the UniGene ID as unique
identifier. Those probe sets that present sequences not assigned to a UniGene cluster were
further grouped according to Gene Symbol (derived from EMBL or dbEST) or to the
Accession number itself.
Comparison of experimental strategies: RNA pools versus individual samples
To define the best experimental strategy, we compared the results obtained with the
above described protocol (i.e. experimental replicates followed by pooling of RNA
samples), with results obtained by labeling and hybridizing the three independent RNAs
and performing replica analysis.
Briefly, the experimental design was as follows: the same RNA samples used to
generate the RNA pools representative of the untreated U937-PR and U937-MT cells
described above were used for this test. We labeled the experimental replicates
separately, and hybridized each labeled target to one HG-U133A chip.
A scheme of the experimental strategy is shown below.
U937-PR9
U937-PR9_1
U937-PR9_2
U937-PR9_3
U937-MT_1
U937-MT_2
U937-MT_3
U937-MT
Comparative analysis was performed as follows:
U937-PR 1
U937-PR 2
U937-PR 3
NOISE
COMPARISON
NOISE
U937-Mt 1
U937-Mt 2
U937-Mt 3
We then used the GenePicker software to perform replica analysis and statistical
tests, as described in detail above. We thus obtained 491 regulated genes (210 induced
and 281 repressed). Using RNA pools, we identified 1128 genes regulated in the HGU133A chip (613 induced and 515 repressed). Of these, 274 were in common between
the two lists (56% of the genes identified using individual replicates and 24% of those
identified using RNA pools). These results suggest that, using the same RNA samples
and identical stringency of analysis, the use of RNA pools increases the number of
identified targets by approximately 2-fold, and are in agreement with previous
observations.
Considering the high degree of concordance between our microarray data and qPCR
data (Supplementary Table 9 and Alcalay, M., Meani, N., Gelmetti, V., et al (2003). J
Clin Invest, 112, 1751-61), we believe the use of RNA pools increases the sensitivity of
the method and is to be preferred in all cases where the measurement of individual or
technical variability is not relevant to the experimental model.
Primary data can be obtained from ArrayExpress, accession no. E-MEXP149. All elaborated results are available in the Supplementary Data at the Oncogene
website.
Download