SUPPLEMENTARY METHODS Experimental Design We analysed gene expression profiles in two experimental models: 1. APL blasts derived from three patients bearing the t(15;17) and expressing PML/RAR, before and after treatment with 10-6 M Retinoic Acid (RA) (SigmaAldrich, St. Louis, Missouri, USA) in vitro for four hours; 2. A U937 clone conditionally expressing PML/RAR (U937-PR). In these cells, the cDNAs encoding PML/RAR is under the transcriptional control of the Zinc (Zn)inducible mouse metallothionein (Mt) promoter. U937-PR clone was treated with 100M ZnSO4 for 12 hours, before 10-6 M RA was added to the culture for 4 hours. A U937 bulk population containing the empty cloning vector (Mt), also treated with 100M ZnSO4 for 12 hours and with 10-6 M RA for 4 hours, was used as reference. Gene expression profile of U937-PR clone prior to and after 4 hours of treatment with 10-6 M RA was analyzed and compared to that obtained from the U937-MT cells. 3. Samples used, Extract preparation and labelling Leukemic blasts from peripheral blood were obtained at disease onset (prior to any therapy and sensitive to RA therapy) from three patients with newly diagnosed APL (AML-M3 according to the FAB classification), and that showed ≥ 75% leukemic infiltration. The collected blasts were isolated by centrifugation on a Ficoll-Hypaque gradient as previously described and then treated in vitro for 4 hours with 10–6 M Retinoic Acid (RA) treatment, prior to RNA extraction. For the U937-PR and U937-Mt cell lines, three independent vials were thawed and the induction either with ZnSO4 or RA and RNA extraction were performed separately. Prior to RNA extraction, a small aliquot of cells was lysed in Laemmli lysis buffer and all experiments were controlled for PML/RAR fusion protein expression by Western blotting. Total RNA was extracted using TRIzol Reagent (Gibco), followed by clean up on RNeasy mini/midi columns (RNeasy Mini/Midi Kit, Qiagen). For each cell line, an RNA pool was obtained by mixing equal quantities of total RNA from each of the three independent RNA extractions. Biotin-labelled cRNA targets were synthesized starting from 5g of total RNA. Double stranded cDNA synthesis was performed with GIBCO SuperScript Custom cDNA Synthesis Kit, and biotin-labelled antisense RNA was transcribed in vitro using Ambion’s In Vitro Transcription System, including Bio-11UTP and Bio-11-CTP (NEN Life Sciences, PerkinElmer Inc, Boston, Massachusetts, USA) in the reaction. All steps of the labelling protocol were performed as suggested by Affymetrix (http://www.affymetrix.com/support/technical/manual/expression_manual.affx). The size and the accuracy of quantitation of targets were checked by agarose gel electrophoresis of 2g aliquots, prior to and after fragmentation. After fragmentation, targets were diluted in hybridisation buffer at a concentration of 150g/ml. A scheme of the experimental strategy is shown below. U937-PR9 Pooled PML/RAR target PML/RAR chips Control chips Pooled control target U937-MT Hybridisation procedures and parameters Hybridization mix for target dilution (100 mM MES, 1 M [Na +], 20 mM EDTA, 0.01% Tween 20) was prepared as indicated by Affymetrix, including pre-mixed biotinlabeled control oligo B2 and bioB, bioC, bioD and cre controls (Affymetrix cat# 900299) at a final concentration of 50 pM, 1.5 pM, 5 pM, 25 pM and 100 pM respectively. Targets were diluted in hybridization buffer at a concentration of 150µg/ml and denatured at 99°C prior to introduction into the GeneChip cartridge. Targets were tested for quality by hybridization to Affymetrix Test3 Arrays, cat# 900341. Two copies of the complete GeneChip HG-U133 set (HG-U133A, HG-U133B) were then hybridized with each biotin-labeled target. Hybridizations were performed for 14-16 hours at 45°C in a rotisserie oven. GeneChip cartridges were washed and stained in the Affymetrix fluidics station following the EukGE-WS2 standard protocol (including Antibody Amplification): 1. Wash 10 cycles of 2 mixes/cycle with Wash Buffer A (6X SSPE, 0.01% Tween 20) at 25°C 2. Wash 4 cycles of 15 mixes/cycle with Wash Buffer B (100 mM MES, 0.1 M [Na+], 0.01% Tween 20) at 50°C 3. Stain the probe array for 10 minutes in SAPE solution (10 g/mL SAPE in 100 mM MES, 1 M [Na +], 0.05% Tween 20, 2 mg/mL BSA) at 25°C 4. Wash 10 cycles of 4 mixes/cycle with Wash Buffer A at 25°C FIRST SCAN 5. Stain the probe array for 10 minutes in antibody solution (Normal Goat IgG 0.1 mg/mL, 6. Biotinylated antibody 3 g/mL, 100 mM MES, 1 M [Na +], 0.05% Tween 20, 2 mg/mL BSA) at 25°C 7. Stain the probe array for 10 minutes in SAPE solution at 25°C 8. Final Wash 15 cycles of 4 mixes/cycle with Wash Buffer A at 30°C SECOND SCAN Images were scanned using an Affymetrix GeneArray Scanner, using default parameters. Each chip was scanned twice, to obtain two different images: the first scan was performed after the first SAPE staining procedure (between steps 4 and 5 above), and the second scan was performed after antibody amplification of the signal, at the end of the washing procedure. The resulting images were analysed using Microarray Suite version 5 (MASv5), Affymetrix cat# 690018. Data obtained from the two scans was processed independently, and merged for each sample only at the end of all elaborations. Measurement data and specifications “Absolute analysis” was performed for each chip with MASv5 software using default parameters, scaling all images to a value of 500. Report files were extracted for each chip, and performance of labelled targets was evaluated on the basis of several values (scaling factor, background and noise values, % present calls, average signal value, etc). Results derived from APL blasts after RA treatment (sample) were compared to results from the APL blasts prior to RA treatment (reference) by “comparative analysis”, using the reference chips as baseline. Each sample chip was compared to both reference chips for identification of regulated genes. Furthermore, duplicate sample and reference chips were compared to each other for calculation of noise (see scheme below, 1). The same procedure was followed for the results derived from U937-PR cells after RA treatment (sample) compared to results from the U937-PR prior to RA treatment target (reference) (see scheme below, 2), for the U937-Mt cells after RA treatment (sample) compared to the U937-Mt prior to RA treatment (reference) (see scheme below, 3); for the U937-PR cells expressing PML/RAR (sample) compared to U937-Mt (reference) (see scheme below, 4). 1 APL#1+ RA_1 2 APL#1+ RA_2 U937-PR RA1 U937-PR RA2 NOISE NOISE COMPARISON COMPARISON NOISE APL#1_1 NOISE APL#1_2 U937-PR 1 U937-PR 2 Similarly, for patients APL#2 and APL#3 U937-Mt RA1 U937-Mt RA2 U937-PR 1 U937-PR 2 NOISE NOISE COMPARISON COMPARISON NOISE NOISE U937-Mt 1 U937-Mt 2 U937-Mt 1 3 U937-Mt 2 4 This procedure yielded four comparison files for each sample under analysis. Data thus obtained was then subjected to further elaboration using the DCall-Fold Change analysis procedure. DCall–Fold Change analysis is performed on Affymetrix comparison files. The Affymetrix “Difference Call” (DCall) corresponds to the qualitative information about the status of a Probe Set in the two conditions considered: it indicates if the expression level of a Probe Set is decreased (D), mildly decreased (MD), increased (I) or mildly increased (MI) in the sample as compared to the reference. “Fold Change” gives the corresponding quantitative information: it is calculated from the Signal Log ratio of Affymetrix comparison files. FCi = 2SLRiif SLRi > 0 FCi = -1/2SLRiif SLRi < 0 Where SLRi is the signal log ratio value for Probe Set i. The expression of a gene represented by a specific Probe Set is considered as decreased if, in each comparison file analyzed, it has a DCall corresponding to “D” or “MD” and its Fold Change value is lower than a fixed cut-off. Conversely, the expression of a given gene is considered increased if its representative Probe Set, in each comparison files analyzed, has a DCall corresponding to “I” or “MI” and its Fold Change value is higher than a fixed cut-off. For the purpose of finding common regulated target genes, the analysis was performed at low stringency; and the fold change cut-off value was set to 1,3 or –1.3. The t-statistic is well suited for finding differentially expressed genes because it allows the selection of an expression pattern that has maximal difference in mean level of expression between two groups, and minimal variation of expression within each group. A double sided t-test was performed on Signal values generated by MASv5, considering that group 1 N (μ1,s1) and group 2 N (μ2,s2) follow a Gaussian distribution. Parameters of the distributions are unknown and we assume the identity of standard deviations. t X 2 X1 n1 1s12 n2 1s2 2 n1n2 f n1 n2 Where: •X2, X1 are the means of signal values for group1 and group2, respectively, •n1, n2 are the number of signals in group1 and group2, respectively, •s12, s22 are the variances of the signal values in group1 and group2, respectively, •f is the number of degrees of freedom. f = n1+n2-2 The McNemar test was used to determine data quality and cut-off values (see Abell, M.L., Braselton, J.P., and Rafter, J.A., 1999. Statistics with Mathematica. Academic Press). The McNemar test is often used in clinical trials to assess the efficacy of drug treatment versus placebo controls. The test compares lists of “yes/no” values. A P value >0.01 indicates lack of efficacy (the lists are equivalent), whereas P values <0.01 suggest the presence of a therapeutic effect. Applied to our data, we used the results from pair- wise chip comparisons to obtain these lists where yes means “gene regulated” and no means “gene not regulated”. Lists of regulated genes resulting from chip comparisons between two test chips or between two control chips were used to determine the noise level of the experiments (called noise lists), whereas chip comparisons between a test and a control chip were used to determine the signal (called signal lists or data lists). Three parameters were evaluated: 1. The equivalence of samples and chip performances (noise list vs. noise list, P > 0.01). 2. The presence of differences in transcript levels (noise list vs. signal list, P < 0.01). 3. The reproducibility of measurement of such differences (signal list vs. signal list, P > 0.01). Comparative analysis lists that resulted from single-chip comparisons were combined into two duplicate lists using the logical AND operator (meaning that a gene was called regulated only when it was found to be regulated in each of the composing sub-lists). Noise lists and randomized signal lists combined the same way were used as controls. Randomization of signal lists was carried out using a pseudorandom number generator that reassigned a new position to each gene in the list before combining them. Special-purpose software for replica analysis and t-test analysis, named GenePicker, was developed by G. Finocchiaro and H. Muller (Finocchiaro, G., Parise, P., Minardi, S.P., Alcalay, M. & Muller, H. (2004). Bioinformatics, 20, 3670-2), and is available at http://www.ifom-firc.it/RESEARCH/Appl_Bioinfo/tools.html. Elaboration of results Lists of regulated genes resulting from the analysis procedure described above were imported into Access Databases for further elaboration. First, regulated probe sets from the 2 chips were combined into a single gene list for each experimental sample. Next, the results deriving from the first and the second scans of each chip (see “Hybridisation procedures and parameters”) were combined into a single list. Both fold change values were maintained for reference. Gene identity was assigned to Affymetrix probe sets using the “Automated Chip Reannotation tool at IFOM” (http://bio.ifom-firc.it/ARRAY_ANNOT/index.html), derived from UniGene release Hs.166. Probe sets were then converted into non-redundant regulated genes, rather than regulated probe sets, using the UniGene ID as unique identifier. Those probe sets that present sequences not assigned to a UniGene cluster were further grouped according to Gene Symbol (derived from EMBL or dbEST) or to the Accession number itself. Comparison of experimental strategies: RNA pools versus individual samples To define the best experimental strategy, we compared the results obtained with the above described protocol (i.e. experimental replicates followed by pooling of RNA samples), with results obtained by labeling and hybridizing the three independent RNAs and performing replica analysis. Briefly, the experimental design was as follows: the same RNA samples used to generate the RNA pools representative of the untreated U937-PR and U937-MT cells described above were used for this test. We labeled the experimental replicates separately, and hybridized each labeled target to one HG-U133A chip. A scheme of the experimental strategy is shown below. U937-PR9 U937-PR9_1 U937-PR9_2 U937-PR9_3 U937-MT_1 U937-MT_2 U937-MT_3 U937-MT Comparative analysis was performed as follows: U937-PR 1 U937-PR 2 U937-PR 3 NOISE COMPARISON NOISE U937-Mt 1 U937-Mt 2 U937-Mt 3 We then used the GenePicker software to perform replica analysis and statistical tests, as described in detail above. We thus obtained 491 regulated genes (210 induced and 281 repressed). Using RNA pools, we identified 1128 genes regulated in the HGU133A chip (613 induced and 515 repressed). Of these, 274 were in common between the two lists (56% of the genes identified using individual replicates and 24% of those identified using RNA pools). These results suggest that, using the same RNA samples and identical stringency of analysis, the use of RNA pools increases the number of identified targets by approximately 2-fold, and are in agreement with previous observations. Considering the high degree of concordance between our microarray data and qPCR data (Supplementary Table 9 and Alcalay, M., Meani, N., Gelmetti, V., et al (2003). J Clin Invest, 112, 1751-61), we believe the use of RNA pools increases the sensitivity of the method and is to be preferred in all cases where the measurement of individual or technical variability is not relevant to the experimental model. Primary data can be obtained from ArrayExpress, accession no. E-MEXP149. All elaborated results are available in the Supplementary Data at the Oncogene website.