Experiment Design We analysed gene expression profiles of different U937 cell clones conditionally expressing PML/RAR, AML1/ETO, or PLZF/RAR. In these cells, the different cDNAs are under the transcriptional control of the Zinc (Zn)-inducible mouse metallothionein (Mt) promoter. All cell clones were treated with 100M ZnSO4 for 8 hours before RNA extraction. We selected cell clones that showed comparable levels of expression of the fusion proteins after induction (see figure 1A). A U937 bulk population containing the empty cloning vector (Mt), also treated with 100M ZnSO4 for 8 hours, was used as reference for all samples. Samples used, extract preparation and labelling: Total RNA was extracted using TRIzol Reagent (Gibco), followed by clean up on RNeasy mini/midi columns (RNeasy Mini/Midi Kit, Qiagen). For each of the four U937 cell lines (PML/RAR, AML1/ETO, PLZF/RAR and Mt), three independent vials were thawed, and the ZnSO4 induction and RNA extraction were performed separately. Prior to RNA extraction, a small aliquot of cells was lysed in Laemmli lysis buffer, and all experiments were controlled for fusion protein expression by Western blotting. For each cell line, an RNA pool was obtained by mixing equal quantities of total RNA from each of the three independent RNA extractions. Biotin-labelled cRNA targets were synthesized starting from 10g of pooled RNA. Double stranded cDNA synthesis was performed with GIBCO SuperScript Custom cDNA Synthesis Kit, and biotin-labelled antisense RNA was transcribed in vitro using Ambion’s In Vitro Transcription System, including Bio-16-UTP and Bio-11-CTP (Enzo) in the reaction. All steps of the labelling protocol were performed as suggested by Affymetrix (http://www.affymetrix.com/support/technical/manual/expression_manual.affx). The size and the accuracy of quantitation of targets were checked by agarose gel electrophoresis of 2g aliquots, prior to and after fragmentation (see below). After fragmentation, targets were diluted in hybridisation buffer at a concentration of 150g/ml. MK 1 2 3 4 5 6 7 8 Marker (9.49 kb, 7.46 kb, 4.4 kb, 2.37 kb, 1.35 kb, 0.24 kb) 1- U937-Mt 2- U937-Mt, fragmented 3- U937-PML/RAR 4- U937-PML/RAR, fragmented 5- U937-AML1/ETO 6- U937-AML1/ETO, fragmented 7- U937-PLZF/RAR 8- U937-PLZF/RAR, fragmented Due to a globally less efficient performance of the PML/RAR target (see Report files.xls), another triplicate of total RNA samples were extracted from the PML/RAR cell line in the same conditions described above. Biotinylated cRNA targets were then synthesized from the new PML/RAR RNA pool and from the previously obtained reference RNA pool. The entire procedure described below was repeated for the new PML/RAR and reference targets thus obtained, and resulting data are referred to as 2PR and 2Mt in all analysis tables. Hybridisation procedures and parameters Hybridisation mix for target dilution (100 mM MES, 1 M [Na +], 20 mM EDTA, 0.01% Tween 20) was prepared as indicated by Affymetrix, including premixed biotin-labelled control oligo B2 and bioB, bioC, bioD and cre controls (Affymetrix cat# 900299) at a final concentration of 50 pM, 1.5 pM, 5 pM, 25 pM and 100 pM respectively. Targets were diluted in hybridisation buffer at a concentration of 150µg/ml and denatured at 99°C prior to introduction into the GeneChip cartridge. Targets were tested for quality by hybridisation to Affymetrix Test2 Arrays (cat# 900271 – now substituted by Test3 Arrays, cat# 900341). Two copies of the complete HG-U95 chip set (HG-U95Av2, HG-U95B, HG-U95C, HG-U95D, HGU95E, Affymetrix cat#900303, 900305, 900307, 900309, 900311) were then hybridised with each biotin-labelled target. Hybridisations were performed for 14-16 hours at 45°C in a rotisserie oven. GeneChip cartridges were washed and stained in the Affymetrix fluidics station following the EukGE-WS2 standard protocol (including Antibody Amplification): 1. Wash 10 cycles of 2 mixes/cycle with Wash Buffer A (6X SSPE, 0.01% Tween 20) at 25°C 2. Wash 4 cycles of 15 mixes/cycle with Wash Buffer B (100 mM MES, 0.1 M [Na+], 0.01% Tween 20) at 50°C 3. Stain the probe array for 10 minutes in SAPE solution (10 g/mL SAPE in 100 mM MES, 1 M [Na +], 0.05% Tween 20, 2 mg/mL BSA) at 25°C 4. Wash 10 cycles of 4 mixes/cycle with Wash Buffer A at 25°C FIRST SCAN 5. Stain the probe array for 10 minutes in antibody solution (Normal Goat IgG 0.1 mg/mL, 6. Biotinylated antibody 3 g/mL, 100 mM MES, 1 M [Na +], 0.05% Tween 20, 2 mg/mL BSA) at 25°C 7. Stain the probe array for 10 minutes in SAPE solution at 25°C 8. Final Wash 15 cycles of 4 mixes/cycle with Wash Buffer A at 30°C SECOND SCAN Images were scanned using an Affymetrix GeneArray Scanner, using default parameters. Each chip was scanned twice, to obtain two different images: the first scan was performed after the first SAPE staining procedure (between steps 4 and 5 above), and the second scan was performed after antibody amplification of the signal, at the end of the washing procedure. The resulting images were analysed using Microarray Suite version 5 (MASv5), Affymetrix cat# 690018. Data obtained from the two scans was processed independently, and merged for each sample only at the end of all elaborations. Measurement data and specifications “Absolute analysis” was performed for each chip with MASv5 software using default parameters, scaling all images to a value of 500. Report files were extracted for each chip, and performance of labelled targets was evaluated on the basis of several values (scaling factor, background and noise values, % present calls, average signal value, etc). Homogeneity within the experiment was also taken into account; for this reason, the PML/RAR sample was repeated in a second experiment, together with a newly synthesized reference (Mt) target. A summary report file for all chips can be found in “Report files.xls”. Results derived from PML/RAR, AML1/ETO, and PLZF/RAR targets (samples) were also compared to results from the Mt target (reference) by “comparative analysis”, using the reference chips as baseline. Each sample chip was compared to both reference chips for identification of regulated genes. Furthermore, duplicate sample and reference chips were compared to each other for calculation of noise (see scheme below). U937-PML/RAR 1 U937-PML/RAR 2 U937-AML1/ETO 1 U937- AML1/ETO 2 U937-PLZF/RAR 1 U937-PLZF/RAR 2 COMPARISON FILES NOISE FILES U937-Mt 1 U937-Mt 2 This procedure yielded four comparison files for each sample under analysis. All raw data deriving from absolute and comparative analyses of both scans for each chip are available as text files in the “Absolute analysisscan1.zip”, “Absolute analysis-scan2.zip”, “Comparative analysis-scan1.zip” and “Comparative analysis-scan2.zip” directories, respectively. Data thus obtained was then subjected to further elaboration using two sequential analysis procedures: DCall-Fold Change and T-test analyses. DCall – Fold Change analysis is performed on Affymetrix comparison files. The Affymetrix “Difference Call” (DCall) corresponds to the qualitative information about the status of a Probe Set in the two conditions considered: it indicates if the expression level of a Probe Set is decreased (D), mildly decreased (MD), increased (I) or mildly increased (MI) in the sample as compared to the reference. “Fold Change” gives the corresponding quantitative information: it is calculated from the Signal Log ratio of Affymetrix comparison files. FCi = 2SLRi if SLRi > 0 FCi = -1/2SLRi if SLRi < 0 Where SLRi is the signal log ratio value for Probe Set i. The expression of a gene represented by a specific Probe Set is considered as decreased if, in each comparison file analysed, it has a DCall corresponding to “D” or “MD” and its Fold Change value is lower than a fixed cut-off. Conversely, the expression of a given gene is considered increased if its representative Probe Set, in each comparison files analysed, has a DCall corresponding to “I” or “MI” and its Fold Change value is higher than a fixed cut-off. For the purpose of finding common regulated target genes, the analysis was performed at low stringency; and the fold change cut-off value was set to 1,3 or –1.3. The t-statistic is well suited for finding differentially expressed genes because it allows the selection of an expression pattern that has maximal difference in mean level of expression between two groups, and minimal variation of expression within each group. A double sided t-test was performed on Signal values generated by MASv5, considering that group 1 N (μ1,s1) and group 2 N (μ2,s2) follow a Gaussian distribution. Parameters of the distributions are unknown and we assume the identity of standard deviations. t X 2 X1 n1 1s12 n2 1s2 2 n1n2 f n1 n2 where: X2, X1 are the means of signal values for group1 and group2, respectively, n1,n2 are the number of signals in group1 and group2, respectively, s12,s22 are the variances of the signal values in group1 and group2, respectively, f is the number of degrees of freedom. f = n1+n2-2 The McNemar test was used to determine data quality and cut-off values (see Abell, M.L., Braselton, J.P., and Rafter, J.A., 1999. Statistics with Mathematica. Academic Press). The McNemar test is often used in clinical trials to assess the efficacy of drug treatment versus placebo controls. The test compares lists of “yes/no” values. A P value >0.01 indicates lack of efficacy (the lists are equivalent), whereas P values <0.01 suggest the presence of a therapeutic effect. Applied to our data, we used the results from pair-wise chip comparisons to obtain these lists where yes means “gene regulated” and no means “gene not regulated”. A gene was called regulated when both the Dcall-Fold change analysis and the t-test resulted positive for that gene. Lists of regulated genes resulting from chip comparisons between two test chips or between two control chips were used to determine the noise level of the experiments (called noise lists), whereas chip comparisons between a test and a control chip were used to determine the signal (called signal lists or data lists). Three parameters were evaluated: 1. The equivalence of samples and chip performances (noise list vs. noise list, P > 0.01). 2. The presence of differences in transcript levels (noise list vs. signal list, P < 0.01). 3. The reproducibility of measurement of such differences (signal list vs. signal list, P > 0.01). Comparative analysis lists that resulted from single-chip comparisons were combined into two duplicate lists using the logical AND operator (meaning that a gene was called regulated only when it was found to be regulated in each of the composing sub-lists). Noise lists and randomised signal lists combined the same way were used as controls. Randomisation of signal lists was carried out using a pseudorandom number generator that reassigned a new position to each gene in the list before combining them. Special-purpose software for replica analysis and t-test analysis was developed by Heiko Muller, and is available on request (muller@ifom-firc.it). Elaboration of results Lists of regulated genes resulting from the analysis procedure described above were imported into Access Databases for further elaboration. First, regulated probe sets from the 5 chips were combined into a single gene list for each experimental sample. Next, the results deriving from the first and the second scans of each chip (see “Hybridisation procedures and parameters”) were combined into a single list. Both fold change values were maintained for reference. Results of the two experiments with PML/RAR targets were then combined to obtain a single list. For this sample, the fold changes considered were those relative to the experiment where regulation was found; in the case of genes found regulated in both experiments, or of genes that were not regulated in either experiment, an average value was calculated. Gene identity was assigned to Affymetrix probe sets using the “Automated Chip Reannotation tool at IFOM” (http://bio.ifom- firc.it/ARRAY_ANNOT/index.html), derived from UniGene release Hs.159. The regulated probe sets thus annotated are visible in the table “1 Regulated ProbeSets.xls”. These results were then converted into non-redundant regulated genes, rather than regulated probe sets, using the UniGene ID as unique identifier. All probe sets assigned to the same UniGene cluster were considered as redundant, and are represented once in the table “2 Regulated Genes Hs159.xls”. Probe sets that represent sequences not assigned to a UniGene cluster were further grouped according to Gene Symbol (derived from EMBL or dbEST) or to the Accession number itself. The fold change value indicated in the table is the average fold change of all regulated probe sets representing the gene. In the rare cases where probe sets representing the same gene displayed opposite regulations, the results were discarded from further analysis for the purpose of this study. Of the latter, only those probe sets that were concordantly regulated by fusion proteins were maintained in the final list (“3 Common Targets.xls”, see below), but they were discarded from all lists based on gene identity rather than probe set. The table “3 Common Targets.xls”, starting point of the results discussed in the manuscript, includes all those genes that are regulated concordantly by at least two fusion proteins, even when the third fusion protein under analysis regulates some of these in the opposite direction. Common target genes were first identified by searching for concordantly regulated probe sets (1623 probe sets, corresponding to 1409 non-redundant genes). Additional 146 common target genes were identified by searching the table “Regulated Genes Hs159.xls”, and therefore derive from the results of different probe sets for the same gene. In particular, of the 1555 genes represented in the table: 50 are induced by all 3 fusion proteins 113 are repressed by all 3 fusion proteins 94 are induced only by PML/RAR and AML1/ETO; of these, 28 are repressed by PLZF/RAR 182 are repressed only by PML/RAR and AML1/ETO; of these, 71 are induced by PLZF/RAR 219 are induced only by PML/RAR and PLZF/RAR; of these, 23 are repressed by AML1/ETO 326 are repressed only by PML/RAR and PLZF/RAR; of these, 49 are induced by AML1/ETO 291 are induced only by PLZF/RAR and AML1/ETO; of these, 40 are repressed by PML/RAR 280 are repressed only by PLZF/RAR and AML1/ETO; of these, 34 are induced by PML/RAR. In summary, of the 1555 genes considered as common targets of AML fusion proteins, 245 (=15%) were concordantly regulated by two fusion proteins and discordantly regulated by the third.