SUPPLEMENTARY SECTION a) Hierarchical clustering analysis Hierarchical clustering analysis of all 81 samples of the study, as well as specific subgroups of specimens was used to explore data obtained by gene expression profiling. Gene expression matrices were filtered to exclude rows displaying missing values exceeding 10%, and both genes and experiments were mean-centered. Hierarchical clustering, carried on using the Pearson metric and the average clustering method, revealed the aggregation of cancer specimens from the same patient and of OSE samples. Analysis of gene expression profiles using only tumor samples (excluding all cell line specimens) revealed several distinct clusters, which varied in number of genes and levels of expression (see Figure A). Hierarchical clustering analysis indicated the following: Cluster D and clusters N and O displayed opposite gene expression profiles, determining the major split of samples. These three clusters contained a very large number of genes so that clear functional relationships were not evident. However several growth factors, morphogens and signal transduction proteins were revealed in these groups (VEGF, WNT5A, PDGFB in cluster D, WNT4 and BMP4 in cluster N and MADH2, MADH5 and MADH6 in cluster O). Clusters E and G included numerous genes involved in the host immune response, probably identifying tumor samples containing lymphocytes and immune cells. Cluster E contained several interferon induced proteins (G1P2, G1P3, MX1, IFIT1, IFI16), while cluster G contained several MHC class II molecules. Cluster F contained a number of genes previously proposed as EOC markers, such as KRT17, MEIS1, PAX8 and EPHB3 (Welsh et al, 2001; Schaner et al, 2003). Cluster M was clearly enriched in ECM proteins and contained both FGF2 and FGFR4. b) Unsupervised class discovery, associated gene list extraction, and GO-terms distribution comparison Class discovery (unsupervised clustering) was carried out using ISIS v.2.0 software (von Heydebreck et al, 2001). Briefly, the software generates a large set of average gene expression profiles by standard hierarchical clustering and then, for each average profile, checks whether the clustering suggests one or more binary class distinctions of the set of samples. Finally, a statistical score (diagonal linear discriminant, DLD) is calculated for each candidate bipartition which quantifies how strongly the two classes are separated by the expression levels of a suitable subset of genes. The procedure was carried out using the default ISIS parameters (p=50, poffs=0, and all possible candidate splits=149), and 24 candidate splits were obtained. All 24 partitions revealed in tumor samples were further compared with available clinical data. Because ISIS software does not provide the list of genes responsible for the partition formed, a two-sample univariate F-test (with randomized variance model and with FDR correction) was used to extract gene lists. A nominal P-value < 0.002 was chosen as a threshold, in order to limit the number of recovered genes. The confidence level of false discovery rate assessment was 90%, the maximum allowed number of false-positive genes was 10, and the maximum allowed proportion of false-positive genes was 0.1. The procedure was applied to all discovered partitions. The number of retrieved genes varied among the different lists associated to the binary partitions, and GO-terms distribution analysis using EASE software v.2.0 (Hosack et al, 2003) was applied to reveal functional associations. The distribution of GO-terms associated to each gene list was compared to the distribution of GO-terms associated to all genes present on the array. Up- and downregulated genes within each list were analyzed separately, using only valid UniGene identifiers. Table A summarizes the results. Briefly, this analysis identified three gene lists (associated with ISIS classes 6, 20 and 24 respectively), which were enriched in GO-terms related to the ECM and one list (associated with ISIS class 15) enriched in genes localized in the intracellular compartment. ECM and its remodeling emerged as the major relevant functional theme in our expression data. The three identified gene lists as well as the three related partitions largely overlapped, and the ISIS class 20 was chosen for further analysis, since it was the most balanced in terms of number of samples in each group. c ) Gene list description. The ECM/FGF2 signaling-related classifier, as predicted using our dataset, contained genes related to the ECM and to elements functionally related to FGF2 signaling. The functional category related to ECM included genes encoding ECM structural components as well as genes involved in its remodeling and in cell adhesion processes: collagens (COL3A1, COL5A1, COL6A3, COL9A3), the proteoglycans fibulin 2 (FBLN2) and fibromodulin (FMOD), the fibronectin 1 gene (FN1), thrombospondin 2 (THBS2), Lutheran blood group (LU), fibroblast activation protein-alpha (FAP), lysyl oxidase-like 1 (LOXL1), latent transforming growth factor-beta binding protein 2 (LTBP2), SPARClike 1 (SPARCL1), proteoglycan 1, secretory granule (PRG1), osteoblast-specific factor 2 (OSF2). Along with FGF2 this classifier contained the FGFR4, one clone corresponding to an immature form of the FGFR2, as well as other ECM related genes reported to be regulated by FGF2 in various cellular contexts: OB cadherin (CDH11), lumican (LUM), biglycan (BGN). Among the other genes of the top-DLD-classifier, several are associated with the immune/inflammatory response of the host: CXCL2 chemokine (also known as GRO2 oncogene), interleukin 16 (IL16), MHC class II transactivator (MHC2TA) and the Fc fragment of IgG receptor and transporter-alpha (FCGRT), Fc fragment of IgG lowaffinity IIIa receptor (FCGR3A), Duffy blood group (FY), lymphocyte-specific protein tyrosine kinase (LCK), tumor necrosis factor receptor superfamily member 6 (TNFRSF6) and Rhesus blood group-associated glycoprotein (RHAG). The top-DLD-classifier also accounted for genes involved in transcription regulation, such as GATA binding protein 1 (GATA1), GA binding protein transcription factors alpha subunit (60 kDa) and beta subunit 2 (47 kDa) (GABPA, GABPB2), NGFI-A binding protein 2 (NAB2), AE binding protein 1 (AEBP1) and RING1 and YY1 binding protein (RYBP). Complete gene lists are available from the web sites of IFOM (http://www.ifom.it/) and LNCIB (http://www.lncib.it/). d) Ascitic cell recovery Cells present in ascitic fluid were collected by centrifugation, resuspended in RPMI 1460 medium, stratified over a 75-100% Ficoll-Hypaque discontinuous density gradient and centrifuged to harvest tumor-associated lymphocytes and tumor cells. Tumor cells were enriched over the 75% Ficoll density gradient. Contaminating monocytes were removed by plastic adherence for 1 h at 37°C. Purity of ovarian tumor cell populations was determined by flow-cytometric analysis of different tumor markers (Ca125, FR, Herb-B2, EGF-R) and leukocyte differentiation antigens (CD3, CD14, CD16, CD28, CD25). e) Analysis of TP53 Genomic DNA was extracted from frozen specimens when available (12 cases). In the remaining 30 cases, methylene-blue stained sections from formalin-fixed, paraffinembedded tissues were microdissected under the microscope to obtain malignant tissues. Genomic DNA was extracted as described (Birindelli et al, 2001). Samples were screened by PCR-SSCP (single strand conformation polymorphism) (Donghi et al, 1993) or by DG-DGGE (double gradient-denaturating gradient gel electrophoresis) (Gelfi et al, 1997) for the presence of TP53 mutations in the most frequently affected exons (5 through 8) of the gene. Samples with mutations were identified by the presence of one or more new bands or a shift in position compared with a control wild-type cell line and control mutated samples. These cases were subjected to automated DNA sequencing (ABI Prism 377, Applied Biosystems) and each sequence reaction was performed at least twice in sense and antisense strands f) cDNA microarrays After printing, slides were cross-linked at 45 mj/cm2 and stored in a desiccator. Before hybridization, all slides were treated with 50% formamide for 2 min at 70°C to remove excess DNA, followed by 1% SDS/H2O for 5 min at room temperature to reduce overall background. Pre-hybridization was performed in UltraHyb hybridization buffer for 1 h at 42°C (Ambion, Austin, TX). All clones were annotated according to their GenBank accession number or their I.M.A.G.E. clone identifier, using the SOURCE (Diehn et al, 2003) and the IFOM EST Annotation Machine (Guffanti et al, 2002) resources. g) Target cDNA preparation Probe labeling reaction was carried out in a final volume of 40 l (1X first-strand buffer; 0.01 M DTT, 0.1 mM dATP, dGTP, dTTP, 6.25 M dCTP; 0.33 mM Cy3 or Cy5 dCTP; 1 Ci 32P-dCPT; 20 U RNase inhibitor from human placenta (Roche Applied Science, Indianapolis, IN); 300 U SuperScript II reverse transcriptase (Life Technologies, Fredrick, MA). Samples were incubated at 42°C for 2 h and the reactions stopped by addition of 4 l of 0.5 M EDTA, pH 8. Starting RNA template was removed by alkali hydrolysis adding 4 l of 0.5 M NaOH, followed by incubation at 70°C for 15 min and finally neutralization with 4 l of 0.5 M HCl. Unincorporated nucleotides were removed from labeled probes using Microcon YM-50 columns (Millipore, Bedford, MA). h) Slide scanning and image analysis cDNA microarrays were scanned using the GenePix 4000A microarray scanner at a resolution of 10 m and analyzed using GenePix Pro v.3.0 software (Axon Instruments, Union City, CA). Image scanning and acquisition processes were carried out as follows: 1. The Cy5 laser photomultiplier voltage (PMT) of each sample was modulated according to the fixed Cy3 PMT value of the reference to obtain a scatter plot of the fluorescent ratios with a regression ratio of 1, thereby indicating a balance between the two channels. 2. Removal of poor quality spots ("flagging") was carried out both automatically using GenePix Pro software and manually. Such spots were excluded from further analyses. i) Raw data filtering, normalization and gene expression matrix construction The GenePix Pro GPR raw data files containing the spot intensities were processed using the GenePix post-processing program GP3 (Fielden et al, 2002) Each GPR file associated to each sample was processed to correct, filter and normalize the data to obtain reliable Cy3 and Cy5 ratios of each cDNA target. Raw data filtering procedures to remove low-quality data included: 1. Removal of failed PCR clones; 2. Removal of flagged spots identified during scanning; 3. Removal of spots not meeting the following criteria: a. negative or saturated local background corrected signal intensities in both channels; b. median signal intensity less than the median local background plus two standard deviations of the median local background; 4. Removal of clones showing only one single valid measurement in the three replicates; After the filtering procedures, each slide was normalized to balance the Cy3 and Cy5 channels using a global trimmed mean obtained by eliminating the upper and lower 5% of the data. This value was subtracted from each data point. The final output of each sample contained Cy3 and Cy5 local background-corrected signals, Cy5 to Cy3 ratios (log2) and a flag tag for each corresponding clone. The final gene expression matrix was obtained by collating log2 ratios and flag data from the 81 experiments. In all subsequent analyses, control spike genes were removed and a maximum of 10% of missing (invalid) values was allowed. The nearest neighbor method was chosen to estimate missing values. Mean centering was applied to all genes to standardize the dataset. The Pearson metric and average or complete linkage methods were used in hierarchical clustering. The gene expression matrix used for ISIS class discovery procedure accounted for 39 specimens of primary tumors from advanced disease (with the exclusion of the 2 clear cell cases) and was further filtered to about 2000 genes to reduce the noise, as suggested by the software authors (von Heydebreck et al, 2001), by keeping only genes showing a row standard deviation greater than 0.27. j) Supervised learning Supervised learning methods implemented in BRB-ArrayTools were used to select the best discriminating genes associated with the known phenotypes. Standard class comparison was done using a two-sample univariate F-test (with randomized variance model and with FDR correction with a nominal P-value < 0.0025). The confidence level of false discovery rate assessment was 90%, the maximum allowed number of falsepositive genes was 10, and the maximum allowed proportion of false-positive genes was 0.1. The estimated probability of identifying at least 113 genes as significant (P < 0.0025) by chance, when no real differences exist between the classes was 0.00606. All permutation tests carried out used at least 1000 permutations. k) Immunohistochemistry Tumor sections (1-2 m) were serially cut from formalin-fixed, paraffin-embedded tissue mounted in poly-L-lysine (Sigma, St. Louis, MO)-coated slides, deparaffinized in xylene and hydrated in graded alcohols. Endogenous peroxidase activity was inhibited by treating sections with 0.3% hydrogen peroxide in methanol for 30 min. Slides were washed three times in 0.05 M PBS-0.1% Triton, incubated with normal goat or human albumin diluted 1:50 and 1:100, respectively, in 1% PBS 0.1%-BSA-sodium azide and incubated overnight with the following primary antibodies: Antibody Supplier Epitope retrieval FGF2 (147) sc-79, pAb Santa Cruz FGFR4 (c16) sc-124, pAb Santa Cruz 6-min autoclave in citrate buffer (pH 6) + 5-min in protease XIV 2% in TBS-EDTA 6-min autoclave in citrate buffer (pH 6) FN1 A 0245, pAb Dako a 6-min autoclave in EDTA (pH 8) + 5min in protease XIV 2% in TBSEDTA a Dilution 1:50 1:200 1:800 Antibodies were diluted in 1% PBS-0,1% BSA-sodium azide. Controls Pos:pancreas Neg:blocking peptide Pos:neuroendocrine carcinoma Neg:blocking peptide Pos:myofibroblastic sarcoma TABLE A. ISIS classes, associated gene lists and GO-terms distribution analysis. ISIS CLASS 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 16 17 18 19 20 21 22 23 24 a a Group Group DLD No. Genes 0 b 24 10 32 12 10 12 8 4 7 14 5 12 9 4 10 33 5 9 9 21 33 6 9 6 No. Genes No. Genes 79 20 20 263 239 147 59 50 115 99 21 30 10 26 250 11 281 72 239 67 38 56 79 16 47 3 3 482 514 196 33 7 118 87 3 5 1 5 478 3 605 38 323 57 6 27 56 5 UP UP GO-terms 47 24 14 33 30 7 2 10 9 60 77 29 53 1 21 5 6 3 14 65 61 19 29 2 NO NO NO NO NO YES NO NO NO NO NO NO NO NO YES NO NO NO NO YES NO NO YES NO group group Bonferroni j b c d e f Probabilityg h h i GO-term 1 Score P<0.002 FP<10 genes FP<10% 0 1 P<0.05 15 29 7 27 29 27 31 35 32 25 34 27 30 35 29 6 34 30 30 18 6 33 30 33 16.82 15.2 13.82 13.46 13.25 13.17 13.07 13.02 12.99 12.98 12.97 12.92 12.64 12.62 12.46 12.41 12.26 12.07 12.00 11.63 11.42 11.2 10.95 9.493 100 25 35 298 314 181 92 86 159 105 38 36 9 61 301 16 450 90 266 75 70 79 106 31 0 0.051 0.037 0 0 0.001 0.005 0.017 0 0.001 0.06 0.016 0.23 0.038 0 0.154 0 0.002 0 0.003 0.012 0.009 0.003 0.062 ECM Intracellular ECM ECM Binary partitions obtained by analysis of 39 advanced EOC by automated class discovery with ISIS software (von Heydebreck et al, 2001). b Number of samples within each group of samples. c DLD score associated to each binary partition discovered with ISIS. d Number of genes significant at P < 0.002 in univariate F-test obtained using BRBArrayTools (Radmacher et al, 2002; McShane et al, 2002).. e Number of genes containing less than 10 false positives with a 90% interval of confidence. f Number of genes containing less than 10% false positives with a 90% interval of confidence. g Probability that the predicted number of genes is significant (at P < 0.002) by chance and not indicative of differences between the classes. h Number of upregulated genes in group 0 or group 1 of tumor samples. i ISIS classes with associated gene lists significantly (P < 0.05 using Bonferroni correction) enriched in specific GO-terms in comparison to the distribution of GOterms of all genes present on the array, as assessed using (Hosack et al, 2003). j Significant retrieved GO-terms. EASE software v.2.0 FIGURE A. Hierarchical clustering of all analyzed samples except the cell lines. Clustering was done using 3309 genes, using the Pearson metric and the average clustering method. Expression levels are relative to a common reference, obtained by pooling RNA from ten human cell lines. Increased (orange) or decreased (blue) expression of the genes is shown for each sample. Major clusters and several of the genes they contain are shown in boxes A-O. REFERENCES Birindelli S, Perrone F, Oggionni M, Lavarino C, Pasini B, Vergani B, Ranzani GN, Pierotti MA and Pilotti S. (2001). Lab Invest, 81, 833-844. Diehn M, Sherlock G, Binkley G, Jin H, Matese JC, Hernandez-Boussard T, Rees CA, Cherry JM, Botstein D, Brown PO and Alizadeh AA. (2003). Nucleic Acids Res, 31, 219223. Donghi R, Longoni A, Pilotti S, Michieli P, Della Porta G and Pierotti MA. (1993). J Clin Invest, 91, 1753-1760. Fielden MR, Halgren RG, Dere E and Zacharewski TR. (2002). Bioinformatics, 18, 771773. Gelfi C, Righetti SC, Zunino F, Della TG, Pierotti MA and Righetti PG. (1997). Electrophoresis, 18, 2921-2927. Guffanti A, Reid JF, Alcalay M and Simon G. (2002). Trends Genet, 18, 589-592. Hosack DA, Dennis G, Jr., Sherman BT, Lane HC and Lempicki RA. (2003). Genome Biol, 4, R70. McShane LM, Radmacher MD, Freidlin B, Yu R, Li MC and Simon R. (2002). Bioinformatics, 18, 1462-1469. Radmacher MD, McShane LM and Simon R. (2002). J Comput Biol, 9, 505-511. Schaner ME, Ross DT, Ciaravino G, Sorlie T, Troyanskaya O, Diehn M, Wang YC, Duran GE, Sikic TL, Caldeira S, Skomedal H, Tu IP, Hernandez-Boussard T, Johnson SW, O'Dwyer PJ, Fero MJ, Kristensen GB, Borresen-Dale AL, Hastie T, Tibshirani R, van de RM, Teng NN, Longacre TA, Botstein D, Brown PO and Sikic BI. (2003). Mol Biol Cell, 14, 4376-4386. von Heydebreck A, Huber W, Poustka A and Vingron M. (2001). Bioinformatics, 17, S107-S114. Welsh JB, Zarrinkar PP, Sapinoso LM, Kern SG, Behling CA, Monk BJ, Lockhart DJ, Burger RA and Hampton GM. (2001). Proc Natl Acad Sci U S A, 98, 1176-1181.