Supplementary section a (doc 140K)

advertisement
SUPPLEMENTARY SECTION
a) Hierarchical clustering analysis
Hierarchical clustering analysis of all 81 samples of the study, as well as specific
subgroups of specimens was used to explore data obtained by gene expression profiling.
Gene expression matrices were filtered to exclude rows displaying missing values
exceeding 10%, and both genes and experiments were mean-centered. Hierarchical
clustering, carried on using the Pearson metric and the average clustering method,
revealed the aggregation of cancer specimens from the same patient and of OSE samples.
Analysis of gene expression profiles using only tumor samples (excluding all cell line
specimens) revealed several distinct clusters, which varied in number of genes and levels
of expression (see Figure A).
Hierarchical clustering analysis indicated the following:

Cluster D and clusters N and O displayed opposite gene expression profiles,
determining the major split of samples. These three clusters contained a very large
number of genes so that clear functional relationships were not evident. However
several growth factors, morphogens and signal transduction proteins were
revealed in these groups (VEGF, WNT5A, PDGFB in cluster D, WNT4 and
BMP4 in cluster N and MADH2, MADH5 and MADH6 in cluster O).

Clusters E and G included numerous genes involved in the host immune response,
probably identifying tumor samples containing lymphocytes and immune cells.
Cluster E contained several interferon induced proteins (G1P2, G1P3, MX1,
IFIT1, IFI16), while cluster G contained several MHC class II molecules.

Cluster F contained a number of genes previously proposed as EOC markers, such
as KRT17, MEIS1, PAX8 and EPHB3 (Welsh et al, 2001; Schaner et al, 2003).

Cluster M was clearly enriched in ECM proteins and contained both FGF2 and
FGFR4.
b) Unsupervised class discovery, associated gene list extraction, and GO-terms
distribution comparison
Class discovery (unsupervised clustering) was carried out using ISIS v.2.0 software (von
Heydebreck et al, 2001). Briefly, the software generates a large set of average gene
expression profiles by standard hierarchical clustering and then, for each average profile,
checks whether the clustering suggests one or more binary class distinctions of the set of
samples. Finally, a statistical score (diagonal linear discriminant, DLD) is calculated for
each candidate bipartition which quantifies how strongly the two classes are separated by
the expression levels of a suitable subset of genes. The procedure was carried out using
the default ISIS parameters (p=50, poffs=0, and all possible candidate splits=149), and 24
candidate splits were obtained. All 24 partitions revealed in tumor samples were further
compared with available clinical data.
Because ISIS software does not provide the list of genes responsible for the partition
formed, a two-sample univariate F-test (with randomized variance model and with FDR
correction) was used to extract gene lists. A nominal P-value < 0.002 was chosen as a
threshold, in order to limit the number of recovered genes. The confidence level of false
discovery rate assessment was 90%, the maximum allowed number of false-positive
genes was 10, and the maximum allowed proportion of false-positive genes was 0.1. The
procedure was applied to all discovered partitions.
The number of retrieved genes varied among the different lists associated to the binary
partitions, and GO-terms distribution analysis using EASE software v.2.0 (Hosack et al,
2003) was applied to reveal functional associations. The distribution of GO-terms
associated to each gene list was compared to the distribution of GO-terms associated to
all genes present on the array. Up- and downregulated genes within each list were
analyzed separately, using only valid UniGene identifiers. Table A summarizes the
results. Briefly, this analysis identified three gene lists (associated with ISIS classes 6, 20
and 24 respectively), which were enriched in GO-terms related to the ECM and one list
(associated with ISIS class 15) enriched in genes localized in the intracellular
compartment. ECM and its remodeling emerged as the major relevant functional theme in
our expression data. The three identified gene lists as well as the three related partitions
largely overlapped, and the ISIS class 20 was chosen for further analysis, since it was the
most balanced in terms of number of samples in each group.
c ) Gene list description.
The ECM/FGF2 signaling-related classifier, as predicted using our dataset, contained
genes related to the ECM and to elements functionally related to FGF2 signaling. The
functional category related to ECM included genes encoding ECM structural components
as well as genes involved in its remodeling and in cell adhesion processes: collagens
(COL3A1, COL5A1, COL6A3, COL9A3), the proteoglycans fibulin 2 (FBLN2) and
fibromodulin (FMOD), the fibronectin 1 gene (FN1), thrombospondin 2 (THBS2),
Lutheran blood group (LU), fibroblast activation protein-alpha (FAP), lysyl oxidase-like
1 (LOXL1), latent transforming growth factor-beta binding protein 2 (LTBP2), SPARClike 1 (SPARCL1), proteoglycan 1, secretory granule (PRG1), osteoblast-specific factor 2
(OSF2). Along with FGF2 this classifier contained the FGFR4, one clone corresponding
to an immature form of the FGFR2, as well as other ECM related genes reported to be
regulated by FGF2 in various cellular contexts: OB cadherin (CDH11), lumican (LUM),
biglycan (BGN).
Among the other genes of the top-DLD-classifier, several are associated with the
immune/inflammatory response of the host: CXCL2 chemokine (also known as GRO2
oncogene), interleukin 16 (IL16), MHC class II transactivator (MHC2TA) and the Fc
fragment of IgG receptor and transporter-alpha (FCGRT), Fc fragment of IgG lowaffinity IIIa receptor (FCGR3A), Duffy blood group (FY), lymphocyte-specific protein
tyrosine kinase (LCK), tumor necrosis factor receptor superfamily member 6 (TNFRSF6)
and Rhesus blood group-associated glycoprotein (RHAG). The top-DLD-classifier also
accounted for genes involved in transcription regulation, such as GATA binding protein 1
(GATA1), GA binding protein transcription factors alpha subunit (60 kDa) and beta
subunit 2 (47 kDa) (GABPA, GABPB2), NGFI-A binding protein 2 (NAB2), AE binding
protein 1 (AEBP1) and RING1 and YY1 binding protein (RYBP).
Complete gene lists are available from the web sites of IFOM (http://www.ifom.it/) and
LNCIB (http://www.lncib.it/).
d) Ascitic cell recovery
Cells present in ascitic fluid were collected by centrifugation, resuspended in RPMI 1460
medium, stratified over a 75-100% Ficoll-Hypaque discontinuous density gradient and
centrifuged to harvest tumor-associated lymphocytes and tumor cells. Tumor cells were
enriched over the 75% Ficoll density gradient. Contaminating monocytes were removed
by plastic adherence for 1 h at 37°C. Purity of ovarian tumor cell populations was
determined by flow-cytometric analysis of different tumor markers (Ca125, FR, Herb-B2,
EGF-R) and leukocyte differentiation antigens (CD3, CD14, CD16, CD28, CD25).
e) Analysis of TP53
Genomic DNA was extracted from frozen specimens when available (12 cases). In the
remaining 30 cases, methylene-blue stained sections from formalin-fixed, paraffinembedded tissues were microdissected under the microscope to obtain malignant tissues.
Genomic DNA was extracted as described (Birindelli et al, 2001). Samples were screened
by PCR-SSCP (single strand conformation polymorphism) (Donghi et al, 1993) or by
DG-DGGE (double gradient-denaturating gradient gel electrophoresis) (Gelfi et al, 1997)
for the presence of TP53 mutations in the most frequently affected exons (5 through 8) of
the gene. Samples with mutations were identified by the presence of one or more new
bands or a shift in position compared with a control wild-type cell line and control
mutated samples. These cases were subjected to automated DNA sequencing (ABI Prism
377, Applied Biosystems) and each sequence reaction was performed at least twice in
sense and antisense strands
f) cDNA microarrays
After printing, slides were cross-linked at 45 mj/cm2 and stored in a desiccator. Before
hybridization, all slides were treated with 50% formamide for 2 min at 70°C to remove
excess DNA, followed by 1% SDS/H2O for 5 min at room temperature to reduce overall
background. Pre-hybridization was performed in UltraHyb hybridization buffer for 1 h at
42°C (Ambion, Austin, TX). All clones were annotated according to their GenBank
accession number or their I.M.A.G.E. clone identifier, using the SOURCE (Diehn et al,
2003) and the IFOM EST Annotation Machine (Guffanti et al, 2002) resources.
g) Target cDNA preparation
Probe labeling reaction was carried out in a final volume of 40 l (1X first-strand buffer;
0.01 M DTT, 0.1 mM dATP, dGTP, dTTP, 6.25 M dCTP; 0.33 mM Cy3 or Cy5 dCTP;
1 Ci 32P-dCPT; 20 U RNase inhibitor from human placenta (Roche Applied Science,
Indianapolis, IN); 300 U SuperScript II reverse transcriptase (Life Technologies,
Fredrick, MA). Samples were incubated at 42°C for 2 h and the reactions stopped by
addition of 4 l of 0.5 M EDTA, pH 8. Starting RNA template was removed by alkali
hydrolysis adding 4 l of 0.5 M NaOH, followed by incubation at 70°C for 15 min and
finally neutralization with 4 l of 0.5 M HCl. Unincorporated nucleotides were removed
from labeled probes using Microcon YM-50 columns (Millipore, Bedford, MA).
h) Slide scanning and image analysis
cDNA microarrays were scanned using the GenePix 4000A microarray scanner at a
resolution of 10 m and analyzed using GenePix Pro v.3.0 software (Axon Instruments,
Union City, CA). Image scanning and acquisition processes were carried out as follows:
1.
The Cy5 laser photomultiplier voltage (PMT) of each
sample was modulated according to the fixed Cy3 PMT value of the reference to
obtain a scatter plot of the fluorescent ratios with a regression ratio of 1, thereby
indicating a balance between the two channels.
2.
Removal of poor quality spots ("flagging") was carried out
both automatically using GenePix Pro software and manually. Such spots were
excluded from further analyses.
i) Raw data filtering, normalization and gene expression matrix construction
The GenePix Pro GPR raw data files containing the spot intensities were processed using
the GenePix post-processing program GP3 (Fielden et al, 2002) Each GPR file associated
to each sample was processed to correct, filter and normalize the data to obtain reliable
Cy3 and Cy5 ratios of each cDNA target.
Raw data filtering procedures to remove low-quality data included:
1.
Removal of failed PCR clones;
2.
Removal of flagged spots identified during scanning;
3.
Removal of spots not meeting the following criteria:
a. negative or saturated local background corrected signal intensities in both
channels;
b. median signal intensity less than the median local background plus two standard
deviations of the median local background;
4.
Removal of clones showing only one single valid measurement in the three
replicates;
After the filtering procedures, each slide was normalized to balance the Cy3 and Cy5
channels using a global trimmed mean obtained by eliminating the upper and lower 5%
of the data. This value was subtracted from each data point. The final output of each
sample contained Cy3 and Cy5 local background-corrected signals, Cy5 to Cy3 ratios
(log2) and a flag tag for each corresponding clone. The final gene expression matrix was
obtained by collating log2 ratios and flag data from the 81 experiments. In all subsequent
analyses, control spike genes were removed and a maximum of 10% of missing (invalid)
values was allowed. The nearest neighbor method was chosen to estimate missing values.
Mean centering was applied to all genes to standardize the dataset. The Pearson metric
and average or complete linkage methods were used in hierarchical clustering. The gene
expression matrix used for ISIS class discovery procedure accounted for 39 specimens of
primary tumors from advanced disease (with the exclusion of the 2 clear cell cases) and
was further filtered to about 2000 genes to reduce the noise, as suggested by the software
authors (von Heydebreck et al, 2001), by keeping only genes showing a row standard
deviation greater than 0.27.
j) Supervised learning
Supervised learning methods implemented in BRB-ArrayTools were used to select the
best discriminating genes associated with the known phenotypes. Standard class
comparison was done using a two-sample univariate F-test (with randomized variance
model and with FDR correction with a nominal P-value < 0.0025). The confidence level
of false discovery rate assessment was 90%, the maximum allowed number of falsepositive genes was 10, and the maximum allowed proportion of false-positive genes was
0.1. The estimated probability of identifying at least 113 genes as significant (P < 0.0025)
by chance, when no real differences exist between the classes was 0.00606. All
permutation tests carried out used at least 1000 permutations.
k) Immunohistochemistry
Tumor sections (1-2 m) were serially cut from formalin-fixed, paraffin-embedded tissue
mounted in poly-L-lysine (Sigma, St. Louis, MO)-coated slides, deparaffinized in xylene
and hydrated in graded alcohols. Endogenous peroxidase activity was inhibited by
treating sections with 0.3% hydrogen peroxide in methanol for 30 min. Slides were
washed three times in 0.05 M PBS-0.1% Triton, incubated with normal goat or human
albumin diluted 1:50 and 1:100, respectively, in 1% PBS 0.1%-BSA-sodium azide and
incubated overnight with the following primary antibodies:
Antibody
Supplier
Epitope retrieval
FGF2 (147)
sc-79, pAb
Santa
Cruz
FGFR4 (c16)
sc-124, pAb
Santa
Cruz
6-min autoclave in citrate buffer (pH
6) + 5-min in protease XIV 2% in
TBS-EDTA
6-min autoclave in citrate buffer (pH
6)
FN1
A 0245, pAb
Dako
a
6-min autoclave in EDTA (pH 8) + 5min in protease XIV 2% in TBSEDTA
a
Dilution
1:50
1:200
1:800
Antibodies were diluted in 1% PBS-0,1% BSA-sodium azide.
Controls
Pos:pancreas
Neg:blocking peptide
Pos:neuroendocrine
carcinoma
Neg:blocking peptide
Pos:myofibroblastic
sarcoma
TABLE A. ISIS classes, associated gene lists and GO-terms distribution analysis.
ISIS
CLASS
1
2
3
4
5
6
7
8
9
1
11
12
13
14
15
16
17
18
19
20
21
22
23
24
a
a
Group Group DLD No. Genes
0
b
24
10
32
12
10
12
8
4
7
14
5
12
9
4
10
33
5
9
9
21
33
6
9
6
No. Genes
No. Genes
79
20
20
263
239
147
59
50
115
99
21
30
10
26
250
11
281
72
239
67
38
56
79
16
47
3
3
482
514
196
33
7
118
87
3
5
1
5
478
3
605
38
323
57
6
27
56
5
UP
UP
GO-terms
47
24
14
33
30
7
2
10
9
60
77
29
53
1
21
5
6
3
14
65
61
19
29
2
NO
NO
NO
NO
NO
YES
NO
NO
NO
NO
NO
NO
NO
NO
YES
NO
NO
NO
NO
YES
NO
NO
YES
NO
group group Bonferroni
j
b
c
d
e
f Probabilityg
h
h
i GO-term
1
Score P<0.002
FP<10 genes FP<10%
0
1
P<0.05
15
29
7
27
29
27
31
35
32
25
34
27
30
35
29
6
34
30
30
18
6
33
30
33
16.82
15.2
13.82
13.46
13.25
13.17
13.07
13.02
12.99
12.98
12.97
12.92
12.64
12.62
12.46
12.41
12.26
12.07
12.00
11.63
11.42
11.2
10.95
9.493
100
25
35
298
314
181
92
86
159
105
38
36
9
61
301
16
450
90
266
75
70
79
106
31
0
0.051
0.037
0
0
0.001
0.005
0.017
0
0.001
0.06
0.016
0.23
0.038
0
0.154
0
0.002
0
0.003
0.012
0.009
0.003
0.062
ECM
Intracellular
ECM
ECM
Binary partitions obtained by analysis of 39 advanced EOC by automated class
discovery with ISIS software (von Heydebreck et al, 2001).
b
Number of samples within each group of samples.
c
DLD score associated to each binary partition discovered with ISIS.
d
Number of genes significant at P < 0.002 in univariate F-test obtained using BRBArrayTools (Radmacher et al, 2002; McShane et al, 2002)..
e
Number of genes containing less than 10 false positives with a 90% interval of
confidence.
f
Number of genes containing less than 10% false positives with a 90% interval of
confidence.
g
Probability that the predicted number of genes is significant (at P < 0.002) by chance
and not indicative of differences between the classes.
h
Number of upregulated genes in group 0 or group 1 of tumor samples.
i
ISIS classes with associated gene lists significantly (P < 0.05 using Bonferroni
correction) enriched in specific GO-terms in comparison to the distribution of GOterms of all genes present on the array, as assessed using
(Hosack et al, 2003).
j
Significant retrieved GO-terms.
EASE software v.2.0
FIGURE A. Hierarchical clustering of all analyzed samples except the cell lines.
Clustering was done using 3309 genes, using the Pearson metric and the average
clustering method. Expression levels are relative to a common reference, obtained by
pooling RNA from ten human cell lines. Increased (orange) or decreased (blue)
expression of the genes is shown for each sample. Major clusters and several of the genes
they contain are shown in boxes A-O.
REFERENCES
Birindelli S, Perrone F, Oggionni M, Lavarino C, Pasini B, Vergani B, Ranzani GN,
Pierotti MA and Pilotti S. (2001). Lab Invest, 81, 833-844.
Diehn M, Sherlock G, Binkley G, Jin H, Matese JC, Hernandez-Boussard T, Rees CA,
Cherry JM, Botstein D, Brown PO and Alizadeh AA. (2003). Nucleic Acids Res, 31, 219223.
Donghi R, Longoni A, Pilotti S, Michieli P, Della Porta G and Pierotti MA. (1993). J
Clin Invest, 91, 1753-1760.
Fielden MR, Halgren RG, Dere E and Zacharewski TR. (2002). Bioinformatics, 18, 771773.
Gelfi C, Righetti SC, Zunino F, Della TG, Pierotti MA and Righetti PG. (1997).
Electrophoresis, 18, 2921-2927.
Guffanti A, Reid JF, Alcalay M and Simon G. (2002). Trends Genet, 18, 589-592.
Hosack DA, Dennis G, Jr., Sherman BT, Lane HC and Lempicki RA. (2003). Genome
Biol, 4, R70.
McShane LM, Radmacher MD, Freidlin B, Yu R, Li MC and Simon R. (2002).
Bioinformatics, 18, 1462-1469.
Radmacher MD, McShane LM and Simon R. (2002). J Comput Biol, 9, 505-511.
Schaner ME, Ross DT, Ciaravino G, Sorlie T, Troyanskaya O, Diehn M, Wang YC,
Duran GE, Sikic TL, Caldeira S, Skomedal H, Tu IP, Hernandez-Boussard T, Johnson
SW, O'Dwyer PJ, Fero MJ, Kristensen GB, Borresen-Dale AL, Hastie T, Tibshirani R,
van de RM, Teng NN, Longacre TA, Botstein D, Brown PO and Sikic BI. (2003). Mol
Biol Cell, 14, 4376-4386.
von Heydebreck A, Huber W, Poustka A and Vingron M. (2001). Bioinformatics, 17,
S107-S114.
Welsh JB, Zarrinkar PP, Sapinoso LM, Kern SG, Behling CA, Monk BJ, Lockhart DJ,
Burger RA and Hampton GM. (2001). Proc Natl Acad Sci U S A, 98, 1176-1181.
Download