Nature template - PC Word 97 - University of Utah Health Care

advertisement
1
Supplemental Data
A Transcriptional Profiling Meta-Analysis Reveals a
Core EWS-FLI Gene Expression Signature
Jeffrey D. Hancock1,2 & Stephen L. Lessnick1,2,3
1
The Division of Pediatric Hematology/Oncology, University of Utah School of
Medicine, Salt Lake City, Utah 84112.
2
The Center for Children, Huntsman Cancer
Institute, University of Utah School of Medicine, Salt Lake City, Utah 84112.
3
Department of Oncological Sciences, University of Utah School of Medicine, Salt Lake
City, Utah 84112.
Supplemental methods: (below) Detailed description of the methods used for the
comparative microarray analyses.
Supplemental Figure 1: Venn diagram of genes represented in the human tumor data
sets
Supplemental Figure 2: ASSESS heatmaps comparing model gene sets across
individual samples in human tumor data sets
Supplemental Table 1: Microsoft Word file containing the descriptions of the
phenotypes observed in the various Ewing’s sarcoma models
Supplemental Table 2: Microsoft Word file containing a table outlining the various
human tumors and tissues present in Ewing’s tumor data sets
Supplemental Table 3: Microsoft Excel file containing the gene symbols comprising
the upregulated and downregulated gene sets from the Ewing’s sarcoma model systems.
The spreadsheets are in the .gmx file format for use in the GenePattern and GSEA
programs.
2
Supplemental Table 4: Microsoft Excel file containing a tables outlining the gene
symbols shared between the data sets. The first worksheet lists the gene symbols shared
between each gene set and the human data set. The second and third worksheet outline
the genes shared amongst the individual upregulated and downregulated gene sets as
calculated by the VennMaster program.
Supplemental Table 5: Microsoft Excel file containing GSEA comparison of Ewing’s
sarcoma model systems to human data sets
Supplemental Table 6: Microsoft Excel file containing ASSESS comparisons of
Ewing’s sarcoma model gene sets across individual samples in human data sets.
Spreadsheets are in the .gct file format for use in the GenePattern Suite.
Supplemental Table 7: Microsoft Excel file containing a list of the upregulated
leading edge gene symbols identified in across all Ewing's models enriched at a FDR
<0.25 within the human data sets.
Supplemental Table 8: Microsoft Excel file containing a list of the downregulated
leading edge gene symbols identified in across all Ewing's models enriched at a FDR
<0.25 within the individual human data set.
Supplemental Table 9: Microsoft Excel file containing a list of the leading edge gene
symbols identified across all models as well as when limited to the EWS-FLI models
enriched at a FDR <0.25 within the mesenchymal stem cell samples.
3
Ewing’s Sarcoma Microarray Data
To identify all Ewing’s sarcoma model system and tumor expression profiles, we
performed a systematic survey of the literature and microarray data repositories. This
was accomplished through iterative searches of PubMed, NCBI GEO, and ArrayExpress
using both the canonical Medical Subject Heading Terms (MeSH) for Ewing’s sarcoma
and microarray data ("Sarcoma, Ewing's"[MeSH] AND "Microarray Analysis"[MeSH])
as well as commonly used variations (ewing, ewings, sarcoma, EWS/FLI, EWS-FLI,
EWS, FLI, etc.).
Ewing’s Sarcoma Model Systems:
Braunreiter NIH3T3 EWS-ETS1
NIH3T3 murine fibroblasts were infected with retroviral constructs containing 1X
FLAG-tagged constructs with one of the following EWS-ETS fusions: EWS-FLI, EWSERG, EWS-FEV, EWS-ETV1, and EWS-ETV4. These were competitively hybridized
on the array against RNA harvested from uninfected NIH3T3 cells. Experiments were
performed using HCI mouse cDNA array. Slides consisted of 19200 murine cDNA
clones representing 13590 unique genes, encompassing UniGene release 10/2/2005.
Clones were generated from the National Institute on Aging (NIA) 15K and NIA 7.4K
mouse clone set. Data are available at
http://www.hci.utah.edu/publicweb/content/lessnick/mscSupplementalBraunreiter2006_
files/mscSupplemental-Braunreiter-2006.html
Deneen NIH3T3 EWS-ETS2
This data set was generated from polyclonal NIH3T3 cell populations expressing one of
three EWS-ETS fusion genes (EWS-FLI, EWS-ERG, EWS-ETV1). Five Affymetrix
4
arrays (Mu11kSubA, Mu11kSubB, Mu19SubA, Mu19kSubB, and Mu19kSubC
GeneChips) were used to generate these data. The complete data set is not publicly
available. Therefore, we used the reported sets of EWS-ETS upregulated and
downregulated genes. These gene sets were downloaded from
http://mcb.asm.org/cgi/content/full/23/11/3897/T1
Hu RMS EWS-FLI3
This data set was generated from an embryonal rhabdomyosarcoma cell line infected
with an inducible retroviral construct expressing EWS-FLI. The infected lines were
induced to express EWS-FLI and RNA was harvested at periodic intervals for 3 days.
The controls and tetracycline induced samples were hybridized with several replicates
to Affymetrix U95Av2 arrays and were used to generate these data (36 total samples).
The complete data set is not yet publicly available, but was generously provided by S.
Hu-Lieskovan.
Kinsey EWS-FLI KD4
This study is comprised of 16 total samples. The data set was generated from polyclonal
luc-RNAi infected (i.e. EWS-FLI expressing) and EWS-FLI-RNAi (EF-2-RNAi, EF-4RNAi) infected Ewing’s sarcoma lines TC71 (5 samples) and EWS502 (11 samples).
Experiments were performed on Affymetrix U133 Plus 2.0 arrays. The raw data are
available at
http://www.hci.utah.edu/publicweb/content/lessnick/molecularcancerResearch2006/msc
Supplemental2006.html
Lessnick HFF EWS-FLI5
This data set was generated from human neonatal foreskin fibroblasts infected with an
inducible retroviral construct expressing EWS-FLI. The infected lines were induced to
5
express EWS-FLI and RNA was harvested daily for 4 days. Duplicate samples were
prepared in a subsequent week. The control (pre-induction) and induced samples were
hybridized to Affymetrix U95Av2 arrays (10 total samples). This data set can be
accessed in an Excel spreadsheet “Appendix 2” at:
http://www.hci.utah.edu/publicweb/content/lessnick/mscSupplementalLessnick2002_fil
es/mscSupplemental-Lessnick-2002.html
Prieur EWS-FLI KD6
This model represents the microarray analysis of A673 cell lines transfected with
control siRNA (siCT - EWS-FLI expressing) and siRNA against EWS-FLI (siEF1) and
is comprised of 4 total samples. The transfected lines were hybridized in duplicate
arrays to Affymetrix U133A arrays. Though the raw data were kindly made available to
us by the authors, we were unable to extract gene sets using our uniform SAM
parameters. Therefore we used the gene lists as published by the authors. These were
derived using a 2-class ANOVA, but statistical significance was not reported (likely due
to insufficient samples – the reason for the failure of our SAM analysis).
Riggi ES-D3 EWS-FLI7
This model is derived from the microarray analysis of hEWS-FLI-1V5 (cloned from
SK-N-MC) expression in embryonic stem (line ES-D3) fibroblasts. Expression of
hEWS-FLI-1V5 in ES-D3 cells was achieved using the Retroviral Gene Transfer and
Expression (BD Biosciences Clontech). Expression analysis was done using the NIA17k clone set cDNA arrays (Tanaka TS, Jaradat SA, Lim MK, et al.
http://www.unil.ch/dafl/page5509_en.html) and Quantifoil support array. Fluorescence
ratios for array elements were extracted by using ScanAlyze software. For each time
point and cell line, five m17k microarrays (among which two were dye swaps) were
done comparing hEWS-FLI-1-V5 expressing with empty vector control cells. Raw data
6
were not available for these experiments. However the gene lists generated by the
authors using one-sample, one-sided t tests (FDR <0.2) were available as supplementary
data. The upregulated and downregulated gene lists from all time points generated in
the ES-D3 experiments were compiled into a comprehensive gene set.
Riggi MPC EWS-FLI7
This model is derived from the microarray analysis of hEWS-FLI-1V5 (cloned from
SK-N-MC) expression in mesenchymal progenitor cells (MPC). MPCs were isolated
from bone marrow of wild-type adult C57BL/6 mice and cultured and then tested by
fluorescence-activated cell sorting for mesenchymal stem cell marker expression before
and after infection and selection. Expression of hEWS-FLI-1V5 in MPCs was achieved
using the Retroviral Gene Transfer and Expression (BD Biosciences Clontech).
Expression analysis was done using the NIA-17k clone set cDNA arrays (Tanaka TS,
Jaradat SA, Lim MK, et al. http://www.unil.ch/dafl/page5509_en.html) and Quantifoil
support array. Fluorescence ratios for array elements were extracted by using ScanAlyze
software. For each time point and cell line, five m17k microarrays (among which two
were dye swaps) were done comparing hEWS-FLI-1-V5 expressing with empty vector
control cells. Raw data were not available for these experiments. However the gene
lists generated by the authors using one-sample, one-sided t tests (FDR <0.2) were
available as supplementary data. The upregulated and downregulated gene lists from all
time points generated in the MPC experiments were compiled into a comprehensive
gene set.
Riggi STO EWS-FLI7
This model is derived from the microarray analysis of hEWS-FLI-1V5 (cloned from
SK-N-MC) expression in spontaneously immortalized embryonic (STO) fibroblasts
(MEF cell line). Expression of hEWS-FLI-1V5 in STOs was achieved using the
7
Retroviral Gene Transfer and Expression (BD Biosciences Clontech). Expression
analysis was done using the NIA-17k clone set cDNA arrays (Tanaka TS, Jaradat SA,
Lim MK, et al. http://www.unil.ch/dafl/page5509_en.html) and Quantifoil support
array. Fluorescence ratios for array elements were extracted by using ScanAlyze
software. For each time point and cell line, five m17k microarrays (among which two
were dye swaps) were done comparing hEWS-FLI-1-V5 expressing with empty vector
control cells. Raw data were not available for these experiments. However the gene
lists generated by the authors using one-sample, one-sided t tests (FDR <0.2) were
available as supplementary data. The upregulated and downregulated gene lists from all
time points generated in the STO experiments were compiled into a comprehensive
gene set.
Rorie NBL EWS-FLI8
This data set was generated from pooled EWS-FLI infected and uninfected control
neuroblastoma cell lines LAN 5 and NGP9A Tr1. Duplicate Affymetrix U95Av2 arrays
were used to generate these data (8 total samples). The relative expression values were
then computed using GeneSpring 5.0 (Silicon Genetics). The values were further
normalized and gene lists generated. These data were used for SAM analysis. The
complete data set is not yet publicly available, but was generously provided by B.
Weissman.
Siligan EWS-FLI KD9
This data was derived from the microarray analysis of polyclonal mismatch-RNAi
infected (i.e. EWS-FLI expressing) and EWS-FLI-RNAi (shEF22 and shEF4
respectively) infected Ewing’s sarcoma STA-ET-7.2 Ewing's sarcoma cells. After
appropriate selection for stably infected knockdown cells, replicates of each EWS-FLI
8
knockdown line and mismatch-RNAi samples were hybridized to Affymetrix U133A
arrays.
Smith EWS-FLI KD10
This gene set was derived from the microarray analysis of polyclonal luc-RNAi infected
(i.e. EWS-FLI expressing) and EWS-FLI-RNAi (EF-2-RNAi, EF-4-RNAi) infected
Ewing’s sarcoma A673 cells. After appropriate selection for stably infected knockdown
cells, two replicates of each EWS-FLI knockdown line and four replicate luc-RNAi
samples were hybridized to Affymetrix U133A arrays (8 total samples). This data set is
available at
http://www.hci.utah.edu/publicweb/content/lessnick/mscSupplementalSmith2006_files/
mscSupplemental-Smith-2006.html
Smith EWS-FLI inducible rescue10
This gene set is derived from the microarray analysis of a clonal A673 cell line (TetA673) that contained the FLAG-tagged EWS-FLI cDNA under the control of a
tetracycline-repressible promoter which were then subsequently infected with retroviral
RNAi with against EWS-FLI (EF-2-RNAi). After appropriate selection for stably
infected knockdown cells, tetracycline was withdrawn and the cells were allowed to
express exogenous EWS-FLI at levels comparable to endogenous expression. Total
RNA was collected at time points preceding and following EWS-FLI induction. A total
of 10 experimental and 5 control samples were hybridized to Affymetrix U133A arrays.
This data set is available at
http://www.hci.utah.edu/publicweb/content/lessnick/mscSupplementalSmith2006_files/
mscSupplemental-Smith-2006.html
9
Human Tumor Data sets:
Human Ewing’s sarcoma and rhabdomyosarcoma data set (Baer et al., 2004)11
This data set is comprised of the microarray analysis of 23 human sarcoma patient
samples from the University Children’s Hospital, Heidelberg, Germany. It contains the
profile of 2 human sarcoma tumor types: 11 Ewing’s sarcomas and 12 primary pediatric
rhabdomyosarcomas (9 alveolar and 3 embryonal). HE-stain confirmed each sample to
have >80% tumor cells. All hybridizations were performed on Affymetrix U95Av2
arrays. The data set is available at:
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE967
Human sarcoma data set (Baird et al., 2005) 12
This publicly available data set was generated from microarray analysis of 181 human
sarcoma patient samples at the National Human Genome Research Institute (NHGRI).
The data set includes 16 human sarcoma tumor types: 1 alveolar soft part sarcoma, 1
chondrosarcoma, 1 clear cell sarcoma, 5 dermatofibrosarcomas, 20 Ewing’s sarcomas, 7
fibrosarcomas, 5 gastrointestinal stromal tumors, 6 leiomyosarcomas, 33 liposarcomas,
38 malignant fibrous histiocytomas, 6 malignant hemangiopericytomas, 6 malignant
peripheral nerve sheath tumors, 2 mixed Mullerian tumors, 6 osteosarcomas, 6
rhabdomyosarcomas, 10 sarcomas (NOS), 3 benign schwannomas, and 18 synovial cell
sarcomas. The microarray platform used was a cDNA array containing 12601 cDNA
clones annotated with IMAGE CloneIDs. The complete data set was downloaded from
http://www.ncbi.nlm.nih.gov/geo/gds/gds_browse.cgi?gds=1268
Human mesenchymal tumor data set (Henderson et al., 2005)13
This publicly available data set was generated from microarray analysis of 96
mesenchymal tumors, representing 19 different sub-types from specimens resected at
10
the London Bone and Soft Tissue Tumour Service (Royal National Orthopaedic
Hospital, Stanmore and University College London Hospitals, London), Great Ormond
Street Hospital, London, or the Nuffield Orthopaedic Center, Headington, Oxford, in the
UK. The data set includes 4 alveolar rhabdomyosarcomas (3 PAX3-FKHR, 1 NA), 4
chondroblastomas, 4 chondromyxoid fibromas, 7 chondrosarcomas, 4 chordomas, 3
dedifferentiated chondrosarcomas, 3 embryonal rhabdomyosarcomas, 5 Ewing's
Sarcomas (all EWS-FLI), 5 fibromatoses, 8 leiomyosarcomas, 3 lipomas, 4 malignant
peripheral nerve sheath tumors, 10 monophasic synovial sarcomas (1 SYT-SSX NOS, 1
SYT-SSX2, 2 SYT-SSX1, 6 NA), 7 myxoid liposarcomas (4 FUS-CHOP, 3 NA), 4
neurofibromas, 11 osteosarcomas, 3 undifferentiated sarcomas, 4 schwannomas, and 3
well-differentiated liposarcomas. The profiling experiments were performed on
Affymetrix U133A Human GeneChips. The RMA algorithm was used for preprocessing, normalizing and calculation of expression values. The complete data set
was downloaded from http://www.ebi.ac.uk/aerep/dataselection?expid=484703006.
Human Small Round Blue Cell Tumor data set (Khan et al., 2001) 14
This publicly available data set was also generated at the NHGRI. It contains 63 human
small round blue cell tumor samples with 4 distinct tumor types represented: 23
Ewing’s sarcomas, 8 Burkitt’s lymphomas, 12 neuroblastomas, and 20
rhabdomyosarcomas. This data set was generated using a cDNA array containing 6567
clones annotated with IMAGE CloneIDs. This data set was downloaded from:
http://home.ccr.cancer.gov/oncology/oncogenomics/Data/rri_used_NatureMed_Alldata.
txt.
Risk stratified and metastatic Ewing’s sarcoma data set (Ohali et al., 2004)15
This data set is comprised of the microarray analysis of 14 primary tumor specimens
and 6 metastases. Samples were obtained from 18 patients admitted to the Pediatric
11
Hematology Oncology Department at Schneider Children's Medical Center. All
patients were treated with a combination of aggressive chemotherapy, radiotherapy and
surgery. The median age at diagnosis was 15 years (range 7-27). Five patients were
female and 13 were male subjects. Response to therapy was defined by
histopathological response and assessed by percentage of tumor necrosis at the time of
surgery (limb salvage procedure) following neoadjuvant chemotherapy and
radiotherapy. The median follow-up was 72.5 months (range 7-171). All samples were
hybridized to Affymetrix U95Av2 arrays. The complete data set is not yet publicly
available, but was generously provided by S. Avigad.
PEPR normal human tissue data set (Chen et al., 2004)16
This data set is derived from several normal human tissue samples processed and made
available at the Public Expression Profiling Resource (PEPR). We queried the PEPR
data repository for all normal human bone and skeletal muscle samples which were
hybridized to Affymetrix U95Av2 arrays. This query resulted in the identification of 2
bone samples and 16 skeletal muscle samples. The 2 bone samples were technical
replicates derived from the Skeletal Genome Anatomy project. These were pooled from
4 individuals who healed normally from fractures ages 35-81 (SGAP-NormalIIP1aAv2-s2). The 16 skeletal muscle samples were derived from normal skeletal muscle
controls from the the Acute Quadriplegic Myopathy17 (2), DMD temporal profiling18
(10), and Duchenne19 (4) data sets. These samples were used exclusively as
comparators to the Ohali et al. samples. Public access to these data were supported by
grants from the NIH (National Center for Medical Rehabilitation Research
5R24HD050846, and Wellstone Muscular Dystrophy Center 1U54HD053177).
12
Human Ewing’s sarcoma and neuroblastoma data set (Staege et al., 2004)20
This data set represents the microarray analysis of 10 human sarcoma patient samples.
Primary Ewing’s sarcoma samples were from C. Poremba and K-L. Schäfer
(Düsseldorf, Germany). Primary neuroblastoma samples were from F. Berthold
(Cologne, Germany). The 2 human sarcoma tumor types were comprised of 5 Ewing’s
sarcomas and 5 neuroblastomas (Stages I, III and IV). RNA from native tumor samples
was processed for DNA-microarray analysis using Affymetrix U133A arrays. This data
set is available at: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1825
Human Ewing’s sarcoma, normal tissue and mesenchymal stem cell data set
(Tirode et al., 2007)21
This dataset is comprised of the 27 human Ewing’s sarcoma samples as well as the
freshly isolated (P1) BMSCs processed by Tirode et al. We also included the Tirode et
al. compilation of CEL files from E-AFMX-5 (Su et al.,22) and from E-MEXP-167 and
E-MEXP-168 (Boquest et al.,23) as reported in their supplementary data. We excluded
the Ewing’s cell line samples and the EWS-FLI knockdown samples from the overall
data set. All microarray experiments were performed on Affymetrix U133A arrays. The
Su et al. and Boquest et al. data sets are available at EBI’s ArrayExpress repository.
The data original to Tirode et al may be accessed at:
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE7007
13
Human tumors and tissues data sets (Whiteford et al., 2007)24
The full experiment is comprised of 182 human pediatric xenograft, primary tumor, cell
lines and normal patient samples gathered by the Oncogenomics Section, Pediatric
Oncology Branch, National Cancer Institute, NIH, from the Children's Oncology Group
(COG) Preclinical Protein-Tissue Array Project (POPP-TAP), available at:
http://home.ccr.cancer.gov/oncology/oncogenomics/
The “Whiteford et al. human tissues” data set includes 70 primary tumor samples:
30 Neuroblastomas, 21 Rhabdomyosarcomas, and 19 Ewing's sarcomas as well as 19
normal human tissue samples. The “Whiteford et al. tumors” and “Whiteford et al.
Ewing's vs normal” sets are simply subdivisions of these samples. The “Whiteford et
al. tumors” set is limited exclusively to the tumor samples. The “Whiteford et al.
Ewing's vs normal” set is limited to the Ewing’s sarcoma and normal tissues samples.
These samples were competitively hybridized on the cDNA arrays against reference
RNA derived from a pooled group of seven sarcoma cell lines: CHP212, RD, HeLa,
A204, K562, RDES, and CA46. All experiments were performed on the NCI human
cDNA microarray platform. This is a cDNA array containing 42,578 cDNA clones,
representing 13,606 unique genes and 12,327 expressed sequence tags, annotated with
IMAGE CloneIDs.
Preprocessing and normalization of data sets
All data sets for which we had obtained the raw .CEL files were processed using the
Expression File Creator module in GenePattern25, with the exception of the Tirode et al.
data. The files were processed using the MAS5 algorithm26. Normalization was
performed using median scaling. Absent and present calls were ignored. For the Tirode
et al., data we used the mirrored the procedure outlined in their publication, using the
14
GCRMA algorithm27 as instituted in R Bioconductor to process and normalize the .CEL
files.
Probe matching across data sets
Because the data sets used in our analyses were generated using different microarray
platforms, we converted the annotation to HUGO gene symbols to facilitate direct
comparison. For comparing between human and mouse data sets, human HUGO gene
symbols were used as the common identifier. Details of the conversion of each
individual data set and gene list follow below.
Conversion of Affymetrix accession number to HUGO gene symbol
The majority of the data sets were annotated by Affymetrix accession numbers. For the
model gene sets we matched the Affymetrix accession numbers to their corresponding
human HUGO symbols via the GeneCruiser ver.4 module available in GenePattern. For
the human data sets used as comparators in GSEA and ASSESS we used the “Collapse
Max Probes to Symbols” option to process the annotations and match each Affymetrix
symbol to its corresponding gene symbol.
Conversion of human IMAGE ID to HUGO gene symbol
Several data sets were annotated by IMAGE IDs. We matched these IMAGE IDs to
their corresponding HUGO gene symbol via the SOURCE batch unification tool
available at http://source.stanford.edu.
Annotation of mouse microarray data with human HUGO gene symbols
To convert the mouse annotation of gene sets derived from our SAM analyses to human
UniGeneIDs we made use of the both mouse and human UniGene databases. We first
used the mouse database
15
(ftp://ftp.ncbi.nih.gov/repository/UniGene/Mus_musculus/Mm.data.gz build #162, 27
Feb 2007) to match the NCBI accession numbers from our gene sets to their mouse
UniGeneID and corresponding most homologous human ProteinID. Conveniently, the
mouse UniGene database contains a “best match” homologous human proteinID for
each mouse UniGene entry. We made use of this homologous human ProteinID to then
find the appropriate human HUGO gene symbol from the human UniGene database
(ftp://ftp.ncbi.nih.gov/repository/UniGene/Homo_sapiens/Hs.data.gz build #201, 01
Mar 2007).
Extraction of EWS-FLI induced model gene sets
For the majority of the data sets, representative EWS-FLI induced gene lists have been
published in support of their relevance to Ewing’s sarcoma. However, these lists were
not extracted in a uniform manner. In order to eliminate bias when performing GSEA,
we used uniform parameters to extract dysregulated gene sets from all model data sets
for which we had obtained the raw data. We used SAM 28 as instituted in the TM4 MeV
software 29 to identify the probes differentially regulated between Ewing’s models and
the experimental controls for each microarray experiment. We first limited the data to
those genes showing a fold change of at least 2. We then chose delta (tuning) values for
each analysis to identify clones at a false discovery rate (FDR) of <0.05 against 1000
random permutations. The genes identified as being upregulated between the Ewing’s
models and their comparators were designated as the upregulated and downregulated
gene sets and are available in Supplementary Table 1.
Venn Diagrams
To produce two-dimensional representations of the overlapping genes between sets we
used VennMaster30. This tool draws area proportional Venn/Euler-diagrams and
optimizes the areas in space to represent their relations. This software is available at
16
http://www.informatik.uni-ulm.de/ni/staff/HKestler/vennm/. We used the gene lists
available in Supplemental Table 3 as input data for the diagrams. The following
parameters were used to generate the diagrams. Global options: Size factor 0.7, Number
of edges 16, Seed 173, Update interval 10, Max intersections 12. Error function: Max
intersections 12, and remainder at default. Optimization: Particle swarm with default
values. The full list of overlaps is contained in Supplemental Table 4.
Gene Set Enrichment Analysis (GSEA)
GSEA has been used previously to compare microarray data sets, including data sets
generated on different microarray platforms, and data sets obtained from different
organisms10,31,32. GSEA measures the “enrichment” of one gene set near the top of a
second ordered gene list. Enrichment is quantified using a running-sum statistic called
the enrichment score (ES). The best possible ES is 1 (indicating perfect correlation),
while the worst possible ES is -1 (indicating perfect inverse correlation.) In GSEA the
null hypothesis is that the genes in the gene set are randomly distributed through the
rank-ordered list. Rejection of the null hypothesis indicates that the gene set is
preferentially enriched near the top of the rank-ordered list, indicating significant
similarity between the gene set and the ordering of the rank-ordered list. All analyses
were performed using GSEA version 2.0.1, available at http://www.broad.mit.edu/gsea/.
A unique advantage of GSEA is that it allows us to directly compare the
enrichment of all the Ewing’s model gene sets across all the human tumor data sets. A
normalization procedure is used to correct for gene set size differences across analyses,
and it outputs a normalized enrichment score (NES). Statistical significance and
correction for multiple comparison testing is determined by calculating a false discovery
rate (FDR) q value by permutation testing.
17
To control for the potential confounding effects of using data sets with different
control samples we subdivided the Whiteford et al. data set into two subsets. With little
exception, the exclusion of the different controls did not seem to influence the
enrichment of the model gene sets in the Ewing’s sarcoma samples. These results
underline the robustness of gene set enrichment analysis in correctly identifying
correlations between expression profiles.
Analysis of Sample Set Enrichment ScoreS (ASSESS)
ASSESS is an extension of the statistical approach used in GSEA33. The first step of
traditional GSEA is to rank order the genes in a data set according to their correlation to
a particular class (e.g. Ewing’s sarcoma vs other tumors) using a signal-to-noise (SNR)
algorithm. Following the completion of the SNR analysis, ASSESS will then compare
the individual gene expression values in a single sample to the expression values across
all of the samples (as ranked by the SNR). This secondary analysis generates a
likelihood ratio metric that represents the correlation of each gene to one class versus
the other. Thereafter the individual sample is individually rank-ordered according to the
likelihood ratio metric. This process is performed for all samples within the data set
such that in the end each sample is uniquely ordered. The enrichment of each gene set
is then calculated individually within each sample in the data set using a running sum
statistic similar to that employed in GSEA. Statistical significance is determined by
permutation testing and multiple testing is corrected for using FDR as in GSEA. All
analyses were performed using ASSESS http://people.genome.duke.edu/assess/.
Comparison of Ewing’s sarcoma model gene sets to human tumor rank ordered
lists
We used GSEA and ASSESS to test for enrichment of the Ewing’s sarcoma models
signatures in each of the separate tumor types represented within the human tumor data
18
sets. We used a signal-to-noise (SNR) analysis with 1000 random permutations of the
human data set as instituted in javaGSEA v2.0.131 and ASSESS33 to generate the rankordered list. The GSEA rank-list analysis was classed to compare the tumor phenotype
of interest samples vs. all others. The ASSESS analysis first performed a similar twoclass based analysis and rank ordering. Subsequently the ASSESS algorithm re-ranked
the genes in each separate sample according to their correlation to the initial rank list as
determined by a non-parametric test. The previously described SAM derived set of
differentially upregulated genes from the Ewing’s models were used as the comparator
genes set in the enrichment analyses.
Comparison of Ewing’s sarcoma models
Once normalized enrichment scores were obtained from all experiments we performed a
simple summation of these scores to compare the model systems. To determine the
models which were most like human Ewing’s tumors as determined by GSEA, we
added the normalized enrichment scores for each model across all human tumor data
sets. The models were then rank ordered in a descending manner from the largest
composite NES to the smallest. To compare the models via ASSESS, we added the
NES for each model across all the individual Ewing’s tumor samples within a data set.
The models were then rank ordered within the individual data sets in descending order,
again with the largest sum NES at the top, and the smallest composite NES at the
bottom.
Leading edge analysis
To identify these EWS-FLI targets we analyzed the “leading-edges” from our GSEA
results. In a gene set enrichment analysis the leading-edge subset is comprised of those
genes that appear in the ranked list at or before the point at which the running sum
reaches its maximum deviation from zero. The leading-edge subset can be interpreted as
19
the core that accounts for the gene set’s enrichment signal.31 We first identified all
model gene sets enriched in the human data sets. For discovery purposes we limited the
model gene sets to those enriched at a FDR < 0.25.
References
1.
Braunreiter, C.L., Hancock, J.D., Coffin, C.M., Boucher, K.M. & Lessnick, S.L.
Expression of EWS-ETS fusions in NIH3T3 cells reveals significant differences to
Ewing's sarcoma. Cell Cycle 5, 2753-9 (2006).
2.
Deneen, B. et al. PIM3 proto-oncogene kinase is a common transcriptional target
of divergent EWS/ETS oncoproteins. Mol Cell Biol 23, 3897-908 (2003).
3.
Hu-Lieskovan, S. et al. EWS-FLI1 fusion protein up-regulates critical genes in
neural crest development and is responsible for the observed phenotype of Ewing's
family of tumors. Cancer Res 65, 4633-44 (2005).
4.
Kinsey, M., Smith, R. & Lessnick, S.L. NR0B1 is required for the oncogenic
phenotype mediated by EWS/FLI in Ewing's sarcoma. Mol Cancer Res 4, 851-9 (2006).
5.
Lessnick, S.L., Dacwag, C.S. & Golub, T.R. The Ewing's sarcoma oncoprotein
EWS/FLI induces a p53-dependent growth arrest in primary human fibroblasts. Cancer
Cell 1, 393-401 (2002).
6.
Prieur, A., Tirode, F., Cohen, P. & Delattre, O. EWS/FLI-1 silencing and gene
profiling of Ewing cells reveal downstream oncogenic pathways and a crucial role for
repression of insulin-like growth factor binding protein 3. Mol Cell Biol 24, 7275-83
(2004).
7.
Riggi, N. et al. Development of Ewing's sarcoma from primary bone marrow-
derived mesenchymal progenitor cells. Cancer Res 65, 11459-68 (2005).
20
8.
Rorie, C.J. et al. The Ews/Fli-1 fusion gene switches the differentiation program
of neuroblastomas to Ewing sarcoma/peripheral primitive neuroectodermal tumors.
Cancer Res 64, 1266-77 (2004).
9.
Siligan, C. et al. EWS-FLI1 target genes recovered from Ewing's sarcoma
chromatin. Oncogene 24, 2512-24 (2005).
10.
Smith, R. et al. Expression Profiling of EWS/FLI Identifies NKX2.2 as a
Critical Target Gene in Ewing’s Sarcoma. Cancer Cell 9(2006).
11.
Baer, C. et al. Profiling and functional annotation of mRNA gene expression in
pediatric rhabdomyosarcoma and Ewing's sarcoma. Int J Cancer 110, 687-94 (2004).
12.
Baird, K. et al. Gene expression profiling of human sarcomas: insights into
sarcoma biology. Cancer Res 65, 9226-35 (2005).
13.
Henderson, S.R. et al. A molecular map of mesenchymal tumors. Genome Biol
6, R76 (2005).
14.
Khan, J. et al. Classification and diagnostic prediction of cancers using gene
expression profiling and artificial neural networks. Nat Med 7, 673-9 (2001).
15.
Ohali, A. et al. Prediction of high risk Ewing's sarcoma by gene expression
profiling. Oncogene 23, 8997-9006 (2004).
16.
Chen, J. et al. The PEPR GeneChip data warehouse, and implementation of a
dynamic time series query tool (SGQT) with graphical interface. Nucleic Acids Res 32,
D578-81 (2004).
17.
Di Giovanni, S. et al. Constitutive activation of MAPK cascade in acute
quadriplegic myopathy. Ann Neurol 55, 195-206 (2004).
18.
Chen, Y.W. et al. Early onset of inflammation and later involvement of TGFbeta
in Duchenne muscular dystrophy. Neurology 65, 826-34 (2005).
21
19.
Chen, Y.W., Zhao, P., Borup, R. & Hoffman, E.P. Expression profiling in the
muscular dystrophies: identification of novel aspects of molecular pathophysiology. J
Cell Biol 151, 1321-36 (2000).
20.
Staege, M.S. et al. DNA microarrays reveal relationship of Ewing family tumors
to both endothelial and fetal neural crest-derived cells and define novel targets. Cancer
Res 64, 8213-21 (2004).
21.
Tirode, F. et al. Mesenchymal stem cell features of Ewing tumors. Cancer Cell
11, 421-9 (2007).
22.
Su, A.I. et al. A gene atlas of the mouse and human protein-encoding
transcriptomes. Proc Natl Acad Sci U S A 101, 6062-7 (2004).
23.
Boquest, A.C. et al. Isolation and transcription profiling of purified uncultured
human stromal stem cells: alteration of gene expression after in vitro cell culture. Mol
Biol Cell 16, 1131-41 (2005).
24.
Whiteford, C.C. et al. Credentialing preclinical pediatric xenograft models using
gene expression and tissue microarray analysis. Cancer Res 67, 32-40 (2007).
25.
Reich, M. et al. GenePattern 2.0. Nat Genet 38, 500-1 (2006).
26.
Affymetrix. Affymetrix Microarray Suite User Guide, (Affymetrix, Santa Clara,
CA, 2001).
27.
Wu, Z. & Irizarry, R.A. Preprocessing of oligonucleotide array data. Nat
Biotechnol 22, 656-8; author reply 658 (2004).
28.
Tusher, V.G., Tibshirani, R. & Chu, G. Significance analysis of microarrays
applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98, 5116-21
(2001).
29.
Saeed, A.I. et al. TM4: a free, open-source system for microarray data
management and analysis. Biotechniques 34, 374-8 (2003).
22
30.
Kestler, H.A., Muller, A., Gress, T.M. & Buchholz, M. Generalized Venn
diagrams: a new method of visualizing complex genetic set relations. Bioinformatics 21,
1592-5 (2005).
31.
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based
approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A
102, 15545-50 (2005).
32.
Sweet-Cordero, A. et al. An oncogenic KRAS2 expression signature identified
by cross-species gene-expression analysis. Nat Genet 37, 48-55 (2005).
33.
Edelman, E. et al. Analysis of sample set enrichment scores: assaying the
enrichment of sets of genes for individual samples in genome-wide expression profiles.
Bioinformatics 22, e108-16 (2006).
Download