Word file

advertisement
SUPPLEMENTARY METHODS
Fish husbandry
Zebrafish were spawned and reared in a temperature controlled room at 27 ± 2° C with a 14-hour
light/10-hour dark cycle. Conditioned water (CW) for fish rearing and maintenance was
produced by passing well water through an ultraviolet sterilization unit, degassing column, sand,
activated carbon filters and buffered to pH 7.2-7.4. Water flow to aerated tanks was controlled by
a timer activating water flow several times per day. Larvae were initially fed equal parts of
Microfeast (Salt Creek, Inc., Salt Lake City, UT), a powdered complete diet, and Encapsulon
(Argent Laboratories, Redmond, WA), a microencapsulated larval fish diet 3-5X daily. At two
weeks of age, Microfeast was discontinued and brine shrimp nauplii (Argentemia, Argent
Laboratories) were added to the diet. At six weeks of age, Encapsulon was discontinued, and fish
were fed Oregon Test Diet twice daily ad libitum and brine shrimp once daily.
Carcinogen exposures
This carcinogenesis works were carried out at Oregon State University to identify lines of
zebrafish highly sensitive to carcinogen-induced neoplasia, and different genetic lines,
carcinogen and dosages were used (Supplementary Table1 online). Static fry immersion
exposures with 7,12-dimethylbenz(a)anthracene (DMBA) and dibenzo(a,l)pyrene (DBP) were
conducted separately in glass beakers for 24 hours in the dark. Aldrich Chemical Co.
(Milwaukee, WI) supplied both DMBA and DBP. The fry were exposed to carcinogen or DMSO
at 3 weeks post-fertilization. Typically we used treatment groups of 100-150 fry. When
exposures were completed, fish were rinsed in 3 changes of water, and placed into polypropylene
tubs for rearing until 6 weeks of age when they were placed into fish tanks. Fish treated with
carcinogens were typically sampled for histology 6-12 months following the onset of carcinogen
exposure. Tumors sampled for the microarray study were all over 3 mm in diameter so that gross
dissection would leave sufficient tissue for histological diagnosis. The normal male livers were
also from a variety of wild-type and mutant lines sampled at 7 months to 14 months old for the
microarray study.
Histology procedures and analysis
1
Fish were anesthetized in tricaine methane sulfonate (MS 222; Argent Laboratories) and the belly
slit from heart to anus. A syringe was used to flush buffered zinc formalin fixative over the gills
and throughout the abdomen. Fish were fixed in buffered zinc formalin for 24 hr, decalcified for
48 hr in Cal X II (formic acid/formalin; Fisher Scientific), then dehydrated in a graded series of
ethanol solutions, and embedded in paraffin. Saggital step sections were cut from the left side.
Nine 4-6 m step sections were saved between the middle of the lens of the left eye and the
middle of the lens of the right eye. Three sections were placed onto each of three slides, and
stained routinely with hematoxylin and eosin.
Hepatocellular and cholangiocellular neoplasms of zebrafish are quite similar histologically to
those liver neoplasms of humans. We used the criteria developed for classifying rodent liver
neoplasms and foci of hepatocellular alteration to categorize most of our liver neoplasms and
altered foci in zebrafish (Goodman et al., 1994; In: Guides for Toxicologic Pathology.
STP/ARP/AFIP. Washington, D.C. 24 p.). However, the criteria for grading of hepatocellular
carcinomas in humans do not precisely fit for the zebrafish. The nuclear to cytoplasm ratio which
is used in the human hepatocellular carcinoma grading system is not really appropriate for
zebrafish. The cytoplasmic volume of normal zebrafish liver varies much depending on a variety
of factors incuding sex, diet and toxicant exposure. Nevertheless, if only nuclear factors were
considered, then the grading system could be applied to the zebrafish. That is, the most
anaplastic or embryonal tumors of zebrafish have the greatest nuclear irregularity,
hyperchromatism, and prominent nucleoli. Based on this criterion, we observed that the zebrafish
tumors ZFL T1+ and T10+ were highly anaplastic (i.e. having a high component of anaplastic
embryonal cells reminiscence of high grade tumors) while ZFL T9+ was the least anaplastic (i.e.
consists primarily of hepatocellular adenoma and well differentiated hepatocellular carcinoma
reminiscence of lower grade tumors), thus they were at either end of the spectrum among the ten
tumors. The remaining seven tumors were in between the spectrum, with ZFL T2+ and T8+ (less
anaplastic and better differentiated) closer to ZFL T9+. The zebrafish genetic background,
carcinogen treatment, tumor size and tumor histological description of liver tumor samples used
in this study is summarized in the table below.
2
Zebrafish genetic background, carcinogen treatment, tumor size and tumor histological
description of liver tumor samples used in this study.
Tumor
Samples
Genetic
Background
Carcinogen
Treatment
Tumor Size and Description
ZFL T1+
AB (Wild-Type)
2.5ppm DMBA
6x4x4mm liver tumor. Anaplastic cholangiocellular
carcinoma, with high component of hepatoblastoma,
and much necrosis.
ZFL T2+
Cologne (WildType)
2.5ppm DBP
4mm soft, tan liver tumor. Mixed carcinoma with
medium differentiation level.
1.25ppm DBP
5x2x2 mm tan mass in liver. Anaplastic
hepatocellular carcinoma (bulk of tumor), also
anaplastic mixed carcinoma arising in wall of gall
bladder.
TL (Uma)
0.6ppm DMBA
5x2x2 mm tan mass in liver. Anaplastic
hepatocellular carcinoma, hepatocellular adenoma
and hepatoblastoma evident histologically.
TL (Uma)
1.25ppm
DMBA
5mm tan liver tumor. Anaplastic mixed carcinoma with
hepatoblastoma component. Hepatocellular adenoma
also present on histology.
0.6ppm DBP
7x6x4 mm multilobulated tan mass in liver. Collision
tumor. Cholangiocellular carcinoma with intestinal
differentiation. Anaplastic mixed carcinoma with
hepatoblastoma component.
1.25ppm DBP
7x5x4 mm tan mass in liver. Anaplastic mixed
carcinoma. Hepatocellular carcinoma component has
spindloid sarcomatous pattern.
5ppm DBP
5x4x4mm multilobulated soft tan mass in liver. Mixed
carcinoma with relatively well differentiated
hepatocellular component.
2.5ppm DMBA
Liver 10X normal size with 1-2 mm white foci present.
Both hepatocellular adenoma and well differentiated
hepatocellular carcinoma present, colliding, in liver.
5ppm DMBA
7x6x4mm soft tan to white mass with 5 mm cyst filled
with clear fluid in liver. Myelocytic sarcoma present
histologically in liver.
ZFL T3+
ZFL 4+
ZFL T5+
ZFL T6+
ZFL T7+
ZFL 8+
TL (Uma)
TL (Uma)
TU (Wild Type)
TU (Wild Type)
ZFL 9+
TU x AB (Alf)
ZFL T10+
TU x AB (WildType)
3
Liver tissue sampling and Total RNA extraction
Instruments were cleaned with RNaseZAP (Ambion, Austin, Texas). Half of large grossly visible
liver tumors (>3 mm diameter) or normal liver was removed and immediately placed into
RNAlater (Ambion, Austin, Texas). These samples were shipped in coolers containing 4o C
coldpacks. Total RNA of tissue samples was extracted using Trizol reagent (Invitrogen)
according to the manufacturer’s instructions. Reference RNA was obtained by pooling equal
amount of male and female total RNA extracted from normal-looking liver tissues of wild-type
zebrafish. This pooled reference RNA is used as the ‘reference’ for hybridization with normal
and tumor samples so that both normal and tumor samples are comparable in our two-color
microarray system. The integrity of RNA samples was verified by gel electrophoresis and the
concentrations were determined by UV spectrophotometer. RNA samples were stored at -80oC
until used.
Zebrafish oligonucleotide microarray construction and hybridization
Zebrafish oligonucleotide probes for this array were designed by Compugen (USA) and
synthesized by Sigma Genesis (USA). For each gene feature in the array, one 65-mer
oligonucleotide probe was designed from the 3’ region sequences. Each probe was selected from
a sequence segment that is common to a maximum number of splice variants predicted for each
gene. The arrays contained 16,416 oligonucleotide probes representing ~15,800 unique genes
(more information can be obtained from http://www.labonweb.com/chips/libraries.html), which
is about 1/3 of the zebrafish genome. The array also contains 172 spots representing β-actin
probes as controls. Oligonucleotide probes were resuspended in 3XSSC at 20 µM concentration
and spotted onto poly-L-Lysine coated microscope slides using a custom-built DNA microarrayer
(DeRisi, communication) in the Genome Institute of Singapore (GIS).
For fluorescence labeling of cDNAs, 20 µg of total RNA from the reference and experimental
samples were reverse transcribed in the presence of Cy3-dUTP and Cy5-dUTP (Amersham Inc.),
respectively. Labeled cDNA were pooled, concentrated, and resuspended in DIG EasyHyb
(Roche Applied Science) buffer for hybridization at 42oC for 16 hours in the MAUI® system
(BioMicro, USA). After hybridization the slides were washed in a series of washing solutions
(2X SSC with 0.1% SDS; 1X SSC with 0.1 % SDS; 0.2X SSC and 0.05X SSC; 30 seconds each),
4
dried using low-speed centrifugation and scanned for fluorescence detection.
Acquisition and Statistical Filtering for Zebrafish Liver Tumor Data
The arrays were scanned using the GenePix 4000B microarray scanner (Axon Instruments, USA)
and the generated images with their fluorescence signal intensities were analyzed using GenePix
Pro 4.0 image analysis software (Axon Instruments, USA). All the data were uploaded into the
GIS Microarray Database where normalization (median centered normalization), statistical
filtering and analyses were carried out. Only gene features that were not flagged and those with
signal to background ratio more than 1.5 were extracted for analyses. The microarray raw data
has been submitted into the National Center for Biotechnology Information (NCBI) Gene
Expression Omnibus database (GEO Accession Number: GSE 3519) and is compliant with
MIAME standard.
Statistical comparison of genes between 10 liver tumor and 10 normal liver (control) samples was
performed using Wilcoxon rank-sum non-parametric test and the resulting p-values were adjusted
for Benjamini and Hochberg False Discovery Rate (Benjamini & Hochberg, 1995). As a result of
the statistical tests, 2,315 gene features (~14% of total gene features) representing 1861 unique
zebrafish Unigene clusters (Build 85), were found to be significantly [p-value<0.05 after false
discovery rate (FDR) adjustment] different between tumor and non-tumor liver samples.
Human Homology Mapping for Zebrafish Liver Tumor Data
National Center for Biotechnology Information (NCBI, USA) HomoloGene and UniGene
databases were used for human homology mapping of the zebrafish genes. Information on how
HomoloGene and UniGene databases were built can be obtained from the following link,
http://www.ncbi.nlm.nih.gov/HomoloGene/HTML/homologene_buildproc.html and
http://www.ncbi.nlm.nih.gov/UniGene/FAQ.html, respectively. The latest UniGene and
HomoloGene Build files were downloaded from the following NCBI FTP sites
ftp://ftp.ncbi.nih.gov/repository/UniGene/Danio_rerio/ and
ftp://ftp.ncbi.nlm.nih.gov/pub/HomoloGene/, respectively. HomoloGene allows for detection of
putative homologs among the annotated genes of several eukaryotic genomes and has links to
UniGene clusters established by tblastn search of the UniGene database. A PERL script was
5
written to map all zebrafish UniGene clusters to human UniGene clusters that are identified as
homologs of each other by the HomoloGene database. Another PERL script was written to
enable automated mapping of GenBank Identifiers (GenBank Accession Number) of the
zebrafish probes on the array to their respective UniGene cluster which are then mapped to
human UniGene cluster(s) that has been identified as homolog(s) by HomoloGene database. This
automated procedure is part of the Genome Institute of Singapore Zebrafish Microarray
Annotation Database ( http://giscompute.gis.a-star.edu.sg/~govind/zebrafish/version2/ ; see ref. 5)
and is updated periodically from several resource databases. In this study, using NCBI
Homologene (Build 43.1) and Unigene (Build 85 for zebrafish and Build 186 for human)
databases, we were able to map 1334 unique zebrafish Unigene clusters (representing 1404 gene
features) to 1942 unique human Unigene clusters (see Supplementary Data 1 online for details).
Some zebrafish Unigene clusters were mapped to more than one human Unigene clusters (usually
from the same family of proteins). We designated this Zebrafish Liver Tumor Differentially
Expressed Gene Set as ZLTDEGS and used it for subsequent comparative analysis with the
human cancer microarray data. Functional characterization of genes was based on Gene Ontology
and can be obtained from Stanford’s SOURCE database.
Source of Human Microarray Datasets
With the exception of Neo et al., 2004 (ref. 7), Nam et al (unpublished) and Miller et al.,
(unpublished), all human cancer micoarray datasets used in this study can be downloadable from
publicly accessible databases provided in the respective online version of the paper at the
publisher website. Human liver cancer dataset (Neo et al., 2004) was obtained directly from the
first author. The human gastric (Nam et al., unpublished) and liver cancer progression (Miller et
al., unpublished) datasets were used with consent as it is part of another cancer collaborative
study between GIS, the Catholic University of Korea and Sungkyunkwan University School of
Medicine, Korea. The human liver samples for the liver cancer progression dataset and the gastric
samples were obtained with consent from patients who underwent surgical treatment at the
Sungyunkwan University School of Medicine and Yonsei University, Korea, respectively. The
liver samples were histologically graded by two pathologists (Jung Young Lee from the Catholic
University of Korea and Cheol Guen Park from Sungyunkwan University School of Medicine)
using the Edmondson and Steiner method and according to the guidelines of the International
6
Working Party. Microarray hybridization for both the liver and gastric samples were performed
on oligoarrays manufactured in GIS using the Compugen/Sigma Oligolibrary (60-mers)
representing ~17,260 unique genes followed by data acquisition and normalization as described
above. One way ANOVA test or one-versus-all (OVA) unpooled t-test were applied onto the
human liver cancer progression data and 3,084 unique genes associated with tumor grade (pvalue<0.001) were used for analysis in this study. As for all other human cancer datasets
involving tumor versus normal analysis, Wilcoxon rank-sum test were used for determining the P
value and subsequently adjusted for Benjamini and Hochberg False Discovery Rate [Benjamini,
Y, and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful
approach to multiple testing. Journal of the Royal Statistical Society Series B, 57, 289-300.] to
indicate the statistical significance of the differential expression of each gene between tumor and
normal samples. The difference between mean log2 ratio signal/reference of tumor versus normal
samples was calculated to indicate the relative fold changes and direction of expression of a gene.
Statistical Tests used for Comparative Analysis of ZLTDEGS with Human Tumor
Microarray Datasets
Eight human cancer microarray datasets representing four tumor types were used in this analysis:
liver6,7, gastric8, Nam et al., prostate9,10and lung11,12. Genes that are not present or detectable in at
least three different tumor type datasets were excluded from the analysis. In order to qualify as
differentially expressed genes for each of the human tumor type, the genes must be significant
(FDR<1%) in both human datasets for a respective tumor type. The key question being asked in
this analysis is whether the human homolog genes of ZLTDEGS is overrepresented and is
correlated with the ranking of human tumor differentially expressed genes or enriched genes (see
below for gene set enrichment strategy). To assess the statistical significance of the
overrepresentation, we modeled the problem as a Bernoulli Trial experiment where the
cardinality of the total ZLTDEGS human homologs was the number of trials (n) and the number
of success (s) was the number of tumor genes that was “successfully” identified among
ZLTDEGS human homologs (i.e. the overlapping genes between ZLTDEGS and filtered human
datasets). The (random) probability of a success (p) was therefore the fraction of human tumor
genes among the total valid human genes being considered. In other words, we can view the
ZLTDEGS as a selection process of human genes, and ask whether the selection process is
7
indeed associated with human tumor genes and not simply by random chance alone. Under this
model, the significance of the overrepresentation of human tumor genes can be assessed by
calculating the probability that among a randomly selected human gene set of size n there are at
least s human tumor genes, i.e. the P-value of observing s human tumor genes among n random
human genes. The above statistic follows the Binomial Distribution and p-value can be calculated
using the formula:
n  n

 
Pr( X  s; n, p)      p y (1  p) n y 
ys   y 

The smaller the P-value, the more unlikely that the observed degree of overlap between the
human tumor gene sets and ZLTDEGS human homologs would arise by chance and therefore
suggests a stronger association or a greater commonality between the intersecting zebrafish and
human tumor gene sets.
The question of whether the ZLTDEGS human homolog gene set was correlated with the rank
order stemming from the human tumor analysis was assessed by employing the GSEA
methodology (see ref. 4 and http://www.broad.mit.edu/gsea/ ). The genes were ranked based on
the Geo-Mean FDR value of the gene in both the human datasets for a particular tumor type.
Therefore the upper-ranked genes are relatively more significant, hence more consistently
associated with the tumor type compared to the lower-ranked genes in a rank list of genes in a
human dataset. The GSEA framework provides an Enrichment Score (ES) which indicates the
association of a gene set with a ranked list of genes, with higher ES denoting that the gene set is
concentrated among the top ranked genes of the list. A Normalized Enrichment Score (NES) is
used when multiple datasets are compared as in this study. The nominal p-value is calculated by a
series of Monte Carlo simulation, permuting the ranked list and computing the ES for each
permutated set. A total of 1 million iterations were performed and the fraction of time randomly
generated ranked list produced an ES score greater than or equal to the observed ES was reported
as the p-value. This test measures the association of ranked gene list with a given set of genes and
complements the Binomial test, as described earlier, which evaluates the amount of
overrepresentation of a gene set in another gene set.
8
Gene Set Enrichment Strategy
As there are genes not present or detectable in across all datasets and tumor types, we devised the
following strategy for enriching a set of genes for a particular tumor type:
1. The gene has to be significant (FDR<1%) in both the human datasets for the tumor type
intended for enrichment. This criterion will ensure that the genes are significantly
differentially expressed in the tumor type intended for gene enrichment.
2. The gene’s geometric mean of the FDR values in other tumor types (not intended for
enrichment) has to be more than 1% (Geo-Mean FDR >1%) and the gene has to be not
significant (FDR>1%) in at least two other tumor types not intended for enrichment. This
criterion will increase the likelihood of the gene being not significant in the tumor types
not intended for enrichment even though the gene may not be present or detectable in all
datasets.
Using this strategy, the set of genes enriched for a tumor type will be significantly differentially
expressed for the tumor type intended for enrichment, and is likely less significant in other tumor
types not intended for enrichment (although some individual genes in the gene set may still be
significantly differentially expressed in one of the other tumor types not intended for
enrichment). The entire gene set, taken together, represents an expression signature that is more
consistently associated with a particular tumor type. Each set of genes enriched for a tumor type,
was assessed for intersection with ZLTDEGS as described above.
9
Download