Statistical analysis

advertisement
Supplemental Methods
Microarray procedure. Total RNA extraction from microdissected or sorted cells was
performed with the Purescript RNA Isolation Kit (Gentra, Minneapolis, USA) with addition
of 80 g glycogen (Roche, Mannheim, Germany) as a carrier. All reagents were reduced to
one-tenth of the standard protocol. RNA was stored at -80°C. The RiboAmpTM RNA
Amplification Kit, version C (Arcturus, Mountain View, CA, USA) was used for the first
round of in vitro transcription (IVT). 1.5 g T4 gene 32 protein (Ambion, Austin, USA) per
sample were added to the first cDNA synthesis. For the second round of IVT and concomitant
labelling, the BioArray High Yield Transcription Labeling Kit (ENZO Life Science,
Farmingdale, NY, USA) was used with the following modifications: the incubation-time was
prolonged to 8 hours and the ENZO T7 RNA polymerase was substituted by 300 U
Stratagene T7 RNA polymerase (Stratagene, La Jolla, CA, USA). cRNA yield and length
distribution were analysed on a 2100 Bioanalyzer (Agilent). Fragmentation of cRNA,
microarray hybridisation to HGU133 Plus 2.0 arrays, washing steps and scanning of the
microarrays was performed according to the Affymetrix protocol (Affymetrix). This method
has been shown to yield reliable results and is restricted only by a loss of information for
some probe sets due to cDNA shortening during the two-round IVT process.1,
2
The gene
expression profiles of 12 typical cHL and of normal B cells were available through an earlier
study.1
Quantitative real-time RT-PCR of lymphoma cases and activated T cells. Total RNA was
isolated from microdissected lymphoma cells (1500 cells per case) and FACS-sorted CD4+
and CD8+ activated tonsillar T cells (2000 cells each) according to the protocol of the
PureScript RNA Purification Kit (Gentra) except lysis of cells in 150 µl cell lysis solution,
addition of 37.5 µl protein-DNA precipitation solution, precipitation in 120 µl isopropanol
with 4 µl glycogen (20 mg/ml, Roche) and washing of pellets in 200 µl 70% ethanol. cDNA
1
synthesis (SensiScript Kit, Qiagen, Hilden, Germany) was done with random hexamer
primers (ABI, Darmstadt, Germany). The real-time PCR was performed on a 7900HT
TaqMan (ABI). All PCRs (10 µL: 1x TaqMan universal master mix (ABI), 1x TaqMan®
Gene Expression Assays of the corresponding transcripts (ABI, #331182: GAPDH, FN1,
NNMT; IGFBP7) and diluted cDNA) were measured as duplicates under standard conditions:
2 min 50°C, 10 min 95°C, followed by 45 cycles of 15 sec 95°C and 1 min 60°C. Fold change
(FC) of expression was calculated by FC=2-ΔΔCt with ΔΔCt = (Ct
Tcell,GOI –
ALCL,GOI
– Ct
ALCL,RG)-(Ct
Ct Tcell,RG) with GOI=gene of interest, RG=reference gene.
Immunohistochemical
analysis.
The
following
antibodies
were
used
for
immunohistochemical stainings: anti-EZH2 (Zymed, 187395), anti-ITK (Abcam, ab32113),
anti-LGALS1 (Abcam, ab25138), anti-MEOX1 (ProSci, XAV-8523), anti-SATB1 (BD
Biosciences, 611182), and anti-SNFT (Abnova, H0005559-A01). Antigen retrieval by
pressure cooking was applied to the sections for all stainings. For detection of antibody
binding, the REAL-HRP system (Dako REALTM EnvisionTM Detection System,
Peroxidase/DAB+, Rabbit/Mouse K5007) (for EZH2, LGALS1, MEOX1, SNFT) or the
REAL-AP system (Dako REALTM Detection System, Alkaline Phosphatase/RED,
Rabbit/Mouse K5005) (for ITK and SATB1) were used. Incubation with the primary
antibodies was 30 minutes at room temperature for all antibodies.
Statistical analysis. GCOS 1.4 software (Affymetrix) was used to analyze microarray
hybridization quality with default parameters. Signal intensities of each probe set were scaled
to a target value of 2000 based on 100 control transcripts, as provided by Affymetrix
[http://www.affymetrix.com/Auth/support/downloads/mask_files/hgu133_plus_2norm.zip].
Additionally, the “.DAT” files were visually examined for possible artifacts. The percentages
of present (P) calls and the scaling factors (SF) were largely comparable across the samples
with similar variation among samples of the various groups, although sorted cells often gave
2
higher frequencies of P calls (on average 22.3% (range 13.6–30.1%) for microdissected cells,
30.4% (range 20.2–38.6%) for live sorted cells; SF average of microdissected cells 3.9 (1.8–
7.8), SF average of live sorted cells 3.7 (1.3–7.2)). This indicates that different isolation
methods (LMPC or FACS) or slight differences in the RNA quality of the microdissected or
live sorted cells influenced the amplification and hybridization procedure only modestly.
Statistical analysis was performed using the computing environment R (R Development Core
Team, 2005). Additional software packages (affy, geneplotter, multtest, vsn) were taken from
the Bioconductor project.3 For microarray pre-processing, the variance stabilization method of
Huber et al. was applied for probe level normalization.4 The variance of probe intensities is
rendered independent of their expected expression levels by this method. Assuming that the
majority of genes are not differentially expressed across the samples, parameters (offset and a
scaling factor) were estimated for each microarray. With regard to the computational
complexity of the algorithm, parameters are estimated on a random subset of probes and are
then used to transform the complete arrays. By application of the robust median polish
method on the normalized data probe set summarization was calculated. Considering the
different probe affinities via the probe effect, an additive robust model on the logarithmic
scale (base 2) was fitted across the arrays for each probe set.5, 6
Unsupervised hierarchical clustering was performed for the 100 probe sets with the
highest standard deviation across all samples using the Manhattan distance and the average
linkage method. The stability of the resulting dendrogram was tested with Pvclust.7
Heat Maps were generated with the Spotfire software (Spotfire DecisionSite9.1,
1996-2007). A heat map is a colour coded visualization of the normalized signal values.
Differential gene expression. To reduce the dimension of the microarray data before
conducting pairwise comparisons, an intensity filter (the signal intensity of a probe had to be
≥100 in at least 25% of the samples, if the group sizes were equal) and a variance filter (the
3
interquartile range of log2 intensities should be ≥0.5, if the group sizes were equal) were
applied. In the case of unequal group sizes, the signal intensity of a probe set had to be above
100 in at least a fraction (smaller group size minus one, divided by the total sample size of the
two groups) of the samples. The inter-quartile range of log2 intensities had to be at least 0.1
for unequal group sizes. After the global filtering a two-sample t-test was applied to identify
genes that were differentially expressed between two groups. The false discovery rate (FDR)
according to Benjamini and Hochberg was used to account for the multiple testing.7 Fold
changes (FC) between the two groups of each supervised analysis were calculated for each
gene.
Principal Component Analysis (PCA) was performed using pre-selected gene sets
and pre-decided groups of samples. The first principal component is used as an expression
signature for the given gene set, which is then applied to the samples of all groups. A
Wilcoxon test was performed to detect significant differences between the groups after PCA.
Generation of a list of T cell-characteristic genes. A comprehensive list of T cellcharacteristic genes was generated by performing a supervised analysis for CD4 and CD8 T
cells with B cells (naive, N, memory, M, centrocytes, CC and centroblasts, CB; data
generated on the same platform 1), respectively and selecting for transcripts up-regulated in
CD4+ and/or CD8+ T cells (280 probe sets, FDR ≤ 0.05, FC ≥ 2). This list was complemented
with an independent list of genes selectively expressed in T cells, generated by comparing
gene expression of numerous distinct leukocyte subpopulations (412 probe sets).8 After
filtering the latter list for informative probe sets (expression above background in at least two
of all samples of ALCL and normal cells) this list comprised 258 individual probe sets. The
combined final list comprised 530 individual probe sets corresponding to 459 genes.
4
References
1.
Brune V, Tiacci E, Pfeil I, Döring C, Eckerle S, van Noesel C, et al. Origin and
pathogenesis of nodular lymphocyte-predominant Hodgkin lymphoma as revealed by
global gene expression analysis. J Exp Med 2008; 205: 2251-68.
2.
Kenzelmann M, Klaren R, Hergenhahn M, Bonrouhi M, Grone HJ, Schmid W, et al.
High-accuracy amplification of nanogram total RNA amounts for gene profiling.
Genomics 2004; 83: 550-8.
3.
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al.
Bioconductor:
open
software
development
for
computational
biology and
bioinformatics. Genome Biol 2004; 5: R80.
4.
Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M. Variance
stabilization applied to microarray data calibration and to the quantification of
differential expression. Bioinformatics 2002; 18 Suppl 1: S96-104.
5.
Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP. Summaries of
Affymetrix GeneChip probe level data. Nucleic Acids Res 2003; 31: e15.
6.
Tukey JW. Exploratory data analysis. Reading, MA, USA: Addison-Wesley
Publishing Co. Inc., 1977.
7.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and
powerful approach to multiple testing. J R Statist Soc B 1995; 57: 289-300.
8.
Chtanova T, Newton R, Liu SM, Weininger L, Young TR, Silva DG, et al.
Identification of T cell-restricted genes, and signatures for different T cell responses,
using a comprehensive collection of microarray datasets. J Immunol 2005; 175: 783747.
5
Download