Supplemental Methods Microarray procedure. Total RNA extraction from microdissected or sorted cells was performed with the Purescript RNA Isolation Kit (Gentra, Minneapolis, USA) with addition of 80 g glycogen (Roche, Mannheim, Germany) as a carrier. All reagents were reduced to one-tenth of the standard protocol. RNA was stored at -80°C. The RiboAmpTM RNA Amplification Kit, version C (Arcturus, Mountain View, CA, USA) was used for the first round of in vitro transcription (IVT). 1.5 g T4 gene 32 protein (Ambion, Austin, USA) per sample were added to the first cDNA synthesis. For the second round of IVT and concomitant labelling, the BioArray High Yield Transcription Labeling Kit (ENZO Life Science, Farmingdale, NY, USA) was used with the following modifications: the incubation-time was prolonged to 8 hours and the ENZO T7 RNA polymerase was substituted by 300 U Stratagene T7 RNA polymerase (Stratagene, La Jolla, CA, USA). cRNA yield and length distribution were analysed on a 2100 Bioanalyzer (Agilent). Fragmentation of cRNA, microarray hybridisation to HGU133 Plus 2.0 arrays, washing steps and scanning of the microarrays was performed according to the Affymetrix protocol (Affymetrix). This method has been shown to yield reliable results and is restricted only by a loss of information for some probe sets due to cDNA shortening during the two-round IVT process.1, 2 The gene expression profiles of 12 typical cHL and of normal B cells were available through an earlier study.1 Quantitative real-time RT-PCR of lymphoma cases and activated T cells. Total RNA was isolated from microdissected lymphoma cells (1500 cells per case) and FACS-sorted CD4+ and CD8+ activated tonsillar T cells (2000 cells each) according to the protocol of the PureScript RNA Purification Kit (Gentra) except lysis of cells in 150 µl cell lysis solution, addition of 37.5 µl protein-DNA precipitation solution, precipitation in 120 µl isopropanol with 4 µl glycogen (20 mg/ml, Roche) and washing of pellets in 200 µl 70% ethanol. cDNA 1 synthesis (SensiScript Kit, Qiagen, Hilden, Germany) was done with random hexamer primers (ABI, Darmstadt, Germany). The real-time PCR was performed on a 7900HT TaqMan (ABI). All PCRs (10 µL: 1x TaqMan universal master mix (ABI), 1x TaqMan® Gene Expression Assays of the corresponding transcripts (ABI, #331182: GAPDH, FN1, NNMT; IGFBP7) and diluted cDNA) were measured as duplicates under standard conditions: 2 min 50°C, 10 min 95°C, followed by 45 cycles of 15 sec 95°C and 1 min 60°C. Fold change (FC) of expression was calculated by FC=2-ΔΔCt with ΔΔCt = (Ct Tcell,GOI – ALCL,GOI – Ct ALCL,RG)-(Ct Ct Tcell,RG) with GOI=gene of interest, RG=reference gene. Immunohistochemical analysis. The following antibodies were used for immunohistochemical stainings: anti-EZH2 (Zymed, 187395), anti-ITK (Abcam, ab32113), anti-LGALS1 (Abcam, ab25138), anti-MEOX1 (ProSci, XAV-8523), anti-SATB1 (BD Biosciences, 611182), and anti-SNFT (Abnova, H0005559-A01). Antigen retrieval by pressure cooking was applied to the sections for all stainings. For detection of antibody binding, the REAL-HRP system (Dako REALTM EnvisionTM Detection System, Peroxidase/DAB+, Rabbit/Mouse K5007) (for EZH2, LGALS1, MEOX1, SNFT) or the REAL-AP system (Dako REALTM Detection System, Alkaline Phosphatase/RED, Rabbit/Mouse K5005) (for ITK and SATB1) were used. Incubation with the primary antibodies was 30 minutes at room temperature for all antibodies. Statistical analysis. GCOS 1.4 software (Affymetrix) was used to analyze microarray hybridization quality with default parameters. Signal intensities of each probe set were scaled to a target value of 2000 based on 100 control transcripts, as provided by Affymetrix [http://www.affymetrix.com/Auth/support/downloads/mask_files/hgu133_plus_2norm.zip]. Additionally, the “.DAT” files were visually examined for possible artifacts. The percentages of present (P) calls and the scaling factors (SF) were largely comparable across the samples with similar variation among samples of the various groups, although sorted cells often gave 2 higher frequencies of P calls (on average 22.3% (range 13.6–30.1%) for microdissected cells, 30.4% (range 20.2–38.6%) for live sorted cells; SF average of microdissected cells 3.9 (1.8– 7.8), SF average of live sorted cells 3.7 (1.3–7.2)). This indicates that different isolation methods (LMPC or FACS) or slight differences in the RNA quality of the microdissected or live sorted cells influenced the amplification and hybridization procedure only modestly. Statistical analysis was performed using the computing environment R (R Development Core Team, 2005). Additional software packages (affy, geneplotter, multtest, vsn) were taken from the Bioconductor project.3 For microarray pre-processing, the variance stabilization method of Huber et al. was applied for probe level normalization.4 The variance of probe intensities is rendered independent of their expected expression levels by this method. Assuming that the majority of genes are not differentially expressed across the samples, parameters (offset and a scaling factor) were estimated for each microarray. With regard to the computational complexity of the algorithm, parameters are estimated on a random subset of probes and are then used to transform the complete arrays. By application of the robust median polish method on the normalized data probe set summarization was calculated. Considering the different probe affinities via the probe effect, an additive robust model on the logarithmic scale (base 2) was fitted across the arrays for each probe set.5, 6 Unsupervised hierarchical clustering was performed for the 100 probe sets with the highest standard deviation across all samples using the Manhattan distance and the average linkage method. The stability of the resulting dendrogram was tested with Pvclust.7 Heat Maps were generated with the Spotfire software (Spotfire DecisionSite9.1, 1996-2007). A heat map is a colour coded visualization of the normalized signal values. Differential gene expression. To reduce the dimension of the microarray data before conducting pairwise comparisons, an intensity filter (the signal intensity of a probe had to be ≥100 in at least 25% of the samples, if the group sizes were equal) and a variance filter (the 3 interquartile range of log2 intensities should be ≥0.5, if the group sizes were equal) were applied. In the case of unequal group sizes, the signal intensity of a probe set had to be above 100 in at least a fraction (smaller group size minus one, divided by the total sample size of the two groups) of the samples. The inter-quartile range of log2 intensities had to be at least 0.1 for unequal group sizes. After the global filtering a two-sample t-test was applied to identify genes that were differentially expressed between two groups. The false discovery rate (FDR) according to Benjamini and Hochberg was used to account for the multiple testing.7 Fold changes (FC) between the two groups of each supervised analysis were calculated for each gene. Principal Component Analysis (PCA) was performed using pre-selected gene sets and pre-decided groups of samples. The first principal component is used as an expression signature for the given gene set, which is then applied to the samples of all groups. A Wilcoxon test was performed to detect significant differences between the groups after PCA. Generation of a list of T cell-characteristic genes. A comprehensive list of T cellcharacteristic genes was generated by performing a supervised analysis for CD4 and CD8 T cells with B cells (naive, N, memory, M, centrocytes, CC and centroblasts, CB; data generated on the same platform 1), respectively and selecting for transcripts up-regulated in CD4+ and/or CD8+ T cells (280 probe sets, FDR ≤ 0.05, FC ≥ 2). This list was complemented with an independent list of genes selectively expressed in T cells, generated by comparing gene expression of numerous distinct leukocyte subpopulations (412 probe sets).8 After filtering the latter list for informative probe sets (expression above background in at least two of all samples of ALCL and normal cells) this list comprised 258 individual probe sets. The combined final list comprised 530 individual probe sets corresponding to 459 genes. 4 References 1. Brune V, Tiacci E, Pfeil I, Döring C, Eckerle S, van Noesel C, et al. Origin and pathogenesis of nodular lymphocyte-predominant Hodgkin lymphoma as revealed by global gene expression analysis. J Exp Med 2008; 205: 2251-68. 2. Kenzelmann M, Klaren R, Hergenhahn M, Bonrouhi M, Grone HJ, Schmid W, et al. High-accuracy amplification of nanogram total RNA amounts for gene profiling. Genomics 2004; 83: 550-8. 3. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004; 5: R80. 4. Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 2002; 18 Suppl 1: S96-104. 5. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 2003; 31: e15. 6. Tukey JW. Exploratory data analysis. Reading, MA, USA: Addison-Wesley Publishing Co. Inc., 1977. 7. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc B 1995; 57: 289-300. 8. Chtanova T, Newton R, Liu SM, Weininger L, Young TR, Silva DG, et al. Identification of T cell-restricted genes, and signatures for different T cell responses, using a comprehensive collection of microarray datasets. J Immunol 2005; 175: 783747. 5