Supplementary Figure Legends Figure S1 Overview of the experimental strategy The transcriptomes of CD3-purified cells from the peripheral blood of five inv(14)-positive T-PLL cases were compared with those of eight normal donor derived immunomagnetically purified peripheral blood T cell samples. Mononuclear cells were isolated from fresh peripheral blood samples (Lymphoprep, Invitrogen, Karlsruhe, Germany). CD3-positive T cells were enriched employing anti-CD3 magnetic microbeads (MidiMacs, Miltenyi Biotec, Bergisch Gladbach, Germany) resulting in a purity of CD3+ T cells of >90% by flow cytometry. Gene expression profiling was performed on RNA from 1-2x108 CD3+ cells (RNeasy midi kit, Qiagen, Hilden, Germany) using the Affymetrix U133A microarray platform as recently described (1). Differentially expressed genes (Table S2) were identified using the Mann-Whitney U non-parametric test in a supervised approach as described in the legend to Table S2. The panel of differentially expressed genes was further functionally analyzed for enrichment in the gene ontology (GO) categories `biological process´ and `molecular function´ employing the GO stat tool (Table S3) (2). Furthermore, differentially expressed genes were tested for non-random distribution to individual chromosome arms employing the hypergeometric distribution method (3) in order to identify candidate regions for genomic aberrations (Table 1). Gene expression changes were then correlated with chromosomal imbalances as detected by FISH and Affymetrix GeneChip 50K SNP array analysis. The raw experimental data can be accessed through the internet (http://www.ncbi.nlm.nih.gov/geo/). Figure S2 and Table S5 Summary of the GeneChip analyses GeneChip analyses were conducted as recently described (6). Genomic DNA (extracted with QIAamp DNA Blood Midi Kit, Qiagen, Hilden, Germany) was subjected to Affymetrix GeneChip 50K SNP XbaI mapping array analyses following the standard protocol for Affymetrix GeneChip Mapping 100K arrays (Affymetrix Inc., Santa Clara, CA, USA). Arrays were evaluated using the Affymetrix software tools (GDAS, v3.0; CNAT2.0) and according to criteria we established based on FISH-validated copy number aberrations.The problem to establish reliable cut-offs from SNP-Chip data is that the quality of the individual SNPs, and consequently the single point copy number (SPA-CN) values, fluctuates throughout the genome. Therefore, the cut-off for copy number changes was calculated correlating SNP-Chip data with chromosomal regions containing FISH-proven imbalances using a stepwise approach (Figure S2). As cut-offs are calculated as mean plus/minus three times the standard deviation, the intrinsic heterogeneity of SPA_CN values leads to a size-dependent bias of the standard deviation. The selection of too small or too large regions for cut-off determination leads to high and low standard deviations/cut-offs, respectively. Therefore, the initial cut-offs were calculated using a region spanning 150 consecutive SNPs. For deletions, an initial estimation of the cut-off was determined using SPA_CN values from a first set of 150 consecutive SNPs, 75 upstream and 75 downstream of the chromosomal location of the FISH-probe in 8p21.3, that was proven to be deleted in four T-PLL cases by FISH. A deletion was defined as a region spanning at least 18 consecutive SNPs (to reach a resolution of 1 Mb) with a mean value below 1.5. The performance of this estimate was tested on a second set of 20 FISH-proven deletions. Mean SPA_CN values of 18 consecutive SNPs were calculated for these regions and 15/20 showed mean SPA_CN values below 1.5. One deleted region had a mean SPA_CN value above 1.5 and below 1.6 whereas 4 regions had mean SPA_CN values above 1.75 (see details in Table S5). The final cut-off was therefore adjusted to 1.6. Using this criterion, 4/20 FISH-proven deletions could not be detected, which is explained by the presence of low quality SNPs in these particular regions. The initial estimation of the cut-off for gains was calculated using 150 consecutive SNPs from three cases with FISH-proven 6p21 chromosomal gains. The initial cut-off was 2.8. This estimated cut-off was verified using 12 additional FISH-proven gained regions, and all of them contained mean values above 2.8 for 18 consecutive SNPs. Therefore, the final cut-off for gains was set to 2.8. Additionally, all 11 cases with FISH-proven balanced copy numbers on chromosome arm 9q were shown to have mean values above 1.6 and below 2.8, which confirmed their balanced genomic status and demonstrated the validity of the calculated cut-offs. In order to determine copy number alterations throughout the SNP-Chip data, it was necessary to smooth the SPA_CN-data to compensate for regions represented by low quality SNPs. Therefore, means of 18 consecutive copy numbers of single point analyses were calculated and to get a clearer overview, a colour-code for gains and losses applied using the established cut-offs. We identified 39 deleted regions and 21 gained regions (see Table 2). As the boundaries of gained and deleted regions are blurred by the smoothing algorithm, their exact locations were taken from the raw SPA_CN data. Figure S1 Figure S2 Table S1 T-PLL Clinical and cytogenetic data of the T-PLL patients included in the study Age (in years) Gender Karyotype of the tumor cells 45,X,Y,i(6)(p10),del(7)(p15),der(8)t(8;8)(p21;q21),-10, der(11)del(11)(q22q23)dup(11)(q23q24),dic(12;15)(p12;p13), add(13)(p13),inv(14)(q11q32),add(15)(q24),der(21)t(10;21)(q11;p13), add(22)(p13),+der(?)t(?;8)(?;q21)[cp 21]. 45,XX,der(5)t(5;11)(p14;q13),i(8)(q10),-11,t(12;14)(p12;q12~13),-14, inv(14)(q11q32),add(17)(p12),add(18)(p11),+mar [15] 44,X,-Y, t(4;9)(q22~24;q34),add(6)(q12),+8,der(8)t(8;8)(p21;q21)x2, -11,inv(14)(q11q32),del(17)(q24),add(19)(p13), der(21;22)(q10;q10)[3]. 45,X,-X,+10,der(10;18)(q10;q10),inv(14)(q11q32)[1] /46,idem,der(9)t(9;14)(p21;q31), inv(14)(q11q32),der(14)inv(14)(q11q32), t(9;14)(p21;q31),+16[12] 1 49 m 2 54 f 3 72 m 4 90 f 5 74 f FISH only: TCRAD break, TCL1 break, nuc ish see table S5 6 59 m FISH only: TCRAD break, TCL1 break, nuc ish see table S5 7 70 f 8 62 m 9 51 m 10 76 m 11 68 f 12 55 m A, T-PLL without inv(14)/t(14;14) 67 m Included in Included in Included in FISHSNP-Chip GEP analysis analysis x x x x x x x x x x x x x x x x x x x 46,XX,der(6)t(3;6)(p14;q21),inv(14)(q11.2;q32.1), inv(16) (pter->p13::q11->p13::q24->q11::q24->qter), der(21)t(21;21)(p11;q21)[6]/46,idem,der(8)t(8;8)(p21;q11)[17] 44,X,der(Y)t(Y;?1)(q12;q31),der(8)t(8;8)(p11;q22)hsr(8)(p11), der(9)t(9;9)(p23;q34),der(10)t(5;10)(q31;q26),del(10)(q23), -11,dup(11)(p12p14),dup(13)(q21q14),inv(14)(q11q32.1), dic(20;22)(p12;p11),-22[17] 46,XY,der(8)t(8;8)(p21;q21),add(12)(p13),+14, der(14)(14pter->14q10::14q22->14q11::14q32->14qter)x2, -18,add(19)(p13),der(22)t(?12;22)(q14;q11)[29] 46,XY,t(1;2)(p32;p21),t(3;20)(q27;q11),+8,t(8;20)(p12;q13), inv(14)(q11q32),-22[15] 44,X,add(X)(q25),del(3)(p11),der(4)t(3;4)(p12;p15), der(8)?t(X;8)(q25;p22),der(8)t(8;8)(p22;q23),-11,-13,inv(14)(q11q32), r(17)(p11q24)[cp 9]/44,XX,inv(1)(p12q25~31), -11,13,inv(14)(q11q32),add(16)(q11),r(17)(p11q24)[13] 43,X,-Y,t(1;3)(q22;p23),der(4)t(4;?22)(p15;q13),t(6;6)(p12;p22~24), i(8)(q10),-9,-10, der(11)add(11)(p12)add(11)(q23), +13,der(13;14)(q10;q10)x2, inv(14)(q11q32),add(17)(p12),add(17)(q24),22,+mar[25] 45,X,add(Y)(p11),add(5)(q34),i(8)(q10),del(11)(q23),dic(13;22)(p13;p13) x x x x x x x x x x x x m= male, f= female, GEP= gene expression profiling A: Results of this case were not included in the evaluations due to the lack of inv(14)/t(14;14) but served for control purposes. Karyotypes of the tumour cells are described according to ISCN 1995 (4). Breakpoints in the TCRAD and TCL1 loci were confirmed by FISH in all T-PLL samples (Table S4). Table S2 Differentially expressed genes in T-PLL (N=5) vs. normal donor (ND, N=8) derived CD3+ T-cells. To eliminate genes with a low absolute expression intensity, only genes called present by the Affymetrix algorithm in at least 50% of the T-PLL samples (for genes designated up-regulated) or normal control samples (for genes defined as down-regulated) were selected (N=11028) and then further filtered by comparing the median signal intensities of T-PLL vs. normal control samples and defining a cut-off fold change of ±2.0 yielding N=1302 probe sets. This set of genes was further analysed employing the Mann-Whitney U non-parametric test at a significance level of 0.05 resulting in the identification of 830 differentially expressed genes (termed “subgroup distinction genes”, N=668 down-regulated and N=634 up-regulated probe sets, A). To correct for multiple testing the 1302 reliably measured probe sets with a fold change difference of ±2.0 between the two groups (see above) were also analysed by multiclass supervised comparative analysis utilizing the significance analysis of microarrays (SAM) method. Employing a false discovery rate of q ≤ 5% N=1052 differentially expressed probe sets were identified, which showed a 99.8% overlap (828/ 830 probe sets) with the set of subgroup distinction genes defined by the MWU test (see Venn diagram and raw data in B). Table S3 Statistically over-represented Gene Ontologies (GO) within the genes differentially expressed between T-PLL and normal peripheral blood derived CD3+ T-cells Annotated GO term Genes No. of annotated No. of annotated genes in genes in target reference list (388 gene list (5132 in in total) total) p-value GO Biological Process OPN3 TLE2 CALM3 GPR65 BLR1 ENPP2 CD2 KIAA1128 JAG2 CELSR2 GPR56 DOCK1 TSHB GPR171 TGFBR3 CXCL1 ACVR2B GNG11 CD8A F2R PPAP2A KLRB1 P2RY5 DKK1 TNFRSF1B CD160 HRMT1L2 CD59 TCF7L2 GPR27 IFNGR2 ERBB3 FZD6 PIG8 ADRB2 RRH KLRF1 CD3D P2RY10 ITGA10 INHA OPN3 SLC1A1 LST1 CST7 MBP GPR65 BLR1 MYBPC1 CTSW CD2 JAG2 ALPL GCH1 IGJ RIPK2 ICOS NR3C2 SNTA1 IL2RB CXCL1 CLECSF2 GO:0050874 : Organismal PITPNA GBP2 KLRC3 XCL2 SORD CD8A F2R SIX6 KLRB1 CHST4 PDE7B physiological process CD244 CLCN5 CD160 CKMT2 CCL5 CD59 SCN10A CCL4 CTLA4 PPBP PLUNC GLMN GNLY LGALS3B LST1 CST7 MBP GPR65 BLR1 CTSW CD2 JAG2 IGJ RIPK2 ICOS CD48 IL2RB CXCL1 CLECSF2 GBP2 KLRC3 XCL2 CD8A KLRB1 CHST4 CD244 GO:0006952 : Defense response CD160 CCL5 HRMT1L2 CD59 CCL4 CTLA4 PPBP PLUNC GLMN GNLY LGALS3BP KCNN4 IK KLRF1 CD3D INHA GZMA GO:0007186 : G-protein coupled OPN3 CALM3 GPR65 P2RY5 BLR1 ENPP2 KIAA1128 CELSR2 GPR56 TSHB GPR171 CXCL1 GPR27 FZD6 ADRB2 GNG11 RRH F2R P2RY10 receptor protein signaling PPAP2A pathway LST1 CST7 MBP GPR65 BLR1 CTSW CD2 JAG2 IGJ RIPK2 ICOS IL2RB CXCL1 CLECSF2 GBP2 XCL2 KLRC3 CD8A KLRB1 CHST4 CD244 CD160 GO:0006955 : Immune response CCL5 CD59 CCL4 CTLA4 PPBP PLUNC GLMN GNLY LGALS3BP IK KLRF1 CD3D GZMA INHA GO Molecular Function OPN3 NEO1 GPR65 BLR1 CD2 KIAA1128 LAIR2 GRM6 CELSR2 GPR56 NR3C2 IL2RB GPR171 TGFBR3 IL18R1 ACVR2B KLRC3 NR2C1 CR1 CD8A GO:0004872 : Receptor activity F2R KLRB1 FKBP1A P2RY5 CD244 TNFRSF1B CUL5 CD160 PLA2R1 EPS15 RORA PTPRM TRIP13 GPR27 IFNGR2 ERBB3 FZD6 LGALS3BP ADRB2 CRSP6 RRH KLRF1 C OPN3 NEO1 GPR65 P2RY5 BLR1 TNFRSF1B KIAA1128 GRM6 PTPRM GO:0004888: Transmembrane CELSR2 GPR56 IL2RB GPR171 ACVR2B IFNGR2 GPR27 KLRC3 FZD6 receptor activity ERBB3 LGALS3BP ADRB2 RRH KLRF1 CD3D P2RY10 F2R KLRB1 GO:0007166: Cell surface receptor linked signal transduction 41 246 0.0012 53 396 0.0231 39 275 0.0427 20 108 0.0427 36 250 0.0427 45 316 0.0222 27 163 0.0351 The 830 probeset identifiers found to be significantly deregulated in T-PLL (Table S2) were annotated and analyzed for the presence of overrepresented “biological processes” and “molecular functions” using the GOstat tool (2) we used a list of 11028 probesets of the HG-U133A array, which showed equal to or greater than 50% P detection calls in T-PLL (n=5) and/or normal T-cell (n=8) array analyses. Significant GO terms are indicated together with the associated genes, the number of associated genes in the T-PLL target gene list and the number of associated genes in the reference list. Computed p-values were corrected for multiple testing using the Benjamini and Hochberg method (5). Table S4 Summary of the applied probes and FISH results Table S5 Overview of the FISH-proven regions used to define the cut-offs for chromosomal deletions and gains in the GeneChip analysis Deletions Chromosomal region Number of cases with deletion as detected by GeneChip 3 0 2 4 3 0 2 2 Number of cases with deletion as detected by FISH 3 1 2 4 3 1 3 3 Median SPA_CN value of cases Number of cases with gain as detected by FISH 2 7 Median SPA_CN value of cases 5p15 8p11.21 Number of cases with gain as detected by GeneChip 2 7 14q32.1 17p13 2 1 2 1 6q21 7q35 10p15~14 11q22~23 18p11.32~22 21q22 22q11.21 22q11.23 1.50, 1.39, 1.30 2.12 1.37, 1.24 1.35, 1.34, 1.36, 1.36 1.03, 0.94, 1.54 3.00 1.29, 1.41, 1.79 1.37, 1.34, 1.76 Gains Chromosomal region 5.30, 3.17 3.66, 5.34, 3.38, 3.71, 3.26, 3.13, 3.48 3.35, 3.50 2.87 Reference List (1) Schroers R, Griesinger F, Trümper L, Haase D, Kulle B, Klein-Hitpass L, Sellmann L, Dührsen U, Dürig J. Combined analysis of ZAP-70 and CD38 expression as a predictor of disease progression in B-cell chronic lymphocytic leukemia. Leukemia 2005; 19(5):750-758. (2) Beissbarth T, Speed TP. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 2004; 20(9):1464-1465. (3) Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nature Genetics 1999; 22(3):281-285. (4) Mitelman F. ISCN: An International System for Human Cytogenetic Nomenclature. Basel: Karger, 1995;94-104. (5) Hochberg Y, Benjamini Y. More powerful procedures for multiple significance testing. Stat Med 1990; 9(7):811-818. (6) Matsuzaki H, Dong S, Loi H, Di X, Liu G, Hubbell E, Law J, Berntsen T, Chadha M, Hui H, Yang G, Kennedy GC, Webster TA, Cawley S, Walsh PS, Jones KW, Fodor SP, Mei R. Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nat Methods 2004; 1(2):109-111.