1 Legends to Supplementary Material 2 3 Supplementary Figure 1. Similarity of samples of distinct tissues within each 4 normal tissue expression data set as shown by hierarchical clustering. 5 The basis of these clustered heatmap is an expression compendium of 105 genes 6 selected without knowledge of class/tissue membership, only by usage of statistics of 7 a gene’s expression across samples. A) Hierarchical clustering of 10 samples of the 8 Su et al. data set. Samples cluster by tissue type. B) Hierarchical clustering of 25 9 samples of the Roth et al. data set. Samples cluster by tissue type. 10 11 Supplementary Figure 2. Similarity of samples of different tissues between 12 normal tissue expression data sets as investigated by correspondence 13 analysis. 14 The Pearson correlation coefficients r of a cross-comparison between all samples 15 from Su and Roth data sets were calculated. Each cell of the resulting matrix of 16 correlation coefficients was transformed appropriately (x = (1+r) / (1-r)), and 17 subjected to standard correspondence analysis as implemented in the R package ca. In 18 the correspondence analysis biplot shown here, the coordinates of row and column 19 objects of the cross-correlation matrix (i.e. the samples in Su et al. and Roth et al. data 20 sets) are transformed into the same 3D space. Note, that samples of each data set that 21 originate from matching tissue sources are located at similar positions in the plot. 22 23 Supplementary Figure 3. Clustered heatmap of the combined normal tissue 24 data set. 25 Here we show a heat map clustered on rows and columns by hierarchical clustering 26 that is based on the expression data of 105 genes in the 35 normal tissue samples. For 27 between-study normalization we mean centered the log-transformed expression 28 values. Note, that samples cluster by tissue type and that distinct clusters of genes 29 emerge that show characteristic expression in distinct tissues or tissue combinations. 30 31 Supplementary Figure 4. Correspondence analysis to select gene for classifier 32 training. 33 Here we show the result of a correspondence analysis of the exponentially 34 transformed 105-gene-by-5-normal-tissue-centroids data matrix. Note, that some 35 genes cluster closely with some tissues while others are not associated with distinct 36 tissues. Only the 70 genes closest to distinct tissues were used for training of the 37 normal tissue classifier. 38 39 Supplementary Figure 5. Heatmap of normal tissue expression data for the 70 40 classifier genes. 41 Here we show a heatmap of the combined normal tissue expression data (log 42 transformed, mean centered) of the 70 classifier genes that was used for training of the 43 classifier. Samples are ordered by tissue type. This is an excerpt from Figure 2. 44 2 45 Supplementary Figure 6. Heatmap of primary cancer expression data for the 70 46 classifier genes. 47 Here we show a heatmap of the expression data of the 70 classifier genes in 652 48 tumors extracted from the expO expression compendium (log transformed, mean 49 centered). Samples are ordered by cancer type. This is an excerpt from Figure 2. 50 51 Supplementary Table 1. Over-represented literature sub-networks among the 52 70 classifier genes. 53 The networks were derived from a large literature-based gene-to-gene and gene-to- 54 compound network that underlies the metacore software from GeneGo (CA, USA). 55 Statistical testing for over-representation of this network in the 70-gene lists and for 56 overrepresentation of functional categories is performed by hypergeometric testing. 57 3