Supplementary Material Legends (doc 37K)

advertisement
1
Legends to Supplementary Material
2
3
Supplementary Figure 1. Similarity of samples of distinct tissues within each
4
normal tissue expression data set as shown by hierarchical clustering.
5
The basis of these clustered heatmap is an expression compendium of 105 genes
6
selected without knowledge of class/tissue membership, only by usage of statistics of
7
a gene’s expression across samples. A) Hierarchical clustering of 10 samples of the
8
Su et al. data set. Samples cluster by tissue type. B) Hierarchical clustering of 25
9
samples of the Roth et al. data set. Samples cluster by tissue type.
10
11
Supplementary Figure 2. Similarity of samples of different tissues between
12
normal tissue expression data sets as investigated by correspondence
13
analysis.
14
The Pearson correlation coefficients r of a cross-comparison between all samples
15
from Su and Roth data sets were calculated. Each cell of the resulting matrix of
16
correlation coefficients was transformed appropriately (x = (1+r) / (1-r)), and
17
subjected to standard correspondence analysis as implemented in the R package ca. In
18
the correspondence analysis biplot shown here, the coordinates of row and column
19
objects of the cross-correlation matrix (i.e. the samples in Su et al. and Roth et al. data
20
sets) are transformed into the same 3D space. Note, that samples of each data set that
21
originate from matching tissue sources are located at similar positions in the plot.
22
23
Supplementary Figure 3. Clustered heatmap of the combined normal tissue
24
data set.
25
Here we show a heat map clustered on rows and columns by hierarchical clustering
26
that is based on the expression data of 105 genes in the 35 normal tissue samples. For
27
between-study normalization we mean centered the log-transformed expression
28
values. Note, that samples cluster by tissue type and that distinct clusters of genes
29
emerge that show characteristic expression in distinct tissues or tissue combinations.
30
31
Supplementary Figure 4. Correspondence analysis to select gene for classifier
32
training.
33
Here we show the result of a correspondence analysis of the exponentially
34
transformed 105-gene-by-5-normal-tissue-centroids data matrix. Note, that some
35
genes cluster closely with some tissues while others are not associated with distinct
36
tissues. Only the 70 genes closest to distinct tissues were used for training of the
37
normal tissue classifier.
38
39
Supplementary Figure 5. Heatmap of normal tissue expression data for the 70
40
classifier genes.
41
Here we show a heatmap of the combined normal tissue expression data (log
42
transformed, mean centered) of the 70 classifier genes that was used for training of the
43
classifier. Samples are ordered by tissue type. This is an excerpt from Figure 2.
44
2
45
Supplementary Figure 6. Heatmap of primary cancer expression data for the 70
46
classifier genes.
47
Here we show a heatmap of the expression data of the 70 classifier genes in 652
48
tumors extracted from the expO expression compendium (log transformed, mean
49
centered). Samples are ordered by cancer type. This is an excerpt from Figure 2.
50
51
Supplementary Table 1. Over-represented literature sub-networks among the
52
70 classifier genes.
53
The networks were derived from a large literature-based gene-to-gene and gene-to-
54
compound network that underlies the metacore software from GeneGo (CA, USA).
55
Statistical testing for over-representation of this network in the 70-gene lists and for
56
overrepresentation of functional categories is performed by hypergeometric testing.
57
3
Download