Supplementary File 5 (doc 40K)

advertisement
PAM method is used to find genes in a set of DNA chips which best classifies samples
and to validate a set of genes to classify samples. The goal is to find the smallest subset of
genes allowing the best classification.
PAM can be divided in 2 main parts:
-
The first step is to define, by calculating, a parameter characterizing each gene. This
will allows to generate various subsets of genes tested in a second step. Starting with
all genes, subsets of genes are produced by selecting genes whose parameter reduced
by  is greater or equal to zero.  is a value starting at zero for the first group (all
genes), and increasing to a value equal to the highest parameter value (Nearest
Shrunken Centroids method).
-
The second step is a K-fold validation. Samples are divided into K equal parts. Based
on the K-1 parts, the selected genes are used to predict the classes of Kth part, an error
rate is computed. This prediction is repeated K fold, each part is predicted based on
the gene expression of the K-1 parts. An average error rate is calculated.
The evolution of average error prediction is used to find the limiting value of  producing
the smallest group of genes with the weakest average error rate. This group of genes allow
prediction of the class of each sample with the weakest probability of mistake. To validate a
set of genes, the average error rate must be at zero for =0 and increase as .
There are n samples in K classes, for each gene of these samples, xij is the expression
for the gene i in the sample j and.
xik
is the average expression of the gene i in the class k, this is the class centroid.
xi
is the overall centroid for the gene i.
xij
xik = ΣjЄCk n
k
n
xi = Σj=1
xij
n
The parameter characterizing each gene (dik ) is calculated from
dik = (xik – xi) / Si
xik and xi
(Si is the pooled within-class standard deviation.)
The subsets of genes are produced by calculating
d’ik
d’ik = sign(dik)(|dik|-Δ)+
If |dik|-Δ < 0 then d’ik = 0 , the gene is eliminated
References:
- The Stanford website: http://www-stat.stanford.edu/%7Etibs/PAM/
- "Diagnosis of multiple cancer types by shrunken centroids of gene expression"
PNAS 2002 99:6567-6572 (May 14).
Download