Supplemental Material

Supplementary Methods SBIME firstly performs an ANOVA (analysis of variance) for the different genes and computes their associated p-values. The variables used are the expression data for each gene, such as the log(ratio), signal intensity or any other value that represents expression levels. The factor studied is a biological criterion defined by the user, for instance, the time points from a kinetics study, or the subtypes of a particular cancer. Genes showing the lowest variance within a biological group and the most significant difference of means between the groups will have the lowest p-value derived from ANOVA. Secondly, using annotation files from Gene Ontology, regulatory networks from BioCarta, metabolic pathways from KEGG and chromosomal localization downloaded from [ftp://ftp1.nci.nih.gov/pub/CGAP/], and protein domains from PFAM, , which are automatically updated on a regular basis, SBIME recovers each gene present in the data set and stores its p-value. Any categories (pathway, GO annotation, domain, etc…), which do not have a corresponding gene on the chip, are eliminated from the rest of the study. For each category, SBIME then counts the number of genes found on the chip and the number of genes found to have a p-value lower than the threshold fixed by the user and expresses this as a percentage (Ps). With the proportion of differentially expressed genes calculated for each category, the next step is to assess the significance of these results. There are two ways of doing this: The Z-score approach: For each functional category containing X genes on the chip, SBIME randomly selects X genes from the entire data set and compares the percentage of genes with a pvalue lower than the established threshold with the number of X genes (Pr). This operation is repeated N times (the number of iterations (N) is determined by the user), and a Z-score is finally computed as follows: Z Ps  P r P Under the null hypothesis H0: Z ~N (0.1) r where Ps is the percentage of significant genes found in the data set for a given functional category; Pr is the percentage of significant genes found randomly, P r the mean of the N Pr and  Pr the square root of the variance of the N Pr. A Z-score is thus computed for each category. Categories displaying the highest Z-score can be considered to be of special interest in the data set studied (typically Z>3). Finally, a p-value is associated with the Z-score in order to facilitate comparison with results obtained from the second option, described below. Fisher’s exact test: Each proportion of significant genes found by functional category is compared to the proportion of significant genes on the array. The corresponding p-value is calculated using the following formula: p (a  b)! (c  d )! (a  c)! (b  d )! (a  b  c  d )! a!b!c! d! , where a is the number of significant genes on the array; b is the number of all the genes on the array; c is the number of significant genes in a given functional category and d is the number of all the genes in the same category.

Supplemental Material

Related documents

Products

Support

Supplemental Material

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib