Gene Ontology Based Prediction and Analysis of Microarray Data, GO-PAM Narinder Singh Sahni Centre for Computational Biology and Bioinformatics,School of Information Technology, Jawaharlal Nehru University DNA microarray technology permits us to analyze the behaviour of several thousands genes simultaneously. The routine case of analysis involves looking for differentially expressed genes in two class problems, e.g. {disease/non-disease}, {stress/control}, {knock-out/wild type} etc. The “traditional” approach to analyzing gene expression data is to use data mining algorithms for detecting differentially expressed genes, and then relate these genes to biological pathways. The biggest drawback following this approach is that the hypotheses follows the actual analysis of data. In other words, one first mines the data and then forms some kind of a biological hypotheses. Also, different data mining methods would provide different lists of differentially expressed genes, leading to discovery of different GO’s for the same data set. The sets of differentially expressed genes may have different biological interpretations, which are hard to decipher. Therefore, it becomes pertinent to infer differences in gene expression based on the biological background as well. Gene Ontology (GO) provides a structured vocabulary in terms of a hierarchy in form of DAG, for annotation of genes and proteins. In this presentation, we present a hypotheses based approach where we first select the biological attributes (as described in the GO database) that are of interest. Next, we select only those genes that are related to the biological attribute of interest. Finally, we build a model using only the chosen genes to validate whether the chosen biological attribute shows significant difference in the condition under study, either accepting or refuting the hypotheses. One can easily go through the entire list of over 22000 attributes as described in the GO-database. This approach has several advantages to the first method as described above. Aspects such as signal-to-noise ratio due to the sheer number of genes involved, validation (both statistical and biological), and more importantly gene selection become much more manageable. Our results on re-analyzing several of the published data in human-cancer show a significant improvement in error rates in comparison to what has been reported in the original articles. Also, the proposed methodology also provides a direct link to relevant biological attributes and pathways thus reducing the overall effort required in analyzing gene expression data.