Secondary databases consist of sequences of gene

Secondary databases consist of sequences of gene products organized under functional categories. Such databases will tend to cluster under well defined strings those sequences that represent the same functional product in diverse organisms. One example is Kegg Orthology (KO). We analyzed the performance of KO as a source for annotation using all entries in KO for seven prominent organisms: C. familiaris (Cfa), M. musculus (Mmu), R. norvegicus (Rno), A. thaliana (Ath), C. elegans (Cel), D. melanogaster (Dme) and H. sapiens (Hsa), totalizing 25,060 proteins clustered under 17,056 distinct KO entries. These proteins were used to annotate ESTs from four model organisms: Ath, Cel, Dme and Hsa. The annotation was classified as correct, changed and speculated using as reference the relationship between an organisms’s ESTs and its own proteins. We found a small number of changed annotation, but the speculation was high. This problem was solved when we used proteins in KEGG GENES database that was not in KO as source for the annotation, rising the correct annotate to over 90%. This problem was minimized when proteins in KEGG GENES database that were not in KO were used to assign ESTs to their cognate proteins, increasing the correct annotation to over 90%. Results indicate that KEGG GENES database still contain candidates do be incorporated in KO database. Secondary databases consist of sequences of gene products organized under functional categories. Such databases will tend to cluster under well defined strings those sequences that represent the same functional product in diverse organisms. One example is Kegg Orthology (KO). We analyzed the performance of KO as a source for annotation using all entries in KO for seven prominent organisms: C. familiaris (Cfa), M. musculus (Mmu), R. norvegicus (Rno), A. thaliana (Ath), C. elegans (Cel), D. melanogaster (Dme) and H. sapiens (Hsa), totalizing 25,060 proteins clustered under 17,056 distinct KO entries. These proteins were used to annotate EST from four model organisms: Ath, Cel, Dme and Hsa. The annotation was classified as correct, changed and speculated using as control the alignment of an EST to its cognate protein. We found a small number of changed annotations (under 10%), but the speculation was high (2037%). This problem was minimized when proteins in KEGG GENES database that were not in KO were used to assign ESTs to their cognate proteins, putatively increasing the correct annotation to over 90% and reducing speculation to under 5%. Results indicate that KEGG GENES database still contain candidates do be incorporated in KO database. K-EST is a web-based application that shows the annotation of large sets of ESTs from four model organisms (A. thaliana, C. elegans, D. melanogaster and H. sapiens with the KOG database. K-EST may be used as a tool to predict the EST sampling or, roughly, gene expression in novel transcriptome projects by comparison to the four model organism expression / sampling database calculated with the KOG dataset. K-EST uses statistical methods to analyze differential expression between organisms and internal cDNA libraries. The expression / sampling may be normalized by constitutively expressed transcripts: actin and GAPDH. Other important feature of K-EST is to show the conservation between genes of the four organisms used. Availability: http://www.biodados.icb.ufmg.br/K-EST/

Secondary databases consist of sequences of gene

Related documents

Products

Support

Secondary databases consist of sequences of gene

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib