dictyBase – March 2006 Progress Report.
1. GO Staff.
GO Annotators: Petra Fey, Pascale Gaudet, Karen Pilcher
GO Developer: Sohel Merchant
Gene Ontology annotation is integral to the curation process at dictyBase. Annotation of gene products to GO terms is done concurrently with curation of literature, phenotypes, and sequences. All three curators work to annotate gene products of the Dictyostelium genome.
We are now providing a gp2protein file to the GO Consortium. This allows other users to see and BLAST against the sequences being annotated at dictyBase.
To conform to the weekly checking script, we have implemented internal quality control before the GO association file is committed. We have adopted the GO checking script, which checks the association files generated by dictyBase and reports any errors. In addition to checking for errors, the quality control script fixes any errors that can be modified automatically, such as replacing secondary GO IDs with primary
GO IDs.
We have created an automated process to generate IEAs using the
InterPro2GO mappings from EBI. These electronic annotations were previously provided by the Welcome Trust Sanger Institute annotation pipeline. We are using the WSInterProScan remote web service running at
EBI to generate our annotations. We plan to update these IEAs quarterly to ensure that we are in sync with the most current
InterPro2GO mappings.
Sohel Merchant is currently working on a redesign of the GO curation tools. The design changes will allow us to capture different types of physical and genetic interactions. This will include a complete redesign of the underlying database schema to capture more complicated sets of GO data. For example, we will be able to differentiate gene product X interacting with Y and Z simultaneously from independent interactions of X with Y and X with Z. In addition, we will be implementing a way to store context information such as cell type and developmental stage.
We have implemented a generic OBO file parser to parse any ontology file in OBO format. This will be available to users in the next
BioPerl release.
Pascale Gaudet is a member of the AmiGO working group.
Petra Fey is a member of the OBO-Edit working group.
2. Annotation Progress: Current GO Stats. a. Number of annotations to various GO aspects.
Apr-05 Mar-06 Increase
Total number of annotations: 18223 26229 44%
Function
Process
Component
Total number of non-IEA annotations:
8108
6283
3832
10898
9378
5953
6905 10911
34%
49%
55%
58%
Function
Process
Component
Total number of distinct non-IEA annotations:
Total number of annotations:
2440
2576
1889
3888
3946
3077
1116 1521
18223 26229
59%
53%
62%
37%
44%
IMP
IGI
IPI
ISS
IDA
IEP
TAS
NAS
NR
IEA
ND
IC
RCA
399
33
57
4296
454
4
476
6
0
11318
1091
89
0
682
58
110
6190
797
38
514
51
0
15318
2314
157
0
71%
76%
93%
44%
76%
850%
79%
750%
0%
35%
112%
76%
0%
b. Number of genes annotated to various GO aspects.
Apr-05 Mar-06 Increase
Total number of genes:
Total number of curated gene models:
Total number of genes with annotations:
13640
2274
5392
13620
3181
6299
-0.14%
40%
17%
Function
Process
Component
Total number of genes with non-IEA annotations:
4725
3986
2871
1689
5716
5194
4082
2669
21%
30%
42%
58%
Function
Process
Component
1606
1602
1505
2547
2548
2447
59%
59%
63%
3. Methods of Annotation. a. Literature curation.
In addition to gene product and phenotype annotation, dictyBase curators extract GO from Dictyostelium publications. To date, we are current with the literature since January of 2004 and are working our way backwards chronologically while staying up-to-date with new publications. b. Curation of previously unidentified genes and gene products.
Because only a small fraction of the predicted proteins have been characterized in the literature, dictyBase curators are annotating gene products that have EST coverage and/or contain conserved functional domains. Gene products of this type are annotated with the ISS evidence code as there is no published data available. c. Automated methods. i. IEAs via the BLAST method. All Dictyostelium protein sequences are analyzed by BLAST against GO gene association sequence files
(http://www.geneontology.org/index.shtml#downloads), identifying proteins in other organisms that align with Dictyostelium proteins with an E value ≤ e -50 . GO annotations that have been manually assigned to these proteins from other species are then imported and attached to the corresponding gene product in dictyBase. The proteins from which the annotations are derived are displayed in the 'Evidence' column on the
Gene Ontology evidence and references page. ii. IEAs via InterPro2GO. Protein sequences in dictyBase are scanned for conserved functional domains that exist in InterPro. Domains are mapped to GO annotations using the InterPro2GO mapping file generated
by Nicola Mulder at EBI. These GO annotations are then assigned to the respective gene products. d. Quality control measures.
We work closely at dictyBase to ensure that annotations are consistent between curators and conform to the guidelines set in the annotation documentation. We also have a set of internal guidelines recorded in the dictyBase Standard Operating Procedures (http://dictybase.org/SOPs) to which curators adhere. The three curators meet monthly with the goal of maintaining consistency. In these meetings, we examine a variety of publications with the intent of covering all possible issues in annotation. Annotations are typically conservative, in particular when making ISS annotations for which there is no published information about the Dictyostelium gene product.
4. Ontology Development.
Annotators have requested several additions and changes to the ontologies necessary to annotate Dictyostelium development. These requests focus on, but are not limited to, process terms to describe developmental events such as cell type differentiation and formation of developmental structures. Non-developmental terms have also been requested, including terms related to metabolism, cytoskeleton, DNA modification, and glycosylation. Additionally, we frequently deal with
InterPro2GO mappings and request deletions or additions of terms to
InterPro records. Since April 2005, we have submitted 51 items to
Curator requests and 47 items to the Annotation issues at
SourceForge.net.
5. Publications, tutorials, and presentations.
Chisholm, R.L. (2005). dictyBase: Using the Genome to Organize
Dictyostelium Biology. In Dictyostelium Genomics. W. Loomis and A.
Kuspa, editors. Horizon Press. Pp. 23-39.
Chisholm, R. L., Gaudet, P., Just, E. M., Pilcher, K. E., Fey, P.,
Merchant, S. N., Kibbe, W. A. (2006). dictyBase, the model organism database for Dictyostelium discoideum. Nucleic Acids Res 34:D423-7.
Fey. P., Gaudet, P., Pilcher, K. E., Franke, J., Chisholm, R.L. (2005). dictyBase and the Dicty Stock Center. In Dictyostelium discoideum
Protocols. L. Eichinger and F. Rivero-Crespo, editors. Humana Press. In press.
Gaudet, P., Pilcher, K. E., Fey, P., Kibbe, W. A., Chisholm, R. L.
(2005). Curation at dictyBase [poster]. The First International
Biocurator Meeting, Pacific Grove, CA.
Gaudet, P., Pilcher, K. E., Fey, P., Kibbe, W. A., Chisholm, R. L.
(2005). Curation at dictyBase [poster]. International Dictyostelium
Conference, Autrans, France.
Just, E. M., Merchant, S. N., Kibbe, W. A., Chisholm, R. L. (2005). An
Overview of the Architecture of dictyBase [poster]. International
Dictyostelium Conference, Autrans, France.
Just, E. M., Merchant, S. N., Kibbe, W. A., Chisholm, R. L. (2005). dictyBase: A Case Study of an Existing Genome Resource Adopting GMOD
Tools. Genome Informatics 2005, Cold Spring Harbor Laboratory, NY.
Just, E. M., Merchant, S. N., Kibbe, W. A., Chisholm, R. L. (2005). dictyBase: A Case Study of an Existing Genome Resource Adopting GMOD
Tools [poster]. GMOD Spring 2005, Menlo Park, CA.
Merchant, S. N., Just, E. M., Kibbe, W. A., Chisholm, R. L. (2005). An
Architectural Overview Of A Secure, Robust And Cross-platform
Integration Of The Apollo Genome Annotation Tool Into A Perl Web
Application [poster]. Genome Informatics 2005, Cold Spring Harbor
Laboratory, NY.
Pilcher, K. E., Gaudet, P., Fey, P., Just, E. M., Merchant, S. N.,
Kibbe, W. A., Chisholm, R. L. (2005). dictyBase Workshop 2005.
International Dictyostelium Conference, Autrans, France.
6. Other highlights. i. dictyMart. dictyBase developers implemented a Dictyostelium specific version of BioMart (http://www.biomart.org). This powerful new tool allows our users to do more advanced queries in the database and retrieve comprehensive, custom search results, including FASTA sequence files. ii. Downloads. dictyBase now offers a comprehensive data download site. Items available for download include various sequence datasets, functional annotations, GFF3 files, Gene Ontology association file,
Dictyostelium mutant phenotypes, the Franke Reference Library, third party tools sent to us by users, and protein domains as predicted by
InterPro. iii. Phenotype ontology. The curators at dictyBase have been working to create mutant type and phenotype ontologies. The ontologies will be used for phenotype curation in our next software release planned for
April 2006, and we plan to comply with PATO guidelines once they have been revised. iv. Dicty Stock Center. To improve integration of the Dicty Stock
Center into dictyBase, we have linked the strains in the Stock Center with their corresponding genes. The Stock Center is maintained and curated by Jakob Franke at Columbia University, New York. v. Genome Version 2.5. The latest version of the genome from the
Sequencing Centers has been integrated into dictyBase. Changes in this version include gap closures and base call corrections; these changes are being fed back into GenBank. vi. Chado. dictyBase developers have implemented the Chado database schema (www.gmod.org/schema/index.shtml). We have used the schema to create improved relationships between genes and features. The next step
is to utilize Chado for display of non-protein-coding features such as tRNAs, rRNAs, and microRNAs. vii. STKE. We have created reciprocal links between the Gene Pages in dictyBase and the pathways pages in the AAAS Signal Transduction
Knowledge Environment (STKE).