The SOFG Anatomy Entry List (SAEL) as an annotation tool for functional genomics experiments Authors: Stuart Aitken Richard Baldock Jonathan Bard Albert Burger Duncan Davidson* Terry Hayamizu Helen Parkinson** Alan Rector Martin Ringwald Jeremy Rogers Cornelius Rosse Christian J. Stoeckert University of Edinburgh MRC-HGU University of Edinburgh Herriot Watt University MRC-HGU The Jackson Laboratory EBI University of Manchester The Jackson Laboratory University of Manchester University of Washington University of Pennsylvania stuart@inf.ed.ac.uk Richard.Baldock@hgu.mrc.ac.uk jbard@staffmail.ed.ac.uk ab@macs.hw.ac.uk Duncan.Davidson@hgu.mrc.ac.uk terryh@informatics.jax.org parkinson@ebi.ac.uk rector@cs.man.ac.uk ringwald@informatics.jax.org jrogers@cs.man.ac.uk rosse@u.washington.edu stoeckrt@pcbi.upenn.edu * Communicating Author ** Presenting Author Introduction The long study of anatomy and the need for common annotation for biology and medicine have resulted in a proliferation of biomedical ontologies built for different purposes, using different knowledge representation tools and often very rich in terms, structure and relationship types. Anatomy components in biomedical ontologies serve varied purposes, for example, descriptions of medical procedures in GALEN (Rector et al.1999), or description of traits or phenotype. As there are multiple anatomy ontologies they often contain nonorthogonal concepts, though these are often defined and structured differently within each ontology. For example, the Foundational Model of Anatomy or FMA (Rosse and Mejino 2003), which takes a structural view of anatomy, contains the concept “liver”, which is defined in free text as “Lobular organ the parenchyma of which consists of lobules which communicate with the biliary tree” Liver is also described formally by various attributes, including: member-of, bounded-by, component-of, adjacency etc. If we consider liver in the Mouse Anatomical Dictionary (Hunter A. et al. 2003), which has a developmental view, the “liver” is part-of the “liver and biliary system” and developmental stage information is provided. The level of detail provided by these ontologies is variable and the purposes of the ontologies are clearly different, though both contain the concept liver. Merging ontologies is a complex process (Rector, 2004) as the different structures and relationship types must be reconciled and this process may not be necessary for functional genomics applications. Selection of anatomical terms for annotation of functional genomics annotation therefore requires some knowledge of where to look, some information on the purpose and scope of the ontology queried, and an ability to critically assess whether the term returned is accurate. These tasks may be intuitive for the average scientist who has some notion of the concept of each term. However, in a high throughput situation it is time consuming to query large and complex ontologies directly, and in our experience many scientists simply annotate with free text. This causes data exchange and query problems for those who manage functional genomics data. With these points in mind Standards and Ontologies for Functional Genomics (SOFG, www.sofg.org) has set up an international effort to integrate human and mouse anatomy ontologies (http://www.sofg.org). As part of this effort, a workshop group comprising representatives from Galen (Rector et al. 1999), the FMA (Rosse and Mejino 2003), the Mouse Anatomical Dictionary for Mouse Development (Bard et al. 1998) and the Human Developmental Anatomy ontology (Hunter et al. 2003), the Anatomical Dictionary for the Adult Mouse (http://www.informatics.jax.org/searches/AMA_form.shtml), the RNA Abundance Database (RAD, Manduchi et al. 2004) and ArrayExpress (Brazma et al. 2003) was formed to consider the following: 1. Explore whether an entry list to existing ontologies would be useful in a functional genomics context, specifically microarrays 2. Determine the use cases, limitations and criteria for building such a list 3. Produce an anatomy entry list of terms 4. Test the draft set of terms against existing ontologies and functional genomics data repositories 5. Consider implementation issues and a web services model for querying mapped ontologies. The result of the workshop is the SOFG Anatomy Entry List (SAEL) consisting of an unstructured list of approx 100 vertebrate/mammalian anatomy terms representing the major body substances (e.g. blood) and dissectable parts (e.g. liver) likely to be used in a microarray or other functional genomics experiments. SAEL was drawn up by looking at various source ontologies and was subsequently tested against user-supplied vertebrate anatomy annotation in the following gene expression databases: ArrayExpress (Brazma et al. 2003), RAD (Manduchi et al. 2004), GXD (Hill et al. 2004), and SMD (Sherlock et al. 2001). The list is intended for use both as an annotation resource and as an entry point to mapped ontologies for both biologists and curators. The SAEL terms are uniquely identified and are available as an OBO format file from www.sofg.org/sael. The terms are purposely not defined and are presented as an unstructured alphabetical list as they are intended for simple annotation and mapping, rather than as an independent anatomy ontology. The participating databases and ontologies are preparing to map SAEL terms to their resources so that users can move easily from SAEL to these more sophisticated resources. The SAEL will be made available at the SOFG web site (www.sofg.org), as will the mappings from SAEL entries to participating source ontologies. The knowledge acquisition tool COBrA, developed as part of the XSPAN project (www.xspan.org), is used to record mappings and relevant provenance data. We invite interested communities to provide feedback on SAEL. We will also use XSPAN's web service and user interfaces to provide program-level and additional user access to SAEL and its mappings. In the first instance, this web service merely returns the accession numbers and names of mapped anatomical structures in the source ontologies. In the second phase, the central web service will interoperate with additional web services, to be deployed at the source ontology sites, which will facilitate access to a standardised set of properties of the anatomical structures in the source ontologies. The corresponding WSDL descriptions will be posted on the SOFG web site. References: Bard JBL, Kaufman MH, Dubreuil C, Brune RM, Burger A, Baldock RA, Davidson DR. 1998. An internet-accessible database of mouse developmental anatomy based on a systematic nomenclature. Mech. Dev. 74:111-120 Brazma, A. et al. (2003). “ArrayExpress--a public repository for microarray gene expression data at the EBI.” Nucleic Acids Res 31(1): 68-71 Hill DP, Begley DA, Finger JH, Hayamizu TF, McCright IJ, Smith CM, Beal JS, Corbani LE, Blake JA, Eppig JT, Kadin JA, Richardson JE, Ringwald M. 2004. The Mouse Gene Expression Database (GXD): updates and enhancements. Nucleic Acids Res. 32 Database issue:D568-571 Hunter A, Kaufman MH, McKay A, Baldock R, Simmen MW, Bard JB. 2003. An ontology of human devleopmental anatomy. J. Anat. 203:347-355 Manduchi E., Grant G.R., He H., Liu J., Mailman M.D., Pizarro A.D., Whetzel P.L., Stoeckert C.J. Jr. (2004) RAD and the RAD Study-Annotator: an approach to collection, organization and exchange of all relevant information for high-throughput gene expression studies. Bioinformatics, 20(4): 452-459 Sherlock, G., Hernandez-Boussard, T., Kasarskis, A., Binkley, G., Matese, J.C., Dwight, S.S., Kaloper, M., Weng, S., Jin, H., Ball, C. A., Eisen, M.B., Spellman, P.T., Brown, P.O., Bostein, D., Cherry, J.M. (2001) The Stanford Microarray Database. Nucleic Acids Res, 29:152-5 Rector, A.L., Zanstra, P.E., Solomon, W.D., Rogers, J.E., Baud, R., Ceusters, W., W Claassen, Kirby, J., Rodrigues, J.-M., Mori, A.R., Haring, E.v.d. and Wagner, J. Reconciling Users' Needs and Formal Requirements: Issues in developing a Re-Usable Ontology for Medicine. IEEE Transactions on Information Technology in BioMedicine, 2 (4). 229-242 Rector, A.L., Defaults, context and knowledge: Alternatives for OWL-Indexed Knowledge bases. in Pacific Symposium on Biocomputing (PSB-2004), (Kona, Hawaii, 2004), World Scientific, 226-238 Rosse C, Mejino JVL. 2003. A reference ontology for biomedical informatics: the Foundational Model of Anatomy. J Biomed Inform. 36:478-500