Ontology Exchange Languages for Bioinformatics

An Evaluation of Ontology Exchange Languages for Bioinformatics Authors: Robin McEntire (SB) Peter Karp (Pangea Systems) Neil Abernethy (InGenuity) Frank Olken (LBNL) Robert E. Kent (WSU) Matt DeJongh (NetGenics) Peter Tarczy-Hornoch (U of Washington, Seattle) David Benton (SB) Dhiraj Pathak (GW) Gregg Helt (UC Berkeley) Suzanna Lewis (UC Berkeley) Anthony Kosky (GeneLogic) Eric Neumann (NetGenics) Dan Hodnett (NetGenics) Luca Tolda (Merck KGA) Thodoros Topaloglou (GeneLogic) August 1, 1999 Abstract Ontologies are specifications of the concepts in a given field and the relationships among those concepts. The development of ontologies for molecular-biology information and the sharing of those ontologies within the bioinformatics community are central problems in bioinformatics. If the bioinformatics community is to share ontologies effectively, ontologies must be exchanged in a form that uses standardized syntax and semantics. This paper reports on an effort among the authors to evaluate a number of alternative ontology-exchange languages, and to recommend one or more languages for use within the larger bioinformatics community. The study selected a set of candidate languages, and defined a set of capabilities that the ideal ontology-exchange language should satisfy. The study scored the languages according to the degree to which they provided each capability. In addition, the authors performed several ontologyexchange experiments with the two languages that received the highest scores: OML and Ontolingua. The result of those experiments, and the main conclusions of this study, was that the frame-based semantic model of Ontolingua is preferable to the conceptual graph model of OML, but that the XML-based syntax of OML is preferable to the Lisp-based syntax of Ontolingua. 1. Introduction Ontologies, as specifications of the concepts in a given field and the relationships among those concepts, provide insight into the nature of information produced by that field and are an essential ingredient for any attempts to arrive at a shared understanding of concepts in a field. Thus the development of ontologies for molecular-biology information and the sharing of those ontologies within the bioinformatics community are central problems in bioinformatics. If the bioinformatics community is to share ontologies effectively, the ontologies must be exchanged in some standardized form, such as using a file with a welldefined syntax and semantics. Exchange of bioinformatics ontologies will be simplified if the community can agree on a relatively small number of such exchange forms --- ideally, on one form. This paper reports on an effort among the authors to evaluate a number of alternative ontology-exchange languages, and to recommend one or more languages for use within the larger bioinformatics community. The evaluation effort involved three separate meetings in 1998 and 1999 by the authors, as well as experiments with the proposed ontology languages. In phase I of the evaluation, the authors selected a set of candidate languages, and a set of capabilities that the ideal ontology-exchange language should satisfy. The authors then scored the languages according to the degree to which they provided each capability. In phase II of the evaluation, the authors performed several ontology-exchange experiments with the two languages that rated the highest during phase I, which were OML and Ontolingua. This paper describes the evaluation process and its results in more detail. A web site maintained by the authors can be found at http://wwwsmi.stanford.edu/projects/bio-ontology/. 2. Motivations This section discusses the motivations for this work in more detail. Ontology development is important because every biological database employs an ontology, either implicitly or explicitly, to model its data. The more fine-grained the ontology, the more precisely the database will be able to model the nuances of the data that it tries to capture. A coarse-grained ontology will model only superficial aspects of the data, and therefore may not capture data elements that are important for some problem-solving task. For example, a genome-sequence database that fails to record which genetic code is used to encode a given DNA sequence does not provide the information that users of the database will need to reliably translate each DNA sequence into the corresponding protein sequence. A semantically malformed ontology is one that incorrectly models the semantics of its application domain, and therefore yields a database whose structure corrupts or restricts the information that it is intended to hold. For example, a metabolic database that defines a one-to-one relationships between enzymes and the reactions they catalyze cannot reliably model the fact that a bifunctional enzyme catalyzes two separate reactions. Ontology sharing is important for a number of reasons. First, ontology development is time consuming. Different bioinformatics groups who wish to develop ontologies for the same types of biological information will often arrive at a solution faster by adopting an existing ontology than by developing a new ontology de novo. For example, a group that wishes to define an ontology for microarray gene-expression data will almost certainly accomplish this task more quickly by consulting one or more existing microarray ontologies. Second, if different bioinformatics databases that cover the same types of data (e.g., protein sequences) employ the same ontology, they simplify the problem of database integration, i.e., of processing queries across multiple biological databases. Different ontologies for the same types of data produce a semantic mismatch that complicates the multidatabase query problem. Third, bioinformatics databases must make their schemas available to their user communities if the users are to have a full understanding of the semantics of these databases. Fourth, ontology sharing is important because ontologies themselves constitute a form of biological knowledge that is quite valuable when shared within the bioinformatics community. For example, the taxonomy of enzymatic reactions developed by the Enzyme Commission {EC}, and the taxonomy of gene function developed by Riley {RileyOntol} are valuable bioinformatics ontologies. Fifth, differences between ontologies puporting to represent the same biological process may lead to important insights into ways of improving those representations, and/or new insights into the underlying biology. 3. Terminology Ontologies are defined in the literature in a number of ways with varying degrees of formality. One prevailing definition of an ontology is a specification of a conceptualization that is designed for reuse across multiple applications. By conceptualization, we mean a set of concepts, relations, objects, and constraints that define some domain of interest. One can argue at length about what is and is not an ontology {Gruber,Guarino}. Our view is that ontologies exist at several levels of complexity:  A controlled vocabulary is an ontology that simply lists a set of terms.  A taxonomy is a set of terms that are arranged into a generalizationspecialization hierarchy. A taxonomy does not define attributes of these terms, nor does it define relationships between the terms.  An object-oriented database schema defines a hierarchy of classes, and attributes and relationships of those classes.  A knowledge-representation system based on first-order logic can express all of the preceding relationships, as well as negation and disjunction. The GeneClinics experiment (see www.geneclinics.org) illustrates this range of complexity among different ontologies. One of the first steps of the experiment was to augment the object-oriented schema with a richer set of capabilities including disjunction, role restriction and other constraints. In the GeneClinics object database much of this information was in fact represented in the Java software interacting with the database but was hidden from the end user. 4. Candidate Languages In this section we will discuss the candidate ontology-exchange languages that were evaluated by the authors. We discuss the reasons each language was selected for consideration as a bioinformatics ontology exchange language, we list the developers of each language and the reasons for its development, and provide references for each language. 4.1. Ontolingua The Ontolingua language was developed by a group at Stanford University for the exchange of ontologies, and was originally funded by the DARPA Knowledge Sharing Effort (Ref). Ontolingua is one of the most significant efforts to come out of the knowledge representation community and is based on the Knowledge Interchange Format (KIF), a language specifically built for the sharing of knowledge among different knowledge representation systems. The authors believed that any evaluation of languages for the exchange of ontologies must include this project. The semantics of Ontolingua are based on the frame knowledge representation systems developed by knowledge-representation researchers {Fikes,KarpReview}. 4.2. CycL Cyc is perhaps the best-known of the knowledge representation systems and is significant in its scope and its longevity. Cyc was developed by Doug Lenat at MCC but has since spun-off as a commercial entity, Cycorp. The underlying representation language for Cyc is called CycL, which derives from first-order predicate calculus but with extensions for additional expressivity. Cyc is one of the most significant commercial products, if not the most significant, in the marketplace currently. For this reason, as well as it's significance within the knowledge representation community and the rich expressive abilities, it was selected for evaluation. 4.3. OML/CKML Ontology Markup Language/Conceptual Knowledge Markup Language (OML/CKML) is a relatively new effort coming out of Washington State University that is attempting to base a system for the expression of ontologies on an XML-based syntax. The OML effort was begun in the 1990's and, though relatively young and untested, the authors believed it to have a significant representational power. This representational power combined with the interoperable nature of an XML-based language was believed to be a combination worth investigating. In addition, since OML/CKML is currently under development there is a potential for co-development to allow the bioinformatics community to influence features and expressive power of the language. There is, though, a possible disadvantage in that the language may evolve in ways that are not to the advantage of the community or is perhaps not stable or standardized. 4.4. OPM OPM was interesting to the authors as a candidate language for exchange of ontologies because of the significance of the OPM system, a product from GeneLogic used in a number of Pharmaceutical/BioTech organizations. OPM, as a product, is used for the integration of multiple information sources, and uses an underlying object-oriented federated schema for this purpose. 4.5. XML/RDF Extensible Markup Language/ Resource Description Format (XML/RDF) were developed by the W3C. The current standard for the XML Schema Language is controlled by the XML Schema Working Group of the W3C. (RDF) is intended to encode metadata concerning web documents. XML/RDF were investigated as a part of the evaluation effort because of the significance of the web and web-based applications. It is clear that the web is rapidly becoming the primary method for the exchange of information and data, and that XML is currently the leading candidate for a generic language for the exchange of semi-structured objects. XML/RDF as is, without a higher level formalism that encompasses the expressivity present in frame-based languages does not go far enough to allow the kind of modeling needed in the bioinformatics community. 4.6. UML The Unified Modeling Language (UML) provides a set of notational conventions that can be used by software application designers/developers to model their software system. UML was developed by Rational Software and is currently backed by Rational, Microsoft and the OMG. UML was selected for evaluation because it is another widely-used system for the representation of objects and their relationships. 4.7. OKBC The Open Knowledge Base Connectivity (OKBC) is an API for accessing and modifying multiple, heterogeneous knowledge bases. OKBC is not actually an ontology exchange language – it is a programmatic API. This group considered it because its knowledge model was designed to capture ontologies. The OKBC effort began as a part of the recent DARPA High Performance Knowledge Base (HPKB) program, and is the successor of Generic Frame Protocol (GFP), a frame representation system developed at the Artificial Intelligence Center at SRI. OKBC was created because it provides a uniform model that can be understood across a number of knowledge representation systems. The work on OKBC is currently being overseen by a working group lead by Richard Fikes at Stanford. Voting members in this group are; ISI, Stanford KSL, SRI International, Cycorp, SAIC and Teknowledge. 4.8. ASN.1 ASN.1 was included in this evaluation because of it's historical significance as an early language for the exchange of datatypes and simple objects. The ASN.1 standard was developed as part of the OSI networking stack. It has been, and still is, being used in a number of bioinformatics applications from the National Center for Biotechnology Information. ASN.1 was also used in conjunction with the Unified Medical Language System (UMLS) project at the National Library of Medicine (NLM), however, production of ASN.1 encodings of the UMLS has been discontinued because of low demand for ASN.1 by UMLS users. 4.9. ODL The Object Definition Language (ODL) is a relatively new standard coming out of the Object Database Management Group (ODMG) in the early 1990's. ODL was selected for evaluation because it is currently a de facto standard for a common representation of objects for object-oriented databases and programming languages and so has the potential to become a standard supported widely throughout the industry. The ODMG member companies include almost all organizations in the ODBMS/ODM industry and is very closely aligned with the OMG. 5. Evaluation 5.1. Evaluation Part I: Initial Evaluation Selection of Candidate Languages The evaluation process began with the selection of known languages for expressing ontologies. Our selection process relied on an informal review of current literature and prior knowledge of participants, but, we believe, covers the most viable candidate languages for the exchange of ontologies. The languages, once selected, were then divided among the authors for evaluation. Selection of Evaluation Criteria In order to evaluate the languages in a consistent fashion the authors arrived at a set of questions over which each candidate language would be evaluated. The questions that were distributed to members of the working group is included in Appendix A. The questions were divided into the following five major categories; 1. Language Support and Standardization: This section includes general questions about the depth of support for the language, including technical support and relationship with standards efforts 2. Data model/capabilities: This section asks about the richness of the expressive capabilities of the language. 3. Querying: This section poses questions about the capabilities of query languages available for a representation language. 4. Performance: Though not related to issues of the expressiveness of the language, the authors wanted to capture some notion of what might be expected in terms of performance if we were to use a given language. 5. Other Issues: This section is more concerned with pragmatics, such as current use of the language and representation of, or connectivity to, non-ontology sources. Evaluation Matrix The final judgement of the authors for the initial evaluation phase was guided by a matrix of the aspects of an exchange language that were considered key to it's use by members of the Bio-Ontology Consortium (http://wwwsmi.stanford.edu/projects/bio-ontology/) and other groups who may want to build ontologies in the area of molecular biology. This evaluation matrix is included in Appendix B. Selection of Languages for further Evaluation The authors decided that there was not a single language that stood out as the only appropriate candidate for recommendation as a language for the exchange of ontologies. It was clear that representational expressiveness was not adequate in some languages, and so they were eliminated from consideration. For example, some languages were unable to encode ground facts (instance objects). Also, some languages were in part or in whole proprietary, or had a significant cost associated with them. This was considered prohibitive to the successful adoption and use of the languages and so these languages were also eliminated . It was decided that two languages, Ontolingua and OML/CKML, provided enough expressivity to warrant a more in-depth evaluation. 5.2. Evaluation Part II: OML and Ontolingua The second phase of the evaluation process focused on the two candidate languages that were deemed most interesting from the initial evaluation: Ontolingua and OML/CKML. The authors decided that it would be useful to create a small model in each language in order to judge the utility and the representational richness of each language. A set of experiments were developed to perform this detailed evaluation. Three sets of experimenters were undertaken. The three experiments and their results are discussed below. OML Representation of the EcoCyc Gene Ontology Experiment: Peter Karp's group at Pangea Systems performed an experiment to better understand the OML language by translating the EcoCyc gene ontology into OML. The gene ontology is a taxonomy of 150 classes that classify microbial genes according to their functions, and that was developed by Dr. Monica Riley as part of the EcoCyc project.{Riley,EcoCyc} Within EcoCyc, the ontology can be accessed at http://ecocyc.panbio.com:1555/class-subs?object=Genes The OML encoding of the ontology can be accessed at: http://ecocyc.panbio.com/~pkarp/omlgenes.txt Results: Our findings were that OML was able to capture most aspects of the gene ontology. However, we identified what we consider to be a number of limitations of OML during the course of this experiment. 1. A number of aspects of the terminology used in the tags in OML files are not at all intuitive, and are not consistent with the terminology used in the more mainstream ontology community. This terminology will interfere with the acceptance and understanding of the language in the bioinformatics community. We suggested that OML could allow several alternatives for each tag to allow the language to be accepted by different communities that use different terminology. 2. The OML definitions are not modular in the sense that the OML definition of a given Class is spread out into several parts of the file, making OML files less human readable. 3. OML has a number of limitations in its expressive power: a) It cannot express facets directly (attributes of attributes), but R. Kent suggested that N-ary relations can be used to express facets. b) It cannot express annotations. c) It cannot handle multiple collection types -- sets only. d) It cannot express cardinality or numeric-range constraints. Ontolingua Representation of the EcoCyc Gene Ontology Experiment: Results: Representation of GeneClinics data model as an ontology Experiment: Peter Tarczy-Hornoch at the University of Washington in collaboration with Luca Toldo and Robert Kent performed an experiment with the general goal of using the existing GeneClinics OODB model as the basis for an ontology to assess OML/CKML and Ontolingua for ontology creation/exchange. The specific goal was to develop a small representative ontology in both Ontolingua and OML/CKML that represents key clinical and molecular entities and their linkages. design of the experiment was: 1. A distributed e-mail based experiment involving three investigators at three sites 2. The GeneClinics investigator developed 5 page document outlining subset of high level (coarse grain) GeneClinics OODB model. The scope of this model was to represent key clinical entities (clinical diagnoses, tests), key molecular entities (genes, loci, products, alleles, mutations), and their inter-relationships (causality maps to diagnoses, clinical tests for molecular entities) 3. The whole group clarified points including disjunctions, restrictions, other constraints not in OODB model 4. The developer of OML/CKML (one of the three participants) implemented/refined OML/CKML ontology 5. A specific instance (Charcot Marie Tooth type 1A) was represented 6. The same ontology was represented in parallel in Ontolingua 7. A specific instance (CMT 1A) was represented in Ontolingua 8. The OML/CKML and Ontolingua experiences were compared and contrasted 9. A handful of very granular elements were implemented (chosen to “stress” each language and compare robustness) Results: 1. The underlying paradigms of Ontolingua and OML/CKML are subtly different – frames based vs. conceptual graph based (formal concept analysis, information flow theory). Both require effort to learn the paradigm if you are not familiar with it. 2. Ontolingua concepts mapped more closely to object databases and object oriented programming paradigms -- thus might be easier for typical bioinformaticist to learn. 3. Minor difference in namespace -- Ontolingua requires name to be a unique identifier. 4. OML/CKML’s XML syntax makes it easier to learn than Ontolingua with its LISP syntax. 5. Neither language has the type of documentation of its syntax and semantics that would be needed for a tutorial for a bioinformaticist. Ideally the tutorial/documentation would need to include both formal representation of syntax with modified BNF format as well as selected examples drawn from biology building in complexity. For example, how do you represent a biological entity like a protein, how do you express the concept that a sequence of DNA codes for that protein, how do you express that proteins have one or more of the following list of functions, etc. 6. Both languages very expressive – Ontolingua’s expressivity is easier to see in both LISP and in the Ontolingua ontology-development tool because it is exposed even in simple case examples. OML/CKML expressivity is rich but harder to determine since a) it is not apparent is simpler examples, b) things like local theories and other concepts are powerful but harder to understand (documentation in conceptual graph paradigm, documentation and specification both evolving). In principle the OML/CKML conceptual graph model may be richer and more expressive than the frame model; an exact comparison of the two models would be useful. 7. Both languages able to handle needs of GeneClinics sample ontology (not a complex ontology). 8. Conceptual graph paradigm dense but very powerful (see document Designator-Facet.doc for examples). 9. Though not per se an attribute of the languages themselves it is important to note that software tools and applications, such as editors, browsers, parsers, translators, and query systems, exist for Ontolingua but not for OML/CKML. 10. OML/CKML is "an uninstantiated formalism" at some level. 11. The availability of the developer of OML/CKML (R. Kent) for collaboration on this project was immensely helpful. Conclusions: The expressive power of the two languages is similar and more than adequate for the purposes of expressing a part of the GeneClinics data model as an ontology. OML/CKML is however theoretically more powerful being based on a conceptual graph methodology. The Ontolingua frames semantics/paradigm on the other hand may be easier to learn since it is less of a leap from object database and object programming paradigms. The LISP syntax of Ontolingua could present a challenge to many bioinformaticians and the XML syntax of OML/CKML is likely to be more intuitive. Ideally an ontology exchange language would have an easy to learn basic semantics and syntax (like XML) but be very expressive (like OML/CKML and Ontolingua). Neither language as it stands quite achieves this ideal though a more frame-based version of OML/CKML or an XML encoding of Ontolingua might come closer. Finally, for the general bioinformatics community (not versed in ontology representation) it might be helpful to create documentation and tutorials that use biological examples. 5.3. Evaluation Part III: Recommendations At its last meeting, the BioOntology Core Group reached the following conclusions and recommendations. The core group reached two major decisions for the selection of a language for the exchange of ontologies for molecular biology: 1) A traditional frame-based approach for representation of biological entities is sufficient for current needs. In addition, frame-based systems have been in use for a significant period of time and are, in general, stable representation systems. Among frame-based systems Ontolingua is clearly one of the most prominent and has had extensive use for many years. 2) XML has tremendous momentum with significant interest from commercial organisations and a serious standardisation effort. We anticipate that XMLbased tools and web servers supporting XML are beginning to appear and more are on the horizon. The belief of the group was that the language that the bioinformatics community needs for the exchange of ontologies should be based on frame-based semantics with an XML expression. However, the group also believed that we did not have such a language before us since Ontolingua is frame-based but without an XML expression and OML does have an XML expression, but is based on conceptual graphs, not frames. At the meeting Peter Karp presented preliminary work that he and Vinay Chaudhri, from SRI, had done on producing an XML expression based on the OKBC knowledge model, which in turn is very closely related to Ontolingua (the Ontolingua developers were also involved in the development of OKBC). The consensus of the group was that we recommend the use of a frame-based language with an XML syntax for the exchange of ontologies, and, to that end, the group requested that Karp and Chaudhri complete their work on the XML expression of Ontolingua, so that the group could complete its evaluation of exchange languages. 6. Summary Over the last two decades, the knowledge representation and object-oriented database communities have developed a number of languages that may be used for the expression of semantic database models. These languages share many elements in common, and are exemplified by the frame knowledge representation systems used in the knowledge representation community. Frame systems have been used in many different bioinformatics projects, and the authors believe that frame systems provide the necessary representational constructs to model ontologies for molecular biology. Furthermore, frame systems have a significant amount of history and use, so that they provide a stable representational paradigm. The authors also believe that the explosion of the web and the languages associated with it simply cannot be ignored. Acceptance of an exchange language that is expressed in a Lisp syntax will be limited within the bioinformatics community, even though the underlying representational system may be identical to that expressed in a web-based language. For this reason the authors believe that an XML-based syntax must be used for a bioinformatics ontology exchange language to increase the likelihood that the language will see widespread acceptance. In summary, the results of this evaluation suggest two directions for future work: development of an XML expression for the Ontolingua model, or adapting OML/CKML to include a frame-based semantic model. 7. Future Directions The authors support the use of a frame-based exchange language using an XML syntax. Several researchers on the evaluation team are currently developing a specification of XML expression of Ontolingua using OKBC. A separate set of researchers on the team are pursuing a frame-based version of OML. The exchange language evaluation team will meet again to consider the question of whether either, or both, of these efforts provide an acceptable exchange language that meets the groups requirements. References EC Edwin C. Webb, "Enzyme Nomenclature, 1992: Recommendations of the nomenclature committee of the International Union of Biochemistry and Molecular Biology on the nomenclature and classification of enzymes", Eur. J. Biochem., Academic Press, 1992. Fikes Fikes, R. and Kehler, T., "The Role of Frame-Based Representation in Reasoning", Communications of the Association for Computing Machinery, 1985, 28(9):904-920. Gruber Gruber, T.R., "A translation approach to portable ontology specifications", Knowledge Acquisition, 1993 5:199-220. Guarino Guarino, N. and Giaretta, P., "Ontologies and knowledge bases towards a terminological clarification", in Towards very large knowledge bases, IOS Press, Amsterdam, 1995, N.J.I. Mars, pp25-32. KarpReview Karp, P.D., "The design space of frame knowledge representation systems", SRI International AI Center, 1992, #520, URL ftp://www.ai.sri.com/pub/papers/ karp-freview.ps.Z. RileyOntol Riley, M., "Functions of the gene products of Escherichia coli", Microbiological Reviews, 1993, 57:862-952. 8. Appendix A - Evaluation Questions The following questions were asked of each candidate language during the Phase I evaluation process. Language Support and Standardization: 1. Is a formal specification of the syntax of the ontology language available? How complex is its syntax? Please present that formal specification of the language at the meeting. 2. What parsers are available for the language? What translators are available to convert between language L and other ontology-description languages? How complete are those translators? 3. What other software is available that operates on the language, such as for web-based publishing of ontologies or browsing/editing of ontologies? 4. What support (documentation, training, tutorials, e-mail) is available for the language? 5. Does it have any development/usage standards? Who controls this standard? 6. Does a stable release of the language exist (i.e. one that will not fundamentally change in 6 months)? Data model/capabilities: 1. What assumptions does the language make about the ontology to be represented? 2. Which of the following does the language support:  negation  conjunction  disjunction  recursion  relations  multiple inheritance  multi-valued slots  number restrictions on roles  role hierarchies  transitive roles  axioms  template/default values 3. 4. 5. 6.  method slots (calculated values?)  constraints If the language supports constraints, how rich is the constraint language? Is the constraint language formally defined? What are the primitive data types in the language? What database data model(s) does the language support? Does the language encode instances as well as classes (data as well as schema?) Querying: 1. Are there tools for query an ontology expressed in this language? If so, ... 2. How are queries expressed? 3. Which of the following queries can be expressed in the query language:  what are the parents of concept C?  what are the children of concept C?  what could I say about concept C (e.g. what roles are legally applicable to C)?  is concept C satisfiable?  what role-fillers can a role have for a concept C?  what English expression does C have?  is C a kind of D?  what is the least common parent of C and D  what is the greatest common child of C and D  are C and D equivalent?  Can queries be translated/compiled into a standard programming/query language? Performance: [These questions are more about ontology tools (editors, viewers, ...) than language.] 1. Are there any limits (or the limits of available translators/parsers) in the size of the ontology, the length of names/values, etc. (theoretical or practical). 2. What is the overhead (bytes) for a language parser? interpreter? 3. For resources which depend on an information service for support (such as Ontolingua), does the service have the capacity to support all of the users of the technology? Other Issues: 1. What example applications exist which utilize the language? How many of these are from or representative of the bioinformatics domain? [The two questions below are asking about the ability to express non-domain relevant information in the ontology, so that, for example, one could include user model information (preferences for viewers, etc) or database access information (for access to persistent instance-level information) in the domain model.] 2. Can the ontology be partitioned, for example, into biology and bioinformatics (e.g. a protein has an accession number)? 3. Can the core ontology be extended to include other information, e.g. mappings to functions in databases, control information for showing the ontology through interfaces 9. Appendix B - Evaluation Matrix, Part I The table below shows the evaluation of candidate languages over general information. Property ASN.1 ODL Onto OML/ OPM XML/ UML CKML RDF Formal Yes No Yes Yes Yes Yes Yes Syntax? Translato No Yes Loom,I No Relatio No No rs DL,KIF nal,AS ,CLIPS, N.1,XM etc L,HTM L,ER Software Parser Parsers WWW No Yes XML Rationa Tools s browser toolkits l Rose s,editor s,compa rison tools Support ?? yes WWW WWW Docum WWW Formal docume gramma entation sites, courses, ntation, rs, ,trainin mailing books, FAX,tut WWW g,tutori lists, tutorials orial,su exampl als books pport es staff Controlli ng Org Stability Users Bioinfo Users Develope rs ISO ODMG Stable Stable Yes OO Vendor s Yes NCBI ?? OO Vendor s Stanfor dU Stable WWW users WSU Evolvin g Intel apps SB, Stanfor d RoboW eb Yes Stanfor d WSU GeneLo gic Inc Stable W3C OMG Stable Yes, Bix and others PDB Evolvin g WWW develop ers No GeneLo gic many, many many parts of industry SB, (probab ly many other pharma s) Rationa l Rose 10. Appendix C - Evaluation Matrix, Part II The table below shows the results of evaluation over detailed properties of the representational expressiveness of candidate languages. Property ASN.1 ODL Onto OML/ OPM XML/ UML CKML RDF Negation No No Yes Yes ?? No No Conjuncti No No Yes Yes ?? No No on Disjuncti No No Yes Yes ?? No No on Relations Yes Yes Yes Yes Yes Yes Yes Multiple No Yes Yes Yes Yes Yes No Inheritan ce Inverses No Yes Yes Yes Yes No No MultiYes Yes Yes Yes Yes Yes No valued slots Multiple Yes Yes No No ?? Yes No collection types Number restrictio ns Slot hierarchie s Facets Default Values Other slot constraint s Primitive Datatypes Data Model Instances and classes No No Yes Yes ?? No Yes No No Yes Yes ?? No No No No No No Yes No Yes Yes ?? ?? No Yes No No No No Yes Yes No No No Standar d Object/ Logic Standar d Object/ Logic Standar d Object None N/A SemiStr uctured data Object Yes Yes No Yes No Standa Standar rd d Object Object w/o inherit ance No No Comparison of the expressive power of the ontology-exchange languages. The meanings of the rows are: 1. Negation: Does the language allow the assertion that a relation does not hold between x and y. 2. Conjunction: Does the language allow the assertion that a relation holds both between (x, y) and between (x, z). 3. Disjunction: Does the language allow the assertion that a relation holds both between (x, y) or between (x, z), but not both. 4. Relations: Does the language allow the mapping of the elements of a set A to the elements of a set B. 5. Multiple inheritance: Can the language describe inheritance of a child class from multiple parent classes? 6. Inverses: Can the language encode that slot X and slot Y are inverses of one another? 7. Multi-valued slots: Can the language encode slots that may have multiple values? 8. Multiple collection types: Can the language encode slots with different collection types such as bags, sets, and sequences? 9. Number restrictions on slots: Can the language encode constraints on the number of values a slot may have? 10. Slot hierarchies: Can the language encode taxonomic hierarchies of slots? 11. Facets: Can the language encode facets (facets encode properties of slots)? 12. Default values: Can the language encode default slot values? 13. Other slot constraints: Can the language encode other types of constraints on slot values, such as numeric ranges? 14. Primitive datatypes: What primitive datatypes does the language support? ``Standard'' indicates standard datatypes such as numbers, strings, Boolean. 15. Data model: What database data model does the language support? 16. Instances and classes: Can the language encode information about instance objects as well as class objects? Appendix D - Evaluation Matrix The table below was used by the authors to evaluate the initial candidate languages after our evaluations over the questions was complete. This table shows the desired attributes of an exchange language, and how each language can be rated along those aspects. A plus sign, '+', indicates a positive. More than one plus sign indicates more significant positives. The minus sign, '-', indicates a negative evaluation of a criteria. Also, AF indicates that the language/product is free to academic organizations, and classes & instances multiple inheritance constraints defaults expressive power tools available* Onto XML OML OKBC /RDF + + + + + + + + ++ ++ + + + + + +++ + +++ ++ lisp Java lisp, Java, (AF) C stability + + + support + ++ + + translators ++ + ? + many applications + + + + open language + + + + OPM CycL - + UML /XMI + + + + + + ++ Java, C++ + + + + + + + +++ lisp, Java, C (AF) + + KIF. Loom + + + + + simplicity: human good low low simplicity: formal good good good open to + ++ ++ collaboration good good good good low + STATUS out out out out out * By "tools available" the authors mean browsers and editors for the language.

Ontology Exchange Languages for Bioinformatics

Related documents

Products

Support

Ontology Exchange Languages for Bioinformatics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib