What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern the terms and relations Glorified glossary – with terms organised by is_a relations (class/subclass) into hierarchy Controlled vocabulary for describing: • Web services e.g. WSDL files • Standalone tools • Web servers • Databases • Data, e.g. XSD data schema associated with a WSDL file • Data syntax and file formats Aims to describe (coarse level) all major bioinformatics databases, data and tools in use The "beta" release covers tools (and associated data) in the EMBRACE Registry: http://www.embraceregistry.net/ Scope EDAM includes 7 sub-ontologies (branches of terms in their own namespace) In the domain of "bioinformatics tool and data description“: • biological entity – “Any biological thing (or part of a thing) with a physical existence, a physical part, region or feature that can be mapped to such a thing, a collection of such things or an observable phenonema or occurrence” • topic – “A general field of bioinformatics study, data, processing and analysis or technology.” • operation – “A specific, singular function or process performed by a tool, for example a WS operation. What is done, but not (typically) how or in what context.” • data resource – “A category of content of a data source including databases and ontologies.” • data – “A semantic description of a data entity (datum) commonly used in bioinformatics.” • format – “A reference (typically a URL) of a data format specification.” Required terms not specific to this domain might (eventually) be removed – including the entity branch (which provides biological context for other branches). Conceptual model Bold text within a box indicates a namespace (top-level term) Non-bold text within a box indicates a minor branch Text next to lines indicates a relation between two terms Design Principles It wasn’t just thrown together (honestly) … • Clearly defined scope • A purpose-independent design, not tied to a particular use case • Relevant to annotation of current: •WSDL files •XSD schema •Standalone databases, servers and tools • Comprehensive, with enough terms to be useful • Comprehensible, with terms and relations that are simple and intuitive • Uncluttered, including only commonly used terms use and with as few relation types as possible • Navigable, with a simple class (is_a) hierarchy • General, including terms of general use and excluding fine-grained specialised concepts. • Complementary to (not duplicate) other established ontologies. • Compatible (e.g. cross-referenced) with existing resources • Integrity, compatible (so far as possible) with "upper level" ontologies • Extensible, with clear guidelines for developers • Convenient, with clear guidelines for annotators • Ideally, support automated logical inference (reasoning software) • Validatable There is a compromise between “ontological correctness” and usability – a pragmatic approach is essential! Limitations EDAM is/does not: • Describe syntax or file formats in detail (syntax namespace will provide references) • Define data structures. Although has_part / is_part_of relations are defined they are not currently used. • Include terms for every conceptual part of things. Typically a datatype is only listed if it known to be in common use • A catalogue of individual data structures, databases etc. Terms correspond to classes; specific instances are not included. • A full-strength ontology. Many relations and other domain features that could be expressed, e.g. in OWL format, are not modelled. • A way (in itself) to identify or unify all services and data (but it might help). • Complete (and arguably never can be). Sources (current version) Software collections and registries: • EMBRACE Web Services • EBI Web Services • EBI databases and retrievable fields known to the EB-eye web services () • EMBOSS including EMBASSY packages (>200 applications) • WHAT-IF data and services (see also WHAT-IF help) • Lists of tools from the Web Domain ontologies: • myGrid ontology • NAR Databases • NAR web servers • Sequence (sequence-related terms) • Sequence service (sequence service terms) Database-related terms: • dbxref.txt (databases cross-referenced in UniProtKB/Swiss-Prot) • List of databases collated by the ELIXIR project • Lists of databases from the web Other (not used as source of terms): • MI (molecular interactions) • MIRIAM Resources • bio2rdf Sources (to consider) 1. BioMoby: BioMoby Object Ontology (datatypes) BioMoby Namespace Ontology (namespaces) BioMoby service types (analysis types) BioMoby web service registry (Moby-compliant services) 2. Tool collections and registries: PSICQUIC services Web services lists and registries Services supported by the bio* projects 3. Domain ontologies: PDBML Schema (Protein Data Bank Markup Language) Sequence Ontology (sequence annotation and annotation exchange) BioPAX ontology (biological pathway data) Ondex ontology DAS (sequence annotation) Map (biological map-related terms from Gramene database) 4. XML formats: BSML MACSIM HSAML BEAST MSAML PHYLIP JalView 2 Project AlignmentML EBI Application XML UniProtKB RDF 5. Other: MSD/PDBe API OMG LSR documents Download “Beta" version in OBO (Open Biomedical Ontologies) format: http://sourceforge.net/projects/edamontology/files/ Status “Beta” version intended primarily for testing and feedback Starting point for service nomenclature Coverage is quite broad in general and quite deep for sequence analysis: •~2000 terms with definitions •8 basic types of relation (plus inverse relations) • Relations are defined but not used in many term definitions. Relations will be added in the future depending on requirements. Maturing nicely through iterative cycles of development • Term names, definitions and hierarchy (is_a relations) in all branches are reasonably stable • Future versions will not be a fundamental departure EDAM is being actively developed: • OBO uses IDs to uniquely identify terms. EDAM IDs will persist between versions: a given ID is guaranteed to identify the same concept. This does *not* imply term names, definitions and other fields will remain constant, but they will remain true to the concept. • Obsolete terms will also persist (they will not be removed and will maintain their ID). Suggestions, requirements and collaborations welcome! License EDAM is made available to all without any constraint or license on its use or redistribution other than: • EDAM is clearly acknowledged as the source of the product. • EDAM files displayed publicly include the publication date and/or version number. • EDAM files are not altered and subsequently redistributed under their original name or with the same term identifiers. Documentation Documentation at: http://edamontology.sourceforge.net/ Including clear statement of: • Branches of terms (namespaces / sub-ontologies) • Relations • Rules (governing rules and relations) • Guidelines for Developers • Guidelines for Annotators (basic) • And more … Viewing EDAM may be viewed in: • Any text editor • Ontology editor OBO Ontology Editor (OBOEdit) Version 2 http://oboedit.org • Web-based browsers: NCBO Ontology Browser http://bioportal.bioontology.org/visualize/42800 EBI Ontology Look-up Service (coming soon) http://www.ebi.ac.uk/ontology-lookup/ • SRS EBI SRS server http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+LibInfo+-lib+EDAM Viewing in Text Editor • Any text editor Viewing in Ontology Editor • Ontology editor OBO Ontology Editor (OBOEdit) Version 2 http://oboedit.org Viewing in Web-based Browser • Web-based browsers: NCBO Ontology Browser http://bioportal.bioontology.org/visualize/42800 EBI Ontology Look-up Service (coming soon) http://www.ebi.ac.uk/ontology-lookup/ Viewing in SRS EDAM is in EBI SRS server: http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+LibInfo+-lib+EDAM And from the EBI dbfetch: http://wwwdev.ebi.ac.uk/Tools/dbfetch/ Which allows the terms to be addressed : http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000352 (plain text view) or http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000352?style=html (HTML view) These views are the term “end-points” Guidelines for Annotators Which EDAM branch to use? • “topic” for coarse-grained annotation of tools, databases, servers and so on • “operation" for fine-grained annotation of tool functions • “data resource" for annotating data resources such as databases and servers into broad categories based on content-type • “data" and “format" for annotating data in semantic and syntactic terms respectively Picking terms • Familiarise yourself with EDAM (use a text editor or OBOEdit) • Identify the correct branch/namespace (“operation", “data" etc. see above) • Search EDAM using keywords to find candidate terms. Use synyonyms, alternative spellings etc. • Pick the most specific term(s) available (some concepts are necessarily overlapping or general!) • Only pick a correct term (if it doesn't exist it can be added) Use other ontologies Use EDAM alongside other ontologies where possible and desirable. For example, an operation that predicts specific features of a molecular sequence could be annotated with GO terms for the features. Annotation of Web Services Model of a Web Service A WS is considered as an arbitrary (but usually related) set of one or more operations, reducing the problem of WS interoperation to one of compatibility between operations. Operation • Discrete unit of functionality performing (typically) one or more definite functions • Reads an input • Writes an output • Uses zero or more data resources Input • Payload of SOAP message passed in operation call • Name and (ideally) description is given in WSDL file • Input has one or XML elements which must be set (input values) Output • Payload of SOAP message returned from operation call • Name and (ideally) description is given in WSDL file • Output has one or XML elements which are written (output values) XML elements • Simple or complex XSD types given in XSD schema associated with a WSDL file • Correspond to values that are input or output by a service • Name and (ideally) description of element is given in schema • Element values are instances of a particular datatype with a semantic type and a specific syntax. • Most element values have a syntax fully specified by the schema • Some element values correspond to text in a specific file format which is not specified by the schema. Such reports may be a composite of different semantic types. Data resources • Databases or ontologies used in the background • Not passed in a WS call • Might be specified indirectly via a parameter. For example an operation reads a database, the name of which is specified Annotation of Web Services Levels of annotation Annotation of a WSDL file or associated XSD schema is possible at several levels. Assuming SAWSDL annotation, the XML elements that may be annotated are: 1. 2. 3. 4. Service (<wsdl:portType>) • Ideally one “Topic" term for the service as a whole Operation (<wsdl:operation>) • Ideally one "Operation" term for each WSDL operation (more than one in exceptional circumstances) Input (parameter) values (<xs:element>, <xs:complexType>, <xs:simpleType>, <xs:attribute>) • One "Data" term • One “Format" term Output values (<xs:element>, <xs:complexType>, <xs:simpleType>, <xs:attribute>) • One "Data" term • One “Format" term The expectation is for annotation of operation inputs and outputs to go into XSD schema although the WSDL file (<input> and <output> elements) might also be used. The following annotations might be useful but are not supported by SAWSDL: 1. 2. 3. Web service (<wsdl:service>) • One or more "Topic" terms to describe the general area(s) the service operates in • One or more “Data resource" terms to describe the data resources used by the service Operation input (<input>) • One or more "Data" terms for the input(s) of each operation (if needed) Operation output (<output>) • One or more "Data" terms for the output(s) of each operation (if needed) Annotation of EMBOSS EMBOSS (European Molecular Biology Open Software Suite) >200 applications for (mostly) molecular sequence analysis Application descriptions are kept in ACD (Application Command Definition) file ACD file includes: 1 “Application definition” 1 or more “Data definitions” ACD files are annotated with EDAM terms Application definition: >=1 “topic” term >=1 “operation” term Data definition: >=1 “data” term EMBOSS Service Annotation Annotated WSDL files (and associated XSD data schema) are available from: http://wwwdev.ebi.ac.uk/soaplab/typed/services/list You will see a list of service end-points with WSDL URLs. For example: http://wwwdev.ebi.ac.uk/soaplab/typed/services/alignment_consensus.cons.sa?wsdl To see the data schema associated with a WSDL, you must replace "?wsdl" with "?xsd=1", "?xsd=2" or "?xsd=3" For example: http://wwwdev.ebi.ac.uk/soaplab/typed/services/alignment_consensus.cons.sa?xsd=1 SAWSDL annotation The proposed format of SAWSDL annotation includes the term namespace, unique identifier and URN pointing to the term definition: <element name="elementName" sawsdl:modelReference="http://purl.org/edam/namespace/id"> Where ... * element is the XML element being annotated * elementName is the name of the XML element * namespace is the namespace of the EDAM term, e.g. "operation" * id is the unique identifier of the term, e.g. "0000295" The term name, if required, could be given as an XML comment after the annotated element: <element name="elementName" sawsdl:modelReference="http://purl.org/edam/namespace/id"> term_name --> <!-- This is not recommended however as term names are not guaranteed to remain constant. The value of the sawsdl:modelReference attribute is a URN pointing to the term definition. Proposal is to use PURLs (Persistent Uniform Resource Locators) which include the term namespace. EDAM term end-points When pasted into a browser, the PURLs: http://purl.org/edam/topic/0000182 http://purl.org/edam/operation/0000292 http://purl.org/edam/data/0000863 ... will (eventually) resolve to: http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182 http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000292 http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000863 These are complete OBO term statements in plain text (OBO format). PURLs support text extensions allowing a format specifier to be added. For example these PURLs: http://purl.org/edam/topic/0000182?style=html http://purl.org/edam/operation/0000292?style=html http://purl.org/edam/data/0000863?style=html ... will resolve to OBO term statements in HTML such that terms referred to in the statements (via relations) will be clickable to allow navigation: http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182?style=html http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000292?style=html http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000863?style=html EDAM term end-points The eventual final list of end-points will provide other formats/views: • Plain text in OBO format (default) • HTML • XML • JSON • The term in a web browser, e.g. NCBO Ontology Browser. http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182?style=html http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=xml http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=txt http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=json http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=browser (default) For now, you can see this in action for this term: http://purl.org/edam/entity/0000002 http://purl.org/edam/entity/0000002?style=html Parallel Developments (and other applications) These include: • BioXSD • EMBRACE Registry / BioCatalogue • Taverna • BioNEMUS • Ondex • ELIXIR BioNemus Thanks • Peter Rice (boss) • Alan Bleasby (PURL handling) • Mahmut Uludag (EMBOSS WS) • Hamish McWilliam (SRS, discussions) • Matus Kalas (BioXSD, discussions) • James Malone (SWO + discussions) • Steve Pettifer (publications + discussions) • The Forgotten … (sorry) All enquiries to Jon Ison (jison@ebi.ac.uk)