BioSphere: A Grid Portal for Collaborative Ontology Creation Stuart Aitken1, Kemian Dang1, and Jonathan Bard2 1 Informatics, University of Edinburgh 2 Weatherall Institute of Molecular Medicine, Oxford 14 Sept 2009 www.biosphere-portal.org 1 The BioSphere Ontology Portal BioSphere: A BBSRC tools and resources project developing tools for the end user – Collaborative ontology development tools – Aimed at biologists and bio-informaticians 1. Ontologies W Bio-ontology languages (OBO) W W3C standards: Web Ontology Language (OWL) 2. Archiving XML data W Keys for XML 3. BioSphere Portal W GridSphere and OGSA-DAI 14 Sept 2009 www.biosphere-portal.org 2 Bio-ontologies z Taxonomies of organisms (NCBI) Cellular Organisms Eukaryota … Rodentia … Mus musculus (house mouse) z Anatomy ontologies – Mouse, fly, worm and plant anatomies used for indexing spatial and temporal gene expression data – Tissues and regions are named and given IDs z Generally in bio-ontologies – IDs are used in many databases – Mapped to other ID schemes – Providing interoperability 14 Sept 2009 www.biosphere-portal.org 3 Annotations Annotations to terms (classes) are very important: z Definitions – Textual definition of a term z Cross references to databases – ‘DbXRef’ e.g [TAIR:ki] z leaf is_a shoot apex part_of leaf primordium An OBO language definition: [Term] id: PO:0000017 na me: leaf prim ordium def: "An organized group of cells that will differentiate into leaf that are e merging as an outgrowth in the shoot apex (flanking the meristem). " [TAIR:ki] is_a: PO:0009025 ! leaf relationship: part_of PO:0000037 ! shoot apex 14 Sept 2009 www.biosphere-portal.org 4 Annotations Annotations form nested structures: z Definition <text, DbXRef list> z DbXRef <name, accession, [description]> e.g. ISBN:0471245208 z Synonym <{exact, broad, narrow, related}, text, DbXRef list> z Editors need to support these features of OBO, as well as allow class hierarchies to be edited 14 Sept 2009 www.biosphere-portal.org 5 Tools: OBO Edit 14 Sept 2009 www.biosphere-portal.org 6 Web Ontology Language (OWL) In parallel with the development of these biomedical ontologies, standards for a web-compatible ontology language emerged from the W3C: z OWL is a general-purpose ontology language – Based on Description Logic » With well-defined semantics » Tractable reasoning algorithms – Web-compatible » Concepts and relations referred to through URIs » RDF/XML syntax – Compatible with other W3C standards 14 Sept 2009 www.biosphere-portal.org 7 OBO into OWL z Aims for converting OBO to OWL (NCBO/GO effort with Moreira, Mungall and Shah) were: – to translate OBO to OWL, and back, – staying within OWL-DL, and – round-trip files without error. z IDs remain the primary index → local name in URL C A R O:0000063 → nam espace#CAR O_0000063 z Semantics – OBO is less expressive than OWL so only uses a fraction of the representational power z Can now consider working with ontologies at the XML document level – XML methods 14 Sept 2009 www.biosphere-portal.org 8 Archiving XML data XML documents and XML keys z XML documents, serving as databases, often contain unique keys (in the database sense) z Top-level keys and secondary keys identify elements in hierarchically-structured databases [Buneman 01] – If name and course in <name>joe</name><course>maths-1</course>… constitute a key, then all elements with this key must agree everywhere, define: – path expressions – value equality ⇒ node equality – Relative keys - motivated by scientific databases 14 Sept 2009 www.biosphere-portal.org 9 Archiving XML data Versioning XML documents using timestamps z Changes to scientific databases are accretive [Buneman 01] – Documents have a key structure – Additions/deletions are infrequent – diff-based approaches may be inefficient z ‘keys’ can be used to merge versions of documents – Keys that are the same in two versions indicate that no change has taken place - supports merging – In fact, the element need only be stored once, along with a timestamp representing a time interval – The document structure can be exploited - timestamps stored only at child nodes that are changed 14 Sept 2009 www.biosphere-portal.org 10 BioSphere: An ontology portal BioSphere provides: z Shared access to ontologies – For loosely-organised groups of ontology developers Version management for multiple users z Visualisation of user’s view points z Discussion and collaboration Why a portal? z The architecture is ideal for managing data resources centrally z Details of formats, plug-ins etc can be hidden z – But off-line working may need to be supported 14 Sept 2009 www.biosphere-portal.org 11 Portal and Grid Technologies The BioSphere portal integrates several existing technologies: z GridSphere – A environment for accessing portlets » Java enterprise-style designs » GridSphere and portlets are deployed in tomcat – Provides user account management – Integration with Grid security where needed z OGSA-DAI / DAIX – Middleware providing read and read/write/update access to data resources – DAIX version is designed to operate with XML databases z eXist XML database 14 Sept 2009 www.biosphere-portal.org 12 Portal and Grid Technologies In addition to GridSphere: z Spring Framework – Portlet event handling z Dojo JavaScript toolkit – Provides a drag-and-drop tree library – Plus other GUI components BioSphere Portlet OGSADAIX eXist GridSphere tomcat 14 Sept 2009 www.biosphere-portal.org 13 Versioning Ontologies OWL XML <SubClassOf> <OWLClass URI="http://purl.org/obo/owl/GO#GO_0000001"/> <OWLClass URI="http://purl.org/obo/owl/GO#GO_0000000"/> </SubClassOf> New tags are added to Assertions when loading ontologies into the system: <A> <SubClassOf> <OWLClass URI="http://purl.org/obo/owl/GO#GO_0000001"/> <OWLClass URI="http://purl.org/obo/owl/GO#GO_0000000"/> </SubClassOf> <V begin-Group="0” end-Group=”FUTURE"/> </A> 14 Sept 2009 www.biosphere-portal.org 14 Versioning Ontologies Edits are recorded as changes to Version tags: <A> <SubClassOf> <OWLClass URI="http://purl.org/obo/owl/GO#GO_0000001"/> <OWLClass URI="http://purl.org/obo/owl/GO#GO_0000000"/> </SubClassOf> <V begin-Group="0” end-Group=”FUTURE"/> <V begin-User1="0” end-User1=”V1"/> <!-- User1 deletes assertion --> </A> Changes to the ontology are recorded in Version tags Nothing is actually deleted, duplications of content are minimised User1’s view is {AssertionsGroup / DeletionsUser1} + AssertionsUser1 14 Sept 2009 www.biosphere-portal.org 15 Collaborative editing Users inherit the group view Users make edits and comments, and can reject the group view User edits are copied to the group when agreement is established Group <EntityAnnotation> <O WL Class U RI="&oboContent;CA R O#C A R O_0000063"/> <Annotation annotationURI="&rdfs;label"> <Constant>portion of cell substance</Constant> </Annotation> </EntityAnnotation> User-1 caro.owl OWL 2 ontology User-2 Xpath queries taking version no. and user ID return views of the OWL 2 XML document 14 Sept 2009 www.biosphere-portal.org 16 BioSphere Portal Gridsphere provides the user with login and layout customisation 14 Sept 2009 www.biosphere-portal.org 17 OBO editor 14 Sept 2009 www.biosphere-portal.org The portlet currently has two components: a form for entering information; and a tree visualisation of the ontology; 18 Visualisations from database views Group SubClassOf structure shown in blue Group View - excludes anatomical plane User-1 View: SubClassOf structure shown in green 14 Sept 2009 www.biosphere-portal.org 19 Community organisation 14 Sept 2009 www.biosphere-portal.org 20 Related work z The NCBO has a bioportal that allows OBO ontologies to be browsed – New ontologies can be submitted, and will be reviewed – Future versions will support discussion 14 Sept 2009 www.biosphere-portal.org 21 Related work z z z KAON ontology management [Gabel 04] SemVersion [Volkel 06] Protégé – Prompt - change tracking [Noy 04] – Web / Collaborative Protégé [Tudorache 07] z Ontology views – Sub-ontology extraction [Bhatt 04] – Segmentation [Seidenberg 06] – Views [Noy 04] [Volz 03] 14 Sept 2009 www.biosphere-portal.org 22 Future Work z Sign more users up … actual usage includes: – Ontology editing – Commenting z Database access – Speed up the user interaction by eliminating XML data transfer z Visualisation – Investigate alternatives to trees to present the ontology structure BBSRC grant BB/F015976/1 14 Sept 2009 www.biosphere-portal.org 23