BioSphere : A Grid Portal for Collaborative Ontology Creation

advertisement
BioSphere:
A Grid Portal for
Collaborative Ontology Creation
Stuart Aitken1, Kemian Dang1, and Jonathan Bard2
1 Informatics, University of Edinburgh
2 Weatherall Institute of Molecular Medicine, Oxford
14 Sept
2009
www.biosphere-portal.org
1
The BioSphere Ontology Portal
BioSphere: A BBSRC tools and resources project
developing tools for the end user
– Collaborative ontology development tools
– Aimed at biologists and bio-informaticians
1.
Ontologies
W Bio-ontology languages (OBO)
W W3C standards: Web Ontology Language (OWL)
2.
Archiving XML data
W Keys for XML
3.
BioSphere Portal
W GridSphere and OGSA-DAI
14 Sept
2009
www.biosphere-portal.org
2
Bio-ontologies
z
Taxonomies of organisms (NCBI)
Cellular Organisms
Eukaryota …
Rodentia …
Mus musculus (house mouse)
z
Anatomy ontologies
– Mouse, fly, worm and plant anatomies used for indexing spatial
and temporal gene expression data
– Tissues and regions are named and given IDs
z
Generally in bio-ontologies
– IDs are used in many databases
– Mapped to other ID schemes
– Providing interoperability
14 Sept
2009
www.biosphere-portal.org
3
Annotations
Annotations to terms (classes) are very important:
z Definitions
– Textual definition of a term
z
Cross references to databases
– ‘DbXRef’ e.g [TAIR:ki]
z
leaf
is_a
shoot apex
part_of
leaf primordium
An OBO language definition:
[Term]
id: PO:0000017
na me: leaf prim ordium
def: "An organized group of cells that will differentiate into leaf that are
e merging as an outgrowth in the shoot apex (flanking the meristem). "
[TAIR:ki]
is_a: PO:0009025 ! leaf
relationship: part_of PO:0000037 ! shoot apex
14 Sept
2009
www.biosphere-portal.org
4
Annotations
Annotations form nested structures:
z Definition
<text, DbXRef list>
z
DbXRef
<name, accession, [description]>
e.g. ISBN:0471245208
z
Synonym
<{exact, broad, narrow, related}, text, DbXRef list>
z
Editors need to support these features of OBO, as well
as allow class hierarchies to be edited
14 Sept
2009
www.biosphere-portal.org
5
Tools: OBO Edit
14 Sept
2009
www.biosphere-portal.org
6
Web Ontology Language (OWL)
In parallel with the development of these biomedical
ontologies, standards for a web-compatible ontology
language emerged from the W3C:
z OWL is a general-purpose ontology language
– Based on Description Logic
» With well-defined semantics
» Tractable reasoning algorithms
– Web-compatible
» Concepts and relations referred to through URIs
» RDF/XML syntax
– Compatible with other W3C standards
14 Sept
2009
www.biosphere-portal.org
7
OBO into OWL
z
Aims for converting OBO to OWL (NCBO/GO effort with
Moreira, Mungall and Shah) were:
– to translate OBO to OWL, and back,
– staying within OWL-DL, and
– round-trip files without error.
z
IDs remain the primary index → local name in URL
C A R O:0000063 → nam espace#CAR O_0000063
z
Semantics
– OBO is less expressive than OWL so only uses a fraction
of the representational power
z
Can now consider working with ontologies at the XML
document level
– XML methods
14 Sept
2009
www.biosphere-portal.org
8
Archiving XML data
XML documents and XML keys
z XML documents, serving as databases, often contain
unique keys (in the database sense)
z Top-level keys and secondary keys identify elements
in hierarchically-structured databases [Buneman 01]
– If name and course in
<name>joe</name><course>maths-1</course>…
constitute a key, then all elements with this key must
agree everywhere, define:
– path expressions
– value equality ⇒ node equality
– Relative keys - motivated by scientific databases
14 Sept
2009
www.biosphere-portal.org
9
Archiving XML data
Versioning XML documents using timestamps
z Changes to scientific databases are accretive
[Buneman 01]
– Documents have a key structure
– Additions/deletions are infrequent
– diff-based approaches may be inefficient
z
‘keys’ can be used to merge versions of documents
– Keys that are the same in two versions indicate that no
change has taken place - supports merging
– In fact, the element need only be stored once, along with
a timestamp representing a time interval
– The document structure can be exploited - timestamps
stored only at child nodes that are changed
14 Sept
2009
www.biosphere-portal.org
10
BioSphere: An ontology portal
BioSphere provides:
z Shared access to ontologies
– For loosely-organised groups of ontology developers
Version management for multiple users
z Visualisation of user’s view points
z Discussion and collaboration
Why a portal?
z The architecture is ideal for managing data resources
centrally
z Details of formats, plug-ins etc can be hidden
z
– But off-line working may need to be supported
14 Sept
2009
www.biosphere-portal.org
11
Portal and Grid Technologies
The BioSphere portal integrates several existing
technologies:
z GridSphere
– A environment for accessing portlets
» Java enterprise-style designs
» GridSphere and portlets are deployed in tomcat
– Provides user account management
– Integration with Grid security where needed
z
OGSA-DAI / DAIX
– Middleware providing read and read/write/update access
to data resources
– DAIX version is designed to operate with XML databases
z
eXist XML database
14 Sept
2009
www.biosphere-portal.org
12
Portal and Grid Technologies
In addition to GridSphere:
z Spring Framework
– Portlet event handling
z
Dojo JavaScript toolkit
– Provides a drag-and-drop tree library
– Plus other GUI components
BioSphere
Portlet
OGSADAIX
eXist
GridSphere
tomcat
14 Sept
2009
www.biosphere-portal.org
13
Versioning Ontologies
OWL XML
<SubClassOf>
<OWLClass URI="http://purl.org/obo/owl/GO#GO_0000001"/>
<OWLClass URI="http://purl.org/obo/owl/GO#GO_0000000"/>
</SubClassOf>
New tags are added to Assertions when loading ontologies into the
system:
<A>
<SubClassOf>
<OWLClass URI="http://purl.org/obo/owl/GO#GO_0000001"/>
<OWLClass URI="http://purl.org/obo/owl/GO#GO_0000000"/>
</SubClassOf>
<V begin-Group="0” end-Group=”FUTURE"/>
</A>
14 Sept
2009
www.biosphere-portal.org
14
Versioning Ontologies
Edits are recorded as changes to Version tags:
<A>
<SubClassOf>
<OWLClass URI="http://purl.org/obo/owl/GO#GO_0000001"/>
<OWLClass URI="http://purl.org/obo/owl/GO#GO_0000000"/>
</SubClassOf>
<V begin-Group="0” end-Group=”FUTURE"/>
<V begin-User1="0” end-User1=”V1"/> <!-- User1 deletes assertion -->
</A>
Changes to the ontology are recorded in Version tags
Nothing is actually deleted, duplications of content are minimised
User1’s view is {AssertionsGroup / DeletionsUser1} + AssertionsUser1
14 Sept
2009
www.biosphere-portal.org
15
Collaborative editing
Users inherit the group view
Users make edits and comments, and can reject the group view
User edits are copied to the group when agreement is established
Group
<EntityAnnotation>
<O WL Class
U RI="&oboContent;CA R O#C A R O_0000063"/>
<Annotation annotationURI="&rdfs;label">
<Constant>portion of cell substance</Constant>
</Annotation>
</EntityAnnotation>
User-1
caro.owl
OWL 2 ontology
User-2
Xpath queries taking version no.
and user ID return views of the
OWL 2 XML document
14 Sept
2009
www.biosphere-portal.org
16
BioSphere Portal
Gridsphere provides the user with login and layout customisation
14 Sept
2009
www.biosphere-portal.org
17
OBO editor
14 Sept
2009
www.biosphere-portal.org
The portlet currently has two components:
a form for entering information; and
a tree visualisation of the ontology;
18
Visualisations from
database views
Group SubClassOf structure shown in blue
Group View - excludes anatomical plane
User-1 View:
SubClassOf structure shown in green
14 Sept
2009
www.biosphere-portal.org
19
Community organisation
14 Sept
2009
www.biosphere-portal.org
20
Related work
z
The NCBO has a bioportal that allows OBO
ontologies to be browsed
– New ontologies can be submitted, and will be
reviewed
– Future versions will support discussion
14 Sept
2009
www.biosphere-portal.org
21
Related work
z
z
z
KAON ontology management [Gabel 04]
SemVersion [Volkel 06]
Protégé
– Prompt - change tracking [Noy 04]
– Web / Collaborative Protégé [Tudorache 07]
z
Ontology views
– Sub-ontology extraction [Bhatt 04]
– Segmentation [Seidenberg 06]
– Views [Noy 04] [Volz 03]
14 Sept
2009
www.biosphere-portal.org
22
Future Work
z
Sign more users up … actual usage
includes:
– Ontology editing
– Commenting
z
Database access
– Speed up the user interaction by eliminating XML data
transfer
z
Visualisation
– Investigate alternatives to trees to present the ontology
structure
BBSRC grant
BB/F015976/1
14 Sept
2009
www.biosphere-portal.org
23
Download