OIL - School of Computer Science

advertisement
OIL: A Slick Way to Represent Knowledge for Bioinformatics
and the Web
Carole Goble(1), Robert Stevens(1), Ian Horrocks(1), Dieter Fensel (2), Frank van
Harmelen (2,3), Stefan Decker (4), Michel Klein (2)
(1) The Department of Computer Science
The University of Manchester
Manchester, M13 9PL, UK
Email: <carole/stevensr/horrocks>@cs.man.ac.uk
http://img.cs.man.ac.uk/
(2) Vrije Universiteit Amsterdam, Holland
dieter@cs.vu.nl / frankh@cs.vu.nl / mcaklein@cs.vu.nl
(3) Aidministrator Nederland, B.V., Amersfoort, Holland
(4) Stanford University, USA
stefan@db.stanford.edu">stefan@db.stanford.edu
1
Introduction
The web has proved to be an excellent mechanism for the ready publication and
availability of information. It has been an effective technology for supporting the
biologists’ culture of co-ordinated research, and the sharing and rapid dissemination of
information. What is missing is the next level of interoperability—not just making
information available, but understanding what the information means so that it can be
linked in appropriate and insightful ways, and providing automated support for this
process.
The integration and sharing of information requires a consistent shared understanding of
the meaning of that information. The biologist’s knowledge of molecular biology and
bioinformatics, and their interpretation of the resources with respect to this knowledge, is
essential to the task of combining resources to answer queries. A shared understanding
requires three things: metadata, terminologies and ontologies.
Sharing data means publishing, agreeing and sharing metadata, which means shared
schemas and shared controlled vocabularies. This means the use of, and adherence to,
standard vocabularies such as GO [Ashburner00]. The questions now become (a) how to
describe and exchange such vocabularies?and (b) how do we build better vocabularies?
A prerequisite for a widespread use of ontologies is a joint standard for their description
and exchange. Ontologies provide a way of capturing a shared understanding of terms
that can be used by humans and programs to aid information exchange. The ontology
needs to be specified in some language.
In 1999, the BioOntology Core Group developed a frame-based language with an XML
syntax for the exchange of ontologies for molecular biology. The proposed language was
XOL [Karp99] (later extended to iXOL), based on the modelling primitives of OKBCLite [Chaudhri97]. However, this language is only suitable for ontologies that are framebased – allowing only necessary but not sufficient class definitions (i.e. a new class is
always a sub-class of and not exactly equal to its specification), and only class names but
not class expressions (except for the limited form of expression provided by slots and
frames) can be used in defining classes. Ontologies such as the TAMBIS ontology
[Baker99] come from the Description Logic [Borgida95] school of knowledge
representation, relying on and encouraging reasoning over the class definitions to infer
subsumption relationships between classes and to check the consistency and coherency of
the classification and its concepts. What would be ideal would be a language that unified
frames and description logics with a web standard syntax.
The Ontology Inference Layer (OIL) is a new language proposed as a knowledge
representation language for the web [Fensel00]. OIL is an effort to produce a layered
architecture for specifying and exchanging ontologies. The language has been designed
so that:
 It provides the modelling primitives commonly used in frame-based ontologies;
 It has simple, clean and well-defined semantics based on description logics;
 Automated reasoning support (i.e. consistency and subsumption checking) may be
specified and provided in a computationally efficient manner.
OIL is a layered language in order to be very clear about the reasoning support possible at
each level of expressivity.



OIL-Core captures the necessary mainstream modelling primitives that both provide
adequate expressive power and are well understood, thereby allowing the semantics
to be precisely defined and complete inference to be viable;
Standard-OIL includes the integration of instances as well as concepts;
Heavy-OIL includes additional representational (and reasoning) capabilities.
Although OIL is a powerful language it isn’t necessary to use all the expressive power –
it is light-weight enough to represent simple taxonomies or frame-like ontologies such as
GO without obstruction. It is also expressive enough to present logic-based models such
as the TAMBIS Ontology [Baker99], and all points in between. Some parts of the
ontology can be simple, others complex; moreover, the language offers the ontologist an
evolutionary development path whereby they can progressively introduce more
expressive constructs. Thus OIL is the confluence of the modelling primitives of frames
with the classification reasoning capabilities of description logics and encodings in the
metadata languages of the web.
For more discussion and motivation, and for papers that give a formal description,
grammar specification, see the papers at http://www.ontoknowledge.org/oil/. OIL’s
machine-readable syntax is defined as an XML DTD, an XML Schema definition and an
RDF Schema definition. Because of OIL’s antecedents in the logic community, it has a
formal denotational semantics available from the same URI. OIL also has a compact
more human-readable syntax, which is used in this paper.
1.1
An Informal Description of OIL-Core
An OIL ontology is a structure made up of several components, some of which may
themselves be structures, some of which are optional (component?) and some of which
may be repeated one or more times (component+) or zero or more times (component*).
An OIL ontology is delineated by the keywords begin-ontology and end-ontology, and
consists of two major parts:

The ontology container, which is concerned with describing the metadata of the
ontology, based on the Dublin Core Metadata Element Set (Version 1.1) standard
[Dublin Core]. Some of the elements can be specialised with a qualifier, which
refines the meaning of the element (written element.qualifier). Elements include title,
creator, subject, description, publisher, contributor, date, type, format, identifer,
source (URI), language, relation (to other OIL ontologies) and rights. Figure 1
gives an example of an ontology container;
ontology-container
title “macromolecule fragment”
creator “robert stevens”
subject “macromolecule generic ontology”
description “example for a tutorial”
description.release “1.0”
publisher “R Stevens”
type “ontology”
format “pseudo-xml”
identifier “http://www.ontoknowledge.org/oil/oil.pdf”
source “http://img.cs.man.ac.uk/ismb00/mmexample.pdf”
language “OIL”
language “en-uk”
Figure 1: An example of an ontology container

The ontology definition, defining a particular ontological vocabulary, containing
descriptions of classes, slots and individuals (also known as instances). Classes are
collections of individuals that share properties; slots describe binary relations between
individuals. The definition is made up of two parts: (a) an import directive giving a
list of one or more references to other OIL modules that are to be included in this
ontology and (b) a definition of zero or more of the following:
i.
class definitions (class-def). Classes are unary predicates, such as person, and
can be arranged into classification lattices; for example person is a subclass of
mammal, thus anything that is an instance of person must also be an instance of
ii.
iii.
a mammal. Classes also contain information about how their members relate to
other individuals;
slot definitions (slot-def). Slots are binary relations that define how members of
classes relate to other objects. Slots are also known as roles or attributes. Slots
can also be arranged into slot lattices (subrelations). They may be constrained
through the use of slot constraints, limiting the type of instance that could fill
the slot (the slot filler), the number of fillers a slot may have, enumerating a set
of possible fillers etc. Particular fillers may also be specified; for example, a
white wine may be explicitly defined as something that is a wine whose colour
slot is filled with the colour white.
axioms (disjoint, covered, disjoint-covered, equivalent). An axiom asserts
some additional facts about the classes in the ontology.
The rest of the paper concentrates on an informal discussion of the definition part of OIL,
with illustrative examples that don’t claim to be great biology but serve to illustrate the
expressivity of OIL.
1.2
Class Definitions
A class definition associates a class name with a class description. A class-def includes
the name of a class and some optional documentation describing the class.
Class expressions are descriptions of classes, which are either:
 A class name, including built-in class names such as top and bottom. Top is the top
of the lattice, so is the most general class; bottom is the bottom of the lattice, and is a
sub-class of every class. Every individual is an instance of top; no individual is an
instance of bottom;
 An enumerated class, defined by enumerating its instances, for example (one-of Iron
Sulphur) defines the class whose instances are Iron and Sulphur;
 A slot-constraint, a list of one or more constraints or restrictions applied to a slot. A
slot is a binary relation (i.e. its instances are pairs of individuals). A slot-constraint is
a class definition, i.e. its instances are those individuals that satisfy the constraints.
We will go into this in more detail below.
 A combination of class expressions using the Boolean connectives (and, or, not). And
takes two of more expressions and conjoins them, or takes two or more expressions
and treats them as a disjunction, and not takes a single expression that is negated. For
example, the class (dna or messenger-rna) defines the class whose instances are
all those individuals that are instances of either the class dna or the class messengerrna. Expressions may be recursive and thus arbitrarily complex.
Concrete type expressions define a range over some data type, currently integer and
string. Sub-ranges can be defined using expressions (min x), (max x), (greater-than x),
less-than x), (equal x) and (range x y). Expressions of the same type can be combined
using the boolean operators.
The ontology definition consists of the following components:
type? The type of definition, which is either primitive or defined, defaulting to primitive.
A primitive class’s definition (the combination of subclass-of and slot-constraint
components) is taken to be a necessary but not sufficient condition for membership in the
class. For example, in (1) primitive class dna is asserted to be a sub-class of nucleicacid. In (2) deoxyribophosphate is asserted to be a sub-class of all classes whose
instances are all those individuals that are not instances of the class ribophosphate. This
is part of the assertions required to state that an instance cannot both be an instance of
deoxyribophosphate and ribophosphate, i.e. that the classes are disjoint. As this is
such a common thing to say OIL has this as a special axiom as we will see later.
(1) class-def primitive dna
subclass-of nucleic-acid
(2) class-def primitive deoxyribophosphate
subclass-of not ribophosphate
Definition (3) declares dna as a sub-class of nucleic-acid with a slot-constraint (or
role) stating that the backbone must be deoxyribophosphate. This says that, all
instances of dna must necessarily be nucleic-acids with a ribophosphate backbone, but
there may be nucleic acids with a deoxyribophosphate backbone that are not dna.
(3) class-def primitive dna
subclass-of nucleic-acid
slot-constraint has-backbone
has-value deoxyribophosphate
(4) class-def defined dna
subclass-of nucleic-acid
slot-constraint has-backbone
has-value deoxyribophosphate
When a class is defined, its definition is taken to be a necessary and sufficient condition
for membership of the class. Thus in (4) all instances of dna are necessarily
deoxyribophosphate backboned nucleic acids, and every deoxyribophosphate backboned
nucleic acid is also an instance of dna.
subclass-of? The class being defined in this class-def must be a sub-class of each of the
class-expressions listed. In (5) and (6) enzyme has two super-classes: protein and
catalyst.
(5) class-def defined catalyst
subclass-of macromolecule
slot-constraint promotes
has-value reaction
(6) class-def enzyme
subclass-of protein, catalyst
slot-constraint* Zero or more slot-constraints, which are special kinds of classexpression (as described above). A slot-constraint defines a class as shown in (3) and (4)
above. A slot-constraint consists of the following components:

name A slot can be defined in the ontology through a slot-def; if it isn’t then its
assumed to have no globally applicable constraints so any pair of individuals could be
an instance of the slot.

has-value? A list of one or more class or concrete-type expressions. This is
equivalent to the existential quantifer of predicate logic. (3) and (4) define the class
each instance of which as has-backbone of some instance of the class
deoxyribophosphate. This doesn’t mean that instances of the slot-constraint hasbackbone is only deoxyribophosphate.

value-type? A list of one or more class or concrete-type expressions. This is
equivalent to the universal (for-all) quantifer of predicate logic. (7) defines the class
such that if an instance of the dna is related via has-backbone to some individual x,
then x must be an instance the class deoxyribophosphate. This doesn’t mean that
instances of dna have to have a backbone at all.
(7) class-def defined dna
subclass-of nucleic-acid
slot-constraint has-backbone
value-type deoxyribophosphate
(8) class-def defined dna
subclass-of nucleic-acid
slot-constraint has-backbone
has-value deoxyribophosphate
value-type deoxyribophosphate
To define dna as a class where there must be a backbone and it must only be filled by
deoxyribophosphate we must make a definition as in (8).

has-filler? A list of one or more individual names or data values (integers or strings).
Every instance of the defined class must be related via the slot to each individual and
data value listed (e.g. (9)).

max-cardinality?, min-cardinality? and cardinality?. A non-negative number n
followed by a class or concrete-type expression. An instance of the defined class can
be related to at most n, at least n or exactly n distinct instances, or data values of the
expression defined by the slot constraint. If the expression if missing then the type of
the instances the slot is related to is irrelevant, all that matters is the number. (10)
defines the class each instance of which has exactly 7 transmembrane-helices for
its scaffold.
(9) slot-constraint atomic-number
has-filler 19
(10) slot-constraint has-scaffold
cardinality 7 transmembrane-helix
1.2.1 Slot definitions
A slot definition associates a slot name and some optional documentation with a slot
description. A slot description specifies global constraints that apply to the slot relation.
Examples are given in (11). A slot-def consists of the following components:
(11) slot-def part-of
inverse is-part-of
properties transitive
slot-def has-polarity
range polarity
slot-def cellularlocation
properties functional
slot-def has_backbone
inverse is_backbone_of
slot-def loosely-binds
domain electrical-charge





subslot-of binds
inverse loosely-bound-by
?
subslot-of A list of one or more slots. Slots may also be arranged into slot
hierarchies.
domain? If the pair (x,y) is an instance of the slot relation, x must be an instance of
each class-expression listed.
range? If the pair (x,y) is an instance of the slot relation, y must be an instance or data
value of each class or concrete type expression listed.
inverse? The name of the slot that is the inverse of the slot being defined.
properties? A list of one or more properties of the slot, which include:
- transitive if both (x,y) and (y,z) are instances that (x,z) must be an instance;
- symmetric if (x,y) is an instance then (y,x) must be an instance;
- functional if (x,y) is an instance then there is no z such that (x,z) is an instance
and y  z.
1.2.2 Axioms
Current valid axioms defined on lists of class expressions are:
 disjoint All the class expressions listed are pairwise disjoint (have no instances in
common), for example (12);
 covered. Every instance of the class is also an instance of at least one of a list of class
expressions;
 disjoint-covered A combination of the two above;
 equivalent all the class expressions listed are equivalent (have the same instances).
(12) class-def ribophosphate
disjoint ribophosphate deoxyribophosphate
class-def deoxyribophosphate disjoint rna dna
These can be alternatively defined using class definitions with negation and disjunction;
however, they are common enough to warrant their own primitives in the language.
1.3
Reasoning and OIL
Linking OIL with Description Logics gives the language well defined semantics often
lacking in frame-based and web-based language specification such as OKBC and RDFS.
In addition it considerably extends what the ontologist is able to express. OIL extends
frame language such as OKBC by:
 Arbitrary Boolean class expressions wherever a class name can appear;
 A slot definition can be treated as a class and can be used in class expressions;
 Class definitions can be specified as primitive or defined;
 Slot fillers can be restricted to instances of arbitrarily complex class expressions;
 Global slots allow the specification of parent slot and relation properties;
 There is no restriction on the ordering of class and slot definitions.
Moreover, the reasoning services offered by a Description Logic support the development
and incremental maintenance of an ontology [Bechhofer, Stevens]. Highly optimised
implementations of sound and complete tableaux subsumption algorithms for very
expressive DLs such as SHIQ [Horrocks99] can be used in spite of the high worst-case
complexity. The key services are:
 Subsumption checking between two concept descriptions, C and D, C subsumes D,
when the set of individuals that are instances of D are always a subset of the
individuals that are instances of C.
 Classification organises a collection of concept expressions into a partial order based
on the subsumption check. This provides a lattice of definitions, ranging from the
general to the specific. Composed definitions have their position implicitly
determined automatically. Thus classification is a dynamic process where new
compositional expressions can be added to an existing hierarchy.
 Concept satisfiability checks whether a concept description can never have instances
because of some inconsistencies, contradictions or some other reason in the model.
What this means to the ontologist is that (a) there is automated support for building and
evolving the classification lattice and (b) the classification scheme is coherent and
consistent. For example, given (13) and definitions above.
(13) class-def defined mitochondrial
slot-constraint cellular_location
has-value (mitochondrion or (slot-constraint part-of has-value
mitochondrion))
class-def defined succinate_dehydrogenase
subclass-of enzyme
slot-constraint promotes value-type oxidation
slot-constraint cellular_location
has-value (slot-constraint part-of has-value mitochondrion)

The

If we introduced the class expression (catalyst and mitochondrial),
succinate_dehydrogenase would be recognised as a sub-class because of its
definition;
1.4
succinate_dehydrogenase
mitochondrial because of its definition;
class
is
recognised
as
a
sub-class
of
Standard OIL and Heavy OIL
Our starting point was to define a core language with the intention that additional features
be defined as a set of semantics still with clearly defined semantics. OIL-Core is an
expressive language that still enjoys reasoning support – every construct can be serviced
with the most state of the art description logic reasoning engine such as FaCT
[Horrocks99].
Standard OIL is a strict superset of OIL-Core, extending it to include the definition of
instances of classes and roles using instance-of and related statements respectively.
These statements allow the modeller to express instances and distribute them along with
their class definitions using XML syntax. This was an important requirement in the
exchanging of ontologies. However, the instances cannot be used in reasoning at the
present time. Special instances known as nominals are supported by using primitive
disjoint concepts. Nominals include notions such as Italy – a modeller might want to refer
to Italy in a model but there is only one Italy, so although it is really an instance of
country its behaviour is similar to a concept.
Heavy OIL is used to refer to other features not yet in OIL that can be found in other
ontology modelling languages including: further relation properties such as composite
relations, and relation (ir)reflexivity etc; rules/axioms; limited second order expressivity;
default reasoning and modules. These are explored further in papers in [Horrocks00], and
form a line of future work.
1.5
OIL Status
The issues to be addressed by the biology e-Science community are a version of issues to
be addressed by the whole web community—specifically, how to move the web from one
where information is machine readable by humans to one where information is machine
processable by intelligent services such as information brokers, search agents and
information filters. OIL has been developed as an international collaboration as part of an
effort to realize the W3C vision of a semantic web through the explicit representation of
their meaning [Berners-Lee99]. Current OIL activity is focused on:
Example applications: Manchester has already produced a version of the Gene Ontology
[Ashburner00] in OIL and has developed the second generation TAMBIS ontology in
OIL.
Tools: The FaCT reasoner is available with a logic sufficient to reason over OIL, an
implementation efficient enough to make reasoning empirically tractable and a CORBA
IDL with a clean API [Bechhofer99]. Other tools under development include OIL editors
(OILEd at Manchester, OntoEdit at Karlesruhe and an extension of Protégé 2000 at
Stanford), and the adaptation of ontology integration tools, such as Chimaera at Stanford.
EU project proposals: A Network of Excellence and an IST FET-O proposal have been
submitted, with over 20 commercial partners as supporters including major
pharmaceutical companies and biotech web portal information providers.
WWW metadata language standardisation: The OIL effort is a major contributor to the
DARPA Agent Markup Language initiative (DAML) in their attempt to define an
ontology language, DAML-ONT [DAML]. In addition, the W3C have been developing
an extension to RDF to include logical inferencing; OIL’s RDFS mapping gives it a
semantics that it is otherwise absent. At the time of writing, the three communities—OIL,
W3C and DAML—have come together in a joint DAML Language Committee to
produce a universal, scalable yet technically sound language for describing knowledge on
the web.
Acknowledgements
The authors would like to acknowledge the entire OIL Consortium
(http://www.ontoknowledge.org/oil/misc.shtml#ackn) and Sean Bechhofer for his
invaluable help with the examples.
References
[Ashburner00] M. Ashburner et al Gene Ontology: Tool for the Unification of Biology,
Nature Genetics Vol 25 pages 25-29, (2000)
[Baker99] P.G. Baker, C.A. Goble, S. Bechhofer, N.W. Paton, R. Stevens, and A Brass.
An Ontology for Bioinformatics Applications Bioinformatics, 15(6):510-520, (1999).
[Bechhofer99]. Bechhofer S., Horrocks I., Patel-Schneider P.F., Tessaris S. A Proposal
for a Description Logic Interface. DL99, International Workshop on Description Logics,
Linköping, Sweden, (1999).
[Bechhofer] S. Bechhofer and C.A. Goble Thesaurus Construction through Knowledge
Representation In: Data and Knowledge Engineering accepted for publication
[Berners-Lee99] T. Berners-Lee Weaving the Web Orion Business, ISBN: 0752820907
(1999)
[Borgida95] A. Borgida Description logics in data management. IEEE Trans on
Knowledge and Data Engineering 7(5) pp: 671-682 (1995)
[Chaudhri97] V.K. Chaudhri, A. Farquhar, R. Fikes, P.D. Karp, and J.P. Rice: Open
knowledge base connectivity 2.0 Technical Report KSL-98-06, Knowledge Systems
Laboratory, Stanford (1997)
[DAML] http://www.daml.org
[Dublin Core] http://purl.ocl.org/dc/
[Fensel00] D. Fensel I. Horrocks, F. van Harmelen, S. Decker, M. Erdmann, and
M.Klein OIL in a nutshell In: Knowledge Acquisition, Modeling, and Management,
Proceedings of the European Knowledge Acquisition Conference (EKAW-2000), Lecture
Notes in Artificial Intelligence, Springer-Verlag, (2000).
[Horrocks99] I. Horrocks, U. Sattler A Description Logic with Transitive and Inverse
Roles and Role Hierarchies. Journal of Logic and Computation, 9(3): 385-410, (1999).
[Horrocks00] I. Horrocks, D. Fensel, S Bechhofer, J Broekstra, S. Decker, M. Erdmann,
C. Goble, F. van Harmelen, M.Klein, S. Staab, R. Studer, The Ontology Inference Layer
OIL, available from http://www.ontoknowledge.org/oil/oilhome.shtml
[Karp99] P.D. Karp, V.K. Chaudhri and J. Thomere: XOL: An XML-based ontology
exchange language: Version 0.3, (1999).
[Stevens] R. Stevens, C.A. Goble and S. Bechhofer, Ontology-based Knowledge
Representation for Bioinformatics to appear in Briefings in Bioinformatics
Download