LinKFactory® : an Advanced Formal Ontology Management System

advertisement
LinKFactory® : an Advanced Formal Ontology
Management System
Werner Ceusters
Peter Martens
Language and Computing (L&C)
Het Moorhof, Hazenakkerstraat 20 A
9520 Zonnegem, Belgium
werner@landc.be
Language and Computing (L&C)
Het Moorhof, Hazenakkerstraat 20 A
9520 Zonnegem, Belgium
peter@landc.be
Abstract
As the web becomes more and more a worldwide
platform for e-commerce, the creation of formal
ontologies in all business sectors becomes crucial. It will
become increasingly important to have computers
understood what the real meaning is of the content of
web pages, and of the data in the databases lying behind
them. The real challenge will be to create formal
ontologies that are processable and exchangeable by
machines.
developed will have to be language-independent, but
however linkable to all languages.
This paper describes LinKFactory®, a formal
terminology and ontology management system that
makes the creation and management of large scale,
complex, multilingual and formal ontologies possible.
We will explain the possibilities of LinKFactory® based
on our experiences in creating a formal representation of
the medical world, named LinKBase®, and linking it to
several third party ontologies.
Keywords
Formal ontologies, semantic/linguistic knowledge base,
ontology management system, terminology management
system
The existing problem that researchers and industries
have is to build and maintain an environment that makes
it possible to create the needed large formal ontology’s
while keeping processing time at a minimum. Formal
ontology has been recently defined as the systematic,
formal, axiomatic development of the logic of all forms
and modes of being [3]. Management systems for
smaller ontology’s have been developed (ODE [1],
WebOnto [4], Ontolingua [5], HoZo [6], JOE [7],
Protégé [8], OntoSaurus [9], …), but none of these are
capable to deal with the enormous and complex
ontology’s that will be needed to support the semantic
web.
To resolve these problems (initially for the medical
environment) L&C worked on creating a formal
Ontology Management System, called LinKFactory®.
The intent of the project was to implement a knowledge
representation and compatible reasoning mechanism in a
database structure. Among the objectives set for
developing the data-model were:

INTRODUCTION
For many years, numerous teams, mainly of academic
origin, have been working on systems that can handle
terminology and the complex relations between
individual terms. All those systems suffer from at least
one of the major setbacks :



Insufficiently formalised (designed for human use,
not machine use)
Not capable of handling the required large numbers
of knowledge objects that form an adequate
ontology
Not designed to handle linguistic aspects,
sometimes not even multiple language entries
To make the semantic web a success, these three
setbacks will have to be overcome. Knowledge will have
to be formalized so that machines worldwide have a
shared and common understanding of the information
provided. The systems developed will have to be able to
handle enormous amounts of information very fast. As
the web is a universal system, different languages will
have to be supported, i.e. the system and the ontology’s



The ability to fully model a classification (ontology)
of concepts with all their relevant relationships and
definitions.
The ability to connect languages with this
conceptual model and use it for natural language
understanding.
The ability to connect the resulting association of
terms and concepts with third party terminology
systems such as SNOMED or ICD-9.
All entities in the database should be versioned so
that references can be made to older versions of
objects without losing that information.
During the course of the project several extra capabilities
were added to these base requirements that served to
enrich the structure and allow for even more
sophisticated features.
All this had to be modelled as efficiently as possible, and
in such a way that it would allow easy manipulation
from an application layer.
THE TOOL : LinKFactory®
General Description
LinKFactory® is the formal ontology management
system, developed by L&C, used to build en manage the
medical linguistic knowledge base LinKBase®.
LinKFactory® is a tool that can be used to build large
and complex language-independent formal ontology’s.
“Language-independent” has to be understood in terms
of independency from any specific language (such as
English, French, Dutch, …), but not from language as a
medium of communication. It is also not limited to small
ontology’s, as most of the existing ontology editors are.
The fact that the ontology’s are language-independent
has some major consequences on the type of applications
that can run on top of them. It will, for example, be
much easier to search for relevant information on the
web (or a thesaurus): the search can be done in one
language in free text. This free text search will be linked
to language-independent concepts (based on the
semantics) that will be the basis for the information
retrieval. Since terms in several languages are attached
to the concepts using a linguistic ontology [2], also
relevant info in other languages can be retrieved, while
semantically irrelevant information will not appear in the
list of results.
System Architecture of LinKFactory®
LinKFactory® stores the data in a relational database
(we currently use Oracle). Access to the database is
abstracted away by a set of functions that are “natural”
when dealing with ontology’s: get-children, find-path,
join concepts, get terms for concept X, …
One of the main requirements of the project was that a
server-side component should be developed that would
allow developers to use a standardized API to program
applications on top of the semantic database without
requiring intimate knowledge of the internal structure of
the database.
This component would also have to be databaseindependent (Oracle, Sybase, SqlServer have been
tested), capable of dealing with multiple concurrent
users and it would have to be stable. LinKFactory® is
also platform independent (Windows, Solaris, Unix and
Linux tested). Combining all these requirements made it
clear that Java was to be the platform of choice seeing as
it supports all of the above and has become a stable and
mature technology in the last year.
We finally settled on RMI (Remote Method Invocation)
as our technology of choice because of its simplicity and
proven robustness. This means that our server-side
component is a Java Application that extends
java.rmi.Remote. The application requires an RMI
registry (a sort of Domain Name Server for RMI servers)
to be running in order for it to be able to register itself
and for clients to be able to connect to the RMI server.
The LinKFactory® system consists of 2 major
components (see figure 1), the LinKFactory® Server,
and the LinKFactory® Workbench (client-side
component). The LinkFactory® Workbench allows the
user to browse and model the LinKBase® data.
Figure 1 : LinKFactory® components
The workbench is a dynamic framework for the
LinKFactory® Beans. Each bean has its own specific
functionality and limited view to the underlying formal
ontology, but combining a set of beans in the workbench
can provide the user with a powerful tool to view an
manage the data stored in the semantic database. The
workbench provides the user with an optimal flexibility
to create a customized tool to view and manage the data
in the ontology.
Different views on the semantic network are
implemented as Java beans. Examples are: Concept tree,
Concept criteria and full definitions, Linktype tree,
Criteria list, Term list, Search pane, Properties panel,
Reverse relations, … The LinKFactory® framework is
implemented in 100% pure Java code. The modular
design is done using Java beans organized and linked in
a freely configurable workspace.
Each user can create multiple views on the semantic
network using the beans available. The beans are
organized in several workspaces designed by the user.
Each workspace can contain multiple frames upon which
the beans are laid out. Once the layout work is finished,
links can be established between the beans used.
Each of the layouts defined can be saved as Java code
and stored in the database. Layouts can be defined on
different levels: Organization, Group, User.
Each bean can have multiple incoming and outgoing
links where appropriate. Beans can also be linked inter
frame. Each bean has specific properties, which can be
set at runtime. This approach allows for the different
types of tasks to be performed using the optimal layout
for the task at hand.
Several quality assurance mechanisms are build in:
versioning, user tracking, user hierarchies, formal
sanctioning with possibility to overrule, siblingdetection, linktype hierarchy, etc.
Specifications of the Available Beans :
General
The different beans provide information on and a view
of different parts of the ontology’s build. All of these
beans can be linked to each other, when an outgoing link
from bean 1 matches an incoming link from bean 2. A
bar on top of each bean shows the other beans the bean
has been linked to and also the direction of the link.
Other items in the bean bar are the bean label, the button
to display/edit the bean properties and the possibility to
refresh the bean contents. Optional items (dependent on
the kind of bean) are a shortcut to the linktype filter
property and a dragable item possibility.
Most important beans
The ConceptTree (see figure 2) bean provides the user
with a view to the hierarchical relations in the semantic
network of concepts. As concepts can have multiple
parents (network structure) and the representation is a
tree-view, the network structure is split up into the
matching tree representation. Modifications to the
structure can be made by means of drag and drop.
relations and full definitions can be added, removed or
modified (by drag and drop).
We introduced the notion of concept-definition and
concept-criteria, which allows us to group a number of
concept-criteria (essentially relationships) to form a full
definition. In this way a concept could not only have
multiple full definitions and loose concept-criteria, but
also the definitions could overlap.
Concept-criteria are the equivalent of what used to be
complex-concepts; they represent a relationship between
two concepts by use of a linktype.
This diagram illustrates how full definitions could be
constructed:
L1
FD1
C
C1
L2
C2
L3
L4
L5
The functionalities of the ConceptTree bean include
search by knowledge name, search by terms; modify
hierarchy, history of searches. The bean properties
provide a way to specify the number of siblings to
display, the font, the child depth, the number of children
to display, the preferred language, the parent depth and
the leaf-node child depth.
2
FD
C3
C4
C5
The hypothetical concept C has 5 relationships
(CONCEPT_CRITERIA) and 2 full definitions (FD1
and FD2). FullDef1 consists of 3 concept-criteria: L1,
L2 and L3 and FullDef2 consists of L3 and L4. L5 is
simply a loose concept-criterium not belonging to any
full definitions.
Figure 2 : the ConceptTree bean
Figure 3 : the Full Definition bean
A second important bean is the Full Definition bean (see
figure 3). This bean shows the user the hierarchical and
non-hierarchical relations a concept has with other
concepts. These relations are sub classed in the relations
explicitly specified for this concept (beneath the node
labeled CRITERIA), and the implicit relations (beneath
the node labeled INHERITED CRITERIA) (figure 2). It
also shows the full definitions, i.e. the sufficient criteria
to uniquely identify a concept, for this concept. Explicit
The ReverseConcept bean (see figure 4) shows the
reverse concept bean shows the relations other concepts
have with the selected concept. The node labeled
Reverse ConceptCriteria shows the explicit relations
other concepts have with the selected concept. The node
labeled Inherited Reverse ConceptCriteria shows the
implicit relations other concepts have with this concept,
i.e.: the explicit relations other concepts have with a
concept that is a explicit child of the selected concept,
hence the concepts have an implicit relation with the
featured concept. The inherited reverse relations are not
shown by default.
Figure 4 : the Reverse Concept bean
The LinkType bean (see figure 5) provides the user with
a view to the hierarchical relations in the semantic
network of linktypes.
As linktypes can have multiple parents (network
structure) and the representation is a tree-view, the
network structure is split up into the matching tree
representation. Modifications to the structure can be
made by means of drag and drop.
Figure 5 : LinkType Tree bean
Linktypes were deemed to have a hierarchy just like
concepts so we added the LINKTYPE_TREE to
represent this; this simple construct suffices because
there is only a hierarchical parent-child relationship
between linktypes.
This hierarchy will have an effect on the constraints (see
below) because when a linktype is used in a conceptcriterium it automatically implies all the parent-linktypes
are used.
e.g. : When there is a link HAS-BONAFIDEBOUNDARY and it has HAS-BOUNDARY as parent
then that parent is also implied when the child is used in
a relationship.
The Translate bean (see figure 6) shows the list of terms
related to the selected concept, the selected linktype or
the selected criterium in a certain language. Terms can
be added, modified or removed. Several Translate beans
can be viewed simultaneously giving terms in different
languages, all linked to the language-independent
concept. This construction f.e. makes it possible to place
an application on top of the ontology
Figure 6 : Translate bean
Other available beans include Concept Properties bean,
LinkType Properties bean, Criteria bean, Bookmark
bean and others.
All of these beans can be selected by the user and linked
to each other, as such creating a powerful environment
for browsing and editing large ontology’s. An ontology
that has been build with LinKFactory® is LinKBase®.
LinKBase® : A LARGE FORMAL ONTOLOGY
BUILD WITH LinKFactory®
Since the initial focus of L&C was the medical world,
we started to construct a formal representation of the
medical world. We used LinKFactory® to do this.
LinKBase® is a large multi-lingual medical formal
terminology system covering most parts of healthcare.
The fact that LinKBase® currently contains over
1,000,000 medical concepts and over 350 linktypes (with
over 3,000,000 linktype instantiations), gives a good
indication of the size of the ontology’s LinKFactory®
can handle.
The medical concepts, themselves language independent,
are linked to about 3,000,000 terms in various languages.
Terms can be stored in different languages and can be
linked to concepts, criteria and linktypes with an
intersection table, allowing us to define both homonyms
(1 term that has several different meanings or linked
concepts/criteria/linktypes) and synonyms (multiple
terms associated with 1 concept/criteria/linktype).
Closely related with the mentioned intersection table
(TERM_CCL) is the SOURCE and SOURCE_OBJECT
construction. SOURCE is a table that stores a number of
medical classification systems such as SNOMED or
ICD-9-CM that classify medical concepts according to
their own hierarchy and are used throughout the medical
world. By using SOURCE_OBJECT we can link
TERM_CCL records with certain sources and assign a
code to that combination. We now have the possibility to
translate from existing formal medical hierarchies to our
own conceptual structure and back. A very powerful
feature.
LinKBase® is an IS_A hierarchy without loops. This is
called a ‘directed acyclic graph’. It means that no
concept can be a child (of a child of a child…) of itself.
In this hierarchy, it is not presumed that the children of
one parent are mutually exclusive. In some cases, they
are, what can be made explicit by using the DISJOINT
link. In most cases, they are not.
LinKBase® makes a distinction between the domain
knowledge and the linguistic knowledge. The linguistic
ontology is a subset of the global ontology. It contains
elements (medical knowledge and other) of the ontology
that influence the grammar of a language. Part of it is
present in a specific sub-ontology. Another part is
present in the global ontology in a scattered manner. In
this way, every piece of linguistic knowledge has to get a
(referred-to) place in the domain ontology.
The linguistic ontology is on the level of language, not
on the level of formal domain knowledge. Whether
something truly happens or not is not part of the
linguistic configuration. The distinctions in linguistic
configuration are made according to the kinds of roles
that are needed in a sentence. E.g. the ‘HAS-THEME’ is
used for things that are displaced in a movement
predicate.
E.g. “The nurse removes a tumor.”
What is the relation between ‘nurse’, ‘removing’ and
‘driving’?
 Domain ontologically
How can I think, abstractly, about removing? What
are relevant questions? What are fixed elements,
such as : There is movement ( What kind of
movement?); Something/someone moves ( Who or
what?); Something/someone provokes movement
(Who or what?); The movement has a goal (Which
goal?)

Linguistically
That something/someone removes is necessary
within this sentence. It does not matter who
removes. It does not matter whether the surgeon
really can remove things at all.
This medical ontology has been the basis for a lot of
related products that each deal with a specific problem in
medical environments.
DERIVED APPLICATIONS
Fastcode® is a state of the art coding tool to transform
narrative expressions (diagnosis, clinical findings, etc.)
in natural language into a classification system. Medical
code tables and international classification systems can
contain several thousand codes. Searching for a code is
unpleasant and time-consuming. Fastcode® offers a very
fast and accurate solution using semantic technology.
The semantic database (LinKBase®) contains the
medical terms (cfr. supra) linked to the different
classification systems (Snomed-RT, ICD-9-CM, ICD10,
MedDRA, ICPC, UMLS, MesH, …) on the basis of their
conceptual meaning. Fastcode® analyses the meaning of
the input words and performs a search based on the
related concepts as stored in the LinKBase® medical
ontology. Using this approach increases speed and
accuracy while solving problems related to synonyms,
homonyms, compound words, orthography and multiple
spelling possibilities.
TeSSI® is a fast indexer to index documents on the basis
of their content (deep meaning) rather than on the actual
words contained in the text. This makes it possible to
search a textual database using semantics rather than
string matching. It will be clear that the results of
semantic searches will be superior to those of string
matches.
FUTURE DIRECTIONS OF LinKFactory
LinKfactory is currently used in house by ten medical
knowledge engineers simultaneously. There are over
7,000,000 knowledge elements, and around 2000
modifications are made on a daily basis. A number of
issues still have to be dealt with.
First, developments have started to make the system
DAML+OIL compatible. This is not easy a task mainly
because the DAML+OIL conventions themselves are not
mature enough to be unambiguously understood. The
fact that the reasoning mechanism behind LinkFactory®
is description logic based, makes it however feasible,
and today, we can claim to be 90% DAML+OIL
compatible.
Another future goal is to integrate unsupervised learning
capacities into the LinkFactory®. Using maximumentropy models on large amounts of free texts, we are
currently able to infer head-modifier relationships
automatically from huge text corpora. The goal is now to
find out how the head-modifier relationships can be
“named” by using information from the LinkBase®.
This possibly will lead to an optimal collaboration
amongst statistical and symbolic methods.
REFERENCES
[1] Blázquez, M., Fernández, M, García-Pinar, J.M.,
Gómez, A.-. Building Ontologies at the Knowledge
Levelusing the Ontology Design Environment.
KAW98,
http://ksi.cpsc.ucalgary.ca/KAW/KAW98/blazquez/
[2] Ceusters W. et al. The distinction between linguistic
and conceptual semantics in medical terminology
and its implication for NLP-based knowledge
acquisition. In C Chute (ed): Proceedings of
IMIAWG6 Conference on Natural Language and
Medical Concept Representation (IMIA WG6,
Jacksonville,1997) 71-80.
[3] Cocchiarella, N. B. 1991. Formal Ontology. In H.
Burkhardt and B. Smith (eds.), Handbook of
Metaphysics and Ontology. Philosophia Verlag,
Munich: 640-647.
[4] Dominguez, J., Tadzebao and WebOnto :
Discussing, browsing, and editing ontologies on the
Web, Proceedings of the 11th Banff Knowledge
Acquisition Workshop., (1998)
[5] Farquhar, A., and Rice, J., The Ontolingua Server :
a tool for collaborative ontology construction,
Proceedings of the 10th Banff
Acquisition Workshop, (1996).
Knowledge
[6] Kozaki, K. et al. “Development of an Environment
for Building Ontologies which is based on a
Fundamental Consideration of Relationship and
Role", Proceedings of The Sixth Pacific Knowledge
Acquisition Workshop (PKAW2000), pp.205-221
,Sydney, Australia, December 11-13, 2000.
[7] Mahalingam, K., Huhns, M., An ontology tool for
query formulation in an agent-based context,
Proceedings of the 2nd IFCIS International
Conference on Cooperative Information Systems
(CoopIS '97)
[8] Musen, M.A., The Knowledge Model of Protégé2000: Combining Interoperability and Flexibility,
Proceedings of EKAW 2000 International
Conference on Knowledge Engineering and
Knowledge Management. Methods, Models and
Tools., Juan-les-Pins, France, (October 2000).
[9] Preece, A. et al. Better Knowledge Management
through Knowledge Engineering,. In IEEE
Intelligent Systems 16:1, Jan-Feb, 2001.
Download