1. Introduction

advertisement
Ontology Learning using WordNet Lexicon*
He Hu1, Xiaoyong Du1, Dayou Liu2 and Ji Hong Ouyang2
1
School of Information, Renmin University of China, Beijin 100872, P.R. China
2
Open Symbol Computation and Knowledge Engineering Laboratory of
State Education Committee
College of Computer Science and Technology, Jilin University
Changchun 130012, P.R. China
E-MAIL: luckh2@163.com
Abstract:
Ontology based approach has been popularized by
current Semantic Web researches. However, ontology
building by hand has proven to be a very hard and
error-prone task and become the bottleneck of ontology
acquiring process. WordNet, an electronic lexical database, is
considered to be the most important resource available to
researchers in computational linguistics. The paper proposes
an ontology learning approach, which uses WordNet lexicon
resources to build a standard OWL ontology model. The
approach will help the automation of ontology building and
be very useful in ontology-based applications.
people are more and more concerned with ontologies,
various ontology editing tools and environments have been
implemented which can be used to define and manage
ontologies by human experts. However, ontology building
by hand has proven to be a very hard job and become the
bottleneck of ontology acquisition. The problem of lacking
ontologies has formed a great obstacle for Semantic Web's
further developments.
Proof
Proof
Logic
Rules
Ontology
Keywords:
Ontology Learning, WordNet, OWL
RDF Schema
RDF M&S
1. Introduction
The Semantic Web has been regarded as the next
version of current Web, which aims to add semantics and
better structure to the information available on the Web.
Underlying this is the goal of making the Web more
effective not only for humans but also for automatic
software agents. The basic idea is to create an environment
for intelligent programs to carry out tasks independently
on behalf of the user. Ontologies in fact turn out to be the
backbone technology for the Semantic Web; Tim
Berners-Lee [1] has portrayed Semantic Web as a layered
architecture where ontology layer lies in the middle of the
other layers.
With the development of Semantic Web research,
*
Sig
XML Schema
XML
URI
Namespaces
Unicode
Fig 1: Semantic Web Layers
Research in ontologies to date has mainly addressed
the basic principles, such as knowledge representation
formalisms, devoting only limited attention to more
practical issues such as techniques and tools aimed at
ontology's actual construction and acquisition. We propose
an ontology learning approach in this paper which uses
WordNet lexicon resources to build a standard OWL
ontology. The approach will help the automation of
ontology building and will be very useful in ontology-
This research was supported by NSFC of China (project number: 604963205,60373098)
based applications.
The paper is organized as follows. We first give some
preliminary knowledge about ontologies, OWL and
WordNet system in Section 2, and then describe how we
can use the ontology mapping approach to get OWL
ontologies with the help of WordNet lexicon in Section 3.
Section 4 discusses our overall framework for ontology
learning. After an analysis of current related works on the
subject, we conclude our research work and give some
ideas for future works in the last section.

OWL language
A number of ontology definition languages have been
developed over the past years. Among them, the Web
Ontology Language (OWL)[3] is the newly emerging
standard proposed and supported by W3C for defining
ontologies in Semantic Web. It is based on description
logic, a subset of first-order logic that provides sound and
decidable reasoning support. The OWL Web Ontology
Language is designed for use by applications that need to
process the content of information instead of just
presenting information to humans. OWL facilitates greater
machine interpretability of Web content than that
supported by XML, RDF, and RDF Schema (RDF-S) by
providing additional vocabulary along with a formal
semantics. OWL has three increasingly-expressive
sublanguages: OWL Lite, OWL DL, and OWL Full.
2. Preliminary

Ontology Approach
Tom Gruber [2] has defined ontology as "a
specification of a conceptualization". Ontologies provide a
deeper level of meaning by providing equivalence
relations between concepts; they can standardize meaning,
description, representation of involved concepts, terms and
attributes; capture the semantics involved via domain
characteristics, resulting in semantic metadata and
"ontological commitment" which forms basis for
knowledge sharing and reuse. Ontologies can provide a
domain theory using an expressive language for capturing
the domain. One of the properties of ontologies is that all
relevant knowledge has been made explicit; this
constitutes in the necessity of specifying many
relationships that are otherwise left implicit and are only
made explicit in the applications developed for working
with the ontology. A list of benefits of ontology approach
is presented below:
1.
Ontologies provide a common vocabulary and
definition of rules for use by independently developed
services;
2.
Agreements among companies and organizations
sharing common services can be made with regard to
their usage and the meaning of relevant concepts can
be expressed unambiguously;
3.
By composing component ontologies, mapping
ontologies to one another and brokering terminology
among participating resources and services,
independently developed systems, agents and services
can work together to share information and processes
consistently, accurately, and completely;
4.
Ontologies also facilitate conversations among agents
to collect, process, fuse, and exchange information;
5.
Ontologies can improve search accuracy by enabling
contextual search using concept definitions and
relations among them instead of (in addition to)
statistical relevance of keywords.

WordNet System
Wordnet is an on-line lexical database which was
developed at the Cognitive Science Laboratory at
Princeton University under the direction of George Miller
[4]. The design of WordNet is inspired by current
psycholinguistic theories of human lexical memory.
WordNet is considered to be the most important resource
available to researchers in computational linguistics, text
analysis, and many related areas. Its design is inspired by
current psycholinguistic and computational theories of
human lexical memory. English nouns, verbs, adjectives,
and adverbs are organized into synonym sets, each
representing one underlying lexicalized concept. Different
relations link the synonym sets including: antonymy,
hypernymy, hyponymy, holonymy, meronymy, synonymy,
troponymy etc. The most current version of WordNet
system is version 2.0. A web interface to WordNet is
available at: www.cogsci.princeton.edu/cgi-bin/webwn.
WordNet system has great influence on the
development of lexical database of the whole world.
EuroWordNet is a multilingual database with wordnets for
several European languages; in China, we have Chinese
Concept Dictionary (CCD)[5] which is also a
WordNet-like semantic lexicon of contemporary Chinese.
3. A Mapping from WordNet to OWL
WordNet system uniquely identifies a word sense in
two ways: with a set of terms called synset and a textual
definition called gloss. For example, for the third sense of
"transport", the synset list would consist of the words
"transportation", "shipping", and "transport". The gloss
textual definition of that third sense would be "the
commercial enterprise of transporting goods and materials".
2
WordNet codes other types of semantic relations as well,
such as kind-of, part-of, and several types of similarity
relations. Table 1 lists some of the most important
WordNet semantic relations and their interpretations:
<rdfs:range
rdf:resource="http://www.w3.org/2000/01/rdf-schema#Lit
eral"/>
</owl:Property>
Tab 1: WordNet Semantic Relations
<owl:Property rdf:ID="example">
<rdfs:domain rdf:resource="#Synset"/>
<rdfs:range
rdf:resource="http://www.w3.org/2000/01/rdf-schema#Lit
eral"/>
</owl:Property>
symbol
!
@
~
#
%
*
Relation
Meaning
Y is an antonym of X if X is
Antonym
opposite in meaning to Y.
Y is a hypernym of X if X
Hypernym
is a (kind of) Y.
X is a hyponym of Y if X is
Hyponym
a (kind of) Y.
Y is a holonymof X if X is a
Holonym
part of Y.
X is a meronym of Y if X is
Meronym
a part of Y.
X is a troponym of Y if to X
Troponym
is to Y in some manner.
<owl:Property rdf:ID="synsetRelation">
<rdfs:domain rdf:resource="#Synset"/>
<rdfs:range rdf:resource="#Synset"/>
</owl:Property>
<owl:Property rdf:ID="antonym">
<rdf:type rdf:resource="&owl; SymmetricProperty"
/>
<rdfs:subPropertyOf
rdf:resource="#synsetRelation"/>
</owl:Property>
Based on the analysis above, we give a illustration of
the meta-model of WordNet system in figure 2; An OWL
representation of the meta-model is given below the figure.
The Mapping from WordNet to OWL is based on this
meta-model.
<owl:Property rdf:ID="hypernym">
<rdf:type rdf:resource="&owl;TransitiveProperty" />
<rdfs:subPropertyOf
rdf:resource="#synsetRelation"/>
Word
hasSense
</owl:Property>
Synset
example
<owl:Property rdf:ID="hyponym">
<rdf:type rdf:resource="&owl;TransitiveProperty" />
<rdfs:subPropertyOf
rdf:resource="#synsetRelation"/>
</owl:Property>
rsdf:literal
synsetRelation
<owl:Property rdf:ID="troponym">
<rdf:type rdf:resource="&owl;TransitiveProperty" />
<rdfs:subPropertyOf
rdf:resource="#synsetRelation"/>
</owl:Property>
glo
ss
rsdf:literal
<owl:FunctionalProperty rdf:ID="typeOfSynset">
<rdfs:domain rdf:resource="#Synset"/>
<rdfs:range rdf:resource="#TypeOfSynset"/>
</owl:FunctionalProperty>
Synset
Fig 2: Meta-Model of WordNet
<owl:Class rdf:ID="Synset">
</owl:Class>
<owl:Class rdf:ID="TypeOfSynset">
<owl:oneOf rdf:parseType="Collection">
<owl:Thing rdf:about="#Noun"/>
<owl:Thing rdf:about="#Verb"/>
<owl:Property rdf:ID="gloss">
<rdfs:domain rdf:resource="#Synset"/>
3
<owl:Thing rdf:about="#Adjective"/>
<owl:Thing rdf:about="#Adverb"/>
</owl:oneOf>
</owl:Class>
<owl:Class rdf:ID="Word">
</owl:Class>
<owl:Property rdf:ID="hasSense">
<rdfs:domain rdf:resource="#Word"/>
<rdfs:range rdf:resource="#Synset"/>
</owl:Property>
<owl:Property rdf:ID="wordHasTheSense">
<rdfs:domain rdf:resource="#Synset"/>
<rdfs:range rdf:resource="#Word"/>
</owl:Property>
Synset, typeOfSynset and word of WordNet are
defined as OWL concepts; antonymy, hypernymy,
hyponymy, holonymy, meronymy etc. are defined as OWL
properties between synsets. These properties have different
characteristics: antonymy is symmetric; other properties
such as hypernymy, hyponymy, holonymy and meronymy
are transitive. These characteristics encoded in OWL can
support ontology reasoning tasks. For example, "transport"
is a hypernymy of "bus"; "bus" is a hypernymy of
"trolleybus"; as the hypernymy property is transitive, we
can automatically infer that "transport" is a hypernymy of
"trolleybus".
The initialized elements of the mapping process are
words, for every word in the input word set, we lookup
WordNet lexicon through calling API and get the semantic
resources including its synsets, antonymy, hypernymy,
hyponymy, holonymy, meronymy etc., then An OWL
ontology definition will be generated based on the
meta-model above. The overall architecture will be
presented in the next section. Below presents an example
ontology, The ontology is created in Protege environment
with OWL-plugin, Protege [6] is a general-purpose
knowledge acquisition framework that is widely used by
groups in various fields. Figure 3 illustrates an example
ontology definition about "bus"; a detailed discussion of
the ontology is omitted here:
Fig 3: An Example Ontology in Protégé
4. An Ontology Learning Framework
Domain or Application
Free Texts Corpus
Language
Analysis
Words or Lexic Tokens
WordNet
Lexicon
FaCT,...
Reasoners
OWL Ontologies
Editors
Protege, ...
Ontology
Applications
Fig 4: Ontology Learning Framework
The overall architecture of our ontology learning
framework is presented in figure 4. The input for the
framework is domain or application free texts corpus; the
framework uses a language analyzer to extract terminology
from the corpus, the linguistic knowledge such as the
4
grammar, morphological rules and some syntactic and
semantic templates are used to do the natural language
processing in this process. Words or lexical tokens will be
generated after the natural language processing. Next, we
use the WordNet lexical knowledge bases to retrieve
semantic concepts and relations of the terms. Based on the
mapping described in the last section, the words or lexical
tokens are mapped into OWL ontologies with the help of
WordNet lexicon. It is a domain/application independent
framework and can learn lexical and ontological
knowledge for both general and specific domains.
Ontologies are used in wide range of fields such as
semantic web, search engines, e-commerce, natural
language processing, knowledge engineering, information
extraction and retrieval. The major problems in building
ontologies are the bottleneck of ontology acquisition and
time-consuming construction of various ontologies for
various domains/applications. Meanwhile the automation
of ontology construction by ontology learning is a solution
to account for these problems.
In this paper, we propose an ontology learning
approach and framework based on the mapping from
WordNet lexicon to OWL ontologies. Synset,
typeOfSynset and word of WordNet are defined as OWL
concepts; antonymy, hypernymy, hyponymy, holonymy,
meronymy etc. are defined as OWL properties between
synsets. This approach will help the automation of
ontology building and be very useful in ontology based
applications. We plan to study Chinese ontology learning
with the support of Chinese Concept Dictionary (CCD) in
our future research works.
5. Related Works
Alessandro Lenci et.al.[7] have researched in
formalizing the EuroWordNet Synsets and Top Ontology
in RDF and writing semantic frames in RDF/S as basis for
interlingua representations. However, RDFS has unclear
semantics, no clean separation between: Instances,
Ontologies and meta-ontologies (e. g. RDFS language
itself); moreover, RDFS has no inference model which is
of crucial importance for automatic tasks. Our mapping are
targeted to OWL, which has clear semantics bringing by
description logic systems; OWL can distinguishes between
Instances and ontologies etc. ; OWL also enjoys a
well-founded inference model from some particular
description logics (SHOQ(D)[8]).
There are many other research works intended to
extend WordNet or to achieving a formal specification of
WordNet. Martin [9] presents the transformation of the
noun-related part of WordNet into a genuine "lexical
ontology" to support knowledge representation, sharing
and retrieval within a knowledge base or on the Web. Aldo
Gangemi et. al. [10,11] develop a research program which
aims to achieve a formal specification of WordNet. Within
this program, they developed a hybrid bottom-up top-down
methodology to automatically extract association relations
from WordNet, and to interpret those associations in terms
of a set of conceptual relations, formally defined in the
DOLCE foundational ontology. The focus of paper [9] is
to guide and ease the representation, retrieval and sharing
of general knowledge; the focus of [10,11] is the extension
and axiomatization of conceptual relations in WordNet.
Neither of them uses mapping approaches as illustrated in
this paper, and the ontologies are not targeted at OWL.
References
[1] Tim Berners-Lee, James Hendler and Ora Lasilla,
The Semantic Web, The Scientific American, May
2001
[2] Gruber T. R., Toward Principles for the Design of
Ontologies
Used
for
Knowledge
Sharing,
International Journal of Human and Computer
Studies, 43(5/6): 907-928, 1995
[3] W3C, "OWL Web Ontology Language Overview" ,
http://Web.w3.org/TR/owl-features/, 2003
[4] George Miller, WordNet: An On-line Lexical Database, International Journal of Lexicography, 3(4), pp.
235- 312. 1990
[5] Y. Liu, J. S. Yu and S. W. Yu, A Treestructure
Solution for the Development of Chinese WordNet.
The First Global WordNet Conference, Mysore, India,
pp51-56, 2002
[6] Component-Based
Support
for
Building
Knowledge-Acquisition Systems, Musen MA,
Fergerson RW, Grosso WE, et al. , Conference on
Intelligent Information Processing (IIP 2000) of the
International Federation for Information Processing
World Computer Congress (WCC 2000), Beijing,
2000
[7] Nicoletta Calzolari, Antonio Zampolli, Alessandro
Lenci: Towards a Standard for a Multilingual Lexical
Entry: The EAGLES/ISLE Initiative. CICLing 2002:
264-279
6. Conclusion and Future Works
Research on ontology is becoming increasingly
widespread in the computer science community.
5
[8] Ontology Reasoning in the SHOQ(D) Description
Logic, Horrocks I. and Sattler U., Proceedings of the
Seventeenth International Joint Conference on
Artificial Intelligence, 2001
[9] Martin Ph., Correction and Extension of WordNet 1.7,
11th International Conference on Conceptual
Structures (ICCS), LNAI 2746, pp160-173, 2003
[10] Aldo Gangemi, Roberto Navigli, Paola Velardi, The
OntoWordNet Project: Extension and Axiomatization
of Conceptual Relations in WordNet, LNCS 2888,
pp820-838, 2003
[11] Gangemi A., Navigli R., Velardi P., Axiomatizing
WordNet Glosses in the OntoWordNet Project, 2nd
International Semantic Web Conference (ISWC),
2003
6
Download