Ontology Learning using WordNet Lexicon* He Hu1, Xiaoyong Du1, Dayou Liu2 and Ji Hong Ouyang2 1 School of Information, Renmin University of China, Beijin 100872, P.R. China 2 Open Symbol Computation and Knowledge Engineering Laboratory of State Education Committee College of Computer Science and Technology, Jilin University Changchun 130012, P.R. China E-MAIL: luckh2@163.com Abstract: Ontology based approach has been popularized by current Semantic Web researches. However, ontology building by hand has proven to be a very hard and error-prone task and become the bottleneck of ontology acquiring process. WordNet, an electronic lexical database, is considered to be the most important resource available to researchers in computational linguistics. The paper proposes an ontology learning approach, which uses WordNet lexicon resources to build a standard OWL ontology model. The approach will help the automation of ontology building and be very useful in ontology-based applications. people are more and more concerned with ontologies, various ontology editing tools and environments have been implemented which can be used to define and manage ontologies by human experts. However, ontology building by hand has proven to be a very hard job and become the bottleneck of ontology acquisition. The problem of lacking ontologies has formed a great obstacle for Semantic Web's further developments. Proof Proof Logic Rules Ontology Keywords: Ontology Learning, WordNet, OWL RDF Schema RDF M&S 1. Introduction The Semantic Web has been regarded as the next version of current Web, which aims to add semantics and better structure to the information available on the Web. Underlying this is the goal of making the Web more effective not only for humans but also for automatic software agents. The basic idea is to create an environment for intelligent programs to carry out tasks independently on behalf of the user. Ontologies in fact turn out to be the backbone technology for the Semantic Web; Tim Berners-Lee [1] has portrayed Semantic Web as a layered architecture where ontology layer lies in the middle of the other layers. With the development of Semantic Web research, * Sig XML Schema XML URI Namespaces Unicode Fig 1: Semantic Web Layers Research in ontologies to date has mainly addressed the basic principles, such as knowledge representation formalisms, devoting only limited attention to more practical issues such as techniques and tools aimed at ontology's actual construction and acquisition. We propose an ontology learning approach in this paper which uses WordNet lexicon resources to build a standard OWL ontology. The approach will help the automation of ontology building and will be very useful in ontology- This research was supported by NSFC of China (project number: 604963205,60373098) based applications. The paper is organized as follows. We first give some preliminary knowledge about ontologies, OWL and WordNet system in Section 2, and then describe how we can use the ontology mapping approach to get OWL ontologies with the help of WordNet lexicon in Section 3. Section 4 discusses our overall framework for ontology learning. After an analysis of current related works on the subject, we conclude our research work and give some ideas for future works in the last section. OWL language A number of ontology definition languages have been developed over the past years. Among them, the Web Ontology Language (OWL)[3] is the newly emerging standard proposed and supported by W3C for defining ontologies in Semantic Web. It is based on description logic, a subset of first-order logic that provides sound and decidable reasoning support. The OWL Web Ontology Language is designed for use by applications that need to process the content of information instead of just presenting information to humans. OWL facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema (RDF-S) by providing additional vocabulary along with a formal semantics. OWL has three increasingly-expressive sublanguages: OWL Lite, OWL DL, and OWL Full. 2. Preliminary Ontology Approach Tom Gruber [2] has defined ontology as "a specification of a conceptualization". Ontologies provide a deeper level of meaning by providing equivalence relations between concepts; they can standardize meaning, description, representation of involved concepts, terms and attributes; capture the semantics involved via domain characteristics, resulting in semantic metadata and "ontological commitment" which forms basis for knowledge sharing and reuse. Ontologies can provide a domain theory using an expressive language for capturing the domain. One of the properties of ontologies is that all relevant knowledge has been made explicit; this constitutes in the necessity of specifying many relationships that are otherwise left implicit and are only made explicit in the applications developed for working with the ontology. A list of benefits of ontology approach is presented below: 1. Ontologies provide a common vocabulary and definition of rules for use by independently developed services; 2. Agreements among companies and organizations sharing common services can be made with regard to their usage and the meaning of relevant concepts can be expressed unambiguously; 3. By composing component ontologies, mapping ontologies to one another and brokering terminology among participating resources and services, independently developed systems, agents and services can work together to share information and processes consistently, accurately, and completely; 4. Ontologies also facilitate conversations among agents to collect, process, fuse, and exchange information; 5. Ontologies can improve search accuracy by enabling contextual search using concept definitions and relations among them instead of (in addition to) statistical relevance of keywords. WordNet System Wordnet is an on-line lexical database which was developed at the Cognitive Science Laboratory at Princeton University under the direction of George Miller [4]. The design of WordNet is inspired by current psycholinguistic theories of human lexical memory. WordNet is considered to be the most important resource available to researchers in computational linguistics, text analysis, and many related areas. Its design is inspired by current psycholinguistic and computational theories of human lexical memory. English nouns, verbs, adjectives, and adverbs are organized into synonym sets, each representing one underlying lexicalized concept. Different relations link the synonym sets including: antonymy, hypernymy, hyponymy, holonymy, meronymy, synonymy, troponymy etc. The most current version of WordNet system is version 2.0. A web interface to WordNet is available at: www.cogsci.princeton.edu/cgi-bin/webwn. WordNet system has great influence on the development of lexical database of the whole world. EuroWordNet is a multilingual database with wordnets for several European languages; in China, we have Chinese Concept Dictionary (CCD)[5] which is also a WordNet-like semantic lexicon of contemporary Chinese. 3. A Mapping from WordNet to OWL WordNet system uniquely identifies a word sense in two ways: with a set of terms called synset and a textual definition called gloss. For example, for the third sense of "transport", the synset list would consist of the words "transportation", "shipping", and "transport". The gloss textual definition of that third sense would be "the commercial enterprise of transporting goods and materials". 2 WordNet codes other types of semantic relations as well, such as kind-of, part-of, and several types of similarity relations. Table 1 lists some of the most important WordNet semantic relations and their interpretations: <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Lit eral"/> </owl:Property> Tab 1: WordNet Semantic Relations <owl:Property rdf:ID="example"> <rdfs:domain rdf:resource="#Synset"/> <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Lit eral"/> </owl:Property> symbol ! @ ~ # % * Relation Meaning Y is an antonym of X if X is Antonym opposite in meaning to Y. Y is a hypernym of X if X Hypernym is a (kind of) Y. X is a hyponym of Y if X is Hyponym a (kind of) Y. Y is a holonymof X if X is a Holonym part of Y. X is a meronym of Y if X is Meronym a part of Y. X is a troponym of Y if to X Troponym is to Y in some manner. <owl:Property rdf:ID="synsetRelation"> <rdfs:domain rdf:resource="#Synset"/> <rdfs:range rdf:resource="#Synset"/> </owl:Property> <owl:Property rdf:ID="antonym"> <rdf:type rdf:resource="&owl; SymmetricProperty" /> <rdfs:subPropertyOf rdf:resource="#synsetRelation"/> </owl:Property> Based on the analysis above, we give a illustration of the meta-model of WordNet system in figure 2; An OWL representation of the meta-model is given below the figure. The Mapping from WordNet to OWL is based on this meta-model. <owl:Property rdf:ID="hypernym"> <rdf:type rdf:resource="&owl;TransitiveProperty" /> <rdfs:subPropertyOf rdf:resource="#synsetRelation"/> Word hasSense </owl:Property> Synset example <owl:Property rdf:ID="hyponym"> <rdf:type rdf:resource="&owl;TransitiveProperty" /> <rdfs:subPropertyOf rdf:resource="#synsetRelation"/> </owl:Property> rsdf:literal synsetRelation <owl:Property rdf:ID="troponym"> <rdf:type rdf:resource="&owl;TransitiveProperty" /> <rdfs:subPropertyOf rdf:resource="#synsetRelation"/> </owl:Property> glo ss rsdf:literal <owl:FunctionalProperty rdf:ID="typeOfSynset"> <rdfs:domain rdf:resource="#Synset"/> <rdfs:range rdf:resource="#TypeOfSynset"/> </owl:FunctionalProperty> Synset Fig 2: Meta-Model of WordNet <owl:Class rdf:ID="Synset"> </owl:Class> <owl:Class rdf:ID="TypeOfSynset"> <owl:oneOf rdf:parseType="Collection"> <owl:Thing rdf:about="#Noun"/> <owl:Thing rdf:about="#Verb"/> <owl:Property rdf:ID="gloss"> <rdfs:domain rdf:resource="#Synset"/> 3 <owl:Thing rdf:about="#Adjective"/> <owl:Thing rdf:about="#Adverb"/> </owl:oneOf> </owl:Class> <owl:Class rdf:ID="Word"> </owl:Class> <owl:Property rdf:ID="hasSense"> <rdfs:domain rdf:resource="#Word"/> <rdfs:range rdf:resource="#Synset"/> </owl:Property> <owl:Property rdf:ID="wordHasTheSense"> <rdfs:domain rdf:resource="#Synset"/> <rdfs:range rdf:resource="#Word"/> </owl:Property> Synset, typeOfSynset and word of WordNet are defined as OWL concepts; antonymy, hypernymy, hyponymy, holonymy, meronymy etc. are defined as OWL properties between synsets. These properties have different characteristics: antonymy is symmetric; other properties such as hypernymy, hyponymy, holonymy and meronymy are transitive. These characteristics encoded in OWL can support ontology reasoning tasks. For example, "transport" is a hypernymy of "bus"; "bus" is a hypernymy of "trolleybus"; as the hypernymy property is transitive, we can automatically infer that "transport" is a hypernymy of "trolleybus". The initialized elements of the mapping process are words, for every word in the input word set, we lookup WordNet lexicon through calling API and get the semantic resources including its synsets, antonymy, hypernymy, hyponymy, holonymy, meronymy etc., then An OWL ontology definition will be generated based on the meta-model above. The overall architecture will be presented in the next section. Below presents an example ontology, The ontology is created in Protege environment with OWL-plugin, Protege [6] is a general-purpose knowledge acquisition framework that is widely used by groups in various fields. Figure 3 illustrates an example ontology definition about "bus"; a detailed discussion of the ontology is omitted here: Fig 3: An Example Ontology in Protégé 4. An Ontology Learning Framework Domain or Application Free Texts Corpus Language Analysis Words or Lexic Tokens WordNet Lexicon FaCT,... Reasoners OWL Ontologies Editors Protege, ... Ontology Applications Fig 4: Ontology Learning Framework The overall architecture of our ontology learning framework is presented in figure 4. The input for the framework is domain or application free texts corpus; the framework uses a language analyzer to extract terminology from the corpus, the linguistic knowledge such as the 4 grammar, morphological rules and some syntactic and semantic templates are used to do the natural language processing in this process. Words or lexical tokens will be generated after the natural language processing. Next, we use the WordNet lexical knowledge bases to retrieve semantic concepts and relations of the terms. Based on the mapping described in the last section, the words or lexical tokens are mapped into OWL ontologies with the help of WordNet lexicon. It is a domain/application independent framework and can learn lexical and ontological knowledge for both general and specific domains. Ontologies are used in wide range of fields such as semantic web, search engines, e-commerce, natural language processing, knowledge engineering, information extraction and retrieval. The major problems in building ontologies are the bottleneck of ontology acquisition and time-consuming construction of various ontologies for various domains/applications. Meanwhile the automation of ontology construction by ontology learning is a solution to account for these problems. In this paper, we propose an ontology learning approach and framework based on the mapping from WordNet lexicon to OWL ontologies. Synset, typeOfSynset and word of WordNet are defined as OWL concepts; antonymy, hypernymy, hyponymy, holonymy, meronymy etc. are defined as OWL properties between synsets. This approach will help the automation of ontology building and be very useful in ontology based applications. We plan to study Chinese ontology learning with the support of Chinese Concept Dictionary (CCD) in our future research works. 5. Related Works Alessandro Lenci et.al.[7] have researched in formalizing the EuroWordNet Synsets and Top Ontology in RDF and writing semantic frames in RDF/S as basis for interlingua representations. However, RDFS has unclear semantics, no clean separation between: Instances, Ontologies and meta-ontologies (e. g. RDFS language itself); moreover, RDFS has no inference model which is of crucial importance for automatic tasks. Our mapping are targeted to OWL, which has clear semantics bringing by description logic systems; OWL can distinguishes between Instances and ontologies etc. ; OWL also enjoys a well-founded inference model from some particular description logics (SHOQ(D)[8]). There are many other research works intended to extend WordNet or to achieving a formal specification of WordNet. Martin [9] presents the transformation of the noun-related part of WordNet into a genuine "lexical ontology" to support knowledge representation, sharing and retrieval within a knowledge base or on the Web. Aldo Gangemi et. al. [10,11] develop a research program which aims to achieve a formal specification of WordNet. Within this program, they developed a hybrid bottom-up top-down methodology to automatically extract association relations from WordNet, and to interpret those associations in terms of a set of conceptual relations, formally defined in the DOLCE foundational ontology. The focus of paper [9] is to guide and ease the representation, retrieval and sharing of general knowledge; the focus of [10,11] is the extension and axiomatization of conceptual relations in WordNet. Neither of them uses mapping approaches as illustrated in this paper, and the ontologies are not targeted at OWL. References [1] Tim Berners-Lee, James Hendler and Ora Lasilla, The Semantic Web, The Scientific American, May 2001 [2] Gruber T. R., Toward Principles for the Design of Ontologies Used for Knowledge Sharing, International Journal of Human and Computer Studies, 43(5/6): 907-928, 1995 [3] W3C, "OWL Web Ontology Language Overview" , http://Web.w3.org/TR/owl-features/, 2003 [4] George Miller, WordNet: An On-line Lexical Database, International Journal of Lexicography, 3(4), pp. 235- 312. 1990 [5] Y. Liu, J. S. Yu and S. W. Yu, A Treestructure Solution for the Development of Chinese WordNet. The First Global WordNet Conference, Mysore, India, pp51-56, 2002 [6] Component-Based Support for Building Knowledge-Acquisition Systems, Musen MA, Fergerson RW, Grosso WE, et al. , Conference on Intelligent Information Processing (IIP 2000) of the International Federation for Information Processing World Computer Congress (WCC 2000), Beijing, 2000 [7] Nicoletta Calzolari, Antonio Zampolli, Alessandro Lenci: Towards a Standard for a Multilingual Lexical Entry: The EAGLES/ISLE Initiative. CICLing 2002: 264-279 6. Conclusion and Future Works Research on ontology is becoming increasingly widespread in the computer science community. 5 [8] Ontology Reasoning in the SHOQ(D) Description Logic, Horrocks I. and Sattler U., Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, 2001 [9] Martin Ph., Correction and Extension of WordNet 1.7, 11th International Conference on Conceptual Structures (ICCS), LNAI 2746, pp160-173, 2003 [10] Aldo Gangemi, Roberto Navigli, Paola Velardi, The OntoWordNet Project: Extension and Axiomatization of Conceptual Relations in WordNet, LNCS 2888, pp820-838, 2003 [11] Gangemi A., Navigli R., Velardi P., Axiomatizing WordNet Glosses in the OntoWordNet Project, 2nd International Semantic Web Conference (ISWC), 2003 6