Evaluating Fuzzy and Gene Ontology on Different Ontology Levels Parth Mehta Umang Gala Dept. of Computer Engineering Dept. of Computer Engineering Dwarkadas J. Sanghvi COE Dwarkadas J. Sanghvi COE Mumbai, India Mumbai, India parthpmehta93@gmail.com ABSTRACT Ontology is an explicit formal conceptualization of a particular interest domain. It necessarily entails or embodies some sort of world view with respect to a given domain. Ontologies are increasingly being used in several fields such as knowledge management, communication between organizations, systems engineering, information extraction, and the semantic web. Ontology evaluation is a concept of assessing a given ontology from the viewpoint of a specific criterion of application, generally to determine which of several ontologies would best suit a particular purpose. In this paper we attempt to review different levels of evaluation vis-à-vis different ontology approaches. Keywords Fuzzy ontology, Gene ontology, Ontology Evaluation, Semantic web INTRODUCTION With Ontology' is the term used to refer to the shared understanding of some domain of interest which may be used as a unifying framework to solve problems [1]. Ontology is usually employed as a structure to capture knowledge about a specific area by providing relevant concepts and relations between them. Ontologies are agreements about shared conceptualizations [1]. These shared conceptualizations comprise of conceptual frameworks for modelling domain knowledge contentspecific protocols for purpose of communication among inter-operating agents and agreements regarding the representation of particular domain theories [1]. In terms of knowledge sharing context, ontologies are described in the form of definitions of representational vocabulary. Simply put, ontologies are a constitutional data structure for conceptualizing knowledge, however generally we are able to construct several different ontologies conceptualizing the same cluster of knowledge and therefore we should be in a position to determine which of them is best suited to a predefined criterion. Thus, ontology evaluation is a crucial issue that should be addressed if ontologies are to be widely umanggala9@gmail.com and popularly adopted in the semantic web as well as other semantics-aware applications. Further, people constructing ontology also require some mechanism to evaluate the resulting ontology possibly to guide the construction process and refinement steps (if any). Moreover, automated or semi-automated ontology learning techniques also do require efficient evaluation measures that can be used to select the “best matching” ontology out of the many plausible candidates, to ascertain values of tunable parameters of the learning algorithm. Fuzzy Ontology The Web Ontology Language Description Logics (OWL DL) become less suitable in domains in which the concepts to be represented do not have precise definitions. In a lot of cases, this scenario is, unfortunately, likely the rule rather than an exception. To handle this problem, the use of fuzzy ontology offers a solution. Classical ontology languages are not appropriate to deal with imprecision or vagueness in knowledge.In this case the solved model constitutes distinct sources of information and the databases reside on distinct servers each of which can be provided with its own search engine, i.e. information retrieval system. Fuzzy ontology helps in improving the information search retrieval as it simplifies our theories in most scientific fields. A fuzzy ontology is simply an ontology which uses fuzzy logic to provide a natural representation of imprecise and vague knowledge and eases reasoning over it. It improves the search retrieval system. Gene Ontology The Gene Ontology gives an ontology of defined terms representing gene product (a biochemical product which is either a protein or RNA) properties. The ontology covers three domains: 1) cellular component: the interior parts of a cell or its extracellular environment; 2) molecular function, the elemental activities of a gene product at the molecular level, such as binding or catalysis; 3) biological process, operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms. Each term is then assigned to one of these three ontology’s mentioned. Many GO terms have synonym or relations to any one of three ontologies. Genome annotation is the method of capturing data from a gene product and then classifying into one these domains mentioned earlier. most part constructed manually. The ontology is commonly described in a specific formal language and must match the syntactic requirements of that language. [6]. Structure, architecture, design. This is again, primarily important to manually constructed ontologies. It comes to the fore when we want the ontology to meet some pre-defined design principles or criteria. Structural concerns include the organization of the ontology as well as its suitability for further development [6]. LITERATURE REVIEW Evaluation of Lexical/Vocabulary and Data Level Categorization Approaches of Ontology Evaluation In a broad sense, most evaluation approaches belong to one of the following categories [1]: • those that are based on comparing the ontology to a “golden standard” which may in turn be an ontology (e.g.[2]); • those that are based on using the ontology in a particular application and evaluating the results (e.g. [3]); • those that involve comparisons with a source of data about the domain to be examined by the ontology (e.g. [4]); • those where evaluation is performed by humans who try to assess the degree to which the ontology meets a set of predefined criteria, standards, requirements, etc. (e.g. [5]). Ontologies are a deeply complex structure and it is often more feasible to focus on the evaluation of different levels of the ontology separately instead of attempting to directly evaluate the ontology as a whole. This is especially true if we wish to have a predominantly automated evaluation. Though the individual levels have been discussed and defined in detail by several different authors, these definitions tend to be broadly similar and involve the following levels [1]: Lexical, vocabulary, or data layer. Here the focus borders on which concepts, instances, facts, etc. have been covered in the ontology, and the vocabulary employed to represent or identify these concepts. Hierarchy or taxonomy. An ontology generally comprises of a hierarchical is-a relation between concepts. Though various other relations between the concepts may also be present and defined, the is-a relationship is often specifically important and may be the focus of a particular evaluation efforts. Context or application level. An ontology may in turn be part of a larger collection of ontologies and may reference or be referenced by multiple definitions in other ontologies. Thus it becomes imperative to take this context into account when evaluating it. Syntactic level. Evaluation at this level might be of particular intrigue for ontologies that have been for the Similarity between two given strings is measured based on the Levenshtein edit distance, normalized to give scores in the range [0, 1]. Now a string matching measure between two sets of strings is then defined by considering each string in the first set, finding its similarity to the most similar string in the second set, and averaging this over all strings of the first set [1]. Here then, one may take the set of all string and compare it to a “golden standard” set of strings which are considered a good representation of the concepts belonging to the problem domain under consideration. The golden standard could also be in fact another ontology. The lexical content of a particular ontology can also be evaluated by using the concepts of precision and recall, as it is known in information retrieval. Here, precision will be the percentage of the ontology lexical entries (i.e. strings used as concept identifiers) that also show up in the golden standard, with respect to the total number of ontology words. Recall, on the other hand, is the percentage of the golden standard lexical entries that also show up as concept identifiers in the ontology, with respect to the total number of golden standard lexical entries. Another way to attain more tolerant matching criteria by allowing synonyms, etc. is to augment each and every lexical entry with its hypernyms from WordNet or some similar resource [4] and then instead of testing if two lexical entries match, one can test for overlap between their corresponding sets of words [1]. Evaluation of Taxonomic Semantic Relations and Other It has been suggested [4] to use a data-driven approach to determine the degree of structural fit between ontology and a corpus of documents. (1) Given a corpus of documents belonging to the domain of interest, a clustering algorithm based on EM is used to determine, in an unsupervised approach, a probabilistic mixture model of covert “topics” such that each of the documents can be modelled as having been created by a mixture of topics. (2) Each concept c of the ontology is characterised by a set of terms that also includes its name in the ontology (3)The probabilistic models acquired during clustering can be used to assess how suitably the concept c fits that topic, for every topic identified by the clustering algorithm,. (4) At this stage, if it’s needed that each concept fits at least some topic quite well, we can acquire a technique for lexical-level evaluation of the ontology. However a particular drawback of this technique to evaluate relations is that it is extraneous to take the directionality of relations into account. By comparing the ontology with a human-generated golden standard, or via a list of statistically relevant terms, evaluation of ontology can also be done on basis of precision and recall measures. Sadly though, preparing the golden standard requires too much of manual human work. Evaluation on Content-Level In some cases the ontology is a part of a larger collection of ontologies which might reference one another. (i.e. one ontology may make use of a concept defined in another ontology). Hence ontology can be evaluated in several ways using this context. For example, the Swoogle search engine (of [7]) makes use of cross-references present among semantic-web documents to define a graph and subsequently calculate a score for each ontology in a way similar to PageRank that is used by the Google web search engine. [8] states that not all “links” or references among ontologies are treated the same. For example if a particular ontology defines a subclass of a class from another ontology, then this specific reference may be treated more important than if that particular ontology only uses a class from another as the domain or range of some relation. Also, ontology may be strengthened with metadata such as its design policy, how it is being employed by others, along with “peer reviews” provided by users of this ontology [1]. Evaluation Based on Application Partly depending on the ontology used the outputs of the application, or even its performance on the given task, might be largely varying. Thus one may make a case that a good ontology is one which helps the application being considered to produce good results on the given task. Hence Ontologies may thus be evaluated just by embedding them into an application and then analyzing the results of the application. For example, [3] talks about a case where the ontology, together with its relations (including is-a and others) is used mainly to ascertain how well related the meaning of two concepts is. The task in question is a speech recognition problem, wherein final output evaluation of the task is relatively straightforward. This approach to ontology evaluation also has several drawbacks: (1) it is easy to see that an ontology is good or bad when worked in a particular way for a particular task, however it’s difficult to generalize this observation; (2) the ontology can only be a minute component of the application and its fallout on the outcome may be relatively small and indirect; (3) drawing a comparison between several ontologies is only possible if they can all be put into the same application. Data-Driven Evaluation Another method to evaluate ontology can be by comparing it to existing data (generally a collection of textual documents) regarding the problem domain concerning the ontology. For example, [8] show how to find out whether the ontology refers to a particular topic, and to categorize the ontology into a known directory of topics: one extracts textual data from the ontology (such as names of concepts and relations) and then uses this as an input to a text classification model [1]. In case of exhaustive ontologies incorporating colossal factual information, the documents could also be used as a source of “facts” regarding the external world, and the evaluation checks if these facts can also be obtained from the ontology. Multiple Criteria Approaches In effect, the general problem of ontology evaluation has been pushed to the question of how to evaluate the ontology vis-à-vis the individual evaluation criteria. On the positive side, these approaches allow us to combine criteria from most of the levels discussed above. ONTOLOGY EVALUATION AND COMPARISON After carrying out research and gathering information we try to carry out comparison between two ontology’s mentioned earlier i.e. Fuzzy ontology and Gene ontology ONTOLOGY EVALUATION LEXICAL/VOCABULARY AND CONCEPT/DATA LEVEL EVALUATION OF TAXONOMIC AND OTHER SEMANTIC RELATIONS CONTEXT-LEVEL EVALUATION APPLICATION-BASED EVALUATION DATA-DRIVEN EVALUATION MULTIPLE-CRITERIA APPROACHES FUZZY ONTOLOGY GENE ONTOLOGY As it is based on information retrieval, an approximation for the query is possible Hence it is better suited for lexical level Here the query passed may not be précised. Hence the intervention of an expert to deduce the data is not required. As Fuzzy ontology eases the process of information retrieval, but this ontology uses a number of references or concepts to do so. A fuzzy ontology is simply an ontology which uses fuzzy logic to provide a natural representation of imprecise and vague knowledge eases reasoning over it and overcomes the classical ontology languages which are not appropriate to deal with imprecision or vagueness in knowledge. On the contrary, the gene ontology needs an exact term or a part of a term in order to retrieve data. Not suited for it The data is not predefined in fuzzy ontology. The query may change and there is no way to evaluate based on past data. Fuzzy ontology can be evaluated on various factors like multiplylocated terms, Query expansion, intermediate query for grouping, Storage required and knowledge representation The data is predefined and hence based on the past data evaluation can be made. Here the query passed has to be précised. That is why a need for the intervention of an expert in order to evaluate is needed. On the contrary, Gene ontology classifies a particular data into three domains only. Hence no extra relations or concepts are needed for them to classify for a particular gene product. Many Gene Ontology (GO) terms have synonyms; GO uses 'synonym' in a loose sense, as the names within the synonyms field may not mean exactly the same as the term they are attached to. Gene ontology can be evaluated on factors like whether the classification into one of the processes specified is appropriate, based on the past data, results based on association rules etc. CONCLUSION We have attempted to evaluate fuzzy ontology and gene ontology on different evaluation methods. A generalized conclusion was difficult to formulate as both these ontology’s had their pros and cons at different levels. Ontology evaluation remains an important open problem in the area of ontology supported computing and the semantic web. Still a single best or preferred approach to ontology evaluation has not been defined yet. In our opinion, future work in this area should focus particularly on defining a single approach in order to carry out evaluation for future ontology’s to be defined. REFERENCES [1] Mike Uschold, Michael G. 1996. Ontologies: Principles, Methods and Applications, Knowledge Engineering Review. Vol 11 No 2 [2] Maedche, A., Staab, S., 2002. Measuring similarity between ontologies. Proc. CIKM. LNAI vol. 2473. [3] Porzel, R., Malaka, R. A task-based approach for ontology evaluation. ECAI 2004 Workshop Ont. Learning and Population. [4] Brewster, C. et al. 2004. Data driven ontology evaluation. Proceedings of Int. Conf. Language Resources and Evaluation, Lisbon [5] Lozano-Tello, A., Gómez-Pérez, A. 2004. Ontometric: A method to choose the on appropriate 15(2):1–18 ontology. J. Datab. Mgmt., [6] Gómez-Pérez A. 1996. Towards a framework to verify knowledge sharing technology. Expert Systems with Applications, 11(4):519– 529. [7] Ding, L., et al. 2004. Swoogle: A search and metadata engine for the semantic web. Proc. CIKM pp. 652–659. [8] Patel, C., et al. 2004. OntoKhoj: a semantic web portal for ontology searching, ranking and classification. ACM Web Inf. & Data Mgmt.