Evaluating Fuzzy and Gene Ontology on Different Ontology Levels

advertisement
Evaluating Fuzzy and Gene Ontology on Different Ontology
Levels
Parth Mehta
Umang Gala
Dept. of Computer Engineering
Dept. of Computer Engineering
Dwarkadas J. Sanghvi COE
Dwarkadas J. Sanghvi COE
Mumbai, India
Mumbai, India
parthpmehta93@gmail.com
ABSTRACT
Ontology is an explicit formal conceptualization of a
particular interest domain. It necessarily entails or
embodies some sort of world view with respect to a
given domain. Ontologies are increasingly being used in
several fields such as knowledge management,
communication between organizations, systems
engineering, information extraction, and the semantic
web. Ontology evaluation is a concept of assessing a
given ontology from the viewpoint of a specific
criterion of application, generally to determine which of
several ontologies would best suit a particular purpose.
In this paper we attempt to review different levels of
evaluation vis-à-vis different ontology approaches.
Keywords
Fuzzy ontology, Gene ontology, Ontology Evaluation,
Semantic web
INTRODUCTION
With Ontology' is the term used to refer to the shared
understanding of some domain of interest which may be
used as a unifying framework to solve problems [1].
Ontology is usually employed as a structure to capture
knowledge about a specific area by providing relevant
concepts and relations between them. Ontologies are
agreements about shared conceptualizations [1]. These
shared conceptualizations comprise of conceptual
frameworks for modelling domain knowledge contentspecific protocols for purpose of communication among
inter-operating agents and agreements regarding the
representation of particular domain theories [1]. In
terms of knowledge sharing context, ontologies are
described in the form of definitions of representational
vocabulary.
Simply put, ontologies are a constitutional data
structure for conceptualizing knowledge, however
generally we are able to construct several different
ontologies conceptualizing the same cluster of
knowledge and therefore we should be in a position to
determine which of them is best suited to a predefined
criterion. Thus, ontology evaluation is a crucial issue
that should be addressed if ontologies are to be widely
umanggala9@gmail.com
and popularly adopted in the semantic web as well as
other semantics-aware applications. Further, people
constructing ontology also require some mechanism to
evaluate the resulting ontology possibly to guide the
construction process and refinement steps (if any).
Moreover, automated or semi-automated ontology
learning techniques also do require efficient evaluation
measures that can be used to select the “best matching”
ontology out of the many plausible candidates, to
ascertain values of tunable parameters of the learning
algorithm.
Fuzzy Ontology
The Web Ontology Language Description Logics
(OWL DL) become less suitable in domains in which
the concepts to be represented do not have precise
definitions. In a lot of cases, this scenario is,
unfortunately, likely the rule rather than an exception.
To handle this problem, the use of fuzzy ontology offers
a solution. Classical ontology languages are not
appropriate to deal with imprecision or vagueness in
knowledge.In this case the solved model constitutes
distinct sources of information and the databases reside
on distinct servers each of which can be provided with
its own search engine, i.e. information retrieval system.
Fuzzy ontology helps in improving the information
search retrieval as it simplifies our theories in most
scientific fields. A fuzzy ontology is simply an ontology
which uses fuzzy logic to provide a natural
representation of imprecise and vague knowledge and
eases reasoning over it. It improves the search retrieval
system.
Gene Ontology
The Gene Ontology gives an ontology of defined terms
representing gene product (a biochemical product which
is either a protein or RNA) properties. The ontology
covers three domains:
1) cellular component: the interior parts of a cell or its
extracellular environment;
2) molecular function, the elemental activities of a gene
product at the molecular level, such as binding or
catalysis;
3) biological process, operations or sets of molecular
events with a defined beginning and end, pertinent to
the functioning of integrated living units: cells, tissues,
organs, and organisms.
Each term is then assigned to one of these three
ontology’s mentioned. Many GO terms have synonym
or relations to any one of three ontologies. Genome
annotation is the method of capturing data from a gene
product and then classifying into one these domains
mentioned earlier.
most part constructed manually. The ontology is
commonly described in a specific formal language and
must match the syntactic requirements of that language.
[6].
Structure, architecture, design. This is again, primarily
important to manually constructed ontologies. It comes
to the fore when we want the ontology to meet some
pre-defined design principles or criteria. Structural
concerns include the organization of the ontology as
well as its suitability for further development [6].
LITERATURE REVIEW
Evaluation of Lexical/Vocabulary and Data
Level
Categorization
Approaches
of
Ontology
Evaluation
In a broad sense, most evaluation approaches belong to
one of the following categories [1]:
• those that are based on comparing the ontology to a
“golden standard” which may in turn be an ontology
(e.g.[2]);
• those that are based on using the ontology in a
particular application and evaluating the results (e.g.
[3]);
• those that involve comparisons with a source of data
about the domain to be examined by the ontology
(e.g. [4]);
• those where evaluation is performed by humans who
try to assess the degree to which the ontology meets
a set of predefined criteria, standards, requirements,
etc. (e.g. [5]).
Ontologies are a deeply complex structure and it is
often more feasible to focus on the evaluation of
different levels of the ontology separately instead of
attempting to directly evaluate the ontology as a whole.
This is especially true if we wish to have a
predominantly automated evaluation. Though the
individual levels have been discussed and defined in
detail by several different authors, these definitions tend
to be broadly similar and involve the following levels
[1]:
Lexical, vocabulary, or data layer. Here the focus
borders on which concepts, instances, facts, etc. have
been covered in the ontology, and the vocabulary
employed to represent or identify these concepts.
Hierarchy or taxonomy. An ontology generally
comprises of a hierarchical is-a relation between
concepts. Though various other relations between the
concepts may also be present and defined, the is-a
relationship is often specifically important and may be
the focus of a particular evaluation efforts.
Context or application level. An ontology may in turn
be part of a larger collection of ontologies and may
reference or be referenced by multiple definitions in
other ontologies. Thus it becomes imperative to take
this context into account when evaluating it.
Syntactic level. Evaluation at this level might be of
particular intrigue for ontologies that have been for the
Similarity between two given strings is measured based
on the Levenshtein edit distance, normalized to give
scores in the range [0, 1]. Now a string matching
measure between two sets of strings is then defined by
considering each string in the first set, finding its
similarity to the most similar string in the second set,
and averaging this over all strings of the first set [1].
Here then, one may take the set of all string and
compare it to a “golden standard” set of strings which
are considered a good representation of the concepts
belonging to the problem domain under consideration.
The golden standard could also be in fact another
ontology.
The lexical content of a particular ontology can also be
evaluated by using the concepts of precision and recall,
as it is known in information retrieval. Here, precision
will be the percentage of the ontology lexical entries
(i.e. strings used as concept identifiers) that also show
up in the golden standard, with respect to the total
number of ontology words. Recall, on the other hand, is
the percentage of the golden standard lexical entries that
also show up as concept identifiers in the ontology, with
respect to the total number of golden standard lexical
entries. Another way to attain more tolerant matching
criteria by allowing synonyms, etc. is to augment each
and every lexical entry with its hypernyms from
WordNet or some similar resource [4] and then instead
of testing if two lexical entries match, one can test for
overlap between their corresponding sets of words [1].
Evaluation of Taxonomic
Semantic Relations
and
Other
It has been suggested [4] to use a data-driven approach
to determine the degree of structural fit between
ontology and a corpus of documents. (1) Given a corpus
of documents belonging to the domain of interest, a
clustering algorithm based on EM is used to determine,
in an unsupervised approach, a probabilistic mixture
model of covert “topics” such that each of the
documents can be modelled as having been created by a
mixture of topics. (2) Each concept c of the ontology is
characterised by a set of terms that also includes its
name in the ontology (3)The probabilistic models
acquired during clustering can be used to assess how
suitably the concept c fits that topic, for every topic
identified by the clustering algorithm,. (4) At this stage,
if it’s needed that each concept fits at least some topic
quite well, we can acquire a technique for lexical-level
evaluation of the ontology. However a particular
drawback of this technique to evaluate relations is that
it is extraneous to take the directionality of relations
into account.
By comparing the ontology with a human-generated
golden standard, or via a list of statistically relevant
terms, evaluation of ontology can also be done on basis
of precision and recall measures. Sadly though,
preparing the golden standard requires too much of
manual human work.
Evaluation on Content-Level
In some cases the ontology is a part of a larger
collection of ontologies which might reference one
another. (i.e. one ontology may make use of a concept
defined in another ontology). Hence ontology can be
evaluated in several ways using this context. For
example, the Swoogle search engine (of [7]) makes use
of cross-references present among semantic-web
documents to define a graph and subsequently calculate
a score for each ontology in a way similar to PageRank
that is used by the Google web search engine.
[8] states that not all “links” or references among
ontologies are treated the same. For example if a
particular ontology defines a subclass of a class from
another ontology, then this specific reference may be
treated more important than if that particular ontology
only uses a class from another as the domain or range of
some relation. Also, ontology may be strengthened with
metadata such as its design policy, how it is being
employed by others, along with “peer reviews”
provided by users of this ontology [1].
Evaluation Based on Application
Partly depending on the ontology used the outputs of
the application, or even its performance on the given
task, might be largely varying. Thus one may make a
case that a good ontology is one which helps the
application being considered to produce good results on
the given task. Hence Ontologies may thus be evaluated
just by embedding them into an application and then
analyzing the results of the application. For example,
[3] talks about a case where the ontology, together with
its relations (including is-a and others) is used mainly to
ascertain how well related the meaning of two concepts
is. The task in question is a speech recognition problem,
wherein final output evaluation of the task is relatively
straightforward.
This approach to ontology evaluation also has several
drawbacks: (1) it is easy to see that an ontology is good
or bad when worked in a particular way for a particular
task, however it’s difficult to generalize this
observation; (2) the ontology can only be a minute
component of the application and its fallout on the
outcome may be relatively small and indirect; (3)
drawing a comparison between several ontologies is
only possible if they can all be put into the same
application.
Data-Driven Evaluation
Another method to evaluate ontology can be by
comparing it to existing data (generally a collection of
textual documents) regarding the problem domain
concerning the ontology. For example, [8] show how to
find out whether the ontology refers to a particular
topic, and to categorize the ontology into a known
directory of topics: one extracts textual data from the
ontology (such as names of concepts and relations) and
then uses this as an input to a text classification model
[1]. In case of exhaustive ontologies incorporating
colossal factual information, the documents could also
be used as a source of “facts” regarding the external
world, and the evaluation checks if these facts can also
be obtained from the ontology.
Multiple Criteria Approaches
In effect, the general problem of ontology evaluation
has been pushed to the question of how to evaluate the
ontology vis-à-vis the individual evaluation criteria. On
the positive side, these approaches allow us to combine
criteria from most of the levels discussed above.
ONTOLOGY EVALUATION AND
COMPARISON
After carrying out research and gathering information
we try to carry out comparison between two ontology’s
mentioned earlier i.e. Fuzzy ontology and Gene
ontology
ONTOLOGY EVALUATION
LEXICAL/VOCABULARY
AND CONCEPT/DATA LEVEL
EVALUATION OF
TAXONOMIC AND OTHER
SEMANTIC RELATIONS
CONTEXT-LEVEL
EVALUATION
APPLICATION-BASED
EVALUATION
DATA-DRIVEN
EVALUATION
MULTIPLE-CRITERIA
APPROACHES
FUZZY ONTOLOGY
GENE ONTOLOGY
As it is based on information
retrieval, an approximation for
the query is possible
Hence it is better suited for
lexical level
Here the query passed may not be
précised. Hence the intervention
of an expert to deduce the data is
not required.
As Fuzzy ontology eases the
process of information retrieval,
but this ontology uses a number
of references or concepts to do
so.
A fuzzy ontology is simply an
ontology which uses fuzzy logic
to
provide
a
natural
representation of imprecise and
vague knowledge eases reasoning
over it and overcomes the
classical ontology languages
which are not appropriate to deal
with imprecision or vagueness in
knowledge.
On the contrary, the gene ontology
needs an exact term or a part of a term
in order to retrieve data. Not suited for
it
The data is not predefined in
fuzzy ontology. The query may
change and there is no way to
evaluate based on past data.
Fuzzy ontology can be evaluated
on various factors like multiplylocated terms, Query expansion,
intermediate query for grouping,
Storage required and knowledge
representation
The data is predefined and hence based
on the past data evaluation can be
made.
Here the query passed has to be
précised. That is why a need for the
intervention of an expert in order to
evaluate is needed.
On the contrary, Gene ontology
classifies a particular data into three
domains only. Hence no extra relations
or concepts are needed for them to
classify for a particular gene product.
Many Gene Ontology (GO) terms have
synonyms; GO uses 'synonym' in a
loose sense, as the names within the
synonyms field may not mean exactly
the same as the term they are attached
to.
Gene ontology can be evaluated on
factors like whether the classification
into one of the processes specified is
appropriate, based on the past data,
results based on association rules etc.
CONCLUSION
We have attempted to evaluate fuzzy ontology and
gene ontology on different evaluation methods. A
generalized conclusion was difficult to formulate as
both these ontology’s had their pros and cons at
different levels. Ontology evaluation remains an
important open problem in the area of ontology
supported computing and the semantic web. Still a
single best or preferred approach to ontology
evaluation has not been defined yet. In our opinion,
future work in this area should focus particularly on
defining a single approach in order to carry out
evaluation for future ontology’s to be defined.
REFERENCES
[1] Mike Uschold, Michael G. 1996. Ontologies:
Principles, Methods and Applications, Knowledge
Engineering Review. Vol 11 No 2
[2] Maedche,
A., Staab, S., 2002. Measuring
similarity between ontologies. Proc. CIKM. LNAI
vol. 2473.
[3] Porzel, R., Malaka, R. A task-based approach for
ontology evaluation. ECAI 2004 Workshop Ont.
Learning and Population.
[4] Brewster, C. et al. 2004. Data driven ontology
evaluation. Proceedings of Int. Conf.
Language Resources and Evaluation, Lisbon
[5] Lozano-Tello, A., Gómez-Pérez, A. 2004.
Ontometric:
A
method
to
choose
the
on
appropriate
15(2):1–18
ontology.
J.
Datab.
Mgmt.,
[6] Gómez-Pérez A. 1996. Towards a framework
to verify knowledge sharing technology.
Expert Systems with Applications, 11(4):519–
529.
[7] Ding, L., et al. 2004. Swoogle: A search and
metadata engine for the semantic web. Proc.
CIKM pp. 652–659.
[8] Patel, C., et al. 2004. OntoKhoj: a semantic
web portal for ontology searching, ranking and
classification. ACM Web Inf. & Data Mgmt.
Download