1 - NAZOU

advertisement
A
ConCom – Concept Comparer
A.1
Basic Information
Many approaches acquire user characteristics for a user model to be populated or kept
up to date and this way provide a basis for successful personalization of visible aspects
in adaptive web-based applications.
Some information can be acquired directly from the user (e.g., the user is asked
a question, fills in a form), observations of user’s behavior while working with the
application, analysis of logs on the web server or analysis of the presented content.
We focus on the analysis of the presented content especially on evaluation of the
similarity to find common or different aspects of the content.
A.1.1 Basic Terms
Ontological concept
A set of properties that are connected to related
concepts. Concepts can be ordered in the hierarchy.
Instance
Reflects objects from real world.
Datatype property
Expresses relations between concept instances and RDF
literals and XML Schema datatypes.
Object property
Expresses relations between two instances.
A.1.2 Method Description
The main idea of the method used in ConCom tool is based on the evaluation of
common property pairs present in both instances. Every instance of the concept can
consist of object or datatype properties that need to be treated differently. When
a datatype property is evaluated the method ends after using a metric intended for
comparing strings. Object properties are processed recursively by using respective
metrics until literals are reached or until there are no properties left.
The total similarity is computed continuously step by step as the different evaluation
metrics are used, i.e. each metric contributes to the total similarity with its own partial
similarity as computed for respective instances (literals).
Knowing user’s rating of displayed concepts we can use similarity measure computed
for each property to investigate common and different properties of the compared
instances and find out useful information about the user’s interests to be used for
personalization purposes.
A.1.3 Scenarios of Use
ConCom can be used in the following scenarios:
 An input is two instances of ontological concepts and the result is their similarity.
 An input is one instance and the result is a set of most similar instances to given
one.
ConCom should not be used in following case:
 Instances of ontological concepts that do not belong to the same ontology.
A.1.4 External Links and Publications
ANDREJKO, A., BIELIKOVÁ, M.: Investigating Similarity of Ontology Instances and Its
Causes. In V. Kurkova, R. Neruda, J. Koutnik (eds.): Artificial Neural Networks
– ICANN 2008, Prague, Czech Republic: LNCS. Springer, 2008. (to appear)
ANDREJKO, A., BIELIKOVÁ, M.: Estimating similarity of the ontological concepts
instances for the adaptive applications based on Semantic Web. [in Slovak] In:
Václav Snášel (ed.): Znalosti 2008: Proceedings of the 7th annual conference,
Bratislava, February 13-15, 2008, pp. 30-41.
ANDREJKO, A., BIELIKOVÁ, M.: Estimating similarity of the ontological concepts
instances for personalization purposes. [in Slovak] In: František Babič, Ján
Paralič (eds.): 2nd Workshop on Intelligent and Knowledge Oriented
Technologies, WIKT 2007 Proceedings, Košice, November 15-16, 2007, pp. 4649.
ANDREJKO, A., BIELIKOVÁ, M.: Comparing Instances of the Ontological Concepts. In:
Tools for Acquisition, Organisation and Presenting of Information and
Knowledge (2): Research Project Workshop Horský hotel Poľana, Slovakia
September 22-23, 2007, pp. 26-35
ANDREJKO, A., BARLA, M., TVAROŽEK, M.: Comparing Ontological Concepts to
Evaluate Similarity. In: Tools for Acguisition, Organisation and Presenting of
Information and Knowledge : Research Project Workshop Bystrá dolina, Nízke
Tatry, Slovakia, September 29-30, 2006, pp. 71-78
LOG4J. Java-based logging utility, Apache Software Foundation.
(http://logging.apache.org/log4j)
SIMMETRIC. Open source Similarity Measure Library.
(http://sourceforge.net/projects/simmetrics/)
A.2
Integration Manual
ConCom is developed in Java (Standard Edition 6) and is distributed as a jar archive.
Access to the functionality of the tool is provided through the command line using call:
java -jar concom.jar [common-options] URL1 URL2
where:

uri1, uri2
− unique identifiers of instances to be compared.
and [common-options]:

-help

-server <url>
– shows help
– ontology server

-ontology <name>

-username <username>
– username used for repository connection

-password <password>
– password used for repository connection

-use-uncommon <true|false>
– ontology name
– whether to use 'uncommon predicates' for data
nodes, default is false

-strong
<filename>
– filename of file containing URIs of 'strongly' filtered
predicates

-weak <filename>

-metric-data <M|L|D>
– filename of file containing URIs of 'weakly' filtered predicates
– strings comparison metric used for data nodes, default is
D

-metric-labels <M|L|D>
– strings comparison metric used for labels, default is D
where [metric]
 M – Monge-Elkan
 L – Levenshtein
 D – Dummy (internal)
ConCom is not a stand-alone application; the tool is proposed to be included in other
application/tool, which will call its interface methods.
A.2.1 Dependencies
ConCom uses:
 Log4J logging utility,
 SimMetrics open source Similarity Measure Library.
A.2.2 Installation
Deploying ConCom into other application requires three external jar archives that must
be included into existing project − the jar archives containing ConCom, Log4J and
SimMetrics.
A.2.3 Configuration
ConCom uses configuration from the command line as described above or configuration
parameters can be set in the configuration file.
A.2.4 Integration Guide
ConCom computes similarity measure for two instances of ontological concepts given
in the command line. The result is a similarity measure computed using respective
similarity metrics. Furthermore, ConCom provides an interface that allows searching for
the most similar instances to the given one.
A.3
Development Manual
A.3.1 Tool Structure
ConCom consists of following packages:
 Provides
set
of
applications
that
(sk.fiit.nazou.concom.applications);
compute
various
similarities
 Provides classes and interfaces for handling instances of ontological concepts
(sk.fiit.nazou.concom.concept);
 Provides classes and interfaces to compute a similarity measure between two
instances. (sk.fiit.nazou.concom.similarity).
A.3.2 Method Implementation
To evaluate similarity measure between instances we proposed a method based on
recursive evaluation of the properties compared instances consist of. The rough
principle of the method illustrating comparison of two instances instanceA and
instanceB is as follows.
function getSimilarity(instanceA, instanceB)
set similarity to 0.0
set counter to 0
store properties for instanceA and instenceB to properties
foreach property in properties do
increment counter
if property is in both instances then
store connected elements to elementX and elementY
add computeSimilarity(elementX, elementY ) to similarity
else
add 0.0 to similarity
end if
end foreach
return similarity/counter
end function
function computeSimilarity(elementX, elementY )
if property is datatype then
return getDatatypeSimilarity(elementX, elementY )
else
set similarity to 0.0
add getObjecSimilarity(elementX, elementY ) to similarity
add getSimilarity(elementX, elementY ) to similarity
return similarity/2
end if
end function
When comparing two instances, properties can appear in different cardinalities:
 single in both instances,
 multiple in both instances,
 single/multiple in one instance only.
When the property has a single occurrence in both instances then the similarity of
related elements (instances in the case of object properties or literals in the case of
datatype properties) is evaluated using different similarity metrics. The comparison of
datatype properties ends after a metric is used to compute the similarity measure
between the related literals. For object properties a metric for related instances is
computed (e.g., taxonomy distance) and further comparison is performed recursively on
the respective instances until literals are reached or until there are no properties left.
When an instance is being traversed recursively, an inverse property can connect it to an
already traversed instance. If we do not consider inverse or symmetric properties, the
algorithm will traverse them and enter an infinite loop. Therefore, we filter out inverse
and symmetric properties to the examined property. However, loops can still occur, for
example, if two different properties lead to the same instance. In such cases, the already
traversed instances are omitted and further traversing stops.
Multiple occurrences of properties in an instance are the most complex case we have to
address. In this case, two sets are constructed which contain elements which are
connected to the examined property in the first and second instance respectively. These
two sets can have different cardinalities – the problem is to identify (i.e., to match)
similar elements between these two sets. We use our similarity measure to identify such
element pairs, which are then compared and the computed similarity contributes to the
total similarity between the two instances.
If single or multiple occurrence of a property occurs only in one instance, we estimate
similarity of values attached to the property as equal zero. It is based on the similarity
definition, i.e. the similarity equals zero if two objects are entirely different. Here, we
assume that instances are entirely different in the property, since a value is assigned to
the property in one instance only.
Furthermore, we investigate reasons (properties) that influenced user evaluation of
content (e.g., interest). We introduce two threshold values used to discover a user’s likes
and dislikes. From the personalization perspective we are only interested in the two
outer sets – positive and negative items. The identified properties can be used by other
tools for actualization of characteristics in the user model or for the acquisition of new
ones.
A.3.3 Enhancements and Optimizing
Each instance of the concept is represented as a tree (eventually a graph) consisting of
nodes (instances) and edges (properties). Each element (node or edge) is represented by
its URI in the repository. Object representation to create a node from given URI is
provided by NodeFactory:
final NodeFactory factory = Utils.getNodeFactory();
Afterwards the node for given URI (xURI) is acquired as follows:
final Node x = nodes.get(xUri);
The nodes and edges are acquired asynchronously in threads and afterwards are stored
in the cache not to be acquired repeatedly. It avoids multiple querying repository for the
same data.
A.4
Manual for Adaptation to Other Domains
The method of the recursive evaluation implemented in the ConCom tool is universal
and exploits ontological structure of the concept. It is based on acquiring properties and
instances (literals) which are connected. Therefore, it can be used also in other
application domains. However, in some cases it might be desirable to add additional
metrics to achieve better results or to deal with particularities typical for processed
domain.
A.4.1 Configuring to Other Domain
When using ConCom in other application domains, more attention should be paid
especially to inverse properties because they cause circular references. Inverse
properties can be identified through owl:InverseOf property of OWL language.
However, query returns both, i.e. base property and its inverse property. Therefore, it is
necessary to fill in the list of inverse properties for given domain in the configuration
file to be ignored.
A.4.2 Dependencies
Log4J and SimMetrics are involved domain independently into the ConCom.
Download