doc - MAPEKUS

advertisement

A ConCom – Concept Comparer

A.1 Basic Information

Many approaches acquire user characteristics for a user model to be filled or kept up to date and this way provide successful personalization of visible aspects in adaptive webbased applications.

Some information can be acquired directly from the user (the user is asked a question, fills in a form, etc.), observations of user’s behavior while working with the application, analysis of logs on the web server or analysis of the presented content.

We focus on the analysis of the presented content especially on evaluation of the similarity to find common or different aspects of the content.

A.1.1 Basic Terms

Ontological concept A set of properties (also called attributes) that are connected to related concepts. Concepts can be ordered in the hierarchy.

Instance Reflects objects from real world.

A.1.2 Method Description

Method used in ConCom tool is based on recursive evaluation of the attributes. Every instance of the concept can consist of object type or data type attributes that need to be treated differently. When data type attribute is evaluated method ends by using strategies intended for comparing string. Object type attributes are processed recursively by using assigned strategies until we reach data type level that is evaluated as mentioned above.

The total similarity is counted continuously step by step as the different evaluation strategies are used, i.e. each strategy contributes to the total similarity with its own partial similarity as counted for two given objects. There is a space to use more strategies to evaluate similarity between two objects.

Knowing user’s rating of displayed concepts we can use computed similarity measure for each attribute to further investigate common and different attributes of the concepts and figure out useful information about the user’s interests to be used for personalization purposes.

A.1.3 Scenarios of Use

ConCom can be used in the following scenarios:

An input is two instances of ontological concepts.

ConCom should not be used in following case:

Instances of ontological concepts that do not belong to the same ontology.

A.1.4 External Links and Publications

A NDREJKO , A., B IELIKOVÁ , M.: Estimating similarity of the ontological concepts instances for the adaptive applications based on Semantic Web. [in Slovak] In:

Václav Snášel (ed.): Znalosti 2008: Proceedings of the 7th annual conference ,

Bratislava, February 13-15, 2008, pp. 30-41.

A NDREJKO , A., B IELIKOVÁ , M.: Estimating similarity of the ontological concepts instances for personalization purposes. [in Slovak] In: František Babič, Ján

Paralič (eds.): 2nd Workshop on Intelligent and Knowledge Oriented

Technologies, WIKT 2007 Proceedings

, Košice, November 15-16, 2007, pp. 46-

49.

A

NDREJKO

, A., B IELIKOVÁ , M.: Comparing Instances of the Ontological Concepts. In:

Tools for Acquisition, Organisation and Presenting of Information and

Knowledge (2): Research Project Workshop Horský hotel Poľana

, Slovakia

September 22-23, 2007, pp. 26-35

A

NDREJKO

, A., B

ARLA

, M., T VAROŽEK , M.: Comparing Ontological Concepts to

Evaluate Similarity. In: Tools for Acguisition, Organisation and Presenting of

Information and Knowledge : Research Project Workshop Bystrá dolina , Nízke

Tatry, Slovakia, September 29-30, 2006, pp. 71-78

L OG 4J. Java-based logging utility, Apache Software Foundation.

(http://logging.apache.org/log4j)

S

IM

M

ETRIC

. Open source Similarity Measure Library.

(http://sourceforge.net/projects/simmetrics/)

A.2 Integration Manual

ConCom is developed in Java (Standard Edition 6) and distributed as a jar archive.

Access to the functionality of the tool is provided through the command line using call: java -jar concom.jar [common-options] URL1 URL2 where:

 uri1, uri2

− unique identifiers of instances to be compared. and [ common-options] :

-help

– shows help

-server <url>

– ontology server

-ontology <name>

– ontology name

-username <username>

– username used for repository connection

-password <password>

– password used for repository connection

-use-uncommon <true|false>

– whether to use 'uncommon predicates' for data nodes, default is false

-strong <filename>

– filename of file containing URIs of 'strongly' filtered predicates

-weak <filename>

– filename of file containing URIs of 'weakly' filtered predicates

-metric-data <M|L|D>

– strings comparison metric used for data nodes, default is

D

-metric-labels <M|L|D>

– strings comparison metric used for labels, default is D where [metric]

M - Monge-Elkan

L – Levenshtein

D - Dummy (internal)

ConCom is not a stand-alone application; the tool is proposed to be included in other application/tool, which will call its interface methods.

A.2.1 Dependencies

ConCom uses:

Log4J logging utility,

SimMetrics open source Similarity Measure Library.

A.2.2 Installation

Deploying ConCom into other application requires the following steps (any Java

Integrated Development Environment should be used):

1.

Three external jar archives must be included into existing project − the jar archives containing ConCom, Log4J and SimMetrics.

A.2.3 Configuration

ConCom uses configuration from command line as described above.

A.2.4 Integration Guide

ConCom evaluates similarity for two instances given in the command line. While processing evaluation total similarity measure is computed

A.3 Development Manual

A.3.1 Tool Structure

ConCom consists of following packages:

Provides set of applications that compute various similarities

( sk.fiit.nazou.concom.applications

);

Provides classes and interfaces for handling instances of ontological concepts

( sk.fiit.nazou.concom.concept

);

Provides classes and interfaces for enumeration of similarity measure between two nodes. ( sk.fiit.nazou.concom.similarity

).

A.3.2 Method Implementation

To evaluate similarity we have proposed a method based on recursive evaluation of the attributes and objects an instance consists of. The main idea of the method is aimed at looking for common pairs in both attributes and their sequential processing. The

principle of the method is depicted in the Fig 1.

[Object contains additional attribute]

Get all attributes

[Attributes list isn't empty]

Adjust total similarity

[User model isn't present]

Add weights

[Attributes occurs in both instances]

Get connected objects

[Object type attribute]

Use object type strategy

[Data type attribute]

Use data type strategy

Get total similarity

Fig 1.Principle of method using recursive traversing of an instance.

The process of comparison begins with acquiring all the attributes from both instances.

An attribute can have several occurrences:

 single in both instances,

 multiple in both instances,

 single/multiple in one instance only.

When the attribute has a single occurrence in both instances objects (literals) it refers to are evaluated for their similarity. Variety of similarity metrics can be used. At this point, if the attribute is data type the comparing for the attribute ends after a strategy is used to evaluate similarity between connected literals. Computed similarity measure is aggregated to total similarity measure. In case of object type attribute a strategy for connected object is used. Furthermore, the comparison is being launched recursively on that object until literals are achieved.

When an instance is being traversed recursively, an inverse attribute can occur to already traversed attribute. For instance, Washington, D.C. is connected to the job offer with the attribute jo:hasDutyLocation . Since more than one job offer can be located in

Washington, D.C. it is desirable for all of them to refer to the same object (e.g.

Washington, D.C. jo:isDutyLocationOf others job offers). If we do not take inverse attribute to the attribute into account the traverse algorithm in the comparing process will continue through that attribute to other instances. The similar problem cause symmetric attributes. Therefore we filter out the inverse and symmetric attributes to the examined attribute.

A multiple occurrence is the most specific case we have to cope with. We move solution of this problem to the lower level. Anytime a multiple attribute is acquired only its one occurrence in the instance is considered. Afterwards, all objects (literals) connected to that attribute are acquired from both instances. Instead of dealing with attributes now we have to deal with two sets of objects (or literals) possibly with different cardinalities.

Here, a problem of how to figure out which object from first set should be compared with an object from another set and contribute to total similarity emerges. We use our similarity measure to identify pairs.

In the situation, when single or multiple occurrence of an attribute is in one instance only we use an assumption that instances are entirely different in the attribute if there is no presence of that attribute in both instances. In regard to similarity definition, the similarity equals zero if two objects have nothing in common. In this case we estimate similarity for such an occurrence of the attribute as equal zero.

We extend achieved similarity measure with content’s attribute that caused different rating according to defined threshold values. We suppose that important attributes influence user’s interest towards positive values and vice versa. From all attributes we are interested in the most positive attributes and the most negative attributes. Therefore, we suggested positive threshold 0.85 (attributes with higher similarity are assigned to positive set) and negative threshold 0.15 (attributes with lower similarity are assigned to negative set).

A.3.3 Enhancements and Optimizing

Each instance of the concept is represented as a tree consisting of nodes and predicates.

Methods for building such a tree are provided. Each element (node or predicate) is represented by its URI in the repository. Object representation to create a node from given URI is provided by NodeFactory : final NodeFactory factory = Utils.getNodeFactory();

Afterwards the node for given URI (xURI) is acquired as follows: final Node x = nodes.get(xUri);

Nodes and predicates are acquired asynchronously in threads and afterwards are stored in cache not to be acquired repeatedly. It avoids querying repository for the same data.

A.4 Manual for Adaptation to Other Domains

The method of the recursive evaluation implemented in the ConCom tool is universal and exploits ontological structure of the concept. It is based on acquiring attributes and objects which are connected. Therefore, it can be used also in other application domains. However, in some cases it might be desirable to add additional strategies to achieve better results or to deal with particularities typical for processed domain.

A.4.1 Configuring to Other Domain

When using ConCom in other application domain, more attention should be paid especially to inverse attributes because they cause circular reference. Inverse properties can be identified through owl:InverseOf property of OWL language. However, query returns both base property and its inverse property. Therefore, it is necessary to fill in the list of inverse properties for given domain in the configuration file to be ignored.

A.4.2 Dependencies

Log4J as well as SimMetrics are involved domain independently into the ConCom.

Download