11. Ontology Alignment.ppt

advertisement

Ontology Alignment

Semantic Web - Spring 2006

Computer Engineering Department

Sharif University of Technology

The Problem

Like the Web, the Semantic Web by design will be distributed and heterogeneous.

Ontology is used in it to support interoperability and common understanding between different parties.

Ontologies themselves may have some heterogeneities.

Ontology Alignment is needed to find semantic relationships among entities of ontologies.

c a b d

?

?

?

?

?

?

How should I use them?

!!!

?

Terminology

Mapping : a formal expression that states the semantic relation between two entities belonging to different ontologies.

Ontology Alignment: a set of correspondences between two or more

(in case of multi-alignment) ontologies. These correspondences are expressed as mappings.

Ontology Coordination: broadest term that applies whenever knowledge from two or more ontologies must be used at the same time in a meaningful way (e.g. to achieve a single goal).

Ontology Transformation: a general term for referring to any process which leads to a new ontology o0 from an ontology o by using a transformation function t.

An Example of Alignment

Car : Ontology A ( ?

) Automobile : Ontology B

Object

1.0

Vehicle

Has

Owner

Car

Boat

0.6

Has

Speed

Owner

Ali

Peugeot 405

250 km/h

0.6

Speed

Car – Automobile

Label Similarity = 0.0

Super Similarity = 1.0

Instance Similarity = 0.6

Relation Similarity = 0.8

Total Similarity = 0.6

Thing

Vehicle

Automobile

0.8

Ali’s

Peugeot

Has

Specification

Speed

Fast

Concept

Property

Instance

Type

Similarity

Terminology cont.

Ontology Translation: an ontology transformation function t for translating an ontology o written in some language L into another ontology o’ written in a distinct language L’.

Ontology Merging: the creation of a new ontology from two

(possibly overlapping) source ontologies. This concept is closely related to that of integration in the database community

.

Ontology Reconciliation: a process that harmonizes the content of two (or more) ontologies, typically requiring changes on one of the two sides or even on both sides.

An Example of Ontology Merging

Object

Thing

Vehicle

Bus Car

Sport Car

Luxury Car

Family Car

BMW

Automobile

Sport Car Family Car

Porsche

An Example of Ontology Merging

Object

Thing

Vehicle

Bus Car

Sport Car

Luxury Car

Family Car

BMW

Automobile

Sport Car Family Car

Porsche

An Example of Ontology Merging

Object

Thing

Vehicle

Bus Car

Sport Car

Luxury Car

Family Car

BMW

Automobile

Sport Car Family Car

Porsche

An Example of Ontology Merging

Object, Thing

Bus

Vehicle

Car, Automobile

Sport Car Luxury Car Family Car

BMW Porsche

Forms of Heterogeneity in Ontologies

Syntactic: depend on the choice of the representation

OWL, RDFS, DAML, N3, DATALOG, PROLOG, …

Terminological: all forms of mismatches that are related to the process of naming the entities (e.g. individuals, classes, properties, relations) that occur in an ontology.

Typical Examples:

 different words are used to name the same entity ( synonymy ); the same word is used to name different entities ( polysemy ); words from different languages (English, French, etc.) are used to name entities; syntactic variations of the same word (different acceptable spellings, abbreviations, use of optional prefixes or suffixes, etc.).

Mismatches at the terminological level are not as deep as those occurring at the conceptual level. However, Most real cases have to do with the terminological level (e.g., with the way different people name the same entities), and therefore this level is at least as crucial as the other one.

Heterogeneity in Ontologies, cont.

Conceptual: we encounter mismatches which have to do with the content of an ontology.

Metaphysical differences: which have to do with how the world is “broken into pieces”.

Coverage: cover different portions – possibly overlapping– of the world.

Granularity: One ontology provides a more (or less) detailed description of the same entities.

Perspective: an ontology may provide a viewpoint, which is different from the viewpoint adopted in another ontology.

Heterogeneity in Ontologies, cont.

Metaphysical differences:

Overcoming Heterogeneity

One common approach to the problems of heterogeneity is the definition of relations across the heterogeneous representations.

These relations can be used for transforming expression of one ontology into a form compatible with that of the other.

This may happen at any level:

syntactic: through semantic-preserving transducers;

 terminological: through functions mapping lexical information;

 conceptual: through general transformation of the representations (sometimes requiring a complete prover for some languages);

Structure of Mapping

Alignment: a process that starts from two representations o and o’ and produces a set of mappings between pairs of (simple or complex) entities <e, e’> belonging to O and O’ respectively.

Intuitively, we will assume that in general a mapping can be described as a quadruple:

<e, e’, n , R>

 e and e’ are the entities between which a relation is asserted by the mapping.

n is a degree of trust (confidence) in that mapping.

R is the relation associated to a mapping, where R identifies the relation holding between e and e’.

 simple set-theoretic relation a fuzzy relation a probabilistic distribution over a complete set of relations

 a similarity measure

Similarity

There are many ways to assess the similarity between two entities. The most common way amounts to defining a measure of this similarity.

The characteristics which can be asked from these measures:

Overcoming Heterogeneity Using Similarity

Local Methods

Terminological Methods

String Based Methods

Token Based Methods

Language Based Methods

Structural Methods

Internal Structure

External Structure

Extensional (based on instances) Methods

When the classes share the same instances

When they do not

Terminological Methods

Terminological methods compare strings.

Can be applied to:

 name, label

 comments concerning entities

URI

Take advantage of the structure of the string (as a sequence of letter).

The main idea in using such measures is the fact that usually similar entities have similar names and descriptions in different ontologies.

Terminological M., cont. (Normalization)

There are a number of normalization procedures that help improving the results of subsequent comparison:

Case normalization: consists of converting each alphabetic character in the strings in their down case counterpart;

Diacritics suppression: replacing characters with diacritic signs with their most frequent replacement (replacing

Montréal with Montreal);

Blank normalization: Normalizing all blank characters

(blank, tabulation, carriage return) into a single blank character;

Link stripping: normalizing some links between words (like replacing apostrophes and blank underline into dashes;

Stopword elimination : eliminates words that can be found in a list (usually like, “to”, “a". . . ).

Terminological M., cont. (String Based)

Substring Similarity

Hamming Distance

N-Gram Distance

Edit Distance

Jaro Similarity

Token Based Distances

Term Frequency Inverse Document Frequency (TF/IDF)

Path Distance : not only the labels of objects but the sequence of labels of entities to which those bearing the label are related.

Terminological M., cont (String Methods)

In string edit distance, the operations usually considered are insertion of a character, replacement of a character by another and deletion of a character.

Levenstein Distance is an Edit Distance with all costs to 1.

Terminological M., cont. (Language Based)

Rely on using NLP techniques to find associations between instances of concepts or classes.

Intrinsic methods : perform the terminological matching with the help of morphological and syntactic analysis to perform term normalization. (Stemming) : going  go

Extrinsic methods: make use of external resources such as dictionaries and lexicons (Wordnet).

Resnik Semantic Similarity

Structural Methods

The structure of entities that can be found in ontology can be compared, instead of comparing their names or identifiers

.

Internal Structure: use criteria such as the range of their properties (attributes and relations), their cardinality , and the transitivity and/or symmetry of their properties to calculate the similarity between them.

External Structure: The similarity comparison between two entities from two ontologies can be based on the position of entities within their hierarchies .

Structural Methods (External)

If two entities from two ontologies are similar, their neighbors might also be somehow similar.

Criteria for deciding that the two entities are similar include:

Their direct super-entities are already similar.

Their sibling-entities are already similar.

Their direct sub-entities are already similar.

All (or most) of their descendant-entities (entities in the sub tree rooted at the entity in question) are already similar.

All (or most) of their leaf-entities are already similar.

All (or most) of entities in the paths from the root to the entities in question are already similar.

Structural Methods (External), cont.

Existing Approaches:

Structural topological dissimilarity on hierarchies

Upward Cotopic Distance

Extensional (based on instances) Methods

Compares the extension of classes, i.e., their set of instances rather than their interpretation.

Conditions in which such techniques can be used:

When the classes share the same instances

When they do not

Global Methods

After calculation of local similarity, it is remain to compute the alignment. This involve some kind of more global treatments, including:

 aggregating the results of these base methods in order to compute the similarity between

 compound entities developing a strategy for computing these similarities in spite of cycles and non linearity in the constraints governing similarities organizing the combination of various similarity / alignment algorithms involving the user in the loop finally extracting the alignments from the resulting

(dis)similarity

Compound similarity

Global similarity computation

The computation of compound similarity is still local because it only provides similarity considering the neighborhood of a node.

Similarity may involve the ontologies as a whole and the final similarity values may ultimately depend on all the ontologies.

The distance defined by local methods can be defined in a circular way. (for instance if the distance between two classes depends on the distances between their instances which themselves depends on the distance between their classes or if there are circles in the ontology).

Strategies must be defined in order to compute this global similarity.

Similarity Flooding

Similarity equation fix point

Global similarity (Similarity Flooding)

Two ontologies are first translated into directed labeled graphs.

Creates another graph G whose nodes are pairs of nodes of the initial graphs and there is an edge between (o1, o’1) and (o2, o’2) labeled by p whenever there are edges (o1, p, o2) in the first graph and (o’1, p, o’2) in the second one.

 computes initial similarity values between nodes (based on their labels for instance) and then iterates steps of re-computing the similarities between nodes in function of the similarity between their adjacent nodes at the previous step.

It stops when no similarity changes more than a particular threshold or after a predetermined number of steps.

Use a weighted linear aggregation in which the weight of an edge is the inverse of the number of other edges with the same label reaching the same couple of entities.

Similarity Flooding Algorithm

Learning Methods

Like in many other fields, learning methods developed in machine learning reveals useful in ontology alignment.

Two particular areas:

 supervised learning in which the ontology alignment algorithm learns how to work through the presentation of many good alignment (positive examples) and bad alignments (negative examples).

 it is difficult to know which techniques works well for which ontology features.

An ontology alignment algorithm learnt with several ontology pairs, might not necessarily work well for a new ontology pair.

Learning from data in which a population of instances is communicated to the algorithm together with theirs relations and the classes they belong to.

Users Feed Back

The support of effective interaction of the user with the system components is one concern of ontology alignment.

User input can take place in many areas of alignment:

Assessing initial similarity between some terms;

Invoking and composing alignment methods;

Accepting or refusing similarity or alignment provided by the various methods.

Alignment Extraction

The ultimate alignment goal is a satisfactory set of correspondences between ontologies.

Manual Extraction: Display the entity pairs with their similarity scores and/or ranks and leaving the choice of the appropriate pairs up to the user of the alignment tool.

Automatic Extraction:

Using Thresholds

Hard threshold retains all the correspondence above threshold n;

Delta method consists in using as a threshold the highest similarity value to which a particular constant value d is subtracted;

Proportional method: consists in using as a threshold the a percentage of the highest similarity value;

Percentage: retains the n% correspondences above the others.

Alignment Extraction, cont.

Automatic Extraction

Using Optimization of the result

 if an injective mapping is required then some choices need to be made in order to maximize the “quality” of the alignment.

that is typically measured on the total similarity of the aligned entity pairs.

A greedy alignment algorithm could construct the correspondences step-wise, at each step selecting the most similar pair and deleting its members from the table. The algorithm will then stop whenever no pair remains whose similarity is above the threshold. (Not Optimal)

Optimal Solution: Stable Marriage

An Example: Anchor Prompt Method

The Anchor-PROMPT (an extension of PROMPT) is an ontology merging and alignment tool for possible matching terms.

Implemented in Protégé http://protege.stanford.edu

Incremental algorithm

Takes as input two ontologies and a set of anchors-pairs of related terms.

Anchors are identified with the help of string-based techniques, or defined by a user.

Then it refines them based on the ontology structures and users feedback.

The PROMPT Algorithm

Make initial suggestions

Select the next operation

Perform automatic updates

Find conflicts

Make suggestions

After a User Performs an Operation

For each operation

 perform the operation

 consider possible conflicts

 identify conflicts propose solutions analyze local context create new suggestions reinforce or downgrade existing suggestions

Conflicts

Conflicts that PROMPT identifies

 name conflicts

 dangling references redundancy in a class hierarchy slot-value restrictions that violate class inheritance

Anchor-PROMPT:Using Non-Local Contexts

Ontology 1 Ontology 2 

Input:

A set of anchor pairs

Output:

A set of related terms with similarity scores

Where do anchors come from?

Lexical matching

Interactive tools

User-specified

Generating Paths in the Graph

Existing Works

Lexical

Features

Method

OntoMorph

U.S. Army

Smart

Chimaera

Prompt

InfoSlueth

A. Prompt

Glue

IF Map

NOM

QOM

CROSI

2001

2001

2002

2002

2003

2003

2004

2005

Year

1997

1999

1999

1999

Organization

S. California

DARPA

Sanford

Stanford

Stanford

Amsterdam

Stanford

Illinois

Southampton

Karlsruhe

Karlsruhe

Southampton

Fridman, Noy

McGuinness

Noy, Musen

Ding

Noy, Musen

Doan

Kafoglou

Ehric

Ehric

Kafoglou

Project Leader

Chalupsky

Automatic

Semi

Semi

Semi

Semi

Semi

Semi

Semi

Automatic

Automatic

Automatic

Automatic

Automatic

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T T

T

T

T

T

T

T

T

The End

Download