D2.R1 - DBGroup

advertisement
WISDOM
Programma di ricerca (cofinanziato dal MIUR, esercizio 2004)
Ricerca Intelligente su Web basata su Ontologie di Dominio
Web Intelligent Search based on DOMain ontologies
Titolo del rapporto
PAOLO BOUQUET, FAUSTO GIUNCHIGLIA, MIKALAI YATSKEVICH,
STEFANO ZANOBINI
D2.R1
15 giugno 2005
Sommario
Sommario
Tema
Tema 1: Analisi Critica di linguaggi e tecniche di mapping
Codice
D2.R1
Data
15 giugno 2005
Tipo di
prodotto
Numero
di pagine
Unità
responsabile
Unità
coinvolte
Autori
Autore
da
contattare
Rapporto
35
TN
TN, MO
Paolo Bouquet, Fausto Giunchiglia, Mikalai Yatskevich, Stefano
Zanobini
Paolo Bouquet
Critical analysis of languages and mapping techniques
1. Introduction
Like the Web, the semantic Web will be distributed and heterogeneous. As such, the
integration of resources found on the semantic Web is one of its main problems. To
develop a solution of this problem, data will be expressed in the framework of ontologies. However, ontologies themselves can be heterogeneous and some work will have
to be done to restore interoperability.
Even with emerging web and ontology standards, coordinating ontology development – whether manual or automatic – will prove to be a challenging task. In evolving
domains, it is expected that ontologies will not remain static and various versions of
ontologies will have to be tracked. Interdisciplinary ontologies may need to be created
from existing domain-specific ontologies, domain specific ontologies may need to be
merged with more general ontologies, different versions of a single-domain ontology
may need to be merged, and new information may need to be merged with existing
ontologies. Furthermore, new ontologies may be built by merging information from
heterogeneous databases or other information sources. Hence, these ontologies will
have to be reconciled.
Semantic interoperability can be grounded in ontology reconciliation: finding relationships between entities belonging to different ontologies. We call this process “ontology alignment” or “ontology matching”.
Alignment results can be used for various purposes such as displaying the correspondences, transforming one source into another, creating a set of bridge axioms or
rules between the ontologies, or generating query wrapper (query rewriting instead of
transformation).
The ontology alignment problem can be described in one sentence: given two ontologies which describe each a set of discrete entities (which can be classes, properties, rules, predicates, etc.), find the relationships (e.g., equivalence or subsumption)
holding between these entities.
The purpose of this deliverable is to present what is existing and ongoing in these
various contexts so that research in ontology alignment can progress towards more
complete, more integrated and especially more efficient means of solving heterogeneity problems through alignment.
In the following we first present the terminology (Section 2). In Section 3 the major approaches to mapping languages are presented. In Section 4 classification of
schema based matching techniques is presented and specific matching techniques are
discussed. Section 5 is devoted to the systems implementing the matching techniques.
2. Terminology
The framework presented in this document builds on top of a lot of recent work on the
problem of semantic interoperability. In this area, different authors use different
words to refer to similar concepts, and vice versa sometimes different concepts are referred to by the same name. In this section, we provide a tentative and partial glossary
with the definition of terms as they will be used in the rest of the document. The glossary is based on a technical deliverable (D2.2.1) of the EU funded Network of Excellence called Knowledge Web.
Mapping: a formal expression that states the semantic relation between two entities
belonging to different ontologies. When this relation is oriented, this corresponds to a
restriction of the usual mathematical meaning of mapping: a function (whose domain
is a singleton). Mappings are discussed at length in Chapter 4.
Ontology Alignment: a set of correspondences between two or more (in case of multi-alignment) ontologies (by analogy with DNA sequence alignment). These correspondences are expressed as mappings. Alignments are detailed in Chapter 3.
Ontology Coordination: broadest term that applies whenever knowledge from two
or more ontologies must be used at the same time in a meaningful way (e.g. to
achieve a single goal).
Ontology Transformation: a general term for referring to any process which leads to
a new ontology o0 from an ontology o by using a transformation function t. Transformations and the like are the subject of further work in this work package.
Ontology Translation: an ontology transformation function t for translating an ontology o written in some language L into another ontology o0 written in a distinct language L0.
Ontology Merging: the creation of a new ontology om from two (possibly overlapping) source ontologies o0 and o00. This concept is closely related to the that of integration in the database community.
Ontology Reconciliation: a process that harmonizes the content of two (or more) ontologies, typically requiring changes on one of the two sides or even on both sides
[Hameed et al., 2004].
Meaning Negotiation: the protocol through which two agents (either human or artificial) agree on the changes required to reconciliate their ontologies.
3. Mappings languages
The most of the state of the art mapping languages represent the mappings in two different ways. In the first family of approaches the mappings are represented as a tuples
usually containing the nodes from both ontologies plus some additional information
defining the strength and type of mapping. In the second family of mapping approaches the mappings are represented as logical statements (most often rules) which
connect the entities of the both ontologies. Let us review these two categories in more
detail.
3.1 Mappings as tuples
This category of mapping languages relies on the very basic definition of a correspondence as the one of a pair of discrete entities in the language. The mapping element in this case is a tuple. Its general definition is roughly the following:
entity1 the first aligned entity. It corresponds to some discrete entity of the representation language.
entity2 the second aligned entity with the same constraint as entity1.
relation the relation holding between the two entities. It is not restricted to the
equivalence relation, but can be more sophisticated operators (e.g., subsumption, incompatibility [40], or even some fuzzy relation).
strength denotes the confidence held in this correspondence. Since many alignment
methods compute a strength of the relation between entities, this strength can be provided as a normalized measure. The measure should belong to an ordered set M including a maximum element and a minimum element. Most often, this value is restricted to be a float value between 0. and 1. In some approaches, this measure could
be generalised into lattice domain. The measure is by no mean characterizing the relationship (e.g., as a fuzzy relation which should be expressed in the relation attribute),
but reflects the confidence of the alignment provider in the relation holding between
the entities or probability of the fact that the given relation holds between the entities.
id an identifier for the correspondence.
In various approaches, the aligned entities may be classes, properties or individuals.
But they also can be any kind of complex term that is used by the target language. For
instance, it can use the concatenation of firstname and lastname considered in [62] if
this is an entity, or it can use a path algebra like in:
hasSoftCopy.softCopyURI = hasURL
3.2 Mappings as rules
This class of mapping representation languages considers sets of expressions of a particular language (L) with variables in these expressions. Correspondences are thus directional and correspond to a clause:
in which the variables of the left hand side are universally quantified over the whole
formula and those of the right hand side (which do not occur in the left hand side) are
existentially quantified. This level can express correspondences like:
This kind of rules (or restrictions) is commonly used in logic-based languages or in
the database world for defining the views in “global-as-view” of “local-as-view” approaches [15]. It also resembles the SWRL rule language [43] when used with OWL.
These rules can also be generalized to any relation and drop the orientation constraint.
4. Matching techniques
4.1 Problem Statement
The matching operation determines the alignment (or mapping) (A’) for a pair of
schemas/ontologies (o and o’). There are some other parameters which can extend the
definition of the matching process, namely: (i) the use of an input mapping (A) which
is to be completed by the process; (ii) the matching parameters, p (e.g., weights,
thresholds); and (iii) external resources used by the matching process, r (e.g., thesauri); see Figure 1.
Fig. 1. The matching process
4.2 The Matching Dimensions
There are many independent dimensions along which algorithms can be classified. As
from Figure 1, we may classify them according to (i) input of the algorithms, (ii)
characteristics of the matching process, and (iii) output of the algorithms.
Input dimensions. These dimensions concern the kind of input on which algorithms
operate. As a first dimension, algorithms can be classified depending on the conceptual models in which ontologies or schemas are expressed. For example, the Artemis
[16] system supports the relational, OO, and ER models; Cupid [47] supports XML
and relational models; QOM [24] supports RDF and OWL models. A second possible
dimension depends on the kind of data that the algorithms exploit: different approaches exploit different information of the input conceptual models, some of them rely only on schema-level information (e.g., Cupid [47], COMA[26]), others rely only on instance data (e.g., GLUE [27]), or exploit both, schema- and instance-level information
(e.g., QOM [32]). Even with the same data models, matching systems do not always
use all available constructs. In general, some algorithms focus on the labels assigned
to the entities, some consider their internal structure and the type of their attributes,
and some others consider their relations with other entities.
Process dimensions. A classification of the matching process could be based on its
general properties, as soon as we restrict ourselves to formal algorithms. In particular,
it depends on the approximate or exact nature of its computation. Exact algorithms
compute the absolute solution to a problem; approximate algorithms sacrifice exactness to performance (e.g., [32]). All the techniques discussed in the remainder of the
paper can be either approximate or exact. Another dimension for analyzing the matching algorithms is based on the way they interpret the input data. Often the three large
classes are distinguished. They are based on the intrinsic input, external resources, or
some semantic theory of the considered entities. We call these three classes syntactic,
external, and semantic respectively; and discuss them in detail in the next section.
Output dimensions. Apart from the information that matching systems exploit and
how they manipulate it, the other important class of dimensions concerns the form of
the result they produce. The form of the alignment might be of importance: is it a one
to- one correspondence between the ontology entities? Has it to be a final mapping? Is
any relation suitable?
Other significant distinctions in the output results have been indicated in [40]. One
dimension concerns whether systems deliver a graded answer, e.g., that the correspondence holds with 98% confidence or 4/5 probability; or an all-or-nothing answer,
e.g., that the correspondence definitely holds or not. In some approaches correspondences between schema/ontology entities are determined using distance measures. This
is used for providing an alignment expressing equivalence between these entities in
which the actual distance is the ground for generating a confidence measure in each
correspondence, usually in [0,1] range [35, 47]. Another dimension concerns the kind
of relations between aligned entities a system can provide. Most of the systems focus
on equivalence (=), while a few other are able to provide a more expressive result
(e.g., equivalence, subsumption (), incompatibility (), see for details [41]). In conclusion, there are many dimensions that can be taken into account when attempting at
classifying matching methods. In the next section we present a classification of elementary techniques that draws simultaneously on several such criteria.
4.3 A retained classification of elementary schema-based matching approaches
In this section mainly schema-based elementary matchers are discussed. Therefore,
only schema/ontology information is considered, not instance data. The exact/approximate opposition has not been used because each of the methods described
below can be implemented as exact or approximate algorithm, depending on the goals
of the matching system. To ground and ensure a comprehensive coverage for our
classification we have analyzed state of the art approaches used for schema-based
matching. The references section reports a partial list of works which have been scrutinized pointing to (some of) the most important contributions.
We have used the following guidelines for building our classification:
 Exhaustivity. The extension of categories dividing a particular category must cover its extension (i.e., their aggregation should give the complete extension of the
category);
 Disjointness. In order to have a proper tree, the categories dividing one category
should be pairwise disjoint by construction;
 Homogeneity. In addition, the criterion used for further dividing one category
should be of the same nature (i.e., should come from the same dimension). This
usually helps guaranteeing disjointness;
 Saturation. Classes of concrete matching techniques should be as specific and discriminative as possible in order to provide a fine grained distinctions between pos-
sible alternatives. These classes have been identified following a saturation principle: they have been added/modified till the saturation was reached, namely taking
into account new techniques did not require introducing new classes or modifying
them.
Notice that disjointness and exhaustivity of the categories ensures stability of the
classification, namely new techniques will not occur in between two categories. Classes of matching techniques represent the state of the art. Obviously, with appearance
of new techniques, they might be extended and further detailed.
The classification presented extends the previous work of classifying automated
schema matching approaches of [62]. The classification of [62] distinguishes between
elementary (individual) matchers and combinations of matchers. Elementary matchers
comprise instance-based and schema-based, element- and structure-level, linguistic
and constrained-based matching techniques. Also cardinality and auxiliary information (e.g., thesauri, global schemas) can be taken into account.
For classifying elementary schema-based matching techniques, two synthetic classifications (see Figure 2), based on what we have found the most salient properties of
the matching dimensions are reused. These two classifications are presented as two
trees sharing their leaves. The leaves represent classes of elementary matching techniques and their concrete examples. Two synthetic classifications are:
– Granularity/Input Interpretation classification is based on (i) granularity of
match, i.e., element- or structure-level, and then (ii) on how the techniques generally
interpret the input information;
– Kind of Input classification is based on the kind of input which is used by elementary matching techniques.
The overall classification of Figure 2 can be read both in descending (focusing on
how the techniques interpret the input information) and ascending (focusing on the
kind of manipulated objects) manner in order to reach the layer of basic techniques.
A retained classification of elementary schema-based matching approaches discuss
in turn Granularity/Input Interpretation, Basic Techniques, Kind of Input layers together with supporting arguments for the categories/classes introduced at each layer.
Elementary matchers are distinguished by the upper layer according to the following classification criteria:
– Element-level vs structure-level. Element-level matching techniques compute
mapping elements by analyzing entities in isolation, ignoring their relations with other
entities. Structure-level techniques compute mapping elements by analyzing how entities appear together in a structure. This criterion is the same as first introduced in [62].
– Syntactic vs external vs semantic. The key characteristic of the syntactic techniques is that they interpret the input in function of its sole structure following some
clearly stated algorithm. External are the techniques exploiting auxiliary (external) resources of a domain and common knowledge in order to interpret the input.
Fig. 2. A retained classification of elementary schema-based matching approaches
These resources might be human input or some thesaurus expressing the relationships between terms. The key characteristic of the semantic techniques is that they use
some formal semantics (e.g., model-theoretic semantics) to interpret the input.
To make the distinctions between the categories proposed clearer, we mark in bold
type the innovations with regard to the initial classification of [62]. In particular, in
the upper level we detail further (with respect to [62]), the element- and structurelevel of matching by introducing the syntactic vs semantic vs external criteria. The
reasons of having these three categories are as follows. Our initial criterion was to distinguish between internal and external techniques. By internal we mean techniques
exploiting information which comes only with the input schemas/ontologies. External
techniques are as defined above. Internal techniques can be further detailed by distinguishing between syntactic and semantic interpretation of input, also as defined
above. However, only limited, the same distinction can be introduced for the external
techniques. In fact, we can qualify some oracles (e.g.,WordNet[55], DOLCE[38]) as
syntactic or semantic, but not a user’s input. Thus, we do not detail external techniques any further and we omit in Figure 2 the theoretical category of internal techniques (as opposed to external).
Notice, that we also omit in further discussions element-level semantic techniques,
since the semantics is usually given in a structure and, hence there are no elementlevel semantic techniques.
Distinctions between classes of elementary matching techniques in the middle layer of our classification are motivated by the way a matching technique interprets the
input information in each concrete case. In particular, a label can be interpreted as a
string (a sequence of letters from an alphabet) or as a word or a phrase in some natural
language, a hierarchy can be considered as a graph (a set of nodes related by edges) or
a taxonomy (a set of concepts having a set-theoretic interpretation organized by a relation which preserves inclusion). Thus, we introduce the following classes of elementary schema/ontology matching techniques at the element-level: string-based,
language-based, based on linguistic resources, constraint-based, alignment reuse,
and based on upper level ontologies. At the structure-level we distinguish between:
graph-based, taxonomy-based, based on repositories of structures, and model-based
techniques. The lower layer classification is concerned with the type of input considered by a particular technique:
– The first level is categorized depending on which kind of data the algorithms
work on: strings (terminological), structure (structural) or models (semantics). The
two first ones are found in the ontology descriptions; the last one requires some semantic interpretation of the ontology and usually uses some semantically compliant
reasoner to deduce the correspondences.
– The second level of this classification decomposes further these categories if
necessary:
terminological methods can be string-based (considering the terms as sequences of
characters) or based on the interpretation of these terms as linguistic objects (linguistic). The structural methods category is split into two types of methods: those which
consider the internal structure of entities (e.g., attributes and their types) and those
which consider the relation of entities with other entities (relational).
Notice that following the above mentioned guidelines for building a classification
the terminological category should be divided into linguistic and non-linguistic techniques. However, since non-linguistic techniques are all string-based, this category
has been discarded.
We discuss below the main alternatives (also indicating in which matching systems
they were exploited) according to the above overall classification in more detail. The
order follows that of the Granularity/Input Interpretation classification. Finally, in
Figure 3, techniques which are marked in italic (techniques based on upper level ontologies and DL-based techniques) have not been implemented in any matching systems yet. However, we are arguing why their appearance seems reasonable in the near
future.
4.4 Matching techniques
String-based techniques are often used in order to match names and name descriptions of schema/ontology entities. These techniques consider strings as sequences of
letters in an alphabet. A comparison of different string matching techniques, from distance like functions to token-based distance functions can be found in [17]. Usually,
distance functions map a pair of strings to a real number, where a smaller value of the
real number indicates a greater similarity between the strings. Some examples of
string based techniques which are extensively used in matching systems are prefix,
suffix, edit distance, and n-gram tests.
 Prefix. This test takes as input two strings and checks whether the first string starts
with the second one. Prefix is efficient in matching cognate strings and similar acronyms (e.g., int and integer), see, for example [32, 35, 42, 46]. This test can be
transformed in a smoother distance by measuring the relative size of the prefix and
the ratio.
 Suffix. This test takes as input two strings and checks whether the first string ends
with the second one (e.g., phone and telephone), see, for example [42, 26, 47, 53].
 Edit distance. This distance takes as input two strings and computes the edit distance between the strings. That is, the number of insertions, deletions, and substitutions of characters required to transform one string into another, normalized by
max(length(string1), length(string2)). For example, the edit distance between NKN
and Nikon is 0.4. Some of matching systems exploiting the given test are [42, 26,
46].
 N-gram. This test takes as input two strings and calculates the number of the same
n-grams (i. e., sequences of n characters) between them. For example, trigram(3)
for the string nikon are nik, iko, kon, so the distance between nkon and nikon would
be 1/3. Some of matching systems exploiting the given test are [42, 26].
Language-based techniques consider names as words in some natural language (e.g.,
English). They are based on Natural Language Processing (NLP) techniques exploiting morphological properties of the input words.
 Tokenization. Names of entities are parsed into sequences of tokens by a tokenizer
which recognizes punctuation, cases, blank characters, digits, etc. (e.g., HandsFree Kits <hands, free, kits>, see, for example [41].
 Lemmatization. The strings underlying tokens are morphologically analyzed in order to find all their possible basic forms (e.g., KitsKit), see, for example [41].
 – Elimination. The tokens that are articles, prepositions, conjunctions, and so on,
are marked (by some matching algorithms, e.g., [47]) to be discarded.
Usually, the above mentioned techniques are applied to names of entities before
running string-based or lexicon-based techniques in order to improve their results.
However, we consider these language-based techniques as a separate class of matching techniques, since they are naturally extended in a distance computation (by comparing the resulting strings or sets of strings).
Constraint-based techniques are algorithms which deal with the internal constraints
being applied to the definitions of entities, such as types, cardinality of attributes, and
keys. We omit here a discussion of matching keys as these techniques appear in our
classification without changes in respect to the original publication [62]. However, we
provide a different perspective on matching datatypes and cardinalities.
 Datatypes comparison involves comparing the various attributes of a class with regard to the datatypes of their value. Contrary to objects that require interpretations,
the datatypes can be considered objectively and it is possible to determine how a
datatype is close to another (ideally this can be based on the interpretation of
datatypes as sets of values and the set-theoretic comparison of these datatypes, see
[67, 68]). For instance, the datatype day can be considered closer to the datatype
working day than the datatype integer. This technique is used in [36].
 Multiplicity comparison attribute values can be collected by a particular construction (set, list, multiset) on which cardinality constraints are applied. Again, it is
possible to compare the so constructed datatypes by comparing (i) the datatypes on
which they are constructed and (ii) the cardinality that are applied to them. For instance, a set of between 2 and 3 children is closer to a set of 3 people than a set of
10-12 flowers (if children are people). This technique is used in [36].
Linguistic resources such as common knowledge or domain specific thesauri are
used in order to match words (in this case names of schema/ontology entities are considered as words of a natural language) based on linguistic relations between them
(e.g., synonyms, hyponyms).
– Common knowledge thesauri. The approach is to use common knowledge thesauri to obtain meaning of terms used in schemas/ontologies. For example,WordNet [49]
is an electronic lexical database for English (and other languages), where various
senses (possible meanings of a word or expression) of words are put together into sets
of synonyms. Relations between schema/ontology entities can be computed in terms
of bindings between WordNet senses, see, for instance [13, 41]. Another type of
matchers exploiting thesauri is based on their structural properties, e.g., WordNet hierarchies. In particular, hierarchy-based matchers measure the distance, for example,
by counting the number of arcs traversed, between two concepts in a given hierarchy,
see [42]. Several other distance measures for thesauri have been proposed in the literature, e.g., [61, 64].
– Domain specific thesauri. These kind of thesauri usually store some specific domain knowledge, which is not available in the common knowledge thesauri, (e.g.,
proper names) as entries with synonym, hypernym and other relations, for instance
[47].
Alignment reuse techniques represent an alternative way of exploiting external resources, which contain in this case alignments of previously matched schemas/ontologies. For instance, when we need to match schema/ontology o1 and o3,
given the alignments between o1 and o2, and between o2 and o3 from the external resource, storing previous match operations results. The alignment reuse is motivated
by the intuition that many schemas/ontologies to be matched are similar to already
matched schemas/ontologies, especially if they are describing the same application
domain. These techniques are particularly promising when dealing with large schemas/ontologies consisting of hundreds and thousands of entities. In these cases, first,
large match problems are decomposed into smaller sub-problems, thus generating a
set of schema/ontology fragments matching problems. Then, reusing previous match
results can be more effectively applied at the level of schema/ontology fragments
compared to entire schemas/ontologies. The approach was first introduced in [62],
and later was implemented as two matchers, i.e., (i) reuse alignments of entire schemas/ontologies, or (ii) their fragments, see, for details [1, 26, 63].
Upper level formal ontologies can be also used as external sources of common
knowledge. Examples are the Suggested Upper Merged Ontology (SUMO) [57] and
Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE) [38]. The
key characteristic of these ontologies is that they are logic-based systems, and therefore, matching techniques exploiting them will be based on the analysis of interpreta-
tions. Thus, these are semantic techniques. For the moment, we are not aware of any
matching systems which use these kind of techniques. However, it is quite reasonable
to assume that this will happen in the near future. In fact, for example, the DOLCE
ontology aims at providing a formal specification (axiomatic theory) for the top level
part of WordNet. Therefore, most probably that systems exploiting now in their
matching process resources of WordNet might also consider resources of DOLCE as
potential extensions of their systems.
Graph-based techniques are graph algorithms which consider the input as labeled
graphs. The applications (e.g., database schemas, taxonomies, or ontologies) are
viewed as graph-like structures containing terms and their inter-relationships. Usually,
the similarity comparison between a pair of nodes from the two schemas/ontologies is
based on the analysis of their positions within the graphs. The intuition behind is that,
if two nodes from two schemas/ontologies are similar, their neighbors might also be
somehow similar. Below, we present some particular matchers representing this intuition.
 Graph matching. There have been a lot of work done on graph (tree) matching in
graph theory and also with respect to schema/ontology matching applications, see
[65, 73]. Matching graphs is a combinatorial problem that can be computationally
expensive. It is usually solved by approximate methods. In schema/ontology
matching, the problem is encoded as an optimization problem (finding the graph
matching minimizing some distance like the dissimilarity between matched objects) which is further resolved with the help of a graph matching algorithm. This
optimization problem is solved through a fix-point algorithm (improving gradually
an approximate solution until no improvement is made). Example of such algorithms are [53] and [36]. Some other (particular) matchers handling DAGs and
trees are children, leaves, and relations.
 Children. The (structural) similarity between inner nodes of the graphs is computed
based on similarity of their children nodes, that is, two non-leaf schema elements
are structurally similar if their immediate children sets are highly similar. A more
complex version of this matcher is implemented in [26].
 Leaves. The (structural) similarity between inner nodes of the graphs is computed
based on similarity of leaf nodes, that is, two non-leaf schema elements are structurally similar if their leaf sets are highly similar, even if their immediate children
are not, see, for example [26, 47].
 Relations. The similarity computation between nodes can be also based on their relations, see [48].
Taxonomy-based techniques are also graph algorithms which consider only the specialization relation. The intuition behind taxonomic techniques is that is-a links connect terms that are already similar (being a subset or superset of each other), therefore
their neighbors may be also somehow similar. This intuition can be exploited in several different ways:
 Bounded path matching. Bounded path matchers take two paths with links between classes defined by the hierarchical relations, compare terms and their positions along these paths, and identify similar terms, see, for instance [58].
 Super(sub)-concepts rules. These matchers are based on rules capturing the above
stated intuition. For example, if super-concepts are the same, the actual concepts
are similar to each other. If sub-concepts are the same, the compared concepts are
also similar, see, for example [23, 32].
Repository of structures stores schemas/ontologies and their fragments together with
pairwise similarities (e.g., coefficients in the [0,1] range) between them. Notice, that
unlike to alignment reuse, repository of structures stores only similarities between
schemas/ontologies, not alignments. In the following, to simplify the presentation, we
call schemas/ontologies or their fragments as structures. When new structures are to
be matched, they are first checked for similarity to the structures which are already
available in the repository.
The goal is to identify structures which are sufficiently similar to be worth matching in more detail, or reusing already existing alignments. Thus, avoiding the match
operation over the dissimilar structures. Obviously, the determination of similarity between structures should be computationally cheaper than matching them in full detail.
The approach of [63], in order to match two structures, proposes to use some
metadata describing these structures, such as, fragment name, root name, number of
nodes, maximal path length, etc. Then, these indicators are analyzed and are aggregated into a single coefficient, which estimates similarity between them. For example,
schema S1 might be found as an appropriate match to schema S2 since they both have
the same number of nodes.
Model-based algorithms start from the consideration that each element belonging to
real word ontologies represents a meaning, and that, consequently, mappings between
such elements have to represent relations between meanings. Allowed relations between ontology elements are those that can express relations between concepts, namely ‘, , , , ’, where ‘’ means ‘disjoint’. Following this intuition, these approaches (i) try to build a formal representation of the meaning to be associated to
each ontology node, and (ii) compute the relations between such meanings comparing
them. Two examples of such methods have been proposed, which differ for the logics
used for encoding the meaning: propositional logics (PL) and description logics (DL).
 Propositional logic-based (PL). In [13, 40, 41, 76, 77], the authors propose to encode the meaning of each node as a propositional formula. As an example, a node
labeled
with
SIVIGLIA,
lying
in
the
two
different
paths
IMAGES/EUROPE/SIVIGLIA and PHOTOS/SPAIN/SIVIGLIA is translated into
the two formulas  = (IMAGES  EUROPE  SIVIGLIA) and  = (PHOTOS  SPAIN 
SIVIGLIA). Mappings are then computed by comparing such meanings using a
standard SAT solver. As an example, in order to determine if the two nodes
SIVIGLIA are equivalent, such satisfiability problem is built: Axioms |=   ,
where Axioms is a possibly empty set of axioms used for improving the results of
comparison. As an example, such set can contain the following axioms: PHOTOS 
IMAGES, SIVIGLIA  EUROPE,
SIVIGLIA  SPAIN. In [13, 40, 41],
WORDNET is used as a source of external knowledge. Notice that SAT deciders
are correct and complete decision procedures for propositional satisfiability, and
therefore, they can be used for an exhaustive check of all the possible mappings.
 DL-based techniques. Propositional language is limited in its expressivity in order
to codify element meanings, as it can not express, for example, properties or roles
between concepts. As an example, it cannot codify the different meanings expressed by the two following paths: ART/EUROPE/WEST/SPAIN/SOUTH,
ART/EUROPE/SOUTH/SPAIN/WEST, which intuitively represent the arts of
southern and western Spain respectively. Using PL-based approaches described in
previous paragraph, an undesired equivalence relation is obtained. In order to overcome these limitations, a new system, for encoding such meanings using Description logics, has been proposed in [74], as an extension of [13]. In this case, the two
paths are encoded as ART  producedIn.((SPAIN portion.SOUTH) 
partOf(EUROPE  portion.WEST)) and ART  producedIn.((SPAIN  portion.WEST)  partOf.(EUROPE  portion.SOUTH)) respectively. Such DL
terms respect the difference between node meanings, so that no wrong entailment
can be reached, and, consequently, no mapping is computed. Furthermore, in [74]
offers the possibility to add each consistent OWL ontology as further external
knowledge.
There are other relevant examples in the literature of DL-based techniques used in
schema/ontology integration applications. For example, in spatio-temporal database
integration scenario, as first motivated in [60] and later developed in [66] the interschema mappings are initially proposed by the integrated schema designer and are encoded together with input schemas in ALCRP language. Then, DL reasoning services
are used to check the satisfiability of the two source schemas and the set of interschema mappings. If some objects are found unsatisfied, then the inter-schema mappings should be reconsidered.
Another example is when DL-based techniques are used in query processing scenario [54]. The approach assumes that mappings between pre-existing domain ontologies are already specified in a declarative manner (e.g., manually). User queries are
rewritten in terms of pre-existing ontologies and are expressed in Classic [12], and
further evaluated against real-world repositories, which are also subscribed to the preexisting ontologies. An earliest approach for query answering by terminological reasoning is described in [7].
Finally, a very similar problem to schema/ontology matching is addressed within
the system developed for matchmaking in electronic marketplaces [22]. Demand D
and supply S requests are translated from natural language sentences into Classic [9].
The approach assumes the existence of a pre-defined domain ontology T , which is
also encoded in Classic. Matchmaking between a supply S and a demand D is performed with respect to the pre-defined domain ontology T . Reasoning is performed
with the help of the NeoClassic reasoner in order to determine the exact match (T |=
(D  S)) and (T |= (S  D)), potential match (if D  S is satisfiable in T ), and nearly
miss (if D  S is unsatisfiable in T ). The system also provides a logically based
matching results rank operation.
The main difference between these approaches and [74] is that they seem to presuppose the existence of inter-schema/inter-ontology mappings, and do not support
their discovery. DL reasoners are then used only to enable integration using reasoning
on these mappings.
5. Systems descriptions
The various methods presented above in isolation have been put together in order to
implement ontology alignment or schema matching systems. There are a number of
available systems that can be seen as addressing ontology alignment. We present
some of them below through their principles and availability. Some of the following
systems are developed by the projects partners and thus will be usable in order to
benchmark them in the future. There were some comparisons of these systems, in particular in [25, 62, 44, 60]. Our purpose here is not really to compare them, but rather
to show their variety.
5.1 Prompt and Anchor-Prompt (Stanford SMI)
The Anchor-PROMPT [58] (an extension of PROMPT, also formerly known as
SMART) is an ontology merging and alignment tool with a sophisticated prompt
mechanism for possible matching terms. The anchor-PROMPT alignment algorithm
takes as input two ontologies and a set of anchors-pairs of related terms, which are
identified with the help of string-based techniques, or defined by a user, or another
matcher computing linguistic (dis)similarity between frame names (labels at nodes),
for example [51]. Then it refines them based on the ontology structures and users
feedback.
It constructs a directed labeled graph representing the ontology from the hierarchy
of concepts (called classes in the algorithm) and the hierarchy of relations (called slots
in the algorithm), where nodes in the graph are concepts and arcs are relations denoting relationships between concepts (the labels are the names of the relations). An initial list of anchors-pairs of related concepts defined by the users or automatically
identified by lexical matching is the input for the algorithm. Anchor- PROMPT analyzes then the paths in the sub-graph limited by the anchors and it determines which
concepts frequently appear in similar positions on similar paths. Based on these frequencies, the algorithm decides if these concepts are semantically similar concepts.
The PROMPT and Anchor-PROMPT systems have also contributed to the design of
other algorithms such as PROMPTDiff which finds differences between two ontologies and provides the editing operation for transforming one ontology into another.
5.2 Chimerae (Stanford KSL)
Chimaera is an environment for merging and testing (diagnosing) large ontologies
[51]. Matching in the system is performed as one of the major subtasks of a merge
operator. Chimaera searches for merging candidates as pairs of matching terms, involving term names, term definitions, possible acronym and expanded forms, names
that appear as suffixes of other names. It also has techniques to identify terms that
should be related by subsumption, disjointness, etc.
5.3 Rondo (Stanford U./U. Leipzig)
Rondo [52] is an environment for model (e.g., database schema) engineering which
provides many unit primitives for manipulating models (extract, restrict, delete) and
way to compose them. Among the unit primitives is the implementation of Similarity
flooding (see § 4.2.1). It converts schemas (SQL DDL, XML) into directed labeled
graphs whose nodes are candidate aligned pairs and arcs are shared properties. Arcs
are weighted by their relevance to the nodes.
5.4 MoA (ETRI)
MOA1 is an environment for merging ontologies developped by Electronics and Telecomunication Research Institute (ETRI) in South Korea. It is a library of methods and
a shell for using them. It can work on OWL (but does not tell which flavor) and contains methods for importing, aligning, modifying and merging ontologies. Unfortunately, the methods are not known beside that they are based on (dis)similarity. The
system uses Jena and Wordnet.
5.5 Cupid (Microsoft research)
The Cupid system [47] implements a generic schema matching algorithm combining
linguistic and structural schema matching techniques, and computes normalized similarity coefficients with the assistance of a precompiled thesaurus. Input schemas are
encoded as graphs. Nodes represent schema elements and are traversed in a combined
bottom-up and topdown manner. Matching algorithm consists of three phases and operates only with tree-structures to which no-tree cases are reduced. The first phase
(linguistic matching) computes linguistic similarity coefficients between schema element names (labels) based on morphological normalization, categorization, stringbased techniques and a thesaurus look-up. The second phase (structural matching)
computes structural similarity coefficients which measure the similarity between contexts in which individual schema elements occur in the schemas under consideration.
The main idea behind the structural matching algorithm is to rely more on leaf level
matches instead of the immediate descendents or intermediate substructures when
computing similarity between non leaf elements. The third phase (mapping generation) computes weighted similarity coefficients and generates final mappings by
choosing pairs of schema elements with weighted similarity coefficients which are
higher than a threshold. In comparison with the other hybrid matchers e.g., Dike [59]
and Artemis (see 5.9), referring to [47], Cupid performs better in the sense of mapping quality.
5.6 Glue (U. of Washington)
Glue [28] is an evolved version of LSD [29] whose goal is to semiautomatically find
schema mappings for data integration. Like its ancestor LSD, Glue use machine learning techniques to find mappings [30]. It first applies statistical analysis to the availa-
ble data (joint probability distribution computation). Then generates a similarity matrix, based on the probability distributions, for the data considered and use “constraint
relaxation” in order to obtain an alignment from the similarity. The algorithm works
in three steps:
 learning distributions the first phase is described above, it learns the joint probability distributions of classes of each ontologies;
 similarity estimation the system estimates the similarity between two classes in
function of their joint probability distributions.
 relaxation produces an alignment from the similarity matrix by using heuristic
rules for choosing the more likely correspondences.
5.7 FCA-merge (U. Karlsruhe)
FCA-merge [68] uses formal concept analysis techniques to merge two ontologies
sharing the same set of instances. The overall process of merging two ontologies consists of three steps:
1. instance extraction,
2. concept lattice computation, and
3. interactive generation of the final merged ontology.
The algorithms theoretically merges two ontologies sharing the same set of instances. However, the authors provide, as first step, methods for extracting the instances from documents. The extraction of instances from text documents circumvents the problem that in most applications there are no individuals which are
simultaneously instances of the source ontologies, and which could be used as a basis
for identifying similar concepts.
The computation of the lattice starts with two ontologies and instances belonging
to both ontologies. From these, it computes two formal contexts, i.e., boolean tables
indicating which instance belongs to which concept of the of the ontology. It then
merges both contexts (by renaming the concepts and adding both contexts). Using
classical formal concept analysis (i.e., the closure of an instances×properties Galois
connection [39]) on contexts made of instances×concepts, the method generates a
pruned concept lattice. The lattice is pruned of all the concepts which are not more
general than a concept of one of the ontologies. The last step consists in helping a user
to further simplify the lattice and generate the taxonomy of an ontology. The produced result is explored and transformed to a merged ontology by the ontology engineer. The final step of deriving the merged ontology from the concept lattice requires
human interaction. The result is rather a merge than an alignment. However, the concepts that are merged can be considered as exactly aligned and those which are not
can be considered in subsumption relation with their ancestors or sibblings.
5.8 IF-Map
Another system inspired by formal concept analysis is IF-Map [44]. It is an automatic
method for ontology mapping based on the Barwise-Seligman theory of information
flow [3]. The basic principle of IF-map is to align two local ontologies by looking at
how these are mapped from a common reference ontology. It is assumed that such
reference ontology is not populated with instances, while local ontologies usually are.
IF-Map generates possible mappings between an unpopulated reference ontology and
a populated local ontology by taking into account how local communities classify instances with respect to their local ontologies.
5.9 Artemis (U. Milano/U.Modena and Reggio Emilia)
Artemis (Analysis of Requirements: Tool Environment for Multiple Information Systems) [16] was designed as a module of MOMIS mediator system [6, 7] for creating
global views. Artemis does not cover all the issues of matching due to the origin function of schema integration. The matching algorithm performs affinity-based analysis
and hierarchical clustering of source schemas elements. Affinity-based analysis is carried out through computation of the name, structural and global affinity coefficients
by exploiting a common thesaurus.
The common thesaurus presents a set of terminological and extensional relationships which depicts intra- and inter-schema knowledge about classes and attributes of
the input schemas, which is built with the help of WordNet [55] and ODB-Tools [5].
A hierarchical clustering technique exploiting global affinity coefficients categorizes classes into groups at different levels of affinity. For each cluster it creates a set
of global attribute global class. Logical correspondence between the attributes of a
global class and source attributes is determined through a mapping table.
5.10 T-tree (INRIA Rhône-Alpes)
Troeps [50] was a knowledge representation system enabling several class taxonomies (called viewpoints) over the same set of objects and bridges between these
classes expressed equivalence or subsumption. T-tree [33] is an environment for generating taxonomies and classes from objects (instances). It can, in particular, infer dependencies between classes (bridges) of different ontologies sharing the same set of
instances based only on the “extension” of classes.
An algorithm has been developed which is able to infer bridges. The bridge inference algorithm, given a set of source viewpoints and a destination viewpoint (built by
T-Tree or by any other mean), returns all the bridges (in a minimal fashion) which are
satisfied by the available data. That is the set of bridges for which the objects in every
source class are indeed in the destination class. The algorithm compares the extension
(set of instances) of the presumed destination to the intersection of these of the presumed source classes. If there is no inclusion of the latter in the former, the algorithm
is re-iterated on all the sets of source classes which contain at least one class which is
a sub-class of the tested source classes. If the intersection of the extension of the presumed source classes is included in that of the presumed destination class, then a
bridge can be established from the latter (and also from any set of sub-classes of the
source classes) to the former (and also any super-class of the destination class). But
other bridges can exists on the sub-classes of the destination. The algorithm is thus reiterated on them. It stops when the bridge is trivial, i.e. when the source is empty. The
algorithm is extension-correct (only valid bridges are inferred), extension-complete
(all valid bridges are inferred) and extension-minimal (only more general bridges are
inferred). The proof is carried out in the classification scheme framework and the “extension-” prefix just tells that what is considered is only the extension of the classes
(the algorithm tests set inclusion on classes). Thus these results are not semantically
grounded. For instance, is that a coincidence that all directors have formerly been at
the same university? Maybe, maybe not. Hence the user has to decide the validation
of inferred bridges. This has to be contrasted with a stronger kind of bridge inference
based on the structural constraints on classes. But indeed, any possible bridge compliant with the current set of objects and the semantics must be a restriction of one of the
bridges provided by the algorithm.
Bridge inference is nothing else than the search for correlation between two sets of
variables. This correlation is particular from a data analysis point of view since it does
not need to be valid on the whole set of individuals (the algorithm looks for subsets
under which the correlation is valid) and it is based on strict set equality (not similarity). However, even if the bridge inference algorithm has been described with set inclusion, it can be helped by other measurements which will narrow or broaden the
search. More generally, the inclusion and emptiness tests can be replaced out by tests
based on the similarity of two sets of objects (as it is usual in data analysis). In fact,
many parameters can be taken into account when inferring bridges; for that purpose,
the algorithm is function of the meaning of the operators _, \ and = ;-test. A second
version of the algorithm (with the same properties) were made available and used
structural comparison: _ is subtyping, \ is type intersection and = ;-test is a sub-typing
test.
5.11 CtxMatch (U. Trento / ITC-IRST)
CtxMatch is a model based schema/ontology matcher which compute mapping
(namely a set of point-to-point relations) between schema/ontology elements.
CtxMatch (i) encodes the meaning of each node in a formal language, and (ii) computes a mapping between elements by comparing their meaning. The accepted relations between schema/ontology elements are , , , ,  (where ‘’ means ‘disjoint’). In the first version of CtxMatch [13], the algorithm used propositional logic
for approximating the element meanings. In order to overcome the limitations inherent to the propositional language expressivity, as described in section 4.4, concerning
the Model-Based Systems, in [74] a new encoding is proposed, where the meaning of
the nodes is encoded into Description Logics terms.
Figure 3 shows the architecture of the CtxMatch system. It has two main modules:
Context Creator and Context Matcher.
The first takes as input a schema with attributes and returns a context. Currently
accepted formats are: XML, XMLSchema, OWL, plain text. The main steps for the
transformation into a context are:
Syntactic normalization: different input formats are transformed into a unique
(internal) format (sintactically normalized schema)
Semantic normalization: essentially, a syntactically normalized schema is transformed into a rich schema (called context) where each node is associated
with a Description Logics term representing its contextual meaning (its
meaning in the context of that schema). For example, a (path in a) schema S1
composed
by
the
nodes
VACATION[location=’Italy’]/HONEYMOON/INSURANCES is associated
with three DL terms, VACATION  location. ITALY, HONEYMOON  location. ITALY, INSURANCES  against. (HONEYMOON  location.
ITALY), representing the meanings of the three nodes VACATION,
HONEYMOON and INSURANCES respectively.
Fig. 3. The CtxMatch architecture.
Semantic normalization invokes the following services:
 Lexical Knowledge Extractor: this service takes as input a lexicon repository and a word, and returns the concepts possibly denoted by the word
with respect to lexicon repository. Currently, WordNet is used as the only
source of lexical knowledge.
 Background Knowledge Extractor: It takes as input an ontology repository and a pair of concepts and returns the relation that holds between the
concepts with respect to the ontology repository. Currently, the ontological part of WordNet and some OWL ontologies are used as repository.
Anyway, each consistent OWL ontology can be added as further resource.
 Label Parser: This service takes as input a label and returns a DL terms
expressing its meaning. This label parser preprocesses the label elements
(words) using a standard lemmatizer and a POS tagger. Then, it creates a
label-specific parse tree, expressing the logic dependencies between label
elements;
 Path Composer: It takes as input a set of labels belonging to different
node and combine them in order to express the meanings of the node.
The Context Matcher takes two contexts (semantically normalized schemas)
and returns a rmapping between them (i.e. a collection of pointwise relations between
elements of the two schemas). In the current version of the system, RACER DL Reasoner [75] is used to compute mappigs. Background Knowledge Extractor is further
exploited in order to possibly increase the meaning comparison. As an example, imagine
a
schema
S2
containing
the
following
three
nodes,
POLICIES/HONEYMOON/ITALY. The ‘Context Creator’ returns the DL terms
POLICIES,
POLICIES

against.
against.(HONEYMOON  location. ITALY.
HONEYMOON,
POLICIES

Context Matcher returns an equivalentce relation between the node INSURANCES of the previous schema and the node
ITALY, as the DL terms INSURANCES  against. (HONEYMOON  location.
ITALY) and POLICIES  against.(HONEYMOON  location. ITALY) are provably equivalent (modulo lexical knowledge).
5.12 S-MATCH (U. Trento)
S-Match is a schema/ontology matching system that implements the semantic
matching approach [41, 13]. It takes two graph-like structures (e.g., database schemas
or ontologies) as input and returns semantic relations between the nodes of the graphs,
that correspond semantically to each other, as output. Possible semantic relations are:
equivalence (=), more general (), less general (), mismatch () and overlapping
().
The current version of S-Match is a rationalized re-implementation of the
CTXmatch system [13] with a few added functionalities. S-Match is schema based,
and, as such, it does not exploit the information encoded in data instances. S-Match is
a hybrid system performing composition of element level techniques. At present, SMatch allows it to handle only tree-like structures (e.g., taxonomies or concept hierarchies).
S-Match was designed and developed as a platform for semantic matching, namely
a highly modular system with the core of computing semantic relations where single
components can be plugged, unplugged or suitably customized. The logical architecture of the system is depicted in Figure 4.
Fig. 4. The logical architecture.
The input schemas (trees) are codified in a standard internal XML format. This internal format can be loaded from a file that is manually edited, or can be produced by
an input format dependent translator. The module taking input schemas/ontologies
does the preprocessing. In particular, it computes in a top-down manner for every label in a tree the meaning captured by the given label in a schema or ontology using
the techniques described in [49]. The preprocessing module has access to the set of
oracles which provide the necessary a priori lexical and domain knowledge. In the
current version WordNet [55] is the only oracle. The output of the module is an enriched tree. These enriched trees are stored in an internal database (PTrees) where
they can be browsed, edited and manipulated.
The Matching Manager coordinates matching process using three extensible libraries. The first library is contained of, what is called in [41], weak semantics element
level matchers. They perform string manipulations (e.g., prefix, n-grams analysis, edit
distance, soundex, data types, and so on) and try to guess the semantic relation implicitly encoded in similar words. The current version of S-Match contains 13 weak semantics element level matchers. The second library is contained of strong semantics
element level matchers, namely oracles. Currently, WordNet is the only oracle. The
third library is contained of structure level strong semantics matchers, namely SAT
solvers (among the others, the SAT deciders that we are currently testing is JSAT [11]
and Open4J by Daniel Le Berre).
5.13 Coma (U. Leipzig)
The COMA system [26] is a generic schema matching tool, which implements composite generic matchers. COMA provides an extensible library of matching algorithms; a framework for combining obtained results, and a platform for the evaluation
of the effectiveness of the different matchers. Matching library is extensible, and as
from [Do and Rahm, 2002] it contains 6 individual matchers, 5 hybrid matchers, and
one “reuse-oriented” matcher. Most of them implement string-based techniques as a
background idea; others share techniques with Cupid (see § 5.5) but reuse-oriented is
a completely novel matcher, which tries to reuse previously obtained results for entire
new schemas or for its fragments. Schemas are internally encoded as rooted directed
acyclic graphs, where elements are the paths. This aims at capturing contexts in which
the elements occur. One of the distinct features of the COMA tool is the capability to
perform iterations in matching process. It presumes interaction with a user which approves obtained matches and mismatches to gradually refine and improve the accuracy of match.
Based on the comparative evaluations conducted in [25], COMA dominates Autoplex& Automatch [8, 9] LSD [29], Glue [27], SF [53] and SemInt [46] matching
tools.
5.14 Buster (U. Bremen)
The Bremen University Semantic Translator for Enhanced Retrieval (BUSTER) [71]
is an information broker middleware that was built to access heterogeneous and distributed information sources and to assess their conceptual, spatial, and temporal relevance with respect to a specific information request. BUSTER can also be used to integrate heterogeneous information through the resolution of structural, syntactical,
and semantic heterogeneities. To be more precise, the BUSTER system provides two
subsystems, one for information filtering and one for information integration.
The BUSTER search module supports the specification of queries of the type concept @ location in time [72]. In addition to the conceptual semantics, the system evaluates the spatial as well as the temporal relevance of an information source. In order
to be able to reason about conceptual, spatial, and temporal relevance, BUSTER utilizes metadata that provide formal descriptions of the respective context of an information source. In principle, the main difference with respect to other system for query
processing and information integration lies in the fact that the user does commit to a
basic vocabulary that is used to define concepts in all the source ontologies. The basic
vocabulary ensures that different source ontologies are comparable to each other. By
formulating the query in terms of this basic vocabulary we the query can be interpreted with respect to all source ontologies in the system. In particular, each concept base
on the shared vocabulary and can be constructed with the help of some construction
operators like u,t well-known from description logics. Because each concepts also
from different source ontologies can be flatten to terms which only consists of elements of the shared vocabulary combined with some construction operators, they can
easily compared with respect to equality (=), subsumption (), overlap (C  D is
consistent), and inconsistence (C  D is inconsistent). In other words, BUSTER can
automatically determine these concepts in a source ontology that are most similar to
the concept we asked for.
5.15 MULTIKAT (INRIA Sophia Antipolis)
MULTIKAT [23, 24] is a tool enabling comparison and merging of two ontologies,
represented in Sowa’s conceptual graph formalism [67]. In this formalism, an ontology is represented through a support (i.e. a hierarchy of concept types, a hierarchy of
relation types, a set of markers for identifying the instances and a conformity relation
enabling to determine which types are compatible with a given marker). The building
of an integrated ontology from two ontologies relies on the following steps:
1. Comparison and merging of the two concept type hierarchies: this step enables to
solve name conflicts and in case of need, to add new concept types and to adapt
concept type definitions.
2. Comparison and merging of the two relation hierarchies: this step enables to solve
name conflicts, and in case of need, to add new relation types, to adapt relation
type definitions and to adapt relation type signatures.
3. Comparison and merging of the two sets of markers: this phase helps to solve
name conflicts and to adapt the conformity relation.
MULTIKAT relies on a cooperative approach: the knowledge engineer can use
MULTIKAT editor to tune the parameters and weights used in the mapping and
merging algorithms.
5.15.1 Mapping of two types in both hierarchies
The mapping algorithm aims at determining, in two concept (resp. relation) type hierarchies, which types are identical. It relies on two phases:
Phase 1: terminology-based mapping During this first phase, MULTIKAT algorithm tries to identify which types of both hierarchies are similar, according to their
main names and their synonyms. The knowledge engineer can combine several criteria and assign them different weights so as to privilege some criteria:
 • t1 and t2 have the same main name,
 • the number of common synonyms of t1 and t2 is greater than a given threshold,
 • the main name of one type belongs to the list of synonyms of the other type. This
similarity function Sim1 : H1×H2 ! R computes the similarity measure Sim1(t1, t2)
between t1, a type of H1 and t2, a type of H2, according to this first identification
phase, and its results are stored in a similarity matrix. After this phase, two types t1
and t2 are 1-similar iff Sim1 (t1, t2) is greater than a threshold Tsimilar.
Phase 2: context-based mapping In this second phase, the mapping algorithm
now considers the contexts of the types to be compared. The context of a type consists
of its relatives (i.e. its direct supertypes and its direct subtypes) in the type hierarchy.
In this second phase, the algorithm tries to identify which types of both hierarchies are
the same, according to their contexts. Three mapping cases are distinguished:
 • The number of 1-similar direct supertypes (resp. direct subtypes) of t1 and t2 are
greater than a threshold Tpred (resp. Tsucc)
 • All the direct supertypes (resp. direct subtypes) of t1 and t2 are 1-similar.
 • The set of relatives of t1 (resp. t2) is included in the set of relatives of t2 (resp. t1)
w.r.t. 1-similarity.
The knowledge engineer can associate different weights to these three cases. Another similarity function Sim2(t1, t2) is computed. If t1 is the type numbered i in Hier1
and t2 the type numbered j in Hier2, then, in the final similarity matrix SimMatr:
SimMatrij = Sim1(t1, t2) + Sim2(t1, t2)
The couples of identical types are computed from this similarity matrix. After the
second phase, the types t1 and t2 are considered as identical iff SimMatrij is the maximum value in the ith line and the jth column in the matrix, and this value is greater
than a threshold Tsame.
Two comparison strategies can be applied:
One-to-one algorithm For each cycle of comparison of two previous identification
phases, the algorithm compares each type of H1 to each type of H2.
Hierarchy-match algorithm This algorithm takes into account the hierarchical
structure in its comparison strategy. It relies on a depth-first search in both hierarchies
and it proceeds as follows: once two identical types have been found, then a search
for further mappings in their sub-hierarchies is performed. In this algorithm, the
thresholds Tsimilar and Tsame have the same values. In both previous phases, after each
evaluation of a couple of types (t1, t2), the corresponding value SimMatrij is compared
to Tsame. As soon as SimMatrij > Tsame, then the pair (t1, t2) is included in the set
IdenticalConceptTypes.
5.15.2 Merging of concept type hierarchies
The knowledge engineer can initialize the set IdenticalConceptTypes by indicating
which types of both hierarchies are already known as identical. The mapping algorithm is applied (either with a one-to-one match strategy or with a hierarchy match
strategy). Then, before the merging, the partial ordering relation of identical types is
checked. The couples, responsible for violation of the merging precondition are eliminated from IdenticalConceptTypes. Then the integrated hierarchy Tccom is built by
representing each couple of identical types by a single type in Tccom and by adding
the types appearing in only one hierarchy. If a type is present only in one ontology
and cannot be mapped to any type of the second ontology, it will be kept in the integrated hierarchy, with a prefix in its name indicating from which expert it comes
from. If a type is present in both hierarchies, the experts can choose its final main
name stored in Tccom. In all cases, the associated synonyms are also stored in Tccom.
The algorithm tries to detect terminology conflicts, topology conflicts and conflicts
specific to conceptual graph formalism.
5.15.3 Comparison and merging of the relation type hierarchies
The mapping algorithm for relation type hierarchies is similar to the 2-phase-based
algorithm previously presented. Once obtained the set IdenticalRelationTypes of pairs
of identical relation types, the precondition for merging of the two hierarchies must
also be checked. When two relation types are considered as identical, a verification of
their signature compatibility must be performed.
The signatures in the integrated relation type hierarchy Trcom must be adapted according to the integrated concept type hierarchy Tcom.
If a relation type is present only in one ontology, its signature is preserved in the
integrated ontology. The signature of the integrated relation type obtained from two
identical relations types relies on the supremum of the concept types appearing in
their signatures.
5.15.4 Comparison and merging of the marker sets
The terminology-based mapping algorithm is used for the set of markers. When two
markers are identical, their conformity relation must be compatible, otherwise they
are eliminated from the set IdenticalMarkers.
5.15.5 Implementation
MULTIKAT was implemented in C/C++ and JAVA, above the conceptual graph platform, COGITO (developed by the LIRMM) and was applied in traffic accident analysis.
5.16 ASCO (INRIA Sophia-Antipolis)
ASCO prototype relies on an algorithm that identifies the pairs of corresponding elements in two different ontologies [2]. These pairs may be pairs of concepts (classes)
in the two ontologies or pairs of relations, or even pairs of a concept in one ontology
and a relation in the other ontology.
ASCO tries to use as much as possible available information contained in the ontology for the process of matching two ontologies. This information consists of identifiers (names), labels, comments of concepts, identifiers, labels, comments, domain,
range of relations, structure of the taxonomy of concepts or of relations, data instances
of ontology, annotations, axioms, rules. So far, in its matching process, ASCO already
takes into account some of above information such as identifiers, labels, comments of
concepts, identifiers, labels, comments, domain, range of relations, structure of the
taxonomy of concepts or of relations.
The matching process of ASCO is composed of several phases. The linguistic
phase applies linguistic processing techniques, and uses string comparison metrics,
and lexical databases such as WordNet to compute the similarity of two concepts or
two relations. In the linguistic processing step, ASCO normalizes firstly terms, expressions thanks to punctuation, upper case, special symbols, digits to have a set of
tokens. These tokens are then compared using string comparison metrics such as JaroWinkler, Levenstein or Monger-Elkan. Based on token similarities, the similarity of
sets of tokens is computed. To increase the accuracy and to avoid the problems of
term conflicts, a lexical database such as WordNet is integrated. To compute the similarity between long texts (for example, between the comments or descriptions of classes or of relations), ASCO uses Term frequency/Inverse document frequency metrics
after applying a linguistic processing step to eliminate all of the stopwords in long
texts.
The computed linguistic similarities are input for the structural phase. In this phase,
ASCO tries to exploit the structure of ontology taxonomy for modifying or asserting
the similarity of two concepts or relations. The similarities of classes or of relations
are iteratively propagated to their neighbors in the tree of ontology which is built from
the hierarchy of classes and the hierarchy of relations. When the propagation terminates (the class similarities and the relation similarities do not change after an iteration or a certain number of iterations is reached), if the similarities between classes or
relations exceed a threshold, they are considered as similar. ASCO runs now on the
two above phases.
ASCO algorithm was implemented in Java. It is built on Corese (Conceptual Resource Search Engine), the semantic search engine developed by ACACIA team [18,
19, 20]. Corese loads ontologies from RDF(S) files into memory, these ontologies are
then supplied to ASCO. ASCO was tested with two real-world ontologies:
O’COMMA, which has 472 concepts and 77 relations [37]; and O’Aprobatiom,
which has 460 concepts and 92 relations.
5.17 OLA (INRIA Rhône-Alpes & UoMontréal)
OLA [35, 36] is a class of algorithm for ontology alignments which targets the following characteristics:
 • covering all the possible characteristics of ontologies (i.e., terminological, structural and extensional);
 • taking care of collection structures (lists, sets) and accounting for them during
matching;
 • expliciting all recursive relationships and finding the best matching through iteration.
OLA is currently implemented for ontologies described in OWL-Lite [36]. It uses
the Alignment API and implementation that was recently developed [33].
The algorithm first compiles the OWL ontologies into graph structures unveiling
all relationships between entities. These graph structures produce the constraints for
expressing a similarity between the elements of the ontologies. The similarity between nodes of the graphs follows two principles: (i) it depends on the category of
node considered (e.g., class, property), and (ii) it takes into account all the features of
this category (e.g., superclasses, properties). This similarity is a weighted linear aggregation of the similarity measures between all the entities a couple of entities is in
relation. This accounts for all the relationships between entities. However, these features (like subclasses) are sets of entities, the similarity between these sets of entities,
thus depends on a local matching between these entities. A matching of both sets is
considered which is: (i) of maximal total similarity, (ii) exclusive, and (iii) of maximal size [69].
Similarity between labels can be produced by any kind of particular terminological
method (e.g., string distance, linguistic evaluation). Similarity between data values
and data types can be provided by specilised external similarity measures (e.g., Euclidean distance, symmetric difference distance) [69].
The definition of this similarity provides a set of equations whose variables are the
similarity values between entities of the ontologies. This set of equation cannot be
solved directly due to local matching. As a matter of fact, depending on the currently
computed similarity, the matching as defined above can be different. We thus developed an iterative algorithm which compute a first approximation of the similarity
(without the local matching), then compute the local matching and reiterate. We
proved that this algorithm is converging towards a solution, mainly because the similarity is always improving over the iterations. It can be that this solution is not the
global optimum so the algorithm should be launched several times.
From this solution, it is possible to extract an alignment between the two ontologies (by retaining the correspondence whose similarity is over a certain threshold, or
by optimising the selections of couples).
5.18 ArtGen (Stanford U.)
In [56] the authors propose a semi-automated algorithm for resolving terminological
heterogeneity among the ontologies and establishing the articulation rules necessary
for meaningful interoperation. This algorithm forms the basis of the articulation gen-
erator for the ONION (ONtology compositION) system. The automated articulation
generator (ArtGen) of ONION suggests articulation rules to the user performing the
matching process. A human expert can either accept, modify or alter the suggestions.
The expert can also indicate new matches that the articulation generator might have
missed.
The authors distinguish two types of articulation rules:
linguistic matching rules Concept names are represented as a string of words. The
linguistic matcher compaires all possible pairs of words from any two concepts of
both ontologies and assigns a similarity score to each pair. The matcher uses a word
similarity table generated by a word relator (Thesaurus-Based word relator or CorpusBased word relator) to look up the similarity between all possible pairs of words. The
similarity score between two concepts is the average of the similarity scores (different
from zero) of all possible pairs of words in their names. The linguistic matching rule
does not indicate the exact semantic relationship between the two concepts, for example, whether they have a class-subclass relationship, or are equivalent etc.
inference rules An inference engine in Datalog is capable of making logical derivations based on the inference rules available in the engine.
The ontologies that were used for the experiments were represented in RDF and
contained 30 respective 50 nodes, which are very small ontologies. The authors
demonstrate how the articulation rules are generated by the ONION system. The tool
was evaluated by computing precision and recall measures for the corpus and thesaurus based word relators. Accuracy was measured bycomparing the results of the automated articulation generator with those expected by the expert.
If the expert deleted a match of the articulation generator then precision is lowered.
In case the expert added a match that was not found by the articulation generator then
recall is lowered.
The thesaurus-based method resulted in very poor results, thought the corpus-based
method produced better results. However scalability was extremely low and the quality of the results were very dependent on the quality of the corpus available. When
everything was pre-computed, the corpus-based method scaled very well.
5.19 Alimo (ITI-CERTH)
The development and maintenance of large multimedia databases has attracted much
attention nowadays from companies and organizations that held multimedia content
(archives, broadcast companies, radio and TV channels etc). The goal is to bypass the
ineffective and time-consuming process of manual searching and retrieval of multimedia content and use computers to make the content easy to be found and accessible
to other parties. Thus, two critical points are identified in making the above goal a reality; effective representation as well as effective retrieval and exploration of multimedia content. For accomplishing the above goal researchers have started to use ontologies in the field of multimedia in order to construct machine-understandable,
descriptive versions of the multimedia content based on multimedia ontologies. Four
different levels of information are represented in multimedia ontologies: signal information, featural information, symbolic information and semantic information. With
the aid of multimedia ontologies the vision of querying and retrieving multimedia
content from distributed databases has started to become more feasible. But in order
for someone to be able to use all the levels of information, from the semantic to the
raw audiovisual one, a proper alignment framework should be provided. For this reason ITI-CERTH is constructing ALIMO (Alignment of Multimedia Ontologies), an
ontology alignment system that pay special care to each one of the subparts of a multimedia ontology and the attributes with the special meaning and structure. Semantic
descriptions will be aligned using methods hybrid alignment systems (terminological,
structural etc). The signal description parts will be compared by using visual matching
algorithms from the field of digital image and video processing. The feature description by examining the XML schema of the MPEG-7 visual part and at last the symbolic description by referring to the definitions of the concepts that those labels are
instances of, and also by examining the datatypes of the attributes assigned to those
instances.
5.20 Bibster (U. Karlruhe)
Bibster [14] addresses a typical problem in the daily life of a computer scientist,
where one regularly has to search for publications or their correct bibliographic
metadata.
The scenario that we support here is that researchers in a community share bibliographic metadata in a Peer-to-Peer fashion. Bibster is a Peer-to-Peer system based on
the SWAP architecture4, which allows to easily integrate, share and search bibliographic metadata using semantic technologies for the representation of the bibliographic instances and the peers’ expertise to allow effectively route queries. Semantic
similarity measures identifying duplicates allow to visualize and to integrate the heterogeneous search results from the peers. Bibliographic entries are extracted from
BibTex into an ontology.
The query results themselves represent small ontologies, containing duplicates.
Finding duplicates is closely related to finding corresponding mappings. In both
cases it is necessary to recognize identical objects despite their different identifiers. In
the given scenario duplicates are bibliographic entries which refer to the same publication or person in the real world, but are modelled as different resources. The similarity function is based on different features of the respective instances. For persons
one can refer to the name. For publications to title, authors, editors, journal, address,
type of publication, etc. The function returns a value between 0 and 1 by applying
specific heuristics to every feature: Strings are compared using the Levenshtein distance [45], the authors of publications are compared by comparing the two sets. Some
domain specific features require special heuristics: if the type of one publication is
“Misc”, this only means that no further information about the type was available. If
another publication is e.g. type “Article” the similarity is set to 0.5 rather than 0. Besides individual functions our approach focuses on applying an aggregation function
to achieve an overall similarity. Through transitive closure we receive a set of “identical” entities. Instead of presenting all instances of the query result, duplicates are visualized as one, merged, resource. These merged resources consist of a union of properties of the individuals identified as duplicates.
After several rounds of testing Bibster is now openly available 5, with the component based on alignment working in the background of the system.
5.21 QOM (U. Karlsruhe)
QOM considers both the quality of mapping results as well as the run-time complexity. The hypothesis is that mapping algorithms may be streamlined such that the loss of
quality (compared to a standard baseline) is marginal, but the improvement of efficiency is so tremendous that it allows for the ad-hoc mapping of large-size, lightweight ontologies. To substantiate the hypothesis, a number of practical experiments
were performed.
The outcome is QOM — Quick Ontology Mapping [31]. It is defined by the steps
of a process model as shown in Figure 5. Mapping one ontology onto another means
that for each entity (concept C, relation R, or instance I) in ontology O1, we try to find
a corresponding entity, which has the same intended meaning, in ontology O2.
Fig. 5. QOM Mapping Process
1. Firstly, QOM uses RDF triples as features.
2. Second, instead of comparing all entities of the first ontology with all entities of
the second ontology, QOM uses heuristics to lower the number of candidate mappings, which is a major problem for run-time complexity. In this dynamic programming approach we only choose promising candidate mappings.
3. The actual similarity computation is done by using a wide range of similarity functions [31]. An entity is described by the kind of appearance that is found to hold for
this entity for characteristics like: identifiers such as URIs, RDF/S primitives such
as subclass and instance relations, or domain specific features e.g. a hashcode-offile in a file sharing domain. These features of ontological entities are compared using String Similarity and SimSet for set comparisons. For efficiency reasons the
similarity computation was disburdened by removing extremely costly featuremeasure combinations such as the comparison of all subclasses of two concepts.
4. These individual measures are all input to the similarity aggregation. Instead of
applying linear aggregation functions, QOM applies a sigmoid function, which
emphasizes high individual similarities and de-emphasizes low individual similarities.
5. From the similarity values we derive the actual mappings. A threshold to discard
spurious evidence of similarity is applied. Further mappings are assigned based on
a greedy strategy that starts with the largest similarity values first.
6. Through several iteration rounds the quality of the results rises considerably.
Eventually, the output returned is a mapping table representing the relation
mapO1,O2 . The evaluation was very promising. Depending on the scenario QOM
reaches high quality mapping levels very quickly. QOM is on a par with other
good state-of-the-art algorithms concerning the quality of proposed mappings,
while outperforming them with respect to efficiency in terms of run-time complexity (O(n) instead of O(n2)) and in terms of the experiments we have performed
(by a factor of 10 to 100).
5.22 KILT (INRIA Lorraine)
A short description of KILT, a maintenance tool for comparing knowledge base versions within the KASIMIR system (see [21]).
The KASIMIR system is a knowledge-based system aimed at helping the decision
process when searching for an adequate treatment for patients ill with cancer. During
an update (or a revision) of a KASIMIR knowledge base, the need for automatically
comparing the old base KBold (before the update) and the new base KBnew (after the
update) has appeared and is rather important for controlling the evolution of a
knowledge base. A module comparing versions has to indicate what has been actually
updated, and to check whether the modifications are in accordance with the intents of
the knowledge engineer. This is the role of the module called KILT, that has been implemented and integrated into the PROTEGE knowledge editor. KILT enables to
make a partitioning of the problems (i.e. a problem is described by a concept denoting
a set of patients, and is possibly associated with a solution or a treatment), represented
in KBold and/or KBnew in four parts:
1. The problems that appear in the two bases, with the same solutions;
2. The problems that appear in the two bases, with different solutions;
3. The obsolete problems, appearing in KBold but not in KBnew;
4. The new problems, appearing in KBnew but not in KBold.
The above partitioning is based on the use of the KASIMIR reasoner. For example,
the new problems in category (4) can be found in the following way. Each problem
PBnew of KBnew is classified in the hierarchy of KBold, which enables to check
whether there is a problem PBold of KBold equivalent to PBnew, i.e. PBold subsumes and is subsumed by PBnew. If this is not the case, then PBnew is a new problem. The three other categories of problems (1), (2), and (3), can be detected and
checked in a similar way. This shows that the implementation of KILT is rather simple, once the connection with the KASIMIR reasoner is done.
KILT is integrated in PROTEGE in the following way. During a session, KBold
corresponds to the state of the knowledge base at the beginning of the session, and
KBnew to its current state. Therefore, the KILT module enables to visualize the edition modifications, i.e. addition or removal of a problem, and association of another
solution to an already known problem, at any time of the session. KILT makes comparisons at a semantic level: two concepts match when they have equivalent definitions, based on their attribute values and on the subsumption relation between classes.
One main drawback is that it is assumed that the attributes –and their names– do not
change from one knowledge base version to another.
References
1. D. Aum¨uller, H.H. Do, S. Massmann, and E. Rahm. Schema and ontology matching with
COMA++. In Proceedings of International Conference on Management of Data (SIGMOD),
Software Demonstration, 2005.
2. T. L. Bach, R. Dieng-Kuntz, and F. Gandon. On ontology matching problems (for building a
corporate semantic web in a multi-communities organization). In Proc. of ICEIS 2004, Porto (PT), 2004.
3. J. Barwise and J. Seligman. Information flow: the logic of distributed systems, volume 44 of
Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, Cambridge (UK), 1997.
4. H.W. Beck, S.K. Gala, and S.B. Navathe. Classification as a query processing technique in
the CANDIDE semantic data model. In Proceedings of International Conference on Data
Engineering (ICDE), pages 572–581, 1989.
5. D. Beneventano, S. Bergamaschi, S. Lodi, and C. Sartori. Consistency checking in complex
object database schemata with integrity constraints. IEEE Transactions on Knowledge and
Data Engineering, 10:576–598, 1998.
6. S. Bergamaschi, D. Beneventano, S. Castano, and M. Vincini. Momis: An intelligent system
for the integration of semistructured and structured data. Technical Report T3-R07,
Università di Modena e Reggio Emilia, Modena (IT), 1998.
7. S. Bergamaschi, S. Castano, and M. Vincini. Semantic integration of semistructured and
structured data sources. SIGMOD Record, 28(1):54–59, 1999.
8. J. Berlin and A. Motro. Autoplex: Automated discovery of content for virtual databases. In
Proceeding of CoopIS, pages 108–122, 2001.
9. J. Berlin and A. Motro. Database schema matching using machine learning with feature selection. In Proc. conference on advanced information system engineering (CAiSE), 2002.
10. Philip Bernstein, Alon Halevy, and Rachel Pottinger. A vision for managing complex models. SIGMOD Records, 29(4):55–63, 2000.
11. Daniel Le Berre. Jsat: The java satisfiability library, 2001.
12. A. Borgida, R. Brachman, D. McGuinness, and L. Resnick. Classic: A structural data model
for objects. SIGMOD Record, 18(2):58–67, 1989.
13. P. Bouquet, L. Serafini, and S. Zanobini. Semantic coordination: A new approach and an
application. In Proceedings of International Semantic Web Conference (ISWC), pages 130–
145, 2003.
14. J. Broekstra, M. Ehrig, P. Haase, F. van Harmelen, M. Menken, P. Mika, B. Schnizler, and
R. Siebes. Bibster - a semantics-based bibliographic peer-to-peer system. In Proceedings of
the SemPGrid 04 Workshop, New York, May 2004.
15. D. Calvanese, G. De Giacomo, and M. Lenzerini. A framework for ontology integration. In
Isabel Cruz, Stefan Decker, Jérôme Euzenat, and Deborah McGuinness, editors, The
emerging semantic web, pages 201–214. IOS Press, Amsterdam (NL), 2002.
16. S. Castano, V. De Antonellis, and S. De Capitani di Vimercati. Global viewing of heterogeneous data sources. IEEE Transactions on Knowledge and Data Engineering, (13(2)):277–
297, 2001.
17. W. Cohen, P. Ravikumar, and S. Fienberg. A comparison of string metrics for matching
names and records. In Proceedings of workshop on Data Cleaning and Object Consolidation
at International Conference on Knowledge Discovery and Data Mining (KDD), 2003.
18. O. Corby, R. Dieng, and C. Hébert. A conceptual graph model for w3c resource description
framework. In Proc. 8th International Conference on Conceptual Structures Logical, Linguistic, and Computational Issues, Darmstadt (DE), August 2000. Springer-Verlag.
19. O. Corby, R. Dieng-Kuntz, and C. Faron-Zucker. Querying the semantic web with the
corese search engine. In Proc. 15th ECAI/PAIS, Valencia (ES), August 2004. IOS Press.
20. O. Corby and C. Faron. Corese: A corporate semantic web engine. In Proc. WWW International Workshop on Real World RDF and Semantic Web Applications, Hawai (HA US),
May 2002.
21. M. d’Aquin, C. Bouthier, S. Brachais, J. Lieber, and A. Napoli. Knowledge editing and
maintenance tools for a semantic portal in oncology. Rapport de recherche A03-R-162,
LORIA, 2003.
22. T. Di Noia, E. Di Sciascio, F. M. Donini, and M. Mongiello. A system for principled
matchmaking in an electronic marketplace. In Proceedings ofWorldWideWeb Conference
(WWW), pages 321–330, 2003.
23. R. Dieng and S. Hug. Comparison of ”personal ontologies” represented through conceptual
graphs. In Proceedings of European Conference on Artificial Intelligence, pages 341–345,
1998.
24. R. Dieng and S. Hug. Multikat, a tool for comparing knowledge from multiple experts. In
Conceptual Structures: Theory, Tools and Applications, Proc. of the 6th Int. Conference on
Conceptual Structures (ICCS’98), Montpellier (FR), August 10-12 1998. Springer-Verlag,
LNAI 1453.
25. H.-H. Do, S. Melnik, and E. Rahm. Comparison of schema matching evaluations. In Proc.
GI-Workshop "Web and Databases", Erfurt (DE), 2002. http://dol.unileipzig.de/pub/200228.
26. H.H.Do and E. Rahm. COMA - a system for flexible combination of schema matching approaches. In Proceedings of Very Large Data Bases Conference (VLDB), pages 610–621,
2001.
27. A. Doan, J. Madhavan, P. Domingos, and A. Halevy. Learning to map ontologies on the
semantic web. In Proceedings of International World Wide Web Conference (WWW), pages
662–673, 2003.
28. A. Doan. Learning to map between structured representations of data. PhD thesis, University of Washington, Seattle (WA US), 2002.
29. A. Doan, P. Domingos, and A. Halevy. Reconciling schemas of disparate data sources: A
machine-learning approach. In Proceeding of SIGMOD, 2001.
30. A. Doan, J. Madhavan, P. Domingos, and A. Halevy. Ontology matching: a machine learning approach. In Steffen Staab and Rudi Studer, editors, Handbook of ontologies, International handbooks on information systems, chapter 18, pages 385–404. Springer Verlag, Berlin (DE), 2004.
31. M. Ehrig and S. Staab. QOM - quick ontology mapping. In Proc.3rd ISWC, Hiroshima
(JP), November 2004. to appear.
32. M. Ehrig and Y. Sure. Ontology mapping - an integrated approach. In Proceedings of European Semantic Web Symposium (ESWS), pages 76–91, 2004.
33. J. Euzenat. Brief overview of T-tree: the Tropes taxonomy building tool. In Proc. 4th ASIS
SIG/CR workshop on classification research, Columbus (OH US), pages 69–87, 1994.
34. J. Euzenat. An api for ontology alignment. In Proc. 3rd international semantic web conference, Hiroshima (JP), 2004.
35. J. Euzenat and P. Valtchev. An integrative proximity measure for ontology alignment. In
Proceedings of Semantic Integration workshop at International Semantic Web Conference
(ISWC), 2003.
36. J. Euzenat and P.Valtchev. Similarity-based ontology alignment in OWL-lite. In Proceedings of European Conference on Artificial Intelligence (ECAI), pages 333–337, 2004.
37. F. Gandon. Distributed Artificial Intelligence and Knowledge Management: ontologies and
multi-agent systems for a corporate semantic web. Scientific philosopher doctorate thesis in
informatics, INRIA and University of Nice - Sophia Antipolis, November 2002.
38. A. Gangemi, N. Guarino, C. Masolo, and A. Oltramari. Sweetening WordNet with DOLCE.
AI Magazine, (24(3)):13–24, 2003.
39 B. Ganter and R. Wille. Formal concept analysis: mathematical foundations. Springer Verlag, Berlin (DE), 1999.
40. F. Giunchiglia and P. Shvaiko. Semantic matching. The Knowledge Engineering Review
Journal, (18(3)):265–280, 2003.
41. F. Giunchiglia, P. Shvaiko, and M. Yatskevich. S-Match: an algorithm and an implementation of semantic matching. In Proceedings of European Semantic Web Symposium (ESWS),
pages 61–75, 2004.
42. F. Giunchiglia and M. Yatskevich. Element level semantic matching. In Proceedings of
Meaning Coordination and Negotiation workshop at International Semantic Web Conference (ISWC), 2004.
43. I. Horrocks, P. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, and M. Dean. SWRL: a semantic
web
rule
language
combining
OWL
and
RuleML,
2003.
www.daml.org/2003/11/swrl/.
44. Y. Kalfoglou and M. Schorlemmer. Ontology mapping:the state of the art. The Knowledge
Engineering Review, 18(1):1–31, 2003.
45. I. V. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals.
Cybernetics and Control Theory, 1966.
46. W. Li and C. Clifton. Semantic integration in heterogeneous databases using neural networks. In Proceedings of the 20th International Conference on Very Large Data Bases,
pages 1–12. Morgan Kaufmann Publishers Inc., 1994.
47. J. Madhavan, P. Bernstein, and E. Rahm. Generic schema matching with cupid. In Proceedings of Very Large Data Bases Conference (VLDB), pages 49–58, 2001.
48. A. Maedche and S. Staab. Measuring similarity between ontologies. In Proceedings of International Conference on Knowledge Engineering and Knowledge Management (EKAW),
pages 251–263, 2002.
49. B. Magnini, L. Serafini, and M. Speranza. Making explicit the semantics hidden in schema
models. In Proceedings of ISWC workshop on Human Language Technology for the Semantic Web and Web Services, 2003.
50. O. Mariño, C. Rechenmann, F. and P. Uvietta. Multiple perspectives and classification
mechanism in object-oriented representation. In 9th European Conference on Artificial Intelligence, pages 425–430, Stockholm, Suède, August 1990.
51. D.L. McGuinness, R. Fikes, J. Rice, and S. Wilder. An environment for merging and testing
large ontologies. In Proceeding of KR, pages 483–493, 2000.
52. S. Melnik, E. Rahm, and P. Bernstein. Rondo: A programming platform for model management. In Proc. ACM SIGMOD, San Diego (CA US), 2003.
53. S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching
algorithm. In Proceedings of International Conference on Data Engineering (ICDE), pages
117–128, 2002.
54. E. Mena, V. Kashyap, A. Sheth, and A. Illarramendi. Observer: An approach for query processing in global information systems based on interoperability between pre-existing ontologies. In Proceedings of International Conference on Cooperative Information Systems
(CoopIS), pages 14–25, 1996.
55. A.G. Miller. Wordnet: A lexical database for english. Communications of the ACM,
(38(11)):39–41, 1995.
56. P. Mitra and G. Wiederhold. Resolving terminological heterogeneity in ontologies. In
Workshop on Ontologies and Semantic Interoperability at the 15th European Conference on
Artificial Intelligence (ECAI), 2002.
57. I. Niles and A. Pease. Towards a standard upper ontology. In Proceedings of the International Conference on Formal Ontology in Information Systems (FOIS), pages 2–9, 2001.
58. N. Noy and M. A. Musen. Anchor-prompt: Using non-local context for semantic matching.
In Proceedings of workshop on Ontologies and Information Sharing at International Joint
Conference on Artificial Intelligence (IJCAI), pages 63–70, 2001.
59. L. Palopoli, G. Terracina, and D. Ursino. The system dike: Towards the semi-automatic
synthesis of cooperative information systems and data warehouses. In Proceeding of ADBISDASFAA, pages 108–117, 2000.
60. C. Parent and S. Spaccapietra. Database integration: the key to data interoperability. In M.
P. Papazoglou, S. Spaccapietra, and Z. Tari, editors, Advances in Object-Oriented Data
Modeling. The MIT Press, 2000.
61. R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and application of a metric on
semantic nets. IEEE Transactions on Systems, Man and Cybernetics, (19(1)):17–30, 1989.
62. E. Rahm and P. Bernstein. A survey of approaches to automatic schema matching. The International Journal on Very Large Data Bases (VLDB), (10(4)):334–350, 2001.
63. E. Rahm, H.H. Do, and S. Maßmann. Matching large XML schemas. SIGMOD Record,
33(4):26–31, 2004.
64. P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pages 448–
453, 1995.
65. D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph
searching. In Proceedings of Symposium on Principles of Database Systems (PODS), pages
39–52, 2002.
66. A. Sotnykova, M-A. Aufaure, N. Bennacer, N. Cullot, and C. Vangenot. Semantic mappings in description logics for database schema integration. Technical report, Swiss Federal
Institute of Technology in Lausanne, 2004.
67. J. Sowa. Conceptual Structures: Information Processing In Mind and Machine. AddisonWesley, 1984.
68. Gerd Stumme and Alexander Mädche. FCA-merge: bottom-up merging of ontologies. In
Proc. 17th IJCAI, Seattle (WA US), pages 225–230, 2001.
69. P. Valtchev. Construction automatique de taxonomies pour l’aide `a la repr´esentation de
connaissances par objets. Th`ese d’informatique, Universit´e Grenoble 1, 1999.
70. P. Valtchev and J. Euzenat. Dissimilarity measure for collections of objects and values.
Lecture notes in computer science, 1280:259–272, 1997.
71. U. Visser, T. Vögele, and C. Schlieder. Spatio terminological information retrieval using
the buster system. In Proceedings of the EnviroInfo, pages 93–100, Wien (AT), 2002.
72. T. Vögele, S. Hübner, and G. Schuster. Buster - an information broker for the semantic
web. Künstliche Intelligenz, 3:31–34, July 2003.
73. K. Zhang and D. Shasha. Approximate tree pattern matching. In A. Apostolico and Z. Galil,
editors, Pattern matching in strings, trees, and arrays, pages 341–371. Oxford University,
1997.
74. S. Sceffer and L. Serafini and S. Zanobini. Semantic Coordination of Hierarchical Classifications with Attributes, TechRep n 706, University of Trento, December 2004,
http://eprints.biblio.unitn.it/archive/00000706/.
75. RACER. Available at: www.sts.tu-harburg.de/~r.f.moeller/racer
76. M. Yatskevich: Schema-based Semantic Matching: Algorithms, a System and a Testing
Methodology. Technical Report DIT-05-047, University of Trento, 2005
77. P. Shvaiko: Iterative Schema-based Semantic Matching. Technical Report DIT-04-020,
University of Trento, 2004.
Download