WISDOM Programma di ricerca (cofinanziato dal MIUR, esercizio 2004) Ricerca Intelligente su Web basata su Ontologie di Dominio Web Intelligent Search based on DOMain ontologies Titolo del rapporto PAOLO BOUQUET, FAUSTO GIUNCHIGLIA, MIKALAI YATSKEVICH, STEFANO ZANOBINI D2.R1 15 giugno 2005 Sommario Sommario Tema Tema 1: Analisi Critica di linguaggi e tecniche di mapping Codice D2.R1 Data 15 giugno 2005 Tipo di prodotto Numero di pagine Unità responsabile Unità coinvolte Autori Autore da contattare Rapporto 35 TN TN, MO Paolo Bouquet, Fausto Giunchiglia, Mikalai Yatskevich, Stefano Zanobini Paolo Bouquet Critical analysis of languages and mapping techniques 1. Introduction Like the Web, the semantic Web will be distributed and heterogeneous. As such, the integration of resources found on the semantic Web is one of its main problems. To develop a solution of this problem, data will be expressed in the framework of ontologies. However, ontologies themselves can be heterogeneous and some work will have to be done to restore interoperability. Even with emerging web and ontology standards, coordinating ontology development – whether manual or automatic – will prove to be a challenging task. In evolving domains, it is expected that ontologies will not remain static and various versions of ontologies will have to be tracked. Interdisciplinary ontologies may need to be created from existing domain-specific ontologies, domain specific ontologies may need to be merged with more general ontologies, different versions of a single-domain ontology may need to be merged, and new information may need to be merged with existing ontologies. Furthermore, new ontologies may be built by merging information from heterogeneous databases or other information sources. Hence, these ontologies will have to be reconciled. Semantic interoperability can be grounded in ontology reconciliation: finding relationships between entities belonging to different ontologies. We call this process “ontology alignment” or “ontology matching”. Alignment results can be used for various purposes such as displaying the correspondences, transforming one source into another, creating a set of bridge axioms or rules between the ontologies, or generating query wrapper (query rewriting instead of transformation). The ontology alignment problem can be described in one sentence: given two ontologies which describe each a set of discrete entities (which can be classes, properties, rules, predicates, etc.), find the relationships (e.g., equivalence or subsumption) holding between these entities. The purpose of this deliverable is to present what is existing and ongoing in these various contexts so that research in ontology alignment can progress towards more complete, more integrated and especially more efficient means of solving heterogeneity problems through alignment. In the following we first present the terminology (Section 2). In Section 3 the major approaches to mapping languages are presented. In Section 4 classification of schema based matching techniques is presented and specific matching techniques are discussed. Section 5 is devoted to the systems implementing the matching techniques. 2. Terminology The framework presented in this document builds on top of a lot of recent work on the problem of semantic interoperability. In this area, different authors use different words to refer to similar concepts, and vice versa sometimes different concepts are referred to by the same name. In this section, we provide a tentative and partial glossary with the definition of terms as they will be used in the rest of the document. The glossary is based on a technical deliverable (D2.2.1) of the EU funded Network of Excellence called Knowledge Web. Mapping: a formal expression that states the semantic relation between two entities belonging to different ontologies. When this relation is oriented, this corresponds to a restriction of the usual mathematical meaning of mapping: a function (whose domain is a singleton). Mappings are discussed at length in Chapter 4. Ontology Alignment: a set of correspondences between two or more (in case of multi-alignment) ontologies (by analogy with DNA sequence alignment). These correspondences are expressed as mappings. Alignments are detailed in Chapter 3. Ontology Coordination: broadest term that applies whenever knowledge from two or more ontologies must be used at the same time in a meaningful way (e.g. to achieve a single goal). Ontology Transformation: a general term for referring to any process which leads to a new ontology o0 from an ontology o by using a transformation function t. Transformations and the like are the subject of further work in this work package. Ontology Translation: an ontology transformation function t for translating an ontology o written in some language L into another ontology o0 written in a distinct language L0. Ontology Merging: the creation of a new ontology om from two (possibly overlapping) source ontologies o0 and o00. This concept is closely related to the that of integration in the database community. Ontology Reconciliation: a process that harmonizes the content of two (or more) ontologies, typically requiring changes on one of the two sides or even on both sides [Hameed et al., 2004]. Meaning Negotiation: the protocol through which two agents (either human or artificial) agree on the changes required to reconciliate their ontologies. 3. Mappings languages The most of the state of the art mapping languages represent the mappings in two different ways. In the first family of approaches the mappings are represented as a tuples usually containing the nodes from both ontologies plus some additional information defining the strength and type of mapping. In the second family of mapping approaches the mappings are represented as logical statements (most often rules) which connect the entities of the both ontologies. Let us review these two categories in more detail. 3.1 Mappings as tuples This category of mapping languages relies on the very basic definition of a correspondence as the one of a pair of discrete entities in the language. The mapping element in this case is a tuple. Its general definition is roughly the following: entity1 the first aligned entity. It corresponds to some discrete entity of the representation language. entity2 the second aligned entity with the same constraint as entity1. relation the relation holding between the two entities. It is not restricted to the equivalence relation, but can be more sophisticated operators (e.g., subsumption, incompatibility [40], or even some fuzzy relation). strength denotes the confidence held in this correspondence. Since many alignment methods compute a strength of the relation between entities, this strength can be provided as a normalized measure. The measure should belong to an ordered set M including a maximum element and a minimum element. Most often, this value is restricted to be a float value between 0. and 1. In some approaches, this measure could be generalised into lattice domain. The measure is by no mean characterizing the relationship (e.g., as a fuzzy relation which should be expressed in the relation attribute), but reflects the confidence of the alignment provider in the relation holding between the entities or probability of the fact that the given relation holds between the entities. id an identifier for the correspondence. In various approaches, the aligned entities may be classes, properties or individuals. But they also can be any kind of complex term that is used by the target language. For instance, it can use the concatenation of firstname and lastname considered in [62] if this is an entity, or it can use a path algebra like in: hasSoftCopy.softCopyURI = hasURL 3.2 Mappings as rules This class of mapping representation languages considers sets of expressions of a particular language (L) with variables in these expressions. Correspondences are thus directional and correspond to a clause: in which the variables of the left hand side are universally quantified over the whole formula and those of the right hand side (which do not occur in the left hand side) are existentially quantified. This level can express correspondences like: This kind of rules (or restrictions) is commonly used in logic-based languages or in the database world for defining the views in “global-as-view” of “local-as-view” approaches [15]. It also resembles the SWRL rule language [43] when used with OWL. These rules can also be generalized to any relation and drop the orientation constraint. 4. Matching techniques 4.1 Problem Statement The matching operation determines the alignment (or mapping) (A’) for a pair of schemas/ontologies (o and o’). There are some other parameters which can extend the definition of the matching process, namely: (i) the use of an input mapping (A) which is to be completed by the process; (ii) the matching parameters, p (e.g., weights, thresholds); and (iii) external resources used by the matching process, r (e.g., thesauri); see Figure 1. Fig. 1. The matching process 4.2 The Matching Dimensions There are many independent dimensions along which algorithms can be classified. As from Figure 1, we may classify them according to (i) input of the algorithms, (ii) characteristics of the matching process, and (iii) output of the algorithms. Input dimensions. These dimensions concern the kind of input on which algorithms operate. As a first dimension, algorithms can be classified depending on the conceptual models in which ontologies or schemas are expressed. For example, the Artemis [16] system supports the relational, OO, and ER models; Cupid [47] supports XML and relational models; QOM [24] supports RDF and OWL models. A second possible dimension depends on the kind of data that the algorithms exploit: different approaches exploit different information of the input conceptual models, some of them rely only on schema-level information (e.g., Cupid [47], COMA[26]), others rely only on instance data (e.g., GLUE [27]), or exploit both, schema- and instance-level information (e.g., QOM [32]). Even with the same data models, matching systems do not always use all available constructs. In general, some algorithms focus on the labels assigned to the entities, some consider their internal structure and the type of their attributes, and some others consider their relations with other entities. Process dimensions. A classification of the matching process could be based on its general properties, as soon as we restrict ourselves to formal algorithms. In particular, it depends on the approximate or exact nature of its computation. Exact algorithms compute the absolute solution to a problem; approximate algorithms sacrifice exactness to performance (e.g., [32]). All the techniques discussed in the remainder of the paper can be either approximate or exact. Another dimension for analyzing the matching algorithms is based on the way they interpret the input data. Often the three large classes are distinguished. They are based on the intrinsic input, external resources, or some semantic theory of the considered entities. We call these three classes syntactic, external, and semantic respectively; and discuss them in detail in the next section. Output dimensions. Apart from the information that matching systems exploit and how they manipulate it, the other important class of dimensions concerns the form of the result they produce. The form of the alignment might be of importance: is it a one to- one correspondence between the ontology entities? Has it to be a final mapping? Is any relation suitable? Other significant distinctions in the output results have been indicated in [40]. One dimension concerns whether systems deliver a graded answer, e.g., that the correspondence holds with 98% confidence or 4/5 probability; or an all-or-nothing answer, e.g., that the correspondence definitely holds or not. In some approaches correspondences between schema/ontology entities are determined using distance measures. This is used for providing an alignment expressing equivalence between these entities in which the actual distance is the ground for generating a confidence measure in each correspondence, usually in [0,1] range [35, 47]. Another dimension concerns the kind of relations between aligned entities a system can provide. Most of the systems focus on equivalence (=), while a few other are able to provide a more expressive result (e.g., equivalence, subsumption (), incompatibility (), see for details [41]). In conclusion, there are many dimensions that can be taken into account when attempting at classifying matching methods. In the next section we present a classification of elementary techniques that draws simultaneously on several such criteria. 4.3 A retained classification of elementary schema-based matching approaches In this section mainly schema-based elementary matchers are discussed. Therefore, only schema/ontology information is considered, not instance data. The exact/approximate opposition has not been used because each of the methods described below can be implemented as exact or approximate algorithm, depending on the goals of the matching system. To ground and ensure a comprehensive coverage for our classification we have analyzed state of the art approaches used for schema-based matching. The references section reports a partial list of works which have been scrutinized pointing to (some of) the most important contributions. We have used the following guidelines for building our classification: Exhaustivity. The extension of categories dividing a particular category must cover its extension (i.e., their aggregation should give the complete extension of the category); Disjointness. In order to have a proper tree, the categories dividing one category should be pairwise disjoint by construction; Homogeneity. In addition, the criterion used for further dividing one category should be of the same nature (i.e., should come from the same dimension). This usually helps guaranteeing disjointness; Saturation. Classes of concrete matching techniques should be as specific and discriminative as possible in order to provide a fine grained distinctions between pos- sible alternatives. These classes have been identified following a saturation principle: they have been added/modified till the saturation was reached, namely taking into account new techniques did not require introducing new classes or modifying them. Notice that disjointness and exhaustivity of the categories ensures stability of the classification, namely new techniques will not occur in between two categories. Classes of matching techniques represent the state of the art. Obviously, with appearance of new techniques, they might be extended and further detailed. The classification presented extends the previous work of classifying automated schema matching approaches of [62]. The classification of [62] distinguishes between elementary (individual) matchers and combinations of matchers. Elementary matchers comprise instance-based and schema-based, element- and structure-level, linguistic and constrained-based matching techniques. Also cardinality and auxiliary information (e.g., thesauri, global schemas) can be taken into account. For classifying elementary schema-based matching techniques, two synthetic classifications (see Figure 2), based on what we have found the most salient properties of the matching dimensions are reused. These two classifications are presented as two trees sharing their leaves. The leaves represent classes of elementary matching techniques and their concrete examples. Two synthetic classifications are: – Granularity/Input Interpretation classification is based on (i) granularity of match, i.e., element- or structure-level, and then (ii) on how the techniques generally interpret the input information; – Kind of Input classification is based on the kind of input which is used by elementary matching techniques. The overall classification of Figure 2 can be read both in descending (focusing on how the techniques interpret the input information) and ascending (focusing on the kind of manipulated objects) manner in order to reach the layer of basic techniques. A retained classification of elementary schema-based matching approaches discuss in turn Granularity/Input Interpretation, Basic Techniques, Kind of Input layers together with supporting arguments for the categories/classes introduced at each layer. Elementary matchers are distinguished by the upper layer according to the following classification criteria: – Element-level vs structure-level. Element-level matching techniques compute mapping elements by analyzing entities in isolation, ignoring their relations with other entities. Structure-level techniques compute mapping elements by analyzing how entities appear together in a structure. This criterion is the same as first introduced in [62]. – Syntactic vs external vs semantic. The key characteristic of the syntactic techniques is that they interpret the input in function of its sole structure following some clearly stated algorithm. External are the techniques exploiting auxiliary (external) resources of a domain and common knowledge in order to interpret the input. Fig. 2. A retained classification of elementary schema-based matching approaches These resources might be human input or some thesaurus expressing the relationships between terms. The key characteristic of the semantic techniques is that they use some formal semantics (e.g., model-theoretic semantics) to interpret the input. To make the distinctions between the categories proposed clearer, we mark in bold type the innovations with regard to the initial classification of [62]. In particular, in the upper level we detail further (with respect to [62]), the element- and structurelevel of matching by introducing the syntactic vs semantic vs external criteria. The reasons of having these three categories are as follows. Our initial criterion was to distinguish between internal and external techniques. By internal we mean techniques exploiting information which comes only with the input schemas/ontologies. External techniques are as defined above. Internal techniques can be further detailed by distinguishing between syntactic and semantic interpretation of input, also as defined above. However, only limited, the same distinction can be introduced for the external techniques. In fact, we can qualify some oracles (e.g.,WordNet[55], DOLCE[38]) as syntactic or semantic, but not a user’s input. Thus, we do not detail external techniques any further and we omit in Figure 2 the theoretical category of internal techniques (as opposed to external). Notice, that we also omit in further discussions element-level semantic techniques, since the semantics is usually given in a structure and, hence there are no elementlevel semantic techniques. Distinctions between classes of elementary matching techniques in the middle layer of our classification are motivated by the way a matching technique interprets the input information in each concrete case. In particular, a label can be interpreted as a string (a sequence of letters from an alphabet) or as a word or a phrase in some natural language, a hierarchy can be considered as a graph (a set of nodes related by edges) or a taxonomy (a set of concepts having a set-theoretic interpretation organized by a relation which preserves inclusion). Thus, we introduce the following classes of elementary schema/ontology matching techniques at the element-level: string-based, language-based, based on linguistic resources, constraint-based, alignment reuse, and based on upper level ontologies. At the structure-level we distinguish between: graph-based, taxonomy-based, based on repositories of structures, and model-based techniques. The lower layer classification is concerned with the type of input considered by a particular technique: – The first level is categorized depending on which kind of data the algorithms work on: strings (terminological), structure (structural) or models (semantics). The two first ones are found in the ontology descriptions; the last one requires some semantic interpretation of the ontology and usually uses some semantically compliant reasoner to deduce the correspondences. – The second level of this classification decomposes further these categories if necessary: terminological methods can be string-based (considering the terms as sequences of characters) or based on the interpretation of these terms as linguistic objects (linguistic). The structural methods category is split into two types of methods: those which consider the internal structure of entities (e.g., attributes and their types) and those which consider the relation of entities with other entities (relational). Notice that following the above mentioned guidelines for building a classification the terminological category should be divided into linguistic and non-linguistic techniques. However, since non-linguistic techniques are all string-based, this category has been discarded. We discuss below the main alternatives (also indicating in which matching systems they were exploited) according to the above overall classification in more detail. The order follows that of the Granularity/Input Interpretation classification. Finally, in Figure 3, techniques which are marked in italic (techniques based on upper level ontologies and DL-based techniques) have not been implemented in any matching systems yet. However, we are arguing why their appearance seems reasonable in the near future. 4.4 Matching techniques String-based techniques are often used in order to match names and name descriptions of schema/ontology entities. These techniques consider strings as sequences of letters in an alphabet. A comparison of different string matching techniques, from distance like functions to token-based distance functions can be found in [17]. Usually, distance functions map a pair of strings to a real number, where a smaller value of the real number indicates a greater similarity between the strings. Some examples of string based techniques which are extensively used in matching systems are prefix, suffix, edit distance, and n-gram tests. Prefix. This test takes as input two strings and checks whether the first string starts with the second one. Prefix is efficient in matching cognate strings and similar acronyms (e.g., int and integer), see, for example [32, 35, 42, 46]. This test can be transformed in a smoother distance by measuring the relative size of the prefix and the ratio. Suffix. This test takes as input two strings and checks whether the first string ends with the second one (e.g., phone and telephone), see, for example [42, 26, 47, 53]. Edit distance. This distance takes as input two strings and computes the edit distance between the strings. That is, the number of insertions, deletions, and substitutions of characters required to transform one string into another, normalized by max(length(string1), length(string2)). For example, the edit distance between NKN and Nikon is 0.4. Some of matching systems exploiting the given test are [42, 26, 46]. N-gram. This test takes as input two strings and calculates the number of the same n-grams (i. e., sequences of n characters) between them. For example, trigram(3) for the string nikon are nik, iko, kon, so the distance between nkon and nikon would be 1/3. Some of matching systems exploiting the given test are [42, 26]. Language-based techniques consider names as words in some natural language (e.g., English). They are based on Natural Language Processing (NLP) techniques exploiting morphological properties of the input words. Tokenization. Names of entities are parsed into sequences of tokens by a tokenizer which recognizes punctuation, cases, blank characters, digits, etc. (e.g., HandsFree Kits <hands, free, kits>, see, for example [41]. Lemmatization. The strings underlying tokens are morphologically analyzed in order to find all their possible basic forms (e.g., KitsKit), see, for example [41]. – Elimination. The tokens that are articles, prepositions, conjunctions, and so on, are marked (by some matching algorithms, e.g., [47]) to be discarded. Usually, the above mentioned techniques are applied to names of entities before running string-based or lexicon-based techniques in order to improve their results. However, we consider these language-based techniques as a separate class of matching techniques, since they are naturally extended in a distance computation (by comparing the resulting strings or sets of strings). Constraint-based techniques are algorithms which deal with the internal constraints being applied to the definitions of entities, such as types, cardinality of attributes, and keys. We omit here a discussion of matching keys as these techniques appear in our classification without changes in respect to the original publication [62]. However, we provide a different perspective on matching datatypes and cardinalities. Datatypes comparison involves comparing the various attributes of a class with regard to the datatypes of their value. Contrary to objects that require interpretations, the datatypes can be considered objectively and it is possible to determine how a datatype is close to another (ideally this can be based on the interpretation of datatypes as sets of values and the set-theoretic comparison of these datatypes, see [67, 68]). For instance, the datatype day can be considered closer to the datatype working day than the datatype integer. This technique is used in [36]. Multiplicity comparison attribute values can be collected by a particular construction (set, list, multiset) on which cardinality constraints are applied. Again, it is possible to compare the so constructed datatypes by comparing (i) the datatypes on which they are constructed and (ii) the cardinality that are applied to them. For instance, a set of between 2 and 3 children is closer to a set of 3 people than a set of 10-12 flowers (if children are people). This technique is used in [36]. Linguistic resources such as common knowledge or domain specific thesauri are used in order to match words (in this case names of schema/ontology entities are considered as words of a natural language) based on linguistic relations between them (e.g., synonyms, hyponyms). – Common knowledge thesauri. The approach is to use common knowledge thesauri to obtain meaning of terms used in schemas/ontologies. For example,WordNet [49] is an electronic lexical database for English (and other languages), where various senses (possible meanings of a word or expression) of words are put together into sets of synonyms. Relations between schema/ontology entities can be computed in terms of bindings between WordNet senses, see, for instance [13, 41]. Another type of matchers exploiting thesauri is based on their structural properties, e.g., WordNet hierarchies. In particular, hierarchy-based matchers measure the distance, for example, by counting the number of arcs traversed, between two concepts in a given hierarchy, see [42]. Several other distance measures for thesauri have been proposed in the literature, e.g., [61, 64]. – Domain specific thesauri. These kind of thesauri usually store some specific domain knowledge, which is not available in the common knowledge thesauri, (e.g., proper names) as entries with synonym, hypernym and other relations, for instance [47]. Alignment reuse techniques represent an alternative way of exploiting external resources, which contain in this case alignments of previously matched schemas/ontologies. For instance, when we need to match schema/ontology o1 and o3, given the alignments between o1 and o2, and between o2 and o3 from the external resource, storing previous match operations results. The alignment reuse is motivated by the intuition that many schemas/ontologies to be matched are similar to already matched schemas/ontologies, especially if they are describing the same application domain. These techniques are particularly promising when dealing with large schemas/ontologies consisting of hundreds and thousands of entities. In these cases, first, large match problems are decomposed into smaller sub-problems, thus generating a set of schema/ontology fragments matching problems. Then, reusing previous match results can be more effectively applied at the level of schema/ontology fragments compared to entire schemas/ontologies. The approach was first introduced in [62], and later was implemented as two matchers, i.e., (i) reuse alignments of entire schemas/ontologies, or (ii) their fragments, see, for details [1, 26, 63]. Upper level formal ontologies can be also used as external sources of common knowledge. Examples are the Suggested Upper Merged Ontology (SUMO) [57] and Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE) [38]. The key characteristic of these ontologies is that they are logic-based systems, and therefore, matching techniques exploiting them will be based on the analysis of interpreta- tions. Thus, these are semantic techniques. For the moment, we are not aware of any matching systems which use these kind of techniques. However, it is quite reasonable to assume that this will happen in the near future. In fact, for example, the DOLCE ontology aims at providing a formal specification (axiomatic theory) for the top level part of WordNet. Therefore, most probably that systems exploiting now in their matching process resources of WordNet might also consider resources of DOLCE as potential extensions of their systems. Graph-based techniques are graph algorithms which consider the input as labeled graphs. The applications (e.g., database schemas, taxonomies, or ontologies) are viewed as graph-like structures containing terms and their inter-relationships. Usually, the similarity comparison between a pair of nodes from the two schemas/ontologies is based on the analysis of their positions within the graphs. The intuition behind is that, if two nodes from two schemas/ontologies are similar, their neighbors might also be somehow similar. Below, we present some particular matchers representing this intuition. Graph matching. There have been a lot of work done on graph (tree) matching in graph theory and also with respect to schema/ontology matching applications, see [65, 73]. Matching graphs is a combinatorial problem that can be computationally expensive. It is usually solved by approximate methods. In schema/ontology matching, the problem is encoded as an optimization problem (finding the graph matching minimizing some distance like the dissimilarity between matched objects) which is further resolved with the help of a graph matching algorithm. This optimization problem is solved through a fix-point algorithm (improving gradually an approximate solution until no improvement is made). Example of such algorithms are [53] and [36]. Some other (particular) matchers handling DAGs and trees are children, leaves, and relations. Children. The (structural) similarity between inner nodes of the graphs is computed based on similarity of their children nodes, that is, two non-leaf schema elements are structurally similar if their immediate children sets are highly similar. A more complex version of this matcher is implemented in [26]. Leaves. The (structural) similarity between inner nodes of the graphs is computed based on similarity of leaf nodes, that is, two non-leaf schema elements are structurally similar if their leaf sets are highly similar, even if their immediate children are not, see, for example [26, 47]. Relations. The similarity computation between nodes can be also based on their relations, see [48]. Taxonomy-based techniques are also graph algorithms which consider only the specialization relation. The intuition behind taxonomic techniques is that is-a links connect terms that are already similar (being a subset or superset of each other), therefore their neighbors may be also somehow similar. This intuition can be exploited in several different ways: Bounded path matching. Bounded path matchers take two paths with links between classes defined by the hierarchical relations, compare terms and their positions along these paths, and identify similar terms, see, for instance [58]. Super(sub)-concepts rules. These matchers are based on rules capturing the above stated intuition. For example, if super-concepts are the same, the actual concepts are similar to each other. If sub-concepts are the same, the compared concepts are also similar, see, for example [23, 32]. Repository of structures stores schemas/ontologies and their fragments together with pairwise similarities (e.g., coefficients in the [0,1] range) between them. Notice, that unlike to alignment reuse, repository of structures stores only similarities between schemas/ontologies, not alignments. In the following, to simplify the presentation, we call schemas/ontologies or their fragments as structures. When new structures are to be matched, they are first checked for similarity to the structures which are already available in the repository. The goal is to identify structures which are sufficiently similar to be worth matching in more detail, or reusing already existing alignments. Thus, avoiding the match operation over the dissimilar structures. Obviously, the determination of similarity between structures should be computationally cheaper than matching them in full detail. The approach of [63], in order to match two structures, proposes to use some metadata describing these structures, such as, fragment name, root name, number of nodes, maximal path length, etc. Then, these indicators are analyzed and are aggregated into a single coefficient, which estimates similarity between them. For example, schema S1 might be found as an appropriate match to schema S2 since they both have the same number of nodes. Model-based algorithms start from the consideration that each element belonging to real word ontologies represents a meaning, and that, consequently, mappings between such elements have to represent relations between meanings. Allowed relations between ontology elements are those that can express relations between concepts, namely ‘, , , , ’, where ‘’ means ‘disjoint’. Following this intuition, these approaches (i) try to build a formal representation of the meaning to be associated to each ontology node, and (ii) compute the relations between such meanings comparing them. Two examples of such methods have been proposed, which differ for the logics used for encoding the meaning: propositional logics (PL) and description logics (DL). Propositional logic-based (PL). In [13, 40, 41, 76, 77], the authors propose to encode the meaning of each node as a propositional formula. As an example, a node labeled with SIVIGLIA, lying in the two different paths IMAGES/EUROPE/SIVIGLIA and PHOTOS/SPAIN/SIVIGLIA is translated into the two formulas = (IMAGES EUROPE SIVIGLIA) and = (PHOTOS SPAIN SIVIGLIA). Mappings are then computed by comparing such meanings using a standard SAT solver. As an example, in order to determine if the two nodes SIVIGLIA are equivalent, such satisfiability problem is built: Axioms |= , where Axioms is a possibly empty set of axioms used for improving the results of comparison. As an example, such set can contain the following axioms: PHOTOS IMAGES, SIVIGLIA EUROPE, SIVIGLIA SPAIN. In [13, 40, 41], WORDNET is used as a source of external knowledge. Notice that SAT deciders are correct and complete decision procedures for propositional satisfiability, and therefore, they can be used for an exhaustive check of all the possible mappings. DL-based techniques. Propositional language is limited in its expressivity in order to codify element meanings, as it can not express, for example, properties or roles between concepts. As an example, it cannot codify the different meanings expressed by the two following paths: ART/EUROPE/WEST/SPAIN/SOUTH, ART/EUROPE/SOUTH/SPAIN/WEST, which intuitively represent the arts of southern and western Spain respectively. Using PL-based approaches described in previous paragraph, an undesired equivalence relation is obtained. In order to overcome these limitations, a new system, for encoding such meanings using Description logics, has been proposed in [74], as an extension of [13]. In this case, the two paths are encoded as ART producedIn.((SPAIN portion.SOUTH) partOf(EUROPE portion.WEST)) and ART producedIn.((SPAIN portion.WEST) partOf.(EUROPE portion.SOUTH)) respectively. Such DL terms respect the difference between node meanings, so that no wrong entailment can be reached, and, consequently, no mapping is computed. Furthermore, in [74] offers the possibility to add each consistent OWL ontology as further external knowledge. There are other relevant examples in the literature of DL-based techniques used in schema/ontology integration applications. For example, in spatio-temporal database integration scenario, as first motivated in [60] and later developed in [66] the interschema mappings are initially proposed by the integrated schema designer and are encoded together with input schemas in ALCRP language. Then, DL reasoning services are used to check the satisfiability of the two source schemas and the set of interschema mappings. If some objects are found unsatisfied, then the inter-schema mappings should be reconsidered. Another example is when DL-based techniques are used in query processing scenario [54]. The approach assumes that mappings between pre-existing domain ontologies are already specified in a declarative manner (e.g., manually). User queries are rewritten in terms of pre-existing ontologies and are expressed in Classic [12], and further evaluated against real-world repositories, which are also subscribed to the preexisting ontologies. An earliest approach for query answering by terminological reasoning is described in [7]. Finally, a very similar problem to schema/ontology matching is addressed within the system developed for matchmaking in electronic marketplaces [22]. Demand D and supply S requests are translated from natural language sentences into Classic [9]. The approach assumes the existence of a pre-defined domain ontology T , which is also encoded in Classic. Matchmaking between a supply S and a demand D is performed with respect to the pre-defined domain ontology T . Reasoning is performed with the help of the NeoClassic reasoner in order to determine the exact match (T |= (D S)) and (T |= (S D)), potential match (if D S is satisfiable in T ), and nearly miss (if D S is unsatisfiable in T ). The system also provides a logically based matching results rank operation. The main difference between these approaches and [74] is that they seem to presuppose the existence of inter-schema/inter-ontology mappings, and do not support their discovery. DL reasoners are then used only to enable integration using reasoning on these mappings. 5. Systems descriptions The various methods presented above in isolation have been put together in order to implement ontology alignment or schema matching systems. There are a number of available systems that can be seen as addressing ontology alignment. We present some of them below through their principles and availability. Some of the following systems are developed by the projects partners and thus will be usable in order to benchmark them in the future. There were some comparisons of these systems, in particular in [25, 62, 44, 60]. Our purpose here is not really to compare them, but rather to show their variety. 5.1 Prompt and Anchor-Prompt (Stanford SMI) The Anchor-PROMPT [58] (an extension of PROMPT, also formerly known as SMART) is an ontology merging and alignment tool with a sophisticated prompt mechanism for possible matching terms. The anchor-PROMPT alignment algorithm takes as input two ontologies and a set of anchors-pairs of related terms, which are identified with the help of string-based techniques, or defined by a user, or another matcher computing linguistic (dis)similarity between frame names (labels at nodes), for example [51]. Then it refines them based on the ontology structures and users feedback. It constructs a directed labeled graph representing the ontology from the hierarchy of concepts (called classes in the algorithm) and the hierarchy of relations (called slots in the algorithm), where nodes in the graph are concepts and arcs are relations denoting relationships between concepts (the labels are the names of the relations). An initial list of anchors-pairs of related concepts defined by the users or automatically identified by lexical matching is the input for the algorithm. Anchor- PROMPT analyzes then the paths in the sub-graph limited by the anchors and it determines which concepts frequently appear in similar positions on similar paths. Based on these frequencies, the algorithm decides if these concepts are semantically similar concepts. The PROMPT and Anchor-PROMPT systems have also contributed to the design of other algorithms such as PROMPTDiff which finds differences between two ontologies and provides the editing operation for transforming one ontology into another. 5.2 Chimerae (Stanford KSL) Chimaera is an environment for merging and testing (diagnosing) large ontologies [51]. Matching in the system is performed as one of the major subtasks of a merge operator. Chimaera searches for merging candidates as pairs of matching terms, involving term names, term definitions, possible acronym and expanded forms, names that appear as suffixes of other names. It also has techniques to identify terms that should be related by subsumption, disjointness, etc. 5.3 Rondo (Stanford U./U. Leipzig) Rondo [52] is an environment for model (e.g., database schema) engineering which provides many unit primitives for manipulating models (extract, restrict, delete) and way to compose them. Among the unit primitives is the implementation of Similarity flooding (see § 4.2.1). It converts schemas (SQL DDL, XML) into directed labeled graphs whose nodes are candidate aligned pairs and arcs are shared properties. Arcs are weighted by their relevance to the nodes. 5.4 MoA (ETRI) MOA1 is an environment for merging ontologies developped by Electronics and Telecomunication Research Institute (ETRI) in South Korea. It is a library of methods and a shell for using them. It can work on OWL (but does not tell which flavor) and contains methods for importing, aligning, modifying and merging ontologies. Unfortunately, the methods are not known beside that they are based on (dis)similarity. The system uses Jena and Wordnet. 5.5 Cupid (Microsoft research) The Cupid system [47] implements a generic schema matching algorithm combining linguistic and structural schema matching techniques, and computes normalized similarity coefficients with the assistance of a precompiled thesaurus. Input schemas are encoded as graphs. Nodes represent schema elements and are traversed in a combined bottom-up and topdown manner. Matching algorithm consists of three phases and operates only with tree-structures to which no-tree cases are reduced. The first phase (linguistic matching) computes linguistic similarity coefficients between schema element names (labels) based on morphological normalization, categorization, stringbased techniques and a thesaurus look-up. The second phase (structural matching) computes structural similarity coefficients which measure the similarity between contexts in which individual schema elements occur in the schemas under consideration. The main idea behind the structural matching algorithm is to rely more on leaf level matches instead of the immediate descendents or intermediate substructures when computing similarity between non leaf elements. The third phase (mapping generation) computes weighted similarity coefficients and generates final mappings by choosing pairs of schema elements with weighted similarity coefficients which are higher than a threshold. In comparison with the other hybrid matchers e.g., Dike [59] and Artemis (see 5.9), referring to [47], Cupid performs better in the sense of mapping quality. 5.6 Glue (U. of Washington) Glue [28] is an evolved version of LSD [29] whose goal is to semiautomatically find schema mappings for data integration. Like its ancestor LSD, Glue use machine learning techniques to find mappings [30]. It first applies statistical analysis to the availa- ble data (joint probability distribution computation). Then generates a similarity matrix, based on the probability distributions, for the data considered and use “constraint relaxation” in order to obtain an alignment from the similarity. The algorithm works in three steps: learning distributions the first phase is described above, it learns the joint probability distributions of classes of each ontologies; similarity estimation the system estimates the similarity between two classes in function of their joint probability distributions. relaxation produces an alignment from the similarity matrix by using heuristic rules for choosing the more likely correspondences. 5.7 FCA-merge (U. Karlsruhe) FCA-merge [68] uses formal concept analysis techniques to merge two ontologies sharing the same set of instances. The overall process of merging two ontologies consists of three steps: 1. instance extraction, 2. concept lattice computation, and 3. interactive generation of the final merged ontology. The algorithms theoretically merges two ontologies sharing the same set of instances. However, the authors provide, as first step, methods for extracting the instances from documents. The extraction of instances from text documents circumvents the problem that in most applications there are no individuals which are simultaneously instances of the source ontologies, and which could be used as a basis for identifying similar concepts. The computation of the lattice starts with two ontologies and instances belonging to both ontologies. From these, it computes two formal contexts, i.e., boolean tables indicating which instance belongs to which concept of the of the ontology. It then merges both contexts (by renaming the concepts and adding both contexts). Using classical formal concept analysis (i.e., the closure of an instances×properties Galois connection [39]) on contexts made of instances×concepts, the method generates a pruned concept lattice. The lattice is pruned of all the concepts which are not more general than a concept of one of the ontologies. The last step consists in helping a user to further simplify the lattice and generate the taxonomy of an ontology. The produced result is explored and transformed to a merged ontology by the ontology engineer. The final step of deriving the merged ontology from the concept lattice requires human interaction. The result is rather a merge than an alignment. However, the concepts that are merged can be considered as exactly aligned and those which are not can be considered in subsumption relation with their ancestors or sibblings. 5.8 IF-Map Another system inspired by formal concept analysis is IF-Map [44]. It is an automatic method for ontology mapping based on the Barwise-Seligman theory of information flow [3]. The basic principle of IF-map is to align two local ontologies by looking at how these are mapped from a common reference ontology. It is assumed that such reference ontology is not populated with instances, while local ontologies usually are. IF-Map generates possible mappings between an unpopulated reference ontology and a populated local ontology by taking into account how local communities classify instances with respect to their local ontologies. 5.9 Artemis (U. Milano/U.Modena and Reggio Emilia) Artemis (Analysis of Requirements: Tool Environment for Multiple Information Systems) [16] was designed as a module of MOMIS mediator system [6, 7] for creating global views. Artemis does not cover all the issues of matching due to the origin function of schema integration. The matching algorithm performs affinity-based analysis and hierarchical clustering of source schemas elements. Affinity-based analysis is carried out through computation of the name, structural and global affinity coefficients by exploiting a common thesaurus. The common thesaurus presents a set of terminological and extensional relationships which depicts intra- and inter-schema knowledge about classes and attributes of the input schemas, which is built with the help of WordNet [55] and ODB-Tools [5]. A hierarchical clustering technique exploiting global affinity coefficients categorizes classes into groups at different levels of affinity. For each cluster it creates a set of global attribute global class. Logical correspondence between the attributes of a global class and source attributes is determined through a mapping table. 5.10 T-tree (INRIA Rhône-Alpes) Troeps [50] was a knowledge representation system enabling several class taxonomies (called viewpoints) over the same set of objects and bridges between these classes expressed equivalence or subsumption. T-tree [33] is an environment for generating taxonomies and classes from objects (instances). It can, in particular, infer dependencies between classes (bridges) of different ontologies sharing the same set of instances based only on the “extension” of classes. An algorithm has been developed which is able to infer bridges. The bridge inference algorithm, given a set of source viewpoints and a destination viewpoint (built by T-Tree or by any other mean), returns all the bridges (in a minimal fashion) which are satisfied by the available data. That is the set of bridges for which the objects in every source class are indeed in the destination class. The algorithm compares the extension (set of instances) of the presumed destination to the intersection of these of the presumed source classes. If there is no inclusion of the latter in the former, the algorithm is re-iterated on all the sets of source classes which contain at least one class which is a sub-class of the tested source classes. If the intersection of the extension of the presumed source classes is included in that of the presumed destination class, then a bridge can be established from the latter (and also from any set of sub-classes of the source classes) to the former (and also any super-class of the destination class). But other bridges can exists on the sub-classes of the destination. The algorithm is thus reiterated on them. It stops when the bridge is trivial, i.e. when the source is empty. The algorithm is extension-correct (only valid bridges are inferred), extension-complete (all valid bridges are inferred) and extension-minimal (only more general bridges are inferred). The proof is carried out in the classification scheme framework and the “extension-” prefix just tells that what is considered is only the extension of the classes (the algorithm tests set inclusion on classes). Thus these results are not semantically grounded. For instance, is that a coincidence that all directors have formerly been at the same university? Maybe, maybe not. Hence the user has to decide the validation of inferred bridges. This has to be contrasted with a stronger kind of bridge inference based on the structural constraints on classes. But indeed, any possible bridge compliant with the current set of objects and the semantics must be a restriction of one of the bridges provided by the algorithm. Bridge inference is nothing else than the search for correlation between two sets of variables. This correlation is particular from a data analysis point of view since it does not need to be valid on the whole set of individuals (the algorithm looks for subsets under which the correlation is valid) and it is based on strict set equality (not similarity). However, even if the bridge inference algorithm has been described with set inclusion, it can be helped by other measurements which will narrow or broaden the search. More generally, the inclusion and emptiness tests can be replaced out by tests based on the similarity of two sets of objects (as it is usual in data analysis). In fact, many parameters can be taken into account when inferring bridges; for that purpose, the algorithm is function of the meaning of the operators _, \ and = ;-test. A second version of the algorithm (with the same properties) were made available and used structural comparison: _ is subtyping, \ is type intersection and = ;-test is a sub-typing test. 5.11 CtxMatch (U. Trento / ITC-IRST) CtxMatch is a model based schema/ontology matcher which compute mapping (namely a set of point-to-point relations) between schema/ontology elements. CtxMatch (i) encodes the meaning of each node in a formal language, and (ii) computes a mapping between elements by comparing their meaning. The accepted relations between schema/ontology elements are , , , , (where ‘’ means ‘disjoint’). In the first version of CtxMatch [13], the algorithm used propositional logic for approximating the element meanings. In order to overcome the limitations inherent to the propositional language expressivity, as described in section 4.4, concerning the Model-Based Systems, in [74] a new encoding is proposed, where the meaning of the nodes is encoded into Description Logics terms. Figure 3 shows the architecture of the CtxMatch system. It has two main modules: Context Creator and Context Matcher. The first takes as input a schema with attributes and returns a context. Currently accepted formats are: XML, XMLSchema, OWL, plain text. The main steps for the transformation into a context are: Syntactic normalization: different input formats are transformed into a unique (internal) format (sintactically normalized schema) Semantic normalization: essentially, a syntactically normalized schema is transformed into a rich schema (called context) where each node is associated with a Description Logics term representing its contextual meaning (its meaning in the context of that schema). For example, a (path in a) schema S1 composed by the nodes VACATION[location=’Italy’]/HONEYMOON/INSURANCES is associated with three DL terms, VACATION location. ITALY, HONEYMOON location. ITALY, INSURANCES against. (HONEYMOON location. ITALY), representing the meanings of the three nodes VACATION, HONEYMOON and INSURANCES respectively. Fig. 3. The CtxMatch architecture. Semantic normalization invokes the following services: Lexical Knowledge Extractor: this service takes as input a lexicon repository and a word, and returns the concepts possibly denoted by the word with respect to lexicon repository. Currently, WordNet is used as the only source of lexical knowledge. Background Knowledge Extractor: It takes as input an ontology repository and a pair of concepts and returns the relation that holds between the concepts with respect to the ontology repository. Currently, the ontological part of WordNet and some OWL ontologies are used as repository. Anyway, each consistent OWL ontology can be added as further resource. Label Parser: This service takes as input a label and returns a DL terms expressing its meaning. This label parser preprocesses the label elements (words) using a standard lemmatizer and a POS tagger. Then, it creates a label-specific parse tree, expressing the logic dependencies between label elements; Path Composer: It takes as input a set of labels belonging to different node and combine them in order to express the meanings of the node. The Context Matcher takes two contexts (semantically normalized schemas) and returns a rmapping between them (i.e. a collection of pointwise relations between elements of the two schemas). In the current version of the system, RACER DL Reasoner [75] is used to compute mappigs. Background Knowledge Extractor is further exploited in order to possibly increase the meaning comparison. As an example, imagine a schema S2 containing the following three nodes, POLICIES/HONEYMOON/ITALY. The ‘Context Creator’ returns the DL terms POLICIES, POLICIES against. against.(HONEYMOON location. ITALY. HONEYMOON, POLICIES Context Matcher returns an equivalentce relation between the node INSURANCES of the previous schema and the node ITALY, as the DL terms INSURANCES against. (HONEYMOON location. ITALY) and POLICIES against.(HONEYMOON location. ITALY) are provably equivalent (modulo lexical knowledge). 5.12 S-MATCH (U. Trento) S-Match is a schema/ontology matching system that implements the semantic matching approach [41, 13]. It takes two graph-like structures (e.g., database schemas or ontologies) as input and returns semantic relations between the nodes of the graphs, that correspond semantically to each other, as output. Possible semantic relations are: equivalence (=), more general (), less general (), mismatch () and overlapping (). The current version of S-Match is a rationalized re-implementation of the CTXmatch system [13] with a few added functionalities. S-Match is schema based, and, as such, it does not exploit the information encoded in data instances. S-Match is a hybrid system performing composition of element level techniques. At present, SMatch allows it to handle only tree-like structures (e.g., taxonomies or concept hierarchies). S-Match was designed and developed as a platform for semantic matching, namely a highly modular system with the core of computing semantic relations where single components can be plugged, unplugged or suitably customized. The logical architecture of the system is depicted in Figure 4. Fig. 4. The logical architecture. The input schemas (trees) are codified in a standard internal XML format. This internal format can be loaded from a file that is manually edited, or can be produced by an input format dependent translator. The module taking input schemas/ontologies does the preprocessing. In particular, it computes in a top-down manner for every label in a tree the meaning captured by the given label in a schema or ontology using the techniques described in [49]. The preprocessing module has access to the set of oracles which provide the necessary a priori lexical and domain knowledge. In the current version WordNet [55] is the only oracle. The output of the module is an enriched tree. These enriched trees are stored in an internal database (PTrees) where they can be browsed, edited and manipulated. The Matching Manager coordinates matching process using three extensible libraries. The first library is contained of, what is called in [41], weak semantics element level matchers. They perform string manipulations (e.g., prefix, n-grams analysis, edit distance, soundex, data types, and so on) and try to guess the semantic relation implicitly encoded in similar words. The current version of S-Match contains 13 weak semantics element level matchers. The second library is contained of strong semantics element level matchers, namely oracles. Currently, WordNet is the only oracle. The third library is contained of structure level strong semantics matchers, namely SAT solvers (among the others, the SAT deciders that we are currently testing is JSAT [11] and Open4J by Daniel Le Berre). 5.13 Coma (U. Leipzig) The COMA system [26] is a generic schema matching tool, which implements composite generic matchers. COMA provides an extensible library of matching algorithms; a framework for combining obtained results, and a platform for the evaluation of the effectiveness of the different matchers. Matching library is extensible, and as from [Do and Rahm, 2002] it contains 6 individual matchers, 5 hybrid matchers, and one “reuse-oriented” matcher. Most of them implement string-based techniques as a background idea; others share techniques with Cupid (see § 5.5) but reuse-oriented is a completely novel matcher, which tries to reuse previously obtained results for entire new schemas or for its fragments. Schemas are internally encoded as rooted directed acyclic graphs, where elements are the paths. This aims at capturing contexts in which the elements occur. One of the distinct features of the COMA tool is the capability to perform iterations in matching process. It presumes interaction with a user which approves obtained matches and mismatches to gradually refine and improve the accuracy of match. Based on the comparative evaluations conducted in [25], COMA dominates Autoplex& Automatch [8, 9] LSD [29], Glue [27], SF [53] and SemInt [46] matching tools. 5.14 Buster (U. Bremen) The Bremen University Semantic Translator for Enhanced Retrieval (BUSTER) [71] is an information broker middleware that was built to access heterogeneous and distributed information sources and to assess their conceptual, spatial, and temporal relevance with respect to a specific information request. BUSTER can also be used to integrate heterogeneous information through the resolution of structural, syntactical, and semantic heterogeneities. To be more precise, the BUSTER system provides two subsystems, one for information filtering and one for information integration. The BUSTER search module supports the specification of queries of the type concept @ location in time [72]. In addition to the conceptual semantics, the system evaluates the spatial as well as the temporal relevance of an information source. In order to be able to reason about conceptual, spatial, and temporal relevance, BUSTER utilizes metadata that provide formal descriptions of the respective context of an information source. In principle, the main difference with respect to other system for query processing and information integration lies in the fact that the user does commit to a basic vocabulary that is used to define concepts in all the source ontologies. The basic vocabulary ensures that different source ontologies are comparable to each other. By formulating the query in terms of this basic vocabulary we the query can be interpreted with respect to all source ontologies in the system. In particular, each concept base on the shared vocabulary and can be constructed with the help of some construction operators like u,t well-known from description logics. Because each concepts also from different source ontologies can be flatten to terms which only consists of elements of the shared vocabulary combined with some construction operators, they can easily compared with respect to equality (=), subsumption (), overlap (C D is consistent), and inconsistence (C D is inconsistent). In other words, BUSTER can automatically determine these concepts in a source ontology that are most similar to the concept we asked for. 5.15 MULTIKAT (INRIA Sophia Antipolis) MULTIKAT [23, 24] is a tool enabling comparison and merging of two ontologies, represented in Sowa’s conceptual graph formalism [67]. In this formalism, an ontology is represented through a support (i.e. a hierarchy of concept types, a hierarchy of relation types, a set of markers for identifying the instances and a conformity relation enabling to determine which types are compatible with a given marker). The building of an integrated ontology from two ontologies relies on the following steps: 1. Comparison and merging of the two concept type hierarchies: this step enables to solve name conflicts and in case of need, to add new concept types and to adapt concept type definitions. 2. Comparison and merging of the two relation hierarchies: this step enables to solve name conflicts, and in case of need, to add new relation types, to adapt relation type definitions and to adapt relation type signatures. 3. Comparison and merging of the two sets of markers: this phase helps to solve name conflicts and to adapt the conformity relation. MULTIKAT relies on a cooperative approach: the knowledge engineer can use MULTIKAT editor to tune the parameters and weights used in the mapping and merging algorithms. 5.15.1 Mapping of two types in both hierarchies The mapping algorithm aims at determining, in two concept (resp. relation) type hierarchies, which types are identical. It relies on two phases: Phase 1: terminology-based mapping During this first phase, MULTIKAT algorithm tries to identify which types of both hierarchies are similar, according to their main names and their synonyms. The knowledge engineer can combine several criteria and assign them different weights so as to privilege some criteria: • t1 and t2 have the same main name, • the number of common synonyms of t1 and t2 is greater than a given threshold, • the main name of one type belongs to the list of synonyms of the other type. This similarity function Sim1 : H1×H2 ! R computes the similarity measure Sim1(t1, t2) between t1, a type of H1 and t2, a type of H2, according to this first identification phase, and its results are stored in a similarity matrix. After this phase, two types t1 and t2 are 1-similar iff Sim1 (t1, t2) is greater than a threshold Tsimilar. Phase 2: context-based mapping In this second phase, the mapping algorithm now considers the contexts of the types to be compared. The context of a type consists of its relatives (i.e. its direct supertypes and its direct subtypes) in the type hierarchy. In this second phase, the algorithm tries to identify which types of both hierarchies are the same, according to their contexts. Three mapping cases are distinguished: • The number of 1-similar direct supertypes (resp. direct subtypes) of t1 and t2 are greater than a threshold Tpred (resp. Tsucc) • All the direct supertypes (resp. direct subtypes) of t1 and t2 are 1-similar. • The set of relatives of t1 (resp. t2) is included in the set of relatives of t2 (resp. t1) w.r.t. 1-similarity. The knowledge engineer can associate different weights to these three cases. Another similarity function Sim2(t1, t2) is computed. If t1 is the type numbered i in Hier1 and t2 the type numbered j in Hier2, then, in the final similarity matrix SimMatr: SimMatrij = Sim1(t1, t2) + Sim2(t1, t2) The couples of identical types are computed from this similarity matrix. After the second phase, the types t1 and t2 are considered as identical iff SimMatrij is the maximum value in the ith line and the jth column in the matrix, and this value is greater than a threshold Tsame. Two comparison strategies can be applied: One-to-one algorithm For each cycle of comparison of two previous identification phases, the algorithm compares each type of H1 to each type of H2. Hierarchy-match algorithm This algorithm takes into account the hierarchical structure in its comparison strategy. It relies on a depth-first search in both hierarchies and it proceeds as follows: once two identical types have been found, then a search for further mappings in their sub-hierarchies is performed. In this algorithm, the thresholds Tsimilar and Tsame have the same values. In both previous phases, after each evaluation of a couple of types (t1, t2), the corresponding value SimMatrij is compared to Tsame. As soon as SimMatrij > Tsame, then the pair (t1, t2) is included in the set IdenticalConceptTypes. 5.15.2 Merging of concept type hierarchies The knowledge engineer can initialize the set IdenticalConceptTypes by indicating which types of both hierarchies are already known as identical. The mapping algorithm is applied (either with a one-to-one match strategy or with a hierarchy match strategy). Then, before the merging, the partial ordering relation of identical types is checked. The couples, responsible for violation of the merging precondition are eliminated from IdenticalConceptTypes. Then the integrated hierarchy Tccom is built by representing each couple of identical types by a single type in Tccom and by adding the types appearing in only one hierarchy. If a type is present only in one ontology and cannot be mapped to any type of the second ontology, it will be kept in the integrated hierarchy, with a prefix in its name indicating from which expert it comes from. If a type is present in both hierarchies, the experts can choose its final main name stored in Tccom. In all cases, the associated synonyms are also stored in Tccom. The algorithm tries to detect terminology conflicts, topology conflicts and conflicts specific to conceptual graph formalism. 5.15.3 Comparison and merging of the relation type hierarchies The mapping algorithm for relation type hierarchies is similar to the 2-phase-based algorithm previously presented. Once obtained the set IdenticalRelationTypes of pairs of identical relation types, the precondition for merging of the two hierarchies must also be checked. When two relation types are considered as identical, a verification of their signature compatibility must be performed. The signatures in the integrated relation type hierarchy Trcom must be adapted according to the integrated concept type hierarchy Tcom. If a relation type is present only in one ontology, its signature is preserved in the integrated ontology. The signature of the integrated relation type obtained from two identical relations types relies on the supremum of the concept types appearing in their signatures. 5.15.4 Comparison and merging of the marker sets The terminology-based mapping algorithm is used for the set of markers. When two markers are identical, their conformity relation must be compatible, otherwise they are eliminated from the set IdenticalMarkers. 5.15.5 Implementation MULTIKAT was implemented in C/C++ and JAVA, above the conceptual graph platform, COGITO (developed by the LIRMM) and was applied in traffic accident analysis. 5.16 ASCO (INRIA Sophia-Antipolis) ASCO prototype relies on an algorithm that identifies the pairs of corresponding elements in two different ontologies [2]. These pairs may be pairs of concepts (classes) in the two ontologies or pairs of relations, or even pairs of a concept in one ontology and a relation in the other ontology. ASCO tries to use as much as possible available information contained in the ontology for the process of matching two ontologies. This information consists of identifiers (names), labels, comments of concepts, identifiers, labels, comments, domain, range of relations, structure of the taxonomy of concepts or of relations, data instances of ontology, annotations, axioms, rules. So far, in its matching process, ASCO already takes into account some of above information such as identifiers, labels, comments of concepts, identifiers, labels, comments, domain, range of relations, structure of the taxonomy of concepts or of relations. The matching process of ASCO is composed of several phases. The linguistic phase applies linguistic processing techniques, and uses string comparison metrics, and lexical databases such as WordNet to compute the similarity of two concepts or two relations. In the linguistic processing step, ASCO normalizes firstly terms, expressions thanks to punctuation, upper case, special symbols, digits to have a set of tokens. These tokens are then compared using string comparison metrics such as JaroWinkler, Levenstein or Monger-Elkan. Based on token similarities, the similarity of sets of tokens is computed. To increase the accuracy and to avoid the problems of term conflicts, a lexical database such as WordNet is integrated. To compute the similarity between long texts (for example, between the comments or descriptions of classes or of relations), ASCO uses Term frequency/Inverse document frequency metrics after applying a linguistic processing step to eliminate all of the stopwords in long texts. The computed linguistic similarities are input for the structural phase. In this phase, ASCO tries to exploit the structure of ontology taxonomy for modifying or asserting the similarity of two concepts or relations. The similarities of classes or of relations are iteratively propagated to their neighbors in the tree of ontology which is built from the hierarchy of classes and the hierarchy of relations. When the propagation terminates (the class similarities and the relation similarities do not change after an iteration or a certain number of iterations is reached), if the similarities between classes or relations exceed a threshold, they are considered as similar. ASCO runs now on the two above phases. ASCO algorithm was implemented in Java. It is built on Corese (Conceptual Resource Search Engine), the semantic search engine developed by ACACIA team [18, 19, 20]. Corese loads ontologies from RDF(S) files into memory, these ontologies are then supplied to ASCO. ASCO was tested with two real-world ontologies: O’COMMA, which has 472 concepts and 77 relations [37]; and O’Aprobatiom, which has 460 concepts and 92 relations. 5.17 OLA (INRIA Rhône-Alpes & UoMontréal) OLA [35, 36] is a class of algorithm for ontology alignments which targets the following characteristics: • covering all the possible characteristics of ontologies (i.e., terminological, structural and extensional); • taking care of collection structures (lists, sets) and accounting for them during matching; • expliciting all recursive relationships and finding the best matching through iteration. OLA is currently implemented for ontologies described in OWL-Lite [36]. It uses the Alignment API and implementation that was recently developed [33]. The algorithm first compiles the OWL ontologies into graph structures unveiling all relationships between entities. These graph structures produce the constraints for expressing a similarity between the elements of the ontologies. The similarity between nodes of the graphs follows two principles: (i) it depends on the category of node considered (e.g., class, property), and (ii) it takes into account all the features of this category (e.g., superclasses, properties). This similarity is a weighted linear aggregation of the similarity measures between all the entities a couple of entities is in relation. This accounts for all the relationships between entities. However, these features (like subclasses) are sets of entities, the similarity between these sets of entities, thus depends on a local matching between these entities. A matching of both sets is considered which is: (i) of maximal total similarity, (ii) exclusive, and (iii) of maximal size [69]. Similarity between labels can be produced by any kind of particular terminological method (e.g., string distance, linguistic evaluation). Similarity between data values and data types can be provided by specilised external similarity measures (e.g., Euclidean distance, symmetric difference distance) [69]. The definition of this similarity provides a set of equations whose variables are the similarity values between entities of the ontologies. This set of equation cannot be solved directly due to local matching. As a matter of fact, depending on the currently computed similarity, the matching as defined above can be different. We thus developed an iterative algorithm which compute a first approximation of the similarity (without the local matching), then compute the local matching and reiterate. We proved that this algorithm is converging towards a solution, mainly because the similarity is always improving over the iterations. It can be that this solution is not the global optimum so the algorithm should be launched several times. From this solution, it is possible to extract an alignment between the two ontologies (by retaining the correspondence whose similarity is over a certain threshold, or by optimising the selections of couples). 5.18 ArtGen (Stanford U.) In [56] the authors propose a semi-automated algorithm for resolving terminological heterogeneity among the ontologies and establishing the articulation rules necessary for meaningful interoperation. This algorithm forms the basis of the articulation gen- erator for the ONION (ONtology compositION) system. The automated articulation generator (ArtGen) of ONION suggests articulation rules to the user performing the matching process. A human expert can either accept, modify or alter the suggestions. The expert can also indicate new matches that the articulation generator might have missed. The authors distinguish two types of articulation rules: linguistic matching rules Concept names are represented as a string of words. The linguistic matcher compaires all possible pairs of words from any two concepts of both ontologies and assigns a similarity score to each pair. The matcher uses a word similarity table generated by a word relator (Thesaurus-Based word relator or CorpusBased word relator) to look up the similarity between all possible pairs of words. The similarity score between two concepts is the average of the similarity scores (different from zero) of all possible pairs of words in their names. The linguistic matching rule does not indicate the exact semantic relationship between the two concepts, for example, whether they have a class-subclass relationship, or are equivalent etc. inference rules An inference engine in Datalog is capable of making logical derivations based on the inference rules available in the engine. The ontologies that were used for the experiments were represented in RDF and contained 30 respective 50 nodes, which are very small ontologies. The authors demonstrate how the articulation rules are generated by the ONION system. The tool was evaluated by computing precision and recall measures for the corpus and thesaurus based word relators. Accuracy was measured bycomparing the results of the automated articulation generator with those expected by the expert. If the expert deleted a match of the articulation generator then precision is lowered. In case the expert added a match that was not found by the articulation generator then recall is lowered. The thesaurus-based method resulted in very poor results, thought the corpus-based method produced better results. However scalability was extremely low and the quality of the results were very dependent on the quality of the corpus available. When everything was pre-computed, the corpus-based method scaled very well. 5.19 Alimo (ITI-CERTH) The development and maintenance of large multimedia databases has attracted much attention nowadays from companies and organizations that held multimedia content (archives, broadcast companies, radio and TV channels etc). The goal is to bypass the ineffective and time-consuming process of manual searching and retrieval of multimedia content and use computers to make the content easy to be found and accessible to other parties. Thus, two critical points are identified in making the above goal a reality; effective representation as well as effective retrieval and exploration of multimedia content. For accomplishing the above goal researchers have started to use ontologies in the field of multimedia in order to construct machine-understandable, descriptive versions of the multimedia content based on multimedia ontologies. Four different levels of information are represented in multimedia ontologies: signal information, featural information, symbolic information and semantic information. With the aid of multimedia ontologies the vision of querying and retrieving multimedia content from distributed databases has started to become more feasible. But in order for someone to be able to use all the levels of information, from the semantic to the raw audiovisual one, a proper alignment framework should be provided. For this reason ITI-CERTH is constructing ALIMO (Alignment of Multimedia Ontologies), an ontology alignment system that pay special care to each one of the subparts of a multimedia ontology and the attributes with the special meaning and structure. Semantic descriptions will be aligned using methods hybrid alignment systems (terminological, structural etc). The signal description parts will be compared by using visual matching algorithms from the field of digital image and video processing. The feature description by examining the XML schema of the MPEG-7 visual part and at last the symbolic description by referring to the definitions of the concepts that those labels are instances of, and also by examining the datatypes of the attributes assigned to those instances. 5.20 Bibster (U. Karlruhe) Bibster [14] addresses a typical problem in the daily life of a computer scientist, where one regularly has to search for publications or their correct bibliographic metadata. The scenario that we support here is that researchers in a community share bibliographic metadata in a Peer-to-Peer fashion. Bibster is a Peer-to-Peer system based on the SWAP architecture4, which allows to easily integrate, share and search bibliographic metadata using semantic technologies for the representation of the bibliographic instances and the peers’ expertise to allow effectively route queries. Semantic similarity measures identifying duplicates allow to visualize and to integrate the heterogeneous search results from the peers. Bibliographic entries are extracted from BibTex into an ontology. The query results themselves represent small ontologies, containing duplicates. Finding duplicates is closely related to finding corresponding mappings. In both cases it is necessary to recognize identical objects despite their different identifiers. In the given scenario duplicates are bibliographic entries which refer to the same publication or person in the real world, but are modelled as different resources. The similarity function is based on different features of the respective instances. For persons one can refer to the name. For publications to title, authors, editors, journal, address, type of publication, etc. The function returns a value between 0 and 1 by applying specific heuristics to every feature: Strings are compared using the Levenshtein distance [45], the authors of publications are compared by comparing the two sets. Some domain specific features require special heuristics: if the type of one publication is “Misc”, this only means that no further information about the type was available. If another publication is e.g. type “Article” the similarity is set to 0.5 rather than 0. Besides individual functions our approach focuses on applying an aggregation function to achieve an overall similarity. Through transitive closure we receive a set of “identical” entities. Instead of presenting all instances of the query result, duplicates are visualized as one, merged, resource. These merged resources consist of a union of properties of the individuals identified as duplicates. After several rounds of testing Bibster is now openly available 5, with the component based on alignment working in the background of the system. 5.21 QOM (U. Karlsruhe) QOM considers both the quality of mapping results as well as the run-time complexity. The hypothesis is that mapping algorithms may be streamlined such that the loss of quality (compared to a standard baseline) is marginal, but the improvement of efficiency is so tremendous that it allows for the ad-hoc mapping of large-size, lightweight ontologies. To substantiate the hypothesis, a number of practical experiments were performed. The outcome is QOM — Quick Ontology Mapping [31]. It is defined by the steps of a process model as shown in Figure 5. Mapping one ontology onto another means that for each entity (concept C, relation R, or instance I) in ontology O1, we try to find a corresponding entity, which has the same intended meaning, in ontology O2. Fig. 5. QOM Mapping Process 1. Firstly, QOM uses RDF triples as features. 2. Second, instead of comparing all entities of the first ontology with all entities of the second ontology, QOM uses heuristics to lower the number of candidate mappings, which is a major problem for run-time complexity. In this dynamic programming approach we only choose promising candidate mappings. 3. The actual similarity computation is done by using a wide range of similarity functions [31]. An entity is described by the kind of appearance that is found to hold for this entity for characteristics like: identifiers such as URIs, RDF/S primitives such as subclass and instance relations, or domain specific features e.g. a hashcode-offile in a file sharing domain. These features of ontological entities are compared using String Similarity and SimSet for set comparisons. For efficiency reasons the similarity computation was disburdened by removing extremely costly featuremeasure combinations such as the comparison of all subclasses of two concepts. 4. These individual measures are all input to the similarity aggregation. Instead of applying linear aggregation functions, QOM applies a sigmoid function, which emphasizes high individual similarities and de-emphasizes low individual similarities. 5. From the similarity values we derive the actual mappings. A threshold to discard spurious evidence of similarity is applied. Further mappings are assigned based on a greedy strategy that starts with the largest similarity values first. 6. Through several iteration rounds the quality of the results rises considerably. Eventually, the output returned is a mapping table representing the relation mapO1,O2 . The evaluation was very promising. Depending on the scenario QOM reaches high quality mapping levels very quickly. QOM is on a par with other good state-of-the-art algorithms concerning the quality of proposed mappings, while outperforming them with respect to efficiency in terms of run-time complexity (O(n) instead of O(n2)) and in terms of the experiments we have performed (by a factor of 10 to 100). 5.22 KILT (INRIA Lorraine) A short description of KILT, a maintenance tool for comparing knowledge base versions within the KASIMIR system (see [21]). The KASIMIR system is a knowledge-based system aimed at helping the decision process when searching for an adequate treatment for patients ill with cancer. During an update (or a revision) of a KASIMIR knowledge base, the need for automatically comparing the old base KBold (before the update) and the new base KBnew (after the update) has appeared and is rather important for controlling the evolution of a knowledge base. A module comparing versions has to indicate what has been actually updated, and to check whether the modifications are in accordance with the intents of the knowledge engineer. This is the role of the module called KILT, that has been implemented and integrated into the PROTEGE knowledge editor. KILT enables to make a partitioning of the problems (i.e. a problem is described by a concept denoting a set of patients, and is possibly associated with a solution or a treatment), represented in KBold and/or KBnew in four parts: 1. The problems that appear in the two bases, with the same solutions; 2. The problems that appear in the two bases, with different solutions; 3. The obsolete problems, appearing in KBold but not in KBnew; 4. The new problems, appearing in KBnew but not in KBold. The above partitioning is based on the use of the KASIMIR reasoner. For example, the new problems in category (4) can be found in the following way. Each problem PBnew of KBnew is classified in the hierarchy of KBold, which enables to check whether there is a problem PBold of KBold equivalent to PBnew, i.e. PBold subsumes and is subsumed by PBnew. If this is not the case, then PBnew is a new problem. The three other categories of problems (1), (2), and (3), can be detected and checked in a similar way. This shows that the implementation of KILT is rather simple, once the connection with the KASIMIR reasoner is done. KILT is integrated in PROTEGE in the following way. During a session, KBold corresponds to the state of the knowledge base at the beginning of the session, and KBnew to its current state. Therefore, the KILT module enables to visualize the edition modifications, i.e. addition or removal of a problem, and association of another solution to an already known problem, at any time of the session. KILT makes comparisons at a semantic level: two concepts match when they have equivalent definitions, based on their attribute values and on the subsumption relation between classes. One main drawback is that it is assumed that the attributes –and their names– do not change from one knowledge base version to another. References 1. D. Aum¨uller, H.H. Do, S. Massmann, and E. Rahm. Schema and ontology matching with COMA++. In Proceedings of International Conference on Management of Data (SIGMOD), Software Demonstration, 2005. 2. T. L. Bach, R. Dieng-Kuntz, and F. Gandon. On ontology matching problems (for building a corporate semantic web in a multi-communities organization). In Proc. of ICEIS 2004, Porto (PT), 2004. 3. J. Barwise and J. Seligman. Information flow: the logic of distributed systems, volume 44 of Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, Cambridge (UK), 1997. 4. H.W. Beck, S.K. Gala, and S.B. Navathe. Classification as a query processing technique in the CANDIDE semantic data model. In Proceedings of International Conference on Data Engineering (ICDE), pages 572–581, 1989. 5. D. Beneventano, S. Bergamaschi, S. Lodi, and C. Sartori. Consistency checking in complex object database schemata with integrity constraints. IEEE Transactions on Knowledge and Data Engineering, 10:576–598, 1998. 6. S. Bergamaschi, D. Beneventano, S. Castano, and M. Vincini. Momis: An intelligent system for the integration of semistructured and structured data. Technical Report T3-R07, Università di Modena e Reggio Emilia, Modena (IT), 1998. 7. S. Bergamaschi, S. Castano, and M. Vincini. Semantic integration of semistructured and structured data sources. SIGMOD Record, 28(1):54–59, 1999. 8. J. Berlin and A. Motro. Autoplex: Automated discovery of content for virtual databases. In Proceeding of CoopIS, pages 108–122, 2001. 9. J. Berlin and A. Motro. Database schema matching using machine learning with feature selection. In Proc. conference on advanced information system engineering (CAiSE), 2002. 10. Philip Bernstein, Alon Halevy, and Rachel Pottinger. A vision for managing complex models. SIGMOD Records, 29(4):55–63, 2000. 11. Daniel Le Berre. Jsat: The java satisfiability library, 2001. 12. A. Borgida, R. Brachman, D. McGuinness, and L. Resnick. Classic: A structural data model for objects. SIGMOD Record, 18(2):58–67, 1989. 13. P. Bouquet, L. Serafini, and S. Zanobini. Semantic coordination: A new approach and an application. In Proceedings of International Semantic Web Conference (ISWC), pages 130– 145, 2003. 14. J. Broekstra, M. Ehrig, P. Haase, F. van Harmelen, M. Menken, P. Mika, B. Schnizler, and R. Siebes. Bibster - a semantics-based bibliographic peer-to-peer system. In Proceedings of the SemPGrid 04 Workshop, New York, May 2004. 15. D. Calvanese, G. De Giacomo, and M. Lenzerini. A framework for ontology integration. In Isabel Cruz, Stefan Decker, Jérôme Euzenat, and Deborah McGuinness, editors, The emerging semantic web, pages 201–214. IOS Press, Amsterdam (NL), 2002. 16. S. Castano, V. De Antonellis, and S. De Capitani di Vimercati. Global viewing of heterogeneous data sources. IEEE Transactions on Knowledge and Data Engineering, (13(2)):277– 297, 2001. 17. W. Cohen, P. Ravikumar, and S. Fienberg. A comparison of string metrics for matching names and records. In Proceedings of workshop on Data Cleaning and Object Consolidation at International Conference on Knowledge Discovery and Data Mining (KDD), 2003. 18. O. Corby, R. Dieng, and C. Hébert. A conceptual graph model for w3c resource description framework. In Proc. 8th International Conference on Conceptual Structures Logical, Linguistic, and Computational Issues, Darmstadt (DE), August 2000. Springer-Verlag. 19. O. Corby, R. Dieng-Kuntz, and C. Faron-Zucker. Querying the semantic web with the corese search engine. In Proc. 15th ECAI/PAIS, Valencia (ES), August 2004. IOS Press. 20. O. Corby and C. Faron. Corese: A corporate semantic web engine. In Proc. WWW International Workshop on Real World RDF and Semantic Web Applications, Hawai (HA US), May 2002. 21. M. d’Aquin, C. Bouthier, S. Brachais, J. Lieber, and A. Napoli. Knowledge editing and maintenance tools for a semantic portal in oncology. Rapport de recherche A03-R-162, LORIA, 2003. 22. T. Di Noia, E. Di Sciascio, F. M. Donini, and M. Mongiello. A system for principled matchmaking in an electronic marketplace. In Proceedings ofWorldWideWeb Conference (WWW), pages 321–330, 2003. 23. R. Dieng and S. Hug. Comparison of ”personal ontologies” represented through conceptual graphs. In Proceedings of European Conference on Artificial Intelligence, pages 341–345, 1998. 24. R. Dieng and S. Hug. Multikat, a tool for comparing knowledge from multiple experts. In Conceptual Structures: Theory, Tools and Applications, Proc. of the 6th Int. Conference on Conceptual Structures (ICCS’98), Montpellier (FR), August 10-12 1998. Springer-Verlag, LNAI 1453. 25. H.-H. Do, S. Melnik, and E. Rahm. Comparison of schema matching evaluations. In Proc. GI-Workshop "Web and Databases", Erfurt (DE), 2002. http://dol.unileipzig.de/pub/200228. 26. H.H.Do and E. Rahm. COMA - a system for flexible combination of schema matching approaches. In Proceedings of Very Large Data Bases Conference (VLDB), pages 610–621, 2001. 27. A. Doan, J. Madhavan, P. Domingos, and A. Halevy. Learning to map ontologies on the semantic web. In Proceedings of International World Wide Web Conference (WWW), pages 662–673, 2003. 28. A. Doan. Learning to map between structured representations of data. PhD thesis, University of Washington, Seattle (WA US), 2002. 29. A. Doan, P. Domingos, and A. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In Proceeding of SIGMOD, 2001. 30. A. Doan, J. Madhavan, P. Domingos, and A. Halevy. Ontology matching: a machine learning approach. In Steffen Staab and Rudi Studer, editors, Handbook of ontologies, International handbooks on information systems, chapter 18, pages 385–404. Springer Verlag, Berlin (DE), 2004. 31. M. Ehrig and S. Staab. QOM - quick ontology mapping. In Proc.3rd ISWC, Hiroshima (JP), November 2004. to appear. 32. M. Ehrig and Y. Sure. Ontology mapping - an integrated approach. In Proceedings of European Semantic Web Symposium (ESWS), pages 76–91, 2004. 33. J. Euzenat. Brief overview of T-tree: the Tropes taxonomy building tool. In Proc. 4th ASIS SIG/CR workshop on classification research, Columbus (OH US), pages 69–87, 1994. 34. J. Euzenat. An api for ontology alignment. In Proc. 3rd international semantic web conference, Hiroshima (JP), 2004. 35. J. Euzenat and P. Valtchev. An integrative proximity measure for ontology alignment. In Proceedings of Semantic Integration workshop at International Semantic Web Conference (ISWC), 2003. 36. J. Euzenat and P.Valtchev. Similarity-based ontology alignment in OWL-lite. In Proceedings of European Conference on Artificial Intelligence (ECAI), pages 333–337, 2004. 37. F. Gandon. Distributed Artificial Intelligence and Knowledge Management: ontologies and multi-agent systems for a corporate semantic web. Scientific philosopher doctorate thesis in informatics, INRIA and University of Nice - Sophia Antipolis, November 2002. 38. A. Gangemi, N. Guarino, C. Masolo, and A. Oltramari. Sweetening WordNet with DOLCE. AI Magazine, (24(3)):13–24, 2003. 39 B. Ganter and R. Wille. Formal concept analysis: mathematical foundations. Springer Verlag, Berlin (DE), 1999. 40. F. Giunchiglia and P. Shvaiko. Semantic matching. The Knowledge Engineering Review Journal, (18(3)):265–280, 2003. 41. F. Giunchiglia, P. Shvaiko, and M. Yatskevich. S-Match: an algorithm and an implementation of semantic matching. In Proceedings of European Semantic Web Symposium (ESWS), pages 61–75, 2004. 42. F. Giunchiglia and M. Yatskevich. Element level semantic matching. In Proceedings of Meaning Coordination and Negotiation workshop at International Semantic Web Conference (ISWC), 2004. 43. I. Horrocks, P. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, and M. Dean. SWRL: a semantic web rule language combining OWL and RuleML, 2003. www.daml.org/2003/11/swrl/. 44. Y. Kalfoglou and M. Schorlemmer. Ontology mapping:the state of the art. The Knowledge Engineering Review, 18(1):1–31, 2003. 45. I. V. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory, 1966. 46. W. Li and C. Clifton. Semantic integration in heterogeneous databases using neural networks. In Proceedings of the 20th International Conference on Very Large Data Bases, pages 1–12. Morgan Kaufmann Publishers Inc., 1994. 47. J. Madhavan, P. Bernstein, and E. Rahm. Generic schema matching with cupid. In Proceedings of Very Large Data Bases Conference (VLDB), pages 49–58, 2001. 48. A. Maedche and S. Staab. Measuring similarity between ontologies. In Proceedings of International Conference on Knowledge Engineering and Knowledge Management (EKAW), pages 251–263, 2002. 49. B. Magnini, L. Serafini, and M. Speranza. Making explicit the semantics hidden in schema models. In Proceedings of ISWC workshop on Human Language Technology for the Semantic Web and Web Services, 2003. 50. O. Mariño, C. Rechenmann, F. and P. Uvietta. Multiple perspectives and classification mechanism in object-oriented representation. In 9th European Conference on Artificial Intelligence, pages 425–430, Stockholm, Suède, August 1990. 51. D.L. McGuinness, R. Fikes, J. Rice, and S. Wilder. An environment for merging and testing large ontologies. In Proceeding of KR, pages 483–493, 2000. 52. S. Melnik, E. Rahm, and P. Bernstein. Rondo: A programming platform for model management. In Proc. ACM SIGMOD, San Diego (CA US), 2003. 53. S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching algorithm. In Proceedings of International Conference on Data Engineering (ICDE), pages 117–128, 2002. 54. E. Mena, V. Kashyap, A. Sheth, and A. Illarramendi. Observer: An approach for query processing in global information systems based on interoperability between pre-existing ontologies. In Proceedings of International Conference on Cooperative Information Systems (CoopIS), pages 14–25, 1996. 55. A.G. Miller. Wordnet: A lexical database for english. Communications of the ACM, (38(11)):39–41, 1995. 56. P. Mitra and G. Wiederhold. Resolving terminological heterogeneity in ontologies. In Workshop on Ontologies and Semantic Interoperability at the 15th European Conference on Artificial Intelligence (ECAI), 2002. 57. I. Niles and A. Pease. Towards a standard upper ontology. In Proceedings of the International Conference on Formal Ontology in Information Systems (FOIS), pages 2–9, 2001. 58. N. Noy and M. A. Musen. Anchor-prompt: Using non-local context for semantic matching. In Proceedings of workshop on Ontologies and Information Sharing at International Joint Conference on Artificial Intelligence (IJCAI), pages 63–70, 2001. 59. L. Palopoli, G. Terracina, and D. Ursino. The system dike: Towards the semi-automatic synthesis of cooperative information systems and data warehouses. In Proceeding of ADBISDASFAA, pages 108–117, 2000. 60. C. Parent and S. Spaccapietra. Database integration: the key to data interoperability. In M. P. Papazoglou, S. Spaccapietra, and Z. Tari, editors, Advances in Object-Oriented Data Modeling. The MIT Press, 2000. 61. R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, (19(1)):17–30, 1989. 62. E. Rahm and P. Bernstein. A survey of approaches to automatic schema matching. The International Journal on Very Large Data Bases (VLDB), (10(4)):334–350, 2001. 63. E. Rahm, H.H. Do, and S. Maßmann. Matching large XML schemas. SIGMOD Record, 33(4):26–31, 2004. 64. P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pages 448– 453, 1995. 65. D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In Proceedings of Symposium on Principles of Database Systems (PODS), pages 39–52, 2002. 66. A. Sotnykova, M-A. Aufaure, N. Bennacer, N. Cullot, and C. Vangenot. Semantic mappings in description logics for database schema integration. Technical report, Swiss Federal Institute of Technology in Lausanne, 2004. 67. J. Sowa. Conceptual Structures: Information Processing In Mind and Machine. AddisonWesley, 1984. 68. Gerd Stumme and Alexander Mädche. FCA-merge: bottom-up merging of ontologies. In Proc. 17th IJCAI, Seattle (WA US), pages 225–230, 2001. 69. P. Valtchev. Construction automatique de taxonomies pour l’aide `a la repr´esentation de connaissances par objets. Th`ese d’informatique, Universit´e Grenoble 1, 1999. 70. P. Valtchev and J. Euzenat. Dissimilarity measure for collections of objects and values. Lecture notes in computer science, 1280:259–272, 1997. 71. U. Visser, T. Vögele, and C. Schlieder. Spatio terminological information retrieval using the buster system. In Proceedings of the EnviroInfo, pages 93–100, Wien (AT), 2002. 72. T. Vögele, S. Hübner, and G. Schuster. Buster - an information broker for the semantic web. Künstliche Intelligenz, 3:31–34, July 2003. 73. K. Zhang and D. Shasha. Approximate tree pattern matching. In A. Apostolico and Z. Galil, editors, Pattern matching in strings, trees, and arrays, pages 341–371. Oxford University, 1997. 74. S. Sceffer and L. Serafini and S. Zanobini. Semantic Coordination of Hierarchical Classifications with Attributes, TechRep n 706, University of Trento, December 2004, http://eprints.biblio.unitn.it/archive/00000706/. 75. RACER. Available at: www.sts.tu-harburg.de/~r.f.moeller/racer 76. M. Yatskevich: Schema-based Semantic Matching: Algorithms, a System and a Testing Methodology. Technical Report DIT-05-047, University of Trento, 2005 77. P. Shvaiko: Iterative Schema-based Semantic Matching. Technical Report DIT-04-020, University of Trento, 2004.