PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya F. Noy Stanford Medical Informatics Stanford University Outline Definitions and motivation The PROMPT ontology-merging algorithm Incremental algorithm (PROMPT) Statistical algorithm (Anchor-PROMPT) The tools Evaluation Future work Ontologies Characterize concepts and relationships in an application area, providing a domain of discourse Enumerate concepts, attributes of concepts, and relationships among concepts Define constraints on relationships among concepts Why do we need ontologies An ontology provides a shared vocabulary for different applications in a domain An ontology enables interoperation among applications using disparate data sources from the same domain Ontologies Are Everywhere Ontologies have been used in academic projects for a long time Knowledge sharing and reuse Reuse of problem-solving methods Ontologies are becoming widely used outside of academia Categorization of Web sites (e.g. Yahoo!) Product catalogs Need for Ontology Merging There is significant overlap in existing ontologies Yahoo! and DMOZ Open Directory Product catalogs for similar domains Need for Ontology Merging and Integration Need to merge or align overlapping ontologies Chemdex™—a portal for accessing lifescience–supply catalogs Workshop on “Ontologies and Information Sharing” at IJCAI’2001 6 out of 18 papers (1/3) are about ontology merging and integration What Is Ontology Merging Existing Approaches Ontology design and integration term matching (Stanford SKC, ISI) graph-based analysis (Stanford SKC) transformation operators (Ontomorph at ISI) merging tools (Chimaera at Stanford KSL) Object-oriented Programming subject-oriented programming (IBM) “subjective” views of classes transformation operations concentrates on methods rather than relations Existing Approaches (II) Databases develop mediators and provide wrappers define a common data model and mappings define matching rules to translate directly Most of these approaches do not provide any guidance to the user, do not use structural information Outline Definitions and motivation The PROMPT ontology-merging algorithm Incremental algorithm (PROMPT) Statistical algorithm (Anchor-PROMPT) The tools Evaluation Future work PROMPT Our approach is: Partial automation Algorithms based on concept-representation structure relations between concepts user’s actions Our approach is not: Complete automation Algorithm for matching concept names Knowledge Model A generic knowledge model of OKBC (Open KnowledgeBase Connectivity Protocol) Classes Instances Slots Collections of objects with similar properties Arranged in a subclass–superclass hierarchy First-class objects in a knowledge base Binary relations describing properties of classes and instances Facets Constraints on slot values (cardinality, min, max) The PROMPT Algorithm Make initial suggestions Select the next operation Perform automatic updates Find conflicts Make suggestions Example: merge-classes Agency employee subclass of Agent Agency employee Employee subclass of Employee subclass of subclass of Agent Agent has client agent for has client agent for Customer Traveler Customer Traveler Example: merge-classes (II) Agency employee Employee subclass of subclass of Agency employee Customer subclass of subclass of Agent Agent agent for Employee has client Traveler agent for Customer Traveler Analyzing Global Properties Locally Global properties classes that have the same sets of slots classes that refer to the same set of classes slots that are attached to the same classes Local context incremental analysis consider only the concepts that were affected by the last operation The PROMPT Operation Set Extends the OKBC operation set with ontologymerging operations merge classes merge slots merge instances copy of a class … deep or shallow with or without subclasses with or without instances After a User Performs an Operation For each operation perform the operation consider possible conflicts identify conflicts propose solutions analyze local context create new suggestions reinforce or downgrade existing suggestions Conflicts Conflicts that PROMPT identifies name conflicts dangling references redundancy in a class hierarchy slot-value restrictions that violate class inheritance Example: merge-classes Agent Agent Agent Operation Steps: merge-classes Own slot and their values for the new class ask the user in case of conflicts or use preferences Template slots for the new class union of template slots of the original classes Subclasses and superclasses for the new class Conflicts Suggestions Template Slots Copy template slots that don’t exist in the merged ontology Agent Agent agent for Agent agent for Template Slots Attach the slots that have already been mapped Agent Agent has client Agent client client Subclasses And Superclasses If a superclass (subclass) exists, re-establish the links Agency employee superclass Agent Agent Employee superclass Agent Dangling References For example, allowed class Customer facet value Agent Agent agent for dummy frame Customer _temp Agent facet value agent for Additional Suggestions: Merge Slots If slot names at the merged class are similar, suggest to merge the slots Agent client has client Additional Suggestions: Merge Classes If the set of classes referenced by the merged class is the same as the set of classes referenced by another class, suggest a merge Agency employee Agent has clients Client handles reservations Reservation Additional Suggestions: Merge Classes If names of superclasses (subclasses) of the merged class are similar, suggest to merge the classes Employee superclass Agent Agency employee superclass Check for Cycles If there is a cycle, suggest removing one of the parents Person superclass Employee Agency employee superclass Agent To Summarize Perform the actual operation For the concepts (classes, slots, and instances) directly attached to the operation arguments perform global analysis for new suggestions Perform global analysis for new conflicts Context Non-local context Classes directly referenced by C Slots in C C Anchor-PROMPT: Using Non-Local Contexts Ontology 1 Ontology 2 Input: Output: A set of anchor pairs A set of related terms with similarity scores Where do anchors come from? Lexical matching Interactive tools User-specified Generating Paths in the Graph Similarity Score Generate a set of all paths (of length < L) Generate a set of all possible pairs of paths of equal length For each pair of paths and for each pair of nodes in the identical positions in the paths, increment the similarity score Combine the similarity score for all the paths Equivalence Groups Anchor-PROMPT: Initial Results TRIAL PERSON CROSSOVER Trial Person Crossover PROTOCOL TRIAL-SUBJECT INVESTIGATORS POPULATION PERSON TREATMENT-POPULATION Design Person Person Action_Spec Character Crossover_arm Knowledge Model Assumptions The only assumption: An OKBC-compliant knowledge model Outline Definitions and motivation The PROMPT ontology-merging algorithm Incremental algorithm (PROMPT) Statistical algorithm (Anchor-PROMPT) The tools Evaluation Future work Protégé-2000 An environment for Ontology development Knowledge acquisition Intuitive direct-manipulation interface Extensibility Ability to plug in new components Ontologies in Protégé-2000 Protégé-200 plugins Domain-specific user-interface plugins Alternative back ends for archival storage Utility programs for knowledge-acquisition tasks End-user applications Protégé-based PROMPT tool Protégé-2000 has an OKBC-compatible knowledge model allows building extensions through a plug-in mechanism can work as a knowledge-base server for the plugins The PROMPT tool The PROMPT tool features Setting a preferred ontology Maintaining the user’s focus Providing feedback to the user Preserving original relations subclass-superclass relations slot attachment facet values Linking to the direct-manipulation ontology editor Logging operations Outline Definitions and motivation The PROMPT ontology-merging algorithm Incremental algorithm (PROMPT) Statistical algorithm (Anchor-PROMPT) The tools Evaluation Future work Evaluation Knowledge-based systems are rarely evaluated We can use software-engineering approaches to empirical evaluation of tools We need to develop additional knowledgebase measurements Questions we asked How good are PROMPT’s suggestions and conflict-resolution strategies? Does PROMPT provide any benefit when compared to a generic ontology-editing tool (Protégé-2000)? What we were trying to find out The benefit that the tool provides Productivity benefit Quality improvement in the resulting ontologies User satisfaction Precision and recall of the tool’s suggestions Source ontologies for the experiments Two ontologies of problem-solving methods the ontology for the Unified Problem-solving Method Development Language (UPML) the ontology for the Method-Description Language (MDL) Experiment 1: Evaluate the quality of PROMPT’s suggestions Suggestions that the tool produced Suggestions that the user followed Metrics Method Operations that the user performed Precision Recall Automatic logging Automatic data reporting Results: the quality of PROMPT’s suggestions Suggestions that users followed Conflict-resolution strategies that users followed 75% 90% Knowledge-base operations generated automatically 74% Experiment 2: PROMPT versus generic Protégé-2000 Metrics PROMPT content of the resulting ontologies number of explicit knowledge-base operations Results: PROMPT versus generic Protégé-2000 The resulting ontologies had only one difference Specifying operations explicitly 60 40 20 0 16 PROMPT 60 Protégé Results Experts followed most of the PROMPT’s suggestions Using PROMPT has improved the efficiency of ontology merging Anchor-PROMPT Evaluation Experiment setup Two ontologies from the DAML ontology library Varying parameters maximum path length number of anchor pairs Experiment results Ratio of correct results above the median similarity score Anchor-PROMPT: Evaluation Results Max path length 4 4 4 3 3 3 2 2 Number of Result anchors precision 4 3 2 4 3 2 4 3 67% 67% 61% 67% 61% 56% 100% 100% Anchor-PROMPT Evaluation Results Equivalence groups of size <= 2 are required Maximum path lengths of 2 provides extremely high precision (but low recall) 75% precision with maximum path lengths 3 and 4 Future work Extend the set of heuristics that PROMPT uses for guiding the experts Extend the techniques to ontology alignment and ontology refactoring Develop protocols and metrics for a more detailed evaluation of the tools http://protege.stanford.edu