PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya F. Noy

advertisement
PROMPT:
Algorithm and Tool for Automated
Ontology Merging and Alignment
Natalya F. Noy
Stanford Medical Informatics
Stanford University
Outline


Definitions and motivation
The PROMPT ontology-merging algorithm





Incremental algorithm (PROMPT)
Statistical algorithm (Anchor-PROMPT)
The tools
Evaluation
Future work
Ontologies



Characterize concepts and relationships in
an application area, providing a domain of
discourse
Enumerate concepts, attributes of
concepts, and relationships among
concepts
Define constraints on relationships among
concepts
Why do we need ontologies


An ontology provides a shared vocabulary
for different applications in a domain
An ontology enables interoperation among
applications using disparate data sources
from the same domain
Ontologies Are Everywhere

Ontologies have been used in academic
projects for a long time



Knowledge sharing and reuse
Reuse of problem-solving methods
Ontologies are becoming widely used
outside of academia


Categorization of Web sites (e.g. Yahoo!)
Product catalogs
Need for Ontology Merging

There is significant overlap in existing
ontologies


Yahoo! and DMOZ Open Directory
Product catalogs for similar domains
Need for Ontology Merging and
Integration

Need to merge or align overlapping
ontologies


Chemdex™—a portal for accessing lifescience–supply catalogs
Workshop on “Ontologies and Information
Sharing” at IJCAI’2001

6 out of 18 papers (1/3) are about ontology
merging and integration
What Is Ontology Merging
Existing Approaches

Ontology design and integration





term matching (Stanford SKC, ISI)
graph-based analysis (Stanford SKC)
transformation operators (Ontomorph at ISI)
merging tools (Chimaera at Stanford KSL)
Object-oriented Programming

subject-oriented programming (IBM)



“subjective” views of classes
transformation operations
concentrates on methods rather than relations
Existing Approaches (II)

Databases



develop mediators and provide wrappers
define a common data model and mappings
define matching rules to translate directly
Most of these approaches
do not provide any guidance to the user,
do not use structural information
Outline


Definitions and motivation
The PROMPT ontology-merging algorithm





Incremental algorithm (PROMPT)
Statistical algorithm (Anchor-PROMPT)
The tools
Evaluation
Future work
PROMPT

Our approach is:


Partial automation
Algorithms based on
concept-representation structure
 relations between concepts
 user’s actions


Our approach is not:


Complete automation
Algorithm for matching concept names
Knowledge Model

A generic knowledge model of OKBC (Open KnowledgeBase Connectivity Protocol)
 Classes




Instances
Slots



Collections of objects with similar properties
Arranged in a subclass–superclass hierarchy
First-class objects in a knowledge base
Binary relations describing properties of classes and
instances
Facets

Constraints on slot values (cardinality, min, max)
The PROMPT Algorithm
Make initial suggestions
Select the next operation
Perform automatic updates
Find conflicts
Make suggestions
Example: merge-classes
Agency
employee
subclass of
Agent
Agency
employee
Employee
subclass of
Employee
subclass of
subclass of
Agent
Agent
has client
agent for
has client
agent for
Customer
Traveler
Customer
Traveler
Example: merge-classes (II)
Agency
employee
Employee
subclass of
subclass of
Agency
employee
Customer
subclass of
subclass of
Agent
Agent
agent for
Employee
has client
Traveler
agent for
Customer
Traveler
Analyzing Global Properties Locally

Global properties




classes that have the same sets of slots
classes that refer to the same set of classes
slots that are attached to the same classes
Local context


incremental analysis
consider only the concepts that were affected
by the last operation
The PROMPT Operation Set

Extends the OKBC operation set with ontologymerging operations




merge classes
merge slots
merge instances
copy of a class




…
deep or shallow
with or without subclasses
with or without instances
After a User Performs an Operation

For each operation


perform the operation
consider possible conflicts
identify conflicts
 propose solutions




analyze local context
create new suggestions
reinforce or downgrade existing suggestions
Conflicts

Conflicts that PROMPT identifies




name conflicts
dangling references
redundancy in a class hierarchy
slot-value restrictions that violate class
inheritance
Example: merge-classes
Agent
Agent
Agent
Operation Steps: merge-classes

Own slot and their values for the new class
ask the user in case of conflicts or use preferences

Template slots for the new class
union of template slots of the original classes



Subclasses and superclasses for the new
class
Conflicts
Suggestions
Template Slots
Copy template slots that don’t exist in the merged ontology
Agent
Agent
agent for
Agent
agent for
Template Slots
Attach the slots that have already been mapped
Agent
Agent
has client
Agent
client
client
Subclasses And Superclasses
If a superclass (subclass) exists, re-establish the links
Agency
employee
superclass
Agent
Agent
Employee
superclass
Agent
Dangling References
For example,
allowed class
Customer
facet value
Agent
Agent
agent for
dummy frame
Customer
_temp
Agent
facet value
agent for
Additional Suggestions: Merge Slots
If slot names at the merged class are similar, suggest to
merge the slots
Agent
client
has client
Additional Suggestions: Merge Classes
If the set of classes referenced by the merged class is the
same as the set of classes referenced by another class,
suggest a merge
Agency
employee
Agent
has
clients
Client
handles
reservations
Reservation
Additional Suggestions: Merge Classes
If names of superclasses (subclasses) of the merged class
are similar, suggest to merge the classes
Employee
superclass
Agent
Agency
employee
superclass
Check for Cycles
If there is a cycle, suggest removing one of the parents
Person
superclass
Employee
Agency
employee
superclass
Agent
To Summarize


Perform the actual operation
For the concepts (classes, slots, and
instances) directly attached to the
operation arguments


perform global analysis for new suggestions
Perform global analysis for new conflicts
Context
Non-local context
Classes directly referenced by C
Slots in C
C
Anchor-PROMPT:
Using Non-Local Contexts
Ontology 1
Ontology 2

Input:


Output:


A set of anchor pairs
A set of related terms with
similarity scores
Where do anchors come
from?



Lexical matching
Interactive tools
User-specified
Generating Paths in the Graph
Similarity Score




Generate a set of all paths (of length < L)
Generate a set of all possible pairs of paths of
equal length
For each pair of paths and for each pair of
nodes in the identical positions in the paths,
increment the similarity score
Combine the similarity score for all the paths
Equivalence Groups
Anchor-PROMPT: Initial Results
TRIAL
PERSON
CROSSOVER
Trial
Person
Crossover
PROTOCOL
TRIAL-SUBJECT
INVESTIGATORS
POPULATION
PERSON
TREATMENT-POPULATION
Design
Person
Person
Action_Spec
Character
Crossover_arm
Knowledge Model Assumptions
The only assumption:
An OKBC-compliant knowledge model
Outline


Definitions and motivation
The PROMPT ontology-merging algorithm





Incremental algorithm (PROMPT)
Statistical algorithm (Anchor-PROMPT)
The tools
Evaluation
Future work
Protégé-2000

An environment for




Ontology development
Knowledge acquisition
Intuitive direct-manipulation interface
Extensibility

Ability to plug in new components
Ontologies in Protégé-2000
Protégé-200 plugins




Domain-specific user-interface plugins
Alternative back ends for archival storage
Utility programs for knowledge-acquisition
tasks
End-user applications
Protégé-based PROMPT tool

Protégé-2000


has an OKBC-compatible knowledge model
allows building extensions through a plug-in
mechanism

can work as a knowledge-base server for the plugins
The PROMPT tool
The PROMPT tool features






Setting a preferred ontology
Maintaining the user’s focus
Providing feedback to the user
Preserving original relations
 subclass-superclass relations
 slot attachment
 facet values
Linking to the direct-manipulation ontology editor
Logging operations
Outline


Definitions and motivation
The PROMPT ontology-merging algorithm





Incremental algorithm (PROMPT)
Statistical algorithm (Anchor-PROMPT)
The tools
Evaluation
Future work
Evaluation



Knowledge-based systems are rarely
evaluated
We can use software-engineering
approaches to empirical evaluation of
tools
We need to develop additional knowledgebase measurements
Questions we asked


How good are PROMPT’s suggestions and
conflict-resolution strategies?
Does PROMPT provide any benefit when
compared to a generic ontology-editing tool
(Protégé-2000)?
What we were trying to find out

The benefit that the tool provides




Productivity benefit
Quality improvement in the resulting
ontologies
User satisfaction
Precision and recall of the tool’s
suggestions
Source ontologies for the
experiments

Two ontologies of problem-solving
methods


the ontology for the Unified Problem-solving
Method Development Language (UPML)
the ontology for the Method-Description
Language (MDL)
Experiment 1: Evaluate the
quality of PROMPT’s suggestions
Suggestions
that the tool
produced
Suggestions
that the user
followed

Metrics



Method

Operations
that the user
performed
Precision
Recall

Automatic
logging
Automatic data
reporting
Results: the quality of PROMPT’s
suggestions
Suggestions
that users followed
Conflict-resolution strategies
that users followed
75%
90%
Knowledge-base operations
generated automatically
74%
Experiment 2: PROMPT versus
generic Protégé-2000

Metrics

PROMPT

content of the resulting
ontologies
number of explicit
knowledge-base
operations
Results: PROMPT versus generic
Protégé-2000


The resulting ontologies had only one
difference
Specifying operations explicitly
60
40
20
0
16
PROMPT
60
Protégé
Results


Experts followed most of the PROMPT’s
suggestions
Using PROMPT has improved the
efficiency of ontology merging
Anchor-PROMPT Evaluation

Experiment setup


Two ontologies from the DAML ontology
library
Varying parameters
maximum path length
 number of anchor pairs


Experiment results

Ratio of correct results above the median
similarity score
Anchor-PROMPT: Evaluation Results
Max path
length
4
4
4
3
3
3
2
2
Number of
Result
anchors
precision
4
3
2
4
3
2
4
3
67%
67%
61%
67%
61%
56%
100%
100%
Anchor-PROMPT Evaluation Results



Equivalence groups of size <= 2 are
required
Maximum path lengths of 2 provides
extremely high precision (but low recall)
75% precision with maximum path lengths
3 and 4
Future work



Extend the set of heuristics that PROMPT
uses for guiding the experts
Extend the techniques to ontology
alignment and ontology refactoring
Develop protocols and metrics for a more
detailed evaluation of the tools
http://protege.stanford.edu
Download