Ontology Mapping and link discovery Kunal Narsinghani Ashwini Lahane

advertisement
Ontology Mapping and link discovery
Kunal Narsinghani
Ashwini Lahane
Agenda
 Introduction
 Levels of heterogeneity
 Previous work in the field
 PROMPT Suite of Tools
 Prompt on Protégé
 The Web of Data
 CRS : Managing Co-references
 Silk – A link discovery framework
Introduction
 Can a single ontology suffice for various applications?
 Definition – The task of relating the vocabulary of two Ontologies
that share the same domain of discourse
 It’s a morphism that consists of a collection of functions assigning
symbols used in one vocabulary to the symbols in the other[1]
 This would provide a common layer from which ontologies can be
accessed and exchange information.
 Translation is different from mapping
Introduction
 An analogy to the problem – Clocks
Levels of Heterogeneity in Ontologies
 Syntactic
 Structural
 Semantic
Mapping discovery
 First approach is to use a reference ontology
 Example – the upper Ontologies SUMO and DOLCE
 What when a shared ontology is not available?
 Structural & definitional information can be used to discover
mappings
 Example tools – IF-Map, QOM, MAFRA & Prompt
IF-MAP architecture
Fig: The steps in IF-MAP
PROMPT Suite of Tools
 Interactive tools for ontology merging and mapping
 Ontology
 formal specification of domain information
 facilitate knowledge sharing and reuse
 Different ontologies –may overlap, need to be reconciled
 Determine correlation
 Find all concepts
 Determine similarities
 Change source ontologies or remove overlap
 Record mapping for future reference
Ontology Management
 Tasks
 Finding correlations
 Merging ontologies
 Version management
 Factoring ontologies
 Tools
 Benefit from being tightly integrated into single framework
 Uniform user interface
 Same interaction paradigms
 Easy access from one tool to another
PROMPT Knowledge Model
 Based on knowledge model of Protégé
 Frame based
 Types of frames
 Class
 Set of entities specifying a concept
 Slots
 Attributes of class
 Has domain and range
 Must have unique names
 Instances
 Elements of class
PROMPT Framework
 Tools for multiple-ontology management
 Extension to Protege ontology-editing environment
 Open architecture allows easy extension with plugins
 Tools in PROMPT
 IPROMPT – Interactive ontology merging tool
 ANCHORPROMPT – a graph-based tool for finding
similarities between ontologies
 PROMPTDIFF –for finding a diff between two versions of the
same ontology
 PROMPTFACTOR – a tool for extracting a part of an ontology
PROMPT Framework
IPROMPT
 Interactive ontology merging tool
 Leads user through merging process
 Suggestions for merging
 Identifies inconsistencies and potential problems
 Suggests strategies for resolving
 Uses structure of concepts and their relation along with user
input
 Decision based on local context
 Iterative
IPROMPT Algorithm
IPROMPT Algorithm
 Creates initial suggestion based on lexical similarity of names
 Merged ontology contains frames which are similar to frames




in input ontologies
2 ontologies O1 and O2 are merged to form Om
Merging decisions are designer and task dependent
Set of knowledge based operations defined
For each operation:
 Changes performed automatically
 New merging suggestions
 Inconsistencies and potential problems
Class hierarchies
Suggestion for merging
IPROMPT Operations
 Merge classes
 Merge slots
 Merge instances
 Shallow copy of a class
 Copy class from source ontology to merged
 Deep copy of a class
 Also copies all the parents of the class up to the root hierarchy
Inconsistencies & Potential Problems
 Name conflicts
 Dangling references
 Redundancy in the class hierarchy
 Slot values violating slot-value restrictions
Additional features
 Setting up preferred ontology
 Maintaining user focus
 Providing feedback to user
 Logging of ontology merging and editing operations
ANCHORPROMPT
 Graph based tool for finding similarities
 Compares larger portions
 Goal : Augment IPROMPT by determining additional points
of similarity
 Input : Anchors - Set of pairs of related terms
 Anchor identification – Manual /Automatic
 Each ontology is viewed as a directed labeled graph
ANCHORPROMPT representation
ANCHORPROMPT algorithm
Algorithm
 Begins with anchor pair
 TRIAL, Trail
 PERSON, Person
 Path 1: TRIAL -> PROTOCOL -> STUDY-SITE -> PERSON
 Path 2: Trial -> Design -> Blinding -> Person
 Determine similarity score for pair of related terms
 If two pairs of terms from the source ontologies are similar
and there are paths connecting the terms, then the elements
in those paths are often similar as well
PROMPTDIFF
 Tool for comparing ontology versions
 Version comparison in software code is based on comparing




text files
Ontologies have different text representation
Heuristics algorithm that produces a structural diff between
two versions
Compares the structure of the two ontology versions
Identifies frames changed and what changes were made
PromptDiff Algorithm
 An extensible set of heuristic matchers
 Fixed-point algorithm to combine the results of the matchers
to produce a structural diff between two versions
PROMPTFACTOR
 Tool for factoring out semantically independent part of an




large ontology into a new sub-ontology
Ensures that severed links do not introduce ill-defined
concepts in the sub-ontology
User can specify concepts of interest
Performs the transitive closure of the superclass relation and
all the relations defined by slots
Target ontology works as stand-alone
PromptFactor Algorithm
 User specifies the concept of interest
 PromptFactor traverses the ontology term
 Determines transitive closure of all relations including
subclass-of relation
 Determines all the parents of selected term in hierarchy
 User interactive
 Determines inconsistencies
Prompt Demo
 It is available as a plug-in for Protégé 3.4
 Uses linguistic similarity matches between concepts
 Also matches slot names and slot value types
 In cases where automation is not possible, user intervention is needed; possible
actions are suggested
 Alignment is followed by merging
 Alignment is establishing links between the ontologies
 Merging is the creation of a single coherent ontology
Prompt Demo
The Web of Data
 Data sources span a large range of domains
 RDF data model is used to publish structured data on the
web
 Explicit RDF links exist between entities in different data
sources
 However, there is a lack of tools to set RDF links to other
data sources
Silk
 It is a link specification language
 Allows specification of the links that should be discovered
between data sources, as well as conditions to be fulfilled to
be linked
 Link conditions are specified using similarity metrics; they
can use aggregation functions to combine similarity scores
 Data access performed using SPARQL
Silk Features
 Support for owl:sameAs links and other types of RDF links
 Provides a declarative language to specify link conditions
 Datasets need not be replicated locally
 Caching, indexing and entity pre-selection are used to
enhance performance
Silk LSL example
Silk LSL example..contd
Silk similarity metrics
 Similarity metrics can be combined using aggregation functions
 Sets of resources can be selected using Silk RDF path selector language
Silk Pre-Matching
 Comparison of all entities in Source ‘S’ and Target ‘T’ would
need O(|S|*|T|)
 Using pre-matching a limited set of target entities that are
likely to match a given source entity is found
 Performed by indexing the target resources based on their
property values
 Using this scheme reduces runtime to O(|S| + |T|)
Silk Implementation
Managing coreferences
 Semantic web vision - Large quantities of information
 Readily available
 Interlinked
 Machine readable
 Fragmented web
 Significant overlap
 Need to identify ‘duplicates’
 Co-reference resolution – determining “equivalent” URIs
Co-reference Resolution Service (CRS)
 Systematic analysis and heuristic based approach :
 Identifying
 Publishing
 Managing
 Using co-reference information
 Most prevalent way – owl:sameAs
 Equivalence – context dependent
CRSes
 Maintain sets of equivalent URIs
 Storing co-reference data separately
 URI definition and synonyms are kept separate
 Management techniques - history, rollback, annotation
 Use of multiple CRSes that applications can use
 Core functionality in PHP – easy integration
 Backed by MySQL
Data representation in CRS
 Equivalent URIs are stored in bundles
 1 URI in each bundle is considered as a canon- preferred URI
 Formation of bundles:
 Check if URI already exists in any bundle
 If not, create a ‘singleton’ bundle for new URIs
 Perform merge – union of bundles with “equivalent” URIs
 Constituent bundles that were merged are marked inactive
Examples of bundle formation
Data representation
 Data storage – Indexed tables of hashed URIs
 Permits fast lookup to find:
 Canon of given URI
 All URIs in a bundle
 Deprecate URIs by flags
 Finding all equivalences - coref:coreferenceData links to the
bundle for that URI and recursively repeat the process for
each URI in that bundle













<rdf:RDF xmlns:coref="http://www.rkbexplorer.com/ontologies/coref#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<coref:Bundle>
<coref:canon rdf:resource="http://southampton.rkbexplorer.com/id/person-00021"/>
<coref:duplicate rdf:resource="http://acm.rkbexplorer.com/id/person-102898" />
<coref:duplicate rdf:resource="http://citeseer.rkbexplorer.com/id/resource-CSP109002" />
<coref:duplicate rdf:resource="http://dblp.rkbexplorer.com/id/people-27aedbcb" />
<coref:duplicate rdf:resource="http://eprints.rkbexplorer.com/id/kfupm/person-27aed0c1" />
<coref:duplicate rdf:resource="http://southampton.rkbexplorer.com/id/person-00021" />
<coref:duplicate rdf:resource="http://wiki.rkbexplorer.com/id/hugh_glaser" />
<coref:lastUpdated>2009-01-16 11:11:40</coref:lastUpdated>
</coref:Bundle>
</rdf:RDF>
RDF description of equivalent URIs in a bundle
 Ways to speed up
 Look up only 1 URI from each CRS
 Follow only coref:canon predicate
 Lookup would need O(log|S|+ log|T|)
References
[1] The PROMPT Suite: Interactive Tools For Ontology Merging And Mapping –
Natalya F. Noy and Mark A. Musen;Stanford Medical Informatics, Stanford
University
[2] Managing Co-reference on the Semantic Web - Hugh Glaser, Afraz Jaffri,
Ian C. Millard School of Electronics and Computer Science University of
Southampton Southampton, Hampshire, UK
[3] Ontology Mapping: The State of the Art Yannis Kalfoglou and Marco Schorlemmer
[4] Kalfoglou,Y. and Schorlemmer, M. (2003a).
IFMap: an ontology mapping method based on information flow theory.
Journal on Data Semantics, 1(1):98–127.
[5] Silk – A Link Discovery Framework for the Web of Data
Julius Volz, Christian Bizer et al.
Download