Increasing the Precision of Semantic Interoperation Gio Wiederhold

advertisement
Stanford Computer Forum
Increasing the Precision of
Semantic Interoperation
Gio Wiederhold
Jan Jannink, Shrish Agarwal, Prasenjit Mitra, Stefan Decker.
Stanford University
March 2000
report: www-db.stanford.edu/pub/gio/1999/miti.htm
Supported by AFOSR- New World Vistas Program
March 2000
Gio XIT 1
Heterogeneity among Domains
If interoperation involves distinct
domains mismatch ensues
• Autonomy conflicts with consistency,
– Local Needs have Priority,
– Outside uses are a Byproduct
Heterogeneity must be addressed
• Platform and Operating Systems 4 4
• Representation and Access Conventions 4
• Naming and Ontology :
March 2000
Gio XIT 2
Semantic Mismatches
Information comes from many autonomous sources
• Differing viewpoints
(by source)
–
–
–
–
–
differing terms for similar items
{ lorry, truck }
same terms for dissimilar items
trunk(luggage, car)
differing coverage
vehicles (DMV, AIA)
differing granularity
trucks (shipper, manuf.)
different scope
student museum fee, Stanford
• Hinders use of information from disjoint sources
– missed linkages
– irrelevant linkages
loss of information, opportunities
overload on user or application program
• Poor precision when merged
ok for web browsing ,
March 2000
poor for business
Gio XIT 3
Solutions
Specify and standardize terminology usage: ontology
• Globally
all interacting sources
–
–
–
–
wonderful for users and their programs
long time to achieve, 2 sources (UAL, BA), 3 (+ trucks), 4, … all ?
costly maintenance, since all sources evolve
who has the authority to dictate conformance
• Domain-specific
–
–
–
–
–
XML DTD assumption
Small, focused, cooperating groups
high quality, some examples - genomics, arthritis, shakespeare plays
allows sharable, formal tools
ongoing, local maintenance affecting users - annual updates
poor interoperation, users still face inter-domain mismatches
• solves only part of the problem
March 2000
Gio XIT 4
Domains and Consistency
.
• a domain will contain many objects
• the object configuration is consistent
• within a domain all terms are consistent &
• relationships among objects are consistent
Domain Ontology
• context is implicit
No committee is needed
to forge compromises *
within a domain
 Compromises hide valuable details
March 2000
Gio XIT 5
Objective Scalable Knowledge Composition
Provide for Maintainable Ontologies
• devolve maintenance onto many
domain-specific experts / authorities
• provide an algebra to compute
composed ontologies that are
limited to their articulation terms
SKC
• enable interpretation within the
source contexts
March 2000
Gio XIT 6
Sample Operation: INTERSECTION
Articulation
Source Domain 1:
Owned and maintained
by Store
March 2000
Result contains
shared terms,
useful for purchasing
Source Domain 2:
Owned and maintained
by Factory
Gio XIT 7
Tools to create articulations
Graph matcher
for
Articulationcreating
Expert
Transport
ontology
Vehicle
ontology
Suggestions
for articulations
March 2000
Gio XIT 8
continue from initial point
Also suggest similar terms
for further articulation:
• by spelling similarity,
• by graph position
• by term match repository
Expert response:
1. Okay
2. False
3. Irrelevant
to this articulation
All results are recorded
Okay ’s are converted into articulation rules
March 2000
Gio XIT 9
Candidate Match Repository
Term linkages automatically extracted from 1912 Webster’s dictionary *
* free, other sources
.
Based on processing
headwords  definitions
using algebra primitives
being processed.
Notice presence
of 2 domains:
chemistry, transport
March 2000
Gio XIT 10
Using the Match Repository
March 2000
Gio XIT 11
Using the Match Repository
March 2000
Gio XIT 12
An Ontology Algebra
A knowledge-based algebra for ontologies
Intersection
Union
Difference
create a subset ontology
keep sharable entries
create a joint ontology
merge entries
create a distinct ontology
remove shared entries
The Articulation Ontology (AO) consists of
rules that link domain ontologies
March 2000
matching
Gio XIT 13
INTERSECTION support
Articulation ontology
Terms useful
for purchasing
Matching
rules that use
terms from the
2 source domains
Store
Ontology
March 2000
Factory
Ontology
Gio XIT 14
Other Basic Operations
DIFFERENCE: material
fully under local control
UNION: merging
entire ontologies
Articulation
ontology
typically prior
intersections
March 2000
Gio XIT 15
Features of an algebra
Operations can be composed
Operations can be rearranged
Alternate arrangements can be evaluated
Optimization is enabled
The record of past operations can be
kept and reused when sources change
March 2000
Gio XIT 16
Knowledge Composition
Composed knowledge for
applications using A,B,C,E
Articulation
knowledge
(A B) U
(B C) U
(C E)
Articulation
knowledge
(C E)
U
U
U : union
: intersection
U
Knowledge
resource
E
Articulation
knowledge
for (A B)
U
Knowledge
resource
A
March 2000
U
(B
C)
Knowledge
resource
B
Knowledge
resource
C
(C
U
Legend:
U
U
for
D)
Knowledge
resource
D
Gio XIT 17
Primitive Operations
Model and Instance
Unary
• Summarize -- abstract
• Glossarize - list terms
• Filter - reduce instances
• Extract - move into context
Binary
• Match - data corrobaration
• Difference - distance measure
• Intersect - use of articulation
• Union - search broadening
March 2000
Constructors
• create object
• create set
Connectors
• match object
• match set
Editors
• insert value
• edit value
• move value
• delete value
Converters
• object - value
• object indirection
• reference indirection
Gio XIT 18
Exploiting the result
Result has links
to source
.
Avoid n2 problem of interpreter
mapping [Swartout HPKB year 1]
Processing & query
evaluation is best
performed within
Source Domains
& by their engines
March 2000
Gio XIT 19
Domain Specialization
.
• Knowledge Acquisition (20% effort) &
• Knowledge Maintenance (80% effort *)
to be performed
• Domain specialists
• Professional organizations
• Field teams
of modest size
automously
maintainable
Empowerment
* based on experience with software
March 2000
Gio XIT 20
Summary
To sustain the trend
1. The value of the results has to keep increasing
precision, relevance not volume
2. Value is provided by experts,
encoded as models of
diverse resources, customers
Problems to be addressed
mismatches
quality
Clear models
temporal extensions
maintenance
}
Thanks to Jan Jannink, Shrish Agarwal, Prasenjit Mitra, Stefan Decker.
March 2000
Gio XIT 21
Download