Stanford Computer Forum Increasing the Precision of Semantic Interoperation Gio Wiederhold Jan Jannink, Shrish Agarwal, Prasenjit Mitra, Stefan Decker. Stanford University March 2000 report: www-db.stanford.edu/pub/gio/1999/miti.htm Supported by AFOSR- New World Vistas Program March 2000 Gio XIT 1 Heterogeneity among Domains If interoperation involves distinct domains mismatch ensues • Autonomy conflicts with consistency, – Local Needs have Priority, – Outside uses are a Byproduct Heterogeneity must be addressed • Platform and Operating Systems 4 4 • Representation and Access Conventions 4 • Naming and Ontology : March 2000 Gio XIT 2 Semantic Mismatches Information comes from many autonomous sources • Differing viewpoints (by source) – – – – – differing terms for similar items { lorry, truck } same terms for dissimilar items trunk(luggage, car) differing coverage vehicles (DMV, AIA) differing granularity trucks (shipper, manuf.) different scope student museum fee, Stanford • Hinders use of information from disjoint sources – missed linkages – irrelevant linkages loss of information, opportunities overload on user or application program • Poor precision when merged ok for web browsing , March 2000 poor for business Gio XIT 3 Solutions Specify and standardize terminology usage: ontology • Globally all interacting sources – – – – wonderful for users and their programs long time to achieve, 2 sources (UAL, BA), 3 (+ trucks), 4, … all ? costly maintenance, since all sources evolve who has the authority to dictate conformance • Domain-specific – – – – – XML DTD assumption Small, focused, cooperating groups high quality, some examples - genomics, arthritis, shakespeare plays allows sharable, formal tools ongoing, local maintenance affecting users - annual updates poor interoperation, users still face inter-domain mismatches • solves only part of the problem March 2000 Gio XIT 4 Domains and Consistency . • a domain will contain many objects • the object configuration is consistent • within a domain all terms are consistent & • relationships among objects are consistent Domain Ontology • context is implicit No committee is needed to forge compromises * within a domain Compromises hide valuable details March 2000 Gio XIT 5 Objective Scalable Knowledge Composition Provide for Maintainable Ontologies • devolve maintenance onto many domain-specific experts / authorities • provide an algebra to compute composed ontologies that are limited to their articulation terms SKC • enable interpretation within the source contexts March 2000 Gio XIT 6 Sample Operation: INTERSECTION Articulation Source Domain 1: Owned and maintained by Store March 2000 Result contains shared terms, useful for purchasing Source Domain 2: Owned and maintained by Factory Gio XIT 7 Tools to create articulations Graph matcher for Articulationcreating Expert Transport ontology Vehicle ontology Suggestions for articulations March 2000 Gio XIT 8 continue from initial point Also suggest similar terms for further articulation: • by spelling similarity, • by graph position • by term match repository Expert response: 1. Okay 2. False 3. Irrelevant to this articulation All results are recorded Okay ’s are converted into articulation rules March 2000 Gio XIT 9 Candidate Match Repository Term linkages automatically extracted from 1912 Webster’s dictionary * * free, other sources . Based on processing headwords definitions using algebra primitives being processed. Notice presence of 2 domains: chemistry, transport March 2000 Gio XIT 10 Using the Match Repository March 2000 Gio XIT 11 Using the Match Repository March 2000 Gio XIT 12 An Ontology Algebra A knowledge-based algebra for ontologies Intersection Union Difference create a subset ontology keep sharable entries create a joint ontology merge entries create a distinct ontology remove shared entries The Articulation Ontology (AO) consists of rules that link domain ontologies March 2000 matching Gio XIT 13 INTERSECTION support Articulation ontology Terms useful for purchasing Matching rules that use terms from the 2 source domains Store Ontology March 2000 Factory Ontology Gio XIT 14 Other Basic Operations DIFFERENCE: material fully under local control UNION: merging entire ontologies Articulation ontology typically prior intersections March 2000 Gio XIT 15 Features of an algebra Operations can be composed Operations can be rearranged Alternate arrangements can be evaluated Optimization is enabled The record of past operations can be kept and reused when sources change March 2000 Gio XIT 16 Knowledge Composition Composed knowledge for applications using A,B,C,E Articulation knowledge (A B) U (B C) U (C E) Articulation knowledge (C E) U U U : union : intersection U Knowledge resource E Articulation knowledge for (A B) U Knowledge resource A March 2000 U (B C) Knowledge resource B Knowledge resource C (C U Legend: U U for D) Knowledge resource D Gio XIT 17 Primitive Operations Model and Instance Unary • Summarize -- abstract • Glossarize - list terms • Filter - reduce instances • Extract - move into context Binary • Match - data corrobaration • Difference - distance measure • Intersect - use of articulation • Union - search broadening March 2000 Constructors • create object • create set Connectors • match object • match set Editors • insert value • edit value • move value • delete value Converters • object - value • object indirection • reference indirection Gio XIT 18 Exploiting the result Result has links to source . Avoid n2 problem of interpreter mapping [Swartout HPKB year 1] Processing & query evaluation is best performed within Source Domains & by their engines March 2000 Gio XIT 19 Domain Specialization . • Knowledge Acquisition (20% effort) & • Knowledge Maintenance (80% effort *) to be performed • Domain specialists • Professional organizations • Field teams of modest size automously maintainable Empowerment * based on experience with software March 2000 Gio XIT 20 Summary To sustain the trend 1. The value of the results has to keep increasing precision, relevance not volume 2. Value is provided by experts, encoded as models of diverse resources, customers Problems to be addressed mismatches quality Clear models temporal extensions maintenance } Thanks to Jan Jannink, Shrish Agarwal, Prasenjit Mitra, Stefan Decker. March 2000 Gio XIT 21