7/26/2016
S calable K nowledge C omposition November 1997 Jan Jannink, Danladi Verheijen, Gio Wiederhold Stanford University
An abstract concept is like a valise with a false bottom. you may put in what you please, and take them out again, without being observed.
Alexis de Toqueville, Democracy in America, 1838.
Gio Wiederhold SKC 1
• Goal: Reliable answers using heterogeneous
data sources
– General sources: factbook ‘96, UN – Topical sources: EIA, OECD, OPEC • Approach: Bottom up from data – Python scripts implement rule-based
operations on source data to answer challenge problems
• Theory: Rule-based algebra – Mapping primitives & Intersection operation 7/26/2016 Gio Wiederhold SKC 2
• We extract portions of ontology implicit in sites • factbook ‘96: www.odci.gov/cia • UN: www.un.org & www.globalpolicy.org • EIA: www.eia.doe.gov • OECD: www.oecd.org • OPEC: www.opec.org 7/26/2016 Gio Wiederhold SKC 3
• What is the most recent
year an OPEC member nation was on the UN security council?
– Related to CP # 72 – Sources »
factbook ‘96 (nation)
» OPEC (members,
dates)
» UN (SC members,
years)
– Correct Answer » 1996 (Indonesia) – Problems * *
different country names
• Gambia => The Gambia
historical country names
• Yugoslavia *
factbook has out of date OPEC & UN SC lists
• Gabon (left OPEC 1994) » UN lists future security
council members
• Gabon 1998 » intent of original question • Temporal variants 7/26/2016 Gio Wiederhold SKC 4
7/26/2016 Source: OPEC Pages Iran Iraq Kuwait Saudi_Arabia Venezuela Qatar Indonesia Socialist_Peoples_Libyan _Arab_Jamahiriya * United_Arab_Emirates Algeria Nigeria Ecuador Gabon * 1960 1960 1960 1960 1960 1961 1962 1962 1967 1969 1971 1973 1992 1975 1994 UN Pages Bahrain Bahrain Brazil Brazil Gabon * Gabon * Gambia * Gambia * Slovenia Slovenia Costa_Rica Costa_Rica … Indonesia Indonesia … Yugoslavia * Yugoslavia * 1996 1995 1989 1988 1999 1998 1999 1998 1999 1998 1999 1998 1999 1998 1998 1997 * Problems handled using SKC articulation rules Gio Wiederhold SKC 5
• Experience w/ real data confirming validity of
our approach
– Expert sources are better maintained than
general sources
– We generate successive approximations with
increasing levels of confidence
• Manual processing of sources is our first step in
providing an algebra that truly accounts for the complexity of real data sources
7/26/2016 Gio Wiederhold SKC 6
Ontologies list the terms and their relationships that allow communication among partners in enterprises
(in machine-readable form) Relationships determine meaning -
parent, school, company Databases use ontologies during design in their E-R diagrams
(Implicitly)
and represent the leaf nodes in their schemas Knowledge-bases use ontologies
(often implicitely)
add class definition
(to hold instances)
, constraints, and operations among the terms
7/26/2016 Gio Wiederhold SKC 7
• Define Terms used in System Construction
to enable Correctness in Understanding system = designers, implementors, users, maintainers designers = implementors = users = maintainers
• Define Higher-level Abstractions needed to
communicate in larger contexts managers, decision-makers, systems in own, other domains
• Share the Cost of Knowledge Acquistion &
Maintenance reuse encoded knowledge, remain up-to-date as domains change
7/26/2016 Gio Wiederhold SKC 8
Lexicons: collect terms used in inform. systems
Taxonomies: categorize, abstract, classify terms
Schemas of databases: attributes, ranges filed
Data dictionaries: integration of files, attributes
Object libraries: grouped attributes, methods
Symbol tables: collect terms used in a program
Domain object models: re-engineering terms
. . .
More Knowledge
7/26/2016 Gio Wiederhold SKC 9
Top-down:
–Commonly acceptable UPPER layers
Domain-specific
–Sharing tools –Object based
Bottom-up
–Pragmatic, TASK-specific collections –Database schemas and models 7/26/2016 Gio Wiederhold SKC 10
Implicit ontologies are a prerequisite for communication among humans and organizations.
Knowledge is explicitely represented in AI-systems; sometimes the ontology is explicit as well.
Database schemas are partial explicit ontologies
• Relational schemas only terms & 1:1 dependencies. • E-R designs contain 1:n, m:n cardinalities • Structural schemas contain semantic dep. types
Conceptual graphs define terms of discourse and a modest number of relationship types Variables in software represent ontologies poorly.
7/26/2016 Gio Wiederhold SKC 11
Three Alternatives
Create a committee to define everybody’s terms
Takes many years, until people are worn out Ignored when changes make deviation necessary Get all terms and put them into large model
[ Cyc, UMLS, Federated Schemas, . . . ]
Can be rapid Provides broad integration Ignores conflicts Hard to maintain (requires committee)
Keep all Terms distinct, except where sharing
Requires initial effort Complex system view Empowers participants Scalable with many participants 7/26/2016 Gio Wiederhold SKC 12
Provide for Maintainable Ontologies
• devolve maintenance onto many
domain-specific experts / authorities
• provide an
to compute composed ontologies that are limited to their articulation terms SKC
• enable interpretation within the
source contexts
7/26/2016 Gio Wiederhold SKC 13
• Ontology:
a set of terms and their relationships
• Term:
a reference to real-world and abstract objects
• Relationship:
a named and typed set of links between objects
• Reference:
a label that names objects
• Real-world object:
an entity instance with a physical manifestation
• Abstract object:
a concept which refers to other objects
7/26/2016 Gio Wiederhold SKC 14
• Object oriented class hierarchies,
(snapshots of executing programs capture object instances)
• Database schemas,
(via their E-R or structural models)
• Semi-structured databases, •
(OEM
• Definitional thesauri,
(UMLS: see http://www.lexical.com) Knowledge bases.(CYC, Ontolingua) SKC specifically does not restrict its applicability to a purely extensional (object) or intensional (schema) definition of ontology, since its purpose is to support useful processing of extensions using intensional knowledge for all parties. To that end it is important that the intensional specifications include predicates or methods that permit the collection of extensional access to real-world objects. We do not require ontologies to be complete specifications of a domain, but rather that usage of an ontology provide results complete with respect to the ontology.
7/26/2016 Gio Wiederhold SKC 15
• The mapping of terms to objects differs between
autonomous domains.
• The collections of real-world objects provides a
grounding for the definitions, and an opportunity for validation of the meaning of the terms being employed.:
• Relationships have semantic, and derived from
that, structural significance. Multiple relationship types may share structural characteristics, as IS-A, Ownership, Part-of, Reference,
• We will keep the number of primitive
relationships limited,
• The mapping of relationship types differs
between autonomous domains.
7/26/2016 Gio Wiederhold SKC 16
• a domain will contain many objects • the object configuration is consistent • • within a domain all
terms
are consistent &
relationships
among objects are consistent
Domain Ontology
• context is implicit
No committee is needed to forge compromises * within a domain
Compromises hide valuable details
7/26/2016 Gio Wiederhold SKC 17
If interoperation involves distinct domains, mismatch ensues
• Autonomy conflicts with consistency, – Local Needs have Priority, – Outside uses are a Byproduct
Heterogeneity must be addressed
• Platform and Operating Systems 4 4 • Representation and Access Conventions 4 • Naming and Ontology : 7/26/2016 Gio Wiederhold SKC 18
A knowledge-based algebra for ontologies Intersection create a subset ontology keep sharable entries Union create a joint ontology merge entries Difference create a distinct ontology remove shared entries The Articulation Ontology (AO) consists of matching rules that link domain ontologies
7/26/2016 Gio Wiederhold SKC 19
• Operations can be composed • Operations can be rearranged • Alternate arrangements can be evaluated • Optimization is enabled • The record of past operations can be
kept and reused
7/26/2016 Gio Wiederhold SKC 20
Result contains shared terms
Terms useful for purchasing
Source Domain 1: Owned and maintained by Store
7/26/2016
Source Domain 2: Owned and maintained by Factory
Gio Wiederhold SKC 21
Articulation ontology Matching rules that use terms from the 2 source domains
Terms useful for purchasing
7/26/2016
Store Ontology Factory Ontology
Gio Wiederhold SKC 22
Articulation ontology matching rules :
size = size color =table(colcode) style = style Ana tomy {. . . } Shoe Store
• Shoes { . . . } • Customers { . . . } • Employees { . . . }
Shoe Factory
• Material inventory {...} • Employees { . . . } • Machinery { . . . } • Processes { . . . } • Shoes { . . . }
Hard ware foot = foot Employees Employees
7/26/2016 Gio Wiederhold SKC 23
UNION: merging
entire ontologies
Arti culation ontology DIFFERENCE: material
fully under local control
7/26/2016
typically prior intersections
Gio Wiederhold SKC 24
Legend:
U U : union
: intersection
Articulation knowledge U for ( ( A B ( U C U B ) C ) E ) U U Composed knowledge for applications using A,B,C,E Articulation knowledge U ( C E )
7/26/2016
Articulation knowledge U for ( A B ) Knowledge resource A ( B U C ) Knowledge resource B Knowledge resource C Knowledge resource E ( C U D ) Knowledge resource D
Gio Wiederhold SKC 25
Result has links to source
7/26/2016
Processing and evaluation is best performed within Source Domains
Gio Wiederhold SKC 26
• No need to harmonize full ontologies • Focus on what is critical for interoperation • Rules specific for articulation • Potentially many sets of articulation rules • Maintenance is distributed –to n sources –to m articulation agents
is m < n
2
, depending on architecture density
a research question
7/26/2016 Gio Wiederhold SKC 27
• Knowledge Acquisition (20% effort) & • Knowledge Maintenance (80% effort *) • Performed by: – Domain specialists – Professional organizations – Modest sized field teams
automously maintainable
7/26/2016
* based on software maintenance experience
Gio Wiederhold SKC 28
• Algebra enables Interoperation by – dealing explicitly with differences by knowledge – identifying maintenance domains – keeping sources autonomous • Assumes domain has a common ontology – composing domain ontologies requires the algebra
to manage the linkages where articulation occurs
– processes are best executed within the domains • Articulation knowledge is distributed – allows specialists to work independently – supports multiple intersections and views • Maintenance is structured and partitioned 7/26/2016 Gio Wiederhold SKC 29