Document 11036119

advertisement
'>'^"'^*4-X
(^
'
:
HD28
.M414
Context Interchange: Representing
and Reasoning about Data Semantics
in Heterogeneous Systems
Cheng Hian Goh, Stephane
Bressan,
Stuart E. Madnick, Michael D. Siegel
Sloan
CISL WP# 96-06
December 1996
WP# 3928
The Sloan School of Management
Massachusetts Institute of Technology
02142
Cambridge,
MA
*^'—
w
.'i^y
Context Interchange: Representing and Reasoning about Data
Semantics
Cheng Hian Goh^
Heterogeneous Systems*
in
Stuart E. Madnick
Stephane Bressan
Michael D. Siegel
Sloan School of Management
Massachusetts Institute of Technology
Email: {chgoh,sbressan,smadnick,msiegel}@mit.edu
December
1996
5,
Abstract
The context iNterchange (coin)
strategy [44; 48] presents a novel perspective for medi-
ated data access in which semantic conflicts
a
priori,
among heterogeneous systems
are not identified
but are detected and reconciled by a Context Mediator through comparison of con-
texts associated
with any two systems engaged
in
data exchajige. In this paper, we present
a formal characterization and reconstruction of this strategy in a COIN framework, based on
a deductive object-oriented data model and language called COIN. The COIN framework provides a logical formalism for representing data semantics in distinct contexts.
this presents
a well-founded basis
systems. In addition,
in defining
it
for
made
that
reasoning about semantic disparities in heterogeneous
combines the best features of
an integration strategy that
features are
We show
is
loose-
and
scalable, extensible
tight- coupling
and
accessible.
approaches
These
latter
possible by allowing complexity of the system to be harnessed in small
chunks, by enabling sources and receivers to remaiin loosely-coupled to one another, and by
sustaining an infrastructure for data integration.
Keywords: Context, heterogeneous
databfises, logic
and databases, mediators, semantic
interoperability.
'This work is supported in part by ARPA and USAF/Rorae Laboratory under contra£t F30602-93-C-0160, the
InternationaJ Financial Services Research Center (IFSRC), cind the PROductivity From Information Technology
(PROFIT)
project at
MIT.
'Financial support from the National University of Singapore
is
gratefully acknowledged.
Introduction
1
The last few
years have witnessed an unprecedented growth in the
(traditional database systems, data feeds, web-sites,
tions that
make data
requests). This
of access to the Internet, which
and the
rise of
new
is
users,
data warehouses, and applica-
spurred on by a number of factors: for example, ease
is
emerging as the de facto global information infrastructure,
organizational forms
adhocracies and networked organizations
(e.g.,
which mandated new ways of sharing and managing information. The advances
and telecommunications technologies have
to
information meaningfully). This problem
in
[36]),
networking
led to increased physical connectivity (the ability
exchange bits and bytes), but not necessarily
interoperability [47]
sources
and applications providing structured or
(human
semi-structured data on requests) and receivers
number of information
is
among autonomous and
logical connectivity (the ability to
exchange
traditionally referred to as the need for semantic
heterogeneous systems.
Context Interchange: Background
1.1
The
goal of this paper
for achieving
is
to describe a novel approach, called Context Interchange (coin),
semantic interoperability
strategy described in this paper drew
Specifically,
we share the basic
• the detection
and
among heterogeneous
its
sources and receivers.
The COIN
inspiration from earlier work reported in [48; 44].
tenets that
reconciliation of semantic conflicts axe system services
which are pro-
vided by a Context Mediator, and should be transparent to a user; and
mediation service requires only that the user furnish a logical
• the provision of such a
(declarative) specification of
conflicts,
when
how
data are interpreted in sources and receivers, and
detected, should be resolved,
but not what conflicts
exists a priori
how
between
any two systems.
These insights axe novel because they depaxt from
classical integration strategies
which
ei-
ther require users to engage in the detection and reconciliation of conflicts (in the case of
loosely-coupled systems; e.g.,
MRDSM
be identified and reconciled, a
schemas
priori,
[33],
VIP-MDBMS
Multibase
Unfortunately, as interesting as these ideas
may
in the absence of a formal conceptual foundation.
augmented with a
is
or insist that conflicts should
by some system administrator,
(as in tightly- coupled systems; e.g.,
basis for Context Interchange
[29]),
[30],
Mermaid
be, they could
One attempt
the semantic-value model
property-list which defines
its
context.
[44],
in
one or more shared
[49]).
remain as vague musings
at identifying a conceptual
where each data element
is
This model, however, continues to
be fraught with ambiguity. For example,
for different attributes are, as well as
kinds of conflicts, and
distinct contexts.
is
silent
on how
it
on implicit agreement on what the modifiers
relied
what conversion functions axe applicable
different conversion definitions
for different
can be associated with
Defining the semantics of data through annotations attached to individual
data elements tend also to be cumbersome, and there
is
no systematic way of promoting the
At the same time, the representational
sharing and reusing of the semantic representations.
formalism remains somewhat detached from the underlying conflict detection algorithm (the
subsumption algorithm
to
[48]).
Among
be done on a pairwise basis
and
time),
on the
is
(i.e.,
by comparing the context definitions
non-committal on how a query plan
sets of pair-wise conflicts.
single-level
other problems, this algorithm requires conflict detection
(i.e.,
property
lists
for multiple sources
for
two systems at a
can be constructed based
Furthermore, the algorithm limits meta-attributes to only a
cannot be nested), and are not able to take advantage of known
constraints for pruning off conflicts which are guaranteed never to occur.
Summary
1.2
We have two
for the
of Contributions
(intertwined) objectives in this paper. First,
Context Interchange strategy that
will not
we aim
to provide
a formal foundation
only rectify the problems described earlier,
but also provide for an integration of the underlying representational and reasoning formalisms.
Second, the deconstruction and subsequent reconstruction^ of the Context Interchange approach
described in
[48; 44]
provides us with the opportunity to address the concern for integration
strategies that are scalable, extensible
Our formal characterization
and
accessible.
of the Context Interchange strategy takes the form of a COIN
framework, based on the COIN data model, which
object-oriented model called Gulog^
[15].
coin
is
among data
translated to a normal program
is
well-defined,
there
(i.e.,
is
and
coin provide us with a well-founded basis
disparities that exist
a customized subset of the deductive
a "logical" data model in the sense that
uses logic as a formalism for representing knowledge
features of
is
for expressing operations.
for
logical
making inferences about semantic
in different contexts. In particular, a
[34] (equivalently,
The
it
coin framework can be
a Datalog^^^ program) for which the semantics
and where sound computational procedures
no real distinction between factual statements
for
(i.e.,
query answering
data
in sources)
exist.
Since
and knowledge
statements encoding data semantics) in this logical framework, both queries on data
sources [data-level queries) as well as queries on data semantics [knowledge-level queries) can
In this dixonstmction, we tease apcirt different elements of the Context Interchange strategy with the gocil
of understanding their contributions individually and collectively. The reMnstntction examines how the same
features (and more) can be accomplished differently within the formalism
^Gulog
is itself
a variant of F-logic
[28).
we have invented.
be processed
we
in
an
identical
manner.
As an
investigate the adoption of an abductive
alternative to the classic deductive framework,
framework
[27] for
query processing. Interestingly,
although abduction and deduction are "mirror-images" of each other
computed using a simple extension
[14],
the abductive answers,
to classic SLD-resolution leads to intensional answers as
opposed to extensional answers that would be obtained via deduction. Intensional answers are
useful in our
query
framework
for
a number of conceptual and practical reasons. In particular,
if
the
issued by a "naive" user under the assumption that there are no conflicts whatsoever,
is
the intensional answer obtained can be interpreted as the corresponding mediated query in which
database accesses are interleaved with data transformations required for mediating potential
conflicts. Finally,
constraints,
by checking the consistency of the abducted answers against known
we show that the abducted answer can be
connection to what
As much
cause
it
as
it
is
traditionally
known
greatly simplified, demonstrating a clear
as semantic query optimization
a logical data model, COIN
is
is
also
object- identity, type- hierarchy, inheritance, and overriding)
The standard
many
commonly
of the features
(e.g.,
associated with object-
use of abstraction, inheritance, as well as structural and behavioral
many
inheritance [28] present
[7].
an "object-oriented" data model be-
adopts an "object-centric" view of the world and supports
orientation.
integrity
opportunities for sharing and reuse of semantic encodings. Con-
version functions (for transforming the representation of data between contexts) can be modeled
as methods attached to types in a natural fashion.
formalisms,
we make some adjustments
we introduce the notion of
common
sical
for
our problem domain. In partic-
context-objects, described in [37], as reified representations for
reference point and
is
instrumental in providing a structuring mechanism for
theories which
making inferences across multiple
The
model by distinguishing between
about particular contexts. This allows context knowledge to be defined
collections of statements
with a
to the structure of our
which have particular significance
different kinds of objects
ular,
Unlike "general purpose" object-oriented
may be mutually
inconsistent.
reconstruction of the Context Interchange strategy allows us to go beyond the clas-
concern of "non-intrusion", and provides a formulation that
accessible [21].
By
scalability,
we
is
scalable, extensible
require that the complexity of creating
and administering
(maintaining) the interoperation services should not increase exponentially with the
participating sources
and
terms of
its
local
and
The above concerns
of
changes should not have adverse effects on other parts
Finally, accessibility refers to
ease-of-use
number
receivers. Extensibility refers to the ability to incorporate changes in
a graceful manner; in particular,
of the larger system.
and
flexibility in
how the system
is
perceived by a user in
supporting different kinds of queries.
are addressed in two ways in the reconstructed Context Interchange
strategy^.
Provisions for msiking sources more accessible to users
is
accomplished by shifting
the burden for conflict detection and mediation to the system; supporting multiple paradigms
data access by supporting queries formulated directly on sources as well as queries medi-
for
ated by views; by making knowledge of disparate semantics accessible to users by supporting
knowledge- level queries and answering with intensional answers; and by providing feedback in
the form of mediated queries.
distinction
Scalability
and
extensibility are addressed
between the representation of data semantics as
and the detection of potential
conflicts that
may
arise
is
known
when data
by maintaining the
in individual contexts,
are exchanged between two
systems; by the structural arrangement that allow data semantics to be specified with reference
to
complex object types
domain model
in the
as opposed to annotations tightly-coupled in the
individual database schemas; by allowing multiple systems with distinct schemas to bind to the
same contexts; by the judicious use of object-oriented
overriding in the type system present in the
domain model; and by sustaining an
ture for data integration that combines these features.
infrastructure,
we introduce a
of context axioms to
features, in particular, inheritance
As a special
meta-logical extension of the
be "objectified" and placed
effort in
infrastruc-
providing such an
COIN framework which allows
in a hierarchy,
[5].
This mechanism,
coupled with type inheritance, constitutes a powerful approach for incorporating changes
Finally,
in
new system,
or changes to the
we remark that the
feasibility
domain model)
sets
such that new and more com-
plex contexts can be derived tlirough a hierarchical composition operator
the addition of a
and
in
(e.g.,
a graceful manner.
and features of this approach have been demonstrated
a prototype implementation which provides mediated access to traditional database systems
(e.g.,
Oracle databases) as well as semi-structured data
Web-sites)^.
Organization of this Paper
1.3
The
rest of this
tional
We
(e.g..
paper
example which
is
is
organized as follows. Following this introduction, we present a motivaused to highlight selected features of the Context Interchange strategy.
demonstrate in Section 3 the novelty of our approach by comparing
other classical and contemporary approaches.
of the
COIN data model, which forms the basis
it
with a variety of
Section 4 describes the structure and language
for the
formal characterization of the Context
Interchange strategy in a COIN framework. Section 5 introduces the abduction fram,ework and
illustrates
how abductive
in particular,
inference
is
used as the basis for obtaining intensional answers, and
mediated answers corresponding to naive queries
For the sake of brevity, further references
made
(i.e.,
those formulated while
to the Context Interchange strategy from this point on refers
to this recon.struction unless otherwise specified.
This prototype
request.
is
accessible from
any WWW-client
(e.g.,
Netscape Browser) and can be demonstrated upon
assuming that sources are homogeneous). Section 6 describes a meta-logical extension
COIN framework which allows changes
scribes the prototype system
be incorporated
to
to the
manner. Section 7 de-
in a graceful
which provides integrated access to a variety of heterogeneous
sources while adopting the framework which
we have
described.
We
conclude
in Section 8
with
a number of suggestions for further work.
Motivational Example
2
Consider the scenario shown
on "revenue" and "expenses"
in
Figure
1,
deliberately kept simple for didactical reasons.
(respectively) for
some
companies are available
collection of
two autonomously-administered data sources, each comprised of a
a user
is
interested in
knowing which companies have been
Data
single relation^.
and
"profitable"
in
Suppose
their respective
revenue: this query can be formulated directly on the (export) schemas of the two sources as
follows^:
SELECT rl.cname, rl .revenue FROM rl
Ql:
,
r2
WHERE r 1 cname = r2.cname AND rl .revenue > r2. expenses;
.
In the absence of any mediation, this query will return the
the extensional data set shown in Figure
The above
empty answer
if it is
executed over
1.
query, however, does not take into account the fact that data sources
istered independently
and have
different contexts:
i.e.,
they
may embody
different
cire
admin-
assumptions
on how information contained therein should be interpreted. To simplify the ensuing discussion,
we assume that the data reported
scale-factors of
"company
financials"
in the
(i.e.,
two sources
using the currency shown and a scale- factor of
Japanese Yen (JPY);
reports
all
"company
in
1;
USD using a
1, all
"company
the only exception
which case the scale-factor
financials" in
only in the currencies and
financial figures pertaining to the companies,
include revenue and expenses). Specifically, in Source
in
differ
is
1000.
scale-factor of
1.
is
which
financials" are reported
when they
Source
2,
are reported
on the other hand,
In the light of these remarks,
we make the assumption that the relationcil data model is adopted to be the canonical
assume
that the database schemas exported by the sources are relational and that queries
i.e., we
are formulated using SQL (or some extension thereof). This simplifies the discussion by allowing us to focus on
semantic conflicts in disparate systems without being detracted by conflicts over data model constructs. The
choice of the relational data model is one of convenience rather than necessity, and is not to be construed as a
'Throughout
data model [47]:
this paper,
constraint of the integration strategy being proposed.
®We assume, without loss of generality, that relation ncimes are unique across all data sources. This can
always be accomplished via some renaming scheme: say, by prefixing relation name with the name of the data
source
(e.g.,
<ibl#rl).
Context c.
Context
c,
2
All "company ftnancials " are reported in
USD,
using a scale-factor of 1
Alt "company ftnancials " ("revenue " inclusive) are reported
in the currency
shown for each company
Alt "company ftnancials" are reported using
Query
a scale-factor of 1
except for items reported in JPY, where the
1000
scale-factor is
select rl. cname, rl. revenue
from rl, r2
where rl cname = r2 cname
and rl revenue > r2. expenses
.
.
rl
cname
revenue
IBM
1
NTT
1
currency
USD
JPY
000 000
000 000
r2
cname
Source
I
.
User
necessarily being precise at all times; formal definitions of the underlying representational and
inference formalisms are postponed to Sections 4,
5,
and 6 of
this paper.
A
Functional Features of Context Interchange:
2.1
User Perspective
Unlike classical and contemporary approaches, the Context Interchange approach provides users
with a wide array of options on how and what queries can be asked and the kinds of answers
which can be returned. These features work
in
tandem
to allow greater flexibility
and
effective-
ness in gaining access to information present in multiple, heterogeneous systems.
Query Mediation: Automatic Detection and Reconciliation of
In a Context Interchange system, the
mediator
among
[54],
sites
and access
Conflicts
same query (Ql) can be submitted
to a specialized
which rewrites the query so that data exchange
called a Context Mediator,
having disparate contexts axe interleaved with appropriate data transformations
to ancillary
We
data sources (when needed).
refer to this
transformation as query
mediation and the resulting query as the corresponding mediated query.
For example, the mediated query
MQl:
MQl
corresponding to Ql
SELECT rl.cname, rl .revenue FROM rl
WHERE rl. currency
=
'USD' AND
is
given by:
r2
,
rl.cname
=
r2.cnaine
AND rl. revenue > r2. expenses;
UNION
SELECT rl.cname, rl. revenue
WHERE rl. currency
AND rS.fromCur
AND rl. revenue
=
=
*
1000
= r2.cname
rS.toCur = 'USD'
'JPY' AND rl.cname
rl. currency AND
1000
*
r3. rate FROM rl, r2, r3
*
*
rS.rate > r2. expenses
UNION
SELECT rl.cname, rl. revenue
WHERE rl. currency
AND r3.fromCur
AND rl.cname
=
=
<>
*
rl. currency
'USD' AND
rl .currency AND rS.toCur
r2.cname AND rl.revenue
This mediated query considers
all
,
r2, r3
<>
'JPY'
rS.rate FROM rl
=
* r3.
'USD'
rate
>
r2. expenses;
potential conflicts between relations rl and r2
when com-
paring values of "revenue" and "expenses" as reported in the two different contexts. Moreover,
the answers returned
receiver.
1
Thus
in
000 000. More
may be
further transformed so that they conform to the context of the
our example, the revenue of NTT
specifically, the three-part
will
be reported as 9 600 000 as opposed to
query shown above can be understood as follows.
8
The
1;
first
subquery takes care of tuples
in this case, there
is
no
for
which revenue
reported in USD using scale-factor
is
The second subquery handles
conflict.
tuples for which revenue
is
reported in JPY, implying a scale-factor of 1000. Finally, the last subquery considers the case
where the currency
Conversion
among
is
neither JPY nor USD, in which case only currency conversion
different currencies
is
is
needed.
aided by the ancillary data source r3 which provides
currency conversion rates.
This second query, when executed, returns the "correct" answer
consisting only of the tuple
< 'NTT'
Support
for
9 600 000>.
,
Views
In the preceding example, the query
various sources.
While
formulated directly on the export schema for the
is
this provides a great deal of flexibility,
what data are present where and be
(so as to construct
Ql
a query).
A
sufficiently familiar
it
also requires users to
know
with the attributes in different schemas
simple and yet effective solution to these problems
is
to allow
views to be defined on the source schemas and have users fornmlate queries based on the view
instead. For example,
we might
define a view
on
relations rl
and r2, given by
CREATE VIEW vl (cname, profit) AS
SELECT rl. cname, rl. revenue
-
r2. expenses
FROM rl, r2
WHERE rl. cname = r2. cname;
In which case, query
VQl:
Ql can be equivalently formulated on the view vl
as
SELECT cname, profit FROM vl
WHERE profit > 0;
While achieving
essentially the
same
functionalities as tightly-coupled systems, notice that
view definitions in our case are no longer concerned with semantic heterogeneity and mcike no
attempts at identifying or resolving
conflicts.
In fact, any query on a view (say,
can be trivially rewritten to a query on the source schema
(e.g.,
VQl on
vl)
Ql). This means that query
mediation can be undertaken by the Context Mediator as before.
Knowledge-Level versus Data-Level Queries
Instead of inquiring about stored data,
it is
sometimes useful
of data which are implicit in different systems.
to
be able to query the semantics
Consider, for instance, the query based on a
superset of SQL*^:
Sciore et
aJ.
[43]
have described a similar (but not identical) extension of SQL in which context is treated
We are not concern with the exact syntax of such a language here; the issue at hand is
as a "first-class object".
SELECT rl.cname, rl .revenue. scaleFactor IN cl,
Q2:
rl. revenue. scaleFactor IN c2 FROM rl
WHERE rl .revenue. scaleFactor IN cl <> rl .revenue. scaleFactor IN c2;
Intuitively, this
(in
query asks for companies
for
which
scale- factors for reporting "revenue" in rl
context cl) differ from that which the user assumes (in context c2).
such as
Q2
as knowledge-level queries, as
We
refer to queries
opposed to data-level queries which are enquires on
factual data present in data sources. Knowledge-level queries have received little attention in
the database literature and certainly have not been addressed by the data integration
This, in our opinion,
nity.
arises primarily
If so,
a significant gap since heterogeneity in disparate data sources
from incompatible assumptions about how data are interpreted. Our
to integrate access to
differences
is
commu-
among
ability
both data and semantics can be exploited by users to gain insights into
particular systems ("Do sources
how?"), or by a query optimizer which
A and B
may want
report a piece of data differently?
to identify sites with
minimal conflicting
interpretations (to minimize costs associated with data transformations).
Interestingly, knowledge-level queries
anism
for
can be answered using the exact same inference mech-
mediating data-level queries. Hence, submitting query
Q2
to the Context Mediator
will yield the result:
MQ2:
SELECT rl.cname, 1000,
1
FROM rl
WHERE rl. currency = 'JPY';
which indicates that the answer consists of companies
for
which the currency attribute has
value 'JPY', in which case the scale-factors in context cl and c2 are 1000 and
If desired,
the mediated query
MQ2
1,
we would obtain the
respectively.
can be evaluated on the extensional data set to return an
answer grounded in actual data elements. Hence,
Figure
1
if
MQ2
is
evaluated on the data set shown in
singleton answer <'NTT', 1000, 1>.
Extensional versus Intensional Answers
Yet another feature of Context Interchange
is
that answers to queries can be both intensional
and extensional. Extensional answers correspond to
fact-sets
which one normally expects of
a database retrieval. Intensional answers, on the other hand, provide only a characterization
of the extensional answers without actually retrieving data from the data sources.
preceding example,
MQ2
In the
can in fact be understood as an intensional answer for Q2, while the
tuple obtained by the evaluation of
how we might support the underlying
MQ2
constitutes the extensional answer for Q2.
inferences needed to answer such queries.
10
In the
names of
COIN framework, intensional answers are grounded
in extensional predicates
relations), evaluable predicates (e.g., arithmetic operators or "relational" operators),
and external functions which can be directly evaluated through system
answer
is
(i.e.,
calls.
The
intensional
thus no different from a query which can normally be evaluated on a conventional
query subsystem of a
DBMS. Query
step process: an intensional answer
answering in a Context Interchange system
is
thus a two-
returned in response to a user query; this can then
is first
be executed on a conventional query subsystem to obtain the extensional answer.
The intermediary
intensional answer serves a
number
of purposes
[24].
Conceptually,
it
constitutes the mediated query corresponding to the user query
and can be used to confirm the
More
often than not, the intensional
user's
understanding of what the query actually
entails.
answer can be more informative and easier to comprehend compared to the extensional answer
it
derives. (For
example, the intensional answer
MQ2
actually conveys
merely returning the single tuple satisfying the query.)
computation of extensional answers are
compared
likely to
more information than
From an operational standpoint,
be many orders of magnitude more expensive
makes good
to the evaluation of the corresponding intensional answer. It therefore
sense not to continue with query evaluation
if
the
the intensional answer satisfies the user.
From
a
practical standpoint, this two-stage process allows us to separate query mediation from query
optimization and execution. As we will illustrate later in this paper, query mediation
is
driven
by logical inferences which do not bond well with the (predominantly cost-based) optimization
techniques that have been developed
[40; 45].
The advantage
of keeping the two tasks apart
is
thus not merely a conceptual convenience, but allows us to take advantage of mature techniques
for
query optimization in determining how best a query can be evaluated.
Query Pruning
Finally, observe that consistency checking
is
process, allowing intensional answers to be
performed as an integral activity of the mediation
pruned
(in
answers which are better comprehensible and more
some
(MQl) would have only
third) since the other
For example,
efficient.
modified to include the additional condition "rl. currency =
returned
cases, significantly) to aurive at
'
if
Ql had been
JPY'", the intensional answer
the second SELECT statement (but not the
first
and the
two would have been inconsistent with the newly imposed condition.
This pruning of the intensional answer, accomplished by taking into consideration integrity
constraints (present as part of a query, or those defined on sources) and knowledge of data
semantics in distinct systems, constitutes a form of semantic query optimization
[7].
Consistency
checking however can be an expensive operation and the gains from a more efficient execution
must be balanced against the
cost of performing the consistency check during query mediation.
11
In our case, however, the benefits are ampUfied since spurious conflicts that remain undetected
could result in an additional conjunctive query involving multiple sources.
2.2
It is
Functional Features of Context Interchange:
System Perspective
natural to assume the internal complexity of any system will increase in commensuration
with the external functionalities
We make
is
A
it
offers.
no claim that our approach
is
The Context Interchange system
"simple"; however,
we submit that
is
no exception.
this
complexity
decomposable and well-founded. Decomposability has obvious benefits from a system engi-
neering perspective, allowing complexity to be harnessed into small chunks, thus making our
more endurable, even when the number of sources and
integration approach
ponentially large and
because
it
is
when changes
are rampant.
The complexity
is
receivers are ex-
said to
be well-founded
possible to characterize the behavior of the system in an abstract mathematical
framework. This allows us to understand the potential (or limits) of the strategy apart from
the idiosyncrasies of the implementation, and
how improvements can be made. We
is
is
useful for providing insights into where
describe below
some of the ways
in
and
which complexity
decoupled in a Context Interchange system, as well as tangible benefits which result from
formalization of the integration strategy.
Representation of 'Meaning' as opposed to Conflicts
As mentioned
earlier,
a key insight of the Context Interchange strategy
is
that
we can
represent
the meaning of data in the underlying sources and receivers without identifying and reconciling
all
potential conflicts which exist between any two systems. Thus, query mediation can be per-
formed dynamically
(as
when
a query
is
submitted) or
it
can be used to produce a query plan
(the mediated query) that constitutes a locally-defined view. In the latter case, this view
ilar
to the shared
changes do occur
schemas
(say,
in tightly-coupled systems
when
sim-
with one important exception: whenever
the semantics of data encoded in some context
Context Mediator can be triggered to reconstruct the
is
local
in tightly-coupled systems, this reconstruction requires
is
changed^), the
view automatically. Unlike the case
no manual intervention from the system
administrators. In both of the above scenarios, changes in local systems are well-contained and
do not mandate human intervention
This represents a
in other parts of the larger system.
significant gain over tightly-coupled systems
where maintenance of shared schemas constitute
a major system bottleneck.
*For an account of
why
this seemingly strange
phenomenon may be more common than
be, see [52].
12
is
widely believed to
Contexts versus Schemas
conceptualization
Unlike the semantic-value model, the COIN data model adopts a very different
of contexts: instead of a property-list associated with individual data elements,
as consisting of a collection of
receiver
axioms which describes a particular "situation" a source or
is in.
Domain Model
.....7(8)-=:::,-..
semanticNu mber
scaleFaclor
currency
moneyAmt
1^1
currency TVpe
companyFinan c
i
al s
semantic-relation
r2'
we view context
semantics tnDg
Legend
object for which the object-id
a Skolem function
is
relation r present in a source, there
Among
semantic-objects.
defined on
[8]
an isomorphic relation
is
r'
some key
defined
values.
on the corresponding
other features, this structure allows us to encode assumptions of the
underlying context independently of structure imposed on data in underlying sources
schemes).
For each
For example, the constraint:
all
"company
financials" are reported in
(i.e.,
US
the
Dollars
within context C2 can be described using the clause^
X'-.companyFinancials, currency {c2, X'):currencyType h
curre/Jcy(c2,X')[va/ue(c2) ->'USD'].
The method currency
object
is
reported.
A
is
said to be a modifier because
it
modifies
how the
value of a semantic-
detailed presentation of the details of this framework will be presented
later in Section 4.
The dichoton y between schemas and
contexts present a
number
of opportunities for sys-
tematic sharing and reuse of semantic encodings. For example, different sources and receivers
same context may now bind
in the
which correlate with one another
the same semantic-type
(e.g.,
to the
(e.g.,
same
set of context axioms;
and
distinct attributes
revenue, expenses) may be mapped to instances of
companyFinancials). These circumvent the need to define a new
property-list for all attributes in each schema.
Unlike the semantic value model, there
ambiguity on what modifiers can be introduced as "meta-attributes" since
semantic- types are explicitly defined in the domain model.
section, views
As pointed out
all
is
no
properties of
in the previous
can be (more simply) defined on the extensional relation independently of the
context descriptions. This constitutes yet another benefit of teasing apart the structure of a
and semantics which are
source,
implicit in
its
context.
Inheritance and Overriding in Semantic- Types
Not
in
surprisingly, the various features frequently associated with "object-orientation" are useful
our representation scheme as
archy,
well.
Semantic-types
fall
naturally into a generalization hier-
which allow us to take advantage of structural and behavioral inheritance
economy of expression. Structural inheritance allows a semantic-type
tions defined for
from
moneyAmt
scaleFactor.
well.
its
Hence,
®The
supertypes.
in achieving
to inherit the declara-
For example, the semantic-type companyFinancials inherits
the declarations concerning the existence of the "methods" currency and
Behavioral inheritance allows the definitions of these methods to be inherited as
if
we had defined
syntcLx of this language
is
earlier that instances of
moneyAmt
has a scale- factor of
defined in Section 4 and corresponds to that of Gulog
14
[15].
1,
instances of companyFinancials would inherit the
all
companyFinancials
is
same
scale-factor since every instance of
an instance of moneyAmt.
Inheritance need not be monotonic. Non-monotonic inheritance means that the declaration
or definition of a
viewed
cis
defaults
method can be overridden
which can always be changed to
Value Conversion
Among
Thus, inherited definitions can be
in a subtype.
reflect
the specificities at hand.
Different Contexts
Yet another benefit of adopting an object-oriented model in our framework
is
that
it
allows
conversion functions on values to be explicitly defined in the form of methods defined on various
moneyAmt
semantic types. For example, the conversion function for converting an instance of
from one scale-factor (F in context C) to another (Fi in context
Ci)
can be defined as follows:
X' -.moneyAmt h
X'[cvt(ci)@scaleFactor, C,
U^V]
<r-
X'[scaleFactor(ci)^-[value{ci)^Fi]],
X'[sca}eFactor{C)^.[vahieici)^F]],
This conversion function, unless explicitly overridden,
request for scale-factor conversion on an object which
the conversion
is
to
is
will
may
phenomenon
is
ci
,
Overriding can take place along
defined with reference to context
or not) will have to be defined explicitly. This
a powerful mechanism for different users to introduce their
The apparent redundancy
definition in diff'erent contexts)
is
(in
own
interpretations of disparate
having multiple instances of the same
addressed through the adoption of a context hierarchy which
described next.
Hierairchical composition of
By
Contexts
"objectifying" sets of axioms associated with contexts,
relationship
among
contexts.
If c
is
a subcontext of
said to apply in c unless they are "overridden".
is
C\.
allows diff"erent conversion functions to be associated with different contexts and
data in a localized fashion.
is
a
only: in order for scale-factor conversion to take place in a different context, the conversion
function (which could be identical to the one in
is
is
introduce a different conversion function for a
subtype of moneyAmt. Notice that this conversion function
Ci
be invoked whenever there
F/F^.
an instance of moneyAmt and when
be performed with reference to context
the generalization hierarchy: as before, we
V = U*
to
make
all
c',
then
we can introduce a
all
hierarchical
the axioms defined in
An immediate
(/
are
application of this concept
"functional" contexts subcontexts of a default context Cq, which contains the
default declarations
and method
definitions.
Under
15
this scheme,
new contexts introduced need
only to identify
method
how
definitions
it is
different
from the default context and introduce the declarations and
which need to be changed (overridden). This
is
formulated as a meta-logical
extension of the COIN framework and will be described in further details in Section
Another advantage of having
the
domain model
in
this hierarchy of context
is
6.
the ability to introduce changes to
an incremental fashion without having adverse
effects
on existing systems.
For example, suppose we need to add a new source for which currency units take on a different
representation
'Japanese Yen' versus
(e.g.,
'
JPY'). This distinction has not been previously
captured in our domain model, which has hitherto assumed currency units have a homogeneous
representation.
To accommodate the new data
source,
it is
necesscU-y to
add a new modifier
for
currencyType, say format, in the domain model:
cuTrencyType[forinat(ctx) => semanticString].
Rather than making changes to
all
existing contexts,
we can
assign a default value to this
modifier in cq, and at the same time, introduce a conversion function for mapping between
currency representations of different formats
'Japanese Yen' and 'JPY'):
(e.g.,
X: currencyType, format{co, X):semanticString h
format(co, A')[vaJue(co)—) 'abbreviated'].
X -.currencyType h
The
X -.currencyType,
3
U^V]
add
to the
process
last step in this
which distinguishes
X[cvt(co)@C,
it
is
to
format{c'
,
<r-
.
.
.
(body)
new context
X) .semanticString h
.
.
(cf)
the following context axiom:
format(c',X)[vaJue(c')-> 'full'].
as having a different format.
Context Interchange
vis-a-vis Traditional
and Contemporary
Integration Approaches
In the preceding section,
we have made
detailed
comments on the many
features that the
Context Interchange approach has over traditional loose- and tight-coupling approaches.
summary, although tightly-coupled systems may provide better support
for
In
data access to
heterogeneous systems (compared to loosely-coupled systems), they do not scale-up effectively
given the complexity involved in constructing a shared schema for a large
and are generally unresponsive
other hand, require
little
to changes for the
number
of systems
same reason. Loosely-coupled systems, on the
central administration but are equally non-viable since they require
users to have intimate knowledge of the data sources being accessed; this assumption
16
is
generally
non-tenable
when the number
of systems involved
The Context Interchange approach
is
and when changes are
large
frequent^°.
provides a novel middle ground between the two:
allows
it
queries to sources to be mediated in a transparent manner, provides systematic support for
elucidating the semantics of data in disparate sources and receivers, and at the
same
time, does
not succumb to the complexities inherent in maintenance of shared schemas.
At a cursory
level,
the Context Interchange approach
may appear
similar to
many contem-
porary integration approaches. Examples of these commonalities include:
•
framing the problem
in
McCarthy's theory of contexts
[37] (as in
Carnot
[11],
and more
recently, [18]);
•
encapsulation
[3]
of semantic knowledge in a hierarchy of rich data types which are refined
via sub-typing (as in several object-oriented multidatabase systems, the archetype of which
is
•
Pegasus
[2]);
adoption of a deductive or object-oriented formalism
database System
• provision of
We
[26]
and
DISCO
[28;
15]
(as in the
ECRC
Multi-
[50]);
value-added services through the use of mediators
[54] (as in
TSIMMIS
[20]);
posit that despite these superficial similarities, our approach represents a radical departure
from these contemporary integration strategies.
To begin with, a number of contemporary
integration approaches are in fact attempts aimed
at rejuvenating the loose- or tight-coupling approach.
These are often characterized by the
adoption of an object-oriented formalism which provides support for more effective data transformation
(e.g.,
0*SQL
[32])
or to mitigate the effects of complexity in schema creation and
change management through the use of abstraction and encapsulation mechanisms. To some
extent,
and
contemporary approaches such as Pegasus
DISCO
[50]
[2],
the
can be seen as examples of the latter strategy.
text Interchange strategy since they continue to rely on
conflicts a priori
ECRC
and
in the
human
Multidatabase Project
These
differ
from the Con-
intervention in reconciling
maintenance of shared schemas. Yet another difference
though a deductive object-oriented formalism
is
also used in the
is
that
is
al-
Context Interchange approach,
"semantic-objects" in our case exist only conceptually and are never actually materialized.
implication of this
[26],
One
that mediated queries obtained from the Context Mediator can be further
We
have drawn a sharp distinction between the two here to provide a contrast of their relative features. In
one is most likely to encounter a hybrid of the two strategies. It should however be noted that the two
strategies are incongruent in their outlook and are not able to easily take advantage of each other's resources.
For instance, data semantics encapsulated in a shared .schema cannot be ea-sily extracted by a u.ser to assist in
formulating a query which seeks to reference the source schemas directly.
practice,
17
optimized using traditional query optimizers or be executed by the query subsystem of
(relational)
query subsystems without changes.
In the Carnot system
[11],
semantic interoperability
axioms which translate "statements" which are true
are meaningful in the
it
Cyc knowledge base
A
[31].
suggested that domain-specific ontologies
is
classical
[22],
is
accomplished by writing articulation
in individual sources to
similar approach
statements which
adopted
is
in [18],
where
which may provide additional leverage by
allowing the ontologies to be shared and reused, can be used in place of Cyc. While we
like
the explicit treatment of contexts in these efforts and share their concern for sustaining an
infrastructure for data integration, our realization of these differ significantly.
axioms
map
[23] in
our case operate at a finer
level of granularity: rather
"statements" present in a data source to a
First,
lifting
than writing cixioms which
common knowledge
base, they are used for
describing "properties" of individual "data objects". Second, instead of having an "ontology"
which captures
aF.
structural relationships
we have a domain model which
is
These differences account largely
Finally,
we remark
that the
a
much
among data
less
elaborate collection of complex semantic-types.
for the scalability
TSIMMIS
objects (much like a "global schema"),
[41; 42]
and
extensibility of our approach.
approach stems from the premise that
formation integration could not, and should not, be fully automated.
TSIMMIS
mans
of a "light-weight" object
and integration
model which
this translates to the strategy of
tics
is
making sure that attribute
for free-text descriptions
when the data
achieved. However, there are also
many
We
labels are as descriptive as possible
may be
effective
when consensus on a shared vocabulary cannot be
(e.g.,
where data sources are
relatively
some consensus can be reached) where human intervention
is
is
not
primarily responsible for the different approaches
our strategy.
The COIN Data Model:
4
This motivated the invention
concur that this approach
other situations
appropriate or necessary: this distinction
TSIMMIS and
assist hu-
("man-pages") which provide elaborations on the seman-
sources are ill-structured and
well-structured and where
activities.
mind,
intended to be self-describing. For practical purposes,
of information encapsulated in each object.
taken in
this in
opted in favor of providing both a framework and a collection of tools to
in their information processing
and opting
With
in-
Structure, Language, and Frame-
work
In
[37],
McCarthy pointed out that statements about the world
are never always true or
false:
the truth or falsity of a statement can only be understood with reference to a given context.
This
is
formalized using assertions of the form:
18
ist{c,a)
c:
which suggests that the statement a
an outer context
asserted in
is
true ("«<") in the context c^\ this statement itself being
Lifting axioms^^ are used to describe the relationship
c.
between
statements in different contexts. These statements are of the form
ist{c,a)
c:
which suggests that
"cr
<=>
ist{c',a')
in c states the
same thing
McCarthy's notion of "contexts" and
"lifting
eling statements in heterogeneous databases
From
facts
this perspective, factual
as a' in c"'.
axioms" provide a useful framework
which are seemingly
in conflict
for
mod-
with one another.
statements present in a data source are no longer "universal"
about the world, but are true relative to the context associated with the source but not
necessarily so in a different context. Thus,
with sources
1
and 2
in Figure
1,
if
we
we may now
c:
ist(ci,
rl('NTT', 1000 000, 'JPY'))-
c:
«s<(c2,
r2(' NTT', 5 000 000)).
The context
c
assign the labels
may be omitted
and
C2 to contexts associated
write:
above refers to the ubiquitous context
the integration context) and
c\
in the
in
which our discourse
is
conducted
ensuing discussion whenever there
(i.e.,
is
no
ambiguity.
In the Context Interchange approach, the semantics of data are captured explicitly in a
collection of statements asserted in the context associated with each source, while allowing
conflict detection
and reconciliation to be deferred to the time when a query
Building on the ideas developed in
[48; 44],
we would
like to
is
submitted.
be able to represent the semantics
of data at the level of individual data elements (as opposed to the predicate or sentential level),
which allows us to identify and deal with
individual data elements
the value
1
000 000
may be present
in relation
"in the words of Guha
[23],
in a relation
rl (as shown in Figure
IBM and NTT while being reported
associated with the context."
conflicts at a finer level of granularity. Unfortunately,
without a unique denotation. For instance,
1)
in different currencies
simultaneously describes the revenue of
and
scale-factors.
Thus, the statements
contexts represents "the reification of the context dependencies of the sentences
They
are said to be "rich-objects" in that "they cannot be defined or completely
Consider, for instance, the context associated with the statement; "There are fovir billion people
hving on Earth". To fully qualify the sentence, we might add that it assumes that the time is 1991, However,
described"
[38].
not the only relevant assumption in the underlying context, since there are implicit assumptions
considered a "live person" (are fetuses in the womb alive?), or what it means to be "on earth"
include people who are in orbit around the earth?)
this certainly
about who
(does
it
is
is
aJso called articulation
axioms
in
Cyc/Carnot
[11].
19
ist{ci,
currency(R,Y) f- t1{N,R,Y)).
ist{ci,
scaIeFactor{R, 1000) <- currency(R,Y),
ist{cu scaleFactor{R,
\)
Y
<- currency {R,Y),
F
='JPY').
t^'JPY')-
intending to represent the currencies and scale-factors of revenue amounts will result in multiple
To circumvent
inconsistent values.
this
problem, we introduce semantic-objects, which can
be referenced unambiguously through their object-ids.
Semantic-objects are complex terms
constructed from the corresponding data values (also called primitive- objects) and are used as
a basis for inferring about conflicts, but are never materialized in an object-store. This will be
described in further details in the next section.
The data model underlying our
integration approach, called
COIN
(for
context INterchange),
both a structural component describing how data objects are organized, and a
consists of
lan-
guage which provides the basis for niciking formal assertions and inferences about a universe
of discourse. In the remainder of this section,
we provide a description of both of these com-
ponents, followed by a formal characterization of the Context Interchange strategy in the form
COIN framework. The
of a
introduced in Section
The
4.1
latter will
be illustrated with reference to the motivational example
2.
Structural Elements of
The COIN data model
is
COIN
a deductive object-oriented data model designed to provide explicit
support for Context Interchange. Consistent with object- orientation
modeled as
in
objects,
having unique and immutable object-ids
a generalization hierarchy with provision
between two kinds of data objects
types,
oid
information units are
and corresponding to
non-monotonic inheritance.
in coin: primitive objects,
distinguish
in
COIN have both an
value: these are identical in the case of primitive-objects, but different for semantic-
objects. This
an important distinction which
is
Primitive-types correspond to data types
to sources
We
types
which are instances of primitive
and semantic-objects which are instances of semantic-types. Objects
and a
and
receivers.
will
become apparent
(e.g., strings, integers,
shortly.
and
reals)
which are native
Semantic-types, on the other hand, are complex types introduced to
support the underlying integration strategy.
ties
for
(oids),
[3],
Specifically, semantic-objects
may have
proper-
which are either attributes or modifiers. Attributes represent structural properties of the
semantic-object under investigation: for instance, an object of the semantic-type companyFinancials must, by definition, describes
by defining the attribute company
other hand,
how
Eire
some company; we capture
for the semantic-type
this structural
dependency
companyFinancials. Modifiers, on the
used as the basis for capturing "orthogonal" sources of variations concerning
the value of a semantic-object
may be
interpreted. Consider the semantic-type
20
money Ami:
the modifiers currency and scaleFactor defined for
in
how
the value corresponding to an instance of
money Ami
suggests two sources of variations
moneyAmt may be
interpreted. "Orthogonal-
here refers to the fact that the value which can be assigned to one modifier
ity"
of other modifiers, as
the case with scaleFactor and ciu-rency. This
is
is
is
independent
not a limitation on the
expressiveness of the model since two sources of variations which are correlated can always be
modeled as a single modifier. As we
in dealing
shall see later, this simplification allows greater flexibility
with conversions of values across
diff"erent contexts.
may be
Unlike primitive-objects, the value of a semantic-object
For example,
texts.
of NTT,
to
it is
if
the (Skolem) term sko
the oid for the object representing the revenue
is
perfectly legitimate for both
(1)
ist{ci,
value{skQ,1000 000)); and
(2)
ist{c2,
value(sko,9 600 000)),
be true since contexts
c\
and
C2
embody
different
assumptions on what currencies and
factors are used to report the value of a revenue amount^''. For our
the case that the value of a semantic-object
the case in the example above, where (1)
by a special
lifting
Definition 1 Let
t
in
/cvt,
some
known, but not
often
context, but not others. This
(2).
t
is
given in context
The
derivation of (2)
is
is
aided
^ X)
<^
it
is
ist{cs, value{t,X')).
derived from context
and say that
X
is
the value of
it
is
t
Q
Cs-
shall see later, the conversion function referenced
dependent on the type of the object to which
r,
For any context represented by C, we have
Cg.
as the conversion function for t in context C,
context C, and that
As we
in
it is
be an oid-term corresponding to a semantic-object of the semantic-type
ist{C, value{t,X) <- fcvt{t,Cs,X')
refer to
is
known
is
problem domain,
scale-
axiom defined below.
and suppose the value of
We
different in different con-
above
applied,
is
polymorphically defined, being
and may be
different in distinct
contexts.
Since modifiers of a semantic-type are orthogonal by definition, the conversion function
referenced in the preceding definition can in fact be composed from other simpler conversion
methods defined with reference to each modifier. To distinguish between the two, we
first
as a composite conversion function,
A
and the
latter as
predicate-calculu.s language
refer to the
atomic conversion functions. Suppose
is used in the discussion here since it provides better intuition for most readers.
which properties are modeled a-s "methods" (allowing us to write sA:o[vaJue->l 000 000]
as opposed to value{skoA 000 000)), will be formaJly defined in Section 4.2.
The COIN language,
for
21
modifiers of a semantic-type r are mi,
T.
It
follows that
fcvt{t,
if t is
=
Cs,X')
X
.
.
,
.
an object of type
3X\,
if
.
..
,
mj^,
r,
and
/cvt is
a composite conversion function for
then
Xk-] such that
{f^,'2it^c,,X')
=
X,) A--- A {£lit^Cs,Xk-,)
= X)
where f^H corresponds to the atomic conversion function with respect to modifier
rrij.
Notice
that the order in which the conversions are eventually effected need not correspond to the
ordering of the atomic conversions imposed here, since the actual conversions are carried out
in a lazy fashion
Finally,
ments
and depends on the propagation of variable bindings.
we note that value-based comparisons
We
here.
in the relational
say that two semantic-objects are distinct
may be
distinct semantic-objects
Definition 2 Let
©
if
model requires some adjust-
their oids are different. However,
semantically-equivalent as defined below.
be a relational operator from the
set
{=,
<,>,<,>,
7^,
.
.}.
If
i
and
t'
are
oid-terms corresponding to semantic-objects, then
O
(t®t')
In particular,
(value{t,X)
we say
that
t
A
vahie{t',X')
and
t'
A
X ® X')
are semantically-equivalent in context c
if
ist{c,t=t').
We sometimes abuse the notation slightly by allowing primitive-objects to participate in semanticcomparisons. Recall that we do not distinguish between the oid and the value of a primitive
object; thus, ist{C, va/ue(l
we know
000 000,
1
1000
that ist{Ci, value{skQ,
000000))
is
true irregardless of
what
C
may
000)), where sko refers to the revenue of
be.
NTT
Suppose
as before.
The expression
sko
<
5 000
000
will therefore evaluate to
in context c\
but not context
C2,
This latter fact can be derived from the value of sko
9 600 000)).
a priori in
"true"
rj and
since ist{c2,
in ci
(which
value{sko,
is
reported
the conversion function associated with the type companyFinancials (see
Section 5.3).
4.2
The Language
We describe
of
COIN
in this section the
inspired largely by Gulog
syntax and informal semantics of the language of COIN, which
[15]^''.
Rather than making inferences using a context
is
logic (see, for
[28] in that method rules cire bound to the underlying types, which leads to different
non-monotomc inheritance. Specifically, in the case of F-logic, it is not rales but
ground expressions that axe handed down the generalization hierarchy. Since we are interested in reasoning at
the intensional level, the former model is more appropriate for us.
^"Gulog
differs
from F-logic
approciches for dealing with
22
example,
[6]),
we introduce "context"
contexts through the use of parameterized methods.
value{sko,
000 000)) can be equivalently written as sko[value{c\)^
1
common
"vocabulary"
(i.e.,
what types
in different
For example, the context-formula
represents a (single- valued) method. This simplification
to a
and capture variations
as first-class objects
is
1
is/(ci,
000 000] where va]ue(ci)
possible because of our
commitment
and what methods are applicable) and the
exists
remains immutable across different contexts. By writing statements which
fact that object ids
are fully decontextualized
the integration context),
we
from the individual source and receiver contexts
"lifted"
(i.e.,
into
are able to leverage on semantics and proof procedures developed
without provision for contexts.
Following
[34],
we
define an alphabet as consisting of (1) a set of type symbols which are
partitioned into symbols representing semantic-types and primitive-types: each of which have
a distinguished type symbol, denoted by
T^ and
T-p respectively; (2)
an
infinite set of constant
symbols which represents the oids (or identically, values) of primitive-objects;
function symbols and predicate symbols;
and
modifiers,
built-in
methods
a set of method symbols corresponding to attributes,
(4)
va/(ie
(e.g.,
and
cvt);
usual logical connectives and quantifiers A, V, V, 3,
:,
—>,
::,
A
term
a function symbol and
is
oids, they are called oid-terms.
n-ary
if
of
T
•
T
and
if i
declaration
r' are
if
also
is
p
if
is
ii
,
.
.
is
infinite set of variables; (6) the
etc; (7) auxiliary
symbols such as
(,
.
,
either a constant, a variable, or the token /(ii,
i,j
),
],
[,
.
.
.
,
tn)
are terms. Since terms in our model refer to (logical)
Finally, a predicate, function, or
method symbol
said to be
is
a subtype of
a term and r
is
in
is
defined as follows:
type symbols, then r
is
r' is
::
a type declaration.
r' is
We
say that r
is
a subtype
a supertype of t. For any type symbol r" such that
r'
::
r",
t".
a type symbol, then
r. If r' is
We
t
t
:
a supertype of
an n-ary predicate symbol, and
predicate declaration.
•
is
and conversely, that
t',
is
A
an instance of type
•
an
expects n arguments.
if it
Definition 3
•
->,
(5)
^ and so forth; and finally, (8) a set of context symbols, of the distinguished object-type
called ctx, denoting contexts.
where /
a set of
(3)
ti
,
.
t,
,
.
.
is
an object
then
t
is
an attribute declaration.
We
t, r'
r^ are type symbols, then p(ti
is ri
x
•
•
•
x
,
.
m.
23
is
,
.
.
t„)
is
/,
.
is
a
r„.
are symbols denoting semantic-types, then
say that the signature of the attribute
the semantic-type r has attribute
say that
said to be of inherited type t'
say that the signature of predicate p
an attribute symbol and
We
declaratiori.
t—>t',
T[m^T']
and that
•
if
is
m
a modifier symbol, and
is
r, r'
We
a modifier declaration.
m
say that
Without any
signature t—^t'.
are symbols denoting semantic-types, then T[m{ctx)=i-T']
is
a modifier of the semantic-type
m
we assume that
loss of generality,
is
which has
r,
unique across
all
semantic-types.
•
if
r
a semantic-type, and t\,T2 are primitive types, then T[cvt(ctx)@ctx,Ti=^T2\
is
compound conversion
for r
•
if
T
is
declaration.
m
T[cvt(ctx)@'m,ctx, Ti=>T2\
if
is
r
a semantic-type,
is
a value declaration.
and
is
a modifier defined on
IS
a atomic conversion declaration.
of the atomic conversion of
•
compound
say that the signature of the
a
conversion
n—>r2.
r X
a semantic- type,
is
We
is
m for
r, is
r,
are primitive types, then
ri, T2
We
say that the signature
t x ti—>r2.
ti is
a primitive-type and c
We
say that the signature of the value for t
is
a context symbol, then T[value(ctx)—>Ti]
r—>ti.
given by
is
Declarations for attributes, modifiers, conversions, and the built-in method value are collectively
referred to as
method
An atom
Definition 4
•
if
p
•
if
defined as follows:
is
an n-ary predicate symbol with signature
is
type
declarations.
Ti,
m
is
.
,
.
.
T„ respectively, then p{ti,
if
m
is
if
the
•
is
t, t'
T, Ti
,
T2 respectively, c
is
is
.
.
,tn are of (inherited)
are of (inherited) types
a context symbol, and
r, r'
t, t'
are of
tc is
(in-
a context term, then
a compound conversion atom.
atom of the modifier
m
has signature r x tj—>r2, c
^1,^2 are of (inherited) types r, ti, T2 respectively,
a atomic conversion atom for m.
the value signature
is
given by t-^t', c
T, Ti,
.
a modifier atom.
is
types
t\,
a context symbol, and
t[cvt{c)@Tn,tc,ti-^t2]
if
and
conversion function for r has signature t x ti^T2, t,ti,t2 are of
the atomic conversion
t,
t„
an attribute atom.
is
compound
symbol,
•
a predicate atom.
respectively, then i[m(c)—>i']
r, r'
t[cvt{c)@tc,t\^t2]
if
•
is
herited) types
•
•
a modifier symbol with signature t—>t', c
(inherited) types
•
is
x
an attribute symbol with signature t^t' and
respectively, then ([m-^i']
•
,tn)
ri
then t[vahie{c)-^t']
is
is
tc is
a context symbol, and
a value atom.
24
and
is
a context
a context term, then
t, t'
are of (inherited)
As
atoms corresponding
before, the
and
to attributes, modifiers, conversions,
built-in
method
value are referred to collectively as method atoms.
Atoms can be combined
form molecules
to
(or
compound atoms): these are
"syntactic sugar"
which are notationally convenient, but by themselves do not increase the expressive power of
we may
the language. For example,
write
,mk-^tk] as a shorthand for the conjunct t[mi-^t\] A
• t[mi—^ti;
•
t[m^ti[mi-^t2]] as a shorthand
•
t
T[m^t'] as a shorthand
:
for
<
for
:
i[m—>ii] A
A
t
ii[mi->i2];
•
•
•
A t[mk^tk];
and
t[77i^t'].
Well formed formulas can be defined inductively in the same manner as in first-order lan-
guages
[34]; specifically,
•
an atom
•
[{(f)
and
•
if
is
(/)
a formula;
is
are formulas, then
if
a formula and
X
is
-'(p, cp
Aip and (py
ip
are
all
formulas;
a variable occurring in 0, then both (VX(/)) and
{3X (p)
are
formulas.
Instead of dealing with the complexity of full-blown first-order logic,
it is
customary to
restrict
well-formed formulas to only clauses.
Definition 5
A Horn
clause in the
COIN language
is
a statement of the form
rhA^Bi,...,B„
where
is
A
can either be an atom or a declaration, and B\,.
called the head,
the form t[rn@
.
.
.
and Bi,.
->i']
,
.
where
.
B^
is
t
is
.
.
,Bn
is
called the body of the clause.
a conjunction of atoms.
If
A
is
a
method atom of
a term denoting a semantic-object, then the predeclaration
must contain the object declarations
for all oid-terms in the head.
Otherwise, F
F
may be omitted
Q
altogether.
4.3
A
The COIN Framework
The COIN framework
builds on the COIN data model to provide a formal charcicterization of the
Context Interchange strategy
for the integration of
Definition 6 A coin framework J^
is
heterogeneous data sources.
a quintuple <S,^,£,'D,C> where
25
• 5, the source set,
name
a labeled multi-set {s\ := S\,...,Sjn :— S„i}.
is
•
fj,,
which are known to hold on those predicates. The
constitute a relation
Si is in
V, the domain model,
domain model
corresponding to
S
to C. If
fj,{si)
=
Cj,
we
a set consisting of declarations. Intuitively, declarations in the
and predicates which are known.
,
.
.
E,„} where Ei
the set of elevation axioms
is
G
there
Si,
is
a clause which defines a corresponding semantic-
which every primitive object
in Vij
is
replaced by a Skolem term in
•
for
every oid-term in
attributes);
for every
r[,,
we
identify its type via the introduction of
and define the values which are assigned
declcu-ation,
-
atoms of
S. Ei consists of three parts:
for ear'i relation r^j
r'
set of
Cj.
a multi-set {Ei,.
is
Sj in
relation r[- in
-
is
context
identify the types, methods,
• £, the elevation set,
-
the
is
r^j in Sj.
the source-to-context mapping, defines a (total) function from
say that the source
•
label Si
of a source, and Si consists of ground predicate atoms rij{ai,...) as well as the
integrity constraints
Tij
The
an object
to structural properties
(i.e.,
and
oid-term in
r[,,
we
define
its
value in context c(= n{st)) with reference to
rij.
• C, the context multi-set,
is
a labeled multi-set {c\ := C\,...,Cn '=
context symbol, and Ci, called the context set for
Cj,
description of the relevant data semantics in context
We provide the
shown
in Figure
1
can be represented in a coin framework
The contents
sources.
We
we
of the source set
S
is
simply the
place no limitation on the
rules following the
•
The
function
/i is
—
>
a
which provides a
Qi
Cj.
T=
is
how the
<S,
£,
/i,
integration scenario
V, C>. Figures 3 and 4
it
set of
number
ground atoms present
of relations which
in the
may be
data
present in
happens that each source has only one
relation.
ground atoms are functional dependencies which are known to be
true in the respective relation.
dependency cname
Cj
will elaborate briefly below:
each source; in the current example,
The
set of clauses
intuition for the above definition by demonstrating
present a partial codification which
•
is
Cn} where
For instance, the two rules in
{re venue, currency} on the attributes in
defined as a relation
on
S
x C: thus, source
whereas S2 and S3 are both mapped to context
26
C2-
si
S\ defines
the functional
r\.
is
mapped
to context ci,
Source
5
set
si := {
riClBM', 1000000, 'USD')- ri('NTT', 1000000, 'JPY').
=^2
fii
52 :=
{
^ri(/V,fli,.),n(7V,fl2,-).
Yi=Y2i-ri{N,.,y\),r,(N,.,Y2).
}
r2('IBM', 1500 000). raCNTT', 5000 000).
Ei=E2^ro{N,Ei),r2(N,E-2).
53 :=
{
,
Source-to-Context Mapping
{/x(Si
,
}
rsCUSD', 'JPY', 104.0). rsCJPY' 'USD', 0.0096).
Ti = T2^r3{X,Y,Ti),r3{X,Y,T2).
}
fj,
Ci), fl{s2,C2), ^l{S3,C2)}
Domain model
V
/* type declarations */
T5.
semanticNumber
T5.
semajiticString
moneyAmt semanticNumber.
::
::
number
::
varchar
::
integer
::
companyFinanciaJs
moneyAmt.
currencyType
semanticString.
real
::
::
Tp.
Tp.
::
number,
number.
::
companyName
::
semanticString.
/* attribute declaration */
companyFinancials[company
/* modifier
=>
companyName]
declarations */
moneyAmt[currencyfctxj=> currencyType;
/* value
scaleFactor(ctx)--
semanticNumber]
declarations */
semanticString\va\ue{ctx)^
varciiar].
semanticJVumber[va/uefctxj=> number].
/* conversion
declarations */
semanticString[cvt(ctx)@ctx,varcbar => varchar].
semanticNumber[cvt(ctx)(§ctx,number => number].
moneyAmt[cvt('ctx)@ctx,number => number].
money Amt[cvt(ctx)@ctx,scaleFactor,number => number].
moneyAmt[cvt('ctxj@ctx,currency,number => number].
Figure
3:
/* predicate declarations */
r[ (company iVame,companyFinanciaJs, currency Type)
r2(companyName,companyFinanciaJs).
ri (v'archar,integer, varchar),
r!^{currencyType,currencyType,semanticNumber).
rs ( varchar, varchar,real).
The source
set, source-to-coiitext
r2 ( varchar,integer )
mapping, and domain model
corresponding to the motivational example.
27
for the
COIN framework
•
V
The domain model
consists of two parts.
the semantic-types which are
(1)
tions for
The
known and the
methods which are applicable
left-half (as seen in
Figure 3) identifies
generalization hierarchy; (2) the declara-
to the semantic-types;
predicates corresponding to the semantic relations
(r,).
The
and the signatures
for the
same
right-half does the
for
primitive-types and predicates for the exteiisional relations.
•
The
first
clause in each
£?,
sponding to the relation
of the elevation set
rf,
£
defines the semantic relation r[ corre-
the semantic relations are defined on semantic-objects {as
opposed to primitive-objects), which axe instantiated as Skolem terms. The Skolem function (e.g., /r2#expenses) ^r^ chosen in the
of a tuple in the corresponding relation
way such that when applied
(e.g.,
to the key-value
'NTT'), the resulting Skolem term
/r2#expenses('NTT')) would in fact identify a unique "cell" in the relation as
Figure 2 (in this case, the expenses of NTT as reported in relation
•
shown
in
r2).
Object declarations and attribute atoms in the elevation set provide a way of specifying
the types of corresponding Skolem terms introduced in the semantic relation.
stance, any
Skolem term
companyFinancials.
is
•
(i.e.,
assigned to the
The
The
/ri#revenue(-)
attribute
company
following this declaration defines the object that
attribute for this semantic-object
values of the Skolem terms introduced in the semantic relation are defined through
the clauses
shown
The
last.
primitive-objects assigned are obtained directly from the
extensional relation. Clearly, the value assignment
source as identified by
/i;
is
valid only within the context of the
the values of the Skolem terms in a different context can be
derived through the use of conversion functions, which
The context
shown
Figure
in
in-
asserted to be an instance of the semantic- type
is
atom
For
multi-set
5.
C
is
we
will define later.
given by {ci := Ci,C2 := C2} and
There are two kinds of axioms: modifier value
is
defined by the axioms
definitions
and conversion
definitions.
Consistent with our data model, modifiers can be assigned
mechanism
texts: this constitutes the principle
for describing the
For example, the fact that in context
contexts.
scale-factor of 1
000 whenever
it is
diflferent
Ci,
values in distinct con-
meaning of data
companyFinancials are reported using a
reported in JPY, and
1
otherwise, can be represented by the
formula:
VX'
:
(
companyFinancials 3F'
:
number h
X'[sca/eFactor(ci )^F'] )A
(F'[vaiue(ci)-> 1000] <- X'[curreJicy(ci)^y'] A Y' ='3PY>
28
in disparate
)
A
Elevation
Axioms Ei
of
^
'"l(/rl#ciiame(-'^l),/rl#revenue(-^l)./rl#currency(-^'l)) <~ ^l('^l)-.
compajiyNanie.
/ri#cnan.e(-)
companyFinancials.
/ri#revenue(-)
/rl# revenue (^l) [company -> /rl#cname(A'i
currencyType.
/ri#currency{-)
-)•
:
:
)].
:
/rl#cname(A'i)[vaJue(C)^,Yi] f-
Ti (.Yi ,.,.), /x(Si
,
C).
/rl#revenue(-Vi)[vaJue(C)^A'o] <- Tj (Xi -Va, -), m(S1 C)(Xi _, X3), ^(Si , C).
/rl#currency(A'l)[va/ue(C)->A'3] <,
H
Elevation Axioms Eo of
,
,
£^
'"2(/r2#cname(A'l),/r2#expenses(-'^l)) <" T^iXi,.).
/r2#cname(-)
:
conjpauyiVanje.
/r2#expeiises(-)
companyFinancials.
:
/r2#expenses(Ai)[cOnipany -> /r2#ciiame(-'^l
/r2#cname(A',)[vaiue(C)->Xi]
<r- r-iC-Yi
,
)]•
_), /l(so,
C).
/r2#expenses(A'i)[vaiue(C)-^A'2] <- r^iXi, X2), n{S2,C).
El evation
Axioms £3
^
of
'3(/r3#fromCur(-^l A'a), /r3#toCur(-^I
,
/r3#fromCur(-,-)
/r3#toCur(-,-)
:
:
,
A'2), /r3#exchangeRate(A"i
,
Xi)) <- r3(Xi X2,-).
Currency Type.
curreiicyType.
semanticNumber.
r3(Xi X2, .), M«3 C)
/r3#fron,Cur(Ai,X2)Mue(C)-^A,]
r3(Xi,X2,-),/i(s3,C).
/r3#toCur(A,,X,)[vaJue(C)^A2]
/r3#exchai,geRate(Ai A2)[vailie(C)^ A'3] <r- r3(Xi X2, X3) ^l{s3,C).
/r3#exchai.geRate(-, -)
:
^
,
,
e
,
,
Figure
4:
,
Elevation set corresponding to the motivational example
29
,
Context
Ci:
/* modifier value assignments */
X' companyFinajicials \- X'[scaJeFactor(ci)->- scaJeFactor{ci,X')].
X' companyFinanciaJs, scaleFactor{ci,X') number h
:
:
:
X'
:
sca/eFactor(ci,X')[vaJue{ci)^ l] <- X'[currency(ci)^Y'],Y' ^'JPY'.
companyFinanciaJs, scaJeFactor(c\,X') number h
:
scaleFactorici, X')[value{ci)^ 1000] <- X'[currency{ci)^Y'],Y' ='JPY'.
X'
X'
:
companyFinanciaJs h X' [currency (ci)^ currency {ci,X')].
companyFinanciaJs, currency {ci,X') currencyType h
:
:
currency(cuX')[value{ci)^Y]
^
X'[company -^NI,],r\(N[,R',Y'),NI,
i
N[,
Y'[value{ci)^Y].
/* conversion function definitions */
X' moneyAmt \X'[cvt{ci)@C,U^V] <- X'[cvt{ci)mscaJeFactor,C,U^W],
X'[cvt{ci)@currency,C, W-¥V].
X' moneyAmt \:
:
X'[cvt{ci)@scaJeFactor,C,
U^V]
<- A"[scaJeFactor(ci)->-[va/ije(ci)->Fi]],
X'[scaleFactor{C)^.[value{ci)^F]],
X'
:
moneyAmt
h
.Y'[cvt(ci) ©currency, C,
X'
:
moneyAmt
U-^V]
<- X'[currency{ci)-^Y{], X'[currency{C)-^Y'],
I-
X'[cvt{ci)®currency,C,U^V]
<-
X'[currency{ci)^Y(],X'[currency{C)^Y'],
y;
% y,
r'.iY}, f/,
R'[value{ci)^R],V
Context
V = U*
R), y}
=U
*
i
r/, y;
i
Y',
R.
C2:
/* modifier value assignments */
X' companyFinancials h X'[scaJeFactor(c2)->- scaJeFactor(co,X')].
:
X'
X'
X'
:
:
:
moneyAmt, sca}eFactor{c2,X')
:
number h
scayeFactor(c2, A")[vaiue(c2)^
l).
companyFinanciaJs h X'[currency(c2)^ currency (c2,-X'')].
moneyAmt, currency {c2,X') currencyType h
:
currency(c2,A")[vaJue(c2)-> 'USD'].
/* conversion
definitions are similar to ci
Figure
5:
Context sets
for
C
and omitted for
brevity */
for the motivational
30
example
at hand.
F/F^
^
(F'[va7ue(ci)^
The above formula
is
l]
^
X'[ciirrency(ci)^Y']AY' ^'JPY'
).
not in clausal form, but can be transformed to definite
Horn
clauses by
Skolemizing the existentially quantified variable F'. For example, the above formulas can be
reduced to the following clauses:
X'
:
company Financials h
X'[scaleFactor{ci )^fscaleFactor{ci}(^')]-
X'
:
company Financials,
Aca/eFacMc,)(^')[va/»e(ci)^
A''
:
companyFinancials,
fscaieFactorici)
fscaieFactor{c-2)(-^')
's
a
1
OOO]
^
l]
X'[currency(ci)->r'],y'^'JPY'.
number h
'
fscaleFactor{ci)(^')
fscaieFactoH.cM^')[y^iiie{cj)^
where
number h
JscaieFactor(c^){^')
^ X [currency (c,)^Y'],Y'
'
unique Skolem function;
notational simplicity,
for
with the term scaleFactor(c2, X').
^MPY'.
we
Currency values corresponding to
stances of companyFinancials are obtained directly from the extensional relation r\ as
in Figure 5. In this instance,
it is
as context C2), the modifier values for ciu-rency
sive to capture
intensional
It is
shown
In a "better-behaved" situation (such
and scaleFactor can be defined independently
worthwhile to note that our framework
both types of scenario, although the
and extensional knowledge more
first
tends to
is
sufliciently expres-
make the boundary between
fuzzy.
Conversion functions define how the value of a given semantic-object can be derived
the current context, given that
shown
in
Figure
moneyAmt
The
5,
the
first
its
value
is
known with
respect to a different context.
is
The currency conversion
which
is
defined by identifying the respective scale-factors in the source
is
ratio of the
obtained by multiplying the source value by a conversion rate
obtained via a lookup on yet another data source
are defined with respect to
As
and currency.
and target contexts and multiplying the value of the moneyAmt object by the
two.
in
clause in the gi'oup (for context ci) defines the conversion for
via the composition of atomic conversion functions for scaleFactor
scaleFactor conversion
in-
necessary to reference an extensional relation because "meta-
data" are represented along with "data" in a source.
of the underlying schema.
replace
moneyAmt but
(r:^).
Notice that these conversions
are applicable to companyFinancials via behavioral
inheritance of the methods. In general, the repertoire of conversion functions can be extended
arbitrarily
built-in
by defining the conversion externally and invoking the external functions using the
system
predicate which serves as an escape hatch to the operating system. However,
encapsulating the conversion in external functions makes
it
harder to reason about the properties
of the conversion; for examj^le, the explicit treatment of arithmetic operators
31
and table-lookups
conversion functions) allow us to exploit opportunities for optimization, say, by rewriting
(ill
the arithmetic expression to reduce the size of intermediary tables during query execution^^.
Query Answering
5
Abductive Inferences
as
Following the same algorithm outlined in
[1],
any collection of COIN clauses can be translated to
Datalog with negation (Datalog"'^) (or equivalently, normal Horn program
semantics as well as computation procedures have been widely studied
we explore an
alternative approach based
[34]), for
which the
In this section,
[51]^^.
on abductive reasoning. The abductive framework
We
provides us with intensional (as opposed to extensional) answers to a query^'^.
describe this
abductive framework below and the relationship between query mediation in a coin framework
and query answering
familiarity with logic
part, shall
remain
in
an abductive framework.
programming
we assume some
In the interest of space,
ensuing discussion, and for most
at the level of [o4] in the
faithful to the notations therein.
The Abductive Framework
5.1
Abduction refers to a particular kind of hypothetical reasoning which, in the simplest case, takes
the form:
From observing A and the axiom
Infer
5
abductive reasoning.
is
-^
A
as a possible "explanation" of A.
Abductive logic programming (ALP)
T
B
a theory,
I
is
Specifically,
an extension of
logic
an abductive framework
[17]
[27] is
a set of integrity constraints, and
abducible predicates. Given an abductive framework
observation)
and a
,
is
is
a triple
[34] to
support
<T,A,I>
where
a set of predicate symbols, called
<T,A,I>
and a sentence 3Xq{X)
(the
the abductive task can be characterized as the problem of finding a substitution 6
set of abducibles
(1)
TuA\=q{X)9,
(2)
TUA
^^
^
programming
satisfies I;
A, called the abductive explanation
for the given observation,
and
Details of query optimization strategies that taJce into eiccount conversion functions are
the work reported here.
such that
A more
detailed discussion can be found in
beyond the scope of
[13].
'^The fact that "object-based logics" can be encoded in classicfil predicate logic has been known for a long
time (see for example, [9]). This however should not cause us to "lose faith" in our data model, since the syntax
of the language plays a pivotaJ role in shaping our conceptualization of the problem amd in finding solutions at
the appropriate levels of abstraction.
'^This change in perspective
is
beneficial for a variety of reasons (see Section 2),
32
and
will
not be repeated here.
A
(3)
has some properties that make
Requirement
(1
)
A, together with T, must be capable of providing an explanation
states that
The
the observation q{X)9.
we
ambiguity,
for
consistency requirement in (2) distinguishes abductive explanations
in the characterization of
from inductive generalizations. Finally,
primarily that literals in
"interesting".
it
A
A
are atoms formed from abducible predicates:
refer to these
atoms
also as abducibles.
means
in (3), "interesting"
where there
In most instances, we would
like
is
no
A
to
be minimal or non-redundant.
also
Semantics and proof procedures
ALP
for
have been active research topics recently (see
[27]
We describe in this section an abduction procedure based on extensions
called SLD+ Abduction. The underlying idea is first reported in [12], and
and references therein).
to
SLD
resolution,
The account we
has inspired various different extensions.
We
consider
first
SLD
resolution.
Given a theory
give here follows that in
T
consisting of (definite)
and a goal clause <— q{X), and SLD-refutation of <— q{X)
q(X)); <—
Gi;-;<— Gn where
<—
from <— Gt by resolving one of
may be many
Since there
of possible refutations
is
in a
which terminates
in
clauses A, such that
g.
()
is
some <-
can be resolved with the selected
of ways
G^,
it
an empty
TU A
|=
will not
clause).
(e.g., in
is
Go(=
obtained
The
literal,
search space defined by an
literal
g will not resolve with any
is
not worth
contain any branch that leads to a refutation
Given however that we are searching
<— Gj+i, which
a space
a depth-first manner).
whose selected
G, then clearly by letting
we can continue the search with
clauses
(the selected literal) with one of the clauses in T.
T which
number
and each <— Gi+i
This means that the part of the subtree with <— G, at the root
exploring any further, since
g,
the empty clause
Horn
a sequence of goal clauses <—
defined (in the form of an SLD-tree).
Suppose now that there
with
is
its literals
clauses in
SLD-tree may be searched
clause in T.
G„
is
[46].
A
is
(i.e.,
one
for a set of unit
include a unit clause which resolves
obtained from <— Gj minus the
literal
This observation forms the basis for the SLD+Abduction procedure which we proceed to
describe below.
Given an abductive framework
<T,A,1> and
the abductive query q{X), consider the
sequence given by
<— Go,
Ao where Go = q{X) and Ao
is
the
empty
set
<-Gn,A„
such that Gt+i,
•
if
g,
step
At_|_i
is
derived from Gj, A, as follows:
the selected literal of <— Gj, can be resolved with a clause in T, then a single resolution
is
taken to yield Gj+i, and Aj^-i
=
A^;
33
•
g
if
is
abducible,
{5' <— }
is
empty
g with
is
said to
derivation, as
this refutation,
variables
all
just defined,
complexity since
it
due to the
"3' <— "
[16], it is
becomes necessary
fact that a
may have
(say in <— Gi)
A„
said to be a refutation
is
to
V ( q{X)6
is
),
where 6
not done, the abducted fact "g'
is
the substitution
explicitly
The
to deal with equality constraints
is
to
(where
g'
=
g)
on Skolem constants.
{sk) introduced earlier in the SLD-derivation
[t]
later
be unified with a term
and the consistency of sk
[17].
=
on
(in <—
Gj, where j
>
i)-
t
when
the equality fact sk
with other abduced facts and
Suppose that the selected
usual negation-as-failure mechanism
the refutation
i,
[10] as integrity constraints.
FEQ
is
— Ik
checked.
described can be extended to cope with negation through
sources of complications in this scheme. First,
new
^"
to be
it
This Skolemization, however, introduces additional
is
used:
theory (augmented with the current residue), then not g
so that
the
suggested that this can be dealt with by introducing the equality predicate as an
the use of negation-as-failure
g.
is
selected literal g to be Skolemized. This
to be unified with a specific term
The procedure which we have just
not
Gn
needs only be existentially quantified for
Skolem constant
Thus, when a Skolem constant sk
is
<—
if
said to be the residue corresponding to
abducible predicate and to add the theory of Free Equality (FEQ)
abduced
{g' <— }.
with respect to the abductive framework
is
we require that the
would have been unnecessarily strong.
In
Aj U
X.
resolvable with g. If the Skolemization
is
G
=
and Aj+i
TU A,U
substitutions leading to the refutation, restricted to the
because variables in the unit clause
This
less p,
and constitutes the abductive answer
In the abduction step above,
is
G,
Gj-i-i is
set of unit clauses
obtained from the composition of
Skolem constants, and
variables replaced by
be a derivation of
we have
The accumulated
clause.
all its
consistent with I, then
The sequence obtained
<T,A,I>. A
g' is
it
i.e., if
is
the current goal clause
3 cannot be proven from the
assumed to be
may happen
additional facts are abducted.
literal of
To avoid
true.
There are two
that g becomes provable later in
this,
not g needs to be recorded
clauses which are subsequently added do not violate this implicit assumption.
Second, negation
may be
nested.
Suppose there
is
a clause given by g <— not
not provable from the current residue.
Then an attempt
with negation-as-failure (SLDNF) will
fail
because
it
is
/i,
abduction
This procedure can be generalized to any
and abduction
at
odd
is
not possible to prove h.
used instead and
level of nesting,
levels.
34
and that h
is
to prove not g using SLD-resolution
might be rendered provable by adding further clauses to the residue.
SLD-resolution to try to show
h,
with
is
However, h
So rather than using
allowed to add to the residue.
SLD
being used at even
levels,
Query Answering
5.2
Figure 6 illustrates
how
perspective, queries
COIN Framework
queries are evaluated in a Context Interchange system.
and answers are couched
knowledge-level) query
thereof),
in the
is
satisfying the query).
(a
a user
data model: a (data-level or
in the relational
formulated using a relational query language
and answers can either be intensional
From
(SQL
or
some extension
mediated query) or extensional (actual tuples
Examples of these queries and answers have been presented
earlier in
Section 2.1.
COIN Query
Data/Knowledge-Level
Query
$R
Abductive Query
G
©
Intentional
Answer
~ "
~
Domain Model
+ Clark's
Theory
FEQ
Context Set
^ '
Extensional
Answer
- -
-1
„
Integrity
Constramts
^
Elevation Set
Abducibles
Source-to-Context
Mediated
Query
T
mapping
-
evaluable-predicates
Source Set
extensional predicates
1
G
Query Optimizer/Executioner
Abductive Answer
COIN Framework
System Perspective
Figure
6:
A summary
of
how
Abductive Framework
queries are processed within the Context Interchange strategy:
query to a well-formed COIN query; @ performs the COIN to
Datalog"'^6 translation; @ is the abduction computation which generates an abductive answer
corresponding to the given query; and ® transforms the answer from clausal form back to SQL.
(D
transforms a (extended)
SQL
35
Transformation to the
COIN Framework
Within the COIN framework, the SQL-like queries originating from users are translated
coin language. For example, queries Ql and Q2
clausal representation in the
to a
in Section 2 can
be mapped to the following clausal representations:
CQl: (- answer {N,R).
answer{N, R) <- ri(N, R, .),r2{N,
E),R> E
and correspondingly,
CQ2: <^ answer {N,FuF2)
answer{N, Fi, F2)
<r- ri
Fi
The above
(N, R,
/
_),
R[scaleFactor{ci)^Fi], R[scaleFactor{c2)^F2],
F2.
queries however do not capture the real intent of the user. For example, there
is
no
recognition that "revenue" and "expenses" have different currencies and scale-factors associated
with them and should not be compared "as
the method scaleFactor
C2
is
that
R
in
CQ2
is
a primitive-object for which
not defined, or the fact that both queries originate from context
which may be interpreted
"naive",
is",
differently in a different context.
We
say that these queries are
and thus must be translated to corresponding "well-formed"
Definition 7 Let <Q, c> be a naive query
in
a COIN framework
.F,
queries.
where
c
denotes the context
from which the query originates. The well-formed query Q' corresponding to <Q, c>
is
obtained
by the following transformations:
• replace all relational operators with their "semantic" counterpart; for example,
X>y
is
c
replaced with
•
make
all
X
> Y.
relational "joins" explicit
by replacing shared variables with
explicit equality
using the semantic-operator =; for example, ri{X,Y),r2{X,Z) would be replaced with
n{XuY),r2{X2,Z),X,^X2.
• similarly,
make
relational "selections" explicit; thus, ri {X, a) will
be replaced by ri{X, Y),
c
Y=
a.
• replace all references to extensional relations
for
•
example, ri{X,Y)
will
with the corresponding semantic-relations;
be replaced with r[{X,Y).
append to the query constructed
so far, value
elements that are requested in the query.
36
atoms that return the value of the data
Q
Based on the above transformation, the well-formed query corresponding to naive queries
<CQ1,C2> and <CQ2,C2>,
CQl':
<r-
are given by
answer(N,R).
answer{N, R)
^
r[{N[, R',
_),
r'^iN^, E'),
N[
1
N^, R'
> E\
N[[value(c2)^N],R'[value{c2)^R]
and
CQ2':
<r-
answer {N,FuF2).
answer{N, Fi, F2) <- r[(N',
F{
R',J), R'[scaleFactor{ci)^F[], R'[sca}eFactor{c2)^F^],
% Fi,N'[value(c2)->NlF[[value{c2)^FilF^[vaIue{c2)^F2].
respectively.
Transformations to
The
cin
Abductive Framework
COIN framework and an abduction framework can now be
relationship between a
Definition 8 Given the COIN framework
corresponding abductive framework
•
T
•
I
is
given by
consists of the integrity constraints defined in S,
[10];
and
fi;
and the system predicate which
calls.
a well-formed query in the COIN framework .Fc, the corresponding abductive
framework of which
is
WD UC U
the built-in predicates corresponding
relational (comparison) operators,
provides the interface for system
that <— q(X)
£
and
to arithmetic
is
can be mapped to a
augmented with Clark's Free Equality
A consists of the extensional predicates defined in 5,
Suppose <— q(X)
this
<T, X, A> where
the Datalog"^^ translation of the set of clauses given hy
Axioms
•
T^
Tc = <S, fi,£,'DX>,
is
denoted by
identical in
Ta — <T,T,A>. Without any
both J^c and
COIN, and any COIN query <—
Q{X) can
J^a-
This
is
is
a sublanguage of
always be transformed to a Datalog"^^ query <— q{X)
•(—
Q{X)
into the theory
Given an abductive framework <T,X,>1>, and the query 3Xq(X). Suppose
an abductive answer
for
q(X)9, then
we assume
loss of generality,
because Datalog"*'^
by adding the Datalog^'^^-translation of the clause q(X)
is
stated.
it
follows that
T\={q{X)e<-pu...,Pm)
37
T.
A=
{pi,...,Pm}
This result follows from the fact that
Suppose
q[X)6.
and
step,
<p is
=
{sko,...}
pi
for
A
i
•
we say
3X q{X){6ip).
that the tuple
This fact
is
•
=
•
1,
.
,
.
.
Apm
m,
so a set of ground atoms
constitutes a precondition for
the set of Skolem constants introduced by the abduction
is
a "reverse" substitution {ski/Yi} where
not in X. Then,
query
K
ground
The conjunct
represents their conjunction.
in fact
pj's are
(
3Y (pi,
.
.
.
sfc,
€
K
and
Vj
a distinct variable
is
an intensional answer
,Pm)</', ^V') is
not surprising given that Motro and
Yuan
[39]
for the
suggested that
intensional answers can be obtained from the "dead-ends" of "derivation trees" corresponding
to a query.
fact a naive
Although
it
was not recognized as such, the procedure described
implementation of
SLD+ Abduction
in [39]
is
From
(without any consistency checking).
in
the
perspective of the user issuing a naive query, the intensional answer can also be interpreted as
the corresponding mediated answer:
An an
illustration of the preceding
comments, the evaluation of CQ2'
in the
abductive
framework yields the following abductive answer:
A=
The
{ri{sko,skusk2),sk2 ='3PY'},e
reverse substitution
(equivalently, the
(p is
=
{7V/sfco,
Fi/1 000, F2/I}
given by {sfco/Vo, s/^i/Vi, 5^2/^2}, and thus the intensional answer
mediated query)
is:
{3Yo,YuY2iri{Yo,YuY2),Y2 ='JPY'), {TV/Fo, Fi/l000,F2/l}
which translates to
answer
for the
MQ2
shown
in Section 2.
If
)
{Vo/'NTT', Vi /I 000 000, Ya/'iPY'}
above mediated query, then the answer
for the original user
query
is
is
an
given by
{A^/'NTT',Fi/l000,F2/l}.
Illustrative
5.3
In this section,
Example
we provide an example
illustrative of the
computation involved
in
query medi-
ation (equivalently, obtaining the intensional answer to a query).
Consider the query
Q3
(a simplified variant of
Q2) which
queries relation ri for the scale-factors of revenues in context
Q3:
issued from context ci
is
c\
:
SELECT rl.cname, rl .revenue. scaleFactor IN cl
FROM rl;
The
(well-formed) clausal representation for this query
CQ3:
is
given by
<- aiiswer(Ar, F).
answer(iV, F) <- rV{N',
/?',.),
N' [value (ci)^N], R' [scaleFactor {ci)-^F'],
F'[value{ci)^F].
38
,
which
SLD+Abduction algorithm
Figure 7 shows one possible refutation of this query using the
For better
scribed earlier.
and
the refutation
shown using COIN
is
clauses used for resolving the goal clauses are those
The
Catalog.
clarity,
shown
de-
clauses rather than
earlier in Figure 3, 4
5.
we
aid in appreciating the chain of reasoning,
To
on the
offer the following highlights
refu-
tation:
refutation begins with the query as given, with
•
The
•
At step
(3),
form, r\(sko,ski,sk2),
At step
(6),
the
is
ri
is
set.
an extensional
Skolemized
its
added to A.
literal scaleFactor(ci
two different clauses (where
F
Since
removed from the goal clause and
is
it
empty
initialized to the
the literal ri{N,.,.) cannot be further resolved.
predicate (and hence abducible),
•
A
F =1
,
be resolved with
/ri^revenue(sfco))[vaiue(ci)—>F] can
F
and
=1000). One
chosen arbitrarily
is
(in this case,
=1); the other will be selected on backtracking and will eventually lead to another
refutation.
•
To
not be
case,
'
JPY' when evaluated
is
it
in context c\
To determine
(see step (8)).
the expansion of the goal clause as
shown
• In step (12), the extensional relation
tion,
we are not allowed
we
will
• In step (15),
to
is
the
(see
r\
This eventually leads to
referenced again. In the absence of other informa-
assume that
it
is
the
fact,
same
"fact"
which
hcis
been abducted:
r\(sk3,sk4,sk5) to A.
the equality constraint on the objects /ri#cname(sfco) and /ri#cname(sfc3) leads
added to A. At
=
sk^.
Since '='
merge the two
is
abducible
this point, the functional
=
generates further the constraints ski
facts ri(sA;o,sA;i,.sA;2)
Finally, in step (17), the literal sk2
=
and
The
-"^k^
and sk2
is
an evaluable predicate),
=
sk^,
which
in
it
is
turn allow us to
abducted, which leads to a refutation.
substitution, restricted to variables
is
is
ri (5^3, 5^4,5^5).
'JPY'
This intensional answer, translated to SQL,
(it
dependency cname -^ {revenue, currency}
abductive answer corresponding to this refutation
'JPY'}.
5).
is
in step (10).
need to add a new Skolemized
to the constraint skg
•
this
if
necessary to identify the currency value from the extensional relation
corresponding axiom for assigning currency values in Figure
i.e.,
hand must
arrive at a successful refutation, the currency for the revenue-object at
is
{N,F},
given by:
39
given by
is
A=
{r\ (sko, sk\
,
.sA;2),
The
sk2
given by {N/skQ, F/l}.
—
(1)
^as{N,F).
1
Ao =
{}
SELECT rl.cname,
On
1
FROM rl WHERE rl. currency <> 'JPY';
F =1 000
backtracking, the other solution corresponding to
answer returned to the user
MQ3:
will
be obtained. The complete
thus given by:
is
SELECT rl.cname,
1
FROM rl WHERE rl. currency <> 'JPY'
UNION
SELECT rl.cname, 1000 FROM rl WHERE rl. currency = 'JPY';
The correspondences between
clearly seen in the
constraint (sko
=
above example. At step
^fca) to
and semantic query optimization can be
integrity checking
(15), the functional
be propagated and eventually allow
from the abductive answer.
SELECT rell.cname,
were not
r^ allows
ri(sA;3, sA;4,5/j5) to
the
initial
be eliminated
the intensional answer obtained would instead be:
If
it
1
FROM rl rell, rl rel2
so,
dependencies
WHERE rel 1 cname = rel2 cname
.
.
which would include a redundant second reference
obviously would lead to suboptimal performance
This second answer
to r\.
if
unintuitive,
is
and
executed without further optimization. In
the more general scenario, constraints can be useful in pruning an entire refutation altogether.
For instance,
Q3 had
if
been:
SELECT rl.cname IN cl, rl .revenue. scaleFactor IN cl
Q3':
FROM rl WHERE rl. currency = 'JPY';
we
in
will eventually
end up trying to abduct
A, thus resulting
in
sk-^
an unsuccessful refutation. In
consist of only the second select-statement in
6
A
(i.e.,
C
=
{c\
Two
7^
'JPY'
is
already present
mediated query
MQ3'
will
MQ3.
COIN Framework
COIN framework
We
:= Ci,...,c„ :— Cn})-
framework which allows new contexts to be defined
fashion.
this case, the
Meta-Logical Extension to the
In Section 4.3, context knowledge in a
theories
=' JPY' where sk2
basic mechanisms underly this
is
represented by a set of separate
describe here an extension to this basic
terms of existing ones
in
move
to
such an extension:
in
an incremental
the treatment of
context as a set of parameterized statements and the introduction of the hierarchical operator
which defines a subcontext relation on the
set {ci
,
.
.
,
.
-<,
c^i }.
Recall that the relative truth or falsity of a statement can be represented using McCarthy's
ist,
so that
41
ist(ct,a)
is
mean
taken to
a
that the statement
is
true in context
make incremental refinements
to statements which describe
enclosing context. Thus,
a subcontext of
if c, is
a differential context denoted by
ist(Ct,a) <-
cr
ist(ci,a) <—
c,
G
Cj
^
c,
is
already
Cj, this
-<
allows us to
known about an
allows us to introduce
such that:
(Jc^,
^
not-overridden{6c,,cr)-
Cj, ist{cj,a),
indicates that the statement
a obtained from the more general
not explicitly overridden by the differential context.
is
context theory of
in
denoted by
what
relation
(Jci
The predicate not-overridden
context Cj
Cj,
The
Cj.
from
Cj
and 6^
is
similar to that accomplished
The composition
of a
new
by the isa operator defined
[4].
In the
COIN data model, statements
references to
in
a context are "decontextualized" by making explicit
the form of a context-object. For example, the statement
its reification in
ist(cj,t[mi-^t'] <- t[m2^t']).
can be equivalently stated as
t[mi{cj)^t'] <- t[m2(cj)^t'].
This second form simplifies the inferences which are undertaken to support context mediation,
but requires some adjustment to allow statements to be inherited.
statement
This
is
inherited by context
is
c,
accomplished by requiring
6c,iX)
=
(^
all
Cj),
we
Specifically,
if
need to replace the references to
will
statements in
Sc^
to
be parameterized;
the above
Cj
with
c,.
i.e.,
{a^(X),...,ai[X)}
For instance, the earlier statement would have been asserted as
aiX)
=
t[in^{X)^t']
^
t[m2{X)^t'].
The statement a{X)
in the set 6c
is
said to
axioms forms an uninstantiated context
Definition 9 Let 6C
said to
be the
contexts {ci,
relation
-<.
.
=
{cq :=
.
,
c„}. Let {c,,
Then the
from Ci{X) as before:
,
set.
Co{X)JcA^)^-
differential for context
.
be uninstantiated. The collection of uninstantiated
.
,
.
.
Ci
Ct^ }
^^c^i^)}^ for which
with respect to
-<,
i.e.,
42
Cj^,
(i
=
l,...,7i)
is
which defines a partial order on the
be the predecessors of
uninstantiated context set for
(^^IJ^-)
q
with respect to the subcontext
denoted by Cij(X), can be obtained
•
a{X)GQ^(X)^a(X)E6,,^iX)
•
a(X) G QjiX)
The
context co
is
<-
a(X) e Q{X), not-overridden{6c {X),a{X)).
said to
be the default context and forms the basis
other differentials.
for the
Q
Definition 10 Given 6C
Suppose Ct(X)
The
{cq '—
Co{X),6c^{X),
.
.
the uninstantiated context set for
context set for
Notice that
is
is
=
c, is
,^c„(^)} and the subcontext relation
.
obtained inductively using Definition
c,
Q
to determine whether or not a given statement
is
being overridden in a specific context. The simplest approach
to this
in the
method) defined
in
head
any of
the type companyFinancials
is
in
a differential context
its
set,
be used
in the
V
assume that whenever a
supercontext applies. For example,
if
rules (pertaining
the scaleFactor for
given in two distinct context differentials along a given path in
is
said to take precedence and
COIN framework.
leads to the following extended formulation of a
Definition 11 The extended coin framework
and
to
corresponding context.
The above scheme
<S,£,
is
none of the other
the hierarchy, then the statement in the more specific context
will
9.
given by the set Ci(ci).
we have not described how one
method atom appears
-<.
are defined as before in Definition
is
a sextuple given by <<S,
6,
subcontext relation defined on the set of contexts
6C
P, 6C,
as defined in Definition
is
{ci,
/x, £",
.
.
.
,
7
The Context Interchange Prototype
The
feasibility
9,
^
>, where
and
the
-< is
c„} induced by 6C.
and features of the proposed strategy have been demonstrated
in a
prototype
which provides mediated access to both traditional structured databases and semi-structured
data sources (web-sites). Our implementation leverages on the world- wide- web
number
in
of ways:
for
(WWW)
in
providing physical connectivity across different networks and platform,
adopting a universal addressing scheme to different types of geographically-distributed
sources,
and
a
for providing us
re-
with a wealth of heterogeneous data sources. As shown in Figure
8,
queries submitted to the system are intercepted by a Context Mediator, which rewrites the user
query to a mediated query. The Optimizer transforms this to an optimized query plan, which
takes into account a variety of cost information.
The optimized query plan
is
executed by an
Executioner which dispatches subqueries to individual systems, collates the results, undertakes
conversions which
may be
necessary
when data
are exchanged between two systems, and returns
the answers to the receiver.
43
CONTEXT MEDIATION SERVICES
LxjcaJ
DBMS
supporting intermediate
processing
subquery //
subquery
answers
Non-traditional
Data Sources
Local
(e.g.,
DBMS
Figure
8:
Architectural overview of the Context Interchange Prototype
The Context Mediator
is
implemented
implementation distributed by the
interpreter
in
ECLiPSe^^, which
ECRC. At
is
an
efficient
and robust Prolog
the heart of the Context Mediator
which implements the extended SLD+Abduction algorithm described
Since computation of the abductive answer
we need
to translate
web-pages)
both the user-query
is
is
a meta-
in Section 5.1.
performed within a Horn-clause (HC) framework,
as well as
COIN clauses
to statements in Datalog"''^^
and
on obtaining the answer, perform the reverse translation to SQL. In the absence of aggregation
operators, the
SQL-to-HC and HC-to-SQL compilers
of these languages shares a
8
We
common grounding
are relatively straight-forward since both
in predicate calculus.
Conclusion
have presented a tightly-woven tapestry of ideas derived from different threads
in the
lit-
erature in artificial intelligence (on "contexts"), databases (on "heterogeneous databases" and
"semantic query optimization"), logic programming (on "abductive logic programming" and
^^ECLiPSe:
The ECRC Constraint
http //www ecrc de/eclipse/.
.
Logic
Parallel
.
44
System,
More information can be obtained
at
"meta- logic"), and others which are already present at the confluence of different scholarly
traditions
"deductive object-oriented data models" and "intensional answers").
(e.g.,
The
vari-
ous results and insights integrated together in a formal framework for the Context Interchange
and provide a well-founded basis
strategy,
for representing
disparate sources and receivers. Specifically,
tics in
we have described how data semantics
disparate systems can be articulated using a "object-logic"
ticular,
and reasoning about data seman-
,
and how
logical inferences (in par-
abduction) can be used to provide mediated access to both data and data- semantics.
At the same time, we showed that the COIN framework presents a viable alternative to
and contemporary integration approaches by by allowing
more
in
easily accessed,
classical
different kinds of information to
be
by making possible the sustenance of an infrastructure that mitigates the
complexity in the creation and maintenance of large-scale systems, and by isolating changes in
different
components which are only loosely-coupled together.
This paper
are
many
is
by no means the
interesting issues which
last
we
word on Context Interchange.
are only beginning to explore.
On
We
the contrary, there
mention below two of
these undertakings.
As noted
in [35], the
autonomy and heterogeneity
of sources present
new
challenges for
query processing and optimization which are not the same as those in distributed database
systems. These differences stem from constraints which are characteristic of the underlying ensources
may
differ in their
not be known, and data conversions
may
incur large hidden costs which are not accounted
vironment; for example,
may
for previously.
difl'erent
As we have shown
earlier,
query-handling
ability, cost
the detection of unsatisfiable answers in the abductive
framework constitute a form of semantic query optimization which presents huge
recently
[19],
We
and have found much synergy between the abduction framework, semantic
query optimization, and constraint
make
payoffs.
embarked on a re-implementation of the abduction procedure using Constraint Han-
dling Rules
tions
models
logic
programming
which motivated the work recently presented
[25]
in [53].
on the premise of similar observa-
To
this end,
we have been
able to
use of the existing prototype as a testbed on which theoretical insights can be rapidly
implemented and experimented with.
The
richness of the representational formalism
greater scope for abuse.
While
it is
is
a two-edged sword since
unlikely that there will ever
be a
it
presents also
"definitive guide" to context
modeling, case studies, evaluation criteria, prescriptive guidelines, and tools are in dire need.
At
this
moment, we are working with
several industry information-providers in applying this
mediation technology to the "real world" problems encountered by them.
We
are hopeful that
these experiences will be instrumental in developing and validating integration methodologies
that are grounded in practice.
45
References
[1]
Abiteboul,
of the
[2]
ACM SIGMOD
Ahmed,
A.,
Lausen, G., Uphoff, H., and Walker, E. Methods and
S.,
Conference (Washington, DC,
Smedt, P.
R.,
D.,
May
rules. In Proceedings
1993), pp. 32 41.
Du, W., Kent, W., Ketabchi, M. A., Litwin, W.
and Shan, M.-C. The Pegasus heterogeneous muitidatabase
system.
A., Raffi,
IEEE Computer
24, 12
(1991), 19-27.
[3]
Atkinson, M.
S.
P.,
Bancilhon,
F.,
DeWitt,
D., Dittrich, K., Maier, D.,
The object-oriented database system manifesto
and Zdonik,
In Proc. First Int'l. Conf. on
(invited paper).
Deductive and Object-Oriented Databases (Kyoto, Japan, Dec. 4-6 1989).
[4]
BORGi, A., Mancarella, p., Pedreschi, D., and Turini, F.
Computational Logic: Symposium Proceedings,
logic theories. In
Brussels,
[5]
Nov
J.
Compositional operators
W.
for
Lloyd, Ed. Springer-Verlag,
1990, pp. 117 134.
Brogi, a., and Turini, F. Metalogic
for
knowledge representation. In Proceedings of the 2nd
International Conference on Principles of Knowledge Representation and Reasoning (KR-91) (Cambridge,
[6]
MA, Apr 22-25
Buvac,
in
1991), pp. 61-69.
S. Quantificational logic of context. In Proceedings of the
Knowledge Representation and Reasoning (held
Workshop on Modeling Context
at the Thirteenth International Joint
Conference
on Artificial Intelligence) (1995).
[7]
Chakravarthy, U.
optimization.
[8]
Chang,
ACM
C.-L.,
S.,
Grant,
J.,
and Minker,
J.
Transactions on Database Systems
and Lee, R. C.-T. Symbolic
Logic-based approach to semantic query
15,
2 (1990), 162-207.
Logic and Mechanical Theorem Proving. Academic
Press, 1973.
[9]
Chen, W., and Warren, D.
Symposium on
[10]
S. C-logic for
complex objects. In
Principles of Database Systems
Clark, K. Negation
as failure. In Logic and
ACM SIGACT-SIGMOD-SIGART
(PODS) (March
1989), pp. 369-378.
Data Bases, H. Gallaire and
J.
Minker, Eds. Plenum
Press, 1978, pp. 292-322.
[11]
Collet, C, Huhns, M. N., and Shen, W.-M. Resource
base in Carnot.
[12]
Cox, P.
T.,
Proceedings
[13]
Daruwala,
IEEE Computer
24, 12
(Dec 1991), 55-63.
and Pietrzykowski, T. Causes
CADE-SO
a.,
for events: their
computation and applications. In
(1986), J. H. Siekmann, Ed., Springer-Verlag, pp. 608-621.
Goh, C.
H.,
Hofmeister,
The context interchange network prototype.
on Database Semantics (DS-6) (Atlanta,
Verlag)
integration using a large knowledge
S.,
Hussein, K., Madnick,
In Proc of the
G A, May
.
46
S.,
and Siegel, M.
IFIP WG2.6 Sixth Working Conference
10 - Jun 2 1995).
To appear
in
LNCS
(Springer-
[14]
Denecker, M., and Schreye, D. D. On
Int
[15]
Conf on
Fifth Generation
the duality of abduction and model generation. In Proc
Computer Systems
(1992), pp. 650-657.
DOBBIE, G., AND TOPOR, R. On the declarative and procedural semantics of deductive objectoriented systems. Journal of Intelligent Information Systems 4 (1995), 193-219.
[16]
Programming (1988), pp. 562-579.
Logic
[17]
Abductive planning with event calculus. In Proc. 5th International Conference on
ESHGHi, K.
EsHGHi, K., AND KowALSKi, R. A. Abduction compared with negation by
failure.
In Proc. 6th
International Conference on Logic Programming (Lisbon, 1989), pp. 234-255.
[18]
Faquhar,
A.,
Dappert,
context logic. In
A., Fikes, R.,
and Pratt, W.
Integrating information sources using
AAAI-95 Spring Symposium on Information Gathering from
Distributed Hetero-
geneous Environments (1995). To appear.
[19]
Fruhwirt,
Garcia-Molina,
MAN,
J.,
H.,
p. Terminological reasoning with constraint handUng rules. In
Papakonstantinou, Y., Quas.s,
and Widom,
NGITS
Proc
J.
The TSIMMIS approach
GOH, C.
[22]
A., Sagiv, Y., Ull-
to mediation: data models and languages. In
H.,
Madnick,
S. E.,
Israel,
June 27-29
and Siegel, M. D. Context interchange: overcoming the challenges
dynamic environment.
International Conference on Information and Knowledge
1
Rajaraman,
(Next Generation Information Technologies and Systems) (Naharia,
of large-scale interoperable databcise systems in a
Dec
D.,
To appear.
1995).
[21]
and Hanschke,
Workshop on Principles and Practice of Constraint Programming (1993).
Proc.
[20]
T.,
In Proceedings of the Third
Management (Gaithersburg, MD, Nov 29-
1994), pp. 337-346.
Gruber, T. R. The
Principles of
role of
common
ontology in achieving sharable, reusable knowledge bases. In
Knowledge Representation and Reasoning: Proceedings of
ference (Cambridge,
MA,
1991),
J.
A. Allen, R. Files, and
the
2nd International Con-
E. Sandewall, Eds.,
Morgan Kaufmann,
pp. 601 602.
[23]
GuHA, R. V.
Thesis,
[24]
Department of Computer Science, Stanford University, 1991.
Jaffar,
to
J.,
AND Maher, M.
Constraint logic programming:
A
survey. Journal of Logic Program-
appear (1996).
Jonker, W., and ScHiJTZ, H. The
p.
[27]
STAN-CS-9 1-1399-
3 (Sep 1987), 229-257.
ming
[26]
Tech. Rep.
Imielinski, T. Intelligent query answering in rule based systems. Journal of Logic Programming
4,
[25]
Contexts: a formalization and some applications.
ECRC
multidatabase system. In Proc
ACM SIGMOD
(1995),
490.
Kakas, A. C, KowALSKi, R.
and Computation
2,
A.,
and Toni,
F. Abductive logic programming. Journal of Logic
6 (1993), 719-770.
47
[28]
[29]
KlFER, M., Lausen, G., and
Wu,
J.
Logical foundations of object-oriented and frame-based
languages.
JACM
KuHN,
AND LuDWiG, T. VIP-MDBMS: A
E.,
4 (1995), 741 843.
logic multidatabase system. In Proc Fnt'l
Symp. on
Databases in Parallel and Distributed Systems (1988).
[30]
Landers, T., and Rosenberg, R. An overview
Symposium for Distributed Databases
[31]
Lenat, D.
B.,
and Guha, R.
of Multibase. In Proceedings
2nd International
(1982), pp. 153-183.
Building large knowledge-based systems: representation and infer-
ence in the Cyc project. Addison- Wesley Publishing Co., Inc., 1989.
[32]
W. 0*SQL: A
LiTWiN,
of the
language for object oriented multidatabase interoperability. In Proceedings
IFIP WG2.6 Database Semantics Conference on
(Lome, Victoria, Australis, Nov 1992), D. K. Hsiao, E.
Interoperable Database Systems (DS-5)
J.
Neuhold, and R. Sacks-Davis, Eds.,
North-Holland, pp. 119-138.
[33]
Litwin, W., and Abdellatif, A.
IEEE
MDSL.
Proceedings of the
[34]
Lloyd,
J.
[35]
Lu, H., Ooi, B.-C, AND GOH, C. H.
W.
5 (1987), 621-632.
-75,
logic
of the multi-database manipulation language
programming, 2nd, extended ed. Springer- Verlag, 1987.
On
global multidatabase query optimization.
ACM SIGMOD
4(1992), 6-11.
fiecorrffO,
[36]
Foundations of
An overview
Malone, T. W., and Rockart,
J.
F.
Computers, networks, and the corporation.
Scientific
American 265, 3 (September 1991), 128-136.
[37]
McCarthy,
J.
Generality in
J.,
and Hayes,
artificial intelligence.
Communications of
the
ACM
30, 12 (1987),
1030-1035.
[38]
McCarthy,
P. J.
Some
philosophical problems from the standpoint of AI. In
Readings in Nonmonotonic Reasoning, M. Ginsberg, Ed. Morgan Kaufrnann, 1987.
[39]
Motro,
a.,
and Yuan, Q. Querying database
knowledge. In Proceedings of the
ACM SIGMOD
Conference (1990), pp. 173-183.
[40]
MuMiCK,
In Proc.
[41]
I.
S.,
and Pirahesh, H. Implementation
ACM SIGMOD
(Minneapolis,
MN, May
Papakonstantinou, Y., Garcia-Molina,
geneous information sources. In Proc
IEEE
H.,
of magic-sets in a relational database system.
1994), pp. 103-114.
and Widom,
J.
Object exchange across hetero-
International Conference on Data Engineering (March
1995).
[42]
Quass, D., Rajaraman, a., Sagiv, Y., Ullman,
heterogeneous information.
J.,
and Widom,
In Proc International Conference
Databases (1995).
48
J.
Querying semistructured
on Deductive and Object-Oriented
[43]
SciORE, E., SlEGEL, M., AND ROSENTHAL, A.
Context interchange using meta-attributes.
Proceedings of the 1st International Conference on Information and Knowledge
Management
In
(1992),
pp. 377-386.
[44]
SciORE, E., SiEGEL, M., AND ROSENTHAL, A. Using Semantic values to
among heterogeneous information
systems.
ACM
facilitate interoperability
Transactions on Database Systems
19,
2 (June
1994), 254-290.
[45]
Seshadri, p., Hellerstein,
VASTAVA, D., Stuckey, P.
M., Pirahesh, H., Leung, T.
J.
J.,
AND SUDARSHAN,
ACM SIGMOD
and implementation. In Proc.
C, Ramakrishnan,
R., Sri-
S. Cost-based optimization for magic: algebra
(Montreal, Quebec, Canada, June 1996), pp. 435-
446.
[46]
Shanahan, M.
Prediction
deduction but explanation
is
is
abduction. In Proc. of the IJCAI-89
(1989), pp. 1055-1060.
[47]
Sheth, a.
p.,
and Larson,
J.
A. Federated database systems
geneous, and autonomous databases.
[48]
SiEGEL, M.,
AND Madnick,
S.
ACM
of the 17th International Conference on Very Large
[49]
Templeton, M., Brill,
Gregor, R. Mermaid
IEEE
[50]
75, 5 (1987),
TOMASic,
of
DISCO.
A.,
D.,
Dao,
S. K.,
Lund,
managing
distributed, hetero-
22, 3 (1990), 183-236.
Computing Surveys
A metadata approach
for
to solving semantic conflicts. In Proceedings
Data Bases (1991), pp. 133-145.
E.,
Ward,
P.,
Chen, A.
— a front end to distributed heterogeneous databases.
L. P.,
and Mac-
Proceedings of the
695-708.
Raschid,
L.,
and Valduriez,
p. Scaling heterogeneous databases and the design
In Proceedings of the 16th International Conference
on Distributed Computing Systems
(Hong Kong, 1995).
[51]
Ullman,
J.
Principles of database
Computer Science
Press,
a result of domain evolution.
ACM
and knowledge-base systems, Vol
II.
1991.
[52]
[53]
Ventrone,
v.,
SIGMOD
Record
Wetzel,
G.,
20,
JICSLP
R.,
Semantic heterogeneity
cis
and Toni,
F.
Procalog:
Programming with
constraints and ab-
Poster session, Sept. 1996.
Wiederhold, G. Mediators
25, 3
S.
4 (Dec 1991), 16-20.
Kowalski,
ducibles in logic.
[54]
and Heiler,
in the architecture of future information systems.
(March 1992), 38-49.
Soon
49
~-
ri r^
IEEE Computer
Date Due
Lib-26-67
MIT LIBRARIES
3 9080 01428 1940
Download