Document 11080416

advertisement
'-'iiVsCf
HD28
.M414
WHERE DOES THE DATA COME FROM:
MANAGING DATA INTEGRATION WITH
SOURCE TAGGING CAPABILITIES
Y.
Richard
Wang
Stuart E. Madnick
August 1990
CISR
Sloan
WP
WP
No. 214
No. 3191-90
Center for Information Systems Research
Massachusetts
Institute of
Sloan School of
n
Technology
Management
Massachusetts Avenue
Cambridge, Massachusetts, 02139
WHERE DOES THE DATA COME FROM:
MANAGING DATA INTEGRATION WITH
SOURCE TAGGING CAPABILITIES
Y.
Richard
Wang
Stuart E. Madnick
August 1990
CISR
Sloan
®1990
WP
WP
Y.R.
No. 214
No. 3191-90
Wang,
S.E.
Madnick
An
earlier version of this paper entitled, "A Source Tagging Theory for Heterogeneous
Database Systems," has been accepted for presentation and publication at the 11th
International Conference on Information Systems, 1990.
Center for Information Systems Research
Sloan School of Management
Massachusetts Institute of Technology
1.
Introduction
Source Tagging Example
1.2 Research Issues and Goals
1.3 Research Background And Assumptions
The Polygen Model
2.
2.1 The Polygen Algebra
Polygen Query Translation
3.
Example Source Tagging in the PQP
4.
The Necessary and Sufficient Condition of Source Tagging
5.
Summary and Conclusions
6.
Appendix: The Operations that Generate Table 5
References
1.1
ACKNOWLEDGEMENTS
1
3
6
8
10
11
15
19
21
23
25
27
Work reported herein has been supported. In part,
International Financial Service Research Center. MITs Center for Information
Research, and MITs IXS Digital Library Project.
by MITs
Systems
Where Does the Data Come From:
Managing Data Integration with Source Tagging Capabilities
ABSTRACT
Many
important Management Support Systems require access to and seamless
This paper studies heterogeneous database
systems from the source persp>ective. It aims at addressing issues such as the following: (1) Where is
the data from? (2) Which intermediate sources were used to arrive at that data? Sp)edfically, it
presents a polygen model for resolving these data source and intermediate source problems. The polygen
model provides a precise characterization of the source tagging problem and a solution including a
pHDlygen algebra, a data-driven query translation mechanism, and the necessary and sufficient condition
for source tagging. This model has been developed as a direct extension of the relational model to the
multiple database setting with source tagging capabilities, thus it enjoys all of the strengths of the
traditional relational model. Source knowledge is imp>ortant for many reasons. It enables users to
apply their own judgment to the credibility of the information. It enables users to rationalize and
reconcile data inconsistencies. It enables system designers to develop access charge systems. It enables
an application user to adjust data. And it enables a system to interpret data semantics more accurately.
In sum, source tagging capabilities should be a required functior\ality for future heterogeneous database
integration of multiple heterogeneous database systems.
systems.
Introduction
1.
The rapidly increasing complexity, interdependence, and competition
profoundly changed
how
corporations operate and
competitive advantage in the marketplace.
communications capability and data
It
how
in the global
market has
they align their information technology for
has been argued (Madnick, 1989) that improved
accessibility will lead to integration of
systems both within and
across organizational boundaries in the 1990s. This will lead to vastly improved group communications
and, more importantly, the integration of business processes across traditional functional, product, and
geographic
effective
lines.
The integration of business processes,
Management Support Systems
and management (Rockart
require access to
&
for
more
Short, 1989). Increasingly,
many important Management Support Systems
and seamless integration of multiple heterogeneous database systems. These types
& Embley,
Lyngbaek
& McLeod,
Litwin, et
al.,
Madnick,
for
product development, product delivery, and customer service
heterogeneous database systems have been referred
Rusinkiewicz,
demands
in turn, will accelerate
&
1987; Elmasri, Larson,
1983), Multidatabases (Ferrier
1982), or Composite Information
to as Federated
Database Systems (Czejdo,
Navathe, 1987; Heimbigner
&
Systems
of
Strangret, 1982; Litwin
(Madnick, Siegel,
&
& McLeod,
1985;
Abdellatif, 1986;
& Wang,
1990;
Wang &
1988).
In this paper,
we
study heterogeneous database systems from the multiple source perspective.
1
In particular,
we
address the following two issues:
(1)
Where
is
the data from?
Which
(2)
intermediate data sources were used to arrive at that data?
interesting to note that these issues have not been directly addressed
is
It
Contemporary heterogeneous database systems
underlying databases in order
anonymous
strive to encapsulate the heterogeneity of the
produce an illusion that
all
information originates from a single
This illusion has been referred to as location transparency or location independence
source.
(Date, 1990).
to
we have found
In our field studies of actual needs,
simplicity of location transparency for query formulation, they also
piece of data retrieved
(e.g..
are.
is
want the
that although users
want
know
to
Customer Database).
Source: Corporate
responsible for marketing, production, or finance,
the data
date.
to
the source of each
Most managers, be they
would not be much concerned about how independent
from physical storage or how distributed and heterogeneous the physical database systems
Their primary concern about data
As such, knowing
is
whether
it
the source of each piece of data
could
decision
making
them
many
facilitate their
may be
important
to
for
processes.
reasons, for
example:
•
Source knowledge enables managers to apply their
information.
useless to
•
own judgment
In our discussions with managers, several exclaimed that data
them unless they know
its
to rationalize
PLC
retrieved from LP. Sharp's Disclosure database (based in Toronto)
Dataline database (based in London).
It is
countries from which the financial data
•
their
help.
knowing
when
Canada
the source
Furthermore, since Reuters
company, the Dataline database may be more appropriate.
and
For
compared with Finsbury's
in values, thus
was compiled would
the data sources helps managers to rationalize
make
has different values
likely that different accounting practices in
and the United Kingdom would explain the difference
UK -based
totally
and reconcile data inconsistencies.
example, the attribute "Return on Equity" for Reuters Holdings
a
would be
source.
Source knowledge enables managers
is
to the credibility of the
In short,
knowing
reconcile the data inconsistencies as well as
own judgment.
Source knowledge enables a production irunager
to adjust data.
In a
manufacturing firm that
we
interviewed, production data was extracted from plants across the country in order to produce
production reports.
On
the days
vice versa, the production
volume could be
were based on 23 hours instead
apply
to all plants;
when standard time
less
is
switched
to daylight
saving time and
than the regular volume because the aggregates
of the regular 24 hours.
Note
that this
does not necessarily
Arizona, for instance, does not participate in the daylight saving time
program. With source knowledge, the production data due
time zone differences could be
to
adjusted appropriately.
•
Source knowledge enables system designers to develop access charge systems.
financial institution, analysts
have access
to
In a major
With
multiple external commercial databases.
data source knowledge, system designers could develop systems needed for internal charge back
schemes. For example, different charges could be associated with data actually returned to the
user versus intermediate data used in the query process.
•
Source knowledge enables a system
For example,
it
is
to interpret
typical for country-specific databases to
used. Thus, a product
may have
more
data semantics
a price of 2416.95.
Is it in
omit
explicit indication of currency
U.S. dollars? Japanese yens? or U.K.
pounds? Knowing the data source can often provide the necessary
the data source
exception.
is
Japan, then the system can
assume
accurately and completely.
In this case,
clarification.
that the currency
is
in
yen unless
it
is
if
an
This information can be used in conjunction with other rules to determine the data
semantics correctly.
Indeed,
it
has been suggested^ that knowing the data source
required feature for heterogeneous database systems.
is
so important that
it
should be a
Providing source tagging capabilities for
heterogeneous database systems requires an understanding of the constraints involved both
organizationally and technically.
Most organizations must deal with pre-existing information systems which have been
developed and administered independently, and are likely
One
to
remain
so.
Many
of these systems are
of our technical colleagues has suggested that data source tagging capabilities be
thirteenth rule for distributed database systems (Date, 1990).
added
as Date's
controlled
by autonomous subsidiaries or even separate corporations
that are reluctant or unwilling to
to
change
this
problem by tagging data source
Technically,
research results.
It
it
file
have
important
to
order to allow for data integration.
been retrieved from a
after the data has
to
would enable us
such as the capabilities
the
is
in
Dow Jones financial
services)
This implies that one should not require data
their systems.
be augmented in a pre-existing information system
(e.g.,
We resolve
local database.
develop source tagging capabilities based on previous database
to
enjoy
all
of the strengths of the conventional database systems,
allow for data sharing by multiple users concurrently and
handling details from the concern of application programmers. In order
to
to
remove many
accomplish
this,
we
understand the trade-offs of different data models, the mechanisms used in these models
to
of
to
perform data definition and data manipulation, and develop a new algebra and a query processing
mechanism
for facilitating
source tagging capabilities.
Our
research contributions can be
summarized
as
follows:
(1)
We
have developed a polygen modefi
(poly) source (gen) perspective.
to
study heterogeneous database systems from the multiple
The polygen model provides
a precise characterization of the
source tagging problem and a solution including a polygen algebra, a data-driven query translation
mechanism, and the necessary and
provided
(2)
We
to illustrate the basic
sufficient condition for source tagging.
A concrete example is also
mechanism.
have developjed the polygen model as a direct extension of the relational model
to the multiple
database setting with source tagging capabilities, thus the polygen model enjoys
all
of the
strengths of the traditional relational model.
(3)
We
have established a theoretical foundation
for resolving
For example, the polygen algebra can be extended
data, such as the temporal aspect of data.
To
to
query," and so on, and so forth.
other
critical
research issues.
address other basic attributes associated with
Users normally want to
know
not only where the data
is
model" will be used in the paper instead
By the same token, "polygen query" will be used instead of "global
highlight the source tagging problems, the phrase "polygen
of the conventional "global model."
many
from but also when the data was collected and how
earlier,
knowing
it
was
collected.
Furthermore, as
we
motivated
the data source will enable a user or a query processor to interpret the data
semantics more accurately; knowing data source credibility will enable the user or the query
processor to hjrther resolve potential conflicts amongst the data retrieved from different sources;
and knowing data access
n
cost will enable system designers to develop access charge systems.
SOURCE TAGGING EXAMPLE
In preparing a special rejxjrt^
member
of the
ComputerWorld
from the school
wUh
an
on the top ten graduate programs
staff called
MBA degree.
one of the schools
Suppose
to get the
that the following
in Information Systems, a
names
of
CEO's who graduated
SQL polygen query
SELECT ONAME, CEO
FROM PORGANIZATION, PALUMNUS
WHERE CEO = ANAME AND DECREE = "MBA"
was
created given a polygen schema derived from the
For expository purposes, the prefix "P"
Polygen Schema
is
used
to
Alumni Database and Company Database below.
denote a polygen scheme in the p>olygen schema.
order to select those
that
the
it
CEOs who received an MBA
degree. Moreover, the query processor needs to 'Tcnow"
has to merge the BUSINESS and the FIRM relations
ANAME attribute.
As
such, the challenge
is to
first
before joining the
CEO
attribute with
develop not only a polygen model but also a pwlygen
algebra and the algorithms for a polygen query processor capable of resolving the data and
intermediate source tagging problems for any arbitrary polygen query. Tagging the
name
accurately to the result
is
referred to as the Data
intermediate use of the Alumni Database accurately
is
Company Database
Source Tagging problem.
Tagging the
referred to as the Intermediate Source Tagging
problem.
1^ RESEARCH ISSUES
We
AND GOALS
have reviewed a broad range of
literature
The systems
heterogeneous distributed database systems.
Madnick, 1988) included
(Deen, Amin,
& C,
1986; Litwin, et
al.,
MULTIBASE
1987; Deen,
Amin,
in the
&
In addition,
1982).
and examined various research prototypes of
that
we
United States (Smith,
Taylor, 1987),
and
MRDSM
we have surveyed more
studied (Gupta, 1989;
et al., 1981),
PRECI*
in France'^ (Litwin
&
in
Wang &
England
Abdellatif,
than forty U.S. commercial systems
offering partial solutions to the heterogeneous distributed database problem, including Data
Integration's
MERMAID,
(Gupta, et
1989).
al.,
To
Cincom's SUPRA, Metaphor's DIS, and TRW's Data Integration Engine
the best of our knowledge,
none of these systems have
dealt with these source
tagging problems.
Two
related issues,
among
others,
polygen model should be created
relationship between the polygen
Most heterogeneous
(Hull
&
King, 1987;
in
need
to
be addressed
in source tagging:
order to tag multiple sources explicitly?
model and the polygen query processing
distributed database systems adopt
Peckham & Maryanski,
(1)
(2)
What kind
What
is
of
the
facility?
one of the following four data models
1988): the Relational
the Semantic Database Model, or the Entity Relationship Model.
Model, the Functional Data Model,
Each data model has merits for
its
MRDSM, an administrator may define for any collection of databases a collective name called a
multidatabase name. For instance, the databases Michelin, Kleber, and Gault_M may collectively get
the name Rest_guides. However, the focus of such names is to simplify the expression of some
commands; otherwise, these commands may require an enumeration of the corresponding databases.
In
intended purposes.
For example, both the Functional Data Model and the Sennantic Database Model
are rich in semantics
and implemented
and
rich in semantics
lends
Base
itself to
is
Entity Relationship
widely accepted as the leading database design
a simple structure and an elegant theoretical foundation.
Management Systems dominate
been extended (Codd, 1979)
to
the database market today.
we
model,
The
tool.
Model
relational
we
is
also
model
In addition, Relational Data
Moreover, the relational model has
capture semantics such as generalization and aggregation.
consider both of the rigorous and pragmatic asp)ects,
relational
The
in operational systems.
selected the relational model.
In
order to
Based on the
and intermediate
define, in this paper, a polygen model for resolving the data
source tagging problems.
One
of the
key
activities in
into a set of local queries,
translation has
Litwin
&
in turn are
& C,
& Thompson,
1987; Deen,
&
Amin,
in
a!.,
&
&
1982; Templeton, et
&
al.,
Through subtree matching,
queries, given the sp>ecific source
As we
and
will discuss later,
the
for translating a
the
1981;
1983).
A
symbolic query
and transformation
et al.,
rules^ into
trees are further translated into local
our query translation mechanism differs from the above mentioned
(1)
Instead of the view definition approach which encodes the
polygen query into the corresponding
mapping algorithm from
& Goodman,
language syntax descriptions.
target
techniques in two important aspects:
procedure
multiway
these
1984;
Czejdo, 1985; Rusinkiewicz,
1988) in which a syntax-directed parser converts a polygen query
trees.
Query
& Hwang,
Yu, 1984; Dayal
Strangret, 1982; Katz
transformation technique has also been proposed (Rusinkiewicz
multiway
query
translate a p>olygen
most heterogeneous distributed database
1986; Brill, Templeton,
Taylor, 1987; Ferrier
Abdellatif, 1986; Litwin, et
is to
routed to the corresponding local databases.
been approached through view definition
systems (Breitbart, Olson,
Deen, Amin,
which
formulating composite information
mapping
data.
As
a result,
local queries,
our mechanism separates
adding a new database
system does not require modifying the existing procedural view definitions.
(2)
Instead of the symbolic
query transformation technique which tackles a broad range of nodal query languages
Each transformation rule contains
source part and a target part. For example.
a
Source:
SELECT
Target:-
Projection
attnbute-l
FROM
relation-1
WHERE
to the existing
condirion;
((attribute-!). Selection (condition, (relation-1));
at a
higher level.
our mechanism focuses on the mapping between a polygen algebraic expression and the corresponding
local operations, pjermitting entities
1.3
(and attributes) in local databases to overlap one another.
RESEARCH BACKGROUND AND ASSUMPTIONS
We
have developed a heterogeneous database system which currently has access
internal databases (the
Alumni Database,
the Placement Database,
three external commercial databases (Finsbury's Dataline
The query processor architecture
translates
schema.
The word "polygen"
equipped with source tagging
local queries
capabilities.
1.
for the
is
The PQP
and the Student Database) and
and LP. Sharp's Disclosure and Currency).
Application Query Processor
Briefly, the
Polygen Query Processor (PQP) based on the
used here
to signify that the
in turn translates the
based on the corresponding polygen schema, and routes them
(LQP). The details of the
is
depicted in Figure
an end-user query into a polygen query
user's application
bases
is
query processor
polygen query into a
to the Local
To
the
PQP, each LQP behaves as a
return from the LQPs, the retrieved data are further processed by the
its
set of
local data
local relational system.
PQP
is
Query Processors
mapping and communication mechanisms between an LQP and
encapsulated in the LQP.
to three
Upon
produce the
in order to
desired composite information.
Many
user.
critical
problems need
to
be resolved
in order to
provide a seamless solution to the end-
These problems include source tagging, query translation, schema integration
&
Navathe, 1987), inter-database instance matching (Wang
&
Madnick, 1989b), domain mapping (DeMichiel, 1989; Shin, 1988), and semantic reconciliation (Wang
&
Navathe, 1986; Elmasri, Larson,
Madnick, 1989a).
We
focus on the
•
The data source
•
The
•
Schema
local
is
&
(Batini, Lenzirini,
first
two problems and make the following assumptions
in this paper:
tagged after the data has been retrieved from a local database.
schemata and the polygen schema are
integration has been performed,
and the
all
based on the relational model.
attribute
mapping information
is
stored in the
polygen schema.
•
The inter-database instance
security identification
identifier
number
vs.
mismatching problem
employee
8
identification
(e.g.,
IBM
vs. I.B.M or social
number) has been resolved and
the
information
is
available for the
elsewhere (Wang
&
PQP
to use.
A
discussion of this issue has been presented
Madnick, 1989b).
The domain mismatch problem such
as unit ($ vs. V), scale (in billions vs. in millions),
description interpretation ("expensive" vs. "$$$", "Chinese Cuisine" vs.
has been resolved during schema integration and the information
^
is
"Hunan
and
or Cantonese")
also available to the
Application
Composite
Query
Answer
PQP.
r
Application
Schema
Metadata
Dictionary
Figure
1
:
DBMS
DBMS
Query
Result
The Query Processor
Section 2 defines the polygen model.
Architecture
Polygen query translation
is
presented in Section
3.
Section 4 provides a detailed example of the basic polygen query processing mechanism. The necessary
and
sufficient condition of source tagging is presented in Section 5. Finally,
in section 6.
concluding remarks are made
The Polygen Model
2.
To pre^nt the polygen model more
relationships between the polygen schema
concretely,
and
their
{(database, relation, attribute),...) for the source tagging
The
we
first
exemplify the attribute mapping
corresponding
local
example described
PORGANIZATION
schemata
in Section
in the
1.
Polygen Scheme
ONAME
INDUSTRY
CEO
((AD, BUSINESS, BNAME),
(CD, RRM, FNAME))
1(AD, BUSINESS, IND))
KCD, RRM, CEO))
HEADQUARTERS
((CD, RRM, HQ))
form
{PORGANIZATION, PnNANCE,PALUMNnjS, PCAREER}
A
the
first is
fX)lygen
domain
is
defined as a set of ordered
datum drawn from
a
databases from which the
a simple
datum
domain
in
The
originates.
Each
triplets.
triplet consists of three
an LQP. The second
third is a set of
is
a set of
LDs denoting
elements:
LDs denoting
the local
the intermediate local
databases whose data led to the selection of the datum.
A
same
jX)lygen relation p of degree n
set of attributes
relation
portion,
is
an ordered
and
is
a finite set of time-varying n-tuples, each n-tuple
drawing values from
the corresponding polygen domains.
triplet c=(c(d>, c(o), c(i))
where
the intermediate source portion.
c{\)
c(d> denotes the
Two
datum
A
having the
cell in
a polygen
portion, c(o) the originating
polygen relations are union<ompatible
if
their
corresponding attributes are defined on the same polygen domain.
Note
that
schemes.
local relational
sources.
A
P contains the mapping information between
In contrast,
a fx)lygen
scheme and the corresponding
p contains the actual time-varying data and
polygen scheme P and a polygen
relation
their originating
p may be used synonymously without
data and intermediate source tags for p are updated along the
way
corxfusion.
The
as polygen algebraic operations are
f)er formed.
2.1
THE POLYGEN ALGEBRA
Let attrs(p) denote the set of attributes in p.
denote the data pwrtion,
X 6 attrs(p),
X
=
t(o)
corresponding
the originating source portion,
{xi...,x,,...,xj) is a sublist
attribute x, let pKX) be the
columns
to attribute x,
For each tuple
and
in
be the
cells in
On
p, let t(d)
the intermediate source portion.
t
corresponding
column
in
p corresf)onding
to the sublist of attributes X.
column corresponding
p while t(XXi) denotes the intermediate source portion of the
X
polygen relation
If
to
the sublist of attributes X, let t(x) be the cell in
such, p(x)(o> denotes the originating source portion of the
relation
t(i)
in a
of attrs(p), then let p(x) be the
p corresponding
let t(X)
and
t
cells
to attribute x in
corresponding
column corresponding
t
As
polygen
to the sublist
to attribute
x in
polygen relation p inclusive of the data, originating source, and intermediate source portions while
t(X)
of attributes
denotes the
in tuple
cells
t.
the other hand, p(x) denotes the
corresponding
to the sublist of attributes
11
X
in tuple
t
inclusive of the data, originating
and intermediate source portions. Note
source,
that the "( )" notation in project p(X) should not be
confused with the operation f>(x=y).
shown
has been
It
(Maier, 1983) that in a conventional relational system, a relational algebra
can be defined through five orthogonal algebraic operators.
Here we define the
five orthogonal
algebraic operators in the context of our polygen model:
Project
p{X) =
If
.
{f
p
is
a p>olygen relation,
f
=
t(X)
I
if
pA
t6
and X =
t(X)<d>
is
,
..
.
The above expression
originating source portion
projection. This
ep A
tk
U...U
tk(xj)(o>
V
X
e
xj
t'(xj)<i>= ti(xj){i>
specifies that
if
the data portion of a projected tuple
and the intermediate source portion
is
any one of the data portion of these tuples
to
By the same token, the
intermediate source portions as the
This
relation.
is
new
(pi
xp2) =
(tj
° t2
If
1
p,
ti
and
e
pv]
and
t2
€ p2 where
°
.
If
p(x e y) =
(f
p
I
is
all
new
t<d), t'(o)
involved.
On
spjecifies that the
the other hand,
if
operator will take
original source portion for each of the
drawn from
all
the originating
the intermediate sources.
specifies that each tuple in p,
=
t(o),
iftep A
f(w)(i) = t(w)<i>
t(x)<d) e t(y)<d>}.
12
is
concatenated with every tuple
Since no data items are
and intermediate source portions remain
=
unique, then the
is
project operator will take the union of the
to
a polygen relation, x € attrs(p), y e attrs(p),
t'(d>
X
denotes concatenation).
following the definition of the Cartesian product.
originating source
e
are two polygen relations, then
p)2
The above expression
Restrict
is
also correct because the projected data have been
.
xj
intermediate source portion for each of the cells in the projected
sources and have been derived with the involvement of
Cartesian product
V
be the projected data portion (since they are the same),
the union of the originating source portions as the
cells in the projected relation.
tk(xj)<i>
identical to the those of before the
one piece of data
k tuples has the same projected data portion, then the expression
and take
u...u
ti(X)<d>=...= tk(XKd)).
correct because in this case, only
is
a sublist of attrs(p), then
unique;
t'<d)=ti(X){d), t'(xj)<o>= ti(xj)(o>
if ti
{xi...,x,,...,xj) is
u
merged
in this case, the
be the same.
and
is
u
t(y)(o)
t(x)(o>
a binary relation, then
Vw
€ attrs(p),
in p2
The above expression
operator will update the intermediate source portion
restrict
and
specifies that for each of the tuple in
t(y)
because they are used
to
produce the new polygen
effected because the data portion
(pi
If
.
that satisfies the 9 relation, the
include the originating sources of
t(x)
The originating source portion
not
relation.
u P2
)
and p2 are two polygen
pi
=
(f
t
I
-
ti if
ti<d>e pi
t'=t2 if t2<d)
A
tuples, ti(d>
and
portion of the
each of the
t2{d),
new
it
n,
ti
e pi,
t2
g pj, then
6 P2;
t2<d>
t'<d>=t,<d>, t'(o>=t,(o>
union operator will copy
and both have degree
t(i>.
t,(d>« P2;
€ p, A
The above expression
relations
is
that duplicate tuples are treated as separate
Since Select and Join are defined through Restrict, they also upxiate
tuples).
Union
unique (we assume
is
to
p
u
t2<o>, tXi>=t,<i>
u
t2<i> if
t,(d)=t2<d»
one polygen
specifies that for the tuples that exist in only
new
over to be the
tuple.
On
the other hand,
if
relation, the
the data portion of
two
are identical, then the operator will copy the data portion over to the data
tuple,
cells in the
and take the union of the originating sources
new
By the same token,
tuple.
intermediate source portions as the
new
to
be the originating sources for
union of the
the operator will take the
intermediate source portion for each of the cells in the
new
tuple.
Difference
in p.
If
pi
.
Let p<o) denote the union of
and p2 are two polygen
(Pi - P2) = (f
I
t'(d>= t<d), t'(o> =
all
the t(o> sets in p, and p<i) denote the union of
relations
t(o>,
and both have degree
t'(w)<i)
= t(w)<i>u p2(o)
the
t<i)
sets
then
n,
u
all
p2<i>
V w
e attrs(p),
if t
e pi
and
t(d>ep2).
Difference selects a tuple in pi to be a tuple in (p, - P2)
not identical to those of the tuples in
in P2,
it
follows that
source set of (pi -
all
P2),
as
p>2.
the data portion of the tuple in p^
Since each tuple in pi needs
to
be compared with
all
is
the tuples
the originating sources of the data in p2 should be included in the intermediate
t'(i)
=
t(i)
u
p2(o>
u
p)2(i>
denotes.
Other traditional operators can be defined
common
if
in
terms of the above five operators.
The most
are Join, Select, and Intersection. Join and Select are defined as the restriction of a Cartesian
product. Intersection
is
defined as the project of a join over
13
all
the attributes in each of the relations
involved in the Intersection.
In order to process a pwlygen query,
polygen model:
A
is
Retrieve, Coalesce, Outer Natural
local database relation
considered as a
may
PQP
needs
to
like a
to introduce the following
Primary
join,
in the conventional
PQP.^
reside physically in the
The
Outer Natural Total
op>erators to the
and Merge.
Join,
PQP
first
before
it
polygen model because a pwlygen operation
base relation. This is required in the
view
new
be retrieved from a local database to the
require data from multiple local databases.
dynamically
any
we also need
PQP
Although a
base relation can
be materiaHzed
database system, for conceptual purposes,
Retrieve operation is defined as
an
LQP
we
define
it
to
without
Restrict of>eration
restricting condition.
and Outer Natural
Coalesce
surprising
number
one column.
We
polygen
An
have been informally introduced by Date to handle
of practical applications. Coalesce takes
Outer Natural Join
is
an outer
join
two columns as
with the
and coalesce them
into
join attributes coalesced (Date, 1983).
on
For example, the Outer Natural Primary Join for
ONAME. An
Outer Natural Total Join
PORGANIZATION
an Outer Natural Primary
is
other polygen attributes in the polygen relation coalesced as well. In the
an Outer Natural Total
number
input,
a
define an Outer Natural Primary Join as an Outer Natural Join on the primary key of a
relation.
Natural Join
Join
Join
would perform an Outer Natural Primary
of Coalesce operations
Natural Total Join to include
on INDUSTRY, CEO, and
more than two
Join
on
ONAME
It
an Outer
with
PORGANIZATION
HEADQUARTERS.
px)lygen relations.
Join
is
all
the
example,
followed by a
Merge extends Outer
can be shown that the order in which
Outer Natural Total Join are performed over a set of polygen relations in a Merge
is imnrvaterial.
Since Coalesce can be used in conjunction with the other polygen algebraic operators to define
the Outer Natural Primary Join, Outer Natural Total Join,
and Merge, we define
Coalesce as the sixth
orthogonal primitive of the piolygen model.
Coalesce
attrs(p)
.
-
Let
© denote
(x, y},
and
w
is
the coalesce operator.
If
p
the coalesced attribute of x
is
a polygen relation, x e attrs(p),
and
y,
then
This approach simplifies the Polygen Operation Interpreter, to
14
y €
b>e
presented in Section
III.
attrs(p), z
=
® y:w)
p(x
(f
I
=
f(z)=t(z), f(w)<d>=t(x)(d>, t'(w)<o> =t(xKo) ut(y)(o>
f(z)=t(z), f(wKd>=t(x)<d>, f(w)<o> =t(x)<o)
f(z)=t(z), t'(w)<d>=t(yKd>, f(w)<o> =t(y)(o>
The above expression
attribute called w.
of coalesce, they
must have
new
tuple.
source portions as the
coalesced
&
new
may be
,
specifies that attribute x
new
same
the
t'(wKi> =t(y)<i>
and
and take the union of the originating sources
By the same token, the operator
intermediate source portion of the
tuple.
Note
inconsistent.
does not
is
assumed
Section 4 to
it is
is
compose information with
know
w^ill
to
(and by the definition
copy the data portion of
be the originating sources of
union of the intermediate
new
tuple. For those tuples that
cell in
the
exist, the of)erator will
is
copy the
and
cell
environment, the data values
that inter-database instance
data source tags
new
with
to
be
mismatching problems (Wang
performed.
will
be used in
intermediate source tags. In order to
do
that,
This
presented below.
For the
Polygen Query Translation
SQL polygen query
SQL polygen query
is
= "MBA")
In this expression, those
The
result
is
presented in Section
a
corresponding polygen algebraic expression
(ANAME =CEO) PORGANIZATION (ONAME, CEO)
alumni with an
joined with the
followed by a projection on
1,
as follows:
PALUMNUS (DEGREE
relation.
exist
the process of translating a polygen query into a query execution plan.
3.
for the
t(x)<d)=nil}.
if
have presented the polygen model and the polygen algebra. The algebra
necessary to
process
t(x)<d>=t(y)(d);
will take the
that in a heterogeneous distributed
It
if
attribute y will be coalesced into a
Madnick, 1989b) will be resolved before the coalesce operation
We
,
,
t(yKd)=niI;
if
,
value), then the coalesce operator
either the data portion of attribute x or attribute y
data over to the
f(w)<i) =t(x)(i)
,
both of the data portion of attribute x and attribute y
If
the cell over to attribute w,
the cell in the
f(w)<i) =t{x)<i> ut(y)<i>
,
MBA
degree are selected from the
PORGANIZATION
ONAME and CEO.
15
relation
where an alumnus
PALUMNUS
is
also a
CEO,
In general, the
PQP
takes a polygen algebraic expression as an input
and produces a query
execution plan for retrieving data from the local databases and formulating composite information.
Three components are involved
Interpreter,
in
this process: the Algebraic
and the Query Optimizer, as shown
Query
Polygen
Intermediate
Operation
Operation
Execution
Expression
Matrix
Matrix
Plan
The Polvgen Querv
2:
Operation Matrix.
is
a
polygen algebraic expression and generates a Polygen
1
below.
The
performed on the Left-Hand Relation (LHR)
Attribute
need
for a
(LHA) DEGREE and
Details of the Algebraic Analyzer
1:
is
first
row
indicates that a Select operation should be
PALUMNUS
using the 9 relation "=" between the Left-
the Right-Hand Attribute
Right-Hand Relation (RHR).
Table
Translation Process
For example, the Polygen Operation Matrix for the example polygen algebraic
presented in Table
Hand
Operation
2.
Polygen
The Algebraic Analyzer parses
PR
Figure
Algebraic
Figure
expression
in
Analyzer, the Polygen
beyond
The
result is
(RHA) "MBA."
In this case, there
for the
no
denoted by R(l), a Polygen Relation (PR).
the scope of this paper.
The Polygen Operation Matrix
is
Example Polygen Algebraic Expression
The input
to
pass one
is
Intermediate Operation Matrix.
a Polygen
The output from pass one (and input
Intermediate Operation Matrix, as
depends on where
first
row
of Table
the data resides.
2),
it
is
Operation Matrix as Table
shown
in
Table
Note
that
when
2.
The execution
1
to
exemplifies and an empty
pass two)
is
a half-processed
location (EL) of
the execution location
is
an
LQP
(e.g.,
Table
2:
AD
in the
also used as the originating source tag for each of the cell, c(o), of the polygen
base relation (R(l) in this case).
PR
an of)eration
A Half-Processed lOM Generated h\ Pass One of the POI Algorithm
Table
3:
An Intermediate Operation Matrix for the
alternative execution plans, factors the differences in speeds into
all
queries sent to a local
In this
DBMS can
example, the
first
further processed
by the PQP
to the
in order to
systems will most likely have their
optimization methods.
As
cost evaluation,
and insures
that
be processed there (Dayal, 1983).
two rows of Table
simultaneously and the third row
its
Company Database (CD) LQP. The
produce
own
3 are routed to the Alunnni Database
a
(AD)
LQP
returned relations are
composite answer. Note also that the local database
high-level query languages, such as
SQL, with
their
own
such, the algebraic expressions could be synthesized before sending to the
corresponding local database systems.
4.
We now
Example Source Tagging in the
illustrate the processing of the
relations using Table 3 as a
example polygen query assuming the following
local
query execution plan.
The Alumnus Relation (AD)
AID#
PQP
The Career Relation (AD)
The Business Relation (AD)
(3)
The polygen query processor can derive the information
Database's
BNAME
relation
and the information
of
and Company Database's
(ONAME, (AD,
that
FNAME
Genentech
from the pKjIygen schema
relation
his information can be
CD)).
from the Alumni
is
shown
to the
user upon
request with a simple mapping.
In this simple example, the data source information can be obtained
polygen schema. The intermediate source information
is
by inspection from the
not observable from the polygen schema. In a
federated database system with hundreds of databases in which a polygen query
is
optimized
to select
only the relevant databases for information retrieval, the data source information observed from the
polygen schema
is
a suf)er set of the result obtained
by the PQP.
We now
turn our attention to other
theoretical issues of source tagging.
The Necessary and
5.
The polygen model presented
in Section 2 is
based on the assumption that the source
the cell level after the data has been retrieved from a local database.
addressed in
this section:
(1)
Tagging
Sufficient Condition of Source
Two
(That
is,
tagged
at
fundamental issues are
How many other p>otential approaches exist for source
the closure property hold for the polygen algebra?
is
(2)
Does
does a polygen operation over a
set of
tagging?
polygen relations always produce a px)lygen relation?)
We address
these
two issues through the following lemma and theorem.
that although there are four conceivable
sources are tagged by
(Lemma)
to tag sources, the closure
by
cell,
by
tuple,
Since the pxjlygen model
is
by
Model
to a
attribute,
and only
if
to source
and by relation.
based on the Relational Model, the granularity of a data object
the other hand, the granularity cannot be finer than a
relation.
if
polygen model, there exists four ways
tagged cannot be coarser than a relation because a relation
On
property holds
we show
cell.
In extending the Relational
tagging:
ways
Specifically,
In addition, source tags are deleted or
be
the basic unit of an algebraic operation.
cell
because a
cell is
the smallest unit of a
updated by algebraic operators,
operations either by tuple (Cartesian product, union,
21
is
to
difference,
and
restrict)
or
all
by
of
them perform
attribute {project.
coalesce).
may
follows that sources
It
be tagged by
^Theorem) The closure property holds
if
and only
cell,
if
operation,
e^ e E,
eit'
e E,
if
cell,
is
by
polygen model
V
e e E.
portion can be
f is
same
is
by
of
ei
(e,
and
t2
e<o))
e
where
62,
°
model
in
(ei
,
e]<o» x
e2<o».
model can be expressed as
ti(d>=t2(d>
t2(x)(o)
if
ti(x)(o>
may
.
is
by
Cartesian product.
If
denote an algebraic
Cartesian product, union, or
=
e
is
show
((.e^ e^') for
by induction,
that,
shown below;
the intermediate source
use the notations developed in Section
is
by
2.
cell.
is
is
not by
by
cell.
It
by
relation,
follows,
by the
attribute, or
polygen model can be expressed as
By
if
polygen relation defined by the
a
definition, the operation yields
(t,
by
(e, e<o)).
"
t2
:
However, the
result
t2(x)<o).
It
is
by
be different from
attribute, then
t2.
t^
follows that source
It
an attribute
in this
cannot be expressed in the form of
(e(x),
is
follows that source tagging by attribute
is
not feasible.
polygen model can be expressed as
(t,
t(o)).
not feasible.
By
contradiction,
we conclude
u
e<x)(o» because
the similar argument, the result cannot be expressed in the form of
follows that source tagging by tuple
polygen
Consider union. By definition, tXd)=ti(d), t'(xKo)=t,(x)(o>
tuple, then a tuple in this
By
we
may
t,
source tagging
(e(x), e(x)<o)).
be different from
source tagging
f
denotes concatenation). However, the result cannot be expressed in the form
not feasible.
is
i.e,
which source tagging
because the originating source tags from
tagging by relation
cell.
We now
and source tagging
(e2,
relation.
project, restrict, or coalesce; e^t]
relation, then a relation in this
Consider the Cartesian product of
e
if f is
=> Source tagging
that the closure property holds
source tagging
E
e^ e
token. For consistency,
that there exists a polygen
by
f(ei, e,') if f is
the originating source portion is
The closure property holds
Suppose
If
some
=
62
then the closure property holds,
the
is
denote two base p>oIygen relations. Let
Cartesian product, union, or difference.
Only
shown by
(Proof) Part 1:
tuple.
ei'
by
attribute, or
possible combinations of algebraic operations
all
f(e]) if f is project, restrict, or coalesce.
source tagging
Lemma,
and
Similarly, let ey+i = He),) for
difference.
some
=
e2
ei
by
tuple,
source tagging
Let E denote the set of results obtained from
defined in a polygen model. Let
by
If
Consider
(t,
t<o».
It
that the proposition
is true.
Part
2:
Source tagging
The premise
is
by
cell =>
that source tagging
is
The closure property holds.
by
cell justifies
22
the usage of the polygen model presented in
By
Section 2 in the following proof.
ej
,
V
ej
6 E, and the closure property holds
Assuming
holds
V
the model's definition, t<o) is the set of the originating source
ev+i
e,
V
e^ £ E,
we show
the closure property also
that
e E.
Two
tk(xj)<o)
V
xj
V
€ X.
x;
For Cartesian product, ey^i =
For
=
t(xj)(o)
difference,
e ek A (t(x)(d> 8 t(y)<d))).
e^^i
=
(e^
we
have presented
Source Tagging problems.
Furthermore,
we have
In the
second case,
e'^
=
)
(t
=
:
(t,
t
°
t2
g e^
and
e e^, t<d>« e'J. For
same
in
e'^
t2
e
P,
where "denotes
e'^)
restrict,
e^*]
= ev(x 9 y) =
(
Cartesian product, difference, and restrict,
)
,
= (e^-
e^^i
coalesce following the similar
conclude that the proposition
a
t,
:
e'^)
and
ei-^i
= e^ix 6
y).
t
:
t
it
The
arguments. From the Principle of
is true.
Summary and Conclusions
polygen model
for resolving the
Data Source Tagging and Intermediate
The polygen model research addresses issues
a perspective that, to the best of our
in data integration
from the
knowledge, has not been studied
to date.
presented a data-driven query translation mechanism for mapping a polygen
algebraic expression into a set of intermediate polygen op>erations dynamically.
System
(2)
t'(xj)<o>= ti(xj)<o>
x
(ev.- e'k)
and
6.
-
e X.
follows that the closure property holds for 6^+] = etCX).
Since t(o) remains the
closure property holds for union
"where" persp)ective
xj
unique and
It
follows that the closure property holds for e^^i = (e^ x
Mathematical Induction,
V
t(X)(d> is
Since the closure property holds for ti(X)(o>=...= tk(X){o>, thus the closure
6 X.
property also holds for t'(xj)(o)
concatenation.
cases need to be considered: (1)
In the first case, t'(xjXo>
ti(X)<d)=...= tk(X)(d).
We
te
e E.
that the closure property holds for
For projection, e^+i = e^(X).
U...U
V
V
A
Prototype, called
has been implemented (Yuan, 1990) to demonstrate the feasibility the polygen model and the
polygen query processing capability presented
in this paper.
This research has also provided us with a theoretical foundation for further investigation of
many
other critical research issues in heterogeneous distributed systems, for example the cardinality
inconsistency problem which
is
inherent in heterogeneous database systems.''
It
also enable us to
Under the relational assumption, the cardinality inconsistency problem exists in heterogeneous
database systems because the referential integrity is not enforceable over multiple pre-existing
23
interpret information
from different sources more accurately. By storing the metadata about each of the
data sources in the PQP,
many domain mismatch,
semantic reconciliation, and data conflict problems
could be resolved systematically using the data and intermediate source tags.
polygen models can be developed
for
Furthermore, other
heterogeneous distributed database systems based on the Entity
Relationship Model, the Functional Data Model, and the more recent object-oriented models (Manola
Dayal, 1986;
Shaw & Zdonik,
1990).
The data source and intermediate source information can be very valuable
the polygen query processor in formulating cost-effective, customized,
information in a federated database environment.
seamless access
to
&
to the user as well as
and credible composite
As more and more important applications require
and integration of data from multiple heterogeneous database systems both within
and across organizational boundaries, these
capabilities will also
become increasingly
critical.
databases which have been developed and administered independently and are likely to remain
24
so.
Appendix: The Operations that Generate Table
The second and
third
row
of Table 3 indicates that the
be retrieved from the Alumni Database and the
corresponding data source
A2
cells are the set
below. The intermediate source
is
BUSINESS and FIRM
Company Database
set
relations should
respectively.
(AD) and (CD) respectively as show^n
an empty
5.
in
As
such, the
Table Al and Table
because no other data sources have been involved
obtaining these relations.
Table Al: The Business Relation
in
Table A3 The Outer join of Table Al and Table
BNAME
A2
References
[1]
[2]
Batini, C, Lenzirini, M., & Navathe, S. (1986). A compararive analysis of methodologies for
database schema integration. ACM Computing Survey, lfi(4), pp. 323 - 364.
Bernstein, P. A., et
ACM
[3]
(1981).
al.
C^ery Processing
& Thompson,
Breitbart, Y., Olson, P. L.,
heterogeneous database system. Los Angeles,
[4]
System
in a
for Distributed
Databases (SDD-1).
Transactions on Database Systems, 6(4), pp. 602-623.
G. R. (1986). Database integration in a distributed
CA. February 1986. pp. 301-310.
Templeton, M., & Yu, C. (1984). Distributed query processing strategies in MERMAID,
management systems. First International Conference on Data Engineering. Los
Angeles, CA. February 1984. pp. 301-310.
Brill, D.,
a frontend to data
[5]
&
Ceri, S.
Pelagatti, G. (1984). Distributed Databases Principles
&
Systems (1st
ed.).
McGraw-
Hill.
[6]
Codd,
E. F. (1979).
Extending the relational database model
to
capture more meaning.
ACM
Transactions on Database Systems, 1(4), pp. 397-434.
[7]
Czejdo, B., Rusinkiewicz, M., & Embley, D. (1987). An approach to schema integration and
query formulation in federated database systems. The 3rd International Conference on Data
Engineering. Los Angeles, CA. 1987. pp. 477-484.
J. (1983). The outer
England. 1983. pp. 76-106.
[8]
Date, C.
(9]
Date, C.
[10]
J.
An
(1990).
The 2nd International Conference on Databases. Cambridge,
join.
Introduction to Database Systems (5th ed.).
Addison Wesley.
Dayal, U. (1983). Processing Queries Over Generalization Hierarchies in a Multidatabase
The 9th International Conference on Very Large Data Bases. August 1983. pp. 342-353.
Systems.
Ill]
[12]
Dayal, U. & Hwang, K. (1984). View definition and generalization for database integration in
multidatabase system. IEEE Transactions on Software Engineering, SE-10. pp. 628-644.
Deen,
S.
M., Amin, R. R.,
& C,
T.
M.
(1987).
Data integration in distributed databases. IEEE
Transactions on Software Engineering, SE-13. pp. 860-864.
[13]
[14]
[15]
Deen, S. M., Amin, R. R., & Taylor, M. C. (1987). Implementation of a prototype for PRECI*.
Computer Journal, 20.(2), pp. 157-162.
DeMichiel, L. G. (1989). Performing operations over mismatched domains. The Fifth
Intemationl Conference on Data Engineering. Los Angeles, CA. February 1989. pp. 36-45.
Elmasri, R., Larson,
J.,
&
Navathe,
databases and logical database design.
[16]
[17]
S.
(1987).
Honeywell
Schema integration algorithms for federated
Inc., Submitted for Publication. 1987.
& Wong, E. (1978).
ACM-SIGMOD Conference. May
Epstein, R., Stonebraker, M.,
Distributed
Database System.
1978.
Ferrier, A.
&
Strangret, C. (1982). Heterogenity
in
systems Sirius-Delta. TTie 8th International Conference
Query Processing
in a Relational
database marmgement
on Very Large Data Bases. Mexico City,
the
distributed
Mexico. 1982.
[18]
[19]
Gupta, A. (1989). Integration of Information Systems:
334). New York, N.Y.: IEEE Press.
Gupta,
for
A., et al. (1989).
Cambridge,
[20]
[21]
architecture comparison of contemporary approaches
information systems.
and products
Sloan School of Management, MIT,
MA 02139. 1989.
Heimbigner, D.
ACM
An
heterogeneous
integrating
Bridging Heterogeneous Databases (pp.
&
McLeod, D.
(1985).
A
Federated architecture for information management.
Transactions on Office Information Systems, i, pp. 253-278.
Hevner, A. R.
&
Yao,
S. B. (1979).
Query Processing
in Distributed
Database Systems. /£££
Transactions on Software Engineenng, SE-5 (3). pp. 177-187.
[22]
Hull, R.
&
King, R. (1987). Semantic database modeling:
27
survey, applications, and research
issues.
[23]
ACM
Computing Surveys, 12(3), pp. 201-260.
& Goodman,
Katz, R. H.
Chen
system. In P. P.
ER
(pp. 259-279).
[24]
W. &
Litwin,
N. (1981). View processing in MULTIBASE, a heterogeneous database
View processing in MULTIBASE, a heterogeneous database system
(Ed.),
Institute.
Abdellatif, A. (1986). Multidatabase interoperability.
/£££ Computer,
,
pp. 10-
18.
[25]
[26]
Symposium on
al. (1982). SIRIUS system for distributed data management.
Distributed Databases. Berlin, West Germany. 1982. pp. 311-366.
Lyngbaek,
& McLeod,
Litwin, W., et
An
D. (1983).
approach
to
object sharing in distributed database
on Very Large Data Bases. October 1983. pp. 364-374.
Madnick, S. E. (1989). Information Technology Platform for the 1990s. In M. S. S. Morton (Ed.),
Information Technology Platform for the 1990s (pp. 19-48). Cambridge, MA: Sloan School of
Management, MIT.
systems.
[27]
P.
The 9th
International Conference
S. E., Siegel, M., & Wang, Y. R. (1990). The Composite Information Systems
Laboratory (CISL) project at MIT. /£££ Data Engineering, 12(2), pp. 10-15.
[28]
Madnick,
[29]
Maier, D. (1983). The Theory of Relational Databases (1st ed.).
[30]
Manola,
&
F.
Dayal, U. (1986).
PDM: An
object-oriented
Workshop on Object-Oriented Database Systems.
[31]
International
Peckham,
J.
&
Maryanski,
F. (1988).
Pacific
Computer Science
data
model.
The
Press.
International
Grove, CA. September 1986. pp. 18-25.
Semantic data models.
ACM
Computing Surveys,
2fl(3),
pp.
153-189.
[32]
Rockart,
Sloan
[33]
J.
F.
&
Short,
J.
E. (1989). IT in the 1990s:
Management Review, Sloan School
of
Managing Organizational Interdependence.
Management, MFT, M(2), pp 7-17.
& Czejdo, B. (1985). Query transformation in heterogeneous distributed
The 5th International Conference on Distributed Computing Systen\s. Denver,
Rusinkiewicz, M.
database systems.
CO.
1985. pp. 300-307.
[34]
Rusinkiewicz, M., et al. (1988). Query processing
system. University of Houston. 1988.
[35]
Shaw, G.
&
Zdonik,
S.
International Conference
B. (1990).
A
in
omnibase
-
a loosely coupled multi-database
query algebra for object-oriented databases. The Sixth
on Data Engineering. Los Angeles, CA. February
1990.
[36]
Shin, D. G. (1988). Semantics for handling queries with missing information.
International Conference on Information Systems. 1988. pp. 161-167.
[37]
Smith,
J.
M., et
al.
(1981). Multibase
Systems. 1981 National
[38]
Templeton, M.,
et
al.
heterogeneous databases.
[39]
Wang,
Y. R.
&
-
Wang,
Y. R.
systems.
Madnick,
[41]
Wang,
& Madnick,
ACM
Y. R.
Heterogeneous
Distributed
S. E.
Database
1981. pp. 487-499.
(1983). An Overview of the MERMAID system
IEEE EASCON. Washington, DC. 1983.
-
a frontend
to
among information systems. Composite
Sloan School of Management.
(1988). Connectivity
Information Systems (CIS) Project (pp. 141).
[40]
Integrating
Computer Conference.
The Ninth
S. E. (1989a).
MIT
Facilitating connectivity in
composite information
Data Base, 2Q.O), pp. 38-46.
&
Madnick,
S.
E. (1989b).
integrating autonomous systems.
The
The inter-database instance identification problem in
Conference on Data Engineering. Los
Fifth International
Angeles, CA. February 1989b. pp. 46-55.
[42]
[43]
Y. R. & Madnick, S. E. (1990). A Polygen Model for Heterogeneous Database Systems:
The Source Tagging Perspective. To Appear in the 16th International Conference on Very Large
Data Bases. Brisbane, Australia. August 1990.
Wang,
Yuan, Y. (1990). The design and implementation of system P: A polygen database masnagement
system. (CIS-90-07) Composite Information Systems Laboratory, Sloan School of Management,
MIT, Cambridge, MA. May 1990.
28
'1
o
t
17 7
Date Due
i£? l^ B9
orT,
07 ^M
Lib-26-67
MIT LIBRARIES DUPL
3
TDflD
0D701S71
T
Download