Document 11048920

advertisement
..-^
4
Dewfv
HD28
M4 1
.
')w.
i'^^-&7
ALFRED
P.
WORKING PAPER
SLOAN SCHOOL OF MANAGEMENT
INTEGRATING DISPARATE DATABASES
FOR COMPOS ite-ansv:frs
Stuart E. Madnick
Y.
September 1987
Richard Wang
#WP 1936-87
MASSACHUSETTS
INSTITUTE OF TECHNOLOGY
50 MEMORIAL DRIVE
CAMBRIDGE, MASSACHUSETTS 02139
INTEGRATING DISPARATE DATABASES
FOR COMPOSITE-ANSWERS
Stuart E. Madnick
Y.
Seotember 1987
Richard Wang
#WP 1936-87
'
"IT. LIBRARIES
m
Proceed i ngs or the Hawaii International Conrerence
To appear
on System Sciences January 1988.
,
Integrating Disparate Databases
For Composite- Answers
Stuart E. Madnick
Y.
Richard Wang*
E53-320, Sloan School of Management
MIT, Cambridge. ^LA. 02139
(617)253-6671
*
On
leave from the
MIS Department,
University of Arizona, Tucson,
ABSTRACT Many
databases
to
to
important information systems require multiple independent
work together within and'or
increase productivity.
Systems (CIS).
A
College of BPA,
AZ 85721
We
across organizational boundaries in order
refer to this type of systems as
key area of research
in
CIS
is
Composite Information
logical connectiuity
which deals with
the process of accessing disparate databases in concert for composite-answers.
major problems that need
to
contradiction, inconsistency,
be overcome
to
The
attain logical connectivity include
and ambiguity.
This paper presents an approach
to resolve these
problems through enhancing
the semantic power of the database integration. This approach exploits concepts
drawn from frame-based knowledge representation and rule-based inferencing. An
object-oriented prototype
is
also presented to illustrate the process involved in
formulating composite-answers where different levels of abstraction are required.
KEY WORDS AND PHRASES
:
Distributed Database Systems, Object-Oriented
Programming, Organizational Information Systems, Strategic Computing.
ACKNOWLEDGEMENTS Work reported herein has been supported in part by the
Department of Transportation's Transportation System Center, the U.S. Air Force,
the Center for
Citibank.
Management
of Information at the University of Arizona,
and
1.
Introduction
Significant advances in the price, speed-performance, capacity, and capabilities
of
new database technology have created
a
wide range of opportunities
for
commercial applications. These opportunities are widely recognized now as
strategically important to corporations. It
is
also increasingly evident that the
identification of strategic applications alone does not result in success for an
A
organization.
careful and delicate interplay between choice of strategic
applications, appropriate technology,
attain success,
as depicted in Figure
and organizational responses must be made
1 [2].
An
effective information
system
is
to
one
that successfully align the problems and opportunities across these three domains.
Strategic Applications
Organization
Figure
1
A
Technology
Strategic Applications. Technology,
and
OrganizationalResearch Initiative (SATOKI)
One important category
linkage
(e.g.,
integration
tying into supplier and, or buyer systems) and. or intra-corporate
(e.g.,
tying together disparate functional areas within a firm). This
category of systems
hereinafter.
of strategic applications involve inter-corporate
is
referred to as Composite Information Sv.s/em.s (CIS)
[2, 6, 14]
This paper addresses issues involved in the evolution of separate systems
more
fully integrated
CIS
in order to
to a
gain strategic advantage. Benefits of CIS and
recent research activities are presented in the remainder of this section. Section 2
provides a case study of multiple tour-guide databases
to illustrate
the strategy for
semantic integration of disparate database systems. Section 3 presents a solution
that combines Data Base
Management System (DBMS) and
Expert Systems (ALES) techniques
Artificial Intelligence
/
to facilitate logical connectivity. In particular,
concepts drawn from frame-based knowledge representation and ruie-based
inferencing are exploited. Furthermore, an object-oriented prototype system was
developed
to test this
approach. Finally, concluding remarks are
made
in section 4.
Benefits of CIS
A
key idea of CIS
is
the integration of disparate databases for composite
answers. Without integration,
prone
to
it is
difficult,
obtain key information which
expensive, time-consuming, and error-
may
be distributed in databases located in
difTerent divisions of different organizations.
Consider the following case study of a major international bank
Figure
2.
as
shown
in
Currently, three separate database systems are being used for loan
management, cash management, and
letters-of-credit processing.
requests that $50,000 be transferred to another account.
in the
[2],
If
that client's cash balances
funds transfer system can not cover that transaction,
though that client
may have
a 520,000,000
Suppose a client
it
will be rejected
- even
active letter-of-credit! This rejection,
besides being annoying to the client, will require significant effort to correct by
manually drawing on the
letter-of-credit to cover the funds transfer. If the
connect the three separate database systems together
concert, so that funds can be automatioallv
to
drawn on the
bank can
access information in
letter-of-credit,
then
®
/ £ESSX \
LOAN
CASH
MANAGEMENT
MANAGEMENT
SYSTEM
SYSTEM
I
X
DATABASE
DATABASE
DATABASE
9
3
1
Figure 2
An
Electronic
LETTER OF
CREDIT
SYSTEM
Banking Svstem Without Integration
product-differentiation will be achieved via the enhanced quality of service,
reprocessing costs can be reduced since special
and
manual intervention can be avoided.
Recent Re s earch Activities
Researchers in the
DBMS
and
ALES
DBMS
AI'ES community has begun
to
have been striving
to
enhance their
The database community seeks additional
systems in a converging direction.
semantic capability in their
fields
to
"understand" more of the real world, while the
address the increasing
This converging trend has prompted researchers
to
demand
for
database access.
address the connectivity problem
from different perspectives. Furthermore, twelve methodologies
for
database schema
integration have been analyzed and compared by Batini, Lenzerini. and Navathe
We
summarize some recent research
The open-systems group
It
is
[5]
activities that benefited our
[1].
work below.
deals with highly parallel, distributed, open systems.
based on the belief that future computer applications will involve the
interaction of subsystems that have been independently developed at disparate
geographical locations. Message Passing Semantics
developed
The
is
a
methodology being
to tackle connectivity.
MULTIBASE research project [3,
15] attempts to provide a
through a single query language and database schema
to
uniform interface
data in pre-existing,
heterogeneous, distributed databases. The Functional Data Model and the data
language
DAPLEX
are used as the
execute queries that
may
common
data model and language
to efficiently
require data from different databases with different
schemata, data models, and query languages.
The federated architecture research
[4, 9]
aims at uniting a collection of
independent database systems into a loosely-coupled federation. The export-schema
specifies the information that a
component
will share
with other components, while
the import-schema specifies the non-loca! information that a
component wishes
to
manipulate.
Each
of these research efforts has addressed certain aspects of connectivity. In
comparison, the research goals of the CIS research project
[2, 6, 14]
is
twofold: (a) to
develop a methodology for integrating disparate database systems with a particular
focus on logical connectivity; and (b) to demonstrate the feasibility of a system
capable of attaining logical connectivity in a real environment with satisfactory
performance.
2
Developing CIS
Recent advances
in
DBMS
and
AlES
have provided a new perspective
design of CIS. At the external level, query languages such as
Forms (QBF)
in
INGRES
have been employed
Although these interfaces are powerful, the user
to
statement-of-objective which
(IFEP).
is
A CIS
and Query-By-
needs
is to
to follow certain formats.
formulate composite answers
even more powerful
if
the user can
should allow the user
to
query by a
in order to achieve the user's objective, it wil be
simply "query by objective (QBO)."
the
provide user-friendly interfaces.
still
Since the purpose of accessing multiple databases
QUEL
to
processed by an Intelligent Front-End Processor
IFEP eventually produces
a composite-query executable by the underlying
system.
At the conceptual
level, a
composite-query will be processed
may
required information which
to
produce the
be distributed in disparate databases located in
different divisions of different organizations. Integrating these disparate databases
requires
knowing where
all
the data are stored, facilitating all the necessary
interfaces, constructing queries to retrieve the data,
accumulating
local results,
resolving local differences, and integrating the results into composite-answers.
MULTIBASE
[15], for
Database Interfaces (LDI)
for
database integration
to
example, used a Global Data Manager (GDM) and Local
provide a uniform interface.
have been proposed
categorized into two contexts":
(a)
in the past decade.
They are
view integration which produces a global
conceptual description of a proposed database; and
il)
Many other methodologies
(b)
database integration which
.Mulli-database interoperability has al^o been proposea Ja an alternative approach. It assumes
may access basicall> have no silolial -chema The system should provide
that the databases the user
the user with functions for manipulating data that
mutually nonintcgrated
I3i.
may
be in vi-ibly distinct :^chemata and
may
be
produces the global schema of a collection of databases. This global schema
integrated view of all databases in a distributed database environment
To develop
is
an
[1].
strategic applications of databases quickly in order
tangible benefits before a full-scale heterogeneous database integration
to
is
obtain
launched,
the concept of virtual-drivers' -^ can be applied, as delineated below. Note that the
virtual-driver concept
component
is
is
equally applicable in an environment where each
a distributed database
management system such
as R* or
INGRES".
Virtual-Drivers
In the virtual-driver concept [14], users can
their traditional functions, as
shown
still
in Figure 3.
use real terminals
to
perform
Meanwhile, virtual-drivers are
created which are indistinguishable to the system from users of real terminals.
A
user interested in obtaining composite-answers from multiple databases invokes the
CIS-Executive which mediates the virtual-drivers. The CIS-Executive invokes each
system (via
among
its
virtual-driver) to obtain the necessary information. Incompatibilities
the database systems are resolved by the CIS-Executive which then presents
a composite-answer to the user.
when using the
levels of connectivity
need
to be
considered
virtual-driver approach, as discussed below.
Physical connectivity refers
disparate databases. Although
connectivity
Two
(e.g.,
to the process of
many R&D
actual communication
among
issues need to be addressed in physical
bandwidths, security, availability, and reliability) in order
to
insure an effective CIS. we assume that communication solutions are available, and
focus on
Lomcal
connectivity which
is
even more challenging. Logical connectivity
imporlanl not. to confuse the concept of virtual-driver with \ irtual-lerminal protocols The
virtual-terminal protocoU ha\e iiecn iiucnieci to try to hide teimmal idiosyncracie> from
application prok^rams throuyih mappiiiii of i-cal terminal- onto a hypothetical network virtual
terniinaL In contrast to the narrow mappin:^. the \ irtual-ciriver concept aims at accessing separate
databases in concert to tormulate composite answers
(2) It is
PROCESSING
PROCESSING
PROCESSING
COMPONENT
COMPONENT
COMPONENT
J
K
DATABASE
DATABASE
DATABASE
1
J
K
I
fC
Figure 3
Integrating Disparate Databases
refers to the process of accessing disparate databases
answers. The major problems that need
to
in concert
for
composite-
be overcome to attain logical connectivity
include syntax and semantic contradiction, inconsistency, and ambiguity due to
different assumptions
made
in disparate databases.
instead of logical connectivity hereinafter.
A
For brevity, connectivity
travel-guides case
is
used
is
used
to illustrate
connectivity strategy.
Connectivity Strategy
Travel-guides are easy
to
understand, abundant in data-semantics, and
representative of the situation involved in CIS. Litwin and Abdellatif reported a
prototype multi-database system using tour-guides for Paris as an example
conventional
DDL/DML
is
extended
in
[8].
The
their prototype to incorporate the
schema
complexities introduced by the explicit recognition that a global
available. Similar to
MULTIBASE,
Abdellatifs prototype via
DBA or application
DDL/DML.
In other words, connectivity
programmers; therefore, opaque
rule-based inferencing as a strategy
is
that questions such as
of the end-user,
making
Other advantages
to
in
is
Litwin and
encoded by the
the end-user. In contrast,
for
An advantage
"why and how" can be answered upon
of this
the request
way.
to illustrate
Massachusetts. 1987 (abbr.
connectivity strategy.
AAA
hereinafter), the
They
FODOR's
England. 1987 (abbr. FODOR), and The Spirit of Massachusetts. 1987 (abbr.
MASS). One might envisage
that
AAA
implemented
is
INGRES* where
in
implemented
organizations.
in distributed
Oar
focus
ORACLE, and MASS
in
R* by competing
on the semantics of each database, not the
is
each
FODOR
state has a local physical database transparent across the states. Similarly,
is
we
the connectivity-process m.ore transparent to the end-user.
will be delineated along the
AAA Tour-book
New
to
attain connectivity.
Three tour-guides are presented below
are
attempted
is
not
drawn from frame-based knowledge representation scheme and
exploit concepts
approach
connectivity
is
DBMS
used
to
implement the database.
Interacting with a workstation which uses an CIS-Executive to drive these
distributed databases, a travel agent
may
seek
to
meet
established by a client in her office. Suppose that the clieni
is to
maximize the quality
the composite-query produced by
facilities of
see
of
Logan Airport Hilton
how we may formulate
Logan Airport Hilton
schemata shown
in
IFEP
in
is
to
cost.
Boston"
o
statement-of-objective
Suppose also that part of
obtain a composite-answer of the
Boston from the tour-guide databases.
a composite-answer for the query
in
Figure
minimum
of facility with
tement-of-objective
th
"what are the
Let us
facilities
from the tour-guide databases with database
4.
8
AAA
AAA-Iafo:
.•V.^
Relations
(Name*. Address, Rate-Code. Lodg'.ng-Type. Claisification,
Phoned, Other)
(Address*, Direction)
A- Direction:
(Name*, Facility*)
(Name*, Credit-Card*)
(Name*. Season*, IPL, IPH. 2P1BL. 2P1BH. 2P2BL. 2P2BH, XP, F-code)
AAA-Facility:
A-AA-Credit:
AAA-Rate;
FODOR
Relations
(ID#*, Name. Address, Comment, Location, Package, Category)
(ID**, Phone^i')
FODOR-Info:
FODOR-Phone:
FODOR-Facility:
FODOR-Service:
(ID***, Facility')
MASS-Info:
(Name*. Address, Facility-Tytpe, Rating, #-of-Rooms, Other)
(Name*, Phone**)
(Name*,
(Name*. Amenity-code*)
(Name*, Package-Name*, Package-Descript)
(ID**. Service*)
MASS
MASS-Phone;
M.-\SS-Package:
Relational Schemata for
Figure 4
In ortier to retrieve the data
and data dictionaries need
MASS
to
(i.e.,
"What
are the facilities of
AAA, FODOR. and MASS. As
be accumulated, as
shown
necessary
that amenity in
to realize
in
addition, the amenity-code in
construct the
Table
automatic on most current
COLUMNS
Logan Airport Hilton
in
DBMS,
2.
and MASS.
facilities as the
It is
to
decompose
in Boston?") into
the
a result, data of Logan Airport Hilton can
The reader may have recognized that
has
to
Table
be converted
2.
it is
(e.g.,
6
means
pool) in order
Although such mappings are not
the^e transformations are fairly straightforward
as
Pursuing the case further, we observe that
AAA
data dictionary of
SELECT. JOIN, and PROJECT on
and are performed by experimental systems such
than
in the
schemata
MASS equates to /ac:7ify in FODOR and AAA. In
MASS
MASS-column
of the facilities, the
The CIS-Executive can be designed
i.
subqueries which perform operations such as
relations in
AAA. FODOR. and MASS
and the data-formats
be accessed. The
are exemplified in Table
the query
Relations
CO
MASS-CC:
MASS-Amenity
to
si-of-L'nits.
possible that
MULTIBASE.
FODOR
FODOR
reports less information
simply does not emphasize the
other two guides, but specializes in other aspects such as decor.
TNAME
AAA
AAA
meanings. Example sources of difference include
guides has very different
whether
•
tax, breakfast, service charge,
and granuities are included
or not.
Inconsistency. The name, address, and phone number are reported as follows:
AAA:
Logan Airport Hilton; Lo^an International Airport. East Boston, 02123
(617) 569-9300
FODOR:
MASS:
Hilton Inn at Logan. Logan
The Logan Airport
Hilton.
Airport, 569-9300
Int'l
Logan International Airport, Boston, 02123
(617)569-9300
The twelve methodologies reported by Batini et
al.
dealt with
[1] also
some
subset of these issues. This tour-guides case presented above provides a cohesive
setting to discuss these issues
and the corresponding connectivity strategy, as
elaborated below.
To resolve these
issues,
it is
necessary
to
map synonyms and
convert different
data formats. This can be accomplished, in part, through data dictionaries. View
definition
and integration techniques can be applied
comprehensive view from the
may
provide a more
partial, incomplete information in the local databases.
Through comparison, conforming, and integration
conceptual schema
to
of the schemata, a global
be proposed and tested against the following criteria: (a)
completeness and correctness;
This process requires
(b)
(a)
minimality; and
(c)
extensive manual
understandability
effort,
and
(b)
[1].
the resulting
integrated schema does not contain explicit knowledge of the assumptions made. To
resolve these two problems,
the data.
it is
With these concepts
in
necessary
to
understand the concepts that underlie
mind, reasonable connections
based on the content of the database(s).
We
refer to this type of
reconciliation. Connectivity strategy can be
employed
may
problem as semantic
to
reconciliation problems in order to formulate composite-answers.
13
be established
solve semantic-
Two
parts of this
strategy are illustrated below:
identifier;
(A)
and
(b)
identifying the
making a judgment call based on
Identifying the
A
(a)
same object without
unique global key identifier
may
same
object without a global
credibility.
a global identifier
when multiple databases are
not exist
involved. For example, there does not exist a primary key identifier
which
is
defined
consistently and completely across the tour-guide databases. In order for the
subqueries (decomposed by the CIS-Executive discussed earlier)
necessary database operations, a unique ID for "Logan Airport Hilton"
it
in
perform the
to
is
needed, be
an ID-number, a name, an address, or a phone-number. Note that the name used
FODOR
is
"Hilton Inn at Logan."
identifier to retrieve facility from the
A moment
numbers and a
of thought
hotel. Let us
name
retrieved from
AAA
reference, the
"Logan Airport Hilton"
is
used as the
FODOR database, nothing will be returned.
would lead one
assume that
does not share the same phone
Hilton" as the
If
to
make
a hotel
the connection between phone
may have more than
number with another
for the hotel, the
Using "Logan Airport
phone number (617) 569-9300 could be
given the relations in Figure
phone numbers 569-9300
hotel.
one phone, but
for
4.
Using (617) 569-9300 as a
FODOR. and
(617) 569-9300 for
MASS
could also be retrieved.
The remaining question
FODOR. From FODOR,
is
whether "617"
is
also the area code for 569-9300 in
the hotel which has the phone
number 569-9300
is
classified
as located in the Boston area, which has an area-code 617. Consequently, the three
hotels have exactly the
same phone number,
i.e.,
(617) 569-9300. Since a hotel does
not share the same phone with another by assumption, we conclude that
same
hotel.
As
a result, facilitv data can be retrieved from the databases.
14
it is
the
Thus,
to identify the
same
global identifier requires us to
object in disparate databases without
make
an explicit
extensive use of inference with knowledge in
the application domain.
(B)
Making
a
judgment call based on
credibility
Contradiction, granularity, inconsistency, and ambiguity are unavoiaable
when integrating disparate
How
information disagree?
perspectives?
An
databases.
What should we
when
do
make our system
can we
different sources of
sensitive to different
informed approach based on the credibility of the
can help the delivery process. The fact that
specializes in decor
and
and
location,
MASS
example o^ credibility which can be used
to
AAA
databases
FODOR
specializes in room-rate,
specializes in
make
local
handicapped service
a judgment call
when
an
is
needed.
We
give another concrete example below.
As mentioned
without cable but
A
closer
AAA
indicates that
for color
TV.
only indicates
if
AAA
AAA
color
TV
is
more
specific. It
has three categories
for
CATV for cable TV, and C/CATV for color cable TV. Whereas
cable
TV
is
available. Therefore,
more detailed; therefore, more likely
In addition,
Logan Airport Hilton has
MASS reports that cable TV is available, contradicting each other.
examination reveals that
TV: C/T\'
MASS
earlier,
offers fee for
to
it is
fair to
be more credible in reporting
say that
AAA
is
TV information.
movies (assuming that the CIS-Executive
is
able to
interpret the data accumulated in Table 3). If the CIS-Executive also "knows" that
cable
TV means
fee,
cable movies such as
then
it
HBO." That
is. it
m.ovies from special stations such as
has color
for
TV
without cable, but offers fee
for
in
HBO.
composite-answer
is
Logan Airport Hilton. The reader may wish
to
Based on the analysis presented
formulated below for the
MASS means "fee
would conclude that "Cable TV"
facilities of
in
propose variations of the composite-answer.
15
this section, a
TV without cable, but with fee for movies, e.g. HBO. (3) air
phone in room. i.5i pool. (6) airport transportation available. <T) restaurant. iS)
non-smokers' room, and (9) pets allowed. In addition, the following facilities have been reported
suites, smoke detectors, entertainments, dancing weekends, bath or shower, attractive furnishing,
cocktail bar lounge, near public transportation, and handicapped accessible."
"(1) free parking; (2l color
conditioning,
(4)
The tour-guide case has shed additional
connectivity.
light on the challenge involved in
We now turn our attention to connectivity facilitatation.
3 Facilitating Connectivity
In speculating the Society of
we are
"All natural languages,
cases
..
One
possibility
is
told,
Mind (SOM), Minsky
states that [11]
have much the same kinds of nouns, verbs, adjectives, and
that detailed syntactic restrictions are genetically encoded, directly,
into our brains ... It would seem inevitable that some early high-level representations,
developing in the first year or so, would be concerned with ... the 'things' that natural languages
later represent as nouns. Here we indeed suspect generic prestructuring, within sensory systems
designed to partition and aggregate their inputs into data for representing 'things." Further, we
would need systems with elementary operations for constructing, and comparing descriptions."
It is, of
course, not our intent here to debate the speculation
SOM. The important
point
is
and evidences of
that each local database can be perceived as genetically
prestructured with certain syntactic restrictions, and designed to partition and
aggregate data for representing "things." Further, we would need systems with
elementary operations
each
local
database
is
for constructing,
and comparing descriptions.
In this way,
regarded as a "simple mind" with the "deep structures" [11]
along with some means
for
manipulating them. The "simpTe mind" here, of course,
refers to the "prestructured" local
schema, and
local
DML
is
used
to
manipulate the
"simple mind."
If
we accept
be applied
this analogy, then the transient case-shift
i3)
new
to
[11] can also
where the most highly developed case has the best-developed description
structure. Furthermore, Pappert's Principle
minds"
mechanism
^'
can be applied on top of these "simple-
accumulate new knowledge and organize the
e.xisting
knowledge into
States that some of the most crucial step> in mental growth are based not simply on acquiring
skills,
but on acquiring
new administrative ways
16
to
use what one already
knows
[12|.
different layers, with the top layer being an "intelligent global
(a)
schema" capable of
activating the lower-level agents to retrieve the needed information from the
local databases,
and
case-shift to the upper layers for the
semantic conflicts
cases; (b) reconciliating
Principle of Noncompromise'"*'; and
and information
(c)
among
most highly developed
different agents through the
transforming different sources of knowledge
into composite-answers.
In light of this discussion,
it is
natural
augmented with rule-based inference
to
apply frame-based representation [10]
capabilities to facilitate connectivity.
Object-Oriented Approach
It is
important
to
facilitate connectivity.
hotel has
observe that certain basic principles can be applied to
For example, although tour-guides are idiocyncratic, each
some rooms, and a room has some kind
of facilities
and
services.
As such,
it
can be represented as a composite-object with properties specialized in different
granularities.
These basic principles can serve as a reference of disparate databases,
guiding the reconciliation.
The
object.
object-oriented approach enables us to encapsulate each database as an
Messages can be sent among objects
to
obtain the information requested by a
composite-query. In addition, object inheritance networks can be constructed
declare properties shared by different classes of objects.
methods
to be
triggered
(e.g.,
interpretation of a hotel rating
rules
makes
it
convenient
for
send a message
to
when information on
Active-values permit
another object
rating
to
is
to
obtain the
requested). Heuristic
describing flexible responses to a wide range of events,
including situation where a unique global key identifier does not exist across
Stales that the longer an internal conil'd pci-i:-'.- anv iiu Ar[ agent's subordinates, the weaker
becomes that agent's status among its own ci>n-.pft;iori It' such interna! problems aren't settled
soon, other agents will take control and the agents tbrmerly iiuolved will be "dismissed" [12|.
'4)
.
databases
(e.g.,
tour-guide databases use different key identifiers for hotels). In this
way, knowledge of object attributes, inheritance properties, and heuristic rules can
be accumulated in the CIS-Executive
There are
many
to reconcile
semantic differences.
LOOPS,
object-oriented languages, such as
that offer excellent
features such as composite-objects and perspectives which are very useful in
we wish
facilitating connectivity [16]. Since
to
experiment with various novel
concepts involving connectivity to multiple databases,
we implemented
a specialized
frame-based knowledge representation scheme and rule-based inference capability
using
COMMON-LISP,
as discussed below.
Implementation Strategy
An
Abstract Data Base
CIS-Executive
to
integrate disparate databases for composite-answers.
higher-level conceptual
actual
to
DBMSs
Management System (ADBMS) was implemented
DBMS
(e.g.,
DBMS
It
a
sends queries (via messages)
AAA. FODOR, and MASS)
dictionaries, retrieve information, convert formats,
Adding anew
is
which conceals the implementation details of the
from other objects in the community.
multiple databases
ADBMS
as a
will not result in
any change
to
access the local data
and integrate different views.
to the
existing applications.
Also implemented was a set of commands which provide the basic features of an
object-oriented language with extensions to simplify constraint
representation.
Mechanisms are provided
building, relating,
and showing
objects.
database objects, and the actual
for interfaces
and knowledge
with databases as well as
The functional relationship among ADBMS,
DBMS
is
illustrated in Figure 5.
The reader
is
referred to Levine [7] for a detailed description of the underlying Knowledge-Object
REpresentation Language (KOREL).
A simplified example
18
is
presented below.
Figure 5
AAA
FODOR
MASS
OBJECT
OBJECT
OBJECT
DATABASE
DATABASE
DATABASE
Functional Relationship
Among ADBMS and
DBMS
the Actual
The Tour-Guide Case Revisited
Using the notation and the schemata
realized in Figure 6 as a primitive
FODOR
object
is
shown
simplified scenario of
ADBMS.
to
respond
(HCIS) object
to the
is
Similarly, a partial representation of the
These objects are used below
in Figure 7.
how
in Figure 4, a Hotel-CIS
query "what are the
to
present a
facilities of
Logan
Airport Hilton" submitted by the user as discussed in the case study.
At the conceptual
Airport Hilton)"
object given the
is
level, the
executed
to
command
"get-object
'HCIS
'facility
get a composite-answer of facilities from the
argument "Logan Airport Hilton."
At the HCIS
'(Logan
level, the following
commands
•;end-mes^agi>
FODOR
''ti!-\ic:i:r\
scnd-messa^e
send-measa^e
'AAA
''lil-faciiit}
'MASS
'all -facility
19
are e.xecuted.
'
Logan Airport Hiltom
Logan Airport Hilton
Logan Airport Hilton)
>
'f
HCIS
IHCIS
(NAME: (VALUE-TYPE string))
(ADDRESS, (VALUE-TYPE string))
(RATE-CODE: (VALUE-TYPE string))
(LODGL\G-TYPE: (VALUE-TYPE string))
(RATING (VALUE-TYPE stnngU
(#-OF-UNITS iVALUE-TYPE integer))
(PHONED iVALUE-TYPE string)
MULTIPLE-VALUE-FUNCTION
(
(FODOR
'MESSAGE: (VALUE
'FACILITY:
(QUERY
all-facilitv))
Hnd-facility))
(CATEGORY: (CHOICES
true))
super-deluxe,
deluxe, expensive,
(DIRECTION: (VALUE-TYPE stringU
moderate, inexpensi\e)
(FACILITY: (IF-NEEDED
(QUERY
find facility))
(CREDIT-CARD (VALUE-TYPE siring)
(MULTIPLE- VALUE-FUNCTION true))
(SEASON-RATE (VALUE-TYPE string)'
(MULTIPLE- VALUE-FUNCTION true))
(LOCATION: (VALUE-TYPE string')
(COMMENT: (VALUE-TYPE: string!)
(PACKAGE: (VALUE-TYPE stringi)
(SERVICE: (IF-NEEDED Hnd-services)))
Figure 6
,
An
Figure
rind-category)l)
A Partial Representation
of the FODOR Object
7
Object Representation of
the ComDosite-Model
These messages trigger the AAA,
return
all facilities
HCIS,
it
that they
"know
of."
FODOR, and MASS database
After
ail
objects to
the facilities have been returned to
reconciles all the difTerences from the local results, formulate a composite-
answer, and then return the composite-answer
to
IFEP
for the user.
In
reconciliating the differences, heuristic rules are used to identify a hotel or resolve
contradictory information.
We
illustrate the
FODOR
message from HCIS, the
'facility
case at the database object level. In response to the
FODOR object executes
'(Logan Airport Hilton)."
This
the
command
command
''^*?/-o6;c'cf
'FODOR
triggers the following
asynchronous events:
•
Establish the necessary connection and send queries to the
to retrieve the facilities in
•
Execute the
command
to establish the
FODOR-Facility shown
"pei-o6_/tT?
FODOR
in
Figure
database
4.
'category '"^Logan Airport Hilton/"
necessary connection and send queries
20
FODOR
to the
FODOR database
the category of "Logan Airport Hilton." (The category turns out
to to retrieve
to
•
be "expensive.")
command "send-message
Execute the
'facility" to infer
CATEGORY
"expensive"
additional facilities.
'FODOR -CATEGORY 'expensive
The command triggers the FODOR-
object to retrieve the facilities implied by the category
bath or shower,
(i.e.,
TV and
phone
room,
in
restaurant, and bar), and return the implied facilities to the
•
Concatenate the
CATEGORY;
facilities
returned from the
FODOR
then return the concatenated results
Finally at the actual
retrieve the facilities in
FODOR DBMS
level, the
to
heating, A/C,
FODOR object.
database and
FODOR-
HCIS.
subqueries are executed
FODOR-Facility and the category
in
to
FODOR-Info given
"Logan Airport Hilton."
The implementation
of this prototype has provided us with experience
and
feedback on the effectiveness of using an object-oriented approach augmented with
rule-based inference engine to handle semantic reconciliation, and an opportunity to
sec
how we may
devise a layered approach
to
organize the existing knowledge and
accumulate new knowledge, as Pappert's Principle suggested.
We
conclude with the
following remarks.
4
Concluding Remarks
The major problems that need
to
be overcome in order to integrate disparate
databases for composite answers include contradiction, inconsistency, and
ambiguity.
We
have presented an approach
to resolve
these problems through
enhancing the semantic power of the database integration. This approach exploits
concepts
drawn from frame-based knowledge representation scheme and rule-based
iriferencing.
21
We
are actively researching connectivity in the data engineering context.
Features crucial to connectivity in CIS are being designed and implemented on an
AT&T 3B2
used
machine under the
to dial directly into
feasibility of a
UNIX
environment. The enhanced version will be
multiple on-line
DBMSs simultaneously
to
demonstrate the
system capable of attaining connectivity in a real environment with
satisfactory performance.
State-of-the-art approaches in formulating composite-answers are mostly
application dependent and ad-hoc in nature. This research
is
intended
to
provide a
theoretical foundation of connectivity in order to reconcile different assumptions
perspectives due to different mental models
be integrated.
22
embedded
in the different
and
databases
to
References
1.
Batini, C. Lenzerini,
M. and Navathe. S.B. "A Comparative Analysis of
Methodologies for Database Schema Integration," AC^I Computing Survevs
Vol. 18, No. 4, December 1986, pp. 323 - 363.
2.
.
Frank, W.F., Madnick, S.E., and Wang, Y.R. "A Conceptual Model for
Integrated Autonomous Processing: An International Bank's Experience with
Large Databases," to appear in the 8th Annual International Conference on
Information Svstems (ICIS) December 1987.
,
3.
Goldhirsch, D.. Landers, T., Rosenberg, R.. and Yedwab, L.
"MULTIBASE:
Svstem Administrator's Guide," Computer Corporation of America, November
1984.
4.
Heimbigner. D. and Mcleod D. "A Federated Architecture for Information
ACM Transactions on Office Information Systems Vol. 3, No. 3,
July 1985, pp. 253-278.
Management,"
5.
.
ACM
Hewitt, C. E. Office Are Open Systems.
Transactions on Office
Information Svstems Vol. 4, No. 3, (July 1986), pp. 271-287.
.
6.
Lam, C.Y. and Madnick,
in
information systems.
Composite Information Systems - a new concept
Sloan School of Management, MIT, May
S.E..
CISR
''»VT# 35,
1978.
7.
Levine, S., 'Interfacing Objects and Database," Master's Thesis,
Engineering and Computer Science, MIT, May 1987.
8.
Litwin,
9.
Lyngbaek. P. and McLeod D.. "An Approach to Object Sharing in Distributed
Database Systems," 9th International Conf. on VLDB October, 1983.
W. and
December 1986,
Electrical
"Multidatabase Interoperability." Computer
Abdellatif, A.
pp. 10-18.
.
.
10.
Minsky, M. "A Framework for Representing Knowledge," in Readings in
Knowledge Representation Bachman and Levesque (Eds.) Morgan Kaufman
.
Publishers. 1985.
11.
Minsky, M. "The Society Theory of Thinking." Proceedings of the Fifth
International Joint Conference on Artificial Intelligence Cambrdige, Mass.,
August 22-25, 1977.
.
New York, 1986
INFOPLEX database computer:
12.
Minsky, M. The Society of mind Simon and Schuster,
13.
Madnick,
14.
Madnick, S.E. and Wang, Y.R. "Evolution Towards Strategic Applications of
Databases Through Composite Information Systems." Working Paper #136687, Sloan School of Management, MIT, Submitted for publication.
15.
Integrating Heterogeneous Distributed
Smith, J.M. et al. "ML'LTIBASE
Database Systems," National Computer Conference 1981. pp. 487-499.
16.
Stefik.
.
S. E. and Wang, Y. R., Modeling the
Proceedings of the 6-th
a multiprocessor systems with unbalanced flows.
Advanced Database Symposium (August 1986), pp. 85-93.
-
M. and Bobrow. D.G. "Object-Oriented Programming: Themes and
Variations." The AI Magazine Winter 1936. Vol. 6. No. 4. pp. 40 62.
-
.
23
1313^09^
Date Due
P.
Lib-26-6';
MIT
1
lUliAKIES
lllllllllllllllllllllllllllllllllllll
3
TOflD ODM
T3b 172
Download