DISTRIBUTED DBMS - School of Information Technology

advertisement
WHY NOT USE FEDERATED APPROACH FOR
DATABASE MANAGEMENT SYSTEM (DBMS)?
Yan Cui
ITK478
Position paper
CRUCIAL ISSUES IN ENTERPRISES
“…organizations merge or takeover since the
existing systems have been designed for different
corporate needs, the resulting enterprise will
have to face information inconsistency,
heterogeneity and incompatible overlap”.
Wijegunartne, Fernandez and Vltoudis in [1]
 “…a large modern enterprise, it is also inevitable
that …use different database systems to store
and search their critical data. Competition,
evolving technology, mergers, acquisitions,
geographic distribution, and … decentralization
of growth…” Haas and Lin in [2]

DBMS APPROACHES

Compare two major approaches
Federated database system approach
 Distributed database system approach


Comparison in their architectures/designs,
transparency, integration, autonomy, and
others.
DISTRIBUTED DBMS
Definition of Distributed database (DDBS)
and Distributed Database Management
System (DBMS)
 Centralized and distributed databases
conversion
 Distributed DBMS design

DISTRIBUTED DBMS (CONT)

Definition of Distributed database (DDBS)
and Distributed Database Management
System (DBMS)
Distributed database – “a collection of multiple,
logically interrelated database distributed over a
computer network” by M. Özsu and P. Valduriez in
[1]
 Distribute DBMS – “as the software system that
permits the management of the DDBS and makes the
distribution transparent to the users” by M. Özsu
and P. Valduriez in [1].

DISTRIBUTED DBMS (CONT)

Centralized and distributed databases
conversion

Distributed DBMS is more “local autonomy,
improved performance, improved
reliability/availability, economics, expandability, and
shareability” [3].
Fig. 1 - Central Database on a Network [3]
Fig. 2 - DDBS Environment [3]
DISTRIBUTED DBMS (CONT)

Distributed DBMS design - in [4] by F. A.
Baião, M. Mattoso and G. Zaverucha, defined
“Distribution design involves making decisions on
the fragmentation and placement of data across
the sites of a computer network”


Fragmentation
Allocation
DISTRIBUTED DBMS (CONT)

Distributed DBMS design – Fragmentation
Defined as “clustering fragments the information
accessed simultaneously by applications” [4].
 vertical fragmentation
 horizontal fragmentation
 mixed fragmentation

DISTRIBUTED DBMS (CONT)

Distributed DBMS design – Fragmentation

horizontal fragmentation - class instances are
distributed across fragments, and also a horizontal
fragment of a class contains a subset of the whole
class extension [4]
Primary (Round-Robin, Hash-partition, and Rang-partition)
 Derived fragment

Fig.3 - Round-robin [5]
Fig. 4 - Hash-partition [5]
Fig. 5 - Range partition [5]
DISTRIBUTED DBMS (CONT)

Distributed DBMS design – Fragmentation

horizontal fragmentation

Derived fragment
Fig. 5 - Range partition [5]
DISTRIBUTED DBMS (CONT)

Distributed DBMS design – Fragmentation
horizontal fragmentation - distribute attributes and
methods across fragments, as fragment 1(name,
GPA) and fragment 2(address, bDate, picture) from
student class in Fig. 7
 mixed fragmentation – combination of vertical and
horizontal fragmentations

Fig. 7 – Vertical fragmentation [5]
Fig. 8 – Mixed fragmentation [5]
DISTRIBUTED DBMS (CONT)

Distributed DBMS design – Allocation

by M. Özsu and P. Valduriez in [3] is to distribute all
resources/fragments across the nodes/sites of a
computer network.
FEDERATED DBMS
 Definition


all data sources are federated and linked together
from heterogeneous DBMSs, different locations,
relevant/irrelevant and structure/non-structure data,
into a unified system by DBMS by L.M. Haas, E.T.
Lin and M.A. Roth in [6].
Characteristics of federated DBMS

transparency, heterogeneity, a high degree of
function, extensibility, openness, autonomy, and
optimized performance in [2,6].
FEDERATED DBMS

DB2 architecture for database federation
user-defined function (UDF) (Scalar and Table
UDFs)
 Wrapper

Fig. 9 – DB2 architecture of database federation [6]
FEDERATED DBMS

DB2 architecture for database federation

UDF - take input parameters and return either a
scalar result or a table of data.
Scalar UDF - takes SQL statement as input and returns a
scalar result.
 Table UDF - is the other method which produces table as
output from any referenced SQL statements.

Select db2mq.mqsend(a.headline)
From Articles a
Where a.article_timestamp >= CURRENT
TIMESTAMP
Example. 1 - Scalar UDF [6]
Select a.first, a.last, a.phone, a.email
From TABLE(addressbook()) AS a,
Company_Profiles c
Where c.industry = ‘FINANCIAL’ AND
c.revenue > 50,000,000 AND c.name =
a.company_name
Example. 2 - Table UDF [6]
FEDERATED DBMS

DB2 architecture for database federation

Wrapper - as “powerful and flexible infrastructure
for federation” in [6] because it integrates both scalar
UDF function and Table UDF data
Select c.name, a.URL
From Compounds c, Experiments e, Articles a
Where e.result < 1.1e-p and e.id = c.id and serach (a.subject, c.name) > 0
Example. 3 – Wrapper [6]
COMPARISON TABLE
Comparison
Transparency
Distributed DBMS
Very transparency because distributed database
needs to be interrelated through communication
network. Each site holds its own database.
Therefore, users or applications need to know how
to interact with database system.
Federated DBMS
Not transparency because it masks from the
user the differences, idiosyncrasies, and
implementations of the underlying data
sources [2]. Therefore, the users not need
to aware of location, invocation, dialect,
fragmentation, etc.
Heterogeneity
Very hard to handle for heterogeneity if multiple
Can handle different hardware, network
databases are not interrelated and different networks. protocols, software, query language, data
models.
Autonomy
Local autonomy because each department have
authority to manage their data.
Not disturb local operation, moved or
modified data, remain
application/interface.
Data integration
Hard if not same network protocols, and multiple
DBMS, and not interrelated. It also increases cost
and traffic for query.
Can be easy to integrate data from different
protocols, DMBS, using wrapper.
Database access
Can be access using ODBC, JDBC, etc, as adapters.
Each adapter may be different based on the database
system: Oracle using Oracleadapter; SQL using
SQLadapter, and Access using OLEadapter. Each
programming language has its own embedded SQL.
Using Xperanto as middleware layer to
access any DBMSs with simple
programming model. Application can push
XML as standard SQL statement for
various query execution.
Other features
Economic, Reflects organizational structure.
A high degree of function, extensibility and
openness of the federation, optimized
performance.
CONCLUSION/POSITION
the disadvantages of distributed DBMS are
complexity, economic, difficulty to maintain data
integration, database access [3].
 federated database system provides
transparency, autonomy, optimized performance,
accessibility, and query standard through
multiple DBMSs
 an efficient way to integrate multiple DMBSs if
enterprises merging or using different DBMSs,
and provide data sharing and processing
efficiently throughout the enterprises.

REFERENCE






[1] I. Wijegunaratne, G. Fernandez, J. Valtoudis. 2000. “A Federated Architecture
for Enterprise Data Integration”, 2000 Australian Software Engineering Conference.
Retrieved September 12, 2007.
(http://portal.acm.org.proxy.lib.ilstu.edu:2048/citation.cfm?id=787253&coll=Portal&d
l=GUIDE&CFID=5277637&CFTOKEN=95867344)
[2] Laura Haas, Eileen Lin, 2002 “IBM Federated Database Technology”, IBM,
retrieved September 10, 2007
(http://www.ibm.com/developerworks/db2/library/techarticle/0203haas/0203haas.html
)
[3] M. Özsu and P. Valduriez, Principles of Distributed Database Systems, 2nd
edition (1st edition 1991), New Jersey, Prentice-Hall, 1999.
[4] F.A. Baião , M. Mattoso , G. Zaverucha. 1998. “Towards an Inductive Design of
Distributed Object Oriented Databases”. Proceedings of the 3rd IFCIS International
Conference on Cooperative Information Systems, p.188-197, August 20-22. Retrieved
September 28, 2007 from
http://csdl.computer.org/dl/proceedings/coopis/1998/8380/00/83800188.pdf.
[5] F. Baião, M. Mattoso, G. Zaverucha. “An Algorithm for the Design of Distributed
Object Databases” PowerPoint. Retrieved September 14, 2007. From http://wwwdb.cs.wisc.edu/dbseminar/spring00/talks/fernanda_slides.pdf.
[6] L.M. Haas, E.T. Lin, M.A. Roth. 2002. “Data integration through database
federation”. IBM Systems Journal, Volume 41 , Issue 4, retrieved October 1, 2007
from http://www.research.ibm.com/journal/sj/414/haas.pdf.
QUESTION?
Download