WHY NOT USE FEDERATED APPROACH FOR DATABASE MANAGEMENT SYSTEM (DBMS)? Yan Cui ITK478 Position paper CRUCIAL ISSUES IN ENTERPRISES “…organizations merge or takeover since the existing systems have been designed for different corporate needs, the resulting enterprise will have to face information inconsistency, heterogeneity and incompatible overlap”. Wijegunartne, Fernandez and Vltoudis in [1] “…a large modern enterprise, it is also inevitable that …use different database systems to store and search their critical data. Competition, evolving technology, mergers, acquisitions, geographic distribution, and … decentralization of growth…” Haas and Lin in [2] DBMS APPROACHES Compare two major approaches Federated database system approach Distributed database system approach Comparison in their architectures/designs, transparency, integration, autonomy, and others. DISTRIBUTED DBMS Definition of Distributed database (DDBS) and Distributed Database Management System (DBMS) Centralized and distributed databases conversion Distributed DBMS design DISTRIBUTED DBMS (CONT) Definition of Distributed database (DDBS) and Distributed Database Management System (DBMS) Distributed database – “a collection of multiple, logically interrelated database distributed over a computer network” by M. Özsu and P. Valduriez in [1] Distribute DBMS – “as the software system that permits the management of the DDBS and makes the distribution transparent to the users” by M. Özsu and P. Valduriez in [1]. DISTRIBUTED DBMS (CONT) Centralized and distributed databases conversion Distributed DBMS is more “local autonomy, improved performance, improved reliability/availability, economics, expandability, and shareability” [3]. Fig. 1 - Central Database on a Network [3] Fig. 2 - DDBS Environment [3] DISTRIBUTED DBMS (CONT) Distributed DBMS design - in [4] by F. A. Baião, M. Mattoso and G. Zaverucha, defined “Distribution design involves making decisions on the fragmentation and placement of data across the sites of a computer network” Fragmentation Allocation DISTRIBUTED DBMS (CONT) Distributed DBMS design – Fragmentation Defined as “clustering fragments the information accessed simultaneously by applications” [4]. vertical fragmentation horizontal fragmentation mixed fragmentation DISTRIBUTED DBMS (CONT) Distributed DBMS design – Fragmentation horizontal fragmentation - class instances are distributed across fragments, and also a horizontal fragment of a class contains a subset of the whole class extension [4] Primary (Round-Robin, Hash-partition, and Rang-partition) Derived fragment Fig.3 - Round-robin [5] Fig. 4 - Hash-partition [5] Fig. 5 - Range partition [5] DISTRIBUTED DBMS (CONT) Distributed DBMS design – Fragmentation horizontal fragmentation Derived fragment Fig. 5 - Range partition [5] DISTRIBUTED DBMS (CONT) Distributed DBMS design – Fragmentation horizontal fragmentation - distribute attributes and methods across fragments, as fragment 1(name, GPA) and fragment 2(address, bDate, picture) from student class in Fig. 7 mixed fragmentation – combination of vertical and horizontal fragmentations Fig. 7 – Vertical fragmentation [5] Fig. 8 – Mixed fragmentation [5] DISTRIBUTED DBMS (CONT) Distributed DBMS design – Allocation by M. Özsu and P. Valduriez in [3] is to distribute all resources/fragments across the nodes/sites of a computer network. FEDERATED DBMS Definition all data sources are federated and linked together from heterogeneous DBMSs, different locations, relevant/irrelevant and structure/non-structure data, into a unified system by DBMS by L.M. Haas, E.T. Lin and M.A. Roth in [6]. Characteristics of federated DBMS transparency, heterogeneity, a high degree of function, extensibility, openness, autonomy, and optimized performance in [2,6]. FEDERATED DBMS DB2 architecture for database federation user-defined function (UDF) (Scalar and Table UDFs) Wrapper Fig. 9 – DB2 architecture of database federation [6] FEDERATED DBMS DB2 architecture for database federation UDF - take input parameters and return either a scalar result or a table of data. Scalar UDF - takes SQL statement as input and returns a scalar result. Table UDF - is the other method which produces table as output from any referenced SQL statements. Select db2mq.mqsend(a.headline) From Articles a Where a.article_timestamp >= CURRENT TIMESTAMP Example. 1 - Scalar UDF [6] Select a.first, a.last, a.phone, a.email From TABLE(addressbook()) AS a, Company_Profiles c Where c.industry = ‘FINANCIAL’ AND c.revenue > 50,000,000 AND c.name = a.company_name Example. 2 - Table UDF [6] FEDERATED DBMS DB2 architecture for database federation Wrapper - as “powerful and flexible infrastructure for federation” in [6] because it integrates both scalar UDF function and Table UDF data Select c.name, a.URL From Compounds c, Experiments e, Articles a Where e.result < 1.1e-p and e.id = c.id and serach (a.subject, c.name) > 0 Example. 3 – Wrapper [6] COMPARISON TABLE Comparison Transparency Distributed DBMS Very transparency because distributed database needs to be interrelated through communication network. Each site holds its own database. Therefore, users or applications need to know how to interact with database system. Federated DBMS Not transparency because it masks from the user the differences, idiosyncrasies, and implementations of the underlying data sources [2]. Therefore, the users not need to aware of location, invocation, dialect, fragmentation, etc. Heterogeneity Very hard to handle for heterogeneity if multiple Can handle different hardware, network databases are not interrelated and different networks. protocols, software, query language, data models. Autonomy Local autonomy because each department have authority to manage their data. Not disturb local operation, moved or modified data, remain application/interface. Data integration Hard if not same network protocols, and multiple DBMS, and not interrelated. It also increases cost and traffic for query. Can be easy to integrate data from different protocols, DMBS, using wrapper. Database access Can be access using ODBC, JDBC, etc, as adapters. Each adapter may be different based on the database system: Oracle using Oracleadapter; SQL using SQLadapter, and Access using OLEadapter. Each programming language has its own embedded SQL. Using Xperanto as middleware layer to access any DBMSs with simple programming model. Application can push XML as standard SQL statement for various query execution. Other features Economic, Reflects organizational structure. A high degree of function, extensibility and openness of the federation, optimized performance. CONCLUSION/POSITION the disadvantages of distributed DBMS are complexity, economic, difficulty to maintain data integration, database access [3]. federated database system provides transparency, autonomy, optimized performance, accessibility, and query standard through multiple DBMSs an efficient way to integrate multiple DMBSs if enterprises merging or using different DBMSs, and provide data sharing and processing efficiently throughout the enterprises. REFERENCE [1] I. Wijegunaratne, G. Fernandez, J. Valtoudis. 2000. “A Federated Architecture for Enterprise Data Integration”, 2000 Australian Software Engineering Conference. Retrieved September 12, 2007. (http://portal.acm.org.proxy.lib.ilstu.edu:2048/citation.cfm?id=787253&coll=Portal&d l=GUIDE&CFID=5277637&CFTOKEN=95867344) [2] Laura Haas, Eileen Lin, 2002 “IBM Federated Database Technology”, IBM, retrieved September 10, 2007 (http://www.ibm.com/developerworks/db2/library/techarticle/0203haas/0203haas.html ) [3] M. Özsu and P. Valduriez, Principles of Distributed Database Systems, 2nd edition (1st edition 1991), New Jersey, Prentice-Hall, 1999. [4] F.A. Baião , M. Mattoso , G. Zaverucha. 1998. “Towards an Inductive Design of Distributed Object Oriented Databases”. Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems, p.188-197, August 20-22. Retrieved September 28, 2007 from http://csdl.computer.org/dl/proceedings/coopis/1998/8380/00/83800188.pdf. [5] F. Baião, M. Mattoso, G. Zaverucha. “An Algorithm for the Design of Distributed Object Databases” PowerPoint. Retrieved September 14, 2007. From http://wwwdb.cs.wisc.edu/dbseminar/spring00/talks/fernanda_slides.pdf. [6] L.M. Haas, E.T. Lin, M.A. Roth. 2002. “Data integration through database federation”. IBM Systems Journal, Volume 41 , Issue 4, retrieved October 1, 2007 from http://www.research.ibm.com/journal/sj/414/haas.pdf. QUESTION?