Why not use federated approach for database management system (DBMS)? Position paper Yan Cui ITK478 Introduction: Overlooking the current Information Technology in this real world, database management system (DBMS) becomes more and more popular which being used from small shops to large corporations. The major purpose of adopting database system technology is to store and retrieve the data in a fast and efficient way. Recently, Most of relational database management system provides capability to contain large data with meaningful relationships, and well structured methods to manage business confidential information. However, several crucial issues appeared while using DBMS in the enterprises. As Wijegunartne, Fernandez and Vltoudis in [1] pointed out that “…organizations merge or takeover since the existing systems have been designed for different corporate needs, the resulting enterprise will have to face information inconsistency, heterogeneity and incompatible overlap”. The other issue, as discussed in [2] by Haas and Lin, is that “…a large modern enterprise, it is also inevitable that …use different database systems to store and search their critical data. Competition, evolving technology, mergers, acquisitions, geographic distribution, and … decentralization of growth…” Based on the problems addressed above, this paper will use two major methodologies in database system, federated and distributed, to compare why federated approach is the best solution, and benefit for large enterprises. Distributed database system: Distributed and parallel processing on database management systems (DBMS) is an efficient way of improving performance of applications that manipulate large volumes of data [3]. It means that new application s can perform efficiently by improving or increasing a hug of amount of data transaction. Also the parallel technique benefits from data partition. This may be accomplished by removing irrelevant data accessed during the execution of queries and by reducing the data exchange among sites, which are the two main goals of the design of distributed databases [4]. Distribution design involves making decisions on the fragmentation and placement of data across the sites of a computer network [4]. In a top down approach, the distribution design has two phases: fragmentation and allocation [3]. Fragmentation of OODBs: The Fragmentation Process in OODBs – To fragment a class, it is possible to use two basic techniques: horizontal fragmentation and vertical fragmentation [3]. In object databases, horizontal fragmentation distributes class instances across the fragments. Thus, a horizontal fragment of a class contains a subset of the whole class extension. On the other hand, vertical fragmentation (VF) breaks the class logical structure (its attributes and methods) and distributes them across the fragments [3]. Heuristics for the Fragmentation Process in OODBs – a set of heuristics helps the distribution designer in performing the fragmentation process of an object-oriented database, thus contributing on the performance improvement of data access operations in the Cooperative Information System [6]. The allocation of OODBs: In a distributed database system, a query requires data to be accessed from one or more sites. The cost of executing the query depends on the location of the query as well as the data. Specifically, the data locality of a query determines the amount of data transfer incurred in processing the query; the higher the data locality the lower the data transfer costs. Thus, one is faced with the data allocation problem, the aim of which is to increase the data locality. Given a set of queries accessing a set of data fragments, a data allocation algorithm allocates these fragments to the sites of the distributed database system so as to minimize the total data transfer cost for processing the queries [8]. Federated database system: The federated architectural form is a hybrid architecture, which clusters highly interdependent modules and applications (possibly using request/reply and conversational means of inter module communication via method invocations) into domains, while attempting to minimize the strength of the interconnection between the domains [1]. Therefore, its capabilities enables unified access to any digital information, in any format – structured and unstructured, in any information store [2]. By overview the characteristics of federated solution in transparency, heterogeneity, a high degree of function, extensibility and openness of the federation, and optimized performance, we can understand better about this new database system [2]. After the description, two case studies will be cited to demonstrate the efficient adoption of federated system. Characteristics of federated solution Transparency – It means that the users should not need to know where to store the data, what the supported data sources for languages and programming, what dialect of SQL, whether is partitioned and/or replicated, what networking protocols, and see only a single set of error codes. Heterogeneity – It means that the federated database can accommodate all of the differences, encompassing systems such as these in a seamless, transparent federation. A high degree of function – It provides all the function of its rich, standard-compliant DB2 SQL capability against all the data in the federation, as well as all the function of the underlying data sources. Extensibility and openness of the federation – The federated database engine accesses sources via a software component know as a wrapper. Accessing a new type of data source is done by acquiring or creating a wrapper for that source. Also IBM supports the ANSI SQL/MED standard. Any wrapper written to the SQL/MED interface can be used with IBM’s federated database. Autonomy for data sources – The federated database doesn’t affect any operations of the source, not disturb the local operation of an existing data source, no move or modified data and interface. Optimized performance – The optimizer is the component of a relational database management system that determines the best way to execute each query. It decides whether the different operations involved in a query should be done by the federated server or by the source where the data is stored, the order of the operations, what implementations to use to do local portions of the query. Case study: Xperanto Xperanto project is to serve as a middleware layer that supports the publishing of XML data to this class of users. It provides a uniform, XML-based query interface over an object-relational database that allows users to query and restructure the contents of the database as XML data, ignoring the underlying SQL tables and query language [5]. In Fig. 1 displays Xperanto using XML (eXtended Markup Language) to emerge as the unversersal format for publishing and exchanging data over the World Wide Web. (Fig. 1) Xperanto architecture XPERANTO is organized into four major software components, which are further broken down into smaller logical subcomponents. As shown in Figure 2, the major components of XPERANTO are: Query Translation, XML View Services, the XML Schema Generator, and the XML Tagger. The core of XPERANTO, and the primary focus of this paper, is the Query Translation component. This component translates from the XML query language used by clients (currently XML-QL [4]) into the appropriate dialect of SQL for the underlying O-R DBMS. (Fig. 2) Xperanto XML schema mapping One of the goals of Xperanto is to allow XML developers to publish object-relational data in XML form without having to deal with the database system’s native schema or SQL query dialect. XML Schema has been designated to supplant the XML DTD, adding important features such data types, value constraints, inheritance, and foreign key information. The schema mapping includes Object-Relational Database schemas, which enables database designers to define new data types and complex object structure. Also, a fragment of the XML Schema describes the default XML view of the object-relational schema. Xperanto query processing and XML documentation construction Once Xperanto publishes a default XML view of an object-relational database, users can then pose queries and define more complex views using an XML query language. In the processing, it contains XML Query rewriting and SQL generation and XML document construction. Case study: Australian Gas & Light Company (AGL) The federated architecture which established by AGL provides a overview of successful architecture for data integration, therefore, it should be make possible overall accessibility, while minimizing disruption to existing components. In this architecture, all interacting modules of a distributed system is related to the type of dependency existing between processing dependency, information dependency, tapes events, inter-domain traffic, and abnormal condition. (Fig. 3) The AGL Data integration: The integration includes the federation ignores the particulars within the domain; it knows nothing beyond the business services made available to it and the domain does not know of the existence of other fragments of the same entity in other domains of the federation An enterprise software architecture such as the federation described makes possible the flow of information between applications belonging to different business units. When implementing enterprise wide computing, a federated approach minimizes disruption to existing system, since it isolates application clusters in terms of processing, and provides them with consistent, stable interfaces to the rest of the organization [1]. Conclusion: As discussed above, distributed database is under the control of a central DBMS in which storage devices are not all attached to a common CPU. A central database management system is distinct, such as Oracle, SQL, or others. Its disadvantages are complexity, economic, difficulty to maintain data integration, database access. Complex means that it requires extra work to ensure transparent for distributed database system, maintain multiple database systems, and account for the disconnect nature of the database. Economic is because it is distributed in different locations and infrastructure, therefore, it needs extra labor. Difficulty to maintain data integration means that enforcing integrity over a network may require too much of the network's resources to be feasible. Different databases require different drive to access from applications, such as JDBC, ODBC, etc. However, federated database system provides transparency, autonomy, optimized performance, accessibility, and query standard through multiple DBMSs. Also it is an efficient way to solve the problems in the beginning for data sharing and processing throughout the enterprises. Reference: [1] Inji Wijegunaratne, George Fernandez, John Valtoudis. 2000. “A Federated Architecture for Enterprise Data Integration”, 2000 Australian Software Engineering Conference. Retrieved September 12, 2007. (http://portal.acm.org.proxy.lib.ilstu.edu:2048/citation.cfm?id=787253&coll=Portal&dl=GUIDE&CFID=5277637& CFTOKEN=95867344) [2] Laura Haas, Eileen Lin, 2002 “IBM Federated Database Technology”, IBM, retrieved September 10, 2007 (http://www.ibm.com/developerworks/db2/library/techarticle/0203haas/0203haas.html) [3] F. Baião, M. Mattoso, and G. Zaverucha, "A framework for the design of distributed databases. "in Proceedings in Informatics 14. Distributed Data & Structures 4--Records of the 4th International Meeting, W. Litwin, and G. Lévy (Eds.), Carleton Scientific, 2002, pp. 29-36. [4] M. Özsu and P. Valduriez, Principles of Distributed Database Systems, 2nd edition (1st edition 1991), New Jersey, Prentice-Hall, 1999. [5] Michael J. Carey, Jerry Kiernan. 2000. “XPERANTO: Middleware for Publishing Object-Relational Data as XML Documents”, 26th International Conference. Retrieved September 13, 2007. (http://portal.acm.org.proxy.lib.ilstu.edu:2048/citation.cfm?id=671862&coll=Portal&dl=GUIDE&CFID=5277637& CFTOKEN=95867344) [6] Fernanda Araujo Baião , Marta Mattoso , Gerson Zaverucha, Towards an Inductive Design of Distributed Object Oriented Databases, Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems, p.188-197, August 20-22, 1998. [7] Ishfaq Ahmad , Kamalakar Karlapalem , Yu-Kwong Kwok , Siu-Kai So, Evolutionary Algorithms for Allocating Data in Distributed Database Systems, Distributed and Parallel Databases, v.11 n.1, p.5-32, January 2002. [8] Ishfaq Ahmad, Yu-kwong Kwok, Siu-kai So, Evolutionary Algorithms for Allocating Data in Distributed Database Systems, Distributed and Parallel Databases, 11, 5–32, 2002.