PositionPapersSlides/position paper (ITK478

advertisement
Why not use federated approach for database
management system (DBMS)?
Position paper
Yan Cui
ITK478
Introduction:
Overlooking the current Information Technology in this real world, database management system (DBMS) becomes
more and more popular which being used from small shops to large corporations. The major purpose of adopting
database system technology is to store and retrieve the data in a fast and efficient way. Recently, Most of relational
database management system provides capability to contain large data with meaningful relationships, and well
structured methods to manage business confidential information. However, several crucial issues appeared while
using DBMS in the enterprises. As Wijegunartne, Fernandez and Vltoudis in [1] pointed out that “…organizations
merge or takeover since the existing systems have been designed for different corporate needs, the resulting
enterprise will have to face information inconsistency, heterogeneity and incompatible overlap”. The other issue, as
discussed in [2] by Haas and Lin, is that “…a large modern enterprise, it is also inevitable that …use different
database systems to store and search their critical data. Competition, evolving technology, mergers, acquisitions,
geographic distribution, and … decentralization of growth…”
Based on the problems addressed above, this paper will use two major methodologies in database system, federated
and distributed, to compare why federated approach is the best solution, and benefit for large enterprises.
Distributed database system:
Distributed and parallel processing on database management systems (DBMS) is an efficient way of improving
performance of applications that manipulate large volumes of data [3]. It means that new application s can perform
efficiently by improving or increasing a hug of amount of data transaction. Also the parallel technique benefits from data
partition. This may be accomplished by removing irrelevant data accessed during the execution of queries and by reducing
the data exchange among sites, which are the two main goals of the design of distributed databases [4]. Distribution
design involves making decisions on the fragmentation and placement of data across the sites of a computer network
[4]. In a top down approach, the distribution design has two phases: fragmentation and allocation [3].
Fragmentation of OODBs:
 The Fragmentation Process in OODBs – To fragment a class, it is possible to use two basic techniques:
horizontal fragmentation and vertical fragmentation [3]. In object databases, horizontal fragmentation distributes
class instances across the fragments. Thus, a horizontal fragment of a class contains a subset of the whole class
extension. On the other hand, vertical fragmentation (VF) breaks the class logical structure (its attributes and
methods) and distributes them across the fragments [3].
 Heuristics for the Fragmentation Process in OODBs – a set of heuristics helps the distribution designer
in performing the fragmentation process of an object-oriented database, thus contributing on the
performance improvement of data access operations in the Cooperative Information System [6].
The allocation of OODBs:
In a distributed database system, a query requires data to be accessed from one or more sites. The cost of executing
the query depends on the location of the query as well as the data. Specifically, the data locality of a query
determines the amount of data transfer incurred in processing the query; the higher the data locality the lower the
data transfer costs. Thus, one is faced with the data allocation problem, the aim of which is to increase the data
locality. Given a set of queries accessing a set of data fragments, a data allocation algorithm allocates these
fragments to the sites of the distributed database system so as to minimize the total data transfer cost for processing
the queries [8].
Federated database system:
The federated architectural form is a hybrid architecture, which clusters highly interdependent modules and
applications (possibly using request/reply and conversational means of inter module communication via method
invocations) into domains, while attempting to minimize the strength of the interconnection between the domains [1].
Therefore, its capabilities enables unified access to any digital information, in any format – structured and
unstructured, in any information store [2]. By overview the characteristics of federated solution in transparency,
heterogeneity, a high degree of function, extensibility and openness of the federation, and optimized performance,
we can understand better about this new database system [2]. After the description, two case studies will be cited to
demonstrate the efficient adoption of federated system.
Characteristics of federated solution
 Transparency – It means that the users should not need to know where to store the data, what the
supported data sources for languages and programming, what dialect of SQL, whether is partitioned and/or
replicated, what networking protocols, and see only a single set of error codes.
 Heterogeneity – It means that the federated database can accommodate all of the differences,
encompassing systems such as these in a seamless, transparent federation.
 A high degree of function – It provides all the function of its rich, standard-compliant DB2 SQL
capability against all the data in the federation, as well as all the function of the underlying data sources.
 Extensibility and openness of the federation – The federated database engine accesses sources via a
software component know as a wrapper. Accessing a new type of data source is done by acquiring or
creating a wrapper for that source. Also IBM supports the ANSI SQL/MED standard. Any wrapper written
to the SQL/MED interface can be used with IBM’s federated database.
 Autonomy for data sources – The federated database doesn’t affect any operations of the source, not
disturb the local operation of an existing data source, no move or modified data and interface.
 Optimized performance – The optimizer is the component of a relational database management system
that determines the best way to execute each query. It decides whether the different operations involved in
a query should be done by the federated server or by the source where the data is stored, the order of the
operations, what implementations to use to do local portions of the query.
Case study: Xperanto
Xperanto project is to serve as a middleware layer that supports the publishing of XML data to this class of users. It
provides a uniform, XML-based query interface over an object-relational database that allows users to query and
restructure the contents of the database as XML data, ignoring the underlying SQL tables and query language [5]. In
Fig. 1 displays Xperanto using XML (eXtended Markup Language) to emerge as the unversersal format for
publishing and exchanging data over the World Wide Web.
(Fig. 1)
Xperanto architecture
XPERANTO is organized into four major software components, which are further broken down into smaller logical
subcomponents. As shown in Figure 2, the major components of XPERANTO are: Query Translation, XML View
Services, the XML Schema Generator, and the XML Tagger. The core of XPERANTO, and the primary focus of
this paper, is the Query Translation component. This component translates from the XML query language used by
clients (currently XML-QL [4]) into the appropriate dialect of SQL for the underlying O-R DBMS.
(Fig. 2)
Xperanto XML schema mapping
One of the goals of Xperanto is to allow XML developers to publish object-relational data in XML form without
having to deal with the database system’s native schema or SQL query dialect. XML Schema has been designated to
supplant the XML DTD, adding important features such data types, value constraints, inheritance, and foreign key
information. The schema mapping includes Object-Relational Database schemas, which enables database designers
to define new data types and complex object structure. Also, a fragment of the XML Schema describes the default
XML view of the object-relational schema.
Xperanto query processing and XML documentation construction
Once Xperanto publishes a default XML view of an object-relational database, users can then pose queries and
define more complex views using an XML query language. In the processing, it contains XML Query rewriting and
SQL generation and XML document construction.
Case study: Australian Gas & Light Company (AGL)
The federated architecture which established by AGL provides a overview of successful architecture for data
integration, therefore, it should be make possible overall accessibility, while minimizing disruption to existing
components. In this architecture, all interacting modules of a distributed system is related to the type of dependency
existing between processing dependency, information dependency, tapes events, inter-domain traffic, and abnormal
condition.
(Fig. 3)
The AGL Data integration:
The integration includes the federation ignores the particulars within the domain; it knows nothing beyond the
business services made available to it and the domain does not know of the existence of other fragments of the same
entity in other domains of the federation
An enterprise software architecture such as the federation described makes possible the flow of information between
applications belonging to different business units. When implementing enterprise wide computing, a federated
approach minimizes disruption to existing system, since it isolates application clusters in terms of processing, and
provides them with consistent, stable interfaces to the rest of the organization [1].
Conclusion:
As discussed above, distributed database is under the control of a central DBMS in which storage devices are not all
attached to a common CPU. A central database management system is distinct, such as Oracle, SQL, or others. Its
disadvantages are complexity, economic, difficulty to maintain data integration, database access. Complex means
that it requires extra work to ensure transparent for distributed database system, maintain multiple database systems,
and account for the disconnect nature of the database. Economic is because it is distributed in different locations
and infrastructure, therefore, it needs extra labor. Difficulty to maintain data integration means that enforcing
integrity over a network may require too much of the network's resources to be feasible. Different databases require
different drive to access from applications, such as JDBC, ODBC, etc. However, federated database system provides
transparency, autonomy, optimized performance, accessibility, and query standard through multiple DBMSs. Also it
is an efficient way to solve the problems in the beginning for data sharing and processing throughout the enterprises.
Reference:
[1] Inji Wijegunaratne, George Fernandez, John Valtoudis. 2000. “A Federated Architecture for Enterprise Data
Integration”, 2000 Australian Software Engineering Conference. Retrieved September 12, 2007.
(http://portal.acm.org.proxy.lib.ilstu.edu:2048/citation.cfm?id=787253&coll=Portal&dl=GUIDE&CFID=5277637&
CFTOKEN=95867344)
[2] Laura Haas, Eileen Lin, 2002 “IBM Federated Database Technology”, IBM, retrieved September 10, 2007
(http://www.ibm.com/developerworks/db2/library/techarticle/0203haas/0203haas.html)
[3] F. Baião, M. Mattoso, and G. Zaverucha, "A framework for the design of distributed databases. "in Proceedings
in Informatics 14. Distributed Data & Structures 4--Records of the 4th International Meeting, W. Litwin, and G.
Lévy (Eds.), Carleton Scientific, 2002, pp. 29-36.
[4] M. Özsu and P. Valduriez, Principles of Distributed Database Systems, 2nd edition (1st edition 1991), New
Jersey, Prentice-Hall, 1999.
[5] Michael J. Carey, Jerry Kiernan. 2000. “XPERANTO: Middleware for Publishing Object-Relational Data as
XML Documents”, 26th International Conference. Retrieved September 13, 2007.
(http://portal.acm.org.proxy.lib.ilstu.edu:2048/citation.cfm?id=671862&coll=Portal&dl=GUIDE&CFID=5277637&
CFTOKEN=95867344)
[6] Fernanda Araujo Baião , Marta Mattoso , Gerson Zaverucha, Towards an Inductive Design of Distributed Object
Oriented Databases, Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems,
p.188-197, August 20-22, 1998.
[7] Ishfaq Ahmad , Kamalakar Karlapalem , Yu-Kwong Kwok , Siu-Kai So, Evolutionary Algorithms for Allocating
Data in Distributed Database Systems, Distributed and Parallel Databases, v.11 n.1, p.5-32, January 2002.
[8] Ishfaq Ahmad, Yu-kwong Kwok, Siu-kai So, Evolutionary Algorithms for Allocating Data in Distributed
Database Systems, Distributed and Parallel Databases, 11, 5–32, 2002.
Download