Chapter 17 Client-Server Processing and Distributed Databases Outline Overview of Distributed Processing and Distributed Data Client-Server Database Architectures Web Database Connectivity Architectures for Distributed Database Management Systems Transparency for Distributed Database Processing Distributed Database Processing McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Evolution of Distributed Processing and Distributed Data Need to share resources across a network Timesharing (1970s) Remote procedure calls (1980s) Client-server computing (1990s) McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Timesharing Network Terminal Terminal Database Mainframe computer Terminal McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Resource Sharing with a Network of Personal Computers (a) Remote procedural call Procedural call Return results Database (b) File sharing File request File returned Database McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Client-Server Processing with Distributed processing only Server Client Client Client McGraw-Hill/Irwin Database © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Distributed processing and data Server Server Client Client Client Client Database McGraw-Hill/Irwin Database © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Motivation for Distributed Processing Flexibility: the ease of maintaining and adapting a system Scalability: the ability to support scalable growth of hardware and software capacity Interoperability: open standards that allow two or more systems to exchange and use software and data McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Motivation for Distributed Data Data control: locate data to match an organization’s structure Communication costs: locate data close to data usage to lower communication cost and improve performance Reliability: increase data availability by replicating data at more than one site McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Summary of Distributed Processing and Data Distributed Processing Flexibility, interoperability, scalability Distributed Data Local control of data, improved performance, reduced communication costs, improved reliability Disadvantages High complexity, high High complexity, additional development cost, security concerns possible interoperability problems Advantages McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Client-Server Database Architectures Client-Server Architecture is an arrangement of components (clients and servers) among computers connected by a network. A client-server architecture supports efficient processing of messages (requests for service) between clients and servers. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Design Issues Division of processing: the allocation of tasks to clients and servers. Process management: interoperability among clients and servers and efficiently processing messages between clients and servers. Middleware: software for process management • McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Tasks to Distribute Presentation: code to maintain the graphical user interface Validation: code to ensure the consistency of the database and user inputs Business logic: code to perform business functions Workflow: code to ensure completion of business processes Data access: code to extract data to answer queries and modify a database McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Middleware A software component that performs process management. Allow clients and servers to exist on different platforms. Allows servers to efficiently process messages from a large number of clients. Often located on a dedicated computer. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Client-Server Computing with Middleware Middleware McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Types of Middleware Transaction-processing monitors: relieve the operating system of managing database processes Message-oriented middleware: maintain a queue of messages Object-request brokers: provide a high level of interoperability and message intelligence Data access middleware: provide a uniform interface to relational and non relational data using SQL McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Two-Tier Architecture SQL statements Database server Database Query results McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Two-Tier Client-Server Architecture A PC client and a database server interact directly to request and transfer data. The PC client contains the user interface code. The server contains the data access logic. The PC client and the server share the validation and business logic. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Three-Tier Architecture (Middleware Server) SQL statements Database Middleware server Database server Query Results McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Three-Tier Architecture (Application Server) (b) Application server Database SQL statements Database server Application server Query results McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Three-Tier Architecture • • To improve performance, the three-tier architecture adds another server layer either by a middleware server or an application server. The additional server software can reside on a separate computer. Alternatively, the additional server software can be distributed between the database server and PC clients. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Multiple-Tier Architecture A client-server architecture with more than three layers: a PC client, a backend database server, an intervening middleware server, and application servers. Provides more flexibility on division of processing The application servers perform business logic and manage specialized kinds of data such as images. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Multiple-Tier Architecture Application server Database Middleware server Database server Application server McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Multiple-Tier Architecture with Software Bus Database Application server Database server Application server Software bus McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Web Database Connectivity Internet commerce depends heavily on database access for websites. Web database connectivity allows a database to be manipulated through a Web page. A user may use a Web form to change a database or view a report generated from a database. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Internet Basics Network of networks Uses standard protocols: TCP/IP – TCP: splits messages into packets – IP: routes messages Each computer on the Internet has a unique numeric address known as an IP address. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Internet and Intranet Relationship TCP/IP TCP/IP TCP/IP Intranet Firewall TCP/IP TCP/IP TCP/IP Internet McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. World Wide Web Most popular application on the Internet Supports browsing pages located on any computer on the Internet Hypertext Transport Protocol (HTTP) establishes a session between a browser and a Web server. Each page has a unique address known as a URL. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Web Page Request Cycle 5 1 User clicks hyperlink. 2 Browser sends request to web server. Browser displays file. 4 Web server sends file. 3 McGraw-Hill/Irwin Web server locates page. © 2004 The McGraw-Hill Companies, Inc. All rights reserved. XML/XSL Solutions to HTML limitations eXtensible Markup Language (XML) – Separates content and structure of a document – Use document type declaration or schema to specify document structure eXtensible Style Language (XSL): supports transformation into display languages Both are extensible languages McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. The Common Gateway Interface (CGI) CGI is an interface that allows a Web server to invoke an external program on the same computer. The external program uses the parameters passed by the Web server to produce output that is sent back to the browser. Usually, the output contains HTML/XML so that the browser can display it properly. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Straight CGI Parameters SQL External program Web server HTML/XML Database server Results Database McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Hybrid CGI Parameters Parameters External program Web server HTML/XML SQL Partner program HTML/XML Database server Results Database McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Server-side connectivity Server-side connectivity bypasses the external program needed with the CGI approaches. Specialized Web server or middleware server is needed SQL statements and database logic are kept in a web page or external file. The database code can execute stored procedures on the database server. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Server-Side Connectivity Approach SQL Web server with middleware extension Database server HTML/XML SQL statements and formatting requirements McGraw-Hill/Irwin Database © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Server-Side Connectivity with a Middleware Server Database request SQL Middleware server with listener Web server HTML/XML Database server Results SQL statements and formatting requirements McGraw-Hill/Irwin Database © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Client-Side Connectivity Client computing capacity can be more fully utilized without storing code on the client. Provides a more customized interface than permitted by HTML Supports data buffering by the client to improve performance McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Web Page Request Cycle with Client-Side Connectivity Browser sends request to web server. Web server sends file containing HTML/XML and embedded applet (Java) or binary object (ActiveX). McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Summary of Web Connectivity Approach Architecture Example Product Comments Straight CGI Two-tier Inexpensive; portability only limited by external program; limited scalability More expensive than straight CGI; portability depends on partner program; scalable to modest loads Expensive; Web server dependent; highly scalable Apache Web server with external PERL program Hybrid CGI Two-tier Apache Web server with Cold Fusion server extensions Server-side Three-tier Internet Information connectivity and multiple- Server with Microsoft tier Transaction Server Server-side Three-tier Oracle Application connectivity and multiple- Server (Middleware) tier Client-side Two- and Microsoft Remote connectivity multiple-tier Data Service McGraw-Hill/Irwin Expensive; Web server independent; highly scalable Customized client interface; efficient data buffering; usually works with server-side connectivity approaches © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Architectures for Distributed Database Management Systems DBMSs need fundamental extensions. Underlying the extensions are a different component architecture and a different schema architecture. Component Architecture manages distributed database requests. Schema Architecture provides additional layers of data description. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Global Requests Product data Customer-order data Product data Customer-order data McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Component Architecture GD GD DDM GD Site 2 DDM Site 1 LDM DB McGraw-Hill/Irwin DDM Site 3 LDM DB © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Schema Architecture I External schema 1 External schema 2 ... External schema n Conceptual schema Fragmentation schema Allocation schema m Sites Internal schema 1 McGraw-Hill/Irwin Internal schema 2 ... Internal schema m © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Schema Architecture II Global external schema 1 Global external schema 2 ... Global external schema n Global conceptual schema m Sites McGraw-Hill/Irwin Site 1 local mapping schema Site 2 local mapping schema Site 1 local schemas (conceptual, internal, external) Site 2 local schemas (conceptual, internal, external) ... ... Site m local mapping schema Site m local schemas (conceptual, internal, external) © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Transparency for Distributed Database Processing Transparency is related to data independence. With transparency, users can write queries with no knowledge of the distribution, and distribution changes will not cause changes to existing queries and transactions. Without transparency, users must reference some distribution details in queries and distribution changes can lead to changes in existing queries. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Motivating Example Customer CustNo CustName CustCity CustState CustZip CustRegion Product ProdNo 1 1 ProdName ProdColor ProdPrice 8 1 ProdNo OrdNo Inventory OrdCity StockNo OrdDate OrdAmt CustNo McGraw-Hill/Irwin 1 8 OrdNo 8 OrderLine 8 Order QOH WarehouseNo ProdNo © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Fragments Based on the CustRegion Field CREATE FRAGMENT Western-Customers AS SELECT * FROM Customer WHERE CustRegion = 'West' CREATE FRAGMENT Western-Orders AS SELECT Order.* FROM Order, Customer WHERE Order.CustNo = Customer.CustNo AND CustRegion = 'West' CREATE FRAGMENT Western-OrderLines AS SELECT OrderLine.* FROM Customer, OrderLine, Order WHERE OrderLine.OrdNo = Order.OrdNo AND Order.CustNo = Customer.CustNo AND CustRegion = 'West' CREATE FRAGMENT Eastern-Customers AS SELECT * FROM Customer WHERE CustRegion = 'East' CREATE FRAGMENT Eastern-Orders AS SELECT Order.* FROM Order, Customer WHERE Order.CustNo = Customer.CustNo AND CustRegion = 'East' CREATE FRAGMENT Eastern-OrderLines AS SELECT OrderLine.* FROM Customer, OrderLine, Order WHERE OrderLine.OrdNo = Order.OrdNo AND Order.CustNo = Customer.CustNo AND CustRegion = 'East' McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Fragments Based on the WareHouseNo Field CREATE FRAGMENT Denver-Inventory AS SELECT * FROM Inventory WHERE WareHouseNo = 1 CREATE FRAGMENT Seattle-Inventory AS SELECT * FROM Inventory WHERE WareHouseNo = 2 McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Fragmentation Transparency Fragmentation transparency provides the highest level of data independence. Users formulate queries and transactions without knowledge of fragments (locations, or local formats). If fragments change, queries and transactions are not affected. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Location Transparency Location transparency provides a lesser level of data independence than fragmentation transparency. Users need to reference fragments in formulating queries and transactions. However, knowledge of locations and local formats is not necessary. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Local Mapping Transparency Local mapping transparency provides a lesser level of data independence than location transparency. Users need to reference fragments at sites in formulating queries and transactions. However, knowledge of local formats is not necessary. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Distributed Database Processing Distributed data adds considerable complexity to query processing and transaction processing. Distributed database processing involves movement of data, remote processing, and site coordination. Performance implications sometimes cannot be hidden. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Distributed query processing Involves both local (intra site) and global (inter site) optimization. Multiple optimization objectives The weighting of communication costs versus local processing costs depends on network characteristics. There are many more possible access plans for a distributed query. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Distributed Transaction Processing Distributed DBMS provides concurrency and recovery transparency. Independently operating sites must be coordinated. New kinds of failures exist because of the communication network. New protocols are necessary. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Distributed Concurrency Control The simplest scheme involves centralized coordination. Centralized coordination involves the fewest messages and the simplest deadlock detection. The number of messages can be twice as much in distributed coordination. Primary Copy Protocol is used to reduce overhead with locking multiple copies. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Centralized Coordination Lock status Lock request Subtransaction 1 at Site x McGraw-Hill/Irwin Central coordinator Subtransaction 2 ... at Site y ... Subtransaction n at Site z © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Distributed Recovery Management Distributed DBMSs must contend with failures of communication links and sites. Detecting failures involves coordination among sites. The recovery manager must ensure that different parts of a partitioned network act in unison. The protocol for distributed recovery is the two phase commit protocol (2PC). McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Voting and Decision Phases Coordinator Participant 1 Write Begin-Commit to log. Send Ready messages. Wait for responses. Voting phase 3 If all sites vote ready before timeout, Write Global Commit record. Decision phase Send Commit messages. Wait for Acknowledgments. Else send Abort messages. 5 2 Force updates to disk. If no failure, Write Ready-Commit to log. Send Ready vote. Else send Abort vote. 4 Write Commit to log. Release locks. Send acknowledgment. Wait for acknowledgments. Resend Commit messages if necessary. Write global end of transaction. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Summary Utilizing distributed processing and data can significantly improve DBMS services but at the cost of new design challenges. Several client-server architectures provide alternatives among cost, complexity, and benefit levels. Architectures for distributed DBMSs differ in the integration of the local databases and level of data independence. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved.