12 Chapter 12 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel 1 12 In this chapter, you will learn: • What a distributed database management system (DDBMS) is and what its components are • How database implementation is affected by different levels of data and process distribution • How transactions are managed in a distributed database environment • How database design is affected by the distributed database environment Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 2 12 The Evolution of Distributed Database Management Systems • Distributed database management system (DDBMS) – Governs storage and processing of logically related data over interconnected computer systems in which both data and processing functions are distributed among several sites Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 3 12 The Evolution of Distributed Database Management Systems (continued) • Centralized database required that corporate data be stored in a single central site • Dynamic business environment and centralized database’s shortcomings spawned a demand for applications based on data access from different sources at multiple locations Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 4 12 The Evolution of Distributed Database Management Systems (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 5 12 DDBMS Advantages and Disadvantages • Advantages include: – – – – – Data are located near “greatest demand” site Faster data access Faster data processing Growth facilitation Improved communications Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 6 12 DDBMS Advantages and Disadvantages (continued) • Advantages include (continued): – – – – Reduced operating costs User-friendly interface Less danger of a single-point failure Processor independence Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 7 12 DDBMS Advantages and Disadvantages (continued) • Disadvantages include: – – – – – Complexity of management and control Security Lack of standards Increased storage requirements Increased training cost Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 8 12 DDBMS Advantages and Disadvantages (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 9 12 DDBMS Advantages and Disadvantages (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 10 12 DDBMS Advantages and Disadvantages (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 11 12 Characteristics of Distributed Management Systems • Application interface • Validation • Transformation • Query optimization • Mapping • I/O interface Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 12 12 Characteristics of Distributed Management Systems (continued) • Formatting • Security • Backup and recovery • DB administration • Concurrency control • Transaction management Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 13 12 Characteristics of Distributed Management Systems (continued) • Must perform all the functions of centralized DBMS • Must handle all necessary functions imposed by distribution of data and processing – Must perform these additional functions transparently to the end user Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 14 12 Characteristics of Distributed Management Systems (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 15 12 DDBMS Components • Must include (at least) the following components: – – – – Computer workstations Network hardware and software Communications media Transaction processor (application processor, transaction manager) • Software component found in each computer that requests data Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 16 12 DDBMS Components (continued) • Must include (at least) the following components (continued): – Data processor or data manager • Software component residing on each computer that stores and retrieves data located at the site • May be a centralized DBMS Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 17 12 DDBMS Components (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 18 12 Levels of Data and Process Distribution Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 19 12 Single-Site Processing, Single-Site Data (SPSD) • All processing is done on single CPU or host computer (mainframe, midrange, or PC) • All data are stored on host computer’s local disk • Processing cannot be done on end user’s side of system Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 20 12 Single-Site Processing, Single-Site Data (SPSD) (continued) • Typical of most mainframe and midrange computer DBMSs • DBMS is located on host computer, which is accessed by dumb terminals connected to it • Also typical of first generation of single-user microcomputer databases Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 21 12 Single-Site Processing, Single-Site Data (SPSD) (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 22 12 Multiple-Site Processing, Single-Site Data (MPSD) • Multiple processes run on different computers sharing single data repository • MPSD scenario requires network file server running conventional applications that are accessed through LAN • Many multiuser accounting applications, running under personal computer network, fit such a description Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 23 12 Multiple-Site Processing, Single-Site Data (MPSD) (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 24 12 Multiple-Site Processing, Multiple-Site Data (MPMD) • Fully distributed database management system with support for multiple data processors and transaction processors at multiple sites • Classified as either homogeneous or heterogeneous • Homogeneous DDBMSs – Integrate only one type of centralized DBMS over a network Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 25 12 Multiple-Site Processing, Multiple-Site Data (MPMD) (continued) • Heterogeneous DDBMSs – Integrate different types of centralized DBMSs over a network • Fully heterogeneous DDBMS – Support different DBMSs that may even support different data models (relational, hierarchical, or network) running under different computer systems, such as mainframes and microcomputers Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 26 12 Multiple-Site Processing, Multiple-Site Data (MPMD) (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 27 12 Distributed Database Transparency Features • Allow end user to feel like database’s only user • Features include: – – – – – Distribution transparency Transaction transparency Failure transparency Performance transparency Heterogeneity transparency Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 28 12 Distribution Transparency • Allows management of physically dispersed database as though it were a centralized database • Following three levels of distribution transparency are recognized: – Fragmentation transparency – Location transparency – Local mapping transparency Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 29 12 Distribution Transparency (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 30 12 Distribution Transparency (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 31 12 Transaction Transparency • Ensures database transactions will maintain distributed database’s integrity and consistency Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 32 12 Distributed Requests and Distributed Transactions • Distributed transaction – Can update or request data from several different remote sites on network • Remote request – Lets single SQL statement access data to be processed by single remote database processor • Remote transaction – Accesses data at single remote site Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 33 12 Distributed Requests and Distributed Transactions (continued) • Distributed transaction – Allows transaction to reference several different (local or remote) DP sites • Distributed request – Lets single SQL statement reference data located at several different local or remote DP sites Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 34 12 Distributed Requests and Distributed Transactions (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 35 12 Distributed Requests and Distributed Transactions (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 36 12 Distributed Requests and Distributed Transactions (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 37 12 Distributed Requests and Distributed Transactions (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 38 12 Distributed Requests and Distributed Transactions (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 39 12 Distributed Concurrency Control • Multisite, multiple-process operations are much more likely to create data inconsistencies and deadlocked transactions than are single-site systems Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 40 12 Distributed Concurrency Control (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 41 12 Two-Phase Commit Protocol • Distributed databases make it possible for transaction to access data at several sites • Final COMMIT must not be issued until all sites have committed their parts of transaction • Two-phase commit protocol requires each individual DP’s transaction log entry be written before database fragment is actually updated Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 42 12 Performance Transparency and Query Optimization • Objective of query optimization routine is to minimize total cost associated with execution of request • Costs associated with request are function of: – Access time (I/O) cost – Communication cost – CPU time cost • Must provide distribution transparency as well as replica transparency Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 43 12 Performance Transparency and Query Optimization (continued) • Replica transparency – DDBMS’s ability to hide existence of multiple copies of data from user • Query optimization techniques include: – Manual or automatic – Static or dynamic – Statistically based or rule-based algorithms Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 44 12 Distributed Database Design • Data fragmentation – How to partition database into fragments • Data replication – Which fragments to replicate • Data allocation – Where to locate those fragments and replicas Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 45 12 Data Fragmentation • Breaks single object into two or more segments or fragments • Each fragment can be stored at any site over computer network • Information about data fragmentation is stored in distributed data catalog (DDC), from which it is accessed by TP to process user requests Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 46 12 Data Fragmentation (continued) • Strategies – Horizontal fragmentation • Division of a relation into subsets (fragments) of tuples (rows) – Vertical fragmentation • Division of a relation into attribute (column) subsets – Mixed fragmentation • Combination of horizontal and vertical strategies Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 47 12 Data Fragmentation (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 48 12 Data Fragmentation (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 49 12 Data Fragmentation (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 50 12 Data Fragmentation (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 51 12 Data Fragmentation (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 52 12 Data Fragmentation (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 53 12 Data Fragmentation (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 54 12 Data Replication • Storage of data copies at multiple sites served by computer network • Fragment copies can be stored at several sites to serve specific information requirements – Can enhance data availability and response time – Can help to reduce communication and total query costs Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 55 12 Data Replication (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 56 12 Data Replication (continued) • Replication scenarios – Fully replicated database • Stores multiple copies of each database fragment at multiple sites • Can be impractical due to amount of overhead – Partially replicated database • Stores multiple copies of some database fragments at multiple sites • Most DDBMSs are able to handle the partially replicated database well Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 57 12 Data Replication (continued) • Replication scenarios (continued) – Unreplicated database • Stores each database fragment at single site • No duplicate database fragments Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 58 12 Data Allocation • Deciding where to locate data • Allocation strategies – Centralized data allocation • Entire database is stored at one site – Partitioned data allocation • Database is divided into several disjointed parts (fragments) and stored at several sites Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 59 12 Data Allocation (continued) • Allocation strategies (continued) – Replicated data allocation • Copies of one or more database fragments are stored at several sites • Data distribution over computer network is achieved through data partition, data replication, or combination of both Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 60 12 Client/Server vs. DDBMS • Way in which computers interact to form system • Features user of resources, or client, and provider of resources, or server • Can be used to implement a DBMS in which client is the TP and server is the DP Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 61 12 Client/Server vs. DDBMS (continued) • Client/server advantages – Less expensive than alternate minicomputer or mainframe solutions – Allow end user to use microcomputer’s GUI, thereby improving functionality and simplicity – More people in job market have PC skills than mainframe skills – PC is well established in workplace Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 62 12 Client/Server vs. DDBMS (continued) • Client/server advantages (continued) – Numerous data analysis and query tools exist to facilitate interaction with DBMSs available in PC market – Considerable cost advantage to offloading applications development from mainframe to powerful PCs Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 63 12 Client/Server vs. DDBMS (continued) • Client/server disadvantages – Creates more complex environment • Different platforms (LANs, operating systems, and so on) are often difficult to manage – An increase in number of users and processing sites often paves the way for security problems Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 64 12 Client/Server vs. DDBMS (continued) • Client/server disadvantages (continued) – Possible to spread data access to much wider circle of users • Increases demand for people with broad knowledge of computers and software • Increases burden of training and cost of maintaining the environment Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 65 12 C. J. Date’s Twelve Commandments for Distributed Databases • Local site independence • Central site independence • Failure independence • Location transparency • Fragmentation transparency • Replication transparency Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 66 12 C. J. Date’s Twelve Commandments for Distributed Databases (continued) • Distributed query processing • Distributed transaction processing • Hardware independence • Operating system independence • Network independence • Database independence Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 67 12 Summary • Distributed database stores logically related data in two or more physically independent sites connected via computer network • Distributed processing is division of logical database processing among two or more network nodes • Distributed databases require distributed processing • Main components of DDBMS are transaction processor and data processor Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 68 12 Summary (continued) • Current database systems can be classified by extent to which they support processing and data distribution • Homogeneous distributed database system integrates only one particular type of DBMS over computer network • Heterogeneous distributed database system integrates several different types of DBMSs over computer network Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 69 12 Summary (continued) • DDBMS characteristics are best described as set of transparencies • Transaction is formed by one or more database requests • Distributed concurrency control is required in network of distributed databases • Distributed DBMS evaluates every data request to find optimum access path in distributed database Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 70 12 Summary (continued) • The design of distributed database must consider fragmentation and replication of data • Database can be replicated over several different sites on computer network • Client/server architecture refers to way in which two computers interact over computer network to form a system Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel 71