Concepts of Database Management, Fifth Edition 9-1 Chapter 9 Database Management Approaches At a Glance Table of Contents Overview Objectives Instructor Notes Quick Quizzes Key Terms Lecture Notes Overview In this chapter, students examine several database management topics, most of which are applicable to relational systems. Students learn about the issues involved in distributed processing and distributed databases. They study client/server systems and data warehouses. Students examine object-oriented systems, which treat data as objects and include the actions that operate on the objects. They also study the impact of the Web on database access. Finally, students investigate the history of DBMSs and the network and hierarchical data models. Chapter Objectives Describe distributed database management systems (DBMSs). Discuss client/server systems. Define data warehouses and explain their structure and access. Discuss the general concepts of object-oriented DBMSs. Summarize the impact of Web access to databases. Provide a brief history of database management. Describe the network and hierarchical data models. Instructor Notes Distributed Databases A distributed database is a single logical database that is physically divided among computers at several sites on a network. A distributed database management system (DDBMS) is a DBMS capable of supporting and manipulating distributed databases. Use Figure 9.1 to explain distributed database and distributed database management system (DDBMS). Computers in a network communicate through messages. Accessing data using messages over a network is substantially slower than accessing data on a disk. In general, to access data rapidly in a distributed database, you Concepts of Database Management, Fifth Edition 9-2 must attempt to minimize the numbers of messages. It is usually preferable to send a small number of lengthy messages rather than a large number of short messages. Use the formula for message transmission time and the illustration on page 288 to explain the messaging concept. Characteristics of Distributed DBMSs A Distributed Database Management System (DDBMS) can be homogeneous (same local DBMS at each site) or heterogeneous (different local DBMSs). Heterogeneous DBMSs are more complex and more difficult to manage. All DDBMSs share the characteristics of location transparency, replication transparency, and fragmentation transparency. Location Transparency Location transparency is the characteristic that states that users do not need to be aware of the location of data in a distributed database. Replication Transparency Replication lets users at different sites use and update copies of a database and then share their updates with other users. Replication transparency refers to the characteristic that a DDBMS should update various copies of data behind the scenes; users should be unaware of the steps. Fragmentation Transparency A DDBMS supports data fragmentation is the DDBMS can divide and manage a logical object, such as records in a table, among the various locations under its control. If users are unaware of fragmentation, the DDBMS has fragmentation transparency. Use Figures 9.2 and 9.3 to illustrate fragmentation transparency. Advantages of Distributed Databases When compared with a single centralized database, distributed databases offer the following advantages: Local control of data When each location retains it own data, it can exercise greater control. Increasing database capacity If the size of the disk at a single site becomes inadequate for its database, only the disk capacity at that site needs to be increased. To increase the capacity of the entire database, add a new site. System availability If one of the local databases in a distributed database becomes unavailable, only users who need data in that particular database are affected. Also, it the data has been replicated, potentially all users can continue processing. Added efficiency When data is available locally, you eliminate network communication delays and can retrieve data faster than with a remote centralized database. Disadvantages of Distributed Databases Distributed databases have the following disadvantages: Update of replicated data Extra time is needed to update all the copies at various sites. More complex query processing Complexity occurs due to the difference between the time it takes to send messages between sites and the time it takes to access a disk. More complex treatment of concurrent update Concurrent update in a distributed database is treated in basically the same way as it is treated in nondistributed databases. There is, however, an additional level of complexity created in a distributed environment. More complex recovery The basic recovery process is the same as that for nondistributed databases. However, to make sure that the database remains consistent, Concepts of Database Management, Fifth Edition 9-3 measures each database update should be made permanent or aborted and undone. More difficult management of data dictionary The location of the data dictionary entries is a concern. There are three possibilities: choose one site and store the complete data dictionary at this site; store a complete copy of the data dictionary at each site; and distribute, possibly with replication, the data dictionary entries among the various sites. More complex database design Distributing data does not affect the information level design. During the physical Rules for Distributed Databases C. J. Date formulated 12 rules that distributed databases should follow. These rules are: 1. Local autonomy No site should depend on another site to perform its functions. 2. No reliance on a central site A DDBMS should not need to rely on one site more than any other site. 3. Continuous operation Performing any function should not shut down the entire distributed database. 4. Location transparency Users should feel as if the entire database is stored at their location. 5. Fragmentation transparency Users should feel as if they are using a single central database. 6. Replication transparency Users should not be aware of any data replication. 7. Distributed query processing A DDBMS must process queries as rapidly as possible even though the data is distributed. 8. Distributed transaction management A DDBMS must effectively manage transaction updates at multiple sites. 9. Hardware independence A DDBMS must be able to run on different types of hardware. 10. Operating system independence A DDBMS must be able to run on different operating systems. 11. Network independence A DDBMS must be able to run on different types of networks. 12. DBMS independence A DDBMS must be heterogeneous. Client/Server Systems In a network environment, a file server stores files required by users on the network. When users need data from a file, the entire file is sent. Use Figure 9.4 to describe file server architecture. In client/server architecture, the server is a computer providing data to the clients, which are the computers that are connected to a network and that people use to access data stored on the server. The DBMS runs on the server and a client sends a request for specific data to the server. Only the necessary data and not the entire file or files are sent. A client/server architecture may be either two-tier or three-tier. In a two-tier architecture, the server performs database functions and the clients perform the presentation (user interface) functions. Either the server or the clients may perform business functions. The term fat client refers to an arrangement where the clients perform the business functions. If the business functions reside on the server, each client is called a thin client. Use Figure 9.5 to illustrate two-tier client/server architecture. In a three-tier architecture, the clients perform the presentation functions, a database server performs the database functions, and separate computers, called application servers, perform the business functions and act as interface between clients and database server. Use Figure 9.6 to illustrate three-tier client/server architecture. Concepts of Database Management, Fifth Edition 9-4 Advantages of Client/Server Systems The advantages to using a client/server system instead of a file server are: Transmits only the necessary data rather than entire files, across the network. Lower network traffic Improved processing distribution Can distribute processing functions among multiple computers. Thinner clients Because application and database servers handle most of the processing, clients do not need to be as powerful or as expensive as in a file-server environment. Greater processing transparency Users do not need to learn any special commands or techniques. Increased network, hardware, and software transparency Because SQL is the common language, it is easier for users to access data from a variety of sources. A single operation could access data from different networks, different computers, and different operating systems. Improved security Can place additional security features on the application servers and on the network. Decreased costs Can replace, at a considerable cost savings, mainframe applications and mainframe databases with PC applications and databases. Increased scalability Can upgrade the appropriate server or add additional processors to share the processing load. Triggers and Stored Procedures Triggers, which are actions that occur automatically in response to associated database operations, provide additional integrity support. Review the SQL example to create a trigger. If users execute a trigger or other collection of SQL statements repeatedly, you can improve the performance of a client/server system by placing the statements in a special file, called a stored procedure. Data Warehouses For routine update and retrieval operations, users typically interact with an RDBMS using online transaction processing (OLTP) systems. These are ideal tools for operational needs but suffer from severe performance problems when used for data analysis. Consequently, organizations have turned to data warehouses for the analysis of data. A data warehouse is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision-making process. Subject-oriented means that data is organized by entity rather than by application. Integrated means that data is stored in one place in the data warehouse. Time-variant means that data in a data warehouse represents snapshots of data at various points in time in the past. Nonvolatile means that data is read-only. Use Figure 9.7 to illustrate data warehouse architecture. Spend some time on the subject of a data warehouse. Show the relationship between the type of information being extracted and marketing decisions. Use the examples in the text to ask students for marketing strategies Premiere Products might develop. Organizations use the results of these analyses to specifically target consumers. Data Warehouse Structure and Access The typical data warehouse data structure is a star schema, consisting of a central fact table, surrounded by dimension tables. A fact table consists of rows that contain consolidated and summarized data. The fact table contains a multipart primary key, each part of which is a foreign key to the surrounding dimension tables. Each dimension table contains a single-part primary key that serves as an index for the fact table and that also contains other fields associated with the primary key value. Use Figure 9.8 to illustrate a star schema. Access to a data warehouse is accomplished through the use of online analytical processing (OLAP) software. When users access a data warehouse, their queries usually involve aggregate data, such as total sales by month and average sales by customer. Users often need to perform further analysis on the aggregate results. The most common types of analyses are: slice and dice, drill down, and roll up. Use Figure 9.9 through 9.15 to illustrate these analyses. Concepts of Database Management, Fifth Edition 9-5 Data mining consists of uncovering new knowledge, patterns, trends, and rules from the data stored in a data warehouse. Rules for OLAP Systems E. F. Codd formulated 12 rules that OLAP systems should follow. These 12 rules are: 1. Multidimensional conceptual view Users should be able to view data in a multidimensional way. 2. Transparency Users should not have to know they are using a multidimensional database. 3. Accessibility Users should perceive data as a single user view. 4. Consistent reporting performance The size and complexity of the warehouse should not affect reporting performance. 5. Client/server architecture The server portion of the OLAP software should allow the use of different types of clients. 6. Generic dimensionality Each data dimension should have the same structural and operational capabilities. 7. Dynamic sparse matrix handling Missing data should be handled correctly and efficiently. 8. Multiuser support OLAP should provide secure, concurrent access. 9. Unrestricted, cross-dimensional operations Users should be able to perform the same operations across any number of dimensions. 10. Intuitive data manipulation Users should not need to use special interfaces to make their requests. 11. Flexible reporting Users should be able to report data results any way they want. 12. Unlimited dimensions and aggregation levels OLAP software should allow at least 15 data dimensions and an unlimited number of summary levels Quick Quiz 1. A(n) _____ is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision-making process. Answer: data warehouse 2. A data warehouse structure that contains a fact table and surrounding dimension tables is called a(n) _____ schema. Answer: star 3. _____ consists of uncovering new knowledge, patterns, trends, and rules from the data stored in a data warehouse. Answer: Data mining Object-Oriented DBMSs Relational databases store and access data consisting of text and numbers. Graphics, drawings, photographs, video, sound, voice mail, spreadsheets, and other complex objects can be stored in relational databases using special data types known as binary large objects (BLOBs). When the primary focus is the storage and management of complex objects, many companies use object-oriented DBMSs. Focus on a specific database and ask students questions about a database with which they are familiar or have discussed in class to make sure they understand the underlying concepts. Use the embedded questions to test students’ understanding of object-oriented concepts. Concepts of Database Management, Fifth Edition 9-6 What is an Object-Oriented DBMS? An object-oriented database management system (OODBMS) is one in which data and the actions that operate on the data are encapsulated into objects. An object is a set of related attributes along with the actions that are associated with the set of attributes. The concepts of object class, method, message, and inheritance are fundamental to all object-oriented systems. Objects and Classes Use Figure 9.16 to explain the distinction between objects and classes. The differences between a collection of objects and the relational model representation are: Entities are represented as objects rather than as relations. Attributes are listed vertically below the object names. Each attribute is followed by the name of the domain associated with the attribute. A domain is the set of values that are permitted for an attribute. Objects can contain other objects. An object can contain a portion of another object. A class is a generalized category that describes a group of objects that can exist within it. For any class, you can define a subclass. Methods and Messages Methods are the actions defined for a class. Use Figure 9.17 to explain methods. You define methods during the data definition process. To cause a particular method to be executed, you send a message to the object. A message is a request to execute a method. Inheritance Inheritance is a key feature of object-oriented systems. For any class, you can define a subclass. Every occurrence of the subclass is also considered to be an occurrence of the class. The subclass inherits the structure of the class as well as its methods. Unified Modeling Language Unified Modeling Language (UML) is an approach to model all the various aspects of software development for object-oriented systems. UML includes several types of diagrams, each with its own special purpose that can be used to represent database designs. The type of diagram most relevant to database design is the class diagram. Use Figures 9.18 through 9.21 to illustrate UML. Stress that UML is rapidly becoming the industry standard in objectoriented software development. To learn more about UML, access the IBM Rational Rose web site, http://www306.ibm.com/software/rational/uml/. Rules for OODBMSs There are 14 rules to use as benchmarks against which to measure object-oriented systems. These rules are: 1. Complex objects Must support the creation of complex objects from simple objects. 2. Object identity Must provide a way to identify objects, that is, must be able to distinguish one object from another. 3. Encapsulation Must encapsulate data and associated methods together in the database. 4. Information hiding Must hide the details concerning the way data is stored and actual implementation of methods. 5. Types or classes Must support either abstract types or classes. 6. Inheritance Must support inheritance. 7. Late binding Must be able to use the same name for different operations (late binding allows this). In object-oriented systems, this is called polymorphism. Concepts of Database Management, Fifth Edition 9-7 8. Computational completeness Can use functions in the language of the OODBMS to perform computations. 9. Extensibility Must be able to define new data types. 10. Persistence Must have the ability to have a program remember its data from one execution to the next. 11. Performance Should be able to manage very large databases. 12. Concurrent update support Must support concurrent update. 13. Recovery support Must provide recovery services. 14. Query facility Must provide query facilities. Web Access to Databases Many organizations now use the Internet and a Web browser to conduct commercial activities. Collectively, these activities are called electronic commerce (e-commerce). Users access databases via Web browsers. A three-tier client/server architecture makes this access possible. If a company uses this architecture, a Web server often replaces the application server to process client requests. A more flexible architecture positions Web servers as an additional tier. Use Figure 9.22 to explain access to a database via a Web browser. Many different software languages, products, and standards support e-commerce. One of these languages, XML (Extensible Markup Language) is particularly suited to the exchange of data between different programs. Mention that Access provides the ability to import and export data in XML. Access also supports data access page. A data access page is an HTML document that can be bound directly to data in the database. History of Database Management The beginnings of database management coincided with the APOLLO project of the 1960s. IBM developed Generalized Update Access Method (GUAM) to handle the coordination of vast amounts of data required for the space project. Other DBMSs, such as Integrated Data Store (IDS) also were developed during the 1960s. The relational model had its beginnings in 1970 with the publication of a paper by E.F. Codd. Commercial relational DBMSs did not appear until the 1980s. Microsoft Access is currently the dominant PC-based relational DBMS. Other large relational DBMSs include Oracle, Sybase, MySQL, and SQL Server. Hierarchical and Network Databases You can categorize a DBMS by the data model the DBMS follows. There are four data models, each of which has two components: structure and operations. The relational model and the object-oriented model have been discussed previously in this text. The network and hierarchical models are in declining use. Point out that although the use of network and hierarchical model DBMSs is declining, some of the concepts show up in current database models. Network Model Users perceive a network model database as a collection of record types and relationships between these record types. Such a structure is called a network. Use Figure 9.23 to explain the network model. Hierarchical Model Users perceive a hierarchical model database as a collection of hierarchies, or trees. Use Figure 9.24 to explain the hierarchical model. Key Terms All key terms are defined in the Glossary section of the textbook. Access back-end machine access delay back-end processor American National Standards Institute (ANSI) binary large object (BLOB) application server binding association business to business (B2B) Concepts of Database Management, Fifth Edition business to consumer (B2C) class class diagram client client/server system communications network COnference on DAta SYstems Languages (CODASYL) coordinator data cube data fragmentation Data Language/I (DL/I) data mining data model data warehouse database navigation DataBase Task Group (DBTG) DB2 dBASE dimension table distributed database distributed database management system (DDBMS) domain drill down electronic commerce (e-commerce) encapsulated extensible Extensible Markup Language (XML) fact table fat client file server fragmentation transparency front-end machine front-end processor Gemstone generalization Generalized Update Access Method (GUAM) global deadlock heterogeneous DDBMS hierarchical model hierarchy homogeneous DDBMS Information Management System (IMS) inheritance Integrated Data Management System (IDMS) Integrated Data Store (IDS) local deadlock local site 9-8 location transparency message method multidimensional database multiplicity MySQL network network model object Objectivity/DB object-oriented database management system (OODBMS) object-relational DBMS (ORDBMS) on-line analytical processing (OLAP) on-line transaction processing (OLTP) operation Oracle Paradox persistence polymorphism primary copy private visibility protected visibility public visibility relational DBMS (RDBMS) remote site replication transparency roll up scalability server slice and dice SQL server star schema stored procedure structure subclass superclass Sybase System R thin client three-tier architecture trigger two-phase commit two-tier architecture Unified Modeling Language (UML) Versant visibility symbol