Chapter 28: IBM DB2 Universal Database Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Database System Concepts Chapter 1: Introduction Part 1: Relational databases Chapter 2: Relational Model Chapter 3: SQL Chapter 4: Advanced SQL Chapter 5: Other Relational Languages Part 2: Database Design Chapter 6: Database Design and the E-R Model Chapter 7: Relational Database Design Chapter 8: Application Design and Development Part 3: Object-based databases and XML Chapter 9: Object-Based Databases Chapter 10: XML Part 4: Data storage and querying Chapter 11: Storage and File Structure Chapter 12: Indexing and Hashing Chapter 13: Query Processing Chapter 14: Query Optimization Part 5: Transaction management Chapter 15: Transactions Chapter 16: Concurrency control Chapter 17: Recovery System Database System Concepts - 5th Edition, May 23, 2005 Part 6: Data Mining and Information Retrieval Chapter 18: Data Analysis and Mining Chapter 19: Information Retreival Part 7: Database system architecture Chapter 20: Database-System Architecture Chapter 21: Parallel Databases Chapter 22: Distributed Databases Part 8: Other topics Chapter 23: Advanced Application Development Chapter 24: Advanced Data Types and New Applications Chapter 25: Advanced Transaction Processing Part 9: Case studies Chapter 26: PostgreSQL Chapter 27: Oracle Chapter 28: IBM DB2 Chapter 29: Microsoft SQL Server Online Appendices Appendix A: Network Model Appendix B: Hierarchical Model Appendix C: Advanced Relational Database Model 27.2 ©Silberschatz, Korth and Sudarshan Part 9: Case studies (Chapters 26 through 29). Chapter 26: PostgreSQL Chapter 27: Oracle Chapter 28: IBM DB2 Chapter 29: Microsoft SQL Server. These chapters outline unique features of each of these systems, and describe their internal structure. They provide a wealth of interesting information about the respective products, and help you see how the various implementation techniques described in earlier parts are used in real systems. They also cover several interesting practical aspects in the design of real systems. Database System Concepts - 5th Edition, May 23, 2005 27.3 ©Silberschatz, Korth and Sudarshan Chapter 28: IBM DB2 Universal Database 28.1 Overview 28.2 Database-Design Tools 28.3 SQL Variations and Extensions 28.4 Storage and Indexing 28.5 Multidimensional Clustering 28.6 Query Processing and Optimization 28.7 Materialized Query Tables 28.8 Autonomic Features in DB2 28.9 Tools and Utilities 28.10 Concurrency Control and Recovery 28.11 System Architecture 28.12 Replication, Distribution, and External Data 28.13 Business Intelligence Features Bibliographical Notes Database System Concepts - 5th Edition, May 23, 2005 27.4 ©Silberschatz, Korth and Sudarshan Overview The origin of DB2 The System R project at IBM’s Almaden Research Center in 1976 The first DB2 product was released in 1984 IBM continually enhanced the DB2 product in areas Transaction processing Query processing and optimization Parallel processing Active database support Advanced query and warehousing techniques Object-relational support Database System Concepts - 5th Edition, May 23, 2005 27.5 ©Silberschatz, Korth and Sudarshan Overview (cont.) DB2 database engine consists of four code base types Linux, Unix, and Windows z/OS VM OS/400 In this chapter, the focus is on the DB2 Universal Database (UDB) engine that supports Linux, Unix, and Windows The latest version of DB2 is version 8.2, which improves the scalability, availability, and general robustness of the DB2 engine Database System Concepts - 5th Edition, May 23, 2005 27.6 ©Silberschatz, Korth and Sudarshan Overview (cont.) Database System Concepts - 5th Edition, May 23, 2005 27.7 ©Silberschatz, Korth and Sudarshan Chapter 28: IBM DB2 Universal Database 28.1 Overview 28.2 Database-Design Tools 28.3 SQL Variations and Extensions 28.4 Storage and Indexing 28.5 Multidimensional Clustering 28.6 Query Processing and Optimization 28.7 Materialized Query Tables 28.8 Autonomic Features in DB2 28.9 Tools and Utilities 28.10 Concurrency Control and Recovery 28.11 System Architecture 28.12 Replication, Distribution, and External Data 28.13 Business Intelligence Features Bibliographical Notes Database System Concepts - 5th Edition, May 23, 2005 27.8 ©Silberschatz, Korth and Sudarshan Database-Design Tools DB2 provides support for many logical and physical database features using SQL The feature includes constraints, triggers, and recursion using SQL constructs Physical database features such as tablespaces, bufferpools, and partitioning are also supported by using SQL statements DB2 Control Center Includes various design- and administration-related tools Provides a tree-view of a server and its database Allows users to define new objects, create ad-hoc SQL queries, and view query results Database System Concepts - 5th Edition, May 23, 2005 27.9 ©Silberschatz, Korth and Sudarshan Chapter 28: IBM DB2 Universal Database 28.1 Overview 28.2 Database-Design Tools 28.3 SQL Variations and Extensions 28.4 Storage and Indexing 28.5 Multidimensional Clustering 28.6 Query Processing and Optimization 28.7 Materialized Query Tables 28.8 Autonomic Features in DB2 28.9 Tools and Utilities 28.10 Concurrency Control and Recovery 28.11 System Architecture 28.12 Replication, Distribution, and External Data 28.13 Business Intelligence Features Bibliographical Notes Database System Concepts - 5th Edition, May 23, 2005 27.10 ©Silberschatz, Korth and Sudarshan SQL Variations and Extensions DB2 supports a rich set of SQL features for database processing XML features A rich set of XML functions have been included in DB2 xmlelement xmlattributes xmlforest xmlconcat xmlserialize xmlagg xml2clob Fig. 28.2 shows an SQL query with XML extensions Fig. 28.3 shows its resultant output Database System Concepts - 5th Edition, May 23, 2005 27.11 ©Silberschatz, Korth and Sudarshan SQL Variations and Extensions (cont.) Database System Concepts - 5th Edition, May 23, 2005 27.12 ©Silberschatz, Korth and Sudarshan SQL Variations and Extensions (cont.) Support for Data Types DB2 supports user-defined data types Users can define distinct or structured data types User-defined data types: distinct create distinct type us_dollar as decimal(9,2) User can create a field in a table with type us_dollar select product from us_sales where price > us_dollar(1000) Database System Concepts - 5th Edition, May 23, 2005 27.13 ©Silberschatz, Korth and Sudarshan SQL Variations and Extensions (Cont.) Support for Data Types (cont.) User-defined data types: structured Structured data types are complex objects that usually consist of two or more attributes. create type department_t as (deptname varchar(32), depthead varchar(32), faculty_count integer) mode db2sql Structured types can be used to define typed tables create table dept of department_t Database System Concepts - 5th Edition, May 23, 2005 27.14 ©Silberschatz, Korth and Sudarshan SQL Variations and Extensions (Cont.) User-Defined Functions (UDFs) and Methods Users can define their own functions and methods Functions can generate scalars (single attribute) or tables (multiattribute row) as their result User can register functions using the create function statement User-defined functions can operate in fenced or unfenced modes. In fenced mode: the functions are executed by a separate thread in its own address space In unfenced mode: the database processing agent is allowed to execute the function in the server’s address space Database System Concepts - 5th Edition, May 23, 2005 27.15 ©Silberschatz, Korth and Sudarshan SQL Variations and Extensions (Cont.) User-Defined Functions (UDFs) and Methods (cont.) Fig 28.4 shows a definition of UDF, db2gse.GsegeFilterDist Database System Concepts - 5th Edition, May 23, 2005 27.16 ©Silberschatz, Korth and Sudarshan SQL Variations and Extensions (Cont.) Large Objects New database applications require the manipulation of text, images, video, and other types of data that are typically quite large DB2 supports these requirements by providing three different large object (LOB) types Binary Large Objects (blobs) Single Byte Character Large Objects (clobs) Double Byte Character Large Objects (dbclobs) Database System Concepts - 5th Edition, May 23, 2005 27.17 ©Silberschatz, Korth and Sudarshan SQL Variations and Extensions (Cont.) Indexing Extensions and Constraints Users can create index extensions to generate keys from structured data types by using create index extension statement Database System Concepts - 5th Edition, May 23, 2005 27.18 ©Silberschatz, Korth and Sudarshan SQL Variations and Extensions (Cont.) Web Services DB2 can integrate Web services as producer or consumer For example, a Web service invokes the following SQL select trn_id, amount, date from transactions where cust_id = <input> order by date fetch first 1 row only; The following SQL shows DB2 acting as a consumer of a Web service select ticker_id, GetQuote(ticker_id) from portfolio Database System Concepts - 5th Edition, May 23, 2005 27.19 ©Silberschatz, Korth and Sudarshan Chapter 28: IBM DB2 Universal Database 28.1 Overview 28.2 Database-Design Tools 28.3 SQL Variations and Extensions 28.4 Storage and Indexing 28.5 Multidimensional Clustering 28.6 Query Processing and Optimization 28.7 Materialized Query Tables 28.8 Autonomic Features in DB2 28.9 Tools and Utilities 28.10 Concurrency Control and Recovery 28.11 System Architecture 28.12 Replication, Distribution, and External Data 28.13 Business Intelligence Features Bibliographical Notes Database System Concepts - 5th Edition, May 23, 2005 27.20 ©Silberschatz, Korth and Sudarshan Storage and Indexing Storage Architecture DB2 provides storage abstraction for managing logical database tables usefully in multinode and multidisk environment Nodegroups can be defined to support table partitioning across a specific set of nodes in a multinode system → Flexibility in allocating table partitions ex) Large tables may be partitioned across all nodes in a system Small tables may reside on a single node Database System Concepts - 5th Edition, May 23, 2005 27.21 ©Silberschatz, Korth and Sudarshan Storage and Indexing (cont.) Storage Architecture (cont.) DB2 uses tablespaces to organize tables A tablespace consists of one or more containers, which are references to directories, devices, or files A tablespace may contain zero or more database objects such as tables, indices, or LOBs Fig 28.6 shows that Two tablespaces are defined for a nodegroup Humanres tablespace is assigned four containers Sched tablespace has only one container Employee and department tables are assigned to the humanres tablespace Project table is in the sched tablespace Database System Concepts - 5th Edition, May 23, 2005 27.22 ©Silberschatz, Korth and Sudarshan Storage and Indexing (cont.) Database System Concepts - 5th Edition, May 23, 2005 27.23 ©Silberschatz, Korth and Sudarshan Storage and Indexing (cont.) Storage Architecture (cont.) Two kinds of tablespaces System-managed spaces (SMS) : Directories or file systems that the underlying operating system maintains Data-managed spaces (DMS) : Raw devices or preallocated files that are then controlled by DB2 DB2 support stripping across the different containers One-by-one extent page in round-robin fashion Benefits are parallel I/O and load balancing Database System Concepts - 5th Edition, May 23, 2005 27.24 ©Silberschatz, Korth and Sudarshan Storage and Indexing (cont.) Buffer Pools A buffer pool is a common shared data area that maintains memory copies of objects One or more buffer pools may be associated each tablespace for managing different objects DB2 allows buffer pools to be defined by SQL statements create bufferpool <buffer-pool> …. alter bufferpool <buffer-pool> size <n> Database System Concepts - 5th Edition, May 23, 2005 27.25 ©Silberschatz, Korth and Sudarshan Storage and Indexing (cont.) Tables, Records, and Indices A table consists of a set of pages, and each page consists of a set of records user data records or special system records DB2 uses a space map record called Free Space Control Record (FSCR) to find free space in table Fig 28.7 shows the logical view of a table and an associated index Database System Concepts - 5th Edition, May 23, 2005 27.26 ©Silberschatz, Korth and Sudarshan Storage and Indexing (cont.) Database System Concepts - 5th Edition, May 23, 2005 27.27 ©Silberschatz, Korth and Sudarshan Storage and Indexing (cont.) Tables, Records, and indices (cont.) Indices are organized as pages containing index records and pointers to child and sibling pages DB2 supports B+-tree index mechanisms DB2 contains internal pages and leaf pages Leaf pages contain index entries that point to records in the table Each record in a table can be uniquely identified by using its page and slot information, called a record ID (RID) DB2 supports “include columns” in the index definition create unique index I1 on T1(C1) include (C2) Specifies that C2 is to be included as an extra column in an index on column C1 This enables DB2 use “index-only” query processing techniques Database System Concepts - 5th Edition, May 23, 2005 27.28 ©Silberschatz, Korth and Sudarshan Storage and Indexing (Cont.) Tables, Records, and indexes (cont.) Data page Fig. 28.8 shows the typical data page format in DB2 Each data page contains a header and a slot directory The slot directory is an array of 255 entries that points to record offsets in the page Page 1056 contains record 1 at offset 3700, which is a forward pointer to the record <473, 2> Record <473,2> is an overflow record created as a result of an update operation of the original record <1056,1> DB2 supports different page sizes such as 4,8,16 and 32 KB Database System Concepts - 5th Edition, May 23, 2005 27.29 ©Silberschatz, Korth and Sudarshan Storage and Indexing (cont.) Database System Concepts - 5th Edition, May 23, 2005 27.30 ©Silberschatz, Korth and Sudarshan Chapter 28: IBM DB2 Universal Database 28.1 Overview 28.2 Database-Design Tools 28.3 SQL Variations and Extensions 28.4 Storage and Indexing 28.5 Multidimensional Clustering 28.6 Query Processing and Optimization 28.7 Materialized Query Tables 28.8 Autonomic Features in DB2 28.9 Tools and Utilities 28.10 Concurrency Control and Recovery 28.11 System Architecture 28.12 Replication, Distribution, and External Data 28.13 Business Intelligence Features Bibliographical Notes Database System Concepts - 5th Edition, May 23, 2005 27.31 ©Silberschatz, Korth and Sudarshan Multidimensional Clustering Multidimensional Clustering (MDC) DB2 table may be created by specifying one or more keys as dimensions along which to cluster the table’s data A new clause called organize by dimensions For example, the following DDL describes a sales table organized by storedId, year(orderDate), and itemId as dimensions create table sales(storeId int, orderDate date, shipDate date, receiptDate date, region int, itemId int, price float, yearOd int generated always as year(orderDate)) organize by dimensions (region, yearOd, itemId) Database System Concepts - 5th Edition, May 23, 2005 27.32 ©Silberschatz, Korth and Sudarshan Multidimensional Clustering (cont.) Multidimensional Clustering (MDC) (cont.) Each of dimensions may consist of one or more columns A ‘dimension block index’ is automatically created for each of the specified dimensions used to access data quickly and efficiently A composite block index is used to maintain the clustering of data over insert and update activity Every unique combination of dimension values forms a logical “cell”, which is physically organized as block of pages Fig 28.9 shows a simple logical cube with only two values for each dimension attribute Database System Concepts - 5th Edition, May 23, 2005 27.33 ©Silberschatz, Korth and Sudarshan Multidimensional Clustering (cont.) Database System Concepts - 5th Edition, May 23, 2005 27.34 ©Silberschatz, Korth and Sudarshan Multidimensional Clustering (cont.) Block Indices A dimension block index is created on each of the year(orderDate), region, and itemId attributes Each is structured in the same manner as a traditional B-tree index except that, at the leaf level, the keys point to a block identifier (BID) instead of a RID Fig. 28. 10 illustrates slices of blocks for specific values of region and itemId dimensions, respectively Database System Concepts - 5th Edition, May 23, 2005 27.35 ©Silberschatz, Korth and Sudarshan Multidimensional Clustering (cont.) Database System Concepts - 5th Edition, May 23, 2005 27.36 ©Silberschatz, Korth and Sudarshan Multidimensional Clustering (cont.) Block Map A block map Is also associated with the table Records the state of each block belonging to the table May be in such states as – In use – Free – Loaded – Requiring constant enforcement Database System Concepts - 5th Edition, May 23, 2005 27.37 ©Silberschatz, Korth and Sudarshan Multidimensional Clustering (cont.) Design Considerations A crucial aspect of MDC is to choose The right set of dimensions for clustering a table The right block size parameter to minimize the space utilization If chosen appropriately, If chosen incorrectly, Significant performance and maintenance advantages Performance degradation and worse space utilization There are a number of tuning knobs that can be exploited to organize the table Database System Concepts - 5th Edition, May 23, 2005 27.38 ©Silberschatz, Korth and Sudarshan Chapter 28: IBM DB2 Universal Database 28.1 Overview 28.2 Database-Design Tools 28.3 SQL Variations and Extensions 28.4 Storage and Indexing 28.5 Multidimensional Clustering 28.6 Query Processing and Optimization 28.7 Materialized Query Tables 28.8 Autonomic Features in DB2 28.9 Tools and Utilities 28.10 Concurrency Control and Recovery 28.11 System Architecture 28.12 Replication, Distribution, and External Data 28.13 Business Intelligence Features Bibliographical Notes Database System Concepts - 5th Edition, May 23, 2005 27.39 ©Silberschatz, Korth and Sudarshan Query Processing and Optimization DB2 queries are transformed into a tree of operations by the query compiler The query operator tree is used at execution time for processing DB2 supports a rich set of query operators Figure 28.12 and 28.13 show a query and its associated query plan The query is a representative complex query from the TPC-H benchmark and contains several joins and aggregations The query plan in this example is rather simple DB2 provides powerful visual explain feature that can help users understand the details of a query execution plan The query plan in the figure is based on the visual explain for the query Database System Concepts - 5th Edition, May 23, 2005 27.40 ©Silberschatz, Korth and Sudarshan Query Processing and Optimization (cont.) Database System Concepts - 5th Edition, May 23, 2005 27.41 ©Silberschatz, Korth and Sudarshan Query Processing and Optimization (cont.) # TPC- H benchmark The TPC is a non-profit corporation founded to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry The TPC Benchmark (TPC-H) is a decision support benchmark It consists of a suite of business oriented ad-hoc queries and concurrent data modifications The queries and the data populating the database have been chosen to have broad industry-wide relevance This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions Database System Concepts - 5th Edition, May 23, 2005 27.42 ©Silberschatz, Korth and Sudarshan Query Processing and Optimization (cont.) Access Methods DB2 supports a comprehensive set of access methods Table scan Is the most basic method and performs a page-by-page access of all records in the table Index scan DB2 accesses the qualifying records using the RIDs in the index Block index scan A new The access method for MDC tables qualifying blocks are accessed and processed Database System Concepts - 5th Edition, May 23, 2005 27.43 ©Silberschatz, Korth and Sudarshan Query Processing and Optimization (cont.) Access Methods (cont.) Index only Is used when the index contains all the attributes required A scan of the index entries is sufficient, and there is no need to fetch the records List prefetch This access method is chosen for an unclustered index scan with a significant number of RIDs DB2 collects the RIDs of relevant records using an index scan Then sorts the RIDs by page number Finally performs a fetch of the records in sorted order from the data pages Database System Concepts - 5th Edition, May 23, 2005 27.44 ©Silberschatz, Korth and Sudarshan Query Processing and Optimization (cont.) Access Methods (cont.) Block and record index ANDing DB2 uses this method when it determines that more than one index can be used to constrain the number of satisfying records in a base table It processes the most selective index to generate a list of BIDs or RIDs It then processes the next selective index to return the BIDs or RIDs A BID or RID qualifies for further processing only if it is present in the intersection (AND operation) Block and record index ORing Two or more block or record indices can be used to satisfy query predicates that are combined by using the OR operator DB2 eliminates duplicate BIDs or RIDs by performing a sort and then fetches the resulting set of records Database System Concepts - 5th Edition, May 23, 2005 27.45 ©Silberschatz, Korth and Sudarshan Query Processing and Optimization (cont.) Join, Aggregation, and Set Operations DB2 can choose between nested-loop, sort-merge, and hash-join techniques In describing the join and set binary operation, we use the notation of “outer” and “inner” tables to distinguish the two input streams Nested-loop technique is useful if the inner table – Is very small – Or can be accessed by using an index on a join predicate Sort-merge-join and hash-join techniques are used for joins involving large outer and inner tables Merging technique eliminates duplicates in the case of union DB2 implements set operations by using sorting and merging techniques Database System Concepts - 5th Edition, May 23, 2005 27.46 ©Silberschatz, Korth and Sudarshan Query Processing and Optimization (cont.) Support for Complex SQL Processing One of the most important aspects of DB2 is that it uses the query processing infrastructure in an extensible fashion to support complex SQL operation The complex SQL operations include support for deeply nested and correlated subqueries as well as constraints, referential integrity, and triggers DB2 can provide better efficiency and scalability by performing most of the constraint checking and triggering actions as part of the query plan Database System Concepts - 5th Edition, May 23, 2005 27.47 ©Silberschatz, Korth and Sudarshan Query Processing and Optimization (cont.) Multiprocessor Query-Processing Features DB2 extends the base set of query operation with control and data exchange primitives to support SMP, MPP, and SMP-cluster modes of query processing DB2 uses a “tablequeue” abstraction for data exchange between threads on different nodes or on the same node The table queue is a buffer that redirects data to appropriate receivers by using broadcast, one-to-one, or directed multicast methods In all these modes, DB2 employs a coordinator process to control the query operations and final result gathering Coordinator processes can also perform some global databaseprocessing actions Database System Concepts - 5th Edition, May 23, 2005 27.48 ©Silberschatz, Korth and Sudarshan Query Processing and Optimization (cont.) Multiprocessor Query-Processing Features (cont.) Fig 28.14 shows a simple query executing in a 4-node MPP system Database System Concepts - 5th Edition, May 23, 2005 27.49 ©Silberschatz, Korth and Sudarshan Query Processing and Optimization (cont.) Query Optimization Procedures Parsing the SQL statement to QGM (Query Graph Model) Performing semantic transformations on the QGM to enforce constraints, referential integrity, and triggers → Enhanced QGM Performing query rewrite transformations Decorrelation of correlated subqueries Transformation of subqueries into joins using early-out processing Pushing the group by operation below joins Using materialized view for portions of the original query The query optimizer component uses this enhanced and transformed QGM as its input for optimization Database System Concepts - 5th Edition, May 23, 2005 27.50 ©Silberschatz, Korth and Sudarshan Chapter 28: IBM DB2 Universal Database 28.1 Overview 28.2 Database-Design Tools 28.3 SQL Variations and Extensions 28.4 Storage and Indexing 28.5 Multidimensional Clustering 28.6 Query Processing and Optimization 28.7 Materialized Query Tables 28.8 Autonomic Features in DB2 28.9 Tools and Utilities 28.10 Concurrency Control and Recovery 28.11 System Architecture 28.12 Replication, Distribution, and External Data 28.13 Business Intelligence Features Bibliographical Notes Database System Concepts - 5th Edition, May 23, 2005 27.51 ©Silberschatz, Korth and Sudarshan Materialized Query Tables Materialized Query Tables (MQTs) In DB2, the materialized views are called materialized query tables MQTs are specified by using create table statement as shown by Fig. 28.15 Database System Concepts - 5th Edition, May 23, 2005 27.52 ©Silberschatz, Korth and Sudarshan Materialized Query Tables (cont.) Query Routing to MQTs The internal QGM model allows the compiler to match the input query against the available MQT definitions choose appropriate MQTs for consideration Fig. 28.16 shows the entire flow of the reroute and optimization Database System Concepts - 5th Edition, May 23, 2005 27.53 ©Silberschatz, Korth and Sudarshan Materialized Query Tables (cont.) Maintenance of MQTs Two dimensions to maintenance: time and cost Time dimension – Immediate – Deferred Size dimension – Incremental – Full Fig. 28.17 shows the two dimensions and the options that are the most useful Database System Concepts - 5th Edition, May 23, 2005 27.54 ©Silberschatz, Korth and Sudarshan Chapter 28: IBM DB2 Universal Database 28.1 Overview 28.2 Database-Design Tools 28.3 SQL Variations and Extensions 28.4 Storage and Indexing 28.5 Multidimensional Clustering 28.6 Query Processing and Optimization 28.7 Materialized Query Tables 28.8 Autonomic Features in DB2 28.9 Tools and Utilities 28.10 Concurrency Control and Recovery 28.11 System Architecture 28.12 Replication, Distribution, and External Data 28.13 Business Intelligence Features Bibliographical Notes Database System Concepts - 5th Edition, May 23, 2005 27.55 ©Silberschatz, Korth and Sudarshan Autonomic Features in DB2 Autonomic computing includes a set of techniques that allow the computing environment to manage itself and reduce the external dependencies Configuration DB2 supports autonomic tuning of various memory and system configuration parameters For instance, parameters such as buffer pool sizes and sort heap sizes can be specified as autonomic Database System Concepts - 5th Edition, May 23, 2005 27.56 ©Silberschatz, Korth and Sudarshan Autonomic Features in DB2 (cont.) Optimization Important aspects of improving the performance Auxiliary Data data structures (indices, MQTs) organization features (partitioning, clustering) Design Advisor Provides workload-based advice Automatically analyze a workload, using optimization techniques ex) db2advis -d <DB name> -i <workloadfile> -m MICP – M: Materialized query tables – I: Indices – C: Clustering, namely, MDC – P: Partitioning key selection Database System Concepts - 5th Edition, May 23, 2005 27.57 ©Silberschatz, Korth and Sudarshan Chapter 28: IBM DB2 Universal Database 28.1 Overview 28.2 Database-Design Tools 28.3 SQL Variations and Extensions 28.4 Storage and Indexing 28.5 Multidimensional Clustering 28.6 Query Processing and Optimization 28.7 Materialized Query Tables 28.8 Autonomic Features in DB2 28.9 Tools and Utilities 28.10 Concurrency Control and Recovery 28.11 System Architecture 28.12 Replication, Distribution, and External Data 28.13 Business Intelligence Features Bibliographical Notes Database System Concepts - 5th Edition, May 23, 2005 27.58 ©Silberschatz, Korth and Sudarshan Tools and Utilities The DB2 control center is the primary tool for use and administration of DB2 databases It is organized from data objects such as servers, databases, tables, and indices It contains task-oriented interfaces to perform commands allows users to generate SQL scripts Fig. 28.18 shows a screen shot of the main panel of the Control Center Database System Concepts - 5th Edition, May 23, 2005 27.59 ©Silberschatz, Korth and Sudarshan Tools and Utilities (cont.) Database System Concepts - 5th Edition, May 23, 2005 27.60 ©Silberschatz, Korth and Sudarshan Tools and Utilities (cont.) Utilities DB2 provides comprehensive support for load, import, export, reorg, redistribute, and other data-related utilities DB2 supports a number of tools such as Audit facility Governor facility Query patroller facility Trace and diagnostic facilities Event monitoring facilities Database System Concepts - 5th Edition, May 23, 2005 27.61 ©Silberschatz, Korth and Sudarshan Chapter 28: IBM DB2 Universal Database 28.1 Overview 28.2 Database-Design Tools 28.3 SQL Variations and Extensions 28.4 Storage and Indexing 28.5 Multidimensional Clustering 28.6 Query Processing and Optimization 28.7 Materialized Query Tables 28.8 Autonomic Features in DB2 28.9 Tools and Utilities 28.10 Concurrency Control and Recovery 28.11 System Architecture 28.12 Replication, Distribution, and External Data 28.13 Business Intelligence Features Bibliographical Notes Database System Concepts - 5th Edition, May 23, 2005 27.62 ©Silberschatz, Korth and Sudarshan Concurrency Control and Recovery Concurrency and Isolation DB2 supports various modes of concurrency control and isolation Repeatable read (RR) mode Corresponds closely to the serializable level Ensures that range scans will find the same set of tuples if repeated Read stability (RS) mode Cursor stability (CS) mode Locks only the rows that an application retrieves in a unit of work Default isolation level Uncommitted read (UR) mode Database System Concepts - 5th Edition, May 23, 2005 27.63 ©Silberschatz, Korth and Sudarshan Concurrency Control and Recovery (cont.) Concurrency and Isolation (cont.) The various isolation modes are implemented by using locks DB2 supports record-level and table-level locks DB2 escalates from record-level to table-level locks if the space in the lock table becomes tight DB2 supports a variety of lock modes in order to maximize concurrency Fig. 28.19 shows the different lock modes and their descriptions Database System Concepts - 5th Edition, May 23, 2005 27.64 ©Silberschatz, Korth and Sudarshan Concurrency Control and Recovery (cont.) Database System Concepts - 5th Edition, May 23, 2005 27.65 ©Silberschatz, Korth and Sudarshan Concurrency Control and Recovery (cont.) Logging and Recovery DB2 supports two types of log modes Circular logging : useful for crash recovery or application failure recovery Archival logging : required for roll-forward recovery from an archival backup DB2 supports transaction rollback and crash recovery as well as point-in-time or roll-forward recovery In crash recovery, DB2 performs the standard phases of undo and redo processing For point-in-time recovery, database can be restored from a backup and can be rolled forward to a specific point in time using the archived logs The roll-forward recovery command supports both database and tablespace levels Database System Concepts - 5th Edition, May 23, 2005 27.66 ©Silberschatz, Korth and Sudarshan Chapter 28: IBM DB2 Universal Database 28.1 Overview 28.2 Database-Design Tools 28.3 SQL Variations and Extensions 28.4 Storage and Indexing 28.5 Multidimensional Clustering 28.6 Query Processing and Optimization 28.7 Materialized Query Tables 28.8 Autonomic Features in DB2 28.9 Tools and Utilities 28.10 Concurrency Control and Recovery 28.11 System Architecture 28.12 Replication, Distribution, and External Data 28.13 Business Intelligence Features Bibliographical Notes Database System Concepts - 5th Edition, May 23, 2005 27.67 ©Silberschatz, Korth and Sudarshan System Architecture DB2 Process Model Fig. 28.20 shows some of the different process or threads in DB2 Remote client applications connect to the database server through communication agents such as db2tcpcm Each application is assigned an agent called the db2agent thread A dedicated server process that serves for each active connection This agent and its subordinate agents perform the applicationrelated tasks There are a set of agents at the level of the server to perform tasks such as crash detection, license server, process creation and control of system resources Database System Concepts - 5th Edition, May 23, 2005 27.68 ©Silberschatz, Korth and Sudarshan System Architecture (Cont.) Database System Concepts - 5th Edition, May 23, 2005 27.69 ©Silberschatz, Korth and Sudarshan System Architecture (Cont.) DB2 Memory Model Fig. 28.21 shows the different types of memory segment in DB2 Private memory in agents or threads is mainly used for local variables and data structures that are relevant only for the current activity Shared memory is partitioned into database shared memory – contains useful data structures such as buffer pool, lock lists, application package caches, and shared sort areas server shared memory and application shared memory – are primarily used for common data structures and communication buffers Database System Concepts - 5th Edition, May 23, 2005 27.70 ©Silberschatz, Korth and Sudarshan System Architecture (Cont.) Database System Concepts - 5th Edition, May 23, 2005 27.71 ©Silberschatz, Korth and Sudarshan Chapter 28: IBM DB2 Universal Database 28.1 Overview 28.2 Database-Design Tools 28.3 SQL Variations and Extensions 28.4 Storage and Indexing 28.5 Multidimensional Clustering 28.6 Query Processing and Optimization 28.7 Materialized Query Tables 28.8 Autonomic Features in DB2 28.9 Tools and Utilities 28.10 Concurrency Control and Recovery 28.11 System Architecture 28.12 Replication, Distribution, and External Data 28.13 Business Intelligence Features Bibliographical Notes Database System Concepts - 5th Edition, May 23, 2005 27.72 ©Silberschatz, Korth and Sudarshan Replication, Distribution, and External Data DB2 Replication A product in the DB2 family that provides replication capabilities among DB2 other relational data sources such as Oracle, MS SQL Server, and other nonrelational data sources Queue Replication Creates a queue transport mechanism using IBM’s messagequeue product DB2 Information-integrator product Provides federation, replication, and search capabilities User-defined table functions Enable access to nonrelational and external data sources create function statement with the clause returns table Two-phase commit protocol Provides full supports for distributed transaction processing Database System Concepts - 5th Edition, May 23, 2005 27.73 ©Silberschatz, Korth and Sudarshan Chapter 28: IBM DB2 Universal Database 28.1 Overview 28.2 Database-Design Tools 28.3 SQL Variations and Extensions 28.4 Storage and Indexing 28.5 Multidimensional Clustering 28.6 Query Processing and Optimization 28.7 Materialized Query Tables 28.8 Autonomic Features in DB2 28.9 Tools and Utilities 28.10 Concurrency Control and Recovery 28.11 System Architecture 28.12 Replication, Distribution, and External Data 28.13 Business Intelligence Features Bibliographical Notes Database System Concepts - 5th Edition, May 23, 2005 27.74 ©Silberschatz, Korth and Sudarshan Business Intelligence Features DB2 Data Warehouse Edition An offering that incorporates business intelligence features Enhances the DB2 engine with features for ETL, OLAP, mining, and on-line reporting On-line Analytical Processing (OLAP) Cube views provides a mechanism to construct appropriate data structures and MQTs DB2 also provides multidimensional OLAP support using DB2 OLAP server DB2 Alphablox is a new feature that provides on-line, interactive, reporting, and analysis capabilities DB2 Intelligent Miner provides various components for modeling, scoring, and visualizing data Database System Concepts - 5th Edition, May 23, 2005 27.75 ©Silberschatz, Korth and Sudarshan Chapter 28: IBM DB2 Universal Database 28.1 Overview 28.2 Database-Design Tools 28.3 SQL Variations and Extensions 28.4 Storage and Indexing 28.5 Multidimensional Clustering 28.6 Query Processing and Optimization 28.7 Materialized Query Tables 28.8 Autonomic Features in DB2 28.9 Tools and Utilities 28.10 Concurrency Control and Recovery 28.11 System Architecture 28.12 Replication, Distribution, and External Data 28.13 Business Intelligence Features Summary & Bibliographical Notes Database System Concepts - 5th Edition, May 23, 2005 27.76 ©Silberschatz, Korth and Sudarshan Summary (1) This chapter provides a very brief synopsis of the features that are available in the DB2 Universal Database System In short, the DB2 Universal Database is multi-platform, scalable, object-relational database server DB2 Control Center includes various design- and administration- related tools DB2 provides support for a rich set of SQL features for various aspects of database processing The storage and indexing architecture in DB2 consists of the file- system or disk-management layer, the services to manage the buffer pools, data objects, and concurrency and recovery managers Database System Concepts - 5th Edition, May 23, 2005 27.77 ©Silberschatz, Korth and Sudarshan Summary (2) DB2 provides multidimensional clustering (MDC) Materialized views are supported in DB2 DB2 UDB version 8.2 provides features for simplifying the design and manageability of databases DB2 provides a number of tools for ease of use and administration DB2 supports a comprehensive set of concurrency-control, isolation, and recovery techniques DB2 Data Warehouse Edition is an offering in the DB2 family that incorporates business intelligence features Database System Concepts - 5th Edition, May 23, 2005 27.78 ©Silberschatz, Korth and Sudarshan Bibliographical Notes (1) The origin of DB2 can be traced back to the System R Project (Chamberlin et al.[1981]) IBM Research contributions include areas such as transaction processing (write-ahead logging and ARIES recovery algorithms) (Mohan et al.[1992]), query processing optimization (Starbust) (Haas et al.[1990]), parallel processing (DB2 Parallel Edition) (Baru et al.[1995]), active database support (constraints, triggers) (Cochrane et al.[1996]), advanced query and warehousing techniques such as materialized views (Zaharioudakis et al.[2000], Lehner et al.[2000]), multidimensional clustering (Padmanabahn et al.[2003], Bhattacharjee et al.[2003]), autonomic features (Zilio et al.[2004]) and object-relational support (ADTs, UDFs) (Carey et al.[1999]) Multiprocessor-query-processing details can be found in Baru et al.[1995] or in the DB2 Administration and Performance guides DB2 Online documentation Database System Concepts - 5th Edition, May 23, 2005 27.79 ©Silberschatz, Korth and Sudarshan Bibliographical Notes (2) Don Chamberlin’s books provide a good review of the SQL and programming features (Chamberlin[1996] and Chamberlin[1998]) Earlier books by C. J. Date and others provide a good review of the features of DB2 Universal Database for OS/390 (Date[1989] and Martin et al.[1989]) Recent book such as DB2 for Dummies (Zikopoulos et al.[2000]), DB2 SQL developer’s guide (Sanders[2000]), and the DB2 Administration Certification Guides (Cook et al.[1999]) provide hand-on training for using and administering DB2 Chamberlin[1998], Zikopoulos et al.[2004] and the DB2 documentation library provide a complete description of the SQL support Database System Concepts - 5th Edition, May 23, 2005 27.80 ©Silberschatz, Korth and Sudarshan Chapter 28: IBM DB2 Universal Database 28.1 Overview 28.2 Database-Design Tools 28.3 SQL Variations and Extensions 28.4 Storage and Indexing 28.5 Multidimensional Clustering 28.6 Query Processing and Optimization 28.7 Materialized Query Tables 28.8 Autonomic Features in DB2 28.9 Tools and Utilities 28.10 Concurrency Control and Recovery 28.11 System Architecture 28.12 Replication, Distribution, and External Data 28.13 Business Intelligence Features Summary & Bibliographical Notes Database System Concepts - 5th Edition, May 23, 2005 27.81 ©Silberschatz, Korth and Sudarshan End of Chapter Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan