CHAPTER 16: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING Answers to Selected Exercises 16.34. Consider a disk with the following characteristics (these are not parameters of any particular disk unit): block size 512 bytes; interblock gap size 128 bytes; number of blocks per track 20; number of tracks per surface 400. A disk pack consists of 15 double-sided disks. a. What is the total capacity of a track, and what is its useful capacity (excluding interblock gaps)? b. How many cylinders are there? c. What are the total capacity and the useful capacity of a cylinder? d. What are the total capacity and the useful capacity of a disk pack? e. Suppose that the disk drive rotates the disk pack at a speed of 2,400 rpm (revolutions per minute); what are the transfer rate ( ) in bytes/msec and the block transfer time ( ) in msec? What is the average rotational delay ( ) in msec? What is the bulk transfer rate? f. Suppose that the average seek time is 30 msec. How much time does it take (on the average) in msec to locate and transfer a single block, given its block address? g. Calculate the average time it would take to transfer 20 random blocks, and compare this with the time it would take to transfer 20 consecutive blocks using double buffering to save seek time and rotational delay. Answer: (a) Total track size = 20 * (512+128) = 12800 bytes = 12.8 Kbytes Useful capacity of a track = 20 * 512 = 10240 bytes = 10.24 Kbytes (b) Number of cylinders = number of tracks = 400 (c) Total cylinder capacity = 15*2*20*(512+128) = 384000 bytes Useful cylinder capacity = 15 * 2 * 20 * 512 = 307200 bytes (d) Total capacity of a disk pack = 15 * 2 * 400 * 20 * (512+128) = 153600000 bytes = 153.6 Mbytes Useful capacity of a disk pack = 15 * 2 * 400 * 20 * 512 = 122.88 Mbytes (e) Transfer rate tr= (total track size in bytes)/(time for one disk revolution in msec) tr= (12800) / ( (60 * 1000) / (2400) ) = (12800) / (25) = 512 bytes/msec block transfer time btt = B / tr = 512 / 512 = 1 msec average rotational delay rd = (time for one disk revolution in msec) / 2 = 25 / 2 = 12.5 msec bulk transfer rate btr= tr * ( B/(B+G) ) = 512*(512/640) = 409.6 bytes/msec (f) average time to locate and transfer a block = s+rd+btt = 30+12.5+1 = 43.5 msec (g) time to transfer 20 random blocks = 20 * (s + rd + btt) = 20 * 43.5 = 870 msec time to transfer 20 consecutive blocks using double buffering = s + rd + 20*btt = 30 + 12.5 + (20*1) = 62.5 msec (a more accurate estimate of the latter can be calculated using the bulk transfer rate as follows: time to transfer 20 consecutive blocks using double buffering = s+rd+((20*B)/btr) = 30+12.5+ (10240/409.6) = 42.5+ 25 = 67.5 msec) 16.35. A file has 20,000 STUDENT records of Each record has the following fields: Name (30 bytes), Ssn (9 bytes), Address (40 bytes), PHONE (10 bytes), Birth_date (8 bytes), Sex (1 byte), Major_dept_code (4 bytes), Minor_dept_code (4 bytes), Class_code (4 bytes, integer), and Degree_program (3 bytes). An additional byte is used as a deletion marker. The file is stored on the disk whose parameters are given in Exercise 16.27. a. Calculate the record size in bytes. b. Calculate the blocking factor and the number of file blocks , assuming an unspanned organization. c. Calculate the average time it takes to find a record by doing a linear search on the file if (i) the file blocks are stored contiguously, and double buffering is used; (ii) the file blocks are not stored contiguously. d. Assume that the file is ordered by Ssn; by doing a binary search, calculate the time it takes to search for a record given its Ssn value. Answer: (a) R = (30 + 9 + 40 + 10 + 8 + 1 + 4 + 4 + 4 + 3) + 1 = 113 bytes (b) bfr = floor(B / R) = floor(512 / 113) = 4 records per block b = ceiling(r / bfr) = ceiling(20000 / 4) = 5000 blocks (c) For linear search we search on average half the file blocks= 5000/2 = 2500 blocks. i. If the blocks are stored consecutively, and double buffering is used, the time to read 2500 consecutive blocks = s+rd+(2500*(B/btr))= 30+12.5+(2500*(512/409.6)) = 3167.5 msec = 3.1675 sec (a less accurate estimate is = s+rd+(2500*btt)= 30+12.5+2500*1= 2542.5 msec) ii. If the blocks are scattered over the disk, a seek is needed for each block, so the time is: 2500 * (s + rd + btt) = 2500 * (30 + 12.5 + 1) = 108750 msec (d) For binary search, the time to search for a record is estimated as: ceiling(log 2 b) * (s +rd + btt) = ceiling(log 2 5000) * (30 + 12.5 + 1) = 13 * 43.5 = 565.5 msec 16.36. Suppose that only 80% of the STUDENT records from Exercise 16.28 have a value for Phone, 85% for Major_dept_code, 15% for Minor_dept_code, and 90% for Degree_program; and suppose that we use a variable-length record file. Each record has a 1-byte for each field in the record, plus the 1-byte deletion marker and a 1-byte end-of-record marker. Suppose that we use a record organization, where each block has a 5-byte pointer to the next block (this space is not used for record storage). a. Calculate the average record length in bytes. b. Calculate the number of blocks needed for the file. Answer: (a) Assuming that every field has a 1-byte field type, and that the fields not mentioned above (NAME, SSN, ADDRESS, BIRTHDATE, SEX, CLASSCODE) have values in every record, we need the following number of bytes for these fields in each record, plus 1 byte for the deletion marker, and 1 byte for the end-of-record marker: R fixed = (30+1) + (9+1) + (40+1) + (8+1) + (1+1) + (4+1) +1+1 = 100 bytes For the fields (PHONE, MAJORDEPTCODE, MINORDEPTCODE DEGREEPROGRAM), the average number of bytes per record is: R variable = ((9+1)*0.8)+((4+1)*0.85)+((4+1)*0.15)+((3+1)*0.9) = 8+4.25+0.75+3.6= 16.6 bytes The average record size R = R fixed + R variable = 100 + 16.6 = 116.6 bytes The total bytes needed for the whole file = r * R = 20000 * 116.6 = 2332000 bytes (b) Using a spanned record organization with a 5-byte pointer at the end of each block, the bytes available in each block are (B-5) = (512 - 5) = 507 bytes. The number of blocks needed for the file are: b = ceiling((r * R) / (B - 5)) = ceiling(2332000 / 507) = 4600 blocks ( compare this with the 5000 blocks needed for fixed-length, unspanned records in Problem 4.19(b) ) 16.37. Suppose that a disk unit has the following parameters: seek time 20 msec; rotational delay 10 msec; block transfer time 1 msec; block size 2400 bytes; interblock gap size = 600 bytes. An EMPLOYEE file has the following fields: Ssn, 9 bytes; Last_name, 20 bytes; First_name, 20 bytes; Middle_init, 1 byte; Birth_date, 10 bytes; Address, 35 bytes; Phone, 12 bytes; Supervisor_ssn, 9 bytes; Department, 4 bytes; Job_code, 4 bytes; deletion marker, 1 byte. The EMPLOYEE file has 30,000 records, fixed-length format, and unspanned blocking. Write appropriate formulas calculate the following values for the above EMPLOYEE file: a. Calculate the record size (including the deletion marker), the blocking factor , and the number of disk blocks . b. Calculate the wasted space in each disk block because of the unspanned organization. c. Calculate the transfer rate and the bulk transfer rate for this disk unit (see Appendix B for definitions of and ). d. Calculate the average needed to search for an arbitrary record in the file, using linear search. e. Calculate in msec the average needed to search for an arbitrary record in the file, using linear search, if the file blocks are stored on consecutive disk blocks and double buffering is used. f. Calculate in msec the average needed to search for an arbitrary record in the file, using linear search, if the file blocks are stored on consecutive disk blocks. g. Assume that the records are ordered via some key field. Calculate the average and the needed to search for an arbitrary record in the file, using binary search. Answer: (a) R = (9 + 20 + 20 + 1 + 10 + 35 + 12 + 9 + 4 + 4) + 1 = 125 bytes bfr = floor(B / R) = floor(2400 / 125) = 19 records per block b = ceiling(r / bfr) = ceiling(30000 / 19) = 1579 blocks (b) Wasted space per block = B - (R * Bfr) = 2400 - (125 * 19) = 25 bytes (c) Transfer rate tr= B/btt = 2400 / 1 = 2400 bytes/msec bulk transfer rate btr= tr * ( B/(B+G) ) = 2400*(2400/(2400+600)) = 1920 bytes/msec (d) For linear search we have the following cases: i. search on key field: if record is found, half the file blocks are searched on average: b/2= 1579/2 blocks if record is not found, all file blocks are searched: b = 1579 blocks ii. search on non-key field: all file blocks must be searched: b = 1579 blocks (e) If the blocks are stored consecutively, and double buffering is used, the time to read n consecutive blocks= s+rd+(n*(B/btr)) i. if n=b/2: time = 20+10+((1579/2)*(2400/1920))= 1016.9 msec = 1.017 sec (a less accurate estimate is = s+rd+(n*btt)= 20+10+(1579/2)*1= 819.5 msec) ii. if n=b: time = 20+10+(1579*(2400/1920))= 2003.75 msec = 2.004 sec (a less accurate estimate is = s+rd+(n*btt)= 20+10+1579*1= 1609 msec) (f) If the blocks are scattered over the disk, a seek is needed for each block, so the time to search n blocks is: n * (s + rd + btt) i. if n=b/2: time = (1579/2)*(20+10+1)= 24474.5 msec = 24.475 sec ii. if n=b: time = 1579*(20+10+1)= 48949 msec = 48.949 sec (g) For binary search, the time to search for a record is estimated as: ceiling(log 2 b) * (s +rd + btt) = ceiling(log 2 1579) * (20+10+1) = 11 * 31 = 341 msec = 0.341 sec 16.38. A PARTS file with Part# as the hash key includes records with the following Part# values: 2369, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428, 3943, 4750, 6975, 4981, and 9208. The file uses eight buckets, numbered 0 to 7. Each bucket is one disk block and holds two records. Load these records into the file in the given order, using the hash function ( ) mod 8. Calculate the average number of block accesses for a random retrieval on Part#. Answer: The records will hash to the following buckets: K h(K) (bucket number) 2369 1 3760 0 4692 4 4871 7 5659 3 1821 5 1074 2 7115 3 1620 4 2428 4 overflow 3943 7 4750 6 6975 7 overflow 4981 5 9208 0 9209 1 Two records out of 15 are in overflow, which will require an additional block access. The other records require only one block access. Hence, the average time to retrieve a random record is: (1 * (13/15)) + (2 * (2/15)) = 0.867 + 0.266 = 1.133 block accesses 16.39. Load the records of Exercise 16.31 into expandable hash files based on extendible hashing. Show the structure of the directory at each step, and the global and local depths. Use the hash function ( ) mod 32. Answer: Hashing the records gives the following result: 16.40. Load the records of Exercise 16.31 into an expandable hash file, using linear hashing. Start with a single disk block, using the hash function 0 mod 20, and show how the file grows and how the hash functions change as the records are inserted. Assume that blocks are split whenever an overflow occurs, and show the value of at each stage. Answer: Note: It is more common to specify a certain load factor for the file for triggering the splitting of buckets (rather than triggering the splitting whenever a new record being inserted is placed in overflow). The load factor lf could be defined as: lf = (r) / (b * Bfr) where r is the current number of records, b is the current number of buckets, and Bfr is the maximum number of records per bucket. Whenever lf gets to be larger than some threshold, say 0.8, a split is triggerred. It is also possible to merge backets in the reverse order in which they were created; a merge operation would be triggerred whenever lf becomes less than another threshold, say 0.6. 16.49. Suppose that a file initially contains 120,000 records of 200 bytes each in an unsorted (heap) file. The block size = 2,400 bytes, the average seek time 16 ms, the average rotational latency 8.3 ms, and the block transfer time 0.8 ms. Assume that 1 record is deleted for every 2 records added until the total number of active records is 240,000. a. How many block transfers are needed to reorganize the file? b. How long does it take to find a record right before reorganization? c. How long does it take to find a record right after reorganization? Answer: Let X = # of records deleted Hence 2X= # of records added. Total active records = 240,000 = 120,000 - X + 2X. Hence, X = 120,000 Records before reorganization (i.e., before deleting any records physically) = 360,000. (a) No. of blocks for Reorganization = Blocks Read + Blocks Written. - 200 bytes/record and 2400 bytes/block gives us 12 records per block - Reading involves 360,000 records; i.e. 360,000/12 = 30K blocks - Writing involves 240,000 records; i.e., 240000/12 = 20K blocks. Total blocks transferred during reorganization = 30K + 20K = 50K blocks. (b) Time to locate a record before reorganization. On an average we assume that half the file will be read. Hence, Time = (b/2)* btt = 15000 * 0.8 ms = 12000 ms. = 12 sec. (c) Time to locate a record after reorganization = (b/2) * btt = 10000 * 0.8 = 8 sec. 16.50. Suppose we have a sequential (ordered) file of 100,000 records where each record is 240 bytes. Assume that 2,400 bytes, 16 ms, 8.3 ms, and 0.8 ms. Suppose we want to make independent random record reads from the file. We could make random block reads or we could perform one exhaustive read of the entire file looking for those records. The question is to decide when it would be more efficient to perform one exhaustive read of the entire file than to perform individual random reads. That is, what is the value for when an exhaustive read of the file is more efficient than random reads? Develop this as a function of . Answer: Total blocks in file = 100000 records * 240 bytes/record divided by 2400 bytes/block = 10000 blocks. Time for exhaustive read = s + r + b.btt = 16 + 8.3 + (10000) * 0.8 = 8024.3 msec Let X be the # of records searched randomly that takes more time than exhaustive read time. Hence, X (s + r + btt) > 8024.3 X (16+8.3+0.8) > 8024.3 X > 8024.3/25.1 Thus, X > 319.69 i.e. If at least 320 random reads are to be made, it is better to search the file exhaustively. 16.51. Suppose that a static hash file initially has 600 buckets in the primary area and that records are inserted that create an overflow area of 600 buckets. If we reorganize the hash file, we can assume that most of the overflow is eliminated. If the cost of reorganizing the file is the cost of the bucket transfers (reading and writing all of the buckets) and the only periodic file operation is the fetch operation, then how many times would we have to perform a fetch (successfully) to make the reorganization cost effective? That is, the reorganization cost and subsequent search cost are less than the search cost before reorganization. Support your answer. Assume 16 msec, 8.3 msec, and 1 msec. Answer: Primary Area = 600 buckets Secondary Area (Overflow) = 600 buckets Total reorganization cost = Buckets Read & Buckets Written for (600 & 600) + 1200 = 2400 buckets = 2400 (1 ms) = 2400 ms Let X = number of random fetches from the file. Average Search time per fetch = time to access (1 + 1/2) buckets where 50% of time we need to access the overflow bucket. Access time for one bucket access = (S + r + btt) = 16 + 8.3 + 0-8 = 25.1 ms Time with reorganization for the X fetches = 2400 + X (25.1) ms Time without reorganization for X fetches = X (25.1) (1 + 1/2) ms = 1.5 * X * (25.1) ms. Hence, 2400 + X (25.1) < (25.1) * (1.5X) 2374.9/ 12.55 < X Hence, 189.23 < X If we make at least 190 fetches, the reorganization is worthwhile. 16.52. Suppose we want to create a linear hash file with a file load factor of 0.7 and a blocking factor of 20 records per bucket, which is to contain 112,000 records initially. a. How many buckets should we allocate in the primary area? b. What should be the number of bits used for bucket addresses? Answer: (a) No. of buckets in primary area. = 112000/(20*0.7) = 8000. (b) K: the number of bits used for bucket addresses 2K < = 8000 < = 2 k+1 2 12 = 4096 2 13 = 8192 K = 12 Boundary Value = 8000 – 2^12 = 8000 - 4096 = 3904 CHAPTER 9: Storing Data: Disks and Files Exercise 9.5 Consider a disk with a sector size of 512 bytes, 2000 tracks per surface, 50 sectors per track, five double-sided platters, and average seek time of 10 msec. 1. What is the capacity of a track in bytes? What is the capacity of each surface? What is the capacity of the disk? 2. How many cylinders does the disk have? 3. Give examples of valid block sizes. Is 256 bytes a valid block size? 2048? 51,200? 4. the disk platters rotate at 5400 rpm (revolutions per minute), what is the maximum rotational delay? 5. one track of data can be transferred per revolution, what is the transfer rate? Answer : 1. bytes/track = bytes/sector × sectors/track = 512 × 50 = 25K bytes/surface = bytes/track × tracks/surface = 25K × 2000 = 50,000 K bytes/disk = bytes/surface× surfaces/disk = 50, 000K × 5 × 2 = 500,000 K 2. The number of cylinders is the same as the number of tracks on each platter, which is 2000. 3. The block size should be a multiple of the sector size. We can see that 256 is not a valid block size while 2048 is. 51200 is not a valid block size in this case because block size cannot exceed the size of a track, which is 25600 bytes. 4. If the disk platters rotate at 5400rpm, the time required for one complete rotation, which is the maximum rotational delay, is 1/5400 × 60 = 0.011 seconds . The average rotational delay is half of the rotation time, 0.006 seconds. 5. The capacity of a track is 25K bytes. Since one track of data can be transferred per revolution, the data transfer rate is 25K/0.011 = 2, 250 Kbytes/second Exercise 9.6 Consider again the disk specifications from Exercise 9.5 and suppose that a block size of 1024 bytes is chosen. Suppose that a file containing 100,000 records of 100 bytes each is to be stored on such a disk and that no record is allowed to span two blocks. 1. How many records fit onto a block? 2. How many blocks are required to store the entire file? the file is arranged sequentially on disk, how lllallY surfaces are needed? 3. How many records of 100 bytes each can be stored using this disk? 4. pages are stored sequentially on disk, with page 1 on block 1 of track 1, what page is stored on block 1 of track 1 on the next disk surface? How would your answer change if the disk were capable of reading and writing from all heads in parallel? 5. VVhat titne is required to read a file containing 100,000 records of 100 bytes each sequentially? Again, how \vould your answer change if the disk were capable of reading/writing from all heads in parallel (and the data was arranged optimally)? 6. \\That is the time required to read a file containing 100,000 records of 100 bytes each in a random order? To read a record, the block containing the recOl'd has to be fetched from disk. Assume that each block request incurs the average seek time and Answer : 1. 1024/100 = 10. We can have at most 10 records in a block. 2. There are 100,000 records all together, and each block holds 10 records. Thus, we need 10,000 blocks to store the file. One track has 25 blocks, one cylinder has 250 blocks. we need 10,000 blocks to store this file. So we will use more than one cylinders, that is, need 10 surfaces to store this file. 3. The capacity of the disk is 500,000K, which has 500,000 blocks. Each block has 10 records. Therefore, the disk can store no more than 5,000,000 records. 4. There are 25K bytes, or we can say, 25 blocks in each track. It is block 26 on block 1 of track 1 on the next disk surface. If the disk were capable of reading/writing from all heads in parallel, we can put the first 10 pages on the block 1 of track 1 of all 10 surfaces. Therefore, it is block 2 on block 1 of track 1 on the next disk surface. 5. A file containing 100,000 records of 100 bytes needs 40 cylinders or 400 tracks in this disk. The transfer time of one track of data is 0.011 seconds. Then it takes 400 × 0.011 = 4.4seconds to transfer 400 tracks. This access seeks the track 40 times. The seek time is 40 × 0.01 = 0.4seconds. Therefore, total access time is 4.4+ 0.4 = 4.8seconds. If the disk were capable of reading/writing from all heads in parallel, the disk can read 10 tracks at a time. The transfer time is 10 times less, which is 0.44 seconds. Thus total access time is 0.44 + 0.4 = 0.84seconds 6. For any block of data, Averageaccesstime = seektime + rotationaldelay + transfertime. seektime = 10msec rotationaldelay = 6msec transfertime = 1K / 2, 250K/sec = 0.44msec The average access time for a block of data would be 16.44 msec. For a file containing 100,000 records of 100 bytes, the total access time would be 164.4 seconds. CHAPTER 17: INDEXING STRUCTURES FOR FILES Answers to Selected Exercises 17.18. Consider a disk with block size = 512 bytes. A block pointer is = 6 bytes long, and a record pointer is = 7 bytes long. A file has = 30,000 EMPLOYEE records of . Each record has the following fields: Name (30 bytes),Ssn (9 bytes), Department_code (9 bytes), Address (40 bytes), Phone (10 bytes), Birth_date (8 bytes), Sex (1 byte), Job_code (4 bytes), and Salary (4 bytes, real number). An additional byte is used as a deletion marker. a. Calculate the record size in bytes. b. Calculate the blocking factor and the number of file blocks , assuming an unspanned organization. c. Suppose that the file is by the key field Ssn and we want to construct a on Ssn. Calculate (i) the index blocking factor (which is also the index fan-out ); (ii) the number of firstlevel index entries and the number of first-level index blocks; (iii) the number of levels needed if we make it into a multilevel index; (iv) the total number of blocks required by the multilevel index; and (v) the number of block accesses needed to search for and retrieve a record from the file—given its Ssn value—using the primary index. d. Suppose that the file is by the key field Ssn and we want to construct a on Ssn. Repeat the previous exercise (part c) for the secondary index and compare with the primary index. e. Suppose that the file is by the nonkey field Department_code and we want to construct a on Department_code, using option 3 of Section 17.1.3, with an extra level of indirection that stores record pointers. Assume there are 1,000 distinct values of Department_code and that the EMPLOYEE records are evenly distributed among these values. Calculate (i) the index blocking factor (which is also the index fan-out ); (ii) the number of blocks needed by the level of indirection that stores record pointers; (iii) the number of first-level index entries and the number of first-level index blocks; (iv) the number of levels needed if we make it into a multilevel index; (v) the total number of blocks required by the multilevel index and the blocks used in the extra level of indirection; and (vi) the approximate number of block accesses needed to search for and retrieve all records in the file that have a specific Department_code value, using the index. f. Suppose that the file is by the nonkey field Department_code and we want to construct a on Department_code that uses block anchors (every new value of Department_code starts at the beginning of a new block). Assume there are 1,000 distinct values of Department_code and that the EMPLOYEE records are evenly distributed among these values. Calculate (i) the index blocking factor (which is also the index fan-out ); (ii) the number of first-level index entries and the number of first-level index blocks; (iii) the number of levels needed if we make it into a multilevel index; (iv) the total number of blocks required by the multilevel index; and (v) the number of block accesses needed to search for and retrieve all records in the file that have a specific Department_code value, using the clustering index (assume that multiple blocks in a cluster are contiguous). g. Suppose that the file is ordered by the key field Ssn and we want to construct a B+-tree access structure (index) on Ssn. Calculate (i) the orders and of the B+-tree; (ii) the number of leaf-level blocks needed if blocks are approximately 69% full (rounded up for convenience); (iii) the number of levels needed if internal nodes are also 69% full (rounded up for convenience); (iv) the total number of blocks required by the B+-tree; and (v) the number of block accesses needed to search for and retrieve a record from the file—given its Ssn value—using the B+-tree. h. Repeat part g, but for a B-tree rather than for a B+-tree. Compare your results for the B-tree and for the B+-tree. Answer: (a) Record length R = (30 + 9 + 9 + 40 + 9 + 8 + 1 + 4 + 4) + 1 = 115 bytes (b) Blocking factor bfr = floor(B/R) = floor(512/115) = 4 records per block Number of blocks needed for file = ceiling(r/bfr) = ceiling(30000/4) = 7500 (c) i. Index record size R i = (V SSN + P) = (9 + 6) = 15 bytes Index blocking factor bfr i = fo = floor(B/R i ) = floor(512/15) = 34 ii. Number of first-level index entries r 1 = number of file blocks b = 7500 entries Number of first-level index blocks b 1 = ceiling(r 1 /bfr i ) = ceiling(7500/34) = 221 blocks iii. We can calculate the number of levels as follows: Number of second-level index entries r 2 = number of first-level blocks b 1 = 221 entries Number of second-level index blocks b 2 = ceiling(r 2 /bfr i ) = ceiling(221/34) = 7 blocks Number of third-level index entries r 3 = number of second-level index blocks b 2 = 7 entries Number of third-level index blocks b 3 = ceiling(r 3 /bfr i ) = ceiling(7/34) = 1 Since the third level has only one block, it is the top index level. Hence, the index has x = 3 levels iv. Total number of blocks for the index b i = b 1 + b 2 + b 3 = 221 + 7 + 1 = 229 blocks v. Number of block accesses to search for a record = x + 1 = 3+1=4 (d) i. Index record size R i = (V SSN + P) = (9 + 6) = 15 bytes Index blocking factor bfr i = (fan-out) fo = floor(B/R i ) = floor(512/15) = 34 index records per block (This has not changed from part (c) above) (Alternative solution: The previous solution assumes that leaf-level index blocks contain block pointers; it is also possible to assume that they contain record pointers, in which case the index record size would be V SSN + P R = 9 + 7 = 16 bytes. In this case, the calculations for leaf nodes in (i) below would then have to use R i = 16 bytes rather than R i = 15 bytes, so we get: Index record size R i = (V SSN + P R ) = (9 + 7) = 15 bytes Leaf-level ndex blocking factor bfr i = floor(B/R i ) = floor(512/16) = 32 index records per block However, for internal nodes, block pointers are always used so the fan-out for internal nodes fo would still be 34.) ii. Number of first-level index entries r 1 = number of file records r = 30000 Number of first-level index blocks b 1 = ceiling(r 1 /bfr i ) = ceiling(30000/34) = 883 blocks (Alternative solution: Number of first-level index entries r 1 = number of file records r = 30000 Number of first-level index blocks b 1 = ceiling(r 1 /bfr i ) = ceiling(30000/32) = 938 blocks) iii. We can calculate the number of levels as follows: Number of second-level index entries r 2 = number of first-level index blocks b 1 = 883 entries Number of second-level index blocks b 2 = ceiling(r 2 /bfr i ) = ceiling(883/34) = 26 blocks Number of third-level index entries r 3 = number of second-level index blocks b 2 = 26 entries Number of third-level index blocks b 3 = ceiling(r 3 /bfr i ) = ceiling(26/34) = 1 Since the third level has only one block, it is the top index level. Hence, the index has x = 3 levels (Alternative solution: Number of second-level index entries r 2 = number of first-level index blocks b 1 = 938 entries Number of second-level index blocks b 2 = ceiling(r 2 /bfr i ) = ceiling(938/34) = 28 blocks Number of third-level index entries r 3 = number of second-level index blocks b 2 = 28 entries Number of third-level index blocks b 3 = ceiling(r 3 /bfr i ) = ceiling(28/34) = 1 Since the third level has only one block, it is the top index level. Hence, the index has x = 3 levels) iv. Total number of blocks for the index b i = b 1 + b 2 + b 3 = 883 + 26 + 1 = 910 (Alternative solution: Total number of blocks for the index b i = b 1 + b 2 + b 3 = 938 + 28 + 1 = 987) v. Number of block accesses to search for a record = x + 1 = 3+1=4 (e) i. Index record size R i = (V DEPARTMENTCODE + P) = (9 + 6) = 15 bytes Index blocking factor bfr i = (fan-out) fo = floor(B/R i ) = floor(512/15) = 34 index records per block ii. There are 1000 distinct values of DEPARTMENTCODE, so the average number of records for each value is (r/1000) = (30000/1000) = 30 Since a record pointer size P R = 7 bytes, the number of bytes needed at the level of indirection for each value of DEPARTMENTCODE is 7 * 30 =210 bytes, which fits in one block. Hence, 1000 blocks are needed for the level of indirection. iii. Number of first-level index entries r 1 = number of distinct values of DEPARTMENTCODE = 1000 entries Number of first-level index blocks b 1 = ceiling(r 1 /bfr i ) = ceiling(1000/34) = 30 blocks iv. We can calculate the number of levels as follows: Number of second-level index entries r 2 = number of first-level index blocks b 1 = 30 entries Number of second-level index blocks b 2 = ceiling(r 2 /bfr i ) = ceiling(30/34) = 1 Hence, the index has x = 2 levels v. total number of blocks for the index b i = b 1 + b 2 + b indirection = 30 + 1 + 1000 = 1031 blocks vi. Number of block accesses to search for and retrieve the block containing the record pointers at the level of indirection = x + 1 = 2 + 1 = 3 block accesses If we assume that the 30 records are distributed over 30 distinct blocks, we need an additional 30 block accesses to retrieve all 30 records. Hence, total block accesses needed on average to retrieve all the records with a given value for DEPARTMENTCODE = x + 1 + 30 = 33 (f) i. Index record size R i = (V DEPARTMENTCODE + P) = (9 + 6) = 15 bytes Index blocking factor bfr i = (fan-out) fo = floor(B/R i ) = floor(512/15) = 34 index records per block ii. Number of first-level index entries r 1 = number of distinct DEPARTMENTCODE values= 1000 entries Number of first-level index blocks b 1 = ceiling(r 1 /bfr i ) = ceiling(1000/34) = 30 blocks iii. We can calculate the number of levels as follows: Number of second-level index entries r 2 = number of first-level index blocks b 1 = 30 entries Number of second-level index blocks b 2 = ceiling(r 2 /bfr i ) = ceiling(30/34) = 1 Since the second level has one block, it is the top index level. Hence, the index has x = 2 levels iv. Total number of blocks for the index b i = b 1 + b 2 = 30 + 1 = 31 blocks v. Number of block accesses to search for the first block in the cluster of blocks = x + 1 = 2 + 1 = 3 The 30 records are clustered in ceiling(30/bfr) = ceiling(30/4) = 8 blocks. Hence, total block accesses needed on average to retrieve all the records with a given DEPARTMENTCODE = x + 8 = 2 + 8 = 10 block accesses (g) i. For a B + -tree of order p, the following inequality must be satisfied for each internal tree node: (p * P) + ((p - 1) * V SSN ) < B, or (p * 6) + ((p - 1) * 9) < 512, which gives 15p < 521, so p=34 For leaf nodes, assuming that record pointers are included in the leaf nodes, the following inequality must be satisfied: (p leaf * (V SSN +P R )) + P < B, or (p leaf * (9+7)) + 6 < 512, which gives 16p leaf < 506, so p leaf =31 ii. Assuming that nodes are 69% full on the average, the average number of key values in a leaf node is 0.69*p leaf = 0.69*31 = 21.39. If we round this up for convenience, we get 22 key values (and 22 record pointers) per leaf node. Since the file has 30000 records and hence 30000 values of SSN, the number of leaf-level nodes (blocks) needed is b 1 = ceiling(30000/22) = 1364 blocks iii. We can calculate the number of levels as follows: The average fan-out for the internal nodes (rounded up for convenience) is fo = ceiling(0.69*p) = ceiling(0.69*34) = ceiling(23.46) = 24 number of second-level tree blocks b 2 = ceiling(b 1 /fo) = ceiling(1364/24) = 57 blocks number of third-level tree blocks b 3 = ceiling(b 2 /fo) = ceiling(57/24)= 3 number of fourth-level tree blocks b 4 = ceiling(b 3 /fo) = ceiling(3/24) = 1 Since the fourth level has only one block, the tree has x = 4 levels (counting the leaf level). Note: We could use the formula: x = ceiling(log fo (b 1 )) + 1 = ceiling(log 24 1364) + 1 = 3 + 1 = 4 levels iv. total number of blocks for the tree b i = b 1 + b 2 + b 3 + b 4 = 1364 + 57 + 3 + 1 = 1425 blocks v. number of block accesses to search for a record = x + 1 = 4+1=5 17.19. A PARTS file with Part# as the key field includes records with the following Part# values: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39, 43, 47, 50, 69, 75, 8, 49, 33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order = 4 and leaf = 3; show how the tree will expand and what the final tree will look like. Answer: A B + -tree of order p=4 implies that each internal node in the tree (except possibly the root) should have at least 2 keys (3 pointers) and at most 4 pointers. For p leaf =3, leaf nodes must have at least 2 keys and at most 3 keys. The figure on page 50 shows how the tree progresses as the keys are inserted. We will only show a new tree when insertion causes a split of one of the leaf nodes, and then show how the split propagates up the tree. Hence, step 1 below shows the tree after insertion of the first 3 keys 23, 65, and 37, and before inserting 60 which causes overflow and splitting. The trees given below show how the keys are inserted in order. Below, we give the keys inserted for each tree: 1 :23, 65, 37; 2:60; 3:46; 4:92; 5:48, 71; 6:56; 7:59, 18; 8:21; 9:10; 10:7 4 ; 11:78; 12:15; 13:16; 14:20; 15:24; 16:28, 39; 17:43, 47; 18:50, 69; 19:7 5 ; 20:8, 49, 33, 38; 17.20. Repeat Exercise 17.19, but use a B-tree of order = 4 instead of a B+-tree. 17.21. Suppose that the following search field values are deleted, in the given order, from the B+-tree of Exercise 17.19; show how the tree will shrink and show the final tree. The deleted values are 65, 75, 43, 18, 20, 92, 59, 37. Answer: An important note about a deletion algorithm for a B + -tree is that deletion of a key value from a leaf node will result in a reorganization of the tree if: (i) The leaf node is less than half full; in this case, we will combine it with the next leaf node (other algorithms combine it with either the next or the previous leaf nodes, or both), (ii) If the key value deleted is the rightmost (last) value in the leaf node, in which case its value will appear in an internal node; in this case, the key value to the left of the deleted key in the left node replaces the deleted key value in the internal node. Following is what happens to the tree number 19 after the specified deletions (not tree number 20): Deleting 65 will only affect the leaf node. Deleting 75 will cause a leaf node to be less than half full, so it is combined with the next node; also, 75 is removed from the internal node leading to the following tree: Deleting 43 causes a leaf node to be less than half full, and it is combined with the next node. Since the next node has 3 entries, its rightmost (first) entry 46 can replace 43 in both the leaf and internal nodes, leading to the following tree: Next, we delete 18, which is a rightmost entry in a leaf node and hence appears in an internal node of the B + -tree. The leaf node is now less than half full, and is combined with the next node. The value 18 must also be removed from the internal node, causing underflow in the internal node. One approach for dealing with underflow in internal nodes is to reorganize the values of the underflow node with its child nodes, so 21 is moved up into the underflow node leading to the following tree: Deleting 20 and 92 will not cause underflow. Deleting 59 causes underflow, and the remaining value 60 is combined with the next leaf node. Hence, 60 is no longer a rightmost entry in a leaf node and must be removed from the internal node. This is normally done by moving 56 up to replace 60 in the internal node, but since this leads to underflow in the node that used to contain 56, the nodes can be reorganized as follows: Finally, removing 37 causes serious underflow, leding to a reorganization of the whole tree. One approach to deleting the value in the root node is to use the rightmost value in the next leaf node (the first leaf node in the right subtree) to replace the root, and move this leaf node to the left subtree. In this case, the resulting tree may look as follows: 17.22. Repeat Exercise 17.21, but for the B-tree of Exercise 17.20. B+-tree insertion with left redistribution. CHAPTER 10: TREE-STRUCTURED INDEXING Exercise 10.1 Consider the B+ tree index of order d = 2 shown in Figure 10.1. 1. Show the tree that would result from inserting a data entry with key 9 into this tree. 2. Show the B+ tree that would result from inserting a data entry with key 3 into the original tree. How many page reads and page writes does the insertion require? 3. Show the B+ tree that would result from deleting the data entry with key 8 from the original tree, assuming that the left sibling is checked for possible redistribution. 4. Show the B+ tree that would result from deleting the data entry with key 8 from the original tree, assuming that the right sibling is checked for possible redistribution. 5. Show the B+ tree that would result from starting with the original tree, inserting a data entry with key 46 and then deleting the data entry with key 52. 6. Show the B+ tree that would result from deleting the data entry with key 91 from the original tree. 7. Show the B+ tree that would result from starting with the original tree, inserting a data entry with key 59, and then deleting the data entry with key 91. 8. Show the B+ tree that would result from successively deleting the data entries with keys 32, 39, 41, 45, and 73 from the original tree. Answer 10.1 1. The data entry with key 9 is inserted on the second leaf page. The resulting tree is shown in figure 10.2. 2. The data entry with key 3 goes on the first leaf page F. Since F can accommodate at most four data entries (d = 2), F splits. The lowest data entry of the new leaf is given up to the ancestor which also splits. The result can be seen in figure 10.3. The insertion will require 5 page writes, 4 page reads and allocation of 2 new pages. 3. The data entry with key 8 is deleted, resulting in a leaf page N with less than two data entries. The left sibling L is checked for redistribution. Since L has more than two data entries, the remaining keys are redistributed between L and N, resulting in the tree in figure 10.4. 4. As is part 3, the data entry with key 8 is deleted from the leaf page N. N’s right sibling R is checked for redistribution, but R has the minimum number of keys. Therefore the two siblings merge. The key in the ancestor which distinguished between the newly merged leaves is deleted. The resulting tree is shown in figure 10.5. 5. The data entry with key 46 can be inserted without any structural changes in the tree. But the removal of the data entry with key 52 causes its leaf page L to merge with a sibling (we chose the right sibling). This results in the removal of a key in the ancestor A of L and thereby lowering the number of keys on A below the minimum number of keys. Since the left sibling B of A has more than the minimum number of keys, redistribution between A and B takes place. The final tree is depicted in figure 10.6. 6. Deleting the data entry with key 91 causes a scenario similar to part 5. The result can be seen in figure 10.7. 7. The data entry with key 59 can be inserted without any structural changes in the tree. No sibling of the leaf page with the data entry with key 91 is affected by the insert. Therefore deleting the data entry with key 91 changes the tree in a way very similar to part 6. The result is depicted in figure 10.8. 8. Considering checking the right sibling for possible merging first, the successive deletion of the data entries with keys 32, 39, 41, 45 and 73 results in the tree shown in figure 10.9. Exercise 10.3 Answer the following questions: 1. What is the minimum space utilization for a B+ tree index? Answer 10.3 1. By the definition of a B+ tree, each index page, except for the root, has at least d and at most 2d key entries. Therefore—with the exception of the root—the minimum space utilization guaranteed by a B+ tree index is 50 percent. Exercise 10.4 Suppose that a page can contain at most four data values and that all data values are integers. Using only B+ trees of order 2, give examples of each of the following: 1. A B+ tree whose height changes from 2 to 3 when the value 25 is inserted. Show your structure before and after the insertion. 2. A B+ tree in which the deletion of the value 25 leads to a redistribution. Show your structure before and after the deletion. 3. A B+ tree in which the deletion of the value 25 causes a merge of two nodes but without altering the height of the tree. 4. An ISAM structure with four buckets, none of which has an overflow page. Further, every bucket has space for exactly one more entry. Show your structure before and after inserting two additional values, chosen so that an overflow page is created. Answer 10.4 For these answers, two illustrations are given, one showing the tree before the specified change and one showing it after. 1. See Figures 10.13 and 10.14. 2. See Figures 10.15 and 10.16. 3. See Figures 10.17 and 10.18. 4. See Figures 10.19 and 10.20 (inserted 27 and 29). Exercise 10.5 Consider the B+ tree shown in Figure 10.21. 1. Identify a list of five data entries such that: (a) Inserting the entries in the order shown and then deleting them in the opposite order (e.g., insert a, insert b, delete b, delete a) results in the original tree. (b) Inserting the entries in the order shown and then deleting them in the opposite order (e.g., insert a, insert b, delete b, delete a) results in a different tree. 2. What is the minimum number of insertions of data entries with distinct keys that will cause the height of the (original) tree to change from its current value (of 1) to 3? 3. Would the minimum number of insertions that will cause the original tree to increase to height 3 change if you were allowed to insert duplicates (multiple data entries with the same key), assuming that overflow pages are not used for handling duplicates? Answer 10.5 The answer to each question is given below. 1. The answer to each part is given below. (a) One example is the set of five data entries with keys 17, 18, 13, 15, and 25. Inserting 17 and 18 will cause the tree to split and gain a level. Inserting 13, 15, and 25 does change the tree structure any further, so deleting them in reverse order causes no structure change. When 18 is deleted, redistribution will be possible from an adjacent node since one node will contain only the value 17, and its right neighbor will contain 19, 20, and 22. Finally, when 17 is deleted, no redistribution will be possible so the tree will loose a level and will return to the original tree. (b) Inserting and deleting the set 13, 15, 18, 25, and 4 will cause a change in the tree structure. When 4 is inserted, the right most leave will split causing the tree to gain a level. When it is deleted, the tree will not shrink in size. Since inserts 13, 15, 18, and 25 did not affect the right most node, their deletion will not change the altered structure either. 2. Let us call the current tree depicted in Figure 10.21 T . T has 16 data entries. The smallest tree S of height 3 which is created exclusively through inserts has (1 ∗ 2 ∗ 3 ∗ 3)∗ 2+1 = 37 data entries in its leaf pages. S has 18 leaf pages with two data entries each and one leaf page with three data entries. T has already four leaf pages which have more than two data entries; they can be filled and made to split, but after each spilt, one of the two pages will still has three data entries remaining. Therefore the smallest tree of height 3 which can possibly be created from T only through inserts has (1 ∗ 2 ∗ 3 ∗ 3) ∗ 2+ 4 = 40 data entries. Therefore the minimum number of entries that will cause the height of T to change to 3 is 40 − 16=24. 3. The argument in part 2 does not assume anything about the data entries to be inserted; it is valid if duplicates can be inserted as well. Therefore the solution does not change. CHAPTER 11: HASH-BASED INDEXING Exercise 11.1 Consider the Extendible Hashing index shown in Figure 11.1. Answer the following questions about this index: 1. What can you say about the last entry that was inserted into the index? 2. What can you say about the last entry that was inserted into the index if you know that there have been no deletions from this index so far? 3. Suppose you are told that there have been no deletions from this index so far. What can you say about the last entry whose insertion into the index caused a split? 4. Show the index after inserting an entry with hash value 68. 5. Show the index after inserting entries with hash values 17 and 69 into the original tree. 6. Show the index after deleting the entry with hash value 21 into the original tree. (Assume that the full deletion algorithm is used.) 7. Show the index after deleting the entry with hash value 10 into the original tree. Is a merge triggered by this deletion? If not, explain why. (Assume that the full deletion algorithm is used.) Answer 11.1 The answer to each question is given below. 1. It could be any one of the data entries in the index. We can always find a sequence of insertions and deletions with a particular key value, among the key values shown in the index as the last insertion. For example, consider the data entry 16 and the following sequence: 1 5 21 10 15 7 51 4 12 36 64 8 24 56 16 56D 24D 8D The last insertion is the data entry 16 and it also causes a split. But the sequence of deletions following this insertion cause a merge leading to the index structure shown in Fig 11.1. 2. The last insertion could not have caused a split because the total number of data entries in the buckets A and A2 is 6. If the last entry caused a split the total would have been 5. 3. The last insertion which caused a split cannot be in bucket C. Buckets B and C or C and D could have made a possible bucket-split image combination but the total number of data entries in these combinations is 4 and the absence of deletions demands a sum of at least 5 data entries for such combinations. Buckets B and D can form a possible bucket-split image combination because they have a total of 6 data entries between themselves. So do A and A2. But for the B and D to be split images the starting global depth should have been 1. If the starting global depth is 2, then the last insertion causing a split would be in A or A2. 4. See Fig 11.2. 5. See Fig 11.3. 6. See Fig 11.4. 7. The deletion of the data entry 10 which is the only data entry in bucket C doesn’t trigger a merge because bucket C is a primary page and it is left as a place holder. Right now, directory element 010 and its split image 110 already point to the same bucket C. We can’t do a further merge. See Fig 11.5. Exercise 11.2 Consider the Linear Hashing index shown in Figure 11.6. Assume that we split whenever an overflow page is created. Answer the following questions about this index: 1. What can you say about the last entry that was inserted into the index? 2. What can you say about the last entry that was inserted into the index if you know that there have been no deletions from this index so far? 3. Suppose you know that there have been no deletions from this index so far. What can you say about the last entry whose insertion into the index caused a split? 4. Show the index after inserting an entry with hash value 4. 5. Show the index after inserting an entry with hash value 15 into the original tree. 6. Show the index after deleting the entries with hash values 36 and 44 into the original tree. (Assume that the full deletion algorithm is used.) 7. Find a list of entries whose insertion into the original index would lead to a bucket with two overflow pages. Use as few entries as possible to accomplish this. What is the maximum number of entries that can be inserted into this bucket before a split occurs that reduces the length of this overflow chain? Answer 11.2 The answer to each question is given below. 1. Nothing can be said about the last entry into the index: it can be any of the data entries in the index. 2. If the last item that was inserted had a hashcode h0(keyvalue) = 00 then it caused a split, otherwise, any value could have been inserted. 3. The last data entry which caused a split satisfies the condition h0(keyvalue) = 00 as there are no overflow pages for any of the other buckets. 4. See Fig 11.7 5. See Fig 11.8 6. See Fig 11.9 7. The following constitutes the minimum list of entries to cause two overflow pages in the index : 63, 127, 255, 511, 1023 The first insertion causes a split and causes an update of Next to 2. The insertion of 1023 causes a subsequent split and Next is updated to 3 which points to this bucket. This overflow chain will not be redistributed until three more insertions (a total of 8 entries) are made. In principle if we choose data entries with key values of the form 2k +3 with sufficiently large k, we can take the maximum number of entries that can be inserted to reduce the length of the overflow chain to be greater than any arbitrary number. This is so because the initial index has 31(binary 11111), 35(binary 10011),7(binary 111) and 11(binary 1011). So by an appropriate choice of data entries as mentioned above we can make a split of this bucket cause just two values (7 and 31) to be redistributed to the new bucket. By choosing a sufficiently large k we can delay the reduction of the length of the overflow chain till any number of splits of this bucket. CHAPTER 17: INTRODUCTION TO TRANSACTION PROCESSING CONCEPTS AND THEORY Answers to Selected Exercises 20.22. Which of the following schedules is (conflict) serializable? For each serializable schedule, determine the equivalent serial schedules. a. b. c. d. 1( 1( 3( 3( ); ); ); ); 3( 3( 2( 2( ); ); ); ); 1( 3( 3( 1( ); ); ); ); 2( ); 1( ); 1( ); 3( ); 3( 2( 1( 1( ); ); ); ); 20.23. Consider the three transactions 1, 2, and 3, and the schedules 1 and 2 given below. Draw the serializability (precedence) graphs for 1 and 2, and state whether each schedule is serializable or not. If a schedule is serializable, write down the equivalent serial schedule(s). 1: 1 ( 2: 2 ( 3: 3 ( 1: 1 ( 2 ( ); 2: 1 ( 2 ( ); ); ); ); ); 1( 2( 3( 2( ); ); ); ); 1 ( ); 2 ( ); 2 ( ); 3 ( ); 1 ( ); 3 ( ); 3 ( ); 1 ( ); 3 ( ); 2 ( ); 2 ( ); ); 2 ( ); 3 ( ); 1 ( ); 2 ( ); 3 ( ); 1 ( ); 2 ( ); 3 ( ); 20.24. Consider schedules 3, 4, and 5 below. Determine whether each schedule is strict, cascadeless, recoverable, or nonrecoverable. (Determine the strictest recoverability condition that each schedule satisfies.) 3: 1 ( ); 2 ( ); 1 ( ); 3 ( ); 3 ( ); 1 ( ); 1; 3 ( ); 3; 2 ( ); 2 ( ); 2 ( ); 2; 4: 1 ( ); 2 ( ); 1 ( ); 3 ( ); 3 ( ); 1 ( ); 3 ( ); 2 ( ); 2 ( ); 2 ( ); 1; 2; 3; 5: 1 ( ); 2 ( ); 3 ( ); 1 ( ); 2 ( ); 3 ( ); 1 ( ); 1; 2 ( ); 3 ( ); 2 ( ); 3; 2; 3. Given two following transactions: T1: r1(A); w1(A); r1(B);w1(B); T2: r2(A); w2(A); r2(B);w2(B); Prove that the schedule: S: r1(A);w1(A); r2(A); w2(A); r1(B);w1(B); r2(B);w2(B); is conflict-serializable. (Hint: reordering the nonconflicting operations in S until we form the equivalent serial schedule) 4. Consider the three transactions T1, T2, and T3, and the schedules S1 and S2 given below. Draw the serializability graph for S1 and S2, and state whether each schedule is conflict-serializable or not. If a schedule is conflict-serializable, write down the equivalent serial schedule. T1: r1(B); w1(B); T2: r2(A); w2(A); r2(B); w2(B); T3: r3(A);w3(A); S1: r2(A); r1(B); w2(A); r3(A); w1(B); w3(A); r2(B); w2(B); S2: r2(A); r1(B); w2(A); r2(B); r3(A); w1(B); w3(A); w2(B); CHAPTER 21: Concurrency Control Techniques 21.20. Prove that the basic two-phase locking protocol guarantees conflict serializability of schedules. ( : Show that if a serializability graph for a schedule has a cycle, then at least one of the transactions participating in the schedule does not obey the two-phase locking protocol.) 21.22. Prove that strict two-phase locking guarantees strict schedules. 21.23. Prove that the wait-die and wound-wait protocols avoid deadlock and starvation. 21.24. Prove that cautious waiting avoids deadlock. Exercise 17.2 Consider the following classes of schedules: serializable, conflict-serializable, view-serializable, recoverable, avoids-cascadingaborts, and strict. For each of the following schedules, state which of the preceding classes it belongs to. If you cannot decide whether a schedule belongs in a certain class based on the listed actions, explain briefly. The actions are listed in the order they are scheduled and prefixed with the transaction name. If a commit or abort is not shown, the schedule is incomplete; assume that abort or commit must follow all the listed actions. 1. T1:R(X), T2:R(X), T1:W(X), T2:W(X) 2. T1:W(X), T2:R(Y), T1:R(Y), T2:R(X) 3. T1:R(X), T2:R(Y), T3:W(X), T2:R(X), T1:R(Y) 4. T1:R(X), T1:R(Y), T1:W(X), T2:R(Y), T3:W(Y), T1:W(X), T2:R(Y) 5. T1:R(X), T2:W(X), T1:W(X), T2:Abort, T1:Commit 6. T1:R(X), T2:W(X), T1:W(X), T2:Commit, T1:Commit 7. T1:W(X), T2:R(X), T1:W(X), T2:Abort, T1:Commit 8. T1:W(X), T2:R(X), T1:W(X), T2:Commit, T1:Commit 9. T1:W(X), T2:R(X), T1:W(X), T2:Commit, T1:Abort 10. T2: R(X), T3:W(X), T3:Commit, T1:W(Y), T1:Commit, T2:R(Y), T2:W(Z), T2:Commit 11. T1:R(X), T2:W(X), T2:Commit, T1:W(X), T1:Commit, T3:R(X), T3:Commit 12. T1:R(X), T2:W(X), T1:W(X), T3:R(X), T1:Commit, T2:Commit, T3:Commit Answer 17.2 For simplicity, we assume the listed transactions are the only ones active currently in the database and if a commit or abort is not shown for a transaction, we’ll assume a commit will follow all the listed actions. 1. Not serializable, not conflict-serializable, not view-serializable; It is recoverable and avoid cascading aborts; not strict. 2. It is serializable, conflict-serializable, and view-serializable; It does NOT avoid cascading aborts, is not strict; We can not decide whether it’s recoverable or not, since the abort/commit sequence of these two transactions are not specified. 3. It is the same with number 2 above. 4. It is NOT serializable, NOT conflict-serializable, NOT viewserializable; It is NOT avoid cascading aborts, not strict; We can not decide whether it’s recoverable or not, since the abort/commit sequence of these transactions are not specified. 5. It is serializable, conflict-serializable, and view-serializable; It is recoverable and avoid cascading aborts; It is not strict. 6. It is serializable and view-serializable, not conflict-serializable; It is recoverable and avoid cascading aborts; It is not strict. 7. It is not serializable, not view-serializable, not conflict-serializable; It is not recoverable, therefore not avoid cascading aborts, not strict. 8. It is not serializable, not view-serializable, not conflict-serializable; It is not recoverable, therefore not avoid cascading aborts, not strict. 9. It is serializable, view-serializable, and conflict-serializable; It is not recoverable, therefore not avoid cascading aborts, not strict. 10. It belongs to all above classes. 11. (assume the 2nd T2:Commit is instead T1:Commit). It is serializable and view-serializable, not conflict-serializable; It is recoverable, avoid cascading aborts and strict. 12. It is serializable and view-serializable, not conflict-serializable; It is recoverable, but not avoid cascading aborts, not strict. Exercise 17.3 Consider the following concurrency control protocols: 2PL, Strict 2PL, Conservative 2PL, Optimistic, Timestamp without the Thomas Write Rule, Timestamp with the Thomas Write Rule, and Multiversion. For each of the schedules in Exercise 17.2, state which of these protocols allows it, that is, allows the actions to occur in exactly the order shown. For the timestamp-based protocols, assume that the timestamp for transaction Ti is I and that a version of the protocol that ensures recoverability is used. Further, if the Thomas Write Rule is used, show the equivalent serial schedule. Answer 17.3 See the table 17.1. Note the following abbreviations. S-2PL: Strict 2PL; C-2PL: Conservative 2PL; Opt cc: Optimistic; TS W/O THR: Timestamp without Thomas Write Rule; TS With THR: Timestamp without Thomas Write Rule. Thomas Write Rule is used in the following schedules, and the equivalent serial schedules are shown below: 5. T1:R(X), T1:W(X), T2:Abort, T1:Commit 6. T1:R(X), T1:W(X), T2:Commit, T1:Commit 11. T1:R(X), T2:Commit, T1:W(X), T2:Commit, T3:R(X), T3:Commit Exercise 17.4 Consider the following sequences of actions, listed in the order they are submitted to the DBMS: Sequence S1: T1:R(X), T2:W(X), T2:W(Y), T3:W(Y), T1:W(Y), T1:Commit, T2:Commit, T3:Commit Sequence S2: T1:R(X), T2:W(Y), T2:W(X), T3:W(Y), T1:W(Y), T1:Commit, T2:Commit, T3:Commit For each sequence and for each of the following concurrency control mechanisms, describe how the concurrency control mechanism handles the sequence. Assume that the timestamp of transaction Ti is i. For lock-based concurrency control mechanisms, add lock and unlock requests to the previous sequence of actions as per the locking protocol. The DBMS processes actions in the order shown. If a transaction is blocked, assume that all its actions are queued until it is resumed; the DBMS continues with the next action (according to the listed sequence) of an unblocked transaction. 1. Strict 2PL with timestamps used for deadlock prevention. 2. Strict 2PL with deadlock detection. (Show the waits-for graph in case of deadlock.) 3. Conservative (and Strict, i.e., with locks held until end-of-transaction) 2PL. 4. Optimistic concurrency control. 5. Timestamp concurrency control with buffering of reads and writes (to ensure recoverability) and the Thomas Write Rule. 6. Multiversion concurrency control. Answer 17.4 The answer to each question is given below. 1. Assume we use Wait-Die policy. Sequence S1: T1 acquires shared-lock on X; When T2 asks for an exclusive lock on X, since T2 has a lower priority, it will be aborted; T3 now gets exclusive-lock on Y; When T1 also asks for an exclusive-lock on Y which is still held by T3, since T1 has higher priority, T1 will be blocked waiting; T3 now finishes write, commits and releases all the lock; T1 wakes up, acquires the lock, proceeds and finishes; T2 now can be restarted successfully. Sequence S2: The sequence and consequence are the same with Sequence S1, except T2 was able to advance a little more before it gets aborted. 2. In deadlock detection, transactions are allowed to wait, they are not aborted until a deadlock has been detected. (Compared to prevention schema, some transactions may have been aborted prematurely.) Sequence S1: T1 gets a shared-lock on X; T2 blocks waiting for an exclusive-lock on X; T3 gets an exclusive-lock on Y; T1 blocks waiting for an exclusive-lock on Y; T3 finishes, commits and releases locks; T1 wakes up, gets an exclusive-lock on Y, finishes up and releases lock on X and Y; T2 now gets both an exclusive-lock on X and Y, and proceeds to finish. No deadlock. Sequence S2: There is a deadlock. T1 waits for T2, while T2 waits for T1. 3. Sequence S1: With conservative and strict 2PL, the sequence is easy. T1 acquires lock on both X and Y, commits, releases locks; then T2; then T3. Sequence S2: Same as Sequence S1. 4. Optimistic concurrency control: For both S1 and S2: each transaction will execute, read values from the database and write to a private workspace; they then acquire a timestamp to enter the validation phase. The timestamp of transaction Ti is i. Sequence S1: Since T1 gets the earliest timestamp, it will commit without problem; but when validating T2 against T1, none of the three conditions hold, so T2 will be aborted and restarted later; so is T3 (same as T2). Sequence S2: The fate is the same as in Sequence S1. 5. Timestamp concurrency control with buffering of reads and writes and TWR. Sequence S1: This sequence will be allowed the way it is. Sequence S2: Same as above. 6. Multiversion concurrency control Sequence S1: T1 reads X, so RTS(X) = 1; T2 is able to write X, since TS(T2) . RTS(X); and RTS(X) and WTS(X) are set to 2; T2 writes Y, RTS(Y) and WTS(Y) are set to 2; T3 is able to write Y as well, so RTS(Y) and WTS(Y) are set to 3; Now when T1 tries to write Y, since TS(T1) ! RTS(Y), T1 needs to be aborted and restarted later. Sequence S2: The fate is similar to the one in Sequence S1. Exercise 17.5 For each of the following locking protocols, assuming that every transaction follows that locking protocol, state which of these desirable properties are ensured: serializability, conflict-serializability, recoverability, avoidance of cascading aborts. 1. Always obtain an exclusive lock before writing; hold exclusive locks until end-oftransaction. No shared locks are ever obtained. 2. In addition to (1), obtain a shared lock before reading; shared locks can be released at any time. 3. As in (2), and in addition, locking is two-phase. 4. As in (2), and in addition, all locks held until end-of-transaction. Answer 17.5 See the table 17.2. CHAPTER 22: Database Recovery Techniques Answers to Selected Exercises 22.23. Figure 22.6 shows the log corresponding to a particular schedule at the point of a system crash for four transactions 1, 2, 3, and 4. Suppose that we use the with checkpointing. Describe the recovery process from the system crash. Specify which transactions are rolled back, which operations in the log are redone and which (if any) are undone, and whether any cascading rollback takes place. Answer: First, we note that this schedule is not recoverable, since transaction T4 has read the value of B written by T2, and then T4 committed before T2 committed. Similarly, transaction T4 has read the value of A written by T3, and then T4 committed before T3 committed. The [commit, T4] should not be allowed in the schedule if a recoverable protocol is used, but should be postponed till after T2 and T3 commit. For this problem, let us assume that we can roll back a committed transaction in a non-recoverable schedule, such as the one shown in Figure 21.7. By using the procedure RIU_M (recovery using immediate updates for a multiuser environment), the following result is obtained: From Step 1 of procedure RIU_M, T2 and T3 are the active transactions. T1 was committed before the checkpoint and hence is not involved in the recovery. From Step 2, the operations that are to be undone are: [write_item,T2,D,25] [write_item,T3,A,30] [write_item,T2,B,12] Note that the operations should be undone in the reverse of the order in which they were written into the log. Now since T4 read item B that as written by T2 and read item A that as written by T3, and since T2 and T3 will be rolled back, by the cascading rollback rule, T4 must be also rolled back. Hence, the following T4 operations must also be undone: [write_item,T4,A,20] [write_item,T4,B,15] (Note that if the schedule was recoverable and T4 was committed, then from Step 3, the operations that are to be redone would have been: [write_item,T4,B,15] [write_item,T4,A,20] In our case of non-recoverable schedule, no operations need to be redone in this example.) At the point of system crash, transactions T2 and T3 are not committed yet. Hence, when T2 is rolled back, transaction T4 should also be rolled back as T4 reads the values of items B and A that were written by transactions T2 and T3. The write operations of T4 have to be undone in their correct order. Hence, the operations are undone in the following order: [write_item,T2,D,25] [write_item,T4,A,20] [write_item,T3,A,30] [write_item,T4,B,15] [write_item,T2,B,12] 22.24. Suppose that we use the deferred update protocol for the example in Figure 22.6. Show how the log would be different in the case of deferred update by removing the unnecessary log entries; then describe the recovery process, using your modified log. Assume that only REDO operations are applied, and specify which operations in the log are redone and which are ignored. Answer: In the case of deferred update, the write operations of uncommitted transactions are not recorded in the database until the transactions commit. Hence, the write operations of T2 and T3 would not have been applied to the database and so T4 would have read the previous (committed) values of items A and B, thus leading to a recoverable schedule. By using the procedure RDU_M (deferred update with concurrent execution in a multiuser environment), the following result is obtained: The list of committed transactions T since the last checkpoint contains only transaction T4. The list of active transactions T' contains transactions T2 and T3. Only the WRITE operations of the committed transactions are to be redone. Hence, REDO is applied to: [write_item,T4,B,15] [write_item,T4,A,20] The transactions that are active and did not commit i.e., transactions T2 and T3 are canceled and must be resubmitted. Their operations do not have to be undone since they were never applied to the database. 22.25. How does checkpointing in ARIES differ from checkpointing as described in Section 22.1.4? Answer: The main difference is that with ARIES, main memory buffers that have been modified are not flushed to disk. ARIES, however writes additional information to the LOG in the form of a Transaction Table and a Dirty Page Table when a checkpoint occurs. 22.28. Incremental logging with deferred updates implies that the recovery system must a. store the old value of the updated item in the log b. store the new value of the updated item in the log c. store both the old and new value of the updated item in the log d. store only the Begin Transaction and Commit Transaction records in the log 22.29. The write-ahead logging (WAL) protocol simply means that a. writing of a data item should be done ahead of any logging operation b. the log record for an operation should be written before the actual data is written c. all log records should be written before a new transaction begins execution d. the log never needs to be written to disk 22.30. In case of transaction failure under a deferred update incremental logging scheme, which of the following will be needed? a. an undo operation b. a redo operation c. an undo and redo operation d. none of the above 22.31. For incremental logging with immediate updates, a log record for a transaction would contain a. a transaction name, a data item name, and the old and new value of the item b. a transaction name, a data item name, and the old value of the item c. a transaction name, a data item name, and the new value of the item d. a transaction name and a data item name 22.32. For correct behavior during recovery, undo and redo operations must be a. commutative b. associative c. idempotent d. distributive 22.33. When a failure occurs, the log is consulted and each operation is either undone or redone. This is a problem because a. searching the entire log is time consuming b. many redos are unnecessary c. both (a) and (b) d. none of the above 22.34. Using a log-based recovery scheme might improve performance as well as provide a recovery mechanism by a. writing the log records to disk when each transaction commits b. writing the appropriate log records to disk during the transaction’s execution c. waiting to write the log records until multiple transactions commit and writing them as a batch d. never writing the log records to disk 22.35. There is a possibility of a cascading rollback when a. a transaction writes items that have been written only by a committed transaction b. a transaction writes an item that is previously written by an uncommitted transaction c. a transaction reads an item that is previously written by an uncommitted transaction d. both (b) and (c) 22.36. To cope with media (disk) failures, it is necessary a. for the DBMS to only execute transactions in a single user environment b. to keep a redundant copy of the database c. to never abort a transaction d. all of the above 22.37. If the shadowing approach is used for flushing a data item back to disk, then a. the item is written to disk only after the transaction commits b. the item is written to a different location on disk c. the item is written to disk before the transaction commits d. the item is written to the same disk location from which it was read Exercise 18.2 Briefly answer the following questions: 1. What are the properties required of LSNs? 2. What are the fields in an update log record? Explain the use of each field. 3. What are redoable log records? 4. What are the differences between update log records and CLRs? Answer 18.2 The answer to each question is given below. 1. As with any record id, it should be possible to fetch a log record with one disk access given the log sequence numbers, LSNs. Further, LSNs should be assigned in monotonically increasing order; this property is required by the ARIES recovery algorithm. 2. An update log record consists of two sets of fields: a) Fields common to all log records – prevLSN, transID and type. b) Fields unique for update log records – pageID, length, offset, before-image and after-image. prevLSN - the previous LSN of a given transaction. transID - the id of the transaction generating the log record. type - indicates the type of the log record. pageID - the pageID of the modified page. length - length in bytes of the change. offset - offset into the page of the change. before-image - value of the changed bytes before the change. after-image - value of the changed bytes after the change. 3. Redoable log records are update log records and compensation log records; executing the actions indicated by these records several times is equivalent to executing them once. 4. A compensation log record (CLR) C describes the action taken to undo the actions recorded in the corresponding update log record U. (This can happen during normal system execution when a transaction is aborted, or during recovery from a crash.) The compensation log record C also contains a field called undonextLSN which is the LSN of the next log record that is to be undone for the transaction that wrote update record U; this field in C is set to the value of prevLSN in U. Unlike an update log record, a CLR describes an action that will never be undone. An aborted transaction will never be revived, therefore once a CLR has properly returned the data its previous state, both transactions can be forgotten. Exercise 18.3 Briefly answer the following questions: 1. What are the roles of the Analysis, Redo, and Undo phases in ARIES? 2. Consider the execution shown in Figure 18.1. (a) What is done during Analysis? (Be precise about the points at which Analysis begins and ends and describe the contents of any tables constructed in this phase.) (b) What is done during Redo? (Be precise about the points at which Redo begins and ends.) (c) What is done during Undo? (Be precise about the points at which Undo begins and ends.) Answer 18.3 The answer to each question is given below. 1. The Analysis phase starts with the most recent begin checkpoint record and proceeds forward in the log until the last log record. It determines (a) The point in the log at which to start the Redo pass (b) The dirty pages in the buffer pool at the time of the crash. (c) Transactions that were active at the time of the crash which need to be undone. The Redo phase follows Analysis and redoes all changes to any page that might have been dirty at the time of the crash. The Undo phase follows Redo and undoes the changes of all transactions that were active at the time of the crash. 2. (a) For this example, we will assume that the Dirty Page Table and Transaction Table were empty before the start of the log. Analysis determines that the last begin checkpoint was at LSN 00 and starts at the corresponding end checkpoint (LSN 10). We will denote Transaction Table records as (transID, lastLSN) and Dirty Page Table records as (pageID, recLSN) sets. Then Analysis phase runs until LSN 70, and does the following: LSN 20 LSN 30 LSN 40 LSN 50 LSN 60 LSN 70 Adds (T1, 20) to TT and (P5, 20) to DPT Adds (T2, 30) to TT and (P3, 30) to DPT Changes status of T2 to ”C” from ”U” Deletes entry for T2 from Transaction Table Adds (T3, 60) to TT. Does not change P3 entry in DPT Changes (T1, 20) to (T1, 70) The final Transaction Table has two entries: (T1, 70), and (T3, 60). The final Dirty Page Table has two entries: (P5, 20), and (P3, 30). (b) Redo Phase: Redo starts at LSN 20 (smallest recLSN in DPT). LSN 20 LSN 30 LSN 40,50 LSN 60 LSN 70 Changes to P5 are redone. P3 is retrieved and its pageLSN is checked. If the page had been written to disk before the crash (i.e. if pageLSN >= 30), nothing is re-done otherwise the changes are re-done. No action Changes to P3 are redone No action (c) Undo Phase: Undo starts at LSN 70 (highest lastLSN in TT). The Loser Set consists of LSNs 70 and 60. LSN 70: Adds LSN 20 to the Loser Set. Loser Set = (60, 20). LSN 60: Undoes the change on P3 and adds a CLR indicating this Undo. Loser Set = (20). LSN 20: Undoes the change on P5 and adds a CLR indicating this Undo. Exercise 18.4 Consider the execution shown in Figure 18.2. 1. Extend the figure to show prevLSN and undonextLSN values. 2. Describe the actions taken to rollback transaction T 2. 3. Show the log after T 2 is rolled back, including all prevLSN and undonextLSN values in log records. Answer 18.4 The answer to each question is given below. 1. The extended figure is shown below: 2. Step i) Restore P3 to the before-image stored in LSN 60. Step ii) Restore P5 to the before-image stored in LSN 50. Step iii) Restore P5 to the before-image stored in LSN 20. 3. The log tail should look something like this: Exercise 18.5 Consider the execution shown in Figure 18.3. In addition, the systemcrashes during recovery after writing two log records to stable storage and again afterwriting another two log records. 1. What is the value of the LSN stored in the master log record? 2. What is done during Analysis? 3. What is done during Redo? 4. What is done during Undo? 5. Show the log when recovery is complete, including all non-null prevLSN and undonextLSN values in log records. Answer 18.5 The answer to each question is given below. 1. LSN 00 is stored in the master log record as it is the LSN of the begin checkpoint record. 2. During analysis the following happens: LSN 20 LSN 30 LSN 40 LSN 50 LSN 60 LSN 70 LSN 80 LSN 90 Add (T1,20) to TT and (P1,20) to DPT Add (T2,30) to TT and (P2,30) to DPT Add (T3,40) to TT and (P3,40) to DPT Change status of T2 to C Change (T3,40) to (T3,60) Remove T2 from TT Change (T1,20) to (T1,70) and add (P5,70) to DPT No action At the end of analysis, the transaction table contains the following entries: (T1,80), and (T3,60). The Dirty Page Table has the following entries: (P1,20), (P2,30), (P3,40), and (P5,80). 3. Redo starts from LSN20 (minimum recLSN in DPT). LSN 20 Check whether P1 has pageLSN more than 10 or not. Since it is a committed transaction, we probably need not redo this update. LSN LSN LSN LSN LSN LSN LSN 30 Redo the change in P2 40 Redo the change in P3 50 No action 60 Redo the changes on P2 70 No action 80 Redo the changes on P5 90 No action 4. ToUndo consists of (80, 60). LSN 80 Undo the changes in P5. Append a CLR: Undo T1 LSN 80, set undonextLSN = 20. Add 20 to ToUndo. ToUndo consists of (60, 20). LSN 60 Undo the changes on P2. Append a CLR: Undo T3 LSN 60, set undonextLSN = 40. Add 40 to ToUndo. ToUndo consists of (40, 20). LSN 40 Undo the changes on P3. Append a CLR: Undo T3 LSN 40, T3 end ToUndo consists of (20). LSN 20 Undo the changes on P1. Append a CLR: Undo T1 LSN 20, T1 End 5. The log looks like the following after recovery: LSN 00 LSN 10 LSN 20 LSN 30 LSN 40 LSN 50 LSN 60 LSN 70 LSN 80 LSN 90 LSN 100 LSN 110 LSN 120,125 LSN 130,135 begin checkpoint end checkpoint update: T1 writes P1 update: T2 writes P2 update: T3 writes P3 T2 commit update: T3 writes P2 T2 end update: T1 writes P5 T3 abort CLR: Undo T1 LSN 80 CLR: Undo T3 LSN 60 CLR: Undo T3 LSN 40 CLR: Undo T1 LSN 20 prevLSN = 30 prevLSN = 40 prevLSN = 50 prevLSN = 20 prevLSN = 60 undonextLSN= 20 undonextLSN= 40 T3 end. T1 end.