CMPE 272 HW #1 KEY SHEET 16.34. Consider a disk with the following characteristics (these are not parameters of any particular disk unit): block size B = 512 bytes; interblock gap size G = 128 bytes; number of blocks per track = 20; number of tracks per surface = 400. A disk pack consists of 15 double-sided disks. a. What is the total capacity of a track, and what is its useful capacity (excluding interblock gaps)? Given Block size B=512 bytes Interblock gap size G=128 bytes Number of blocks per track = 20 Number of tracks per surface = 400. Total capacity of a track (TC) = (Block size + Interblock gap size) * number of blocks = (512 + 128) * 20 = 12800 = 12.8 KB Useful Capacity of a track (UC) = Block size * number of blocks/tracks = 512 * 20 = 10240 =10.24 KB b. How many cylinders are there? Total number of cylinders = number of tracks per surface= 400 c. What are the total capacity and the useful capacity of a cylinder? Total cylinder capacity = no.of double sided disks * 2 (since double sided) * Total capacity = 15*2*12800 =384000 =384 KB Useful Capacity of a cylinder = no.of double sided disks * 2 (since double-sided) * Useful capacity =15*2*10240 =307200=307.2 KB d. What are the total capacity and the useful capacity of a disk pack? Total capacity of a disk pack = number of tracks per surface * Total Cylinder capacity =400*384000 = 153600000 =153.6KB Useful capacity of a disk pack = number of tracks per surface * Useful Cylinder capacity =400*307200 =122880000 = 122.8 KB e. Suppose that the disk drive rotates the disk pack at a speed of 2,400 rpm (revolutions per minute); what are the transfer rate (tr) in bytes/msec and the block transfer time (btt) in msec? What is the average rotational delay (rd) in msec? What is the bulk transfer rate? (See Appendix B.) Speed = 2400 rpm Time for one disk revolution in msec = 60 * 1000/Disk drive rpm = 60 * 1000/2400 = 25 ms Transfer rate (TR) = total track size/Time for one disk revolution =12800/25 =512 bytes/msec Block transfer rate (BTT) = B/TR = 512/512= 1ms Average rotational delay (rd) = Time for one disk revolution/2 = 25/2 =12.5 ms Bulk Transfer rate (BTR) = TR*B/(B+G) = 512*512/(512+128) =409.6 bytes/msec f. Suppose that the average seek time is 30 msec. How much time does it take (on the average) in msec to locate and transfer a single block, given its block address? Seek time (S) = 30 ms Time to locate and transfer a single block = S+rd+BTT = 30+12.5+1 = 43,5 ms g. Calculate the average time it would take to transfer 20 random blocks, and compare this with the time it would take to transfer 20 consecutive blocks using double buffering to save seek time and rotational delay. Time to transfer 20 random blocks = 20*43.5 = 870 ms Time to transfer 20 consecutive blocks = sd+rd+ (20*BTT) = 30+12.5+(20*1)=62.5 ms 16.35. A file has r = 20,000 STUDENT records of fixed length. Each record has the following fields: Name (30 bytes), Ssn (9 bytes), Address (40 bytes), PHONE (10 bytes), Birth_date (8 bytes), Sex (1 byte), Major_dept_code (4 bytes), Minor_dept_code (4 bytes), Class_code (4 bytes, integer), and Degree_program (3 bytes). An additional byte is used as a deletion marker. The file is stored on the disk whose parameters are given in Exercise 16.27. a. Calculate the record size R in bytes. Assuming block size =512 bytes Record size R = Sum of all field size =30+9+40+10+8+1+4+4+4+3+1 =114B b. Calculate the blocking factor bfr and the number of file blocks b, assuming an unspanned organization BFR = B/R = 512/114 = 4.49 No.of block (b) = no. of records/BFR = 20000/4 = 5000 c. Calculate the average time it takes to find a record by doing a linear search on the file if (i) the file blocks are stored contiguously, and double buffering is used; (ii) the file blocks are not stored contiguously. Avg time to find a record by doing a linear search on file, the search is performed on average half = 5000/2 = 2500 BTR = TR(B/B+G) = 512*(512/(512+128)) = 409.6 bytes ~410 bytes (i) If the blocks are stored consecutively and double buffering is used, the time to read 2500 consecutive blocks Time taken to read 2500 blocks = S+rd+(2500/(B/BTR) = 30+12.5+2500(512/410) = 3167.5 ms = 3.1675 sec ii) For scattered blocks, we need to seek time for each block so that the time will be =(S+rd+BTT)*2500 = (30+12.5+1)*2500 = 108750 sec = 108.75 ms d. Assume that the file is ordered by Ssn; by doing a binary search, calculate the time it takes to search for a record given its Ssn value. Avg time to find a record by doing a binary search on the file when it's ordered by Ssn =log2b(S+rd+BTT) = log25000(30+12.5+1) = 0.5655 ms Assuming block size =2400 bytes a)Record size R = Sum of all field size =30+9+40+10+8+1+4+4+4+3+1 =114B b) BFR = floor(B/R) = 2400/114 = 21 sec/block No.of block (b) = ceil(no. of records/BFR) = 20000/21 = 952.3 = 953 c) Avg time to find a record by doing a linear search on file, the search is performed on average half = 952/2 = 476 BTR = TR(B/B+G) = 2400*(2400/(2400+600)) = 1920 bytes (i) If the blocks are stored consecutively and double buffering is used, the time to read 476 consecutive blocks Time taken to read 2500 blocks = S+rd+(476/(B/BTR) = 20+10+476(2400/1920) = 625 ms = 0.625 sec ii) For scattered blocks, we need to seek time for each block so that the time will be =(S+rd+BTT)*476 = (20+10+1)*476 = 14756 sec d) Avg time to find a record by doing a binary search on the file when it's ordered by Ssn =log2b(S+rd+BTT) = log2952(20+10+1) = 310 ms 16.36. Suppose that only 80% of the STUDENT records from Exercise 16.28 have a value for Phone, 85% for Major_dept_code, 15% for Minor_dept_code, and 90% for Degree_program; and suppose that we use a variable-length record file. Each record has a 1-byte field type for each field in the record, plus the 1-byte deletion marker and a 1-byte end-of-record marker. Suppose that we use a spanned record organization, where each block has a 5-byte pointer to the next block (this space is not used for record storage). a. Calculate the average record length R in bytes. Fixed size = (30+1)+(9+1)+(40+1)+(8+1)+(1+1)+(4+1)+1+1 = 100 bytes Variable Size = ((10+1)*0.8)+((4+1)*0.85)+((4+1)*0.15)+((3+1)*0.9)) = 8.8+4.25+0.75+3.6=17.4 bytes The average record size R = R(fixed) + R(variable) = 100 + 17.4 = 117.4 bytes b. Calculate the number of blocks needed for the file. Using a spanned record organization, where each block has a 5-byte pointer to the next block, the bytes available in each block (B - 5) = (512-5) = 507 bytes. The number of blocks needed for the file : b = ceiling((r*R)/(B-5)) = ceiling(2348000/507) = 4631 blocks Assuming block size =2400 bytes a) Fixed size = (30+1)+(9+1)+(40+1)+(8+1)+(1+1)+(4+1)+1+1 = 100 bytes Variable Size = ((10+1)*0.8)+((4+1)*0.85)+((4+1)*0.15)+((3+1)*0.9)) = 8.8+4.25+0.75+3.6=17.4 bytes The average record size R = R(fixed) + R(variable) = 100 + 17.4 = 117.4 bytes b) Using a spanned record organization, where each block has a 5-byte pointer to the next block, the bytes available in each block (B - 5) = (2400-5) = 2395 bytes. The number of blocks needed for the file : b = ceiling((r*R)/(B-5)) = ceiling(117.4*2000/2395) =980.3 = 981 blocks 18.13. Consider SQL queries Q1, Q8, Q1B, and Q4 in Chapter 6 and Q27 in Chapter 7. a. Draw at least two query trees that can represent each of these queries. Under what circumstances would you use each of your query trees? b. Draw the initial query tree for each of these queries, and then show how the query tree is optimized by the algorithm outlined in Section 18.7. c. For each query, compare your own query trees of part (a) and the initial and final query trees of part (b). Q8: select E.frame, E.lname,S.fname,S.lname From Employee E, Employee S Where E.Superssn=S.ssn Q8’s tree 1: Project E.frame, E.lname,S.fname,S.lname E.Superssn=S.ssn JOIN Employee E, employee S Q8’s tree 2: Project Cartesian Product Employee E, employee S E.frame, E.lname,S.fname,S.lname Select E.Superssn =S.Ssn Initial Query tree is the same as tree 2. Replace selection and Cartesian product by join in tree 1. Tree1 is the result after optimization Q27 : select E.frame, E.lname,1.1*salary from Employee, Works_on, Project where ssn=Esssn and Pno =Pnumber and Pname =’ProductX’ Q8’s tree 1: Project frame, lname,Salary Pno =Pnumber join Employee Project Ssn =Essn join select Pname=’ProductX’ Works_on Q8’s tree 2: Project fname, name, Salary Pno=Pnumber and Ssn = Essn and Pname = ‘ProductX’ Select employee, Project Cartesian Product Works_on Cartesian Product The initial Query tree of Q27 is ‘Q27’ tree 2. But the heuristic approximation process will not be the same as tree1. This can be more optimised as follows: Project fname, name, Salary Pno = Pnumber join Employee Project Ssn =Essn Join Select Pname =’ProductX’ works_on 18.14. A file of 4,096 blocks is to be sorted with an available buffer space of 64 blocks. How many passes will be needed in the merge phase of the external sort-merge algorithm? Let nR = no.of initial runs , b= no.of file blocks , nB = available buffer space , dM = degree of merging b=4096 nB=64 Sorting Phase nR = b/nB =4096/64 =64 Dm = min(nB-1,nB) = min(63,64) =63 Number of passes np= ceil (logdM 𝑛R) = log63 64 =1.004 Number of passes = 2 20.14. Change transaction T2 in Figure 20.2(b) to read read_item(X); X := X + M; if X > 90 then exit else write_item(X); Discuss the final result of the different schedules in Figures 20.3(a) and (b), where M = 2 and N = 2, with respect to the following questions: Does adding the above condition change the final outcome? Does the outcome obey the implied consistency rule (that the capacity of X is 90)? read_item(X); X := X + M; if X > 90 then exit else write_item(X); If M=2, then with initial value of X=88, we have X=X+M =88+2 =90 Only when initial X>88, do we exit with the if() statement getting to be evaluated as True. Or in the other words, for write_item(X) not to be executed, the initial value of X>88. 20.22 Which of the following schedules is (conflict) serializable ? For each serializable schedule, determine the equivalent serial schedules. a. r1(X); r3(X); w1(X); r2(X); w3(X) b. r1(X); r3(X); w1(X); w1(X); r2(X) c. r3(X); r2(X); w3(X); r1(X); w1(X) d. r3(X); r2(X); r1(X); w3(X); w1(X) a) Given Schedule r1(X); r3(X); w1(X); r2(X); w3(X) Conflict graph : There is a cycle T1-> T2-> T3-> T1 in this graph. So, the given set of scheduling times/transactions is not serializable. b) Given Schedule r1(X); r3(X); w1(X); w1(X); r2(X) There is a cycle T1> T3-> T1 in this graph. So, the given set of scheduling times/transactions is not serializable. c) Given Schedule r3(X); r2(X); w3(X); r1(X); w1(X) This graph doesn't contain any cycle. Hence, this schedule is serializable. Now, T2-> T3->T1 is a serial schedule and is equivalent to r2(X); w3(X); r1(X); w1(X) d) Given Schedule r3(X); r2(X); r1(X); w3(X); w1(X) There is a cycle T1> T3-> T1 in this graph. So, the given set of scheduling times/transactions is not serializable. 20.23. Consider the three transactions T1, T2, and T3, and the schedules S1 and S2 given below. Draw the serializability (precedence) graphs for S1 and S2, and state whether each schedule is serializable or not. If a schedule is serializable, write down the equivalent serial schedule(s). T1: r1 (X); r1 (Z); w1 (X); T2: r2 (Z); r2 (Y); w2 (Z); w2 (Y); T3: r3 (X); r3 (Y); w3(Y); S1: r1 (X); r2(Z);r1(Z); r3(X); w1(X); w3(Y); r2(Y) ; w2(Z); w2(Y); S2: r1 (X); r2(Z); r3(X); r1(Z) ; r2(Y) ; r3(Y) ; w1 (X); w2 (Z) ; w3( Y) ; w2(Y); Time T1 T0 r1(X) T1 T2 T2 T3 r2(Z) r1(Z) T3 r3(X) T4 r3 (Y) T5 w1(X) T6 w3(Y) T7 r2(Y) T8 w2(Z) T9 w2(Y) From the above schedule table, we can determine the conflicting operations and dependencies Conflicting Operations Dependencies between the Transactions r1(Z) , w2(Z) T1-> T2 r3(X) ,w1(X) T3->T1 r3(Y), w2(Y) T3->T2 w3(Y), w2(Y) T3-> T2 Using the above to tables we can draw the precedence graph: Since there are no cycles in the precedence Schedule ‘S1’ is Serializable. Schedule: S2 Time T1 T0 r1(X) T1 T2 r2(Z) T2 T3 r3(X) r1(Z) T4 r2(Y) T5 T6 T7 r3 (Y) w1(X) w3(Y) w2(Z) T8 T9 T3 w3(Y) w2(Y) From the above schedule table, we can determine the conflicting operations and dependencies Conflicting Operations Dependencies between the Transactions R3(Z) , w2(Z) T3-> T1 R1(Z), W2(Z) T1->T2 R2(Y), W3(Y) T2->T3 R3(Y), W2(Y) T3-> T2 Using the above to tables we can draw the precedence graph: From the above precedence graph, we can see that there is a loop between T2 and T3. Hence, schedule S2 is not Serializable. ”