COSC 671 Final WINTER 2014 Name: NOTE: YOU MUST ANSWER 6, 7, 8. Questions 1 – 5 are optional and count for extra credit only. Open books, open notes, open Internet. Return your answer electronically (good formats are: pdf, odt, sxw, rtf, txt, doc, png, jpg. 1. For each of the following schedules, state (1) whether the schedule is serializable and (2) whether the schedule is conflict serializable. r: read w: write a: abort c: commit a. r (T1,x), r(T2,x), w(T1,x), w(T2,x), c(T2), c(T1) b. r(T1,x), r(T2,x), r(T2,y), w(T2,y), c(T2), w(T1,y), c(T1) 2. Give the precedence graph for each of the schedules given in #1. The graph can be given as a picture (nodes and arcs), adjacency matrix or adjacency list. 3. For each of the following sets of transactions, give a schedule that maximizes interleaving while avoiding conflicts by using the 2PL protocol (there may be more than one correct answer, give only one answer for each). a. T1: r(x), w(x), c T2: r(x), w(x), c b. T1: r(x), r(y), c T2: r(x), w(x), c, r(y), w(y), c 4. For the following schedule (there is only one schedule given in the problem), give the schedule that results from using timestamping with wait-die. r(T1,x), r(T2,x), r(T2,y), w(T2,y), r(T3,x), w(T3,x), c(T3), c(T2), w(T1,y), c(T1) 5. A database has two tables, X and Y. X has schema (a, b), and the current instance has tuples x1, x2, … xm. Y has schema (c, d), and the current instance has tuples y1, y2, … yn. Recall the multi-granularity locking protocol described in class (see also http://en.wikipedia.org/wiki/Multiple_granularity_locking ). The database objects are hierarchically shown here. DB Y X x1 a b … x2 a b y1 c d y2 … c d Consider the following schedules, give all locks over the database hierarchy at the end of the operations from each schedule (two separate answers). a. r(T1,x1.a), r(T2, y1.c) b. r(T1, x1.a), w(T1, x1.a), r(T2, x2.b) 6. Suppose you have two interleaved transactions, T1, T2. T1: r(x), r(y), x += 1, y += 1, w(x), w(y), c T2: r(x), x += 10, w(x), c The trace of actual execution is as follows (the subscript gives the transaction #) r1(x), r2(x) r1(y), x1+=1, y1+=1, w1(x), x2+=10, FAIL! Starting values: x = 5, y = 50 At the point of FAIL! The entire DB system fails; a few minutes later, recovery starts and then recovery will successfully complete. (a) Give the log file just before the FAIL! occurs. (b) Give the values of x and y after recovery completes. (c) We can suppose the DB system failed because of some error in data protection (that is, in implementing correct locking). Identify the statement that should not have been allowed to execute. (d) Continue with (c), how would 2PL prevent the statement identified in (c) from causing the DB failure? 7. In a distributed data base system, suppose tables R1, R2, R3 are created at node1. Then R2 and R3 are migrated to node2. Give the distributed catalog information (global catalog and local catalogs) that reflects the current distribution of R1, R2, and R3. 8. This is a ‘thinking’ question. We did not discuss this in class. You need to consider the proposal and discuss advantages/disadvantages of each (if any). I expect approximately one page of text/figures to present your basic opinions. In a distributed data base system, the table R1 is horizontally partitioned (into R1, R2) between node1 and node2: R1 is put on node1, R2 is put on node2. You also have a secondary index, I, on R1, attribute A. Consider three possibilities: 1. I is stored on node1 and replicated on node 2. 2. I is stored on node 3, 3. I is partitioned into I1 and I2, where I1 is the secondary index on R1.A and I2 is the secondary index on R2.A. A query wishes to do a join using R.A as one of the join attribute. Which is the best proposal, and what are the advantages and disadvantages compare to the other two proposals? Consider storage requirements, communication costs, amount of parallelism, processing time.