91.4902 – Advanced Database System: Assignment 2 1 QUESTION 1 The implementation of the join operation (R ⋈A=B S) using sort merge is outlined by the following algorithm. set i 1; j 1; while (i n) and (j m) do {if R(i)[A] > S(j)[B] then set j j +1 else if R(i)[A] < S(j)[B] then set i i +1 else {/* R(i)[A] = S(j)[B], so we output a matched tuple*/ set k i; while (k n) and (R(k)[A] = S(j)[B]) do {set l j; while (l m) and (R(i)[A] = S(l)[B]) do {output; l l + 1;} set k k +1;} set i k, j l;}} From the given two relation, R and S, the two tables below are the illustration of the the initial relations before any steps of the abovementioned sort-merge algorithm has been implemented. Note that the two relations relation below has been sorted by the value of the join attributes A and B, and therefore the sort phase of the sort-merge algorithm identified on the fist two lines above, can be ignored. A 3 3 5 6 7 8 C a b c e f g R B 1 2 3 3 8 8 D j j h k f g S The implementation of the sort phase is initiated by the instantiation of two accessing pointers (accessing indexes) – i and j. Following the instantiation of these two accessing index, the values of both indexes are initialized to 1 in order to make reference on the first records of the relations R and S. Since the value of the attribute A in R is greater than the value of the attribute B in S, the value of the index j increment by 1 until the values of both A and B are equal. When j is equal to 3 and i is equal to 1, the value of B is equal to the value of A, and hence the combination of the two tuples referenced by the indexes i and j is assigned as the first record of the output relation T. The followings are the illustrations of these processes. 91.4902 – Advanced Database System: Assignment 2 A 3 3 5 6 7 8 i=1 C a b c e f g j=3 2 B 1 2 3 3 8 8 R i=1 A 3 3 5 6 7 8 S C a b c e f g A 3 R j=3 D j j h k f g B 1 2 3 3 8 8 D j j h k f g C a B 3 D h T S The conditional process of the algorithm proceeds by searching the other record of S that has a matched value of attribute B. This is done by instantiating a new index l and set the value of l to j + 1. Following the instantiation and the initialization of index l, the while loop starts the iteration for searching the matched record of S by examining the value of attribute B. When the index l is equal to 4, the value of B in the 4 th record of S is equal to the value of A in the record referenced by the index i (S(4)[B] = R(1)[A]). Hence, the combination of the two records referenced by the current index is assigned as a new record of relation T. 91.4902 – Advanced Database System: Assignment 2 A 3 3 5 6 7 8 i=1 C a b c e f g l=4 3 B 1 2 3 3 8 8 R i=1 A 3 3 5 6 7 8 C a b c e f g R l=4 B 1 2 3 3 8 8 D j j h k f g D j j h k f g S A 3 3 C a a B 3 3 D h k T S After the creation of new output tuple, the index l is incremented by1. Checking the next record in the relation S against the condition, the searching condition is no longer valid for the current record and the while loop is then terminated when l is equal to 5. In the further step of the algorithm, another internal while loop iteration is implemented against the records of R. In the following while loop, another index, named k, is instantiated and its value is initialized to i + 1. When the index k is equal to 2 and the index j is equal to 3, the value of A in the relation R is equal to the value of B in the relation S, and hence a new output tuple is generated in S by taking the combination of the two matched records. The following graphs illustrated the aforementioned processes. 91.4902 – Advanced Database System: Assignment 2 A 3 3 5 6 7 8 k=2 C a b c e f g j=3 4 B 1 2 3 3 8 8 R k=2 A 3 3 5 6 7 8 C a b c e f g R j=3 B 1 2 3 3 8 8 D j j h k f g D j j h k f g S A 3 3 3 C a a b B 3 3 3 D h k h T S Incrementing the index k by 1 triggers the termination of the second internal while loop iteration, since the value of A in the third record of R is greater than the value of B in the third record of S. In the completion of the main while loop iteration, the last line of the algorithm sets the index i and the index j to the value of k-1 and l-1 respectively. The index i is now equal to 2, whereas the index j is equal to 4. Since the attribute value of A for the second record of R is equal to the attribute value of B for the fourth record of S, a new record in T is therefore created to take the combination of the currently pointed record. For further details, the above processes are illustrated in the followings. 91.4902 – Advanced Database System: Assignment 2 A 3 3 5 6 7 8 i=2 C a b c e f g j=4 5 B 1 2 3 3 8 8 R i=2 A 3 3 5 6 7 8 C a b c e f g R j=4 B 1 2 3 3 8 8 D j j h k f g D j j h k f g S A 3 3 3 3 C a a b b B 3 3 3 3 D h k h k T S Following the above processes, the index l is set to 5 (the value of j + 1). As the index l points to the next record in S, <8, f>, the attribute value of B for the fifth record of S fails to satisfy the second condition of the first internal while loop, and hence the while loop is not valid for execution. Further processing the steps in the algorithm, the index k is set to 3 and the condition for the while loop is applied to the pointed record in R, <5, c>. Since the attribute value of A for the third record of R does not match the attribute value of B for the fourth record of S, the second internal while loop is also failed to be executed. Due to the implementation of the last assignment line, the index value of i is set to 3 (the value of k), and the index value of j is set to 5 (the value of l). This last line ends the current iteration and the algorithm proceeds by checking the value of i and j against the value of n and m respectively. Since the value of i is less than the value of n and the value of j is less than m, the next while loop iteration is triggered for execution. Starting the iteration, the value of A in the current record of R is checked against the value of B in the current record of S. When i is equal to 3 and j is equal to 5, the value of A is less than the value of B and therefore the value of the index i is incremented by 1. Similar happen when i is equal to 4 and j is equal to 5, the index i is further incremented by 1. The 91.4902 – Advanced Database System: Assignment 2 6 current record of R that is pointed by the index i = 5, has the value of A less than the value of B for the current record of S, and the index i is therefore incremented by 1. When the value of the index i is equal to 6, the index refer to the record whose value of A is equal to the value of B for the current record of S. The current records of the relation R and the relation S is combined to generate a new record of relation T, as illustrated in the following. A 3 3 5 6 7 8 i=6 C a b c e f g j=5 B 1 2 3 3 8 8 R i=6 A 3 3 5 6 7 8 C a b c e f g R j=5 B 1 2 3 3 8 8 D j j h k f g D j j h k f g S A 3 3 3 3 8 C a a b b g B 3 3 3 3 8 D h k h k f T S Following the creation of a new output record, the index l is then set to 6 (the value of j + 1). Since the value of l is equal to m and the value of attributes A and B is equal for the currently pointed records of R and S (R(6)[A] = S(6)[B]), those two records are merged to form a newly created record of relation T. The aforementioned processes are illustrated as follow. 91.4902 – Advanced Database System: Assignment 2 A 3 3 5 6 7 8 i=6 C a b c e f g 7 B 1 2 3 3 8 8 l=6 S R i=6 A 3 3 5 6 7 8 C a b c e f g A 3 3 3 3 8 8 R l=6 B 1 2 3 3 8 8 D j j h k f g D j j h k f g C a a b b g g B 3 3 3 3 8 8 D h k h k f g T S After the above illustrated generation of output, the index l is increment by 1. As the index l is equal to 7, the next while loop iteration is not valid for execution because the value of l has exceeded the value of m. The steps of the algorithm proceed with setting the value of k to be the value of i + 1. Due to the set operation applied on k, the value of k exceeds the value of n, and the while loop that follows is therefore not valid for execution. The next line to be implemented is the assignment line that sets the values of indexes i and j to the values of k-1 and l-1 respectively. Since the value of the index i has exceeded n, the next while loop iteration is no longer valid for execution and therefore the sort-merge algorithm is accomplished with the following result. A 3 3 3 3 8 8 C a a b b g g B 3 3 3 3 8 8 D h k h k f g 91.4902 – Advanced Database System: Assignment 2 8 QUESTION 2 S.Name, S.StudentNumber ⋈G.StuNumber = S.StudentNumber ⋈E.SectionIdentifier = G.SectionIdentifier ⋈C.CourseNumber = E.CourseNumber ⋈C.CourseNumber = P.CourseNumber σC.Department = “BC” STUDENT GRADE_REPORT SECTION PREREQUISITE COURSE Name, Student ⋈CourseNum ((σDept = ‘BC’ = CourseNum (COURSE) ⋈CourseNum SECTION ⋈ = CourseNum SecIdentifier = SecIdentifier PREREQUISITE GRADE_REPORT) ⋈StuNumber = StuNumber STUDENT) According to the analysis of the given relational algebra expression, the above illustrated graph represents an initial query tree that corresponds to the algebra expression. Applying the Heuristic Algebraic Optimization Algorithm to the initial query tree, the following is the illustration of the optimized query tree. Although there are six step of transition involved in the complete algorithm, there is only one step applicable to the relational algebra expression under discussion. Hence, the fifth step has been applied to the above illustrated initial query tree, and there are several PROJECT operations created as the result. The purpose of using PROJECT operation here is to limit the resulted attributes of 91.4902 – Advanced Database System: Assignment 2 9 each subtrees to those which are required in the query result and in the subsequent operation of the query tree. ⋈G.StudentNumber = S.StudentNumber G.StudentNumber S.Name, S.StudentNumber ber ⋈E.SectionIdentifier = G.SectionIdentifier E.SectionIdentifier ⋈C.CourseNumber = E.CourseNumber ⋈C.CourseNumber = P.CourseNumber C.CourseNumber σC.Department = “BC” COURSE G.SectionIdentifier, G.StudentNum GRADE_REPORT E.CourseNumber, E.SectionIdentifier P.CourseNumber PREREQUISITE STUDENT SECTION 91.4902 – Advanced Database System: Assignment 2 10 QUESTION 3 Given the initial values of X and Y, the following table illustrates the transition of the values of X and Y at each point of time. Note, the values of N and M used for the modification in both transactions, are equal to 12 and 8 respectively. Time 1 2 3 4 5 6 7 8 9 Transaction 1 READ(X) X:= X - N Transaction 2 T1 T2 T1.X = 90 T1.X = 90 - 12 = 78 READ(X) X:= X + M WRITE(X) READ(Y) T2.X = 90 T2.X = 90 + 8 = 98 X = T1.X = 78 T1.Y = 100 X = T2.X = 98 WRITE(X) Y:= Y + N WRITE(Y) Overwrite the previously updated value of X T1.Y = 100 + 12 = 112 Y = T1.Y = 112 In the above computation of T1 and T2, the final value of the data item X is shown to be 98 (at time = 7, X = T2.X = 98), while the final value of the other data item, Y, is shown to be 112 (at time = 9, X = T1.Y = 112). These figures of X and Y had resulted due to the last write operation on X performed by T2 and the only write operation on Y performed by T1 respectively. However, the above interleaved transaction had yielded an incorrect value of X. The incorrectness of the final value of X was due to the interleaving write operation on X that overwrote the value of X previously written by T1. This means that the value of X was first updated to 78 and written back to the database, and later at time = 7, the interleaving write operation overwrote the value of X written previously. The aforementioned update problem is known as the lost update problem. The following computational shows the correct value of X and Y, and therefore further proof on why the above interleaved transactions had caused an incorrect update value of X. T1 T2 T1.X = 90 T1.X = 90 - 12 = 78 T1 X = T1.X = 78 T1.Y = 100 T1.Y = 100 + 12 = 112 Y = T1.Y = 112 T2.X = 78 T2 T2.X = 78 + 8 = 86 X = T2.X = 86 T2 T1 T2.X = 90 T2 T2.X = 90 + 8 = 98 X = T2.X = 98 T1.X = 98 T1.X = 98 - 12 = 86 T1 X = T1.X = 86 T1.Y = 100 T1.Y = 100 + 12 = 112 Y = T1.Y = 112 91.4902 – Advanced Database System: Assignment 2 11 The above computational processes implemented the transactions one after another. The first computational column implemented T1 and then T2, while the second implemented T1 after T2. In either way, both of computations yielded the same values of X and Y, which were 86 and 112 respectively. These two results contradict with the two results of the former computational processes, and hence the interleaved transactions can be proven to yield an incorrect update. 91.4902 – Advanced Database System: Assignment 2 12 QUESTION 4 S1: R1(X), R2(X), R1(Y), R2(Z), W1(X), R1(Z), W2(Z), C2, W1(Y), C1. Given the above representation, the schedule S1 is identified to consist of two transactions. The operations belong to the first transaction are highlighted in yellow, while the operation belong to the second transaction are highlighted in light blue. From the definition of strict schedule, a strict schedule is a schedule in which transactions can neither read nor write an item X until the last transaction that wrote X has committed. Analyzing the above schedule, there were three write operations that were applied on the data item X, Z and Y. As we identify each operation by the color and the data item it accessed, the first write operation that was on data item X and was performed by the first transaction was not followed by any read or write operation performed on the same data item and by different transaction. Similarly, the second and third write operation that were on data items X and Z respectively, were not followed by any read or write operation accessing the same data items and performed by different transaction. Hence, there was no violation of the strict schedule occurred before both of the transactions committed, and the above schedule is said to be a strict schedule. Since S1 is strict schedule, it implies that the transaction S1 is also cascadeless and recoverable. S2: R1(X), W1(X), R2(Z), R2(Y), R2(X), W2(Y), C2, A1 Using the same representation, the first transaction is identified by the operations highlighted in yellow, whereas the second transaction is identified by the operations highlighted in blue. Analyzing the above schedule, the first transaction is identified to write on data item X on the second operation of the schedule. Following this write operation, the second transaction triggered the read operation on the data item X that had been written by the first transaction. Since the second transaction had read the data item X written by the first transaction and it had committed before the first transaction, this schedule is nonrecoverable. Due to the non-recoverability of the above schedule, it implies that S2 is neither cascadeless nor strict. S3: R1(X), W1(X), R1(Y), R2(Y), W2(Y), W1(Y), C1, R2(X), W2(X), C2 Referring to the above representation of the schedule S3, there is a violation against the definition of strict rule in the sixth operation of the schedule. W2(Y) implies the write operation on Y that is performed by the second transaction. Since the second transaction had not committed while the first transaction wrote on the same data item, the above schedule is not strict. Checking the schedule against the definition of cascadeless schedule, the above schedule is identified to be cascadeless. The above schedule is said to be cascadeless because all the read operations occurred in the schedule only applied on the data items written by the committed schedule.