Exercises to Chapter 15 – Transactions 15.1 List ACID properties, explain usefulness of each. A – Atomicity – transaction has several actions, after each intermediate of them Database may be in not consistent state, hence either all of them must be successful, either none C – Consistency – transactions are to transit database from one consistent state to another, preserve consistency I – Isolation – for improving throughput, transactions may run concurrently, but each of them must give same output as in exclusive (isolated) mode of execution D – Durability – results of changes made by successful transaction must remain in database, must be durable even if system will fail. 15.2. Suppose that there is a database that never fails. Is a recovery manager required for this system? In the case of software errors in written transaction, or due to the absence of some data, transactions will be aborted, and they are to be rolled-back 15.3. Consider file system. What are the steps involved in creation and deletion files, in writing data to files? Explain how the issues of atomicity and durability are relevant to the creation and deletion of files, writing data to files? For creation-deletion of file, there must be created/deleted directory entry for this file. In the case of creation disk clusters are to be allocated to file, in the case of deletion, allocated clusters are to be deallocated. Data are written to the RAM buffer, from which they are transferred to disk in the case of overfilling, or when file is closed, or when buffer is flushed by explicit command. Durability is important for files – we expect that saved data will be persistent on disk. Atomicity is not guaranteed for applications, but it must be provided for operations of deletion and creation, otherwise, we shall lose disk space. 15.4. Database implementers paid much attention to ACID properties, but file-system implementers have not. Why? Because database application are crucially to be atomic and durable, violation can cause real-world problems, hence this job for providing ACID was paid for. 15.5. List possible states through which transactions may pass. Active=>Partially committed=> committed Active=>Partially committed=>failed=>aborted Active=> failed=>aborted 15.6. Why concurrent execution of transactions is important in the case of long transactions or transactions working with (slow) disk, and not important for short transactions? Concurrent execution of transaction involves overhead for their management, context switching. If transactions are short, overhead may be comparable with times of their execution, and concurrent execution will not be beneficial. But in the case of transactions working with disks, when one of them waits for I/O operation termination, other may use processor. Also, in the case of mixing of long and short transaction, concurrent execution decreases response times of short transactions, and increases throughput of the system. 15.7. Explain distinction between serial and serializable schedules. Serial schedule assumes sequential execution of transactions. Serializable schedule is a parallel schedule equivalent (conflict, view) to some serial schedule, i.e. providing same outputs. 15.8. Consider the following 2 transactions: T1: read(A) Read(B); If(A==0)B++; Write(B); T2: read(B); Read(A); If(B==)A++; Write(A); Let the consistence requirement be A==0 V B==0, with A=B=0 as initial values a)Show that every serial execution of these transactions preserves consistency b)Show that concurrent execution produces not-serializable schedule c)Is there concurrent execution producing serializable schedule? a) Serial schedules may be only 2: T1 T2 or T2 T1. Let’s consider 1st variant T1 T2: T1 will read initial values of A, B in memory (0,0), A will satisfy condition in if operator, hence B will become 1, this B=1 will be written back to the disk. After execution of T1 A=0, B=1, consistency condition holds. Then T2 will read B,A (1,0), B will not satisfy condition in if statement, hence A will not be modified, and A=0 will be written back, not changing previous value, hence, consistency again will be preserved. 2nd variant is treated similarly, and we get again that consistency will be preserved. b,c) In concurrent execution at least 1 operation of one transaction must start before termination of the other transaction. Last operation is write(B) or write(A). 1st operation in other transaction is read (B) or write(A), which are in conflict with the 1st operation of other transaction. Hence, any parallel schedule will not be conflict serializable. It will not be also view serializable, because operation of modification is last operation in both transactions, and in any parallel schedule both transactions will read initial values of A, B, but in any sequential schedule one transaction will use results written by the previous one. As far as in any parallel schedule, both transactions will read initial values of A,B (0,0), in both of them if condition will be true, and each of them will make modifications. So, after their parallel execution both A,B will be incremented to (1,1), hence consistency condition will be violated. 15.9. Since every conflict-serializable is also view-serializable, why do we emphasize conflict serializability rather than view-serializability? Because conflict-seriliazability needs in simple algorithms for its checking, while checking of view-seriliazability belongs to NP-complete problems. 15.10. Consider precedence graph T1 T2 T4 T3 T5 Is the corresponding schedule conflict-serializable? Precedence graph is built according to some schedule S, and has nodes corresponding to transactions, and 2 nodes, T1, T2 are connected by edge directed from T1 to T2, if these transaction have pair of conflicting instructions, and conflicting instruction of that pair in T1 must be executed according to S before respective instruction of T2. Schedule is conflict serializable, if respective precedence hasn’t cycles. Presence of cycle means that there are conflicting instructions which are to be executed at first in each transaction involved in cycle. This means that in any sequential schedule required sequence of such conflicting instructions will be violated. Our precedence graph has not cycles, so respective schedule is conflict serializable. For finding serial schedule, conflictequivalent to schedule represented by such precedence graph, we are to determine sequence of execution of transactions complying to the precedence graph. We can’t take as 1st transaction to be executed T5 since conflicting instructions of T4, T3 are to be executed before it. Considering our graph, we come to conclusion that only T1 may be chosen as the 1st transaction for execution, as not having predecessors. Similarly, 2nd may be only T2, having T1 as predecessor. Next may be chosen either T3, either T4, each of them is to be executed both after T1, T2. So, our possible schedules, will be either T1, T2, T3, T4, T5, either T1, T2, T4, T3, T5, each of corresponds to the topological order between nodes represented by precedence graph. For machine processing, graph may be represented by incidence matrix: 1 2 3 4 5 1 1 1 1 2 1 1 3 1 4 1 5 Having n rows, n columns, n is a number of nodes, ij-th element is 1 if there is edge directed from node Ti to node Tj, otherwise element is 0, 0 elements we haven’t shown (empty cells). Number of 1-s is equal to number of edges. To find successor of some node we are to examine respective row, for example, successors of T1 are T2, T3, T4, because 1-s in the 1-st row are in columns 2,3,4. Similarly, predecessors of any node may be found by analysis of respective column, for example, T3 has 2 predecessors T1, T2 because in the 3rd column we have 1-s in rows 1,2. To find node, not having predecessors, we are to find column with all zeroes, this will be only 1st column, hence, 1st task in serial schedule will be T1. Then we zero 1st row, and look for new zero columns, only 2nd column will be zeroed, hence T2, will be 2nd in the schedule. We zero 2nd row, and columns 3, 4 become zeroed, hence either T3, either T4 may be chosen next, for example, T4. We zero its row, no new zeroed columns will appear, so next will be T3. After zeroing 3rd row, 5th column will become zeroed, and T5 will be the last task in our schedule. This procedure gives also the way for checking presence of loops: if after our procedure all matrix will be zeroed, then there are no loops, otherwise there will be 1-s corresponding to nodes involved in loops. Complexity of this procedure I O(n2). 15.11. What is a recoverable schedule? Why recoverability is desirable? Are there any circumstances in which it would be desirable to allow non-recoverable schedules? Recoverability assumes ability to recover previous consistent state of database in the case of transaction failure, so it is important feature. For maintenance of such a feature in concurrent environment, when multiple transactions execute simultaneously, nonrecoverable situation may occur if transaction T1 which have used results provided by other transaction T2, will commit, but after that moment T2 will continue execution and will fail. So, results provided by T2 were already used by T1, and since T1 has committed this situation can’t be rolled-back. For providing recoverability, each transaction, which uses results provided by other transactions, must not commit before all of these transactions will commit. This may lead to large response time for transactions which use results provided by other long transactions. In the case of such circumstances, if probability of failure of transactions providing information for other transactions, it may be allowed to use non-recoverable schedules. 15.12. What is a cascadeless schedule? Why is cascadelessness of schedules is desirable? Are there any circumstances under which it would be desirable to allow non-cascadeless schedules? A cascadeless schedule is one in which failure of transaction results of which are to be used in other transactions, will not lead to necessity of rolling-back latter transactions. This is achieved by allowing to read results provided by some transaction, only after commitment of the latter. This restricts concurrency of execution of transactions, and if probability of transactions failure is small, it may be desirable to allow non-cascadeless schedules.