Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Transactions (based on slides by C. Faloutsos at CMU) General Overview Relational model - SQL Functional Dependencies & Normalization Physical Design &Indexing Query optimization Transaction processing concurrency control recovery Transactions - dfn = unit of work, eg. move $10 from savings to checking Atomicity (all or none) Consistency () Isolation (as if alone) Durability (changes persist) recovery concurrency control Operational details ‘read(x)’: fetches ‘x’ from disk to main memory (= buffer) ‘write(x)’: writes ‘x’ to disk (sometime later) power failure -> troubles! Also, could lead to inconsistencies... Durability transactions should survive failures (after a transaction completes succesfully the changes in the DB persist) Atomicity straightforward: Checking = Checking + 10 Savings = Savings - 10 Consistency eg., the total sum of $ is the same, before and after (but not necessarily during) Isolation Other transactions should not affect us counter-example: lost update problem: read(N) read(N) N=N-1 N=N-1 write(N) write(N) Transaction states partially active committed failed committed aborted Outline concurrency control (-> isolation) ‘correct’ interleavings how to achieve them recovery (-> durability, atomicity) Concurrency why do we want it? Example of interleaving: T1: moves $10 from savings (X) to checking (Y) T2: adds 10% interest to everything Interleaved execution Read(X) Read(X) X=X-10 Write(X) Read(Y) Y=Y+10 Write(Y) ‘correct’? X = X * 1.1 Write(X) Read(Y) Y=Y*1.1 Write(Y) How to define correctness? Let’s start from something definitely correct: Serial executions Serial execution T1 Read(X) X=X-10 Write(X) Read(Y) Y=Y+10 Write(Y) T2 ‘correct’ Read(X) X = X * 1.1 Write(X) Read(Y) Y=Y*1.1 Write(Y) by definition How to define correctness? A: Serializability: A schedule (=interleaving) is ‘correct’ if it is serializable, ie., equivalent to a serial interleaving (regardless of the exact nature of the updates) examples and counter-examples: Example: ‘Lost-update’ problem T1 Read(N) T2 Read(N) N=N-1 N= N-1 Write(N) Write(N) not equivalent to any serial execution (why not?) -> incorrect! More details: ‘conflict serializability’ T1 Read(N) T2 Read(N) N=N-1 N= N-1 Write(N) Write(N) Conflict serializability r/w: eg., object X read by Ti and written by Tj w/w: ........written by Ti and written by Tj PRECEDENCE GRAPH: Nodes: transactions Arcs: r/w or w/w conflicts Precedence graph T1 Read(N) T2 Read(N) N=N-1 N= N-1 N Write(N) T2 N Write(N) T1 Cycle -> not serializable Example T1 Read(A) … write(A) T2 T3 Read(A) … Write(A) Read(B) … Write(B) Read(B) … Write(B) Example T1 Read(A) … write(A) T2 T3 A T3 Read(A) … Write(A) Read(B) … Write(B) Read(B) … Write(B) T1 B T2 serial execution? Example A: T2, T1, T3 (Notice that T3 should go after T2, although it starts before it!) Q: algo for generating serial execution from (acyclic) precedence graph? Example A: Topological sorting A topological sort of a DAG=(V,E) is a linear ordering of all its vertices such that if G contains an edge (u,v), then u appears before v in the ordering. …it is the ordering of its vertices along a horizontal line so that all directed edges go from left to right. …topologically sorted vertices appear in reverse order of their finishing times according to depth first search (DFS) Serializability Ignore ‘view serializability’ We assume ‘no blind writes’, ie, ‘read before write’ (counter) example: ‘Inconsistent analysis’ T1 Read(A) A=A-10 Write(A) T2 Read(A) Sum = A Read(B) Sum += B Read(B) B=B+10 Write(B) Precedence graph? Conclusions ‘ACID’ properties of transactions recovery for ‘A’, ‘D’ concurrency control for ‘I’ correct schedule -> serializable precedence graph acyclic -> serializable