Synchronization Chapter 5 Clock Synchronization Example: “make” program (real) time When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time. Fact: Different machines have different clocks. Problem: How to synchronize (distributed) clocks? clock synchronization algorithms Cristian's Algorithm Delay D Getting the current time from a time server. Main idea: Use a time server. This server has its own clock, and all other machines use time as reference time. Time server: passive it only responds to “what time is it?”-requests. Problem: Client gets “current” time after delay D! Simple estimates for D : Dest = (T1-T0)/2 or if I is known Dest = (T1-T0-I) /2 If T is the time reported by the time server, the client sets its current time to T + Dest its The Berkeley Algorithm Ave = 1/3 *(0 + (-10) + 25) = 5 a) The time daemon asks all the other machines for their clock values b) The machines answer c) The time daemon tells everyone how to adjust their clock Characteristics: Also based on a time server. Time server is active (unlike Cristian’s algorithm) Lamport Timestamps Machine 1 Situation: Three machines, each has own clock. Each clock has its own rate. Clocks must be corrected (forward), if sending time happens to be after receiving time. S: sending time, R: receiving time. S R Action m1: 0 2 Nothing m2: 4 9 Nothing 16 m3: 12 10 Correction 17 m4: 12 7 Correction Machine 2 Machine 3 0 0 1 2 3 2 4 3 6 9 4 8 12 0 m1 m2 m3 6 5 13 10 6 15 12 18 7 m4 17 14 21 8 19 16 24 Corrections 15 Idea: Why synchronization with “real” time? Approach: Processes that interact with each other should agree on the ordering of events only. Actual event times are not important (in many applications). logical time Example: Totally-Ordered Multicasting Updating a replicated database and leaving it in an inconsistent state. Solution: Use timestamps. These deliver a global notion of time. All messages are multicast (to all replicas) and all replicas multicast acknowledgements back to other ones. Each replica process holds a queue ordered using timestamps. Thus, all queues will be the same in all processes. An operation (e.g. update) is performed only after the acknowledgement of all processes and if it has the smallest timestamp (at head of queue). Global State (1) a) b) A consistent cut: Send events of each receive event are included in the cut. An inconsistent cut: Receive event of m2 has no corresponding send event! Global State (2) a) Organization of a process and channels for a distributed snapshot Snapshot: Models a (consistent) global state of the (distributed) system including local states of processes/machines and states of communication channels (e.g. sent/received messages). Global State (3) b) c) d) Process Q receives a marker for the first time and records its local state Q records all incoming messages (new marker received) Q receives a marker for its incoming channel and finishes recording the state of the incoming channel Useful for distributed termination detection. The Bully Algorithm (1) The bully election algorithm a) Process 4 holds an election b) Process 5 and 6 respond, telling 4 to stop c) Now 5 and 6 each hold an election Global State (3) d) e) Process 6 tells 5 to stop Process 6 wins and tells everyone A Ring Algorithm Messages initiated from 5 Messages initiated from 2 Process 2 and Process 5 detect simultaneously that the coordinator (Process 7) is down. Election algorithm using a ring. Mutual Exclusion: A Centralized Algorithm: Algorithm 1 a) b) c) Process 1 asks the coordinator for permission to enter a critical region. Permission is granted Process 2 then asks permission to enter the same critical region. The coordinator does not reply. When process 1 exits the critical region, it tells the coordinator, when then replies to 2 Main problems: Coordinator must be always up. Scalability problems, when many processes are involved. A Distributed Algorithm: Algorithm 2 2 a) b) c) Process 0 releases resource Two processes want to enter the same critical region at the same moment. Process 0 has the lowest timestamp, so it wins. When process 0 is done, it sends an OK also, so 2 can now enter the critical region. -: Every process does almost the same work (performance degradation). -: Too much message traffic. -: Worse yet, ANY process crash means that ALL processes will hang; solution: use of additional “denial” messages (when receiver is in critical region). +: ? or it is a distributed algorithm. A Token Ring Algorithm: Algorithm 3 Token a) b) An unordered group of processes on a network. A logical ring constructed in software. Token circulates in defined order Only process holding the token can access resource Problems: Token may be lost; process may crash. Comparison Algorithm Messages per entry/exit Delay before entry (in message times) Problems Coordinator crash Algorithm 1 (Centralized) Algorithm 2 (Distributed) 3 2 Request, Grant, Release Request + (delayed) Grant 2(n–1) 2(n–1) Request + Grant to/from each process n Requests and n (delayed) Grants (unusual case: centralized algorithm is more reliable as decentralized ones!) Crash of any process 1 to Algorithm 3 (Token ring, also distributed) 1 token pass, when every process is using the resource. Unlimited number of token passes, when none of the processes is using resource. 0 to n – 1 0: Token is now here n-1: Token just left Lost token, process crash A comparison of three mutual exclusion algorithms. The Transaction Model (1) Primitive Description BEGIN_TRANSACTION Make the start of a transaction END_TRANSACTION Terminate the transaction and try to commit ABORT_TRANSACTION Kill the transaction and restore the old values READ Read data from a file, a table, or otherwise WRITE Write data to a file, a table, or otherwise Examples of primitives for transactions. Transactions are data(base)-oriented operations that have the so-called ACID property: A: Atomicity or All-or-nothing: Operations of a transaction are indivisible. C: Consistency: Data are left consistent after transactions. I: Isolation: Concurrent transactions do not interfere with each other. D: Durability: Changes of a committed transaction are permanent. The Transaction Model (2) BEGIN_TRANSACTION reserve Lethbridge Calgary; reserve Calgary Toronto; reserve Toronto New York; END_TRANSACTION (a) a) b) BEGIN_TRANSACTION reserve Lethbridge Calgary; reserve Calgary Toronto; reserve Toronto New York: full ABORT_TRANSACTION (b) Transaction to reserve three flights commits Transaction aborts when third flight is unavailable Distributed Transactions (B) A B Hotel database a) b) A nested transaction: Made of subtransactions operating on logically different databases. A distributed transaction: Made of subtransactions operating on logically the same but distributed database. Private Workspace a) b) The file index and disk blocks for a three-block file The situation after a transaction has modified block 0 and appended block 3 c) After committing Failure Private index and new (shadow) blocks are deleted. Writeahead Log x = 0; y = 0; BEGIN_TRANSACTION; x = x + 1; y=y+2 x = y * y; END_TRANSACTION; (a) Log Log Log [x = 0 / 1] [x = 0 / 1] [y = 0/2] [x = 0 / 1] [y = 0/2] [x = 1/4] (b) (c) (d) a) A transaction b) – d) The log before each statement is executed Failure File is reconstructed by means of log; rollback recovery (log is read backwards) Concurrency Control a) b) a) General organization of managers for handling transactions. b) Distributed transactions. Serializability BEGIN_TRANSACTION x = 0; x = x + 1; END_TRANSACTION (a) BEGIN_TRANSACTION x = 0; x = x + 2; END_TRANSACTION BEGIN_TRANSACTION x = 0; x = x + 3; END_TRANSACTION (b) (c) Schedule 1 x = 0; x = x + 1; x = 0; x = x + 2; x = 0; x = x + 3 Legal Schedule 2 x = 0; x = 0; x = x + 1; x = x + 2; x = 0; x = x + 3; Legal Schedule 3 x = 0; x = 0; x = x + 1; x = 0; x = x + 2; x = x + 3; Illegal (d) a) – c) Three transactions T1, T2, and T3 d) Possible schedules Two-Phase Locking Two-phase locking. Strict twophase locking. Pessimistic Timestamp Ordering Concurrency control using timestamps. (a) and (b): T2 can proceed, since only earlier TAs (T1) used item x. (c) and (d): T3 now used item x, so abort T2. (e): T2 can read. (f): T2 has to wait until T3 has done its work. (g) and (h): abort T2, since item has later changed by T3.