Distributed Computer System TDDB47 Real Time Systems • Definition … a system of multiple autonomous processing elements, cooperating in a common purpose or to achieve a common goal. – Excludes computer networks with no common purpose, e.g., Internet – although Internet computers computing genome information are • Tightly coupled: access to common memory – Synchronization possible by the use of shared variables Loosely coupled: no common memory – Synchronization by the use of message passing Lecture 6: Distributed systems Calin Curescu Real-Time Systems Laboratory Department of Computer and Information Science Linköping University, Sweden • The lecture notes are partly based on lecture notes by Simin Nadjm-Tehrani, Jörgen Hansson, Anders Törne. They also loosely follow Burns’ and Welling book “Real-Time Systems and Programming Languages”. These lecture notes should only be used for internal teaching purposes at Linköping University. Lecture 6: Distributed systems Calin Curescu 30 pages • Homogeneous vs. heterogeneous system Lecture 6: Distributed systems Calin Curescu Reasons for distribution • • • • Lecture 6: Distributed systems Calin Curescu Issues • Exploitation of parallelism – Improved performance • Blue Gene/L – 33000 CPUs - 70.72 teraflops • Heavy duty computation – e.g. weather forecast Exploitation of redundancy – Increased availability and reliability! – More faults in the system! • Banking, communications Dispersion of computing power to locations where it is used • Engine control, brake system, gearbox control, airbag … Addition or enhancement of processors and communication links – Scalability, load balancing • Web server farms • • • 3 of 30 Language support – Support for partition, configuration, allocation, reconfig. – Distribution transparency? • RPC, Real-time CORBA Dependability & Reliability – Possibility for more reliability – Deal with partial failures Distributed control algorithms – Distributed process synchronisation – Communication system support Scheduling – Ensure end-to-end deadlines – Single processor systems are not optimal any longer Lecture 6: Distributed systems Calin Curescu Dependability & Distribution • Making systems fault-tolerant typically uses redundancy • Brake-by-wire – Redundancy: Having distributed sensors and actuators makes brake control more fault-tolerant • Distributed decision – Has more information – May impose sub-optimal action with respect to local decision – What if one node is acting differently than the others? Local decision – May take the best action on local conditions – What if there is a reading error? • Lecture 6: Distributed systems Calin Curescu 5 of 30 4 of 30 Justifying safety – Redundancy in space leads to distribution – But distributed systems are not necessarily faulttolerant! 2 of 30 Lecture 6: Distributed systems Calin Curescu 6 of 30 Justifying availability • Active replication – Group membership • Passive replication – Primary – backup Lecture 6: Distributed systems Calin Curescu 7 of 30 Distributed systems & FT • Introduce new complications – No global clock – Richer failure models • Node failures • Communication failures • Provides replication and group mechanisms – Transparency in treatment of faults – Like N-Version Programming • Even better Lecture 6: Distributed systems Calin Curescu Failure models • Node failures – Crash – Omission – Byzantine (arbitrary) • Channel failures – Crash (and potential partitions) – Message loss – Erroneous/arbitrary messages Lecture 6: Distributed systems Calin Curescu 9 of 30 ”Chicken and egg” problem • Replication is useful in presence of failures if there is a consistent common state among replicas – What happens when a replica fails? • Active ? • Passive ? • To get consistency, processes need to communicate their state via broadcast • But broadcast algorithms are distributed algorithms that run on every node – also affected by failures… Lecture 6: Distributed systems Calin Curescu A useful broadcast • Reliable broadcast – All non-crashed processes agree on messages delivered • I.e for any message m, if a correct process delivers m, then every correct process delivers m. • Agreement property – No spurious messages • I.e. no erroneous, duplicated or created messages • Integrity property – All messages broadcast by non-crashed processes are delivered • Validity property Lecture 6: Distributed systems Calin Curescu 11 of 30 8 of 30 10 of 30 How to implement? • The first step is to separate the underlying network (transport) and the broadcast mechanism • Distinguish between receipt and delivery of a message Lecture 6: Distributed systems Calin Curescu 12 of 30 Common channel assumptions • Communication channel assumptions Reliable broadcast • – No link failures lead to partition – Send does not duplicate or change messages Within every process p – Execute broadcast(m) of message m by: • adding sender(m) and a unique ID as a header to the message m • send(m) to all neighbours including itself – When receive(m): • if previously not executed deliver(m) then • if sender(m) ≠ p then send(m) to all neighbours • deliver(m) – Receive does not ”invent” messages Lecture 6: Distributed systems Calin Curescu 13 of 30 Lecture 6: Distributed systems Calin Curescu Failures • • What happens if p fails – Directly after a receipt – While relaying – Before sending the message – After sending to some, but not all neighbours The consensus problem • Processes p1,…, pn take part in a decision – Each pi proposes a value vi – All correct processes decide on a common value v that is equal to one of the proposed values • Desired properties – Every correct process eventually decides • Termination property – No two correct processes decide differently • Agreement property – If a process decides v then the value v was proposed by some process • Validity property Prove correctness of algorithm by proving the necessary properties in: – Validity – Integrity – Agreement – Order Lecture 6: Distributed systems Calin Curescu 15 of 30 Lecture 6: Distributed systems Calin Curescu Basic impossibility result [Fischer, Lynch and Paterson 1985] • There is no deterministic algorithm solving the consensus problem in an asynchronous distributed system with a single crash failure. 14 of 30 16 of 30 Assume Synchrony: • Distributed computations proceed in rounds initiated by pulses • Pulses implemented using local physical clocks, synchronised assuming bounded message delays • Why? Lecture 6: Distributed systems Calin Curescu 17 of 30 Lecture 6: Distributed systems Calin Curescu 18 of 30 Byzantine generals • A difficult problem solved in 1980 by Pease, Shostak and Lamport • Consensus in the wake of arbitrary (node) failures – Each process may fail in an arbitrary way (may be malicious) • Theorem: There is an upper bound t for the number of Byzantine failures compared to the size of the network N – N ≥ 3t+1 • Gives a t+1 round algorithm for solving consensus in a synchronous network Lecture 6: Distributed systems Calin Curescu Scenario 1 • 19 of 30 G and L1 are correct, L2 is faulty Lecture 6: Distributed systems Calin Curescu 20 of 30 Scenario 2 • G and L2 are correct, L1 is faulty Scenario 3 • Lecture 6: Distributed systems Calin Curescu 21 of 30 L1 and L2 are correct, G is faulty Lecture 6: Distributed systems Calin Curescu 2-round algorithm • … does not work with t=1, N=3! • Seen from L1, scenario 1 and 3 are identical, so if L1 decides 1 in scenario 1 it will decide 1 in scenario 3 • Similarly for L2, if it decides 0 in scenario 2 it decides 0 in scenario 3 • L1 and L2 do not agree in scenario 3 ! Lecture 6: Distributed systems Calin Curescu 23 of 30 22 of 30 Distributed Scheduling • Characteristics of a synchronous distributed system – Upper bound on communication delays – Local clocks available and drift is bounded – Each node make progress at minimum rate • Dynamic processor allocation – Anomalies: Response time might increase if • WCET is decreased; • priority is increased; or • number of nodes is increased. Lecture 6: Distributed systems Calin Curescu 24 of 30 Allocation problem • P1, P2: WCET=25, Period=50, P3: WCET=80, Period=100 • P1 -> CPU1; P2-> CPU2; P3 -> CPU1 or CPU2 – Not feasible • P1 & P2 -> CPU1; P3 -> CPU2 – Feasible schedule Lecture 6: Distributed systems Calin Curescu 25 of 30 Allocation • Static allocation of processes more reliable than dynamic reallocation of processes – Low utilisation – Schedulability analysis for each processor • Remote blocking – Difficult problem – Replicate data to other nodes in order to ensure local access • Practical approach: – Static allocation for safety-critical (periodic and sporadic) processes; let aperiodic processes migrate. Lecture 6: Distributed systems Calin Curescu Bidding Algorithm 26 of 30 Scheduling of Communication • Aperiodic processes arrives at some node – Admission is performed to see if process can be guaranteed locally • If yes, then admit process • If no, initiate bidding • Different problem than CPU scheduling – Non-preemptive by nature – Distributed protocol is required – Additional deadlines at buffers of communication nodes • • Bidding: request information about current state (processing surplus) at other nodes. – Migrate process to the node with highest surplus – Admission of process is performed at the new node • If no, initiate bidding... (if the deadline is still feasible) • TDMA - Time Division Multiple Access – Time Triggered Architecture (TTA) Priority-based CAN – Event based Token Passing • Send is only allowed of node is holding a token • Token hold time is bounded • • • Improvement to bidding: focused addressing Lecture 6: Distributed systems Calin Curescu 27 of 30 In principle any communication system offering bounded message passing should do Lecture 6: Distributed systems Calin Curescu TTP 28 of 30 Reading • The time triggered architecture [Kopetz et. al] • Chapter 14 of Burns & Wellings • TDMA – Allocates pre-defined slots within which pre-defined nodes can send their pre-defined messages – Periodical architecture • Chapter 8 of Herman Kopetz book, Realtime Systems, Design principles for distributed embedded applications • Article by Ramamritham, Stankovic, and Zhao, IEEE Transactions on Computers, Volume 38(8), August 1989 • Temporal firewall • Replication & failure detection Lecture 6: Distributed systems Calin Curescu 29 of 30 Lecture 6: Distributed systems Calin Curescu 30 of 30