Outline • Announcement • Distributed scheduling – continued • Quiz at the end of today’s class Announcement • Schedule for the rest of the semester – 4/10: Recovery – 4/15: Fault tolerance – 4/17: Class evaluation Protection and security – 4/22: Protection and security – continued Quiz #3 – 4/24: Existing distributed systems and review • Final exam – 5:30-7:30PM, April 29, 2003 – Cumulative May 29, 2016 COP 5611 - Operating Systems 2 Motivations May 29, 2016 COP 5611 - Operating Systems 3 Motivations – cont. May 29, 2016 COP 5611 - Operating Systems 4 Distributed Scheduling • A distributed scheduler is a resource management component of a distributed operating system that focuses on judiciously and transparently redistributing the load of the system among the computers to maximize the overall performance May 29, 2016 COP 5611 - Operating Systems 5 Components of a Load Distributing Algorithm • Four components – Transfer policy • Determines when a node needs to send tasks to other nodes or can receive tasks from other nodes – Selection policy • Determines which task(s) to transfer – Location policy • Find suitable nodes for load sharing – Information policy May 29, 2016 COP 5611 - Operating Systems 6 Stability • The queuing-theoretic perspective – The CPU queues grow without bound if arrival rate is greater than the rate at which the system can perform work – A load distributing algorithm is effective under a given set of conditions if it improves the performance relative to that of a system not using load distribution • Algorithmic stability – An algorithm is unstable if it can perform fruitless actions indefinitely with finite probability • Processor thrashing May 29, 2016 COP 5611 - Operating Systems 7 Sender-Initiated Algorithms May 29, 2016 COP 5611 - Operating Systems 8 Receiver-Initiated Algorithms May 29, 2016 COP 5611 - Operating Systems 9 Empirical Comparison of Sender-Initiated and Receiver-Initiated Algorithms May 29, 2016 COP 5611 - Operating Systems 10 Symmetrically Initiated Algorithms • Sender-initiated component – A sender broadcasts a TooHigh message, sets a TooHigh timeout alarm, and listens for an Accept – A receiver that receives a TooHigh message cancels its TooLow timeout, sends an Accept message to the sender, and increases its load value – On receiving an Accept message, if the site is still a sender, choose the best task to transfer and transfer it – If no Accept has been received before the timeout, it broadcasts a ChangeAverage message to increase the average load estimates at the other nodes May 29, 2016 COP 5611 - Operating Systems 11 Symmetrically Initiated Algorithms – cont. • Receiver-initiated component – It broadcasts a TooLow message, set a TooLow timeout alarm, and starts listening for a TooHigh message – If TooHigh message is received, it cancels its TooLow timeout, sends an Accept message to the sender, and increases its load value – If no TooHigh message is received before the timeout, the receiver broadcasts a ChangeAverage message to decrease the average at other nodes May 29, 2016 COP 5611 - Operating Systems 12 Comparison May 29, 2016 COP 5611 - Operating Systems 13 Adaptive Algorithms • A stable symmetrically initiated algorithm – Each node keeps of a senders list, a receivers list, and an OK list • By classifying the nodes in the system as Sender/overloaded, Receiver/underloaded, or OK using the information gathered through polling May 29, 2016 COP 5611 - Operating Systems 14 A Stable Symmetrically Initiated Algorithm – cont. • Sender-initiated component – The sender polls the node at the head of the receiver – The polled node moves the sender to the head of its sender list and sends a message indicating it is a receiver, sender, or OK node – The sender updates the polled node based on the reply – If the polled node is a receiver, it transfers a task – The polling process stops if its receiver’s list becomes empty, or the number of polls reaches a PollLimit May 29, 2016 COP 5611 - Operating Systems 15 A Stable Symmetrically Initiated Algorithm – cont. • Receiver-initiated component – The nodes polled in the following order • Head to tail of its senders list • Tail to head in the OK list • Tail to head in the receivers list May 29, 2016 COP 5611 - Operating Systems 16 A Stable Sender-Initiated Algorithm • This algorithm uses the sender-initiated algorithm of the stable symmetrically initiated algorithm – Each node is augmented by an array called the statevector • It keeps track of its status at all the other nodes in the system • It is updated based on the information at the polling stage – The receiver-initiated component is replaced by the following protocol • When a node becomes a receiver, it informs all the nodes that are misinformed May 29, 2016 COP 5611 - Operating Systems 17 Comparison May 29, 2016 COP 5611 - Operating Systems 18 Performance Under Heterogeneous Workloads May 29, 2016 COP 5611 - Operating Systems 19 Selecting a Suitable Load Sharing Algorithm • The best algorithm depends on the system under consideration – For example, if the system never attains high loads, sender-initiated algorithms will give an improved algorithm – Stable scheduling algorithms should be used for systems that can reach high loads – For systems with heterogeneous work loads, adaptive stable algorithms are preferable May 29, 2016 COP 5611 - Operating Systems 20 Other Requirements of Load Distributing • Scalability – The algorithm should work well in large distributed systems • • • • Location transparency Determinism Preemption Heterogeneity May 29, 2016 COP 5611 - Operating Systems 21 Case Studies • • • • The V-System The Sprite system Condor system The Stealth distributed scheduler May 29, 2016 COP 5611 - Operating Systems 22 Task Placement vs. Task Migration • Task placement vs. task migration – Task placement refers to the transfer of a task that is yet to begin execution to a new location and starts its execution there – Task migration refers to the transfer of task that has already begun execution to a new location and continuing its execution there May 29, 2016 COP 5611 - Operating Systems 23 Task Migration • State transfer – The task’s state includes the content of registers, the task stack, the task’s status, virtual memory address space, file descriptors, any temporary files and buffered messages – In addition, current working directory, signal masks and handlers, resource usage statistics, and references to children and parent processes • Unfreeze – The task is installed at the new machine and is put in the ready queue May 29, 2016 COP 5611 - Operating Systems 24 Issues in Task Migration • State transfer – The cost to support remote execution, including delays due to freezing the tasks, obtaining and transferring the state, and unfreezing the task – Residual dependencies • • • • Transferring pages in the virtual memory space Redirection of messages Location-dependent system calls Residual dependencies are undesirable May 29, 2016 COP 5611 - Operating Systems 25 State Transfer Mechanisms May 29, 2016 COP 5611 - Operating Systems 26 Location Transparency • Location transparency is essential to support task migration – Task migration should hide the locations of tasks – Remote execution of tasks should not require any special provisions in programs – These require names be independent of their locations • Addresses are maintained as hints • An object can be accessed through pointers May 29, 2016 COP 5611 - Operating Systems 27 Task Migration Performance • Cost of process migration in Sprite May 29, 2016 COP 5611 - Operating Systems 28 Task Migration Performance – cont. • Cost of process migration in Charlotte May 29, 2016 COP 5611 - Operating Systems 29 Summary • Load distributed algorithms try to improve the overall system performance by transferring load from heavily loaded nodes to lightly loaded or idle nodes – There are different load distributed algorithms developed – To be effective, these algorithms must be able to collect the necessary information efficiently and minimize the overhead of task transferring and delays to due to task transferring May 29, 2016 COP 5611 - Operating Systems 30