TDDB68 Concurrent programming and operating systems Overview: CPU Scheduling CPU bursts and I/O bursts Scheduling Criteria Scheduling Algorithms Priority Inversion Problem CPU Scheduling Multiprocessor Scheduling Appendix: Scheduling Algorithm Evaluation [SGG7/8/9] Chapter 5 Appendix: Short Introduction to Real-Time Scheduling Copyright Notice: The lecture notes are mainly based on Silberschatz’s, Galvin’s and Gagne’s book (“Operating System Concepts”, 7th ed., Wiley, 2005). No part of the lecture notes may be reproduced in any form, due to the copyrights reserved by Wiley. These lecture notes should only be used for internal teaching purposes at the Linköping University. Christoph Kessler, IDA, Linköpings universitet Basic Concepts TDDB68, C. Kessler, IDA, Linköpings universitet. Silberschatz, Galvin and Gagne ©2005 6.2 CPU Scheduler Maximum CPU utilization obtained with multiprogramming CPU–I/O Burst Cycle – Selects from among the processes in memory that are ready to execute, and allocates the CPU to one of them CPU scheduling decisions may take place when a process: Process execution consists of a cycle of alternating CPU execution and I/O wait 0. 1. 2. 3. Is admitted Switches from running to waiting state Switches from running to ready state Switches from waiting 0 to ready state 4. Terminates CPU burst distribution histogram 2 4 1 3 Scheduling under 1 and 4 only is nonpreemptive All other scheduling is preemptive TDDB68, C. Kessler, IDA, Linköpings universitet. 6.3 Silberschatz, Galvin and Gagne ©2005 Scheduling Criteria TDDB68, C. Kessler, IDA, Linköpings universitet. 6.4 Silberschatz, Galvin and Gagne ©2005 6.6 Silberschatz, Galvin and Gagne ©2005 Optimization Criteria CPU utilization – Max CPU utilization keep the CPU as busy as possible Max throughput Throughput – # of processes that complete their execution per time unit Turnaround time – amount of time to execute a particular process Waiting time – Min turnaround time Min waiting time Min response time amount of time a process has been waiting in the ready queue Response time – amount of time it takes from when a request was submitted until the first response is produced, not including output (for time-sharing environment) Deadlines met? – in real-time systems (later) TDDB68, C. Kessler, IDA, Linköpings universitet. 6.5 Silberschatz, Galvin and Gagne ©2005 TDDB68, C. Kessler, IDA, Linköpings universitet. 1 First-Come, First-Served (FCFS, FIFO) Scheduling Process Burst Time P1 24 P2 P3 3 3 FCFS Scheduling (Cont.) Suppose that the processes arrive in the order P2 , P3 , P1 The Gantt chart for the schedule is: P2 Suppose that the processes arrive in the order: P1 , P2 , P3 The Gantt Chart for the schedule is: 0 P1 P2 0 24 27 30 FCFS normally used for non-preemptive batch scheduling, e.g. printer queues (i.e., burst time = job size) Waiting time Pi = start time Pi – time of arrival for Pi Average waiting time: (0 + 24 + 27) / 3 = 17 Silberschatz, Galvin and Gagne ©2005 6.7 Shortest-Job-First (SJF) Scheduling Associate with each process the length of its next CPU burst. Use these lengths to schedule the shortest ready process Two schemes: nonpreemptive SJF – once CPU given to the process, it cannot be preempted until it completes its CPU burst preemptive SJF – preempt if a new process arrives with CPU burst length less than remaining time of current executing process. Also Example of Preemptive SJF 0 Convoy effect – short process behind long process Idea: shortest job first? TDDB68, C. Kessler, IDA, Linköpings universitet. P2 2 P3 4 Example of Non-Preemptive SJF Process Arrival Time P1 0.0 P2 2.0 P3 4.0 P4 5.0 Burst Time 7 4 1 4 with non-preemptive SJF: P1 P3 3 7 P2 8 P4 12 16 TDDB68, C. Kessler, IDA, Linköpings universitet. 6.10 Silberschatz, Galvin and Gagne ©2005 Burst Time 7 4 1 4 Can only estimate the length Based on length of previous CPU bursts, using exponential averaging: 1. t n actual lenght of n th CPU burst 2. n 1 predicted value for the next CPU burst 3. , 0 1 4. Define : n+1 t n 1 n . P2 5 Silberschatz, Galvin and Gagne ©2005 6.8 Predicting Length of Next CPU Burst with preemptive SJF: P1 - much better! Average waiting time = (0 + 6 + 3 + 7) / 4 = 4 Silberschatz, Galvin and Gagne ©2005 6.9 Process Arrival Time P1 0.0 P2 2.0 P3 4.0 P4 5.0 30 Average waiting time: (6 + 0 + 3)/3 = 3 0 gives minimum average waiting time for a given set of processes TDDB68, C. Kessler, IDA, Linköpings universitet. 6 Waiting time for P1 = 6; P2 = 0, P3 = 3 known as Shortest-Remaining-Time-First (SRTF) SJF is optimal – 3 P1 P3 Waiting time for P1 = 0; P2 = 24; P3 = 27 TDDB68, C. Kessler, IDA, Linköpings universitet. P3 P4 P1 11 7 16 Average waiting time = (9 + 1 + 0 +2) / 4 = 3 TDDB68, C. Kessler, IDA, Linköpings universitet. 6.11 Silberschatz, Galvin and Gagne ©2005 TDDB68, C. Kessler, IDA, Linköpings universitet. 6.12 Silberschatz, Galvin and Gagne ©2005 2 Examples of Exponential Averaging Priority Scheduling Extreme cases: A priority value (integer) is associated with each process =0 n+1 Recent The CPU is allocated to the process with the highest priority = n (smallest integer highest priority) history does not count =1 n+1 = tn Only nonpreemptive CPU burst time Problem: n+1 = tn + (1 - ) tn-1 + … Starvation – low-priority processes may never execute +(1 - )j tn-j + … Solution: +(1 - )n +1 0 Since both and (1 - ) are less than 1, each successive term has less weight than its predecessor TDDB68, C. Kessler, IDA, Linköpings universitet. preemptive SJF is a priority scheduling where priority is the predicted next the latest CPU burst counts Otherwise: Expand the formula: 6.13 Silberschatz, Galvin and Gagne ©2005 Round Robin (RR) Aging – as time progresses increase the priority of the process TDDB68, C. Kessler, IDA, Linköpings universitet. Example: RR with Time Quantum q = 20 Each process gets a small unit of CPU time: time quantum, usually 10-100 milliseconds. After this time has elapsed, the process is preempted and added to the end of the ready queue. Given n processes in the ready queue and time quantum q, each process gets 1/n of the CPU time in chunks of at most q time units at once. No process waits more than (n-1)q time units. Performance q very large FCFS q very small many context switches q must be large w.r.t. context switch time, otherwise too high overhead Process Burst Time P1 53 P2 P3 17 68 P4 24 TDDB68, C. Kessler, IDA, Linköpings universitet. Silberschatz, Galvin and Gagne ©2005 6.14 6.15 Silberschatz, Galvin and Gagne ©2005 Time Quantum and Context Switches The Gantt chart is: P1 0 P2 20 37 P3 P4 57 P1 77 P3 97 117 P4 P1 P3 P3 121 134 154 162 Typically, higher average turnaround than SJF, but better response TDDB68, C. Kessler, IDA, Linköpings universitet. Silberschatz, Galvin and Gagne ©2005 6.16 RR: Turnaround Time Varies With Time Quantum Smaller time quantum more context switches Time Quantum = 1 P1 P2 P3 P4 3 TDDB68, C. Kessler, IDA, Linköpings universitet. 6.17 Silberschatz, Galvin and Gagne ©2005 5 TDDB68, C. Kessler, IDA, Linköpings universitet. 9 10 15 6.18 17 The average turnaround time can 3 + 9 + 15 17 be improved if +most =11 4 processes finish their next CPU burst in a single time quantum. Silberschatz, Galvin and Gagne ©2005 3 RR: Turnaround Time Varies With Time Quantum Time Quantum = 2 P1 P2 P3 P4 55 TDDB68, C. Kessler, IDA, Linköpings universitet. 10 14 15 6.19 17 RR: Turnaround Time Varies With Time Quantum The average turnaround time can 5 +10 + 14 17 be improved if +most =11.5 4 processes finish their next CPU burst in a single time quantum. Silberschatz, Galvin and Gagne ©2005 Problems with RR and Priority Schedulers Priority based scheduling may cause starvation for some processes. TDDB68, C. Kessler, IDA, Linköpings universitet. 6.20 Multilevel Queue Ready queue is partitioned into separate queues, e.g.: foreground (interactive) background (batch) Each queue has its own scheduling algorithm Round robin based schedulers are maybe too ”fair”... we sometimes want to prioritize some processes. Silberschatz, Galvin and Gagne ©2005 foreground – RR background – FCFS Useful when processes are easily classified into different groups with different characteristica... Scheduling must be done also between the queues: Fixed priority scheduling Serve Solution: Multilevel queue scheduling ...? all from foreground queue, then from background queue. Possibility of starvation. Time slice Each queue gets a certain share of CPU time which it can schedule amongst its processes TDDB68, C. Kessler, IDA, Linköpings universitet. 6.21 Silberschatz, Galvin and Gagne ©2005 Multilevel Queue Scheduling Example: 80% to foreground in RR, 20% to background in FCFS TDDB68, C. Kessler, IDA, Linköpings universitet. 6.22 Silberschatz, Galvin and Gagne ©2005 Multilevel Feedback Queue A process can move between the various queues aging can be implemented this way Time-sharing among the queues in priority order TDDB68, C. Kessler, IDA, Linköpings universitet. 6.23 Silberschatz, Galvin and Gagne ©2005 Processes in lower queues get CPU only if higher queues are empty TDDB68, C. Kessler, IDA, Linköpings universitet. 6.24 Silberschatz, Galvin and Gagne ©2005 4 Multilevel Feedback Queue Example of Multilevel Feedback Queue Three queues: Multilevel-feedback-queue scheduler high defined by the following parameters: Q0 – RR with q = 8 ms Q1 – RR with q = 16 ms number of queues Q2 – FCFS scheduling algorithms for each queue method used to determine when to upgrade a process method used to determine when to demote a process method used to determine which queue a process will enter when that process needs service priority level of each queue Scheduling: low priority A new job enters queue Q0 which is served RR. When it gains CPU, the job receives 8 milliseconds. If it does not finish in 8 milliseconds, it is moved to Q1. At Q1 the job is again served RR and receives 16 additional milliseconds. If it still does not complete, it is preempted and moved to Q2. TDDB68, C. Kessler, IDA, Linköpings universitet. Silberschatz, Galvin and Gagne ©2005 6.25 Synchronization might Interfere with Priority Scheduling Priority inversion problem TDDB68, C. Kessler, IDA, Linköpings universitet. 6.26 Silberschatz, Galvin and Gagne ©2005 TDDB68 Concurrent programming and operating systems (sleep) H R? R! M (done) L R! Scenario: A low-prioritized thread L takes resource R and executes... Multiprocessor Scheduling Then mid-prioritized thread M preempts L and executes... Then high-prioritized thread H preempts M and executes... ...until it needs resource R. But R is occupied (owned by L), and H waits. Now M will be awake and running, and H will have to wait for an unrelated thread M to finish ... ? Copyright Notice: The lecture notes are mainly based on Silberschatz’s, Galvin’s and Gagne’s book (“Operating System Concepts”, 7th ed., Wiley, 2005). No part of the lecture notes may be reproduced in any form, due to the copyrights reserved by Wiley. These lecture notes should only be used for internal teaching purposes at the Linköping University. Can be resolved by priority inheritance L temporarily inherits H’s priority while working on R TDDB68, C. Kessler, IDA, Linköpings universitet. Multiprocessor Scheduling Multiprocessor Scheduling CPU scheduling more complex if multiple CPUs available Multiprocessor (SMP) (homogeneous) Multi-core processors homogeneous cores CPU CPU … CPU processors, shared memory share L2 cache and memory P0 P1 L1$ D1$ L1$ D1$ L2$ HW threads / logical CPUs share basically everything but register sets – HW-Contextswitch-on-Load – Cycle-by-cycle interleaving – Simultaneous multithreading Main memory Job-blind scheduling (FCFS, SJF, RR – as above) schedule and dispatch one by one as any CPU gets available Affinity based scheduling guided by data locality (cache contents, loaded pages) Co-Scheduling / Gang scheduling for parallel jobs Supercomputing applications often use a fixed-sized process configuration (“SPMD”, 1 thread per processor) and implement own methods for scheduling and resource management (goal: load balancing) 6.29 Processor-local ready queues Centralized vs. local task queues, load balancing TDDB68, C. Kessler, IDA, Linköpings universitet. Common ready queue (SMP only) – critical section! Memory Ctrl Intel Xeon Dualcore(2005) Parallel jobs: Work sharing Multi-CPU scheduling approaches for sequential and parallel tasks supported by Linux, Solaris, Windows XP, Mac OS X memory Multithreaded processors Christoph Kessler, IDA, Linköpings universitet Silberschatz, Galvin and Gagne ©2005 6.27 Silberschatz, Galvin and Gagne ©2005 Load balancing by task migration Push – migration vs. pull migration (work stealing) Linux: Push-load-balancing every 200 ms, pull-load-balancing whenever local task queue is empty TDDB68, C. Kessler, IDA, Linköpings universitet. 6.30 Silberschatz, Galvin and Gagne ©2005 5 Affinity-based Scheduling Scheduling Communicating Threads Cache contains copies of data recently accessed by CPU Frequently communicating threads / processes If a process is rescheduled to a different CPU (+cache): Old cache contents invalidated by new accesses Many cache misses when restarting on new CPU Time quantum much bus traffic and many slow main memory accesses Policy: Try to avoid migration to other CPU if possible. A process has affinity for the processor on which it is currently running CPU Hard affinity (e.g. Linux): Migration to other CPU is forbidden Soft affinity (e.g. Solaris): Migration is possible but undesirable TDDB68, C. Kessler, IDA, Linköpings universitet. 6.31 Cache Cache Main Memory Silberschatz, Galvin and Gagne ©2005 Global, shared RR ready queue Execute processes/threads from the same job simultaneously rather than maximizing processor affinity Example: Undivided Co-scheduling algorithm Place threads from same task in adjacent entries in the global queue A1 A2 A3 A4 B1 B2 B3 C1 C2 C3 C4 C5 … Window on ready queue of size #processors All threads within same window execute in parallel for at most 1 time quantum RR fair (no indefinite postponement) Programs designed to run in parallel profit from multiprocessor env. May reduce processor affinity 6.33 CPU 0: A0 Idle… B1 CPU 1: B0 A1 Idle… B0 CPU 0: A0 B1 A0 CPU 1: A1 B0 A1 A0 Idle… B1 A0 B1 A1 Idle… B0 A1 B1 A0 B1 TDDB68, C. Kessler, IDA, Linköpings universitet. B0 A1 6.32 B0 Silberschatz, Galvin and Gagne ©2005 Summary: CPU Scheduling Goals: Tasks can be parallel (have >1 process/thread) TDDB68, C. Kessler, IDA, Linköpings universitet. time CPU Co-Scheduling / Gang Scheduling (e.g., in a parallel program) should be scheduled simultaneously on different processors to avoid idle times Enable multiprogramming CPU utilization, throughput, ... Scheduling Algorithms Preemptive vs Non-preemptive scheduling RR, FCFS, SJF Priority scheduling Multilevel queue and Multilevel feedback queue Priority Inversion Problem Multiprocessor Scheduling Appendix: Scheduling Algorithm Evaluation, Realtime Scheduling In the book (Chapter 5): Scheduling in Solaris, Windows, Linux Silberschatz, Galvin and Gagne ©2005 TDDB68, C. Kessler, IDA, Linköpings universitet. 6.34 Silberschatz, Galvin and Gagne ©2005 Scheduling Algorithm Evaluation Analytical modeling Deterministic modeling Appendix: Scheduling Algorithm Evaluation Material for further reading / self-study takes a particular predetermined workload and defines the performance of each algorithm for that workload. Queuing models Simulation Models: Implementation TDDB68, C. Kessler, IDA, Linköpings universitet. 6.35 Silberschatz, Galvin and Gagne ©2005 TDDB68, C. Kessler, IDA, Linköpings universitet. 6.36 Silberschatz, Galvin and Gagne ©2005 6 Real-Time Systems Network access - Search - Maintain link - Talk Display - Clock - Browse web -… Appendix: Real-Time Scheduling Camera - Take photo (optional) Keyboard - Button pressed -… Event-driven and/or time-driven Deadlines TDDB68, C. Kessler, IDA, Linköpings universitet. 6.37 Silberschatz, Galvin and Gagne ©2005 TDDB68, C. Kessler, IDA, Linköpings universitet. Silberschatz, Galvin and Gagne ©2005 6.38 Utility Function Real-Time Scheduling (Value of result relative to time) Hard real-time systems missing a deadline can have catastrophic consequences Hard real-time system: utility requires that critical processes receive priority over less important ones missing a deadline leads to degradation of service e.g., ri di lost frames / pixelized images in digital TV Release time Often, periodic tasks or reactive computations benefit vv(t) i (t ) Soft real-time computing Deadline utility loss required to complete a critical task within a guaranteed amount of time time penalty require special scheduling algorithms: RM, EDF, ... TDDB68, C. Kessler, IDA, Linköpings universitet. 6.39 Silberschatz, Galvin and Gagne ©2005 Real-Time CPU Scheduling TDDB68, C. Kessler, IDA, Linköpings universitet. 6.40 Silberschatz, Galvin and Gagne ©2005 Rate-Monotonic scheduling (RM) Use strict priority-driven preemptive scheduling Priority = inverse periodicity • Task with shortest periodicity has highest priority Schedulability analysis Time triggered or event triggered (e.g., sensor measurement) Periodic tasks Each task i has a periodicity pi, a (relative) deadline di and a computation time ti CPU utilization of task i: ti / p i Sporadic tasks (no period, but relative deadline) Aperiodic tasks (no period, no deadline) TDDB68, C. Kessler, IDA, Linköpings universitet. 6.41 Silberschatz, Galvin and Gagne ©2005 Worst case – start all tasks simultaneously Simulate an execution of the system until lowest prioritised task has finished for the first time ... if it meets its deadline, then it is “all ok” (task set is schedulable) Example: P1 (period 50, time 20) has higher priority than P2 (period 100, time 35) TDDB68, C. Kessler, IDA, Linköpings universitet. 6.42 Silberschatz, Galvin and Gagne ©2005 7 Missed Deadlines with Rate Monotonic Scheduling Earliest Deadline First (EDF) Scheduling Priorities are assigned according to deadlines: Example: P1 (periodicity 50, time 25), P2 (periodicity 80, time 35) P1 preempts P2 the earlier the deadline, the higher the priority; the later the deadline, the lower the priority. P2 misses deadline at 80 P2 keeps CPU as its deadline is closer TDDB68, C. Kessler, IDA, Linköpings universitet. Silberschatz, Galvin and Gagne ©2005 6.43 TDDB68, C. Kessler, IDA, Linköpings universitet. SJF becomes EDF RM vs. EDF Earliest Deadline First (EDF) same principle as for SJF preemptive version RM: Schedulability: This is an optimal algorithm – if CPU utilisation is below 100% the task set is schedulable! 6.44 Silberschatz, Galvin and Gagne ©2005 Simpler algorithm On overload: always lowest prioritized task fails first Harder schedulablity analysis, not an optimal algorithm Utilization-based schedulability test – Schedulability guaranteed only for utilization < 69% Simulation-based schedulability test EDF: TDDB68, C. Kessler, IDA, Linköpings universitet. Silberschatz, Galvin and Gagne ©2005 6.45 Dynamic Scheduling with Dynamic Admission Control Overload may propagate delays all over... Optimal algorithm, simple schedulability analysis (total utilization <100% always ok) TDDB68, C. Kessler, IDA, Linköpings universitet. 6.46 Silberschatz, Galvin and Gagne ©2005 Minimizing Latency Event latency is the amount of time from when an event occurs to when it is serviced scheduling activation acceptance test NO YES Ready Run preemption signal from resource Waiting termination Interrupt latency + Dispatch latency wait on busy resource Acceptance test is also known as performing admission control Perform admission control at run-time when • tasks have dynamic activations • the arrival times are not known a priori • execute test every time there is a new task coming into the system TDDB68, C. Kessler, IDA, Linköpings universitet. 6.47 Silberschatz, Galvin and Gagne ©2005 TDDB68, C. Kessler, IDA, Linköpings universitet. 6.48 Silberschatz, Galvin and Gagne ©2005 8 Interrupt Latency Dispatch Latency Interrupt latency is the period of time from when an interrupt Dispatch latency is the amount of time required for the arrives at the CPU to when it is serviced. scheduler to stop one process and start another. ISR continued TDDB68, C. Kessler, IDA, Linköpings universitet. 6.49 Silberschatz, Galvin and Gagne ©2005 Period of conflict, e.g., priority inversion TDDB68, C. Kessler, IDA, Linköpings universitet. 6.50 Silberschatz, Galvin and Gagne ©2005 9