Process Scheduling Chapter 5 1 Introduction Policy and implementation Objectives: Fast response time High throughput (turnaround time) Avoidance of process starvation Context switching is expensive Context is a snapshot of the values of the general-purpose, memory management, and other special registers. 2 Type of Scheduling Long-term Performed when new process is created. The decision to add to the pool of processes to be executed. Medium-term Swapping The decision to add to the number of processes that are partially or fully in main memory 3 Types of Scheduling Short-term Which ready process to execute next The decision as to which available processes will be executed by the processor. FCFS, Round-Robin, Shortest process next, Shortest remaining time I/O The decision as to which process’s pending I/O request shall be handled by available I/O device 4 Scheduling and Process State Transition New Long-term scheduling Long-term scheduling Ready, suspend Ready Medium-term scheduling Blocked, suspend 5 Blocked Medium-term scheduling Running Short-term scheduling Exit Queuing Diagram for Scheduling Long-term scheduling Batch jobs Time-out Ready Queue Short-term scheduling Release Processor Medium-term scheduling Interactive users Ready, Suspend Queue Medium-term scheduling Blocked, Suspend Queue Blocked Queue Event Occurs 6 Event Wait 5.2 Clock Interrupt Handling Clock interrupt is the 2nd to the power-failure interrupt. Tasks: Returns the hardware clock Update CPU usage statistics Performs scheduler-related functions Sends a SIGXCPU signal to the current process Updates the time-of-day and other related clocks. Handles callouts Wakes up system processes Handles alarms 7 5.2.1 Callouts Records a function that the kernel must invoke at a later time. int to_ID = timeout(void(*fn), caddr_t arg, long delta) void untimeout(int to_ID) Tasks: Retransmission of network packets Certain scheduler and memory management functions Monitoring devices to avoid losing interrupts Polling devices that do not support interrupts 8 Callout in BSD UNIX 9 5.2.2 Alarms Real-time: relates to the actual elapsed time, and notifies the process via a SIGALRM signal. Profiling: Measures the amount of time the process has been executing and uses the SIGPROF signal for notification. Virtual-time: Monitors only the time spent by the process in user mode and sends the SIGVTALRM signal. 10 5.3 Scheduler Goals The scheduler must ensure that the system delivers acceptable performance to each application. Different applications: Interactive: 50-150ms Batch: scientific computation Real-time: time-critical 11 5.4 Traditional UNIX Scheduling To improve response times of interactive users, while ensuring that low-priority, background jobs do not starve. Priority-based: User-process is preempted Kernel is strictly non-preempted 12 Priority Kernel:0-49, user: 50-127 proc fields: p_pri: Current scheduling priority p_usrpri: User mode priority p_cpu: Measure of recent CPU usage p_nice: User-controllable nice factor Kernel: Sleeping 13 priority User mode priority Depends on two factors: Nice: 0-39 CPU usage Time-sharing: equal opportunity decay factor: for SVR3 it is 1/2, for 4.3BSD: decay 14 = (2*load_average)/(2*load_average+1) p_cpu = p_cpu* decay p_usrpri = PUSER + (p_cpu/4) +(2*p_nice) Example : PUSER = 50 T2 T1 P1 P_usrpri= 110 P_cpu = 80 Nice = 20 Decay =1/2 P2 P_usrpri= 120 P_cpu = 80 Nice=25 15 T3 P1 P_usrpri= 115 P_cpu = 100 Nice = 20 P2 P_usrpri= 110 P_cpu = 40 Nice = 25 P1 P_usrpri= 102 P_cpu = 50 Nice = 20 Decay =1/2 P2 P_usrpri= 115 P_cpu = 60 Nice = 25 Scheduler Implementation 32 run queues: doubly linked list of proc structures for runnable processes. whichqs: bitmask for each queue, “1” means that there is a runnable process swtch(): context switch by p_addr Saving part of u area (pcb) Loading the saved context. VAX ffs & ffc : special instructions for context switch 16 17 Run Queue Manipulation roundrobin(): for the processes with the same priority. schedcpu(): recomputes the priority once per second Removes the process from the run queue; recomputes the priority Puts it back 18 When to switch context The current process blocks on a resource or exits. The priority recomputation procedure results in the priority of another process becoming greater than that of the current one( flag runrun set). The current process, or an interrupt handler, wake up a higher-priority process 19 Analysis Not scale well No way to let a specific process to occupy the CPU No guarantee to real-time applications Little control of priorities Kernel is non-preemptive, high-priority runnable processes may have to wait for the kernel to relinquish the CPU 20 5.5 The SVR4 Scheduler 21 Support a diverse range of applications including those requiring real-time response Separate the scheduling policy from the mechanisms that implement it Provide applications with greater control over their priority and scheduling. Define a scheduling framework with a well-defined interface to the kernel Allow new scheduling policies to be added in a modular manner, including dynamic loading of scheduler implementations. Limit the dispatch latency for time-critical applications. The class-independent Layer Responsible for context switching, run queue management, & preemption. 22 Preemption points Places of code where the kernel data is in a steady state and is about to begin a long computation. In the pathname parsing routine lookuppn() In the open system call, before file creation In the memory subsystem, before freeing the pages of a process. 23 Call PREEMPT() check kprunrun Interface to the Scheduling Classes 3 fields of proc p_cid: class ID, an index into the global class table p_clfuncs: pointer to the classfuncs vector for the class p_clproc: pointer to a class-dependent private data structure #define CL_SLEEP(procp, clprocp, …) (*(procp)-p->clfuncs->cl_sleep)(clprocp, …) 24 Interface cnt’d Entry CL_TICK: the clock interrupt handler CL_FORK, CL_FORKRET: fork CL_ENTERCLASS, CL_EXITCLASS: enter, exit CL_SLEEP: sleep() CL_WAKEUP: wakeprocs() Priorities: 0-59: time-sharing class 60-99: system priority 100-159: real-time class 25 26 The Time Sharing Class The default class for a process. Round-robin scheduling: Event-driven scheduling tsproc: ts_timeleft: time remaining in the quantum ts_cpupri : system part of the priority ts_upri: user part of the priority(nice value) ts_umpri: user mode priority (ts_cpupri+ ts_upri) ts_dispwait: seconds since start the quantum 27 Dispatcher parameter table Dispatcher parameter table New ts_cpupri to set when the quantum expires. 28 New ts_cpupri to set when returning to user mode after sleeping Number of seconds to wait for quantum expiry before using ts_lwait. Use instead of ts_tqexp if process took longer than ts_maxwait to use up its quantum. The Real-Time Class 100-159: higher than any time-sharing process. The real-time process must wait until the current process is about to return to user mode or until it reaches a kernel preemption point. Real-time processes require bounded dispatch latency and bounded response time. The response time = the time for interrupt handler + dispatch latency. 29 30 The priocntl System Call Basic operations: Changing the priority class of the process Setting ts_upri for time-sharing processes Resetting priority and quantum for real-time processes Obtaining the current value of several scheduling parameters 31 priocntlset: perform the same operations on a set of processes - a system/ a process group/ session/ a scheduling class/ a particular user/ having the same parent. Adding a scheduling class Provide an implementation of each classdependent scheduling function Initialize a classfuncs vector to point to these functions Provide an initialization function to perform setup tasks such as allocating internal data structures Add an entry for this class in the class table Rebuild the kernel 32 Analysis Provides flexible approach that allows the addition of scheduling classes to a system. Event-driven scheduling favors I/O-bound & interactive jobs over CPU-bounded ones. No good way for a time-sharing class process to switch to a different one. priocntl is only used by the superuser. It is difficult to tune the system properly for a mixed set of applications. Solaris2.x improved SVR4 33 5.6 Solaris 2.x Enhancements Multithreaded, symmetricmultiprocessing OS Preemptive Kernel Fully preemptive Implement interrupts by special kernel threads Interrupt threads always run at the highest priority in the system. 34 Multiprocessor Support Processors can communicate by crossprocessor interrupt Per-processor data structure Cpu_thread: currently running thread Cpu_dispthread: last selected to run Cpu_idle: idle thread Cpu_runrun: preemption flag used for timesharing threads Cpu_kprunrun: preemption flag set by real-time threads Cpu_chosen_level: priority of thread that is going to preempt the current thread 35 Multiprocessor scheduling T6 becomes runnable - preempts T3 36 37 Hidden Scheduling The kernel schedules the work without considering the priority of the thread for which it is doing the work. E.G. STREAMS services. Moving STREAMS processing into kernel threads. Callouts handled by a special callout thread (has max system priority) 38 Priority Inversion A situation where a lower-priority thread holds a resource needed by a higher priority process, thereby blocking that higher-priority process. 39 Solution 40 Solved by priority inheritance or priority lending. Priority inheritance must be transitive. 41 Implementation of Priority inheritance An extra state to implement priority inheritance A global priority & inherited priority for each thread pi_willto(): traverses the synchronization chain and passes on the inherited priority of the calling thread. pi_waive(): surrenders its inherited priority. 42 43 44 Limitations of Priority Inheritance 45 Can be implemented only when it is known which thread is going to free the resource, i.e. when the resource is held by a single, known thread. For mutexes the owner is always known, so pr. Inh. can be used, For semaphores, and conditions variables the owner is usually indeterminate, so pr. inh. is not used, When a reader/writer lock is used for writing there is a single, known owner; It may be held however by multiple readers, so then there is no single owner. Limitations of Priority Inheritance 46 Solaris defines an owner-of record, which is the first thread that obtained the read clock. If a higher priority writer blocks on this object, the owner-of record thread will inherit its priority. If there are other readers – they cannot inherit the writer’s priority, so the solution is limited. While reducing the time a high-priority process must block, in the worst case however this time is still much greater than what is acceptable for many real-time applications. Alternative solutions – ceiling protocol – it requires however a priori knowledge of all processes in the system and their resource requirements – possible in embedded applications. Turnstiles 47 Restrict the sleep queue to threads blocked on a particular resource – limiting the time taken to process the queue Threads are queued in order of their priority; To unlock turnstile: signal – for single highest priority thread, broadcast – for all blocked threads. 48 Solaris scheduling evaluation 49 Suitable for multithreaded and many real-time applications for uni- and multiprocessors; Still missing other desirable real-time features such as gang scheduling and deadline-driven scheduling Linux Scheduling Scheduling classes SCHED_FIFO: First-in-first-out real-time threads SCHED_RR: Round-robin real-time threads SCHED_OTHER: Other, non-real-time threads Within each class multiple priorities may be used 50 51 Non-Real-Time Scheduling Linux 2.6 uses a new scheduler - the O(1) scheduler Time to select the appropriate process and assign it to a processor is constant regardless of the load on the system or number of processors Separate queue for each priority. Higher priority assigned lower number 52 Non-Real-Time Scheduling queue structure – for active queues and for expired queues All scheduling is done from the active queue structure; when it becomes empty a switch is made with the expired queue structure and the scheduling continues Two 53 54 Calculating priorities For non-real time priority is changed dynamically as a function of the task’s static priority and its execution behavior. For real-time tasks priority is fixed SCHED_FIFO tasks do not have assigned time-slices 55 SCHED_RR tasks have assigned time slices but they are never moved to the expired queue structure