Michael Jugan Clay Taylor Xuejuan Zhang Abstract A CPU scheduling simulator is implemented in C++ for use on Unix-based systems. The simulator supports a variety of advanced features such as blocking processes, preemption, and context switching to accurately model real-world conditions. The simulator works by applying a scheduling algorithm to a process script. Process scripts are specially formatted files which describe computer workloads. Six scheduling algorithms are included with the simulator: first-in first-out (FIFO), shortest job first (SJF), round robin (RR), priority queue (PQ), earliest deadline first (EDF), and multilevel feedback queue (MLFQ). Furthermore, users may quickly create and test novel scheduling methods. Two additional programs are provided to aid in using the scheduler. The first allows users to run the simulator using a custom command prompt. A second program helps with the creation of process scripts. These two tools are used to test the six scheduling algorithms on three different workloads. It is found that MLFQ has the best overall performance. However, using SJF and PQ also give respectable results. Introduction The fundamental purpose of an operating system is to allot a computer's resources to the various processes running on it. One important resource managed by the operation system is clock cycles of execution on the processor. The operating system must utilize a scheduling algorithm to decide when, and for how long, each process is allowed to execute. There are many different scheduling algorithm choices available. Some algorithms attempt to optimize for a desirable trait such as high throughput or low latency, while others try to obtain a balance. The act of choosing the best scheduling algorithm for a given system is not intuitive. This problem motivates the creation of a process-driven simulator. The simulator is used to evaluate an array of common scheduling algorithms in the context of various synthetic workloads. The first challenge is that of creating a realistic simulator. In order to perform good evaluations, the simulator must be able to handle a wide variety of process. Furthermore, the processes must be highly configurable. For example, in real-time systems, tasks often have deadlines they must meet. Therefore, the simulator must handle this attribute in order to realistically evaluate scheduling algorithms for real-time systems. This is just one of many possible examples of how adding realism to the simulator also adds complexity. The second challenge associated with this project is determining which scheduling algorithm performs the best. The performance of a scheduling algorithm can be evaluated multiple ways, and thus one generally cannot say that a specific algorithm is totally superior to all others. Therefore, each algorithm must be evaluated using a myriad of metrics and workloads. Thorough testing will indicate which algorithms are best for certain situations. Related Work Prior to designing a system, recent CPU scheduling work was explored. It was found that substantial academic research has been performed addressing instruction-set architecture (ISA) simulation [1-4]. These simulators are tasked with the challenge of efficiently modeling complex computer architectures. The simulator being designed in this project operates at a much higher level, and its operation is independent of the system’s hardware. Nevertheless, reading these research papers made it apparent that a good simulator is customizable, efficient, and easy to use. In addition to finding research papers, a few scheduling simulation programs were also discovered. One project, LinSched [5], is particularly impressive. It was developed by a PhD candidate at the University of North Carolina, and is used by Google [6]. LinSched works by running the scheduling portions of Linux’s kernel in user space. The simulation’s results are nearly identical to those obtained by running the actual kernel. Moving the code into user space greatly aids system developers with the testing of scheduling routines. For example, one benefit of using LinSched is that scheduler bugs will simply crash the simulator instead of the entire operating system. This greatly reduces development time. Although LinSched works at a higher level than the ISA simulators, it is designed to be used by systems programmers. A slightly higher level scheduling simulator named CPU Scheduling Simulator (CPUSS) was also found. CPUSS is an open source project developed from 2007 to 2008, and it has amassed over 6,300 downloads. It is written in C# using Window’s Visual Studio. The project’s homepage describes CPUSS as “a framework that allows you to quickly and easily design and gather metrics for custom CPU scheduling strategies” [7]. It analyzes scheduling algorithms using more than 20 different metrics. Additionally, it can generate graphs showing simulation results. Furthermore, new scheduling algorithms may be created by implementing a C# interface. CPUSS appears to provide Windows users with a fully featured method for performing scheduling evaluations. No Unix equivalent of CPUSS could be found. Design Figure I: A high-level illustration of the simulator's design As shown in Figure I, the scheduling simulator is responsible for reading input, tracking process’ statuses, and outputting results. The system begins with the simulator reading a process script. This type of file contains a list of processes to run and information about each process. Processes are described by values such as their issue and execution times. A process’ issue time is simply the time during the simulation at which the process first requests to use the CPU’s resources. Execution time is the amount of time that a process must run on the CPU before it finishes execution. The simulator is responsible for determining when new processes need to be issued. After a process has been issued, it may not immediately gain access to the CPU. Issued processes are initially inactive, but they are eligible to be scheduled. The simulator periodically consults another system component, the scheduler. The scheduler tells the simulator which inactive process should be executed by the CPU and for how much time. Only one process may be executed at any given time, and this process is referred to as the active process. Scheduler In this project, six different scheduling algorithms may be used to select the active process. The simplest, first-in first-out (FIFO), simply executes processes in the order in which they are issued. The second algorithm, shortest job first (SJF), runs whichever process has the least amount of remaining time. SJF uses FIFO to handle processes of the same length. Thirdly, round robin (RR) cycles through the processes and allots each a fixed amount of time known as a time-slice. The next algorithm, priority queuing (PQ), runs higher priority processes before lower priority processes. Processes with the same priority are handled with FIFO. The algorithm earliest deadline first (EDF) is implemented similarly to PQ. However, instead of running the highest priority processes, EDF runs those with the lowest valued deadlines. Similarly to PQ, EDF handles ties using FIFO. Lastly, multi-level feedback queuing (MLFQ), is the most complicated algorithm studied. MLFQ uses several queues, and each is assigned its own timeslice. The highest priority queue typically has the shortest time-slice, and all processes start in this queue. The MLFQ algorithm selects the first process in the highest priority queue to be the active process. The active process is run for the length specified by its queue’s time-slice. If the process does not finish running, it is reinserted into a different queue level. Normally the process is inserted into a lower priority queue. However, if it was run for less than its allocated time, it either remains in its original queue, or it is inserted into a higher priority queue. This decision is made based upon how much time was used relative to the next highest queue’s time-slice. Advanced Features In addition to supporting multiple scheduling algorithms, the system also includes features that increase the simulation’s realism. For example, processes may be set up to periodically block. This feature may be used to more realistically simulate processes containing blocking I/O function calls. Moreover, the simulator supports both regular and blocking preemption. With regular preemption, the scheduler may notify the simulator to swap the active process at any time. For example, a PQ scheduler may preempt the active process if a higher priority process is recently issued. When blocking preemption is enabled, the scheduler automatically deactivates the active process if it begins to block. The final advanced feature is that the simulator can account for the overhead of context switches. It is assumed that a context switch occurs whenever the active process changes. For a configurable number of cycles, the simulator assumes that no process is active. This feature is needed to ensure that the simulator’s results model those observed in real systems. Outputs Running the simulator results in two kinds of outputs: a detailed trace and a results summary. The trace describes all important actions performed by the system. For example, it shows when each process becomes active. Reading the trace is useful for debugging the system and better understanding how each scheduling algorithm works. The results summary includes a variety of metrics used to evaluate a scheduler’s performance. The following statistics are calculated by the simulator: CPU utilization, time the active process blocks, time spent process switching, worst case and average queuing time, worst case and average turnaround time, percent of deadlines met, and total runtime. First, CPU utilization is the percentage of runtime during which the CPU is actively running a process. Ideally, this value will be 100%; however, it may be less due to an active process blocking, process switching's overhead, and simply not having a process needing run. Queuing time refers to the length of time that a process waits to be run, and turnaround time is the amount of time that it takes a specific process to complete. The percent of deadlines met shows how many processes met their individual deadlines. Finally, the total runtime is the sum of each process' turnaround time. These values should give a comprehensive summary of an algorithm's performance. Implementation The entire system is written in C++ using object-oriented design principles. The programs were developed and tested on Unix systems. However, porting the system to Windows would not be difficult. One important implementation decision to note is that the simulator’s time unit is referred to as a cycle. However, this is not meant to always represent the actual length of a CPU’s cycle. CPU speeds vary considerably, and processes running on high frequency CPUs quickly execute billions of cycles. The simulator performs a considerable amount of computations each cycle. Therefore, users should uniformly scale down their cycles to reasonable values. Process Scripts The format of process scripts is name based. At the top of a script file, each unique process is named and described. In this section of the file, each line starts with the name of a process. Variables such as execution time and priority are then listed to describe each process. The next section of the file, the issue section, is indicated by a line containing the word BEGIN_ISSUE. Each line in this section contains an issue time and process name. Splitting up the process scripts into these two sections allows a process’ settings to be edited on one line, and all of the process’ issue lines remain the same. Furthermore, associating actual names with processes makes interpreting the simulator’s results easier. Simulator Class The project’s main class, named Simulator, contains two public methods. These functions are LoadScript and Run. The LoadScript method accepts one argument, the name of a process script. The process script is opened and its contents are read into the Simulator. A Process object is created for each unique process in the file. The Run method is the most crucial element of the Simulator. In this method, the Simulator interacts closely with a Scheduler object. Scheduler is the name of a base class with two non-functional virtual methods; these are AddProcess and GetNextProcess. The scheduling algorithms are implemented as derived classes of the base class. Each derived class adds its own functionality to the Scheduler’s virtual methods. The Run method calls AddProcess whenever a process is issued. The Scheduler then stores this process in an internal data structure such as a queue. When a scheduler’s GetNextProcess routine is called, a process is removed from the data structure and returned to the Simulator. Three optional parameters may be included in calls to Run. These are traceModeEnabled, preemptBlocksEnabled, and procSwitchCycles. The first two arguments are boolean values. When traceModeEnabled is true, the scheduler outputs the detailed trace described in section TODO. The second option, preemptBlocksEnabled, is used to enable the blocking preemption feature. Finally, the last parameter is an unsigned long. This is the number of cycles associated with the overhead of a process switch. Schedule_sim Program A program named schedule_sim serves as a command line based interface to the Simulator class. This allows users to quickly and easily apply different scheduling algorithms to process scripts. A single optional parameter allows users to set the program’s prompt text. After starting the program, the user can enter one of four commands: LOAD_SCRIPT, SET_PARAMS, RUN, and HELP. The first two commands, LOAD_SCRIPT and SET_PARAMS are called before RUN. LOAD_SCRIPT simply takes the name of a process script as an argument. SET_PARAMS allows users to set the optional arguments of the Simulator’s Run method. The Run command takes one or more arguments. The first argument is the name of the scheduling algorithm to use. All other parameters are specific to each scheduler. Table I shows the possible arguments for the RUN command. As shown, FIFO and SJF do not require any additional parameters; however, the more complicated schedulers are told time-slice values to use. Finally, the HELP command simply displays a list of possible commands. The schedule_sim program performs basic error checking on the user’s input. For example, it notifies users when invalid arguments or commands are used. These measures were taken to ensure that the program is user friendly. Table I: Parameters for schedule_sim’s RUN command Scheduler Name first-in-first-out round robin shortest job first RUN Command’s Arguments FIFO RR SJF timeSliceSize priority queuing earliest deadline first multi-level feedback queuing PQ EDF MLFQ timeSliceSize timeSliceSize timeSliceSize1, ... , timeSliceSizeN Script_gen program A second program, script_gen, was written to aid with the task of creating process scripts. This program accepts two arguments: a random number seed and an output filename. The program prompts the user to enter a process name. Then, it asks the user to supply the variables describing a process such as its priority and deadline. Three additional values are also requested: static issue period, random issue period, and quantity. These values allow the program to create quantity number of issue time entries for the process. Each issue time is separated by at least the static issue period. Additionally, a maximum random issue period time may be added to each issue time. After acquiring these details for a single process, the program then asks if there are any more processes. It repeats this routine until the user indicates that all processes have been created. This program makes the creation of large scripts practical. Evaluation The first step we took in evaluating our simulator was to verify the correctness of its results. We did this by running it in trace mode with small workloads consisting of ten or fewer processes and then checking the results by hand. This proved to be valuable as we were able to catch some errors we had made in implementing our design. After we were satisfied that the simulator was behaving as expected, we proceeded to evaluate how the various scheduling algorithms performed for three workloads. Basic Workload The first workload we evaluated was a basic workload consisting of an even mixture of short and long processes that did not block and had no deadlines. The short processes had a runtime of 100 cycles and for the long processes it was 1000 cycles, with 100 of each issued. We gave the short processes a higher priority than the long processes as we felt a typical user would want these to complete faster and would notice increased turnaround times for them more so than with the long processes. Finally we chose an issue rate of two to four times the execution time of the process as this yielded a good amount of active processes without overwhelming the CPU. We ran this workload under the FIFO, RR, SJF, PQ, and MLFQ schedulers but chose to exclude the EDF scheduler since this workload featured no deadlines. For the RR scheduler we evaluated time slices of 10, 100, 500, and 1000 cycles. For the MLFQ scheduler we used a twolevel queue with the top level having a time slice of 100 cycles and the bottom level having a time slice of 200 cycles. For the first experiment, we ran the simulator with no overhead and no preemption on blocking. The results of this test can be seen in Figure II. Figure II: Basic workload simulation results From these charts we can see that the FIFO and RR schedulers with a large time slice favor the long processes while the SJF, PQ, MLFQ, and RR schedulers with a small time slice favor the short processes. Unsurprisingly, as we increase the size of the time slice for RR it begins to resemble the FIFO results. For the SJF, PQ, and MLFQ schedulers the worst case queuing time for the long processes is much worse than for any of the other schedulers. Finally, the MLFQ scheduler achieves much better queuing time for the long processes than the SJF and PQ schedulers by sacrificing just a small amount of extra queuing time for the short processes. We also evaluated how the same set of schedulers responded to the addition of overhead for switching processes with the same workload. For this experiment, we ran the simulator with 100 and 200 cycles of overhead, the results of which can be seen in Figure III. The first thing to take note of from these charts is that the RR schedulers with smaller time slices suffer much more significant drops in performance due to switching processes much more often. Another interesting trend is that most schedulers don't feature a performance loss when the overhead is doubled. This is because the workload used has an ideal utilization of roughly 55%, so the idle time effectively absorbs the switching time. However, the MLFQ scheduler does feature such a drop because it performs enough switching such that the doubling of the overhead hits the ceiling of the absorption effect and thus performance starts suffering. Figure III: Basic workload simulation results with process switching overhead Blocking Workload The next workload that we evaluated was one that featured blocking behavior so that we could explore how the scheduling algorithms behave for blocking as well as how preemption on blocking affects these results. The two types of processes used in this workload both had an execution time of 100 cycles, issue period of 1000 to 2000 cycles, and a blocking time of 1000 cycles, but one blocked once after 50 cycles and the other blocked four times after 20 cycles. We gave the processes that block more often a higher priority under the assumption that it was much more important to reach the point of starting the blocking process for them since it occurs more often. Finally, neither type of process featured deadlines. For the scheduling algorithms we chose to evaluate FIFO, RR, SJF, PQ, and MLFQ. We excluded EDF once again as this workload did not include deadlines. The RR scheduler was evaluated for a time slice of 20, 50 and 100 cycles. The MLFQ scheduler had a time slice of 21 cycles for the top queue and 51 cycles for the bottom queue. These values were chosen to be slightly larger than the blocking periods so that they would incur the blocking penalty and be placed in the appropriate queue level. The results of this test can be seen in Figure IV. Figure IV: Blocking workload simulation results One can see from these charts that the RR and MLFQ schedulers achieve a utilization and runtime without preemption that is equivalent to having preemption while the FIFO, SJF, and PQ schedulers do not. This is because the latter place no restriction on how long a process can sit in the CPU while the former places a guaranteed upper bound on this. Furthermore, the ideal utilization of this workload is only around 19%, so it features the same absorption of blocking time by idle time discussed in the previous section for overhead. One can see that increasing the RR time slice size does indeed also increase the number of cycles the CPU spends blocking as well as the average queuing time, but it is not significant enough to hit the ceiling. With preemption all schedulers achieve similar results except for slight variations in queuing. Deadline Workload The final workload we evaluated was one that featured deadlines that the processes were expected to meet. The first type of process had an execution time of 10 cycles, a deadline of 100 cycles, and an issue period of 50 cycles. The second type had an execution time of 50 cycles, a deadline of 500 cycles, and an issue period of 100 cycles. The third type had an execution time of 100 cycles, a deadline of 1000 cycles, and an issue period of 250 cycles. The processes were given increasing priority for decreasing deadlines. These values were chosen to be theoretically solvable but to give a good spread of behavior among the schedulers. The schedulers evaluated under this workload were FIFO, RR, SJF, PQ, MLFQ, and EDF. For RR we tested with time slices of 1, 10, and 50 cycles, and for MLFQ we tested with time slices of 10/90, 10/40/50, and 10/20/40 cycles with the first number of each series being the time slice for the highest priority queue. The results of this final test can be seen in Figure V. Figure V: Deadline workload simulation results Unsurprisingly, the EDF scheduler meets all deadlines since it is theoretically possible. The only other schedulers that come reasonably close are the SJF and the 10/40/50 MLFQ schedulers. The PQ and other MLFQ schedulers do reasonably well at meeting the deadlines. The FIFO and RR schedulers do a poor job of meeting the deadlines on average. However, the FIFO scheduler and the RR scheduler with a time slice of 50 cycles do a good job for the medium and long processes. On the other hand, the other RR schedulers don't favor any of the process types with all having a deadline meeting percentage roughly equivalent to the average. Conclusion In this paper we presented the design and implementation of a scheduling simulator and its use in evaluating several basic scheduling algorithms. These algorithms consisted of first-in first-out (FIFO), shortest job first (SJF), round robin (RR), priority queue (PQ), earliest deadline first (EDF), and finally the multilevel feedback queue (MLFQ). Our simulation environment provides the capacity for processes to block for I/O and to be preempted for various reasons. The simulator provided the means to implement various workloads which we utilized to test these algorithms under a basic workload, a workload that featured blocking, and a workload that featured deadlines. From our evaluation of the results of these, we found that MLFQ seems to be the most well-rounded scheduling algorithm. It has good average queuing and turnaround times, isn't extremely susceptible to performance drops due to switch overhead or blocking, and manages to do a good job of meeting deadlines when tuned properly. SJF and PQ are close to it in performance, but they are susceptible to performance drops due to blocking without preemption. Furthermore, they also have the unrealistic requirement of having to know the properties of the workload ahead of time. RR does a good job of minimizing time wasted on blocking with a low enough time slice, but it is also the most susceptible to performance drops from switching overhead. FIFO doesn't favor processes of any length like the other scheduling algorithms do, so it is good if this is your definition of fairness. However, just like SJF and PQ it suffers from performance drops due to blocking without preemption. Finally, EDF is the only one that managed to meet all of the deadlines so if this is a requirement then it is the only scheduling algorithm up to the task. References [1] J.Zhu et al. A Retargetable, Ultra-fast Instruction Set Simulator.DATE, 1999. [2] E.Schnarr et al. Facile: A language and compiler for high-performance processor simulators. PLDI, Jun. 2001. [3] A.Nohl et al. A Universal Technique for Fast and Flexible Instruction-Set Architecture Simulation. DAC, 2002. [4] An Efficient Retargetable Framework for Instruction-Set Simulation.M. Reshadi, N. Bansal, P.Mishra, N. Dutt, ISSS’03, October 1–3, 2003 [5] “LinSched: The Linux Scheduler Simulator”:http://www.cs.unc.edu/~jmc/linsched/. [Accessed May 4, 2012] [6] “Linux Scheduler Simulation”:http://www.ibm.com/developerworks/linux/library/l-linuxscheduler-simulator/.[Accessed May 4, 2012] [7] “CPU Scheduling Simulator”:http://cpuss.codeplex.com/. [Accessed May 4, 2012]