The final workload we evaluated was one that featured deadlines

advertisement
Michael Jugan
Clay Taylor
Xuejuan Zhang
Abstract
A CPU scheduling simulator is implemented in C++ for use on Unix-based systems. The
simulator supports a variety of advanced features such as blocking processes, preemption, and
context switching to accurately model real-world conditions. The simulator works by applying a
scheduling algorithm to a process script. Process scripts are specially formatted files which
describe computer workloads. Six scheduling algorithms are included with the simulator: first-in
first-out (FIFO), shortest job first (SJF), round robin (RR), priority queue (PQ), earliest deadline
first (EDF), and multilevel feedback queue (MLFQ). Furthermore, users may quickly create and
test novel scheduling methods. Two additional programs are provided to aid in using the
scheduler. The first allows users to run the simulator using a custom command prompt. A second
program helps with the creation of process scripts. These two tools are used to test the six
scheduling algorithms on three different workloads. It is found that MLFQ has the best overall
performance. However, using SJF and PQ also give respectable results.
Introduction
The fundamental purpose of an operating system is to allot a computer's resources to the
various processes running on it. One important resource managed by the operation system is
clock cycles of execution on the processor. The operating system must utilize a scheduling
algorithm to decide when, and for how long, each process is allowed to execute. There are many
different scheduling algorithm choices available. Some algorithms attempt to optimize for a
desirable trait such as high throughput or low latency, while others try to obtain a balance. The
act of choosing the best scheduling algorithm for a given system is not intuitive. This problem
motivates the creation of a process-driven simulator. The simulator is used to evaluate an array
of common scheduling algorithms in the context of various synthetic workloads.
The first challenge is that of creating a realistic simulator. In order to perform good
evaluations, the simulator must be able to handle a wide variety of process. Furthermore, the
processes must be highly configurable. For example, in real-time systems, tasks often have
deadlines they must meet. Therefore, the simulator must handle this attribute in order to
realistically evaluate scheduling algorithms for real-time systems. This is just one of many
possible examples of how adding realism to the simulator also adds complexity. The second
challenge associated with this project is determining which scheduling algorithm performs the
best. The performance of a scheduling algorithm can be evaluated multiple ways, and thus one
generally cannot say that a specific algorithm is totally superior to all others. Therefore, each
algorithm must be evaluated using a myriad of metrics and workloads. Thorough testing will
indicate which algorithms are best for certain situations.
Related Work
Prior to designing a system, recent CPU scheduling work was explored. It was found that
substantial academic research has been performed addressing instruction-set architecture (ISA)
simulation [1-4]. These simulators are tasked with the challenge of efficiently modeling complex
computer architectures. The simulator being designed in this project operates at a much higher
level, and its operation is independent of the system’s hardware. Nevertheless, reading these
research papers made it apparent that a good simulator is customizable, efficient, and easy to use.
In addition to finding research papers, a few scheduling simulation programs were also
discovered. One project, LinSched [5], is particularly impressive. It was developed by a PhD
candidate at the University of North Carolina, and is used by Google [6]. LinSched works by
running the scheduling portions of Linux’s kernel in user space. The simulation’s results are
nearly identical to those obtained by running the actual kernel. Moving the code into user space
greatly aids system developers with the testing of scheduling routines. For example, one benefit
of using LinSched is that scheduler bugs will simply crash the simulator instead of the entire
operating system. This greatly reduces development time.
Although LinSched works at a higher level than the ISA simulators, it is designed to be
used by systems programmers. A slightly higher level scheduling simulator named CPU
Scheduling Simulator (CPUSS) was also found. CPUSS is an open source project developed
from 2007 to 2008, and it has amassed over 6,300 downloads. It is written in C# using Window’s
Visual Studio. The project’s homepage describes CPUSS as “a framework that allows you to
quickly and easily design and gather metrics for custom CPU scheduling strategies” [7]. It
analyzes scheduling algorithms using more than 20 different metrics. Additionally, it can
generate graphs showing simulation results. Furthermore, new scheduling algorithms may be
created by implementing a C# interface. CPUSS appears to provide Windows users with a fully
featured method for performing scheduling evaluations. No Unix equivalent of CPUSS could be
found.
Design
Figure I: A high-level illustration of the simulator's design
As shown in Figure I, the scheduling simulator is responsible for reading input, tracking
process’ statuses, and outputting results. The system begins with the simulator reading a process
script. This type of file contains a list of processes to run and information about each process.
Processes are described by values such as their issue and execution times. A process’ issue time
is simply the time during the simulation at which the process first requests to use the CPU’s
resources. Execution time is the amount of time that a process must run on the CPU before it
finishes execution.
The simulator is responsible for determining when new processes need to be issued. After
a process has been issued, it may not immediately gain access to the CPU. Issued processes are
initially inactive, but they are eligible to be scheduled. The simulator periodically consults
another system component, the scheduler. The scheduler tells the simulator which inactive
process should be executed by the CPU and for how much time. Only one process may be
executed at any given time, and this process is referred to as the active process.
Scheduler
In this project, six different scheduling algorithms may be used to select the active
process. The simplest, first-in first-out (FIFO), simply executes processes in the order in which
they are issued. The second algorithm, shortest job first (SJF), runs whichever process has the
least amount of remaining time. SJF uses FIFO to handle processes of the same length. Thirdly,
round robin (RR) cycles through the processes and allots each a fixed amount of time known as a
time-slice. The next algorithm, priority queuing (PQ), runs higher priority processes before lower
priority processes. Processes with the same priority are handled with FIFO. The algorithm
earliest deadline first (EDF) is implemented similarly to PQ. However, instead of running the
highest priority processes, EDF runs those with the lowest valued deadlines. Similarly to PQ,
EDF handles ties using FIFO. Lastly, multi-level feedback queuing (MLFQ), is the most
complicated algorithm studied. MLFQ uses several queues, and each is assigned its own timeslice. The highest priority queue typically has the shortest time-slice, and all processes start in
this queue. The MLFQ algorithm selects the first process in the highest priority queue to be the
active process. The active process is run for the length specified by its queue’s time-slice. If the
process does not finish running, it is reinserted into a different queue level. Normally the process
is inserted into a lower priority queue. However, if it was run for less than its allocated time, it
either remains in its original queue, or it is inserted into a higher priority queue. This decision is
made based upon how much time was used relative to the next highest queue’s time-slice.
Advanced Features
In addition to supporting multiple scheduling algorithms, the system also includes
features that increase the simulation’s realism. For example, processes may be set up to
periodically block. This feature may be used to more realistically simulate processes containing
blocking I/O function calls.
Moreover, the simulator supports both regular and blocking preemption. With regular
preemption, the scheduler may notify the simulator to swap the active process at any time. For
example, a PQ scheduler may preempt the active process if a higher priority process is recently
issued. When blocking preemption is enabled, the scheduler automatically deactivates the active
process if it begins to block.
The final advanced feature is that the simulator can account for the overhead of context
switches. It is assumed that a context switch occurs whenever the active process changes. For a
configurable number of cycles, the simulator assumes that no process is active. This feature is
needed to ensure that the simulator’s results model those observed in real systems.
Outputs
Running the simulator results in two kinds of outputs: a detailed trace and a results
summary. The trace describes all important actions performed by the system. For example, it
shows when each process becomes active. Reading the trace is useful for debugging the system
and better understanding how each scheduling algorithm works. The results summary includes a
variety of metrics used to evaluate a scheduler’s performance.
The following statistics are calculated by the simulator: CPU utilization, time the active
process blocks, time spent process switching, worst case and average queuing time, worst case
and average turnaround time, percent of deadlines met, and total runtime. First, CPU utilization
is the percentage of runtime during which the CPU is actively running a process. Ideally, this
value will be 100%; however, it may be less due to an active process blocking, process
switching's overhead, and simply not having a process needing run. Queuing time refers to the
length of time that a process waits to be run, and turnaround time is the amount of time that it
takes a specific process to complete. The percent of deadlines met shows how many processes
met their individual deadlines. Finally, the total runtime is the sum of each process' turnaround
time. These values should give a comprehensive summary of an algorithm's performance.
Implementation
The entire system is written in C++ using object-oriented design principles. The
programs were developed and tested on Unix systems. However, porting the system to Windows
would not be difficult. One important implementation decision to note is that the simulator’s
time unit is referred to as a cycle. However, this is not meant to always represent the actual
length of a CPU’s cycle. CPU speeds vary considerably, and processes running on high
frequency CPUs quickly execute billions of cycles. The simulator performs a considerable
amount of computations each cycle. Therefore, users should uniformly scale down their cycles to
reasonable values.
Process Scripts
The format of process scripts is name based. At the top of a script file, each unique
process is named and described. In this section of the file, each line starts with the name of a
process. Variables such as execution time and priority are then listed to describe each process.
The next section of the file, the issue section, is indicated by a line containing the word
BEGIN_ISSUE. Each line in this section contains an issue time and process name. Splitting up
the process scripts into these two sections allows a process’ settings to be edited on one line, and
all of the process’ issue lines remain the same. Furthermore, associating actual names with
processes makes interpreting the simulator’s results easier.
Simulator Class
The project’s main class, named Simulator, contains two public methods. These functions
are LoadScript and Run. The LoadScript method accepts one argument, the name of a process
script. The process script is opened and its contents are read into the Simulator. A Process object
is created for each unique process in the file.
The Run method is the most crucial element of the Simulator. In this method, the
Simulator interacts closely with a Scheduler object. Scheduler is the name of a base class with
two non-functional virtual methods; these are AddProcess and GetNextProcess. The scheduling
algorithms are implemented as derived classes of the base class. Each derived class adds its own
functionality to the Scheduler’s virtual methods. The Run method calls AddProcess whenever a
process is issued. The Scheduler then stores this process in an internal data structure such as a
queue. When a scheduler’s GetNextProcess routine is called, a process is removed from the data
structure and returned to the Simulator.
Three optional parameters may be included in calls to Run. These are traceModeEnabled,
preemptBlocksEnabled, and procSwitchCycles. The first two arguments are boolean values.
When traceModeEnabled is true, the scheduler outputs the detailed trace described in section
TODO. The second option, preemptBlocksEnabled, is used to enable the blocking preemption
feature. Finally, the last parameter is an unsigned long. This is the number of cycles associated
with the overhead of a process switch.
Schedule_sim Program
A program named schedule_sim serves as a command line based interface to the
Simulator class. This allows users to quickly and easily apply different scheduling algorithms to
process scripts. A single optional parameter allows users to set the program’s prompt text. After
starting the program, the user can enter one of four commands: LOAD_SCRIPT,
SET_PARAMS, RUN, and HELP.
The first two commands, LOAD_SCRIPT and SET_PARAMS are called before RUN.
LOAD_SCRIPT simply takes the name of a process script as an argument. SET_PARAMS
allows users to set the optional arguments of the Simulator’s Run method. The Run command
takes one or more arguments. The first argument is the name of the scheduling algorithm to use.
All other parameters are specific to each scheduler. Table I shows the possible arguments for the
RUN command. As shown, FIFO and SJF do not require any additional parameters; however,
the more complicated schedulers are told time-slice values to use. Finally, the HELP command
simply displays a list of possible commands. The schedule_sim program performs basic error
checking on the user’s input. For example, it notifies users when invalid arguments or commands
are used. These measures were taken to ensure that the program is user friendly.
Table I: Parameters for schedule_sim’s RUN command
Scheduler Name
first-in-first-out
round robin
shortest job first
RUN Command’s Arguments
FIFO
RR
SJF
timeSliceSize
priority queuing
earliest deadline first
multi-level feedback queuing
PQ
EDF
MLFQ
timeSliceSize
timeSliceSize
timeSliceSize1, ... , timeSliceSizeN
Script_gen program
A second program, script_gen, was written to aid with the task of creating process scripts.
This program accepts two arguments: a random number seed and an output filename. The
program prompts the user to enter a process name. Then, it asks the user to supply the variables
describing a process such as its priority and deadline. Three additional values are also requested:
static issue period, random issue period, and quantity. These values allow the program to create
quantity number of issue time entries for the process. Each issue time is separated by at least the
static issue period. Additionally, a maximum random issue period time may be added to each
issue time. After acquiring these details for a single process, the program then asks if there are
any more processes. It repeats this routine until the user indicates that all processes have been
created. This program makes the creation of large scripts practical.
Evaluation
The first step we took in evaluating our simulator was to verify the correctness of its
results. We did this by running it in trace mode with small workloads consisting of ten or fewer
processes and then checking the results by hand. This proved to be valuable as we were able to
catch some errors we had made in implementing our design. After we were satisfied that the
simulator was behaving as expected, we proceeded to evaluate how the various scheduling
algorithms performed for three workloads.
Basic Workload
The first workload we evaluated was a basic workload consisting of an even mixture of
short and long processes that did not block and had no deadlines. The short processes had a
runtime of 100 cycles and for the long processes it was 1000 cycles, with 100 of each issued. We
gave the short processes a higher priority than the long processes as we felt a typical user would
want these to complete faster and would notice increased turnaround times for them more so than
with the long processes. Finally we chose an issue rate of two to four times the execution time of
the process as this yielded a good amount of active processes without overwhelming the CPU.
We ran this workload under the FIFO, RR, SJF, PQ, and MLFQ schedulers but chose to
exclude the EDF scheduler since this workload featured no deadlines. For the RR scheduler we
evaluated time slices of 10, 100, 500, and 1000 cycles. For the MLFQ scheduler we used a twolevel queue with the top level having a time slice of 100 cycles and the bottom level having a
time slice of 200 cycles. For the first experiment, we ran the simulator with no overhead and no
preemption on blocking. The results of this test can be seen in Figure II.
Figure II: Basic workload simulation results
From these charts we can see that the FIFO and RR schedulers with a large time slice
favor the long processes while the SJF, PQ, MLFQ, and RR schedulers with a small time slice
favor the short processes. Unsurprisingly, as we increase the size of the time slice for RR it
begins to resemble the FIFO results. For the SJF, PQ, and MLFQ schedulers the worst case
queuing time for the long processes is much worse than for any of the other schedulers. Finally,
the MLFQ scheduler achieves much better queuing time for the long processes than the SJF and
PQ schedulers by sacrificing just a small amount of extra queuing time for the short processes.
We also evaluated how the same set of schedulers responded to the addition of overhead
for switching processes with the same workload. For this experiment, we ran the simulator with
100 and 200 cycles of overhead, the results of which can be seen in Figure III. The first thing to
take note of from these charts is that the RR schedulers with smaller time slices suffer much
more significant drops in performance due to switching processes much more often. Another
interesting trend is that most schedulers don't feature a performance loss when the overhead is
doubled. This is because the workload used has an ideal utilization of roughly 55%, so the idle
time effectively absorbs the switching time. However, the MLFQ scheduler does feature such a
drop because it performs enough switching such that the doubling of the overhead hits the ceiling
of the absorption effect and thus performance starts suffering.
Figure III: Basic workload simulation results with process switching overhead
Blocking Workload
The next workload that we evaluated was one that featured blocking behavior so that we
could explore how the scheduling algorithms behave for blocking as well as how preemption on
blocking affects these results. The two types of processes used in this workload both had an
execution time of 100 cycles, issue period of 1000 to 2000 cycles, and a blocking time of 1000
cycles, but one blocked once after 50 cycles and the other blocked four times after 20 cycles. We
gave the processes that block more often a higher priority under the assumption that it was much
more important to reach the point of starting the blocking process for them since it occurs more
often. Finally, neither type of process featured deadlines.
For the scheduling algorithms we chose to evaluate FIFO, RR, SJF, PQ, and MLFQ. We
excluded EDF once again as this workload did not include deadlines. The RR scheduler was
evaluated for a time slice of 20, 50 and 100 cycles. The MLFQ scheduler had a time slice of 21
cycles for the top queue and 51 cycles for the bottom queue. These values were chosen to be
slightly larger than the blocking periods so that they would incur the blocking penalty and be
placed in the appropriate queue level. The results of this test can be seen in Figure IV.
Figure IV: Blocking workload simulation results
One can see from these charts that the RR and MLFQ schedulers achieve a utilization and
runtime without preemption that is equivalent to having preemption while the FIFO, SJF, and PQ
schedulers do not. This is because the latter place no restriction on how long a process can sit in
the CPU while the former places a guaranteed upper bound on this. Furthermore, the ideal
utilization of this workload is only around 19%, so it features the same absorption of blocking
time by idle time discussed in the previous section for overhead. One can see that increasing the
RR time slice size does indeed also increase the number of cycles the CPU spends blocking as
well as the average queuing time, but it is not significant enough to hit the ceiling. With
preemption all schedulers achieve similar results except for slight variations in queuing.
Deadline Workload
The final workload we evaluated was one that featured deadlines that the processes were
expected to meet. The first type of process had an execution time of 10 cycles, a deadline of 100
cycles, and an issue period of 50 cycles. The second type had an execution time of 50 cycles, a
deadline of 500 cycles, and an issue period of 100 cycles. The third type had an execution time
of 100 cycles, a deadline of 1000 cycles, and an issue period of 250 cycles. The processes were
given increasing priority for decreasing deadlines. These values were chosen to be theoretically
solvable but to give a good spread of behavior among the schedulers.
The schedulers evaluated under this workload were FIFO, RR, SJF, PQ, MLFQ, and
EDF. For RR we tested with time slices of 1, 10, and 50 cycles, and for MLFQ we tested with
time slices of 10/90, 10/40/50, and 10/20/40 cycles with the first number of each series being the
time slice for the highest priority queue. The results of this final test can be seen in Figure V.
Figure V: Deadline workload simulation results
Unsurprisingly, the EDF scheduler meets all deadlines since it is theoretically possible.
The only other schedulers that come reasonably close are the SJF and the 10/40/50 MLFQ
schedulers. The PQ and other MLFQ schedulers do reasonably well at meeting the deadlines.
The FIFO and RR schedulers do a poor job of meeting the deadlines on average. However, the
FIFO scheduler and the RR scheduler with a time slice of 50 cycles do a good job for the
medium and long processes. On the other hand, the other RR schedulers don't favor any of the
process types with all having a deadline meeting percentage roughly equivalent to the average.
Conclusion
In this paper we presented the design and implementation of a scheduling simulator and
its use in evaluating several basic scheduling algorithms. These algorithms consisted of first-in
first-out (FIFO), shortest job first (SJF), round robin (RR), priority queue (PQ), earliest deadline
first (EDF), and finally the multilevel feedback queue (MLFQ). Our simulation environment
provides the capacity for processes to block for I/O and to be preempted for various reasons. The
simulator provided the means to implement various workloads which we utilized to test these
algorithms under a basic workload, a workload that featured blocking, and a workload that
featured deadlines.
From our evaluation of the results of these, we found that MLFQ seems to be the most
well-rounded scheduling algorithm. It has good average queuing and turnaround times, isn't
extremely susceptible to performance drops due to switch overhead or blocking, and manages to
do a good job of meeting deadlines when tuned properly. SJF and PQ are close to it in
performance, but they are susceptible to performance drops due to blocking without preemption.
Furthermore, they also have the unrealistic requirement of having to know the properties of the
workload ahead of time.
RR does a good job of minimizing time wasted on blocking with a low enough time slice,
but it is also the most susceptible to performance drops from switching overhead. FIFO doesn't
favor processes of any length like the other scheduling algorithms do, so it is good if this is your
definition of fairness. However, just like SJF and PQ it suffers from performance drops due to
blocking without preemption. Finally, EDF is the only one that managed to meet all of the
deadlines so if this is a requirement then it is the only scheduling algorithm up to the task.
References
[1] J.Zhu et al. A Retargetable, Ultra-fast Instruction Set Simulator.DATE, 1999.
[2] E.Schnarr et al. Facile: A language and compiler for high-performance processor simulators.
PLDI, Jun. 2001.
[3] A.Nohl et al. A Universal Technique for Fast and Flexible Instruction-Set Architecture
Simulation. DAC, 2002.
[4] An Efficient Retargetable Framework for Instruction-Set Simulation.M. Reshadi, N. Bansal,
P.Mishra, N. Dutt, ISSS’03, October 1–3, 2003
[5] “LinSched: The Linux Scheduler Simulator”:http://www.cs.unc.edu/~jmc/linsched/.
[Accessed May 4, 2012]
[6] “Linux Scheduler Simulation”:http://www.ibm.com/developerworks/linux/library/l-linuxscheduler-simulator/.[Accessed May 4, 2012]
[7] “CPU Scheduling Simulator”:http://cpuss.codeplex.com/. [Accessed May 4, 2012]
Download