Process Scheduling 5..

advertisement
Unix Internals
Process Scheduling 5
Unix scheduler
 policy
 implementation
context switch
 hardware execution context of current process is saved in it’s
process control block (PCB)
which is traditionally part of the
u area of the process
 context of the next process is retrieved from it’s PCB,
loaded into the hardware registers
 architecture-specific tasks
o flush data cache after context switch
o flush instruction cache after context switch
o flush address translation cache after context switch
o pipelined architectures (e.g., RISC)
flush instruction pipeline prior to context switch
hardware clock
 fixed time interval interrupt
clock tick 10 milliseconds
 major tick n clock ticks
n depends upon specific Unix variant
 clock frequency
o number of ticks per second
o constant HZ defined in param.h file
 kernel functions measure time in number of ticks
clock interrupt handler
 highly system-dependent
 priority second only to power-failure interrupt
 updates CPU usage statistics for current process
 performs scheduler-related functions
o priority recomputation
o time-slice expiration handling
 sends SIGXCPU signal to current process
if it has exceeded its CPU usage quota
C. Robert Putnam
106747319
Page 1
2/16/2016
11:33 AM




updates
o time-of-day clock
o SVR4 variable lbolt – number of ticks elapsed since system last booted
handles callouts
wakes system processes when required
o swapper
o pagedaemon
handles alarms
callouts
 records a function that the kernel will invoke after a specific number of ticks
 int to_ID = timeout(void(*fn)(), caddr_t arg, long delta);
o fn() – kernel function to be invoked
o arg – argument to be passed to the function
o delta – time interval measured in number of ticks
 kernel invokes callout in system context
o function cannot sleep
o function cannot access the process context
 void untimeout( int to_ID);
 periodic tasks
o retransmission of network packets
o scheduler functions
o memory management functions
o monitoring devices to avoid losing interrupts
o polling devices
 normal kernel operation – do not execute at interrupt priority
o clock handler checks if any callouts are due
o sets flag indicating that callout handler must run
o when system returns from base interrupt priority
 checks flag
 if flag is set, system invokes callout handler
 handler invokes each callout that is due to run
o once due, callouts run,
but only after all pending interrupts have been serviced
pending callout list
 checking time
o every CPU tick
o high interrupt priority
o optimize checking time
 insertion time
o low priority
o much less frequently than one per tick
C. Robert Putnam
106747319
Page 2
2/16/2016
11:33 AM

implementation
o 4.3BSD sort list in order of “time to fire”
 stores differential time of expiration for each entry
difference between its time to fire and that of the previous callout
 decrements time difference in first entry at each clock
o sort list in order of “time to fire”
 stores absolute time of expiration for each entry
 compares current absolute time with time in first entry
o timing wheel
 fixed-size circular array of callout queues
 at each tick, clock interrupt handler advances current time pointer
to the next element in the array,
wrapping around at the end of the array
 if any callouts are in the queue, their expiration time is checked
 new callouts are inserted into the queue which is N elements away
from current queue, where N is the time to fire measured in ticks
 timing wheel hashes callouts based on the expiration time
 within each queue, callouts may be sorted or unsorted
 sorting callouts
 reduces time required to process non-empty queues but
 increases insertion time
alarms
 real time alarm
o measures actual elapsed time
o notifies process via SIGALRM signal
 profiling alarm
o measures amount of time process has been executing
o notifies process via SIGPROF signal
 virtual-time alarm
 monitors time spent in user mode
 notifies process via SIGVALRM signal
BSD Unix
setitimer() system call
 allows process to request any type of alarm
 specify time interval in microseconds
SVRx
alarm() system call
 requests real time alarm
 specify time interval in whole number of seconds
C. Robert Putnam
106747319
Page 3
2/16/2016
11:33 AM
SVR4
hrtsys() system call
 allows time to be specified in microseconds
high resolution of real time alarms does not imply high accuracy
kernel delivers SIGALRM signal to process,
process will not respond to signal until it is next scheduled to run
high-resolution timers used by high priority processes are less likely to have
scheduling delays
but if current process is executing in kernel mode and does not reach an
preemption point
profiling & virtual alarms
clock interrupt handler charges whole tick to process even though it uses only a
portion of it
time measured by these alarms reflect number of clock interrupts that have
occurred while process was running
long run – average out good indicator of time used
for any single alarm, significant inaccuracy
scheduler goals
interactive process
reduce average time and variance between user action and application
response sufficiently so that users cannot readily detect the delay
batch process
measure of scheduling efficiency is the task completion time in the presence
of other activity
real time process -- time critical
predictable scheduling behavior with guaranteed bounds on response times
kernel functions – paging, interrupt handling, process management
execute promptly when required
mix of processes
all application processes must continue to progress
no application should prevent another from progressing unless user has
explicitly permitted it
system should always be able to receive and process interactive user input
C. Robert Putnam
106747319
Page 4
2/16/2016
11:33 AM
Traditional Unix Scheduling
SVR3, 4.3BSD





time-sharing, interactive environments
multiple users
several batch & foreground processes simultaneously
improve response times of interactive users
ensuring that low-priority, background jobs do not starve







priority-based scheduling
each process has a scheduling priority that changes with time
scheduler selects highest-priority runnable process
uses preemptive time-slicing to schedule processes of equal priority
dynamically varies process priority based on their CPU usage patterns
kernel is nonpreemptible
running process
o may voluntarily relinquish CPU when blocking on a resource
o can be preempted when it returns to user mode
process priorities
kernel priority numbers 0 – 49
user mode priority numbers 50 – 127
proc structure
p_pri
current scheduling priority
p_usrpri
user-mode priority
p_cpu
measure of recent CPU usage
p_nice
user-controllable nice factor




scheduler uses p_pri to decide which process to schedule
process in user mode  p_pri == p_usrpri
process wakes up after blocking in a system call
scheduler uses p_pri to store the temporary kernel priority
sleep priority
 terminal input 28
 disk I/O 20
 process wakes up after blocking
o kernel sets p_pri  sleep priority of the event or resource
o process scheduled ahead of other user processes
 process completes system call, about to return to user mode
 kernel sets scheduling priority p_pri  p_usrpri
C. Robert Putnam
106747319
Page 5
2/16/2016
11:33 AM
user-mode priority p_usrpri depends upon
 recent CPU usage p_cpu
 user-controllable nice factor p_nice range 0 – 39 , default 20
o user mode:
Δ↑ p_nice  Δ↓ priority
o superuser:
Δ↓ p_nice  Δ↑ priority
process priority algorithm
 process creation  p_cpu = 0
 clock tick  clock handler increments p_cpu for current process (max 127)
 every second  kernel invokes schedcpu() -- scheduled by callout
o reduces p_cpu of each process by decay factor
o SVR3
 fixed decay factor ½
 undesirable side effect – system load Δ↑  Δ↑ priorities
o 4.3BSD computes
 decay = (2 * load_average) / (2 * load_average + 1);
 load_average == average number of runnable processes over last second
 p_usrpri = PUSER + ( p_cpu / 4 ) + ( 2 * p_nice );
 PUSER is the baseline user priority of 50
C. Robert Putnam
106747319
Page 6
2/16/2016
11:33 AM
scheduler implementation
whichgs
(global variable – bitmask 32 cells – one bit for each queue)
0 0 0 1 0 1 0 …
qs (32 rows)
0 – 3
4 -- 7
8 -- 11
12--15
16--19
20--23
proc structures
P
P
P
P
P
runnable processes
…
124-127
context switch - swtch() function
 examines whichgs to find index of the first set bit
 bit identifies scheduler queue containing highest priority runnable process
 removes process from head of queue & switches context to it
 proc structure -- p_addr field points to page table entries of the u area
 process control block is a part of the u area
 VAX-11
o special instructions – 32 bit fields
 FFS – find First Set
 FFC – Find First Clear
o special instructions – double linked lists
 INSQHI – insert elements into doubly linked lists
 REMQHI– remove elements from doubly linked lists
o special instructions – process context
 LDPCTX – load process context
 SVPCTX – save process context
C. Robert Putnam
106747319
Page 7
2/16/2016
11:33 AM
run queue manipulation
 highest priority process always runs
unless current process is executing in kernel mode
 fixed time quantum -- 4.3BSD 100 milliseconds
 every 100 ms, kernel invokes roundrobin() function
to schedule next process from same queue
 if higher priority process were runnable,
preferentially scheduled without waiting for roundrobin() function
 if other runnable processes are on a lower priority queue,
the current process continues to run even though
its quantum has expired
 every second, schedcpu() recomputes the priority of each process
o removes the process from the queue
o recomputed process priority
o inserts process into a (possibly) different run queue
 every four ticks, clock interrupt handler computes
priority of the current process
 context switch
o current process (voluntary context switch)
 blocks on a resource
 exits
o priority recomputation  priority of another process > current priority
o current process or interrupt handler wakes up a higher priority process

voluntary context switch
o kernel directly calls swtch() from sleep() or exit() functions

involuntary switches
o system is in kernel mode, cannot directly preempt the process
o kernel sets runrun flag
o when process is about to return to user mode,
kernel checks the runrun flag
o if runrun flag is set, kernel transfers control to swtch() routine
limitations
 large number of processes  inefficient to recompute priorities every second
 no way to guarantee CPU resources to specific group of processes
 no guarantee of response time
 little control over priorities by applications
 nonpreemptive kernel -- priority inversion -- higher-priority process may
have to wait significant amount of time for lower-priority process to complete
C. Robert Putnam
106747319
Page 8
2/16/2016
11:33 AM
SVR4 Scheduler
scheduling class defines scheduling policy for set of processes
SVR4 scheduling classes
 time-sharing
 real-time
class-independent routines -- implement common services
 context switching
 run queue manipulation
 preemption
 procedural interface
class-dependent functions – pure virtual functions
 priority recomputation
o real-time class – fixed priorities
o time-sharing class -- varies process priorities dynamically
 inheritance
class independent layer
dqactmap
(global variable – bitmask 160 cells – one bit for each queue)
0 0 1 0 0 1 0 …
dispq (160 rows)
160
159
158
P
157
156
155
P
proc structures
P
P
P
runnable processes
…
0 – 59 time-sharing class
60 – 99 system priorities
100 – 159 real-time class
setfrontdq() – insert process at front of queue,
e.g., process preempted before time quantum had expired
setbackdq() – insert process at back of queue, e.g., newly runnable process
dispdeq() – remove process from queue
0
C. Robert Putnam
106747319
Page 9
2/16/2016
11:33 AM
dispatch latency
delay between the
time that process becomes runnable
and
time that process actually begins running
SVR4 kernel preemption points
preemption point -- places in kernel code where
 all kernel data structures are in stable state
 kernel is about to embark on a lengthy computation
at preemption point
 kernel macro PREEMPT() checks kprunrun flag
 if kprunrun flag is set,
o real-time process is requesting access to run
o PREEMPT() macro calls preempt() kernel routine to preempt process
 bounds the amount of time that real-time process must wait before being
scheduled
preemption points
 pathname parsing routine lookuppn()
before beginning to parse
each individual pathname component
 open() system call
before creating a file if it does not exist
 memory subsystem
before freeing process pages

preempt() kernel routine
o invokes CL_PREEMPT operation
to perform class-dependent processing
then
o calls swtch() to initiate context switch
C. Robert Putnam
106747319
Page 10
2/16/2016
11:33 AM

swtch() function
o calls pswtch() to perform machine-independent part of context switch
o pswtch()
 clears runrun and kprunrun flags,
 selects highest priority runnable process
 removes highest priority runnable process from the dispatch queue
 updates dqactmap
 sets state of process to SONPROC (running on processor)
 updates memory management registers to map the u area
 updates virtual address translation registers of the new process
o invokes assembly language code to
 manipulate register content
 flush translation buffers
 etc
C. Robert Putnam
106747319
Page 11
2/16/2016
11:33 AM
interface to scheduling classes
generic interface provides virtual functions that are implemented by each scheduling class
rt_classfuncs
rt_init
real-time
sys_classfunc
sys_init
system
ts_init
time-sharing
ts_classfunc
proc structures
proc structures
p_cid
p_cid
p_cid
p_cid
p_clfuncs
p_clfuncs
p_clfuncs
p_clfuncs
p_clproc
p_clproc
p_clproc
p_clproc
classdependent
data
classdependent
data
classdependent
data
classdependent
data
classfuncs structure
vector of pointers to the functions
that implement the class-dependent interface for any class
global class table
contains one entry <init fn ptr, class name , classfuncs ptr > for each class
 class name
 pointer to an initialization function
 pointer to the classfuncs vector for that class
p_cid
p_clfuncs
p_clproc
C. Robert Putnam
106747319
index into global class table
class ID
pointer to the classfuncs vector for the process class
pointer to class-dependent private data structure
Page 12
2/16/2016
11:33 AM
accessing class-dependent functions
 set of macros are used to
o resolve generic interface function calls and to
o invoke the correct class-dependent function

#define CL_SLEEP(procp, clproc, … ) \
(*(procp)->p_clfuncs->cl_sleep)(clprocp, … )
scheduling classes
determine the
 policies for priority computation
o range of priorities for the processes
o if and under what conditions the priority can change
o size of the time slice each time a process runs
o whether time slice size will depend upon the priority class
e.g., infinite time quantum for real-time tasks
 scheduling of the processes in the class
entry points of the class-dependent interface
CL_TICK
CL_FORK
CL_FORKRET
CL_ENTERCLASS
CL_EXITCLASS
clock interrupt handler initiates call
monitors time slice, recomputes priorities,
handles time quantum expiration, etc.
fork() initiates call
CL_FORK initializes child-specific data structure
CL_FORKRET may set runrun, allowing child process
to run before parent
call is initiated when a process enters or exits
a scheduling class; responsible for allocating &
deallocating the class-dependent data structures
CL_SLEEP
sleep() initiates call; may recompute process priority
CL_WAKEUP
wakeprocs() initiates call; places the process on the
appropriate run queue; may set runrun or kprunrun
scheduling class determines the specific actions to be accomplished by
a particular function, e.g., clock interrupt handler calls CL_TICK routine
for the class to which the process belongs; the routine determines the
exact actions to be performed
C. Robert Putnam
106747319
Page 13
2/16/2016
11:33 AM
Time-Sharing Class
 default class
 dynamic priorities
 round-robin scheduling is used for processes with equal priorities
 static dispatcher parameter table controls process priorities and time slices
 time slice depends upon scheduling priority
 ↓Δ priority  ↑Δ time slice
index
0
1
…
15
…
40
…
59
globpri
0
1
…
15
…
40
…
59
ts_globpri
ts_quantum
ts_tqexp
ts_slpret
ts_maxwait
ts_lwait


quantum
100
100
…
80
…
20
…
10
tqexp
0
0
…
7
…
30
…
49
slpret
10
11
…
25
…
50
…
59
maxwait
5
5
…
5
…
5
…
5
lwait
10
11
…
25
…
50
…
59
global priority
time quantum
ts_cpupri value when time quantum expires
before using ts_maxwait seconds
ts_cpupri value when returning to user mode after sleeping
seconds to wait for time quantum expiry
ts_cpupri value when time quantum does not expire
before using ts_maxwait seconds
event-driven scheduling ,i.e., process priority changed in response
to specific event related to process
scheduler
o reduces priority each time it uses its entire time slice
o boosts priority if
 process blocks on event or resource
 takes a long time to use its time slice
struct tsproc class-dependent data
0 ≤ ts_umdpri ≤ 59
ts_timeleft time remaining in quantum
ts_cpupri system part of priority
ts_upri
user part of priority (nice value), range [ -20, +19 ], default 0
ts_umdpri user mode priority == min { max { ts_cpupri + ts_upri, 0 } 59 }
ts_diswait clock time (seconds) since start of quantum
C. Robert Putnam
106747319
Page 14
2/16/2016
11:33 AM
process returns from sleeping in kernel
 priority is kernel priority determined by sleep condition
 return to user mode  priority restored from ts_umdpri
 ts_umdpri may be changed by priocntl(), but only a superuser increase it
 ts_cpupri is adjusted according to dispatcher parameter table
dispatcher parameter table access
 current ts_cpupri  ts_tqexp, to_slpret, ts_lwait  new ts_cpupri
 ts__umdpri  ts_globpri, ts_quantum, ts_maxwait
ts_upri = 14, ts_cpupri = 1, ts_globpri = 15, ts_umdpri = 15
time quantum expires prior to ts_maxwait time 
ts_cpupri = 0, ts_umdpri = 14
time quantum does not expire prior to ts_maxwait time 
ts_cpupri = 11, ts_umdpri = 25
process makes a system call and blocks on resource;
resumes and returns to user mode 
ts_cpupri = 11 ( ts_cpupri  ts_slpret ), ts_umdpri = 25
C. Robert Putnam
106747319
Page 15
2/16/2016
11:33 AM
Real-Time Class






priority range [100 – 159]
fixed priority
fixed quantum
bounded dispatch latency
bounded response time
real-time processes are scheduled before any kernel services
process executing in kernel mode; real-time process becomes runnable;
real-time process must wait until current process is about to
return to user mode or until it reaches a kernel preemption point
process explicitly makes a priocntl call to change priority or time quantum
only super processes can enter real-time class;
priocntl – specify priority and time quantum
default parameter table
↓Δ priority  ↑Δ time slice
struct rtproc
 current time quantum
 time remaining in quantum
 current priority
dispatch latency
time between
 process becomes runnable
 process begins to run
response time
time between
 occurrence of event
 response to event
time required by
 interrupt handler to process the event
 dispatch latency
 real-time process to respond to the event
C. Robert Putnam
106747319
Page 16
2/16/2016
11:33 AM
preemption points
divide kernel algorithms into small bounded work units
real-time process becomes runnable 
rt_wakeup()
[ class-dependent wakeup processing routine]
 sets kernel flag kprunrun
current process executing kernel code
 reaches preemption point
 checks flag
 initiates context switch to real-time process
latency bounds guaranteed only when
real-time process is highest-priority runnable process on system




real-time process is running on system
higher-priority real-time process becomes available
higher-priority real-time process is preferentially scheduled
latency calculation of lower-priority process
is recalculated after process regains CPU
priocntl system call
-- single process
 change priority class of process
 set ts_upri for time-sharing process
 reset priority and quantum for real-time process
 obtaining current values of scheduling parameters
priocntlset system call -- set of related process
 change priority class of processes
 set ts_upri for time-sharing processes
 reset priority and quantum for real-time processes
 obtaining current values of scheduling parameters
applies to all processes
 in system
 in process group or session
 in scheduling class
 owned by particular user
 having same parent
C. Robert Putnam
106747319
Page 17
2/16/2016
11:33 AM
Analysis
 allows addition of scheduling classes to system
 time-sharing class changes priorities based on events related to process
 fast and scalable dynamic priority calculation
 event-driven scheduling
favors I/O bound and interactive jobs over CPU-bound jobs
 interactive job with large computations  system will be unresponsive
 total system load and job mix characteristics determines system response
 system administer can alter system behavior by changing dispatcher table
settings and rebuilding kernel; such retuning may be required to keep
system efficient and responsive

adding a scheduling class does not require access to kernel source code
o provide implementation of each class-dependent scheduling function
o initialize classfuncs vector to point to these functions
o provide initialization function to perform setup tasks, e.g., allocating
internal data structures
o add entry for this class in the class table in a master configuration file,
typically located in the master.d directory of the kernel build directory
o entry contains pointers to the initialization function and classfuncs vector
o rebuild kernel







priocntl system call restricted to superuser
no good way for a time-sharing class process to switch to a different class
no provision for deadline-driven scheduling
code path between preemption points is too long for many time-critical processes
hard real-time systems require a fully preemptible kernel
extremely difficult to tune system for a mixed set of applications
system load varies constantly hence it is not reasonable to require careful
manual tuning each time the load changes
C. Robert Putnam
106747319
Page 18
2/16/2016
11:33 AM
Solaris 2.x Scheduling
multithreaded, symmetric multiprocessing operating system
preemptive kernel
fully preemptive kernel
global kernel data structures are protected by synchronization objects
 mutual exclusion locks (mutexes)
 semaphores
implementation of interrupts using special kernel threads
use the standard synchronization primitives of the kernel



rarely needs to raise the interrupt level to protect critical regions
few nonpreemptible code segments
higher-priority process is scheduled as soon as it is runnable
interrupt threads run at the highest priority
scheduling classes can be dynamically loaded
priorities of the interrupt threads are recomputed to ensure that they remain at
the highest possible level
blocked thread can only be restarted on the same processor
multiprocessor support
single dispatch queue for all processors
slelected threads can be restricted to a single, specific processor
processors communicate by sending cross-processor interrupts
per-processor data structure -- scheduling variables
cpu_thread
thread currently running on this processor
cpu_dispthread
thread last selected to run on this processor
cpu_idle
idle thread for this processor
cpu_runrun
preemption flag used for time-sharing threads
cpu_kprunrun
kernel preemption flag set by real-time threads
cpu_chosen_level
priority of thread that is going to preempt
the current thread on this processor
C. Robert Putnam
106747319
Page 19
2/16/2016
11:33 AM
Thread T1, priority 120
Thread T2, priority 130
Thread T3, priority 100
Thread T4, priority 132
Thread T5, priority 135
Processor P1
Processor P2
Processor P3
Processor P4
Processor P5
Thread T6, priority 130
Blocked
Thread T7, priority 115
Blocked


event on processor P1 makes thread T6 runnable
kernel
o places T6 on dispatch queue
o calls cpu_choose() to find processor with the lowest priority thread
cpu_choose()
 marks selected processor for preemption
 sets cpu_chosen_level to priority T6 value130
 send cross-processor interrupt


event on processor P2 makes thread T7 runnable
kernel
o places T7 on dispatch queue
o calls cpu_choose() to find processor with the lowest priority thread
cpu_choose()
 compares cpu_chosen_level to priority T7 value115
 does not mark selected processor for preemption

processor P3
o handles the interrupt
o preempts thread T3
hidden scheduling
priority inversion
C. Robert Putnam
106747319
Page 20
2/16/2016
11:33 AM
implementation of priority inheritance
limitations of priority inheritance
turnstiles
analysis
Mach Scheduling
C. Robert Putnam
106747319
Page 21
2/16/2016
11:33 AM
C. Robert Putnam
106747319
Page 22
2/16/2016
11:33 AM
C. Robert Putnam
106747319
Page 23
2/16/2016
11:33 AM
C. Robert Putnam
106747319
Page 24
2/16/2016
11:33 AM
Download