Priority Scheduling: an Application for the Permutahedron (ppt)

advertisement
Priority Scheduling:
An Application for the Permutahedron
Ethan Bolker
UMass-Boston
BMC Software
AMS Toronto meeting
September 24, 2000
Plan
•
•
•
•
Brief introduction to queueing theory
Priority scheduling
Conservation laws and the permutahedron
Specifying CPU shares
interesting pictures and open questions
References: www.cs.umb.edu/~eb/goalmode
Acknowledgements: Jeff Buzen, Yiping Ding, Dan
Keefe, Oliver Chen, Aaron Ball,
Tom
Larard
2
Queueing theory
• Workload: stream of jobs visiting a server
(ATM, time shared CPU, printer, …)
• Jobs queue when server is busy
• Input:
– Arrival rate:
– Service demand:
 job/sec
s sec/job
• Performance metrics:
–
–
–
–
Utilization:
Response time:
Degradation:
Queue length:
u = s (must be  1)
r = ???
d = r/s
q = r (Little’s law)
3
Response time computations
• r, d, q measure queueing delay
r  s (d  1), unless parallel processing possible
• Randomness really matters
r = s (d = 1) if arrivals scheduled (best case, no waiting)
r >> s for bulk arrivals (worst case, maximum delays)
• Theorem. d = 1/(1- u) if arrivals are Poisson and
service is exponentially distributed (M/M/1).
 r = s/(1- u) (think virtual server with speed 1-u )
 q = u/(1- u) (convention: job in service is on queue)
4
M/M/1
• Essential nonlinearity often counterintuitive
– at u = 90% average queue length is 0.9/(1-0.9) = 9,
– average response time is s/(1-0.9) = 10s,
– but 1 customer in 10 has no wait at all (10% idle time)
• A useful guide even when hypotheses fail
– accurate enough ( 20%) for real computer systems
– d depends only on u: many small jobs have same
impact as few large jobs
– faster system  smaller s  smaller u
r = s/(1-u)  double win: less service, less wait
– waiting costly, server cheap (telephones): want u  0
– server costly (doctors): want u  1 but scheduled
5
Multiple Job Streams
• Multiple workloads, utilizations u1, u2, …
• U =  ui < 1
All degradations equal: di = 1/(1-U)
• Suppose priority scheduling possible
Study degradation vector V = (d1, d2, …)
6
Priority Scheduling
• Priority state: order workloads by priority (ties OK)
– two workloads, 3 states: 12, 21, [12]
– three workloads, 13 states:
•
•
•
•
123
[12]3
1[23]
[123]
(6 = 3! of these ordered states),
(3 of these),
(3 of these),
(1 state with no priorities)
– n wkls, f(n) states, n! ordered (simplex lock combos)
• p(s) = prob( state = s ) = fraction of time in state s
• V(s) = degradation vector when state = s
(measure this, or compute it using queueing theory)
• V = s p(s)V(s) (time avg is convex combination)
• Achievable region is convex hull of vectors V(s)
Two workloads
d1 = d2
d2
V(12) (wkl 1 high prio)

 V([12]) (no priorities)
achievable region
V(21)

d1
8
Two workloads
d1 = d2
d2
V(12) (wkl 1 high prio)

 V([12]) (no priorities)

V(21)

d1
9
Two workloads
d1 = d2
d2
V(12) (wkl 1 high prio)

 V([12]) (no priorities)
note: u1 < u2  wkl 2 effect on wkl 1 large
V(21)

d1
10
Conservation
• No Free Lunch Theorem. Weighted average
degradation is constant, independent of priority
scheduling scheme:
i (ui /U) di = 1/(1-U)
• Provable from some hypotheses
• Observable in some real systems
• Sometimes false: shortest job first minimizes
average response time (printer queues,
supermarket express checkout lines)
11
Conservation
• For any proper set A of workloads
Imagine giving those workloads top priority.
Then can pretend other wkls don’t exist. In that case
i  A (ui /U(A)) di = 1/(1-U(A))
When wkls in A have lower priorities they have
higher degradations, so in general
i  A (ui /U(A)) di  1/(1-U(A))
• These 2n -2 linear inequalities determine the
convex achievable region R
• R is a permutahedron: only n! vertices
12
Two workload permutahedron
d2
u1d1 + u2d2 = U/(1-U)
d1
13
Two workload permutahedron
d2
u1d1 + u2d2 = U/(1-U)
V(21) 
d2  1/(1- u2 )
d1
14
Two workload permutahedron
d2
 V(12)
achievable region
u1d1 + u2d2 = U/(1-U)
d1  1/(1- u1 )
V(21) 
d2  1/(1- u2 )
d1
15
Three workload permutahedron
d3
u1d1 + u2d2 + u3d3 = U/(1-U)
V(213)
V(123)
 
d2
d1
16
Experimental evidence
17
Four workload permutahedron
4! = 24 vertices (ordered states)
24 - 2 = 14 facets (proper subsets)
(conservation constraints)
74 faces (states)
Simplicial geometry and transportation polytopes,
Trans. Amer. Math. Soc. 217 (1976) 138.
18
Scheduling for performance
• Administrator specifies performance goals
– desired degradations (IBM OS/390) (not today)
– CPU shares (UNIX offerings from HP, IBM, Sun)
• Operating system dispatches jobs in an attempt to
meet goals
• Model predicts degradations by constructing map
workload performance goals
permutahedron
19
Specifying CPU shares
• Administrator specifies workload CPU shares
• Share f (0 < f < 1) means workload guaranteed
fraction f of CPU when at least one of its jobs
is queued for service, can get more if some
competition is absent
• share  utilization
• share  cap
• share should be renamed guarantee
20
Map shares to degradations
- two workloads • Suppose f1 and f2 > 0 , f1 + f2 = 1
• Model: System operates in state
– 12 with probability f1
– 21 with probability f2
(independent of who is on queue)
• Average degradation vector:
V = f1 V(12) + f2 V(21)
21
Model validation
22
Model validation
23
Map shares to degradations
- three (n) workloads prob(123) =
f1
f2
f3
-----------------------------(f1 + f2 + f3) (f2 + f3) (f3)
• Theorem: These n! probabilities sum to 1
– interesting identity generalizing adding fractions
– prove by induction, or by coupon collecting
• V = ordered states s prob(s) V(s)
• O(n!), (n!), good enough for n  9 (12)
• Searching for fast (approximate) algorithm ...
24
Model validation
25
Model validation
26
Map shares to degradations
(geometry)
• Interpret shares as barycentric coordinates in
the n-1 simplex
• Study the geometry of the map from the
simplex to the n-1 dimensional permutahedron
• Easy when n=2: each is a line segment and map
is linear
27
Mapping a triangle to a hexagon
f1 = 1

f1 = 0
f3 = 1
132
312
M
321
123
wkl 1 high priority
wkl 1 low priority
213
231
28
f1 = 0
Mapping a triangle to a hexagon
f1 = 1

{23}
29
Mapping a triangle to a hexagon
30
Implementing fair share scheduling
• Actual Sun/solaris implementation is subtle
• HP and IBM are black boxes (for me)
• Stochastic solution: randomly choose queued job
to dispatch (implement the model rather than
model an implementation)
• May require prior computation of
priodist(w, p) = prob(wkl w runs at prio p)
• workload priority probabilities, not state
probabilities
31
Priority distributions
• Given degradations, compute a priodist
• A priodist is an nn matrix with row sums 1
• {priodists} = cartesian product of n n-simplices
priodist space (dim n(n-1))
permutahedron (dim n-1)
• Map is surjective, not injective
• Look for a well behaved inverse image
32
Three workload permutahedron
d2
d1 = d2
[13]2 312
132
3[12]
1[23]
[123]
321
123
[23]1
231
[12]3
213
d2 = d3
2[13]
d1
d1 = d3
33
… dissected into 3! quadrilaterals
d2
d1 = d2
1[23]
[123]
123
[12]3
d2 = d3
d1
34
… each mapped to from a
skew quadrilateral of priodists
1 0 0
0 .5 .5
0 .5 .5
.33 .33 .33
.33 .33 .33
.33 .33 .33
P[123]
P1[23]
1[23]

P123
1 0 0
0 1 0
0 0 1
(x,y)
(x,y)
[123]
P[12]3
.5 .5 0
.5 .5 0
0 0 1
123
[12]3
 xyP123 + x(1-y) P1[23] + (1-x)yP[12]3 + (1-x)(1-y) P[123]
 degradation vector in this corner of permutahedron
35
Skew quadrilaterals
• Given 4 points P00, P01, P10, P11  Rm ,
map unit square: (x,y) 
xyP00 + x(1-y) P01+ (1-x)yP10 + (1-x)(1-y) P11
• Easy to generalize to 2k points
• Analogous to convex hull, which maps
barycentric coordinates on a simplex
• Reference for this construction?
36
Inversion
Try to locate * = (d1, d2 ) on coordinate grid
d2

d1
37
Sequential bisection
d2


d1
38
Sequential bisection
d2


d1
39
Sequential bisection
d2



d1
40
Sequential bisection
d2



d1
41
Sequential bisection
d2

  

d1
42
… may fail to converge
d2



d1
43
Tempered sequential bisection
d2
o




d1
44
Tempered sequential bisection
d2
o
o  


d1
45
Tempered sequential bisection
d2
oo
o  


prove that this converges...
d1
46
Download