Scheduling in Server Farms Mor Harchol-Balter Computer Science Dept Carnegie Mellon University

advertisement
Scheduling in Server Farms
Mor Harchol-Balter
Computer Science Dept
Carnegie Mellon University
harchol@cs.cmu.edu
1
= today
Outline
I.
= tomorrow
Review of scheduling
in single-server
II. Supercomputing
III. Web server farm model
FCFS
Router
PS
Router
FCFS
PS
IV. Towards Optimality …
SRPT
SRPT
&
Router
SRPT
Metric:
Mean
Response
Time,
E[T]
2
Single Server Model (M/G/1)
Load r = lE[X]<1
Poisson
arrival
process
w/rate l
X: job size
(service requirement)
Bounded Pareto
Job sizes with huge variance are everywhere in CS:
Var ( X )
C =
E[ X ]2
2
X
Pr{Job size  x} ~ 1
1
x Downey 96]
• CPU Lifetimes of UNIX jobs [Harchol-Balter,
• Supercomputing job sizes [Schroeder, Harchol-Balter 00]
• Web file sizes [Crovella, Bestavros 98, Barford, Crovella 98]
½
Huge Shin 99]
• IP Flow durations [Shaikh, Rexford,
D.F.R.
Variability
¼
8  C X2  50 is typical
Top-heavy:
kx p
top 1% jobs
make up half
load
3
Scheduling Single Server (M/G/1)
Poisson
arrival
process
Load r <1
Huge
Variance
Question: Order these scheduling policies for
mean response time, E[T]:
1. FCFS (First-Come-First-Served, non-preemptive)
2. PS (Processor-Sharing, preemptive)
3. SJF (Shortest-Job-First, a.k.a., SPT, non-preemptive)
4. SRPT (Shortest-Remaining-Processing-Time, preemptive)
5. LAS (Least Attained Service, a.k.a., FB, preemptive)
4
Scheduling Single Server (M/G/1)
Load r <1
Poisson
arrival
process
LOW E[T]
SRPT
OPT for all
arrival
sequences
[Schrage 67]
Huge
Variance
HIGH E[T]
<
LAS
<
Requires D.F.R.
[Righter,
Shanthikumar89]
No “Starvation!”
PS
<
Insensitive
to E[X2]
SJF <
FCFS
Surprisingly bad:
(E[X2] term)
Even the biggest jobs prefer SRPT to PS:
~E[X2]
(shorts
caught
behind
longs)
[Bansal, Harchol-Balter 01], [Wierman, Harchol-Balter 03]:
THM:
E[T(x)]SRPT < E[T(x)]PS for all x, for Bounded Pareto, r < .9.
5
Effect of Variability
FCFS
FCFS
FCFS
FCFS
SJF
SJF
SJF
SJF
FCFS
SJF
E[T]
PS
LAS
LAS
LAS
LAS
LAS
SRPT
r
C2 = 24
8
16
12
20
Bounded Pareto job sizes
6
Closed vs. Open Systems
Open System
Closed System
Think
Send
Receive
QUESTION:
What’s the
effect of
scheduling?
7
Closed vs. Open Systems
FCFS
E[T]
E[T]
FCFS
SRPT
r
Open System Results
SRPT
r
Closed System Results
Closed & open systems analyzed under same job size distribution,
with same average load.
[Schroeder, Wierman, Harchol-Balter, NSDI 06]
8
Summary Single-Server
Single-server system
r <1
X: job size -- highly-variable
LESSONS LEARNED:
 Smart scheduling greatly
improves mean response time.
 Variability of job size
distribution is key.
 Closed system sees much
reduced effect.
9
Multiserver Model
Server farms:
+ Cheap
+ Scalable capacity
Incoming
jobs:
Poisson
Process
Routing
(assignment)
policy
Sched. policy
Sched. policy
Router
Sched. policy
2 Policy Decisions
(Sometimes scheduling policy is fixed – legacy system)
10
Outline
I.
Review of scheduling
in single-server
II. Supercomputing
III. Web server farm model
FCFS
Router
PS
Router
FCFS
PS
IV. Towards Optimality …
SRPT
SRPT
&
Router
SRPT
Metric:
Mean
Response
Time,
E[T]
11
Supercomputing Model
Poisson
Process
Routing
(assignment)
policy
FCFS
FCFS
Router
FCFS
 Jobs are not preemptible.
 Jobs processed in FCFS order.
 Assume hosts are identical.
 Jobs i.i.d. ~ G: highly variable size distribution.
 Size may or may not be known. Initially assume known.
12
Q: Compare Routing Policies for E[T]?
Supercomputing
1. Round-Robin
2. Join-Shortest-Queue
Go to host w/ fewest # jobs.
3. Least-Work-Left,
equivalent to M/G/k/FCFS
Go to host with least total work.
4.
FCFS
Poisson
Process
Routing
policy
FCFS
Router
FCFS
 Jobs i.i.d. ~ G: highly variable
Central-Queue-Shortest-Job
(M/G/k/SJF)
Host grabs shortest job when free.
5.
Size-Interval Splitting
Jobs are split up by size among hosts.
13
A: Size-Interval Splitting: best so far
High
E[T] 1. Round-Robin
2. Join-Shortest-Queue
Go to host w/ fewest # jobs.
3. Least-Work-Left,
equivalent to M/G/k/FCFS
Go to host with least total work.
4.
Supercomputing
FCFS
Routing
policy
FCFS
Router
FCFS
 Highly variable job sizes
Central-Queue-Shortest-Job
(M/G/k/SJF)
Host grabs shortest job when free.
Low
E[T]
5.
Size-Interval Splitting
Jobs are split up by size among hosts.
[Harchol-Balter, Crovella,
Murta, JPDC 99]
14
Routing Policies: Remarks
High
E[T] 1. Round-Robin
2. Join-Shortest-Queue
Go to host w/ fewest # jobs.
Central-Queue:
+ Good utilization
of servers.
+ Some isolation
for smalls
3. Least-Work-Left,
equivalent to M/G/k/FCFS
Go to host with least total work.
4.
Central-Queue-Shortest-Job
(M/G/k/SJF)
Host grabs shortest job when free.
Low
E[T]
5.
Size-Interval WAY Better!
- Worse utilization
of servers.
+ Great isolation
for smalls!
Size-Interval Splitting
Jobs are split up by size among hosts.
[Harchol-Balter, Crovella,
15
Murta, JPDC 99].
Size-Interval Splitting
S
x  f ( x)
M
L
SizeInterval
Routing
XL
job size x
Question: How to choose the size cutoffs?
“To Balance Load or Not to Balance Load?”
?
?
?
 xf ( x)dx =  xf ( x)dx =  xf ( x)dx =
 xf ( x)dx
S
XL
M
L
16
Size-Interval Splitting
x  f ( x)
FCFS
ssss S
FCFS
L LL L
SizeInterval
Routing
job size x
Answer: Recent Research for case of Bounded Pareto
job size: Pr{X>x} ~ x-a
a<1
UNBALANCE
favor smalls
a=1
BALANCE LOAD
a>1
UNBALANCE
favor larges
[Harchol-Balter,Vesilo, 06+], [Glynn, Harchol-Balter, Ramanan, 06+]
17
Beyond Size-Interval Splitting
FCFS
ssss S
x  f ( x)
FCFS
L LL L
SizeInterval
Routing
job size x
Q: Is Size-Interval Splitting as good as it gets?
18
Size-Interval Splitting with Stealing
Answer: Allow Cycle Stealing!
FCFS
S
FCFS
L
SizeInterval
Routing
with
Cycle
Stealing
Send Shorts Here
Send Longs Here.
But, if idle, send Short.
 Gain to Shorts is high
 Pain to Longs is very small.
Cycle Stealing analysis very hard: Fayolle, Iasnogorodski, Konheim,
Meilijson, Melkman, Cohen, Boxma, van Uitert, Jelenkovic, Foley, McDonald,
Harrison, Borst, Williams …
New easy approach: Dimensionality Reduction 2D1D
[Harchol-Balter, Osogami, Scheller-Wolf, Squillante SPAA03]
19
What if Don’t Know Job Size?
1. Round-Robin
2. Join-Shortest-Queue
Go to host w/ fewest # jobs.
3. Least-Work-Left,
equivalent to M/G/k/FCFS
FCFS
Routing
policy
FCFS
Router
FCFS
 Highly variable job sizes
Go to host with least total work.
4.
Central-Queue-Shortest-Job
(M/G/k/SJF)
Host grabs shortest job when free.
5.
Size-Interval Splitting
Jobs are split up by size among hosts.
Q: What can we do to
minimize E[T] when
don’t know job size?
20
The TAGS algorithm
“Task Assignment by Guessing Size”
Outside
Arrivals
Host 1
s
Host 2
m
Host 3
Answer:
When job reaches size limit for host, then it
is killed and restarted from scratch at next host.
[Harchol-Balter, JACM 02]
21
Results of Analysis
Random
Least-Work-Left
TAGS
High
variability
Lower
variability
22
Summary – Part I
Single-server system
r <1
X: job size -- highly-variable
Supercomputing
FCFS
Router
FCFS
LESSONS LEARNED:
 Smart scheduling greatly
improves mean response time.
 Variability of job size
distribution is key.
 Closed system sees much
reduced effect.
LESSONS LEARNED:
 Greedy routing policies,
like JSQ, LWL are poor.
 To combat variability, need
size-interval splitting.
 By isolating smalls, can
achieve effects of smart
single-server policies
 Load UN-balancing
 Don’t need to know size.
23
Tomorrow …
I.
Review of scheduling in single-server
M/GI/1
II. Supercomputing/Manufacturing
III. Web server farm model
FCFS
Router
PS
Router
FCFS
IV. Towards Optimality …
PS
Homework
SRPT
SRPT
&
Router
SRPT
24
Scheduling in Multiserver Systems
PART II
Mor Harchol-Balter
Computer Science Dept
Carnegie Mellon University
harchol@cs.cmu.edu
25
Outline
I.
Review of scheduling
in single-server
II. Supercomputing
III. Web server farm model
FCFS
Router
PS
Router
FCFS
PS
IV. Towards Optimality …
SRPT
SRPT
&
Router
SRPT
26
Outline
I.
Review of scheduling
in single-server
II. Supercomputing
III. Web server farm model
FCFS
Router
PS
Router
FCFS
PS
IV. Towards Optimality …
SRPT
SRPT
&
Router
SRPT
27
Web Server Farm Model
PS
Poisson
Process
Routing
policy
PS
Router
PS




 Cisco Local
Director
 IBM Network
Dispatcher
 Microsoft
SharePoint
 F5 Labs BIG/IP
HTTP requests are immediately dispatched to server.
Requests are fully preemptible.
Commodity servers utilized  Do Processor-Sharing.
Jobs i.i.d. ~ G: highly variable size distribution,
7 orders magnitude difference in job size
[Crovella, Bestavros 98].
28
Q: Compare Routing Policies for E[T]?
Web Server Farm
High
E[T]FCFS
PS
1. Random
Router
PS
2. Join-Shortest-Queue
Go to host w/ fewest # jobs.
3. Least-Work-Left
PS
?
 High variance job size
?
Go to host with least total
work.
4. Size-Interval Splitting
Low
E[T]FCFS
Jobs are split up by size
among hosts.
(Central-Queue policies aren’t
possible for PS farms)
29
Q: Compare Routing Policies for E[T]?
PS
1. Random
PS
Router
PS
2. Join-Shortest-Queue
Go to host w/ fewest # jobs.
3. Least-Work-Left
Go to host with least total
work.
4. Size-Interval Splitting
Jobs are split up by size
among hosts.
Answer:
ShortestQueue is
greedier
& better.
Answer:
Same for E[T],
but not great.
Also, want to
8 servers, r = .9, C2=50
balance load!
E[T]
SIZE
RAND
E[T ]M/G/1/PS =
LWL
JSQ
PS farm
E[T ]
1

r
l 1 r
 1
r 
=  pi 


l
p
1

r
i
 i

30
Prior Analysis of JSQ Routing
All prior JSQ analysis assumes FCFS servers
FCFS
JSQ
FCFS
2-server:
>2-server approximations:
[Kingman 61] , [Flatto, McKean 77],
[Wessels, Adan, Zijm 91]
[Nelson, Philips, Sigmetrics 89]
[Nelson, Philips, Perf.Eval. 93]
[Foschini, Salz 78],
[Knessl, Makkowsky, Schuss, Tier 87]
[Lin, Raghavendra, TPDS 96]
[Conolly 84], [Rao, Posner 87], [Blanc 87],
[Grassmann 80], [Muntz, Lui, Towsley 95]
[Cohen, Boxma 83]
31
[Nelson, Philips] Idea
FCFS
JSQ
FCFS
E[Waiting Time] =
E[ JobSize ] Pr{n total in all}E[length shortest queue | n total]
n
Assume this is:
M /M /k
n
p
Assume this is:
 n
 k 
k: #servers
32
First Analysis of JSQ for PS
[Gupta, Harchol-Balter, Sigman, Whitt, 06+]
PS
Poisson
Process
PS
JSQ
PS
Near insensitivity to C2:
PS server farm with General service
 PS server farm w/Exponential service
= FCFS server farm w/Exponential service
Single-queue equivalence:
For PS server farm w/Exponential service,
Multiserver system = Single queue w/ contingent arrival rates
33
Summary so far
Supercomputing
FCFS
Router
FCFS
LESSONS LEARNED:
 Greedy routing policies,
like JSQ, LWL are poor.
 To combat variability, need
size-interval splitting.
 By isolating smalls, can
achieve effects of smart
single-server policies
 Load UN-balancing
 Don’t need to know size.
Web server farm
PS
Router
PS
PS
LESSONS LEARNED:
 JSQ routing is good!
 Job size variability
not a problem.
 Load Balancing
34
Outline
I.
Review of scheduling in single-server
M/GI/1
II. Supercomputing/Manufacturing
III. Web server farm model
FCFS
Router
PS
Router
FCFS
PS
IV. Towards Optimality …
SRPT
SRPT
&
Router
SRPT
35
What is Optimal Routing/Scheduling?
Sched. policy
Routing
policy
Incoming
jobs
Sched. policy
Router
Sched. policy
2 Policy Decisions
Assume no restrictions:
 Jobs are fully preemptible.
 Can have central queue if want it, or not.
 Know job size
(of course don’t know future jobs ...)
36
What is Optimal Routing/Scheduling?
Central-Queue-SRPT
SRPT
Recall:
SRPT
minimizes E[T] on every sample path!
[Schrage 67]
Question: Central-Queue-SRPT looks pretty good!
Does it minimize E[T]?
37
Central-Queue-SRPT
SRPT
Answer: This does not minimize E[T] on every arrival sequence.
Bad Arrival Sequence:
@time 0: 2 jobs size 29, 1 job size 210
@time 210: 2 jobs size 28, 1 job size 29
@time 210 + 29: 2 jobs size 27, 1 job size 28, etc.
Central-Queue-SRPT
OPT
28 28 29
29
29
210
28
29
29 28 210
29
preempted
38
Central-Queue-SRPT
SRPT
Adversarial (Worst-Case) Guarantees:
THM: [Leonardi, Raz, STOC 97]: Central-Queue-SRPT is
  
O log min
biggest size
smallest size
jobs
, ##servers

competitive for E[T], and no online policy can improve upon this
by more than constant factor.
Remarks:
 log(biggest/smallest) could be factor 7 in practice!
 Closest stochastic result analyzes only central-queue w/priorities:
[Harchol-Balter, Wierman, Osogami, Scheller-Wolf, QUESTA 05]
39
What is Optimal Routing/Scheduling
with Immediate Dispatch?
Sched. policy
Routing
policy
Incoming
jobs
Sched. policy
Router
Sched. policy
2 Policy Decisions
Practical Assumption: jobs must be immediately dispatched!
 Jobs are fully preemptible within queue.
 Know job size.
40
What is Optimal Routing/Scheduling
when Immediately Dispatch?
Incoming
jobs
Immediately
Dispatch Jobs
SRPT
Router
SRPT
Claim:
The optimal routing/sched.
pair given immed. dispatch
uses SRPT at the hosts.
(Assuming an opt pair exists.)
PROOF:
 Let A: optimal routing/scheduling pair wrt E[T].
 Suppose by contradiction: A does not use SRPT at the hosts.
 Let policy pair B mimic A with respect to A’s dispatching of jobs to hosts.
 I.e., policy B may be different from A, but sends the same jobs to the
same hosts at the same times as A.
 But after the dispatching, B does SRPT scheduling at the hosts.
 Thus B improves upon A with respect to E[T]. Contradiction!
IMPACT: Claim  narrow search to policies with SRPT at hosts.
41
In search of good
Immediate Dispatch Routing
Immediately
Dispatch Jobs
Incoming
jobs
Router
SRPT
SRPT
SRPT
Q: What should immediate dispatch routing
policy be, given SRPT sched. at hosts?
42
In search of good
Immediate Dispatch Routing
… why not obvious
Immediately
Dispatch Jobs
SRPT
JSQ
SRPT
OPT
1000 10
10
1 1000
Bad Arrival Sequence:
@time 0: job of size 10 arrives.
@time 0+: job of size 1000 arrives.
@time 0++: job of size 10 arrives.
@time 0+++: job of size 1 arrives.
@time 1+++: job of size 1000 arrives.
JSQ/SRPT
10
10
1000
1 1000
43
Smart Immediate Dispatch Policy
Immediately
Dispatch
Incoming
jobs
Router
SRPT
SRPT
SRPT
Answer: IMD Algorithm due to [Avrahami,Azar 03]:
 Split jobs into size classes
 Assign each incoming job to server w/ fewest
#jobs in that class
Remarks:
biggest
 IMD is O log min smallest , # jobs competitive for E[T].
 Immediate Dispatching is “as good as” Central-Queue-SRPT
 Similar policy proposed by [Wu,Down 06] for heavy-traffic setting.
  

44
Some Key Points
Supercomputing
Web server farm model
FCFS
Router
PS
Router
FCFS
• Need Size-interval splitting
to combat job size variability
and enable good performance.
• Job size variability is
not an issue.
• Greedy, JSQ, performs well.
Towards Optimality …
SRPT
SRPT
&
Router
PS
SRPT
• Both these have
similar worst-case E[T].
• Almost exclusively
worst-case analysis,
so hard to compare
with above results.
• Need stochastic
45
research here!
If you want to know more …
My class lectures are all available online.
15-849 Performance Modeling
** Highly-recommended for CS theory, Math, TEPPER, and ACO doctoral students
Queueing theory is an old area of mathematics which has recently become very hot. The goal of
queueing theory has always been to improve the design/performance of systems, e.g. networks, servers,
memory, disks, distributed systems, etc., by finding smarter schemes for allocating resources to jobs.
In this class we will study the beautiful mathematical techniques used in queueing theory, including
stochastic analysis, discrete-time and continuous-time Markov chains, renewal theory, product-forms,
transforms, supplementary random variables, fluid theory, scheduling theory, matrix-analytic methods,
and more. Throughout we will emphasize realistic workloads, in particular heavy-tailed workloads.
This course is packed with open problems -- problems which if solved are not just interesting
theoretically, but which have huge applicability to the design of computer systems today.
Instructor: Mor Harchol-Balter (harchol@cs.cmu.edu)
www.cs.cmu.edu/~harchol/
46
References
 N. Avrahami and Y. Azar, “Minimizing Total flow time and total completion
time with immediate dispatching.” SPAA 2003, pp. 11-18.
 N. Bansal and M. Harchol-Balter, "Analysis of SRPT scheduling: Investigating
unfairness," Proceedings of ACM Sigmetrics 2001.
 P. Barford and M. Crovella, “Generating representative web workloads for
network and server performance evaluation,” ACM Sigmetrics 1998, pp. 151-160.
 J. Blanc, “A note on waiting times in systems with queues in parallel,” J. Appl.
Prob., Vol. 24, 1987 pp 540-546.
 S. Borst, O. Boxma, and P. Jelenkovic, “Reduced load equivalence and induced
burstiness in GPS queues with long-tailed traffic flows,” Queueing Systems, Vol.
43, 2003, pp. 274-285.
 S. Borst, O. Boxma, and M. van Uitert, “The asymptotic workload behavior of
two coupled queues,” Queueing Systems, Vol. 43, 2003, pp. 81-102.
 J.W. Cohen and O. Boxma, Boundary Value Problems in Queueing System
Analysis, North Holland, 1983
 B. Conolly, “The Autostrada queueing problem,” J. Appl. Prob.: Vol. 21., 1984, pp.
394-403.
47
References, cont.
 M. Crovella and A. Bestavros, “Self-similarity in world wide web traffic:
evidence and possible causes,” Proceedings of the 1996 ACM Sigmetrics
International Conference on Measurement and Modeling of Computer Systems,
May 1996, pp. 160-169.
D. Down and R. Wu, “Multi-layered round robin scheduling for parallel servers,”
Queueing Systems: Theory and Applications, Vol. 53, No. 4, 2006, pp. 177-188.
G. Fayole and R. Iasnogorodski, “Two coupled processors: the reduction to a
Riemann-Hilbert problem,” Zeitschrift fur Wahrscheinlichkeistheorie und
vervandte Gebiete, vol. 47, 1979, pp. 325-351.
 L. Flatto and H.P. McKean, “Two queues in parallel,” Communication on Pure and
Applied Mathematics, Vol. 30, 1977, pp. 255-263.
 R. Foley and D. McDonald, “Exact asymptotics of a queueing network with a
cross-trained server,” Proceedings of INFORMS Annual Meeting, October 2003,
pp. MD-062.
 G. Foschini and J. Salz, “A basic dynamic routing problem and diffusion,” IEEE
Transactions on Communications, Vol. Com-26, No. 3, March 1978.
48
References, cont.
 P. Glynn, M. Harchol-Balter, K. Ramanan, “Heavy-traffic approach to optimizing
size-interval task assignment,” Work in progress, 2006.
 W. Grassmann, "Transient and steady state results for two parallel queues,"
Omega, vol. 8, 1980, pp. 105-112.
 V. Gupta, M. Harchol-Balter, K. Sigman, and W. Whitt, “Analysis of join-theshortest-queue policy for web server farms.” In submission, 2006.
 M. Harchol-Balter and A. Downey. "Exploiting process lifetime distributions for
dynamic load balancing," Proceedings of ACM Sigmetrics '96 Conference on
Measurement and Modeling of Computer Systems , May 1996, pp. 13-24.
 M. Harchol-Balter, M. Crovella, and C. Murta, "On choosing a task assignment
policy for a distributed server system," Journal of Parallel and Distributed
Computing , vol. 59, no. 2, Nov. 1999, pp. 204-228.
 M. Harchol-Balter, C. Li, T. Osogami, and A. Scheller-Wolf, and M. Squillante,
“Cycle stealing under immediate dispatch task assignment,” Proceedings of the
Annual ACM Symposium on Parellel Algorithms and Architectures (SPAA), June
2003, pp. 274-285.
49
References, cont.
 M. Harchol-Balter, B. Schroeder, N. Bansal, M. Agrawal. "Size-based scheduling to
improve web performance." ACM Transactions on Computer Systems , Vol. 21, No. 2,
May 2003, pp. 207-233.
 M. Harchol-Balter and R.Vesilo, “Optimal cutoffs for size-interval task assignment,”
Work in progress, 2006.
 M. Harchol-Balter, A. Wierman, T. Osogami, and A. Scheller-Wolf, "Multi-server
queueing systems with multiple priority classes," Queueing Systems: Theory and
Applications (QUESTA), vol. 51, no. 3-4, 2005, pp. 331-360.
 J. Kingman, “Two similar queues in parallel,” Biometrika, Vol. 48, 1961, pp. 1316-1323.
A. Konheim, I. Meilijson, and A. Melkman, “Processor-sharing of two parallel lines,” J.
Appl. Prob., Vol. 18, 1981, pp. 952-956.
 C. Knessl, B. Matkowsky, Z. Schuss, and C. Tier, “Two parallel M/G/1 queues where
arrivals join the system with the smaller buffer content,” IEEE Transactions on
Communications, Vol. Com-35, No. 11,1987, pp. 1153-1158.
 S. Leonardi and D. Raz, “Approximating total flow time on parallel machines,”
ACM Symposium on Theory of Computing (STOC), 1997.
50
References, cont.
 H. Lin, and C. Raghavendra, “An approximate analysis of the join the shortest
queue (JSQ) policy”, IEEE Transactions on Parallel and Distributed Systems, vol.
7, no. 3, March 1996.
 J. Lui, R. Muntz, D. Towsley, “Bounding the mean response time of the
minimum expected delay routing policy: an algorithmic approach,” IEEE
Transactions on Computers, Vol. 44, No. 12, Dec 1995.
 S. Muthukrishnan, R. Rajaraman, A. Shaheen, and J. Gehrke, “Online scheduling
to minimize average stretch,” Proceedings of the 40th Annual Symposium on
Foundations of Computer Science, October 1999, pp. 433.
 R. Nelson and T. Philips, “An approximation to the response time for shortest
queue routing,” ACM SIGMETRICS Performance Evaluation Review, Vol. 17 No. 1,
May 1989, pp. 181-189.
 R. Nelson and T. Philips, “An approximation for the mean response time for
shortest queue routing with general interarrival and service times,” Performance
Evaluation, Vol. 17 No. 2, March 1993 pp. 123-139.
51
References, cont.
 T. Osogami, M. Harchol-Balter, and A. Scheller-Wolf, “Analysis of cycle stealing
with switching cost,” Proceedings of the ACM Sigmetrics, June 2003, pp. 184-195.
 T. Osogami, M. Harchol-Balter, and A. Scheller-Wolf. "Analysis of cycle stealing
with switching times and thresholds" Performance Evaluation, Vol. 61, No. 4, 2005,
pp. 347-369.
 B. Rao and M. Posner, “Algorithmic and approximate analysis of the shorter
queue,” Model Naval Research Logistics, Vol. 34, 1987, pp. 381-398.
 R. Righter and J. Shanthikumar, “Scheduling multiclass single server queueing
systems to stochastically maximize the number of successful departures,"
Probability in the Engineering and Informational Sciences, Vol. 3, 1989, pp. 323-333.
 L.E. Schrage, “A proof of the optimality of the shortest processing remaining
time discipline,” Operations Research, Vol. 16, 1968, pp. 678-690.
 B. Schroeder and M. Harchol-Balter, "Evaluation of task assignment policies for
supercomputing servers: The case for load unbalancing and fairness," 9th IEEE
Symposium on High Performance Distributed Computing (HPDC '00) , August 2000.
52
References, cont.
 B. Schroeder, A. Wierman, and M. Harchol-Balter. "Closed versus open system
models: A cautionary tale,” Proceedings of NSDI , 2006.
A. Shaikh, J. Rexford, and K. Shin, “Load-sensitive routing of long-lived IP
flows,” Proceedings of SIGCOMM, September, 1999.
 J. Wessels, I. Adan, and W. Zijm, “Analysis of the asymmetric shortest queue
problem,” Queueing Systems, Vol. 8, 1991, pp. 1-58.
 A. Wierman and M. Harchol-Balter. "Classifying scheduling policies with respect
to higher moments of conditional response time." Proceedings of ACM Sigmetrics
2005 Conference on Measurement and Modeling of Computer Systems.
 A. Wierman and M. Harchol-Balter. "Nearly insensitive bounds on SMART
scheduling." Proceedings of ACM Sigmetrics 2005 Conference on Measurement
and Modeling of Computer Systems.
 A. Wierman and M. Harchol-Balter, "Classifying scheduling policies with
respect to unfairness in an M/GI/1," Proceedings of ACM Sigmetrics 2003
Conference on Measurement and Modeling of Computer Systems , June 2003.
53
Download