Chapter 14 - Queueing Models

advertisement
Chapter 14
Queueing Models
14.1 Introduction
A basic fact of life is that we all spend a great deal of time waiting in lines (queues). We wait in
line at a bank, at a supermarket, at a fast-food restaurant, at a stoplight, and so on. Of course,
people are not the only entities that wait in queues. Televisions at a television repair shop,
other than the one(s) being repaired, are essentially waiting in line to be repaired. Also, when
messages are sent through a computer network, they often must wait in a queue before being
processed.
The same type of analysis applies to all of these. The purpose of such an analysis is generally
twofold.
1. First, we want to examine an existing system to quantify its operating characteristics.
For example, if a fast-food restaurant currently employs 12 people in various jobs, the
manager might be interested in determining the amount of time a typical customer
must wait in line or how many customers are typically waiting in line.
2. Second, we want to learn how to make a system better. The manager might find, for
example, that the fast-food restaurant would do better, from an economic standpoint,
by employing only 10 workers and deploying them in a different manner.
The first objective, analyzing the characteristics of a given system, is difficult from a
mathematical point of view. The two basic modelling approaches are analytical and simulation.
With the analytical approach, we search for mathematical formulas that describe the operating
characteristics of the system, usually in “steady state.” The mathematical models are typically
too complex to solve unless we make simplifying (and sometimes unrealistic) assumptions. For
example, at a supermarket, customers typically join one of several lines (probably the shortest),
possibly switch lines if they see that another line is moving faster, and eventually get served by
one of the checkout people. Although this behaviour is common—and is simple to describe in
words—it is really difficult to analyze analytically.
The second approach, simulation, allows us to analyze much more complex systems, without
making many simplifying assumptions. However, the drawback to queueing simulation is that it
usually requires specialized software packages or trained computer programmers to
implement.
In this chapter, we employ both the analytical approach and simulation. For the former, we
discuss several well-known queueing models that describe some—but certainly not all—
queueing situations in the real world. These models illustrate how to calculate such operating
characteristics as the average waiting time per customer, the average number of customers in
line, and the fraction of time servers are busy. These analytical models generally require
simplifying assumptions, and even then they can be difficult to understand.
The inputs are typically mean customer arrival rates and mean service times. The required
outputs are typically mean waiting times in queues, mean queue lengths, the fraction of time
servers are busy, and possibly others. Deriving the formulas that relate the inputs to the
outputs is mathematically very difficult, well beyond the level of this book. Therefore, many
times in this chapter you have to take our word for it. Nevertheless, the models we illustrate
are very valuable for the important insights they provide.
14.2 ELEMENTS OF QUEUEING MODELS
Almost all queueing systems are alike in that customers enter a system, possibly wait in one or
more queues, get served, and then depart.
Characteristics of Arrivals
First, we must specify the customer arrival process. This includes the timing of arrivals a well as
the types of arrivals. Regarding timing, specifying the probability distribution of interarrival
times, the times between successive customer arrivals, is most common. These interarrival
times might be known—that is, nonrandom. For example. the arrivals at some doctors’ offices
are scheduled fairly precisely. Much more commonly, however, interarrival times are random
with a probability distribution. In real applications, this probability distribution must be
estimated from observed customer arrival times.
Regarding the types of arrivals, there are at least two issues.
1. First, do customers arrive one at a time or in batches—carloads, for example? The
simplest system is when customer’ arrive one at a time, as we assume in all of the
models in this chapter.
2. Second, are all customers essentially alike, or can they be separated into priority
classes? At a computer centre, for example, certain jobs might receive higher priority
and run first, whereas the lower- priority jobs might be sent to the back of the line and
run only after midnight. We assume throughout this chapter that all customers have the
same priority.
Another issue is whether (or how long) customers will wait in line. A customer might arrive to
the system, see that too many customers are waiting in line, and decide not to enter the system
at all. This is called balking. A variation of balking occurs when the choice is made by the
system, not the customer. In this case, we assume there is a waiting room size so that if the
number of customers in the system equals the waiting room size, newly arriving customers are
not allowed to enter the system. We call this a limited waiting room system. Another type of
behaviour, called reneging, occurs when a customer already in line becomes impatient and
leaves the system before starting service. Systems with balking and reneging are difficult to
analyze, so we do not consider any such systems in this chapter. However, we do discuss
limited waiting room systems.
Service Discipline
When customers enter the system, they might have to wait in line until a server becomes
available. In this case, we must specify the service discipline. The service discipline is the rule
that states which customer, from all who are waiting, goes into service next. The most common
service discipline is first-come-first-served (FCFS), where customers are served in the order of
their arrival. All of the models we discuss use the FCFS discipline. However, other service
disciplines are possible, including service-in-random-order (SRO), last- come-first-served
(LCFS), and various priority disciplines (if there are customer classes with different priorities).
For example, a type of priority discipline used in some manufacturing plants is called the
shortest-processing-time (SPT) discipline. In this case, the jobs that are waiting to be processed
are ranked according to their eventual processing (service) times, which are assumed to be
known. Then the job with the shortest processing time is processed next.
One other aspect of the waiting process is whether there is a single line or multiple lines. For
example, most banks now have a single line. An arriving customer joins the end of the line.
When any teller finishes service, the customer at the head of the line goes to that teller. In
contrast, most supermarkets have multiple lines. When a customer goes to a checkout counter,
she must choose which of several lines to enter. Presumably, she will choose the shortest line,
but she might use other criteria in her decision. After she joins a line—inevitably the slowestmoving one, from our experience!—she might decide to move to another line that seems to be
moving faster.
Service Characteristics
In the simplest systems, each customer is served by exactly one server, even when the system
contains multiple servers. For example, when you enter a bank, you are eventually served by a
single teller, even though several tellers are working. The service times typically vary in some
random manner, although constant (nonrandom) service times are sometimes possible. When
service times are random, we must specify the probability distribution of a typical service time.
This probability distribution can be the same for all customers and servers, or it can depend on
the server and/or the customer. As with interarrival times, service time distributions must
typically be estimated from service time data in real applications.
In a situation like the typical bank, where customers join a single line and are then served by
the first available teller, we say the servers (tellers) are in parallel (see Figure 14.1).
A different type of service process is found in many manufacturing settings. For example,
various types of parts (the “customers”) enter a system with several types of machines (the
“servers”). Each part type then follows a certain machine routing, such as machine 1, then
machine 4, and then machine 2. Each machine has its own service time distribution, and a
typical part might have to wait in line behind any or all of the machines on its routing. This type
of system is called a queueing network. The simplest type of queueing network is a series
system, where all parts go through the machines in numerical order: first machine 1, then
machine 2, then machine 3, and so on (see Figure 14.2). We examine mostly parallel systems in
this chapter.
Short-Run versus Steady-State Behaviour
If you run a fast-food restaurant, you are particularly interested in the queueing behaviour
during your peak lunchtime period. The customer arrival rate during this period increases
sharply, and you probably employ more workers to meet the increased customer load. In this
case, your primary interest is in the short-run behaviour of the system—the next hour or two.
Unfortunately, short-run behaviour is the most difficult to analyze.
But how do we draw the line between the short run and the long run? The answer depends on
how long the effects of initial conditions persist.
Analytical models are best suited for studying long-run behaviour. This type of analysis is called
steady-state analysis and is the focus of much of the chapter. One requirement for steady-state
analysis is that the parameters of the system remain constant for the entire time period.
Another requirement for steady-state analysis is that the system must be stable. This means
that the servers must serve fast enough to keep up with arrivals—otherwise, the queue can
theoretically grow without limit. For example, in a single-server system where all arriving
customers join the system, the requirement for system stability is that the arrival rate must be
less than the service rate. If the system is not stable, the analytical models discussed in this
chapter cannot be used.
14.4 IMPORTANT QUEUEING RELATIONSHIP
We typically calculate two general types of outputs in a queueing model: time averages and
customer averages. Typical time averages are





L, the expected number of customers in the system
LQ, the expected number of customers in the queue
LS, the expected number of customers in service
P(all idle), the probability that all servers are idle
P(all busy), the probability that all servers are busy
If you were going to estimate the quantity LQ, for example, you might observe the system at
many time points, record the number of customers in the queue at each time point, and then
average these numbers. In other words, you would average this measure over time. Similarly,
to estimate a probability such as P(all busy), you would observe the system at many time
points, record a 1 each time all servers are busy and a 0 each time at least one server is idle,
and then average these 0’s and 1’s.
In contrast, typical customer averages are



W, the expected time spent in the system (waiting in line or being served)
WQ, the expected time spent in the queue
Ws, the expected time spent in service
Little’s Formula
λ = arrival rate (mean number of arrivals per time period)
μ = service rate (mean number of people or items served per time period)
U = server utilization (the long-run fraction of time the server is busy)
14.5 ANALYTICAL STEADY-STATE QUEUEING MODELS
We will illustrate only the most basic models, and even for these, we provide only the key formulas. In
some cases, we even automate these formulas with behind-the-scenes macros. This enable you to focus
on the aspects of practical concern: (1) the meaning of the assumptions and whether they are realistic,
(2) the relevant input parameters, (3) interpretation of the outputs, and possibly (4) how to use the
models for economic optimization.
The Basic Single-Server Model (M/M/1)
We begin by discussing the most basic single-server model, labelled the M/M/1 model. This shorthand
notation, developed by Kendall, implies three things. The first M implies that the distribution of
interarrival times is exponential. The second M implies that the distribution of service times is also
exponential. Finally, the “1” implies that there is a single server.
Mean time between arrivals = 1/ λ
The mean service time per customer = 1/ μ
ρ = traffic intensity = λ/μ
This is called the traffic intensity, which is a very useful measure of the congestion of the system. In fact,
the system is stable only if ρ < 1. If ρ ≥ 1, so that λ ≥ μ, then arrivals occur at least as fast as the server
can handle them; in the long run, the queue becomes infinitely large—that is, it is unstable. Therefore,
we must assume that ρ < 1 to obtain steady-state results.
Assuming that the system is stable, let pn, be the steady-state probability that there are exactly n
customer in the system (waiting in line or being served) at any point in time. This probability can be
interpreted as the long-run fraction of time when there are n customers in the system. For example, p0
is the long-run fraction of time when there are no customers in the system, p1 is the long-run fraction of
time when there is exactly one customer in the system, and so on.
The Basic Multi-Server Model (M/M/s)
Many service facilities such as banks and postal branches employ multiple servers. Usually, these servers
work in parallel, so that each customer goes to exactly one server for service and then departs. In this
section, we analyze the simplest version of this multiple- server parallel system, labelled the M/M/s
model. Again, the first M means that interarrival times are exponentially distributed. The second M
means that the service times for each server are exponentially distributed. (We also assume that each
server is identical to the others, in the sense that each has the same mean service time.) Finally, the s in
M/M/s denotes the number of servers. (If s = 1, the M/M/s and M/M/1 models are identical. In other
words, the M/M/1 system is a special case of the M/M/s system.)
If you think about the multiple-server facilities you typically enter, such as banks, post offices, and
supermarkets, you recognize that there are two types of waiting line configurations.
1. The first, usually seen at supermarkets, is where each server has a separate line. Each customer
must decide which line to join (and then either stay in that line or switch later on).
2. The second, seen at most banks and post offices, is where there is a single waiting line, from
which customers are served in FCFS order. We examine only the second type because it is
arguably the more common system in real-world situations and is much easier to analyze
mathematically.
There are three inputs to this system: the arrival rate λ, the service rate (per server) μ, and the number
of servers s. To ensure that the system is stable, we must also assume that the traffic intensity, now
given by ρ = λ /(sμ), is less than 1. In words, we require that the arrival rate λ be less than the maximum
service rate sμ (which is achieved when all s servers are busy). If the traffic intensity is not less than 1,
the length of the queue eventually increases without bound.
The steady-state analysis for the M/M/s system is more complex than for the M/M/1 system. As before,
let pn be the probability that there are exactly n customers in the system, waiting or in service. Then it
turns out that all of the steady-state quantities depend on p0, which can be calculated from the rather
complex formula in equation (14.12). Then the other quantities can be calculated from p0, as indicated
in equations (14.13) to (14.17).
Download