Queueing System Topologies with Limited Flexibility John N. Tsitsiklis Kuang Xu

advertisement
Queueing System Topologies with Limited Flexibility
John N. Tsitsiklis
LIDS, Massachusetts Institute of Technology, Cambridge, MA, 02139, [email protected]
Kuang Xu
LIDS, Massachusetts Institute of Technology, Cambridge, MA, 02139, [email protected]
We study a multi-server model with n flexible servers and rn queues, connected through a fixed bipartite
graph, where the level of flexibility is captured by the average degree, d(n), of the queues. Applications in
skill-based routing in call centers, flexible supply chains, and content replication in data centers are among
our main motivations.
We focus on the scaling regime where the system size n tends to infinity, while the overall traffic intensity
stays fixed. We show that a large capacity region (robustness) and diminishing queueing delay (performance)
are jointly achievable even under very limited flexibility (d(n) n). In particular, when d(n) ln n, a
family of random-graph-based interconnection topologies is (with high probability) capable of stabilizing
all admissible arrival rate vectors (under a bounded support assumption), while simultaneously ensuring
a diminishing queueing delay, of order ln n/d(n), as n → ∞. Our analysis is centered around a new class
of virtual-queue-based scheduling policies that rely on dynamically constructed partial matchings on the
connectivity graph.
Key words : queueing system, flexibility, partial resource pooling, random graph, expander graph,
asymptotics
1.
Introduction
At the heart of many modern queueing networks lies the problem of allocating processing resources
(e.g., manufacturing plants, web servers, or call-center staff) to meet multiple types of demands
that arrive dynamically over time (e.g., orders, data queries, or customer inquiries). It is often
the case that a fully flexible or completely resource-pooled system, where every unit of processing
resource is capable of serving all types of demands, delivers the best possible performance. Our
inquiry is, however, motivated by the unfortunate reality that such full flexibility is often infeasible
due to overwhelming implementation costs (in the case of supply chains or data centers) or human
skill limitations (in the case of a skill-based call center).
What are the key benefits of flexibility and resource pooling in such queueing networks? Can we
harness the same benefits even when the degree of flexibility is limited, and how should the network
be designed and operated? These are the main questions that we address in this paper. While
1
2
these questions can be approached from several different angles, we will focus on the metrics of
expected queueing delay and capacity region; the former is a direct reflection of performance, while
the latter measures the system’s robustness against demand uncertainties, i.e., when the arrival
rates for different demand types are unknown or likely to fluctuate over time. Our main message
is positive: in the regime where the system size is large, improvements in both the capacity region
and delay are jointly achievable even under very limited flexibility, given a proper choice of the
interconnections topology.
Figure 1
Extreme cases of (in)flexibility: d(n) = n vs. d(n) = 1.
Benefits of Full Flexibility. We begin by illustrating the benefits of flexibility and resource
pooling using two simple examples.1 Consider a system of n servers, each running at rate 1, and n
queues, where each queue stores jobs of a particular demand type. For each i ∈ {1, . . . , n}, queue i
receives a Poisson arrival stream of rate λi = ρ ∈ (0, 1), independently from all other queues. The
job sizes are independent and exponentially distributed with mean 1.
For the remainder of this paper, we will use as a measure of flexibility given by the average
number of servers that a demand type can receive services from, denoted by d(n). Let us consider
the two extreme cases: a fully flexible system, with d(n) = n (Figure 1(a)), and an inflexible system,
with d(n) = 1 (Figure 1(b)). Fixing the traffic intensity ρ, and letting the system size, n, tend to
infinity, we observe the following qualitative benefits of full flexibility.
1. Diminishing Delay: Let W be the steady-state expected time in queue. In the fully flexible
case and under any work-conserving scheduling policy2 , the collection of all jobs in the system
evolves as an M/M/n queue with arrival rate ρn. It is not difficult to verify that the expected total
number of jobs in queue is bounded by a constant independent of n, and hence the expected waiting
1
The lack of mathematical rigor in this section is intended to make the results easier to state. Formal definitions will
be given in later sections.
2
Under a work-conserving policy, a server is always busy whenever there is at least one job in the queues to which
it is connected.
3
time in queue (the time from entering the queue to the initiation of service) satisfies E (W ) → 0,
as n → ∞.3 In contrast, the inflexible system is simply a collection of n unrelated M/M/1 queues,
and hence the expected waiting time is E (W ) =
ρ
1−ρ
> 0, for all n. In other words, expected delay
diminishes in a fully flexible system, as the system size increases, but stays bounded away from
zero in the inflexible case.
2. Large Capacity Region: Suppose that we now allow the arrival rate λi to queue i to vary
with i. For the fully flexible case, and treating it again as an M/M/n queue, it is easy to see that
Pn
the system is stable for all arrival rate vectors that satisfy i=1 λi < n, whereas in the inflexible
system, since all M/M/1 queues operate independently, we must have λi < 1, for all i, in order to
achieve stability. Comparing the two, we see that the fully flexible system attains a much larger
capacity region, and is hence more robust to uncertainties or changes in the arrival rates.
3. Joint Achievability: Finally, it is remarkable, though perhaps obvious, that a fully flexible
system can achieve both benefits (diminishing delay and large capacity region) simultaneously,
Pn
without sacrificing one for the other. In particular, for any ρ ∈ (0, 1), the condition i=1 λi < ρn
for all n implies that E (W ) → 0, as n → ∞.
Preview of Main Result. Will the above benefits of flexibility continue to be present if the
system is no longer fully flexible (i.e., if d(n) n)? The main result of the paper (Theorem 1) shows
that the objectives of diminishing delay and a large capacity region can still be jointly achieved,
even when the amount of flexibility in the system is limited (ln n d(n) n), as long as the
arrival rates are appropriately bounded. However, when flexibility is limited, the interconnection
topology and scheduling policy need to be chosen with care: our solution is based on a family of
connectivity graphs generated by the Erdös-Rényi random bipartite graph construction, combined
with a new class of scheduling policies that rely on dynamically constructed partial matchings.
Furthermore, the scheduling policies are completely oblivious to the exact values of λi , and adapt
to them automatically.
1.1.
Motivating Applications
We describe here some motivating applications for our model, which share a common overall
architecture illustrated in Figure 2. Content replication is commonly used in data centers for
bandwidth intensive operations such as database queries (2) or video streaming (9), by hosting
the same piece of content on multiple servers. Here, a server corresponds to a physical machine in
a data center, and each queue stores incoming demands for a particular piece of content (e.g., a
3
The diminishing expected waiting time follows from the bounded expected total number of jobs in steady-state, and
Little’s Law, and the fact that the total arrival rate is ρn, which goes to infinity as n → ∞.
4
video clip). A server j is connected to queue i if there is a copy of content i on server j, and d(n)
corresponds to the average number of replicas per content across the network. Similar structures
also arise in skill-based routing (SBR) in call centers, where agents (servers) are assigned
to answer calls from different categories (queues) based on their domains of expertise (15), and
in process-flexible supply chains (3)-(8), where each plant (server) is capable of producing
multiple product types (queues). In many of these applications, demand rates can be unpredictable
and may change significantly over time (for instance, unexpected “spikes” in demand traffic are
common in modern data centers (1)). The demand uncertainties make robustness an important
criterion for system design. These practical concerns have been our primary motivation for studying
the joint trade-off between robustness, performance, and the level of flexibility.
1.2.
Related Work
Bipartite graphs serve as a natural model of the relationships between demand types and service
resources. It is well known in the supply chain literature that limited flexibility, corresponding to a
sparse bipartite graph, can be surprisingly effective in resource allocation even when compared to
a fully flexible system (3, 4, 5, 6, 7). The use of sparse random graphs or expanders as flexibility
structures that improve robustness has recently been studied in (8) in the context of supply chains,
and in (9) for content replication. Similar to the robustness results reported in this paper, these
works show that the use of expanders or random graphs can accommodate a large set of demand
rate vectors. However, in contrast to our work, nearly all analytical results in this literature focus
on static allocation problems, where one tries to match supply with demand in a single slot, as
opposed to the queueing context, where resource allocation decisions need to be made dynamically
over time.
Figure 2
A processing network with rn queues and n servers.
In the queueing theory literature, the models we consider fall under the umbrella of so-called
5
multi-class multi-server systems, where multiple queues and servers are connected through a bipartite graph. Under these (and similar) settings, complete resource pooling (full flexibility) is known
to improve system performance (12, 13, 14), but much less is known when only limited flexibility is
available, because systems with a non-trivial connectivity graph are very difficult to analyze, even
under seemingly simple scheduling policies (e.g, first-come-first-serve) (10, 11). Simulations in (15)
show empirically that limited cross-training can be highly effective in a large call center under a
skill-based routing algorithm. Using a very different set of modeling assumptions, (16) proposes a
specific chaining structure with limited flexibility, which is shown to perform well under heavy traffic. Closer to the spirit of the current work is (17), which studies a partially flexible system where
a fraction p > 0 of all processing resources are fully flexible, while the remaining fraction, 1 − p, is
dedicated to specific demand types, and shows an exponential improvement in delay scaling under
heavy-traffic. However, both (16) and (17) focus on the heavy-traffic regime, which is different from
the current setting where traffic intensity is assumed to be fixed while the system size tends to
infinity, and provide analytical results for the special case of uniform demand rates. Furthermore,
with a constant fraction of fully flexible resources, the average degree in (17) scales linearly with
the system size n, whereas we are interested in the case of a much slower (super-logarithmic) degree
scaling. Finally, the expected delay in the architecture considered in (17) does not tend to zero as
the system size increases.
At a higher level, our work is focused on the joint trade-off between robustness, delay, and the
degree of flexibility in a queueing network, which is little understood in the existing literature, and
especially for networks with a non-trivial interconnection topology.
On the technical end, we build on several existing ideas. The techniques of batching (cf. (18, 19))
and the use of virtual queues (cf. (20, 21)) have appeared in many contexts in queueing theory,
but the specific models considered in the literature bear little resemblance to ours. The study of
perfect matchings on a random bipartite graph dates back to the seminal work in (22); while it
has become a rich topic in combinatorics, we will refrain from giving a thorough literature survey
because only some elementary and standard properties of random graphs are used in the current
paper.
Organization of the Paper. We describe the model in Section 2 along with the notation to be
used throughout. The main result is stated in Section 3, with a discussion of potential improvements
postponed till Section 7. Section 3.2 studies a specific flexibility structure and compares it against
the architecture proposed in this paper. The rest of the paper is devoted to the proof of our main
result (Theorem 1), with an overview of the proof strategy given in Section 3.3.
6
2.
Model and Notation
Notation. To avoid clutter, we will minimize the use of floor and ceiling notation throughout the
paper (e.g., writing Trn instead of Tbrnc ). We will assume that all values of interests are appropriately
rounded up or down to an integer, whenever doing so does not cause ambiguity or confusion. We
denote by Gn the set of all rn × n bipartite graphs. For gn ∈ Gn , let deg(gn ) be the average degree
among the rn left vertices. An m × n Erdös-Rényi random bipartite graph G = (E, I ∪ J),4 where
each of the mn edges is present with probability p, is referred to as an (m, n, p) random graph. We
will use Pn,p (·) to denote the probability measure on Gn induced by an (rn, n, p) random graph,
rn2 −|E|
Pn,p (g) = p|E| (1 − p)
,
∀g ∈ Gn ,
(1)
where |E | is the cardinality of the set of edges, E. For a subset of vertices, M ⊂ I ∪ J, we will
denote by g |M the graph induced by g on the vertices in M . Throughout, we will use the letter K
to denote a generic constant.
The following short-hand notation for asymptotic comparisons will be used; here f and g are
positive functions:
1. f (x) . g(x) for f (x) = O (g(x)), and f (x) & g(x) for f (x) = Ω (g(x)).
(x)
(x)
= ∞, and f (x) g(x) for lim supx→∞ fg(x)
= 0.
2. f (x) g(x) for lim inf x→∞ fg(x)
(x)
3. f (x) ∼ g(x) for limx→∞ fg(x)
= 1.
When labeling a sequence, we will use the notation a(n) if the value of a has a meaningful
dependence on n, or an if n serves merely as an index. We will use Expo (λ), Geo (p), Bino(n, p) as
shorthands for the exponential, geometric and binomial distributions with the standard parameters,
d
respectively. The expression X = Y means that X and Y have the same distribution. Whenever
possible, we will use upper-case letters for random variables, and lower-case letters for deterministic
values.
The Model. We consider a sequence of systems operating in continuous time, indexed by an
integer n, where the nth system consists of n servers and rn queues, where r is a positive constant
that is fixed as n varies (Figure 2). The connectivity of the system is encoded by an rn × n
undirected bipartite graph gn = (E, I ∪ J), where I and J represent the set of queues and servers,
respectively, and E the set of edges between them.5 A server j ∈ J is capable of serving a queue
i ∈ I if and only if (i, j) ∈ E. We will denote by N (i) the set of servers in J connected to queue i,
and similarly, by N (j) the set of queues in I connected to server j.
4
Throughout, we denote by E, I, and J the set of edges and left and right vertices, respectively.
5
For notational simplicity, we omit the dependence of E, I, and J on n.
7
In the nth system, each queue i receives a stream of incoming jobs according to a Poisson
process of rate λn,i , independent of all other streams. We will denote by λn the arrival rate vector,
i.e., λn = (λn,1 , λn,2 , . . . , λn,rn ). The sizes of the jobs are exponentially distributed with mean 1,
independent from each other and from the arrival processes. All servers are assumed to run at a
constant rate of 1. The system is assumed to be empty at time t = 0.
Jobs arriving at queue i can be assigned to any server j ∈ N (i) to receive service. The assignment
is binding, in the sense that once the assignment is made, the job cannot be transferred to, or
simultaneously receive service from, any other server. Moreover, service is non-preemptive, in the
sense that once service is initiated for a job, the assigned server has to dedicate its full capacity to
that job until its completion.6 Formally, if a server j just completed the service of a previous job at
time t, its available actions are: 1. Serve a new job: Server j can choose to fetch a job from any
queue in N (j) and immediately start service. The server will remain occupied and take no other
actions until the current job is completed. 2. Remain idle: Server j can choose to remain idle.
While in the idling state, it will be allowed to initiate a service (Action 1) at any point in time.
Given the limited set of actions available to the server, The performance of the system is fully
determined by a scheduling policy, π, which specifies for each server j ∈ J, when to remain idle,
and when to serve a new job, and from which queue in N (j) to fetch a job when initiating a new
service.
We only allow policies that are causal, in the sense that the decision at time t depends only
on the history of the system (arrivals and service completions) up to t. We allow the scheduling
policy to be centralized (i.e., to have full control over all servers’ actions), and to be based on the
knowledge of the graph gn and the past history of all queues and servers. On the other hand, a
policy does not observe the actual sizes of jobs before they are served, and does not know the
arrival rate vector λn .
Performance metric: Let Wk be the waiting time in queue for the kth job, defined as the time
from the job’s arrival to a queue until when it starts to receive service from some server. With a
slight abuse of notation, we will define the expected waiting time, E (W ), as
4
E (W ) = lim sup E (Wk ) .
(2)
k→∞
6
While we restrict ourselves to only binding and non-preemptive scheduling polices in this paper, other common
architectures where (a) a server can serve multiple jobs concurrently (processor sharing), (b) a job can be served by
multiple servers concurrently, or (c) job sizes are revealed upon entering the system, are clearly more powerful than
the current setting, and are therefore capable of implementing the scheduling policy we consider here. Hence the
performance upper bounds developed in this paper also apply to these more powerful variants.
8
Note that E (W ) captures the worst-case expected waiting time across all jobs in the long run, and
is always well defined, even under scheduling policies that do not induce a steady-state distribution.
We are primarily interested in the scaling of E (W ) in the regime where the total traffic intensity
(i.e., the ratio between the total arrival rate and the total service capacity) stays fixed, while the
size of the system, n, tends to infinity.
3.
Main Theorem
Before stating our main theorem, we first motivate a condition on the arrival rate vector, λn . Since
we will allow the arrival rates λn,i to vary with i, and since on average each queue is connected
to d(n) servers, the range of fluctuations of the λn,i with respect to i should not be too large
compared to d(n), or else the system would become unstable. We will therefore let the amount of
rate fluctuation or uncertainty be bounded by some u(n), an upper bound on λn,i for all i. The
following condition summarizes these points.
Condition 1(Rate Condition) We fix a constant ρ ∈ (0, 1) (referred to as the traffic intensity)
and a sequence {u(n)}n≥1 of positive reals with inf n u(n) > 0. For any n ≥ 1, the arrival rate vector
λn satisfies:
1. max1≤i≤rn λn,i ≤ u(n).
Prn
2.
i=1 λn,i ≤ ρn.
We will denote by Λn ⊂ Rn+ the set of all arrival rate vectors in the nth system that satisfy the
above conditions.
We denote by Eπ (W | gn , λn ) the expected waiting time (cf. Eq. (2)) under a scheduling policy
π, a given graph gn , and an arrival rate vector λn . The following is the main result of the paper.
q
Theorem 1. We assume that d(n) ln n and u(n) d(n)
, and fix ρ ∈ (0, 1), γ > 0, and n ≥ 1.
ln n
Then, there exists a policy πn such that for every arrival vector λn that satisfies Condition 1, the
following hold.
1. There exists a bipartite graph, gn ∈ Gn (possibly depending on λn ) and scheduling policy, πn ,
such that deg(gn ) ≤ (1 + γ)d(n), and
Eπn (W | gn , λn ) ≤
Ku2 (n) ln n
= o(1),
d(n)
where K > 0 is a constant independent of n, gn and λn .
(3)
9
2. Let Gn be a random graph in Gn , chosen at random according to the Erdös-Rényi measure
Pn, d(n) (cf. Eq. (1)). Then Gn has the property in Part 1, with high probability, uniformly over all
n
λn ∈ Λn . Formally,
Pn, d(n) Gn has the property in Part 1 ≥ 1 − δn ,
n
∀λn ∈ Λn ,
where {δn }n≥1 is a sequence wth limn→∞ δn = 0.
2. There exists Hn ⊂ Gn , with
7
Pn, d(n) (Hn ) ≥ 1 − δn ,
(4)
n
such that statements in part 1. hold if gn ∈ Hn , where {δn }n≥1 is a sequence of non-increasing
constants with limn→∞ δn = 0.
The reader is referred to Section 7 for a discussion on potential improvements of the result.
3.1.
Remarks on Theorem 1
Note that Part 1 of the theorem is a special case of Part 2, which states that for large values of n,
and for a graph chosen according to the Erdös-Rényi model, the scheduling policy will have a large
probability (with respect to the random graph generation) of achieving diminishing delay, as in
Eq. (3). At a higher level, Eq. (3) relates three key characteristics of the system: delay E (W ), level
of robustness u(n), and degree of flexibility d(n). Furthermore, the scheduling policy only requires
knowledge of ρ, not of the arrival rate vector; this is important in practice because this vector need
not be known a priori or may change significantly over time.
The fact that a large capacity region (as given by Condition 1) is achievable when d n should
not be too surprising: expander graphs (such as those obtained through a random graph construction) are known to have excellent max-flow performance. The more difficult portion of the argument
is to show that a diminishing delay (a temporal property) can also be achieved without significantly
sacrificing the range of admissible arrival rates, through appropriately designed scheduling policies.
In our proof, the achievability of a large capacity region is not shown explicitly, but comes as a
by-product of the delay analysis; in fact, the scheduling policy automatically finds, over time, a
feasible flow decomposition over the connectivity graph (see Section 6).
Dependence on ρ It is possible to incorporate the traffic intensity ρ in the leading constant in
Eq. (3), to obtain a bound of the form
Eπn (W | gn , λn ) ≤
K 0 u2 (n) ln n
·
,
1−ρ
d(n)
(5)
where K 0 is a constant independent of ρ. This captures a trade-off between the traffic intensity
and the degree of flexibility in the system. A proof of Eq. (5) is given in Appendix A.7.
7
Pn,p is the probability measure induced by an (rn, n, p) random bipartite graph, defined in Eq. (1). The choice of
Hn can depend on λn .
10
Adversarial Interpretation of Rate Robustness It is important to note that the delay
guarantee in Theorem 1 holds for any choice of λn that satisfies Condition 1. To better understand
the power and limitations of the “rate robustness” entailed by Theorem 1, it is useful to view
the system as a game between two players: an Adversary who chooses the rate vector λn , and
a Designer who chooses the interconnection topology gn and the scheduling policy πn . The goal
of the Designer is to achieve small average waiting time (which would also imply stability of all
queues), while the goal of the Adversary is the opposite. The following definition will facilitate our
discussion.
Definition 1. (Good Graphs) Let us fix n, d(n), u(n), ρ, γ, and K, as in Theorem 1. We
define the set Good (λn ) of good graphs for λn as the subset of Gn for which deg(gn ) ≤ (1 + γ)d(n)
and the inequality in Eq. (3) holds, for some policy πn .
We now examine the robustness implications of Theorem 1 under different orderings of the
actions in the game, (i.e., who gets to act first) listed according to increasing levels of difficulties
for the Designer.
Level 1: Adversary acts first (weak adversary). The Designer gets to pick gn and πn after
observing the realization of λn . Note that this weak form of adversary fails to capture the rate
uncertainty arising in many applications, where the arrival rates are unknown at the time when
the system is designed. For this setting, it is not hard to come up with a simple deterministic
construction of a good graph and a corresponding policy. Formally, the set Good (λn ) is nonempty
for every λn ∈ Λn . However, Part 1 of Theorem 1 actually makes a stronger statement. While the
graph is chosen after observing λn , the policy is designed without knowledge of λn , albeit at the
expense of a much more complex policy.
Level 2: Adversary and Designer act independently (intermediate adversary). Suppose that both the Adversary and the Designer make their decisions independently, without knowing each other’s choice. This case models most practical applications, where λn is generated by
some exogenous mechanism, unrelated to the design process, and is not revealed to the Designer
ahead of time. Here, our randomization in the construction of the graph offers protection against
the choice of the Adversary. Part 2 of the theorem can be rephrased into the statement that there
exists a policy for which
inf Pn, d(n) Gn ∈ Good (λn ) → 1.
λn ∈Λn
(6)
n
As an extension, we may consider the case where the Adversary chooses λn at random, according
to some probability measure µn on Λn , but still independently from Gn . However, this additional
11
freedom to the Adversary does not make a difference, and it can be shown, through an easy
application of Fubini’s Theorem, that
inf Pn, d(n) × µn Gn ∈ Good (λn ) → 1,
µn
(7)
n
and where × is used to denote product measure. The proof is given in Appendix A.8.
Level 3: Designer acts first (strong adversary). In this case, the Adversary chooses λn
after observing the gn and πn picked by the Designer. While such an adversary may be too strong
for most practical applications, it is still interesting to ask whether there exist fixed gn and πn that
will work well for any λn that satisfies Condition 1 (or even weaker conditions). We conjecture
Average Queue Length
that this is the case.
2
Matching−based
10
Greedy
1
10
0
10
−1
10
−2
10
20
40
60
80
100
120
140
160
180
200
System Size (n)
Figure 3
Diminishing Delay. Simulations with n × n networks (r = 1), λn,i = ρ = 0.95, and d(n) =
√
n.
On Practical Scheduling Policies The scheduling policy that we will use in the proof of
Theorem 1 was mainly designed for the purpose of analytical tractability, rather than practical
efficiency. For instance, simulations suggest that a seemingly naive greedy heuristic, where any free
server chooses a job from a longest queue to which it is connected, can achieve a superior delay
scaling (see Figure 3).8 Unfortunately, deriving an explicit delay upper bound for the greedy policy
or other similar heuristics appears to be challenging. See also the discussion in Section 7.
3.2.
A Comparison with Modular Architectures
Assume for simplicity that there is an equal number of queues and servers (i.e., r = 1). In a
modular architecture, the designer partitions the system into n/d(n) separate sub-networks. Each
sub-network consists of d(n) queues and servers that are fully connected within the sub-network
(Figure 4). Note that this architecture guarantees a degree of exactly d(n), by construction. Assume
8
The scheduling policy being simulated is a discrete-time version of the continuous-time policy analyzed in this paper.
12
Figure 4
A modular architecture, consisting of n/d(n) discounted clusters, each of size d(n). Within each cluster,
all servers are connected to all queues.
that all queues have the same arrival rate ρ ∈ (0, 1) and that each server uses an arbitrary workconserving service policy. Since each sub-network is fully connected, it is not difficult to see that
the modular architecture achieves a diminishing queueing delay as n → ∞, as long as d(n) → ∞.
This seems even better than the random-graph-based architecture, which requires d(n) to scale at
least as fast as ln n. However, this architecture fares much worse in terms of robustness, as we now
proceed to discuss.
Since the sub-networks are completely disconnected from each other, the set of arrival rates that
are admissible in a modular network is severely reduced due to the requirement that the total
arrival rate in each sub-network be less than d(n), which is much more restrictive than the rate
constraint given in Condition 1. Thus, the modular design may achieve a better delay scaling when
the arrival rates are (almost) uniform, but at the risk of instability when the arrival rates are
random or chosen adversarially, because the processing resources are not shared across different
sub-networks.
Removed the subsection on the MS thesis architecture. Whatever was worth saying was already
said in the intro.
3.3.
Proof Overview
Sections 4 through 6 contain the proof of Theorem 1. We will provide an explicit construction of
scheduling policies πn , and then show that they achieve the delay scaling in Eq. (3). At the core
of the construction is the use of virtual queues, whose operation is described in detail in Section 4.
The proof of Theorem 1 is completed in two steps. In Section 5, we first prove the result in the
special case where all λi are equal to some λ < 1/r. The symmetry brought about by the uniform
arrival rates will significantly simplify the notation, while almost all intuition developed here will
13
carry over to the proof for the general case. We then show in Section 6 that, perhaps surprisingly,
the original scheduling policies designed for the uniform arrival rate case can be applied directly
to achieve the delay scaling in Eq. (3) for any λn satisfying Condition 1.
4.
Virtual Queue and the Scheduling Policy
This section provides a detailed construction of a virtual-queue-based scheduling policy that
achieves the delay scaling in Theorem 1. We begin by describing some high-level ideas behind our
design.
Regularity vs. Discrepancies. Setting aside computational issues, an efficient scheduling
policy is difficult to design because the future is unpredictable and random: one does not know a
priori which part of the network will become more loaded, and hence current resource allocation
decisions must take into account all possibilities of future arrivals and job sizes. which is difficult
to carry out or analyze.
However, as the size of the system, n, becomes large, certain regularities in the arrival processes
begin to emerge. To see this, consider the case where r = 1, λn,i = λ < 1 for all n and i, and
assume that at time t > 0, all servers are busy serving some job. Now, during the interval [t, t + γn ),
“roughly” λnγn new jobs will arrive, and nγn servers will become available. For this [t, t + γn )
interval, denote by Γ the set of queues that received a job, and by ∆ the set of roughly nγn
servers that became available. If γn is chosen so that λnγn n, these incoming jobs are likely to
spread out across the queues, so that most queues receive at most one job. Assuming that this is
indeed the case, we see that the connectivity graph gn restricted to Γ ∪ ∆, denoted gn |Γ∪∆ , is a
subgraph sampled uniformly at random among all (λnγn × nγn )-sized subgraphs of gn . When nγn
is sufficiently large, and gn is well connected (as in an Erdös-Rényi random graph with appropriate
edge probability), we may expect that, with high probability, gn |Γ∪∆ admits a matching (Definition
4) that includes the entire set Γ, in which case all λnγn jobs can start receiving service by the end
of the interval.
Note that when n is sufficiently large, despite the randomness in the arrivals, the symmetry
in the system makes delay performance at a short time scale insensitive to the exact locations
of the arrivals. Treated collectively, the structure of the set of arrivals and available servers in
a small interval becomes less random and more “regular,” as n → ∞. Of course, for any finite
n, the presence of randomness means that discrepancies (events that deviate from the expected
regularity) will be present. For instance, the following two types of events will occur with small,
but non-negligible, probability.
1. Arrivals may be located in a poorly connected subset of gn .
14
2. Arrivals may concentrate on a small number of queues.
We need to take care of these discrepancies, and show that any of their negative impact on performance is insignificant.
Following this line of thought, our scheduling policy aims to use most of the resources to dynamically target the regular portion of the traffic (via matchings on subgraphs), while ensuring that
the impact of the discrepancies is well contained. In particular, we will create two classes of virtual
queues to serve these two objectives:
1. A single Matching queue that targets regularity in arrival and service times, and discrepancies
of the first type.
2. A collection of rn Residual queues, one for each (physical) queue, which targets the discrepancies
in arrival and service times of the second type.
The queues are “virtual,” as opposed to “physical,” in the sense that they merely serve to conceptually simplify the description of the scheduling policy.
Good Graphs: The management of the virtual queues must comply with the underlying connectivity graph, gn , which is fixed over time. We informally describe here some desired properties
of gn ; more detailed definitions and performance implications will be given in subsequent sections,
as a part of the queueing analysis. We are interested in graphs that belong to a set Hn , defined as
the intersection of the following three subsets of Gn :
1. Ĝn (cf. Lemma 1): For the case r = 1, gn ∈ Ĝn if it admits a full matching. For the more general
case, a suitable generalization is provided in the context of Lemma 1. This property will be
used in both Matching and Residual queues to handle discrepancies.
2. G̃n (cf. Lemmas 5-7) We have gn ∈ G̃n if a randomly sampled sublinear-sized subgraph admits a
full matching, with high probability. This property will be used in the Matching queue to take
advantage of regularity.
3. Ln (cf. Section 5.3) : We have gn ∈ Ln if gn has an average degree approximately equal to d(n).
This property is to comply with our degree constraint.
Once the description of the policy is completed, the proof will consist of establishing the following:
(i) If gn ∈ Hn , then the claimed delay bound holds.
(ii) The probability that a random bipartite graph belongs to Hn tends to 1, as n → ∞.
Inputs to the Scheduling Policy: Besides n and r, the scheduling policy uses the following
inputs:
1.
2.
3.
4.
ρ, the traffic intensity as defined in Condition 1,
, a constant in (0, 1 − ρ),
b(n), a batch size function,
gn , the interconnection topology.
15
4.1.
Arrivals to Virtual Queues
The arrivals to both the Matching and Residual queues are arranged in batches. Roughly speaking,
a batch is a set of jobs that are treated collectively as a single entity that arrives to a virtual queue.
We define a sequence of random times {TB (k)}k∈Z+ , by letting TB (0) = 0, and for all k ≥ 1,
ρ
4
TB (k) = time of the k b(n)th arrival to the system,
r
where b(n) ∈ Z+ is a design parameter, and will be referred to as the batch parameter.9 We will
refer to the time period (TB (k − 1), TB (k)] as the kth batch period.
Definition 2. (Arrival Times to Virtual Queues) The time of arrival of the kth batch to
4
all virtual queues is TB (k), and the corresponding interarrival time is A(k) = TB (k + 1) − TB (k).
While all virtual queues share the same arrival times, the contents of their respective batches
are very different. We will refer to them as the Matching batch and Residual batch, respectively.
The kth Matching batch is the set of jobs that first arrive to their respective queues during
the kth batch period. Since for each queue only the first job belongs to the Matching batch, it is
convenient to represent the Matching batch as the set of queues that receive at least one job during
the batch period.
The kth batch arriving to the ith Residual queue is the set of all jobs, except for the first one,
that arrive to queue i during the kth batch.
Definition 3. (Size of a Residual Batch). Let Hi (k) be the total number of jobs arriving
to queue i during the kth batch period. The size of the ith Residual batch is given by
4
+
Ri (k) = (Hi (k) − 1) .
(8)
A graphical illustration of the Matching and Residual batches is given in Figure 5.
4.2.
State Transitions and Service Rules
This section describes how the batches will be served by the physical servers. Before getting into
the details, we first describe the general ideas.
1. The sizes of Residual batches are typically quite small, and the physical servers will process
them using a (non-adaptive) randomized scheduling rule.
2. For each Matching batch, we will first “collect” a number of available servers, equal to the size
of the batch:
9
We chose not to absorb the constant ρ/r into b(n) at this point because this will allow for simpler notation in
subsequent sections.
16
Figure 5
An illustration of the arrivals during a single batch period.
(a) With high probability, all jobs in the Matching batch can be simultaneously processed by
these servers through a matching over gn .
(b) With small probability, some jobs in the Matching batch are located in a poorly connected
subset of gn and cannot be efficiently matched with the available servers. In this case, all jobs
in the batch will be served, one at a time, according to a fixed server-to-queue assignment (a
“clear” phase).
To implement the above queueing dynamics, we will specify the evolution of states and actions
of all physical servers and virtual queues, to be described in detail in the remainder of this section.
4.2.1.
States and Actions of (Physical) Servers and Residual Queues We first intro-
duce the notion of an assignment function, which will be used to schedule the physical servers. An
assignment function L maps each queue i ∈ I to a server L(i) ∈ J. We say that L is an efficient
assignment function for the connectivity graph gn if (i, L(i)) ∈ E for all i ∈ I, and
−1 L (j) ≤ r + 1,
(9)
for all j ∈ J. As will become clear in the sequel, our scheduling policy will use an assignment
function L to ensure that every Residual queue receives at least a “minimum service rate” from
some server. Let Ĝn be the set of all gn ∈ Gn such that gn has an efficient assignment function. With
our random graph construction, an efficient assignment function exists with high probability. The
proof can be found in Appendix A.1.
Lemma 1. Let p (n) = d(n)/n and assume that d(n) ln n. Then,
lim Pn,p(n) Ĝn = 1.
n→∞
17
Figure 6
State evolution of a physical server.
A physical server can be, at any time, in one of two states: “BUSY” and “STANDBY;” cf. Fig.
4.2.1. The end of a period in the BUSY state will be referred to as an event point, and the time
interval between two consecutive event points an event period. At each event point, a new decision
is to be made as to which virtual queue the server will choose to serve, during the next event period.
All servers are initialized in a BUSY state, with the time until the first event point distributed as
Expo (1), independently across all servers.
Recall that be a parameter in (0, 1 − ρ). The state transitions defined below also involve the
current state of the Matching queue, whose evolution will be given in Section 4.2.2. When a server
j ∈ J is at the kth event point:
1. With probability ρ + , the kth event period is of type “matching” :
(a) If the Matching queue is in state COLLECT, server j enters state STANDBY.
(b) If the Matching queue is in state CLEAR, let B 0 be the Matching batch at the front of the
Matching queue, and let M 0 ⊂ I be the set of queues that still contain an unserved job from
batch B 0 . Let i∗ = min {i : i ∈ M 0 } .
i) If L (i∗ ) = j, then server j starts processing the job in queue i∗ that belongs to B 0 , entering
state BUSY.
ii) Else, server j goes on a vacation of length Expo (1), entering state BUSY.
(c) If the Matching queue is in state IDLE, server j goes on a vacation of length Expo (1),
entering state BUSY.
2. With probability 1 − (ρ + ), the kth event period is of type “residual.” Let i be an index drawn
uniformly at random from the set L−1 (j).
(a) If the ith Residual queue is non-empty, server j starts processing a job from the Residual
batch that is currently at the front of the ith Residual queue, entering state BUSY.10
(b) Else, server j goes on a vacation of length Expo (1), entering state BUSY.
10
The Residual batch at the front of the Residual queue is defined to be the oldest Residual batch that contains any
as yet unserved job.
18
The above procedure describes all the state transitions for a single server, except for one case:
when in state STANDBY, any server can be ordered by the Matching queue to start processing a
job, or initiate a vacation period. The transition out of the STANDBY state will be specified in
the next subsection.
4.2.2.
States and Actions of the Matching Queue As mentioned earlier, the Match-
ing queue is not a physical entity, but a mechanism that coordinates the actions of the physical
servers. The name “Matching” reflects the fact that this virtual queue is primarily focused on using
matchings of the connectivity structure of gn to schedule batches, where a matching is defined as
follows.
Definition 4. (Matching in a Bipartite Graph) Let F ⊂ E be a subset of the edges of
g = (E, I ∪ J). We say that F is a matching, if |{j 0 : (i, j 0 ) ∈ F }| ≤ 1, and |{i0 : (i0 , j) ∈ F }| ≤ 1, for
all i ∈ I, j ∈ J. A matching F is said to contain S ⊂ I ∪ J if for all s ∈ S, there exists some edge in
F that is incident on s, and is said to be full if it contains I or J.
Figure 7
State evolutions of the Matching queue.
The Matching queue can be in one of three states: IDLE, COLLECT, and CLEAR. It is initialized
to state IDLE.
1. If the Matching queue is in state IDLE, it remains so until the arrival of a Matching batch,
upon which the Matching queue enters state COLLECT.
2. If the Matching queue is in state COLLECT, it remains so until there are exactly (ρ/r)b(n)
servers in state STANDBY.11 Denote by Γ ⊂ I the Matching batch currently at the front of the
Matching queue, and by ∆ ⊂ J the set of servers in STANDBY at the end of the COLLECT
period. The Matching queue makes the following decisions:
(a) If gn |Γ∪∆ admits a full matching F :
i) Let all matched servers in ∆ start processing a job in the current Matching batch according
to the matching F , entering state BUSY. If there are unmatched servers in ∆ (which will
occur if |Γ| < |∆| = (ρ/r)b(n)), let all unmatched servers enter state BUSY by initiating a
vacation, with a length independently distributed according to Expo (1).
11
This particular number is equal to the total number of jobs arriving in a single batch period.
19
ii) This completes the departure of a Matching batch from the Matching queue. If the Matching queue remains non-empty at this point, it returns to state COLLECT (to deal with
the rest of the Matching batch); else, it enters state IDLE.
(b) If gn |Γ∪∆ does not admit a full matching, the Matching queue enters state CLEAR. All
servers in state STANDBY enter state BUSY by initiating a vacation, with a length independently distributed according to Expo (1).
3. If the Matching queue is in state CLEAR, it remains so until all jobs in the current Matching
batch have started receiving processing from one of the servers, upon which it enters state
COLLECT if the Matching queue remains non-empty, or state IDLE, otherwise.
By now, we have described how the batches are formed (Section 4.1), and how each physical
server operates (Section 4.2). The scheduling policy is hence fully specified, provided that the
underlying connectivity graph g admits an efficient assignment function.
5.
Dynamics of Virtual Queues – Uniform Arrival Rates
We now analyze the queueing dynamics induced by the virtual-queue-based scheduling policy
introduced in Section 4. In this section, we will prove Theorem 1 for the special case of uniform
arrival rates, where
λn,i = λ < 1/r,
(10)
for all n ≥ 1 and i ∈ I. In particular, we have ρ = λr < 1. Starting with this section, we will focus
on a specific batch size function of the form
b(n) = K
n
n ln n
=K
,
d(n)
y(n)
(11)
for a suitably chosen constant K, where y(n) is defined by
y(n) =
d(n)
.
ln n
(12)
The specifics of the choice of K will be given later. We have assumed that d(n) ln n, and hence
y(n) → ∞ as n → ∞. Note that this implies that b(n) n. As will be seen soon, this guarantees
that only a small fraction of the jobs will enter the Residual queues.
The main idea behind the delay analysis is rather simple: we will treat each virtual queue as
a GI/GI/1 queue, and use Kingman’s bound to derive an upper bound on the expected waiting
time in queue. The combination of a batching policy with Kingman’s bound is a fairly standard
technique in queueing theory for deriving delay upper bounds (see, e.g., (19)). Our main effort will
go into characterizing the various queueing primitives associated with the virtual queues (arrival
20
rates, traffic intensity, and variances of inter-arrival and processing times), as well as in resolving
the statistical dependence induced by the interactions among the physical servers. We begin by
stating a simple fact on the inter-arrival-time distribution for the virtual queues. Even though in
this section we assume that r = 1, we state the lemma for the general case.
Lemma 2. The inter-arrival times of batches to the virtual queues, {A (k)}k≥1 , are i.i.d., with
E (A (k)) = b(n)/(rn), and Var (A (k)) . b(n)/n2 .
Proof. By definition, A (k) is equal in distribution to the time until a Poisson process with
rate λrn records λb (n) arrivals. Therefore, E (A (k)) = λb (n) /(λrn)=b(n)/(rn), and Var (A (k))
1
2
= λb (n) · (λrn)
2 . b(n)/n .
Definition 5. (Service Times in Virtual Queues) Consider the kth batch arriving at a
virtual queue (Matching or Residual). Define the time of service initiation, Ek , to be when the
batch first reaches the front of the queue, and the time of departure, Dk , to be when the last job
in the batch starts receiving service from one of the physical servers. Let Dk = Ek if the batch
contains no jobs. The service time for the k batch is defined to be
S(k) = Dk − Ek .
The interval [Ek , Dk ] is referred to as the service period of the kth batch.
We will denote by S M (k) and SjR (k) the service time for the kth batch in the Matching queue
and the jth Residual queue, respectively. Note that our scheduling policy (Section 4.2) induces
interactions among the physical servers, and hence in general, the service times in a Residual queue
are neither independent across batches or queues nor identically distributed.
5.1.
Delays in Matching Queues
We first turn our attention to the service times S M (k) of the batches in the Matching queue. Based
on our construction, it is not difficult to verify that the S M (k) are i.i.d. for any fixed graph, gn .
Furthermore, the value of S M (k) is equal to the length of a COLLECT period plus possibly the
length of a CLEAR period, if the subgraph formed during the COLLECT period, gn |Γ∪∆ , fails to
contain a matching that includes all queues in the current batch. We will therefore write
d
S M (k) = Scol + Xq(gn ) · Scle ,
(13)
where Scol and Scle correspond to the length of a COLLECT and CLEAR period, respectively, and
Xq is a Bernoulli random variable with P (Xq = 1) = q. The value of q(gn ) is defined by
q(gn ) = P {gn |Γ∪∆ does not admit a full matching} ,
(14)
21
where the probability is taken over the randomness in Γ and ∆, but is conditional on Gn = gn .
We begin by analyzing the properties of Scle . Recall from Section 4.2.1 that, when the Matching
queue is in state CLEAR, the jobs in the current Matching batch are to be served in a sequential
fashion (Step 1-(b)) by a designated server. In particular, letting i∗ be the smallest index of the
queues that contain an unserved job from the current Matching batch, then the job in queue i∗ ,
denoted by b, will be served by server L(i∗ ) whenever that server enters an event period of type
“matching.” No other unprocessed jobs in the Matching batch can be served until job b begins
to receive service, and the amount of time it takes until this occurs is exponentially distributed
with mean 1/(ρ + ), since the probability for an event period at a physical server to be of type
“matching” is (ρ + ) (Step 1). Since there are at most λb (n) jobs in a Matching batch, the length
of a CLEAR period, Scle , is no greater than the amount of time it takes for a Poisson process
of rate ρ + to record λb (n) arrivals.12 Arguing similar to the proof of Lemma 2, we have the
following lemma. Note that the distribution of Scle does not depend on the structure of gn , as long
as gn admits an efficient assignment function (cf. Lemma 1).
Lemma 3. E (Scle ) =
λ
b (n),
ρ+
2
and E (Scle
) . b2 (n).
To analyze Scol , we argue as follows. A COLLECT period consists of the time until a first
server in ∆ completes service, followed by the time until one of the remaining servers completes
service, etc., until we have a total of λb(n) service completions. Thus the length of the period is the
sum of λb(n) independent exponential random variables with parameters λ(ρ + )n, λ(ρ + )(n −
1), . . . , λ(ρ + )(n − λb(n) + 1). Using the fact that b(n) n, this is essentially the same situation
as in our analysis of Scle , and we have the following result.
2
Lemma 4. E (Scol ) ∼ λ/(ρ + ) · b(n)/n and E (Scol
) . b2 (n)/n2 .
Finally, we want to focus our attention to a set G̃n ⊂ Gn of graphs that possess the following two
properties.
1. The value of E Xq(gn ) = q(gn ) is small, for all gn ∈ G̃n . This property will help us upper bound
the service time S M (k), using Eq. (13) and the moment bounds for Scle and Scol developed
earlier.
2. The set G̃n has high probability under the Erdös-Rényi random graph model.
12
While the size of the Matching batch is at most λb (n), Lemma 9 indicates that its expected value ∼ λb (n), as
n → ∞.
22
The next lemma is the main technical result of this sub-section, which formalizes the above statements.
We start with some definitions. For m ≤ n, let Mm,n be the family of all m × m subsets of I ∪ J,
that is, h ∈ Mm,n if and only if |h ∩ I | = |h ∩ J | = m. Let m(n) be such that m(n) → ∞ as n → ∞.
Let PMm(n),n be a probability measure on Mm(n),n . Let, for each g ∈ Gn ,
l(g) = PMm(n),n
h ∈ Mm(n),n : g |h does not admit a full matching
,
and p(n) = d(n)/n. We now define G̃n as the set of graphs in Gn that satisfy
m(n)
l(g) ≤ m2 (n) (1 − p (n))
.
(15)
Informally, this is a set of graphs which, for the given measure PMm(n),n on subgraphs, have a high
probability that a random subgraph will have a full matching. Consistent with the general outline
of the proof given in Section 4, we will show that a) graphs in G̃n have favorable delay guarantees
(Proposition 1 ) and b) that random graphs are highly likely to belong to G̃n (Lemma 5).
Lemma 5. (Probability of Full Matching on Random Subgraphs) With p(n) = d(n)/n,
we have
lim Pn,p(n) G̃n = 1,
(16)
n→∞
Remark on Lemma 5: Eq. (16) states that graphs in G̃n can be found using the Erdös-Rényi
model, with high probability. This probabilistic statement is not to be confused with Eq. (15),
which is a deterministic property that holds for all g ∈ G̃n . For the latter property, the randomness
lies only in the sampling of subgraphs (via PMm(n),n ). This distinction is important for our analysis,
because the interconnection topology, gn , stays fixed over time, while the random subgraph, gn |Γ∪∆ ,
is drawn independently for different batches.
Before proving Lemma 5, we state the following useful fact. The proof consists of a simple
argument using Hall’s marriage theorem and a union bound (cf. Lemma 2.1 in (23) for a proof of
the lemma).
Lemma 6. Let G be an (n, n, p) random bipartite graph. Then
n
Pn,p (G does not admit a full matching) ≤ 3n (1 − p) .
(17)
23
Proof. (Lemma 5) Define
q(g, h) = I (g |h does not admit a full matching ) ,
where g ∈ Gn and h ∈ Mm(n),n . Let l(g) = E (q(g, H)), Let H be a random element of Mm(n),n ,
distributed according to PMm(n),n (h). Note that
l(g) = PMm(n),n
h ∈ Mm(n),n : g |h does not admit a full matching
.
Let G be an (rn, n, p (n)) random bipartite graph over I ∪ J, generated independently of H.
Note that the distribution of G restricted to any m(n)-by-m(n) subset of I ∪ J is that of an
(m(n), m(n), p(n)) random bipartite graph. Therefore, by Lemma 6, we have
E (l(G)) = P (G|H does not admit a full matching)
≤ 3m(n) (1 − p(n))
m(n)
,
(18)
o
n
m(n)
.
where P(·) represents the distributions of both G and H.Let G̃n = g ∈ Gn : l(g) ≤ m2 (n) (1 − p(n))
Eq. (18), combined with Markov’s inequality, yields
m(n)
Pn,p(n) G̃n = 1 − P l (G) > m2 (n) (1 − p(n))
E (l(G))
3
≥1 −
≥1−
,
m(n)
2
m(n)
m (n) (1 − p(n))
(19)
which converges to 1 as n → ∞, because m(n) → ∞. as n → ∞. This completes the proof of Lemma
5.
Using Lemma 5, we can now obtain an upper bound on the value of q(gn ). The proof is given in
Appendix A.3.
Lemma 7. Let b (n) = Kn/y(n), as in Eq. (11), for some constant K > 8/λ. If gn ∈ G̃n , then
q(gn ) ≤
1
· n−(λK−4)/2 .
y 2 (n)
We are now ready to bound the mean and variance of the service time distribution for the
Matching queue, using Eq. (13) and Lemmas 3, 4, and 7. The proof is given in Appendix A.4.
Lemma 8. (Service Time Statistics for the Matching Queue) Assume that gn ∈ G̃n , and
let b (n) = Kn/y(n), for some constant K > 8/λ. The service times of the Matching batches, S M (k),
are i.i.d., with
E S M (k) ∼
λ
1
1
·
, and Var S M (k) . 2
.
ρ + y(n)
y (n)
24
Let WM be the steady-state waiting time of a batch in the Matching queue, where waiting
time is defined to be the time from when the batch is formed until when all jobs in the batch have
started to receive service from a physical server. The following is the main result of this subsection.
Proposition 1. (Delays in the Matching Queue) Assume gn ∈ Ĝn ∩ G̃n , where Ĝn and G̃n
were defined before Lemmas 1 and 5, respectively. We have
1
E (WM ) .
.
y(n)
(20)
Proof. We use Kingman’s bound, which states that the expected value of the waiting time in
2
2
σa +σs
a GI/GI/1 queue, W , is bounded by E (W ) ≤ λ̃ 2(1−
, where λ̃ is the arrival rate, ρ̃ is the traffic
ρ̃)
intensity, and σa2 and σs2 are the variances for the interarrival times and service times, respectively.
Using Lemmas 2 and 8, the claim follows by substituting
1
n
y(n)
.
=
,
E (A(k)) b(n)
K
K
λ
E (S M (k)) ρ+ y(n)
ρ
ρ̃ =
=
≤
< 1,
K
E (A(k))
ρ+
ry(n)
1
1 b(n)
.
,
σa2 = Var (A(k)) ∼
2
λ n
ny(n)
1
,
σs2 = Var S M (k) .
y(n)2
λ̃ =
(21)
for all sufficiently large values of n. We obtain
1
1
1
E (WM ) . y(n)
+
.
.
2
ny(n) y(n)
y(n)
5.2.
Delays in a Residual Queue
The delay analysis for the Residual queues is conceptually simpler compared to that of the Matching
queue, but requires more care on the technical end. The main difficulty comes from the fact
that, due to the physical servers’ interactions with the Matching queue, the service times in a
Residual queue i, {SiR (k)}k≥0 , are neither identically distributed nor independent over k (Section
4.2.1). To overcome this difficulty, we will use a trick to restore the i.i.d. nature of the service
times. In particular, we will create a per-sample-path coupling of the SiR (k)’s to an i.i.d. sequence
n
o
S̃iR (k)
, such that SiR (k) ≤ S̃iR (k) for all k, almost surely. If we pretend that the S̃iR (k)’s are
k≥0
indeed the true service times, an upper bound on the expected waiting time can be established
using Kingman’s bound. We finish the argument by showing that this upper bound applies to the
expected value of the original waiting time. The following proposition is the major technical result
of this section.
25
Proposition 2. Assume gn ∈ Ĝn , where Ĝn was defined in Lemma 1. Fix any i ∈ I. Let k be the
index for batches. Denote by {TiR (k)}k≥0 and {SiR (k)}k≥0 the sequences of interarrival and service
times at the ith Residual queue, respectively, defined on a common probability space. There exists
n
o
, defined on the same probability space, that satisfies:
a sequence S̃iR (k)
k≥0
1. P S̃iR (k) ≥ SiR (k) , ∀k ∈ N = 1.
n
o
2. Elements of S̃iR (k) are i.i.d. over k, and independent of {TiR (k)}.
2
2
2
R
. b n(n)
3. E S̃iR (1) . b n(n)
2 , and E S̃i (1)
2 .
Proof. The proof is given in Appendix A.5. It involves an explicit coupling construction of the
n
o
sequence S̃iR (k) , which is then shown to satisfy all three claims. Technicalities aside, the proof
makes use of the following simple observations: (1) the event periods can be “prolonged” via
coupling to be made independent, without qualitatively changing the scaling of the service time,
S̃iR (k), and (2) the first and second moments of SiR (k) are mainly influenced by the size of the
Residual batch, R(k), which is small ( .
b(n)
)
n
by design.
Denote by WiR the steady-state waiting times in the ith Residual queue, where waiting
time is defined to be the time from when the batch is formed till when all jobs in the batch have
started to receive service. The next proposition is the main result of this sub-section, which is
proved using the stochastic dominance result of Proposition 2, and the same Kingman’s bound as
in Proposition 1. The proof is given in A.6.
Proposition 3. (Delay in a Residual Queue) Assume gn ∈ Ĝn ∩ G̃n , where Ĝn and G̃n were
defined before Lemmas 1 and 5, respectively. We have13
max E WiR .
i∈I
5.3.
1
.
y(n)
(22)
Proof of Theorem 1 Under Uniform Arrival Rates
Proof. Let Hn0 = Ĝn ∩ G̃n for all n ≥ 0, where Ĝn and G̃n were defined before Lemmas 1 and 5,
respectively. Since each job is served either through the Matching queue or a Residual queue, the
total queueing delay is no more than that of the waiting time in a virtual queue plus the time to
form a batch (A(k)). Hence, letting b(n) = Kn/y(n), by Lemma 2, and Propositions 1 and 3, we
have
Eπ (W |gn , λn ) ≤ E (A(1)) + max E W M , E W1R
K0
ln n
≤
= K0 ·
,
y(n)
d(n)
13
The expectation is defined in the sense of Eq. (2).
(23)
26
if gn ∈ Hn0 , where K 0 is a constant independent of n and gn . It is not difficult to show, by the weak
law of large numbers, that there exist n ↓ 0, such that
deg (Gn )
P 1 − n ≤
≤ 1 + n = 1,
(24)
d(n)
n
o
deg(gn )
random
graph.
Let
L
=
g
∈
G
:
∈
[1
−
,
1
+
]
,
for all n, where Gn is a rn, n, d(n)
n
n
n
n
n
n
d(n)
and Hn = Hn0 ∩ Ln . By Eq. (24), and Lemmas 1 and 7, we have
Pn, d(n) (Hn ) ≥ 1 − δn ,
(25)
n
for all n, for some δn ↓ 0. Note that the definitions of Ĝn , G̃n and Ln do not depend on the arrival
rates λn , and hence the value of δn also does not depend on λn .
Eqs. (23) and (25) combined prove the first two claims of the theorem. Since all λi are equal in
this case, the scheduling policy is oblivious to the arrival rates by definition. This completes the
proof of Theorem 1 when λn,i = λ < 1/r for all n and i.
6.
General Case and Arrival-rate Oblivious Scheduling
We now complete the proof of Theorem 1 for the general case, for λn satisfying Condition 1. In
particular, we will show that the original virtual-queue-based scheduling policy given in Section 4.2
automatically achieves the u2 (n) ln n/d(n) delay scaling, while being fully oblivious to the values of
the λn,i . We will use a definition of Hn identical to that in the uniform case, so that the probability
of this set will still converge to 1. To avoid replicating our earlier arguments, and since the delays
in each of the virtual queues are completely characterized by the statistics of arrival times and job
sizes, we will examine the changes in each of these queueing primitives as we move to the general
case. Throughout the section, we will analyze a system where the arrival rate vector, λn , satisfies
Condition 1, and the scheduling rules are the same as before, with b(n) = Kn/y(n).
1. Inter-arrival times of batches. The inter-arrival times to all virtual queues are equal to
the time it takes to collect enough jobs to form a single batch. By the merging property of Poisson
P
processes, the aggregate stream of arrivals to the system is a Poisson process of rate i∈I λn,i , with
X
λn,i ≤ ρn.
(26)
i∈I
We will make a further assumption, that there exists ρ̃, with 0 < ρ̃ < ρ, such that
X
i∈I
λn,i ≥ ρ̃n.
(27)
27
This assumption will be removed by the end of this section by a small modification to the scheduling
policy. Given Eqs. (26) and (27), a result analogous to Lemma 2 holds:
E (A(k)) =
b(n)
,
n
and
Var (A(k)) .
b(n)
.
n2
(28)
2. Service times for Residual queues. The distributions of the times at which a server tries
to fetch a job from a Residual queue can vary due to the different values of the λn,i ’s. However, it is
not difficult to show that the upper bound on the service times at the Residual queues (Proposition
2) depend only on the batch size ρr b(n), and are independent of the specific values of the λn,i ’s.
Therefore, the only factor that may significantly change the delay distribution at the ith Residual
queue is its associated Residual batch size, Ri (k) (Eq. (8)). Since the arrival rates are no longer
uniform, this distribution will in principle be different from the uniform setting. However, we will
show that the difference is small and does not change the scaling of delay.
Denote by pi the probability that an incoming job to the system happens to arrive at queue i.
We have that
λn,i
u(n)
≤
=
λ
ρ̃n
i∈I n,i
pi = P
1
ρ̃
n
u(n)
,
(29)
where u(n) is the upper bound on λn,i defined in Condition 1, and the inequality follows from the
P
assumptions that λn,i ≤ u(n) and that i∈I λn,i ≥ ρ̃n (Eq. (27)). Using Eq. (29), and following the
same steps as in the proof of Proposition 2, one can show that14
b2 (n)u2 (n)
,
max E S̃iR (k) .
i∈I
n2
and
b2 (n)u2 (n)
max Var S̃iR (k) .
.
i∈I
n2
We repeat the arguments in Proposition 3 by replacing, e.g., the bounds on Var S̃iR (k) with
q
2
2
(n)
d(n)
n ln n
maxi∈I Var S̃iR (k) . b (n)u
.
Letting
b(n)
=
K
,
we
have
that,
whenever
u(n)
,
d(n)
ln n
n2
max E
i∈I
WiR
n
b(n) b2 (n)u2 (n)
b(n)u2 (n)
.
+
.
b(n) n2
n2
n
ln n 2
.
u (n).
d(n)
(30)
3. Service times for the Matching queue. Surprisingly, it turns out that the bounds on the
service times statistics for the Matching queue in the uniform arrival-rate case (Lemma 8) will carry
ρ̃
In particular, we observe, similar to Eq. (66) in (? ), that now Ri (k) R1 λb (n) , u(n)
n , where Ri (·, ·) is defined
in Lemma 11 in (? ). Here, “” is in terms of stochastic dominance.
14
28
over to the general setting, unchanged. Recall from Eq. (13) that the service time of a Matching
batch is composed of two elements: the length of a COLLECT phase, Scol , and, with probability
q(gn ), the length of a clearing phase Scle . Since the size of the Matching batch is at most ρr b(n) by
definition, the characterizations of Scol and Scle given by Lemmas 3 and 4 continue to hold, with λ
being replaced by λ = ρr . It remains to verify that the bound on q(gn ) stays unchanged. While the
λn,i ’s are now different and the subgraph induced by ∆ and Γ is no longer uniformly distributed,
the upper bound on q(gn ) given in Lemma 7 still holds, because Lemma 5 applies to arbitrary
distributions of subgraphs. Therefore, the scaling in Proposition 1
E WM .
ln n
1
=
,
y(n) d(n)
(31)
continues to hold under non-uniform arrival rates.
Combining Eqs. (26), (30) and (31), we have, analogous to Eq. (23), that for all n ≥ 1,
Eπ (W |gn , λn ) ≤ E (A(1)) + max E W M , E W1R
K1 ln n K2 u2 (n) ln n Ku2 (n) ln n
+
≤
,
≤
d(n)
d(n)
d(n)
(32)
if gn ∈ Hn , where K1 , K2 and K are positive constants independent of n, gn and λn .
Finally, we justify the assumption made in Eq. (27), by showing that the scheduling policy could
be easily modified such that Eq. (27) always holds: we simply insert to each queue an independent
Poisson stream of “dummy packets” at rate δ, where δ is chosen to be any constant in (0, 1−ρ
).
r
The dummy packets are merely place-holders: when a server begins serving a dummy packet, it
simply goes on a vacation of duration Expo (1). This trivially guarantees the validity of Eq. (27),
P
with λ̃ = δ, and since i∈I λn,i ≤ ρn, doing so is equivalent to having a new system with ρ replaced
by ρ0 = ρ + δr < 1. Hence, all results continue to hold.15 This concludes the proof of Theorem 1.
7.
Conclusions and Future Work
The main message of this paper is that the benefits of diminishing delay and large capacity region
can be jointly achieved in a system where the level of processing flexibility of each server is small
compared to the system size. The main result, Theorem 1, proves this using a randomly constructed
interconnection topology and a virtual-queue-based scheduling policy. As a by-product, it also
provides an explicit upper bound (Eq. (3)) that captures a trade-off between the delay (E (W )),
level of robustness (u(n)), and degree of processing flexibility (d(n)).
15
This maneuver of inserting dummy packets is of course a mere technical device for the proof. In reality, delay and
capacity would both become much less of a concern when the traffic intensity, ρ, is too small, at which point less
sophisticated scheduling policies may suffice.
29
The scaling regime considered in this paper assumes that the traffic intensity is fixed as n
increases, which fails to capture system performance in the heavy-traffic regime (ρ ≈ 1). It would be
interesting to consider a scaling regime in which ρ and n scale simultaneously (e.g., the celebrated
Halfin-Whitt regime (25)), but it is unclear at this stage what exact formulations and analytical
techniques are appropriate. At a higher level, it would be interesting to understand how long-term
input characteristics in a queueing network (e.g., how arrival rates are drawn or evolve over time)
impact its temporal performance (e.g., delay), and what role the network’s intrinsic structure (e.g.,
flexibility) has to play here.
Theorem 1 also leaves open several promising directions for future improvements and extensions,
which we discuss briefly. It is currently necessary to have that d(n) ln n in order to achieve a
diminishing delay. The ln n factor is essentially tight if a random-graph-based connectivity structure
is to be used, because d(n) = Ω(ln n) is necessary for an Erdös-Rényi type random graph to be
connected (i.e., not have isolated queues). We suspect that the ln n factor can be reduced using a
different graph generation procedure.
Fixing u(n) to a constant, we observe that when d(n) = n (fully connected graph), the system
behaves approximately as a M/M/n queue with total arrival rate nρ; if ρ is held fixed, then the
expected delay is known to vanish exponentially fast in n. Thus, there is a gap between this fact
and the polynomial scaling of O (ln n/d(n)) given in Theorem 1. We believe that this gap is due
to an intrinsic inefficiency of the batching procedure used in our scheduling policy. The batching
procedure provides an analytical convenience by allowing us to leverage the regularity in the arrival
traffic in obtaining delay upper bounds. However, the use of batching also mandates that all jobs
wait for a prescribed period of time before receiving any service. This can be highly inefficient in
the scaling regime that we consider, where the traffic intensity is fixed at ρ, because a 1 − ρ fraction
of all servers (approximately) are actually idle at any moment in time. We therefore conjecture
that a scheduling policy that tries to direct a job to an available server immediately upon its arrival
(such as a greedy policy) can achieve a significantly better delay scaling, but the induced queueing
dynamics, and the appropriate analytical techniques, may be very different from those presented
in this paper.
Finally, the parameter u(n) captures the level of robustness against the uncertainties in the
arrival rates. In terms of inducing a diminishing waiting time as n → ∞, there is still a gap between
p
the current requirement that u(n) d(n)/ln n and the upper bound that u(n) = O (d(n)) (since
each queue is connected to only d(n) servers on average). The u2 (n) factor in the scaling of E (W )
is due to the impact on delay by the arrival rate fluctuations in the Residual queues (cf. Eq. (30)).
30
It is unclear what the optimal dependence of E (W ) on u(n) is. It is also possible that the condition
u(n) d(n) alone is sufficient for achieving a diminishing delay; we conjecture that this is the case.
References
[1]S. Kandula, S. Sengupta, A. Greenberg, P. Patel, R. Chaiken, “Nature of datacenter traffic: measurements
and analysis,” IMC, 2009.
[2]G. Soundararajan, C. Amza, and A. Goel, “Database replication policies for dynamic content applications,” Proc. of EuroSys, 2006.
[3]W. Jordan and S. C. Graves, “Principles on the benefits of manufacturing process flexibility,” Management
Science, 41(4):577–594, 1995.
[4]S. Gurumurthi and S. Benjaafar, “Modeling and analysis of flexible queueing systems,” Management
Science, 49:289–328, 2003.
[5]S. M. Iravani, M. P. Van Oyen, and K. T. Sims, “Structural flexibility: A new perspective on the design
of manufacturing and service operations,” Management Science, 51(2):151–166, 2005.
[6]M. Chou, G. A. Chua, C-P. Teo, and H. Zheng, “Design for process flexibility: efficiency of the long chain
and sparse structure,” Operations Research, 58(1):43–58, 2010.
[7]D. Simchi-Levi and Y. Wei, “Understanding the performance of the long chain and sparse designs in
process flexibility,” submitted, 2011.
[8]M. Chou, C-P. Teo, and H. Zheng, “Process flexibility revisited: the graph expander and its applications,”
Operations Research, 59:1090–1105, 2011.
[9]M. Leconte, M. Lelarge and L. Massoulie, “Bipartite Graph Structures for Efficient Balancing of Heterogeneous Loads,” ACM Sigmetrics, London, 2012.
[10]R. Talreja, W. Whitt, “Fluid models for overloaded multiclass many-service queueing systems with FCFS
routing,” Management Sci., 54(8):1513-1527,2008.
[11]J. Visschers, I. Adan, and G. Weiss, “A product form solution to a system with multi-type jobs and
multi-type servers,” Queueing Systems, 70:269-298, 2012.
[12]A. Mandelbaum and M. I. Reiman, “On pooling in queueing networks,” Management Science, 44(7):971981, 1998.
[13]J. M. Harrison and M. J. Lopez, “Heavy traffic resource pooling in parallel-server systems,” Queueing
Systems, 33:39-368, 1999.
[14]S. L. Bell and R. J. Williams, “Dynamic scheduling of a system with two parallel servers in heavy traffic
with resource pooling: asymptotic optimality of a threshold policy,” Ann. Appl. Probab., 11(3): 608-649,
2001.
[15]R. Wallace and W. Whitt, “A staffing algorithm for call centers with skill-based routing,” Manufacturing
and Service Operations Management, 7:276–294, 2005.
31
[16]A. Bassamboo, R. S. Randhawa, and J. A. Van Mieghem, “A little flexibility is all you need: on the
asymptotic value of flexible capacity in parallel queuing systems,” submitted, 2011.
[17]J. N. Tsitsiklis and K. Xu, “On the power of (even a little) resource pooling,” Stochastic Systems, 2: 1–66
(electronic), 2012.
[18]M. Neely, E. Modiano and Y. Cheng, “Logarithmic delay for n×n packet switches under the crossbar
constraint,” IEEE/ACM Trans. Netw., 15(3):657–668, 2007.
[19]J. N. Tsitsiklis and D. Shah, “Bin packing with queues,” J. Appl. Prob., 45(4):922–939, 2008.
[20]N. McKeown, A. Mekkittikul, V. Anantharam and J. Walrand, “Achieving 100% throughput in an inputqueued switch,” IEEE Trans. on Comm., 47(8):1260–1267, 1999.
[21]S. Kunniyur and R. Srikant, “Analysis and design of an adaptive virtual queue,” ACM SIGCOMM, 2001.
[22]P. Erdös and A. Rényi, “On random matrices,” Magyar Tud. Akad. Mat. Kutato Int. Kozl, 8:455–461,
1964.
[23]S. R. Bodas, “High-performance scheduling algorithms for wireless networks,” Ph.D. dissertation, University of Texas at Austin, Dec. 2010.
[24]V. F. Kolchin, B. A. Sevast’yanov, and V. P. Chistyakov, Random Allocation, John Wiley & Sons, 1978.
[25]S. Halfin and W. Whitt, “Heavy-traffic limits for queues with many exponential servers,” Operations
Research, 29: 567-588, 1981.
[26]J. N. Tsitsiklis and K. Xu, “Queueing system topologies with limited flexibility,” Sigmetrics, Pittsburgh,
PA, June, 2013.
32
Appendix A:
A.1.
Additional Proofs
Proof of Lemma 1
Proof. (Lemma 1) Let Gn be an (rn, n, p(n)) random bipartite graph. We first assume that r ≤ 1. Since
d(n) ln n, it is well known that (cf. (22); see also Lemma 6),
P (Gn admits a full matching) → 1,
as n → ∞. We let Ĝn be the set of all graphs that admit a full matching. For each g ∈ Ĝn , choose one full
matching M , and let L(i) = j such that (i, j) ∈ M . L is an efficient assignment function by the definition of
a full matching, since each j is matched with at most one i thorough L.
We now consider the case of r > 1. For each n, partition I into r disjoint subsets, I1 , . . . , Ir , where each
subset has size at most n. Using again Lemma 6, we have that
P (Gn |Ik ∪J admits a full matching for all k = 1, . . . , r) → 1,
(33)
as n → ∞, where Gn |Ik ∪J is the graph Gn restricted to the vertex set Ik ∪ J. Similar to the previous case,
we let Ĝn be the set of graphs that satisfy the event in Eq. (33). For each n and k, let Mk be a full matching
in Gn |Ik ∪J , and for all i ∈ Ik , let L(i) = j where (i, j) ∈ Mk . We then have |L−1 (j)| ≤ r ≤ r + 1 and L is an
efficient assignment function.
A.2.
Proof of Lemma 4
The following technical lemma is useful (see (24), Chapter 1, §2).
Lemma 9. Consider throwing balls uniformly into n bins. Let Nm be the number of balls required so that
at least m bins are non-empty. If m = o(n), then E (Nm ) ∼ m, and Var (Nm ) = o(m).
Proof. (Lemma 4) The length of Scol is equal to the time it takes for λb(n) many servers to enter the
state STANDBY. Denote by Tλ (j) the time of the jth jump in a rate-λ Poisson process. We have that
E (Scol ) = E T(ρ+)n Nλb(n)
∞
X
=
E T(ρ+)n (j) P Nλb(n) = j
j =1
∞
X
1
jP Nλb(n) = j
(ρ + )n j =1
1
E Nλb(n) ,
(34)
=
(ρ + )n
d Pj
where the equality (a) follows from the fact that T(ρ+)n (j) = i=1 Ei , where the Ei ’s are i.i.d., with distri(a)
=
bution
d
Ei = Expo ((ρ + ) n). By Lemma 9, and since b(n) n, we have that
E (Scol ) ∼
1
λ
b(n)
λb(n) =
·
.
(ρ + )n
(ρ + ) n
(35)
To compute the second moment, by Lemma 14 in Appendix B, we have that
j (j + 1)
E T(2ρ+)n (j) =
.
2
(ρ + ) n2
(36)
33
Using a similar argument to that in the calculation of E (Scol ), we have
2
E Scol
=E T(2ρ+)n Nλb(n)
∞
X
=
E T(ρ+)n (j)2 P Nλb(n) = j
j =1
1
(ρ + )2 n2
1
=
(ρ + )2 n2
1
=
(ρ + )2 n2
b2 (n)
. 2 ,
n
(a)
=
∞
X
j(j + 1)P Nλb(n) = j
j =1
2
E Nλb(n) + E Nλb
(n)
h
2
i
E Nλb(n) + E Nλb(n)
+ Var Nλb(n)
(37)
where the equality (a) follows from Eq. (36), and the last step follows from Lemma 9.
A.3.
Proof of Lemma 7
Proof. (Lemma 7) Let G̃n be as in Lemma 5. The size of the random subset Γ ⊂ I induced by a Matching
batch is at most λb(n), and the size of the random subset ∆ ⊂ J induced by the STANDBY servers is exactly
λb(n). Hence, let m(n) = λb(n), and there exists a distribution PMm(n),n over Mm(n),n , with Mm(n),n defined
as in Lemma 5, such that Γ ∪ ∆ is covered by the random subset generated by PMm(n),n almost surely. Note
that Γ and ∆ are generated independently of gn . We now apply Lemma 5 to obtain that for all gn ∈ G̃n
q(gn ) ≤ l(gn )
.b2 (n) (1 −p(n))
λb(n)
= b2 (n) exp [−λb(n) · ln (1 − p(n))]
p(n)2
(a) 2
= b (n) exp −λb(n)p(n) − λb(n)
+ O b(n)p(n)3
2
(b)
n2
1
1
.
exp ln λK/2 =
n−(λK−4)/2 ,
y(n)2
n
y(n)2
where step (a) is based on the Taylor expansion: ln(1 − x) = −x −
substitution b(n) = K y(nn) and p(n) =
positive value less than 1.
A.4.
d(n)
n
=
y (n) ln n
n
. The constant
1
2
x2
2
(38)
+ O (x3 ), and (b) is based on the
in (b) can be replaced by any other
Proof of Lemma 8
Proof. (Lemma 8) Combining Eqs. (13) and Lemmas 3 (for Scle ), 4 (for Scol ), and 7 (for q(gn )), we have
(a)
E S M (k) = E (Scol ) + q(gn )E(Scle )
λ
K
1
n
1 (λK−4)
−2
≤
·
+C
n
ρ + y(n)
y(n)2
y(n)
λ
K
C
1
=
·
+
n− 2 (λK−6) ,
3
ρ + y(n) y(n)
λ
K
∼
·
,
ρ + y(n)
(39)
whenever K > λ6 , where C is some positive constant. Note that in (a) we have used the fact that Xq(gn ) and
Scle are independent. Similarly,
2
E S M (k)
34
2
2
2
= (1 − q(gn )) E Scol
+ q(gn )E Scol
+ Scle
+ 2Scol Scle
2
2
+ 2q(gn )E (Scol ) E (Scle ) + q(gn )E Scle
=E Scol
1
1
1
n
+
n−(λK−4)/2
.
·
y(n)2 y(n)2
y(n) y(n)
1
n2
+
n−(λK−4)/2
,
2
y(n)
y(n)2
1
1
.
+
n−(λK−8)/2 ,
2
y(n)
y(n)4
1
,
.
y(n)2
2
whenever K > λ8 . Note that Var (S M (k)) ≤ E S M (k) . y(n1 )2 . This completes the proof.
A.5.
(40)
Proof of Proposition 2
n
o
Proof. (Proposition 2) We will explicitly construct the sequence S̃iR (k) and show that it satisfies all
three claims. Throughout the proof, we will omit the superscript R and the subscript i in our notation (e.g.,
write S(k) in place of SiR (k)).
Before describing the coupling procedure in detail, we explain in words what is to be accomplished. Recall
from Section 4.2.1 that the type of an event point for the server can be either “matching” or “residual”. The
times between adjacent event points (i.e., event periods) are an i.i.d. exponential random variables, with one
exception: if the event point is of type “matching” and the Matching queue is in state COLLECT. In this
case, the server has to remain in STANDBY state until the end of the COLLECT period, and this induces
dependency among the event periods.
The main idea of the coupling procedure is to simply “prolong” the lengths of all event periods, so that
the resulting lengths are i.i.d. (Step 1). As a technical maneuver, we will also append an additional event
period, if the first event point in a Residual batch is not of a Matching type (Step 2). This guarantees that
the number of event periods, and hence the total service time for a Residual batch, is i.i.d. among different
batches.
Let L be an efficient assignment function for gn (Section 4.2.1), and j = L (i) the index of the server
assigned to queue i. Fix any k ≥ 1. The service period (Definition 5) of the kth batch, [Ek , Dk ], is composed
of a sequence of event periods for server j (Section 4.2.1). Let {em }1≤m≤Mk be the times of all event points
that fall within [Ei , Dk ), and let e0 be the event point that immediately precedes Ek . We have
e0 ≤ Ek ≤ e1 ≤ e2 ≤ · · · ≤ eMk = Dk ,
n
o
We will define S̃ (k)
(41)
, for every sample path of the system, as follows
k≥1
A coupled construction of S̃(k)
For all k ≥ 1:
• Step 1: For all m = 0, 1, . . . , Mk − 1:
(a) If em is of type “matching,” and the Matching queue is in state COLLECT at time em , we let
pm = (em+1 − em ) + h − (l − em ) ,
(42)
35
where l is the first time after em when the Matching leaves state COLLECT, and h is the total amount
of time it has spent in COLLECT before exiting. Note that by definition, h ≥ (l − em ) almost surely, and
hence pm ≥ em+1 − em almost surely.
(b) Otherwise, let
pm = (em+1 − em ) + h̃,
(43)
where h̃ is drawn according to the distribution of the length of a COLLECT period in the Matching queue
(See Lemma 4),
Nλb(n)
d
h̃ =
X
i=1
1
Expo
,
n (ρ + )
(44)
where Nm was defined in Lemma 9.
• Step 2: Draw p̃ from the distributions
(
0, if e0 is of type “matching”;
d
PNλb(n)
p̃ =
Expo (1) + i=1
Expo n(ρ1+)
otherwise,
(45)
• Step 3: Set
Mk −1
S̃ (k) = p̃ · I (R (k) > 0) +
X
pm ,
(46)
m=0
where R (k), defined in Eq. (8), is the size of the kth Residual batch. The indicator function ensures that
the service time is zero if the batch is empty.
By construction, p̃ ≥ 0, and pm ≥ em+1 − em for all 0 ≤ m ≤ Mk − 1, almost surely. Therefore, the first
claim of Proposition 2, that S̃(k) ≥ S(k) for all k almost surely, holds. We now show that the elements in
n
o
S̃ (k) are i.i.d., and independent from the arrival times {T (k)}. The following lemma is useful.
Lemma 10. For any k, the random variables p0 , p1 , . . . , pMk −1 , p̃ are i.i.d., with distribution
Nλb(n)
d
p = Expo (1) +
X
Expo (n (ρ + )) .
(47)
i=1
Furthermore,
E (p) → 1,
(48)
as n → ∞.
Proof. At server j, no two event points of type “match” can occur during the same COLLECT period of
the Matching queue, since after the first such event point, server j will remain in state STANDBY until the
end of the COLLECT period. Therefore, Step 1 of the coupling does not introduce any dependency among
the pm ’s, and the definitions of p̃ and all the pm ’s involve only independent random variables. The specific
distribution of p is immediate from the construction, and the fact that all jobs sizes and vacation lengths
are distributed according to Expo (1).
To show Eq. (48), by the same argument as that in the proof of Lemma 4, we have that


Nλb(n)
X
E
Expo (n (ρ + )) = E Nλb(n) · E (Expo (n (ρ + )))
i=1
36
.
and hence E (p) → E (Expo (1)) = 1 as n → ∞.
b(n)
,
n
(49)
We say that an event point at server j is of type “residual-i,” if
1.The event point is of type “residual”.
2.A job from queue i is selected to be processed.
By the definition of a Residual batch, if R (k) > 0, the departure time of the kth Residual batch, Dk , coincides
with the Ri (k)th event point of type “residual-i” after Ek . Recall that an event period is of type “residual”
1
. Denote
|L−1 (j )|
by M̃k the number of non-zero terms in S̃ (k) (Eq. (46)). The preceding arguments show that that M̃k ’s are
with probability 1 − (ρ + ), and when this occurs, queue i will be selected with probability
independent, and
R(k)
d
M̃k = I (R (k) > 0) +
X
Gk,u ,
∀k ≥ 0.
(50)
u=1
where the Gk,u ’s are i.i.d., with distribution
1 − (ρ + )
.
Gk,u = Geo
|L−1 (j)|
d
(51)
Finally, by Lemma 10, we have that
R(k) Gk,n
S̃ (k) = p · I (R (k) > 0) +
XX
pk,u,v ,
(52)
u=1 v =1
where p and the pk,u,v ’s are i.i.d., with distribution given in Eq. (47). Since the size of Residual batches
(R (k)) are i.i.d., and independent from the arrival times, we have established the second claim of Proposition
2.
To prove the last claim of the Proposition 2 that concerns the first two moments of S̃ (k), we begin with
the following fact on the balls-into-bins problem.
Lemma 11. Denote by Hi (m, n) be the number of balls in bin i after throwing m balls uniformly into n
bins. Define
+
Ri (m, n) = (Hi (m, n) − 1) .
(53)
If m = o(n) , then
m2
E (Ri (m, n)) . 2 ,
n
m2
2
E Ri (m, n) . 2 ,
n
(54)
(55)
as n → ∞.
Proof. By the definition of Ri , we have the following two cases
1.Conditioning on Hi (m, n) ≥ 1, then
1
d
d
Ri (m, n) = Hi (m − 1, n) = Bino m − 1,
,
n
where Bino(n, p) is the binomial distribution with parameter n and p.
(56)
37
2.Conditioning on Hi (m, n) = 0, then Ri (m, n) = 0.
Therefore, we have that
E (Ri (m, n)) = P (Hi (m, n) ≥ 1) ·
m−1
,
n
(57)
and
2
E Ri (m, n)
"
=P (Hi (m, n) ≥ 1) ·
m−1
n
2
#
m−1
1
+
.
1−
n
n
(58)
d
Since Hi (m, n) = Bino m, n1 , we have
m
1
P (Hi (m, n) ≥ 1) = 1 − 1 −
n 1
= 1 − exp m ln 1 −
n
m
(a)
. 1 − exp −
n
(b)
m
. 1− 1−
n
m
. .
n
(59)
where steps (a) and (b) follow from the Taylor expansions ln(1 − x) & −x and exp (x) & 1 − x as x ↓ 0,
respectively. The claim is proved by substituting Eq. (59) into Eqs. (57) and (58).
Recall that R(k) is the number of jobs in the kth Residual batch, defined in Eq. (8). We have that
d
R(k) = R1 (λb(n), λn) ,
(60)
where R (·, ·) on the right-hand side is defined in Lemma 11. By Lemma 11, we have
b2 (n)
,
E (R(k)) .
n2
2
b (n)
E R(k)2 .
.
n2
(61)
(62)
By Eq. (52), we have
R(k) Gk,n
E S̃(k) = E p · I (R (k) > 0) +
XX
!
pk,u,v
u=1 v =1
(a)
= E (p) P (R (k) > 0) + E (p) E (R (k)) E (Gk,n )
(b)
≤ E (p) E (R (k)) + E (p) E (R (k)) E (Gk,n )
= [E (p) (E (Gk,n ) + 1)] E (R (k))
(c)
. E (R (k)) ,
b2 (n)
,
.
n2
(63)
where the step (a) is based on Lemma 13 in Appendix B, step (b) from the fact that P (X > 0) ≤ E (X)
for any random variable X defined on Z+ , step (c) from Eqs. (48) and (51), and step (d) from Eq. (61).
Similarly,

2
E S̃(k)
= E  p · I (R (k) > 0) +
!2 
R(k) Gk,n
XX
u=1 v =1
pk,u,v

38
= E p2 P (R (k) > 0)
R(k) Gk,n
XX
+ 2E (p) P (R (k) > 0) E

!2 u=1
R(k) Gk,n
XX
+E
pk,u,v 
!
pk,u,v
v =1
u=1 v =1
(a)
2
2
. E (R (k)) + E (R (k)) + E (R (k)) + E R (k)
b2 (n)
,
.
n2
(64)
where step (a) follows from Lemma 13 from Appendix B, and the fact that P (R(k) > 0) ≤ E (R(k)). With
Eqs. (63) and (64), we have completed the proof of Proposition 2.
A.6.
Proof of Proposition 3
Proof. (Proposition 3) We begin by stating a technical lemma which formalizes the obvious fact that,
on a per-sample-path basis, delays are monotone with respect to the service times. The proof is elementary
and omitted.
Lemma 12. Consider an initially empty first-come-first-serve (FIFO) queue with a single server. Let
n
o
{T (k)}k≥1 be a sequence of arrival times, and let {S (k)}k≥1 and S̃ (k)
be sequences of service times,
n k≥1o
all defined on a common probability space. Denote by {W (k)}k≥1 and W̃ (k)
the resulted waiting times
k≥1
n
o
induced by ({T (k)} , {S (k)}) and {T (k)} , S̃ (k) , respectively. If P S̃ (k) ≥ S (k) , ∀k ∈ N = 1, then
P W̃ (k) ≥ W (k) , ∀k ∈ N = 1,
(65)
E W̃ (k) ≥ E (W (k)) ,
(66)
which also implies that
∀k ≥ 1.
Fix i ∈ I. By Lemma 12 and Proposition 2, E (WiR ) is upper-bounded by the expected waiting time
n
o
in steady-state of an GI/GI/1 queue with arrival times {A(k)} and service times S̃iR (k) . Using the
σ 2 +σ 2
a
s
Kingman’s formula, Lemma 2, and Proposition 2, we have E (WiR ) ≤ λ̃ 2(1
, where
−ρ̃)
1
n
λ̃ =
∼
,
E (A(k)) b(n)
b(n)
σa2 = Var (A(k)) . 2 ,
n
E S̃1R (k)
b2 (n)
n2
b(n)
n
b(n)
.
.
,
E (A(k))
n
b2 (n)
σs2 = Var S̃1R (k) .
,
n2
ρ̃ =
(67)
(68)
which yield
n
E WiR .
b(n)
In particular, if b(n) = K y(nn) , then E (WiR ) .
b(n) b2 (n)
+ 2
n2
n
1
y (n)
.
b(n)
.
n
(69)
, ∀i ∈ I. It is also clear that the scaling in the above
equation holds uniformly over all i ∈ I. This completes the proof.
39
A.7.
Proof of Eq. (5)
We show in this section that the constant in Eq. (3) can be further expanded to incorporate the traffic
intensity, ρ. In particular, there exists a constant K 0 , independent of ρ, such that
K≤
K0
.
1−ρ
(70)
To see why this is true, we go back to Propositions 1 and 3, which characterize the expected delays in the
Matching queue and Residual queues, respectively. It is easy to verify, from the proof of Proposition 3, that
the bounds on the delays in the Residual queues can be made to be independent of ρ (This is intuitively
consistent with the fact that the Residual queues are used to only absorb the “discrepancies”). By Proposition
1, the expected delay in the Matching queue does depend on ρ, but only through the value of the effective
traffic intensity, ρ̃. In particular, by Eq. (21), we have
1
ρ
≤
,
ρ+ 1+
∀ρ ∈ (0, 1).
(71)
1
2
1
≤
.
=
1
1 − ρ̃ 1 − 1+ 1 (1
1
−
ρ
−ρ)
(72)
ρ̃ ≤
Setting = 21 (1 − ρ), we have
2
Eq. (72) combined with the Kingman’s formula proves our claim, for the case with uniform traffic rates
(λi,n = ρ/r for all i). The validity of Eq. (5) in the general case (λ satisfying Condition 1) can be established
with an identical argument.
A.8.
Proof of Eq. (7)
Define
2
0 u (n) ln n
f
(gn , λn ) = I Eπn (W |gn , λn ) ≤ K
,
(73)
d(n)
where πn be the scheduling policy given in Theorem 1. Let Gn be an rn, n, d(nn) random bipartite graph.
n,K 0
By the first two claims of Theorem 1, there exists K > 0 such that whenever
Pn, d(n) (fn,K (Gn , λn ) = 1) ≥ 1 − δn ,
(74)
n
for all n and λn ∈ Λn , where δn ↓ 0 and do not depend on λn . Now let λn be distributed as µn , and Gn as
Pn, d(n) . We have, by Eq. (74), that
n
PZ (fn,K (Gn , λn ) = 1)
=
Pn, d(n) (fn,K (Gn , λn ) = 1) dµn
n
Zλn
≥
(1 − δn )dµn
λn
=1 − δn .
This proves Eq. (7).
(75)
40
Appendix B:
Technical Lemmas
Lemma 13. Let X, Y and Z be random variables taking values in Z+ . Let random variables X, {Yi }i≥1
and {Zi,j }i,j≥1 be drawn independently according to the distributions of X, Y and Z, respectively. Define
W=
Yi
X
X
X
Zi,j .
i=1 j =1
We have
E (W ) = µX µY µZ ,
E W 2 ≤ µX µY sZ + µX sY µ2Z + sX µ2Y µ2Z ,
4
4
where µX = E (X) , sX = E (X 2 ), and similarly for Y and Z.
Lemma 14. Let Xi be a sequence of i.i.d. random variables, with E (X) = µ and E (X 2 ) = s, then

!2 
n
X
E
Xi  = n s + n(n − 1) µ2 .
(76)
i=1
d
In the special case where Xi = Expo (λ), we have

!2 
n
X
1
n(n + 1)
2
.
Xi  = 2 n + 2 n(n − 1) =
E
λ
λ
λ2
i=1
(77)
Download