Uploaded by ftimaezzaki

performance-analysis-of-communications-networks-and-systems compress

advertisement
PERFORMANCE ANALYSIS
OF COMMUNICATIONS
NETWORKS AND SYSTEMS
PIET VAN MIEGHEM
Delft University of Technology
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo
Cambridge University Press
The Edinburgh Building, Cambridge cb2 2ru, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9780521855150
© Cambridge University Press 2006
This publication is in copyright. Subject to statutory exception and to the provision of
relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press.
First published in print format 2006
isbn-13
isbn-10
978-0-511-16917-5 eBook (NetLibrary)
0-511-16917-5 eBook (NetLibrary)
isbn-13
isbn-10
978-0-521-85515-0 hardback
0-521-85515-2 hardback
Cambridge University Press has no responsibility for the persistence or accuracy of urls
for external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.
Waar een wil is, is een weg.
to my father
to my wife Saskia
and my sons Vincent, Nathan and Laurens
Contents
Preface
xi
1
Introduction
1
Part I
7
2
3
4
Probability theory
Random variables
9
2.1
2.2
2.3
2.4
2.5
2.6
9
16
20
26
28
34
Probability theory and set theory
Discrete random variables
Continuous random variables
The conditional probability
Several random variables and independence
Conditional expectation
Basic distributions
37
3.1
3.2
3.3
3.4
3.5
3.6
3.7
37
43
47
51
54
58
59
Discrete random variables
Continuous random variables
Derived distributions
Functions of random variables
Examples of other distributions
Summary tables of probability distributions
Problems
Correlation
61
4.1
4.2
4.3
61
67
68
Generation of correlated Gaussian random variables
Generation of correlated random variables
The non-linear transformation method
v
vi
Contents
4.4
4.5
4.6
5
6
Examples of the non-linear transformation method
Linear combination of independent auxiliary random
variables
Problem
8
9
78
82
Inequalities
83
5.1
5.2
5.3
5.4
5.5
5.6
5.7
83
84
86
87
90
92
94
The minimum (maximum) and infimum (supremum)
Continuous convex functions
Inequalities deduced from the Mean Value Theorem
The Markov and Chebyshev inequalities
The Hölder, Minkowski and Young inequalities
The Gauss inequality
The dominant pole approximation and large deviations
Limit laws
97
6.1
6.2
6.3
6.4
97
101
103
104
General theorems from analysis
Law of Large Numbers
Central Limit Theorem
Extremal distributions
Part II
7
74
Stochastic processes
113
The Poisson process
115
7.1
7.2
7.3
7.4
7.5
7.6
115
120
122
129
130
132
A stochastic process
The Poisson process
Properties of the Poisson process
The nonhomogeneous Poisson process
The failure rate function
Problems
Renewal theory
137
8.1
8.2
8.3
8.4
8.5
138
144
149
153
155
Basic notions
Limit theorems
The residual waiting time
The renewal reward process
Problems
Discrete-time Markov chains
157
9.1
157
Definition
Contents
9.2
9.3
9.4
10
11
12
13
14
Discrete-time Markov chain
The steady-state of a Markov chain
Problems
vii
158
168
177
Continuous-time Markov chains
179
10.1
10.2
10.3
10.4
10.5
10.6
10.7
10.8
179
180
187
188
193
195
196
199
Definition
Properties of continuous-time Markov processes
Steady-state
The embedded Markov chain
The transitions in a continuous-time Markov chain
Example: the two-state Markov chain in continuous-time
Time reversibility
Problems
Applications of Markov chains
201
11.1 Discrete Markov chains and independent random variables
11.2 The general random walk
11.3 Birth and death process
11.4 A random walk on a graph
11.5 Slotted Aloha
11.6 Ranking of webpages
11.7 Problems
201
202
208
218
219
224
228
Branching processes
229
12.1
12.2
12.3
12.4
12.5
231
233
237
240
243
The probability generating function
The limit Z of the scaled random variables Zn
The Probability of Extinction of a Branching Process
Asymptotic behavior of Z
A geometric branching processes
General queueing theory
247
13.1
13.2
13.3
13.4
13.5
13.6
247
252
256
263
266
267
A queueing system
The waiting process: Lindley’s approach
The Benes̆ approach to the unfinished work
The counting process
PASTA
Little’s Law
Queueing models
271
viii
Contents
14.1
14.2
14.3
14.4
14.5
14.6
14.7
14.8
14.9
The M/M/1 queue
Variants of the M/M/1 queue
The M/G/1 queue
The GI/D/m queue
The M/D/1/K queue
The N*D/D/1 queue
The AMS queue
The cell loss ratio
Problems
Part III
15
16
17
Physics of networks
271
276
283
289
296
300
304
309
312
317
General characteristics of graphs
319
15.1
15.2
15.3
15.4
15.5
15.6
15.7
Introduction
The number of paths with m hops
The degree of a node in a graph
Connectivity and robustness
Graph metrics
Random graphs
The hopcount in a large, sparse graph with unit link
weights
15.8 Problems
319
321
322
325
328
329
The Shortest Path Problem
347
16.1 The shortest path and the link weight structure
16.2 The shortest path tree in NQ with exponential link
weights
16.3 The hopcount kQ in the URT
16.4 The weight of the shortest path
16.5 The flooding time WQ
16.6 The degree of a node in the URT
16.7 The minimum spanning tree
16.8 The proof of the degree Theorem 16.6.1 of the URT
16.9 Problems
348
349
354
359
361
366
373
380
385
The e!ciency of multicast
387
17.1 General results for jQ (p)
17.2 The random graph Js (Q )
17.3 The n-ary tree
388
392
401
340
346
Contents
17.4
17.5
17.6
17.7
17.8
18
The Chuang—Sirbu law
Stability of a multicast shortest path tree
Proof of (17.16): jQ (p) for random graphs
Proof of Theorem 17.3.1: jQ (p) for n-ary trees
Problem
ix
404
407
410
414
416
The hopcount to an anycast group
417
18.1
18.2
18.3
18.4
18.5
18.6
417
419
423
424
431
Introduction
General analysis
The n-ary tree
The uniform recursive tree (URT)
Approximate analysis
The performance measure in exponentially growing
trees
432
Appendix A
Stochastic matrices
435
Appendix B
Algebraic graph theory
471
Appendix C
Solutions of problems
493
Bibliography
523
Index
529
Preface
Performance analysis belongs to the domain of applied mathematics. The
major domain of application in this book concerns telecommunications systems and networks. We will mainly use stochastic analysis and probability
theory to address problems in the performance evaluation of telecommunications systems and networks. The first chapter will provide a motivation
and a statement of several problems.
This book aims to present methods rigorously, hence mathematically, with
minimal resorting to intuition. It is my belief that intuition is often gained
after the result is known and rarely before the problem is solved, unless the
problem is simple. Techniques and terminologies of axiomatic probability
(such as definitions of probability spaces, filtration, measures, etc.) have
been omitted and a more direct, less abstract approach has been adopted.
In addition, most of the important formulas are interpreted in the sense of
“What does this mathematical expression teach me?” This last step justifies
the word “applied”, since most mathematical treatises do not interpret as
it contains the risk to be imprecise and incomplete.
The field of stochastic processes is much too large to be covered in a single
book and only a selected number of topics has been chosen. Most of the topics are considered as classical. Perhaps the largest omission is a treatment
of Brownian processes and the many related applications. A weak excuse
for this omission (besides the considerable mathematical complexity) is that
Brownian theory applies more to physics (analogue fields) than to system
theory (discrete components). The list of omissions is rather long and only
the most noteworthy are summarized: recent concepts such as martingales
and the coupling theory of stochastic variables, queueing networks, scheduling rules, and the theory of long-range dependent random variables that currently governs in the Internet. The confinement to stochastic analysis also
excludes the recent new framework, called Network Calculus by Le Boudec
and Thiran (2001). Network calculus is based on min-plus algebra and has
been applied to (Inter)network problems in a deterministic setting.
As prerequisites, familiarity with elementary probability and the knowledge of the theory of functions of a complex variable are assumed. Parts in
the text in small font refer to more advanced topics or to computations that
can be skipped at first reading. Part I (Chapters 2—6) reviews probability
theory and it is included to make the remainder self-contained. The book
essentially starts with Chapter 7 (Part II) on Poisson processes. The Poisxi
xii
Preface
son process (independent increments and discontinuous sample paths) and
Brownian motion (independent increments but continuous sample paths)
are considered to be the most important basic stochastic processes. We
briefly touch upon renewal theory to move to Markov processes. The theory
of Markov processes is regarded as a fundament for many applications in
telecommunications systems, in particular queueing theory. A large part
of the book is consumed by Markov processes and its applications. The
last chapters of Part II dive into queueing theory. Inspired by intriguing
problems in telephony at the beginning of the twentieth century, Erlang
has pushed queueing theory to the scene of sciences. Since his investigations, queueing theory has grown considerably. Especially during the last
decade with the advent of the Asynchronous Transfer Mode (ATM) and the
worldwide Internet, many early ideas have been refined (e.g. discrete-time
queueing theory, large deviation theory, scheduling control of prioritized
flows of packets) and new concepts (self-similar or fractal processes) have
been proposed. Part III covers current research on the physics of networks.
This Part III is undoubtedly the least mature and complete. In contrast to
most books, I have chosen to include the solutions to the problems in an
Appendix to support self-study.
I am grateful to colleagues and students whose input has greatly improved
this text. Fernando Kuipers and Stijn van Langen have corrected a large
number of misprints. Together with Fernando, Milena Janic and Almerima Jamakovic have supplied me with exercises. Gerard Hooghiemstra has
made valuable comments and was always available for discussions about
my viewpoints. Bart Steyaert eagerly gave the finer details of the generating function approach to the GI/D/m queue. Jan Van Mieghem has given
overall comments and suggestions beside his input with the computation of
correlations. Finally, I thank David Hemsley for his scrupulous corrections
in the original manuscript.
Although this book is intended to be of practical use, in the course of
writing it, I became more and more persuaded that mathematical rigor has
ample virtues of its own.
Per aspera ad astra
January 2006
Piet Van Mieghem
1
Introduction
The aim of this first chapter is to motivate why stochastic processes and
probability theory are useful to solve problems in the domain of telecommunications systems and networks.
In any system, or for any transmission of information, there is always a
non-zero probability of failure or of error penetration. A lot of problems in
quantifying the failure rate, bit error rate or the computation of redundancy
to recover from hazards are successfully treated by probability theory. Often
we deal in communications with a large variety of signals, calls, sourcedestination pairs, messages, the number of customers per region, and so on.
And, most often, precise information at any time is not available or, if it
is available, deterministic studies or simulations are simply not feasible due
to the large number of dierent parameters involved. For such problems, a
stochastic approach is often a powerful vehicle, as has been demonstrated
in the field of physics.
Perhaps the first impressing result of a stochastic approach was Boltzmann’s and Maxwell’s statistical theory. They studied the behavior of particles in an ideal gas and described how macroscopic quantities as pressure and
temperature can be related to the microscopic motion of the huge amount
of individual particles. Boltzmann also introduced the stochastic notion of
the thermodynamic concept of entropy V,
V = n log Z
where Z denotes the total number of ways in which the ensembles of particles can be distributed in thermal equilibrium and where n is a proportionality factor, afterwards attributed to Boltzmann as the Boltzmann constant.
The pioneering work of these early physicists such as Boltzmann, Maxwell
and others was the germ of a large number of breakthroughs in science.
Shortly after their introduction of stochastic theory in classical physics, the
1
2
Introduction
theory of quantum mechanics (see e.g. Cohen-Tannoudji et al., 1977) was
established. This theory proposes that the elementary building blocks of
nature, the atom and electrons, can only be described in a probabilistic
sense. The conceptually di!cult notion of a wave function whose squared
modulus expresses the probability that a set of particles is in a certain state
and the Heisenberg’s uncertainty relation exclude in a dramatic way our
deterministic, macroscopic view on nature at the fine atomic scale.
At about the same time as the theory of quantum mechanics was being
created, Erlang applied probability theory to the field of telecommunications. Erlang succeeded to determine the number of telephone input lines
p of a switch in order to serve QV customers with a certain probability s.
Perhaps his most used formula is the Erlang E formula (14.17), derived in
Section 14.2.2,
p
Pr [QV = p] = Ppp! m
m=0 m!
where the load or tra!c intensity is the ratio of the arrival rate of calls to
the telephone local exchange or switch over the processing rate of the switch
per line. By equating the desired blocking probability s = Pr [QV = p], say
s = 1034 , the number of input lines p can be computed for each load .
Due to its importance, books with tables relating s, and p were published.
Another pioneer in the field of communications that deserves to be mentioned is Shannon. Shannon explored the concept of entropy V. He introduced (see e.g. Walrand, 1998) the notion of the Shannon capacity of a
channel, the maximum rate at which bits can be transmitted with arbitrary
small (but non zero) probability of errors, and the concept of the entropy
rate of a source which is the minimum average number of bits per symbol required to encode the output of a source. Many others have extended
his basic ideas and so it is fair to say that Shannon founded the field of
information theory.
A recent important driver in telecommunication is the concept of quality of service (QoS). Customers can use the network to transmit dierent
types of information such as pictures, files, voice, etc. by requiring a specific level of service depending on the type of transmitted information. For
example, a telephone conversation requires that the voice packets arrive at
the receiver G ms later, while a file transfer is mostly not time critical but
requires an extremely low information loss probability. The value of the
mouth-to-ear delay G is clearly related to the perceived quality of the voice
conversation. As long as G ? 150 ms, the voice conversation has toll quality, which is roughly speaking, the quality that we are used to in classical
Introduction
3
telephony. When G exceeds 150 ms, rapid degradation is experienced and
when G A 300 ms, most of the test persons have great di!culty in understanding the conversation. However, perceived quality may change from
person to person and is di!cult to determine, even for telephony. For example, if the test person knows a priori that the conversation is transmitted
over a mobile or wireless channel as in GSM, he or she is willing to tolerate
a lower quality. Therefore, quality of service is both related to the nature
of the information and to the individual desire and perception. In future
Internetworking, it is believed that customers may request a certain QoS
for each type of information. Depending on the level of stringency, the network may either allow or refuse the customer. Since customers will also pay
an amount related to this QoS stringency, the network function that determines to either accept or refuse a call for service will be of crucial interest
to any network operator. Let us now state the connection admission control
(CAC) problem for a voice conversation to illustrate the relation to stochastic analysis: “How many customers p are allowed in order to guarantee that
the ensemble of all voice packets reaches the destination within G ms with
probability s?”This problem is exceptionally di!cult because it depends on
the voice codecs used, the specifics of the network topology, the capacity of
the individual network elements, the arrival process of calls from the customers, the duration of the conversation and other details. Therefore, we
will simplify the question. Let us first assume that the delay is only caused
by the waiting time of a voice packet in the queue of a router (or switch).
As we will see in Chapter 13, this waiting time W of voice packets in a single
queueing system depends on (a) the arrival process: the way voice packets
arrive, and (b) the service process: how they are processed. Let us assume
that the arrival process specified by the average arrival rate and the service process specified by the average service rate are known. Clearly, the
arrival rate is connected to the number of customers p. A simplified
statement of the CAC problem is, “What is the maximum allowed such
that Pr [W A G] ? ?” In essence, the CAC problem consists in computing
the tail probability of a quantity that depends on parameters of interest. We
have elaborated on the CAC problem because it is a basic design problem
that appears under several disguises. A related dimensioning problem is the
determination of the buer size in a router in order not to lose more than a
certain number of packets with probability s, given the arrival and service
process. The above mentioned problem of Erlang is a third example. Another example treated in Chapter 18 is the server placement problem: “How
many replicated servers p are needed to guarantee that any user can access
the information within n hops with probability Pr [kQ (p) A n] ”, where
4
Introduction
is certain level of stringency and kQ (p) is the number of hops towards the
most nearby of the p servers in a network with Q routers.
The popularity of the Internet results in a number of new challenges. The
traditional mathematical models as the Erlang B formula assume “smooth”
tra!c flows (small correlation and Markovian in nature). However, TCP/IP
tra!c has been shown to be “bursty” (long-range dependent, self-similar and
even chaotic, non-Markovian (Veres and Boda, 2000)). As a consequence,
many traditional dimensioning and control problems ask for a new solution. The self-similar and long range dependent TCP/IP tra!c is mainly
caused by new complex interactions between protocols and technologies (e.g.
TCP/IP/ATM/SDH) and by other information transported than voice. It
is observed that the content size of information in the Internet varies considerably in size causing the “Noah eect”: although immense floods are
extremely rare, their occurrence impacts significantly Internet behavior on
a global scale. Unfortunately, the mathematics to cope with the self-similar
and long range dependent processes turns out to be fairly complex and beyond the scope of this book.
Finally, we mention the current interest in understanding and modeling
complex networks such as the Internet, biological networks, social networks
and utility infrastructures for water, gas, electricity and transport (cars,
goods, trains). Since these networks consists of a huge number of nodes Q
and links O, classical and algebraic graph theory is often not suited to produce even approximate results. The beginning of probabilistic graph theory
is commonly attributed to the appearance of papers by Erdös and Rényi in
the late 1940s. They investigated a particularly simple growing model for a
graph: start from Q nodes and connect in each step an arbitrary random,
not yet connected pair of nodes until all O links are used. After about Q@2
steps, as shown in Section 16.7.1, they observed the birth of a giant component that, in subsequent steps, swallows the smaller ones at a high rate.
This phenomenon is called a phase transition and often occurs in nature.
In physics it is studied in, for example, percolation theory. To some extent,
the Internet’s graph bears some resemblance to the Erdös-Rényi random
graph. The Internet is best regarded as a dynamic and growing network,
whose graph is continuously changing. Yet, in order to deploy services over
the Internet, an accurate graph model that captures the relevant structural
properties is desirable. As shown in Part III, a probabilistic approach based
on random graphs seems an e!cient way to learn about the Internet’s intriguing behavior. Although the Internet’s topology is not a simple ErdösRényi random graph, results such as the hopcount of the shortest path and
the size of a multicast tree deduced from the simple random graphs provide
Introduction
5
a first order estimate for the Internet. Moreover, analytic formulas based
on other classes of graphs than the simple random graph prove di!cult to
obtain. This observation is similar to queueing theory, where, beside the
M/G/x class of queues, hardly closed expressions exist.
We hope that this brief overview motivates su!ciently to surmount the
mathematical barriers. Skill with probability theory is deemed necessary
to understand complex phenomena in telecommunications. Once mastered,
the power and beauty of mathematics will be appreciated.
Part I
Probability theory
2
Random variables
This chapter reviews basic concepts from probability theory. A random variable (rv) is a variable that takes certain values by chance. Throughout this
book, this imprecise and intuitive definition su!ces. The precise definition
involves axiomatic probability theory (Billingsley, 1995).
Here, a distinction between discrete and continuous random variables is
made, although a unified approach including alsoR mixed cases via the Stieltjes integral (Hardy et al., 1999, pp. 152—157), j({)gi ({), is possible. In
general, the distribution I[ ({) = Pr [[ {] holds in both cases, and
Z
X
j(n) Pr[[ = n] where [ is a discrete rv
j({)gI[ ({) =
n
Z
=
j({)
gI[ ({)
g{
g{
where [ is a continuous rv
In most practical situations, the Stieltjes integral reduces to the Riemann
integral, else, Lesbesgue’s theory of integration and measure theory (Royden,
1988) is required.
2.1 Probability theory and set theory
Pascal (1623—1662) is commonly regarded as one of the founders of probability theory. In his days, there was much interest in games of chance1 and
the likelihood of winning a game. In most of these games, there was a finite
number q of possible outcomes and each of them was equally likely. The
1
“La règle des partis”, a chapter in Pascal’s mathematical work (Pascal, 1954), consists of a
series of letters to Fermat that discuss the following problem (together with a more complex
question that is essentially a variant of the probability of gambler’s ruin treated in Section
11.2.1): Consider the game in which 2 dice are thrown q times. How many times q do we have
to throw the 2 dice to throw double six with probability s = 12 ?
9
10
Random variables
probability of the event D of interest was defined as
qD
Pr [D] =
q
where qD is the number of favorable outcomes (samples points of D). If the
number of outcomes of an experiment is not finite, this classical definition
of probability does not su!ce anymore. In order to establish a coherent and
precise theory, probability theory employs concepts of group or set theory.
The set of all possible outcomes of an experiment is called the sample
space . A possible outcome of an experiment is called a sample point $
that is an element of the sample space . An event D consists of a set of
sample points. An event D is thus a subset of the sample space . The
complement Df of an event D consists of all sample points of the sample
space
that are not in (the set) D, thus Df = \D. Clearly, (Df )f = D
and the complement of the sample space is the empty set, f = > or, vice a
versa, >f = . A family F of events is a set of events and thus a subset of the
sample space that possesses particular events as elements. More precisely,
a family F of events satisfies the three conditions that define a -field2 : (a)
f
> 5 F, (b) if D1 > D2 > = = = 5 F, then ^"
m=1 Dm 5 F and (c) if D 5 F, then D 5
F. These conditions guarantee that F is closed under countable unions and
intersections of events.
Events and the probability of these events are connected by a probability
measure Pr [=] that assigns to each event of the family F of events of a sample
space a real number in the interval [0> 1]. As Axiom 1, we require that
Pr [ ] = 1. If Pr [D] = 0, the occurrence of the event D is not possible, while
Pr [D] = 1 means that the event D is certain to occur. If Pr [D] = s with
0 ? s ? 1, the event D has probability s to occur.
If the events D and E have no sample points in common, D _ E = >,
the events D and E are called mutually exclusive events. As an example,
the event and its complement are mutually exclusive because D _ Df = >.
Axiom 2 of a probability measure is that for mutually exclusive events D
and E holds that Pr [D ^ E] = Pr [D]+Pr [E]. The definition of a probability
measure and the two axioms are su!cient to build a consistent framework
on which probability theory is founded. Since Pr [>] = 0 (which follows from
2
A field F posseses the properties:
(i) M F;
(ii) if D> E M F, then D E M F and D K E M F;
(iii) if D M F, then Df M F=
This definition is redundant. For, we have by (ii) and (iii) that (D E)f M F. Further, by De
Morgan’s law (D E)f = Df K E f , which can be deduced from Figure 2.1 and again by (iii),
the argument shows that the reduced statement (ii), if D> E M F, then D E M F, is su!cient
to also imply that D K E M F.
2.1 Probability theory and set theory
11
Axiom 2 because D _ > = > and D = D ^ >), for mutually exclusive events
D and E holds that Pr [D _ E] = 0.
As a classical example that explains the formal definitions, let us consider the experiment of throwing a fair die. The sample space consists of
all possible outcomes:
= {1> 2> 3> 4> 5> 6}. A particular outcome of the
experiment, say $ = 3, is a sample point $ 5 . One may be interested in
the event D where the outcome is even in which case D = {2> 4> 6} and
Df = {1> 3> 5}.
If D and E are events, the union of these events D ^ E can be written
using set theory as
D ^ E = (D _ E) ^ (Df _ E) ^ (D _ E f )
because D_E, Df _E and D_E f are mutually exclusive events. The relation
is immediately understood by drawing a Venn diagram as in Fig. 2.1. Taking
AˆBc
AˆB
AcˆB
A
B
:
Fig. 2.1. A Venn diagram illustrating the union D ^ E.
the probability measure of the union yields
Pr [D ^ E] = Pr [(D _ E) ^ (Df _ E) ^ (D _ E f )]
= Pr [D _ E] + Pr [Df _ E] + Pr [D _ E f ]
(2.1)
where the last relation follows from Axiom 2. Figure 2.1 shows that D =
(D _ E) ^ (D _ E f ) and E = (D _ E) ^ (Df _ E). Since the events are
mutually exclusive, Axiom 2 states that
Pr [D] = Pr [D _ E] + Pr [D _ E f ]
Pr [E] = Pr [D _ E] + Pr [Df _ E]
Substitution into (2.1) yields the important relation
Pr [D ^ E] = Pr [D] + Pr [E] Pr [D _ E]
(2.2)
Although derived for the measure Pr [=], relation (2.2) also holds for other
measures, for example, the cardinality (the number of elements) of a set.
12
Random variables
2.1.1 The inclusion-exclusion formula
A generalization of the relation (2.2) is the inclusion-exclusion formula,
Pr [^qn=1 Dn ] =
q
X
Pr [Dn1 ] n1 =1
+
q
X
q
X
q
X
Pr [Dn1 _ Dn2 ]
n1 =1 n2 =n1 +1
q
X
q
X
Pr [Dn1 _ Dn2 _ Dn3 ]
n1 =1 n2 =n1 +1 n3 =n2 +1
q
q
X
X
q1
+ · · · + (1)
···
n1 =1 n2 =n1 +1
q
X
¤
£
Pr _qm=1 Dnm
(2.3)
nq =nq1 +1
The formula shows that the probability of the union consists of the sum of
probabilities of the individual events (first term). Since sample points can
belong to more than one event Dn , the first term possesses double countings.
The second term removes all probabilities of samples points that belong to
precisely two event sets. However, by doing so (draw a Venn diagram), we
also subtract the probabilities of samples points that belong to three events
sets more than needed. The third term adds these again, and so on. The
inclusion-exclusion formula can be written more compactly as,
q
q
q
q
i
h
X
X
X
X
q
m31
(2.4)
(1)
···
Pr _mp=1 Dnp
Pr [^n=1 Dn ] =
m=1
n1 =1 n2 =n1 +1
or with
h
i
Pr _mp=1 Dnp
X
Vm =
nm =nm31 +1
1$n1 ?n2 ?===?nm $q
as
Pr [^qn=1 Dn ] =
q
X
(1)m31 Vm
(2.5)
m=1
Proof of the inclusion-exclusion formula 3 : Let D = q31
n=1 Dn and E = Dq such that
3
Another proof (Grimmett and Stirzacker, 2001, p. 56) uses the indicator function defined in
Section 2.2.1. Useful indicator function relations are
1DKE = 1D 1E
1Df = 1 3 1D
1DX E = 1 3 1(DE)f = 1 3 1Df KE f = 1 3 1Df 1E f
= 1 3 (1 3 1D )(1 3 1E ) = 1D + 1E + 1D 1E = 1D + 1E + 1DKE
Generalizing the last relation yields
1q
D =13
n=1 n
q
\
(1 3 1Dn )
n=1
Multiplying out and taking the expectations using (2.13) leads to (2.3).
2.1 Probability theory and set theory
13
q31
q31
D E = q
n=1 Dn and D K E = Dq K n=1 Dn = n=1 Dn K Dq by the distributive law in set
theory, then application of (2.2) yields the recursion in q
l
l
k
k
q31
q31
Pr [q
n=1 Dn ] = Pr n=1 Dn + Pr [Dq ] 3 Pr n=1 Dn K Dq
(2.6)
By direct substitution of q < q 3 1, we have
k
l
k
l
l
k
q32
q32
Pr q31
n=1 Dn = Pr n=1 Dn + Pr [Dq31 ] 3 Pr n=1 Dn K Dq31
while substitution in this formula of Dn < Dn K Dq gives
l
k
l
l
k
k
q32
q32
Pr q31
n=1 Dn K Dq = Pr n=1 Dn K Dq + Pr [Dq31 K Dq ] 3 Pr n=1 Dn K Dq K Dq31
Substitution of the last two terms into (2.6) yields
l
k
q32
Pr [q
n=1 Dn ] = Pr [Dq31 ] + Pr [Dq ] 3 Pr [Dq31 K Dq ] + Pr n=1 Dn
k
l
k
l
k
l
q32
q32
3 Pr q32
n=1 Dn K Dq31 3 Pr n=1 Dn K Dq + Pr n=1 Dn K Dq K Dq31
(2.7)
Similarly, in a next iteration we use (2.6) after suitable modification in the right-hand side of (2.7)
to lower the upper index in the union,
k
l
k
l
l
k
q33
q33
Pr q32
n=1 Dn = Pr n=1 Dn + Pr [Dq32 ] 3 Pr n=1 Dn K Dq32
l
k
l
k
q33
Pr q32
n=1 Dn K Dq31 = Pr n=1 Dn K Dq31 + Pr [Dq32 K Dq31 ]
k
l
3 Pr q33
n=1 Dn K Dq31 K Dq32
l
k
l
l
k
k
q33
q33
Pr q32
n=1 Dn K Dq = Pr n=1 Dn K Dq + Pr[Dq32 K Dq ]3Pr n=1 Dn K Dq K Dq32
k
l
k
l
q33
Pr q32
n=1 Dn K Dq K Dq31 = Pr n=1 Dn K Dq K Dq31 + Pr [Dq32 K Dq K Dq31 ]
l
k
3 Pr q33
n=1 Dn K Dq K Dq31 K Dq32
The result is
Pr [q
n=1 Dn ] = Pr [Dq32 ] + Pr [Dq31 ] + Pr [Dq ] + 3 Pr [Dq32 K Dq31 ] 3 Pr [Dq32 K Dq ]
l
k
3 Pr [Dq31 K Dq ] + Pr [Dq32 K Dq31 K Dq ] + Pr q33
n=1 Dn
k
l
k
l
k
l
q33
q33
3 Pr q33
n=1 Dn K Dq32 3 Pr n=1 Dn K Dq31 3 Pr n=1 Dn K Dq
k
l
k
l
q33
+ Pr q33
n=1 Dn K Dq31 K Dq32 + Pr n=1 Dn K Dq K Dq32
k
l
k
l
q33
+ Pr q33
n=1 Dn K Dq K Dq31 3 Pr n=1 Dn K Dq K Dq31 K Dq32
which starts revealing the structure of (2.3). Rather than continuing the iterations, we prove the
validity of the inclusion-exclusion formula (2.3) via induction. In case q = 2, the basic expression
(2.2) is found. Assume that (2.3) holds for q, then the case for q + 1 must obey (2.6) where
q < q + 1,
k
l
q
q
Pr q+1
n=1 Dn = Pr [n=1 Dn ] + Pr [Dq+1 ] 3 Pr [n=1 Dn K Dq+1 ]
14
Random variables
Substitution of (2.3) into the above expression yields, after suitable grouping of the terms,
q
q
k
l
[
[
Pr q+1
Pr Dn1 3
n=1 Dn = Pr[Dq+1 ] +
n1 =1
+
q
[
Pr Dn1 K Dn2 3
Pr Dn1 K Dq+1
n1 =1 n2 =n1 +1
q
[
q
[
q
[
q
[
n1 =1
q
[
Pr Dn1 KDn2 K Dn3 +
n1 =1 n2 =n1 +1 n3 =n2 +1
+ · · · + (31)q31
q
[
+ · · · + (31)q
q
[
···
=
Pr [Dn ] 3
n1 =1
+
q+1
[
q+1
[
k
l
Pr Kq
m=1 Dnm 3
q
[
···
n1 =1 n2 =n1 +1
q+1
[
Pr Dn1 KDn2 KDq+1
n1 =1 n2 =n1 +1
q
[
n1 =1 n2 =n1 +1
q
[
q
[
nq =nq1 +1
l
k
Pr Kq
m=1 Dnm K Dq+1
q
[
nq =nq1 +1
Pr Dn1 K Dn2
n1 =1 n2 =n1 +1
q+1
[
q+1
[
q+1
[
Pr Dn1 K Dn2 K Dn3
n1 =1 n2 =n1 +1 n3 =n2 +1
+ · · · + (31)q
q+1
[
q+1
[
···
n1 =1 n2 =n1 +1
which proves (2.3).
q+1
[
l
k
Pr Kq
m=1 Dnm K Dq+1
nq+1 =nq +1
¤
Although impressive, the inclusion-exclusion formula is useful when dealing with dependent
random
variables because of its general nature. In parh
i
m
ticular, if Pr _p=1 Dnp = dm and not a function of the specific indices np ,
the inclusion-exclusion formula (2.4) becomes more attractive,
Pr [^qn=1 Dn ] =
q
X
(1)m31 dm
m=1
X
1
1$n1 ?n2 ?===?nm $q
µ ¶
q
X
m31 q
=
(1)
dm
m
m=1
An application of the latter formula to multicast can be found in Chapter
17 and many others are in Feller (1970, Chapter IV). Sometimes it is useful
to reason with the complement of the union (^qn=1 Dn )f = \ ^qn=1 Dn =
_qn=1 Dfn . Applying Axiom 2 to (^qn=1 Dn )f ^ (^qn=1 Dn ) = ,
Pr [(^qn=1 Dn )f ] = Pr [ ] Pr [^qn=1 Dn ]
and using Axiom 1 and the inclusion-exclusion formula (2.5), we obtain
Pr [(^qn=1 Dn )f ] = 1 q
q
X
X
(1)m31 Vm =
(1)m Vm
m=1
m=0
(2.8)
2.1 Probability theory and set theory
15
with the convention that V0 = 1. The Boole’s inequalities
Pr [^qn=1 Dn ] q
X
Pr [Dn ]
(2.9)
n=1
Pr [_qn=1 Dn ] 1 q
X
Pr [Dfn ]
n=1
are derived as consequences of the inclusion-exclusion formula (2.3). Only if
all events are mutually exclusive, the equality sign in (2.9) holds whilst the
inequality sign follows from the fact that possible overlaps in events are, in
contrast to the inclusion-exclusion formula (2.3), not subtracted.
The inclusion-exclusion formula is of a more general nature and also applies to other measures on sets than Pr [=], for example to the cardinality as
mentioned above. For the cardinality of a set D, which is usually denoted
by |D|, the inclusion-exclusion variant of (2.8) is
|(^qn=1 Dn )f | =
q
X
(1)m |Vm |
(2.10)
m=0
where the total number of elements in the sample space is |V0 | = Q and
¯
¯
X
¯ m
¯
|Vm | =
¯_p=1 Dnp ¯
1$n1 ?n2 ?===?nm $q
A nice illustration of the above formula (2.10) applies to the sieve of
Eratosthenes (Hardy and Wright, 1968, p. 4), a procedure to construct the
table of prime numbers4 up to Q . Consider the increasing sequence of
integers
= {2> 3> 4> = = = > Q }
and remove successively all multiples of 2 (even numbers starting from 4,
6, ...), all multiples of 3 (starting from 32 and not yet removed previously),
all multiples of 5, all multiples of the next number larger than 5 and still in
the list (which is the prime 7) and so on, up to all multiples
hs i of the largest
Q . Here [{] is the
possible prime divisor that is equal to or smaller than
largest integer smaller than or equal to {. The remaining numbers in the
list are prime numbers. Let us now compute the number of primes (Q )
smaller than or equal to Q by using the inclusion-exclusion formula (2.10).
4
An integer number s is prime if s A 1 and s has no other integer divisors than 1 and itself
s. The sequence of the first primes are 2, 3, 5, 7, 11, 13, etc. If I
d and e are divisors of q,
then q = de from which it follows that d and e cannot exceed
both q. Hence, any composite
I
number q is divisible by a prime s that does not exceed q.
16
Random variables
The number of primes smaller than a real number { is ({) and, evidently,
if sq denotes the q-th prime, then (sq ) = q. Let Dn denote the set of the
multiples of the n-th prime sn that belong to . The number of such sets Dn
in the sieve of Eratosthenes
is equal to³the largest
prime number sq smaller
hs i
s ´
than or equal to
Q , hence, q = Q . If t 5 (^qn=1 Dn )f , this means
that t is not divisible by each prime
s number smaller than sq and that t is
a prime number lying between Q ? t sQ . The cardinality of the set
(^qn=1 Dn )f , the number of primes between Q ? t Q is
³s ´
f
q
|(^n=1 Dn ) | = (Q ) Q
On the other hand, if u 5 _mp=1 Dnp for 1 n1 ? n2 ? · · · ? nm q, then
u is a multiple of sn1 sn2 = = = snm and the number of multiples of the integer
sn1 sn2 = = = snm in is
¸ ¯
¯
Q
¯
¯
= ¯_mp=1 Dnp ¯
sn1 sn2 = = = snm
Applying
´ inclusion-exclusion formula (2.10) with | | = V0 = Q 1 and
³s the
Q gives
q=
(Q ) q
³s ´
X
Q =Q 1
(1)m
m=1
X
1$n1 ?n2 ?===?nm $q
Q
sn1 sn2 = = = snm
¸
hs i
Q , i.e. the
The knowledge of the prime numbers smaller than or equal to
³s ´
first q = Q primes, su!ces to compute the number of primes (Q )
smaller than
s or equal to Q without explicitly knowing the primes t lying
between Q ? t Q .
2.2 Discrete random variables
Discrete random variables are real functions [ defined on a discrete probability space as [ : $ R with the property that the event
{$ 5
: [ ($) = {} 5 F
for each { 5 R. The event {$ 5 : [ ($) = {} is further abbreviated as
{[ = {}. A discrete probability density function (pdf) Pr[[ = {] has the
following properties:
(i) 0 Pr[[ = {] 1 for real { that are possible outcomes of an
2.2 Discrete random variables
17
experiment. The set of values { can be finite or countably infinite
and constitute the discrete probability space.
P
(ii)
{ Pr[[ = {] = 1=
In the classical example of throwing a die, the discrete probability space
= {1> 2> 3> 4> 5> 6} and, since each of the six edges of the (fair) die is equally
possible as outcome, Pr[[ = {] = 16 for each { 5 .
2.2.1 The expectation
An important operator acting on a discrete random variable [ is the expectation, defined as
X
{ Pr [[ = {]
(2.11)
H [[] =
{
The expectation H [[] is also called the mean or average or first moment of
[. More generally, if [ is a discrete random variable and j is a function,
then \ = j([) is also a discrete random variable with expectation H [\ ]
equal to
X
H [j([)] =
j({) Pr [[ = {]
(2.12)
{
A special and often used function in probability theory is the indicator
function 1| defined as 1 if the condition | is true and otherwise it is zero.
For example,
X
X
H [1[Ad ] =
1{Ad Pr [[ = {] =
Pr [[ = {] = Pr[[ A d]
{
{Ad
H [1[=d ] = Pr[[ = d]
(2.13)
The higher moments of a random variable are defined as the case where
j({) = {q ,
X
H [[ q ] =
{q Pr [[ = {]
(2.14)
{
From the definition (2.11), it follows that the expectation is a linear operator,
#
" q
q
X
X
dn [n =
dn H [[n ]
H
n=1
n=1
The variance of [ is defined as
h
i
Var[[] = H ([ H [[])2
(2.15)
18
Random variables
The variance is always non-negative. Using the linearity of the expectation
operator and = H [[], we rewrite (2.15) as
£ ¤
Var[[] = H [ 2 2
(2.16)
£ ¤
Since Var[[] 0, relation (2.16) indicates that H [ 2 (H [[])2 . Often
p
the standard deviation, defined as = Var [[], is used. An interesting
variational principle of the variance follows, for the variable x, from
i
h
i
h
H ([ x)2 = H ([ )2 + (x )2
which is minimized at x = = H [[] with value Var[[]. Hence, the best
least square approximation of the random variable [ is the number H [[].
2.2.2 The probability generating function
The probability generating function (pgf) of a discrete random variable [
is defined, for complex }, as
£ ¤ X {
} Pr [[ = {]
(2.17)
*[ (}) = H } [ =
{
where the last equality follows from (2.12). If [ is integer-valued and nonnegative, then the pgf is the Taylor expansion of the complex function *[ (}).
Commonly the latter restriction applies, otherwise the substitution
¡ ¢ } =
hlw is used such that (2.17) expresses the Fourier series of *[ hlw . The
importance of the pgf mainly lies in the fact that the theory of functions can
be applied. Numerous examples of the power of analysis will be illustrated.
Concentrating on non-negative integer random variables [,
*[ (}) =
"
X
Pr [[ = n] } n
(2.18)
n=0
and the Taylor coe!cients obey
¯
1 gn *[ (}) ¯¯
Pr [[ = n] =
n!
g} n ¯}=0
Z
*[ (})
1
=
g}
2l F(0) } n+1
(2.19)
(2.20)
where F(0) denotes a contour around } = 0. Both are inversion formulae5 .
Since the general form H[j([)] is completely defined when Pr[[ = {] is
5
A similar inversion formula for Fourier series exist (see e.g. Titchmarsh (1948)).
2.2 Discrete random variables
19
known, the knowledge of the pgf results in a complete alternative description,
¯
"
X
j(n) gn *[ (}) ¯¯
H [j([)] =
(2.21)
n!
g} n ¯}=0
n=0
Sometimes it is more convenient to compute values of interest directly from
(2.17)
£ ¤rather than from (2.21). For example, q-fold dierentiation of *[ (}) =
H } [ yields
µ ¶
¸
£
¤
[ [3q
gq *[ (})
1
[3q
}
= H [([ 1) · · · ([ q + 1)}
= H
q
g} q
q!
such that
¯
µ ¶¸
[
1 gq *[ (}) ¯¯
H
=
q
q!
g} q ¯}=1
(2.22)
Similarly, let } = hw , then
¤
£
gq *[ (hw )
= H [ q hw[
q
gw
from which the moments follow as
¯
gq *[ (hw ) ¯¯
H [[ ] =
gwq ¯
q
(2.23)
w=0
and, more generally,
¡
¢¯
gq h3wd *[ (hw ) ¯¯
H [([ d) ] =
¯
¯
gwq
q
(2.24)
w=0
2.2.3 The logarithm of the probability generating function
The logarithm of the probability generating function is defined as
¡ £ ¤¢
O[ (}) = log (*[ (})) = log H } [
(2.25)
*0 (})
from which O[ (1) = 0 because *[ (1) = 1. The derivative O0[ (}) = *[
[ (})
³ 0
´2
00 (})
*
*
(})
shows that O0[ (1) = *0[ (1), while from O00[ (}) = *[
*[
, it follows
[ (})
[ (})
that O00[ (1) = *00[ (1) (*0[ (1))2 . These first few derivatives are interesting
because they are related directly to probabilistic quantities. Indeed, from
(2.23), we observe that
H[[] = *0[ (1) = O0[ (1)
(2.26)
20
Random variables
and from H[[ 2 ] = *00[ (1) + *0[ (1)
¡
¢2
Var[[] = *00[ (1) + *0[ (1) *0[ (1)
= O00[ (1) + O0[ (1)
(2.27)
2.3 Continuous random variables
Although most of the concepts defined above for discrete random variables
are readily transferred to continuous random variables, the calculus is in
general more di!cult. Indeed, instead of reasoning on the pdf, it is more
convenient to work with the probability distribution function defined for
both discrete and continuous random variables as
I[ ({) = Pr [[ {]
(2.28)
Clearly, we have lim{<3" I[ ({) = 0, while lim{<+" I[ ({) = 1. Further,
I[ ({) is non-decreasing in { and
Pr [d ? [ e] = I[ (e) I[ (d)
(2.29)
This relation follows from the observations {[ d} ^ {d ? [ e} =
{[ e} and {[ d} _ {d ? [ e} = >. For mutually exclusive events
D _ E = >, Axiom 2 in Section 2.1 states that Pr [D ^ E] = Pr [D] + Pr [E]
which proves (2.29). As a corollary of (2.29), I[ ({) is continuous at the
right which follows from (2.29) by denoting d = e for any A 0. Less
precise, it follows from the equality sign at the right, [ e, and inequality
at the left, d ? [. Hence, I[ ({) is not necessarily continuous at the left
which implies that I[ ({) is not necessarily continuous and that I[ ({) may
possess jumps. But even if I[ ({) is continuous, the pdf is not necessary
continuous6 .
The pdf of a continuous random variable [ is defined as
i[ ({) =
6
gI[ ({)
g{
(2.30)
Weierstrass was the first to present a continuous non-dierentiable function,
i ({) =
"
[
eq cos (dq {)
q=0
where 0 ? e ? 1 and d is an odd positive integer. Since the series is uniformly convergent
for any {, i ({) is continuous everywhere. Titchmarsh (1964, Chapter IX) demonstrates for
({)
de A 1 + 3
that i ({+k)3i
takes arbitrarily large values such that i 0 ({) does not exist.
2
k
Another class of continuous non-dierentiable functions are the sample paths of a Brownian
motion. The Cantor function which is discussed in (Berger, 1993, p. 21) and (Billingsley, 1995,
p. 407) is an other classical, noteworthy function with peculiar properties.
2.3 Continuous random variables
21
Assuming that I[ ({) is dierentiable at {, from (2.29), we have for small,
positive {
Pr [{ ? [ { + {] = I[ ({ + {) I[ ({)
³
´
gI[ ({)
{ + R ({)2
=
g{
Using the definition (2.30) indicates that, if I[ ({) is dierentiable at {,
Pr [{ ? [ { + {]
{{<0
{
i[ ({) = lim
(2.31)
If i[ ({) is finite, then lim{{<0 Pr [{ ? [ { + {] = Pr [[ = {] = 0,
which means that for well-behaved (i.e. I[ ({) is dierentiable for most {)
continuous random variables [, the event that [ precisely equals { is zero7 .
Hence, for well-behaved continuous random variables where Pr [[ = {] = 0
for all {, the inequality signs in the general formula (2.29) can be relaxed,
Pr [d ? [ e] = Pr [d [ e] = Pr [d [ ? e] = Pr [d ? [ ? e]
If i[ ({) is not finite, then I[ ({) is not dierentiable at { such that
lim I[ ({ + {) I[ ({) = I[ ({) 6= 0
{{<0
This means that I[ ({) jumps upwards at { over I[ ({). In that case,
there is a probability mass with magnitude I[ ({) at the point {. Although the second definition (2.31) is strictly speaking not valid in that
case, one sometimes denotes the pdf at | = { by i[ (|) = I[ ({)(| {)
where
R +" ({) is the Dirac impulse or delta function with basic property that
3" (| {)g{ = 1. Even apart from the above-mentioned di!culties
for certain classes of non-dierentiable, but continuous functions, the fact
that probabilities are always confined to the region [0,1] may suggest that
0 i[ ({) 1. However, the second definition (2.31) shows that i[ ({) can
be much larger than 1. For example, if [ is a Gaussian random variable
1
can be
with mean and variance 2 (see Section 3.2.3) then i[ () = I2
made arbitrarily large. In fact,
³
´
2
exp ({3)
22
s
lim
= ({ )
<0
2
7
In Lesbesgue measure theory (Titchmarsh, 1964; Billingsley, 1995), it is said that a countable,
finite or enumerable (i.e. function evaluations at individual points) set is measurable, but its
measure is zero.
22
Random variables
2.3.1 Transformation of random variables
It frequently appears useful to know how to compute I\ ({) for \ = j([).
the event
{} is equivalent
Only
j 31 exists,
© if the31inverse
ª function
©
ª {j([)
gj
gj
to [ j ({) if g{ A 0 and to [ A j 31 ({) if g{
? 0. Hence,
(
¡
¢
gj
A0
I[ j 31 ({) >
¡ 31 ¢ g{
(2.32)
I\ ({) = Pr [j([) {] =
gj
1 I[ j ({) > g{ ? 0
For well-behaved continuous random variables, we may rewrite (2.31) in
terms of dierentials,
i[ ({) g{ = Pr [{ [ { + g{]
and, similarly for i\ (|),
i\ (|) g| = Pr [| \ = j ([) | + g|]
If j is increasing, then theª event {| j ([) | + g|} is equivalent to
©
j 31 (|) [ j 31 (| + g|) = {{ [ { + g{} such that
i\ (|) g| = i[ ({) g{
If j is deceasing, we find that i\ (|) g| = i[ ({) g{. Thus, if j 31 and
j 0 exists, then the relation between the pdf of a well-behaved continuous
random variable [ and that of the transformed random variable \ = j([)
is
¯ ¯
¯ g{ ¯ i[ ({)
i\ (|) = i[ ({) ¯¯ ¯¯ = 0
g|
|j ({)|
This expression also follows by straightforward dierentiation of (2.32). The
chi-square distribution introduced in Section 3.3.3 is a nice example of the
transformation of random variables.
2.3.2 The expectation
Analogously to the discrete case, we define the expectation of a continuous
random variable as
Z
"
H [[] =
3"
{i[ ({)g{
(2.33)
R"
In addition for the expectation to exist8 , we require 3" |{| i[ ({)g{ ? 4.
If [ is a continuous random variable and j is a continuous function, then
8
This requirement is borrowed from measure theory and Lebesgue integration (Titchmarsh, 1964,
Chapter X)(Royden, 1988, Chapter 4), where a measurable function is said to be integrable (in
the Lebesgue sense) over D if i + = max(i ({)> 0) and i 3 = max(3i ({)> 0) are both integrable
over D. Although this restriction seems only of theoretical interest, in some applications (see the
2.3 Continuous random variables
23
\ = j([) is also a continuous random variable with expectation H [\ ] equal
to
Z
"
H [j([)] =
3"
j({)i[ ({)g{
(2.34)
It is often useful to express the expectation H [[] of a non-negative random
variable [ in tail probabilities. Upon integration by parts,
¯" Z " Z "
Z "
Z "
¯
{i[ ({)g{ = {
i[ (x)gx¯¯ +
g{
i[ (x)gx
H [[] =
{
0
{
0
Z0 "
(1 I[ ({)) g{
(2.35)
=
0
The case for a non-positive random variable [ is derived analogously,
¯0
Z {
Z 0
Z {
Z 0
¯
¯
{i[ ({)g{ = {
i[ (x)gx¯
g{
i[ (x)gx
H [[] =
3"
Z 0
=
3"
3"
3"
3"
3"
I[ ({)g{
The general case follows by addition:
Z "
Z 0
H [[] =
(1 I[ ({)) g{ I[ ({)g{
3"
0
A similar expression exists for discrete random variables. In general for
any discrete random variable [, we can write
H [[] =
"
[
n=3"
=
31
[
31
[
n Pr [[ = n] =
n Pr [[ = n] +
n=3"
31
[
n Pr [[ = n]
n=0
"
[
n (Pr [[ $ n] 3 Pr [[ $ n 3 1]) +
n=3"
=
"
[
n (Pr [[ D n] 3 Pr [[ D n + 1])
n=0
32
[
n Pr [[ $ n] 3
n=3"
= 3 Pr [[ $ 31] 3
(n + 1) Pr [[ $ n] +
n=3"
32
[
n=3"
Pr [[ $ n] +
"
[
n=1
"
[
n Pr [[ D n] 3
"
[
(n 3 1) Pr [[ D n]
n=1
Pr [[ D n]
n=1
Cauchy distribution defined
U in (3.38)) the Riemann integral may exists where the Lesbesgue
does not. For example, 0" sin{ { g{ equals, in the Riemann sense, 2 (which is a standard
excercise in contour integration), but this integral does not exists in the Lesbesgue sense.
Only for improper integrals (integration interval is infinite), Riemann integration may exist
where Lesbesgue does not. However, in most other cases (integration over a finite interval),
U
Lesbesgue integration is more general. For instance, if i ({) = 1{{ is ra tio n a l} , then 01 i (x)gx
does not exist in the Riemann sense (since upper and lower sums do not converge to each
U
other). However, 01 i (x)gx = 0 in the Lesbesgue sense (since there is only a set of measure
zero dierent from 0, namely all rational numbers in [0> 1] ). In probability theory and measure
theory, Lesbesgue integration is assumed.
24
Random variables
or the mean of a discrete random variable [ expressed in tail probabilities
is9
"
31
X
X
H [[] =
Pr [[ n] Pr [[ n]
(2.36)
n=1
n=3"
2.3.3 The probability generating function
The probability generating function (pgf) of a continuous random variable
[ is defined, for complex }, as the Laplace transform
Z "
¤
£
*[ (}) = H h3}[ =
h3}w i[ (w)gw
(2.37)
3"
Again, in some cases, it may be more convenient to use } = lx in which case
the double sided Laplace transform reduces to a Fourier transform. The
strength of these transforms is based on the numerous properties, especially
the inverse transform,
Z f+l"
1
i[ (w) =
*[ (})h}w g}
(2.38)
2l f3l"
where f is the smallest real variable Re(}) for which the integral in (2.37)
converges.
as for discrete random variables, we have h}d *[ (}) =
¤
£ 3}([3d)Similarly
H h
¯
q }d
¯
q
q g (h *[ (})) ¯
(2.39)
H [([ d) ] = (1)
¯
g} q
}=0
£
¤
The main dierence £with¤ the discrete case lies in the definition H h3}[
(continuous) versus H } [ (discrete). Since the exponential is an entire
9
We remark that
"
[
H [[] =
n Pr [[ = n] =
n=3"
"
[
6=
"
[
n (Pr [[ D n] 3 Pr [[ D n + 1])
n=3"
n Pr [[ D n] 3
n=3"
"
[
n Pr [[ D n + 1] =
n=3"
"
[
Pr [[ D n]
n=3"
because the series in the second line are diverging. In fact, there exists a finite integer n such
that, for any real arbitrarily small A 0 holds that Pr [[ D n ] = 1 3 and Pr [[ D n ] $
Pr [[ D n] for all n ? n . Hence,
H [[] =
n
[
n=3"
Pr [[ D n] +
"
[
n=n
Pr [[ D n] D (1 3 )
n
[
1+f<"
n=3"
S"
S
where "
n=n Pr [[ D n] = f is finite. Also, even for negative [,
n=3" Pr [[ D n] is always
positive.
2.3 Continuous random variables
25
P"
function10 with power series around } = 0, h3}[ =
expectation and summation can be reversed leading to
£ ¤
"
£ 3}[ ¤ X
(1)n H [ n n
}
=
H h
n!
n=0
(31)n [ n
n!
} n , the
(2.40)
n=0
£ ¤
provided11 H [ n = R (n!) which is a necessary condition for the summation
to converge
for } 6= 0. Assuming convergence12 , the Taylor series
¤
£ 3}[
around } = 0 is expressed as function
of H h
£ of¤ the moments of [,
whereas in the discrete case, the Taylor series of H } [ around } = 0 given
by (2.18) £is expressed
in terms of probabilities of [. This observation has
¤
£ led
¤
3}[
sometimes the moment generating function, while H } [
to call H h
is the probability generating function
£ [ ¤ of the random variable [. On the
around } = 1,
other hand, series expansion of H }
*[ (}) =
"
X
n
Pr [[ = n] (} + 1 1) =
n=0
"
X
n=0
n µ ¶
X
n
Pr [[ = n]
(} 1)m
m
5
6
"
" µ ¶
X
X
n
7
=
Pr [[ = n]8 (} 1)m
m
m=0
m=0
n=m
shows with (2.22) that
H
µ ¶¸ X
" µ ¶
[
n
=
Pr [[ = n]
m
m
n=m
£ ¤
If moments are desired, the substitution } $ h3x in H } [ is appropriate.
2.3.4 The logarithm of the probability generating function
The logarithm of the probability generating function is defined as
¡ £
¤¢
O[ (}) = log (*[ (})) = log H h3}[
10
(2.41)
An entire (or integral) function is a complex function without singularities in the finite complex
plane. Hence, a power series around any finite point has infinite radius of convergence. In other
words, it exists for all finite complex values.
11 The Landau big R-notation specifies the “order of a function” when the argument tends to some
limit. Most often the limit is to infinity, but the R-notation can also be used to characterize
the behavior of a function around some finite point. Formally, i ({) = R (j ({)) for { < "
means that there exist positive numbers f and {0 for which |i ({)| $ f|j({)| for { A {0 .
12 The lognormal distribution defined by (3.43) is an example where the summation (2.40) diverges
for any } 6= 0.
26
Random variables
from which O[ (0) = 0 because *[ (0) = 1. Further, analogous to the
discrete case, we see that O0[ (0) = *0[ (0), O00[ (0) = *00[ (0) (*0[ (0))2 and
H[[] = *0[ (0) = O0[ (0)
However, the dierence with the discrete case lies in the higher moments,
¯
q
¯
q
q g *[ (}) ¯
(2.42)
H [[ ] = (1)
g} q ¯}=0
because with H[[ 2 ] = *00[ (0),
¡
¢2
Var[[] = *00[ (0) *0[ (0)
= O00[ (0)
(2.43)
The latter expression makes O[ (}) for a continuous random variable particularly useful. Since the variance is always positive, it demonstrates that
O[ (}) is convex (see Section 5.5) around } = 0. Finally, we mention that
¤
£
H ([ H[[])3 = O000
[ (0)
2.4 The conditional probability
The conditional probability of the event D given the event E (or on the
hypothesis E) is defined as
Pr [D|E] =
Pr [D _ E]
Pr [E]
(2.44)
The definition implicitly assumes that the event E has positive probability,
otherwise the conditional probability remains undefined. We quote Feller
(1970, p. 116):
Taking conditional probabilities of various events with respect to a particular hypothesis E amounts to choosing E as a new sample space with probabilities proportional to the original ones; the proportionality factor Pr[E] is necessary in order
to reduce the total probability of the new sample space to unity. This formulation shows that all general theorems on probabilities are valid for conditional
probabilities with respect to any particular hypothesis. For example, the law
Pr [D ^ E] = Pr [D] + Pr [E] Pr [D _ E] takes the form
Pr [D ^ E|F] = Pr [D|F] + Pr [E|F] Pr [D _ E|F]
The formula (2.44) is often rewritten in the form
Pr [D _ E] = Pr [D|E] Pr [E]
(2.45)
2.4 The conditional probability
27
which easily generalizes to more events. For example, denote D = D1 and
E = D2 _ D3 , then
Pr [D1 _ D2 _ D3 ] = Pr [D1 |D2 _ D3 ] Pr [D2 _ D3 ]
= Pr [D1 |D2 _ D3 ] Pr [D2 |D3 ] Pr [D3 ]
Another application of the conditional probability occurs when a partitioning of the sample space
is known:
= ^n En and all En are mutually
exclusive, which means that En _ Em = > for any n and m 6= n. Then, with
(2.45),
X
X
Pr [D _ En ] =
Pr [D|En ] Pr [En ]
n
n
The event Dn = {D _ En } is a decomposition (or projection) of the event D
in the basis event En , analogous to the decomposition of a vector in terms
of a set of orthogonal basis vectors that span the total state space. Indeed,
using the associative property D _ {E _ F} = D _ E _ F and D _ D = D,
the intersection Dn _ Dm = {D _ En } _ {D _ Em } = D _ {En _ Em } = >,
which implies mutual exclusivity (or orthogonality). Using the distributive
property D _ {En ^ Em } = {D _ En } ^ {D _ Em }, we observe that
D=D_
= D _ {^n En } = ^n {D _ En } = ^n Dn
P
Finally, since all events Dn are mutually exclusive, Pr [D] = n Pr [Dn ] =
P
= ^n En and in addition, for any pair m> n holds
n Pr [D _ En ]. Thus, if
that En _ Em = >, we have proved the law of total probability or decomposability,
X
Pr [D|En ] Pr [En ]
(2.46)
Pr [D] =
n
Conditioning on events is a powerful tool that will be used frequently. If
the conditional probability Pr [D|En ] is known as a function j (En ), the law
of total probability can also be written in terms of the expectation operator
defined in (2.12) as
Pr [D] = H [j (En )]
(2.47)
Also the important memoryless property of the exponential distribution (see
Section 3.2.2) is an example of the application of the conditional probability.
Another classical example is Bayes’ rule. Consider again the events En
defined above. Using the definition (2.44) followed by (2.45),
Pr [En |D] =
Pr [En _ D]
Pr [D _ En ]
Pr [D|En ] Pr [En ]
=
=
Pr [D]
Pr [D]
Pr [D]
(2.48)
28
Random variables
Using (2.46), we arrive at Bayes’ rule
Pr [D|En ] Pr [En ]
Pr [En |D] = P
m Pr [D|Em ] Pr [Em ]
(2.49)
where Pr [En ] are called the a-priori probabilities, while Pr [En |D] are the
a-posteriori probabilities.
The conditional distribution function of the random variable \ given [
is defined by
I\ |[ (||{) = Pr [\ ||[ = {]
(2.50)
for any { provided Pr [[ = {] A 0. This condition follows from the definition
(2.44) of the conditional probability. The conditional probability density
function of \ given [ is defined by
i\ |[ (||{) = Pr [\ = ||[ = {] =
=
Pr[[ = {> \ = |]
Pr [[ = {]
i[\ ({> |)
i[ ({)
(2.51)
for any { such that Pr [[ = {] A 0 (and similarly for continuous random
variables i[ ({) A 0) and where i[\ ({> |) is the joint probability density
function defined below in (2.59).
2.5 Several random variables and independence
2.5.1 Discrete random variables
Two events D and E are independent if
Pr [D _ E] = Pr [D] Pr [E]
(2.52)
Similarly, we define two discrete random variables to be independent if
Pr [[ = {> \ = |] = Pr [[ = {] Pr [\ = |]
(2.53)
If ] = i ([> \ ), then ] is a discrete random variable with
X
Pr [[ = {> \ = |]
Pr [] = }] =
i ({>|)=}
Applying the expectation operator (2.11) to both sides yields
X
H [i ([> \ )] =
i ({> |) Pr [[ = {> \ = |]
{>|
(2.54)
2.5 Several random variables and independence
29
If [ and \ are independent and i is separable, i ({> |) = i1 ({)i2 (|), then
the expectation (2.54) reduces to
X
X
i1 ({) Pr [[ = {]
i2 (|) Pr [\ = |] = H [i1 ([)] H [i2 (\ )]
H [i([> \ )] =
{
|
(2.55)
The simplest example of the general function is ] = [ + \ . In that case,
the sum is over all { and | that satisfy { + | = }. Thus,
X
X
Pr [[ = {> \ = } {] =
Pr [[ = } |> \ = |]
Pr [[ + \ = }] =
{
|
If [ and \ are independent, we obtain the convolution,
X
Pr [[ + \ = }] =
Pr [[ = {] Pr [\ = } {]
{
=
X
Pr [[ = } |] Pr [\ = |]
|
2.5.2 The covariance
The covariance of [ and \ is defined as
Cov [[> \ ] = H [([ [ ) (\ \ )] = H [[\ ] [ \
(2.56)
If Cov[[> \ ] = 0, then the variables [ and \ are uncorrelated. If [ and \
are independent, then Cov[[> \ ] = 0. Hence, independence implies uncorrelation, but the converse is not necessarily true. The classical example13 is
Q (0> 1) (Section 3.2.3) because
\ = [ 2 where [
£ 3distribution
¤
£ has2 ¤a normal
[ = 0 and H [\ = H [ = 0 as follows from (3.23). Although [
and \ are perfect dependent, they are uncorrelated. Thus, independence is
a stronger property than uncorrelation. The covariance Cov[[> \ ] measures
the degree of dependence between two (or generally more) random variables.
If [ and \ are positively (negatively) correlated, the large values of [ tend
to be associated with large (small) values of \ .
As an application of the covariance, consider the problem of computing the
variance of a sum Vq of random variables [1 > [2 > = = = > [q . Let n = H [[n ],
13
Another example: let X be uniform on [0> 1] and [ = cos(2X) and \ = sin (2X ). Using
(2.34),
] 1
cos(2x) sin (2x) gx = 0
H [[\ ] =
0
as well as H [[] = H [\ ] = 0. I
Thus, Cov[[> \ ] = 0, but [ and \ are perfectly dependent
because [ = cos (arcsin \ ) = ± 1 3 \ 2 .
30
Random variables
then H [Vq ] =
Pq
n=1 n and
5Ã
!2 6
q
h
i
X
([n n ) 8
Var [Vq ] = H (Vq H [Vq ])2 = H 7
n=1
5
6
q X
q
X
=H7
([n n )([m m )8
n=1 m=1
6
5
q
q
q
X
X
X
([n n )([m m )8
= H 7 ([n n )2 + 2
n=1
n=1 m=n+1
Using the linearity of the expectation operator and the definition of the
covariance (2.56) yields
Var [Vq ] =
q
X
n=1
Var [[n ] + 2
q
q
X
X
Cov [[n > [m ]
(2.57)
n=1 m=n+1
Observe that for a set of independent random variables {[n } the double
sum with covariances vanishes.
The Cauchy-Schwarz inequality (5.17) derived in Chapter 5 indicates that
h
i h
i
(H [([ [ ) (\ \ )])2 H ([ [ )2 H ([ [ )2
such that the covariance is always bounded by
|Cov [[> \ ]| [ \
2.5.3 The linear correlation coe!cient
Since the covariance is not dimensionless, the linear correlation coe!cient
defined as
Cov [[> \ ]
([> \ ) =
(2.58)
[ \
is often convenient to relate two (or more) dierent physical quantities expressed in dierent units. The linear correlation coe!cient remains invariant
(possibly apart from the sign) under a linear transformation because
(d[ + e> f\ + g) = sign(df)([> \ )
This transform shows that the linear correlation coe!cient ([> \ ) is inde2 provided 2 A 0.
pendent of the value of the mean [ and the variance [
[
Therefore, many computations simplify if we normalize the random variable
properly. Let us introduce the concept of a normalized random variable
2.5 Several random variables and independence
31
[
[ W = [3
[ . The normalized random variable has a zero mean and a
variance equal to one. By the invariance under a linear transform, the correlation coe!cient ([> \ ) = ([ W > \ W ) and also ([> \ ) = Cov[[ W > \ W ].
The variance of [ W ± \ W follows from (2.57) as
Var [[ W ± \ W ] = Var[[ W ] + Var[\ W ] ± 2 Cov [[ W > \ W ]
= 2(1 ± ([> \ ))
Since the variance is always positive, it follows that 1 ([> \ ) 1.
The extremes ([> \ ) = ±1 imply a linear relation between [ and \ . Indeed, ([> \ ) = 1 implies that Var[[ W \ W ] = 0, which is only possible if
\ + f0 . A similar argu[ W = \ W + f, where f is a constant. Hence, [ = [
\
ment applies for the case ([> \ ) = 1. For example, in curve fitting, the
goodness of the fit is often expressed in terms of the correlation coe!cient.
A perfect fit has correlation coe!cient equal to 1. In particular, in linear
regression where \ = d[ + e, the regression
coe!cients
h
i dU and eU are the
2
minimizers of the square distance H (\ (d[ + e)) and given by
dU =
Cov [[> \ ]
2
[
eU = H [\ ] dU H [[]
Since a correlation coe!cient ([> \ ) = 1 implies Cov[[> \ ] = [ \ , we
see that dU = [
as derived above with normalized random variables.
\
Although the linear correlation coe!cient is a natural measure of the
dependence between random variables, it has some disadvantages. First,
the variances of [ and \ must exist, which may cause problems with
heavy-tailed distributions. Second, as illustrated above, dependence can
lead to uncorrelation, which is awkward. Third, linear correlation is not
invariant under non-linear strictly increasing transformations W such that
(W ([)> W (\ )) 6= ([> \ ). Common intuition expects that dependence measures should be invariant under these transforms W . This leads to the definition of rank correlation which satisfies that invariance property. Here, we
merely mention Sperman’s rank correlation coe!cient, which is defined as
V ([> \ ) = (I[ ([)> I\ (\ ))
where is the linear correlation coe!cient and where the non-linear strict
increasing transform is the probability distribution. More details are found
in Embrechts et al. (2001b) and in Chapter 4.
32
Random variables
2.5.4 Continuous random variables
We define the joint distribution function by I[\ ({> |) = Pr [[ {> \ |]
and the joint probability density function by
i[\ ({> |) =
C 2 I[\ ({> |)
C{C|
Hence,
(2.59)
Z { Z |
I[\ ({> |) = Pr [[ {> \ |] =
The analogon of (2.54) is
3"
3"
i[\ (x> y)gxgy
(2.60)
Z "Z "
H [j([> \ )] =
3"
3"
j({> |)i[\ ({> |)g{g|
(2.61)
Most of the di!culties occur in the evaluation of the multiple integrals. The
change of variables in multiple dimensions involves the Jacobian. Consider
the transformed random variables X = j1 ([> \ ) and Y = j2 ([> \ ) and
denote the inverse transform by { = k1 (x> y) and | = k2 (x> y), then
iX Y (x> y) = i[\ (k1 (x> y)> k1 (x> y)) M (x> y)
where the Jacobian M (x> y) is
C{
M (x> y) = det
Cx
C|
Cx
C{
Cy
C|
Cy
¸
If [ and \ are independent and ] = [ + \ , we obtain the convolution,
Z "
Z "
i] (}) =
i[ ({)i\ (} {)g{ =
i[ (} |)i\ (|)g|
(2.62)
3"
3"
which is often denoted by i] (}) = (i[ i\ )(}). If both i[ ({) = 0 and
i\ ({) = 0 for { ? 0, then the definition (2.62) of the convolution reduces to
Z }
(i[ i\ )(}) =
i[ ({)i\ (} {)g{
0
2.5.5 The sum of independent random variables
P
Let VQ = Q
n=1 [n , where the random variables [n are all independent.
We first concentrate on the case where Q = q is a (fixed) integer. Since
VQ = VQ31 + [Q , direct application of (2.62) yields the recursion
Z "
iVQ (}) =
iVQ 31 (} |)i[Q (|)g|
(2.63)
3"
2.5 Several random variables and independence
33
which, when written out explicitly, leads to the Q -fold integral
Z 4
iVQ (}) =
4
Z 4
i[Q (|Q )g|Q · · ·
4
i[1 (|1 )i[0 (} |Q · · · |1 )g|1
(2.64)
In many cases, convolutions are more e!ciently computed via generating
functions. The generating function of Vq equals
#
" q
h Sq
i
Y
£ Vq ¤
[
[
= H } n=1 n = H
*Vq (}) = H }
} n
n=1
Since all [n are independent, (2.55) can be applied,
*Vq (}) =
q
Y
¤
£
H } [n
n=1
or, in terms of generating functions,
*Vq (}) =
q
Y
*[n (})
(2.65)
n=1
Hence, we arrive at the important result that the generating function of
a sum of independent random variables equals the product of the generating functions of the individual random variables. We also note that the
condition of independence is crucial in that it allows the product and expectation operator to be reversed, leading to the useful result (2.65). Often, the
random variables [n all possess the same distribution. In this case of independent identically distributed (i.i.d.) random variables with generating
function *[ (}), the relation (2.65) further simplifies to
*Vq (}) = (*[ (}))q
(2.66)
In the case where the number of terms Q in the sum VQ is a random
variable with generating function *Q (}), independent of the [n , we use the
general definition of expectation (2.54) for two random variables,
" X
£ V ¤ X
Q
=
*VQ (}) = H }
} { Pr [VQ = {> Q = n]
n=0 {
=
" X
X
} { Pr [VQ = {|Q = n] Pr [Q = n]
n=0 {
where the conditional probability (2.45) is used. Since the value of VQ
34
Random variables
depends on the number of terms Q in the sum, we have Pr [VQ = {|Q = n] =
Pr [Vn = {]. Further, with
X
} { Pr [VQ = {|Q = n] = *Vn (})
{
we have
*VQ (}) =
"
X
*Vn (}) Pr [Q = n]
(2.67)
n=0
The average H [VQ ] follows from (2.26) as
H [VQ ] =
"
X
*0Vn (1) Pr [Q = n] =
n=0
hP
n
"
X
H [Vn ] Pr [Q = n]
(2.68)
n=0
i
Pn
Since H [Vn ] = H
m=1 [m =
m=1 H [[m ] and assuming that all random
variables [m have equal mean H [[m ] = H [[], we have
H [VQ ] =
"
X
nH [[] Pr [Q = n]
n=0
or
H [VQ ] = H [[] H [Q ]
(2.69)
This relation (2.69) is commonly called Wald’s identity. Wald’s identity
holds for any random sum of (possibly dependent) random variables [m
provided the number Q of those random variables is independent of the [m .
In the case of i.i.d. random variables, we apply (2.66) in (2.67) so that
*VQ (}) =
"
X
(*[ (}))n Pr [Q = n] = *Q (*[ (}))
(2.70)
n=0
This expression is a generalization of (2.66).
2.6 Conditional expectation
The generating function (2.67) of a random sum of independent random variables can be derived using the conditional expectation H [\ |[ = {] of two
random variables [ and \ . We will first define the conditional expectation
and derive an interesting property.
Suppose that we know that [ = {, the conditional density function
2.6 Conditional expectation
35
i\ |[ (||{) defined by (2.51) of the random variable \f = \ |[ can be regarded as only function of |. Using the definition of the expectation (2.33)
for continuous random variables (the discrete case is analogous), we have
Z "
|i\ |[ (||{) g|
(2.71)
H [\ |[ = {] =
3"
Since this expression holds for any value of { that the random variable
[ can take, we see that H [\ |[ = {] = j ({) is a function of { and, in
addition since [ = {, H [\ |[ = {] = j ([) can be regarded as a random
variable that is a function of the random variable [. Having identified the
conditional expectation H [\ |[ = {] as a random variable, let us compute its
expectation or the expectation of the slightly more general random variable
k ([) j ([) with j ([) = H [\ |[ = {]. From the general definition (2.34)
of the expectation, it follows that
Z "
Z "
H [k ([) j ([)] =
k ({) j ({) i[ ({) g{ =
k ({) H [\ |[ = {] i[ ({) g{
3"
3"
Substituting (2.71) yields
Z "Z "
H [k ([) j ([)] =
k ({) |i\ |[ (||{) i[ ({) g|g{
3" 3"
Z "Z "
k ({) |i[\ ({> |) g|g{ = H [k ([) \ ]
=
3"
3"
where we have used (2.51) and (2.61). Thus, we find the interesting relation
H [k ([) H [\ |[ = {]] = H [k ([) \ ]
(2.72)
As a special case where k({) = 1, the expectation of the conditional expectation follows as
H [\ ] = H[ [H\ [\ |[ = {]]
where the index in H] clarifies that the expectation is over the random
P
variable ]. Applying this relation to \ = } VQ where VQ = Q
n=1 [n and
all [n are independent yields
¤
£ £
£
¤¤
*VQ (}) = H } VQ = HQ HV } VQ |Q = q
£
¤
Since HV } VQ |Q = q = *VQ (}) and specified in (2.65), we end up with
*VQ (}) = HQ [*VQ (})] =
"
X
n=0
which is (2.67).
*Vn (}) Pr [Q = n]
3
Basic distributions
This chapter concentrates on the most basic probability distributions and
their properties. From these basic distributions, other useful distributions
are derived.
3.1 Discrete random variables
3.1.1 The Bernoulli distribution
A Bernoulli random variable [ can only take two values: either 1 with
probability s or 0 with probability t = 1 s. The standard example of
a Bernoulli random variable is the outcome of tossing a biased coin, and,
more generally, the outcome of a trial with only two possibilities, either
success or failure. The sample space is = {0> 1} and Pr[[ = 1] = s, while
Pr [[ = 0] = t. From this definition, the pgf follows from (2.17) as
£ ¤
*[ (}) = H } [ = } 0 Pr [[ = 0] + } 1 Pr [[ = 1]
or
*[ (}) = t + s}
(3.1)
From (2.23) or (2.14), the q-th moment is
H [[ q ] = s
which shows that = H[[] = s. From (2.24), we find H [([ d)q ] =
s(1 d)q + t(d)q such that the moments centered around the mean are
¢
¡
H [([ )q ] = st q + (1)q tsq = st t q31 + (1)q sq31
£
¤
Explicitly, with s + t = 1, Var[[] = st and H ([ )3 = st(t s).
37
38
Basic distributions
3.1.2 The binomial distribution
A binomial random variable [ is the sum of q independent Bernoulli random
variables. The sample space is
= {0> 1> · · · > q}. For example, [ may
represent the number of successes in q independent Bernoulli trials such as
the number of heads after q-times tossing a (biased) coin. Application of
(2.66) with (3.1) gives
*[ (}) = (t + s})q
(3.2)
Expanding the binomial pgf in powers of }, which justifies the name “binomial”,
q µ ¶
X
q n q3n n
*[ (}) =
}
s t
n
n=0
and comparing to (2.18) yields
µ ¶
q n q3n
Pr[[ = n] =
s t
n
(3.3)
The alternative, probabilistic approach starts with (3.3). Indeed, the probability that [ has n successes out of q trials consists of precisely n successes
(an event with probability sn ) and q n failures (with probability equal to
t q3n ). The total number
¡q¢ of ways in which n successes out of q trials can be
obtained is precisely n .
P
The mean follows from (2.23) or from the definition [ = qm=1 [Bernoulli
and the linearity of the expectation as H [[] = qs. Higher order moments
around the mean can be derived from (2.24) as
¯
¡
¡
¢q ¢ ¯¯
q µ ¶
gp h3wqs t + shw
gp X q n q3n w(tq3n) ¯¯
¯
p
H [([ ) ] =
t s
= p
h
¯
¯
¯
¯
gwp
gw
n
n=0
w=0
w=0
q µ ¶
X
q n q3n
t s
=
(tq n)p
n
n=0
In general, this form seems di!cult to express more elegantly. It illustrates
that, even for simple random variables, computations may rapidly become
unattractive. For p = 2, the above dierentiation leads to Var[[] = qst.
But, this result is more economically obtained from (2.27), since O[ (}) =
qs
qs2
and O00[ (}) = (t+s})
q log (t + s}), O0[ (}) = t+s}
2 . Thus,
Var [[] = qs2 + qs = qst
(3.4)
3.1 Discrete random variables
39
3.1.3 The geometric distribution
The geometric random variable [ returns the number of independent Bernoulli
trials needed to achieve the first success. Here the sample space
is the
infinite set of integers. The probability density function is
Pr [[ = n] = st n31
(3.5)
because a first success (with probability s) obtained in the n-th trial is
proceeded by n 1 failures (each having probability t = 1 s). Clearly,
Pr [[ = 0] = 0. The series expansion of the probability generating function,
*[ (}) = s}
"
X
tn }n =
n=0
s}
1 t}
(3.6)
justifies the name “geometric”.
The mean H [[] = *0[ (1) equals H [[] = s1 . The higher-order moments
can be deduced from (2.24) as
Ã
!¯
"
X
h3wt@s ¯¯
gq
q
=
s
t n (n t@s)q
H [([ ) ] = s q
¯
gw
1 thw ¯
w=0
n=0
Similarly as for the binomial random variable, the variance most easily folt
,
lows from (2.27) with O[ (}) = log s+log (})log(1t}), O0[ (}) = }1 + 13t}
2
t
O00[ (}) = }12 + (13t})
2 . Thus,
Var [[] =
t
t2 t
+ = 2
2
s
s
s
(3.7)
P
The distribution function I[ (n) = Pr [[ n] = nm=1 Pr [[ = m] is obtained as
n31
X
1 tn
tm = s
= 1 tn
Pr [[ n] = s
1t
m=0
The tail probability is
Pr[[ A n] = t n
(3.8)
Hence, the probability that the number of trials until the first success is
larger than n decreases geometrically in n with rate t. Let us now consider
an important application of the conditional probability. The probability
that, given the success is not found in the first n trials, success does not
occur within the next p trials, is with (2.44)
Pr[[ A n + p|[ A n] =
Pr [{[ A n + p} _ {[ A n}]
Pr [[ A n + p]
=
Pr [[ A n]
Pr [[ A n]
40
Basic distributions
and with (3.8)
Pr[[ A n + p|[ A n] = tp Pr[[ A p]
This conditional probability turns out to be independent of the hypothesis,
the event {[ A n}, and reflects the famous memoryless property. Only because Pr[[ A n] obeys the functional equation i ({ + |) = i ({)i (|), the
hypothesis or initial knowledge does not matter. It is precisely as if past
failures have never occurred or are forgotten and as if, after a failure, the
number of trials is reset to 0. Furthermore, the only solution to the functional equation is an exponential function. Thus, the geometric distribution
is the only discrete distribution that possesses the memoryless property.
3.1.4 The Poisson distribution
Often we are interested to count the number of occurrences of an event
in a certain time interval, such as, for example, the number of IP packets
during a time slot or the number of telephony calls that arrive at a telephone
exchange per unit time. The Poisson random variable [ with probability
density function
n h3
(3.9)
n!
turns out to model many of these counting phenomena well as shown in
Chapter 7. The corresponding generating function is
Pr [[ = n] =
*[ (}) = h3
"
X
n
n=0
n!
} n = h(}31)
(3.10)
and the average number of occurrences in that time interval is
H[[] = (3.11)
This average determines the complete distribution. In applications it is
convenient to replace the unit interval by an interval of arbitrary length w
such that
(w)n h3w
Pr [[ = n] =
n!
equals the probability that precisely n events occur in the interval with
duration w. The probability that no events occur during w time units is
Pr [[ = 0] = h3w and the probability that at least one event (i.e. one or
more) occurs is Pr [[ A 0] = 1 h3w . The latter is equal to the exponential distribution. We will also see later in Theorem 7.3.2 that the Poisson
3.1 Discrete random variables
41
counting process and the exponential distribution are intimately connected.
The sum of q independent Poisson random variables each with mean n is
P
again a Poisson random variable with mean qn=1 n as follows from (2.65)
and (3.10).
The higher-order moments can be deduced from (2.24) as
³
´¯
w
gq h3(w3h ) ¯¯
¯
H [([ )q ] = h3
¯
gwq
¯
w=0
from which
£
¤
H[[] = Var[[] = H ([ )3 = The Poisson tail distribution equals
Pr [[ A p] = 1 p
X
n h3
n=0
n!
which precisely equals the sum of p exponentially distributed variables as
demonstrated below in Section 3.3.1.
The Poisson density approximates the binomial density (3.3) if q $ 4
but the mean qs = . This phenomenon is often referred to as the law
of rare events: in an arbitrarily large number q of independent trials each
with arbitrarily small success s = q , the total number of successes will
approximately be Poisson distributed.
The classical argument is to consider the binomial density (3.3) with s = q
q3n
q!
n
1
3
n!(q 3 n)! qn
q
3n n31
n
\
m
q
13
=
13
13
n!
q
q
q
m=1
Pr[[ = n] =
or
log (Pr[[ = n]) = log
n
n!
n31
[
m
log 1 3
3 n log 1 3
+
+ q log 1 3
q
q
q
m=1
33 {
{
{2
= 3q
to obtain up to order
For large q, we use the Taylor expansion log 1 3 q
3 2q
2 +R q
32 R q
n
n(n 3 1)
2
+ n + R q32 3
log (Pr[[ = n]) = log
+ R q32 3 3
+ R q32
n!
q
2q
2q
n
1 (n 3 )2 3 n + R q32
= log
33
n!
2q
With h{ = 1 + { + R({2 ), we finally obtain the approximation for large q,
1 n h3
13
(n 3 )2 3 n + R q32
Pr[[ = n] =
n!
2q
42
Basic distributions
t
t
l
k
1
The coe!cient of q
is negative if n M + 12 3 + 14 > + 12 + + 14 . In that n-interval,
the Poisson density is a lower bound for the binomial density for large q and qs = . The reverse
Pr[[=n]
holds for values of n outside that interval. Since for the Poisson density Pr[[=n31]
= , we
n
see that Pr[[ = n] increases as A n and decreases as ? n. Thus, the maximum of the
Poisson density lies around n = = H[[]. In conclusion, we can say that the Poisson density
approximates the binomial density for large q and qs = from below in the region of about the
I
standard deviation around the mean H[[] = and from above outside this region (in the
tails of the distribution).
A much shorter derivation anticipates results of Chapter 6 and starts from
the probability generating function (3.2) of the binomial distribution after
substitution of s = q ,
¶
µ
(} 1) q
lim *[ (}) = lim 1 +
= h(}31)
q<"
q<"
q
Invoking the Continuity Theorem 6.1.3, comparison with (3.10) shows that
the limit probability generating function corresponds to a Poisson distribution. The Stein—Chen (1975) Theorem1 generalizes the law of rare events:
this law even holds when the Bernoulli trials are weakly dependent.
As a final remark, let Vq be the sum of i.i.d. Bernoulli trials each with
mean s, then Vq is binomially distributed as shown in Section 3.1.2. If s
is a constant and independent of the number of trials q, the Central Limit
Theorem 6.3.1 states that sVq 3qs tends to a Gaussian distribution. In
qs(13s)
summary, the limit distribution of a sum Vq of Bernoulli trials depends on
how the mean s varies with the number of trials q when q $ 4:
1
g
n 3
if s = q , then
h
Vq $ n!
if s is constant, then
sVq 3qs
qs(13s)
g
3{
2
$ hI22
The proof (see e.g. Grimmett and Stirzacker (2001, pp. 130—132)) involves coupling theory
of stochastic random variables. The degree of dependence is expressed in terms of the total
variation distance. The total variation distance between two discrete random variables [ and
\ is defined as
gW Y ([> \ ) =
[
|Pr [[ = n] 3 Pr [\ = n]|
n
and satisfies
gW Y ([> \ ) = 2 sup |Pr [[ M D] 3 Pr [\ M D]|
DaZ
3.2 Continuous random variables
43
3.2 Continuous random variables
3.2.1 The uniform distribution
A uniform random variable [ has equal probability to attain any value in
the interval [d> e] such Rthat the probability density function is a constant.
e
Since Pr[d [ e] = d i[ ({)g{ = 1, the constant value equals
i[ ({) =
1
1
e d {M[d>e]
(3.12)
where 1| is the indicator function defined in Section 2.2.1. The distribution
function then follows as
{d
+ 1{Ae
1
Pr [d [ {] =
e d {M[d>e]
The Laplace transform (2.37) is2
Z "
h3}d h3}e
*[ (}) =
h3}w i[ (w)gw =
}(e d)
3"
(3.13)
while the mean = H [[] most easily follows from
Z "
{g{
d+e
H[[] =
1{M[d>e] =
2
3" e d
The centered moments are obtained from (2.39) as
³ }
´¯
(e3d)
3 }2 (e3d) ¯
2
q
q
h
h
¯
(1) g
¯
H [([ )q ] =
¯
q
e d g}
}
¯
}=0
¯
¯
)}
2(1)q gq sinh( e3d
¯
2
=
¯
¯
e d g} q
}
}=0
Using the power series
X ( e3d )2n+1
sinh( e3d
2 )}
2
=
} 2n
}
(2n + 1)!
"
n=0
leads to
¤
£
(e d)2q
H ([ )2q =
(2q + 1)22q
£
¤
H ([ )2q+1 = 0
2
(3.14)
(})
Notice that }*[
equals the convolution i W j of two exponential densities i and j with rates
de
d and e, respectively.
44
Basic distributions
Let us define X as the uniform random variable in the interval [0> 1]. If
Z = 1 X is a uniform random variable on [0> 1], then Z and X have the
g
same distribution denoted as Z = X because Pr[Z {] = Pr[1 X {] =
Pr[X 1 {] = 1 (1 {) = { = Pr [X {] =
The probability distribution function I[ ({) = Pr[[ {] = j({) whose
inverse exists can be written as a function of IX ({) = {1{M[0>1] . Let [ =
j 31 (X ). Since the distribution function is non-decreasing, this also holds for
the inverse j 31 (=). Applying (2.32) yields with [ = j 31 (X )
£
¤
I[ ({) = Pr j 31 (X ) { = Pr [X j({)] = IX (j({)) = j({)
g
= lnX are exponentially random variFor instance, j 31 (X ) = ln(13X)
31
ables (3.17) with parameter ; j (X ) = X 1@ are polynomially distributed
random variables with distribution Pr [[ {] = { ; j 31 (X ) = cot(X ) is
a Cauchy random variable defined in (3.38) below. In addition, we observe
that X = j ([) = I[ ([), which means that any random variable [ is
transformed into a uniform random variable X on [0> 1] by its own distribution function.
The numbers dn that satisfy congruent recursions of the form dn+1 =
(dn +) mod P , where P is a large prime number (e.g. P = 231 1), and
are integers (e.g. = 397 204 094 and = 0) are to a good approximation
dn
are nearly uniformly
uniformly distributed. The scaled numbers |n = P31
distributed on [0> 1]. Since these recursions with initial value or seed d0 5
[0> P 1] are easy to generate with computers (Press et al., 1992), the above
property is very useful to generate arbitrary random variables [ = j 31 (X )
from the uniform random variable X .
3.2.2 The exponential distribution
An exponential random variable [ satisfies the probability density function
i[ ({) = h3{
> { 0
(3.15)
where is the rate at which events occur. The corresponding Laplace transform is
Z "
h3w h3}w gw =
(3.16)
*[ (}) = }+
0
and the probability distribution is, for { 0,
I[ ({) = 1 h3{
(3.17)
3.2 Continuous random variables
45
The mean or average follows from (2.33) or from H [[] = *0[ (0) as =
H [[] = 1 . The centered moments are obtained from (2.39) as
³
´¯
¶q ¸
µ
q h}@ ¯
g
}+ ¯
1
¯
= (1)q
H [
¯
g} q
¯
}=0
h}@
around } = 0 is
õ ¶ q
!
"
"
"
³ ´n X
1 q X (1)n
h}@ X 1 ³ } ´n X
n }
=
}q
(1)
=
}+
n! n!
Since the Taylor expansion of
n=0
we find that
n=0
}+
q=0
n=0
¶ ¸
µ
q
1 q
q! X (1)n
H [
= q
n!
(3.18)
n=0
For large q, the centered moments are well approximated by
µ
¶ ¸
1 q
q!
H [
' q
h
The exponential random variable possesses, just as its discrete counterpart, the geometric random variable, the memoryless property. Indeed, analogous to Section 3.1.3, consider
Pr[[ w + W |[ A w] =
Pr [[ w + W ]
Pr [{[ w + W } _ {[ A w}]
=
Pr [[ A w]
Pr [[ A w]
and since Pr [[ A w] = h3w , the memoryless property
Pr[[ w + W |[ A w] = Pr[[ A W ]
is established. Since the only non-zero solution (proved in Feller (1970,
p. 459)) to the functional equation i ({ + |) = i ({)i (|), which implies
the memoryless property, is of the form f{ , it shows that the exponential
distribution is the only continuous distribution that has the memoryless
property. As we will see later, this memoryless property is a fundamental
property in Markov processes.
It is instructive to show the close relation between the geometric and
exponential random variable (see Feller (1971, p. 1)). Consider the waiting
time W (measured in integer units of w) for the first success in a sequence of
W
Bernoulli trials where only one trial occurs in a timeslot w. Hence, [ = {w
is a (dimensionless) geometric random variable. From (3.8), Pr[W A nw] =
(1 s)n and the average waiting time is H [W ] = wH [[] = {w
s . The
46
Basic distributions
transition from the discrete to continuous space involves the limit process
w $ 0 subject to a fixed average waiting time H [W ]. Let w = nw, then
¶
µ
w w@{w
lim Pr[W A w] = lim 1 = h3w@H[W ]
{w<0
{w<0
H [W ]
For arbitrary small time units, the waiting time for the first success and
with average H [W ] turns out to be an exponential random variable.
3.2.3 The Gaussian or normal distribution
The Gaussian random variable [ is defined for all { by the probability
density function
¸
1
({ )2
i[ ({) = s exp (3.19)
2 2
2
which explicitly shows its dependence on the average and variance 2 . The
importance of the Gaussian random variables stems from the Central Limit
Theorem 6.3.1. Often a Gaussian — also called normal — random variable
with average and variance 2 is denoted by Q (> 2 ). The distribution
function is
¶
¸
µ
Z {
1
(w )2
{
I[ ({) = s
exp (3.20)
gw 2 2
2 3"
R{
w2
where3 ({) = I12 3" h3 2 gw is the normalized Gaussian distribution corresponding to = 0 and = 1. The double-sided Laplace transform is
1
*[ (}) = s
2
3
Z "
3}w
h
3"
¸
2 }2
(w )2
3}
2
exp gw
=
h
2 2
(3.22)
Abramowitz and Stegun (1968, Section 7.1.1) define the error function as
2
erf (}) = I
] }
2
h3w gw
0
such that (Abramowitz and Stegun, 1968, Section 7.1.22)
1
I
2
1
(w 3 )2
{3
gw
=
exp 3
I
1
+
erf
22
2
2
3"
] {
(3.21)
3.3 Derived distributions
47
and the centered moments (2.39) are
µ 2 2 ¶¯
¯
}
2q
¯
µ 2 ¶q
g
h 2
¯
¤
£
(2q)!
2q
¯
=
=
H ([ )
¯
2q
g}
q!
2
¯
¯
}=0
£
¤
2q+1
H ([ )
=0
(3.23)
We note from (2.65) that a sum of independent Gaussian
variables
¢
¡Pq random
Pq
2
>
Q (n > n2 ) is again a Gaussian random variable Q
n=1 n
n=1 n . If
2
[ = Q (> 2 ), then the scaled random variable \ = d[ is a Q (d>
£ (d)| ¤)
random variable that is verified by computing Pr [\ |] = Pr [ d .
Similarly for translation, \ = [ + e, then \ = Q ( + e> 2 ). Hence, a
linear combination of Gaussian random variables is again a Gaussian random
variable,
!
à q
q
q
X
X
X
dn Q (n > n2 ) + e = Q
dn n + e>
d2n n2
n=1
n=1
n=1
3.3 Derived distributions
From the basic distributions, a large number of other distributions can be
derived as illustrated here.
3.3.1 The sum of independent exponential random variables
By applying (2.65) and (2.38) a substantial amount of practical problems
can be solved. For example, the sum of q independent exponential random
variables, each with dierent rate n A 0, has the generating function
*Vq (}) =
q
Y
n=1
n
} + n
and probability density function
Qq
Z f+l"
h}w
n=1 n
Qq
iVq (w) =
g}
2l
f3l"
n=1 (} + n )
The contour can be closed over the negative half plane for w A 0, where the
integral has simple poles at } = n . From the Cauchy integral theorem,
we obtain
! q
à q
Y
X
h3m w
Qq
n
iVq (w) =
n=1;n6=m (n m )
n=1
m=1
48
Basic distributions
³
If all rates are equals n = , the case reduces to *Vq (}) =
H [Vq ] = q and with probability density
Z
h}w
q f+l"
iVq (w) =
g}
2l f3l" (} + )q
}+
´q
with
Again, the contour can be closed over the negative half plane and the q-th
order poles are deduced from Cauchy’s relation for the q-th derivative of a
complex function
¯
Z
i ($) g$
1
1 gn i (}) ¯¯
=
n! g} n ¯}=}0
2l F(}0 ) ($ }0 )n+1
as
¯
q
(w)q31 3w
gq31 h}w ¯¯
=
h
iVq (w) =
(q 1)! g} q31 ¯}=3
(q 1)!
(3.24)
For integer q, this density corresponds to the qErlang random variable.
When extended to real values of q = ,
i[ (w; > ) =
(w)31 3w
h
()
(3.25)
it is called the Gamma probability density function, with corresponding pgf
µ
¶ ³
} ´3
= 1+
(3.26)
*[ (}; > ) =
}+
and distribution
I[ ({; > ) =
()
Z {
w31 h3w gw
(3.27)
0
This integral, the incomplete Gamma-function, can only be expressed in
closed analytic form if is an integer. Hence, for the q-Erlang random
variable [, the distribution follows after repeated partial integration as
Z {
q31
X ({)n
q
q31 3w
w h gw = 1 (3.28)
h3{
I[ ({; > q) =
(q 1)! 0
n!
n=0
P
({)n 3{
We observe that Pr[[ A {] = q31
, which equals Pr[\ q 1]
n=0 n! h
where \ is a Poisson random variable with mean = {. Further, Pr[[ A
W
{] = Pr[ [ A {], where H[[ W ] = q, or the distribution of the sum of
i.i.d. exponential random variables each with rate follows by scaling { $
{ from the distribution of the sum of i.i.d. exponential random variables
each with unit rate (or mean 1). Moreover, (2.65) and (3.26) show that a
3.3 Derived distributions
49
sum of q independent Gamma random variables specified by n (but with
P
same ) is again a Gamma random variable with = qn=1 n .
At last all centered moments follow from (2.39) by series expansion around
} = 0 as
³ ¡
¢3 ´ ¯¯
gq h}d 1 + }
¯
¯
H [([ d)q ] = (1)q
¯
g} q
¯
}=0
µ
¶
q
3p
X
(d)
= (1)q q!dq
p (q p)!
p=0
In particular, since H [[] = = , we find with
¡3} ¢
p
= (1)p K(}+p)
p!K(})
¶
q µ
q X 3p
H [([ ) ] = (1) q! q
p=0 p (q p)!
q µ ¶
q X
( + p)
q
q
= (1) q
(1)p
p=0 p
() p
q
q
= (1)q
q
() X (> + 1 + q> )
q
where X (d> e> }) is the confluent hypergeometric function (Abramowitz and
Stegun, 1968, Chapter 13). For example, if q = 2, the variance equals
¤
¤
£
£
2 = 2 and further, H ([ )3 = 23 , H ([ )4 = 3(+2)
and
4
¤ 4(5+6)
£
5
.
H ([ ) =
5
3.3.2 The sum of independent uniform random variables
Pn
The sum Vn =
m=1 Xm of n i.i.d. uniform random variables Xm has as
distribution function the n-fold convolution of the uniform density function
(nW)
iX ({) = 10${$1 on [0> 1] denoted by iX ({). The distribution function
equals
Pr [Vn {] =
[{]
X
m=0
(1)m
({ m)n
m!(n m)!
Indeed, from (2.66) and (3.13) the Laplace transform of Vn is
µ
*Vn (}) =
1 h3}
}
¶n
(3.29)
50
Basic distributions
The inverse Laplace transform determines, for f A 0,
¶n
Z f+l" µ
1
1 h3}
{ g
(nW)
h}{ g}
Pr [Vn {] =
iX ({) =
g{
2l f3l"
}
¡ ¢
R f+l" hvd
P
n
1
Using (1 h3} ) = nm=0 nm (1)m h3m} and the integral 2l
f3l" vq+1 gv =
dq
q! 1Re(d)A0 , yields
(nW)
iX ({) =
n µ ¶
X
({ m)n31
n
1
(1)m
(n 1)! ({3m)D0
m
(3.30)
m=0
from which (3.29) follows by integration.
3.3.3 The chi-square distribution
Suppose that the total error of q independent measurements [n , each perturbed by Gaussian noise, has to be determined. In order to prevent that erP
rors may cancel out, the sum of the squared errors V = qn=0 h2n is preferred
Pq
rather than n=0 |hn |. For simplicity, we assume that all errors hn = [n {n ,
where {n is the exact value of quantity n, have zero mean and unit variance.
The corresponding distribution of V is known as the chi-square distribution.
From the "2 -distribution, the "2 -test in statistics is deduced which determines the goodness of a model of a distribution to a set of measurements.
We refer for a discussion of the "2 -test to Leon-Garcia (1994, Section 3.8)
or Allen (1978, Section 8.4).
We first deduce the distribution of the square \ = [ 2 of a random variable
[ and note that if X and Y are independent so are the random variables
s
j(X ) and k(Y ). The event {\ |} or {[ 2 |} is equivalent to { | s
[ |} and non-existent if | ? 0. With (2.29) and | 0,
s
s
s
s
Pr [\ |] = Pr [ | [ |] = I[ ( |) I[ ( |)
and, after dierentiation,
s
s
i[ ( {) + i[ ( {)
s
i[ 2 ({) =
2 {
If [ is a Gaussian random variable Q(> 2 ), then is, for { 0,
h
i
2)
µ s ¶
exp ({+
{
2 2
s
cosh
i[ 2 ({) =
2
2{
In particular, for Q (0> 1) random variables where = 0 and = 1, i[ 2 ({) =
3.4 Functions of random variables
51
3{
2
h
I
reduces to a Gamma distribution (3.25) with = 12 and = 12 . Since
2{
the sum of q independent Gamma random variables with (> ) is again a
Gamma random variable (> q), we arrive at the chi-square "2 probability
density function,
q
{
{ 2 31
i"2 ({) = q ¡ q ¢ h3 2
2
2 2
(3.31)
3.4 Functions of random variables
3.4.1 The maximum and minimum of a set of independent
random variables
The minimum of p i.i.d. random variables {[n }1$n$p possesses the distribution4
¸
Pr min [n { = Pr [at least one [n {] = Pr [not all [n A {]
1$n$p
or
Pr
¸
p
Y
min [n { = 1 Pr[[n A {]
1$n$p
(3.32)
n=1
whereas for the maximum,
¸
p
Y
Pr[[n {]
Pr max [n A { = Pr [not all [n {] = 1 1$n$p
n=1
or
Pr
¸ Y
p
Pr[[n {]
max [n { =
1$n$p
(3.33)
n=1
For example, the distribution function for the minimum of p independent
exponential random variables follows from (3.17) as
!
Ã
¸
p
p
Y
X
3n {
h
= 1 exp {
n
Pr min [n { = 1 1$n$p
n=1
n=1
or, the minimum of p independent exponential random variables each with
Pp
rate n is again an exponential random variable with rate
n=0 n . In
addition to the memoryless property, this property of the exponential distribution will determine the fundamentals of Markov chains.
4
An alternative argument for independent random variables is that the event {min1$n$p [n A
{} is only possible if and only if {[n A {} for each 1 $ n $ p. Similarly, the event
{max1$n$p [n $ {} is only possible if and only if all {[n $ {} for each 1 $ n $ p.
52
Basic distributions
3.4.2 Order statistics
The set [(1) > [(2) > = = = > [(p) are called the order statistics of the set of
random variables {[n }1$n$p if [(n) is the n-th smallest value of the set
{[n }1$n$p . Clearly, [(1) = min1$n$p [n while [(p) = max1$n$p [n . If
the set {[n }1$n$p consists of i.i.d. random variables with pdf i[ , the joint
density function of the order statistics is, for only {1 ? {2 ? · · · ? {p ,
¤
£
Cp
i{[(m) } ({1 > {2 > = = = > {p ) =
Pr [(1) {1 > = = = > [(p) {p
C{1 = = = C{p
= p!
p
Y
i[ ({m )
(3.34)
m=1
Indeed, confining to discrete random variables for simplicity, if {1 ? {2 ?
· · · ? {p , then
¤
£
Pr [(1) = {1 > = = = > [(p) = {p = p! Pr [[1 = {1 > = = = > [p = {p ]
else
¤
£
Pr [(1) = {1 > [(2) = {2 > = = = > [(p) = {p = 0
because there are precisely p! permutations of the set {[n }1$n$p onto the
given ordered sequence {{1 > {2 > = = = > {p }. If the sequence is not ordered such
that {n A {o for at least©one couple of
ª indices n ? o, then the probability is
zero because the event [(n) A [(o) is, by definition, impossible. Finally,
the product in (3.34) follows by independence.
If the set {[n }1$n$p is uniformly distributed over [0> w], then
p!
i{[(m) } ({1 > {2 > = = = > {p ) = p
w
0 {1 ? {2 ? · · · ? {p w
=0
elsewhere
while for exponential random variables with i[ ({) = h3{
i{[(m) } ({1 > {2 > = = = > {p ) = p!p h3
=0
Sp
m=1 {m
0 {1 ? {2 ? · · · ? {p
elsewhere
The order relation between the set [(1) [(2) · · · ¡ [(p)
¢ is preserved
¡
¢
j
[
after a continuous,
non-decreasing
transform
j,
i.e.
j
[
(1)
(2)
¡
¢
· · · j [(p) . If the distribution function I[ is continuous (it is always
non-decreasing), the argument shows that the order statistics of a general
set of i.i.d. random variable {[n }1$n$p can be reduced to a study of the
order statistics of the set of i.i.d. uniform random variables {Xn }1$n$p on
[0,1] because X = I[ ([).
3.4 Functions of random variables
53
©
ª
The event [(n) { means that at least n among the p random variables {[m }1$m$p are smaller than {. Since each of the p random variables
is chosen independently from a same distribution I[ , the probability that
precisely q of the p random variables is smaller than { is binomially distributed with parameter s = Pr [[ {]. Hence,
p µ ¶
¤ X
£
p
(3.35)
(Pr [[ {])q (1 Pr [[ {])p3q
Pr [(n) { =
q
q=n
The probability density function can be obtained in the usual, though cumbersome, way by dierentiation,
¤
£
g Pr [(n) {
i[(n) ({) =
g{
p µ ¶
X
p g
=
(Pr [[ {])q (1 Pr [[ {])p3q
q g{
q=n
µ ¶
p
X
p
= i[ ({)
q
(Pr [[ {])q31 (1 Pr [[ {])p3q
q
q=n
µ ¶
p
X
p
i[ ({)
(p q)
(Pr [[ {])q (1 Pr [[ {])p3q31
q
q=n
¡p31¢
¡ ¢
¡p31¢
¡p¢
and lowering the upper index
Using q q = p q31 , (p q) p
q =p q
in the last summation, we have
¶
p µ
X
p1
i[(n) ({) = pi[ ({)
(Pr [[ {])q31 (1 Pr [[ {])p3q
q1
q=n
p31
X µp 1¶
(Pr [[ {])q (1 Pr [[ {])p3q31
pi[ ({)
q
q=n
¶
p µ
X
p1
= pi[ ({)
(Pr [[ {])q31 (1 Pr [[ {])p3q
q1
q=n
¶
p µ
X
p1
pi[ ({)
(Pr [[ {])q31 (1 Pr [[ {])p3q
q1
q=n+1
or, with I[ ({) = Pr [[ {],
µ
¶
p1
i[(n) ({) = pi[ ({)
(I[ ({))n31 (1 I[ ({))p3n
n1
(3.36)
The more elegant and faster argument is as follows: in order for [(n) to be
equal to {, exactly n 1 of the p random variables {[m }1$m$p must be
54
Basic distributions
less than {, one equal to { and the£ other p ¤ n must all be greater¡than¢ {.
Abusing the notation i[ ({) = Pr [(n) = { and observing that p p31
n31 =
p!
p!
1!(n31)!(p3n)! is an instance of the multinomial coe!cient q1 !q2 !···qn ! which
gives the number of ways of putting p = q1 + q2 + · · · + qn dierent objects
into n dierent boxes with qm in the m-th box, leads alternatively to (3.36).
3.5 Examples of other distributions
1. The Gumbel distribution appears in the theory of extremes (see Section 6.4) and is defined by the distribution function
3d({3e)
IGumbel ({) = h3h
(3.37)
The corresponding Laplace transform is
Z "
³
}´
3d(w3e)
*Gumbel (}) =
h3}w h3h
dh3d(w3e) gw = h3e} 1 +
d
3"
¢¯
¡
g 3e}
from which the mean follows as H [[] = g}
h 1 + d} ¯}=0 = e + d ,
where = 0.57721=== is the Euler constant. The variance is best computed
2
with (2.43) resulting in Var[[] = 6d
2.
2. The Cauchy distribution has the probability density function
iCauchy ({) =
and corresponding distribution,
ICauchy ({) =
1
(1 + {2 )
´
1 ³
+ arctan {
2
The Laplace transform
1
*Cauchy (}) =
(3.38)
Z "
h3}{ g{
2
3" 1 + {
only converges for purely imaginary } = l$, in which case it reduces to a
Fourier transform,
Z
1 " h3l${ g{
*Cauchy (l$) =
3" 1 + {2
This integral is best evaluated by contour integration. If $ 0, we consider
a contour F consisting of the real axis and the semi-circle that encloses the
negative Im({)-plane,
Z 3l${
Z " 3l${
Z 3l$uh3l ¡ 3l ¢
h
g uh
h
g{
h
g{
=
+ lim
2
2
2
u<" 0
1 + u h32l
3" 1 + {
F 1+{
3.5 Examples of other distributions
55
¯
¯
3l ¯
¯
Since ¯h3l$uh ¯ = h$u sin = h3|$|u sin and sin 0 for 0 , the limit
of the last integral vanishes. The contour encloses the simple pole (zero of
{2 + 1 = ({ l)({ + l)) at { = l. Applying Cauchy’s residue theorem, we
obtain
Z " 3l${
h
g{
h3l${ ({ + l)
=
2l
lim
= h3$
2
2
{<3l
1
+
{
1
+
{
3"
If $ 0, we close the contour over the positive Im({)-plane such that the
contribution of the semi-circle to the contour F again vanishes. The resulting
contour then encloses the simple pole at { = l and
Z " 3l${
h
g{
h3l${ ({ l)
=
2l
lim
= h3$
2
2
{<l
1
+
{
1
+
{
3"
Combining both expressions results in
£
¤
*Cauchy (l$) = H h3l$[ = h3|$|
Since |$| is not analytic around $ = 0, none of the moments of the Cauchy
distribution exists! Hence, the Cauchy distribution is an example of a distribution without mean (see the requirement for the existence
of the expectaR " {g{
tion in Section 2.3.2), although the improper integral 3" 1+{2 = 0 due to
R " {g{
R 0 {g{
diverge.
symmetry (in the Riemann sense), but both 3" 1+{
2 and 0
1+{2
Pq
In addition, if Vq = n=1 [n is the sum of i.i.d. Cauchy random variables
[n , the sample mean Vqq has the Fourier transform,
q
h
h $ i ³ h $ i´q
i
h $ Sq
i Y
Vq
H h3l$ q = H h3l q n=1 [n =
H h3l q [n = H h3l q [
= h3|$|
n=1
Hence, the sample mean Vqq of i.i.d. Cauchy random variables is again a
Cauchy random variable independent of q. This means that the law of large
numbers (see Section 6.2) does not hold for the Cauchy random variable,
as a consequence of the non-existence of the mean. Also, the sum Vq has
1
Fourier transform h3|q$| and the pdf equals iVq ({) = q 1+({@q)
2 .
(
)
3. The Weibull distribution with pdf defined for { 0 and d> e A 0
³ ¡ ¢´
e
exp {d
¢
¡
(3.39)
iWeibull ({) =
d 1 + 1e
generalizes the exponential distribution (3.17) corresponding to e = 1 and
d = 1 . It is related to the Gaussian distribution if e = 2. Let [ be a Weibull
56
Basic distributions
random variable. All higher moments can be computed from (2.34) as
¡
¢
µ ³ ´¶
Z "
h i
dn n+1
{ e
1
n
n
e
¢
¡
¡ ¢
{ exp g{ =
H [ =
d
d 1 + 1e 0
1e
The generating function possesses the expansion
µ
¶
"
"
£ 3}[ ¤ X
n + 1 (}d)n
(})n h n i
1 X
=
*[ (}) = H h
H [ = ¡1¢
n!
e
n!
e n=0
n=0
which cannot be summed in explicit form for general e.
Sometimes an alternative definition of the Weibull distribution appears
iWeibull ({) = de{e31 h3d{
e
(3.40)
3d{e
IWeibull ({) = 1 h
with the advantage of a simpler expression for the distribution function
IWeibull ({). If [ possesses this probability density (3.40), the moments and
variance are
¡
¢
h i Z "
n + 1e
n
n
{ iWeibull ({)g{ =
H [ =
dn@e
0
¡
¢
¡
¢
Z "
1 + 2e 2 1 + 1e
2
Var[[] =
({ H [[]) iWeibull ({)g{ =
d2@e
0
The interest of the Weibull distribution in the Internet stems from the
self-similar and long-range dependence of observables (i.e. quantities that
can be measured such as the delay, the interarrival times of packets, etc.).
Especially if the shape factor e 5 (0> 1), the Weibull has a sub-exponential
tail that decays more slowly than an exponential, but still faster than any
power law.
4. Power law behavior is often described via the Pareto distribution
with pdf for { 0 and A 0>
³
{ ´331
(3.41)
1+
iPareto ({) =
and with distribution function
Z
³
{³
{ ´3
{ ´331
IPareto ({) =
gw = 1 1 +
(3.42)
1+
0
Since lim{<" I ({) = 1, the power must exceed 0. The higher moments
are Beta-functions (Abramowitz and Stegun, 1968, Section 6.2.1)
h i Z "
{n g{
n ( n)
H [n =
{ +1 = n!
0 (1 + )
()
3.5 Examples of other distributions
57
£ ¤
and show that H [ n only exists if A n. Hence, the mean H [[] ¡only ex-¢
ists if A 1. The deep tail asymptotic for large { is iPareto ({) = R {331
and Pr [[ A {] = R ({3 ). For example, the distribution of the nodal degree
in the Internet has an exponent around = 2=4 (see Section 15.3).
5. Another distribution with heavy tails is the lognormal
¢ distribution
¡
defined as the random variable [ = h\ where \ = Q > 2 is£a Gaussian
¤
or normal random variable. From (2.32), it follows that Pr h\ { =
Pr [\ log {] for { 0, and with (3.20)
¸
Z log {
1
(w )2
Ilognormal ({) = s
exp gw
(3.43)
2 2
2 3"
and, for { A 0,
h
i
{3)2
exp (log2
2
s
ilognormal ({) =
{ 2
(3.44)
The moments are
¸
Z "
h i
(log { )2
1
n
n31
H [ = s
{
exp g{
2 2
2 0
¸
Z "
(x )2
1
nx
h exp gx
= s
2 2
2 3"
or, explicitly,
µ 2 2¶
h i
n n
H [ = exp (n) exp
2
and
(3.45)
³ 2
´
2
Var[[] = h2 h2 h
(3.46)
The probability generating function is by definition (2.37)
1
*[ (}; > 2 ) = s
2
Z "
0
h3
h3}w
(log w3)2
2 2
w
1
gw = s
2
Z "
{
({3)2
h3}h h3 22 g{
3"
(3.47)
(}; > 2 ) only exists for Re(}) 0.
The integral (3.47) indicates that *[
This means that *[ (}; > 2 ) is not analytic at any point } = lw on the
imaginary axis because the circle with arbitrary small but non-zero radius
around } = lw necessarily encircles points with Re(}) ? 0 where *[ (}; > 2 )
does not exist. Hence, the Taylor expansion (2.40) of the generating function
around } = 0 does not exist, although all moments or derivatives at } = 0
58
Basic distributions
exist. Indeed, the series
£ ¤
"
X
(1)n H [ n
n!
n=0
n
} =
"
X
(}h )n
n!
n=0
µ 2 2¶
n exp
2
is a divergent series (except for = 0 or } = 0). The fact that the pgf
(3.44) is not available in closed form complicates the computation of the
sum of i.i.d. lognormal random variables via (2.66). This sum appears in
radio communications with several transmitters and receivers.
In radio communications, the received signal levels decrease with the distance between the
transmitter and the receiver. This phenomenon is called pathloss. Attenuation of radio signals
due to pathloss has been modeled by averaging the measured signal powers over long times and
over various locations with the same distances to the transmitter. The mean value of the signal
power found in this way is referred to as the area mean power Pd (in Watts) and is well-modeled as
Pd (u) = f·u3 where f is a constant and is the pathloss exponent5 . In reality the received power
levels may vary significantly around the mean power Pd (u) due to irregularities in the surroundings
of the receiving and transmitting antennas. Measurements have revealed that the logarithm of
the mean power P (u) at dierent locations on a circle with radius u around the transmitter is
approximately normally distributed with mean equal to the logarithm of the area mean power
Pd (u). The lognormal shadowing model assumes that the logarithm of P(u) is precisely normally
distributed around the logarithmic value of the area mean power: log10 (P(u)) = log10 (Pd (u))+[,
where [ = Q (0> ) is a zero-mean normal distributed random variable (in dB) with standard
deviation (also in dB and for severe fluctuations up to 12 dB). Hence, the random variable
P(u) = Pd (u)10[ has a lognormal distribution (3.43) equal to
Pr [P(u) $ {] = Pr [ $ log10
&
%
] {
1
{
(log10 x 3 log10 (Pd (u)))2 gx
= I
exp 3
Pd (u)
22
x
2 log 10 0
3.6 Summary tables of probability distributions
3.6.1 Discrete random variables
Name
Pr [[ = n]
H [[]
Var[[]
£ ¤
*[ (}) = H } [
Bernoulli
Binomial
Geometric
Pr [[ = 1] = s
¡q¢ n
q3n
n s (1 s)
s (1 s)n31
s
qs
s (1 s)
qs (1 s)
1 s + s}
((1 s) + s})q
1
s
13s
s2
Poisson
n
s}
13(13s)}
h(}31)
5
3
n! h
The constant f depends on the transmitted power, the receiver and the transmitter antenna
gains and the wavelength. The pathloss exponent depends on the environment and terrain
structure and can vary between 2 in free space to 6 in urban areas.
3.7 Problems
59
3.6.2 Continuous random variables
Name
Uniform
Exponential
Gaussian
Gamma
Gumbel
Cauchy
Weibull
Pareto
Lognormal
i[ ({)
H [[]
Var[[]
*[ (}) = H h3}[
1d{e
e3d
h3{
({)2
exp 3
22
I
2
({)1 3{
h
K()
{
3{
h
h h
d+e
2
1
(e3d)2
12
1
2
h}d 3h}e
}(e3d)
}+
2
2
2
6
exp
1
(1+{2 )
e
de{e31 h3d{
331
1+ {
(log {)2
exp 3
22
I
{ 2
= 0=5772===
does not exist
does not exist
K(1+ 1
e)
K(1+ 2
3K2 (1+ 1
e)
e)
d1@e
1{A1}
31
}
2
l
3 }
}+
K (} + 1)
h3| Im(})| (Re(}) = 0)
d2@e
2 1{A2}
(31)2 (32)
exp () exp
k 2 2
2
2
2
2
h2 h2 3 h
3.7 Problems
(i) If *[ (}) is the probability generating function of a non-zero discrete
random variable [, find an expression of H [log [] in terms of *[ (}).
(ii) Compute the mean value of the n-th order statistic in an ensemble of
(a) p i.i.d. exponentially distributed random variables with mean 1
and (b) p i.i.d. polynomially distributed random variables on [0,1].
(iii) Discuss how a probability density function of a continuous random
variable [ can be approximated from a set {{1 > {2 > = = = > {q } of q measurements or simulations.
(iv) In a circle with radius u around a sending mobile node, there are
Q 1 other mobile nodes uniformly distributed over that circle. The
possible interference caused by these other mobile nodes depends on
their distance to the sending node at the center. Derive for large Q
but constant density of mobile nodes the pdf of the distance of the
p-th nearest node to the center.
(v) Let X and Y be two independent random variables. What is the
probability that the one is larger than the other?
4
Correlation
In this chapter methods to compute bi-variate correlated random variables
are discussed. As a measure for the correlation, the linear correlation coe!cient defined in (2.58) is used. First, the generation of q correlated Gaussian
random variables is explained. The sequel is devoted to the construction of
two correlated random variables with arbitrary distribution.
4.1 Generation of correlated Gaussian random variables
Due to the importance of Gaussian correlated random variables as an underlying system for generating arbitrary correlated random variables, as will be
demonstrated in Section 4.3, we discuss how they can be generated in multiple dimensions. With the notation of Section 3.2.3, a Gaussian (normal)
random variable with average and variance 2 is denoted by Q (> 2 ). By
linearly combining Gaussian random variables, we can create a new Gaussian
random variable with a desired mean and variance 2 .
4.1.1 Generation of two independent Gaussian random variables
The fact that a linear combination of Gaussian random variables is again a
Gaussian random variable allows us to concentrate on normalized Gaussian
random variables Q (0> 1). Let [1 and [2 be two independent normalized
Gaussian random variables. Independent random variables are not correlated and the linear correlation coe!cient = 0. The resulting joint probability distribution is i[1 [2 ({> |; ) = i[1 ({)i[2 (|) and with (3.19),
i[1 [2 ({> |; 0) =
61
h3
{2 +| 2
2
2
62
Correlation
It is natural to consider a polar transformation
³ ´and the transformed random
2
2
2
variables U = [1 + [2 and = arctan [
[1 . The inverse transform is
s
s
{ = u cos and | = u sin , which diers slightly from the usual polar
transformation in that we now define u = {2 + | 2 instead of u2 = {2 + |2 .
The reason is that the Jacobian is simpler for our purposes,
#
" cos s
C{ C{ ¸
I
u sin 1
2 u
Cu
C
=
M (u> ) = det C| C| = det sin s
I
u cos 2
Cu
C
2 u
whereas the usual polar transformation has the Jacobian equal to the variable u. Using the transformation rules in Section 2.5.4,
u
h3 2
iUX (u> ) =
4
which shows that iUX (u> ) does not depend on . Hence, we can write
iUX (u> ) = iU (u) iX () with iX () = f, where f is a constant and iU (u) =
u
h3 2
4f . This implies that is a uniform random variable over an interval 1@f.
We also recognize from (3.15) that iU (u) is close to an exponential random
variable with rate = 12 . Therefore, it is instructive to choose the constant
f such that U is precisely an exponential random variable with rate = 12 .
3u
1
1
Thus, choosing f = 2
, we end up with iU (u) = h 2 2 and iX () = 2
.
These two independent random variables U and can each be generated
separately from a uniform random variable X on [0,1], as discussed in Section
3.2.1, leading to
U = 2 ln(X1 )
= 2X2
and, finally, to the independent Gaussian random variables
p
p
[1 = 2 ln(X1 ) cos 2X2
[2 = 2 ln(X1 ) sin 2X2
The procedure can be used to generate a single Gaussian random variable,
but also more independent Gaussians by repeating the generation procedure.
4.1.2 The q-joint Gaussian probability distribution function
A collection of q random variables [l is called a random vector [ =
([1 > [2 > = = = > [q )W , a matrix with dimension q × 1. The average of a random
vector is a vector with components H [[l ] for 1 l q. The variance of a
random vector
£
¤
£
¤
Var [[] = H ([ H [[])([ H [[])W = H [[ W H [[] (H [[])W
4.1 Generation of correlated Gaussian random variables
63
is a matrix [ with elements ( [ )l>m = Cov[[l > [m ]. Since Cov[[l > [m ] =
Cov[[m > [l ], the covariance matrix [ is real and symmetric, [ = W[ .
The importance of real, symmetric matrices is that they have real eigenvectors (see Appendix A.2). Moreover, [ is non-negative definite because,
using vector norms defined in Section A.3,
£
¤
{W [ { = H {W ([ H [[])([ H [[])W {
i
h¡
¢W
= H ([ H [[])W { ([ H [[])W {
h°
°2 i
= H °([ H [[])W {°2 0
which implies that all real eigenvalues l are non-negative. Hence, there
exists an orthogonal matrix X such that
[ = Xdiag(l )X
W
(4.1)
If all random variables [l are independent, Cov[[l > [m ] = 0 for l 6= m and
Cov[[l > [l ] = Var[[l ] 0 then [ = diag(Var[[l ]).
Gaussian random variables are completely determined by the mean and
the variance, i.e. by the first two moments. We will now show that the
existence of an orthogonal transformation for any probability distribution
such that X W [ X = diag(l ) implies that a vector of joint Gaussian random
variables can be transformed into a vector of independent Gaussian random
variables. Also the reverse holds, which will be used below to generate q joint
correlated Gaussian random variables. The multi-dimensional generating
function of a q-joint Gaussian or q-joint normal random vector [ is defined
for the vector } = (}1 > }2 > = = = > }q )W as
µ
¶
£ 3}[ ¤
1 W
W
= exp
(4.2)
} [ } H [[] }
*[ (}) = H h
2
Using (4.1), and the fact that X is an orthogonal matrix such that X 31 = X W
and X X W = L,
µ
¶
¡ W
¢W W
1 ¡ W ¢W
W
*[ (}) = exp
X } diag(l )X } X H [[] X }
2
Denote the vectors z = X W } and p = X W H [[]. Then we have
µ
¶
1 W
W
z diag(l )z p z
*[ (}) = exp
2
3
4
q
q z2
2
X
Y
m m
m zm
C
D
= exp
h 2 3pm zm
pm zm =
2
m=1
m=1
64
Correlation
m zm2
3pm zm
2
and h
= *[m (zm ) is the Laplace transform (3.22) of a Gaussian
random variable [m because all m are real and non-negative. With (2.65),
this shows that a vector of joint Gaussian random variables can be transformed into a vector of independent Gaussian random variables. Reversing
the order of the manipulations also justifies that (4.2) indeed defines a general q-joint Gaussian probability generating function. If [1 > [2 > = = = > [q are
joint normal and not correlated, then [ is a diagonal matrix, which implies
that [1 > [2 > = = = > [q are independent. As discussed in Section 2.5.2, independence implies non-correlation, but the converse is generally not true. These
properties make Gaussian random variables particularly suited to deal with
correlations.
y
2
0
2
0.25
fXY(x,y) 0.2
0.15
0.1
0.05
0
2
0
x
2
Fig. 4.1. The joint probability density function (4.4) with [ = \ = 0 and
[ = \ = 1 and = 0=
The corresponding q-joint Gaussian probability density function of the
vector [ can be derived after inverse Laplace transform for the vector { =
({1 > {2 > = = = > {q )W as
µ
¶
1
1
W 31
i[ ({) = ¡s ¢q s
exp ({ H[{]) [ ({ H[{])
(4.3)
2
2
det [
The inverse Laplace transform for q = 2 is computed in Section C.2.
After computing the inverse matrix and the determinant in (4.3) explicitly, the two-dimensional (q = 2) or bi-variate Gaussian probability density
function is
5
2
2 6
({3[ )
exp 7
i[\ ({> |; ) =
2
[
3 2 ({3[ )(|3\ )+
[ \
2(132 )
2[ \
p
1 2
(|3\ )
2
\
8
(4.4)
4.1 Generation of correlated Gaussian random variables
65
Figures 4.1—4.3 plot i[\ ({> |; ) for various correlation coe!cients . If
= 0, we observe that i[\ ({> |; 0) = i[ ({)i\ (|), which indicates that
uncorrelated Gaussian random variables are also independent.
y
2
0
2
0.25
fXY(x,y) 0.2
0.15
0.1
0.05
0
2
0
x
2
Fig. 4.2. The joint probability density function (4.4) with [ = \ = 0 and
[ = \ = 1 and = 0=8=
y
2
0
2
fXY(x,y) 0.25
0.2
0.15
0.1
0.05
0
2
0
x
2
Fig. 4.3. The joint probability density function (4.4) with [ = \ = 0 and
[ = \ = 1 and = 0=8=
[)
\)
If we denote {W = ({3
and |W = (|3
, the bi-variate normal density
[
\
(4.4) reduces to
h
i
W 2
W | W +(| W )2
exp ({ ) 32{
2(132 )
p
i[\ ({W > |W ; ) =
2[ \ 1 2
66
Correlation
from which we can verify the partial dierential equation
Ci[\ ({W > |W ; )
Ci[\ ({W > |W ; )
=
C
C{W C|W
(4.5)
and the symmetry relations
i[\ ({W > |W ; ) = i[\ ({W > | W ; ) = i[\ ({W > | W ; )
(4.6)
4.1.3 Generation of q correlated Gaussian random variables
Let {[l }1$l$q be a set of q independent normal random variables, where
each [l is distributed as Q (0> 1). The vector [ is rather easily simulated. The analysis above shows that H [[] = 0 (the null-vector) and [ =
diag(Var[[l ]) = diag(1) = L, the identity matrix. We want to generate
the correlated normal vector \ with a given mean vector H [\ ] and a given
covariance matrix \ . Since linear combinations of normal random variables are normal random variables, we consider the linear transformation
\ = D[ + E where D and E are constant matrices. We will now determine
D and E. First,
H [\ ] = H [D[] + H [E] = DH [[] + H [E] = H [E]
Hence, the matrix E is a vector with components equal to the given components H [\l ] of the mean vector H [\ ]. Second,
£
¤
W
\ = H (\ H [\ ])(\ H [\ ])
£
¤
£
¤
£
¤
= H D[(D[)W = H D[[ W DW = DH [[ W DW
=D
W
W
[ D = DD
From the eigenvalue decomposition of \ = X diag(l )X W with real eigens ¡
s ¢W
such
values l 0 and the fact that diag(l ) = diag( l ) diag( l )
that
p ³
p ´W
DDW = X diag( l ) diag( l ) X W
we obtain
p
D = X diag( l )
The matrix D is also called the square root matrix of \ and can be found
from the singular value decomposition of \ or from Cholesky factorization
(Press et al., 1992).
Example Generate a normal vector \ with H [\ ] = (300> 300)W , with
standard deviations 1 = 106=066, 2 = 35=355 and correlation \ = 0=8.
4.2 Generation of correlated random variables
67
Solution: The covariance matrix of \ is obtained using the definition of
the linear correlation coe!cient (2.58),
¶
µ 2
¶ µ
1
11250 3000
\ 1 2
=
=
\
3000 1250
\ 1 2 22
The square root matrix D of
\
µ
D=
is
63=640 84=853
0
35=355
¶
which is readily checked by computing DDW = \ . It remains to generate
p independent draws for [1 and [2 from a normal distribution with zero
mean and unit variance as explained in Section 4.1.1. Each pair ([1 > [2 )
out of the p pairs is transformed as \ = D[ +H [\ ]. The result, component
\2 versus \1 , is shown in Fig. 4.4.
Y2
450
400
350
300
250
200
150
100
50
0
0
100
200
300
400
500
600
700
Y1
Fig. 4.4. The scatter diagram of the simulated vector \=
4.2 Generation of correlated random variables
Let us consider the problem of generating two correlated random variables
[ and \ with given distribution functions I[ and I\ . The correlation
is expressed in terms of the linear correlation coe!cient ([> \ ) = defined in (2.58). The need to generate correlated random variables often
occurs in simulations. For example, as shown in Kuipers and Van Mieghem
68
Correlation
(2003), correlations in the link weight structure may significantly increase
the computational complexity of multi-constrained routing, called in brief
QoS routing. The importance of measures of dependence between quantities
in risk and finance is discussed in Embrechts et al. (2001b).
In general, given the distribution functions I[ and I\ , not all linear
correlations from 1 1 are possible. Indeed, let [ and \ be positive
real random variables with infinite range which means that I[ ({) = 1 if
{ $ 4 and that I[ ({) = I\ ({) = 0 for { ? 0. Consider \ = d[ + e with
d ? 0 and e 0. For all finite | ? 0,
¸
¸
|e
|e
I\ (|) = Pr [\ |] = Pr [d[ + e |] = Pr [ Pr [ A
d
d
¶
µ
|e
= 1 I[
A0
d
which contradicts the fact that I\ (|) = 0 for | ? 0. Hence, positive random variables with infinite range cannot be correlated with = 1= The
requirement that the range needs to be unbounded is necessary because two
uniform random variables on [0> 1], X1 and X2 , are negatively correlated with
= 1 if X1 = 1 X2 .
In summary, the set of all possible correlations is a closed interval [min > max ]
for which min ? 0 ? max . The precise computation of min and max is, in
general, di!cult, as shown below.
4.3 The non-linear transformation method
The non-linear transformation approach starts from a given set of two random variables [1 and [2 that have a correlation coe!cient [ 5 [1> 1]. If
the joint distribution function
Z {1 Z {2
i[1 [2 (x> y; [ )gxgy
I[1 [2 ({1 > {2 ; [ ) = Pr [[1 {1 > [2 {2 ] =
3"
3"
is known, the marginal distribution follows from (2.60) as
Z {1 Z "
i[1 [2 (x> y; [ )gxgy
Pr [[1 {1 ] =
3"
3"
Since for any random variable [ holds that I[ ([) = X where X is a uniform
random variable on [0> 1], it follows that X1 = I[1 ([1 ) and X2 = I[2 ([2 )
are uniformly correlated random variables with correlation coe!cient X .
As shown in Section 3.2.1, if X is a uniform random variable on [0,1],
any other random variable \ with distribution function j({) can be constructed as j 31 (X ). By combining the two transforms, we can generate
4.3 The non-linear transformation method
69
\1 = j131 (I[1 ([1 )) and \2 = j231 (I[2 ([2 )) that are correlated because [1
and [2 are correlated. It may be possible to construct directly the correlated random variables \1 = W1 ([1 ) and \2 = W2 ([2 ) if the transforms W1
and W2 are known.
The goal is to determine the linear correlation coe!cient \ defined in
(2.58),
H [\1 \2 ] H [\1 ] H [\2 ]
p
\ = p
Var [\1 ] Var [\2 ]
as a function of [ . Using (2.61),
£
¤
H [\1 \2 ] = H j131 (I[1 ([1 )) j231 (I[2 ([2 ))
Z "Z "
j131 (I[1 (x)) j231 (I[2 (y)) i[1 [2 (x> y; [ )gxgy (4.7)
=
3"
3"
This relation shows that \ is a continuous function in [ and that the
joint distribution function of [1 and [2 is needed. The main di!culty lies
now in the computation of the integral appearing in H [\1 \2 ]. For [1 and
[2 , Gaussian correlated random variables are most often chosen because
an exact analytic expression (4.4) exists for the joint distribution function
I[2 [2 ({1 > {2 ; [ ).
4.3.1 Properties of \ as a function of [
From now on, we choose Gaussian correlated random variables for [1 and
[2 .
Theorem 4.3.1 The correlation coe!cient \ is a dierentiable and increasing function of [ .
Proof:
From the partial dierential equation (4.5) of i[1 [2 (x> y; [ ), it follows that
] " ] "
C 2 i[1 [2 (x> y; [ )
CH [\1 \2 ]
gxgy
=
j131 I[1 (x) j231 I[2 (y)
C[
CxCy
3" 3"
Partial integration with respect to x and y yields
] " ] "
gj131 I[1 (x) gj231 I[2 (y)
CH [\1 \2 ]
=
i[1 [2 (x> y; [ )gxgy
C[
gx
gx
3" 3"
1
1
Applying the chain rule for dierentiation and gj g{({) = j0 (j1
gives
({))
gj 31 ({) i[ (x)
g{
gj 31 (I[ (x))
=
= 0 31
gx
g{
j (j (I[ (x)))
{=I[ (x) gx
C\
1 \2 ]
Since j 0 ({) and i[ (x) are probability density functions and positive, CH[\
= C
A 0. Hence,
C
[
we have shown that \ is a dierentiable, increasing function of [ .
[
¤
70
Correlation
Since [ 5 [1> 1], \ increases from \ min at [ = 1 to \ max corresponding to [ = 1. In the sequel, we will derive expressions to compute
the boundary cases [ = 1 and [ = 1.
Theorem 4.3.2 (of Lancaster) For any two strictly increasing real functions W1 and W2 that transform the correlated Gaussian random variables [1
and [2 to the correlated random variables \1 = W1 ([1 ) and \2 = W2 ([2 ),
it holds that
|\ | |[ |
If two correlated random variables \1 and \2 can be obtained by separate
transformations from a bi-variate normal distribution with correlation coefficient [ , the correlation coe!cient \ of the transformed random variables
cannot in absolute value exceed [ . The interest of the proof is that it uses
powerful properties of orthogonal polynomials and that \ is expanded in a
power series in [ in (4.12).
Proof: The proof is based on the orthogonal Hermite polynomials Kq ({) (see e.g. Rainville
(1960) and Abramowitz and Stegun (1968, Chapter 22)) defined by the generating function
"
[
Kq ({) wq
exp 2{w 3 w2 =
q!
q=0
(4.8)
After expanding exp 2{w 3 w2 in a Taylor series and equating corresponding powers in w, we find
that
[ q2 ]
[
(31)n (2{)q32n
Kq ({) = q!
(4.9)
n! (q 3 2n)!
n=0
with K0 ({) = 1. The Hermite polynomials satisfy the orthogonality relations
] "
2
h3{ Kq ({) Kp ({) g{ = 0
p 6= q
3"
] "
3"
I
2
h3{ Kq2 ({) g{ = 2q q! These orthogonality relations enable us to expand functions in terms of Hermite polynomials
(similar to Fourier analysis). If the expansion of a function i ({),
i ({) =
"
[
dn Kn ({)
n=0
converges for all {, then it follows from the orthogonality relations that
] "
2
1
h3{ i ({) Kn ({) g{
dn = n I
2 n! 3"
The joint normalized Gaussian density function can be expanded (Rainville, 1960, pp. 197—198)
in terms of Hermite polynomials
2
2
exp 3 { 32{|+|
"
2
(13 )
2
2 [
q
s
= h3{ 3|
Kq ({) Kq (|) q
(4.10)
2 q!
1 3 2
q=0
4.3 The non-linear transformation method
71
In order for the covariance Cov[\1 \2 ] to exist, both H \12 and H \22 must be finite. Since
\m = Wm ([m ) for m = 1> 2, the mean is
&
%
] "
] "
1
({ 3 [ )2
H [\m ] =
g{
Wm ({)i[m ({) g{ = I
Wm ({) exp 3
2
2[
2[ 3"
3"
] "
I
2
1
= I
Wm ([ + 2[ x)h3x gx
3"
Let
"
[
I
Wm [ + 2[ x =
dn;m Kn (x)
(4.11)
n=0
with
dn;m =
1
I
2n n! ] "
3"
I
2
h3{ Wm [ + 2[ { Kn ({) g{
then, since K0 ({) = 1,
H [\m ] = d0;m
k l
The second moment H \m2 follows from (2.34) as
H \m2 =
] "
3"
1
= I
] "
1
2
(Wm ({)) i[m ({) g{ = I
2[
I
2
3x2
Wm ([ + 2[ x)h
gx
] "
3"
%
Wm2 ({) exp
&
({ 3 [ )2
3
g{
2
2[
3"
Substituting (4.11) gives
" [
"
[
1
dp;m dn;m I
H \m2 =
n=0 p=0
] "
3"
2
Kn (x) Kp (x) h3x gx =
"
[
d2n;m 2n n!
n=0
which is convergent. Similarly, using (4.4),
] " ] "
W1 (x) W2 (y) i[1 [2 (x> y; [ )gxgy
H [\1 \2 ] =
3"
3"
5
9
exp 73
] " ] "
W1 (x) W2 (y)
(x[ )2
2
[
6
2[
(y\ )2
(x3[ )(y3\ )+
2
[ \
:
\
8
2(132
[)
3
s
gxgy
1 3 2
{2 32[ {|+| 2
] " ] "
exp 3
I
I
(132[ )
W1 [ + 2[ { W2 \ + 2\ |
g{g|
=
t
3" 3"
1 3 2[
%
&
] " ] "
"
"
[
[
1
{2 3 2[ {| + | 2
= t
dn;1
dp;2
Kn ({) Kp (|) exp 3
g{g|
1 3 2[
3" 3"
1 3 2 n=0
p=0
=
3"
3"
2[ \
[
Using (4.10),
H [\1 \2 ] =
] "
] "
"
"
"
[
[
q
2
2
1 [
[
dn;1
dp;2
h3{ Kn ({) Kq ({) g{
h3| Kp (|) Kq (|) g|
q
n=0
2
q!
3"
3"
p=0
q=0
72
Correlation
Introducing the orthogonality relations for Hermite polynomials leads to
H [\1 \2 ] =
"
[
dq;1 dq;2 2q q!q
[
q=0
The correlation coe!cient becomes
S"
S"
q
q
dq;1 dq;2 2q q!q
q=0 dq;1 dq;2 2 q![ 3 d0;1 d0;2
[
\ = tS
tS
= tS q=1
S"
"
"
"
2
2
2
2
2
2 2n n!
n
n
n n!
d
2
n!
3
d
d
2
n!
3
d
d
2
d
n=0 n;1
n=1 n;2
n=1 n;1
n=1 n;2
0;1
0;2
I
I
S
S"
2
2
Denote q = dq;1 2q q! and q = dq;2 2q q!, then Var[\1 ] = "
n=1 n and Var[\2 ] =
n=1 n .
Since the linear correlation coe!cient ([> \ ) equals the correlation coe!cient of the corresponding normalized
random
with mean zero and variance 1, as shown in Section 2.5.3, we may
S
S"variable
2
2
choose "
n=1 n =
n=1 n = 1 such that
\ =
"
[
q q q
[
(4.12)
q=1
If 21 = 1 and 12 = 1, then |\ | = |[ | because all other n and n must then vanish. In all
other cases, either 21 ? 1 or 12 ? 1 or both, such that
\ = 1 1 [ +
"
[
q q q
[
q=2
and
y
y
x"
x"
"
"
[
[
x[
x[
q
q
q
2 | |q
q q [ $
|q q | |[ | $ w
2q |[ | w
q
[
q=2
q=2
q=2
where we have used the Cauchy-Schwarz inequality
partial summation,
"
[
2q |[ |q = (1 3 |[ |)
q=2
because
Sq
2
n=2 n ?
S
de $
q=2
sS
d2
sS
e2 (see Section 5.5). By
" [
q
[
|[ |2
2n |[ |q $ (1 3 |[ |) 1 3 21
= 1 3 21 |[ |2
1
3
|
|
[
q=2 n=2
S"
2
2
n=2 n = 1 3 1 . Thus
t
t
|\ | $ |[ | |1 1 | + 1 3 21 1 3 12 |[ |
Finally, for 21 $ 1 and 12 $ 1, the inequality |1 1 | +
Lancaster’s theorem because |[ | $ 1.
t
t
1 3 21 1 3 12 $ 1 holds. This proves
¤
4.3.2 Boundary cases
Let us investigate some cases for special values of [ .
1. [ = 0. Since uncorrelated Gaussian random variables ([ = 0) are
independent, also \1 = j131 (I[1 ([1 )) and \2 = j231 (I[2 ([2 )) are independent such that \ = 0. Hence, uncorrelated Gaussian random variables
with [ = 0 lead to uncorrelated random variables \1 and \2 with \ = 0.
4.3 The non-linear transformation method
73
2. [ = 1. Perfect positively correlated Gaussian random variables
[1 = [2 = [ have joint distribution
¸
1
(x [ )2
i[1 [2 (x> y; 1) = s
exp (x y)
2
2[
2[
which follows from Pr [[1 {1 > [2 {2 ] = Pr [[ {1 > [ {2 ] = Pr [[ {]
with { = max({1 > {2 ). In that case,
Z "
j131 (I[ (x)) j231 (I[ (x)) gI[ (x)
H [\1 \2 ] =
3"
Z 1
=
0
j131 ({) j231 ({) g{
(4.13)
which may lead to \ max ? 1 depending on the specifics of j1 and j2 .
By transforming { = j1 (x), we obtain
Z j31 (1)
1
xj231 (j1 (x)) j10 (x)gx
H [\1 \2 ] =
j131 (0)
which shows that, if j1 = j2 = j,
Z j31 (1)
£ ¤
x2 j 0 (x)gx = H \ 2
H [\1 \2 ] =
j 31 (0)
Hence, if \1 and \2 have the same distribution function j as \ , the case
[ = 1 leads to
£ ¤
H \ 2 (H [\ ])2
=1
\ =
Var [\ ]
3. [ = 1. Perfect negatively correlated Gaussian random variables
[1 = [2 = [ have joint distribution
¸
1
(x [ )2
i[1 [2 (x> y; 1) = s
exp (x + y)
2
2[
2[
which follows from the symmetry relations (4.6). In that case,
Z "
j131 (I[ (x)) j231 (I[ (x)) gI[ (x)
H [\1 \2 ] =
Z3"
"
j131 (I[ (x)) j231 (1 I[ (x)) gI[ (x)
=
3"
Z 1
=
0
j131 ({) j231 (1 {) g{
(4.14)
which may lead to \ min A 1, depending on the specifics of j1 and j2 .
74
Correlation
4.4 Examples of the non-linear transformation method
4.4.1 Correlated uniform random variables
Let us first focus on the relation between [ and X . Since H [X ] = 12 and
1
X2 = 12
, the definition of the linear correlation coe!cient (2.58) gives
X =
H [X1 X2 ] 14
1
12
where, using (2.61),
H [X1 X2 ] = H [I[1 ([1 )I[2 ([2 )]
Z "Z "
=
I[1 (x)I[2 (y)i[1 [2 (x> y; [ )gxgy
3"
3"
In the case of Gaussian correlated random variables specified by (3.20) and
(4.4), we must evaluate the integral
] "
] "
H [X1 X2 ] =
gx
3"
] x
3
gy
gw h
3"
(w[ )2
1
22
[1
] y
3"
3
g h
( [ )2
2
22
[2
3"
2
2 6
x[
y[
2[
1
2
3
x3[1 )(y3[2 )+
(
2
2
:
9
[1 [2
[1
[2
:
exp 9
8
73
2(132
[)
5
×
2 2
(2)2 [
1 [2
Substituting successively x0 =
H [X1 X2 ] =
1
] "
(2)2
3"
gx0
t
1 3 2[
x3[1
w3
y3
3
, w0 = [1 , y0 = [2 , 0 = [2 , we obtain
[1
[1
[2
[2
] "
gy 0
] x0
3"
w02
h3 2 gw0
3"
] y0
02
h3 2 g 0
3"
02
0 0
02
[ x y +y
exp 3 x 32
2(132
[)
t
1 3 2[
We now use the partial dierential equation (4.5),
4
3
02
0 0
02
x02 32[ x0 y 0 +y02
C2
exp 3 x 32[ x 2y +y
exp
3
Cx0 Cy 0
2(13[ )
2(132
F
C E
[)
E
F=
t
t
D
C C
1 3 2[
1 3 2[
such that
CH [X1 X2 ]
=
C
] "
gx0
] "
3"
gy0
] x0
3"
w02
h3 2 gw0
3"
] y0
3"
02
h3 2 g 0
C2
Cx0 Cy 0
02
0 0
02
exp 3 x 32[ x 2y +y
2(13[ )
t
2
(2)
1 3 2[
Partial integration in the last integral y 0
&$
#
%
C2
x02 3 2[ x0 y0 + y 02
exp 3
L2 =
gy
h
g
Cx0 Cy 0
2 1 3 2[
3"
3"
&$
#
%
] "
y2 C
x02 3 2[ x0 y + y2
=
exp
3
gyh3 2
Cx0
2 1 3 2[
3"
] "
0
] y0
02
3 2
0
4.4 Examples of the non-linear transformation method
75
yields
CH [X1 X2 ]
=
C
] "
y2
gyh3 2
] "
3"
gx0
] x0
C
Cx0
w02
h3 2 gw0
3"
3"
02
0 0
02
exp 3 x 32[ x 2y +y
2(13[ )
t
(2)2 1 3 2[
and similarly in the x0 integral,
CH [X1 X2 ]
1
t
=
2
C
(2)
1 3 2
gy
3"
[
%
] "
] "
gx exp 3
3"
&
x2 2 3 2[ 3 2[ xy + y 2 2 3 2[
2 1 3 2[
5 (432 )
$2 6
#
] "
y2 ] "
exp 3 2 23[2
(
2 3 2[
y
[)
[
x3 8
=
gy
t
gx exp 73 2 3 2[
2 1 3 2[
3"
3"
(2)2 1 3 2[
v v 2 2 3 2[
2 1 3 2[
1
1
1
=
t
t
=
2
2
2
2
4
3
2
3
2
[
[
(2)
1 3 [
4 3 2[
Thus, we find that
6
1
CX
= t
C[
4 3 2
[
or that
X =
6
]
1
6
[
t
+f
+ f = arcsin
2
4 3 2[
It remains to determine the constant f. We have shown in Section 4.3.2 that random variables
generated from uncorrelated Gaussian random variables are also uncorrelated implying that X =
0 if [ = 0 and, hence, that the constant f = 0. This finally results in
6
[
(4.15)
X = arcsin
2
In summary, two uniform correlated random variables X1 and X2 with
correlation coe!cient X are found by transforming two Gaussian correlated
¡
¢
random variables [1 and [2 with correlation coe!cient [ = 2 sin 6X .
Equation (4.15) further shows that X = ±1 if [ = ±1, which indicates
that the whole range of the correlation coe!cient X is possible.
4.4.2 Correlated exponential random variables
In Section 3.2.1, we have seen that, if X is a uniform random variable on
[0,1], j 31 (X ) = 1 log X is an exponential random variable with mean 1 .
The correlation coe!cient for two exponential random variables, \1 and \2 ,
with mean 11 and 12 respectively, is
\ =
H [\1 \2 ] 112
1
1 2
= H [1 2 \1 \2 ] 1
76
Correlation
As above, we generate \1 = 11 log I[1 ([1 ) and \2 = 12 log I[2 ([2 ),
where [1 and [2 are correlated Gaussian random variables with correlation
coe!cient [ . Then,
1 2
H [1 2 \1 \2 ] =
H [log I[1 ([1 ) log I[2 ([2 )]
Z 1" 2Z "
log I[1 (x) log I[2 (y)i[1 [2 (x> y; [ )gxgy
=
3"
3"
In the general case for [ 6= 0, the previous method can be followed, which yields after substitution
towards normalized variables,
2
2
] x
] y
exp 3 x 32[ xy+y
] "
] "
2
2
2
2
13
(
)
w
[
gx
gy log
h3 2 gw log
h3 2 g
t
H [1 2 \1 \2 ] =
3"
3"
3"
3"
(2)2 1 3 2[
Unfortunately, we cannot evaluate this integral analytically.
Let us compute the upper bound \ max from (4.13) with j131 ({) = 11 log {
and j231 ({) = 12 log {,
Z 1
log2 {g{ = 2
H [1 2 \1 \2 ; [ = 1] =
0
and thus \ max = 1. The lower boundary \ min follows from (4.14) as1 ,
Z 1
2
H [1 2 \1 \2 ; [ = 1] =
log { log(1 {)g{ = 2 6
0
2
Here, we find \ min = 1 6 = 0.644 934===.
In summary, exponential correlated random variables can be generated
from Gaussian correlated random variables, but the correlation coe!cient
1
Substituting the Taylor expansion log(1 3 {) = 3
] 1
log { log(1 3 {)g{ = 3
0
and
] 1
{n log {g{ = 3
0
] "
S"
{n
n=1 n
"
[
1
] 1
n
n=1
0
gives
{n log {g{
h3(n+1)x xgx = 3
0
1
(n + 1)2
Thus,
] 1
log { log(1 3 {)g{ =
0
"
[
n=1
1
n(n + 1)2
1
1
1
1
Since n(n+1)
2 = n 3 n+1 3 (n+1)2 ,
] 1
log { log(1 3{)g{ =
0
"
[
1
n=1
n
3
"
[
1
n=2
n
3
"
[
1
n=2
n2
= 13
"
[
1
n=2
n2
= 23
"
[
1
n=1
n2
= 2 3(2) = 2 3
2
6
4.4 Examples of the non-linear transformation method
77
h
i
2
\ is limited to the interval 1 6 > 1 . As explained in the introduction
of Section 4.2, the exponential random variables are positive with infinite
range for which not all negative correlations are possible. The analysis
demonstrates that it is not possible to construct two exponential random
2
variables with correlation coe!cient smaller than \ min = 1 6 ' 0=645.
4.4.3 Correlated lognormal random variables
Two correlated lognormal random variables \1 and \2 with distribution
specified in (3.43) can be constructed directly from two correlated Gaussian
random variables [1 and [2 . In particular, let \1 = hd1 [1 and \2 = hd2 [2 .
The explicit scaling parameters can be used to determine the desired mean.
From (4.7),
Z "Z "
H [\1 \2 ] =
3"
3"
hd1 x hd2 y i[1 [2 (x> y; [ )gxgy
¸
12 2
22 2
= exp d1 1 + d2 2 + d1 + d1 d2 [ 1 2 + d2
2
2
where the Laplace transform (4.2) for q = 2 has been used. Invoking (3.45)
and (3.46) with m $ dm m and m2 $ d2m m2 , the correlation coe!cient \ is
hd1 1 d2 2 [ 1
\ = r³
´³
´
2 2
2 2
hd1 1 1 hd2 2 1
(4.16)
If at least one (but not all) of the quantities 1 > 2 > d1 or d2 grows large, \
tends to zero irrespective of [ . Thus even if [1 and [2 and, hence also
\1 and \2 , have the strongest kind of dependence possible, i.e. [ = ±1,
the correlation coe!cient \ can be made arbitrarily small. In case d1 1 =
d2 2 = , (4.16) reduces to
2
\ =
h [ 1
h2 1
2
We observe that \ max = 1, while \ min = h3 A 1 for A 0; again a
manifestation that for positive random variables with infinite range not all
negative correlations are possible.
78
Correlation
4.5 Linear combination of independent auxiliary random
variables
In spite of the generality of the non-linear transformation method, the involved computational di!culty suggests us to investigate simpler methods of
construction. It is instructive to consider two independent random variables
Y and Z with known probability generating functions *Y (}) and *Z (})
respectively. In the discussion of the uniform random variable in Section
3.2.1, it was shown how to generate by computer an arbitrary random variable from a uniform random variable. We thus assume that Y and Z can
be constructed. Let us now write [ and \ as a linear combination of Y and
Z,
[ = d11 Y + d12 Z + e1
\ = d21 Y + d22 Z + e2
which is specified by the matrix
D=
d11 d12
d21 d22
¸
and compute the covariance defined in (2.56),
Cov [[> \ ] = H [[\ ] [ \
h £ ¤
i
= d11 d21 H Y 2 (H [Y ])2 + (d11 d22 + d12 d21 ) H [Y Z ]
h £
i
¤
(d11 d22 + d12 d21 ) H[Y ] H[Z ] + d12 d22 H Z 2 (H[Z ])2
Since X and Y are independent, H [Y Z ] = H [Y ] H [Z ], and with the definition of the variance (2.16) and denoting Y2 =Var[Y ] and similarly for Z ,
we obtain
2
Cov [[> \ ] = d11 d21 Y2 + d12 d22 Z
In a same way, we find
2
2
[
= d211 Y2 + d212 Z
2
\2 = d221 Y2 + d222 Z
(4.17)
such that the correlation coe!cient, in general, becomes
2
d11 d21 Y2 + d12 d22 Z
q
= q
2
2
d211 Y2 + d212 Z
d221 Y2 + d222 Z
which is independent of the constants e1 and e2 since for a centered moment
H ([ H [[])2 = H ([ + e H [[ + e])2 .
4.5 Linear combination of independent auxiliary random variables
79
In order to achieve our goal of constructing two correlated random variables [ and \ , we can choose the coe!cients of the matrix D to obtain
an expression as simple as possible. If we choose [ = Y or d11 = 1,
d12 = e1 = 0, the correlation coe!cient reduces to
2
d21 [
1
= q q
=r
2
2
2 + d2 2
d2 Z
[
d221 [
22 Z
1 + 22
2
2
d21 [
By rewriting this relation, we obtain
d21 = ±
If we choose d22 =
d
Z
p 22
[
1 2
p
1 2 , the random variables [ and \ are specified as
[=Y
\ =±
p
Z
Y + 1 2 Z + e2
[
2 = 2 and 2 = 2 . Finally,
and the corresponding variances (4.17) are [
Y
\
Z
we require that H [Z ] = Z = 0, which specifies
e2 = H [\ ]
\
H [[]
[
If Z is a zero mean random variable with standard deviation Z = \ , the
random variables [ and \ are correlated with correlation coe!cient p
\
\
\ =±
[ + 1 2 Z + \
[
(4.18)
[
[
In the sequel, we take the positive sign for .
Let us now investigate what happens with the distribution functions
of [¤
£ 3}[
and \ . Using the
pgfs
for
continuous
random
variables
*
(})
=
H
h
[
¤
£
and *\ (}) = H h3}\ , the last relation (4.18) becomes
h
i h s 2 i
£ 3}\ ¤
3} \ 3 \ [
3} \ [
[
H h
H h [
H h3} 13 Z
=h
because Y = [ and Z are independent, or
¶
µ
´
³p
\
3} \ 3 \ [
[
*\ (}) = h
*[
} *Z
1 2 }
[
(4.19)
In order to produce two random variables [ and \ that are correlated with
correlation coe!cient , the pgf of the zero mean random variable Z with
80
Correlation
variance \2 must obey,
h
\ 3 \ [
[
I
}
132
µ
*Z (}) =
*[
[
µ
*\
\
s
13
¶
s} 2
13
¶
(4.20)
}
2
which can be written in terms of the translated random variables \ 0 =
\ \ and [ 0 = [ \ [ ,
µ
¶
}
*\ 0 s 2
13
¶
µ
*Z (}) =
\
s
*[ 0
}
2
[
13
This form shows that, if [ 0 and \ 0 have a same distribution, Z possesses,
in general, a dierent distribution. Only the pgf of a Gaussian (with zero
mean) obeys the functional equation
µ
¶
}
i s 2
13
¶
i (}) = µ
i s 2}
13
The joint probability generating function follows from (2.61) as
Z "Z "
¤
£
h3}1 {3}2 | i[\ ({> |)g{g|
*[\ (}1 > }2 ) = H h3}1 [3}2 \ =
3"
and the inverse is
i[\ ({> |) =
1
(2l)2
Z f1 +l" Z f2 3l"
f1 3l"
f2 3l"
3"
h}1 {+}2 | *[\ (}1 > }2 )g}1 g}2
(4.21)
Using (4.18), we have
¸
s 2 i
h
3 }1 +}2 \ [
[
[
H h
*[\ (}1 > }2 ) = h
H h3}2 13 Z
µ
¶
³ p
´
\
3} 3 \ *Z }2 1 2 (4.22)
= h 2 \ [ [ *[ }1 + }2
[
3}2 \ 3 \ [
Introduced into the complex double integral (4.21), the joint probability
density function of the two correlated random variables can be computed.
The main deficiency of the linear combination method is the implicit assumption that any joint distribution function i[\ ({> |) can be constructed
from two independent random variables [ and Z . The corresponding joint
4.5 Linear combination of independent auxiliary random variables
81
pgf (4.22) possesses a product form that cannot always be made compatible
with the form of an arbitrary pgf *[\ (}1 > }2 ). The examples below illustrate
this deficiency.
4.5.1 Correlated Gaussian random variables
¤
£
If [ and \ are Gaussian random variables with Laplace transform H h3}\
given in (3.22), the expression (4.20) for *Z (}) becomes
µ 2 ¶ ¸
\
*Z (}) = exp
}2
2
which shows that Z is also a Gaussian random variable with mean Z = 0
and standard deviation Z = \ . Further, the joint pgf follows from (4.22)
as
¸
2
£ 3}1 [3}2 \ ¤
[
\2 2
2
} + }1 }2 \ [ +
}
= exp }2 \ [ }1 +
H h
2 1
2 2
Since
¸
¸
2
2
¤
[
\2 2 1 £
}1
[ \
[
2
} + }1 }2 \ [ +
} =
}1 }2
[ \
\2
}2
2 1
2 2
2
¤
£
formula (4.2) indicates that H h3}1 [3}2 \ is the two dimensional pgf of a
joint Gaussian with pdf (4.4). The linear combination method thus provides
the exact results for correlated Gaussian random variables.
4.5.2 Correlated exponential random variables
Let [ and \ be two correlated, exponential random variables with rate {
and | . Recall that H [[] = [ = 1{ . Using the Laplace transform (3.16)
in (4.20), we obtain
*Z (}) = h |
13
I
132
}
1+
s}
132
}
1+ s 2
| 13
|
The corresponding probability distribution function follows from (2.38) as
IZ (w) =
1
2l
Z f+l" 1 +
f3l"
s}
} w+
132 h
1 + s} 2
| 13
|
|
13
I
}
132
g}
fA0
82
Correlation
Define the normalized time W =
1
IZ (w) =
2l
|
s1
Z f+l"
f3l"
132
, then
1 + W } h}(w+(13)W )
g}
1 + }W
}
Since w + (1 ) W A 0, the contour can be closed over the negative Re(})
plane encircling the poles at } = W1 and } = 0. By Cauchy’s residue
theorem,
¢
¡
(1 + W }) } + W1 }(w+(13)W )
(1 + W }) } }(w+(13)W )
IZ (w) = lim
+ lim
h
h
}<0 (1 + }W ) }
(1 + }W ) }
}<3 W1
w
= 1 (1 ) h3(13) h3 W
Hence, for the generation of two exponential, correlated random variables,
the auxiliary random variable Z has an exponential distribution with an
atom of size (1 ) h3(13) at w = 0, which is fortunately easily to generate
with a computer. It appears that only for 0, the linear combination
method leads to correct results for exponential random variables. Moreover,
the method does not give anh indication
i of the validity in the range of . We
2
have shown above that 5 1 6 > 1 .
While the linear combination method applied to generate two exponential random variables still correctly treats a range of , the application to
correlated uniform random variables leads to bizarre results and definitely
shows the deficiency of the method. The di!culties already encountered in
this chapter in generating q = 2 correlated random variables with arbitrary
distribution suspects that the case for q A 2 must be even more intractable.
4.6 Problem
(i) Show in two dimensions that (4.3), or in explicit form (4.4), is indeed
the joint pdf corresponding to (4.2).
5
Inequalities
Hardy et al. (1999) view the most known inequalities from various angles,
provide several dierent proofs and relate the nature of these inequalities.
For example, starting from the most basic inequality between geometric and
arithmetic mean1 ,
{+|
s
max({> |)
(5.1)
{| 2
¡s
s ¢2
{ | 0, they masterly extend this
which directly follows from
relation to the theorem of the arithmetic and geometric mean in several real
variables {n ,
q
q
Y
X
{tnn tn {n
(5.2)
min({> |) Pq
n=1
n=1
where
n=1 tn = 1. They further move to the inequalities of CauchySchwarz, of Hölder, of Minkowski and many more. Only a few inequalities
are reviewed here and we recommend the classic treatise on inequalities by
Hardy, Littlewood and Polya for those who search for more depth, elegance
and insight.
5.1 The minimum (maximum) and infimum (supremum)
Since these concepts will be frequently used, we explain the dierence by
concentrating on the minimum and infimum (the maximum and the supremum follow analogously). Let be a non-empty subset of R. The subset
1
The arithmetic-geometric mean P({> |) is the limit for q < " of the recursion {q =
I
1
({q31 + |q31 ), which is an arithmetic mean, and |q = {q31 |q31 , which is a geometric
2
mean, with initial values {0 = { and |0 = |. Gauss’s famous discovery on intriguing properties
of P({> |) (which lead e.g. to very fast converging series for computing ) is narrated in a
paper by Almkvist and Berndt (1988).
83
84
Inequalities
is said to be bounded from below by P if there exists a number P such that,
for all { 5 holds that { P . The largest lower bound (largest number
P ) is called the infimum and is denoted by inf ( ). Further, if there exists
an element p 5
such that p { for all { 5 , then this element p is
called the minimum and is denoted by min ( ). If the minimum min ( ) exists, then min ( ) = inf ( ). However, the minimum does not always exists.
The classical example is the open interval (d> e), where inf ((d> e)) = d, but
the minimum does not exist because d 5
@ (d> e). On the other hand, for the
closed interval [d> e], we have that inf ([d> e]) = min ([d> e]) = d. This example
also illustrates that every finite non-empty subset of R has a minimum.
5.2 Continuous convex functions
A continuous function i ({) that satisfies for x and y belonging to an interval
L,
µ
¶
x+y
i (x) + i(y)
i
2
2
is called convex in that interval L. If i is convex, i is concave. Hardy
et al. (1999, Section 3.6) demonstrate that this condition is fundamental
from which the more general condition2
!
à q
q
X
X
i
tn {n tn i ({n )
(5.3)
n=1
Pq
n=1
where n=1 tn = 1, can be deduced. Moreover, they show that a convex
function is either very regular or very irregular and that a convex function
that is not “entirely irregular” is necessarily continuous. Current textbooks,
in particular the book by Boyd and Vandenberghe (2004), usually start with
the definition of convexity from (5.3) in case q = 2 where t1 = 1 t2 = t
and 0 t 1 as
i (tx + (1 t)y) ti (x) + (1 t)i (y)
(5.4)
where x and y can be vectors in an p-dimensional space.Fig_convex
Geometrically with p = 1 as illustrated in Fig. 5.1, relation (5.4) shows
that each point on the chord between (x> i (x)) and (y> i (y)) lies above the
2
The convexity concept can be generalized (Hardy et al., 1999, Section 98) to several variables
in which case the condition (5.3) becomes
$
#
[
[
[
tn {n >
tn |n $
tn i ({n > |n )
i
n
n
n
5.2 Continuous convex functions
85
f(x)
f
f(v)
c2
c1
f(u)
u a
a'
b'
b v
x
Fig. 5.1. The function i is convex between x and y.
curve i in the interval L. The more general form (5.3) asserts that the
centre of gravity of any number of arbitrarily weighted points of the curve
lies above or on the curve. Figure 5.1 illustrates that for any convex function
i and points d> d0 > e> e0 5 [x> y] such that d d0 e0 and d ? e e0 , the
chord f1 over (d> e) has a smaller slope than the chord f2 over (d0 > e0 ) or,
0
(d)
(d0 )
i (ee)3i
= Suppose that i ({) is twice dierentiable
equivalently, i (e)3i
0 3d0
e3d
in the interval L, then a necessary and su!cient condition for convexity is
i 00 ({) 0 for each { 5 L. This theorem is proved in Hardy et al. (1999,
pp. 76—77). Moreover, they prove that the equality in (5.3) can only occur
if i({) is linear.
Applied to probability, relation (5.3) with tn = Pr [[ = n] and {n = n is
written with (2.12) as
i (H [[]) H [i ([)]
(5.5)
and is known as Jensen’s inequality. The Jensen’s inequality (5.5) also hold
for continuous random variables. Indeed, if i is dierentiable and convex,
then i ({) i (|) i 0 (|)({ |). Substitute { by the random variable [
and | = H [[], then
i ([) i(H [[]) i 0 (H [[])({ H [[])
After applying the expectation operator to both sides, we obtain (5.5). An
important application of Jensen’s inequality is obtained for i ({) = h3}{
with real } as
£
¤
h3}H[[] H h3}[ = *[ (})
86
Inequalities
Any probability generating function *[ (}) is, for real }, bounded from
below by h3}H[[] .
A continuous analog of (5.3) with i ({) = h{ (and similarly for i ({) =
log {)
¸
Z y
Z y
1
1
exp
i ({)g{ hi ({) g{
yx x
yx x
can be regarded as a generalization of the inequality between arithmetic and
geometric mean.
5.3 Inequalities deduced from the Mean Value Theorem
The mean value theorem (Whittaker and Watson, 1996, p. 65) states that
if j({) is continuous on { 5 [d> e], there exists a number 5 [d> e] such that
Z e
j(x)gx = (e d)j()
d
or, alternatively, if i ({) is dierentiable on [d> e], then
i (e) i (d) = (e d)i 0 ()
(5.6)
R{
The equivalence follows by putting i ({) = d j(x)gx= It is convenient to
rewrite this relation for 0 1 as
i ({ + k) i ({) = ki 0 ({ + k)
In this form, the mean value theorem is nothing else than a special case for
q = 1 of Taylor’s theorem (Whittaker and Watson, 1996, p. 96),
i ({ + k) i ({) =
q31
X
n=1
i (n) ({) n kq (q)
k +
i ({ + k)
n!
q!
(5.7)
An important application of Taylor’s Theorem (or of the mean value theorem) to the exponential function gives a list of inequalities. First,
h{ = 1 + { +
{2 {
h
2
and, since h{ A 0 for any finite {, we have for any { 6= 0,
h{ A 1 + {
(5.8)
5.4 The Markov and Chebyshev inequalities
87
A direct generalization follows from Taylor’s Theorem (5.7),
{
h =
q31
X
{n {q {
+
h
n!
q!
n=0
such that, for q = 2p and any {,
h{ A
2p31
X
n=0
{n
n!
and, for q = 2p + 1,
h{ A
2p n
X
{
n=0
h{ ?
2p n
X
{
n=0
Second, estimates of the product
from (5.8) as3
q
Y
{A0
n!
{?0
n!
Qq
n=0 (1 + dn {) where dn { 6= 0 are obtained
(1 + dn {) ? exp
à q
X
n=0
!
dn {
n=0
5.4 The Markov and Chebyshev inequalities
Consider first a non-negative random variable [. The expectation reads
Z d
Z "
Z "
{i[ ({)g{ =
{i[ ({)g{ +
{i[ ({)g{
H [[] =
0
d
Z0 "
Z "
{i[ ({)g{ d
i[ ({)g{ = d Pr [[ d]
d
d
Hence, we obtain the Markov inequality
Pr [[ d] 3
H [[]
d
(5.9)
A tighter bound
relation indicates
SqThe above
Tqis obtained if all dn A 0 (e.g. dn is a probability).
that j({) =
n=0 (1 + dn {) is smaller than i ({) = exp {
n=0 dn for any { 6= 0 and
j(0) = i (0) = 1. Further, from (1 + dn {) ? hdn { it can be verified that, S
for all Taylor
q
n
coe!cients 1 ? n $ q holds that 0 ? jn $ in and j2 ? i2 such that j({) =
n=0 jn { ?
Sq
S
S
"
q
n ?
n for { A 0. Thus, for { = 1, we have j(1) ?
i
{
i
{
i
or
n=0 n
n=0 n
n=0 n
q
\
(1 + dn ) ?
n=0
q
[
1
n=0
n!
# q
[
n=0
$n
dn
88
Inequalities
Another proof of the Markov inequality follows after taking the expectation
of the inequality d1[Dd [ for [ 0. The restriction to non-negative
random variables can be circumvented by considering the random variable
[ = (\ H [\ ])2 and d = w2 in (5.9),
h
i
h
i H (\ H [\ ])2
Var [\ ]
Pr (\ H [\ ])2 w2 =
w2
w2
From this, the Chebyshev inequality follows as
Pr [|[ H [[]| w] 2
w2
(5.10)
The Chebyshev inequality quantifies the spread of [ around the mean H [[].
The smaller , the more concentrated [ is around the mean.
Further extensions of the Markov inequality use the equivalence between
the events {[ d} / {j([) j(d)} where j is a monotonously increasing
function. Hence, (5.9) becomes
Pr [[ d] H [j([)]
j(d)
H [[ n ]
For example, if j({) = {n , then Pr [[ d] dn . An interesting application of this idea is based on the equivalence of the events {[ H [[]+w} /
{hx[ hx(H[[]+w) } provided x 0. For x 0,
h
i
¤
£
Pr [[ H [[] + w] = Pr hx[ hx(H[[]+w) h3x(H[[]+w) H hx[
(5.11)
where in the last step Markov’s inequality
£ (5.9)
¤ has been used. If the generating function or Laplace transform H hx[ is known, the sharpest bound
is obtained by the minimizer xW in x of the right-hand side because (5.11)
holds for any x A 0. In Section 5.7, we show that this minimizer xW obeying
Re x A 0 indeed exists for probability generating functions. The resulting
inequality
h W i
W
(5.12)
Pr [[ H [[] + w] h3x (H[[]+w) H hx [
is called the Cherno bound.
The Cherno bound of the binomial distribution Let [ denote
a binomial random variable hwith probability
generating function given by
i
£ x[ ¤
[
x
= H (h ) = (t + shx )q . Then, with H [[] = qs,
(3.2) such that H h
£
¤
x
h3x(H[[]+w) H hx[ = h3x(qs+w)+q log(t+sh )
5.4 The Markov and Chebyshev inequalities
£ x[ ¤¯¯
g2 3x(H[[]+w)
Provided gx
h
H
h
¯
2
x=xW
89
A 0, the minimum xW is solution of
g 3x(H[[]+w) £ x[ ¤
=0
H h
h
gx
Explicitly,
µ
¶
qshx
g 3x(H[[]+w) £ x[ ¤
x
= h3x(qs+w)+q log(t+sh ) (qs + w) +
H h
h
gx
t + shx
from which xW follows using t = 1 s as
µ
¶
qst + tw
xW = log
qst sw
Hence,
³
´w3qt
w
h
i
1
qt
W
W
h3x (H[[]+w) H hx [ = ³
´w+qs
w
1 + qs
£ W ¤
W
For large q, but s and w fixed, we observe4 that h3x (H[[]+w) H hx [ =
¡ ¢¢
w2 ¡
w2
h3 qst 1 + R q1 . Since Var[[] = qst and by denoting | 2 = Var[[]
, we
find that the asymptotic regime for large q,
#
"
|[ H [[]|
2
| h3|
Pr p
(5.13)
Var [[]
is in agreement with the Central Limit Theorem 6.3.1. The corresponding
Chebyshev inequality,
#
"
1
|[ H [[]|
| 2
Pr p
|
Var [[]
is considerably less tight for the binomial distribution than the Cherno
bound (5.13). More advanced and sharper inequalities than that of Chebyshev are surveyed by Janson (2002).
4
Write
k l
w
w
3 (w + qs) log 1 +
h3x (H[[]+w) H hx [ = exp (w 3 qt) log 1 3
qt
qs
and use the Taylor expansion of log (1 ± {) around { = 0.
90
Inequalities
5.5 The Hölder, Minkowski and Young inequalities
The Hölder inequality is
³X ´ ³X ´
³X ´
X
d
}
d e · · · } e ···
+ +··· = 1
Let d = [ and e = \ , and further s = 1 A 1 and t = 1 A 1 such that
1
1
s + t = 1, then we obtain as a frequently used application,
H [[\ ] (H [[ s ])1@s (H [\ t ])1@t
(5.14)
The Hölder inequality can be deduced from the basic convexity inequality
(5.4). Since log { is a convex function for real { A 0, the basic convexity
inequality (5.4) is with 0 1,
log (x + (1 )y) log(x) + (1 ) log(y)
After exponentiation, we obtain for x> y A 0 a more general inequality than
(5.1), which corresponds to = 12 ,
x y 13 x + (1 )y
|{ |s
|| |t
Substitute x = Sq m |{m |s and y = Sq m ||m |t , then
m=1
Ã
|{ |s
Pq m
s
m=1 |{m |
! Ã
m=1
|| |t
Pq m
t
m=1 ||m |
!13
|{m |s
||m |t
P
Pq
+
(1
)
q
s
t
m=1 |{m |
m=1 ||m |
and summing over all m yields
q
X
3
4 3
413
q
q
X
X
|{m |s ||m |t(13) C
|{m |s D C
||m |t D
m=1
m=1
(5.15)
m=1
1
By choosing s = 1 and t = 13
, we arrive at the Hölder inequality with
1
1
s A 1 and s + t = 1,
q
X
m=1
3
41 3
41
s
t
q
q
X
X
s
t
|{m |m | C
|{m | D C
||m | D
m=1
(5.16)
m=1
A special important case of the Hölder inequality (5.14) for s = t = 2 is
the Cauchy—Schwarz inequality,
£ ¤ £ ¤
(5.17)
(H [[\ ])2 H [ 2 H \ 2
It is of interest to mention that the Hölder inequality is of a general type in
5.5 The Hölder, Minkowski and Young inequalities
91
the following sense (Hardy et al., 1999, Theorem 101 (p. 82)). Suppose that
i ({) is convex (such Rthat the inverse j({) =Ri 31 ({) is also convex) and that
{
{
i (0) = 0. If I ({) = 0 i (x)gx and J({) = 0 j(x)gx, and if
à q
à q
!
!
q
X
X
X
tn dn en I 31
tn I (dn ) J31
tn J(en )
Pq
n=1
n=1
n=1
with n=1 tn = 1 holds for all positive dn and en , then i ({) = {u and the
above inequality is Hölder’s inequality.
The next inequalities are of a dierent type. For s A 1, the Minkowski
inequality is
(H [|[ + \ |s ])1@s (H [|[|s ])1@s + (H [|\ |s ])1@s
(5.18)
or, written algebraically,
41 3
41 3
41
3
s
s
s
q
q
q
X
X
X
s
s
s
C
|{m + |m | D C
|{m | D + C
||m | D
m=1
m=1
m=1
Suppose that i ({) is continuous and strictly increasing for { 0 and i(0) =
0. Then the inverse function j({) = i 31 ({) satisfies the same conditions.
The Young inequality states that for d 0 and e 0 holds that
Z e
Z d
i (x)gx +
j(x)gx
(5.19)
de 0
0
with equality only if e = i (d). The Young inequality follows by geometrical
consideration. The first integral is the area under the curve | = i ({) from
[0> d], while the second is the area under the curve { = j(|) = i 31 (|) from
[0> e].
Applications of the Cauchy—Schwarz inequality
¤ will demonstrate that both the generating function *[ (}) =
£1. We
H h3}[ and its logarithm O[ (}) = log(*[ (})) are convex functions of
}. First, the
¤ derivative is continuous and non-negative because
£ second
*00[ (}) = H [ 2 h3}[ 0. Further, since
*[ (})*00[ (}) (*0[ (}))2
00
O[ (}) =
*2[ (})
it remains to show that *[ (})*00[ (})(*£0[ (}))2 ¤ 0. From Cauchy—Schwarz
}
inequality (5.17) applied to *0[ (}) = H [h3}[ with [ $ h3 2 [ and \ =
¤¢
¤
£
¤
¡
£
£
}
2
[h3 2 [ , we obtain (*0[ (}))2 = H [h3}[
H h3}[ H [ 2 h3}[ =
*[ (})*00[ (}). Hence, O00[ (}) 0.
92
Inequalities
2. Let \ = 1[A0 in (5.17) while [ is a non-negative random variable,
then with (2.13),
£ ¤
£ ¤
(H [[])2 H [ 2 H [1[A0 ] = H [ 2 (1 Pr [[ = 0])
such that an upper bound for Pr [[ = 0] is obtained,
Pr [[ = 0] 1 (H [[])2
H [[ 2 ]
(5.20)
5.6 The Gauss inequality
In this section, we consider a continuous random variable [ with even probability density function, i.e. i[ ({) = i[ ({), which is not increasing for
{ A 0. A typical example of such random variables are measurement errors
due to statistical fluctuations.
In his epoch-making paper, Gauss (1821) established the method of the
least squares (see e.g. Section 2.2.1 and 2.5.3). In that same paper, Gauss
(1821, pp. 10-11) also stated and proved Theorem 5.6.1, which is appealing
because of its generality.
We define the probability p as
Z i[ (x) gx
(5.21)
p = Pr [ [ ] =
3
p
where = Var [[] is the standard deviation.
Theorem 5.6.1 (Gauss) If [ is a continuous random variable with even
probability density function, i.e. i[ ({) = i[ ({), which is not increasing
for { A 0, then
s
if p ? 23 then q
p 3
if p = 23 then 43
2
if p A 23 then ? 3I13p
and, conversely,
q
if ?
if A
4
then p I3
4
3
then p 1 42
q3
Given a bound on the probability p, Gauss’s Theorem 5.6.1 bounds the
extent of the error [ around its mean zero in units of the standard deviation
or, equivalently, it provides bounds for the normalized random variable
5.6 The Gauss inequality
93
[ W = [3H[[]
. The proof of this theorem only uses real function theory and
is characteristic for the genius of Gauss.
Proof: Consider the inverse function { = j (|) of the integral | =
I[ (3{). An interesting general property of the inverse function is
] 1
j 2 (x) gx =
] "
3"
0
U{
3{ i[ (x) gx = I[ ({) 3
{2 i[ ({) g{
which is verified by the substitution { = j (x). Since H [[] = 0 and Var[[] = H [ 2 , we have
] 1
j 2 (x) gx = 2 = Var [[]
(5.22)
0
Beside j (0) = 0, the derivative
j 0 (|) =
1
1
=
0 ({) 3 I 0 (3{)
I[
i
({)
+
i[ (3{)
[
[
is increasing from | = 0 until | = 1 because i[ ({) attains a maximum at { = 0 and is not
00
increasing for { A 0. Hence, j (|) D 0. From the dierential
00
g |j0 (|) = j 0 (|) g| + |j (|) g|
we obtain by integration
|j 0 (|) 3 j (|) =
] |
00
xj (x) gx
0
00
Since j (|) D 0, we have that |j 0 (|) 3 j (|) D 0 and since |j 0 (|) A 0 (for | A 0) that
k (|) = 1 3
j (|)
|j 0 (|)
lies in the interval [0> 1]. From (5.21), it follows that = j (p) and that k (p) = 1 3 pj
0 (p) or
j 0 (p) =
p (1 3 k (p))
With this preparation, consider now the following linear function
J (|) =
(| 3 pk(p))
p (1 3 k (p))
(5.23)
Clearly, we have that J (p) = and that J0 (|) = p(13k(p))
= j 0 (p) is independent of
|. Since j 0 (|) is non decreasing — which is the basic assumption of the theorem — the dierence
g
j 0 (|)3J0 (|) is negative if | ? p, but positive if | A p. Since j 0 (|)3J0 (|) = g|
(j (|) 3 J (|)),
the function j (|) 3 J (|) is convex with minimum at | = p for which j (p) 3 J (p) = 0. Hence,
j (|) 3 J (|) D 0 for all | M [0> 1]. Further, J (|) is positive for | M (pk (p) > 1]. Especially in this
interval, the inequality j (|) D J (|) is sharp because j (|) is positive in (0> 1]. Thus,
] 1
J2 (|) g| $
pk(p)
] 1
j 2 (|) g| ?
pk(p)
] 1
j 2 (|) g|
0
Using (5.22) and with (5.23), we have
2 2
p2 (1 3 k (p))2
(1 3 pk(p))3
? 2
3
94
Inequalities
from which we arrive at the inequality
2 ?
3p2 (1 3 })2
(5.24)
(1 3 p})3
where } = k (p) M [0> 1]. The derivative of the right-hand side with respect to },
#
$
g
3 (1 3 })2 2
3p2 (1 3 })
=3
p
(2 3 3p + p})
g} (1 3 p})3
(1 3 p})4
3p2 (13})2
is monotonously decreasing for all } M [0> 1] if p ? 23 with maximum at
(13p})3
I
} = 0. Thus, if p ? 23 , evaluating (5.24) at } = 0 yields ? 3p. On the other hand, if p A 23 ,
2
2
(13})
2
then 3p
is maximal provided 2 3 3p + p} = 0 or for } = 3 3 p
. With that value of },
(13p})3
2
2
2
I
the inequality (5.24) yields ? 3 13p . Both regimes p A 3 and p ? 3 tend to a same bound
? I2 if p < 23 . The converse is similarly derived from (5.24).
¤
3
shows that
1
If [ has a symmetric uniform distribution with i[ ({) = 2d
1{M[3d>d] , then
d
p = d and = I3 from which p = I3 . This example shows that Gauss’s
Theorem 5.6.1 is sharpsfor p 23 in the sense that equality can occur in
the first condition 3p.
5.7 The dominant pole approximation and large deviations
In this section, we relate asymptotic results of generating functions to the
theory of large deviations. An asymptotic expansion in discrete-time is
compared to established large deviations results.
The first approach using the generating function *[ (}) of the random
variable [ is an immediate consequence of Lemma 5.7.1.
Lemma 5.7.1 If *[ (}) is meromorphic with residues un at the (simple)
poles sn ordered as 0 ? |s0 | |s1 | |s2 | · · · and if *[ (}) = r(} Q+1 ) as
} $ 4, then holds
*[ (}) =
=
Q
X
n=0
Q
X
n
Pr [[ = n] } +
n
Pr [[ = n] } +
n=0
"
X
un
} Q+1
Q +1
s
(}
s
)
n
n=0 n
"
X
n=0
Ã
un
Q
X
1
}p
+
} sn p=0 sp+1
n
(5.25)
!
(5.26)
The normalization condition *[ (1) = 1 implies that
Pr[[ A Q ] = 1 Q
X
n=0
Pr [[ = n] =
"
X
un
Q+1
s
(1 sn )
n=0 n
(5.27)
5.7 The dominant pole approximation and large deviations
95
The Lemma follows from Titchmarsh (1964, Section 3.21). Rewriting
(5.26) gives,
Ã"
!
Q
"
X
X
X un
n
}m
*[ (}) =
Pr [[ = n] } (5.28)
m+1
s
m=Q+1 n=0 n
n=0
and hence,
Pr [[ = m] = "
X
un
m+1
n=0 sn
(m A Q )
(5.29)
The cumulative density function for N A Q follows from (5.29) as
Pr[[ A N] =
"
X
Pr [[ = m] =
m=N+1
"
X
un
N+1
s
(1 sn )
n=0 n
(N A Q ) (5.30)
Lemma 5.7.1 means that, ¡if the
¢ plot Pr [[ = m] versus m exhibits a kink at
m = Q , then *[ (}) = R } Q as } $ 4. Alternatively5 , the asymptotic
regime does not start earlier than m Q . For large N, only the pole with
smallest modulus, s0 , will dominate. Hence,
u0
Pr[[ A N] N+1
(5.31)
s0
(1 s0 )
This approximation is called the dominant pole approximation with the
residue at the simple pole s0 equal to u0 = lim}<s0 *[ (})(} s0 ).
The second approach is a large deviations approximation in discrete-time.
We have
"
X
Pr [[ = m]
log Pr [[ A N] = log
log
m=N+1
"
X
{m3N31 Pr [[ = m]
m=N+1
3
log C{3N31
"
X
({ 5 R and { 1)
4
{m Pr [[ = m]D
m=0
= (N + 1) log { log *[ ({)
(5.32)
This inequality holds for all real { 1. To get the tightest bound, we
determine the maximizer {max of (5.32), thus L(N) = sup{D1 [(N +1) log {
log *[ ({)]. There exists such a supremum on account of the convexity of
5
In terms of the queue occupancy in ATM, the initial Pr [[ = m]-regime for m ? Q reflects the
cell scale, while the asymptotic regime m D Q refers to the burst scale.
96
Inequalities
L(N) because *[ ({) and log *[ ({) are convex for { 1 as shown in Section
5.5. Assuming that the maximum, say {max exists, then it is solution of
[ ({m a x )
{max = (N + 1) *
*0 ({m a x ) and the large deviations estimate becomes
[
3(N+1)
Pr [[ A N] h3[(N+1) log {m a x 3log *[ ({m a x ))] = *[ ({max ) {max
(5.33)
Observe that (5.33) can be obtained directly from (5.11) with N = w+H[[].
Comparing (5.33) and (5.31) indicates, for large N, that {max = s0 because
lim
N<"
log Pr [[ A N]
= log s0 = log {max
N
Example A frequently appearing “dominant pole” (see, for example, the
extinction probability of a Poisson branching process in Section 12.3, the
M/D/1 queue in Section 14.5 and the size of the giant component in the
random graph in Section 15.6.4) is the real zero dierent from 1 of h(}31) }. The trivial zero is } = 1. The non-trivial solution h(31) = can be
expressed as a Lagrange series (Markushevich, 1985, p. 94) for6 A 1,
= h3
"
X
(q + 1)q31 ³ 3 ´q
h
q!
q=0
(5.34)
An exact and fast converging expansion for around = 1,
"
(1 )2 2 (1 )3 22 (1 )4 52 (1 )5
2
(1 ) +
=1+
+
+
+
3
9
135
405
20 (1 )6 3824 (1 )7 1424 (1 )8 15856 (1 )9
+
+
+
189
42525
18225
229635
#
10
11
¢
¡
44536288 (1 )
11714672 (1 )
11
+
+ r (1 )
+
189448875
795685275
+
(5.35)
(}31)
is derived in Van Mieghem (1996) as the zero of h }31 3} . The numerical
data show that the approximation ' 12 , which can be deduced from the
series, is within 1% accurate for 0=84 ? 1.
6
From (14.43) we observe for d = e = 1 and } = that in (5.34) = 1 for all 0 $ ? 1.
6
Limit laws
Limit laws lie at the heart of analysis and probability theory. Solutions of
problems often considerably simplify in limit cases. For example, in Section
16.5.1, the flooding time in the complete graph with Q nodes and exponentially distributed link weights can be computed exactly. However, the
expression is unattractive, but, fortunately, the limit result for Q $ 4 is
appealing. Many more results and deep discussions are found in the books
of Feller (1970, 1971). In this chapter, we will mainly be concerned with
P
sums of independent random variables, Vq = qn=1 [n .
6.1 General theorems from analysis
In this section, we define modes of convergence of sequences of random
variables and state (without proof) some general theorems that will be used
later on.
6.1.1 Summability
We will need results from the analysis on summability1 . First the discrete
case is presented and then the continuous case.
Lemma 6.1.1 Let {dq }qD1 be a sequence of numbers with limq<" dq = d,
then the average of the partial sums converges to d,
1 X
dp = d
q<" q
p=1
q
lim
1
(6.1)
In his classical treatise on Divergent Series, Hardy (1948) discusses Césaro, Abel, Euler and
Borel summability in depth.
97
98
Limit laws
Proof: The demonstration of (6.1) is short enough to include here. The
fact that there is a limit d of the sequence d1 > d2 > = = = implies that, for an
arbitrary % A 0, there exist a finite number q0 such that, for all q A q0 ,
P
holds that |dq d| ? %. Consider the average partial sum vq = q1 qp=1 dp
or, rewritten,
vq d =
q0
q
1 X
1 X
(dp d) +
(dp d)
q p=1
q p=q
0
Hence,
µ
¶
q0
q
q q0
1 X
1 X
f
|vq d| |dp d| +
|dp d| ? +
%
q
q p=q
q
q
p=1
0
f
? +%
q
Since f is a constant, qf can be made arbitrarily small for q large enough
¤
such that |vq d| ? %, which is equivalent to (6.1).
In fact, as illustrated by many examples in Hardy (1948, Chapter I and
II), relation (6.1) converges in more cases than limq<" dq = d does. For
example, if d2q = 1 and d2q+1 = 0, the limit limq<" dq does not exist, but
(6.1) tends to 12 . Probabilistically, the Lemma 6.1.1 is closely related to the
sample mean and the Law of Large Numbers (Section 6.2).
The continuous case distinguishes between limw<" j(w), which is called the
pointwise limit (for su!ciently large w, all points
R w w will be arbitrarily close
to that limit) and between the limit limw<" 1w 0 j(x)gx, which is called the
time average2 of j=
Lemma 6.1.2 If the pointwise limit limw<" j(w) = j" exists, then the time
average
Z
1 w
j(x)gx = j" =
lim
w<" w 0
Proof: The proof is analogous to that of Lemma 6.1.1 in the discrete
case since limw<" j(w) = j" means that for an arbitrary % A 0, there exist
a finite number W such that, for all w A W holds that |j(w) j" | ? %. For
any w A W ,
Z
Z
Z
1 W
1 w
1 w
j(x)gx j" =
(j(x) j" ) gx +
(j(x) j" ) gx
w 0
w 0
w W
2
In summability theory, it is also known as the Cesaro limit of j.
6.1 General theorems from analysis
and
99
¯Z
¯
¯ Z w
¯
Z
¯ 1¯ W
¯1
¯ 1 w
¯
¯
¯
¯
j(x)gx j" ¯ ¯
(j(x) j" ) gx¯ +
|j(x) j" | gx
¯w
w 0
w W
0
wW
f
? +%
w
w
Since f is a constant, the lemma follows by letting w $ 4.
¤
Both in Markov theory (Section 9.3.2) and in Little’s Law (Section 13.6)
these Lemmas will be used.
6.1.2 Convergence of a sequence of random variables
A sequence {[n }nD0 of random variables may converge to a random variable
[ in several ways. If
¸
Pr lim |[n [| = 0 = 1
n<"
then the sequence {[n }nD0 converges to [ with probability 1 (w.p. 1) or
almost surely (a.s.). This mode of convergence is denoted by [n $ [ w.p. 1
or a.s. as n $ 4. If, for any A 0,
lim Pr [|[n [| A ] = 0
n<"
then it is said that the sequence {[n }nD0 converges in probability or in meas
sure to [. This mode of convergence is denoted by [n $ [ as n $ 4.
Convergence in probability is a weaker notion of convergence than almost
sure convergence. Almost sure convergence implies convergence in probability, whereas convergence in probability means that there exists a subsequence of {[n }nD0 that converges almost surely. An equivalent criterion for
almost surely convergence is
Pr [|[n [| A i.o.] = 0
where “i.o.” stands for “infinitely often”, thus for an infinite number of n.
If, for all { with the possible exception of a set of measure zero where
I[q ({) is discontinuous, the distributions
lim I[q ({) = I[ ({)
n<"
g
then the sequence {[n }nD0 converges in distribution to [, denoted as [n $
100
Limit laws
g
[ as n $ 4 or, sometimes, in mixed form as [n $ I[ as n $ 4. If, for
1 t,
lim H [|[n [|t ] = 0
n<"
t
then the sequence
R " {[n }tnD0 converges to [ in O , the space of all functions
i for which 3" |i ({) | g{ ? 4. The most common values of t are 1, 2 and
t = 4. This convergence is also called convergence in norm (see Appendix
A.3). The Markov inequality (5.9)
H [|[n [|]
shows that convergence in mean (t = 1) implies convergence in probability.
In general, it is fair to say that the convergence of sequences belong to the
most complicated topics in both analysis and probability theory. In many
limit theorems, for example, the Law of Large Numbers in Section 6.2 and
Little’s Law in Section 13.6, the art consists in proving the theorem with
the least possible number of assumptions or in its most widely applicable
form.
Pr [|[n [| ] 6.1.3 List of general theorems
Theorem 6.1.3 (Continuity Theorem) Let {Iq }qD1 be a sequence of
distribution functions with corresponding probability generating functions
{*q }qD1 . If limq<" *q (}) = *(}) exists for all }, and, in addition if *
is continuous at } = 0, then there exists a limiting distribution function I
g
with generating function * for which Iq $ I .
Proof: See e.g. Berger (1993, p. 51).
¤
Theorem 6.1.4 (Dominated Convergence Theorem) Let {iq }qD1 and
i be real functions and suppose that for each {
lim iq ({) = i ({)
q<"
If there exists a real function j ({) such that |iq ({)| ? j({) and for which
the random variable j([) has finite expectation, then
lim H [iq ([)] = H [i ([)]
q<"
Proof: See e.g. Royden (1988, Chapter 4).
¤
6.2 Law of Large Numbers
101
6.2 Law of Large Numbers
Theorem 6.2.1 (Weak Law of Large Numbers) Let {[n } be a sequence
of independent random variables each with distribution identical to that of
the random variable [ with = H [[]. If the expectation = H [[] exists,
then, for any A 0,
¯
¸
¯
¯
¯ Vq
(6.2)
lim Pr ¯¯
¯¯ = 0
q<"
q
Proof 3 : Replacing [n by [n demonstrates that, without loss of
generality,
that = 0. Denote Xq = Vqq , then *Xq (}) =
£ 3}X ¤ we may
£ 3}Vassume
¤
q
H h
= H h q @q . Since the set {[n } is independent with common
distribution, applying relation (2.66) yields
³
³ } ´´q
*Xq (}) = *[
q
Since the expectation exists ( = 0), the Taylor expansion
¢¢q of *[
¡
¡ } (2.40)
around } = 0 is *[ (}) = 1 + r(})
. Taking
¡ and¡*X¢¢q (}) = ¡ 1¢+ r q
the logarithm, log(*Xq (})) = q log 1 + r q} = q=r q} = r(}) for large q
such that limq<" *Xq (}) = 1. By the Continuity Theorem 6.1.3, *X (}) =
¤
£
g
H h3}X = 1 which implies that Xq $ 0. Hence, the sequence Vqq converges
in distribution to = 0, which is equivalent to (6.2).
¤
The Weak Law of Large Numbers is a general result of the behavior of
the sample mean Vqq of independent random variables with same existing
expectation . It is weak in the sense that only convergence in probability
is established. For large q, the weak law of large numbers states that the
sample mean Vqq will be close (less than an arbitrary
¯ ) to the expectation
¯V
¯ q ¯ remains small for all
with high probability. It does not imply
that
¯
¯q
the¯ Weak Law
large q. In fact, large fluctuations in ¯ Vqq ¯ can happen;
¯
of Large Numbers only concludes that large values of ¯ Vqq ¯ occur with
(very) small probability. For example, in a coin-tossing experiment with
a fair coin such that Pr [[n = 1] = Pr [[n = 0] = = 12 in q-trials, the
sequence of always head {[n = 1}1$n$q is possible with probability 23q
and Vqq = 1 A . But, only for q $ 4, the probability of this “always
head sequence” is impossible (limq<" 23q = 0). For all finite q there is a
non-zero probability of having a large deviation from the mean.
If we assume in addition to the existence of the expectation that also the
variance Var[[] exists, the Weak Law follows from the Chebyshev inequality
3
An alternative proof is given in Feller (1970, p. 247—248).
102
Limit laws
(5.10). This exemplifies the increasingly complexity if less restrictions in
the theorems are assumed. Indeed, using (2.57) for independent random
£ ¤
and the Chebyshev inequality (5.10) gives
variables, Var Vqq = Var[[]
q
¯
¸
¯
¯ Vq
¯
Var [[]
¯
¯
¯ Pr ¯
q
q2
which tends to zero for any fixed and finite Var[[]. In fact, with the
additional assumption of a finite variance Var[[], a much more precise result
can be proved known as the Central Limit Theorem (Section 6.3). We
remark that the Weak Law of Large Numbers also holds in the case Var[[]
does not exist.
Theorem 6.2.2 (Strong Law of Large Numbers) Let {[n } be a sequence of independent random variables each with distribution identical to
that of the random variable [ with = H [[]. If the expectation = H [[]
and variance Var[[] exists, then,
¸
Vq
Pr lim
= =1
(6.3)
q<" q
Proof : See e.g. Feller (1970, p. 259—261), Berger (1993, pp. 46—48) or
Wol (1989, pp. 40—41). Their proof is based on the Kolmogorov criterion:
P
Var[[n ]
the convergence of "
is a su!cient condition for the Strong Law
n=1
n2
of Large Numbers for independent random£ variables
[n with mean H [[n ]
¤
and variance Var[[n ]. If the existence of H [ 4 is assumed, Ross (1996, pp.
56—58) and Billingsley (1995, p. 85) provide a dierent proof. Wol (1989,
pp. 41—42) remarks that both the Weak and Strong Laws hold under much
weaker conditions: it is only needed that the [n are not correlated. In other
words, Vqq $ w.p. 1 implies H [[n ] = even if Var[[n ] = 4.
¤
¯
¯
The Strong Law of Large Numbers roughly states that ¯ Vqq ¯ remains
small for su!ciently large q with overwhelming probability. The importance
of the Law of Large Numbers is the mathematical foundation of the intuition
that the sample mean is the best estimator.
Theorem 6.2.3 (Law of the Iterated Logarithm) Let {[n } be a sequence of independent random variables each with distribution identical to
that of the random variable [ with = H [[] and, if Var[[] exists, then,
¸
Vq q
Pr lim sup s
=1 =1
(6.4)
q<"
2q log log q
6.3 Central Limit Theorem
103
Proof: See e.g. Billingsley (1995, p. 154—156) or Feller (1970, Section
¤
VIII.5)4 .
In addition to the Weak and Strong Laws of Large Numbers, the
¯ Law of
¯
the Iterated Logarithm provides information about large values of ¯ Vqq ¯.
q
¯
¯
Specifically, it states that the bound ¯ Vq ¯ 2 log log q holds almost
q
q
surely. The latter means that it is satisfied infinitelyqoften and only for a
¯
¯
finite number of values of q, the converse ¯ Vq ¯ A 2 log log q may occur.
q
q
6.3 Central Limit Theorem
Theorem 6.3.1 (Central Limit Theorem) Let {[n } be a sequence of
independent random variables each with distribution identical to that of the
3q g
random variable [ with finite = H [[] and 2 = Var[[]. Then VqI
$
q
Q (0> 1) or, explicitly,
¸
Z {
w2
1
Vq q
s
h3 2 gw
{ $ ({) = s
Pr
q
2 3"
Proof : Without loss of generality, we may confine to normalized random
variables — replace [n by [n3 — such that = 0 and = 1. Consider the
scaled random variable Xq = q Vq , where q is a real number depending on
q and to be determined later. Similarly as in the proof of the Weak Law of
Large Numbers, we find that *Xq (}) = (*[ (q }))q . Due to the existence
of the variance, the Taylor expansion (2.40) of *[ around } = 0 is known
2
with higher precision as *[ (}) = 1 + }2 + r(} 2 ). For su!ciently small }, the
logarithm
µ
¶
q2 } 2
2 } 2
2 2
+ r(dq } ) = q q + r(qd2q } 2 )
log(*Xq (})) = q log 1 +
2
2
³ ´
only converges to a finite (non-zero) number if q = R I1q . Choosing the simplest function that satisfies this condition, q = I1q , leads to
2
limq<" log(*Xq (})) = }2 or, since
³ the
´ logarithm is a continuous, increasing
2
function, limq<" *Xq (}) = exp }2 . The transform (3.22) shows that the
corresponding limit random variable is a Gaussian Q (0> 1). The theorem
then follows by virtue of the Continuity Theorem 6.1.3.
¤
4
Feller also mentions sharper bounds.
104
Limit laws
An alternative formulation of the Central Limit Theorem is that the nfold convolution of any probability density function
converges
to a Gaussian
h
i
2
(nW)
({3)
1
probability distribution, i[ ({) $ I2 exp 22
with = nH [[]
and 2 = nVar[[]. Both the Law of Large Numbers and the Central Limit
Theorem can be shown to be valid for a surprisingly large class of sequences
{[n } where each random variable may have a dierent distribution. The
conditions for the extension of the Central Limit Theorem are summarized
in the Lindeberg conditions (Feller, 1971, p. 263). An example where the
sum of independent random variables tend to a dierent limit distribution
than the Gaussian appears in Section 16.5.1.
If higher moments are known, the convergence to the Gaussian distribution can be bounded. Feller (1971, Chapter XVI) devotes a chapter on
expansions related to the Central Limit Theorem culminating in the BerryEsseen Theorem.
Theorem 6.3.2 (Berry—Esseen Theorem) Let {[n } be a sequence of
independent random variables each with distribution identical to that
h of the
i
3
2
random variable [ with finite = H [[], = Var[[] and = H |[3|
.
3
Then, with F = 3,
¯ ¯
¸
¯
¯ F
Vq q
¯
s
{ ({)¯¯ s
sup ¯Pr
q
q
{
(6.5)
Proof : See e.g. Feller (1971, Section XVI.5). The constant F can be
slightly improved to F 2=05.
¤
As an example of the rate of convergence towards the Gaussian distribution, the n-fold convolutions of the uniform density given by (3.30) is plotted
in Fig. 6.1 together with the Gaussian approximation (3.19).
6.4 Extremal distributions
6.4.1 Scaling laws
In this section, limit properties of the maximum and minimum of a set {[n }
of independent random variables are discussed. For simplicity, we assume
that all random variables [n have identical distribution I ({) = Pr [[ {]
6.4 Extremal distributions
105
1.0
Pdf of k-convolved uniform random variables
k=2
Exact convolution
Gaussian approximation
0.8
k=3
k=4
0.6
k=8
0.4
k = 16
0.2
0.0
0
2
4
6
8
10
12
14
x
(n)
Fig. 6.1. Both the exact iX ({) with iX ({) = 10{1 and the Gaussian approximation for several values of n.
such that (3.33) and (3.32) simplify to
¸
Pr max [n { = I p ({)
1$n$p
¸
Pr min [n A { = (1 I ({))p
1$n$p
Consider the limit process when p $ 4. Let {{p } be a sequence of real
numbers. Then, confining to the maximum first,
µ ¸¶
log Pr max [n {p
= p log I ({p )
1$n$p
Since 0 I ({p ) 1 and since the logarithm has a Taylor expansion log(1
P
{n
{) = "
n=1 n around { = 0 and convergent for |{| ? 1, we rewrite the
right-hand side as log I ({p ) = log [1 (1 I ({p ))] and, after expansion,
¸¶
µ = p (1 I ({p )) + r [p (1 I ({p ))]
log Pr max [n {p
1$n$p
106
Limit laws
If limp<" p (1 I ({p )) = , we arrive at
¸
lim Pr max [n {p = h3
p<"
(6.6)
1$n$p
Hence, by choosing an appropriate sequence {{p } such that is finite (and
preferably non-zero), a scaling law for the maximum of a sequence can be
obtained and, similarly, for the minimum, if limp<" p (I ({p )) = ,
¸
(6.7)
lim Pr min [n A {p = h3
p<"
1$n$p
The distribution of limp<" min1$n$p [n and limp<" max1$n$p [n are
called extremal distributions.
1.0
D=1
0.8
Probability density function
D=2
D=2
D = 0.5
D = 0.5
0.6
0.4
Weibull
D=1
Fréchet
0.2
Gumbel
0.0
-6
-4
-2
0
2
4
6
x
Fig. 6.2. The probability density function of the three types of extremal distributions.
6.4.2 The Law of Extremal Types
Two distribution functions I and J are said to be of the same type if there
exists constants d A 0 and e for which I (d{ + e) = J({) for all {.
6.4 Extremal distributions
107
Theorem 6.4.1 (Law of Extremal Types) Any extremal distribution of
a sequence of i.i.d random variables can only have one of the three types
3{
1. Gumbel I ({) = h3h
3
2. Fréchet I ({) = h3{ 1{D0
3. Weibull I ({) = h3(3{) 1{?0 + 1{D0
where A 0.
Proof : See e.g. Berger (1993, pp. 65—69).
¤
The generality of this theorem is appealing: any maximum or minimum
of a set of i.i.d. random variables has (apart from the scaling constants d
and e) one of the above three types. The corresponding probability density
functions are plotted in Fig. 6.2.
6.4.3 Examples
1. Consider the set {[n } of exponentially distributed random variables
with I ({) = 1 h3{ . The condition for the maximum is ph3{p $ or, equivalently, {p = 1 (log p log ) and (6.6) becomes, after putting
{ = log ,
¸
1
3{
lim Pr max [n (log p + {) = h3h
p<"
1$n$p
The minimum p (1 h3{p ) $ ³ is equivalent to ´h3{p log p log log or {p 1 log(log p log ) 1 log log p log
p . Hence, after putting
{ = log , the limit law for the minimum of exponential random variables
is
µ
¶¸
1
{
3{
lim Pr min [n A
log log p +
= h3h
p<"
1$n$p
log p
For both the maximum and the minimum of exponential random variables,
3{
a scaling law exists that leads to a Gumbel distribution IJxpeho ({) = h3h .
In other words, for large p, the random variables P = max1$n$p [n log p and Q = log p (min1$n$p [n ) log p log log p have an identical
distribution equal to the Gumbel distribution.
2. Another example is the maximum of a set of i.i.d. uniform random
variables {Xn } in [0> 1] with I ({) = { for 0 { 1. Since p (1 {p ) $ with 0 {p 1 we have, after putting { = or, equivalently, {p $ 1 p
with { 0,
¸
{
lim Pr max Xn 1 = h3{
p<"
1$n$p
p
108
Limit laws
3. Consider a rectangular lattice with size }1 and }2 and with independent
and identical, uniformly distributed link weights on (0> 1] between each lattice point. The number of lattice points (nodes) equals Q = (}1 + 1)(}2 + 1)
and the number of links is O = 2}1 }2 + (}1 + }2 ). The shortest hop path
between two diagonal corner points consists of k = }1 + }2 hops. The weight
Zk of such a k hop path is the sum of k independent uniform random
variables with distribution specified in (3.29),
k µ ¶
1 X k
I ({) = Pr [Zk {] =
(1)m ({ m)k 1m${
m
k!
m=0
k
In particular, Pr [Zk k] = 1 and for small { ? 1 it holds that I ({) = {k! .
The precise computation of the minimum weight of a k hop path in a lattice
is di!cult due to dependence among those k hop paths and we content
ourselves here with an approximate estimate. If we neglect the dependence
of the k hops paths due to possible overlap, then the minimum weight among
all
k hop
can be approximated by (6.7) because the number5 p =
¡}1 +}
¢ paths
k!
2
= }1 !}2 ! of those k hop paths is large. The limit sequence must obey
}1
p (I ({p )) $ for su!ciently large p, which implies that I ({p ) must be
³ ´1
k
k
small or, equivalently, {p must be small. Hence, p {k!p = or {p = k!
.
p
The limit law (6.7) for the minimum weight Z = min1$n$p Zk>n of the
shortest hop path in a rectangular lattice is
"
lim Pr
p<"
µ
min Zk>n A
1$n$p
k!{
p
¶1 #
k
= h3{
k
In other words, the random variable pZ
tends to an exponential random
k!
or
variable with mean 1 for large p = }1k!
!}2 !
µ
¶
|k
Pr [Z |] 1 exp p
k!
5
Any path in a rectangular lattice can be representated by a sequence of r(ight), l(eft), u(p) and
d(own), which is called an encoded path word. The encoded path word of the shortest hop
path between diagonal corner points consists of }1 r’s (or l’s) and }2 d’s (or u’s). The total
number of these paths equals }1}+}2 . Two paths coincide in a same lattice point at j $ k hops
1
from the source node if their encoded path word has the same sum of r’s and d’s in the first
j lettres. The number of overlapping links between two paths equals the number of the same
consecutive lettres (r or d) in a block after a same sum of r’s and d’s in the encoded path words.
Checking for overlap between k hop paths requires a comparison of }1}+}2 ! permutations in
1
the encoded path words.
6.4 Extremal distributions
109
From (2.35), the mean shortest weight of a k hop path equals
µ
¶
µ
¶
Z "
Z "
1
{k
1
(1 IZ ({)) g{ exp p
g{ = 1 +
(}1 !}2 !) k
H [Z ] =
k!
k
0
0
For a square lattice where }1 = }2 = k2 , we have
¶ µµ ¶ ¶ 2
µ
k
k
1
!
H [Z ] = 1 +
k
2
Using Stirling’s formula (Abramowitz and Stegun, 1968, Section 6.1.38) for
s
1
the factorial k! = 2kk+ 2 h3k+ 12k where 0 ? ? 1, for large k, the mean
H [Z ] increases about linearly in the number of hops k,
µ ¶³
´2
s
k
k
k
H [Z ] '
kh 12k
2h
2h
The average weight of a link (or 1 hop) of the shortest k hop path is roughly
1
2h 0=184.
In spite of the fact that path dependence (overlap) has been
in the
´
³s ignored
Q is correct.
computation of the minimum weight, H [Z ] = R (k) = R
However, the approximate analysis does not give the correct prefactor in
H [Z ] nor the correct limit pdf which turns out to be Gaussian. Hence,
if random variables are not independent, Theorem 6.4.1 does not apply.
Finally, a shortest k hop path is not necessarily the overall shortest path
because it is possible — though with small probability — that the overall
shortest path has k + 2m hops with m A 0.
4. The probability density function of the longest shortest path The most commonly
used process that informs each node about changes in a network topology (e.g. an autonomous
domain) is called flooding: every router forwards the packet on all interfaces except for the
incoming one and duplicate packets are discarded. Flooding is particularly simple and robust
since it progresses, in fact, along all possible paths from the emitting node to the receiving node.
Hence, a flooded packet reaches a node in the network in the shortest possible time (if overhead in
routers are ignored). Therefore, the interesting problem lies in the determination of the flooding
time WQ , which is the minimum time needed to inform the last node in a network with Q nodes.
Only after WQ , all topology databases at each router in the network are again synchronized, i.e. all
routers possess the same topology information. Rather than investigating the flooding time WQ
(for which we refer to Section 16.5), the largest number of traversed routers (hops) or the longest
shortest path from the emitting node to the furthermost node in its shortest path tree is computed.
The number of hops, in short the hopcount KQ , along the shortest path between two arbitrary
nodes in a network containing Q nodes is modeled subject to the following assumptions: (a) the
hopcount KQ is a Poisson random variable with mean H [KQ ] = = log Q with A 0, which is
motivated in Section 16.3.1; (b) the number of nodes Q is very large6 ; (c) all shortest paths from
the emitting node towards any other node in the network are independent. The problem reduces
to compute the pdf of the random variable max1$n$Q 31 Kn . The distribution function follows
from (3.9) as
IKQ ({) =
{
[
n h3
n=0
6
n!
=13
"
[
n h3
n!
n={+1
The size of the Internet is currently estimated at about Q ; 105 .
110
Limit laws
The condition limp<" p (1 3 I ({p )) = becomes
"
[
lim Q 13
Q <"
n={Q
n
=
n!
+1
from which we must choose the appropriate {Q as function of Q. Observe that the maximum
term in the series has index n = [], where the latter denotes the largest integer smaller or equal
to . For, the ratio between two consecutive (positive) terms in the n-sum equals d dn = such
n
n1
that, if A n, then dn A dn31 , implying that the terms increase, while, if ? n, the terms
dn ? dn31 form a decreasing sequence. The series is rewritten as
"
[
"
{Q +1 [ ({Q + 1)!n
n
=
n!
({Q + 1)! n=0 ({Q + 1 + n)!
n={Q +1
2
{Q +1
1+
+
+···
=
({Q + 1)!
{Q + 2
({Q + 2)({Q + 3)
We choose {Q = [] + [] ; (1 + ) for large Q and, thus large , where must be related to
. The series then consists of decreasing terms. Moreover, for large ,
"
[
(1+)+1
1
1
n
=
1+
+
+···
n!
((1 + ) + 1)!
(1 + ) + 2@
((1 + ) + 2@)((1 + ) + 3@)
n=[]+[]+1
?
1+
(1+)+1
((1 + ) + 1)!
1
1
+
+ ···
(1 + )
(1 + )2
and thus,
"
[
n=[]+[]+1
(1+)+1 1 + n
=
n!
((1 + ) + 1)! 1
1+R
Using Stirling’s formula (Abramowitz and Stegun, 1968, Section 6.1.38), {! ;
large { yields
I
1
2{{+ 2 h3{ , for
(1+)+1 1 + (1+)+1 h(1+)
1+
;
I
1
1
((1 + ) + 1)! ((1 + ) + 1) 2(1+)+ 2 (1 + )(1+)+ 2 ;
h(1+)[13log(1+)] 1
s
2(1 + )
For large Q, the condition becomes
;
Q (1+)[13log(1+)]+13 1
s
2(1 + ) log Q
1+R
1
log Q
and, after taking the logarithm of both sides,
1
1
log ; ((1 + ) [1 3 log (1 + )] + 1 3 ) log Q3 log log Q3 log(2(1+))3log +R
2
2
1
log Q
or
log + ( 3 1) log Q +
1
log log Q + R
2
1
log Q
; ((1 + ) [1 3 log (1 + )]) log Q
3
1
log(2(1 + )) 3 log 2
(6.8)
At this point, we will assume that ? 1, which justifies the expansion log (1 + ) = + R( 2 ).
This assumption will be checked later. Thus,
(1 + ) [1 3 log (1 + )] ; (1 + ) [1 3 ] ; (1 3 2 )
6.4 Extremal distributions
111
and must be solved from
U ; (1 3 2 ) log Q 3
3 log 2
I
with U = log
2 + ( 3 1) log Q + 12 log log Q . The Newton—Raphson iteration can be
applied with starting value 0 to find the solution of the equation up to the leading order in log Q,
i.e. U ; (1 3 2 ) log Q. Hence,
v
log
U
U
31
13
;13
;13
3
log Q
2 log Q
2
0 =
I
2 + 12 log log Q
2 log Q
which demonstrates that, for D 1, the assumption ? 1 is correct for large Q. The case ? 1
requires the application of Newton—Raphson’s method on (6.8), which we omit here. The second
iteration in Newton—Raphson’s method leads to
1 = 0 3
0
+ log 0
2
20 log Q + 12 + 1
0
and shows that the n-th iteration improves the previous with a quantity of order R log3n Q .
31 Since (6.8) is only accurate up to R log Q , a second iteration is superfluous and we obtain
the choice {Q = (1 + ), or
{Q =
1
I
3 + 1
1
2 3 log log Q
log Q 3 log
2
2
4
After substituting { = 3 log , we finally arrive for D 1 and large Q at
Pr
max
1$n$Q 31
Kn $
{
3 + 1
1
1
{
+
log Q 3 log log Q 3 log (2) = h3h
2
2
4
4
from which the pdf of the hopcount of the longest shortest path (lsp) follows as
iovs ({) = 2h3h
2({f) 32({3f)
h
(6.9)
with
3 + 1
1
1
log Q 3 log log Q 3 log (2)
2
4
4
1
1
3
1
=
H [KQ ] 3 log H [KQ ] 3 log (2)
+
2
2
4
4
f=
and
H [ovs] = f +
ydu [ovs] =
E
2
1
3
+
2
2
H [KQ ] 3
1
log H [KQ ] 3 0=170
4
2
' 0=4112
24
Observe that the average longest shortest path is about twice the average hopcount if = 1
while the variance is small, constant and independent of the scaling parameter f or . Figure 6.3
compares the above approximate analysis with simulations.
112
Limit laws
0.6
Theory
Simulation
0.5
Pr[lsp = k]
0.4
0.3
0.2
0.1
0.0
7
8
9
10
11
12
13
Number of hops k
Fig. 6.3. The hopcount of the shortest path for Q = 4000. Both simulations based on an
Internet-like topology generator (with unit link weights) and theory iovs (n) with = 0=4786
are shown.
Notes
(i) The classical theory of extremes, extremal properties of dependent
sequences and extreme values in continuous-time are treated in detail
in the book by Leadbetter et al. (1983).
(ii) A more recent book by Embrechts et al. (2001a) applies the theory
of extremal events to problems in insurance and finance.
Part II
Stochastic processes
7
The Poisson process
The Poisson process is a prominent stochastic process, mainly because it
frequently appears in a wealth of physical phenomena and because it is
relatively simple to analyze. Therefore, we will first treat the Poisson process
before considering the more general Markov processes.
7.1 A stochastic process
7.1.1 Introduction and definitions
A stochastic1 process, formally denoted as {[(w)> w 5 W }, is a sequence of
random variables [(w), where the parameter w — most often the time — runs
over an index set W . The state space of the stochastic process is the set of
all possible values for the random variables [(w) and each of these possible
values is called the state of the process. If the index set W is a countable
set, [[n] is a discrete stochastic process. Often n is the discrete time or a
time slot in computer systems. If W is a continuum, [(w) is a continuous
stochastic process. For example, the outcome of q tosses of a coin is a
discrete stochastic process with state space {heads, tails} and the index set
W = {0> 1> 2> = = = > q}. The number of arrivals of packets in a router during
a certain time interval [d> e] is a continuous stochastic process because w 5
[d> e]. Any realization of a stochastic process is called a sample path. For
example, a sample path of the outcome of q tosses of a coin is {heads, tails,
tails, = = =, heads}, while a sample path of the number of arrivals in [d> e] is
1d$w?d+k > 3 × 1d+k$w?d+4k > 8 × 1d+4k$w?d+5k > = = = > 13 × 1d+(n31)k$w?e , where
k = e3d
n . Other examples are the measurement of the temperature each
day, the notation of the value of a stock each minute or rolling a die and
recording its value, which is illustrated in Fig. 7.1.
1
The word “stochastic” is derived from r"l in Greek which means “to aim at, try to
hit”.
115
116
The Poisson process
Especially in continuous stochastic processes, it is convenient to define
increments as the dierence [(w) [(x). The continuous time stochastic process [(w) has independent increments if changes in the value of the
process in dierent time intervals are independent, or, if for all w0 ? w1 ?
· · · ? wq , the random variables [(w1 ) [(w0 )> [(w2 ) [(w1 )> = = = > [(wq ) [(wq31 ) are independent. The continuous (time) stochastic process has stationary increments if [(w + v) [(v) possesses the same distribution for all
v. Hence, changes in the value of the process only dependent on the distance
w between process events, not on the time point v.
6
6
4
4
2
2
t
t
Fig. 7.1. Two dierent sample paths of the experiment: roll a die and record the
outcome. The total number of dierent sample paths is 6W where W is the number
of times an outcome is recorded. The state space only contains 6 possible outcomes
{1> 2> 3> 4> 5> 6}.
Stochastic processes are distinguished by (a) their state space, (b) the
index set W and (c) by the dependence relations between random variables
[(w). For example, a standard Brownian motion (or Wiener process)2 is defined as a stochastic process [(w) having continuous sample paths, stationary independent increments and [(w) has a normal distribution Q (0> w). A
Poisson process, defined in more detail in Section 7.2, is a stochastic process
[(w) having discontinuous sample paths, stationary independent increments
and [(w) has a Poisson distribution. A generalization of the Poisson process
is a counting process. A counting process is defined as a stochastic process
Q (w) 0 with discontinuous sample paths, stationary independent increments, but with arbitrary distribution. A counting process Q (w) represents
the total number of events that have occurred in a time interval [0> w]. Examples of a counting process are the number of telephone calls at a local
exchange during an interval, the number of failures in a telecommunication
network, the number of corrupted bits after transmission due to channel
errors, etc.
2
Harrison (1990) shows that the converse is also true: if \ is a continuous process with stationary
independent increments, then \ is a Brownian motion.
7.1 A stochastic process
117
7.1.2 Modeling a stochastic process from measurements
In practice, understanding observed phenomena often asks for a stochastic
model that captures the main characteristics of the studied phenomena and
that enables computations of diverse quantities of interest. Examples in the
field of data communications networks are the determination of the arrival
process at a switch or router in order to dimension the number of buer
(memory) places, the modeling of the graph of the Internet, the distribution
of the duration of a telephone call or web browsing session, the number of
visits to certain websites, the number of links that refer to a web page, the
amount of downloaded information, the number of traversed routers by an
email, etc. Accurate modeling is in general di!cult and often trades o
complexity against accuracy of the model.
Let us illustrate some aspects of modeling by considering Internet delay
measurements. A motivation for obtaining an end-to-end delay model for (a
part of) the Internet is the question whether massive service deployment of
voice over IP (VoIP) can substitute classical telephony with a comparable
quality. Specifically, classical telephony requires that the end-to-end delay
of an arbitrary telephone conversation hardly exceeds 100 ms.
end-to-end delay D [ms]
50
Interdepature time = 12 s
hopcount IP path = 13
# measurement points = 1006
E[D] = 35.03 ms
V[D] = 1.36 ms
min[D] = 34.18 ms
max[D] = 53.95 ms
45
40
35
5:00 a.m.
6:00 a.m.
7:00 a.m.
8:00 a.m.
8:30 a.m.
Time from 5:00 a.m. until 8:30 a.m. measured on 21/11/2002
Fig. 7.2. The raw data of the end-to-end delay of IP test packets along a same path
of 13 hops in the Internet measured during 3.5 hours.
118
The Poisson process
The end-to-end delay along a fixed path between source and destination
measured during some interval is an example of a continuous time stochastic
process. We have received data of the delay measured at RIPE-NCC as
illustrated in Fig. 7.2. Figure 7.2 shows a sample path of this continuous
stochastic process. The precise details of the measurement configuration are
for the present purpose not relevant. It su!ces to add that Figure 7.2 shows
the time dierence between the departure of an IP test packet of 100 byte at
the sending box and its arrival at the destination box accurate within 10 s.
1
packets per second. Each
The average sending rate of IP test packets is 12
IP test packet is assumed to follow the same path from sending to receiving
box. The steadiness of the path is checked by trace-route measurements
every 6 minutes.
Usually, in the next step, the histogram of the raw data is made. A
histogram counts the number of data points that lie in an interval of G
ms, which is often called the bin size. Most graphical packages allow to
choose the bin size. Figure 7.3 shows two dierent histograms with bin size
G = 0=5 ms and G = 0=1 ms. In general, there is no universal rule to
choose the bin size G. Clearly, the bin size is bounded below by the measurement accuracy, in our case G A 10 s. A finer bin size provides more
detail, but the resulting histogram exhibits also more stochastic variations
because there are fewer data points in a small bin and adjacent bins may
possess a significantly dierent amount of data points. Hence, compared
to one larger bin that covers a same interval, less averaging or smoothing
occurs in a set of smaller bins. The normalized histogram obtained by dividing the counts per bin by the total number of data points provides a first
approximation to the probability density function of G. However, it is still
discrete and approximates Pr [n ? G n + G]. A more precise description
of constructing a histogram is given in Section C.1.
The histogram is generally better suited to decide whether outliners in de
data points may be due to measurement errors or not. Figure 7.3 suggests
to either neglect the data points with G A 40 ms or to measure at a higher
sending rate of IP test packets in order to have more details in the intervals
exceeding 38 or 40 ms. If there existed a good3 stochastic model for the endto-end delay along fixed Internet paths, a normal procedure4 in engineering
and physics would be to fit the histogram with that stochastic model to
obtain the parameters of that stochastic model. The accuracy of the fit can
3
4
Which is still lacking at the time of writing.
Other more di!cult methods in the realm of statistics must be invoked in case the measurement
data are so precious and rare that any additional measurement point has a far larger cost than
the cost of extensive additional computations.
7.1 A stochastic process
119
be expressed in terms of the correlation coe!cient explained in Section
2.5.3. The closer tends to 1, the better the fit, which gives confidence that
the stochastic model corresponds with the real phenomenon.
Number of data points
140
120
300
100
200
80
100
60
0
40
35
40
45
50
20
0
34
35
36
37
38
39
40
41
42
43
44
D in ms
Fig. 7.3. The histogram of the end-to-end delay with a bin size of 0.1 ms (the insert
has bin size of 0.5 ms).
Assuming that the presented measurement is a typical measurement along
a fixed Internet path (which is true for about 80% of the investigated dierent paths), it demonstrates that there is a clear minimum at about 34 ms due
to the propagation delay of electromagnetic waves. In addition, the end-toend delay lies for 99% between 34 and 38 ms. However, there is insu!cient
data to pronounce claims in the tail behavior (Pr [G A {] for { A 40 ms).
Just this region is of interest to compute the quality of service expressed as
the probability that the end-to-end delay exceeds { ms is smaller than 103d
where d specifies the stringency on the quality requirement. Toll quality
in classical telephony sets { at 100 ms and d in the range of 4 to 5. The
existence of a good stochastic model covering the whole possible range of
the end-to-end delay G would enable us to compute tail probabilities based
on the parameters that can be fitted from the measurements.
The histogram is in fact a projection of the raw measurement data onto
the ordinate (end-to-end delay axis). All time information (the abscissa in
Fig. 7.2) is lost. Usually, the time evolution and the dependencies or correlations over time of a stochastic phenomenon are di!cult and most analyses
are only tractable under certain simplifying conditions. For example, often
only a steady state analysis is possible and the increments [(wn ) [(wn31 )
120
The Poisson process
of the process for all w0 ? · · · ? wn31 ? wn ? · · · ? wq are assumed to be independent or weakly dependent. The study of Markov processes (Chapters
9—11) basically tries to compute and analyze the process in steady state.
Figure 7.2 is measured over a relatively long period of time and indicates
that after 8.00 a.m. the background tra!c increases. The background traffic interferes with the IP test packets and causes them to queue longer in
routers such that larger variations are observed. However, it is in general
di!cult to ascertain that (a part of) the measurement is performed while
the system operates in a certain stable regime (or steady state).
We have touched upon some aspects in the art of modeling to motivate the
importance of studying stochastic processes. In the sequel of this chapter,
one of the most basic and simplest stochastic processes is investigated.
7.2 The Poisson process
A Poisson process with parameter or rate A 0 is an integer-valued, continuous time stochastic process {[(w)> w 0} satisfying
(i) [(0) = 0
(ii) for all w0 = 0 ? w1 ? · · · ? wq , the increments [(w1 ) [(w0 )> [(w2 ) [(w1 )> = = = > [(wq ) [(wq31 ) are independent random variables
(iii) for w 0, v A 0 and non-negative integers n, the increments have the
Poisson distribution
Pr [[(w + v) [(v) = n] =
(w)n h3w
n!
(7.1)
It is convenient to view the Poisson process [(w) as a special counting
process, where the number of events in any interval of length w is specified
via condition (iii). From this definition, a number of properties can be
derived:
(a) Condition (iii) implies that the increments are stationary because the
right-hand side does not dependent on v. In other words, the increments
only depend on the length of the interval w and not on the time v when the
interval begins. Further, with (3.11), the mean H [[(w + v) [(v)] = w
and because the increments are stationary, this holds for any value of v. In
particular with v = 0 and condition 1, the expected number of events in a
time interval with length w is
H [[(w)] = w
(7.2)
Relation (7.2) explains why is called the rate of the Poisson process,
namely, the derivative over time w or the number of events per time unit.
7.2 The Poisson process
121
(b) The probability that exactly one event occurs in an arbitrarily small
time interval of length k follows from condition (iii) as
Pr [[(k + v) [(v) = 1] = kh3k = k + r(k)
while the probability that no event occurs in an arbitrarily small time interval of length k is
Pr [[(k + v) [(v) = 0] = h3k = 1 k + r(k)
Similarly, the probability that more than one event occurs in an arbitrarily
small time interval of length k is
Pr [[(k + v) [(v) A 1] = r(k)
Example 1 A conversation in a wireless ad-hoc network is severely disturbed by interference signals according to a Poisson process of rate = 0=1
per minute. (a) What is the probability that no interference signals occur
within the first two minutes of the conversation? (b) Given that the first
two minutes are free of disturbing eects, what is the probability that in the
next minute precisely 1 interfering signal disturbs the conversation?
(a) Let [ (w) denote the Poisson interference process, then Pr [[(2) = 0]
needs to be computed. Since [ (0) = 0 and with (7.1), we can write
Pr [[(2) = 0] = Pr [[(2) [ (0) = 0] = h32 , which equals Pr [[(2) = 0] =
h30=2 = 0=8187.
(b) The events during two non-overlapping intervals of a Poisson process
are independent. Thus the event {[(2) [(0) = 0} is independent from the
event {[(3) [(2) = 1} which means that the asked conditional probability Pr [[(3) [(2) = 1|[(2) [(0) = 0] = Pr [[(3) [(2) = 1]. From
(7.1), we obtain Pr [[(3) [(2) = 1] = 0=1h30=1 = 0=0905=
Example 2 During a certain time interval [w1 > w1 + 10 s], the number of IP
packets that arrive at a router is on average 40/s. A service provider asks
us to compute the probability that there arrive 20 packets in the period
[w1 > w1 + 1 s] and 30 IP packets in [w1 > w1 + 3 s]. We may regard the arrival
process as a Poisson process.
We are asked to compute Pr [[(1) = 20> [(3) = 30] knowing that = 40
31
s . Using the independence of increments and (7.1), we rewrite
Pr [[(1) = 20> [(3) = 30] = Pr [[(1) [(0) = 20> [(3) [(1) = 10]
= Pr [[(1) [(0) = 20] Pr [[(3) [(1) = 10]
=
()20 h3 (2)10 h32
= 10326 0
20!
10!
122
The Poisson process
which means that the request of the service provider does not occur in
practice.
7.3 Properties of the Poisson process
The first theorem is the converse of the above property (b) that immediately followed from the definition. The Theorems presented here reveal the
methodology of how stochastic processes are studied.
Theorem 7.3.1 A counting process Q (w) that satisfies the conditions (i)
Q (0) = 0, (ii) the process Q (w) has stationary and independent increments,
(iii) Pr [Q (k) = 1] = k + r(k) and (iv) Pr [Q (k) A 1] = r(k) is a Poisson
process with rate A 0.
Proof: We must show that conditions (iii) and (iv) are equivalent to
condition (iii) in the definition of the Poisson process. Denote Sq (w) =
Pr [Q (w) = q] and consider first the case q = 0, then
S0 (w + k) = Pr [Q (w + k) = 0] = Pr [Q (w + k) Q (w) = 0> Q (w) = 0]
Invoking independence via (ii)
S0 (w + k) = Pr [Q (w + k) Q (w) = 0] Pr [Q (w) = 0]
By definition, S0 (w) = Pr [Q (w) = 0] and from (iii), (iv) and the fact that
P"
n=0 Pr [Q (k) = n] = 1, it follows that
Pr [Q (k) = 0] = 1 k + r(k)
(v)
Combining these with the stationarity in (ii), we obtain
S0 (w + k) = S0 (w) (1 k + r(k))
or
r(k)
S0 (w + k) S0 (w)
= S0 (w) +
k
k
from which, in the limit k $ 0, the dierential equation
S00 (w) = S0 (w)
is immediate. The solution is S0 (w) = Fh3w and the integration constant F
follows from (i) and S0 (0) = Pr [Q (0) = 0] = 1 as F = 1. This establishes
condition (iii) in the definition of the Poisson process for n = 0.
7.3 Properties of the Poisson process
123
The verification for q A 0 is more involved. Applying the law of total
probability (2.46),
Sq (w + k) = Pr [Q(w + k) = q]
q
X
Pr [Q (w + k) Q (w) = m|Q (w) = q m] Pr [Q (w) = q m]
=
m=0
By independence (ii),
Pr [Q (w + k) Q (w) = m|Q (w) = q m] = Pr [Q (w + k) Q (w) = m]
and by definition Pr [Q(w) = q m] = Sq3m (w), we have
Sq (w + k) =
q
X
Pr [Q (w + k) Q (w) = m] Sq3m (w)
m=0
By the stationarity (ii)
Pr [Q(w + k) Q (w) = m] = Pr [Q (k) Q (0) = m]
we obtain using (i)
Sq (w + k) =
q
X
Pr [Q (k) = m] Sq3m (w)
m=0
while (v) and (iii) suggest to write the sum as
Sq (w + k) = Sq (w) Pr [Q (k) = 0] + Sq31 (w) Pr [Q (k) = 1]
q
X
Sq3m (w) Pr [Q (k) = m]
+
m=2
Since Sq (w) 1 and using (iv),
q
X
Sq3m (w) Pr [Q (k) = m] m=2
q
X
Pr [Q(k) = m] = Pr [Q (k) A 1] = r(k)
m=2
we arrive with (v), (iii) at
Sq (w + k) = Sq (w) (1 k + r(k)) + Sq31 (w) (k + r(k)) + r(k)
or
Sq (w + k) Sq (w)
r(k)
= Sq (w) + Sq31 (w) +
k
k
which leads, after taking the limit k $ 0, to the dierential equation
Sq0 (w) = Sq (w) + Sq31 (w)
124
The Poisson process
with initial condition Sq (0) = Pr [Q(0) = q] = 1{q=0} . This dierential
equation is rewritten as
´
g ³ w
(7.3)
h Sq (w) = hw Sq31 (w)
gw
In ¡case q ¢= 1, the dierential equation reduces with S0 (w) = h3w to
g
w
w
gw h S1 (w) = . The general solution is h S1 (w) = w + F and, from the
initial condition S1 (0) = 0, we have F = 0 and S1 (w) = wh3w . The general
q 3w
solution to (7.3) is proved by induction. Assume that Sq (w) = (w)q!h
holds
for q, then the case q + 1 follows from (7.3) as
´
(w)q
g ³ w
h Sq+1 (w) = gw
q!
q+1 3w
h
and integrating from 0 to w using Sq+1 (0) = 0, yields Sq+1 (w) = (w)(q+1)!
which establishes the induction and finalizes the proof of the theorem.
¤
The second theorem has very important applications since it relates the
number of events in non-overlapping intervals to the interarrival time between these events.
Theorem 7.3.2 Let {[(w); w 0} be a Poisson process with rate A 0
and denote by w0 = 0 ? w1 ? w2 ? · · · the successive occurrence times of
events. Then the interarrival times q = wq wq31 are independent identically
distributed exponential random variables with mean 1 .
Proof: For any v 0 and any q 1, the event {q A v} is equivalent to
the event {[(wq31 +v)[(wq31 ) = 0}. Indeed, the q-th interarrival time q
can only be longer than v time units if and only if the q-th event has not yet
occurred v time units after the occurrence of the (q 1)-th event at wq31 .
Since the Poisson process has independent increments (condition (ii) in the
definition of the Poisson process), changes in the value of the process in nonoverlapping time intervals are independent. By the equivalence in events,
this implies that the set of interarrival times q are independent random
variables. Further, by the stationarity of the Poisson process (deduced from
condition (iii) in the definition of the Poisson process),
Pr [q A v] = Pr [[(wq31 + v) [(wq31 ) = 0] = h3v
which implies that any interarrival time has an identical, exponential distribution,
Iq ({) = Pr [q {] = 1 h3{
7.3 Properties of the Poisson process
This proves the theorem.
125
¤
The converse of Theorem 7.3.2 also holds: if the interarrival times {q }
of a counting process {Q (w)> w 0} are i.i.d. exponential random variables
with mean 1 , then {Q (w)> w 0} is a Poisson process with rate .
An association to the exponential distribution is the memoryless property,
Pr[q A v + w|q A v] = Pr[q A w]
By the equivalence of the events, for any w> v 0,
Pr[q A v + w|q A v] = Pr[[(wq31 + v + w) 3 [(wq31 ) = 0|[(wq31 + v) 3 [(wq31 ) = 0]
= Pr[[(wq31 + v + w) 3 [(wq31 + v) = 0|[(wq31 + v) 3 [(wq31 ) = 0]
By the independence of increments (in non-overlapping intervals),
Pr[q A v + w|q A v] = Pr[[(wq31 + v + w) [(wq31 + v) = 0]
and by the stationarity of the increments, the memoryless property is established,
Pr[q A v + w|q A v] = Pr[[(wq31 + w) [(wq31 ) = 0] = Pr[q A w]
Hence, the assumption of stationary and independent increments is equivalent to asserting that, at any time v, the process probabilistically restarts
again with the same distribution and is independent of occurrences in the
past (before v). Thus, the process has no memory and, since the only continuous distribution that satisfies the memoryless property is the exponential
distribution, exponential interarrival times q are a natural consequence.
The arrival time of the q-th event or the waiting time until the q-event is
P
Zq = qn=1 n . In Section 3.3.1, it is shown that the probability distribution
of the sum of independent exponential random variables has a Gamma distribution or Erlang distribution (3.24). Alternatively, the equivalence of the
events, {Zq w} +, {Q (w) q}, directly leads to the Erlang distribution,
IZq (w) = Pr [Zq w] = Pr [Q (w) q] =
"
X
(w)n h3w
n=q
n!
The equivalence of the events, {Zq w} +, {Q (w) q}, is a general
relation and a fundamental part of the theory of renewal processes, which
we will study in the next Chapter 8.
Theorem 7.3.3 Given that exactly one event of a Poisson process {[(w); w 0} has occurred during the interval [0> w], the time of occurrence of this event
is uniformly distributed over [0> w].
126
The Poisson process
Proof: Immediate application of the conditional probability (2.44) yields
for 0 v w,
Pr [1 v|[(w) = 1] =
Pr [{1 v} _ {[(w) = 1}]
Pr [[(w) = 1]
Using the equivalence {1 v} +, {[(w0 + v) [(w0 ) = 1} and the fact
that {[(w0 + v) [(w0 ) = 1} = {[(v) = 1} by the stationarity of the
Poisson process gives
{1 v} _ {[(w) = 1} = {[(v) = 1} _ {[(w) = 1}
= {[(v) = 1} _ {[(w) [(v) = 0}
Applying the independence of increments over non-overlapping intervals and
(7.1) yields
Pr [1 v|[(w) = 1] =
=
Pr [[(v) = 1] Pr [[(w) [(v) = 0]
Pr [[(w) = 1]
(v) h3v h3(w3v)
v
=
(w) h3w
w
¤
which completes the proof.
Theorem 7.3.3 is immediately generalized to q events. For any set of real
variables vm satisfying 0 = v0 ? v1 ? v2 ? · · · ? vq ? w and given that q
events of a Poisson process {[(w); w 0} have occurred during the interval
[0> w], the probability of the successive occurrence times 0 ? w1 ? w2 ? · · · ?
wq ? w of these q Poisson events is
Pr [w1 v1 > = = = > wq ? vq |[(w) = q] =
Pr [{w1 v1 > = = = > wq ? vq } _ {[(w) = q}]
Pr [[(w) = q]
Using a similar argument as in the proof of Theorem 7.3.3,
s = Pr [{w1 v1 > w2 v2 > = = = > wq ? vq } _ {[(w) = q}]
= Pr [[(v1 ) [ (v0 ) = 1> = = = > [(vq ) [(vq31 ) = 1> [(w) [(vq ) = 0]
3
4
q
Y
= C Pr [[(vm ) [ (vm31 ) = 1]D Pr[[(w) [(vq ) = 0]
m=1
3
=C
q
Y
4
h3(vm 3vm31 ) (vm vm31 )D h3(w3vq )
m=1
= q
q
Y
m=1
(vm vm31 ) h3
Sq
m=1 (vm 3vm31 )3(w3vq )
= q
q
Y
m=1
(vm vm31 ) h3w
7.3 Properties of the Poisson process
127
Thus,
q
Pr [w1 v1 > w2 v2 > = = = > wq ? vq |[(w) = q] =
q
Y
(vm vm31 ) h3w
m=1
(w)q h3w
q!
q! Y
(vm vm31 )
wq
q
=
m=1
from which the density function
i{wm } (v1 > = = = > vq |[(w) = q) =
Cq
Pr [w1 v1 > = = = > wq ? vq |[(w) = q]
Cv1 = = = Cvq
follows as
q!
wq
which is independent of the rate . If 0 ? w1 ? w2 ? · · · ? wq ? w are the
successive occurrence times of q Poisson events in the interval [0> w], then
the random variables w1 > w2 > = = = > wq are distributed as a set of order statistics,
defined in Section 3.4.2, of q uniform random variables in [0> w]. In other
words, if q i.i.d. uniform random variables on [0> w] are assorted in increasing
order, they may represent q successive occurrence times of a Poisson process.
The average spacing between these q ordered i.i.d. uniform random variables
is qw as computed in Problem (ii) of Section 3.7.
A related example is the conditional probability where 0 ? v ? w and
0 n q,
i{wm } (v1 > v2 > = = = > vq |[(w) = q) =
Pr [{[(v) = n} _ {[(w) = q}]
Pr [[(w) = q]
Pr [{[(v) = n} _ {[(w) [(v) = q n}]
=
Pr [[(w) = q]
Pr [[(v) = n] Pr[[(w) [(v) = q n]
=
Pr [[(w) = q]
Pr [[(v) = n|[(w) = q] =
q!(v)n h3v ((w v))q3n h3(w3v)
n!(w)q h3w
(q n)!
µ ¶ n
q v
=
(w v)q3n
n wq
=
Hence, if s = vw , the conditional probability becomes
µ ¶
q n
Pr [[(v) = n|[(w) = q] =
s (1 s)q3n
n
128
The Poisson process
Given that a total number of q Poisson events have occurred in time interval
[0> w], the chance that precisely n events have taken place in the sub-interval
[0> v] is binomially distributed with parameter q and s = vw . Observe that
also this conditional probability is independent of the rate . In addition,
since limw<" [(w) = 4 such that q $ 4, applying the law of rare events
results in
lim Pr [[(v) = n|[(w) = q] =
w<"
vn 3v
h
n!
Given an everlasting Poisson process, the chance that precisely n events
occur in the interval [0> v] is Poisson distributed with mean equal to the
length of the interval.
Application The arrival process of most real-time applications (such as
telephony calls, interactive-video, ...) in a network is well approximated by
a Poisson process. Suppose a measurement configuration is built to collect
statistics of the arrival process of telephony calls in some region. During a
period [0> W ], precisely 1 telephony call has been measured. What can be
said of the time { 5 [0> W ] at which the telephony call has arrived at the
measurement device? Theorem 7.3.3 tells us that any time in that interval
is equally probable.
Theorem 7.3.4 If [(w) and \ (w) are two independent Poisson processes
with rates { and | , then is ](w) = [(w) + \ (w) also a Poisson process with
rate { + | .
Proof: It su!ces to demonstrate that the counting process Q] (w) =
Q[ (w) + Q\ (w) has exponentially distributed interarrival times ] . Suppose
that Q] (wq ) = q, it remains to compute the next arrival at time wq+1 = wq +v
for which Q] (wq + v) = q + 1. Due to the memoryless property of the
Poisson process, the occurrence of an event from wq on for each random
variable [ and \ is again exponentially distributed with parameter { and
| , respectively. In other words, it is irrelevant which process [ or \
has previously caused the arrival at time wq . Further, the event that the
interarrival time of the sum processes {] A v} is equivalent to {[ A v} _
{\ A v} or
Pr [] A v] = Pr [[ A v> \ A v] = Pr [[ A v] Pr [\ A v] = h3({ +| )v
where the independence of [(w) and \ (w) has been used. This proves the
theorem.
¤
7.4 The nonhomogeneous Poisson process
129
A direct consequence is that any sum of independent Poisson processes is
also a Poisson process with aggregate rate equal to the sum of the individual
rates. This theorem is in correspondence with the sum property of the
Poisson distribution.
7.4 The nonhomogeneous Poisson process
As will be shown later in Section 11.3.2, the Poisson process is a special case
of a birth-and-death process, which is in turn a special case of a Markov
process. Hence, it seems more instructive to discuss these special processes
as applications of the Markov process. Therefore, only associations to the
Poisson process are treated here. In many cases, the rate is a time variant
function (w) and such process is termed a nonhomogeneous or nonstationary Poisson process. For example, the arrival rate of a large number p of
individual IP-flows at a router is well approximated by a nonhomogeneous
Poisson process, where the rate (w) varies over the day depending on the
number p and the individual rate of each flow of packets. Since the sum of
independent Poisson random variables is again a Poisson random variable,
Pp(w)
we have (w) = m=1 m (w).
If [(w) is a nonhomogeneous Poisson process with rate (w), the increment
[(w) [(v) reflects the number of events in an interval (v> w] and increments
of non-overlapping intervals are still independent.
Rw
Theorem 7.4.1 If (w) = 0 (x)gx and v ? w, then [(w) [(v) is Poisson
distributed with mean (w) (v).
The demonstration is analogous to the proof of Theorem 7.3.1.
Proof (partly): Denote by Sq (w) = Pr [Q (w) Q (v) = q], then
S0 (w + k) = Pr [Q (w + k) Q (v) = 0]
= Pr [Q (w + k) Q (w) = 0> Q(w) Q (v) = 0]
Invoking independence of the increments,
S0 (w + k) = Pr [Q(w + k) Q (w) = 0] Pr[Q (w) Q(v) = 0]
= S0 (w)(1 (w)k + r(k))
or
S0 (w + k) S0 (w)
r(k)
= (w)S0 (w) +
k
k
from which, in the limit k $ 0, the dierential equation
S00 (w) = (w)S0 (w)
130
The Poisson process
g
is immediate. Rewritten as gw
log S0 (w) = (w), after integration over (v> w],
we find log S0 (w) = ((w) (v)) since S0 (v) = Pr [Q (v) Q (v) = 0] = 1.
Thus, for the case q = 0, we find S0 (w) = exp [ ((w) (v))], which proves
the theorem for q = 0.
The remainder of the proof (q A 0) uses the same ingredients as the proof
of Theorem 7.3.1 and is omitted.
¤
A nonhomogeneous Poisson process [(w) with rate (w) can be transformed to a homogeneous Poisson process \ (x) with rate 1 by the time
transform x = (w). For, \ (x) = \ ((w)) = [(w), and \ (x + x) =
\ ((w) + (w)) = [(w + w) because (w) = (w) for small w such
that
Pr [\ (x + x) \ (x) = 1] = Pr [[(w + w) [(w) = 1]
= (w)w + r(w)
= x + r(x)
because x = (w)w + r(w). Hence, all problems concerning nonhomogeneous Poisson processes can be reduced to the homogeneous case treated
above.
7.5 The failure rate function
Previous sections have shown that the Poisson process is specified by a
rate function (w). In this section, we consider the failure rate function of
some object or system. Often it is interesting to know the probability that
an object will fail in the interval [w> w + w] given that the object was still
functioning well up to time w. Let [ denote the lifetime of an object5 , then
this probability can be written with (2.44) as
Pr [{w [ w + w} _ {[ A w}]
Pr [[ A w]
Pr [w ? [ w + w]
=
Pr [[ A w]
Pr [w [ w + w|[ A w] =
If i[ (w) is the probability density function of [ and I[ (w) = Pr [[ w],
then for small w and assuming that i[ (w) is well behaved6 such that
5
In medical sciences, [ can represent in general the time for a certain event to occur. For
example, the time it takes for an organism to die, the time to recover from illness, the time for
a patient to respond to a therapy and so on.
6 Recall the discussion in Section 2.3.
7.5 The failure rate function
131
Pr [w ? [ w + w] = i[ (w)w,
Pr [w [ w + w|[ A w] =
i[ (w)
w
1 I[ (w)
This expression shows that
u(w) =
i[ (w)
1 I[ (w)
(7.4)
can be interpreted as the intensity or rate that a w-year old object will fail.
It is called the failure rate u(w) and
U(w) = 1 I[ (w) = Pr [[ A w]
(7.5)
for
is usually termed7 the reliability function. Since u(w) = Pr[w$[$w+{w|[Aw]
{w
small w, the failure rate u (w) A 0 because u (w) = 0 would imply an infinite
lifetime [. Using the definition (2.30) of a probability density function, we
observe that
u(w) = gU(w)
gw
U(w)
=
g ln U(w)
gw
Or, since U(0) = 1, the corresponding integrated relation is
Z w
¸
U(w) = exp u(x)gx
(7.6)
(7.7)
0
The expressions (7.6) and (7.7) are inverse relations that specify u(w) as function of U(w) and vice versa. The reliability function U(w) is non-increasing
with maximum at w = 0 since it is a probability distribution function. On
the other hand, the failure rate u(w) being a probability density function can
take any positive real value. From (7.4) we obtain the density function of
the lifetime [ in terms of failure rate u(w) as
Z w
¸
i[ (w) = u(w)U(w) = u(w) exp u(x)gx
0
with i[ (0) = u(0). Using the tail relation (2.35) for the expectation of the
lifetime [ immediately gives the mean time to failure,
Z "
H [[] =
U(w)gw
(7.8)
0
In case I[ (W ) = 1 and i[ (W ) 6= 0 for a finite time W , which is the
maximum lifetime, the definition (7.4) demonstrates that u(w) has a pole at
7
In biology, medical sciences and physics, U(w) is called the survival function and u(w) is the
corresponding mortality rate or hazard rate.
132
The Poisson process
w = W= In practice, the failure rate u(w) is relatively high for small w due to
initial imperfections that cause a number of objects to fail early and u(w) is
increasing towards the maximum life time W due to aging or wear and tear.
This shape of u(w) as illustrated in Fig. 7.4 is called a “bath-tub” curve,
which is convex.
r(t)
fX(0)
t
T
0
Fig. 7.4. Example of a “bath-tub” shaped failure rate function u(w).
An often used model for the failure rate is u(w) = dwd31 with corresponding reliability function U(w) = exp [wd ] and where the lifetime [ has a
Weibull distribution function I[ (w) = 1 U(w) as in (3.40). In case d = 1,
the failure rate u(w) = is constant over time, while d A 1 (d ? 1) reflects an
increasing (decreasing) failure rate over time. Hence, a “bath-tub” shaped
(realistic) failure function as in Fig. 7.4 can be modeled by a Weibull model
for u (w) with d ? 1 in the beginning, d = 1 in the middle and d A 1 at the
end of the life time.
For an exponential lifetime where i[ (w) = h3w , the failure rate (7.4)
equals u(w) = and is independent of time. This means that the failure
rate for a w-year-old object is the same as for a new object, which is a
manifestation of the memoryless property of the exponential distribution.
It also explains why in both the exponential as Poisson process is often
called a ’rate’.
7.6 Problems
(i) A series of test strings each with a variable number Q of bits all equal
to 1 are transmitted over a channel. Due to transmission errors, each
1-bit can be eected independently from the others and only arrives
non-corrupted with probability s. The length Q of the test strings
(words) is a Poisson random variable with mean length bits. In
7.6 Problems
133
this test, the sum \ of the bits in the arriving words is investigated
to determine the channel quality via s. Compute the pdf of \ .
(ii) At a router, four QoS classes are supported and for each class packets
arrive according to a Poisson process with rate m for m = 1> 2> 3> 4.
Suppose that the router had a failure at time w1 that lasted W time
units. What is the probability density function of the total number
of packets of the four classes that has arrived during that period?
(iii) Let Q (w) = Q1 (w) + Q2 (w) be the sum of two independent Poisson
processes with rates 1 and 2 . Given that the process Q (w) had
an arrival, what is the probability that that arrival came from the
process Q1 (w)?
(iv) Peter has been monitoring the highway for nearly his entire life and
found that the cars pass his house according to a Poisson process.
Moreover, he discovered that the Poisson process in one lane is independent from that in the other lanes. The rate of these independent
processes diers per lane and is denoted by 1 > 2 > 3 , where m is
expressed in the number of cars on lane m per hour.
(a) Given that one car passed Peter, what is the probability that
it passed in lane 1?
(b) What is the probability that q cars pass Peter in 1 hour ?
(c) What is the probability that in 1 hour q cars have passed and
that they all have used lane 1?
(v) In a game, audio signals arrive in the interval (0> W ) according to a
Poisson process with rate , where W A 1@. The player wins only
if at least one audio signal arrives in that interval, and if he or she
pushes a button (only one push allowed) upon the last of the signals.
The player uses the following strategy: he or she pushes the button
upon the arrival of the first signal (if any) after a fixed time v W .
(a) What is the probability that the player wins?
(b) Which value of v maximizes the probability of winning, and
what is the probability in that case?
(vi) The arrivals of voice over IP (VoIP) packets to a router is close to
a Poisson process with rate = 0=1 packets per minute. Due to an
upgrade to install weighted fair queueing as priority scheduling rule,
the router is switched o for 10 minutes.
(a) What is the probability of receiving no VoIP packets when
switched o?
(b) What is the probability that more than ten VoIP packets will
arrive during this upgrade?
134
The Poisson process
(c) If there was one VoIP in the meantime, what is the most probable minute of the arrival?
(vii) A link of a packet network carries on average ten packets per second.
The packets arrive according to a Poisson process. A packet has a
probability of 30 % to be an acknowledgment (ACK) packet independent of the others. The link is monitored during an interval of 1
second.
(a) What is the probability that at least one ACK packet has been
observed?
(b) What is the expected number of all packets given that five
ACK packets have been spotted on the link?
(c) Given that eight packets have been observed in total, what is
the probability that two of them are ACK packets?
(viii) An ADSL helpdesk treats exclusively customer requests of one of
three types: (i) login-problems, (ii) ADSL hardware and (iii) ADSL
software problems. The opening hours of the helpdesk are from 8:00
until 16:00. All requests are arriving at the helpdesk according to
a Poisson process with dierent rates: 1 = 8 requests with login
problems/hour, 2 = 6 requests with hardware problems/hour, and
3 = 6 requests with software problems/hour. The Poisson arrival
processes for dierent types of requests are independent.
(a) What is the expected number of requests in one day?
(b) What is the probability that in 20 minutes exactly three requests arrive, and that all of them have hardware problems?
(c) What is the probability that no requests will arrive in the last
15 minutes of the opening hours?
(d) What is the probability that one request arrives between 10:00
and 10:12 and two requests arrive between 10:06 and 10:30?
(e) If at the moment w + v there are n + p requests, what is the
probability that there were n requests at the moment w?
(ix) Arrival of virus attacks to a PC can be modeled by a Poisson process
with rate = 6 attacks per hour.
(a) What is the probability that exactly one attack will arrive
between 1 p.m. and 2 p.m.?
(b) Suppose that at the moment the PC is turned on there were no
attacks on PC, but at the shut-down time precisely 60 attacks
have been observed. What is the expected amount of time
that the PC has been on?
7.6 Problems
135
(c) Given that six attacks arrive between 1 p.m. and 2 p.m., what
is the probability that the fifth attack will arrive between 1:30
p.m. and 2 p.m.?
(d) What is the expected arrival time of that fifth attack?
(x) Consider a system V consisting of q subsystems in series as shown
in Fig. 7.5. The system V operates correctly only if all subsystems
operate correctly. Assume that the probability that a failure in a
subsystem Vl occurs is independent of that in subsystem Vm . Given
the reliability functions Um (w) or each subsystem Vm , compute the
reliability function U(w) of the system V.
S1
S3
S2
Sn
Fig. 7.5. A system consisting of q subsystems in series.
(xi) Same question as in previous exercise but applied to a system V
consisting of q subsystem in parallel as shown in Fig. 7.6.
S1
S2
Sn
Fig. 7.6. A system consisting of q subsystems in parallel.
8
Renewal theory
A renewal process is a counting process for which the interarrival times q
are i.i.d. random variables with distribution I (w). Hence, a renewal process
generalizes the exponential interarrival times in the Poisson process (see
Theorem 7.3.2) to an arbitrary distribution. Since the interarrival times are
i.i.d. random variables, at each event (or renewal) the process probabilistically restarts. The classical example of a renewal process is the successive
replacement of light bulbs: the first bulb is installed at time Z0 , fails at
time Z1 = 1 , and is immediately exchanged for a new bulb, which in turn
fails at Z2 = 1 + 2 , and thereafter replaced by a third bulb, and so on.
How many light bulbs are replaced in a period of w time units given the life
time distribution I (w)?
N(t)
t
W1
W0
W2
W1
W3
W2
W4
W3
W5
W4
W5
Fig. 8.1. The relation between the renewal counting process Q (w), the interarrival
time q and the waiting time Zq .
137
138
Renewal theory
8.1 Basic notions
P
As illustrated in Fig. 8.1, the waiting time Zq = qn=1 n (for q 1, with
Z0 = 0 by convention) is related to the counting process {Q (w)> w 0} by the
equivalence {Q (w) q} +, {Zq w}: the number of events (renewals) up
to time w is at least q if and only if the q-th renewal occurred on or before
time w. Alternatively, the number of events by time w equals the largest
value of q for which the q-th event occurs before or at time w, Q (w) =
max [q : Zq w]. The convention that Z0 = 0 implies that Q(0) = 0: the
counting process starts counting from zero at time 0. The main objective
of renewal theory is to deduce properties of the process {Q(w)> w 0} as a
function of the interarrival distribution I (w) = Pr [ w].
8.1.1 The distribution of the waiting time Zq
If we assume that the interarrival times are i.i.d. having a Laplace transform
Z "
Z "
3}w
* (}) =
h gI (w) =
h3}w i (w)gw
0
0
the waiting time Zq is the sum of q i.i.d. random variables specified by
(2.66) as
Z "
h3}w iZq (w)gw = *q (})=
(8.1)
*Zq (}) =
0
By partial integration, we
R w find the Laplace transform of the distribution
IZq (w) = Pr [Zq w] = 0 iZQ (x)gx
Z "
*Zq (})
*q (})
h3}w IZq (w)gw =
= (8.2)
}
}
0
The inverse Laplace transform follows1 with (2.38) as
Z f+l" q
* (}) }w
1
h g}
Pr [Zq w] =
2l f3l"
}
(8.3)
As an alternative to the approach with probability generating functions,
1
In general, by integration of (2.38), we find
] w
i[ (x)gx =
I[ (w) =
0
1
2l
] f+l"
f3l"
*[ (})
h}w 3 1
g}
}
U f+l" *[ (})
1
whose form seems dierent from (8.3). However, 2l
g} = 0 because the contour
f3l"
}
can be closed over the positive Re(}) A f plane where *[ (}) is analytic and because limU<"
* (Uhl ) = 0 for 3 2 ? ? 2 , which follows from the existence of the Laplace integral
U[
" 3}w
i[ (w)gw.
0 h
8.1 Basic notions
139
we can resort to the q-th convolution, which follows from (2.63) as
iZ1 (w) = i (w)
Z "
Z w
iZQ 31 (w |)i (|)g| =
iZQ 31 (w |)i (|)g|
iZQ (w) =
3"
0
Integrated,
Z w
Z "
Pr [Zq w] =
gx
3"
3"
Z " µZ w3|
=
3"
Z w
3"
iZQ 31 (x |)i (|)g|
¶
iZQ 31 (x)gx i (|)g|
Pr [Zq31 w |] i (|)g|
=
0
(qW)
By denoting Pr [Zq w] = I
(w), we have
I(1W) (w) = I (w)
Z w
(qW)
I((q31)W) (w |)i (|)g|
I (w) =
0
(0W)
These equations also show that we can define I (w) = 1. Let us define
P
(nW)
Xq (w) = qn=1 I (w). By summing both sides in the last equation, we
obtain
Z wX
Z w q31
q
X
((n31)W)
Xq (w) =
I
(w |)i (|)g| =
I((n)W) (w |)i (|)g|
0 n=1
0 n=0
(0W)
With the definition I
(w) = 1, we arrive at
Z w
Xq31 (w |)gI (|) + I (w)
Xq (w) =
(8.4)
0
or, written in terms of convolutions,
Xq (w) = (Xq31 I ) (w) + I (w)
(qW)
Finally, we mention the interesting bound on the convolution I
a non-negative random variable ,
Z w
(qW)
I (w) =
I((q31)W) (w |)gI (|)
0
Z w
((q31)W)
(w)
gI (|) = I((q31)W) (w)I (w)
I
0
(w) for
140
Renewal theory
which follows from the monotone increasing nature of any distribution func(0W)
tion. By iteration on q starting from I (w) = 1, it is immediate that
I(qW) (w) (I (w))q
(8.5)
Since (I (w))q is the distribution of the maximum (3.33) of a set of q
i.i.d. random variables {n }1$n$q , the bound (8.5) means that, for n 0,
#
" q
¸
X
n { Pr max n {
Pr
1$n$q
n=1
P
which is rather obvious because qn=1 n max1$n$q n . The equality sign
is only possible if q 1 of the n are zero.
8.1.2 The renewal function p (w) = H [Q (w)]
From the equivalence {Q (w) q} +, {Zq w}, we directly have
Pr [Q (w) q] = Pr [Zq w] = I(qW) (w)
(8.6)
Pr [Q (w) = q] = Pr [Q (w) q] Pr [Q (w) q + 1]
= I(qW) (w) I((q+1)W) (w)
The expected number of events in (0> w] expressed via the tail probabilities
(2.36) follows with (8.6) as
p(w) = H [Q (w)] =
"
X
I(nW) (w)
(8.7)
n=1
and p(w) is called the renewal function. According to a property of the
counting process, Q (0) = 0, the number of events in (0> w] when w $ 0 is
assumed to be zero such that p(0) = 0. From (8.5), it follows at each point
w for which I (w) ? 1 that
p(w) "
X
n=1
(I (w))n =
1
1
1 I (w)
Hence, for finite w where I (w) ? 1, the renewal function p(w) converges at
least as fast as a geometric series and is bounded. In the limit w $ 4, where
limw<" I (w) = 1, we see that p(w) is not bounded anymore. Intuitively, the
number of repeated events (renewals) in an infinite time interval is clearly
infinite.
The renewal function p(w) completely characterizes the renewal process.
Indeed, if *p (}) is the Laplace transform of p(w), then after taking the
8.1 Basic notions
141
Laplace transform of both sides in (8.7) and using the definition Pr [Zq w] =
(qW)
I (w) together with (8.2), we obtain
1X n
1 * (})
*p (}) =
* (}) =
}
} 1 * (})
"
(8.8)
n=1
provided |* (})| ? 1. From this expression, the interarrival time can be
found from
}*p (})
* (}) =
1 + }*p (})
after inverse Laplace transform. By taking the inverse Laplace transform
(2.38), p(w) is written as a complex integral
Z f+l"
* (}) h}w
1
g}
p(w) =
2l f3l" 1 * (}) }
8.1.3 The renewal equation
After taking the inverse Laplace transform of *p (}) = *p (})* (}) + *}(}) ,
which is deduced from (8.8), a third relation for p (w) that often occurs is
Z w
p(w) =
p(w x)gI (x) + I (w)
0
Z w
I (w x)gp(x) + I (w)
(8.9)
=
0
and is called the renewal equation. Taking the limit q $ 4 in (8.4) also
leads to the renewal equation. Since p(0) = 0, the renewal equation implies
that I (0) = Pr [ 0] = 0 or that processes where a zero interarrival time
is possible (e.g. in simultaneous events) are ruled out. For a Poisson process,
Theorem 7.3.1 states that the occurrence of simultaneous events (k $ 0) is
zero. The requirement p(0) = 0 generalizes the exclusion of simultaneous
events in any renewal process.
The probabilistic argument that leads to the renewal equation is as follows.
By conditioning on the first renewal for n A 0,
Pr [Q (w) = n|Z1 = v] = 0
= Pr [Q (w v) = n 1]
w?v
wv
where in the last case for w v the event {Q (w) = n} is only possible if n 1
renewals occur in time interval (v> w], which is, due to the stationarity of the
142
Renewal theory
renewal process, equal to n 1 renewals in (0> w v]. By the law of total
probability (2.46), we uncondition to find for n 1,
Z "
g Pr [Z1 v]
Pr [Q (w) = n] =
Pr [Q (w) = n|Z1 = v]
gv
gv
0
Z w
Pr [Q (w v) = n 1] i (v)gv
(8.10)
=
0
Multiplying both sides by n and summing over all n 1 gives the average
at the left-hand side,
H [Q (w)] =
"
X
n Pr [Q (w) = n]
n=1
The sum at the right-hand side is
"
X
n Pr [Q (w v) = n 1] =
n=1
"
X
(n + 1) Pr [Q (w v) = n]
n=0
= H [Q (w x)] + 1
Combining both sides yields
Z w
H [Q (w)] = I (w) +
H [Q (w x)] gI (v)
0
which is again the renewal equation (8.9) since p(w) = H [Q (w)].
8.1.4 A generalization of the renewal equation
The renewal equation (8.9) is a special case of the more general class of
integral equations
Z w
\ (w x)gI (x)>
w0
(8.11)
\ (w) = k (w) +
0
in the unknown function \ (w), where k (w) is a known function and I (w) is
a distribution function. This equation can be written using the convolution
notation as
\ (w) = k (w) + \ I (w)
By conditioning on the first renewal as shown above, many renewal problems
can be recasted into the form of the general renewal equation (8.11). An
example is the derivation of the residual life or waiting time given in Section
8.3. Therefore, it is convenient to present the solution to the general renewal
equation (8.11).
8.1 Basic notions
143
Lemma 8.1.1 If k (w) is bounded for all w, then the unique solution of the
general renewal equation (8.11) is
Z w
k(w x)gp(x)
(8.12)
\ (w) = k (w) +
where p (w) =
P"
n=1
0
I (nW) (w) is the renewal function.
Proof: Let us first concentrate on the formal solution. In general, convolutions are best treated in the transformed domain. After taking the Laplace
transform of the general renewal equation (8.11), we obtain
*\ (}) = *k (}) + *\ (}) *I (})
such that
*\ (}) =
*k (})
1 *I (})
There always exists a region in the }-domain where |*I (})| ? 1 such that
the geometric series applies,
*\ (}) = *k (})
"
X
(*I (}))n = *k (}) + *k (})
n=0
"
X
(*I (}))n
n=1
Back transforming and taking into account that (*I (}))n is the transform
of a n-fold convolution yields
\ (w) = k (w) + k "
X
I (nW) (w) = k (w) + k p (w)
n=1
This formal manipulation demonstrates2 that (8.12) is a solution of the
general renewal equation (8.11).
Suppose now that there are two solutions \1 (w) and \2 (w). Their dierence
Y (w) = \1 (w) \2 (w) obeys
Z w
Y (w) =
Y (w x)gI (x) = Y I (w)
0
2
Alternatively, by substituting the solution into the equation, a check is
\ (w) = k (w) + \ W I (w) = k (w) + k W I (w) + k W p W I (w)
%
&
"
[
(nW)
= k (w) + k W I (w) +
I
(w) W I (w)
%
= k (w) + k W I (w) +
n=1
"
[
n=2
= k (w) + k W p (w)
&
I
(nW)
(w) = k (w) + k W
%"
[
n=1
&
I
(nW)
(w)
144
Renewal theory
By convolving both sides with I and using the original equation, we deduce
that Y (w) = Y I I (w). Continuing this process, for each n, we have
that Y (w) = Y I (nW) (w). Since I (nW) (w) $ 0 for all finite w and n $ 4
(because p (w) exists for all finite w), and if Y (w) is bounded, this implies that
Y (w) = 0 for all finite w. This demonstrates the uniqueness and motivates
the requirement that k (w) should be bounded.
¤
8.1.5 The renewal function for a Poisson process
Before showing below that the renewal function p(w) can be specified in
detail as w $ 4, we consider first the Poisson process where the interarrival
times {q }qD1 are i.i.d. exponentially distributed with rate . Since * (}) =
}+ ,
Z f+l" }w
h
1
g}
fA0
p(w) =
2l f3l" } 2
The contour can be closed over the negative Re(})-plane (because w 0).
The only singularity
of the integrand is a double pole at } = 0 with residue
¯
gh}w ¯
p(w) = g} ¯
= w. This result, of course, follows directly from the
}=0
definition of the Poisson process given in (7.2). We see that the renewal
function p(w) for the Poisson process is linear for all w. Moreover, the Poisson
process is the only continuous time renewal process with a linear renewal
function p(w). Indeed, if3 p (w) = w, the renewal equation is
Z w
Z w
w =
((w x)) gI (x) + I (w) = I (x)gx wI (0) + I (w)
0
0
By dierentiation with respect to w and assuming non-zero interarrival times
such that I (0) = Pr [ 0] = 0, we obtain a dierential equation
= I (w) +
gI (w)
gw
whose solution is I (w) = 1 h3w . By Theorem 7.3.2, exponential interarrival times characterize a Poisson process with rate = .
8.2 Limit theorems
In the limit w $ 4, the equivalence relation (8.6) indicates that, for any
fixed value of q, Pr [Q (w) q] = 1, which means that the number of events
3
A linear form p(w) = w + with 6= 0 is impossible because p (0) = 0.
8.2 Limit theorems
145
Z
Q (w)
Q (w) $ 4 as w $ 4. Let us consider Q(w)
, which is the sample mean
of the first Q (w) interarrival times in the
w]. ¤The Strong Law of
£ intervalZ(0>
q
Large Numbers (6.3) indicates that Pr limq<" q = = 1 and, because
Z
(w)
$ = H [ ] as w $ 4. Since
Q (w) $ 4 as w $ 4, we have that QQ(w)
ZQ(w) w ? ZQ(w)+1 , we obtain the inequality
ZQ(w)+1
ZQ(w)
w
?
Q (w)
Q (w)
Q(w)
Since both lower and upper bound tend to , we arrive at the important
= 1 . The random variable counting the number
result that limw<" Q(w)
w
of events in (0> w] per interval length w, converges to the average interarrival time = H [ ]. Unfortunately4 , weh cannot
i simply deduce the intuQ(w)
itive result that also the expectation, H
tends to 1 . On the other
w
hand,£ the expectation
of ZQ(w) is obtained from Wald’s identity (2.69)
¤
as H ZQ(w) = H [Q (w)] H [ ]. Taking the expectation in the inequality
H[Q(w)]
1
ZQ(w) w ? ZQ(w)+1 , leads to H[Q(w)]
H[
+ 1w from which, after
w
w
] ?
the limit w $ 4, the intuitive result follows. Thus, we have proved5 the
following theorem:
Theorem 8.2.1 (Elementary Renewal Theorem) If = H [ ] is the
average interarrival time of events in the renewal process, then
H [Q (w)]
p(w)
1
= lim
=
w<" w
w
Q (w)
1
=
lim
w<"
w
lim
w<"
(8.13)
The left-hand side in (8.13) describes the long run average number of
events (renewals) per unit time. The right-hand side is the reciprocal of
the average interarrival rate (or life time). For example, in the light bulb
replacement process, a bulb lasts on average time units, then, in the long
run or steady state, the light bulbs must be replaced at rate 1 per time unit.
4
As remarked by Ross (1996, p. 108), if X is uniformly distributed on (0> 1), consider the random
variables \q defined as \q = q1X $ 1 . For large q, X A 0 with probability 1, whence \q < 0
q
k
l
1
= 1, for all q. The sequence of random
if q < ". However, H [\q ] = qH 1X $ 1 = q q
q
variables \q converges to 0, although the expected values of \q are all precisely 1=
5 The elementary renewal theorem can be proved only by resorting to complex function theory
and using Laplace—Stieltjes transforms (Cohen, 1969, p. 100). The limit argument provided
by the Strong Law of Large Numbers follows then from a Tauberian theorem.
146
Renewal theory
The extension6 of the Elementary Renewal Theorem is the Key Renewal
Theorem. The Key Renewal Theorem gives the limit w $ 4 of the solution
(8.12) of the general renewal equation (8.11).
Theorem 8.2.2 (Key Renewal Theorem) If j(w) is directly7 Riemann
integrable over [0> 4), then
Z
Z w
1 "
lim
j(w x)gp(x) =
j(x)gx
(8.14)
w<" 0
0
The proof8 is more complicated, based on analysis and found in Feller
(1971, Section XI.1). The essential di!culty is demonstrating that the limit
at the right-hand side indeed exists. An application of the Key Renewal
Theorem is presented in Section 8.3 and here we consider Blackwell’s Theorem.
Blackwell’s Theorem follows from the Key Renewal Theorem when choosing k (w) = 1wM[0>W ) in the general renewal equation (8.11). The corresponding
solution (8.12) for w A W is
Z w
Z w
\ (w) =
1w3xM[0>W ) gp(x) =
gp(x) = p(w) p(w W )
w3W
0
R"
while the Key Renewal Theorem states that limw<" \ (w) = 1 0 k(x)gx =
W
. Hence, we arrive at Blackwell’s Theorem, for any fixed W A 0,
p(w) p(w W )
1
=
w<"
W
lim
The interpretation of Blackwell’s Theorem is that the number of expected
renewals in an interval with length W su!ciently far from the origin (or in
steady-state regime) is approximately equal to W . It can be shown that
the reverse, i.e. the Key Renewal Theorem can be deduced from Blackwell’s
theorem, also holds. Hence, the Key Renewal Theorem is equivalent to
Blackwell’s Theorem.
Similarly to the Key Renewal Theorem the di!culty in Blackwell’s Theorem is the proof that the limit exists. If the existence of the limit is proved,
6
In the sequel we assume that the distribution of S
the interarrival times I (w) is not periodic in
the sense that there exists no integer g such that "
q=0 Pr [ = qg] = 1 or, the random variable
does not only take integer units of some integer g.
7 The concept is introduced to avoid widly oscillating functions that are still integrable over
[0> "), such as j(w) = w1{|w3q|? 1 } . The precise definition is given in Feller (1971). A
q2
su!cient condition for Udirect Riemann integrability is (a) j(w) D 0 for all w D 0, (b) j(w) is
non-increasing and (c) 0" j(x)gx ? ".
8 Based on the relatively new probabilistic concept of “coupling”, alternative proofs of the Key
Renewal Theorem exist (see e.g. Grimmett and Stirzacker (2001, pp. 429—430)).
8.2 Limit theorems
147
which means that limw<" p(w) p(w W ) = d(W ) exists, the Elementary
Renewal Theorem su!ces to prove that the limit has value 1 . Following the
argument of Ross (1996, p. 110), we can write, for finite { and |,
d({ + |) = lim [p(w) p(w { |)]
w<"
= lim [p(w) p(w {)] + lim [p(w {) p(w { |)]
w<"
w<"
= d({) + d(|)
Apart from the trivial solution d({) = 0, the only other9 solution of d({ +
|) = d({) + d(|) is d({) = f{, where f is a constant. Hence, given that
limw<" p(w) p(w W ) = d(W ) exists, this is equivalent to the fact that
q 3W )
the sequence {eq }qD0 where eq = p(wq )3p(w
and wq A wq31 converges
W
to a constant f. The simplest sequence with this property is {eWq }qD0 where
eWq = p(q) p(q 1) and W = 1. Lemma 6.1.1 states that
1X W
1X
p(q)
1
en = lim
p(n) p(n 1) = lim
=
q<" q
q<" q
q<" q
q
q
n=1
n=1
f = lim
where the last equality follows from the Elementary Renewal Theorem (8.13).
Theorem 8.2.3 (Asymptotic Renewal Distribution) If the average
= H [ ] and variance 2 = Var[ ] of the interarrival time of the events in
a renewal process exist, then
5
6
Z {
Q (w) w
1
2
lim Pr 7 q
? {8 = s
h3x @2 gx
w<"
w
2 3"
3
(8.15)
Proof : The Elementary Renewal Theorem states that Q (w) w for large
w, which suggests to consider the random variable X (w) = Q(w) w with
H [X(w)] $ 0. From the equivalence {Q (w) ? q} +, {Zq A w}, we have
{X (w) ? {w } +, {Z{w + w A w} where {w is such that {w + w is a positive
9
The proof is as follows: (i) if | = 0, we see that d({ + 0) = d({) + d(0) or d(0) = 0. (ii)
d(q{) = qd({) for integer q. (iii) Using (ii), we have
that
d(q{ + p|) = qd({) + pd(|). By
choosing q{ + p| = 0 it follows from (i) that d 3 p
| = 3p
d (|) such that (ii) holds for
q
q
rational numbers. Thus, d(t1 { + t2 |) = t1 d({) + t2 d(|) for rational numbers t1 and t2 . (iv)
(y)
$ i (x)+i
of a convex function in Section 5.2 and the fact
Recalling the definition i x+y
2
2
that a function that is both concave and convex is a linear function, it follows that d({) is
linear and with (i) that d({) = f{.
148
Renewal theory
integer. Then,
h
i
Pr [X (w) ? {w ] = Pr Z{w + w A w
³
´
´ 6
³
5
w
Z{w + w {w + w {w + w 8
q
q
A
= Pr 7
{w + w
{w + w
The waiting time Zq consists of a sum of i.i.d. random variables with mean
and variance 2 . By the Central Limit Theorem 6.3.1, there holds that
¸
Z "
1
Zq q
2
s
lim Pr
A{ = s
h3x @2 gx
q<"
q
2 {
which implies that
³
´
³
´ 6
5
Z "
Z{w + w {w + w w {w + w 1
2
8
7
q
q
s
lim Pr
A
h3x @2 gx
=
w<"
w
w
2 |
{w + {w + w3 {w + w = |.
provided limw<" t
{w + w
Hence, we must determine {w such that,
for large w,
{
q w
= |
{w + w
Ã
!
r
³ ´2
2
2
w
and provided the negwhich is satisfied if {w = |22 1 ± 1 + 4 |
q
w
ative sign is chosen. For large w, we see that {w |
+ R(1). Thus,
r ¸
Z "
| w
1
2
h3x @2 gx
=s
lim Pr X (w) ? w<"
2 |
which is equivalent to
6
Z "
Q (w) w
1
2
? {8 = s
h3x @2 gx
lim Pr 7 q
w<"
2 3{
w3
Noting that
R"
5
3x2 @2 gx =
3{ h
R{
3x2 @2 gx finally proves (8.15).
3" h
¤
Comparing Theorem 8.2.3 to the Central Limit Theorem 6.3.1 shows that
the asymptotic variance of Q (w) behaves as
Var [Q (w)]
2
= 3
w<"
w
lim
(8.16)
8.3 The residual waiting time
149
Moreover, Theorem 8.2.3 is a central limit theorem for the dependent random
variables Q (Zq ) where dependence is obvious from Q (Zq ) = Q (Zq31 ) + 1.
8.3 The residual waiting time
Suppose we inspect a renewal process at time w and ask the question “How
long do we have to wait on average to see the next renewal?” This question
frequently arises in renewal problems. For instance, the arrivals of taxis at
a station is a renewal process and, often, we are interested to known how
long we have to wait until the next taxi. Also, packets arriving at a router
may find an earlier packet that is partially served. In order to compute the
total time spent in the system, it is desirable to know the residual service
time of that packet. In addition, this problem belongs to one of the classical
examples to demonstrate how misleading intuition in probability problems
can be. There are two dierent arguments to the question above leading to
two dierent answers:
(i) since my inspection of the process does not alter or influence the
process, the distribution of my waiting time should not depend on the
time w; hence, my average waiting time equals the average interarrival
time of the renewal process.
(ii) the time w of the inspection is chosen at random in (i.e. uniformly distributed over) the interval between two consecutive renewals; hence
my expected waiting time should be half of the average interarrival
time.
Both arguments seem reasonable although it is plain that one of them
must be wrong. Let us try to sort out the correct answer to this apparent
paradox, which, according to Feller (1971, pp. 12—13), has puzzled many
before its solution was properly understood.
A(t)
R(t)
time
t
WN(t)
WN(t)+1
L(t)
Fig. 8.2. Definition of the random variables the age D(w), the lifetime O(w) and the
residual life (or waiting time) U(w).
Figure 8.2 defines the setting of the renewal problem and the quantities of
interest: D(w) is the age at time w, which is the total time elapsed since the
150
Renewal theory
last renewal before w at time ZQ(w) , the residual waiting time (or residual
life or excess life) U(w) is the remaining time at w until the next renewal at
time ZQ(w)+1 and O(w) is the total waiting time (or life time). From Fig. 8.2,
we verify that
D(w) = w ZQ (w)
U(w) = ZQ (w)+1 w
O(w) = ZQ (w)+1 ZQ(w) = D(w) + U(w)
The distribution of the residual waiting time, IU(w) ({) = Pr [U(w) {] will
be derived. Similar to the probabilistic argument before, we condition on
the first renewal. If Z1 = v w, then the first renewal occurs before time
w and the event {U(w) A {|Z1 = v} has the same probability as the event
{U(w v) A {} because the renewal process restarts from scratch at time v.
If v A w, the residual waiting time U(w) lies in the first renewal interval [0> v].
In this case, we have either that the residual waiting time U(w) is certainly
shorter than { if v is contained in the interval [w> w + {], else the residual
waiting time U(w) is surely larger than {. In summary,
;
? Pr [U(w v) A {] if 0 v w
Pr [U(w) A {|Z1 = v] =
0
if w ? v w + {
=
1
if v A { + w
Using the total law of probability (2.46),
Z "
g Pr [Z1 v]
Pr [U(w) A {] =
gv
Pr [U(w) A {|Z1 = v]
gv
0
Z w
Z "
g Pr [ v]
gv
=
Pr [U(w v) A {] i (v)gv +
gv
0
{+w
Z w
=
Pr [U(w v) A {] gI (v) + 1 I ({ + w)
0
This relation is an instance of the general renewal equation (8.11). Since
1 I ({ + w) is monotonously decreasing, for all {, it holds with (2.35) that
Z "
Z "
(1 I ({ + w)) gw (1 I (w)) gw = H [ ] ? 4
0
0
which also implies that limw<" 1I ({+w) = 0. Hence, k (w) = 1I ({+w)
is bounded for all w 0 and Lemma 8.1.1 is applicable, yielding
Z w
[1 I ({ + w v)] gp(v)
Pr [U(w) A {] = 1 I ({ + w) +
0
8.3 The residual waiting time
151
Also, the conditions for direct Riemann integrability in the Key Renewal
Theorem 8.2.2 for j(w) = 1 I ({ + w) are satisfied such that
Z w
lim Pr [U(w) A {] = lim
[1 I ({ + w v)] gp(v)
w<"
w<" 0
Z "
1
=
(1 I ({ + w)) gw
with (8.14)
H [ ] 0
Z "
1
=
(1 I (w)) gw
H [ ] {
In other words, the steady-state or equilibrium distribution function for the
residual waiting time equals
Z {
1
(1 I (w)) gw (8.17)
lim Pr [U(w) {] = Pr [U {] = IU ({) =
w<"
H [ ] 0
Similarly, for w A |, the event {D(w) A |} is equivalent to the event {no
renewals in [w |> w]}, which is equivalent to {U(w |) A |}. Hence,
lim Pr [D(w) A |] = lim Pr [U(w |) A |] = lim Pr [U(w) A |]
w<"
w<"
Z "
1
(1 I (w)) gw
=
H [ ] |
w<"
or, both the residual waiting time U and the age D have the same distribution in steady state (w $ 4). Intuitively, when reversing the time axis in
steady state or looking backward in time, an identically distributed renewal
process is observed in which the role of the age D and the residual life U
are interchanged. Thus, by a time symmetry argument, both distributions
must be the same in steady state.
It is instructive to compute the average residual waiting time H [U] =
H [D] in steady state. Using the expression of the average in terms of tail
probabilities (2.35), we have
Z "
H [U] =
(1 IU ({)) g{
0
Z " Z "
1
=
g{
(1 I (w)) gw
H [ ] 0
{
Reversing the order of the {- and w-integration yields
Z w
Z "
1
gw (1 I (w))
g{
H [U] =
H [ ] 0
0
Z "
1
=
w (1 I (w)) gw
H [ ] 0
152
Renewal theory
After partial integration, we end up with
£ ¤
Z "
H 2
Var[ ] + (H [ ])2
1
2
w i (w)gw =
=
H [U] =
2H [ ] 0
2H [ ]
2H [ ]
or
H [ ] Var[ ]
H [U] =
+
2
2H [ ]
(8.18)
This expression shows that the average remaining waiting time equals half of
the average interarrival time plus the ratio of the variance over the mean of
the interarrival time. The last term is always positive. Since H [D] = H [U]
and H [O] = H [D] + H [U], we observe the curious result that
H [O] = H [ ] +
Var[ ]
H [ ]
H [ ]
or that the average total waiting H [O] is longer than the average interarrival time H [ ], contrary to intuition. This fact is referred to as the inspection paradox: the steady-state interrenewal time, O(w) = ZQ(w)+1 ZQ(w) ,
containing the inspection point at time w, exceeds on average the generic
interarrival time, say Z1 . The explanation is that the inspection point at
time w is uniformly chosen over the time axis and every inspection point is
thus equally likely. The chance that the inspection point w lies in a renewal
interval is proportional to the length of that interval. Hence, it has higher
probability to fall in a long interval, which explains10 why H [O] H [ ].
Only for deterministic interarrival times where Var[ ] = 0 holds the equality sign, H [O] = H [ ]. For exponential interarrival times, application of
(3.18) gives Var[ ] = (H [ ])2 and H [U] = H [ ] while H [O] = 2H [ ]: the
fact of being inspected at time w changes the lifetime distribution and even
doubles the expected total life time for exponentially distributed failure or
interoccurrence times.
Returning to the initial question, we observe that the intuitive result that
]
my waiting time H [U] = H[
2 is only correct for deterministic processes.
Thus, the variability in the interarrival process causes the paradox. We will
see later, in queueing theory in Section 14.3.1, that also in queueing systems the variability in service discipline causes the average waiting time to
increase. At last, Feller (1971, p. 187) remarks that an apparently unbiased
inspection plan may lead to false conclusions because the actual observations are not typical of the population as a whole. When people complain
that buses or trains start running irregularly, the inspection paradox shows
10
A similar type of reasoning is used in the computation of the waiting of the GI/D/m queueing
system in Section 14.4.2.
8.4 The renewal reward process
153
that above-average interarrival times are experienced more often. The inspection paradox thus implies that complaints may be erroneously based on
an overestimation of the real deviations from the regular time schedule of
busses or trains.
By separating each renewal interval into two non-overlapping subintervals
D(w) and U(w), we have described an alternating renewal process. An alternating renewal process models a system that can be in on- or o-period with
a repeating pattern [1 > \1 > [2 > \2 > = = = where each on-period [q has a same
distribution Ion and is followed by an o-period \q . Each o-period has also
a same distribution Io . The o-period \q may dependent on the on-period
[q , but the q-th renewal cycle with duration [q + \q is independent of any
other cycle. An alternating renewal process can be used to model a data
stream of packets, where the on-period reflects the time to store or process
an arriving packet and the o-period a (random) delay between two packets.
Another example is the modeling of the end-to-end delay from a source v
to a destination g in the Internet, where the o-period describes a queueing
in a router due to other interfering tra!c along that path. During the onperiod, a packet is not blocked by other packets. The on-period equals the
propagation delay to travel from the output port of one router to the output
port of the next-hop router. The end-to-end delay along a path with k hops
equals the sum of k consecutive o-periods augmented by the propagation
time from v to g.
8.4 The renewal reward process
The renewal reward process associates at each renewal at time Zq a certain
cost or reward Uq , which may vary over time and can be negative. For
example, each time a light bulb fails, it must be replaced at a certain cost
(negative reward) or each customer in a restaurant pays for his meal (positive reward). The reward Uq may depend on the interarrival time q or
length of the q-th renewal interval, but it is independent of other renewal
epochs (dierent from the q-th). Thus, the pairs (Uq > q ) are assumed to be
independent and identically distributed. Most often one is interested in the
total reward U(w) over a period w (not to be confused with the residual life
time) defined as
X
Q(w)
U(w) =
Uq
(8.19)
q=1
In this setting, the renewal reward process is a generalization of the counting
process where Uq = 1.
154
Renewal theory
By slightly rewriting the total reward U(w) earned over an interval w as
PQ(w)
U(w)
Uq Q (w)
= q=1
w
Q (w)
w
and taking the limit w $ 4, the first fraction tends with probability one
to the average reward H [U] per renewal period by the Strong Law of Large
1
Numbers (Theorem 6.2.2), while the second fraction tends to 1 = H[
] by
the Elementary Renewal Theorem 8.2.1. Hence, with probability one holds
that
H [U]
U(w)
=
(8.20)
lim
w<" w
H [ ]
which means that the time average reward rate equals the average award
per renewal period multiplied by the interarrival rate of renewals (or divided
by the average length of a renewal interval).
Similarly as in the proof of the Elementary Renewal Theorem 8.2.1, the
inequality for any w,
X
Q(w)
X
Q(w)
Uq U(w) q=1
Uq + UQ(w)+1
q=1
leads, after taking the expectations and using Wald’s identity (2.69), to an
inequality for the averages,
£
¤
H [Q (w)] H [U] H [U(w)] H [Q (w)] H [U] + H UQ(w)+1
Dividing by w, the limit w $ 4 becomes
¤
£
H UQ(w)+1
H [Q (w)]
H [U(w)]
H [Q (w)]
lim
H [U] lim
+ lim
H [U] lim
w<"
w<"
w<"
w<"
w
w
w
w
¤
£
Since the average reward per renewal period is finite and H UQ(w)+1 =
H [U], we obtain by the Elementary Renewal Theorem 8.2.1 that
H [U]
H [U(w)]
=
w<"
w
H [ ]
lim
(8.21)
Hence, by comparing (8.21) and (8.20), the time average of the average
reward rate equals the time average of the reward rate.
Example The hard disc in a network server is replaced at cost F1 at time
W . The lifetime or age of this mass storage has pdf iD . If the hard disc fails
earlier, the cost of the repair and the penalties for service disruption is F2 .
What is the long run cost of the hard disc in the server per unit time?
8.5 Problems
155
Consider the replacement of hard discs as a renewal process with i.i.d.
interarrival times and with distribution
½ Rw
0 iD (x)gx if w ? W
Pr [ ? w] =
1
if w W
The average replacement time follows from the tail expression (2.35),
Z "
Z W
H [ ] =
(1 Pr [ ? w]) gw =
(1 Pr [ ? w]) gw
0
0
The replacement cost F ( ) equals F ( ) = F1 1 ?W +F2 1 DW and the average
cost is with (2.13),
H [F] = F1 Pr [ ? W ] + F2 (1 Pr [ ? W ])
The Elementary Renewal Reward Theorem (8.21) and (8.20) states that the
long-run cost of replacements equals
F1 Pr [ ? W ] + F2 (1 Pr [ ? W ])
H [F]
=
RW
H [ ]
(1 Pr [ ? w]) gw
0
Usually the replacement time W is chosen to minimize this long-run cost.
8.5 Problems
More worked examples can be found in Karlin and Taylor (1975, Chapter
5).
£
¤
(i) Calculate Pr ZQ(w) { .
(ii) Derive a recursion equation for the generating function *Q(w) (}) =
¤
£
H } Q(w) of the number of renewals in the interval [0> w] and deduce from that equation the renewal equation (8.9) and a relation
for Var[Q (w)].
(iii) In a TCP session from A to B, IP data packets and IP acknowledgement packets travel a distance of 2000 km over precisely the same
bi-directional path. In case of congestion, the average speed is 40000
km/s and without congestion the speed is three times higher. Congestion only occurs in 20% of the travels. What is the average speed
of IP packets in the TCP session?
(iv) The production of digitalized speech samples depends primarily on
the codec, with an eective average rate u (bits/s). Since this rate is
low compared to the ATM capacity F (bits/s), UMTS will use AAL2
mini-cells in which 1 ATM cell is occupied by Q users. The financial
cost of an UMTS operator increases at qf euro per unit time whenever
156
Renewal theory
there are q ? Q speech samples are waiting for transmission and an
additional cost of N euro each time an ATM cell is transmitted. What
is the average cost per unit time for the UMTS operator?
(v) The cost of replacing a router that has failed is D euro. However,
one can decide to replace a router that has been in service for a
period of time W . The advantage of this approach is that the cost of
replacing a working router is only E euro, where E ? D. The policy
ChangeRouter consists of replacing a router either upon failure or
upon reaching the age W , whichever occurs first. Replacement of the
current router by a new one occurs instantaneously and at each time
there can only be one router in the network. Let {[m } be a sequence
of i.i.d. random variables, where [m is the lifetime of a router m.
(a) Find the time average cost rate F of the policy ChangeRouter.
(b) Compute F if W = 5 years, the cost of replacing the failed
router is D = 10000 euro and the cost of replacing a working
router is E = 7000 euro. The independent random variables
[m are exponentially distributed and the average lifetime of a
router is 10 years.
9
Discrete-time Markov chains
A large number of stochastic processes belong to the important class of
Markov processes. The theory of Markov chains and Markov processes is well
established and furnishes powerful tools to solve practical problems. This
chapter will be mainly devoted to the theory of discrete-time Markov chains,
while the next chapter concentrates on continuous time Markov chains. The
theory of Markov processes will be applied in later chapters to compute or
formulate queueing and routing problems.
9.1 Definition
A stochastic process {[(w)> w 5 W } is a Markov process if the future state
of the process only depends on the current state of the process and not on
its past history. Formally, a stochastic process {[(w)> w 5 W } is a continuous
time Markov process if for all w0 ? w1 ? · · · ? wq+1 of the index set W and
for any set {{0 > {1 > = = = > {q+1 } of the state space it holds that
Pr[[(wq+1 ) = {q+1 |[(w0 ) = {0 ,...,[(wq ) = {q ] = Pr[[(wq+1 ) = {q+1 |[(wq ) = {q ]
(9.1)
Similarly, a discrete-time Markov chain {[n > n 5 W } is a stochastic process
whose state space is a finite or countably infinite set with index set W =
{0> 1> 2> = = =} obeying
Pr [[n+1 = {n+1 |[0 = {0 > = = = > [n = {n ] = Pr [[n+1 = {n+1 |[n = {n ]
(9.2)
A Markov process is called a Markov chain if its state space is discrete. The
conditional probabilities Pr [[n+1 = m|[n = l] are called the transition probabilities of the Markov chain. In general, these transition probabilities can
depend on the (discrete) time n. A Markov chain is entirely defined by the
transition probabilities (9.2) and the initial distribution of the Markov chain
157
158
Discrete-time Markov chains
Pr [[0 = {0 ]. Indeed, by the definition of conditional probability (2.45), we
obtain
Pr [[0 = {0 > = = = > [n = {n ] = Pr [[n = {n |[0 = {0 > = = = > [n31 = {n31 ]
× Pr [[0 = {0 > = = = > [n31 = {n31 ]
and, by the definition of the Markov chain (9.2),
Pr [[0 = {0 > = = = > [n = {n ] = Pr [[n = {n |[n31 = {n31 ]
× Pr [[0 = {0 > = = = > [n31 = {n31 ]
This recursion relation can be iterated resulting in
Pr [[0 = {0 > = = = > [n = {n ] =
n
Y
Pr [[m = {m |[m31 = {m31 ] Pr [[0 = {0 ]
m=1
(9.3)
which demonstrates that the complete information of the Markov chain is
obtained if, apart from the initial distribution, all time depending transition
probabilities are known.
9.2 Discrete-time Markov chain
If the transition probabilities are independent of time n,
Slm = Pr [[n+1 = m|[n = l]
(9.4)
the Markov chain is called stationary. In the sequel, we will confine ourselves
to stationary Markov chains. Since the discrete-time Markov chain is conceptually simpler than the continuous counterpart, we start the discussion
with the discrete case.
Let us consider a state space V with Q states (where Q = dim V can be
infinite). It is convenient to introduce a vector notation1 . Since [n can
only take Q possible values, we denote the corresponding state vector at
discrete-time n by v[n] = [v1 [n] v2 [n] · · · vQ [n]] with vl [n] = Pr [[n = l].
Hence, v[n] is a 1 × Q vector. Since the state [n at discrete-time n must
P
be in one of the Q possible states, we have that Q
l=1 Pr [[n = l] = 1 or, in
PQ
vector notation, v[n]=x = l=1 1=vl [n] = 1, where xW = [1 1 · · · 1]. This
fact is also written as kv[n]k1 = 1, where kdk1 is the t = 1 norm of vector d
1
Unfortunately, a vector in Markov theory is represented as a single row matrix which deviates
from the general theory in linear algebra, followed in Appendix A, where a vector is represented
as a single column matrix. In order to be consistent with the literature on Markov processes,
we have chosen to follow the notation of Markov theory here, but elsewhere we adhere to the
general convention of linear algebra.
9.2 Discrete-time Markov chain
159
defined in the Appendix A.3. In a stationary Markov chain, the states [n+1
and [n are connected via the law of total probability (2.46),
Pr [[n+1 = m] =
=
Q
X
l=1
Q
X
Pr [[n+1 = m|[n = l] Pr [[n = l]
Slm Pr [[n = l]
(9.5)
l=1
which holds for all m, or, in vector notation,
v[n + 1] = v[n]S
where the transition probability matrix S is
6
5
S12
S13
···
S1;Q31
S1Q
S11
9 S21
S22
S23
···
S2;Q31
S2Q :
:
9
9 S31
S32
S33
···
S3;Q31
S3Q :
:
9
S =9
:
..
..
..
..
..
:
9
.
.
.
·
·
·
.
.
:
9
7 SQ31;1 SQ31;2 SQ31;3 · · · SQ31;Q31 SQ 31;Q 8
SQ1
SQ;2
SQ3
···
SQ;Q31
SQQ
(9.6)
(9.7)
Since (9.6) must hold for any initial state vector v[0], by choosing v[0] equal
to a base vector [0 · · · 0 1 0 · · · 0] (all columns zero except for column l)
which expresses that the Markov chain starts from one of the possible states,
say state l, then v[1] = [Sl1 Sl2 · · · SlQ ]. Furthermore, since kv[n]k1 = 1 for
P
any n, it must hold that Q
m=1 Slm = 1 for any state l. The relation
Q
X
Slm = 1
(9.8)
m=1
means that, at discrete-time n, there certainly occurs a transition in the
Markov chain, possibly to the same state as at time n 1. The Q × Q
transition probability matrix S thus consists of Q 2 Q transition probabilities Slm and at each row, one transition probability can be expressed in
P
terms of the others, e.g. Sln = 1 Q
m=1;m6=n Slm . A matrix with elements
0 Slm 1 obeying (9.8) is called a stochastic matrix whose properties are
investigated in Appendix A. Apart from the matrix representation, Markov
chains are often described by a directed graph (as illustrated in the figure below), where Slm is represented by an edge from state l to m provided
Slm A 0. Especially, this feature enables to deduce structural properties of
the Markov chain (such as e.g. communicating states) elegantly.
160
Discrete-time Markov chains
P22
P12
1
P41
4
2
P32
P16
P34
P45
5
7
0
0
9 0
9
S = 9 S41
7 0
0
0
3
P55
P75
P47
5
P63
P56
6
P67
P76
S12
S22
S32
0
0
0
0
0
0
0
0
0
S63
0
0
0
S34
0
0
0
0
0
0
0
S45
S55
0
S75
S16
0
0
0
S56
0
S67
0 6
0
0 :
:
S47 :
0 8
S67
0
Given the initial state vector v[0], the general solution of (9.6) is
v[n] = v[0]S n
(9.9)
Similarly, when knowledge of the Markov chain at discrete-time n is available, we obtain from (9.6) that
v [n + q] = v[n]S q
The elements of the matrix S q are called the q-step transition probabilities,
Slmq = Pr [[n+q = m|[n = l]
(9.10)
for n 0 and q 0. Since the discrete Markov chain must be surely in one
of the Q states q time units later given that it started at time n in state l,
we obtain an extension of (9.8), for all q 1,
Q
X
Slmq = 1
(9.11)
m=1
The demonstration of (9.11) is by induction. If q = 1, (9.8) justifies (9.11).
Assume that (9.11) holds. Any matrix element of S q+1 can be written as
P
q
Slmq+1 = Q
n=1 Sln Snm . Summing over all m, yields for q 0,
Q
X
Slmq+1 =
m=1
Q
X
Sln
Q
X
q
Snm
m=1
n=1
=
Q
X
Sln
(induction argument)
n=1
=1
This proves (9.11).
(q = 1 case)
9.2 Discrete-time Markov chain
161
9.2.1 Definitions and classification
9.2.1.1 Irreducible Markov chains
A state m in a Markov chain is said to be reachable from state l if it is possible
to proceed from state l to state m in a finite number of transitions which is
equivalent to Slmq A 0 for finite q. If every state is reachable from every other
state, the Markov chain is said to be irreducible. The example of the Markov
graph above is not irreducible because state 2 is absorbing. Markov theory is
considerably more simplified if we know that the chain is irreducible, which
justifies to investigate methods to determine irreducibility.
An equivalent requirement for the Markov chain to be irreducible is that
the associated directed graph is strongly connected, i.e. if there is a path
from node l to node m for any pair of distinct nodes (l> m). Let us review
some basic notions from graph theory (see Appendix B.1). Denote by D the
adjacency matrix of S where all non-zero elements in S are replaced by 1.
A walk of length n from state l to state m is a succession of n arcs of the form
(q0 $ q1 )(q1 $ q2 ) · · · (qn31 $ qn ), where q0 = l and qn = m. A path is
a walk in which all nodes are dierent, i.e. qo 6= qp for all 0 o 6= p n.
Lemma B.1.1 (proved in Appendix B.1, art. 5) states that the ¡number
of
¢
n
walks of length n from state l to state m is equal to the element D lm .
A directed graph is strongly connected if and only if each non-diagonal
P
PQ31 n
n
element of the matrix Q31
n=1 S or, equivalent, of E =
n=1 D is positive.
Since S has Q states, the longest possible path between two states consists
of Q 1 hops. By summing over all powers of 1 n Q 1, the element
elm of the matrix E equals the number of all possible walks (of any possible
length) between l and m. Hence, if elm A 0 for all l 6= m, there exists walks
from any state l to any other state m. The converse is readily verified.
Another way to determine irreducibility follows from the definition of
reducibility in the Appendix A.4. However, the methods for strongly connectivity or irreducibility are still algebraic in that they require matrix operations. A computationally more e!cient method consists of applying allpair shortest path algorithms on the Markov graph. Examples of all-pair
shortest path algorithms are that of Floyd-Warshall (with computational
complexity FFloyd-Warschall = R(Q 3 )) or the algorithm of Johnson (complexity FJohnson = R(Q 2 log Q + Q O), where O is the total number of links in
the Markov graph). These algorithms are nicely discussed in Cormen et al.
(1991).
162
Discrete-time Markov chains
9.2.1.2 Communicating states
If two states l and m are reachable from one to each other, they are said to
communicate, which is often denoted by l #$ m.
The concept of communication is an equivalence relation: (a) reflexivity:
l #$ l since S 0 = L or Slm = lm . (b) symmetry: l #$ m then m #$ l
which follows from the definition of communication and (c) transitivity: if
l #$ m and m #$ n then l #$ n. The transitivity follows from the nonq+p
negativity of S and S q+p = S q S p such that the matrix element Sln
=
PQ
q
p
q
p
o=1 Slo Son Slm Smn . By definition of l #$ m and m #$ n, we have
q
p A 0 for some finite q and p. Hence, S q+p A 0, which
Slm A 0 and Smn
ln
implies l #$ n. As an application, the total state space can be partitioned
into equivalence classes. States in one equivalence class communicate with
each other. If there is a possibility to start in one class and to enter another
class in which case there is no return possible to the first class (otherwise
the two classes would form one class), the Markov chain is reducible. In
other words, a Markov chain is irreducible if the equivalence relations result
into one class.
9.2.1.3 Periodic and aperiodic Markov chains
q A 0 for some q 1. The
Consider a state m in a Markov chain with Smm
period gm is defined as the greatest common divisor of those q for which
q A 0. The figure below illustrates a Markov chain with period g = Q .
Smm
1
2
N
5
0
9 0
9
9 0
9
S =9 .
9 ..
9
7 0
3
...
6
4
5
1 0 0
0 1 0
0 0 1
.. ..
. . ···
0 0 0
1 0 0 0
···
···
···
..
.
···
···
6
0
0 :
:
0 :
:
.. :
. :
:
1 8
0
Since the greatest common divisor of a set is the largest integer g that
divides any integer in a set, it is smaller than the minimum element in the
set. Thus,
q
1 gm min{Smm
A 0}
PQ
q
q+p
qSp +
q p
The relation Smm
= Smm
mm
o=1;o6=m Smo Som deduced from matrix multiplication and the fact that all elements in S are non-negative shows that
9.2 Discrete-time Markov chain
163
n cannot decrease with increasing n = q + p. Hence, if S A 0, then all
Smm
mm
n
Smm A 0 for n A 1 and thus gm = 1.
Lemma 9.2.1 If two states l and m communicate (l #$ m), then gl = gm .
Proof: Let q and p be integers such that Slmq A 0 and Smlp A 0. From
P
q p
q p
S q+p = S q S p , the matrix element Sllq+p = Q
n=1 Sln Snl Slm Sml . By
definition of q and p, Sllq+p A 0, and, by definition of a period, gl |(q + p).
Similarly, from S q+o+p = S q S o+p , the matrix element
Sllq+o+p =
Q
X
u=1
Sluq
Q
X
o
p
o
Sun
Snl
Slmq Smm
Smlp
(9.12)
n=1
o A 0, which implies by definition that g |o, then we also have
Now, if Smm
m
that Sllq+o+p A 0 from which gl |(q + p + o). Both conditions gl |(q + p)
and gl |(q + p + o) imply that gl |o. But since gm is the largest such divisor
gm gl . By symmetry of the communication relation (replace l $ m and
¤
m $ l), gl gm which proves the lemma.
The consequence of Lemma 9.2.1 is that all the states in an irreducible
Markov chain have common period g. The irreducible Markov chain is
periodic with period g if g A 1 else it is aperiodic (g = 1). A simple
su!cient condition for an irreducible chain to be aperiodic is that Sll A 0
for some state l. Most Markov chains of practical interest are aperiodic.
9.2.2 The hitting time
Let D be a subset of states, D V. The hitting time WD is the first positive
time the Markov chain is in a state of the set D, thus for n 0, WD =
min(n : [n 5 D). The hitting time2 of a state m follows from the definition
if D = {m}. For irreducible Markov chains, the hitting time Wm is finite, for
any state m.
From the definition of the hitting time, the recursion
X
Pr [[1 = n|[0 = l] Pr [Wm = p 1|[0 = n]
Pr [Wm = p|[0 = l] =
n6=m
is immediate. Indeed, in order to have the transition from state l to state m
at discrete-time p, it is necessary to have first a transition from state l to
some other state n and to pass from that state n to state m for the first time
2
The hitting time Wm is also called the first passage time into a state m.
164
Discrete-time Markov chains
after p 1 time units. For a stationary Markov chain, we have for p A 0
that
X
Sln Pr [Wm = p 1|[0 = n]
(9.13)
Pr [Wm = p|[0 = l] =
n6=m
and, by definition for p 0,
Pr [Wm = p|[0 = m] = 0p
(9.14)
The event {[q = m} can be decomposed in terms of the hitting time Wm .
Indeed, since the events {Wm = p> [q = m} are disjointed for 1 p q,
{[q = m} = ^qp=1 {Wm = p> [q = m}
Applied to the q-step transition probabilities,
Pr [[q = m|[0 = l] = Pr [^qp=1 {Wm = p> [q = m}|[0 = l]
q
X
=
Pr [Wm = p> [q = m|[0 = l]
=
p=1
q
X
Pr [Wm = p|[0 = l] Pr [[q = m|[0 = l> Wm = p]
p=1
By definition of the hitting time, {Wm = p} = ^p31
n=1 {[n 6= m} {[p = m}
such that
£
¤
Pr [[q = m|[0 = l> Wm = p] = Pr [q = m|[0 = l> ^p31
n=1 {[n 6= m} {[p = m}
= Pr [[q = m|[p = m]
where the last step follows from the Markov property (9.2). Thus we obtain
Pr [[q = m|[0 = l] =
q
X
Pr [Wm = p|[0 = l] Pr [[q = m|[p = m]
p=1
or, written in terms of q-step transition probabilities with (9.10),
Slmq =
q
X
q3p
Pr [Wm = p|[0 = l] Smm
(9.15)
p=1
For an absorbing state m where Smm = 1, relation (9.15) simplifies to
Slmq = Pr [Wm q|[0 = l]
(9.16)
9.2 Discrete-time Markov chain
165
9.2.3 Transient and recurrent states
The probability that a Markov chain initiated at state l will ever come into
state m is denoted as
ulm = Pr [Wm ? 4|[0 = l]
(9.17)
If the starting state l equals the target state m, then ull is the probability of
ever returning to state l. If ull = 1, the state l is a recurrent state, while,
if ull ? 1, state l is a transient state. If l is a recurrent state, the Markov
chain started at l will definitely (i.e. with probability 1) return to state l
after some time. On the other hand, if l is a transient state, the Markov
chain started at l has probability 1 ull of never returning to state l. For
an absorbing state l defined by Sll = 1, we have by (9.16) that ulm = 1,
implying that an absorbing state is a recurrent state. Further, the mean
return time to state m when the chain started in m is denoted by
pm = H [Wm ? 4|[0 = m]
(9.18)
Concepts of renewal theory will now be applied to a Markov process.
Let Qn (m) denote the number of times that the Markov chain is in state m
during the time interval [1> n] given the chain started in state l or Qn (m) =
Pn
q=1 1{[q =m|[0 =l} . Using (2.13), the average number of visits to state m in
the time interval [1> n] is
" n
#
n
X
X
£
¤
1{[q =m} |[0 = l =
H 1{[q =m} |[0 = l
H [Qn (m)|[0 = l] = H
q=1
=
n
X
q=1
Pr [[q = m|[0 = l]
q=1
or, in terms of q-step transition probabilities (9.10),
H [Qn (m)|[0 = l] =
n
X
Slmq
(9.19)
q=1
The average number of times that the Markov chain is ever in state m
given that it started from state l, is with Q (m) = limn<" Qn (m),
H [Q (m)|[0 = l] =
"
X
q=1
Pr [[q = m|[0 = l] =
"
X
Slmq
q=1
Hence, if state m is reachable from state l, by definition, there is some n
for which Slmn A 0, which implies that H [Q (m)|[0 = l] A 0. Further, consider the probability Pr [Q (m) q|[0 = l] that the number of returns to
166
Discrete-time Markov chains
state m exceeds q, given the Markov chain started from state l. The event
{Q (m) q} is equivalent to the occurrence of the events {Q (m) q 1}
and the event that the Markov chain will return to m again given that it
started from m. The probability of the latter event is precisely umm . Thus, we
obtain the recursion
Pr [Q (m) q|[0 = l] = umm Pr [Q (m) q 1|[0 = l]
with solution for q 1,
Pr [Q (m) q|[0 = l] = (umm )q31 Pr [Q (m) 1|[0 = l]
Now, Pr [Q (m) 1|[0 = l] = Pr [Wm ? 4|[0 = l] = ulm , such that
Pr [Q (m) q|[0 = l] = (umm )q31 ulm
(9.20)
The average computed with (2.36) yields,
H [Q (m)|[0 = l] =
"
X
ulm
=
Slmq
1 umm
q=1
(9.21)
provided ulm A 0. If ulm = 0 then (9.20) vanishes for every q and thus
H [Q(m)|[0 = l] = 0, which means that state m is not reachable from state
l. In summary:
• For a recurrent state m for which umm = 1, we obtain from (9.21) that
H [Q (m)|[0 = l] $ 4 (if ulm 6= 0 else H [Q (m)|[0 = l] = 0) and from
(9.20),
Pr [Q(m) = 4|[0 = l] = lim Pr [Q (m) q|[0 = l] = ulm
q<"
P
q
A state m is recurrent if and only if "
q=1 Smm $ 4.
• For a transient state m for which umm ? 1, there holds that H [Q (m)|[0 = l]
will be finite and Pr [Q (m) = 4|[0 = l] = 0 or, equivalently,
Pr [Q (m) ? 4|[0 = l] = 1
P
q
A state m is transient if and only if "
q=1 Smm is finite.
These relations explain the dierence between a recurrent and a transient
state. When the Markov chain starts at a recurrent state, it returns infinitely
often to that state because umm = Pr [Q (m) = 4|[0 = m] = 1. If the chain
starts at some other state l that is reachable from state m (ulm A 0), then the
chain will visit state m infinitely often. From this analysis, some consequences
arise.
9.2 Discrete-time Markov chain
167
Corollary 9.2.2 A finite-state Markov chain must have at least one recurrent state.
Proof: Suppose the contrary that, if the state space V is finite, all states
are transient states. For a transient state m it follows from (9.21) that
P"
n
n
n=1 Slm is finite, which implies that limn<" Slm = 0 for any other state l.
If the state space is finite and all states are transient states, then
X
lim Slmn = 0
mMV
n<"
Since the summation has a finite number of terms, the limit and summation
operator can be reversed,
X
lim
Slmn = 0
n<"
mMV
But the total law of probability (9.11) requires that
time n), which leads to a contradiction.
¤
P
n
mMV Slm = 1 (for any
Theorem 9.2.3 If l is a recurrent state that leads to a state m, then the
state m is also a recurrent state and ulm = uml = 1.
Proof: Clearly, the theorem is true if l = m. Suppose that, for l 6= m,
Pr [Wl ? 4|[0 = m] = uml ? 1. This implies that the Markov chain starting
from state m has probability 1 uml A 0 of never hitting state l, which is
impossible because l is a recurrent state that will be visited infinitely often.
Hence uml = 1.
Since state m 6= l is reachable from state l, by definition ulm A 0 and there
is a minimum discrete-time q such that Slmq A 0 and Pr [[n = m|[0 = l] = 0
for n ? q. Similarly, since uml = 1, there exists a minimum discrete-time
p to have a transition from m $ l given the chain started in state m, thus,
Pr [[p = l|[0 = m] = Smlp A 0. From (9.12) and the fact that Slmq A 0 and
Smlp A 0 and state l is a recurrent state such that Sllo A 0, we have, for any
o 0,
q+o+p
Smlp Sllo Slmq A 0
Smm
Summing over all o,
"
X
o=1
q+o+p
Smm
Slmq Smlp
"
X
o=1
Sllo
168
or
Discrete-time Markov chains
"
X
o=1
o
Smm
A
"
X
o
Smm
=
o=q+p+1
"
X
q+o+p
Smm
Slmq Smlp
o=1
"
X
Sllo
o=1
P
o
It follows from (9.21) that the right-hand side diverges. Hence, "
o=1 Smm
diverges and relation (9.21) indicates that umm = 1 or, that m must be a
recurrent state.
¤
A non-empty set F V of states is said to be closed if no state l 5 F
@ F, which is
leads to a state m 5
@ F. Thus, ulm = 0 for any l 5 F and m 5
equivalent to Slm = 0. If the set F is closed, the Markov chain starting in
F will remain, with probability 1, in F all the time. For example, if l is
an absorbing state, F = {l} is closed. A closed set F is irreducible if state
l is reachable from state m for all l> m 5 F. Theorem 9.2.3 together with
Corollary 9.2.2 implies that, if F is a finite, irreducible closed set, all states
are recurrent.
9.3 The steady-state of a Markov chain
9.3.1 The irreducible Markov chain
The steady-state vector = limn<" v[n] follows, after taking the limit n $
4 in (9.6), as
= =S
(9.22)
or, for each component m ,
m =
Q
X
Snm n
(9.23)
n=1
with =x = 1 or kk1 = 1. Equation (9.22) shows that the steady-state
vector does not depend on the initial state v[0].
Alternatively, in view of (9.9), we trivially write S n S n31 S = 0 or
S S n31 S n = 0 and if D = limn<" S n exists, then DDS = 0 or S DD =
0. This implies that D(L S ) = (L S )D = 0 or D = T(1), where T() is
the adjoint matrix of S (see Appendix A.1, art. 7). The non-zero columns
(or rows) of the adjoint matrix T() consist of the (unscaled) eigenvector(s)
belonging to eigenvalue . By (9.11) the rows of S n for any n are normalized
and so must D = T(1). Since there is only one eigenvector belonging to
= 1 (Frobenius Theorem A.4.2), the rows of D = limn<" S n must all be
the same and equal to = [d11 d12 · · · d1Q ]. Furthermore, only if all rows
of D are equal to the steady-state vector , then the dependence on the
9.3 The steady-state of a Markov chain
169
PQ
initial state v[0] vanishes since relation (9.9) becomes m = l=1 vl [0]dlm =
P
n
d1m Q
l=1 vl [0] = d1m . Hence, D = limn<" S = x= or, componentwise, for
all 1 m Q,
lim Slmn = m
(9.24)
n<"
The sequence of matrices S> S 2 > S 3 > = = = > S n thus converges to D = x= for
su!ciently large n. Instead of multiplying the last matrix S n in the sequence
by S to obtain the next one S n+1 , with a same computational eort, the
n
sequence S> S 2 > S 4 > = = = > S 2 , obtained by successively squaring, converges
considerably faster to D = x= and may be useful for sparse S .
On the other hand, relation (9.22) is an eigenvalue equation with eigenvalue = 1 and eigenvector . The Frobenius Theorem A.4.2 states that
the transition probability matrix S has one eigenvalue = 1 with corresponding eigenvector . Since in (9.22) the set (S L)W W = 0 has rank
Q 1, the normalization condition kk1 = 1 furnishes the (last) remaining
equation. Except for the trivial case where S is the identity matrix L, the
solution of is obtained from
5
S21
S31
S11 1
9 S12
S22 1
S32
9 S
S23
S33 1
9
13
9
..
..
..
9
.
.
.
9
7 S
S2;Q 1 S3;Q 1
1;Q 1
1
1
1
···
···
···
..
.
···
···
6 5
6
5
6
0
: 9 0:
: 9
: 9 :
: 9
: 9 0:
: 9
: =9 . :
: =9
: 9 . :
: 9
: 9 . :
: 9
8
7
Q 1 8 7 0 8
SQ 1;Q 1 1 SQ;Q 1
1
1
Q
1
(9.25)
SQ1;1
SQ1;2
SQ1;3
..
.
SQ 1
SQ 2
SQ 3
..
.
1
2
3
..
.
In practice, this method is used, especially if the number of states Q is large
and the transition probability matrix S does not exhibit a special matrix
structure.
In summary, for irreducible Markov chains, there are in general two ways3
of computing the state distribution : via the limiting process (9.24) or via
solving the set of linear equations (9.25). Recall that we have invoked the
Frobenius Theorem A.4.2, which is only applicable for irreducible Markov
chains. There exist cases of practical interest where (9.24) fails to hold. For
example, in the two-state Markov chain studied in Section 9.3.3, there is a
chain where the limit state bounces back and forth between state 0 and state
1. It is of importance to know whether the steady-state distribution exists
in the sense that m 6= 0 for at least one m. If m = 0 for all m, then there is
no stationary (or equilibrium or steady-state) probability distribution.
3
A third method consists of a directed graph solution of linear, algebraic equations discussed by
Chen (1971, Chapter 3) and applied to the steady state equation (9.22) by Hooghiemstra and
Koole (2000).
170
Discrete-time Markov chains
9.3.2 The average number of visits to a recurrent state
A direct application of Lemma 6.1.1 to the steady-state of a Markov chain
is that, if (9.24) holds, then
1 X p
Slm = m
lim
q<" q
p=1
q
Invoking (9.19) where Qqq(m) is the fraction of the time the chain is in state
m during the interval [1> q], the relation is equivalent to
H [Qq (m)|[0 = l]
= m
q<"
q
lim
(9.26)
The time average of the average number of visits to state m given the Markov
chain started in state l converges to the steady-state distribution. In other
words, the long run mean fraction of time that the chain spends in state m
equals m and is independent of the initial state l. From (9.21), it immediately follows that, if m is a transient state, m = 0. Only recurrent states
m have a non-zero probability m that the steady-state is in recurrent state
m. Lemma 6.1.1 and its consequence (9.26) suggests to investigate Qqq(m) for
recurrent states m.
If the Markov chain starts in a recurrent state m, we know from (9.21)
that the chain returns to state m infinitely often. Let Zn (m) denote the time
of the n-th visit of the Markov chain to state m. Then,
Zn (m) = min(Qp (m) = n)
pD1
The interarrival time between the n-th and (n1)-th visit is n (m) = Zn (m)
Zn31 (m). The interarrival times {n (m)}nD1 are independent and identically
distributed random variables as follows from the Markov property. Indeed,
every time w the Markov chain returns to state m, it behaves from that time
onwards as if the Markov process would have started from state m, ignoring
the past before time w. Moreover, they have a common mean H [ (m)] =
H [1 (m)] equal to the mean return time to m given by H [Wm |[0 = m] = pm
because the hitting time is Wm = 1 (m). In other words, just as in renewal
theory in Chapter 8, we have a counting process {Qp (m)> p 1} with
associated waiting times Zn (m) and i.i.d. interarrival times n (m), specified
by the equivalence
{Qp (m) ? n} +, {Zn (m) A p}
Invoking the Elementary Renewal Theorem (8.13), we obtain with pm =
9.3 The steady-state of a Markov chain
171
H [Wm |[0 = m]
lim
q<"
Qq (m)
1
=
q
pm
(9.27)
Thus, the chain returns to state m on average every pm time units and, hence,
the fraction of time the chain is in state m is roughly p1m . These results are
summarized as follows:
Theorem 9.3.1 (Limit Law of Markov Chains) If m is a recurrent state
and the Markov chain starts in state l, then, with probability 1,
1{Wm ?"}
Qq (m)
=
q<"
q
pm
lim
(9.28)
and
lim
q<"
ulm
H [Qq (m)|[0 = l]
=
q
pm
(9.29)
Proof: Above we have proved the case (9.27) where the initial state
[0 = m. In that case 1{Wm ?"} = 1. For an arbitrary initial distribution,
it is possible that the chain will never reach the recurrent state m. In that
case 1{Wm ?"} = 0 given [0 = l. It remains to proof (9.29). By definition,
0 Qq (m) q or, 0 Qqq(m) 1, which demonstrates that, for any q, Qqq(m)
is bounded. From the Dominated Convergence Theorem 6.1.4, we have
¯
¯
¸
¸
Qq (m) ¯¯
Qq (m) ¯¯
[0 = l = H lim
[0 = l
lim H
q<"
q<"
q ¯
q ¯
¯
¸
1{Wm ?"} ¯
¯
[0 = l
=H
pm ¯
ulm
Pr [Wm ? 4|[0 = l]
=
=
pm
pm
which completes the proof.
¤
Theorem 9.3.1 introduces the need for an additional definition. A recurrent state m is called null recurrent if pm = 4, in which case (9.29) reduces
to
H [Qq (m)|[0 = l]
=0
(9.30)
lim
q<"
q
By Tauberian theorems (which investigate conditions for the converse of
Lemma 6.1.1 but which are far more di!cult, as illustrated in the book by
Hardy (1948)), it can be shown that, for null recurrent states, the stronger
result limq<" Slmq = 0 also holds. A recurrent state m is called positive
172
Discrete-time Markov chains
recurrent if pm ? 4. The dierence between a transient and a null recurrent state that both obey (9.30) lies in the fact that, for a transient state,
the limit limq<" H [Qq (m)|[0 = m] is finite while, for a null recurrent state,
limq<" H [Qq (m)|[0 = m] = 4. Relation (9.30) indicates that for a null
recurrent state
H [Qq (m)|[0 = m] = R (qd )
where 0 ? d ? 1, while for a positive recurrent state
H [Qq (m)|[0 = m] = m q + r (q)
The strength of the increase of H [Qq (m)|[0 = m] leads to term positive recurrent states also as strongly ergodic states while null recurrent states are
called weakly ergodic. Figure 9.1 sketches the classification of states in a
Markov process.
state j
transient
recurrent
Sj does not exist
positive recurrent
Sj > 0
aperiodic
lim Pijk S j
kof
null recurrent
Sj = 0
periodic
lim Pijkd d S j
k of
Fig. 9.1. Classification of the states in a Markov process with the corresponding
steady state vector m .
With these additional definitions, Corollary 9.2.2 can be sharpened as
follows:
Corollary 9.3.2 A finite-state Markov chain must have at least one positive
recurrent state.
9.3 The steady-state of a Markov chain
173
Proof: By summing (9.11) over q and dividing by q, we find
1XX p
Slm = 1
q
p=1
Q
q
m=1
Using (9.19) yields
Q
X
H [Qq (m)|[0 = l]
q
m=1
=1
When taking the limit q $ 4 of both sides, the summation and limit
operator can be reversed because the summation involves a finite number of
terms. Hence,
Q
X
m=1
H [Qq (m)|[0 = l]
=1
q<"
q
lim
which is only possible if at least one state m is positive recurrent because
transient and null recurrent states obey (9.30).
¤
Similarly, Theorem 9.2.3 and the combined consequence can be sharpened:
Theorem 9.3.3 If l is a positive recurrent state that leads to a state m, then
the state m is also a positive recurrent state.
Theorem 9.3.4 An irreducible Markov chain with finite-state space is positive recurrent.
Alternatively, a Markov chain with a finite number of states has no null
recurrent states. Thus, finite-state Markov chains appear to have simpler
behavior than infinite-state Markov chains.
Theorem 9.3.5 For an irreducible, positive recurrent Markov chain (even
with an infinite-state space), the steady-state is unique.
Proof: The steady-state of a positive recurrent irreducible Markov chain
satisfies both (9.23) and (9.24), even for an infinite-state Markov chain.
Suppose that d 6= is a second steady-state vector which satisfies kdk1 = 1
and
dm =
Q
X
n=1
Snm dn
(9.31)
174
Discrete-time Markov chains
Multiplying both sides by Sml and summing over all m
Q
X
Sml dm =
m=1
Q
X
Sml
m=1
=
Q
X
Q
X
Snm dn
n=1
dn
n=1
Q
X
Snm Sml =
m=1
Q
X
2
dn Snl
n=1
The reversal in m- and n-summation is always allowed (even for Q $ 4) by
absolute convergence. Using (9.31),
dl =
Q
X
2
dn Snl
n=1
Repeating this process leads, for any q 1 and l 1 to
dl =
Q
X
q
dn Snl
n=1
In the limit for q $ 4, application of (9.24) yields
dl = l
Q
X
dn = l
n=1
which demonstrates uniqueness.
¤
Theorem 9.3.6 For an irreducible, positive recurrent Markov chain holds
lim
q<"
H [Qq (m)|[0 = l]
Qq (m)
1
= m
= lim
=
q<"
q
q
pm
(9.32)
and
Qq (m) qm g
$ Q(0> 1)
3@2 s
m m
q
(9.33)
where m2 = Var[Wm |[0 = m]
Proof: For an irreducible, finite-state Markov chain where ulm = 1, Theorem 9.3.1 and Theorem 9.3.4 together with (9.26) lead to the fundamental
relation (9.32). Relation (9.33) is an application of the Asymptotic Renewal Distribution Theorem 8.2.3. We have shown that the interarrival
times {n (m)}nD1 are i.i.d. with mean H [ (m)] = H [Wm |[0 = m] = pm and
¤
(assumed finite) variance Var[ (m)] = Var[Wm |[0 = m] = m2 .
9.3 The steady-state of a Markov chain
175
As a corollary, from (8.16), we have
Var [Qq (m)|[0 = l]
= m2 m3
q
Moreover, since kk1 = 1, it must hold from (9.32) that
lim
q<"
Q
X
1
m=1
pm
(9.34)
=1
and, from (9.22), that
X Slm
1
=
pm
pl
Q
l=1
A Markov chain that is irreducible and for which all states are positive
recurrent is said to be ergodic. Ergodicity implies that both the steady-state
distribution and the long-run probability distribution limn<" v[n] are the
same. Ergodic Markov chains are basic stochastic processes in the study of
queueing theory.
9.3.3 Example: the two-state Markov chain
The two-state Markov chain is defined by
¸
1s
s
S =
t
1t
and illustrated in Fig. 9.2. A matrix computation of the two-state Markov
1p
p
0
1q
1
q
Fig. 9.2. A two-state Markov chain.
chain is presented in Appendix A.4.2. Here, we follow a probabilistic approach. Since there are only two states, at any discrete-time n, there holds
that Pr [[n = 0] = 1Pr [[n = 1]. Hence, it su!ces to compute Pr [[n = 0].
By the total law of probability and the Markov property (9.2), we have
Pr [[n+1 = 0] = Pr [[n+1 = 0|[n = 1] Pr [[n = 1]
+ Pr [[n+1 = 0|[n = 0] Pr [[n = 0]
176
Discrete-time Markov chains
or, from Fig. 9.2, the Markov chain can only be in state 0 at time n + 1, if
it is in state 0 at time n and the next event at time n + 1 brings it back to
that same state 0, or if it is in state 1 at time n and the next event at time
n + 1 induces a transfer to state 0. Introducing the transition probabilities,
Pr [[n+1 = 0] = t Pr [[n = 1] + (1 s) Pr [[n = 0]
= t (1 Pr [[n = 0]) + (1 s) Pr [[n = 0]
= (1 s t) Pr [[n = 0] + t
This recursion can be iterated back to n = 0,
n31
X
(1 s t)m
Pr [[n = 0] = (1 s t) Pr [[0 = 0] + t
n
m=0
Using the finite geometric series
n,
Pn31
13{n
m
m=0 { = 13{ for any { 6= 1 else
Pn31
¶
µ
t
t
n
+ (1 s t) Pr [[0 = 0] Pr [[n = 0] =
s+t
s+t
m
m=0 { =
(9.35)
With Pr [[n = 1] = 1 Pr [[n = 0],
¶
µ
s
s
+ (1 s t)n Pr [[0 = 1] (9.36)
s+t
s+t
¤
£
If |1 s t| ? 1, the state = Pr [[" = 0] Pr [[" = 1] directly
follows as
h
i
t
s
= s+t
(9.37)
s+t
Pr [[n = 1] =
t
= Pr [[" = 0] and
Observe from (9.35) and (9.36) that, if Pr [[0 = 0] = s+t
s
Pr [[0 = 1] = s+t
= Pr [[" = 1], the Markov chain starts and remains the
whole time (for all n) in the steady-state. In addition, the probability of a
particular sequence of states can be computed from (9.3) or directly from
Fig. 9.2. For example,
Pr [[0 = 1> [1 = 0> [2 = 1> [3 = 1] = ts(1 t) Pr [[0 = 1]
We distinguish three cases:
(i) s = t = 0: The Markov chain consists of two separate states that
do not communicate. Each state can be considered as a single state,
irreducible, Markov chain. Any real number belonging to [0> 1] is a
steady-state solution of each separate set. Also, S = L and, hence,
S " = limn<" S n = L.
9.4 Problems
177
(ii) 0 ? s + t ? 2: The Markov chain is aperiodic irreducible positive
recurrent with steady-state given in (9.37). This is the regular case.
(iii) s = t = 1: The Markov chain is periodic with period
£ 1 12,¤ but still irreducible positive recurrent with steady-state = 2 2 given above.
However, S 2q = L and S 2q+1 = S such that limn<" S n does not
exist, but
1 1 ¸
n
1X m
lim
S = 21 12
n<" n
2
2
m=1
9.4 Problems
(i) Given the transition probability matrix S>
5
6
0=8 0=2 0=0
S = 7 0=8 0=0 0=2 8
0=0 0=8 0=2
(a) draw the Markov chain, (b) compute the steady-state vector in
three dierent ways.
(ii) Consider the discrete-time Markov chain with Q states and with
transition probabilities at each state m,
Sm>m+1 = 1 Sm1 =
1
m
1
m
(a) draw the Markov chain, (b) show that the drift is positive, but
that the Markov chain is nevertheless recurrent.
(iii) Assume that trees in a forest fall into four age groups. Let e [n],
| [n], p [n] and x [n] denote the number of baby trees, young trees,
middle-aged trees and old trees, respectively, in the forest at a given
time period n. A time period lasts 15 years. During a time period,
the total number of trees remains constant, but a certain percentage
of trees in each age group dies and is replaced with baby trees. All
surviving trees in the baby, young and middle-aged group enter into
the next age group. Surviving old trees remain old. Let 0 ? se > s| >
sp > sr ? 1 denote the loss rates in each age group in percent.
(a) Make a discrete Markov chain presentation of the process of
aging and replacement in the forest.
178
Discrete-time Markov chains
(b) The distribution of tree population amongst dierent age categories in time period n is represented by
£
¤W
{ [n] = e [n] | [n] p [n] x [n]
If {[n + 1] = S {[n], what is the transition probability matrix
S?
(c) Let se = 0=1> s| = 0=2> sp = 0=3> sr = 0=4 and suppose that
£
¤W
{ [n] = 5000 0 0 0 . What is the number of trees in
each category after 15 and after 30 years?
(d) What is the steady-state situation?
(iv) A faulty digital video conferencing system shows a clustered error
pattern. If a bit is received correctly, then the chance to receive the
next bit correctly is 0.999. If a bit is received incorrectly, then the
next bit is incorrect with probability 0.95.
(a) Model the error pattern of this system using the discrete-time
Markov chain.
(b) How many communicating classes does the Markov chain have?
Is it irreducible?
(c) In the long run, what is the fraction of correctly received bits
and the fraction of incorrectly received bits?
(d) After the system is repaired, it works properly for 99.9% of
the time. A test sequence after repair shows that, when always starting with a correctly received bit, the next 10 bits
are correctly received with probability 0.9999. What is the
probability now that a correctly (and analogously incorrectly)
received bit is followed by another correct (incorrect) bit?
10
Continuous-time Markov chains
Just as it was convenient in Chapter 2 to treat discrete and continuous
random variables distinctly, the same recipe is advised for discrete-time and
continuous-time Markov chains. Here also, it appears that the continuous
case is more intricate than the discrete counterpart.
10.1 Definition
For the continuous-time Markov chain {[(w)> w 0} with Q states, the
Markov property (9.1) can be written as
Pr[[(w + ) = m|[( ) = l> [(x) = {(x)> 0 x ? ] = Pr[[(w + ) = m|[( ) = l]
and reflects the fact that the future state at time w + only depends on the
current state at time . Similarly as for the discrete-time Markov chain,
we assume that the transition probabilities for the continuous-time Markov
chain {[(w)> w 0} are stationary, i.e. independent of a point in time,
Slm (w) = Pr [[(w + ) = m|[( ) = l] = Pr [[(w) = m|[(0) = l]
(10.1)
Analogous to (9.5) and (9.6), the state vector v(w) in continuous-time with
components vn (w) = Pr [[(w) = n] obeys
v(w + ) = v( )S (w)
(10.2)
Immediately, it follows from (10.2) that
v(w + x + ) = v( )S (w + x)
v(w + x + ) = v( + x)S (w) = v( )S (x)S (w)
= v( + w)S (x) = v( )S (w)S (x)
such that, for all w> x 0, the Q × Q transition probability matrix S (w)
179
180
Continuous-time Markov chains
satisfies
S (w + x) = S (x)S (w) = S (w)S (x)
(10.3)
This fundamental relation1 (10.3) is called the Chapman-Kolmogorov equation. Furthermore, since the Markov chain must be at any time in one of
the Q states, the analogon of (9.8) is, for any state l,
Q
X
Slm (w) = 1
(10.4)
m=1
For continuous-time Markov chains, it is convenient to postulate the initial
condition of the transition probability matrix
S (0) = L
(10.5)
where S (0) = limw0 S (w). The relations (10.1), (10.3), (10.4) and (10.5) are
su!cient to describe the continuous-time Markov process completely.
10.2 Properties of continuous-time Markov processes
We will now concentrate on typical properties of a continuous-time Markov
process.
10.2.1 The infinitesimal generator T
Lemma 10.2.1 The transition probability matrix S (w) is continuous for all
w 0.
Proof: Continuity is proved if limk<0 S (w + k) = limk<0 S (w k) = S (w).
From (10.3) and (10.5), we have for k A 0,
lim S (w + k) = S (w) lim S (k) = S (w)L = S (w)
k<0
k<0
Similarly, the other limit follows for w A 0 and 0 ? k ? w from S (w) =
S (w k)S (k).
¤
If a function is dierentiable, it is continuous. However, the converse is
not generally true. Therefore, we include the additional assumption that
1
On a higher level of abstraction, S (w) can be viewed as a linear operator acting upon the vector
space defined by all possible state vectors v(w). Relation S (w + x) = S (x)S (w) is known as
the semigroup property. The family of these commuting operators possesses an interesting
algebraic structure (see e.g. Schoutens (2000)).
10.2 Properties of continuous-time Markov processes
181
S (k) L
= S 0 (0) = T
k0
k
(10.6)
the matrix
lim
exists. This matrix T is called the infinitesimal generator of the continuoustime Markov process and it plays an important role as shown below. The
infinitesimal generator T corresponds to S L in discrete-time. From (10.4),
Q
X
Slm (k) = 1 Sll (k)
m=1>m6=l
and, dividing both sides by k and letting k approach zero, we find for each
l with the definition of T that
Q
X
tlm = tll 0
(10.7)
m=1>m6=l
S (k)
Hence, the sum of the rows in T is zero, tlm = limk0 lmk 0 and tll 0.
The elements tlm of T are derivatives of probabilities and reflect a change in
transition probability from state l towards state m, which suggests us to call
P
them “rates”. Usually, one defines tl = tll 0. Then, Q
m=1 |tlm | = 2tl ,
which demonstrates that T is bounded if and only if the rates tl are bounded.
Karlin and Taylor (1981, p. 140) show that tlm is always finite. For finitestate Markov processes, tm are finite (since tlm are finite), but, in general, tm
can be infinite. If tm = 4, the state is called instantaneous because when
the process enters this state, it immediately leaves the state. In the sequel,
we confine the discussion to non-instantaneous states, thus 0 tm ? 4.
Continuous-time Markov chains with all states non-instantaneous are coined
conservative.
Probabilistically, (10.1) indicates that, for small k,
Pr [[(w + k) = m|[(w) = l] = tlm k + r(k)
Pr [[(w + k) = l|[(w) = l] = 1 tl k + r(k)
(l 6= m)
(10.8)
which clearly generalizes the Poisson process (see Theorem 7.3.1) and motivates us to call tl the rate corresponding to state l.
Lemma 10.2.2 Given the infinitesimal generator T, the transition probability matrix S (w) is dierentiable for all w 0,
S 0 (w) = S (w)T
(10.9)
= TS (w)
(10.10)
182
Continuous-time Markov chains
These equations are called the forward (10.9) and backward (10.10) equation.
Proof: For w = 0, the lemma follows from the existence of T = S 0 (0).
The derivative S 0 (w) is defined, for w A 0, as
S 0 (w) = lim
k<0
S (w + k) S (w)
k
where the derivative of the matrix has elements Slm0 (w) =
(10.3),
gSlm (w)
gw .
Using
S (w + k) S (w) = S (w)S (k) S (w) = S (w) (S (k) L)
= S (k)S (w) S (w) = (S (k) L) S (w)
we obtain
S (k) L
= S (w)T
k<0
k
S (k) L
S (w) = TS (w)
= lim
k<0
k
S 0 (w) = S (w) lim
which proves the lemma.
¤
Suppose we are interested in the probabilities vn (w) = Pr [[(w) = n] of
finding the system in state n at time w. Each component of the state vector
v(w) is determined by (10.2) as
vn (w + k) =
Q
X
vm (w)Smn (k)
m=1
from which
Q
X
Smn (k)
vn (w + k) vn (w)
Snn (k) 1
= vn (w)
+
vm (w)
k
k
k
m=1>m6=n
In the limit k & 0, we find with tmn = limk0
the dierential equation for vn (w),
v0n (w) = tn vn (w) +
Smn (k)
and tn = limk0 13Sknn (k)
k
Q
X
tmn vm (w)
(10.11)
m=1>m6=n
which, together with the initial condition vn (0), completely determines the
probability vn (w) that the Markov process is in state n at time w=
10.2 Properties of continuous-time Markov processes
183
10.2.2 Algebraic properties of the infinitesimal generator T
Equation (10.10) is a matrix dierential equation in w that can be similarly
solved as the scalar dierential equation i 0 (w) = ti (w). With the initial
condition (10.5), the solution is
S (w) = hTw
(10.12)
which demonstrates the importance of the infinitesimal generator T, explicitly given by
6
5
t1
t12
t13
· · · t1;Q 31
t1Q
9 t21
t2
t23
· · · t2;Q 31
t2Q :
9
:
9 t31
t32
t3
· · · t3;Q 31
t3Q :
9
:
T=9
(10.13)
:
..
..
..
..
..
..
9
:
.
.
.
.
.
.
9
:
7 tQ31;1 tQ31;2 tQ31;3 · · · tQ31 tQ31;Q 8
tQ1
tQ;2
tQ 3
···
tQ;Q31
tQ
Moreover, if all eigenvalues n of T are distinct, art. 4 and art. 8 in Appendix
A.1 indicate that
S (w) = hTw = [diag(hn w )\ W
(10.14)
where [ and \ contain as columns the right- and left-eigenvectors of T
respectively. Written explicitly in terms of the right-eigenvectors {n and
left-eigenvectors |n (which both are an 1 × Q matrices or column vectors as
common in vector algebra), (10.14) reads
S (w) =
Q
X
hn w {n |nW
n=1
where the inner or scalar vector product |nW {n = 1 while the outer product
{n |nW is an Q × Q matrix,
6
5
{n1 |n1 {n1 |n2 {n1 |n3 · · · {n1 |nQ
9 {n2 |n1 {n2 |n2 {n2 |n3 · · · {n2 |nQ :
9
:
9
:
W
{n |n = 9 {n3 |n1 {n3 |n2 {n3 |n3 · · · {n3 |nQ :
9
:
.
.
.
.
.
..
..
..
..
..
7
8
{nQ |n1 {nQ |n2 {nQ |n3 · · ·
{nQ |nQ
If we further assume (thus omitting pathological cases) that S (w) is a stochastic, irreducible matrix
¯ for¯ any time w, Frobenius’ Theorem A.4.2 indicates that all eigenvalues ¯hn w ¯ ? 1 and that only the largest one is precisely
equal to 1, say h1 w = 1, which corresponds to the steady-state eigenvector
|1W = and {1 = x, where xW = [1 1 · · · 1]. Frobenius’ Theorem A.4.2
184
Continuous-time Markov chains
implies that all eigenvalues of T have a negative real part, except for the
steady-state eigenvalue 1 = 0. Hence, we may write
S (w) = x +
Q
X
h3|Re n w|+Im n w {n |nW
(10.15)
n=2
where S" = x is the Q × Q matrix with each row containing the steadystate vector . The expression (10.15) is called the spectral or eigen decomposition of the transition probability matrix S (w).
Apart from the eigen decomposition method and the Taylor expansion
Tw
h
=
"
X
(Tw)n
n=0
n!
the matrix equivalent of h{ = limq<" (1 + {@q)q can be used,
¶
µ
Tw q
Tw
S (w) = h = lim L +
q<"
q
(10.16)
(10.17)
Since T has negative diagonal elements and positive o-diagonal elements,
computing the powers Tn as required in (10.16) suers from numerical
rounding-o error propagation. Relation (10.17) circumvents this problem
by choosing q su!ciently high, maxl tl w ? q, such that L + Tw
q has nonnegative elements smaller than 1 everywhere. For stochastic matrices S ,
n
the sequence S> S 2 > S 4 > = = = > S 2 rapidly converges. Yet another useful representation (10.24) of S (w) is discussed in Section 10.4.1.
10.2.3 Exponential sojourn times
We end this section on properties by proving a remarkable and important
characteristic of continuous-time Markov processes.
Theorem 10.2.3 The sojourn times m of a continuous-time Markov process
in a state m are independent, exponential random variables with mean t1m .
Proof: The independence of the sojourn times follows from the Markov
property (see the renewal argument in Section 9.3.2). The exponential sojourn time is proved in two dierent ways.
1. The proof consists in demonstrating that the sojourn times m satisfy the memoryless property. In Section 3.2.2, it has been shown that the
only continuous distribution that satisfies the memoryless property is the
exponential distribution.
10.2 Properties of continuous-time Markov processes
185
The event {m w + W |m A W } for any W 0 and w 0 is equivalent to
the event {[(w + W + x) = m|[(W + x) = m> [(x) = m}. According to the
Markov property (9.1) and with (10.1),
Pr [m w + W |m A W ] = Pr [[(w + W + x) = m|[(W + x) = m> [(x) = m]
= Pr [[(w + W + x) = m|[(W + x) = m]
= Smm (w)
which is independent of W illustrating the memoryless property. Using the
definition of conditional probability (2.44),
Pr [m w + W |m A W ] =
Pr [m w + W ]
= Smm (w)
Pr [m A W ]
which holds for any W and thus also for W = 0, where Pr [m A 0] = 1. The
distribution of the sojourn time at state m satisfies
Pr [m w] = h3m w = Smm (w)
After dierentiation evaluated at w = 0, we find m = tm .
2. An alternative demonstration of the exponential sojourn times starts
by considering for an initial state m, the probability Kq that the process
remains in state m during an interval [0> w]. The idea is to first sample the
continuous-time interval with step qw and afterwards proceed to the limit
q $ 4, which corresponds to a sampling with infinitesimally small step,
µ ¶
µ ¶
¸
w
2w
Kq = Pr [(0) = m> [
= m> [
= m> = = = > [ (w) = m
q
q
¯
µ
¶
¶
¸
µ
q31
Y
¯
(p + 1)w
pw
¯
Pr [
= m¯ [
= m Pr [[(0) = m]
=
q
q
p=0
¯
µ µ ¶
¸¶q
¯
w
¯
= Pr [
Pr [[(0) = m]
= m ¯ [ (0) = m
q
µ ¶¸q
w
Pr [[(0) = m]
= Smm
q
¡ ¢
where (9.3) and (10.1) are used. For large q, Smm qw can be expanded in a
Taylor series around the origin,
µ ¶
µ ¶
w
1
w
0
Smm
= Smm (0) + Smm (0) + R
q
q
q2
µ ¶
w
1
= 1 tm + R
q
q2
186
Continuous-time Markov chains
such that
µ ¶¶¸
µ
µ ¶¸q
1
w
w
= exp q log 1 tm + R
Smm
q
q
q2
For large q the logarithm can be expanded to first order as
µ
µ ¶¶
µ ¶
w
w
1
1
log 1 tm + R
= tm + R
q
q2
q
q2
which shows that
µ ¶¸q
w
= h3tm w
lim Smm
q<"
q
On the other hand,
lim Kq = Pr [[(x) = m> 0 x w]
q<"
Hence, the probability that the process remains in state m at least for a
duration w equals
Pr [[(x) = m> 0 x w] = h3tm w Pr [[(0) = m]
Conditioned to the initial state with (2.44),
Pr [[(x) = m> 0 x w|[(0) = m] = Pr [m w] = h3tm w
(10.18)
Without resorting to the memoryless property, Theorem 10.2.3 has been
proved.
¤
In summary, the continuous-time Markov process {[(w)> w 5 W } can be
described in two equivalent ways, either by the transition probability matrix
S (w) or by the infinitesimal generator T. In the first description, the process
starts at time w = w0 = 0 in state {0 , where it stays until a transition occurs
at w = w1 , which makes the process jump to state {1 . In state {1 , the process
stays until w = w2 at which time it jumps to state {2 , and so on. The sequence
of states {0 > {1 > {2 > = = = is a discrete Markov process and is called the embedded
Markov chain. The embedded Markov chain is further discussed in Section
10.4. The infinitesimal description based on T formulates the evolution of
the process in terms of rates. The process waits in a state m until a jump
or trigger occurs with rate tm and the average waiting time in state m is t1m .
If tm = 0, the Markov process stays infinitely long in state m, implying that
state m is an absorbing state.
10.3 Steady-state
187
10.3 Steady-state
Theorems 9.3.4 and 9.3.6 demonstrate that, when a finite-state Markov chain
is irreducible (all states communicate and Slm (w) A 0), the steady-state exists. Since, by definition, the steady-state does not change over time, or
limw<" S 0 (w) = 0, it follows from (10.9) and (10.10) that
TS" = S" T = 0
where limw<" S (w) = S" . This relation implies that S" is the adjoint
matrix of T belonging to eigenvalue = 0, which plays a role analogous
to = 1 in the discrete case. By the same arguments as in the discrete
case and as shown in Section 10.2.2, all rows of S" are proportional to the
eigenvector of T belonging to = 0. Thus, the steady-state (row) vector is solution of
T = 0
(10.19)
which means that is orthogonal to any column vector of T such that
necessarily det T = 0 in order for a non-zero solution to exist. A single
component of in (10.19) obeys, using (10.7),
l tl =
Q
X
m tml
(10.20)
m=1>m6=l
This equation has a continuity or conservation law interpretation. The lefthand side reflects the long-run rate at which the process leaves state l. The
right-hand side is the sum of the long-run rates of transitions towards the
state l from other states l 6= m or the aggregate long-run rate towards state
l. Both in- and outwards flux at any state l are in steady-state precisely
in balance. Therefore relations (10.20) are called the balance equations.
The balance equation (10.20) directly follows from the dierential equation (10.11) of the state probabilities vn (w) since limw<" vn (w) = n and
limw<" v0n (w) = 0.
Alternatively, the steady-state vector obeys (10.2) or
= v(0)S" = lim v(0)hTw
w<"
which, together with (10.14), implies that all eigenvalues of T must have negative real part such that only = 0 determines the steady-state. This stability condition on the eigenvalues corresponds to that in a linear, time-variant
system. Since all rows in S" are equal (see also (10.15)), the dependence of
the steady-state vector on the initial state drops out. For, analogous to
188
Continuous-time Markov chains
the discrete-time case and recalling the normalization kv(0)k1 = 1, a single
component becomes
m =
Q
X
vn (0) (S" )nm = (S" )1m
n=1
Q
X
vn (0) = (S" )1m
n=1
10.4 The embedded Markov chain
The main dierence between discrete and continuous-time Markov chains
lies, apart from the concept of time, in the determination of the number
of transitions. The sojourn time in a discrete chain is deterministic and
all times are equal to 1. In other words, if Ilm (w) denotes the distribution
function of the time until a transition from state l to state m occurs, then it
is plain that, for a discrete-time process,
Ilm (w) = 1wD1
Even though the process remains in state m with probability Smm , there has
been a transition precisely after w = 1 units.
On the other hand, Theorem 10.2.3 demonstrates that the sojourn times
in state m are exponential distributed with mean t1m . After, on average t1m
time units, a transition from state m to another state occurs. In contrast
to discrete-time Markov chains, after the exponentially distributed sojourn
time in state m the process makes a transition to other states l 6= m. Let us
investigate this fact in more detail. Let us denote
Ylm (k) = Pr [[(k) = m|[(k) 6= l> [(0) = l]
which describes the probability that, if a transition occurs, the process moves
from state l to a dierent state m 6= l. Using the definition of conditional
probability (2.44),
Ylm (k) =
Slm (k)
Pr [{[(k) = m} _ {[(k) 6= l} |[(0) = l]
=
Pr [[(k) 6= l|[(0) = l]
1 Sll (k)
In the limit k & 0, we have
Slm (k)
Ylm = lim Ylm (k) = lim 13Sk (k) =
k0
k0
ll
k
tlm
tl
P
By (10.7), we see that m=1>m6=l Ylm = 1, demonstrating that, given a transition, it is a transition out of state l to another state m. The quantities Ylm
correspond to the transition probabilities of the embedded Markov chain.
10.4 The embedded Markov chain
189
Alternatively, we can write the rate tlm in terms of the transition probabilities Ylm of the embedded Markov chain as
tlm = tl Ylm
(10.21)
Since tl is the rate (i.e. the number of transitions per unit time) of the
process in state l, relation (10.21) shows that the transition rate tlm from
state l to state m equals the rate of transitions in state l multiplied by the
probability that a transition from state l to state m occurs. By definition,
Yll = 0. For, if we assume that Ylm A 0, relation (10.21) would result in
tll = Yll tl A 0 which contradicts the definition tll = tl . Hence, in the
embedded Markov chain specified by the transition probability matrix Y ,
there are no self-transitions (Yll = 0), which is equivalent to the fact that
the sum of the eigenvalues of Y is zero (A.7), since trace(Y ) = 0.
From the steady-state equation or balance equation (10.20), (10.21) and
Yll = 0, we observe that
l tl =
Q
X
m tm Yml
m=1
On the other hand, the embedded Markov chain has a steady-state vector y
that obeys (9.22) or (9.23)
yl =
Q
X
ym Yml
m=1
and kyk1 = 1. The relations between the steady-state vectors of the continuoustime Markov chain and of its corresponding embedded discrete-time Markov
chain y, are
l tl
(10.22)
yl = PQ
m=1 l tl
yl @tl
l = PQ
m=1 ym @tm
(10.23)
The classification in the discrete-time case into transient and recurrent
can be transferred via the embedded Markov chain to continuous Markov
processes.
10.4.1 Uniformization
The restriction Yll = 0 or tll = 0, which means that there are no selftransitions from a state into itself, can be removed. Indeed, we can rewrite
190
Continuous-time Markov chains
the basic relation (10.12) between the transition probability matrix S (w) and
the infinitesimal generator T for all as
µ
¶¸
¶¸
µ
T
T
= h3w exp w L +
S (w) = exp Lw + w L +
Defining W () = L + T
and maxl tl , a description, alternative to (10.15),
(10.16) and (10.17), appears
Slm (w) = h3w
"
X
(w)n
n=0
n!
Wlmn ()
(10.24)
where W () is a stationary transition probability matrix and, hence, a stochastic matrix.
We also observe that W () = T + L can be regarded as a rate matrix,
with the property that, for each state l,
Q
X
Wlm () =
m=1
Q
X
m=1
Tlm + Q
X
lm = m=1
the transition rate in any state l is precisely the same, equal to . Whereas
the embedded Markov chain defined by (10.21) has no self-transitions (Yll =
P
tl
0), we see for any l and m, that Wll () = 1 1 Q
m=1;m6=l tlm = 1 0 while
t
Wlm () = lm . Hence, W () can be interpreted as an embedded Markov chain
that allows self-transitions. In view of (10.21), the embedded structure of
W () is summarized as
tlm = Wlm ()
for l 6= m
tll = 1 Wll ()
where the constant rate tl = for any state l is, besides self-transitions tll 6=
0, the characterizing property. These properties also reveal that, starting
from the embedded chain Y where Yll = 0, we can add self-transitions
tll A 0 with the eect that, on (10.7), the transition rate tl $ tl + tll . The
opposite figure illustrates an embedded Markov chain with self-transitions
and the corresponding transition probability matrix, where the transition
P
t
rates tl follow from 6m=1 Ylm = 1 with Ylm = tlml . This change in transition
rate will change the steady-state vector since the balance equations (10.20)
change. However, the Markov process {[(w)> w 0} is not modified because
a self-transition does not change [(w) nor the distribution of the time until
the next transition to a dierent state. But self-transitions clearly change
the number of transitions during some period of time. When the transition
10.4 The embedded Markov chain
191
rate tm at each state m are the same, the embedded Markov chain W () is
called a uniformized chain.
5
6
t12
0
0
0
0
0
q22
t1
9
:
t22
q12
q46
0 tt242
0
0 :
9 0
1
t
2
9
:
q61
q24
2
9 0 tt32 tt33 0
0 tt363 :
4
3
3
9
:
Y
=
q51
q32
9 0
0 tt434
0
0 tt464 :
q65
9
:
6
q52
q43
9 t51 t52 t53 0
t56 :
0
7
3
t5
t5
t5
t5 8
5
q53
q56
t61
t65
0
0
0
0
t6
t6
q36
q33
In addition, in a uniformized chain, the steady-state vector w() of W ()
is the same as the steady-state vector . Indeed, from (9.23),
wm () =
Q
X
Wnm ()wn ()
n=1
we have, with W () = L + T
,
¶
Q µ
X
tnm
wm () =
wn ()
nm +
n=1
1X
wn ()tnm
Q
= wm () +
n=1
or,
wn ()tn =
Q
X
wn ()tnm
n=1;n6=m
where wn () = n (independent of ) since it satisfies the balance equation (10.20) and Theorem 9.3.5 assures that the steady-state of a positive
recurrent chain is unique.
We will now interpret (10.24) probabilistically. Let Q(w) denote the
total number of transitions in [0> w] in the uniformized (discrete) process
{[n ()}. Since the transition rates tl = are all the same, Q (w) is a Poisson process with rate because, for any continuous-time Markov chain,
the inter-transition or sojourn times are i.i.d. exponential random variables.
n
is recognized as the probability that the
Thus, Pr [Q (w) = n] = h3w (w)
n!
number of transitions that occur in [0> w] in the uniformized Markov chain
with rate equals n. With (9.10), Wlmn () = Pr [[n () = m|[0 () = l] is the
n-step transition probability of that discrete {[n ()} uniformized Markov
192
Continuous-time Markov chains
process. Relation (10.24) can be interpreted as
Slm (w) =
"
X
Pr [[n () = m|[0 () = l> Q (w) = n] Pr [Q (w) = n]
n=0
or, the probability that the continuous Markov process moves from state l
to state m in a time interval of length w, can be decomposed in an infinite
sum of probabilities. Each probability corresponds to a transition from state
l to state m in n-steps, where the number of intermediate transitions n is a
Poisson counting process with rate .
10.4.2 A sampled-time Markov chain
The sampled-time Markov chain approximates the continuous Markov process
in that the transition probabilities Slm (w) are expanded to first order as in
(10.8) with fixed step k = w. The transition probabilities of the sampledtime Markov chain are
Slm = tlm w
(l 6= m)
Sll = 1 tl w
Clearly, the sampled-time Markov chain also allows self-transitions, as illustrated in Fig. 10.1.
1 q12't
q12
1
q51
2
q52
5
q23
3
q53
q34
4
q45
Continuous-time Markov process
1
1 q23't
q12't
2
q23't
q53't
q51't
1 q34't
3
q34't
q52't
4 1 q45't
q45't
1 (q51 + q52 + q53 't
Sampled-time Markov chain
5
Fig. 10.1. A continuous-time Markov process and its corresponding sampled-time
Markov chain.
From (10.8), we observe that the approximation lies in two facts: (a) w is
fixed such that tlm w Pr [[(w + w) = m|[(w) = l] is increasingly accurate
as w $ 0 and (b) transitions occur at discrete times every w time units.
The sampling step w should be chosen such that the transition probabilities
obey 0 Slm 1, from which we find that w max1l tl .
10.5 The transitions in a continuous-time Markov chain
193
Let y denote the steady-state vector of the sampled-time Markov chain
with kyk1 = 1. Being a discrete Markov chain, the steady-state vector
components ym satisfy (9.23) for each component m,
ym =
Q
X
Q
X
Snm yn = w
n=1
tnm yn + (1 tm w) ym
n=1;n6=m
or
tm ym =
Q
X
tnm yn
n=1;n6=m
By comparing with the balance equation (10.20) and on the uniqueness of
the steady-state (Theorem 9.3.5), we observe that y = or, the steady-state
of the sampled-time Markov chain is exactly (not approximately) equal to
the steady-state of the continuous Markov chain for any sampling step w
max1l tl . Although we can possibly miss by sampling every w time units
the smaller-scale dynamics of the continuous Markov chain, the long-run
behavior or steady-state is exactly captured!
10.5 The transitions in a continuous-time Markov chain
Based on the embedded Markov chain, there exists a framework that deduces
all properties of the continuous Markov chain. In particular, the exponential sojourn times of a continuous-time Markov (Theorem 10.2.3) chain is
postulated as a defining characteristic.
Theorem 10.5.1 Let Ylm denote the transition probabilities of the embedded
Markov chain and tlm the rates of the infinitesimal generator. The transition
probabilities of the corresponding continuous-time Markov chain are found
as
Z w
X
Slm (w) = lm h3tl w + tl
Yln
h3tl x Snm (w x)gx
(10.25)
n6=l
0
Proof: If l is an absorbing state (tl = 0), then, by definition, Slm (w) = lm
for all w 0. For a non-absorbing state l and a process starting from state l,
the event { w> [( ) = n} _ {[(w) = m} is possible if and only if the first
transition from l to n occurs at some time x 5 [0> w] and the next transition
from n to m takes place in the remaining time w x. The probability density
g
Pr [l w] = tl h3tl w for w 0 and
function of the sojourn time is il (w) = gw
194
Continuous-time Markov chains
for infinitesimally small , we have
s = Pr [ w> [( ) = n> [(w) = m|[(0) = l]
Z w
gx Pr [ = x> [(x ) = l|[(0) = l]
=
0
× Pr [[(x) = n|[(x ) = l] Pr [[(w) = m|[(x) = n]
Z w
gxil (x)Yln Snm (w x)
=
0
Z w
tl h3tl x Snm (w x)gx
= Yln
0
Furthermore,
Pr[ w and [(w) = m|[(0) = l] =
X
Pr[w > [( ) = n> [(w) = m|[(0) = l]
n6=l
and
Pr [ A w and [(w) = m|[(0) = l] = lm Pr [l A w] = lm h3tl w
Finally,
Slm (w) = Pr [[(w) = m|[(0) = l]
= Pr[ w and [(w) = m|[(0) = l] + Pr[ A w and [(w) = m|[(0) = l]
Combining all above relations into the last one proves the theorem.
By a change of variable v = w x in (10.25), we have
Slm (w) = lm h3tl w + tl
X
Yln h3tl w
Z w
htl v Snm (v)gv
0
n6=l
and, after dierentiation with respect to w, we find for w 0,
Z w
X
X
tn Yln h3tl w
htl v Snm (v)gv + tl
Yln Snm (w)
Slm0 (w) = tl lm h3tl w tl
n6=l
0
n6=l
X
¡
¢
= tl lm h3tl w tl Slm (w) lm h3tl w + tl
Yln Snm (w)
= tl Slm (w) + tl
X
n6=l
n6=l
Yln Snm (w)
¤
10.6 Example: the two-state Markov chain in continuous-time
195
Evaluated at w = 0, recalling that S 0 (0) = T and S (0) = L,
X
Yln Snm (0)
Slm0 (0) = tl Slm (0) + tl
tlm = tl lm + tl
X
n6=l
Yln nm = tl lm + tl Ylm
n6=l
which is precisely relation (10.21). With tl = tll and (10.21), we arrive at
Slm0 (w) =
Q
X
tln Snm (w)
n=1
which is precisely the backward equation (10.10). Hence, (10.25) can be
interpreted as an integrated form of the backward equation and thus of the
entire continuous-time Markov process.
10.6 Example: the two-state Markov chain in continuous-time
The continuous-time two-state Markov chain is defined by the infinitesimal
generator
¸
T=
where > 0. We will solve S (w) from the forward equation (10.9),
¸
0
¸ ¸
0 (w)
S11 (w) S12
S11 (w) S12 (w)
=
0 (w) S 0 (w)
S21
S21 (w) S22 (w)
22
which actually contains two independent transition probabilities because
S12 (w) = 1 S11 (w) and S21 (w) = 1 S22 (w). The forward equation simplifies
to
0
(w) = ( + )S11 (w) + S11
0
(w) = ( + )S22 (w) + S22
Only the first equation needs to be solved since, by symmetry, the solution
of S11 (w) equals that of S22 (w) after changing the role of $ and $ .
The linear, first-order, non-homogeneous dierential equation consists of
the solution to the corresponding homogeneous dierential equation and a
particular solution. The solution of the homogeneous dierential equation,
0 (w) = ( + )S (w), is S (w) = Fh3(+)w . The particular solution
S11
11
11
is generally found by variation of the constant F, which proposes S11 (w) =
196
Continuous-time Markov chains
F(w)h3(+)w as general solution, where F(w) needs to satisfy the original
dierential equation. Hence,
F 0 (w) = h(+)w
(+)w
or, after integration, F(w) = +
h
+ f. The integration constant f
follows from the initial condition (10.5), S11 (0) = 1. Finally, we arrive at
3(+)w
+
h
+ +
3(+)w
S22 (w) =
+
h
+ +
S11 (w) =
from which the steady-state vector is immediate,
i
h
= +
+
10.7 Time reversibility
In this section, we consider only ergodic Markov chains that have a non-zero
steady-state distribution . Suppose the Markov process operates already in
the steady-state, or, in other words, the Markov process is stationary. We are
interested in the time-reversed process defined by the sequence [q > [q31 > = = =
We will show that this reversed time sequence again constitutes a Markov
process.
Theorem 10.7.1 The time-reversed Markov process is a Markov chain.
Proof: It su!ces to demonstrate that the time-reversed process satisfies the
Markov property
Pr [[q = {q |[q+1 = {q+1 > = = = > [q+n = {q+n ] = Pr [[q = {q |[q+1 = {q+1 ]
By definition of the conditional probability (2.44),
U = Pr [[q = {q |[q+1 = {q+1 > [q+2 = {q+2 > = = = > [q+n = {q+n ]
¤
£
= Pr [q = {q | _np=1 {[q+p = {q+p }
¤
£
Pr _np=0 {[q+p = {q+p }
£
¤
=
Pr _np=1 {[q+p = {q+p }
Since the intersection is commutative D _ E = E _ D, the indices can be reversed,
¤
£
Pr _0p=n {[q+p = {q+p }
U=
Pr [_1p=n {[q+p = {q+p }]
¤ £
¤
£
Pr [q+n = {q+n | _0p=n1 {[q+p = {q+p } Pr _0p=n1 {[q+p = {q+p }
=
Pr [_1p=n {[q+p = {q+p }]
10.7 Time reversibility
197
The original stationary process is a Markov process that satisfies (9.2). Using (9.2)
and (9.3) we have
¤
£
Pr [q+n = {q+n | _0p=n1 {[q+p = {q+p } = Pr[[q+n = {q+n |[q+n1 = {q+n1]
and
n1
Y
¤
£
Pr[[q+p = {q+p |[q+p1 = {q+p1 ]
Pr _0p=n1 {[q+p = {q+p } = Pr[[q = {q ]
p=1
and, similarly,
n
Y
¤
£
Pr[[q+p = {q+p |[q+p1 = {q+p1 ]
Pr _1p=n {[q+p = {q+p } = Pr[[q+1 = {q+1 ]
p=2
Hence,
Pr [[q+n = {q+n |[q+n1 = {q+n1 ] Pr [[q+1 = {q+1 |[q = {q ] Pr [[q = {q ]
Pr [[q+n = {q+n |[q+n1 = {q+n1 ] Pr [[q+1 = {q+1 ]
Pr [[q+1 = {q+1 |[q = {q ] Pr [[q = {q ]
=
Pr [[q+1 = {q+1 ]
U=
Applying Bayes’ rule (2.48) to the last relation finally proves the theorem.
¤
Consider the transition probability of the time-reversed Markov process
Ulm = Pr [[q = m|[q+1 = l]
With Bayes’ rule (2.48),
Pr [[q = m|[q+1 = l] =
Pr [[q+1 = l|[q = m] Pr [[q = m]
Pr [[q+1 = l]
and, since the process is stationary,
Pr [[q = m] = m > Pr [[q+1 = l] = l
the transition probability of the time-reversed process is
Ulm =
m Sml
l
(10.26)
A Markov chain is said to be time reversible if, for all l and m, Slm = Ulm .
From (10.26), the condition for time reversibility is
l Slm = m Sml
(10.27)
This condition means that, for all states l and m, the rate l Slm from state
l $ m equals the rate m Sml from state m $ l. An interesting property of
time reversible Markov chains is that any vector { satisfying k{k1 = 1 and
198
Continuous-time Markov chains
{l Slm = {m Sml is a steady-state vector of a time-reversible Markov chain.
Indeed, summing over all l,
X
X
{l Slm = {m
Sml = {m
l
l
Theorem 9.3.5 indicates that the steady-state is unique and, thus, { = .
As a side remark, we note that a transition matrix is only equal to its
transpose S = S W if the Markov process is time reversible and doubly
stochastic (i.e. l = Q1 for all l, as shown in Appendix A.5.1).
The continuous-time analogon can be immediately deduced from the discrete-time embedded Markov chain defined by the transition probabilities
Ylm . Let Xlm denote the transition probabilities of the time-reversed embedded Markov chain and ulm the rates of the corresponding continuous Markov
chain, then by (10.21)
ulm = ul Xlm
(10.28)
We will now show that the rates ul of the time-reversed continuous Markov
process are indeed exponential random variables. Assume that the timereversed process is in state l at time w. The probability that the process is
still in state l at reversed time w x is, using Theorem 10.2.3,
Pr [[( ) = l> w x w]
Pr [[(w) = l]
Pr [[(w x) = l] h3tm w
= h3tm w
=
Pr [[(w) = l]
Pr [[( ) = l> w x w|[(w) = l] =
because, in steady-state w $ 4, Pr [[(w x) = l] = Pr [[(w) = l] = l for
any finite x. Thus, the sojourn time in state l of the time-reversed process is
exponentially distributed with precisely the same rate ul = tl as the forward
time process. The steady-state vector y of the embedded Markov chain can
be written in terms of the steady-state vector of the continuous Markov
chain via (10.22). By (10.26), we obtain
Xlm =
ym Yml
m tm Yml
=
yl
l tl
With (10.21) and (10.28)
l ulm
m tml
=
ul
tl
but, since ul = tl , we finally arrive at
l ulm = m tml
(10.29)
10.8 Problems
199
Comparing (10.29) with the discrete case (10.26), we see that the transition
probabilities Slm and Ulm are changed for the rates tlm and ulm . We know
that m is the portion of time the process (both forward and reversed) spend
in state m and that tlm is the rate at which the process makes transitions
from state l to state m. Equation (10.29) has again a balance interpretation:
m tml is the rate at which the forward process moves from state m to l, while
l ulm is the rate of the time-reversed process from state l to m and both rates
are equal. Intuitively, when a process jumps from state l $ m in forward
time, it is plain that the process makes, in reversed time, just the opposite
transition from m $ l. Similarly as above, a continuous-time Markov chain
is time reversible if, for all l and m, it holds that ulm = tlm . For these processes
(which occur often in practice, as demonstrated in the chapters on queueing),
the rate from l $ m is equal to the rate from m $ l since l tlm = m tml .
10.8 Problems
(i) Consider a computer that has two identical and independent processors. The time between failures has an exponential distribution. The
mean value of this distribution is 1000 hours. The repair time for a
damaged processor is exponentially distributed as well, with a mean
value of 100 hours. We assume damaged processors can be repaired
in parallel. There are clearly three states for this computer: (1) both
processors work, (2) one processor is damaged and (3) both processors are damaged.
(a) Make a continuous Markov chain presentation of these states.
(b) What is the infinitesimal generator matrix T for this Markov
chain? Give the relation between the state probability at time
w and its derivative.
(c) Calculate the steady-state of this process.
(d) What is the availability of the computer if (i) both processors
are required to work, or (ii) at least one processor should work.
(ii) Consider two identical servers that are working in parallel. When one
server fails, the other has to do the whole job alone under a higher
load. The failure times of servers are exponentially distributed: H =
3 × 1034 k31 , when the servers are equally loaded and F = 7 × 1034
k31 , when one of the servers works under the “full load”. In addition,
both servers may fail at the same time with a failure rate of E =
6 × 1035 k31 .
As soon as one of the servers fails, the repair is initiated. The
200
Continuous-time Markov chains
average downtime of a server is 31 = 10 hours. However, if both
servers are damaged, the whole system must be shut down. The
average time needed to repair both damaged servers is 31
B = 20
hours.
(a) Draw the Markov chain for this system.
(b) Determine the infinitesimal generator matrix T.
(c) Determine the steady-state probabilities.
(d) Determine the average lifetime of dierent states.
(e) What is the average number of server repairs needed during a
period of one year?
11
Applications of Markov chains
This chapter illustrates the theory of Markov chains with several examples.
Examples of queueing problems are deferred to later chapters. Generally,
Markov processes can be solved explicitly provided the transition probability
matrix S or the infinitesimal generator T has a special structure. Only in a
very small number of problems is the entire time dependence of the process
available in analytic form.
11.1 Discrete Markov chains and independent random variables
This section illustrates examples of some simple Markov chains. Consider
a set {\q }qD1 of positive integer, independent random variables that are
identically distributed with Pr [\ = n] = dn .
The discrete-time Markov process, defined by [q = \q for q 1, possesses the (infinite) transition probability matrix,
6
5
d1 d2 d3 d4 · · ·
9 d1 d2 d3 d4 · · · :
:
9
:
S =9
9 d1 d2 d3 d4 · · · :
7 d1 d2 d3 d4 · · · 8
··· ··· ··· ··· ···
All rows are identical and Pr [[q+1 = m|[q = l] = dm shows that the states
[q+1 and [q are independent from each other.
Another, more interesting, discrete-time Markov process is defined by
[q = max [\1 > \2 > \3 > = = = > \q ] =
Hence, the process [q mirrors the maxima of the first q random variables.
Clearly, [q+1 = max [[q > \q+1 ] reflects the Markov property: the next
state is only dependent on the previous state of the process. From [q+1 =
201
202
Applications of Markov chains
max [[q > \q+1 ], we observe that Pr [[q+1 = m|[q = l] = 0 if m ? l because
the maximum does not decrease by adding a new random variable in the
list. If m A l, the state m is determined by \q+1 = m with probability dm . If
m = l, then \q+1 m, which has probability
Pr [\q+1 m] =
m
X
Pr [\q+1 = n] =
n=1
m
X
dn = Dm
n=1
The corresponding probability matrix is
5
D1 d2 d3
9 0 D2 d3
9
S =9
0 D3
9 0
7 0
0
0
··· ··· ···
6
d4 · · ·
d4 · · · :
:
d4 · · · :
:
D4 · · · 8
··· ···
P
A related discrete-time Markov chain is [q = qn=1 \n which obeys [q+1 =
[q + \q+1 . Furthermore, if m l, Pr [[q+1 = m|[q = l] = 0 because the
random variables \q are non-negative such that the sum cannot decrease by
adding a new member. If m A l, then
Pr [[q+1 = m|[q = l] = Pr [[q + \q+1 = m|[q = l]
= Pr [\q+1 = m l] = dm3l
The corresponding probability matrix has a Toeplitz structure possessing
the same elements on diagonal lines
5
6
0 d1 d2 d3 · · ·
9 0
0
d1 d2 · · · :
9
:
9
S =9 0
0
0 d1 · · · :
:
7 0
0
0
0 ··· 8
···
···
···
···
···
This list can be extended by considering other integer functions of the set
Q
{\q }qD1 (such as [q+1 = min [[q > \q+1 ] or [q = qn=1 \n etc.).
11.2 The general random walk
The general random walk is an important model that describes the motion
of an item that is constrained to moving either one step forwards, stay at the
position where it currently is or move one step backwards. In general, this
“three-possibility motion” has transition probabilities that depend on the
position m as depicted in Fig. 11.1. Figure 11.1 illustrates that, if the process
is in state m, it has three possible choices: remain in state m with probability
11.2 The general random walk
203
um = Pr [[n+1 = m|[n = m], move to the next state m + 1 with probability
sm = Pr [[n+1 = m + 1|[n = m] or jump back to state m 1 with probability
tm = Pr [[n+1 = m 1|[n = m]. A general random walk is defined by the
(Q + 1) × (Q + 1) band matrix
5
u0 s0 0 0
9 t1 u1 s1 0
9
9 0 t2 u2 s2
9
S =9 .
..
..
..
9 ..
.
.
.
9
7 0 0 0 0
0 0 0 0
···
···
···
..
.
···
···
0
0
0
..
.
0
0
0
..
.
tQ31 uQ31
0
tQ
6
0
0
0
..
.
:
:
:
:
:
:
:
sQ31 8
uQ
(11.1)
where sm A 0, tm A 0, um 0 and tm + um + sm = 1 for all 0 m Q . The
bordering states zero and Q are special: s0 0, u0 0 and s0 + u0 = 1 and
tQ 0, uQ 0 and tQ + uQ = 1.
rj
0
...
j1
pj
j
j+1
...
N
qj
Fig. 11.1. A transition graph of the general random walk.
The general random walk serves as model for a number of practical phenomena:
• The one-dimensional motion of physical particles, electrons that hop from
one atom to another. In this case, the number of states Q can be very
large.
• The gambler’s ruin problem: a state m reflects the capital of a gambler
whereas the sm is the chance that the gambler wins while tm is the probability that he looses. The gambler achieves his target when he reaches
state Q , but he is ruined at state 0. In that case are both states absorbing states with u0 = uQ = 1. In games, most often the probabilities are
independent of the state and simplify to sm = s, tm = t and um = 1 s t=
• The continuous-time counterpart, the birth and death process (Section
11.3), has applications to queueing processes. For a wealth of examples
and applications of the random walk, we refer to the classical treatise of
Feller (1970, Chapter III, XIV).
204
Applications of Markov chains
11.2.1 The probability of gambler’s ruin
The probability of gambler’s ruin is defined as xm = Pr [[W = 0|[0 = m]
where W = minn {[n = 0} is the hitting time to state 0, which is equivalent
to xm = Pr [W0 ? 4|[0 = m]. By definition, x0 = 1 and since the gambler
achieves his goal at state Q , he stops and never gets ruined, xQ = 0. The
law of total probability (2.46) gives the situation after the first transition,
Pr [[W = 0|[0 = m] =
=
Q
X
n=0
Q
X
Pr [[W = 0|[0 = m> [1 = n] Pr [[1 = n|[0 = m]
Pr [[W = 0|[1 = n] Pr [[1 = n|[0 = m]
n=0
(Markov property)
=
Q
X
Smn Pr [[W = 0|[1 = n]
n=0
= tm Pr [[W = 0|[1 = m 1] + um Pr [[W = 0|[1 = m]
+ sm Pr [[W = 0|[1 = m + 1]
After the first transition, the probability Pr [[W = 0|[1 = m] = xm remains
the same as the initial Pr [[W = 0|[0 = m] because W is a random variable
depending on the state and not on the discrete-time. Hence, we obtain the
equations
x0 = 1
xm = tm xm31 + um xm + sm xm+1
(1 m ? Q )
which are dierent from the corresponding steady-state equations (11.5).
The dierence lies in left-multiplication of S , yielding x = S x instead of
right-multiplication in = S . Substituting um = 1 sm tm gives after
some modification
µ
¶
tm
tm
(11.2)
xm+1 = xm31 + 1 +
xm
sm
sm
Iteration on m for the first few values using x0 = 1 yields
µ
¶
t1
t1
x2 = + 1 +
x1
s1
s1
µ
µ
¶
¶ µ
¶
t2
t2
t1
t1 t2
t1 t2
t1
x3 = x1 + 1 +
+
x2 = + 1+
x1
s2
s2
s1 s1 s2
s1 s1 s2
11.2 The general random walk
205
which suggests
xm = m31 Y
n
X
tp
n=1
s
p=1 p
Ã
+ 1+
m31 Y
n
X
tp
n=1
s
p=1 p
!
x1
as readily verified by substitution in (11.2). The unknown x1 is determined
by the last relation, xQ = 0. Finally, the probability of gambler’s ruin is
PQ31 Qn
tp
Pr [[W = 0|[0 = m] =
n=m
1+
p=1 sp
tp
n=1
p=1 sp
PQ31 Qn
(11.3)
Similarly, the mean hitting time m = H [W |[0 = m] follows by reasoning
on the possible transitions. From state m, there is a transition to state m 1
with probability tm . In case of a transition from state m to state m 1, the
hitting time W consists of the first transition plus the remaining time from
state m 1 on which is m31 . Using the law of total probability or reading
all possible transitions from the transition graph (see Fig. 11.1), we find
m = 1 + tm m31 + um m + sm m+1
with the boundary equations for the state 0 and state Q (which in the
gambler’s ruin problem are absorbing states), 0 = Q = 0. With um =
1 sm tm ,
µ
¶
tm
tm
1
m+1 = m31 + 1 +
m
sm
sm
sm
By iteration,
¶
µ
1
t1
1
+ 1+
s1
s1
¶
µ
¶ µ
¶
µ
t2
t1
1
t2
t2
1
1
t1 t2
2 = 1+
+ 1+
1
3 = 1 + 1 +
+
s2 s2
s2
s1 s2
s1
s1 s1 s2
¶
µ
1
t3
t3
3
4 = 2 + 1 +
s3 s3
s3
µ
¶
µ
¶ µ
¶
t2
1
t3
t1
1
1
t2 t3
t1 t2
t1 t2 t3
1+
1+
+ 1+
1
= +
+
+
s1 s2
s1
s3
s2
s1 s2
s1 s1 s2
s1 s2 s3
2 = or
m = m31
X
1
n=1
sn
Ã
1+
n31 Y
q
X
tn3p+1
q=1 p=1
sn3p
!
Ã
+
1+
m31 Y
n
X
tp
n=1 p=1
sp
!
1
Eliminating 1 from Q = 0, finally leads to mean hitting time to ruin or
206
Applications of Markov chains
the mean duration of the game
Ã
!
m31
q
n31 Y
X
X
1
tn3p+1
m = 1+
sn
sn3p
q=1 p=1
n=1
´
Pn31 Qq
! PQ31 1 ³
Ã
tn3p+1
m31 Y
n
1
+
X
n=1 sn
q=1
p=1 sn3p
tp
+ 1+
PQ31 Qn
s
1 + n=1 p=1 tp
p=1 p
(11.4)
sp
n=1
In spite of the relatively simple dierence equations, the solution rapidly
grows unattractive. The particular case of the Markov chain where tn = t
and sn = s simplifies considerably. The probability of gambler’s ruin (11.3)
becomes
³ ´m ³ ´Q
PQ31 ³ t ´n
t
st
n=m
s
s
=
Pr [[W = 0|[0 = m] =
³ ´Q
PQ31 ³ t ´n
t
1
n=0
s
s
and, if s = t = 12 , via de l’Hospital’s rule, Pr [[W = 0|[0 = m] = Q3m
Q . If the
target fortune Q (at which the game ends) is infinitely large,
Pr [[W = 0|[0 = m] = 1
µ ¶m
t
=
s
if t s
if t ? s
which demonstrates that the gambler surely will loose all his money if his
chances s on winning are smaller than those t on losing. Even in a fair
game where s = t, he will be defeated surely. In a favorable game (s A t)
³ ´m
and with start capital m, ruin is possible with probability st . Another
interpretation is a game with two players d and e in which player d starts
with capital m and has winning chance of s, while player e starts with capital
Q m and wins with probability t = 1 s.
Similarly, the mean duration of the game (11.4) simplifies to
3
³ ´n 4 3
³ ´m 4
³ ´n
t
t
t
m31
Q31
1
1
1
XE
s
s
s
FX
F E
m = C
D+C
³ ´Q D
st
st
n=1
n=1
1 st
or
5
H [W |[0 = m] =
3
³ ´m 4
t
s
6
1 9 E1
:
F
7Q C
³ ´Q D m 8
st
1 st
11.2 The general random walk
207
11.2.2 The steady-state
The steady-state equation (9.23) for the vector component m becomes, for
1 m ? Q,
m = sm31 m31 + um m + tm+1 m+1
(11.5)
and, for m = 0 and m = Q,
0 = u0 0 + t1 1
Q = sQ31 Q31 + uQ Q
We rewrite these equations using um = 1tm sm , u0 = 1s0 and uQ = 1tQ
as
s0 0 = t1 1
(11.6)
sm m = (sm31 m31 tm m ) + tm+1 m+1
(11.7)
sQ31 Q31 = tQ Q
Explicitly, for a few values of m, we observe that
s1 1 = (s0 0 t1 1 ) + t2 2 = t2 2
s2 2 = (s1 1 t2 2 ) + t3 3 = t3 3
···
or, in general, for all m,
sm m = tm+1 m+1
s
m
By iteration of m+1 = tm+1
m starting at m = 0, we find
Y sp
sm sm31
s0
m+1 =
· · · 0 = 0
tm+1 tm
t1
tp+1
m
p=0
The normalization kk1 = 1 yields a condition for 0 ,
431
3
Q m31
X
Y sp
D
0 = C1 +
t
p=0 p+1
m=1
which determines the complete steady-state vector for the general random
walk as
Qm31 sp
m =
p=0 tp+1
1+
PQ Qm31
m=1
sp
p=0 tp+1
(11.8)
These relations remain valid even when the number of states Q tends to
infinity provided the infinite sum converges.
208
Applications of Markov chains
In the simple case where sn = s and tn = t, we obtain with = st ,
m =
(1 ) m
1 Q+1
(11.9)
11.3 Birth and death process
A birth and death process1 is defined by the infinitesimal generator matrix
5
0
0
0
0
0
(1 + 1 )
1
0
0
0
(2 + 2 )
2
0
0
2
(3 + 3 )
3
0
0
3
(4 + 4 ) 4
0
0
4
..
..
..
..
..
.
.
.
.
.
0
9 1
9 0
9
T=9
9 0
9 0
7
..
.
0
0
0
0
0
..
.
6
···
··· :
··· :
:
··· :
:
··· :
8
..
.
The transition graph is shown in Fig. 11.2. Although the theory in the
previous chapter was derived for finite-state Markov chains, the birth and
death process is a generalization to an infinite number of states. The general
random walk (Section 11.2) forms the embedded Markov chain of the birth
and death process with transition probabilities specified by (10.21) resulting
l
l
, Yl>l+1 = l+
and Yln = 0 for n 6= l 1 6= l + 1.
in Yl>l31 = l+
l
l
The transition probability matrix is a tri-band diagonal matrix which is
irreducible if all m A 0 and m A 0.
O0
Oj1
...
0
j1
P1
Oj
j
Pj
j+1
...
Pj+1
Fig. 11.2. The transition graph of a birth and death process.
The basic system of dierential equations that completely describes the
birth and death process follows from the general state probability equations
(10.11) for vn (w) = Pr [[(w) = n] as
v00 (w) = 0 v0 (w) + 1 v1 (w)
(11.10)
v0n (w) = (n + n ) vn (w) + n31 vn31 (w) + n+1 vn+1 (w)
(11.11)
with initial condition vn (0) = Pr [[(0) = n]. Exact analytic solutions for
1
Kleinrock (1975, p. ix) mentions that William Feller was the father of the birth and death
process.
11.3 Birth and death process
209
any n and n are not possible. Indeed, let us denote the Laplace transform
of vn (w) by
Z "
h3}w vn (w)gw=
(11.12)
Vn (}) =
0
Since vn (w) is a continuous and bounded function (|vn (w)| 1 for all w A 0),
the Laplace transform exists for Re(}) A 0. The Laplace transform of (11.10)
and (11.11) becomes,
(0 + }) V0 (}) = v0 (0) + 1 V1 (})
(11.13)
(n + n + }) Vn (}) = vn (0) + n31 Vn31 (}) + n+1 Vn+1 (})
(11.14)
which is a set of dierence equations more complex due to the initial condition vn (0) than the set (A.51) in Appendix A.5.2.3. That set (A.51) appears
in the general random walk whose solution is shown to be intractable in general. This infinite set of dierential equations has been thoroughly studied
over years under several simplifying conditions for n and n , for example,
n = and n = for all n. As shown in Chapter 13, they form the basis
for the simplest set of queueing models of the family M/M/m/K.
11.3.1 The steady-state
The steady-state follows from (10.19) as solution of the set
0 0 + 1 1 = 0
m31 m31 (m + m ) m + m+1 m+1 = 0
This set is identical to (11.6) and (11.7) provided sm and tm are changed for
m and m . After this modification in (11.8), the steady-state of the birth
and death process is
0 =
1
P" Qm31
p
p=0 p+1
m=1
Qm31 p
p=0 p+1
m =
P" Qm31 p
1 + m=1 p=0 p+1
1+
(11.15)
m1
(11.16)
Theorem 9.3.4 states that an irreducible Markov chain with a finite number of states is necessarily recurrent. However, it is in general di!cult
to decide whether an irreducible Markov chain with an infinite number of
states is recurrent or transient. In case of the birth and death process, it
is possible to determine when the process is transient or recurrent. The
process is transient if and only if the embedded Markov chain (determined
210
Applications of Markov chains
above) is transient. Section 9.2.3 discusses that, for a recurrent chain,
ulm = Pr [Wm ? 4|[0 = l] equals 1: a finite hitting time means every state
m is certainly visited starting from initial state l. Applied to the embedded
Markov chain, it follows from the gambler’s ruin (11.3) that
Pm31 Qn
tp
n=0
p=1 sp
tp
p=1 sp
Pr [W0 ? 4|[0 = m] = 1 PQ31 Qn
n=0
Thus, for any fixed initial state m, the condition for a recurrent chain
Pr [Wm ? 4|[0 = l] = 1
P
Qn
tp
is only possible in the limit Q $ 4 if limQ<" Q31
n=0
p=1 sp = 4. Transformed to the birth and death rates, the condition for recurrence becomes
P" Qm31 p
2 =
p=0 p = 4. Furthermore, we observe from (11.16) that the
m=1
P
Qm31 p
infinite series 1 = "
p=0 p+1 must converge to have a stationary or
m=1
steady-state distribution.
In summary, if 1 ? 4 and 2 = 4 the birth and death process is
positive recurrent. If 1 = 4 and 2 = 4, it is null recurrent. If 2 ? 4,
the birth and death process is transient.
11.3.2 A pure birth process
A pure birth process is defined as a process {[(w)> w 0} for which in any
state l it holds that l = 0. It follows from Fig. 11.2 that a birth process
can only jump to higher states such that Slm (w) = 0 for m ? l. Similarly, in
a pure death process {[(w)> w 0} all birth rates l = 0.
11.3.2.1 The Poisson process
Let us first consider the simplest case where all birth rates are equal l = and where Smm (w) = Pr [m A w|[(0) = m] = h3w . Using either the back or
forward equation or (10.25) with Yl>l+1 = l>l+1 , yields
Z w
3w
Slm (w) = lm h
+
h3x Sl+1>m (w x)gx
0
or, for m = l + n with n A 0
3w
Z w
Sl>l+n (w) = h
hx Sl+1>l+n (x)gx
0
Explicitly, for n = 1,
3w
Z w
Sl>l+1 (w) = h
0
hx h3x gx = wh3w
11.3 Birth and death process
211
which is independent of l and, thus, it holds for any l 1. For n = 2,
Z w
Z w
(w)2 3w
3w
x
3w
h
h Sl+1>l+2 (x)gx = h
hx xh3x gx =
Sl>l+2 (w) = h
2
0
0
This suggests us to propose for any l 0,
Sl>l+n (w) =
(w)n 3w
h
n!
(11.17)
which is verified inductively as
Z w
Z w
3w
x
3w
Sl>l+n (w) = h
h Sl+1>l+n (x)gx = h
hx Sl>l+n31 (x)gx
0
= h3w
Z w
0
hx
0
n31
n
(x)
(w) 3w
h3x gx =
h
(n 1)!
n!
Hence, the transition probabilities of a pure birth process have a Poisson
distribution (11.17) and are only function of the dierence in states n =
m l 0 for any w 0. Moreover, for 0 x w, consider the increment
[(w) [(x),
X
Pr [[(w) [(x) = n] =
Pr [[(x) = l> [(w) = l + n]
lD0
=
X
Pr [[(x) = l] Pr [[(w) = l + n|[(x) = l]
lD0
=
X
Pr [[(x) = l] Sl>l+n (w x)
lD0
( (w x))n 3(w3x) X
=
Pr [[(x) = l]
h
n!
lD0
Thus, the increment [(w) [(x) has a Poisson distribution,
Pr [[(w) [(x) = n] =
( (w x))n 3(w3x)
h
n!
(11.18)
and, since [(0) = 0 and the increments are independent (Markov property),
we conclude that the pure birth process is a Poisson process (Section 7.2).
11.3.2.2 The general birth process
In case the birth rates n depend on the actual state n, the pure birth process
can be regarded as the simplest generalization of the Poisson. The Laplace
212
Applications of Markov chains
transform dierence equations (11.13) and (11.14) reduce to the set
v0 (0)
0 + }
n31
vn (0)
Vn (}) =
+
Vn31 (})
n + } n + }
V0 (}) =
which, by the usual iteration, has the solution,
Q
n
X
vm (0) n31
p=m p
Vn (}) =
Qn
p=m (p + })
m=0
(11.19)
Q
with the convention that ep=d i (p) = 1 if d A e. The validity of this
general solution is verified by substitution into the dierence equation for
Vn (}). The form of Vn (}) is a ratio that can always be transformed back to
the time-domain provided that n is known. If all n A 0 are distinct, using
(2.38) with f A 0, we find
vn (w) =
n
X
vm (0)
m=0
n31
Y
p=m
1
p
2l
Z f+l"
f3l"
Qn
h}w
p=m (p + })
g}
By closing the contour over the negative real plane (Re(}) ? 0), only simple
poles at } = q are encountered,
1
2l
Z f+l"
f3l"
h}w
Qn
p=m (p + })
g} = n
X
Qn
q=m
h3q w
p=m;p6=q (p q )
resulting in
vn (w) = n
X
m=0
vm (0)
n
X
q=m
h3q w
Qn
Qn31
p=m p
p=m;p6=q (p q )
(11.20)
If some n = m , multiple poles occur and a slightly more complex result
appears that still can be computed in exact analytic form.
11.3.2.3 The Yule process
A classical example of a process with distinct birth rates is the Yule process,
where n = n. In that case, (11.20) can be simplified. With
Qn31
(n 1)!
p=m p
=
Qq31
Q
Qn
(m 1)! p=m (p q) np=q+1 (p q)
p=m;p6=q (p q )
11.3 Birth and death process
213
Qq31
Q
q313m (q m)! and
and with p=m (p q) = (1)q313m q3m
o=1 o = (1)
Qn
Qn3q
p=q+1 (p q) =
o=1 o = (n q)! we find
Qn31
(1)q313m (n 1)!
p=m p
=
Qn
(m 1)!(q m)!(n q)!
p=m;p6=q (p q )
such that
n
X
Qn31
n3m
(n 1)! X h3(q+m)w (1)q31
=
Qn
(m 1)! q=0 q!(n m q)!
p=m;p6=q (p q )
q=m
¶
n3m µ
(n 1)!h3mw X n m ³ 3w ´q
=
h
(n m)!(m 1)! q=0
q
µ
¶
´n3m
n 1 3mw ³
=
h
1 h3w
m 1
h3q w
p=m p
Finally, for the Yule process, we obtain from (11.20) the evolution of the
state probabilities over time
µ
¶
n
´n3m
X
n 1 3mw ³
vn (w) =
vm (0)
h
(11.21)
1 h3w
m1
m=0
In practice, vm (0) = mq if the process starts from state q (implying vn (w) = 0
for n ? q because the process moves to the right for w 0) and the general
form simplifies to
µ
¶
´n3q
n 1 3qw ³
h
(11.22)
vn (w) =
1 h3w
q1
The Yule process has been used as a simple model for the evolution of a
population in which each individual gives birth at exponential rate and
[(w) denotes the number of individuals in the population (that never decreases as there are no deaths) as a function of time w. At each state n
the population has precisely n individuals that each generate births such
that n = n, the birth rate of the population. If the population starts at
w = 0 with one individual q = 1, the evolution over time has the distribution
¡
¢n31
, which is recognized from (3.5) as a geometric
vn (w) = h3w 1 h3w
distribution with mean hw . Since the sojourn times of a Markov process
are i.i.d. exponential random variables, the average time Wn to reach n inPn 1
1
1 Pn
dividuals from one ancestor equals H [Wn ] =
m=1 m = m=1 m which
is well approximated (Abramowitz and Stegun, 1968, Section 6.3.18) as
H [Wn ] log(n+1)+
, where = 0.577 215. . . is Euler’s constant. If the
214
Applications of Markov chains
population starts with q individuals, the distribution (11.22) at time w consists of a sum of q i.i.d. geometric random variables, which is a negative
binomial distribution. The Yule process has been employed for example as
a crude model to estimate the spread of a disease or epidemic and the split
of molecules in new species by cosmic rays.
11.3.3 Constant rate birth and death process
In a constant rate birth and death process, both the birth rate n = and death rate n = are constant for any state n. From (11.16), the
steady-state for all states m with = ? 1,
m = (1 ) m
m0
(11.23)
only depends on the ratio of birth over death rate. The time-dependent
constant rate birth and death process can still be computed in analytic
form. In this case, the matrix form of the infinitesimal generator T has
the tri-band Toeplitz structure, which can be diagonalized in analytic form
as shown in Appendix A.5.2.1. In this section, we present an alternative
approach. Instead of dealing with an infinite set of dierence equations,
a generating function approach seems more convenient. Let us denote the
generating function of the Laplace transforms Vn (}) by
*({> }) =
"
X
Vn (}){n
(11.24)
n=0
Using (11.12) into (11.24) gives
Z "
" Z "
"
X
X
3}w
n
h vn (w){ gw =
h3}w
vn (w){n gw
*({> }) =
n=0 0
0
n=0
where the reversal of summation and integration is allowed because all terms
are positive. Since 0 vn (w) 1, the sum is at least convergent for |{| ? 1,
which shows that *({> }) is analytic inside the unit circle |{| ? 1 for any
Re(}) A 0.
After multiplying (11.14) by {n and summing over all n, we obtain
( + }) V0 (}) + ( + + })
"
X
n=1
Vn (}){n =
"
X
vn (0){n + n=0
+ V1 (}) + "
X
Vn31 (}){n
n=1
"
X
n=1
Vn+1 (}){n
11.3 Birth and death process
215
and, written in terms of *({> }),
P
n+1 + (1 {) V (})
"
0
n=0 vn (0){
*({> }) =
{2 { ( + + }) + Note that in the general case n and n , an expression in terms of *({> })
is not possible. Before continuing with the computations, we make the
additional simplification that vn (0) = nm : we assume that the constant rate
birth and death process starts in state m. With this initial condition, the
generating function
*({> }) =
(1 {) V0 (}) {m+1
{2 { ( + + }) + (11.25)
still depends on the unknown function V0 (}). The following derivation involving the theory of complex functions demonstrates a standard procedure
that will also be useful in other queuing problems.
The denominator in (11.25) has two roots,
q
++}
1
( + + })2 4
+
{1 =
2
2 q
++}
1
( + + })2 4
{2 =
2
2
We need the powerful theorem of Rouché (Titchmarsh, 1964, p. 116) to
deduce more on the location of {1 and {2 .
Theorem 11.3.1 (Rouché) If i (}) and j(}) are analytic inside and on a
closed contour C, and |j(})| ? |i (})| on C, then i (}) and i (}) + j(}) have
the same number of zeros inside C.
Choose i ({) = { ( + + }) and j({) = {2 such that i({) + j({) =
{2 { ( + + }) + , the denominator in (11.25). Since both i ({) and
j({) are polynomials, they are analytic everywhere in the complex {-plane.
We know that *({> }) is analytic inside the unit disk. If the roots {1 or {2 lie
inside the unit disk, the numerator in (11.25) must have zeros at precisely the
same place in order for *({> }) to be analytic inside the unit disk. Hence, we
consider as contour C in Rouché’s Theorem, the unit circle |{| = 1. Clearly,
inside the unit circle (because A 0, A 0
i ({) has one single zero ++}
and Re(}) A 0). Furthermore, on the unit circle |{| = 1,
| { ( + + })| | |{| |( + + })|| = | |( + + })|| A = |{2 |
which shows that |j(})| ? |i (})| on the unit circle. Rouché’s Theorem then
tells us that i ({) + j({) has precisely one zero inside the unit circle. This
216
Applications of Markov chains
implies that |{1 | A 1 and |{2 | ? 1 and that the numerator in (11.25) has a
zero {2 ,
(1 {2 ) V0 (}) {m+1
=0
2
This relation determines the unknown function V0 (}) as
V0 (}) =
{m+1
2
(1 {2 )
such that (11.25) becomes
13{
m+1
{m+1
2
13{2 {
(1 {) {m+1 (1 {2 )
{m+1
= 2
*({> }) =
({ {1 ) ({ {2 )
(1 {2 ) ({ {1 ) ({ {2 )
We know that the numerator can be divided by ({ {2 ), or explicitly,
{m+1
(1 {) {m+1 (1 {2 ) = {m+1
{m+1 + {2 {({m {m2 )
2
2
#
" m
m31
X m3n
X
= ({ {2 ) {2 {n + {2 {
{m313n
{n
2
"
= ({ {2 )
n=0
n=0
{m2 + ({2 1)
m
X
#
n
{m3n
2 {
n=1
Finally,
P
n
{m2 + ({2 1) mn=1 {m3n
2 {
*({> }) =
(1 {2 ) ({ {1 )
(11.26)
By expanding the denominator in a Taylor series around { = 0, and denoting
d0 = {m2
dn = ({2 1){m3n
2
dn = 0
we have
1
*({> }) =
(1 {2 )
=
1
(1 {2 )
nAm
Ã"
X
!Ã
dn {n
n=0
à n
"
X X
n=0
dn3p
{p
1
p=0
"
X
!
p
{3p
1 {
p=0
!
{n
Comparing with (11.24) and equating the corresponding powers in {, we
11.3 Birth and death process
217
find an explicit form of the Laplace transforms of the probabilities that the
birth and death process is in state n,
n
X
dn3p
1
(1 {2 ) p=0 {p
1
Ã
!
n31 p
n
X
1
{
{m3n
{
2
2
1n?m
= 2
+
(1 {2 ) {n1 p=0 {p
1
Ã
!
m 13n
1
{
{
{
1
{m2 {m3n
1
2 1
1n?m
=
+ 2
(1 {2 ) {n1
{1 {2
Vn (}) =
This expression can be put in dierent forms by using relations among the
zeros {1 and {2 , such as {1 + {2 = ++}
and {1 {2 = . This ingenuity
is required to recognize in Vn (}) a known Laplace transform. Otherwise,
one has to proceed by computing the inverse Laplace transform by contour
integration via (2.38). In any case, the computation needs advanced skills in
complex function theory and we content ourselves here to present the result
without derivation (see e.g. Cohen (1969, pp. 80—82)),
i
h
(11.27)
vn (w) = h3(+)w (n3m)@2 Ln3m (dw) + (n3m31)@2 Ln+m+1 (dw)
3(+)w
+h
n
(1 )
"
X
3p@2 Lp (dw)
p=n+m+2
s
where = , d = 2 and where Lv (}) denotes the modified Bessel function (Abramowitz and Stegun, 1968, Section 9.6.1). Using the asymptotic
formulas for the modified Bessel function, the behavior of vn (w) for large w
can be derived (see e.g. Cohen (1969, p. 84)),
s 2
µ
¸
s ¶
s ¶µ
(nm)@2 h(1 ) w
1
m
+ R(w )
n
vn (w) = (1 ) +
s
s
s ¡s ¢3@2
1 1 2 w
¤
1 £
1 + R(w1 )
only if = 1
=s
w
n
This expression demonstrates that the constant rate birth death
process
¡
s ¢2
converges to the steady-state (1 )n with a relaxation rate 1 .
Clearly, the higher , the lower the relaxation rate and the slower the process
tends to equilibrium as illustrated in Fig. 11.3. Intuitively, two eects play
a role. Since the probability that states with large n are visited increases
with increasing , the built-up time for this occupation will be larger. In
addition, the variability of the number of visited states (further derived
for the M/M/1 queue in Section 14.1) increases with increasing , which
218
Applications of Markov chains
0.10
U = 0.8
U = 0.9
U = 0.7
400
300
0.06
U = 0.6
200
0.04
100
0.02
U = 0.4
0.0
0.00
0.2
0.4
20
0.8
0
U
U = 0.2
0
0.6
Relaxation time
s4(t,U)
0.08
40
60
80
100
t
Fig. 11.3. The probability v4 (w) that the process is in state 4 given that it started
from state 0 as function of time (in units of average death
s time, = 1) for various
= . The insert shows the relaxation time = (1 )2 (in units of average
death time, = 1). The corresponding steady state probability 4 are 0.0012, 0.015,
0.051, 0.072, 0.082, 0.065 for = 0=2> 0=4> 0=6> 0=7> 0=8> 0=9 respectively. Observe that
for = 0=9, the plotted 100 time units are smaller than the relaxiation time, which
is 379 time units.
suggests that larger oscillations of the sample paths around the steady-state
are likely to occur, enlarging the convergence time.
11.4 A random walk on a graph
Let J(Q> O) denotes a graph with Q nodes and O links. Suppose that
the link weight zlm = zml of an edge from node l $ m (or vice versa) is
proportional to the transition probability Slm that a packet at node l decides
to move to node m. Clearly, zll = 0. Specifically, with (9.8),
zlm
Slm = PQ
n=1 zln
This constraint (9.8) destroys the symmetry in link weight structure (zlm =
P
PQ
zml ) because, in general, Slm 6= Sml since Q
n=1 zln 6=
n=1 zmn . The sequence of nodes (or links) visited by that packet resembles a random walk
on the graph J(Q> O) and constitutes a Markov chain. Moreover, the steadystate of this Markov process is readily obtained by observing that the chain is
11.5 Slotted Aloha
219
time reversible. Indeed, the condition for time reversibility (10.27) becomes
l zlm
m zml
= PQ
PQ
n=1 zln
n=1 zmn
or, since zlm = zml ,
l
PQ
n=1 zln
PQ
m
= PQ
n=1 zmn
This implies that l = n=1 zln and using the normalization kk1 = 1,
we obtain the steady-state probabilities for all nodes l,
PQ
PQ
zln
n=1 zln
l = PQ PQ
= PQ n=1
PQ
2 l=1 n=l+1 zln
l=1
n=1 zln
This Markov process can model an active packet that monitors the network by collecting state information (number of packets, number of lost or
retransmitted packets, etc.) in each router. Of course, the link weight structure zlm for the active packet is decisive and requires additional information
to be chosen e!ciently. For example, for tra!c monitoring, the distribution
of the number of packets forwarded by each router must be obtained. For
the collection of these data, the active packet should in steady-state visit
all nodes about equally frequently or l = Q1 , implying that the Markov
transition matrix S must be doubly stochastic (see Appendix A.5.1).
11.5 Slotted Aloha
The Aloha protocol is a basic example of a multiple access communication
scheme of which Ethernet2 is considered as the direct descendant. Aloha —
which means “hello” in the Hawaiian language — was invented by Norman
Abramson at the university of Hawaii in the beginning of 1970s to provide
packet-switched radio communication between a central computer and various data terminals at the campus. Slotted Aloha is a discrete-time version
of the pure Aloha protocol, where all transmitted packets have equal length
and where each packet requires one timeslot for transmission.
Consider a network consisting of Q nodes that can communicate with
each other via a shared communication channel (e.g. a radio channel) using
the slotted Aloha protocol. The simplest arrival process D of packets at
each node is a Poisson process. We assume that these Poisson arrivals at a
2
The essential dierence with the Ethernet’s CSMA/CD (carrier sense multiple access with
collision detection) is that Aloha does not use carrier sensing and does not stop transmitting
when collisions are detected. Carrier sensing is only adequate if the nodes are near to each
other (as in a local area network) such that collisions can be detected before the completion of
transmission. Only then is a timely reaction possible.
220
Applications of Markov chains
node are independent from the Poisson arrivals at another node and that all
Poisson arrivals at a node have the same rate Q where is the overall arrival rate at the network of Q nodes. The idea of the Aloha protocol is that,
upon receipt of a packet, the node transmits that newly arrived packet in the
next timeslot. In case two nodes happen to transmit a packet at the same
timeslot, a collision occurs, which results in a retransmission of the packets.
A node with a packet that must be retransmitted is said to be backlogged.
Even if new packets arrive at a backlogged node, the retransmitted packet
is the first one to be transmitted and, for simplicity (to ignore queueing
of packets at a node), we assume that those new packets are discarded. If
backlogged nodes retransmit the packet in the next timeslot, surely a new
collision would occur. Therefore, backlogged nodes wait for some random
number of timeslots before retransmitting. We assume, for simplicity, that
su is the probability (which is the same for all backlogged nodes) that a
successful transmission occurs in the next time slot. Moreover, the probability su of retransmission is the same for each timeslot. The number of time
slots between the occurrence of a collision and a successful transmission is
a geometric random variable Wu (see Section 3.1.3) with parameter su such
that Pr [Wu = n] = su (1 su )n31 .
11.5.1 The Markov chain
The slotted Aloha protocol constitutes a discrete-time Markov chain [n 5
{0> 1> = = = > Q }, where a state m counts the number of backlogged nodes out of
the Q nodes in total and the subscript n refers to the n-th timeslot. Each
of the m backlogged nodes retransmits a packet in the next time slot with
probability su , while each of the Q m unbacklogged nodes will transmit
surely a packet in the next time slot provided a packet arrives in the current
timeslot. The latter event (at least one arrival D) occurs with probability
sd = Pr [D A 0] = 1 Pr [D = ¡0]. If¢ we assume that the arrival process is
Poissonean, then sd = 1 exp Q , but the computations in this section
are more generally valid.
The probability that q backlogged nodes in state m retransmit in the next
time slot is binomially distributed
µ ¶
m q
eq (m) =
s (1 su )m3q
q u
and, similarly, the probability that q unbacklogged nodes in state m transmit
11.5 Slotted Aloha
221
in the next time slot is
µ
¶
Q m q
sd (1 sd )Q3m3q
xq (m) =
q
A packet is transmitted successfully if and only if (a) one new arrival and
no backlogged packet or (b) no new arrival and one backlogged packet is
transmitted. The probability of successful transmission in state m and per
time slot equals
sv (m) = x1 (m)e0 (m) + x0 (m) e1 (m)
The transition probability Sm>m+p equals
;
A
A
?
2pQ m
xp (m)
p=1
x1 (m) (1 e0 (m))
Sm>m+p =
x
(m)
e
(m)
+
x
(m)
(1
e
(m))
p=0
A
0
0
1
A
= 1
p = 1
x0 (m) e1 (m)
The state with m backlogged nodes jumps to the state m 1 with one backlogged node less if no new packets are sent and there is precisely 1 successful
retransmission. The state m remains in the state m if there is 1 new arrival
and there are no retransmission or if there are no new retransmissions and
none or more than 1 retransmission. The state m jumps to state m +1 if there
is 1 new arrival from a non-backlogged node and at least 1 retransmission
because then there are surely collisions and the number of backlogged nodes
increases by 1. Finally, the state m jumps to state m + p if p new packets
arrive from p dierent non-backlogged nodes, which always causes collisions
irrespective of how many backlogged nodes also retransmit in the next time
slot.
The Markov chain is illustrated in Fig. 11.4, which shows that the state
can only decrease by at most 1.
P0i
P01
0
1
P10
P00
2
P21
P11
P04
P03
P02
3
P32
P22
4
N
P44
PNN
P43
P33
Fig. 11.4. Graph of the Markov chain for slotted Aloha. Each state m counts the
number of backlogged nodes.
222
Applications of Markov chains
The transition probability matrix S has the structure
5
6
S00 S01 S02
···
···
S0Q
9 S10 S11 S12
···
···
S1Q :
:
9
:
9 0 S21 S22
·
·
·
·
·
·
S
2Q
:
9
S =9 .
:
..
..
..
..
..
9 ..
:
.
.
.
.
.
9
:
7 0
0 · · · SQ31>Q32 SQ31;Q31 SQ31;Q 8
0
0
···
0
SQ;Q31
SQQ
whose eigenstructure is computed in Appendix A.5.3.
In the asymptotic regime when Q $ 4, slotted Aloha has the peculiar
property that the steady-state vector does not exist. Although for a small
number of nodes Q , the steady-state equations can be solved, when the
number Q grows, Slotted Aloha turns out to be instable. It seems di!cult
to prove that limQ<" = 0, but there is another argument that suggests
the truth of this awkward Aloha property. The expected change in backlog
per time slot is equivalent to
H [[n+1 [n |[n = m] = (Q m) sd sv (m)
(11.28)
and equals the expected number of new arrivals minus the expected number
of successful transmissions. This quantity H [[n+1 [n |[n = m] is often
called the drift. If the drift is positive for all timeslots n, the Markov chain
moves (on average) to higher
or to the right in Fig. 11.4. Since
¢
¡ states
sv (m) 1 and sd = 1 exp Q , it follows that
lim H [[n+1 [n |[n = m] = 4
Q<"
Thus, the drift tends to infinity, which means that, on average, the number of
backlogged nodes increases unboundedly and suggests (but does not prove,
a counter example is given in problem (ii) of Section 9.4) that the Markov
chain is transient for Q $ 4.
A more detailed discussion and engineering approaches to cure this instability are found in Bertsekas and Gallager (1992, Chapter 4). The interest
of the analysis of slotted Aloha lies in the fact that other types of multiple
access protocols, such as the important class of carrier sense multiple access (CSMA) protocols, can be deduced in a similar manner. Of the CSMA
class with collision detection, Ethernet is by far the most important because it is the basis of local area networks. Multiple access protocols of the
CSMA/CD type are discussed in our book Data Communications Networking (Van Mieghem, 2004a).
11.5 Slotted Aloha
223
11.5.2 E!ciency of slotted Aloha and the oered tra!c J
We now investigate the probability of a successful transmission in state m in
more detail,
sv (m) = x1 (m)e0 (m) + x0 (m) e1 (m)
= (Q m) sd (1 sd )Q3m31 (1 su )m + msu (1 su )m31 (1 sd )Q3m
¸
(Q m) sd
msu
=
+
(1 sd )Q3m (1 su )m
1 sd
1 su
For small arrival probability sd and small retransmission probability su , the
probability of successful transmission in state m can be approximated by
using the Taylor
expansions of (1 {) = h ln(13{) = h3{ (1 + r (1)) and
¢
¡
{
2 as
13{ = { + r {
¢¤
£
¡
sv (m) = (Q m) sd + msu + r s2d + s2u h3[(Q3m)sd +msu ] (1 + r (1))
= [(Q m) sd + msu ] exp ( [(Q m) sd + msu ]) (1 + r (1))
Similarly, the probability that no packet is transmitted in state m equals
sqr (m) = x0 (m) e0 (m) = (1 su )m (1 sd )Q3m
= exp ( [(Q m) sd + msu ]) (1 + r (1))
Hence, for small sd and small su , the probability of successful transmission
and of no transmission in state m is well approximated by
sv (m) ' w (m) h3w(m)
sqr (m) ' h3w(m)
Now, w (m) = (Q m) sd +msu is the expected number of arrivals and retransmissions in state m or, equivalently, the total rate of transmission attempts
in state m. That total rate of transmissions in state m, w(m), is also called the
oered tra!c J. The analysis shows that, for small sd and small su , sv (m)
and sqr (m) are closely approximated in terms of a Poisson random variable
with rate w (m). Moreover, the probability of successful transmission sv (m)
can be interpreted as the departure rate from state m or the throughput
VSAloha = Jh3J , which is maximized if J = w (m) = 1. By controlling su
to achieve w (m) = (Q m) sd + msu = 1, slotted Aloha performs with highest throughput. The e!ciency SAloha of slotted Aloha with many nodes
Q A 1 is defined as the maximum fraction of time during which packets are
transmitted successfully which is max sv (m) = h31 . Hence, SAloha = 36%.
Pure Aloha (Bertsekas and Gallager, 1992), where the nodes can start
transmitting at arbitrary times instead of only at the beginning of time
224
Applications of Markov chains
slots, only performs half as e!ciently as slotted Aloha with PAloha = 18%.
Recall that each packet is assumed to have an equal length that corresponds
with the length of one timeslot. In pure Aloha, a transmitted packet at
time w is successful if no other packet is sent during (w 1> w + 1). This time
interval is precisely equal to two timeslots in slotted Aloha which explains
why PAloha = 12 SAloha . The same observation tells us that, in pure Aloha,
sqr (m) ' h32w(m) because in the successful interval the expected number of
arrivals and retransmissions is twice that in slotted Aloha. The throughput
V roughly equals the total rate of transmission attempts J (which is the same
as in slotted Aloha) multiplied by sqr (m) ' h32w(m) , hence, VPAloha = Jh32J .
11.6 Ranking of webpages
To retrieve webpages related to a user’s query, current websearch engines
first perform a search similar to that in text processors to find all webpages
containing the query terms. Due to the massive size of the World Wide Web,
this first action can result in a huge number of retrieved webpages. Several
thousands of webpages related to a query are not uncommon. To reduce the
list of webpages, many websearch engines apply a ranking criterion to sort
this list. In this section, we discuss PageRank, the hyperlink-based ranking
system used by the Google search engine. PageRank elegantly exploits the
power of discrete Markov theory.
11.6.1 A Markov model of the web
The hyperlink structure of the World Wide Web can be viewed as a directed
graph with Q nodes. Each node in the webgraph represents a certain webpage and the directed edges represent hyperlinks. Let us consider a small
collection of webpages as in Fig. 11.5 to illustrate the underlying idea of
PageRank, invented by Brin and Page, the founders of Google.
2
3
1
P
5
4
(a)
ª0
«0
«
«0
«
«0
«¬ P51
P12
0
0
0
P52
0
P23
P14
P24
0
P43
0
0
0
0
P15 º
P25 »»
0»
»
0»
0 »¼
(b)
Fig. 11.5. A subgraph of the World Wide Web (a) and the corresponding transition
probability matrix S (b).
11.6 Ranking of webpages
225
The topology of any graph is determined by an adjacency matrix (see
Appendix B.1). A reasonable criterion to assess the importance of a webpage is the number of times that this webpage is visited. This criterion
suggests us to consider a discrete Markov chain whose transition probability
matrix S corresponds to adjacency matrix of the webgraph (shown in (b)
in Fig. 11.5). The element Slm of the Markov transition probability matrix
is the probability of moving from webpage l (state l) to webpage m (state
m) in one time step. The components vl [n] of the corresponding state vector
v[n] denotes the probability that at time n the webpage l is visited. The
long run mean fraction of time that webpage l is visited equals the steadystate probability l of the Markov chain. This probability l is the ranking
measure of the importance of webpage l used in Google. The basic idea is
indeed simple, but we have not shown yet how to determine the elements
Slm nor whether the steady-state probability vector exists. In particular,
we will demonstrate that guaranteeing that the steady-state vector exists
and that can be computed for a Markov chain containing some billion of
states — the order of magnitude of the number Q of webpages — requires a
deeper knowledge of discrete Markov chains.
To start determining the elements Slm , we assume that, given we are on
webpage l, any hyperlink on that webpage has equal probability to be clicked
on. This assumption implies that Slm = g1l where the degree gl of a node l
(see also Section 15.3) equals the number of adjacent neighbors of node l in
the webgraph. This number gl is thus equal the number of hyperlinks on
webpage l. The transition probability matrix in Fig. 11.5 then becomes
6
5
0 13 0 13 13
9 0 0 1 1 1 :
3
3
3 :
9
:
S =9
9 0 0 0 0 0 :
7 0 0 1 0 0 8
1
2
1
2
0
0
0
The uniformity assumption is in most cases the best we can make if no
additional information is available. If, for example, web usage information
is available showing that a random surfer accessing page 2 is twice as likely
to jump to page 4 than to any¡ other neighboring
¢ webpage of 2, then the
1
1
1
second row can be replaced by 0 0 4 2 4 .
When solely adopting the adjacency matrix of the webgraph as underlying
structure of the Markov transition probability matrix S , we cannot assure
that S is a stochastic matrix. For example, it often occurs that a node
such as node 3 in our example in Fig. 11.5 does not contain outlinks. Such
nodes are called dangling nodes. For example, many webpages may point
226
Applications of Markov chains
to an important document on the web, which itself does not refer to any
other webpage. The corresponding row in S possesses only zero elements,
which violates the basic law (9.8) of a stochastic matrix. To rectify the
deviation from a stochastic matrix, each zero row must be replaced by a
particular non-zero row vector3 y W that obeys (9.8), i.e. kyk1 = y W x = 1
where xW = [1 1 · · · 1]. Again, the simplest recipe is to invoke uniformity
W
and to replace
zero row by¢y W = xQ . In our example, we replace the
¡ any
third row by 15 51 15 51 51 and obtain
5
0
9 0
9 1
S̄ = 9
9 5
7 0
1
2
1
3
0
1
5
0
1
2
1
3
1
5
1
3
1
3
1
5
1
0
0
0
0
1
3
1
3
1
5
6
:
:
:
:
0 8
0
However, this adjustment is not su!cient to insure the existence of a
steady-state vector . In Section 9.3.1, we have shown that, if the Markov
chain is irreducible, the steady-state vector exists. In an irreducible Markov
chain any state is reachable from any other (Section 9.2.1.1). By its very nature, the World Wide Web leads almost surely to a reducible Markov chain.
In order to create an irreducible matrix, Brin and Page have considered
=
S = S̄ + (1 )
x=xW
Q
where 0 ? ? 1, x=xW is a Q × Q matrix with each element equal to
1 and S̄ is the previously adjusted matrix without zero-rows. The linear
combination of the stochastic matrix S̄ and a stochastic perturbation matrix
=
x=xW ensures that S is an irreducible stochastic matrix. Every node is
now directly connected (reachable in one step) to any other (because of
x=xW ), which makes the Markov chain irreducible with aperiodic, positive
recurrent states (see Fig. 9.1). Slightly more general, we can replace the
W
W
matrix x=x
Q by xy , where y is a probability vector as above but where we
must additionally require that each component of y is non-zero in order to
guarantee reachability. Brin and Page have called yW the personalization
vector which enables to deviate from non-uniformity. Hence, we arrive at
the Brin and Page Markov transition probability matrix
=
S = S̄ + (1 )xy W
3
(11.29)
We use the normal vector algebra convention, but remark that the stochastic vectors and
v[n] are also row vectors (without the transpose sign)!
11.6 Ranking of webpages
£
1
4
6
4
1
For y W = 16
16
16
16
16
matrix in our example becomes
5
9
=
9
S =9
9
7
1
80
1
80
1
16
1
80
33
80
¤
19
60
1
20
1
4
1
20
9
20
227
and = 45 , the probability transition
3
40
41
120
3
8
7
8
3
40
19
60
19
60
1
4
1
20
1
20
67
240
67
240
1
16
1
80
1
80
6
:
:
:
:
8
If the presented method were implemented, the initially very sparse matrix
=
S would be replaced by the dense matrix S , which for the size Q of the web
would increase storage dramatically. Therefore, a more eective way is to
define a special vector u whose component um = 1 if row m in S is a zero-row
or node m is dangling node. Then, S̄ = S + uy W is a rank-one update of S
=
and so is S because
=
¢
¡
S = S + uy W + (1 )x=y W = S + (u + (1 )x) y W
11.6.2 Computation of the PageRank steady-state vector
The steady-state vector obeys the eigenvalue equation (9.22), thus = S̄ . Rather than solving this equation, Brin and Page propose to compute
the steady-state vector from = limn<" v[n]. Specifically, for any starting
W
vector v[0] (usually v[0] = xQ ), we iterate the equation (9.6) p-times and
choose p su!ciently large such that kv[p] k where is a prescribed
tolerance. Before turning to the convergence of the iteration process that
=
actually computes powers of S as observed from (9.9), we first concentrate
on the basic iteration (9.6),
=
¢
¡
v [n + 1] = v[n]S = v[n] S + (u + (1 )x) y W
Since v[n]x = 1, we find
v [n + 1] = v[n]S + (v[n]u + (1 )) y W
(11.30)
This formula indicates that only the product of v[n] with the (extremely)
=
sparse matrix S needs to be computed and that S̄ and S are never formed
nor stored. As shown in Appendix A.4.3, the rate of convergence of a Markov
chain towards the steady-state is determined by the second largest eigenvalue. Furthermore, Lemma A.4.4 demonstrates that, for any personaliza=
tion vector yW , the second largest eigenvalue of S is 2 , where 2 is the
second largest eigenvalue of S̄ . Lemma A.4.4 thus shows that by choosing
228
Applications of Markov chains
in (11.29) appropriately, the convergence of the iteration (11.30) tends at
least as n (since 2 ? 1 for irreducible and 2 = 1 for reducible Markov
chains) towards the steady-state vector . Brin and Page report that only
50 to 100 iterations of (11.30) for = 0=85 are su!cient. Clearly, a fast
convergence is found for small , but then (11.29) shows that the true characteristics of the webgraph are suppressed.
This brings us to a final remark concerning the irreducibility approach.
The original method of Brin and Page that resulted in (11.29) by enforcing
that each node is connected to each other alters the true nature of the
Webgraph even though the “connectivity strength” to create irreducibility
W
is extremely small, Q1 in the case yW = xQ . Instead of maximally connecting
all nodes, an other irreducibility approach of minimally connecting nodes
investigated by Langville and Meyer (2005) consists of creating one dummy
node that is connected to all other nodes and to which all other nodes
are connected to ensure overall reachability. Such approach changes the
webgraph less. The large size Q of the web introduces several challenges
such as storage, stability and updating of the PageRank vector, choosing
the personalization vector y and other implementational considerations for
which we refer to Langville and Meyer (2005).
11.7 Problems
(i) Determine the steady-state probability distribution for the birthdeath processes with following transition intensities
(a) l = and l = l,
and l = (b) l = l+1
where and are constants.
(ii) Consider a slotted ALOHA in Section 11.5. There are eight stations
that compete for slots by transmitting with probability 0.12 each in
one slot. Assume that the stations always have packets to transmit.
Compute the average time for one station to transmit seven packets.
12
Branching processes
A branching process is an evolutionary process that starts with an initial
set of items that produce several other items with a certain probability
distribution. These generated items in turn again produce new items and
so on. If we denote by [n the number of items in the n-th generation and
by \n>m the number of items produced by the m-th item in generation n, then
the basic law between the number of items in n-th and n + 1-th generation
is, for n 0,
[n+1 =
[n
X
\n>m
(12.1)
m=1
Figure 12.1 illustrates the basic law (12.1) of a branching process.
X0 = 1
0
1
2
X1 = 3 = Y0
3
X2 = 5 = Y1,1 + Y1,2 + Y1,3
= 2 +0 + 3
Fig. 12.1. A branching process with one root ([0 = 1) drawn as a tree in which all
nodes of generation n lie at a same distance n from the root (label 0).
In general, the production process in each generation n can be dierent,
but most often and in the sequel it is assumed that all generations produce
items with the same probability distribution such that all random variables
\n>m are independent and have the same distribution as \ . The branching
229
230
Branching processes
process is entirely defined by the basic law (12.1) and the distribution of
the initial set [0 . The basic law (12.1) indicates that the number of items
[n+1 in generation n + 1 is only dependent on the number of items [n in
the previous generation n. The Markov property (9.2)
Pr [[n+1 = {n+1 |[0 = {0 > = = = > [n = {n ] = Pr [[n+1 = {n+1 |[n = {n ]
5
6
{n
X
= Pr 7
\ = {n+1 8
m=1
= Pr [{n \ = {n+1 ]
is obeyed, which shows that the branching process {[n }nD0 is a Markov
chain with transition probabilities Slm = Pr [l\ = m]. The discrete branching
process can be extended to a continuous-time branching process in which
items are produced continuously in time, rather than by generations. Since
continuous-time Markov processes are mathematically more di!cult than
their discrete counterpart, we omit the continuous-time branching processes
but refer to the book of Harris (1963) and to a simple example, the Yule
process, in Section 11.3.2.3.
There are many examples of branching processes and we briefly describe
some of the most important. In biology, a certain species generates osprings
and the survival of that species after q generations is studied as a branching
process. In the same vein, what is the probability that a family name that is
inherited by sons only will eventually become extinct? This was the question
posed by Galton and Watson that gave birth to the theory of branching
processes in 1874. In physics, branching processes have been studied to
understand nuclear chain reactions. A nucleus is split by a neutron and
several new free neutrons are generated. Each of these free neutrons again
may hit another nucleus producing additional free neutrons and so on. In
micro-electronics, the avalanche break-down of a diode is another example of
a branching process. In queuing theory, all new arrivals of packets during the
service time of a particular packet can be described as a branching process.
The process continues as long as the queue lasts. The number of duplicates
generated by a flooding process in a communications network is a branching
process: a flooded packet is sent on all interfaces of a router except for
the incoming interface. The spread of computer viruses in the Internet can
be modeled approximately as a branching process. The application of a
branching process to compute the hopcount of the shortest path between
two arbitrary nodes in a network is discussed in Section 15.7.
12.1 The probability generating function
231
12.1 The probability generating function
Since \n>m are independent, identically distributed random variables with
same distribution as \ and independent of the random variable [n , the
probability generating function *[n+1 (}) of [n+1 follows from (2.70) and
the basic law (12.1) as
*[n+1 (}) = *[n (*\ (}))
(12.2)
¤
£
with *[0 (}) = H } [0 = i (}) where i (}) is a given probability generating
function. Iterating the general relation (12.2) gives,
*[n+1 (}) = *[n (*\ (}) = *[n31 (*\ (*\ (})))
= i (*\ (*\ (= = = (*\ (})))))
where the last relation consists of n nested repeated functions *\ (=), called
the iterates of *\ (=).
The expectation can be derived from the probability generating function
by derivation and setting } = 1. More elegantly, by taking the expectation
of the basic law (12.1) and recalling that [n and \n>m are independent, we
have with = H [\ ]
6
5
6
5
[n
[n
X
X
\n>m 8 = H 7
\ 8 = H [\ [n ] = H [[n ]
H [[n+1 ] = H 7
m=1
m=1
Iteration starting from a given average H [[0 ] of the initial population gives
H [[n ] = n H [[0 ]
(12.3)
Using (2.27), the variance of [n+1 follows from (12.2) with
*0[n+1 (}) = *0[n (*\ (}))*0\ (})
¡
¢2
*00[n+1 (}) = *00[n (*\ (})) *0\ (}) + *0[n (*\ (}))*00\ (})
as
³
´2
Var [[n+1 ] = *00[n+1 (1) + *0[n+1 (1) *0[n+1 (1)
¡
¢2
2
= *00[n (1) (*0\ (1)) + *0[n (1)*00\ (1) + *0[n (1)*0\ (1) *0[n (1)*0\ (1)
2
2
= *00[n (1) (H [\ ]) + H [[n ] *00\ (1) + H [[n ] H [\ ] (H [[n ] H [\ ])
= 2 Var [[n ] + H [[n ] Var [\ ]
Iteration starting from a given variance Var[[0 ] of the initial set of items
232
Branching processes
and employing the expression for the average (12.3) yields
Var [[1 ] = 2 Var [[0 ] + H [[0 ] Var [\ ]
Var [[2 ] = (H [\ ])2 Var [[1 ] + H [[1 ] Var [\ ]
¡
¢
= 4 Var [[0 ] + 2 + H [[0 ] Var [\ ]
Var [[3 ] = (H [\ ])2 Var [[2 ] + H [[2 ] Var [\ ]
¢
¡
= 6 Var [[0 ] + 4 + 3 + 2 H [[0 ] Var [\ ]
from which we deduce
X
2(n31)
2n
Var [[n ] = Var [[0 ] +
m H [[0 ] Var [\ ]
m=n31
or
Var [[n ] = 2n Var [[0 ] + H [[0 ] Var [\ ] n31
1 n
1
(12.4)
Substitution into the recursion for Var[[n ] justifies the correctness of (12.4).
The relations for the expectation (12.3) and the variance (12.4) of the
number of items in generation n imply that, if the average production
per generation is H [\ ] = = 1, H [[n ] = H [[0 ] and that Var[[n ] =
Var[[0 ] + nH [[0 ]Var[\ ]. In the case that the average production H [\ ] =
A 1 (H [\ ] ? 1), the average population per generation grows (decreases)
exponentially
p in n with rate log and, similarly for large n, the standard
deviation Var [[n ] grows (decreases) exponentially in n with the same rate
log . Hence, the most important factor in the branching process is the average production H [\ ] = per generation. The variance terms and H [[0 ]
only play a role as prefactor. A branching process is called critical if = 1,
subcritical if ? 1 and supercritical if A 1. In the sequel, we will only
consider supercritical ( A 1) branching processes.
Often, the initial set of items consists of only one item. In that case,
[0 = 1 and *[0 (}) = i (}) = } and
H [[n ] = n
Var [[n ] = Var [\ ] n31
1 n
1
while the explicit nested form of the probability generating function indicates
that
*[n+1 (}) = *\ (*[n (}))
(12.5)
This relation is only valid if i (}) = } or, equivalently, only if [0 = 1. In
case i (}) = }, *[n (}) is the n-th iterate of *\ (}).
12.2 The limit Z of the scaled random variables Zn
233
Example Due to the nested structure (12.2), closed form expressions
for the n-th generation probability generating function *[n (}) are rare.
Assume that [0 = 1. A simple case that allows explicit computation occurs
in a deterministic production of p osprings in each generation for which
*\ (}) = } p . We have from (12.5) that *[1 (}) = *\ (}) = } p and
*[n (}) = } np
This branching process evolves as an p-ary tree shown in Fig. 17.7. A
second example that can be computed exactly is the geometric branching
process studied in Section 12.5.
12.2 The limit Z of the scaled random variables Zn
The conditional expectation, defined in Section 2.6,
H [[n+1 |[n > [n31 > = = = > [0 ] = H [[n+1 |[n ]
(Markov property)
¯ 6
5
¯
[n
X
¯
7
=H
\n>m ¯¯ [n 8 = H [\ [n |[n ] = [n
¯
m=1
is a random variable, which suggests us to consider the scaled random varin
able Zn = [
because
n
H [Zn+1 |Zn > Zn31 > ===> Z1 ] = Zn
while (12.3) shows that H [Zn ] = H [[0 ] for all n. The stochastic process
{Zn }nD1 is a martingale process, which is a generalization of a fair game
with characteristic property that at each step n in the process H [Zn ] is a
constant (independent of n). From (12.4), the variance of the scaled random
n
is
variables Zn = [
n
Var [Zn ] = Var [[0 ] + H [[0 ]
´
Var [\ ] ³
3n
1
2 (12.6)
which geometrically tends, provided H [\ ] = A 1, to a constant independent of n. The expression for the variance (12.6) indicates that
Var [Z ] = lim Var [Zn ] = Var [[0 ] + H [[0 ]
n<"
Var [\ ]
2 (12.7)
exists provided H [\ ] = A 1. We now show that the limit variable Z =
limn<" Zn exists if H [\ ] A 1.
Theorem 12.2.1 If H [\ ] = A 1, the scaled random variables Zq $ Z
a.s.
234
Branching processes
Proof: Consider
h
i
£ 2 ¤
£
¤
H (Zn+q Zq )2 = H Zn+q
+ H Zq2 2H [Zn+q Zq ]
Using (2.72) with k({) = { and the Markov property,
¤
£
H [Zn+q Zq ] = H [H [Zn+q |Zq ] Zq ] = H Zq2
we have, with (2.16), H [Zn+q ] = H [Zn ] = H [[0 ] and (12.6),
h
i
´
Var [\ ] 3q ³
3n
H (Zn+q Zq )2 = Var [Zn+q ]Var [Zq ] = H [[0 ]
1
2 In the limit n $ 4,
h
i
¢
¡
H (Z Zq )2 = R 3q
which means that the sequence {Zn }nD1 converges to Z in O2 or in mean
square (see Section 6.1.2). Moreover,
#
""
"
h
i
X
X
2
2
H (Z Zq ) = H
(Z Zq ) = R(1)
q=1
q=1
P
2
which means that the series has finite expectation and that "
q=1 (Z Zq )
is finite with probability 1. The convergence of this series implies for large
q that (Z Zq )2 $ 0 with probability 1 or that Zq $ Z a.s.
¤
Theorem 12.2.1 means that the number of items in generation n is, for
large n, well approximated by [n Z n . Hence, an asymptotic analysis
of a branching process crucially relies on the properties of the limit random
variable Z .
£ ¤
The generating function *Z (}) = H } Z of this limit random variable
can be deduced as the limit of the sequence of generating functions
[ ¸
n
£ Z ¤
3n
n
*Zn (}) = H }
= H } n = *[n (} )
(12.8)
3n31
Using (12.5) in case1 [0 = 1 with } $ } [n+1 ¸
µ [ ¸¶
n
n+1
H }
= *\ H } n n
leads with Zn = [
to the recursion of the pgf of the scaled random variables
n
Zn ,
³
³ 1 ´´
*Zn+1 (}) = *\ *Zn } 1
The use of the general equation (12.2) is inadequate.
12.2 The limit Z of the scaled random variables Zn
235
In the limit n $ 4 where Zn $ Z a.s., we can apply the Continuity
Theorem 6.1.3 which results in the functional equation for the pgf of the
continuous limit random variable Z ,
³
³ 1 ´´
(12.9)
*Z (}) = *\ *Z } Since Z is a continuous random variable except at Z = 0 as explained
below (see (12.19)), it is£more ¤convenient to define the moment generating function "Z (w) = H h3wZ . Obviously, the relation between the two
generating functions is with } = h3w
"Z (w) = *Z (h3w )
With } = h3w in (12.9) the functional equation of "Z (w) is for w 0 and
H [Z ] = H [[0 ] = 1,
µ
µ ¶¶
w
"Z (w) = *\ "Z
(12.10)
The functional equation (12.10) is simpler than (12.9) and "Z (w) is convex for all w, while *Z (}) is not convex for all }. In particular, *Z (}) =
"Z ( log }) is not analytic at } = 0 and appears2 to have a concave regime
00
near } & 0 where "0Z ( log }) + "Z ( log }) ? 0.
Lemma 12.2.2 "Z (w) is the only probability generating function satisfying
the functional equation (12.10).
¤
£
£
W¤
Proof : Let #Z W (w) = H h3wZ and "Z (w) = H h3wZ be two probability generating functions that satisfy both (12.10). Then #Z W (w) "Z (w)
is continuous for Re w 0 and, since H[Z ] = H[Z W ] = 1, the Taylor series
(2.40) around w = 0 is
#
""
´
X (w)n ³
n
#Z W (w) "Z (w) = (w)H [Z W Z ] + H
(Z W ) Z n
n!
n=2
""
#
´
X (w)n ³
= wH
Z n+1 (Z W )n+1
(n + 1)!
n=1
from which #Z W (w) "Z (w) = w k(w) and k(0) = 0. Since |*0\ (})| for
|}| 1, equation (5.6) of the Mean Value Theorem implies |*\ (d) *\ (e)| |d e| for any |d|> |e| 5 [0> 1]. Since |"Z (w) | 1 and |#Z W (w)| 1 for
2
This fact is observed for both a geometric and Poisson production distribution function.
236
Branching processes
Re(w) 0, we obtain
¯ µ
µ ¶¶
µ
µ ¶¶¯
¯
¯
w
w
¯
¯
|w k(w)| = ¯*\ #Z W
*\ "Z
¯
¯
¯
µ ¶¯
µ ¶
µ ¶¯
¯
¯w
w ¯¯
w
w ¯¯
¯¯#Z W
= ¯¯ k
"Z
¯
¯
or
¯ µ ¶¯
¯
w ¯¯
|k(w)| ¯¯k
¯
¯ ³ ´¯
¯
¯
After N iterations, we have that |k(w)| ¯k wN ¯ which hold for any integer
N. Hence,³for ´any finite
w and since
³
´ k (w) is continuous which allows that
w
w
limN<" k N = k limN<" N ,
¯ µ
¶¯
¯
w ¯¯
= k(0) = 0
|k(w)| lim ¯¯k
N<"
N ¯
which proves the Lemma.
¤
Lemma 12.2.2 is important because solving the functional equation, for example by Taylor expansion, is one of the primary tools to determine "Z (w).
If *\ (}) is analytic inside a circle with radius U\ A 0 centered at } = 1,
then the Taylor series around }0 = 1,
*\ (}) = 1 +
"
X
xn (} 1)n
n=0
¤
£
converges for all |} 1| ? U\ . The definition "Z (w) = H h3wZ implies
that the maximum value of |"Z (w)| inside and on a circle with radius u
around the origin is attained at "Z (u). The functional equation (12.10)
then shows that "Z
³ (w) is´analytic inside a circle around w = 0 with radius
UZ for which "Z UZ ? 1 + U\ . Since "Z (0) = 1, "Z (w) is convex
and decreasing for real w and U\ A 0, there exists such a non-zero value of
UZ . This implies that the Taylor series
"Z (w) = 1 +
"
X
$n wn
(12.11)
n=1
converges around w = 0 for |w| ? UZ . There exists a recursion to compute
$n for any n 1 as shown in Van Mieghem (2005). If "Z (w) is not known in
closed form, the interest of the Taylor series (12.11) lies in the fast convergence for small values of |w| ? 1. The recursion for the Taylor coe!cients $n
12.3 The Probability of Extinction of a Branching Process
237
enables the computation of "Z (w) for |w| ? 1³ to any
desired degree of accu³ ´´
w
racy. The functional equation "Z (w) = *\ "Z extends the w-range
to the entire complex plane. For large values of
particular for nega³ w and in ´
w
tive real w, "Z (w) is best computed from "Z [log |w|]+1 after [log |w|] + 1
w
? 1,
|w|]+1
[log
³
´
the Taylor series (12.11) provides an accurate start value "Z [logw|w|]+1
functional iteratives of (12.10). Indeed, since A 1 such that
for this iterative scheme.
12.3 The Probability of Extinction of a Branching Process
In many applications the probability that the process will eventually terminate and which parameters influence this extinction probability are of
interest. For instance, a nuclear reaction will only lead to an explosion if
critical starting conditions are obeyed. The branching process terminates if,
for some generation q A 0, [q = 0 and, of course, [p = 0 for all p A q.
Let us denote
tn = Pr [[n = 0] = *[n (0)
If we assume that [0 = 1, the analysis simplifies because the more specific
version (12.5) holds. Hence, only if the initial set consists of a single item
[0 = 1,
tn+1 = *[n+1 (0) = *\ (*[n (0)) = *\ (tn )
(12.12)
and with t0 = *[0 (0) = 0, t1 = *\ (0) = Pr [\ = 0] 0. Obviously, if there
is no production, Pr [\ = 0] = 1, or always production, Pr [\ = 0] = 0,
extinction never occurs. By its definition (2.18), a probability generating
function of a non-negative discrete random variable is strict increasing along
the positive real }axis. When excluding the extreme cases such that 0 ?
Pr [\ = 0] ? 1, by the strict increase of *\ ({) for { = Re } 0, we observe
that
0 = t0 ? t1 = *\ (0) ? t2 = *\ (t1 ) ? t3 = *\ (t2 ) ? ===
The series t0 > t1> t2 > = = = is a monotone increasing sequence bounded by 1
because *\ (1) = 1. Hence, the probability of extinction
0 = lim Pr [[n = 0] = Pr [Z = 0]
n<"
exists and 0 ? 0 1. The existence of a limiting process and the fact that
the probability generating function is analytic for |}| ? 1 and hence, continuous, which allows us to interchange limn<" *\ (tn ) = *\ (limn<" tn )
238
Branching processes
yields the equation for the extinction probability 0 ,
0 = *\ (0 )
(12.13)
It demonstrates that the extinction probability 0 is a root of *\ ({) { in
the interval { 5 [0> 1].
Since *Z (0) = Pr [Z = 0] = 0 , this equation (12.13) follows more directly from (12.9). Notice, however, that in the functional equation (12.9)
@ N which may cause that iZ ({)
the function } is not analytic at } = 0 if 5
is possibly not continuous at { = 0, although the limit lim}<0 *Z (0) = 0
exists. On the other hand, since *Z (}) = "Z ( log }), the extinction probability is found as
lim "Z (w) = 0
w<"
and the convexity of "Z (w) implies that, for any real value of w, "Z (w) 0 .
An alternative, more probabilistic derivation of equation (12.13) is as
follows. Applying the law of total probability (2.46) to the definition of the
extinction probability
0 = Pr [[q = 0 for some q A 0]
"
X
=
Pr [[q = 0 for some q A 0|[1 = m] Pr [[1 = m]
m=0
Only if [0 = 1, relation (12.5) indicates that *[1 (}) = *\ (}) which implies
that Pr [[1 = m] = Pr [\ = m]. In addition, given the first generation consists
of m items, the branching process will eventually terminate if and only if each
of the m sets of items generated by the first generation eventually dies out.
Since each set evolves independently and since the probability that any set
generated by a particular ancestor in the first generation becomes extinct is
0 , we arrive at
"
X
0 =
0m Pr [\ = m] = *\ (0 )
m=0
The dierent viewpoints thus lead to a same result summarized by:
Theorem 12.3.1 If [0 = 1 and 0 ? Pr [\ = 0] ? 1, the extinction probability 0 is (a) the smallest positive real root of { = *\ ({) and (b) 0 = 1
if and only if H [\ ] 1 and Pr [\ = 0] + Pr [\ = 1] ? 1.
Proof: (a) Suppose that {r is the smallest positive real root obeying *\ ({r ) = {r A 0. Then,
t1 = *\ (0) ? *\ ({r ) = {r . Assume (induction hypothesis) that tq ? {r . The recursion (12.12)
and the strict increase of *\ ({) then shows that tq+1 = *\ (tq ) ? *\ ({r ) = {0 . Hence, the
principle of induction demonstrates that tq ? {r for all (finite) q and, hence, that 0 $ {0 .
12.3 The Probability of Extinction of a Branching Process
239
(b) First, the condition Pr [\ = 0] + Pr [\ = 1] ? 1 implies that Pr [\ A 1] A 0 and that
there exists at least one integer m A 1 such that Pr [\ = m] A 0. In that case, for real { A 0
but { smaller
convergence which is at least U = 1, the second derivative
S" than the radius U of m32
*00
is positive, which implies that *\ ({) is strict convex in
m=2 m (m 3 1) Pr [\ = m] {
\ ({) =
(0> 1). Since { = 1 obeys { = *\ ({) and *\ (0) = Pr [\ = 0] M (0> 1), the strict convex function
| = *\ ({) can only intersect the line | = { in some point { M (0> 1) if *\ ({) is below that line
near their intersection at { = 1 or if *0\ (1) = H [\ ] A 1. In the other case, if H [\ ] ? 1, the only
intersection is at { = 1.
¤
The two possibilities are drawn in Fig. 12.2.
MY (x)
1
a
b
Pr[Ya = 0]
Pr[Yb = 0]
x3
0
S0 x2
x1
x0
1
x
Fig. 12.2. The generating function *\ ({) along the positive real axis {. The two
possible cases are shown: curve d corresponds to H [\ ] ? 1 and curve e to H [\ ] A 1.
The fast convergence towards the zero 0 is exemplified by the sequence {0 A {1 =
*\ ({0 ) A {2 = *\ ({1 ) A {3 = *\ ({2 ).
A root equation such as (12.13) also appears in queuing models such as the
M/G/1 (Section 14.3) and GI/D/1 (Section 14.4) and reflects the asymptotic
behavior as explained in Section 5.7. The extinction probability 0 can be
expressed explicitly as a Lagrange series as demonstrated in Van Mieghem
(1996).
The branching process with infinitely many generations n $ 4 can be
viewed as an infinite directed tree where each node has a finite degree a.s.
The fact that 0 ? 1 if H [\ ] A 1 implies that, in infinite directed trees,
there exists an infinitely long path starting from the root with probability
1 0 .
Theorem 12.3.2 The limiting branching process with [0 = 1 obeys for
240
Branching processes
n$4
Pr [[n = 0] $ 0
Pr [[n = m] $ 0
for any m A 0
Proof: First, if H [\ ] ? 1, then Theorem 12.3.1 states that 0 = 1. For any probability
generating function * (}), it holds that |* (}) | $ 1 for |}| $ 1. Hence, *[n ({) $ 1 for real
{ M [0> 1]. Moreover, tn = *[n (0) $ *[n ({). In the limit n < ", tn < 0 = 1, which implies
that for all { M [0> 1] it holds that *[n ({) < 0 = 1. The fact that a probability generating
function, a Taylor series around } = 0, converges to a constant 0 for 0 $ { $ 1 implies that
Pr [[n = m] < 0 for any m A 0 and Pr [[n = 0] < 0 .
The second case H [\ ] A 1 possesses an extinction probability 0 ? 1. For { M (0 > 1), Fig. 12.2
shows that 0 ? *\ ({) ? { ? 1. By induction using (12.5), we find that 0 ? *[n ({) ?
*[n1 ({) ? · · · ? 1 or limn<" *[n ({) = 0 for { M (0 > 1). For { M [0> 0 ), the same argument
tn = *[n (0) $ *[n ({) $ 0 shows that limn<" *[n ({) = 0 for { M [0> 1). This proves the
theorem.
¤
Theorem 12.3.2 states that, regardless of the value of H [\ ], the probability
that the n-th generation will consists of any finite positive number of items
tends to zero if n $ 4. Theorem 12.3.2 is equivalent to the statement
that, after an infinite number n $ 4 of evolutions or generations, [n $ 4
with probability 1 0 . Theorem 12.3.2 also illustrates that the Markov
chain with an infinite number of states behaves dierently than a chain with
a finite number of states. In particular, Theorem 12.3.2 shows that the
infinite Markov chain {[n }nD0 has a single absorbing state [n = 0 while
all other states m are transient (limq<" Slmq = 0 for 1 l> m ? 4). The
existence of the steady-state vector (not all components are zero) does
not imply that the branching process with [0 = 1 and 0 ? Pr [\ = 0] ? 1
and infinitely many states is an irreducible Markov chain.
12.4 Asymptotic behavior of Z
The convexity of "Z (w) implies that "0Z (w) 0 for all real w and that "0Z (w)
is decreasing in w. We know that "0Z (0) = 1= Since limw<" "Z (w) = 0 ,
it follows that limw<" "0Z (w) = 0. The following Lemma 12.4.1 is a little
more precise.
Lemma 12.4.1 "0Z (w) = r(w31 ) for w $ 4.
Proof: The derivative of the functional equation (12.10) is "0Z (w) = *0\ ("Z (w)) "0Z (w).
By iteration, we have
N31
\
*0\ "Z m w
N "0Z N w = "0Z (w)
m=0
Since "Z (w) M [0 > 1] for real w D 0, then *0\ (0 ) $ *0\ "Z m w $ for any m. Theorem 12.3.1
0
states that if = *\ (1) A 1, then there are two zeros 0 and 1 of i (}) = *\ (})3} in } M [0> 1]. By
Rolle’s Theorem applied to the continuous function i (}) = *\ (}) 3 }, there exists an M (0 > 1)
12.4 Asymptotic behavior of Z
241
for which i 0 () = 0. Equivalently, *0\ () = 1 and A 0 . Since *0\ (}) is monotonously increasing
in } M [0> 1], we have that *0\ (0) = Pr [\ = 1] $ *0\ (0 ) ? 1. Since "Z (w) is continuous and
monotone decreasing, there exists an integer N0 such that *0\ "Z m w ? 1 for m A N0 and
any w A 0. Hence,
lim
N<"
N31
\
"
0 31
N\
\
*0\ "Z m w =
*0\ "Z m w
*0\ "Z m w < 0
m=0
m=0
m=N0
and, for any finite w A 0, N "0Z N w < 0 for N < " which implies the lemma.
¤
Lemma 12.4.1 is, for large w, equivalent to |"0Z (w) | Fw313 for some
real A 0 and where F is a finite positive real number. Lemma 12.4.1 thus
suggests that
"0Z (w) = j (w) w331
(12.14)
where 0 ? j (w) F on the real positive waxis.
Lemma 12.4.2 If *0\ (0 ) A 0 and A 1, then
I = lim j (w)
w<"
(12.15)
exists, is finite and strictly positive.
Proof: We first use (a) the convexity of any pgf "Z (w) implying that "00
Z (w) D 0 for all w and
we then invoke (b) the functional equation (12.10) of "Z (w).
(a) The function j (w) = 3"0Z (w) w+1 is dierentiable, thus continuous, and has for real w A 0
only one extremum at w = obeying
=
3"0Z ( )
00
"Z ( )
( + 1) A 0
Since "0Z (0) = 31, implying that j (w) = w+1 (1 + r(w) as w 0 or that j (w) is initially monotone
increasing in w, the extremum at w = is a maximum. The derivative of j (w) = 3"0Z (w) w+1 is,
with (12.14),
0
j
(w) =
+1
+1
j (w) 3 "00
Z (w) w
w
+2
00
such that, for finite, max j (w) = +1 "00
Z ( ). Since "Z (w) D 0 for all w, we also obtain the
inequality for w D 0
0
j
(w) $
+1
+1
j (w) $
F
w
w
0 (w) $ 0. Hence, j (w) is not increasing for w < ".
from which limw<" j
(b) Substitution of (12.14) in the derivative of the functional equation (12.10) yields
w
w
j
(12.16)
j (w) = *0\ "Z
Since *0\ "Z w
D *0\ (0 ) A 0 (the restriction of this Lemma), there holds with D =
*0\ (0 ) A 0 for all w A 0 that
j (w) D Dj
w
242
Branching processes
For w ? , j (w) is shown in (a) to be monotone increasing, which requires that D D 1 for A 1.
But, since the inequality with D D 1 holds for all w A 0, we must have that < ". Hence, j (w)
is continuous and strict increasing for all w D 0 with a maximum at infinity, which proves the
existence of a unique limit I $ F.
If I = 0, the suggestion (12.14) is not correct implying that "0Z (w) decreases faster than any
¤
power of w31 . The proof of Lemma 12.4.1 indicates that his case can occur if *0\ (0 ) = 0.
In fact, D = 1. For, when passing to the limit w $ 4 in (12.16) using
Lemma 12.4.2, we obtain
3 = *0\ (0 )
which determines the exponent 1 as
=
log *0\ (0 )
log After integration of (12.14), we have that
Z "
j (x) x331 gx
"Z (w) = 0 +
(12.17)
(12.18)
w
Approximating j (x) by its limit I for large w, we obtain the asymptotic
form
I 3
w
"Z (w) 0 +
Beside and the extinction probability 0 , the parameter I appears as
additional characterizing quantity of a branching process. The behavior of
the Laplace transform (2.37) for large w reflects the Rbehavior of the probaf+l" h{w
1
{v31
bility density function for small {. Hence, using 2l
f3l" wv gw = K(v) for
Re v A 0, the probability density function is, for small {,
iZ ({) 0 ({) +
I
{31
( + 1)
(12.19)
The probability density function iZ ({) is not continuous at { = 0 if 0 A 0
and reflects the two dierent regimes: (a) Z = 0 implying that the branching process extincts, [n = 0, from some generation n on and (b) Z A 0
implying [n Z n for large n, the number of items per generation grows
exponentially with prefactor Z . If two sample paths of a same branching
process are generated, [n>1 Z1 n and [n>2 Z2 n may be largely dierent for large n, because of the random nature of Z : although the prefactor
Z1 and Z2 both have the same probability density function iZ ({), Z1 can
dier substantially from Z2 as illustrated by the pdf iZ ({) in Fig. 12.3.
12.5 A geometric branching processes
243
0
10
P =5
0.8
P =4
-1
10
P =3
-2
10
fW(x)
0.6
increasing P
fW (x)
-3
10
P =2
0.4
increasing P
-4
10
-5
10
0.2
0
2
4
6
8
10
x
increasing P
Poisson
Geometric
0.0
0
2
4
6
8
10
x
Fig. 12.3. The probability density function of the limit random variable Z for both
a geometric and a Poisson production process for a same set of values of the average
= H[\ ].
12.5 A geometric branching processes
Consider a production generation function *\ (}) of the form i (}) = d}+e
f}+g .
Beside straightforward iteration of (12.5), a more elegant approach3 relies
on the following property of i (}). For } = {, the dierence is
i (}) i ({) =
dg ef
(} {)
(g + f}) (g + f{)
and, hence, for any two points {0 and {1 ,
i (}) i ({0 )
=
i (}) i ({1 )
3
µ
f{1 + g
f{0 + g
¶µ
} {0
} {1
¶
The linear fractional transformation i (}) is an automorphism of the extended complex plane
and basic in the geometric theory of a complex function for which we refer to the book of
Sansone and Gerretsen (1960, vol. 2). Fixed points of an automorphism of the extended plane
are solutions of } = i (}), which is a quadratic equation f} 2 + (g 3 d) } 3 e = 0 and which
shows that there are at most two dierent fixed points.
244
Branching processes
Let us now confine to the two fixed points, {0 and {1 , of i (}) that are
1 +g
solution of i (}) = } and let = f{
f{0 +g , then
i (}) {0
} {0
=
i (}) {1
} {1
Now, substitute } $ i (}), then
i (i (})) {0
} {0
i (}) {0
=
= 2
i (i (})) {1
i (}) {1
} {1
Let us denote the iterates of i (}) by zq = iq (}) = i (iq31 (})). By iterating, we find that the iterates obey
} {0
zq {0
= q
zq {1
} {1
or
zq =
{0 (} {1 ) {1 q (} {0 )
} {1 q (} {0 )
(12.20)
Since the probability generating function (3.6) of a geometric random
variable \ is of the form i (}) = d}+e
f}+g , a geometric branching process is
regarded as a basic reference model in the study of branching processes.
The production process in each generation obeys Pr [\ = n] = tsn for n 0
t
leading to *\ (}) = 13s}
, which is slightly dierent from (3.6). We know
that the equation *\ ({) = { can have two real zeros in [0> 1], one at {1 = 1
since *\ (}) is a probability generating function and another at {0 = st =
1
1
H[\ ] = = 0 such that
=
s{1 + 1
t
1
= =
s{0 + 1
s
The functional equation (12.5) associates zq = *[q (}) and after substitution in (12.20) we obtain
¡ q31
¢
1 } q31 + 1
*[q (}) =
(12.21)
(q 1) } q + 1
In the case that H [\ ] = $ 1 or s = t, using the rule of de l’Hospital gives
*[q (}) =
(q 1)} q
q} q + 1
From (12.21), the probabilities of extinction at the n-th generation are
Ã
!
1 n 1
Pr [[n = 0] = *[n (0) =
n 1
12.5 A geometric branching processes
245
If H [\ ] = A 1, then limn<" Pr [[n = 0] = 1 = {0 = 0 (Theorem 12.3.2)
whereas for H [\ ] = 1, we find that limn<" Pr [[n = 0] = 1. If W0 is
the hitting time defined in Section 9.2.2 as the smallest discrete-time n such
that [n = 0, then Pr [W0 n] = Pr [[n = 0].
The probability generating function of the scaled random variables Zn =
[n
follows from (12.21) and (12.8) as
n
*Zn (}) =
¡ n31
¢ 3n
1 } n31 + 1
(n 1) } 3n n + 1
from which *Z ;Jhr (}) = limn<" *Zn (}) follows as
*Z ;Geo (}) =
1 log } + 1 1
log } + 1 1
(12.22)
and
"Z ;Geo (w) =
w
1
"
X
n31 (1)n n
+1 =
1
+
w
n31
w + 1 1
(
1)
n=0
(12.23)
¤
£
¤
£
Since "Z (w) = H h3Z w and using (2.40), all moments are found as H Z n =
n!n31
.
(31)n31
Furthermore, with 0 = *Z (0) = 1 and from (2.38), the probability density function follows as
Z f+l" w + 1 1
1
{w
iZ ;Geo ({) =
h gw
(f A 0)
1
2l f3l" w + 1 By closing the contour for { A 0 over the negative Re(w)-plane, we encounter
a simple pole at w = 1 + 1 = (1 0 ) ? 0 (since A 1) resulting in
´2
³
³
´´
; ³
1
1
A
1
exp
{
1
{A0
? 1
iZ ;Geo ({) =
(12.24)
{=0
({)
A
=
0
{?0
From (12.7) the variance is Var[ZGeo ] = +1
31 . The limit random variable
ZGeo of a geometric branching process is exponentially distributed with an
atom at { = 0 equal to the extinction probability 0 = 1 . From (12.17),
the exponent Geo = 1 for any value of 1. Comparing (12.24) and
the general relation (12.19) for small { indicates that the parameter I =
´2
³
1
1
for a geometric production process.
The limit random variable Z for production processes \ of which all moments exist can be computed via Taylor series expansions. In Van Mieghem
246
Branching processes
(2005), series for both "Z ;Po (w) and iZ ;Po ({) of a Poisson branching process
are presented. Fig. 12.3 illustrates that the probability density function
iZ ;Po ({) of a Poisson branching process is definitely distinct from that of
1
geometric branching process. Since H [Z ] = 1, the variance Var[ZPo ] = 31
of a Poisson limit random variable ZPo implies that iZ ;Po ({) is centered
around { = 1 more tightly as increases.
13
General queueing theory
Queueing theory describes basic phenomena such as the waiting time, the
throughput, the losses, the number of queueing items, etc. in queueing
systems. Following Kleinrock (1975), any system in which arrivals place
demands upon a finite-capacity resource can be broadly termed a queueing
system.
Queuing theory is a relatively new branch of applied mathematics that
is generally considered to have been initiated by A. K. Erlang in 1918 with
his paper on the design of automatic telephone exchanges, in which the famous Erlang blocking probability, the Erlang B-formula (14.17), was derived
(Brockmeyer et al., 1948, p. 139). It was only after the Second World War,
however, that queueing theory was boosted mainly by the introduction of
computers and the digitalization of the telecommunications infrastructure.
For engineers, the two volumes by Kleinrock (1975, 1976) are perhaps the
most well-known, while in applied mathematics, apart from the penetrating
influence of Feller (1970, 1971), the Single Server Queue of Cohen (1969)
is regarded as a landmark. Since Cohen’s book, which incorporates most
of the important work before 1969, a wealth of books and excellent papers
have appeared, an evolution that is still continuing today.
13.1 A queueing system
Examples of queueing abound in daily life: queueing situations at a ticket
window in the railway station or post o!ce, at the cash points in the supermarket, the waiting room at the airport, train or hospital, etc. In telecommunications, the packets arriving at the input port of a router or switch are
buered in the output queue before transmission to the next hop towards
the destination. In general, a queueing system consists of (a) arriving items
247
248
General queueing theory
Service process
Departure
process
Arrival
process
Queueing process
Fig. 13.1. The main processes in a general queueing system.
(packets or customers), (b) a buer or waiting room, (c) a service center
and (d) departures from the system.
The main processes as illustrated in Fig. 13.1 are stochastic in nature.
Initially in queueing theory, the main stochastic processes were described
in continuous time, while with the introduction of the Asynchronous Transfer Mode (ATM) at the late eighties, many queueing problems were more
eectively treated in discrete time, where the basic time unit or time slot
was the minimum service time of one ATM cell. In the literature, there is
unfortunately no widely adopted standard notation for the main random
variables, which often troubles the transparency. Let us start defining the
main random variables in continuous time.
13.1.1 The arrival process
The arrival process is characterized by the arrival time wq of the q-th packet
(customer) and the interarrival time q = wq wq31 between the q-th and
(q 1)-th packet. If all interarrival times are i.i.d. random variables with
distribution ID (w), then
Pr [q w] = ID (w)
As illustrated in Fig. 8.1, we can associate a counting process {Q (w)> w 0} to the arrival process {wq > w 0} by the equivalence {Q (w) q} +,
{wq w}. In other words, if all interarrival times are i.i.d., the number of
arriving packets (customers) is a general renewal process with interarrival
time distribution specified by ID (w). We mention explicitly the condition
of independence which was initially considered as a natural assumption.
In recent measurements, however, arrivals of IP packets are shown not to
obey this simple condition of independence, which has lead to the use of
complicated self-similar and long-range dependent arrival processes.
In the sequel, we will use the following
notation: QD (w) is the number of
Rw
arrivals at time w, while D(w) = 0 QD (x)gx is the total number of arrivals
in the interval [0> w].
13.1 A queueing system
249
13.1.2 The service process
The service process is specified in similar way by the service time {q of the qth packet (customer). If the random variables {q are i.i.d. with distribution
I{ (w), then
Pr [{q w] = I{ (w)
(13.1)
The service process needs additional specifications. First of all, in a singleserver queueing system, only one packet (customer) is served at a time. If
there is more than one server, more packets can evidently be served simultaneously. Next, we must detail the service discipline or scheduling rule,
which describes the way a packet is treated. There is a large variety of
service disciplines. If all packets are of equal priority, the simplest rule
is first-in-first-out (FIFO), which serves the packets in the same order in
which they arrive. Other types such as last-in-first-out or a random order are possible, though in telecommunication, FIFO occurs most often. If
we have packets of dierent multimedia flows, all with dierent quality of
service requirements, not all packets have equal priority. For instance, a
delay-sensitive packet (of e.g. a voice call) must be served as soon as possible preferably before non-delay-sensitive packets (of e.g. a file transfer).
In these cases, packets are extracted from the queue by a certain scheduling rule. The simplest case is a two-priority system with a head-of-the line
scheduling rule: high-priority packets are always served before low-priority
packets. In the sequel, we confine the presentation to a single-server system
with one type of packet and a FIFO discipline. Hence, we omit a discussion
of scheduling rules. A next assumption is that of work conservation: if there
is a packet waiting for service, the server will always serve the packet. Thus,
the server is only idle if there are no packets waiting in the buer and immediately starts service when the first packet is placed in the queue or arrives.
In a non-work-conservative system, the server may stay idle, even if there
are customers waiting (e.g. a situation where patients have to wait during
a coee break in a hospital). Finally, we assume that the arrival process
is independent of the service process. Situations where arriving packets of
some type (e.g. control) change the way the remaining packets in the buer
are served, or a service discipline that serves at a rate proportional to the
number of waiting packets, are not treated.
The service in a router consists in fetching the packet from the buer,
inspecting the header to determine the correct output port and in placing
the packet on the output link for transmission.
In this chapter unless the contrary is explicitly mentioned, we consider
250
General queueing theory
a single-server queueing system under a work-conservative, FIFO service
discipline in which the arrival and service process are independent.
13.1.3 The queueing process
From Fig. 13.1, we observe at least two aspects regarding the queue or
buer: (a) the number of dierent queues and (b) the number of positions
in the queue. In general, a queueing system may have several queues or
even a shared queue for dierent servers. For example, in a router, there
is one physical fast memory or buer in which arriving packets are placed.
Depending on the output interfaces, each link driver per output port is a
server that extracts the packet destined for its link from the common buer
and transmits the packet on this link. For simplicity, we consider here only
one queue with N positions. Often queueing analyses are greatly simplified
in the infinite buer case N $ 4. If the buer is infinitely long, there is
zero loss, as opposed to the finite buer case in which losses can occur if the
queue is full and packets arrive.
So far, the description of the queueing system is complete: we have specified the arrival process, the service process and the physical size of the
waiting room or queue. We now turn our attention to desirable quantities
that can be deduced from the model specification of the queueing system
such as (a) the waiting or queueing time zq of the q-th packet, (b) the
system time Wq = zq + {q of the q-th packet, (c) the unfinished work (also
called the virtual waiting time or workload) y(w) at time w, (d) the number
of packets in the queue QT (w) or in the system QV (w) at time w and (e) the
departure time uq of the q-th packet.
The waiting or queueing time zq of the q-th packet is only zero if the
queue is empty at arrival time wq . The unfinished work y(w) at time w is
the time needed to empty the queueing system or to serve all remaining
packets in the system (queue plus server) at time w. Hence, the unfinished
work at time w is equal to the sum of the service times of the QT (w) buered
packets at time w plus the remaining service time of the packet under service
at time w. Precisely at an arrival epoch w = wq as illustrated in Fig. 13.2, we
observe that y(wq ) = Wq = zq + {q . In addition, y 0 (w) = 1 for all w 6= wq or
y(w) = max [Wq w + wq > 0] for w wq .
The departure times uq satisfy uq = wq + Wq . The time during which
the server is busy is called a busy period, and likewise, the interval of nonactivity is called an idle period.
13.1 A queueing system
251
v(t)
x2
x1
x3
x6
w4
w3
w2
x4
x5 w
6
r1
r2
r3
r4
r5
r6
NS(t)
t1
t2
t3
t4
t5
busy period
idle
t6
t
t
Fig. 13.2. The unfinished work y(w) and the number of packets in the system QV (w)
as function of time. At any new arrival at wq holds y(wq ) = zq +{q . The unfinished
work y(w) decreases with slope 1 between two arrivals. The waiting times zq and
departure times uq are also shown. Notice that z1 = z5 = 0.
13.1.4 The Kendall notation for queueing systems
Kendall introduced a notation that is commonly used to describe or classify
the type of a queueing system. The general syntax is D@E@q@N@p, where
D specifies the interarrival process, E the service process, q the number of
servers, N the number of positions in the queue and p restricts the number
of allowed arrivals in the queueing system. Examples for both the interarrival distribution D and the service distribution E are P (memoryless or
Markovian) for the exponential distribution, J for a general distribution1
and G for a deterministic distribution. When other letters are used besides
these three common assignments, the meaning will be defined. For example, P@J@1 stands for a queueing system with exponentially distributed
interarrival times, a general service distribution and 1 server. If one of the
two last identifiers N and p is not written, they should be interpreted as
infinity. Hence, P@J@1 has an infinitely long queue and no restriction on
the number of allowed arrivals.
1
Often J is written where JL, general independent process, is meant. We interpret J as a
general interarrival process, which can be correlated over time.
252
General queueing theory
13.1.5 The tra!c intensity An important parameter in any queueing system is the tra!c intensity also
called the load or the utilization, defined as the ratio of the mean service
time H [{] = 1 over the mean interarrival time H [ ] = 1
=
H [{]
=
H [ ]
(13.2)
where and are the mean interarrival and service rate, respectively.
Clearly, if A 1 or H [{] A H [ ], which means that the mean service time is
longer than the mean interarrival time, then the queue will grow indefinitely
long for large w, because packets are arriving faster on average than they
can be served. In this case ( A 1), the queueing system is unstable or will
never reach a steady-state. The case where = 1 is critical. In practice,
therefore, mostly situations where ? 1 are of interest. If ? 1, a steadystate can be reached. These considerations are a direct consequence of the
law of conservation of packets in the system, but can be proved rigorously
by ergodic theory or Markov steady-state theory, which determine when the
process is positive recurrent.
13.2 The waiting process: Lindley’s approach
From the definition of the waiting time and from Fig. 13.2, a relation between
zq+1 and zq is found. Suppose the waiting time for the first packet z1 = z,
which is the initialization. If uq wq+1 , which means that the q-th packet
leaves the queueing system before the (q + 1)-th packet arrives, the system
is idle and zq+1 = 0. In all other situations, uq A wq+1 , the q-th packet
is still in the queueing system while the next (q + 1)-th packet arrives and
zq+1 = wq + zq + {q wq+1 . Indeed, the waiting time of (q + 1)-th packet
equals the system time Wq = zq + {q of the q-th packet which started
at wq minus his own arrival time wq+1 . During the interval [wq > wq+1 ], the
queueing system has processed an amount of the unfinished work equal to
wq+1 wq = q+1 time units. Hence, we arrive at the general recursion for
the waiting time,
zq+1 = max (zq + {q q+1 > 0)
Let q = {q q+1 , then
zq+1 = max [0> zq + q ]
= max [0> max [zq31 + q31 > 0] + q ]
= max [0> q > zq31 + q31 + q ]
(13.3)
13.2 The waiting process: Lindley’s approach
and, by iteration,
"
zq+1 = max 0> q > q31 + q > q32 + q31 + q > = = = >
q
X
253
#
n + z1
(13.4)
n=1
A number of observations are in order:
First, if both the interarrival times q+1 and the service times {q are
i.i.d. random variables and mutually independent, then the dierences q
are i.i.d. random variables. In addition, zq and q are also independent
because (13.4) shows that zq only depends on n with indices n ? q. Then,
the waiting time process {zq }qD1 is a discrete time Markov process with
a continuous state space (the waiting times zq are positive real numbers)
because the general relation (13.3) reveals that, since the random variable
q is independent of zq , the waiting time for the (q + 1)-th packet is only
dependent on the waiting time of the previous q-th packet. This is the
Markov property. Since the state space is a continuum, it is not a Markov
chain, merely a Markov process.
Second, if there exists a packet p for which zp = 0 (e.g. packet p = 5
in Fig. 13.2), which means that the p-th packet finds the system empty,
then all packets after the p-th packet are isolated from the eects of those
before the p-th. Mathematically, this separation between two busy periods
directly follows from (13.3) because zp+1 = max [0> p ] leading via iteration
for q p to
#
"
q
q
X
X
zq+1 = max 0> q > q31 + q > = = = >
n >
n
n=p+1
n=p
In other words, this relation is similar to (13.4) as if the system were started
from n = p and zp = 0 instead of n = 1 with z1 = z. Any busy period can
be regarded as a renewal of the waiting process, independent of the previous
busy periods.
Third, again invoking the assumption that q are i.i.d. random variables,
then the order in the sequence {q }qD1 is of no importance in (13.4) and
we may relabel the random variables in (13.4) as n $ q3n+1 to obtain a
new random variable
#
"
q31
q
X
X
n >
n + z1
$q+1 = max 0> 1 > 1 + 2 > = = = >
n=1
n=1
which is identically distributed as zq+1 . The interest of this observation
P
is that, provided z1 = 0 and, hence, $q = max1$m$q q3m
n=0 n where 0 =
0, the sequence {$q }qD1 can only increase with q, because the maximum
254
General queueing theory
P
cannot decrease2 if an additional term q+1
n=0 n is added. Thus, if z1 = 0,
the event {$q+1 ? {} is always contained in {$q ? {}. In steady-state,
which is reached if q $ 4,
lim {$q ? {} = _"
q=1 {$q ? {} = {sup
q<"
mD0
m
X
n ? {}
n=0
which means that the random variable $q with same distribution as the
waiting time zq converges to a limit random variable that is the supremum
P
of the terms mn=0 n in the series. From this relation, it follows that the
steady-state distribution Z ({) of the waiting time is
"
#
m
X
n ? {
Z ({) = lim Pr [zq ? {] = lim Pr [$q ? {] = Pr sup
q<"
q<"
mD0
n=0
if the latter probability exists, i.e. not zero for all {. Lindley has proved that,
if ? 1, the latter corresponds to a proper probability distribution. In other
words, the steady-state distribution of the waiting time in a GI/G/1 system3
exists. Alternatively, the Markov process {zq }qD1 is ergodic if ? 1.
Lindley’s proof is as follows. Due to the assumption that q are
i.i.d. random variables,
the
1 Sq
Strong Law of Large Numbers (6.3) is applicable: Pr limq<" q
n=0 n = H [] = 1 where
H [] = H [{] 3 H [ ] ? 0 (the mean service time is smaller than the mean interarrival time) if
? 1 while H [] A 0 if A 1. In case A 1, there exists a number A 0 and A 1 such
Sq
Sq
1
that, for all q A , holds
n=0 n D H [] q with probability 1. For large ,
n=0 n can be
k
l
Sm
made larger than any fixed { such that Pr supmD0 n=0 n ? { = 0. In case ? 1, we have for
Sq
su!ciently large q that
n=0 n ? 0. Thus, for any { A 0 and A 0, there exists a number (independent of {) such that, for all q A ,
% q
&
% q
&
[
[
Pr
n ? { D Pr
n ? 0 A 1 3 n=0
n=0
while, for q ? , we can always find a number A 0 such that for all { A ,
&
% q
[
n ? { A 1 3 Pr
n=0
Sm
Since supmD0 n=0 n is attained for m ? or m A and because both regimes can be bounded
by the same lower bound,
6
5
&
%qA
&
%q?
m
[
[
[
n ? {8 A Pr
n ? { 1m? + Pr
n ? { 1mA
Pr 7sup
mD0 n=0
n=0
n=0
A13
k
l
k
l
S
S
Clearly, lim{<" Pr supmD0 mn=0 n ? { = 1 and Pr [zq ? 0] = 0, thus Pr supmD0 mn=0 n ? {
2
This observation cannot be made from (13.4) because q , which aects all but the first term
in the maximum, can be negative.
3 Notice that the analysis crucially relies on the independence of the interarrival and service
process.
13.2 The waiting process: Lindley’s approach
255
is non-decreasing and a proper probability distribution. We omit the considerations for the case
= 1.
¤
We now concentrate on the computation of the steady-state distribution
for the waiting time (in the queue) in the case that the load ? 1 and
under the confining assumption that both the interarrival times q+1 and
the service times {q are i.i.d. random variables. We find from (13.3) that
Pr [zq+1 ? {] = Pr [zq + q ? {]
if { 0
Pr [zq+1 ? {] = 0
if { ? 0
With the law of total probability (2.46) and since q can be negative, the
right-hand side is
Z "
g
Pr [zq ? { v|q = v]
Pr [q ? v] gv
Pr [zq + q ? {] =
gv
3"
Using the independence of zq and q , and that zq 0, we obtain for { 0,
Z {
Pr [zq + q ? {] =
Pr [zq ? { v] g Pr [q ? v]
3"
The distribution Pr [q ? v] = Pr [{q q+1 ? v] = Fq (v) can be computed
(see Problem (v) in Chapter 3) provided the interarrival and service process
are known. Proceeding to the steady-state by letting q $ 4 amounts to
Lindley’s integral equation in Z ({) = limq<" Pr [zq ? {] with F({) =
limq<" Fq ({),
Z {
Z ({ v)gF(v)
if { 0
(13.5)
Z ({) =
3"
=0
if { ? 0
The integral equation (13.5) is of the Wiener-Hopf type and treated in general by Titchmarsh (1948, Section 11.17) and specifically by Kleinrock (1975,
Section 8.2) and Cohen (1969, p. 337). Apart from Lindley’s approach, Pollaczek has used variants of the complex integral expression for
Z
{h3d{ f+l" h{}
g}
(f A Re(d) A 0)
max({> 0) =
2l f3l" } d
to treat the complicating non-linear function max({> 0) in (13.3). Several
other approaches (Kleinrock, 1975, Chapter 8) have been proposed to solve
(13.3). We will only discuss the approach due to Benes̆, because his approach
does not make the confining assumption that both the interarrival times
q+1 and the service times {q are i.i.d. random variables. As mentioned
before, in Internet tra!c, which has been shown to be long-range dependent
256
General queueing theory
(i.e. correlated over many time units mainly due to TCP’s control loop), the
interarrival times can be far from independent.
D(t)
[(W)
v(W)
D(u)
x1+ x2+ x3
[(u)
v(u)
x1+ x2
u
x1
idle
t
0 t1 t2
t3 t 4
u
t5
t 6 t7
t8 t9
W
t10
Fig. 13.3. The amount of work arriving to the queueing system (w) versus time
w. At w = x, we observe that (w) = (w) w + y(0 ) A 0. The largest value of
(w) (x) is found for x = w1 because (w1 ) = w1 , the only negative value of (w)
in [0> x). Graphically, we shift the line at 45r so to intersect the point (w1 > (w1 )) to
determine y(x). At w = , ( ) ? 0 and the largest negative value of (w) in [0> )
is attained at w = w8 . Three of the five idle periods have also been shown.
13.3 The Benes̆ approach to the unfinished work
Instead of observing the queueing system at a time w, the Benes̆ approach
considers the behavior over a time interval [0> w). Let (w), (w) and e(w)
denote the amount of work arriving to the queueing system in the interval
[0> w), the total idle time of the server and the total busy time of the server in
the interval [0> w), respectively. The amount of work arriving to the system
is expressed in units of time and must be regarded as the time needed to
process this work, similarly to the definition of the unfinished work. If the
work to process arrives at discrete times, then (w) increases in jumps, as
13.3 The Benes̆ approach to the unfinished work
257
illustrated in Fig. 13.3,
X
D(w)
(w) =
{m
m=0
In general, however, the work may arrive continuously over time with possibly jumps at certain times. The purpose is to determine the unfinished
work or virtual waiting time y(w) at time instant w, and not Rover a time
w
interval [0> w) as the previously defined quantities and D(w) = 0 QD (x)gx.
Clearly, for w A 0, the unfinished work at time w consists of the total amount
of work brought in by arrivals during [0> w) plus the amount of work present
just before w = 0 minus the total time the server has been active,
y(w) = y(03 ) + (w) e(w)
(13.6)
From the definitions above,
(w) + e(w) = w
(13.7)
Moreover, (w), (w) and e(w) are non-decreasing and right continuous (jumps
may occur) functions of time w. Since (w) and e(w) are complementary, it is
convenient to eliminate e(w) from (13.6) and (13.7) and further concentrate
on the total idle time (w) given as
(w) = y(w) + w y(03 ) (w)
(13.8)
If y(x) A 0 at any time x 5 [0> w), then (w) = 0. On the other hand, if
y(x) = 0 at some time x 5 [0> w), then it follows from (13.8) that
(x) = x y(03 ) (x)
(13.9)
Since (w) is non-decreasing in w, the total idle time in the interval [0> w) at
moments x when the buer is empty (y(x) = 0) is the largest value for that can be reached in [0> w),
¢
¡
(w) = sup x y(03 ) (x)
0?x?w
and the supremum is needed because (w) can increase discontinuously (in
jumps). Combining the two regimes, we obtain in general that
¸
¡
¢
3
(w) = max 0> sup x y(0 ) (x)
(13.10)
0?x?w
258
General queueing theory
Equating the two general expressions (13.8) and (13.10) for the total idle
time of the server leads to an new relation for the unfinished work,
¸
¡
¢
3
3
y(w) = y(0 ) + (w) w + max 0> sup x y(0 ) (x)
0?x?w
¸
3
= max y(0 ) + (w) w> sup {(w) w ((x) x)}
0?x?w
The quantity (w) = (w) w + y(03 ) is recognized as the server overload
during [0> w), while (w) w is the amount of excess work arriving during the
interval [0> w). Thus, (w) (x) is the amount of excess work during [x> w)
or the overload of the server during [x> w) provided x A 0 and (0) = y(03 ).
Then,
¸
y(w) = max (w)> sup {(w) (x)}
0?x?w
and, with the convention that y(03 ) = sup0?x?03 {(0) (x)} = (0),
y(w) = sup {(w) (x)}
(13.11)
0?x?w
The unfinished work y(w) at time w is equal to the largest value of the overload
or excess work during any interval [x> w) [0> w). The relation (13.11) is
illustrated and further explained in Fig. 13.3. This general relation (13.11)
shows that the unfinished work is the maximum of a stochastic process.
Furthermore, if y(w) = 0, (13.11) indicates that sup0?x?w {(w) (x)} = 0.
Let xW denote the value at which sup0?x?w {(w) (x)} = (w) (xW ) = 0.
But (xW ) is the lowest value of (w) in [0> w) and, unless an arrival occurs
during the interval [w> w+ w], (w + w) = (w) w ? (w). This argument
shows that, as soon as a new idle period begins, (w) attains the minimum
value so far.
During the idle period as shown in Fig. 13.4, (w) further decreases linearly
with slope 1 towards a new minimum (em ) in [0> em ] until the beginning of
a new busy period, say the m-th at w = em . Then, for all em ? w ? em+1 ,
y(w) = (w) (em ) = sup {(w) (x)}
em ?x?w
In other words, we observe that idle periods decouple the past behavior from
future behavior, as deduced earlier from the waiting time analysis in Section
13.2. As illustrated in Fig. 13.4, the series {(em )}, where em denotes the start
of the m-th busy period, is monotonously decreasing in em , i.e. (em ) A (em+1 )
for any m.
13.3 The Benes̆ approach to the unfinished work
259
[(t)
b1
b3
b2
b4
b5
0
t
0 t1 t2
t3 t4
t5
t6
t7
t8 t9
t10
Fig. 13.4. The excess work (w) for the same process as in previous plot. The arrows
with em denote the start of the m-th busy period. Observe that (em ) is the minimum
so far and that a busy period ends at w A em for which (w) = (em ). The length of
a busy period has been represented by a double arrow.
Let us proceed to compute the distribution of the unfinished work following an idea due to Benes̆. Benes̆ applies the identity4 , valid for all },
Z w
3}w
=1}
h3} g
h
0
to the total idle time of the server by putting = (x)
3}w
h
Z 31 (w)
=1}
h3}(x) g(x)
31 (0)
where 31 (w) is the inverse function. Note that (0) = 0 and that g(x) =
1{y(x)=0} gx = (y(x)) gx where ({) is the Dirac impulse. Let w $ (w),
then
Z w
3}(w)
=1}
h3}(x) (y(x)) gx
h
0
Substituting (13.9) in the integral, which is only valid if y(x) = 0, and (13.8)
at the left-hand side, which is generally valid, gives
Z w
3
3
h3}(y(w)+w3y(0 )3(w)) = 1 }
h3}(x3y(0 )3(x)) (y(x)) gx
0
4
Borovkov (1976, p. 30) proposes another but less simple approach by avoiding the use of the
identity ingeniously introduced by Benes̆=
260
General queueing theory
or, in terms of the excess work (w) = (w) w + y(03 ),
Z w
3}y(w)
3}(w)
=h
}
h3}((w)3(x)) (y(x)) gx
h
0
Taking the expectation of both sides yields,
Z w h
h
i
i
h
i
3}y(w)
3}(w)
3}((w)3(x))
H h
H h
(y(x)) gx
=H h
}
0
Recall that, with (2.34), with the definition of a generating function (2.37),
and further with (2.61),
i
h
T = H h3}((w)3(x)) (y(x))
Z "Z "
C2
=
h3}{ (|)
Pr [(w) (x) {> y(x) |] g{g|
C{C|
3" 3"
and with (2.45), we have
Z 4Z 4
C2
Pr [(w) (x) {|y(x) |] Pr [y(x) |] g{g|
h}{ (|)
C{C|
4 4
Z 4
Z 4
C Pr [y(x) |]
C
=
g{h}{
(|) Pr [(w) (x) {|y(x) |]
g|
C{ 4
C|
4
Z 4
g
h}{
=
Pr [(w) (x) {|y(x) = 0] Pr [y(x) = 0] g{
g{
4
T=
Combining all leads to
] "
3"
h3}{ iy(w) ({)g{ =
] "
3"
3}
h3}{ i(w) ({)g{
] w
] "
g
h3}{
Pr [(w) 3 (x) $ {|y(x) = 0] Pr [y(x) = 0] gx g{
g{
3"
0
By partial integration, we can remove the factor } at both sides. Indeed,
since Pr [y(w) {] = 0 for { ? 0,
Z "
Z "
3}{
h iy(w) ({)g{ = }
h3}{ Pr [y(w) {] g{
3"
3"
Hence, we arrive at
] "
h3}{ Pr [y(w) $ {] g{ =
3"
] "
k
h3}{ Pr [(w) $ {]
3"
3
which is equivalent to
Pr [y(w) {] = Pr [(w) {] g
g{
] w
Pr [(w) 3 (x) $ {|y(x) = 0] Pr [y(x) = 0] gx g{
Z w
0
g Pr [(w) (x) {|y(x) = 0]
Pr [y(x) = 0] gx
g{
0
(13.12)
13.3 The Benes̆ approach to the unfinished work
261
This general relation for the distribution of the unfinished work in terms
of the excess work is the Benes̆ equation. If y(x) = 0 for all x 5 [0> w),
this means that during that interval no work arrives and that (w) (x) =
x w or that w (w) (x) 0 for any x 5 [0> w). Thus, if we choose
{ 5 [w> 0) such that the event {(w) (x) = x w {} is possible, the
probabilities appearing in the right-hand side are not identically zero while
Pr [y(w) {] = 0. Hence, for { 5 [w> 0), the Benes̆ equation reduces to
Z w
g Pr [(w) (x) {|y(x) = 0]
Pr [y(x) = 0] gx
Pr [(w) {] =
g{
0
from which the unknown probability of an empty system Pr [y(x) = 0] can
be found5 for w + { x w. The Benes̆ equation translates the problem
of finding the time-dependent virtual waiting time or unfinished work in an
integral equation that, in principle, can be solved. We further note that
in the derivation hardly any assumptions about the queueing system nor
the arrival process are made such that the Benes̆ equation provides the
most general description of the unfinished work in any queueing system. Of
course, the price for generality is a considerable complexity in the integral
equation to be solved. However, we will see examples6 of its use in ATM.
13.3.1 A constant service rate
If the server operates deterministically as in ATM, for example, the amount
of work arriving to the queueing system in the interval [0> w) simplifies to
(w) = D(w), the number of ATM cells in the interval [0> w), because {m = {
is the time to process one ATM cell, which we take as time unit { = 1.
With this convention, we have that (w) = D(w) w. After substitution of
x = w |, the integral L in (13.12) is
Z w
g Pr [(w) (w |) {|y(w |) = 0]
Pr [y(w |) = 0] g|
L=
g{
0
and the event
{(w) (w |) {} = {D(w) D(w |) { + |}
5
This relation in the unknown function i (x) = Pr [y(x) = 0] is a Volterra equation of the first
kind (see e.g. Morse and Feshbach (1978, Chapter 8))
] }
j(}) =
N(}|x)i (x)gx
d
These integral equations frequently appear in physics in boundary problems, potential and
Green’s function theory.
6 Borovkov (1976) investigates the Benes̆ method in more detail. He further derives from (13.12)
formulae for light and heavy tra!c, and the discrete time process.
262
General queueing theory
Since D(w) D(w |) is a non-negative integer n and thus a discrete random
variable, the probability density function is
gPr[(w)(w|) {|y(w|) = 0]
= Pr[D(w)D(w|) = n|y(w|) = 0] 1{+|=n
g{
which implies that only values at | = n { contribute to the integral L.
Hence, 0 | w implies that d{e n b{ + wc where b}c (respectively
d}e) are the integer equal to or smaller (respectively larger) than },
X
b{+wc
L=
Pr [D(w) D(w + { n) = n|y(w + { n) = 0] Pr [y(w + { n) = 0]
n=d{e
Hence, for a discrete queue with time slots equal to the constant service
time, the Benes̆ equation reduces to
Pr [y(w) {] = Pr [D(w) b{ + wc]
X
(13.13)
b{+wc
Pr [D(w)D(w+{n) = n|y(w+{n) = 0]Pr[y(w+{n) = 0]
n=d{e
13.3.2 The steady-state distribution of the virtual waiting time
Let us turn to the steady-state distribution Y ({) = limw<" Pr [y(w) {].
Since (w) is the amount of work arriving to the queueing system in the
is the mean amount of work arriving in that interval
interval [0> w), (w)
w
[0> w). The steady-state stability condition, equivalent to ? 1, requires
that
(w)
=?1
lim
w<" w
because the server capacity is 1 unit of work per unit of time. Since (w)
is not decreasing, (w) decreases continuously with slope 1 between two
arrivals and increases (possibly discontinuously with jumps) during arrival
epochs, as illustrated in Fig. 13.3. In the stable steady-state regime where
? 1 and limw<" (w)
w ? 1, we find that
µ
¶
(w)
lim (w) = lim w
1 = 4
w<"
w<"
w
and thus limw<" Pr [(w) {] = 1 and limw<" (w)
w = 1 ? 0. From (13.7),
we see that
(w)
e(w)
= 1 lim
=1
lim
w<" w
w<" w
13.4 The counting process
263
which, with (13.9), suggests that
Y (0) = lim Pr [y(w) = 0] = 1 w<"
(13.14)
If the Strong Law of Large Numbers is applicable, which implies that the
lengths of the idle periods are independent and identically distributed, this
relation Y (0) = 1 is proved to be true by Borovkov (1976, pp. 33—34).
Hence, for any stationary single-server system with tra!c intensity , the
probability of an empty system at an arbitrary time is 1 .
Taking the limit w $ 4 in (13.12) then yields
Z w
gPr[(w)(w |) {|y(w |) = 0]
Pr[y(w |) = 0]g|
Y ({) = 1 lim
w<" 0
g{
The tail probability 1 Y ({) = limw<" Pr [y(w) A {] = Pr [y(w" ) A {] is
Z "
gPr[(w" )(w"|) {|y(w"|) = 0]
Pr[y(w" |) = 0]g|
Pr[y(w" ) A {] =
g{
0
This relation shows that, at a point in the steady-state w" $ 4, the contributions to Y ({) are due to arrivals and idle periods in the past. The
corresponding steady-state equation for (13.13) is
¯Z w"
¸
"
X
¯
Pr y(w" + { n) = 0 ¯¯
QD (x) gx = n
Pr [y(w" ) A {] =
n=d{e
w" +{3n
¸
Z w"
× Pr
w" +{3n
QD (x) gx = n
(13.15)
13.4 The counting process
A similar general conservation relation to (13.3) can be deduced for the
counting process,
QV (un+1 ) = max (QV (un ) 1> 0) + QD (un+1 ) QD (un )
The number of packets in the system at the departure time of the (n + 1)-th
packet equals the number of packets in the system at the departure time
of the previous packet n minus that packet, but increased by the number
of arrivals in the time interval [un > un+1 ]. Similarly, for the queue (which is
system minus the packet currently under service),
QT (un+1 ) = max (QT (un ) + QD (un+1 ) QD (un ) 1> 0)
which is the direct analog of (13.3) for the waiting time.
Whereas the waiting process is more natural to consider in problems where
264
General queueing theory
interarrival times are specified, the counting process has more advantages
in a discrete time analysis. In the latter, the queueing system is observed
at certain moments in time, for instance, at the beginning of a timeslot n
that starts immediately7 after the departure of the n-th packet and is equal
to the interval [un > un+1 ], for all un A 0 and u0 = 0. It will be convenient
to simplify the notation: Sn = QV (un+ ) denotes the system content (i.e. the
number of occupied queue positions including the packets currently being
served) at the beginning of timeslot n, Qn = QT (un+ ) is the queue content
at the beginning of timeslot n, and Xn and An are the number of served
packets and of arriving packets during timeslot n respectively. The system
content satisfies the continuity (or balance) equation
Sn+1 = (Sn Xn )+ + An
(13.16)
whereas the queue content obeys
Qn+1 = (Qn Xn + An )+
(13.17)
where ({)+ max({> 0). On the other hand, the relation between system
and queue content implies that Qn = (Sn Xn )+ such that (13.16) is rewritten as
Sn+1 = Qn + An
(13.18)
The number of cells at the beginning of a timeslot n + 1 in the system is
the sum of the number of queued packets at the beginning of the previous
timeslot n and the newly arrived packets during timeslot n.
13.4.1 Queue observations
It is worthwhile to investigate the relation between observations at various
instances of time of the queueing process {QV (w)> w 0} which represents the
number of packets in the system at time w. As seen before and as illustrated
in Fig. 13.5, two time instances seem natural: an inspection at departure
times where QV (uq+ ) describes the number of packets in the system just
after departure of the q-th packet and an observation at arrival times where
QV (w3
q ) describes the number of packets in the system just before the q-th
packet enters.
Suppose that the q-th packet leaves QV (uq+ ) = n m packets behind
in the system. This implies that precisely n arrivals after the q-th packet
have entered the system. Hence, the (q + n + 1)-th packet sees, just before
7
We write {+ = { + and {3 = { 3 where A 0 is an arbitrary, positive real number. The
notation {+ should not be confused with ({)+ = max({> 0).
13.4 The counting process
265
entering the system, at most n packets, because during the period w = uq
and w = wq+n+1 uq , only departures are possible. Thus, QV (w3
q+n+1 ) n
and, clearly, for m n, the (q + m + 1)-th packet observes no more than
QV (w3
q+m+1 ) m. Hence, the following implication holds
n
o
©
ª
QV (uq+ ) m =, QV (w3
)
m
q+m+1
t n j 1
t n k 1
tn+k
n+k
n+k
n+1
x d n+1
n+j
n+j+1-k
n
r
n j k
r
only departures possible
^N r
S
n
` ^
k Ÿ NS t
n k 1
only arrivals possible
dk
`
^N t
S
which holds for all k d j,
n
S
k Ÿ N S rn j k d k
`
which holds for all k d j,
^N r d j`Ÿ ^N t
S
` ^
n j 1
n j 1
^N t
S
d j
n j 1
`
^N t
S
` ^
n j 1
d j œ N S rn d j
` ^
d j Ÿ N S rn d j
`
`
Fig. 13.5. Relation between queue observations at arrival and at departure epochs.
Consider now the converse. Suppose that the (q + m + 1)-th packet sees
precisely n m packets in front of it upon arrival: QV (w3
q+m+1 ) = n m. This
implies that the (q + m + 1 n)-th packet is the first packet that will leave
the system after w = wq+m+1 and that the (q + m n)-th packet has already
left the system. At its departure at w = uq+m3n wq+m+1 , it has observed at
most n packets behind it, because only arrivals are possible in the interval
+
) n and, set n = m, then QV (uq+ ) m
[uq+m3n > wq+m+1 ). Hence, QV (uq+m3n
leading to the implication
o
n
©
ª
+
QV (w3
q+m+1 ) m =, QV (uq ) m
Combining both implications leads to the equivalence,
o
n
©
ª
)
m
+, QV (uq+ ) m
QV (w3
q+m+1
266
General queueing theory
or, for any sample path (or realization), it holds, for any non-zero integer m,
that
h
i
£
¤
Pr QV (w3
)
m
= Pr QV (uq+ ) m
q+m+1
In steady-state for q $ 4, with limq<" QV (w3
q ) = QV;D and limq<"
QV (uq+ ) = QV;G , we find that
Pr [QV;D = m] = Pr [QV;G = m]
(13.19)
In words, in steady-state, the probability of the number of packets in the
system observed by arriving packets is equal to the probability of the number of packets in the system left behind by departing packets. Of course,
we have assumed that the steady-state distribution exists. If one of these
distributions exists, the analysis demonstrates that the other must exist.
Notice that no assumptions about the distribution or dependence are made
and that (13.19) is a general result which only assumes the existence of a
steady-state.
13.5 PASTA
Let us denote by limw<" QV (w) = QV the steady-state system content or the
number of packets in the system in steady-state. To compute the waiting
time distribution (under a FIFO service discipline), we must take the view
of how a typical arriving packet in steady-state finds the queue. Therefore,
it is of interest to know when
?
Pr [QV;D = m] = Pr [QV = m]
(13.20)
The equality would imply that, in steady-state, the probability that an arriving packet finds the system in state m equals the probability that the
system is in state m. Recall with (6.1) that the existence of the probabilities
means that Pr [QV = m] also equals the long-run fraction of the time the system contains m packets or is in state m. Similarly, Pr [QV;D = m] also equals
the long-run fraction of arriving packets that see the system in state m. In
general, relation (13.20) is unfortunately not true. For example, consider
a D/D/1 queue with a constant interarrival time f and a constant service
time {f ? f . Clearly, the D/D/1 system has a periodic service cycle: a
busy period takes {f time units and the idle period equals f {f time
units. Thus, every arriving packet always finds the system empty and conf
cludes Pr [QV;D = 0] = 1, while Pr [QV = 1] = {ff and Pr [QV = 0] = f 3{
f .
The waiting time computation of the GI/D/c system in Section 14.4.2 is
another counter example. Since the arrival process {QD (w)> w 0} interacts
13.6 Little’s Law
267
with the system process {QV (w)> w 0} because every arrival increases the
system content with one, they are dependent processes. Relation (13.20)
is true for Poisson arrivals and this property is called “Poisson arrivals see
time averages” (PASTA).
Theorem 13.5.1 (PASTA) The long-run fraction of time that a process
spends in state m is equal to the long-run fraction of Poisson arrivals that
find the process in state m>
Pr [QV;D = m] = Pr [QV = m]
Proof: See8 e.g. Wol (1982).
¤
The Poisson process has the typical property that future increments are
independent of the past and, thus also of the past system history. In certain
sense, Poisson arrivals perform a random sampling which is su!cient to characterize the steady-state of the system exactly. The PASTA property also
applies to Markov chains. The transitions in continuous time Markov chains
are Poisson processes if self-transitions are allowed (see Section 10.4.1). For
any state m, the fraction of Poisson events that see the chain in state m is
m , which (see Lemma 6.1.2) also equals the fraction of time the chain is in
state m.
13.6 Little’s Law
Little’s Law is perhaps the simplest of the general queueing formulae.
Theorem 13.6.1 (Little’s Law) The average number of packets (customers) in the system H [QV ] (or in the queue H [QT ]) equals the average
arrival rate times the average time spent in the system H [W ] (or in the
queue H [z]),
H [QV ] = H [W ]
(13.21)
H [QT ] = H [z]
8
Although Wollf’s general proof (Wol, 1982) only contains two pages, it is based on martingales
and on axiomatic probability theory.
268
General queueing theory
Little’s Law holds if two of the three limits
D(w)
=
w
(13.22)
QV (x)gx = H [QV ]
(13.23)
1X
lim
Wn = H [W ]
q<" q
(13.24)
lim
lim
1
w<" w
w<"
Z w
0
q
n=1
exist.
$(t)
idle
T3
T2
3
2
1
t
T1
t 1 = 0 t2
t5 W
t3 t 4
t6
t7
t8 t9
t10
Fig. 13.6. The arrival (bold) and departure (dotted) process, together with the
system time Wn for each packet in the queueing system.
Proof: Recall that D(w) represents the total number of arrivals in time
interval [0> w]. If QV (w) = 0 or the system is idle at time w, then
Z w
QV (x)gx =
0
Z wX
"
0 m=1
D(w) Z w
1{xM[wm >wm +Wm ) gx =
X
m=1
0
X
D(w)
1{xM[wm >wm +Wm ) gx =
Wn
n=1
The general case where QV (w) 0 is more complicated as Fig. 13.6 shows
for w = because not all intervals [wm > wm + Wm ) for 1 m D( ) are contained
PD( )
in
n=1 Wn counts too much and is an upper bound for
R [0> ). Hence,
Q
(x)gx.
If
G(w)
denotes the number of departures in [0> w], Fig. 13.6
V
0
illustrates that the area (in grey) in an interval [0> w],Rwhich equals the total
w
number of packets in the system in that interval 0 (D(x) G(x))gx =
Rw
0 QV (x)gx, can be bounded for any realization (sample path) and any w 0
13.6 Little’s Law
269
by
Z w
X
Wn QV (x)gx 0
n:Wn +wn $w
X
D(w)
Wn
n=1
where the lower bound only counts the packets that have left the system by
time w. By dividing by w, we have
D(w)
w
X
n:Wn +wn $w
1
Wn
D(w)
w
Z w
D(w) X Wn
QV (x)gx w
D(w)
0
D(w)
(13.25)
n=1
Since we assume that the limit (13.22) exists, we have that D(w) = R(w)
for w $ 4. From the existence of the limit (13.24), we can thus write
X Wn
D(w)
lim
w<"
n=1
D(w)
= H [W ]
When w $ 4 in (13.25) and using the limits defined above, we find that
the upper bound converges to H [W ]. In order to proof (13.21), it remains
to show that also the lower bound in (13.25) converges to the same limit
H [W ]. Since D(w) = w + r(w) for w $ 4, it follows for the sequence of
arrival times wq that wq $ 4 if q $ 4 and that
q
D(wq )
=
$
wq
wq
as q $ 4
The convergence of the series (13.24) implies for q $ 4 that
Wq X Wn q 1 X Wn
=
$0
q
q
q
q1
q
q31
n=1
n=1
Combining both relations leads to
Wq q
Wq
=
$0
wq
q wq
as q $ 4
which implies that, for any % A 0, there exists a fixed p such that, for all
n A p, we have that Wwnn ? % or wn + Wn ? (1 + %)wn . For w A wp , the lower
bound in (13.25) is
X
nAp:Wn +wn $w
Wn +
p
X
n=1
X
D(w@(1+%))
Wn =
n=1
Wn
270
General queueing theory
or
1
w
X
nD1:Wn +wn $w
1 D(w@(1 + %))
Wn =
1 + % w@(1 + %)
In the limit w $ 4, we obtain
X
1
w
nD1:Wn +wn $w
Wn $
X
D(w@(1+%))
n=1
Wn
D(w@(1 + %))
H [W ]
1+%
Since % can be made arbitrarily small, this finally proves (13.21).
¤
Although the proof may seem rather technical9 for, after all, an intuitive
result, it reveals that no assumptions about the distributions of arrival and
service process apart from steady-state convergence are made. There are
no probabilistic arguments used. In essence Little’s Law is proved by showing that two limits exist for any sample path or realization of the process,
which guarantees a very general theorem. Moreover, no assumptions about
the service discipline, nor about the dependence between arrival and service
process or about the number of servers are made which means that Little’s
Law also holds for non-FIFO scheduling disciplines, in fact for any scheduling discipline! Little’s Law connects three essential quantities: once two of
them are known the third is determined by (13.21). Little’s Law is very
important in operations where it relates the average inventory (similar to
H [QV ]), the average flow rate or throughput and the average flow time
H [W ] in a process flow of products or services. Several examples can be
found in Chapter 14, in Anupindi et al. (2006) and Bertsekas and Gallager
(1992, pp. 157—162).
9
We have chosen for a very general proof. Other proofs (e.g. in Ross (1996) and Gallager (1996))
use arguments from renewal reward theory (Section 8.4) which makes their proofs less general
because they require that the system has renewals.
14
Queueing models
This chapter presents some of the simplest and most basic queueing models.
Unfortunately, most queueing problems are not available in analytic form
and many queueing problems require a specific and sometimes tailor-made
solution.
Beside the simple and classical queueing models, we also present two other
exact solvable models that have played a key role in the development of
Asynchronous Transfer Mode (ATM). In these ATM queueing systems the
service discipline is deterministic and only the arrival process is the distinguishing element. The first is the N*D/D/1 queue (Roberts, 1991, Section
6.2) whose solution relies on the Benes̆ approach. The arrivals consist of Q
periodic sources each with period of G time slots, but randomly phased with
respect to each other. The second model is the fluid flow model of Anick
et al. (1982), known as the AMS-queue, which considers Q on-o sources
as input. The solution uses Markov theory. Since the Markov transition
probability matrix has a special tri-band diagonal structure, the eigenvector
and eigenvalue decomposition can be computed analytically.
We would like to refer to a few other models. Norros (1994) succeeded
in deriving the asymptotic probability distribution of the unfinished work
for a queue with self-similar input, modeled via a fractal Brownian motion.
The resulting asymptotic probability distribution turns out to be a Weibull
distribution (3.40). Finally, Neuts (1989) has established a matrix analytic
framework and was the founder of the class of Markov Modulated arrival
processes and derivatives as the Batch Markovian Arrival process (BMAP).
14.1 The M/M/1 queue
The M/M/1 queue consists of a Poisson arrival process of packets with
exponentially distributed interarrival times, a service process with exponen271
272
Queueing models
tially distributed service time, one server and an infinitely long queue. The
M/M/1 queue is a basic model in queueing theory for several reasons. First,
as shown below, the M/M/1 queue can be computed in analytic form, even
the transient time behavior. Apart from the computational advantage, the
M/M/1 queue possesses the basic feature of queuing systems: the quantities
of interest (waiting time, number of packets, etc.) increase monotonously
with the tra!c intensity .
Packets arrive in the M/M/1 queue with interarrival rate and are served
with service rate . The M/M/1 queue is precisely described by a constant
rate birth and dead process. Any arrival of a packet to the queueing system
can be regarded as a birth. The current state n that reflects the number of
packets in the M/M/1 system jumps to state n + 1 at the arrival of a new
packet and the transition rate equals the interarrival rate : on average every
1
time units a packet arrives to the system. A packet leaves the M/M/1
system after service, which corresponds to a death: at each departure from
the system the current state is decreased by one, with death rate equal
to the service rate : on average every 1 time units, a packet is served.
In the sequel, we concentrate on the steady-state behavior and refer for the
transient behavior to the discussion of the birth and death process in Section
11.3.3.
14.1.1 The system content in steady-state
From the analogy with the constant rate birth and death process as studied
in Section 11.3.3, we obtain immediately the steady-state queueing distribution (11.23) as
Pr [QV = m] = (1 ) m
m0
(14.1)
where QV = limw<" QV (w) is the number of packets in the system in the
stationary regime. In other words, Pr [QV = m] is the probability that the
M/M/1 system (queue plus server) contains m packets. It has been shown
in Section 11.3 that the M/M/1 queue is ergodic (i.e. that an equilibrium
or steady-state exists) if = ? 1, which is a characteristic of a general
queueing system. The probability density function of the system content
(14.1) is a geometric distribution reflecting the memoryless property. We
observe that the M/M/1 system is empty with probability Pr [QV = 0] = 1
and of all states, the empty state has the highest probability. Immediately,
the chance that there is a packet in the M/M/1 system is precisely equal to
the tra!c intensity , namely Pr [QV A 0] = 1 Pr [QV = 0] = .
14.1 The M/M/1 queue
273
The corresponding probability generating function (2.18) is
*QV (}) =
"
X
n=0
Pr [QV = n] } n =
1
1 }
The average number of packets in the M/M/1 system H [QV ] = *0V (1) equals
¤
£
H QV;P@P@1 =
1
while the variance Var[QV ] follows from (2.27) as
¤
£
Var QV;P@P@1 =
(1 )2
Both the mean and variance of the number of packets in the system diverges
as $ 1. When the interarrival rate tends to the service rate, the queue
grows indefinitely long with indefinitely large variation. From Little’s law
(13.21), the average time spent in the M/M/1 system equals
¤ H [QV ]
£
1
1
=
=
H WP@P@1 =
(1 )
(14.2)
where ? 1 or, equivalently, ? . If = 0, there is no load in the system
and the average waiting time attains its minimum equal to the average
service time 1 . In the other limit $ 1, the average waiting time grows
unboundedly, just as the queue length or number of packets in the system.
The behavior of the M/M/1 system in the limit $ 1 is characteristic for
the average of quantities (QV , z> W> = = =) in many queueing systems: a simple
pole at = 1.
As a remark, the average waiting time in the M/M/1 queue follows, after
taking expectations from the general relation Wq {q = zq or H [z] =
H [W ] 1 , as
¤
£
H zP@P@1 =
1
1
=
(1 ) (1 )
(14.3)
14.1.2 The virtual waiting time
For the M/M/1 queue, the virtual waiting time y(w) at some time w consists
of (a) the residual service time of the packet currently under service, (b)
the time needed to serve the QT (w) packets in the queue. As mentioned in
Section 13.1.3, the virtual waiting time at arrival epochs equals the system
time Wq . In other words, if a new packet, say the q-th packet, enters the
M/M/1 system at w = wq , the total time (system time) that the packet
274
Queueing models
spends in the M/M/1 system equals y(wq ) = Wq . At w = w3
q , the number
)
does
not
include
the
new
packet
at
the
last
position
and the packet
QV (w3
q
3
“sees” QV (wq ) other packets in the system (queue plus the packet in the
server) in front of it. We assume further that the server operates in FIFO
(first in, first out) order. Since the service time is exponentially distributed
and possesses the memoryless property, the residual or remaining service
time of the packet currently under service has the same distribution. In
other words, it does not matter how long the packet has already been under
service. The more general argument is that the PASTA property applies.
The system time of the q-th packet is thus the sum of QV (w3
q )+1 exponential
3
i.i.d. random variables. As shown in Section 3.3.1, if QV (wq ) = n, the system
time has an Erlang distribution given by (3.24) with q = n + 1,
iWq (w|QV (w3
q ) = n) =
(w)n 3w
h
n!
Using the law of total probability (2.46), the system time Wq of the q-th
packet or the virtual waiting time at time w = wq becomes
g
Pr [Wq w]
gw
"
X
£
¤
3
=
iWq (w|QV (w3
q ) = n) Pr QV (wq ) = n
iWq (w) =
n=0
= h3w
"
X
(w)n
n=0
n!
¤
£
Pr QV (w3
q) = n
3
In Section 11.3.3, vn (w3
q ) = Pr [QV (wq ) = n] is computed in (11.27) assuming that the system starts with m packets, i.e. vn (0) = nm . In steadyn
state, where wq $ 4, it is shown that vn (w3
q ) $ (1 ) . In most
cases, however, a time-dependent solution is not available in closed form.
Fortunately, for Poisson arrivals, the PASTA property helps to circumvent this inconvenience. Based on the PASTA property, in steady-state,
limq<" Pr [QV (w3
q ) = n] = Pr [QV = n] given by (14.1). The probability
density function iW (w) = limq<" iWq (w) of the steady-state system time W
(or the total waiting time of a packet) is
3w
iW (w) = h
"
X
(w)n
n=0
n!
(1 ) n
or
iW (w) = (1 ) h3(13)w
(14.4)
14.1 The M/M/1 queue
275
In summary, the total time spent in the M/M/1 system in steady-state
1
1
= 3
, which has
( ? 1) has an exponential distribution with mean (13)
1
been found above in (14.2) by Little’s law. Similarly , the waiting time in
the M/M/1 queue is
iz (w) = (1 ) (w) + (1 ) h3(13)w
(14.5)
where the first term with Dirac function reflects a zero queueing time provided the system is empty, which has probability Pr [QV = 0] = 1 .
14.1.3 The departure process from the M/M/1 queue
There is a remarkable theorem due to Burke which has far-reaching consequences for networks of M/M/1 queues.
Theorem 14.1.1 (Burke) In a steady-state M/M/1 queue, the departure
process is a Poisson process with rate Burke’s Theorem is equivalent to the statement that the interdeparture
times uq uq31 in steady-state are i.i.d. exponential random variables with
mean 1 .
Proof: Let us denote the probability density function of the interdeparture time u by
g
Pr [u w]
iu (w) =
gw
In steady-state, it holds in general that Pr [QV;D = m] = Pr [QV;G = m], as
shown in Section 13.4.1, while the PASTA property (Theorem 13.5.1) states
that Pr [QV;D = m] = Pr [QV = m]. Hence, in steady-state in the M/M/1
queue, departing packets see the steady-state system content, i.e. Pr [QV;G = m] =
Pr [QV = m]. Moreover, in steady-state, the departure process can be decomposed into two dierent situations after the departure of a packet: (a) the
departing packet sees an empty system (which is equivalent to “the system is empty”) or (b) the departing packet sees a next packet in the queue
(which is equivalent to “the system serves immediately the next packet in
1
The Laplace transform of the waiting time in the queue follows from Wq = zq + {q as
(1 3 ) } + *W (})
=
*{ (})
} + (1 3 ) (1 3 ) = (1 3 ) + } + (1 3 ) *z (}) =
which, after inverse Laplace transformation, gives (14.5).
276
Queueing models
the queue”),
Pr [u w] = Pr [u w|QV = 0] Pr [QV = 0] + Pr [u w|QV A 0] Pr [QV A 0]
In case (a), we must await for the next packet to arrive and to be served.
This total time is the sum of an exponential random variable with rate and an exponential random variable with rate . It is more convenient to
compute the Laplace transform as shown in Section 3.3.1,
Z "
h3}w g (Pr [u w|QV = 0]) =
*u|QV =0 (}) =
}+}+
0
In case (b), the next packet leaves the M/M/1 queue after an exponential
service time with rate ,
Z "
*u|QV A0 (}) =
h3}w g (Pr [u w|QV A 0]) =
}+
0
Hence,
*u (}) = *u|QV =0 (}) Pr [QV = 0] + *u|QV A0 (}) Pr [QV A 0]
=
(1 ) +
=
}+}+
}+
}+
which proves the theorem.
¤
Burke’s Theorem states that the steady-state arrival and departure process
of the M/M/1 queue are the same! Consequently, the steady-state departure
rate equals the steady-state arrival rate .
14.2 Variants of the M/M/1 queue
A number of readily obtained variants from the birth-death analogy are
worth considering here. Mainly a steady-state analysis is presented.
14.2.1 The M/M/m queue
Instead of one server, we consider the case with p servers. The buer is still
infinitely long and the interarrival process is exponential with interarrival
rate . The M/M/m queue can model a router with p physically dierent
interfaces (or output ports) with same transmission rate towards the same
next hop. All packets destined to that next hop can be transmitted over
any of the p interfaces. This type of load balancing frequently occurs in the
Internet.
As shown in Fig. 14.1, the M/M/m system can still be described by a birth
14.2 Variants of the M/M/1 queue
277
and death process with birth rate n = , but with death rate n = n for
0 n p and n = p if n p. Indeed, if there are n p packets in
system, they can all be served and the departure (or death) rate from the
system is n. Only if there are more packets n A p, only p of them can be
served such that the death rate is limited to the maximum service rate p.
O
0
P
O
1
P
O
O
...
m1
(m 1)P
mP
O
2
P
O
m
mP
O
m+1
...
mP
Fig. 14.1. The birth—death process corresponding to the M/M/m queue.
14.2.1.1 System content
From the basic steady-state relations for the birth and death process (11.15)
and (11.16), we find
Pr [QV = 0] =
1+
1
P"
m
m=1 m!m +
m=p pm3p p!m
Pp31 m
= Pp31 m
1
(14.6)
p
1
m=0 m!m + p!p 13 p
m
Pr [QV = m] =
Pr [QV = 0]
m!m
µ
¶
m
pp
=
Pr [QV = 0]
p! p
mp
(14.7)
mp
(14.8)
, the ratio between average interarrival rate
The tra!c intensity is = p
and average (maximum) service rate. Again, ? 1 corresponds to the stable
(ergodic) regime.
For the M/M/m system it is of interest to know what the probability of
queueing is. Queueing occurs when an arriving packet finds all servers busy,
which happens with probability Pr [QV p], or explicitly,
Pr [QV p] =
Pr [QV = 0] p
³
´ p
p! 1 p
(14.9)
This probability also corresponds to a situation in classical telephony where
no trunk is available for an arriving call. Relation (14.9) is known as the
Erlang C formula.
278
Queueing models
14.2.1.2 Waiting (or queueing) time
Instead of computing the virtual waiting time (or system time, unfinished
work), we will now concentrate on the waiting time of a packet in the
M/M/m queue. The system time can be deduced from the basic relation
Wq = zq + {q , where {q is an exponential random variable with rate p.
A packet only experiences queueing if all servers are occupied. This event
has probability Pr [QV p] specified by the Erlang C formula (14.9). Hence,
the queueing time z can be decomposed into two cases: (a) an arriving
packet does not queue (z = 0) and (b) an arriving packet must wait in the
M/M/m queue,
Pr[z w] = Pr[z w|QV ? p] Pr[QV ? p] + Pr[z w|QV p] Pr[QV p]
= Pr [QV ? p] + Pr [z w|QV p] Pr [QV p]
= 1 Pr [QV p] + Pr [z w|QV p] Pr [QV p]
(14.10)
It remains to compute Pr [z w|QV p]. The reasoning is analogous to
that in the M/M/1 queue. An arriving packet must wait for the packet
currently under service and for the m packets already in the queue before
it. Thus, z equals the sum of m + 1 exponentially random variables with
rate p because for the M/M/m queue, the service rate is p. Hence (see
Section 3.3.1), the distribution for the waiting time z in the queue, provided
m packets are in the queue, is an Erlang distribution,
iz (w|QT = m) =
p(pw)m 3pw
h
m!
Furthermore, the number of packets in the queue QT in steady-state is
related to the system content as QV = p + QT . Using the law of total
probability (2.46), the waiting time in the queue in steady-state is
g
Pr [z w|QV p]
gw
"
X
iz (w|QV = p + m) Pr [QV = p + m|QV p]
=
iz (w|QV p) =
m=0
The conditional probability Pr [QV = p + m|QV p] follows from (2.44)
as
and (14.8) with = p
Pr [QV = p + m]
Pr [QV = 0] pp
Pr [QV = p + m|QV p] =
=
Pr [QV p]
Pr [QV p] p!
µ
¶µ
¶m
= 1
= (1 ) m
p
p
µ
p
¶m+p
14.2 Variants of the M/M/1 queue
279
We observe from (14.1) that, if all p servers are busy, the system content
of an M/M/m system behaves as that in a M/M/1 system. Thus, the
conditional probability density function for the waiting time in the M/M/m
queue is also an exponential distribution,
iz (w|QV p) = (1 ) ph3pw
"
X
(pw)m
m=0
m!
= (1 ) ph3(13)pw
or
Pr [z w|QV p] = 1 h3(13)pw
Substitution in (14.10) finally results in the distribution of the waiting time
in the queue of the M/M/m system,
Iz (w) = Pr [z w] = 1 Pr [QV p] h3(13)pw
(14.11)
Since Iz (0) = 1 Pr [QV p] A 0 while obviously Iz (03 ) = 0, there
is probability mass at w = 0, which is reflected by a Dirac impulse in the
probability density function,
iz (w) = (1 Pr [QV p]) (w) + (1 ) p Pr [QV p] h3(13)pw
(14.12)
The pdf of the system time W = z + { follows after convolution of (14.12)
and i{ (w) = h3w as
iW (w) = (1 Pr [QV p]) hw +
´
³
Pr [QV p]
(1 ) p h(1)pw hw
1 p (1 )
(14.13)
and the average system time can be computed from (14.13) with (2.33) or
directly from H [W ] = H [z] + H [{] as
H [W ] =
1 Pr [QV p]
+
p (1 )
(14.14)
Also, in the single-server case (p = 1), (14.12) reduces to the pdf (14.5) of
the M/M/1 queue. Furthermore, Burke’s Theorem 14.1.1 can be extended
to the M/M/m queue: the arrival and departure process of the M/M/m
queue are both Poisson processes with rate .
14.2.2 The M/M/m/m queue
The dierence with the M/M/m queue is that the number of packets (calls)
in the M/M/m/m queue is limited to p. Hence, when more than p packets (calls) arrive, they are lost. This situations corresponds with classical
280
Queueing models
telephony where a conversation is possible if no more than p trunks are occupied, otherwise you hear a busy tone and the connection cannot be set-up.
The limitation to p arrivals is modeled in the birth and death process by
limiting the interarrival rates, n = if n ? p and n = 0 if n p. The
death rates are the same as in the M/M/m queue, n = n for 0 n p
and n = p if n p.
From the basic steady-state relations for the birth and death process
(11.15) and (11.16), we find
Pr [QV = 0] = Pp
Pr [QV = m] =
1
m
m=0 m!m
m
m!m
=0
Pr [QV = 0]
(14.15)
mp
mAp
(14.16)
The quantity of interest in the M/M/m/m system is the probability that all
trunks (servers) are busy, which is known as the Erlang B formula,
p
p!p
Pr [QV = p] = Pp m
m=0 m!m
(14.17)
In practice, a telephony exchange is dimensioned (i.e. the number of lines
p is determined) such that the blocking probability Pr [QV = p] is below
a certain level, say below 1034 . In summary, the Erlang B formula (14.17)
determines the blocking probability or loss probability (because only p calls
or packets are allowed to the system), while the Erlang C formula (14.9) is
the probability that a packet must wait in the (infinitely long) queue because
all servers are busy.
Although the Erlang B formula (14.17) has been derived in the context
of the M/M/m/m queue, it holds under much weaker assumptions, a fact
already known to Erlang, as mentioned by Kelly (1991). Kelly starts his
memoir on Loss Networks with the Erlang B formula, which Erlang obtained
from his powerful method of statistical equilibrium. The latter concept is
now identified as the steady-state of Markov processes. Kelly further relates
the impact of the Erlang B formula from telephony to interacting particle
system and phase transitions in nature (e.g. the famous Ising model). Much
eort has been devoted over time to generalize Erlang’s results as far as
possible. The Erlang B formula (14.17) holds for the M/G/m/m queue as
well, thus for an arbitrary service process provided the mean service rate (per
server) equals . The proof by Gnedenko and Kovalenko (1989, pp. 237—
240) is long and complicated, whereas the proof of Ross (1996, Section 5.7.2)
14.2 Variants of the M/M/1 queue
281
is more elegant and is based on the time-reversed Markov chain. As a
corollary, Ross demonstrates that the departure process (including both lost
and served packets) of the M/G/m/m system is a Poisson process with rate
.
Example 1 In case p $
³ 4,´ the expression (14.15) and (14.6) tend to
limp<" Pr [QV = 0] = exp while (14.16) and (14.7) become
µ
¶
m
exp Pr [QV = m] =
m!m
This queueing system is denoted as M/M/4. Thus, the number in the
M/M/4 system (in steady-state) is Poisson distributed with parameter .
Hence, in case p $ 4, the average number in the system H[QV ] = (as follows from (3.11)) and the average time in the system follows from
Little’s theorem (13.21) as H [W ] = 1 . The fact that, if p $ 4, the mean
time in the system M/M/4 equals the average service time has a consistent
explanation: if the number of servers p $ 4, implying that there is an
infinite service capacity, it means that there is no waiting room and the
only time a packet is in the system is his service time 1 .
Example 2 Consider two voice over IP (VoIP) gateways connected by a
link with capacity F. Denote the capacity of a voice call by Fvoice (in bit/s).
For example in ISDN, Fvoice = 64 kb/s. In general, Fvoice in VoIP depends
on the specifics of the codecs used. The arrival rate of voice tra!c can be
expressed in terms of the number d of call attempts per hour and the mean
call duration g (in seconds) as
d×g
3600
The number p of calls that the link can carry simultaneously is
=
p=
F
Fvoice
Since the arrival process of voice calls is well modeled by a Poisson process
with exponential holding time, the Erlang B formula (14.17) is applicable
to compute the blocking probability or grade of service (GoS) as
up
Ppp! um
m=0 m!
(14.18)
where u = = p and = p
is the tra!c intensity. This relation
(14.18) specifies the probability that admission control will have to refuse
282
Queueing models
a call request between the two VoIP gateways because the link is already
transporting p calls. An Internet service provider can make a trade-o
between the link capacity F (by hiring more links or a higher capacity link
from a network provider) and the blocking probability or GoS. The latter
must be small enough to keep its subscribed customers, but large enough
to make profit. A reasonable value for GoS seems = 1034 . If the Internet
service provider hires a 2 Mb/s link and oers its customers VoIP software
with codec rate 40 kb/s (G.726 standard), then p = 50. Since the left-hand
side of (14.18) is strictly increasing in , solving the equation (14.18) for
u yields u 28=87 or the tra!c intensity equals = 0=5775. Furthermore,
F
= 40 kb/s, we obtain = 1=155 Mb/s. If the mean call duration
since = p
g (in seconds) is known, the number of call attempts per hour then follow as
. If we assume that a telephone call lasts on average 2 minutes
d = 4158129
g
or g = 120 s, the number of call attempts per hour that the Internet service
provider can handle with a GoS of 1034 equals d = 34651.
14.2.3 The M/M/1/K queue
In contrast to the basic M/M/1 queue, the M/M/1/K system cannot contain
more than N packets (including the packet in the server). Arriving packets
that find the system completely occupied (with N packets), are refused
service and are to be considered as lost (or marked).
In the basic steady-state relations for the birth and death process (11.15)
the appearing summation is limited to N instead of infinity or n = if
n ? N and n = 0 if n N. Thus, with = and
1
Pr [QV = 0] = PN
m=0
m
=
1
1 N+1
the pdf of the system content for the M/M/1/K system becomes,
(1 ) m
1 N+1
=0
Pr [QV = m] =
0mN
(14.19)
mAN
The probability that m positions in the M/M/1/K system are occupied is
proportional to that in the infinite system (14.1) with proportionality factor
¡
¢31
1 N+1
.
The probability that the system is completely filled with N packets equals
Pr [QV = N] =
(1 ) N
1 N+1
(14.20)
14.3 The M/G/1 queue
283
This probability also equals the loss probability for packets in the M/M/1/K
system. Regarding the QoS problem in multimedia based on IP-networks,
a first crude estimate of the packet loss in a router with N positions can
be derived from (14.20). The estimate is rather crude because the arrival
process of packets in the Internet is likely not a Poisson process and the
variable length of the packets does not necessarily lead to an exponential
service rate.
14.3 The M/G/1 queue
The most general single-server queueing system with Poisson arrivals is the
M/G/1 queue. The service time distribution I{ (w) can be any arbitrary
distribution. Due to its importance, we will derive the system content and
the waiting time distribution in steady-state.
In order to describe the M/G/1 queueing system, special observation
points in time must be chosen such that the equations for the evolution
of the number of packets in the system are most conveniently deduced. The
set of departure times {uq } appears to be a suitable set. Any other set of
observation points is likely to lead to a more complex mathematical treatment, mainly because the remaining service time of the packet just under
service is a stochastic variable. If the M/G/1 queue is observed at departure
times, the evolution of the number of packets in the system QV (uq+ ) that the
departing packets leave behind is a discrete Markov chain, namely the embedded Markov chain of the M/G/1 queue continuous-time process. Section
13.4.1 has shown that, in steady-state, relation (13.19) tells that the distribution of the number of packets in the system observed by arrivals equals
that left behind by departures. In addition, the PASTA Theorem 13.5.1
states that, in steady-state, Poisson arrivals observe the actual distribution
of the number of packets in the system. This PASTA property makes that
the embedded Markov chain observing the system at departure epoch only,
nevertheless provides the steady-state solution since the arrival process is
Poisson.
Let us concentrate in deriving the transition probabilities that specify this
embedded Markov chain entirely apart from an initial state distribution.
With the notation of Section 13.4, Sn = QV (un+ ) and An denote the number
of packets in the system at the discrete-time un and the number of arrivals
during a time interval [un > un+1 ], respectively. The transition probability of
the embedded Markov chain is
Ylm = Pr [Sn+1 = m|Sn = l]
284
Queueing models
and the evolution over time follows from (13.16) as
Sn+1 = (Sn 1)+ + An
(14.21)
Hence, since An 0, we see that Sn+1 (Sn 1)+ or that Sn+1 ? (Sn 1)+
is impossible. Hence, Ylm = 0 for m ? l and l A 1 while, for l A 0, Ylm =
Pr [l 1 + An = m|Sn = l] = Pr [An = m l + 1]. The case for l = 0 results
in Y0m = Pr [An = m] = Y1m . Denoting dm = Pr [An = m], the transition
probability matrix becomes2
6
5
d0 d1 d2 d3 · · ·
9 d0 d1 d2 d3 · · · :
:
9
:
9
Y = 9 0 d0 d1 d2 · · · :
9 0 0 d d ··· :
0
1
8
7
..
..
..
.. . .
.
.
.
.
.
and the corresponding transition graph is sketched in Fig. 14.2.
aj i + 1
a1
...
i2
a3
i1
a2
i
i+1
i+2
...
j
...
a0
Fig. 14.2. State transition graph for the M/G/1 embedded Markov chain.
The number of Poisson arrivals during a time slot [un > un+1 ] clearly depends on the length of the service time {n+1 = un+1 un that is distributed
according to I{ (w), which is independent of a specific packet n. Furthermore,
the arrival process is a Poisson process with rate and independent of the
state of the queueing process, thus Pr [An = m] = Pr [A = m]. Hence, using
the law of total probability (2.46),
Z "
Pr [A = m] =
Pr [A = m|{ = w] gI{ (w)
0
Z "
=
0
2
h3w
(w)m
gI{ (w)
m!
The structure of this transition probability matrix Y has been investigated in great depth by
Neuts (1989). Moreover, Y belongs to the class of matrices whose eigenstructure is explicitly
given in Appendix A.5.3.
14.3 The M/G/1 queue
285
If we denote the Laplace transform of the service time by
Z "
*{ (}) =
h3}w gI{ (w)
0
then we observe that
m
dm = Pr [A = m] =
m!
Z "
h
0
¯
()m gm *{ (}) ¯¯
w gI{ (w) =
m!
g} m ¯}=
3w m
(14.22)
so that the transition probability matrix Y is specified. Since all dm A 0 for
all m A 0, Fig. 14.2 indicates that all states are reachable from an arbitrary
state l as the Markov process evolves over time in at least l steps. These
l steps occur in the transition from state l to state 0. This implies that
the Markov process is irreducible and the steady-state stability requirement
? 1 makes the Markov process ergodic. The steady-state vector with
components m = Pr [QV;G = m] where limq<" QV (uq+ ) = QV;G follows from
(9.22) as solution of = Y , where Y is a matrix of infinite dimensions.
14.3.1 The system content in steady-state
Rather than pursuing with the matrix analysis that is explored by Neuts
(1989), we present an alternative method to determine the steady-state distribution m using generating functions. The generating function approach
will lead in an elegant way to the celebrated Pollaczek-Khinchin equation.
The probability generating function (pgf) Jn (}) of a discrete random variable Gn is defined in (2.18) as
"
£ ¤ X
{
Jn (}) = H } Gn =
jn [m]} m
(14.23)
m=0
where jn [m] = Pr[Gn = m]. From (14.21), we have
h
i
+
Vn+1 (}) = H } (Sn 31) +An
(14.24)
Anticipating the corresponding result (14.31) derived in Section 14.4 for
the GI/D/m system in discrete-time, we observe that the generating function Vn (}) satisfies a formally similar equation when p = 1 in (14.31).
This correspondence points to a more general framework because, by choosing appropriate observation points, the M/G/1 and GI/D/1 systems (in
discrete-time) obey a same equation formally. Since the results deduced for
the GI/D/m system are more general because of the p servers instead of 1
here, we content ourselves here to copy the result (14.35) derived below3 in
3
Notice that the notation D(}) here is dierent from D(w) =
Uw
0 QD (x)gx used before.
286
Queueing models
Section 14.4,
¡
¢ (} 1) D(})
V(}) = 1 D0 (1)
} D(})
We further continue to introduce in this general equation the details of the
P"
m
M/G/1 queueing system by specifying D(}) =
m=0 Pr [A = m] } . With
(14.22), we find the Taylor expansion,
¯
"
X
(})m gm *{ (}) ¯¯
= *{ ( })
(14.25)
D(}) =
m!
g} m ¯}=
m=0
and the probability generating function of the system content of the steadystate M/G/1 queueing system,
¢ (} 1) *{ ( })
¡
V(}) = 1 + *0{ (0)
} *{ ( })
But, since *0{ (0) = H [{] = 1 , which is the average service time, we
finally arrive using (13.2) at the famous Pollaczek-Khinchin equation,
V(}) = (1 )
(} 1) *{ ( })
} *{ ( })
(14.26)
Let us further investigate what can be concluded from the Pollaczek-Khinchin
equation (14.26). First, we can verify by executing the limit } $ 1 that
V(1) = 1, the normalization condition for any probability generating function. More interestingly, the average number of packets in the M/G/1 system
follows after some tedious manipulations using de l’Hospital’s rule as
H [QV ] = V 0 (1)
=+
£ ¤
From (2.42), *00{ (0) = H {2 ,
2 *00{ (0)
2(1 )
£ ¤
2 H {2
H [QV ] = +
2(1 )
(14.27)
Hence, the average number of packets in the M/G/1 system (in steady-state)
is proportional
to the second moment of the service time distribution. Since
£ 2¤
H { = Var[{]+(H [{])2 , the relation4 (14.27) shows that, for equal average
I
4
Sometimes the coe!cient of variation for the service time, F[ =
H [QV ] = +
2
2 1 + F[
2(1 3 )
Va r[[]
, is used such that
H[[]
14.3 The M/G/1 queue
287
service rates, the service process with highest variability leads to the largest
average number of packets in the system. One of the early successes of the
Japanese industry was the “just in time” (JIT) principle, which essentially
tries to minimize the variability in a manufacturing process. Minimization
of variability is also very important in the design of scheduling rules: the
less variability, the more e!ciently buer places in a router are used. Since
a deterministic server has the lowest variance (namely zero), the M/D/1
queue will occupy on average the lowest number of packets. This design
principle was used in ATM, where all service times precisely equal the time
needed to serve one ATM cell. The average time spent in the system follows
directly from Little’s law (13.21),
£ ¤
H {2
H [QV ]
= H [{] +
H [W ] =
2(1 )
and, since H [W ] = H [{] + H [z], the average waiting time in the queue is
£ ¤
H {2
(14.28)
H [z] =
2(1 )
Observe a general property of “averages” in queueing systems: there is a
simple pole at = 1. Both the average number of packets in the system
(and in the queue) and the average waiting time grows unboundedly as
$ 1.
14.3.2 The waiting time in steady-state
The derivation of the pgf (14.26) of the steady-state system content V(})
has not made any assumption about the service discipline that determines
the order in which packets are served. The waiting time (in the queue)
and the system time (total time spent in the M/G/1 queueing system) will,
of course, dependent on the order. As mentioned earlier, a FIFO service
discipline is assumed. At each departure time un , the number of packets left
behind by that n-th packet is precisely QV (un ). With FIFO, this implies
that during the total time Wn that the n-th packet has spent in the M/G/1
queueing system, precisely QV (un ) packets have arrived. Similarly, as above
in (14.22), we compute the number of Poisson AW arrivals during Wn (instead
of {n ) and directly find, in steady-state,
¯
Z
m " 3w m
()m gm *W (}) ¯¯
W
Pr [A = m] =
h w gIW (w) =
m! 0
m!
g} m ¯}=
288
Queueing models
and the corresponding pgf,
DW (}) =
"
X
Pr [AW = m] } m = *W ( })
m=0
where *W (}) is the Laplace transform of the system time W . Since the number of Poisson arrivals AW during the system time W of a packet in steadystate equals the number of packets left behind by that packet, Pr [AW = m] =
Pr [QV;G = m]. The PASTA property (Theorem 13.5.1) states that, in steadystate, the observed number of packets in the queue at departure or arrival
times is equal in distribution to the actual number of packets in the queue
or that Pr [QV;G = m] = Pr [QV = m]. By considering the pgfs of both sides,
DW (}) = V(}), such that with (14.26),
*W ( }) = (1 )
(} 1) *{ ( })
} *{ ( })
After a change of variable v = }, we end up with the result that the
Laplace transform of the total system time in steady-state is a function of
the Laplace transform of the service time
*W (v) = (1 )
v*{ (v)
v + *{ (v)
(14.29)
¤
£
Since Wn = zn + {n and, in steady-state W = z + {, we have that H h3vW =
H [h3vz h3v{ ] = H [h3vz ] H [h3v{ ], where the latter follows from independence
between { and z. Hence, *W (v) = *{ (v)*z (v), from which the Laplace
transform of the waiting time in the queue follows as
v
(14.30)
*z (v) = (1 )
v + *{ (v)
Due to the correspondence with (14.26), these two relations (14.29) and
(14.30) are also called Pollaczek-Khinchin equations for the system time
and waiting time respectively. For example, for an exponential service time
(13)
and (14.29) becomes *W (v) = v+(3)
, which
with average 1 , *{ (v) = v+
is indeed the Laplace transform of the pdf of total system time (14.4) in
the M/M/1 queue. The relation (14.30) can be written in terms of the
residual service time (8.17), which after Laplace transform becomes *uz (v) =
13*{ (v)
vH[{] , as
*z (v) =
1
1 *uz (v)
It shows that the dominant tail behavior (see Section 5.7) arises from the
pole at *uz (v) = 1 . By formal expansion into a Taylor series (only valid for
14.4 The GI/D/m queue
289
|*uz (v)| ? 1), we find
*z (v) = (1 )
"
X
n *nuz (v)
n=0
or, after taking the inverse Laplace transform,
X
g
(nW)
(1 ) n iuz
(w)
Pr [z w] =
gw
"
iz (w) =
n=0
The pdf iz (w) of the waiting time in the queue can be interpreted as a sum
(nW)
of convolved residual service time pdfs iuz (w) weighted by (1 ) n =
Pr [QV = n], the steady-state probability of the system content in the M/M/1
system (14.1).
14.4 The GI/D/m queue
The analysis of the GI/D/m queue illustrates a discrete-time approach to
queueing. Since each of the p servers operate deterministically which means
that per unit time precisely one packet (or an ATM cell or customer) is
served, the basic time unit in the analysis, also called a time slot, is equal
to that service time. Hence, the arrival process is expressed as a counting
process: instead of specifying the interarrival rate, the number of arrivals at
each time slot is used.
In the sequel, we confine ourselves to a deterministic server discipline that
removes during timeslot n£ precisely
p cells from the queue. Hence, we have
¤
X
p
n
Xn = p and [n (}) = H }
= } . Substituting (13.16) in (14.23) leads
to
h
i
+
Vn+1 (}) = H } (Sn 3p) +An
(14.31)
At this point, a further general evaluation of the expression (14.31) is only
possible by assuming independence between the random variables An and
Qn . From (13.18), it then follows that Vn+1 (}) = Tn (})Dn (}). This crucial
assumption facilitates the analysis considerably. For,
h
i
+
Vn+1 (}) = H } (Sn 3p) } An
i £
h
¤
+
(by independence)
= H } (Sn 3p) H } An
= Dn (})
"
X
m=0
Pr[(Sn p)+ = m] } m
(by definition (14=23))
290
Queueing models
The summation can be worked out as
"
X
E=
Pr[(Sn p)+ = m] } m
m=0
= Pr[(Sn p)+ = 0] +
=
p
X
Pr[Sn = p + m] } m
m=1
"
X
Pr[Sn = m] } m3p
Pr[Sn = m] +
m=0
"
X
m=1+p
Setting in terms of the generating function of Vn (}) yields
3
4
p
"
p
X
X
X
Pr[Sn = m] + } 3p C
Pr[Sn = m] } m Pr[Sn = m] } m D
E=
m=0
=}
3
3p C
Vn (}) m=0
p
X
4
m=0
Pr[Sn = m] (} m } p )D
m=0
Finally, we obtain a recursion relation for the generating function of the
system content in the GI/D/m system at discrete-time n,
3
4
p31
X
Pr[Sn = m] (} m } p )D
(14.32)
Vn+1 (}) = Dn (})} 3p CVn (}) m=0
In the single-server case p = 1, where precisely one cell is served per time
slot (provided the queue is not empty), equation (14.32) simplifies with
Vn (0) = Pr[Sn = 0] to
Vn+1 (}) = Dn (})} 31 {Vn (}) Pr[Sn = 0] (1 })}
¸
Vn (}) Vn (0)
+ Vn (0)
= Dn (})
}
(14.33)
Notice that Vn+1 (0) = Dn (0)(Vn0 (0) + Vn (0)).
14.4.1 The steady-state of the GI/D/m queue
The steady-state behavior is reached if the system’s distributions do not
change anymore in time. With limn<" Vn (}) = V(}) and limn<" Dn (}) =
D(}), (14.32) reduces in steady-state to
P
p
m
D(}) p31
m=0 Pr[S = m] (} } )
V(}) =
} p D(})
14.4 The GI/D/m queue
291
Recall that [n (}) = } p is the generating function of the service process.
At this point, we use the same powerful argument from complex analysis as
in Section 11.3.3. Since a generating function of a probability distribution
is analytic inside and on the unit circle, the possible zeros of } p D(})
inside that unit circle must be precisely cancelled by zeros in the numerator.
Clearly, } = 1 is a zero of } p D(}). On the unit circle, excluding the point
} = 1 where D(1) = 1, we know5 that |D(})| ? 1 for all points on the unit
circle |}| = 1 (except for } = 1).
The region around } = 1 deserves some closer investigation. From the
Taylor expansions
D(}) = 1 + (} 1) + r(} 1)
} p = 1 + p(} 1) + r(} 1)
and the fact that ? p because a steady-state requires ? 1, we substitute
} 1 = %hl , which describes a circle with radius % around } = 1. Along this
circle with arbitrarily small radius %, we find that
¯
¯ q
¯
¯
l
|D(})| = ¯1 + %h + r(%)¯ = (1 + % cos + r(%))2 + (% sin + r(%))2
¯
¯ q
¯
¯
|} p | = ¯1 + p%hl + r(%)¯ = (1 + p% cos + r(%))2 + (p% sin + r(%))2
which demonstrates that |} p | A |D(})| on this arbitrary small circle if cos A
0 or 2 ? ? 2 . Invoking Rouché’s Theorem 11.3.1 with i (}) = } p and
j(}) = D(}) on the contour F, the unit circle including the point } = 1 by
an arbitrarily small arc % on the right of } = 1 as illustrated in Fig. 14.3, such
that |i (})| A |j(})| on the contour F, shows that } p D(}) has precisely
p zeros 1 > 2 > = = = > p = 1 inside that contour F.
V(})
Since T(}) = D(})
is the generating function of the number of occupied
buer positions, it is also analytic inside the unit circle. Therefore, the zeros
P
p
m
{q }1$q$p31 and p = 1 are also zeros of s(}) = p31
m=0 Pr[S = m] (} } ).
This leads to a set of p equations for each q 6= 0,
p31
X
Pr[S = m] (qp qm ) = 0
m=0
which determine the unknown probabilities Pr[S = m]. Since s(}) is a poly5
For any probability generating function *J (}) it holds for |}| $ 1 that
[
[
"
"
"
[
m
Pr [J = m] } $
Pr [J = m] } m $
Pr [J = m] = 1
|*J (})| = m=0
m=0
m=0
292
Queueing models
|z| = 1
1
0
H
C
Fig. 14.3. Details of the contour F in the neighborhood of } = 1=
nomial of degree p, s(}) is entirely determined by its zeros as
s(}) = (} 1)
p31
Y
(} q )
q=1
The unknown is determined from the normalization condition V(1) =
T(1) = 1, which is explicitly
lim
s(})
}<1 } p D(})
=1
With de l’Hospital’s rule,
31
=
p31
Y
q=1
1
=
(1 q ) lim
p31
}<1 p}
D0 (})
Qp31
q=1 (1 q )
p
Finally, we arrive at the generating function of the buer content via that
of the system content V(}) = D(})T(}),
(p )(} 1) Y } q
} p D(})
1 q
p31
T(}) =
(14.34)
q=1
With D0 (1) = the single-server case (p = 1) in (14.34) reduces to the
well-known result for the pgf of the system and buer content of a GI/D/1
system respectively as
¡
¢ (} 1) D(})
V(}) = 1 D0 (1)
} D(})
}1
T(}) = (1 )
} D(})
(14.35)
(14.36)
14.4 The GI/D/m queue
293
The probability of an empty buer, T(0) = Pr [Q = 0], immediately follows
from (14.34). The average queue length for the single-server (p = 1) is
obtained as T0 (1), or
H [QT ] = H [Q] =
D00 (1)
2 (1 )
(14.37)
14.4.2 The waiting time in a GI/D/m system
Let W = z + 1 denote the steady-state system time of an arbitrary packet,
a “test” packet, in units of a timeslot. In addition to the system content
S that describes the number of packets in the system at the beginning of
a time slot, an additional random variable must be introduced: F denotes
the number of packets that arrive in the same timeslot just before the “test”
packet. In the D-server and assuming a FIFO discipline, these F packets
will be served before the “test” packet, possibly in the same time slot. The
system time of the “test” packet equals
¹
º
(S p)+ + F
W =
+1
p
where b{c denotes the largest integer smaller than or equal to {. Indeed,
(S p)+ + F are the number of packets in the system just before the
arrival of the “test” packet. At the beginning of a timeslot, at most p
packets are served, which explains the integer division. The service time
takes precisely one additional time slot. Let us simplify the notation by
defining R = (S p)+ + F. From this expression for the system time,
we deduce, for each integer n 1 (the minimal waiting time in the system
equals 1 timeslot), that
Pr [W = n] =
p31
X
Pr [R = (n 1)p + m]
m=0
such that the generating function of the waiting time W (}) is
W (}) =
"
X
Pr [W = n] } n =
n=1
=
"
p31
XX
m=0 n=0
" p31
X
X
Pr [R = (n 1)p + m] } n
n=1 m=0
Pr [R = np + m] } n+1
294
Queueing models
Also,
W (} p ) = } p
p31
X
} 3m
m=0
=}
p
=}
Pr [R = np + m] } pn+m
n=0
p31
X
}
3m
"
" X
X
Pr [R = q] } q q>pn+m
n=0 q=0
m=0
p
"
X
p31
X
}
3m
"
X
Pr [R = q] }
q=0
m=0
q
"
X
q>pn+m
n=0
where the Kronecker delta n>p = 1 if n = p else n>p = 0. The sum
"
X
"
X
q>pn+m =
n=0
q3m>pn = 1p|q3m
n=3"
is one if p divides q m else it is zero. Such expression can be written as
"
X
p31
X
1 h2l(q3m)
¢=
q3m>pn = ¡
h2ln(q3m)@p
2l(q3m)@p
p
1
h
n=3"
n=0
Using the latter summation yields
W (} p ) =
p31
"
p31
X
} p X 3m X
}
Pr [R = q] } q
h2ln(q3m)@p
p
q=0
m=0
=
=
n=0
p p31
X p31
X³
}
p
}h2ln@p
n=0 m=0
´3m
"
X
³
´q
Pr [R = q] }h2ln@p
q=0
³
´
1 } 3p
2ln@p
¢31 U }h
¡
2ln@p
n=0 1 }h
p p31
X
}
p
where we have introduced the generating function U(}) =
Thus, we arrive at
P"
q=0 Pr [R = q] }
q.
³
´
}p 1
1 X
2ln@p
U
}h
¢31
¡
p
1 }h2ln@p
p31
W (} p ) =
n=0
The generating function U(}) can be specified further since
h
i
i
£ ¤ h
+
+
U(}) = H } (S3p) +F = H } F H } (S3p)
where independence of the arrival process and queueing process (GI) has
14.4 The GI/D/m queue
295
been used. From (14.31), the corresponding steady-state relation is
h
i
i
h
i
£ ¤ h
+
+
+
V(}) = H } (S3p) +A = H } A H } (S3p) = D(})H } (S3p)
h
i
+
while from V(}) = D(})T(}), we observe that T(}) = H } (S3p) . Hence,
U(}) = I (})T(})
where T(}) is given by (14.34).
We now turn our attention to the determination of I (}), the generating
function of the number of packets in front of the “test” packet. The “test”
packet has been uniformly chosen out of the total flow of arriving packets at
the system. Let us denote by AW the number of arriving packets in the same
time slot as the “test” packet. The random variable AW is not the same as
the number of arriving cells A per time slot. For example, we know that
there is at least one arrival in the time slot of the “test” packet, namely,
the “test” packet itself, hence, Pr [AW = 0] = 0. Furthermore, the larger the
number of arriving packets in a time slot, the higher the probability that
the “test” packet is chosen out of those packets in this time slot. Hence,
Pr [AW = m] is proportional to the number m of arriving packets in a time slot.
In addition, Pr [AW = m] is also proportional to Pr [A = m], which describes
how likely a number m of arriving packets is. Combining both shows that
Pr [AW = m] = m Pr [A = m]
P
1
W
with the proportionality factor equal to = H[A]
because "
m=0 Pr [A = m] =
1. The “test” packet is uniformly distributed among the arriving packets
AW in the time slot of the “test” packet (in steady-state). The probability
of having precisely n packets in front of the “test” packet given AW = m 1
equals
1n?m
Pr [F = n|AW = m] =
m
Indeed, the “test” packet has equal probability 1m of occupying any of the
m possible positions. The occupation of a position n + 1 implies precisely
n cells in front of the “test” packet in a FIFO discipline. Using the law of
total probability (2.46),
Pr [F = n] =
=
"
X
Pr [F = n|AW = m] Pr [AW = m]
m=1
"
X
m=n+1
"
X
1 m Pr [A = m]
1
Pr [A = m]
=
m
H [A]
H [A]
m=n+1
296
Queueing models
The generating function I (}) becomes
I (}) =
"
X
n=0
1 X X
Pr [F = n] } =
Pr [A = m] } n
H [A]
"
"
n
n=0 m=n+1
m
X
"
"
1 X
1 X
1 }m
Pr [A = m]
} n31 =
Pr [A = m]
H [A]
H [A]
1}
m=1
m=1
n=1
3
4
"
"
X
X
1
C
Pr [A = m] } m Pr [A = m]D
=
H [A] (} 1)
=
m=1
=
m=1
} (D(}) Pr [A = 0] 1 + Pr [A = 0])
H [A] (} 1)
Since H [A] = D0 (1) = , we finally arrive at
I (}) =
D (}) 1
(} 1) D0 (1)
Combining all involved expressions leads to the expression of the generating
function of the total time spent in the GI/D/m queue
¡ 2ln@p ¢
p31
p31
p1
X
1 Y }h2ln@p q
D
}h
}
p
p
¢
¡
W (} ) =
¢31 p
¡
p
1 q
} D }h2ln@p q=1
1 }h2ln@p
n=0
For the single-server case (p = 1), the generating function of the system
time (queueing time plus service time) considerably simplifies to
¶
µ
D (}) 1
1
}
W (}) =
} D(})
from which H [W ] and Var[W ] readily follow. The computation of the pdf
given the arrival process D(}) is more complex, as illustrated for the M/D/1/K
queue in the next section.
14.5 The M/D/1/K queue
Suppose we have a buer of N cells and an aggregate arrival stream consisting of a large number of sources with none of them dominating the others.
This input process is well modeled by a Poisson process with arrival rate
. Both the input process as well as the buer content and the output
process have been simulated in Fig. 14.4. Observe the eect of variations
and maximum number of cells in the queue and input process!
14.5 The M/D/1/K queue
297
6
Poisson(0.8)
number of cells in queue
2
number of served cells
8
4
number of arriving cells
1
10
6
4
2
0
0
0
0
200
400
600
800
1000
0
timeslot
200
400
600
800
1000
0
200
400
600
800
1000
timeslot
timeslot
Fig. 14.4. On the left, the Poisson input process with = 0=8 in terms of the number
of cells versus the timeslot. In the middle, the buer occupancy for a buer with
N = 20 as function of time. On the right, the M/D/1/K output process in cells
served per timeslot.
14.5.1 The pdf of the buer occupancy in the M/D/1 queue
n
For a Poisson process, Pr[A = n] = n! h3 and D(}) = h(}31) . The pgf
T(}) of the buer content immediately follows from (14.36) as
T(}) = (1 )
(1 }) h(13})
1 } h(13})
The average queue length is obtained from (14.37) as
¤
£
H QT;P@G@1 = H [Q] =
2
2 (1 )
and Little’s law (13.21) provides the average waiting time in the queue
¤
£
£
¤ H QT;P@G@1
=
H zP@G@1 =
2 (1 )
(14.38)
Since } h(13}) is analytic everywhere, there always exists a neighborhood
(depending on ) around } = 0 for which |} h(13}) | 1. Hence, we can use
the series expansion for the geometric series to obtain
X
X
T(})
} n hn(13}) =
(1 }) } n h(n+1)(13})
= (1 }) h(13})
1
"
"
n=0
n=0
298
Queueing models
Integrating with respect to removes the factor (1 }),
Z
"
X
}n
T(})
g =
h(n+1) h3(n+1)}
1
n+1
n=0
"
"
X
X
}n
(1)q (n + 1)q q q
h(n+1)
}
=
n+1
q!
q=0
n=0
=
" X
"
X
(1)q
n=0 q=0
q!
h(n+1) (n + 1)q31 q } q+n
After a change in the variable p = q + n with p 0, which implies that
n p since q = p n 0, we have
Ãp
!
Z
"
X
X (1)p3n
T(})
g =
h(n+1) (n + 1)p3n31 p3n } p
1
(p n)!
p=0 n=0
Ãp+1
!
"
X
X (1)p+13n
=
hn n p3n p+13n } p
(p
+
1
n)!
p=0
n=1
Dierentiation with respect to , gives
Ãp+1
¸!
"
p+13n
p3n
X
X
(n)
(n)
T(}) = (1 )
(1)p+13n hn
+
}p
(p
+
1
n)!
(p
n)!
p=0
n=1
from which we finally deduce the probability t[p] = Pr [Q = p] that the
p-th position in the buer is occupied6
¸
p+1
X
(n)p3n
(n)p+13n
p+13n n
+
t[p] = (1 )
(1)
h
(14.39)
(p + 1 n)! (p n)!
n=1
One readily observes from the derivation above that the probability v[p] =
Pr [S = p] that p positions in the system are occupied, is v[p] = t[p 1]
6
Explicitely, we have for the queue content probabilities
t[0] = h (1 3 )
t[1] = h (1 3 ) (h 3 3 1)
t[2] = h (1 3 ) (
2
3 h2 + 3 h 3 2 h )
2
while the system content probabilities are
v[0] = (1 3 )
v[1] = (h 3 1) (1 3 )
v[p] = t[p 3 1]
(p D 2)
14.5 The M/D/1/K queue
299
for p 2 because the }-transform is V(}) = (1 ) 13}(13})
. This result is
h(13})
a characteristic property of a deterministic server. Next, we rewrite (14.39)
as
#
"p+1
p
p+13n
p3n
X
X
n (n)
n (n)
h
h
t[p] = (1 )
(p + 1 n)!
(p n)!
n=1
n=1
h
i
= (1 ) h(p+1) j(h3 ; p + 1) hp j(h3 ; p)
(14.40)
where
j({; p) =
p
X
{n
(p n)n
n!
(14.41)
n=0
Due to the nature of the dierences, we immediately find the cumulative
distribution,
N
X
i
h
t[p] = (1 ) h(N+1) j(h ; N + 1) h
p=1
and since t[0] = (1 ) h , we arrive at
Pr[Q N] =
N
X
t[p] = (1 ) h(N+1) j(h ; N + 1)
(14.42)
p=0
The expressions in (14.40) are numerically only useful for small p because
the series is alternating. This problem may be solved by considering a
famous result due to Lagrange (Markushevich, 1985, Vol. 2, Chapter 3,
Section 14)
he} = 1 + e
"
X
(e + qd)q31 ¡ 3d} ¢q
}h
q!
(14.43)
q=1
that converges for |}| d31 . Dierentiation of (14.43) with respect to
z = }h3d} leads to
"
X
(e + d + qd)q q 3qd}
h(e+d)}
=
} h
1 d}
q!
q=0
e}
g e}
g e} g}
eh
because gz
h = g}
h gz = (13d})h
3d} . Choosing d = 1, e + d = p and
} = , we obtain
"
"
X
X
h3p
(p q)q
(q p)q
=
(h3 )q = j(h3 ; p) +
(h3 )q
1 q=0
q!
q!
q=p+1
300
Queueing models
and thus
3
j(h
"
X
(q p)q
h3p
(h3 )q
; p) =
1 q=p+1
q!
where the infinite series consists of merely positive terms. Substituted in
(14.42), this finally yields
N+1
Pr[Q A N] = (1 ) "
X
qq+N+1
(h3 )q
(q
+
N
+
1)!
q=1
(14.44)
In the heavy tra!c limit = $ 1, the dominant zero (5.35) of 1 and the resulting tail
} h(13}) is approximately equal to 1 + 2 (13)
asymptotic (5.31) for the buer occupancy pdf is
Pr [Q A N] (13)
1 3N31
h3N log h32N 1 (14.45)
14.6 The N*D/D/1 queue
The N*D/D/1 queue is a basic model for constant bit rate sources in ATM,
as shown in Fig. 14.5. The input process consists of a superposition of Q
independent periodic sources with the same period G but randomly phased,
i.e. arbitrarily shifted in time with respect to each other. The server operates
deterministically and serves one ATM cell per timeslot. The buer size is
assumed to be infinitely long mainly to enable an exact analytic solution.
During a time period G measured in time slots or server time units, precisely
Q cells arrive such that the tra!c intensity or load (13.2) equals = Q
G.
1
2
3
K
D
.
.
.
N
Fig. 14.5. Sketch of an ATM concentrator where Q input lines are multiplexed onto
a single output line. The N*D/D/1 queue models this ATM basic switching unit
accurately.
Whereas the arrivals in the M/D/1 queue are uncorrelated, the successive
14.6 The N*D/D/1 queue
301
interarrival times in the N*D/D/1 queue are negatively correlated. For the
same average arrival rate, this more regular arrival process results in shorter
queues than in the M/D/1 queue, where the higher variability in the arrival
process causes longer queues.
Due to the dependence of the arrivals over many timeslots, the solution method is based on the Benes̆ approach and starts from the complementary distribution (13.15) for the virtual waiting time or unfinished
work in steady-state, Pr [y(w" ) A {] = limw<" Pr [y(w) A {]. Applied to the
N*D/D/1 queue the unfinished work equals the number of ATM cells in the
system, thus Pr [y(w" ) A {] = Pr [QV A {]. Hence, in the steady-state for
? 1 or Q ? G,
¯Z w"
¸
"
X
¯
Pr [QV A {] =
Pr y(w" + { n) = 0 ¯¯
QD (x) gx = n
n=d{e
w" +{3n
¸
Z w"
× Pr
w" +{3n
QD (x) gx = n
The periodic cell trains with period equal to G timeslots at each input line
lead to a periodic aggregated arrival stream of the Q input lines also with
period G. Each cell train transports precisely one cell per period G, which
allows us to observe the characteristics of the aggregated arrival process
during the time interval [0> G). The computations are most conveniently
performed if we choose the steady-state observation point w" = G. Each of
the Q ATM cells arrives uniformly in [0> G] due to the random phasing of
each cell train and the probability that it arrives in [G +{n> G] is s = n3{
G .
Hence, the number of arrivals in [G + { n> G] is a sum of Bernoulli random
variables, which is binomially distributed,
Z G
¸ µ ¶µ
¶ µ
¶
Q
n{ n
n { Q3n
Pr
QD (x) gx = n =
1
n
G
G
G+{3n
The conditional probability is obtained as follows. The unfinished work
at time G + { n only depends on past arrivals in the interval [0> G + { n].
Given that the number of arrivals in [G + { n> G] equals n while there
are always precisely Q in [0> G], the number of arrivals in [0> G + { n]
Q3n
? 1 since
equals Q n and the corresponding tra!c intensity 0 = G+{3n
Q ? G and, thus, Q n ? G n for any n. From Section 13.3.2, we use the
local stationary result: for any stationary single-server queueing system with
tra!c intensity , the probability of an empty system at an arbitrary time
is 1 . If we take a random point in w 5 [0> G + { n], then stationarity
0
implies that Pr [y (w) = 0] = 1 0 = G+{3Q
G+{3n . Since ? 1, the system
302
Queueing models
is necessarily empty at some instant wW in [0> G + { n). As explained in
Section 13.2 and Section 13.3, we may consider that the system restarts
from wW on ignoring the past. But the probability Pr [y (w) = 0] at a random
time in the interval [0> wW ] and in [wW > G + { n] is the same, which means
that Pr [y(G + { n) = 0] = G+{3Q
G+{3n because we can periodically repeat
the system’s process in [0> wW ] while omitting any activity in [wW > G + { n].
With respect to this newly constructed periodic arrival pattern, the point
w = G + { n is arbitrary such that the local stationary result is applicable.
In summary, we arrive at the overflow probability in a N*D/D/1 system,
Pr [QV A {] =
µ ¶µ
¶ µ
¶
Q
X
G+{Q Q
n{ n
n { Q3n
(14.46)
1
G+{n n
G
G
n=d{e
Observe that Pr [QV A Q ] = 0. Rewriting (14.46) yields
Q µ ¶
G+{Q X Q
(n {)n (G + { n)Q3n31
Pr [QV A {] =
GQ
n
n=d{e
Q3d{e µ ¶
G+{Q X Q
=
(Q m {)Q3m (G + { Q + m)m31
Q
m
G
m=0
G+{Q
=
GQ
Q µ
X
m=0
G+{Q
GQ
Q
m
¶
(G + { Q + m)m31 (Q { m)Q3m
Q
X
µ ¶
Q
(G+{Q +m)m31 (Q m {)Q3m
m
m=Q3d{e+1
Applying Abel’s identity (Comtet, 1974, p. 128), valid for all x> |> },
q µ ¶
X
q
(x n})n31 (| + n})q3n
(x + |) = x
n
q
(14.47)
n=0
with q = Q , x = G + { Q , | = Q { and } = 1 gives
Pr[QV {] =
G+{Q
GQ
Q
X
µ ¶
Q
(G + { Q + m)m31 (Q m {)Q3m
m
m=Q3d{e+1
(14.48)
demonstrating that, indeed, Pr [QV 0] = 0. For small {, relation (14.48)
is convenient, while (14.46) is more suited for large { $ Q . For example,
¡ 1 ¢Q
¡
¢
1 Q
, while Pr [QV A Q 1] = G
.
1+ G
Pr [QV 1] = G+13Q
G+1
14.6 The N*D/D/1 queue
303
In the heavy tra!c regime for = Q
G $ 1, a Brownian motion approximation (Roberts, 1991) is
{
32{ Q
+ 13
Pr [QV A {] ' h
(14.49)
Figure 14.6 compares the exact (14.46) overflow probability and the Brownian approximation (14.49) for = 0=95. Observe from (14.45) that
£
¤
2{2
Pr [QV A {] ' h3 Q Pr QM/D/1 A {
which shows that, for su!ciently high Q , the overflow probability of the
N*D/D/1 queue tends to that of the M/D/1 queue. Thus, an arrival process
consisting of a superposition of a large number of periodic processes tends to
2{2
a Poisson arrival process. The decaying factor h3 Q reflects the eect of the
negative correlations in the arrival process and shows that a Poisson process
overestimates the tail probability in heavy tra!c. Comparison of (14.46)
and (14.44) for lower loads = Q
G illustrates that the Poisson approximation
becomes more accurate.
0
10
-1
Exact
Brownian Approximation
10
-2
10
-3
10
M/D/1: N of
-4
10
-5
10
-6
Pr[NS > x]
10
N = 5000
-7
10
-8
10
-9
10
N = 1000
-10
10
N = 500
-11
10
N = 200
-12
10
N = 100
-13
N = 50
10
U = 0.95
-14
10
-15
10
0
20
40
60
80
100
x
Fig. 14.6. The overflow probability Pr [QV A {] in the N*D/D/1 queue for = 0=95
and various number of sources Q .
304
Queueing models
14.7 The AMS queue
Many arrival patterns in telecommunication exhibit active or “on” periods
succeeded by silent inactive or “o” periods. At the burst or flow level
phenomena of the order of time of an on-o period are dominant and the finer
details of the individual packet arrivals within an on-period can be ignored.
The stream of packets can be regarded as a continuous fluid characterized
by the flow arrival rate.
The AMS queue is perhaps the simplest exact solvable queueing model
that describes the queueing behavior at the burst or flow level. The AMS
queue named after Anick, Mitra and Sondhi (Anick et al., 1982; Mitra, 1988)
considers Q homogeneous, independent on-o sources in a continuous fluid
flow approach. For each source, both the on- and o-period are exponentially
distributed, which makes the model Markovian by Theorem 10.2.3. In the
on-period each source emits a unit amount of information. Hence, at each
moment in time when u sources are in the on-period, u packets (units of
information) arrive at the buer. The service time is constant and equal to
f ? Q packets per unit time. If f A Q , then the buer is always empty. The
time unit is chosen as the average time of an on-period while the average
time of an o-period is denoted by 1 . The buer size is infinitely long. The
Q
tra!c intensity then equals = f(1+)
and stability requires that ? 1.
The ratio 1+
is the long term “on” time fraction of the sources.
Suppose the number of on-sources at time w is l. During the next time
interval w only two elementary actions can take place: a new source can
start with probability (Q l)w or a source can turn o with probability
lw. Compound events have probabilities R(w2 ). The probability of no
change in the arrival process is 1 [(Q l) + l]w during which l sources
are active and the queue empties at rate f l. The AMS queueing process is
a birth-death process where state l describes the number of on-sources and
where the birth rate l = (Q l) and the death rate l = l. Let Sl (w> {)
where 0 l Q , w 0, { 0 be the probability that at time w, l sources
are on and the buer content does not exceed {. Then, we have
Sl (w + w> {) = [Q (l 1)] w Sl31 (w> {) + (l + 1) w Sl+1 (w> {)
+ [1 {(Q l) + l} w] Sl (w> { (l f) w) + R(w2 )
Passing to the limit w $ 0 yields
CSl (w> {)
CSl (w> {)
+ (l f)
= (Q l + 1)Sl31 (w> {) [(Q l) + l]Sl (w> {)
Cw
C{
+ (l + 1) Sl+1 (w> {)
14.7 The AMS queue
305
The time-independent equilibrium probabilities, l ({) = limw<" Sl (w> {),
reflect the steady-state where l sources are on and the buer content does
(w>{)
= 0, the steady-state equations become for
not exceed {. Setting CSlCw
0lQ
gl ({)
= (Q l + 1) l31 ({) [(Q l) + l] l ({) + (l + 1) l+1 ({)
g{
(14.50)
In matrix notation, where ({) is a column vector as opposed to Markov
theory where ({) is a row vector,
(l f)
G
g({)
= T({)
g{
(14.51)
and G = diag[f> 1 f> 2 f> = = = > Q f] and T is a tri-diagonal (Q + 1) ×
(Q + 1) matrix,
5
Q 1
0
0
9 Q [(Q 1) + 1]
2
0
9 0
(Q 1)
[(Q 2) + 2] 3
9
T=9
..
..
..
9 ..
.
.
.
9 .
7 0
0
0
···
0
0
0
···
6
···
0
···
0 :
···
0 :
:
..
.. :
:
.
. :
[ + (Q 1)] Q 8
Q
The buer overflow probability is Pr [QV A {] = 1 k({)k1 = 1 PQ
g({)
m=0 m ({), which implies that k(4)k1 = 1. Moreover, lim{<" g{ = 0
and T(4) = 0 corresponds to the steady-state of the continuous Markov
chain (arrival process and service process as a whole). Furthermore,
µ ¶
1
Q l
(14.52)
l (4) =
(1 + )Q l
is the probability that l out of Q sources are on simultaneously irrespective
of what the buer level in the system is.
31
As shown in Section 10.2.2, besides ({) = hG T{ (0), the solution of
(14.51) can be expressed in terms of the eigenvalues m , the corresponding
right-eigenvector {m and left-eigenvector |m of G31 T,
({) =
Q
X
¡
¢
hm { {m |mW (0)
m=0
where, as shown in Appendix A.5.2.2, the eigenvalues are labeled in increasing order Q 3[f]31 ? · · · ? 1 ? 0 ? Q = 0 ? Q31 ? · · · ? Q3[f] . This
way of writing distinguishes between underload and overload eigenvalues.
Only bounded solutions are allowed. As shown in Appendix A.5.2.2, there
306
Queueing models
are precisely Q [f]1 negative real eigenvalues such that m 5 [0> Q [f]1].
In addition, m = Q that corresponds to the eigenvalue Q = 0 and the (4)
eigenvector. The general bounded solution of (14.51) is
X
Q3[f]31
({) = (4) +
dm hm { {m
(14.53)
m=0
where the scalar coe!cients dm = |mW (0) still need to be determined. Rather
than determining |mW (0) as in Appendix A.5.2.2, a more elegant and physical method is used. The eigenvalue solution in Appendix A.5.2.2 has scaled
the eigenvectors by setting the Q component equal to 1, hence, ({m )Q = 1.
Writing the Q -th component in (14.53) gives with (14.52)
Q
+
Q ({) =
(1 + )Q
X
Q3[f]31
dm hm {
(14.54)
m=0
The most convenient choice of { is { = 0. If the number of on-source m at
any time exceeds the service rate f, then the buer builds-up and cannot be
empty,
m (0) = 0
for [f] + 1 m Q
This observation provides one equation in (14.54) for the coe!cients dn ,
X
Q3[f]31
m=0
dm = Q
(1 + )Q
and shows that Q [f] 1 additional equations are needed to determine all
coe!cients dm . By dierentiating (14.54) p-times and evaluating at { = 0,
we find these additional equations
¯
Q3[f]31
X
gp Q ({) ¯¯
=
dm mp
¯
p
g{
{=0
m=0
which will be determined with the help of the dierential equation (14.51).
Indeed, for p = 1, the dierential equation (14.51) gives
¯
gQ ({) ¯¯
= G31 T(0)
g{ ¯{=0
31
The important observation is that the eect of multiplication by
¯ G T
gm ({) ¯
decreases the number of zero components in (0) by 1, i.e. g{ ¯
=0
{=0
for [f] + 2 m Q . Any additional multiplication by G31 T has the same
14.7 The AMS queue
eect. Since
that
gp ({)
g{p
307
¡
¢p
= G31 T ({), we thus find, for 0 p Q [f] 1
¯
gp Q ({) ¯¯
=0
g{p ¯{=0
We write these Q [f] equation in the unknown dn in matrix form,
5
1
1
12
..
.
9
9
9
9
9
9
9 Q [f]2
9 7 1
Q [f]1
1
1
2
22
..
.
Q [f]2
3
Q [f]1
3
2
2
···
···
···
..
.
···
1
3
32
..
.
Q[f]2
Q[f]1
6
1
6
5
5
Q [f]1 :
d0
9
:
2
d1
Q
: 9
:
9
9
[f]1 : 9
:
d2
..
:=9
:=9
9
:
:
9
.
..
: 7
8 9
9
Q [f]2 :
.
7
Q [f]1 8
d
Q
[f]1
Q [f]1
Q [f]1
···
Q
(1+)Q
0
0
..
.
0
0
6
:
:
:
:
:
:
:
8
and recognize the matrix, denoted by Y , as a Vandermonde matrix (Section
A.1, art. 5) with
Y
Y
l=0
m=l+1
Q3[f]31Q3[f]31
det (Y ) =
(m l )
Since all eigenvalues appearing in the Vandermonde matrix are distinct (Appendix A.5.2.2) det (Y ) 6= 0 and a unique solution follows for all 0 m Q [f] 1 from Cramer’s theorem as
µ
dm = 1+
¶Q Q3[f]31
Y
l=0;l6=m
m
l m
(14.55)
Together with the exact determination of the eigenvalues m and corresponding right-eigenvector {m explicitly given in Appendix A.5.2.2, the coe!cients
dm completely solve the AMS queue.
P
The buer overflow probability Pr [QV A {] = 1 Q
m=0 m ({) becomes
PQ
with m=0 m (4) = 1,
X
Q3[f]31
Pr [QV A {] = m=0
m {
dm h
Q
X
({m )o
o=0
Using the explicit form of the generating function (A.44) where the roots
u1 and u2 belonging to eigenvalue n are specified in (A.42) and the residue
n = nm = f1 in (A.43), the buer overflow probability is
X
Q 3[f]31
Pr [QV A {] = m=0
dm hm { (1 u1 )nm (1 u2 )Q 3nm
(14.56)
308
Queueing models
For large {, Pr [QV A {] will be dominated by the exponential with the
largest negative eigenvalue 0 (for which n0 = Q ),
Pr [QV A {] d0 h0 {
Q
X
({Q )m
m=0
Writing that largest negative eigenvalue (A.47) in terms of the tra!c intensity , gives
(1 + ) (1 )
0 = 1 Qf
¡ Q ¢Q
P
From (A.49), we have Q
. Combined with (14.55), the
m=0 ({Q )m =
f
asymptotic formula for the buer overflow probability becomes
Y
Q3[f]31
Q 0 {
Pr [QV A {] h
l=1
l
l 0
(14.57)
0
10
-1
10
-2
10
-3
U = 0.9
10
-4
10
-5
10
-6
Pr[NS > x]
10
N = 40
-7
10
-8
10
U = 0.7
-9
10
N = 100
-10
10
-11
10
-12
10
U = 0.5
-13
10
-14
10
-15
10
0
5
10
15
20
x
Fig. 14.7. The overflow probability (14.56) in the AMS queue versus the buer level
{ for fixed = 12 . For each tra!c intensity = 0=5, 0=7 and 0.9, the upper curve
corresponds to Q = 40 and the lower to Q = 100. The asymptotic formula (14.57)
is shown in dotted line.
Figure 14.7 shows both the exact (14.56) and asymptotic (14.57) overflow
14.8 The cell loss ratio
309
probability as function of { for various tra!c intensities and two size of
Q . The average o-period in Fig. 14.7 is two times the average on-period.
For large values of { and large tra!c intensities , the asymptotic formula is
adequate. For smaller values, clear dierences are observed. As mentioned in
Section 5.7, the asymptotic regime that nearly coincides with (14.57) refers
to the burst scale phenomena while the non-asymptotic regime reflects the
smaller scale variations. The AMS queue allows us to analyze the eect of
the burstiness of a source by varying .
14.8 The cell loss ratio
Due to its importance in ATM and in future time-critical communication
services, the QoS loss-performance measure, the cell loss ratio, deserves some
attention. In designing a switch for time-critical services with strict delay
requirements smaller than GW , the buer size N is dimensioned as follows.
The order of magnitude of GW is about 10 ms, the maximum end-to-end
delay for high-quality telephony (world wide) advised in ITU-T standards.
Let H[K] be the average number of hops of a path in an ATM network
that rarely exceeds 10 hops. The buer size N is determined such that the
GW
GW
, thus, N
maximum waiting time of a cell never exceeds H[K]
10 . For
example, for STM-1 links where F = 155 Mb/s, we have that ' 366 800
W
ATM cell/s such that N G
10 ' 366 ATM cell buer positions. This firstorder estimate shows that the ATM buer for time-critical tra!c consists
of a few hundreds of ATM cell positions. That small number for N indeed
assures that the delay constraints are met, but introduces the probability
to loose cells. Hence, the QoS parameter to be controlled for time-critical
services is the cell loss ratio.
The cell loss ratio fou is defined as the ratio of the long-time average number of lost cells because of buer overflow to the long-time average number
of cells that arrive in steady-state. There are typically two dierent views
to describe the cell loss ratio: a conservation-based and a combinatorial
one. The conservation law simply states that cells entering the system also
must leave it. The average number of entering cells are all those oered
per time slot minus the ones that have been rejected, thus (1 fou). On
the other side, the average number of cells that leave the system are related to the server activity as (1 t [0]), where is the service rate and
t[m] = Pr [QT = m]. Hence, we have
(1 fou) = (1 t [0]) (14.58)
In the combinatorial view, only the arrival process is viewed from a position
310
Queueing models
in the buer and the number of ways in which cells are lost are counted
leading to
"
N
1 X X
q
t [N m] d [m + q]
(14.59)
fou = 0
D (1) q=0
m=0
with D0 (1) = and d[m] = Pr [A = m]. Although equation (14.58) is simple,
its practical use is limited since the quantities involved are to be known with
extremely high accuracy if fou is of the order of 10310 , which in practice
means a virtually loss-free service. Therefore, we confine ourselves to the
combinatorial result and express (14.59) in terms of a generating function
as
¯
gY (}) ¯¯
0
fou D (1) =
(14.60)
g} ¯}=1
where
Y (}) =
"
X
}q
q=0
=
N
X
N
X
t [N m] d [m + q] =
m=0
t [N m] } 3m
Ã"
X
"
X
N
X
}q
q=0
!
t [N m] } 3m d [m + q] } m
m=0
d [m + q] } m+q
q=0
m=0
Rearranging in terms of the generating function for the arrivals D(}) and
P
m
for the buer occupancy T(}) = N
m=0 t [m] } , where t [m] = 0 for m A N,
yields
Ã
!
m31
N
X
X
t [N m] } 3m D(}) d [q] } q
Y (}) =
q=0
m=0
= D(}) }
3N
N
X
t [N m] }
N3m
m=0
= } 3N D(})T(}) } 3N
N
X
t [N m] }
3m
N
X
m=0
t [m] } m
d [q] } q
q=0
m=0
N3m31
X
m31
X
d [q] } q
(14.61)
q=0
In order to express the cell loss ratio entirely in terms of the generating
functions D(}) and T(}), we employ (2.20),
Ã
!
Z
Z
q
q
³ } ´q+1 ¸
X
X
1
\
($)
\ ($)
1
m
m
|[m]} =
}
g$ =
1
g$
2l F(0) $ m+1
2l F $ }
$
m=0
m=0
Z
\ ($) ³ } ´q+1
1
= \ (}) g$
(14.62)
2l F $ } $
14.8 The cell loss ratio
311
where F is a contour enclosing the origin and the point } and lying within
the convergence region of \ (}). Combining (14.61) and (14.62), we rewrite
Y (}) as
Z
D($)T($)
1
g$
Y (}) = } 3N D(})T(}) } 3N T(})D(}) +
2l F ($ }) $ N
Z
1
D($)T($)
=
g$
2l F ($ }) $ N
Finally, our expression for the cell loss ratio in a GI/G/1/K system reads
Z
D($)T($)
1
g$
(14.63)
fou =
2lD0 (1) F ($ 1)2 $N
where the contour F encloses both the origin and the point } = 1 and lies in
the convergence region of D(}). Usually, D(}) is known while T(}) proves
to be more complicated to obtain. The product T(})D(}) = V(}) is the pgf
of the system content.
If T(}) and D(}) are meromorphic functions7 and if
¯
¯
¯ D(}) T(}) ¯
¯
¯ = 0>
lim
}<" ¯ (} 1)2 } N31 ¯
the contour F in (14.63) can be closed over |$| A 1-plane to get
1 X
D($)T($)
fou = 0
Res$<s N
D (1) s
$ ($ 1)2
(14.64)
where s are the poles of D(})T(}) outside the unit circle. If these conditions
are met, a non-trivial evaluation of the cell loss ratio can be obtained. In
case the buer pgf of the finite system is known, then T(}) is a polynomial
T(})
is zero and
of degree at most N so that the only pole of T(})
}N
¯ lim¯}<" } N =
¯ } D(}) ¯
t(N) 1 and the above conditions simplify to lim}<" ¯ (}31)
2 ¯ = 0. Executing (14.63) then leads to
T(s)
1 X
Res$<s D($)
(14.65)
fou = 0
N
D (1) s s (s 1)2
where only the poles s of the arrival process D(}) play a role. For example,
if the number of arrivals has a geometric distribution d [n] = (1 )n
13
with 0 1 with generating function (3.6), Dgeo (}) = 13}
, then the
conditions for (14.65) are satisfied and we obtain,
µ ¶
1
N
fougeo = T
7
Functions that only have poles in the complex plane.
312
Queueing models
An important class excluded from (14.64) consists of entire functions D(})
that possess a Taylor series expansion converging for all complex variables
}. The pgf of a Poisson process with parameter , DPoisson (}) = h(}31) ,
is an important representative of that class. For a Poisson arrival process,
(14.63) is
Z
h$ T($)
h3
g$
fouPoisson =
2l F ($ 1)2 $N
Deforming the contour to enclose the negative half $-plane (Re($) ? f)
yields
Z f+l"
h$ T($)
h3
fouPoisson =
g$
2l f3l" ($ 1)2 $ N
where the real number f exceeds unity. This expression is recognized as an
inverse Laplace transform and since the argument of the Laplace transform
is a rational function, an exact evaluation is possible leading, however, again
to (14.59). Hence, the combinatorial view does not oer much insight immediately which suggests to consider a conservation-based approach. Indeed,
it is well known that, owing to the PASTA property (Theorem 13.5.1), an
exact expression (Syski, 1986; Bisdikian et al., 1992; Steyaert and Bruneel,
1994) in continuous-time for the cell loss ratio in a M/G/1/K system can
be derived, with the result
fouM/G/1/K;cont = (1 )
Pr[Q A N 1]
1 Pr[Q A N 1]
(14.66)
where, as usual, the tra!c intensity = and Pr [Q A N 1] is the overflow probability in the corresponding infinite system M/G/1. Transforming
fouco nt
yields
(14.66) to discrete-time using foudiscr = (13fou
co nt )
fouM/G/1/K;discr =
1 Pr[Q A N 1]
1 Pr[Q A N 1]
(14.67)
14.9 Problems
(i) A router processes 80% of the time data packets. On average 3.2
packets are waiting for service. What is the mean waiting time of a
packet given that the mean processing time equals 1 ?
(ii) Compute in a M/M/m/m queue the average number of busy servers.
(iii) Let us model a router by a M/M/1 system with average service time
equal to 0.5 s.
14.9 Problems
313
(a) What is the relation between the average response time (average system time) and the arrival rate ?
(b) How many jobs/s can be processed for a given average response
time of 2.5 s?
(c) What is the increase in average response time if the arrival
rate increases by 10%?
(iv) Assume that company has a call center with two phone lines for service. During some measurements it was observed that both the lines
are busy 10% of the time. On the other hand, the average call holding
time was 10 minutes. Calculate the call blocking probability in the
case that the average call holding time increases from 10 minutes to
15 minutes. Call arrivals are Poissonean with constant rate.
(v) Consider a queueing network with Poisson arrivals consisting of two
infinitely long single-server queues in tandem with exponential service
times. We assume that the service times of a customer at the first
and second queue are mutually independent as well as independent
of the arrival process. Let the rate of the Poisson arrival process be
, and let the mean service rates at queues 1 and 2 be 1 and 2 ,
respectively. Moreover, assume that ? 1 and ? 2 . Give the
probability that in steady-state there are q customers at queue 1 and
p customers at queue 2.
(vi) Let us consider the following simple design question: which queue of
the M/M/m family is most suitable if the arrival rate is and the
required service rate is n, with n A 1. We have the three options
illustrated in Fig. 14.8 at our disposal. Since all queues have infinite
P
O/k
1
1
kP
O
P
O
P
P
O/k
A
k
k
B
C
Fig. 14.8. Three dierent options: (A) one M/M/1 queue with service rate
n, (B) n M/M/1 queue with service rate and (C) one M/M/k queue with
service rate n=
buers and the same tra!c intensity = n
, and thus the same
throughput. The QoS qualifier of interest here is the delay, more
314
Queueing models
precisely, the system time W of a packet. Compare the average system
time and draw conclusions.
(vii) An aeroplane takes exactly 5 minutes to land after the airport’s tra!c
control has sent the signal to land. Aeroplanes arrive at random with
an average rate of 6/hour. How long can an aeroplane expect to circle
before getting the signal to land? (Only one aeroplane can land at a
time)
(viii) There are two kinds of connection requests arriving at a base station
of a mobile telephone network: connection requests generated by
new calls (that originate from the same cell as the base station) or
handovers (that originate from a dierent cell, but are transferred
to the cell of the base station). The handovers are supposed not to
experience blocking. Therefore, the base station has to reject some of
the new call connection requests. Every accepted connection request
occupies one of the P available channels. During a busy hour, the
average measured channel occupation time of a call is 1.64 minutes
irrespective of the type of call. Furthermore, the average number of
active calls is 52 and the measured blocking is 2% of the number of
all the connection requests. The average interarrival time between
two consecutive new call connection requests in the cell is 3 seconds.
(a) Calculate the arrival rate (in calls/minute) for the handover
calls.
(b) What is the percentage of new calls that are blocked?
(ix) Let Q denote the number of Poisson arrivals with rate during
the service time { (random variable) of a packet. Assume that the
Laplace transform of the service time *{ (v) = H[h3v{ ] is known.
(a) Show that the pgf of Q is given by *{ ((1 })).
(b) What is the pgf if the service time { is exponential distributed
with mean 1 ? Deduce from this the distribution of Q .
(x) A single-server queue has exponential inter-arrival and service times
with means 31 and 31 , respectively. New customers are sensitive
to the length of the queue. If there are l customers in the system
when a customer arrives, then that customer will join the queue with
a probability (l+1)31 , otherwise he/she departs and does not return.
Find the steady-state probability distribution of this queuing system.
(xi) The M/M/m/m/s queue (The Engset-formula). Consider a system
with p connections and v customers who all desire to telephone and,
hence, need to obtain a connection or line. Each customer can occupy
at most one line. The group of v customers consists of two subgroups.
14.9 Problems
315
When a line has been assigned to a customer, this customer is transferred from the “still demanding subgroup” to the “served group”.
The number of call attempts decreases with the size of the “served
group” whose members all occupy one line. More precisely, the arrival rate in the Engset model is proportional to the size of the “still
demanding subgroup” and the number of arrivals is exponential. The
holding time of a line is also exponentially distributed with mean 1 .
(a) Describe the M/M/m/m/s queue as a birth-death process.
(b) Compute the steady-state.
(c) Compute the blocking probability (similar to the blocking in
the Erlang model).
(xii) Compare the cell loss ratio of the M/M/1/K and of the discrete
M/1/D/K using the dominant pole approximation in Section 5.7.
Hint: approximate the cell loss ratio by the overflow probability.
Part III
Physics of networks
15
General characteristics of graphs
The structure or interconnection pattern of a network can be represented by
a graph. Properties of the graph of a network often relate to performance
measures or specific characteristics of that network. For example, routing is
an essential functionality in many networks. The computational complexity
of shortest path routing depends on the hopcount in the underlying graph.
This chapter mainly focuses on general properties of graphs that are of
interest to Internet modeling.
Mainly driven by the Internet, a large impetus from dierent fields in
science makes the understanding of the growth and the structure of graphs
one of the currently most studied and exciting research areas. The recent
books by Barabasi (2002) and Dorogovtsev and Mendes (2003) nicely reflect
the current state of the art in stochastic graph theory and its applications
to, for example, the Internet, the World Wide Web, and social and biological
networks.
15.1 Introduction
Network topologies as drawn in Fig. 15.1 are examples of graphs. A graph
J is a data structure consisting of a set of Y vertices connected by a set
of H edges. In stochastic graph theory and communications networking,
the vertices and edges are called nodes and links, respectively. In order
to dierentiate between the expectation operator H [=], the set of links is
denoted by L and the number of links by O and similarly, the set of nodes
by N and number of nodes by Q . Thus, the usual notation of a graph
J (Y> H) in graph theory is here denoted by J (Q> O).
The full mesh or complete graph NQ consists of Q nodes and O = Omax =
Q(Q31)
links, where every node has a link to every other node. The graph
2
that is generated by the statement “any l is directly connected to any m” in
319
320
General characteristics of graphs
full mesh
(complete graph)
star
ring
2D (square) lattice
Tree (connected,
loopless graph)
Fig. 15.1. Several types of network topologies or graphs.
a population of Q members,
¡ ¢ is a complete graph NQ . Since in NQ the number of links Omax = R Q 2 for large Q , it demonstrates “Metcalfe’s law”:
the value of networking increases quadratically in the number of connected
members.
The interconnection pattern of a network with Q nodes can be represented
by an adjacency matrix D consisting of elements dlm that are either one or
zero depending on whether there is a link between node l and m or not.
The adjacency matrix is a real symmetric Q × Q matrix when we assume
bi-directional transport over links. If there is a link from l to m (dlm = 1)
then there is a link from m to l (dml = 1) for any m 6= l. Moreover, we
exclude self-loops (dmm = 0) or multiple links between two nodes l and m.
More properties of the adjacency matrix of a graph are found in Appendix
B.
A walk from node D to node E with n 1 hops or links is the node
list WD<E = q1 $ q2 $ · · · qn31 $ qn where q1 = D and qn = E.
A path from node D to node E with n 1 hops or links is the node list
PD<E = q1 $ q2 $ · · · qn31 $ qn where q1 = D and qn = E and where
qm 6= ql for each index l and m. Sometimes the shorter notation PD<E
= q1 q2 · · · qn31 qn is used. All links ql $ qm and the nodes qm in the path
PD<E are dierent, whereas in a walk WD<E no restrictions on the node
list is put. If the starting node D equals the destination node E, that path
PD<D is called a cycle or loop. In telecommunications networks, paths and
not walks are basic entities in connecting two communicating parties. Two
paths between D and E are node(link)-disjoint if they have no nodes(links)
in common.
Apart from the topological structure specified via the adjacency matrix D, the link between node l and m is further characterized by a link
weight z(l $ m), most often a positive real number1 that reflects the
1
In quality of service routing, a link is specified by a vector z(l
 < m) with positive components, each reflecting a metric (such as delay, jitter, loss, monetary cost, administrative weight,
physical distance, available capacity, priority, etc.).
15.2 The number of paths with m hops
321
importance of that particular link. Often symmetry in both directions,
z(l $ m) = z(m $ l), is assumed leading to undirected graphs. Although
this assumption seems rather trivial, we point out that in telecommunications, transport of information in up-link and down-link is, in general, not
symmetrical. Via measurements in the Internet, Paxson (1997) found in
1995 that about 50% of the paths from D $ E were dierent from those
from E $ D= Furthermore, it is often assumed that the link metric z(l $ m)
is independent from z(n $ o) for all links (l $ m) dierent from (n $ o). In
the Internet’s intra-domain routing protocol, the Open Shortest Path First
(OSPF) protocol, network operators have the freedom2 to specify the link
weight z(l $ m) A 0 on the interfaces of their routers.
15.2 The number of paths with m hops
Let [m (D $ E; Q) denote the number of paths with m hops between a
source node D and a destination node E. The most general expression for
the number of paths with m hops between node D and node E is
X
X
X
···
1D<n1 = 1n1 <n2 =· · · =1nm31 <E
[m (D $ E; Q ) =
n1 M{D>E}
@
n2 M{D>n
@
@
1 >E} nm31 M{D>n
1 >··· >nm32 >E}
(15.1)
where 1{ is the indicator function. The number of paths with one hop equals
[1 (D $ E; Q ) = 1D<E . The maximum number of m hop paths is attained
in the complete graph NQ where 1n1 <n2 = 1 for each link n1 $ n2 and
equals
(Q 2)!
max([m (D $ E; Q )) =
(15.2)
(Q m 1)!
The maximum number of hops in any path is Q 1. This maximum occurs,
for example, in a line graph where the path runs from the one extreme node
to the other or in a ring (see Fig. 15.1) between neighboring nodes where
there is a one hop and a (Q 1)-hops path.
The total number of paths PQ between two nodes in the complete graph
is
PQ =
Q31
X
max([m (D $ E; Q )) =
m=1
Q31
X
m=1
X 1
(Q 2)!
= (Q 2)!
(Q m 1)!
n!
Q32
n=0
= (Q 2)!h U
2
8
10
In Cisco’s OSPF implementation, it is suggested to use z(l < m) = E(l<m)
where E(l < m)
denotes the capacity (in bit/s) of the link between nodes l and m. An approach to optimize the
OSPF weights to reflect actual tra!c loads is presented by Fortz and Thorup (2000).
322
General characteristics of graphs
where
"
"
X
X
1
(Q 2)!
=
U = (Q 2)!
m!
(Q 1 + m)!
m=Q31
m=0
1
1
1
+
+
+ ···
Q 1 (Q 1)Q
(Q 1)Q (Q + 1)
¶m
" µ
X
1
1
=
?
Q 1
Q 2
=
m=1
implying that for Q 3, U ? 1. But PQ is an integer. Hence, the total
number of paths in NQ is exactly equal to
PQ = [h(Q 2)!]
(15.3)
where h = 2.718 281=== and [{] denotes the largest integer smaller than or
equal to {. Since any graph is a subgraph of the complete graph, the
maximum total number of paths between two nodes in any graph is upper bounded by [h(Q 2)!].
15.3 The degree of a node in a graph
The degree gm of a node m in a graph J(Q> O) equals the number of its
neighboring nodes and 0 gm Q 1. Clearly, the node m is disconnected
from the rest of the graph if gm = 0. Hence, in connected graphs, 1 gm Q 1. The basic law for the degree (see also Appendix (B.2)) is
Q
X
gm = 2O>
m=1
since each link belongs to precisely two nodes and, hence, is counted twice.
In directed graphs, the in(out)-degree is defined as the number of the in(out)going links at a node, while the sum of in- and out-degree equals the degree.
The minimum nodal degree in the graph J is denoted by gmin = minmMJ gm .
P
2O
The average degree of a graph is defined as gd = Q1 Q
m=1 gm = Q which
is, for a connected graph, bounded by 2 Q2 gd Q 1. The lower
bound is obtained for any spanning tree, a graph that connects all nodes
and that contains no cycles and where O = Omin = Q 1. The upper bound
is reached in the complete graph NQ with Omax = Q(Q31)
. Graphs where
2
gmin = gd such as NQ and the ring topology in Fig. 15.1 are called regular
graphs since any node has precisely gd links.
Sometimes networks are classified either as dense if gd is high or as sparse
15.3 The degree of a node in a graph
323
Fig. 15.2. Degree graph with = 2=4 and Q = 300. All nodes are drawn on a
circle.
if gd is small. For instance, the Internet is sparse with average degree gd 3, although some backbone routers may have a much higher degree, even
exceeding 100. The distribution of the degree GInternet of an arbitrary node
in the Internet is shown to be approximately polynomial (Siganos et al.,
2003),
Pr [GInternet = n] n3
( )
(15.4)
P
3v for Re(v) A 1 is the Riemann
with3 5 (2=2> 2=5) and (v) = "
n=1 n
Zeta function (Titchmarsh and Heath-Brown, 1986). A graph of this class
is called a degree graph. Figures 15.2 and 15.3 show two instances of a
degree graph.
Also the web graph consisting of websites and hyperlinks features a power
law for the in-degree. David Aldous has given the following argument why a
power law of the in-degree of the web graph is natural. To a good approximation, the number of websites is growing exponentially at rate A 0. This
means that the lifetime W of a random website satisfies Pr [W A w] h3w .
3
A more general expression than (15.4) is Pr [gm = n] = fn j(n), where f is a normalization
constant and where j(n) is a slowly varying function (Feller, 1971, pp. 275-284) with basic
property that limw<" j(w{)
= 1, for every { A 0.
j(w)
324
General characteristics of graphs
Fig. 15.3. Degree graph with = 2=4 and Q = 200. The higher degree nodes are
put inside the circle.
Let o (x) denote the number of links into a site at time x after its creation. At
observation time w, the distribution of the number of links [ into a random
website is, by the law of total probability,
Z w
g Pr [W x]
Pr [[ A n|W = x]
gx
Pr [[ A n] =
gx
0
Z w
h3x Pr [[ A n|W = x] gx
0
Z w
Z w
31
3x
h 1{o(x)An} gx = h3x gx = h3w + h3o (n)
0
o31 (n)
Only if o increases exponentially fast as o (x) hx for some ? , a power
law behavior of the in-degree
Pr [[ A n] n3 arises for su!ciently large w. For a polynomial growth o (x) x and large
w,
Pr [[ A n] h3n
1
The large dierence in the decrease of Pr [[ A n] with n between both ex-
15.4 Connectivity and robustness
325
amples illustrates the importance of the growth law of o (x). The argument
shows that a polynomial scaling law, commonly referred to as a power law,
is a natural consequence of exponential growth. An exponential growth possesses the property that go(x)
gx = o (x) which is established by preferential
attachment. Preferential attachment means that new links are on average
added to sites proportional to their size. The more links a site has, the larger
the probability that a new link attaches to this site. For example, already
popular websites are increasingly more often linked to than small or less
popular websites. Since many aspects of the Internet, such as the number
of IP packets, number of users, number of websites, number of routers, etc.,
are currently growing approximately exponentially fast, the often observed
power laws are more or less expected.
15.4 Connectivity and robustness
A graph J is connected if there is a path between each pair of nodes and
disconnected otherwise. A telecommunication network should be connected.
Moreover, it is essential that the network should be robust: it should still
operate if some of the links between routers or switches are broken or temporarily blocked by other calls. Hence, the network graph should possess a
redundancy of links. The minimum number of links to connect all nodes in
the network equals Q 1. This minimum configuration is called redundancy
level 1. In general, a redundancy level of G is defined by Baran (2002) as
the link-to-node ratio in an infinite G-lattice4 . A redundancy level of at
least 3 is regarded as a highly robust network. A consequence of this insight has been employed in the design of the early Internet (Arpanet): it
would be theoretically possible to build extremely reliable communication
networks out of unreliable links by the proper use of redundancy. Another
more timely application of the same principle is the design of reliable ad-hoc
and sensor networks.
4
A G-lattice is a graph where each nodal position corresponds to a point with integer coordinates
within a G dimensional hyper-cube with size ]. Apart from the border nodes, each node has a
same degree equal to 2G. The number of nodes equals Q = ] G . From (B.2), the link-to-node
ratio follows as
Q
1 [
O
gm = G 3 u
=
Q
2Q m=1
1
where the correction u = R Q G 31 is due to the border nodes. For an infinite G-lattice,
where the limit ] < " (which implies Q < "), we obtain
lim
O
]<" Q
=G
326
General characteristics of graphs
There exist interesting results from graph theory that help to dimension
a reliable telecommunication network. Instead of the redundancy level, the
edge and vertex connectivity seem more natural quantifiers from which robustness can be derived. The edge connectivity (J) of a connected graph
J is the smallest number of edges (links) whose removal disconnects J. The
vertex connectivity (J) of a connected graph dierent5 from the complete
graph NQ is the smallest number of vertices (nodes) whose removal disconnects J.
edge connectivity
C
B
A
D
O(G) = 1
G
C
B
E
F
A
D
G
F
C
B
E
A
N(G) = 1
vertex connectivity
D
E
F
Fig. 15.4. An example of the edge and the vertex connectivity of a graph.
These definitions are illustrated in Fig. 15.4. For any connected graph J
holds that
(J) (J) gmin (J)
(15.5)
In particular, if J is the complete graph NQ , then (NQ ) = (NQ ) =
gmin (NQ ) = Q 1. Due to the importance of the inequality6 (15.5), it
deserves some more discussion. Let us concentrate on a connected graph
J that is not a complete graph. Since gmin (J) is the minimum degree of
a node, say q, in J, by removing all links of node q, J is disconnected.
By definition, since (J) is the minimum number of links that leads to
disconnectivity, it follows that (J) (J) and (J) Q 2 because J
is not a complete graph and consequently the minimum nodal degree is at
most Q 2. Furthermore, the definition of (J) implies that there exists
a set V of (J) links whose removal splits the graph J into two connected
subgraphs J1 and J2 , as illustrated in Fig. 15.5. Any link of that set V
5
The complete graph NQ cannot be disconnected by removing nodes and we define (NQ ) =
Q 3 1 for Q D 3.
6 A second general inequality (B.23) relates the second smallest eigenvalue of the Laplacian to
the edge and vertex connectivity (see Section B.4).
15.4 Connectivity and robustness
327
connects a node in J1 to a node in J2 . Indeed, adding again an arbitrary
link of that set makes J again connected. But J can be disconnected into the
same two connected subgraphs by removing nodes in J1 and/or J2 . Since
possible disconnectivity inside either J1 or J2 can occur before (J) nodes
are removed, it follows that (J) cannot exceed (J), which establishes the
inequality (15.5).
G2
G1
C
A
B
Fig. 15.5. A graph J with Q = 16 nodes and O = 32 links. Two connected
subgraphs J1 and J2 are shown. The graph’s connectivity parameters are (J) = 1
(removal of node F), (J) = 2 (removal of links from F to J1 ), gmin (J) = 3 and
gd = 2O
Q = 4.
Let us proceed to find the number of link-disjoint paths between D and
E in a connected graph J. Suppose that K is a set of links whose removal
separates D from E. Thus, the removal of all links in the set K destroys all
paths from D to E. The maximum number of link-disjoint paths between
D and E cannot exceed the number of links in K. However, this property
holds for any set K, and thus also for the set with the smallest possible
number of links. A similar argument applies to node-disjoint paths. Hence,
we end up with Theorem 15.4.1:
Theorem 15.4.1 (Menger’s Theorem) The maximum number of link(node)-disjoint paths between D and E is equal to the minimum number of
links (nodes) separating D and E.
Recall that the edge connectivity (J) (analogously vertex connectivity
(J)) is the minimum number of links (nodes) whose removal disconnects
J. By Menger’s Theorem, it follows that there are at least (J) link-disjoint
paths and at least (J) node-disjoint paths between any pair of nodes in J.
In order to dimension the graph J of a robust telecommunications network, the goal is to maximize both (J) and (J). Of course, the most
reliable graph is the complete graph; however, it is also the most expensive. Usually, since the cost of digging and of installing/connecting the
fibres is around 70% of the total network cost, the total number of links O
328
General characteristics of graphs
is minimized. Since the minimum cannot exceed the average, we have that
gmin gd = 2O
Q . From (15.5), it follows that the best possible reliability is
achieved if the network graph is designed such that
(J) =
2O
Q
The optimum implies that gmin (J) = gd = 2O
Q or that each node has the
same degree gm = gd . Hence, a best possible reliable graph is a regular
graph (gm = gd ), but not every regular graph necessarily obeys (J) = gd .
Furthermore, two dierent graphs with the same parameters Q , O, (J),
(J) and gmin (J) are not necessarily equally reliable. Indeed, the edge
and vertex disconnectivity only give a minimum number (J) and (J)
respectively, but do not give information about the number of subsets in
J that lead to this number. It is clear that if only one vulnerable set of
nodes is responsible for a low (J), while in another graph there are more
such sets, that the first graph is more reliable than the second one. In
summary, the presented simplified analysis gives some insights, but more
details (e.g. the number of vulnerable sets or subgraphs) must be considered
in the dimensioning study.
15.5 Graph metrics
An important challenge in the modeling of a network is to determine the class
of graphs that represents best the global and local structure of the network.
Most of the valuable networks like the Internet, road infrastructures, neural
networks in the human brain, social networks, etc. are large and changing
over time. In order to classify graphs a set of distinguishing properties,
called metrics, needs to be chosen. These metrics are in general function of
the graph’s structure J(Q> O). Natural metrics of a graph are the degree
distribution and the hopcount distribution of an arbitrary path. Beside
quantities such as the diameter and the complexity of a graph defined in
algebraic graph theory (Appendix B), some other metrics are the clustering
coe!cient, the expansion, the resilience, the distortion and the betweenness.
The clustering coe!cient fJ (y) characterizes the density of connections
in the environment of a node y and is defined as the ratio of the number of
links | connecting the gy neighbors of y over the total possible gy (g2y 31) ,
fJ (y) =
2|
gy (gy 1)
(15.6)
The expansion hJ (k) of a graph reflects the number of nodes that can be
15.6 Random graphs
329
reached in k hops from a node y,
hJ (k) =
1 X
|F (k)|
Q2
(15.7)
yMN
where F (k) is the set of nodes that can be reached in k hops from a node
y and |D| represents the number of elements in the set D. We can interpret
F (k) geometrically as a ball centered at node y with radius k.
The resilience uJ (p) measures the connectivity or robustness of a graph.
Let p = |F (k)| denote the number of nodes in a ball centered at node y
and with radius k, and define o (y> p) as the number of links that needs to
be removed to split F (k) into two sets with roughly equal numbers of nodes
(around p@2). The resilience uJ (p) of a graph is
1 X
o(y> p)
(15.8)
uJ (p) =
O
yMN
The distortion wJ (p) measures how closely the graph resembles a tree and
is defined as
1 X
z (F (k))
(15.9)
wJ (p) =
Q
yMN
where z (J) is the value of the minimum spanning tree in J with unit link
weight z (l $ m) = 1 for each link of J=
Consider a flow with a unit amount of tra!c between each pair of nodes
in the graph J. Each flow between a node pair follows the shortest path
between that node pair. The betweenness E of a link (node) is defined as
the number of shortest paths between all possible pairs of nodes in J that
traverse the link (node). If Kl<m denotes the number of hops in the shortest
path from l $ m, then the total number of hops KJ in all shortest paths in
P PQ
PO
J is KJ = Q
l=1
m=l+1 Kl<m . This number is also equal to KJ =
o=1 Eo ,
where Eo is the betweenness of a link o in J. Taking the expectation of both
relations gives the average betweenness of a link in terms of the average
hopcount
¡Q ¢
2
H[KQ ] H[KQ ]
O
with equality only for the complete graph.
H [E] =
15.6 Random graphs
Besides the regular topologies in Fig. 15.1, the class of random graphs constitutes an attractive set of topologies to analyze network performance. The
330
General characteristics of graphs
theory of random graphs originated from a series of papers by Erdös and
Rényi in the late 1940s. There exists an astonishingly large amount of literature on random graphs. The standard work on random graphs is the book
by Bollobas (2001). We also mention the work of Janson et al. (1993) on
evolutionary processes in random graphs.
The two most frequently occurring models for random graphs are the
Erdös-Rényi random graphs Js (Q ) and Ju (Q> O). The class of random
graphs denoted by Js (Q ) consists of all graphs with Q nodes in which
the links are chosen independently and with probability s. In the class
Js (Q ) the total number of links is not deterministic, but on average equal
¡ ¢
to H [O] = s Omax where Omax = Q2 . Since s = OH[O]
we also call s the
max
link density of Js (Q ). An instance of the class J0=013 (300) is drawn in
Fig. 15.6. Related to Js (Q ) are the geometric random graphs J{slm } (Q )
where the links are still chosen independently but where the probability of
l $ m being an edge is slm . An example of J{slm } (Q ) is the Waxman graph
ul um |) and where
(Waxman, 1998; Van Mieghem, 2001) with slm = exp(d|
the vector ul represents the position of node l and d a real, non-negative
number. Geometric random graphs are good models for ad-hoc wireless
networks where the probability slm = i (ulm ) that there is a wireless link
between node l and m is specified by the radio propagation that is briefly
explained at the end of Section 3.5.
random graphs with Q nodes and O links.
The class Ju (Q> O) is the¡ set of
Om a x ¢
In total, we can construct O dierent graphs, which corresponds to the
number of ways we can distribute a set of O ones in the Omax possible places
in the upper triangular part above the diagonal of the adjacency matrix
D. Each of the possible Omax links has equal probability to belong to a
random graph of the class Ju (Q> O). The probability that an element in
the adjacency matrix D is dlm = 1 equals s = OmOa x . As opposed to the
class Js (Q ), the number of non-zero elements in D in each random graph
of Ju (Q> O) is precisely 2O (see Appendix B.1, art. 2), which induces weak
dependence between links in Ju (Q> O). The latter also explains why more
computations are easier in Js (Q ) than in Ju (Q> O).
The average number of paths with m hops between two arbitrary nodes in
Js (Q ) follows from (15.1) and (2.13) as
H[[m ] =
(Q 2)! m
s
(Q m 1)!
(15.10)
for 1 m Q 1. The average total number of paths between two arbitrary
15.6 Random graphs
331
Fig. 15.6. A connected random graph Js (Q ) with Q = 300 and s = 0=013 drawn
on a circle.
nodes D and E equals
6
5
Q31
Q32
X
X s3o
1
H7
[m 8 = (Q 2)!sQ 31
(Q 2)!sQ31 h s
o!
m=1
o=0
where the latter bound is closely approached for large Q. Moreover, when
the random graph reduces to the complete graph (s = 1), we again obtain
(15.3). Since the degree gm of a node m is the number of links incident
with that node, it follows directly from the definition of Js (Q ) that the
probability density function of the degree Grg of an arbitrary node in Js (Q )
equals
µ
¶
Q 1 n
Pr [Grg = n] =
s (1 s)Q313n
(15.11)
n
The interest in random graphs is fueled by the fact that the topology of
the Internet is inaccurately known and also that good models7 are lacking.
In some sense, the Internet can be regarded as a growing and changing organism. Such complex networks also arise in other fields. Increased interest
7
A detailed discussion on di!culties in modeling or simulating the Internet is presented by Floyd
and Paxson (2001).
332
General characteristics of graphs
from dierent disciplines to understand network behavior resulted in a new
wave of science, which may be termed “the physics of networks” and which
was recently reviewed by Strogatz (2001). Random graphs are an elegant
vehicle to thoroughly analyze the performance of, for example, routing algorithms. Some constructed overlay networks such as Gnutella and mobile
wireless ad-hoc networks seem reasonably well modeled by Js (Q ). However, the class Js (Q) does not describe the Internet topology well, and the
degree distribution especially deviates significantly. The degree distribution
(15.11) in Js (Q ) is binomially distributed, while that of the Internet is
close to a power law (15.4). Hence, there is a discrepancy between Internet
measurements and properties of the random graph Js (Q ).
15.6.1 The number F(Q> O) of connected random graphs in the
class Ju (Q> O)
From the point of view of telecommunications networks, by far the most
interesting graphs are those with connected topology. This limitation restricts the value of the link density s from below by a critical threshold sf .
For large Q, the critical threshold is sf logQQ , as shown in Section 15.6.3.
In the theory of random graphs, the problem to determine the number of
connected random graphs F(Q> O) in the class Ju (Q> O) has been intensively studied. Gilbert (1956) has presented an exact recursion formula for
F(Q> O) via the technique of enumeration. Erdös and Rényi (1959, 1960)
have determined the asymptotic behavior of random graphs via the probabilistic method, largely introduced by Erdös himself. Since the analysis8 of
Gilbert is both exact and simple, we will review his results here and those
of Erdös and Rényi in the next section.
Consider a particular random graph M of the class of random graphs
Ju (Q + 1> O), which is constructed from the class Ju (Q> O) by adding one
node labelled Q + 1. Suppose that the node labelled Q + 1 in the random
graph M belongs to a connected component N that possesses y other nodes
and some number of links.
¡ ¢ The remaining part of M has Q y nodes and
O links. There are Qy ways in which the y nodes of N can be chosen
out of the Q nodes in Ju (Q> O). On the other hand, there are F(y + 1> )
¡ Q 3y ¢
ways of picking a connected random graph N while there are ( 2 ) ways
O3
of constructing the remaining part of M. Hence, since the number of ways
¡ Q +1 ¢
we can construct a graph M equals ( O2 ) , we obtain Gilbert’s recursion
8
A dierent, less admissible approach is found in Goulden and Jackson (1983).
15.6 Random graphs
333
formula,
µ¡Q+1¢¶
2
µ¡Q3y¢¶
y+1
=
O
Q µ ¶ (X
2 )
X
Q
y=0
y
2
F(y + 1> )
O
=y
(15.12)
Gilbert (1956) further derives the generating function for F(Q> O) as
!
Ã
n
" X
"
"
X
X
F(Q> O) Q O
(1 + |)(2) {n
(15.13)
{ | = log 1 +
Q!
n!
Q=1 O=1
n=1
which converges for 2 | 0 and all {. So far, no other explicit formulae
for F(Q> O) exist.
In 1889, Caley (see Appendix B.1, art. 3) proved that, in the special case
where O = Q 1, there holds F(Q> Q 1) = Q Q32 . In the other extreme
where O = Omax¡ corresponding
with a full mesh, we have F(Q> Omax ) = 1.
¢
Actually, when Q2 Q 1 ? O Omax , the graph is always connected
because the adjacency matrix D has necessarily at least one non-zero element
¡ ¢
¡Q ¢
per row. This means that F(Q> O) = ( 2 ) for Q Q 1 ? O O . In
max
2
O
all cases where O ? Q 1, the random graphs are necessarily Disconnected,
leading to F(Q> O) = 0.
For computational purposes, we rewrite (15.12) as
µ¡Q+1¢¶
2
O
=
¡ 0 ¢
Q +1
(X
2 )
=Q
y+1
µ ¡0¢ ¶ Q31
µ¡Q3y¢¶
2 )
X µQ ¶ (X
2
2
F(Q +1> )
F(y+1> )
+
y =y
O
O
y=0
Since O3 = >O , we arrive after a substitution of Q $ Q 1 at the
recursion formula,
µ¡Q ¢¶
F(Q> O) =
2
O
µ¡Q 313y¢¶
¶ (y+1
2 )
Q 1 X
2
F(y + 1> )
y
O
=y
Q32
Xµ
y=0
(15.14)
Below we list a few values:
F(2> 1) = 1
F(3> 2) = 3
F(4> 3) = 16
F(5> 4) = 125
F(5> 8) = 45
F(6> 5) = 1296
F(6> 9) = 4945
F(6> 13) = 105
F(7> 6) = 16807
F(7> 10) = 331506
F(7> 14) = 116175
F(7> 18) = 1330
F(3> 3) = 1
F(4> 4) = 15
F(5> 5) = 222
F(5> 9) = 10
F(6> 6) = 3660
F(6> 10) = 2997
F(6> 14) = 15
F(7> 7) = 68295
F(7> 11) = 343140
F(7> 15) = 54257
F(7> 19) = 210
F(4> 5) = 6
F(5> 6) = 205
F(5> 10) = 1
F(6> 7) = 5700
F(6> 11) = 1365
F(6> 15) = 1
F(7> 8) = 156555
F(7> 12) = 290745
F(7> 16) = 20349
F(7> 20) = 21
F(4> 6) = 1
F(5> 7) = 120
F(6> 8) = 6165
F(6> 12) = 455
F(7> 9) = 258125
F(7> 13) = 202755
F(7> 17) = 5985
F(7> 21) = 1
334
General characteristics of graphs
15.6.2 The Erdös and Rényi asymptotic analysis
In a classical paper, Erdös and Rényi (1959) proved that
¸ ¶
¸
µ 1
32{
is connected = h3h
lim Pr Ju Q> Q log Q + {Q
Q<"
2
(15.15)
Ignoring the integral part [.] operator and eliminating { using the number
of links O = 12 Q log Q + {Q gives, for large Q ,
Pr[Ju (Q> O) = connected] h3Q h
3 2O
Q
(15.16)
which should be compared with the exact result,
Pr[Ju (Q> O) = connected] =
F(Q> O)
¡(Q )¢
2
(15.17)
O
In contrast to the unattractive computation of the exact F(Q> O) via recursion (15.14), the Erdös and Rényi asymptotic expression (15.16) is simple.
The accuracy for relatively small Q is shown in Fig. 15.7.
1.0
L=N
Pr[Gr(N,L) = disconnected]
0.8
L = 3/2 N
0.6
L = 2N
0.4
L = 2/3 N log N
0.2
exact
Erdos' asymptotic formula
0.0
0
10
20
30
40
50
60
Number of Nodes N
Fig. 15.7. The probability that a random graph Ju (Q> O) is disconnected : a comparison between the exact result (15.17) and Erdos’ asymptotic formula (15.16) for
O = Q , O = 32 Q , O = 2Q and O = 23 Q log Q .
The key observation of Erdös and Rényi (1959) is that a phase transition in
random graphs with Q nodes occurs when the number of links O is around
15.6 Random graphs
335
Of = 12 Q log Q . Phase transitions are well-known phenomena in physics.
For example, at a certain temperature, most materials possess a solid-liquid
transition and at a higher temperature a second liquid-gas transition. Below
that critical temperature, most properties of the material are completely
dierent than above that temperature. Some materials are superconductive
below a certain critical temperature Wf , but normally conductive (or even
on the ¤property Dn
isolating) above Wf . Erdös and Rényi concentrated
£
that a random graph Ju (Q> O) with O{ = 12 Q log Q + {Q consists of
Q n connected nodes and n isolated nodes for fixed n. If Dfn means the
absence of property Dn , they proved that, for all fixed n, Pr [Dfn ] $ 0 if
Q $ 4 which means that for a large number of nodes Q , almost all random
graphs Ju (Q> O{ ) possess property Dn . This result is equivalent to a result
proved in Section 15.6.3 that the class of random graphs Js (Q) is almost
surely disconnected if the link density s is below sf logQQ and connected
for s A sf . In view of the analogy with physics, it is not surprising that
corresponding sharp transitions also are observed for other properties than
just Dn .
In the sequel, we will show that, for the random graph Ju (Q> O{ ), the
probability that the largest connected component, called the giant component JF (Q> O{ ), has Q n nodes is, for large Q , Poisson distributed with
mean h32{ ,
32{
(h32{ )n h3h
lim Pr [number of nodes in JF (Q> O{ ) = Q n] =
Q<"
n!
(15.18)
If n = 0, then all nodes belong to the giant component and the graph is
completely connected in which case (15.18) leads to (15.16).
The total number of graphs Ju (Q> O{ ) with n 1 isolated nodes equals
¡Q ¢¡(Q 3n)¢
2
, the number of ways in which n isolated nodes can be chosen
n
O{
out of the total of Q nodes multiplied by the number of graphs that can be
constructed with Q n nodes and O{ links. Observe that this total number
also includes those graphs where not all the Q n nodes are necessarily
connected. In other words, this total number includes the graphs that do
not possess property Dn . The total number of graphs W0 without isolated
node follows from the inclusion-exclusion formula (2.10) as
µ ¶µ¡Q3n¢¶
Q
X
n Q
2
W0 (Q> O{ ) =
(1)
n
O{
n=0
where the index n = 0 equals the total number of graphs with Q nodes
and O{ links, i.e. the total number of elements in the sample space. Evi-
336
General characteristics of graphs
dently, the total number F(Q> O{ ) of connected random graphs of the class
Ju (Q> O{ ) is smaller than W0 (Q> O{ ) because all of them must obey property
D0 as well.
Since9
¡Q ¢¡(Q 3n)¢
2
(h32{ )n
n
O{
lim
=
Q ¢
¡
(2)
Q<"
n!
O{
we obtain
32{ )n
W0 (Q> O{ ) X
32{
n (h
(1)
=
= h3h
Q ¢
¡
Q<"
(2)
n!
"
lim
n=0
O{
But, if Q $ 4, the dierence
W0 (Q> O{ ) F(Q> O{ )
Pr [Df0 ] $ 0
¡(Q )¢
2
O{
which demonstrates (15.18) for n = 0. The remaining case for n 0 in
(15.18) follows from the observation that the number of graphs in Ju (Q> O{ )
9
It is convenient to take the logarithm of
Q Q n wn =
n
2
O
{
Q
2
O{
2
n31 =
Qn \
n! m=0
O\
n31
{ 31 Q 3n
3m
1 \
2
=
(Q 3 m)
Q n! m=0
3
m
m=0
13
m
Q
2m
O{ 31 O\
{ 31 1 3
n O{ 31
n
(Q 3n)(Q 313n)
13
13
2m
Q
Q 31
1 3 Q (Q
m=0
31)
which is
log (n!wn ) = n log Q +
+
H[
{ 31
m=0
n
m
n
+ (H{ 3 1) log 1 3
+ log 1 3
log 1 3
Q
Q
Q 31
m=0
n31
[
log 1 3
2m
(Q 3 n)(Q 3 1 3 n)
3 log 1 3
2m
Q(Q 3 1)
For large Q and using the expansion log (1 3 }) = 3} + R } 2 , we have for fixed n with
2m
2m
log 1 3
= log 1 3
+ R Q 33
(Q 3 n)(Q 3 1 3 n)
Q(Q 3 1)
that
2n
+ R O2{ Q 33
log (n!wn ) = n log Q + R Q 31 3 O{
Q
In order to have a finite limit limQ <" log (n!wn ) = f M R, we must require that n log Q 3
f
O{ 2n
= f which implies that O{ = Q
log Q 3 Q
. For this scaling the order term R O2{ Q 33
Q
2
2n f
indeed vanishes if Q < ". By choosing { = 3 2n
, we arrive at the correct scaling of O{ =
1
Q
log
Q
+
{Q
postulated
above
and
f
=
32n{.
2
15.6 Random graphs
337
¡Q ¢
with property Dn is equal to n multiplied by the number of connected
graphs with Q n nodes and O{ links, which is approximately
¡Q ¢¡(Q 3n)¢
n
2
O{
¡( )¢
Q
2
O{
W0 (Q n> O{ )
(h32{ )n 3h32{
h
$
Q
3n
¡( )¢
n!
2
O{
where the limit gives the correct result because the small dierence between
the total number and that without property Dn tends to zero.
15.6.3 Connectivity and degree
There is an interesting relation between the connectivity of a graph, a global
property, and the degree G of an arbitrary node, a local property. The implication {J is connected} =, {Gmin 1} where Gmin = minall nodes MJ G
is always true. The opposite implication is not always true, however, because a network can consists of separate, disconnected clusters containing
nodes each with minimum degree larger than 1. A random graph can be
generated from a set of labelled Q nodes by randomly assigning a link with
probability s to each pair of nodes. During this construction process, initially separate clusters originate, but at a certain moment, one of those
clusters starts dominating (and swallowing) the other clusters. This largest
cluster becomes the giant component. For large Q and a certain sQ which
depends on Q , the implication {Gmin 1} =, {Js (Q) is connected} is almost surely (a.s.) correct. A rigorous mathematical proof is fairly complex
and omitted. Thus, for large random graphs Js (Q ) holds the equivalence
{Js (Q ) is connected} +, {Gmin 1} almost surely such that
Pr [Js (Q ) is connected] = Pr [Gmin 1]
a.s.
From (3.32) and (15.11), we have that
´Q
³
Pr[Gmin 1] = (Pr[Guj 1])Q = (1 Pr[Guj = 0])Q = 1 (1 s)Q31
which shows that Pr [Gmin 1] rapidly tends to one for fixed 0 ? s ? 1
and large Q. Therefore, the asymptotic behavior of Pr [Js (q) is connected]
338
General characteristics of graphs
requires the investigation of the influence of s as a function of Q ,
³
³
´´
Pr [Js (Q ) is connected] = exp Q log 1 (1 sQ )Q31
4
3
"
m(Q31)
X
(1 sQ )
D
= exp CQ
m
m=1
4
3
"
(Q31)m
X
(1
s
)
Q
31
Q
D
exp CQ
= h3Q(13sQ )
m
m=2
If we denote fQ , Q · (1 sQ )Q 31 , then
Q
"
X
(1 sQ )(Q31)m
m=2
m
=
"
X
m=2
fmQ
mQ m31
¡ ¢
can be made arbitrarily small for large Q provided we choose fQ = R Q with ? 12 . Thus, for large Q , we have that
³
³
´´
Pr [Js (Q ) is connected] = h3fQ 1 + R Q 231
which tends to 0 for 0 ? ? 12 and to 1 for ? 0. Hence, the critical
exponent where a sharp transition occurs is = 0. In that case, fQ = f (a
real positive constant) and
µ
µ
¶
¶
log Qf
log f
log Q
=
+R
=
sQ = 1 exp
Q 1
Q
Q
In summary, for large Q ,
Pr [Js (Q ) is connected] $
0
1
if s ? logQQ
if s A logQQ
(15.19)
with a transition region around sf logQQ with a width of R( Q1 ). Notice
{
' logQQ + Q{ : for large { ? 0,
the agreement with (15.15) where s{ = OOmax
32{
32{
$ 0, while for large { A 0, h3h
$ 1 and the width of the transition
h3h
1
region for the link density s is R( Q ).
15.6.4 Size of the giant component
Let V = Pr [q 5 F] denote the probability that a node q in Js (Q ) belongs
to the giant component F. If q 5
@ F, then none of the neighbors of node q
15.6 Random graphs
339
belongs to the giant component. The number of neighbors of a node q is
the degree gq of a node such that
Pr [q 5
@ F] = Pr [all neighbor of q 5
@ F]
X
=
Pr [all n neighbors of q 5
@ F|gq = n] Pr [gq = n]
nD0
Since in Js (Q ) all neighbors of q are independent10 , the conditional probability becomes, with 1 V = Pr [q 5
@ F],
@ F])n = (1 V)n
Pr [all n neighbors of q 5
@ F|gq = n] = (Pr [q 5
Moreover, this probability holds for any node in q 5 Js (Q ) such that,
writing the random variable Grg instead of an instance gq ,
1V =
"
X
(1 V)n Pr [Grg = n] = *Grg (1 V)
n=0
¤
£
where *Grg (x) = H xGrg is the generating function of the degree Grg in
Js (Q ). For large Q , the degree distribution in Js (Q ) is Poisson distributed
with mean degree rg = s (Q 1) and *Grg (x) ' hrg (x31) . For large Q , the
fraction V of nodes in the giant component in the random graph satisfies
an equation similar to that in (12.13) of the extinction probability in a
branching process,
V = 1 h3rg V
(15.20)
and the average size of the giant component is Q V. For rg ? 1 the only
solution is V = 0 whereas for rg A 1 there is a non-zero solution for the
size of the giant component. The solution can be expressed as a Lagrange
series using (5.34),
V (rg ) = 1 h3rg
"
X
(q + 1)q ¡
q=0
(q + 1)!
rg h3rg
¢q
(15.21)
By reversing (15.20), the average degree in the random graph can be expressed in terms of the fraction V of nodes in the giant component,
rg (V) = 10
log (1 V)
V
(15.22)
This argument is not valid, for example, for a two-dimensional lattice Z2s in which each link
between adjacent nodes at integer value coordinates in the plane exists with probability s. The
critical link density for connectivity in Z2s is sf = 12 , a famous result proved in the theory of
percolation (see, for example, Grimmett (1989)).
340
General characteristics of graphs
15.7 The hopcount in a large, sparse graph with unit link weights
Routers in the Internet forward IP packets to the next hop router, which is
found by routing protocols (such as OSPF and BGP). Intra-domain routing
as OSPF is based on the Dijkstra shortest path algorithm, while inter-domain
routing with BGP is policy-based, which implies that BGP does not minimize a length criterion. Nevertheless, end-to-end paths in the Internet are
shortest paths in roughly 70% of the cases. Therefore, we consider the shortest path between two arbitrary nodes because (a) the IP address does not
reflect a precise geographical location and (b) uniformly distributed world
wide communication, especially, on the web seems natural since the information stored in servers can be located in places unexpected and unknown to
browsing users. The Internet type of communication is dierent from classical telephony because (a) telephone numbers have a direct binding with a
physical location and (b) the intensity of average human interaction rapidly
decreases with distance. We prefer to study the hopcount KQ because it is
simple to measure via the trace-route utility, it is an integer, dimensionless,
and the quality of service (QoS) measures (such as packet delay, jitter and
packet loss) depend on the hopcount, the number of traversed routers. In
this section, we first investigate the hopcount in a sparse, but connected
graph where all links have unit weight. Chapter 16 treats graphs with other
link weight structures.
15.7.1 Bi-directional search
The basic idea of a bi-directional search to find the shortest path is by
starting the discovery process (e.g. using Dijkstra’s algorithm) from D and
E simultaneously. When both subsections from D and from E meet, the
concatenation forms the shortest path from D to E. In case all link weights
are equal, z (l $ m) = 1 for any link l $ m in a graph J, the shortest path
from D and E is found when the discovery process from D and that from E
have precisely one node of the graph in common.
Denote by FD (o), respectively FE (o), the set of nodes that can be reached
from D, respectively E, in o or less hops. We define FD (0) = {D} and
FE (0) = {E}. The hopcount is larger than 2o if and only if FD (o) _ FE (o) is
empty. Conditionally on |FD (o)| = qD , respectively |FE (o)| = qE , the sets
FD (o) and FE (o) do not possess a common node with probability
¡Q313qD ¢
Pr [FD (o) _ FE (o) = B||FD (o)| = qD > |FE (o)| = qE ] =
q
E
¡Q 31
¢
qE
15.7 The hopcount in a large, sparse graph with unit link weights
341
which consists of the ratio of all combinations in which the qE nodes around
E can be chosen out of the remaining nodes that do not belong to the set
FD over all combinations in which qE nodes can be chosen in the graph with
Q nodes except for node D. Furthermore,
¡Q313qD ¢
(Q qD 1)(Q qD 2) · · · (Q qD qE )
qE
¡Q31
¢ =
(Q 1)(Q 2) · · · (Q qE )
q
E
+qE
(1 qDQ+1 )(1 qDQ+2 ) · · · (1 qDQ
)
=
qE
1
2
(1 Q )(1 Q ) · · · (1 Q )
For large Q , we apply the Taylor series around { = 0 of log (1 {) =
P
{m
{ "
m=2 M ,
¡Q313qD ¢
log
q
E
¡Q31
¢
qE
µ
¶
µ
¶
qD + n
n
log 1 log 1 Q
Q
n=1
Ã
!
¶ X
qE µ
qE
"
X
X
1
n
qD + n
(qD + n)m nm
=
Q
Q
mQ m
m=2
n=1
n=1
µ
¶
³
´
2
qD qE
1
1
qD qE
1
=
+
+
U
Q
Q
2qD 2qE
2qD qE
=
qE
X
where the remainder is
µ³
¶
m31 µ ¶
qE
"
X
X
1 X m
qD qE ´3
m3p
p
q
n =R
U=
mQ m
Q
p D
m=3
p=0
n=1
After exponentiation
µ
µ³
¶¶
qD qE ´2
1+R
Q
µ
¶
H [|FD (o)|2 |FE (o)|2 ]
By the law of total probability (2.47) and up to R
for
Q2
¯
qD qE
£
¤
Pr KQ A 2o¯|FD (o)| = qD > |FE (o)| = qE = h3 Q
large Q , we obtain
µ
¶¸
|FD (o)| |FE (o)|
Pr [KQ A 2o] H exp Q
(15.23)
This probability (15.23)
h holds for any ilarge graph with a unit link weight
structure provided H |FD (o)|2 |FE (o)|2 = r(Q 2 ). Formula (15.23) becomes
increasingly accurate for decreasing |FD (o)| and |FE (o)|, and so for sparser
large graphs.
342
General characteristics of graphs
15.7.2 Sparse large graphs and a branching process
In order to proceed, the number of nodes in the sets FD (o) and FE (o) needs to
be determined, which is di!cult in general. Therefore, we concentrate here
on a special class of graphs in which the discovery process from D and E is
reasonably well modeled by a branching process (Chapter 12). A branching
process evolves from a given set FD (o 1) in the next o-th discovery cycle
(or generation) to the set FD (o) by including only new nodes, not those
previously discovered. The application of a branching process implies that
the newly discovered nodes do not possess links to any previously discovered
node of FD (o 1) except for its parent node in FD (o 1). Hence, only for
large and sparse graphs or tree-like graphs, this assumption can be justified,
provided that the number of links that point backwards to early discovered
nodes in FD (o 1) is negligibly small.
Assuming that a branching process models the discovery process well, we
will compute the number of nodes that can be reached from D and similarly
from E in o hops from a branching process with production \ specified by
the degree distribution of the nodes in the graph. The additional number
of nodes [o discovered during the o-th cycle of a branching process that
are included in the set FD (o) is described by the basic law (12.1). Thus,
P
|FD (o)| = on=0 [n with [0 = 1 (namely node D). In terms of the scaled
n
random variable Zn = [
with unit mean H [Zn ] = 1,
n
|FD (o)| =
o
X
Zn n
n=0
and where = H [\ ] 1 A 1 denotes the average degree minus 1, i.e. the
outdegree, in the graph. Only the root has H [\ ] equal to the mean degree.
Immediately, the average size of the set of nodes reached from D in o hops
is with H [Zn ] = 1,
H [|FD (o)|] =
o
X
n =
n=0
o+1 1
1
which equally holds for H [|FE (o)|].
Applying Jensen’s inequality (5.5) to (15.23) yields
µ
¶
µ
¶¸
H [FD (o)] H [FE (o)]
FD (o)FE (o)
exp H exp Q
Q
such that
µ
Pr [KQ A n] exp 2
n
Q ( 1)2
¶
15.7 The hopcount in a large, sparse graph with unit link weights
343
With the tail probability expression (2.36) for the average, we arrive at the
lower bound for the expected hopcount in large graphs,
µ
¶
"
"
X
X
2
n
Pr [KQ A n] exp H [KQ ] =
Q ( 1)2
n=0
n=0
¢
¡
P
n can be evaluated exactly11 as
The sum V1 (w) = "
n=0 exp w
´i
³
h
¡ ¢
" cos 2n log + arg 2nl
X
log w
log 2
1 log w + q
V1 (w) = +s
2
2
log log n=1
2n sinh 2n
log " ³
´
X
3n
+
1 h3w
n=1
Furthermore,
´i ¯
³
h
¯
¡¢
¯
¯X
2n
2nl
"
¯ " cos log log w + arg log ¯ X
1
¯
¯
q
q
= e()
¯
¯
2
¯ n=1 2n sinh 2n2
¯n=1
2n sinh 2n
log log is increasing, but for 1 ? 5 its maximum
and the function W () = I2e()
log 2
value W (5) is smaller than 0=0035. Since w = Q(31)
2 is small and A 1, we
approximate
1 log w + V1 (w) (15.24)
2
log 11
For = Uh(v) A 0 and Uh(s) D 0, we have
K(v)
=
sv
] "
s
wv31 h3 w gw
0
and
K(v)
or
] "
"
"
[
[
n
1
=
wv31
h3 w gw
nv
0
n=0
n=0
] "
wv31 V1 (w) gw = K(v)
0
v
v 3 1
By Mellin inversion, for f A 0,
V1 (w) =
1
2l
] f+l"
f3l"
K(v) v
gv
v 3 1 w
By moving the line of integration to the left, we encounter a double pole at v = 0 from K(v)
2nl
from v131 . Invoking Cauchy’s residue theorem leads
and v131 and simple poles at v = log
to the result.
344
General characteristics of graphs
and arrive, for large Q , at
2
log Q
1 log (31)2 + log Q
H [KQ ] + log 2
log log This shows that in large, sparse graphs for which the discovery process is
Q
well modeled by a branching process, it holds that H [KQ ] scales as log
log where = H [\ ] 1 A 1 is the average degree minus 1 in the graph.
We can refine the above analysis. Let us now assume that the convergence
of Zn $ Z is su!ciently fast for large Q and that Z A 0 such that,
|FD (o)| ZD
o
X
n = ZD
n=0
o+1 1
o+1
ZD
1
1
is a good approximation (and similarly for |FE (o)|). The verification of this
approximation is di!cult in general. Theorem 12.3.2 states that Pr [Z = 0] =
0 and equivalently Pr [Z A 0] = 1 0 where the extinction probability 0
obeys the equation (12.13). Using this approximation, we find from (15.23)
µ
¸
¶¯
¯
ZD ZE 2o+2
¯ ZD > ZE A 0
Pr [KQ A 2o] H exp Q ( 1)2 ¯
where the condition on Z A 0 is required else there are no clusters FD (o)
and FE (o) nor a path. Since the same asymptotics also holds for odd values
of the hopcount, we finally arrive, for n 1 and large Q , at
´¯
h
³
i
¯
Pr [KQ A n] H exp ]n ¯ ZD > ZE A 0
where the random variable
]=
g
2
ZD ZE
Q ( 1)2
g
and ZD = ZE = Z . A more explicit computation of Pr [KQ A n] requires
the knowledge of the limit random variable Z , which strongly depends on
the nodal degree \ .
The average hopcount H [KQ ] is found similarly as in the analysis above
by using (15.24) with w = ],
H [KQ ] H [ V1 (])| ZD > ZE A 0]
¯
#
"
1 2 log Z log Q + 2 log (31) + ¯¯
=H
¯Z A 0
¯
2
log 1 log Q 2 log (31) H [ log Z | Z A 0]
= +
2
2
log log 15.7 The hopcount in a large, sparse graph with unit link weights
345
In sparse graphs with average degree H [\ ] equal to and for a large number
of nodes Q , the average hopcount is well approximated12 by
H [KQ ] =
1 2 log ( 1)
H [ log Z | Z A 0]
log Q
+ 2
log 2
log log (15.25)
This expression (15.25) for the average hopcount — which is more refined than
Q
the commonly used estimate H [KQ ] log
log — contains the curious average
H [ log Z | Z A 0] where Z is the limit random variable of the branching
process produced by the graph’s degree distribution \ .
Application to Gp (N) The above analysis holds for fixed H [\ ] = s(Q where is approximately
1) such that, for large Q , we require that s = Q
equal to the average degree. Since the binomial distribution (15.11) for
the degree in Js (Q) is very well approximated by the Poisson distribution
n
Pr [Grg = n] n! h3 for large Q and constant , formula (15.25) requires
the computation of H [ log Z | Z A 0] in a Poisson branching process, which
is presented in Hooghiemstra and Van Mieghem (2005) but here summarized
in Fig. 15.8. The numerical evaluation of average hopcount (15.25) in a
1.2
1.0
E[logW|W>0]
0.8
0.6
0.4
0.2
0.0
-0.2
1
2
3
4
5
6
7
8
9
10
P
Fig. 15.8. The quantity H [ log Z | Z A 0] of a Poisson branching process versus the
average degree .
12
A more rigorous derivation that stochastically couples the graph’s growth specified by a certain
degree distribution to a corresponding branching process is found in van der Hofstad et al.
(2005). In particular, the analysis is shown to be valid for any randomly constructed graph
with a finite variance of the degree. More details on the result for the average hopcount are
presented in Hooghiemstra and Van Mieghem (2005).
346
General characteristics of graphs
random graph of the class Js (Q ) for small average degree and large Q
shows that (15.25) is much more accurate than only its first term log Q .
At the other end of the scale for a constant link density s = f ? 1, which
corresponds to an average degree H [\ ] = f(Q 1), the above analysis no
longer applies for such large values of the average degree H [\ ]. Fortunately,
in that case, an exact asymptotic analysis is possible (see Problem (iii)):
Pr [KQ = 1] = s
¡
¢
Pr [KQ = 2] = (1 s) 1 (1 s2 )Q 32
(15.26)
Values of KQ higher than 2 are extremely unlikely since Pr [KQ A 2] = (1 £
¤Q 32
s) 1 s2
tends to zero rapidly for su!ciently large Q . Hence, H[KQ ] '
Pr [KQ = 1] + 2 Pr [KQ = 2] ' 2 s and, similarly, we find Var[KQ ] '
s(1s). This asymptotic analysis even holds for a larger link density regime
1
s = fQ 3 2 + with A 0 because
¤Q32
£
1
=0
lim Pr [KQ A 2] = lim (1 fQ 3 2 + ) 1 fQ 31+2
Q<"
Q<"
but for = 0, it holds that limQ <" Pr [KQ A 2] = h3f A 0.
In summary, if the link density s scales as s = fQ 3 with 5 [0> 12 ), the
average hopcount H[KQ ] ' 2 s is constant and very small. If s = Q 13
,
equation (15.25) shows that H [KQ ] log Q . The regime in between for
5 [ 12 > 1) needs other analysis techniques.
15.8 Problems
(i) An extremely regular graph is a g-lattice where each nodal position corresponds to a point with integer coordinates within a gdimensional hyper-cube with size ]. Apart from border nodes, each
node has a constant degree (number of neighbors), precisely equal
to 2g. Assuming that all link metrics are equal to one, compute the
probability generating function of the hopcount of the shortest path
between two uniformly chosen points.
coe!cient
(ii) If fJs (Q) is the clustering
£
¤
£of the ¤random graph Js (Q ),
then compute Pr fJs (Q ) { and H fJs (Q) .
(iii) Derive (15.26) in Js (Q ) with unit link weights.
16
The Shortest Path Problem
The shortest path problem asks for the computation of the path from a
source to a destination node that minimizes the sum of the positive weights1
of its constituent links. The related shortest path tree (SPT) is the union of
the shortest paths from a source node to a set of p other nodes in the graph
with Q nodes. If p = Q 1, the SPT connects all nodes and is termed a
spanning tree. The SPT belongs to the fundamentals of graph theory and
has many applications. Moreover, powerful shortest path algorithms like
that of Dijkstra exist. Section 15.7 studied the hopcount, the number of
hops (links) in the shortest path, in sparse graphs with unit link weights.
In this chapter, the influence of the link weight structure on the properties
of the SPT will be analyzed. Starting from one of the simplest possible
graph models, the complete graph with i.i.d. exponential link weight, the
characteristics of the shortest path will be derived and compared to Internet
measurements.
The link weights seriously impact the path properties in QoS routing
(Kuipers and Van Mieghem, 2003). In addition, from a tra!c engineering
perspective, an ISP may want to tune the weight of each link such that the
resulting shortest paths between a particular set of in- and egresses follow
the desirable routes in its network. Thus, apart from the topology of the
graph, the link weight structure clearly plays an important role. Often, as
in the Internet or other large infrastructures, both the topology and the
link weight structure are not accurately known. This uncertainty about the
precise structure leads us to consider both the underlying graph and each
of the link weights as random variables.
1
A zero link weight is regarded as the coincidence of two nodes (which we exclude), while an
infinite link weight means the absence of a link.
347
348
The Shortest Path Problem
16.1 The shortest path and the link weight structure
Since the shortest path is mainly sensitive to the smaller, positive link
weights, the probability distribution of the link weights around zero will
dominantly influence the properties of the resulting shortest path. A regular link weight distribution Iz ({) = Pr [z {] has a Taylor series expansion
around { = 0,
¡ ¢
Iz ({) = iz (0) { + R {2
since Iz (0) = 0 and Iz0 (0) = iz (0) exists. A regular link weight distribution
is thus linear around zero. The factor iz (0) only scales all link weights, but
does not influence the shortest path. The simplest distribution of the link
weight z with a distinct dierent behavior for small values is the polynomial
distribution
Iz ({) = { 1{M[0>1] + 1{M[1>") >
A 0>
(16.1)
The corresponding density is iz ({) = {31 1{M[0>1] . The exponent
log Iz ({)
{0
log {
= lim
is called the extreme value index of the probability distribution of z and
= 1 for regular distributions. By varying the exponent over all nonnegative real values, any extreme value index can be attained and a large
class of corresponding SPTs, in short -trees, can be generated.
Fw(x)
1
D
D 0
H
D!
larger scale
1
x
Fig. 16.1. A schematic drawing of the distribution of the link weights for the three
dierent -regimes. The shortest path problem is mainly sensitive to the small
region around zero. The scaling invariant property of the shortest path allows us
to divide all link weights by the largest possible such that Iz (1) = 1 for all link
weight distributions.
16.2 The shortest path tree in NQ with exponential link weights
349
Figure 16.1 illustrates schematically the probability distribution of the
link weights around zero (0> ], where A 0 is an arbitrarily small, positive
real number. The larger link weights in the network will hardly appear in
a shortest path provided the network possesses enough links. These larger
link weights are drawn in Fig. 16.1 from the double dotted line to the right.
The nice advantage that only small link weights dominantly influence the
property of the resulting shortest path tree implies that the remainder of the
link weight distribution (denoted by the arrow with larger scale in Fig. 16.1)
only plays a second order role. To some extent, it also explains the success
of the simple SPT model based on the complete graph NQ with i.i.d. exponential link weights, which we derive in Section 16.2. A link weight structure
eectively thins the complete graph NQ — any other graph is a subgraph of
NQ — to the extent that a specific shortest path tree can be constructed.
Finally, we assume the independence of link weights, which we deem a
reasonable assumption in large networks, such as the Internet with its many
independent autonomous systems (ASs). Apart from the Section 16.7, we
will mainly consider the case for = 1, which allows an exact analysis.
16.2 The shortest path tree in NQ with exponential link weights
16.2.1 The Markov discovery process
Let us consider the shortest path problem in the complete graph NQ , where
each node in the graph is connected to each other node. The problem of
finding the shortest path between two nodes D and E in NQ with exponentially distributed link weights with mean 1 can be rephrased in terms of a
Markov discovery process. The discovery process evolves as a function of
time and stops at a random time W when node E is found. The process is
shown in Fig. 16.2.
The evolution of the discovery process can be described by a continuoustime Markov chain [(w), where [(w) denotes the number of discovered nodes
at time w, because the characteristics of a Markov chain (Theorem 10.2.3)
are based on the exponential distribution and the memoryless property. Of
particular interest here is the property (see Section 3.4.1) that the minimum
of q independent exponential variables each with parameter l is again an
P
exponential variable with parameter ql=1 l .
The discovery process starts at time w = W0 with the source node D and
for the initial distribution of the Markov chain, we have Pr[[(W0 ) = 1] = 1.
The state space of the continuous Markov chain is the set VQ consisting of
all positive integers (nodes) q with q Q . For the complete graph NQ , the
350
The Shortest Path Problem
transition rates are given by
q = q(Q q)>
q 5 VQ
(16.2)
Indeed, initially there is only the source node D with label2 0, hence q =
1. From this first node D precisely Q 1 new nodes can be reached in
the complete graph NQ . Alternatively one can say that Q 1 nodes are
competing with each other each with exponentially distributed strength to
be discovered and the winner amongst them, say F with label 1, is the one
reached in shortest time which corresponds to an exponential variable with
rate Q 1.
v8
v7
corresponding URT
Markov
discovery
process
v6
v5
v4
v3
2
5
v2
h=0
0
6
3
4
1
5
h=1
h=2
6
v1
1
7
0
8
h=3
time
2
4
3
8
W6
7
Fig. 16.2. On the left, the Markov discovery process as function of time in a graph
with Q = 9 nodes. The circles centered at the discovering node D with label
0 present equi-time lines and yn is the discovering time of the n-th node, while
n = yn yn1 is the n-th interattachment time. The set of discovered nodes
redrawn per level are shown on the right, where a level gives the number of hops k
from the source node D. The tree is a uniform recursive tree (URT).
2
When continuous measures such as time and weight of a path are computed, the source node is
most conveniently labeled by zero, whereas in counting processes, such as the number of hops
of a path, the source node is labeled by one.
16.2 The shortest path tree in NQ with exponential link weights
351
After having reached F from D at hitting time y1 , two nodes q = 2 are
found and the discovery process restarts from both D and F. Although
at time y1 we were already progressed a certain distance towards each of
the Q 2 other, not yet discovered, nodes, the memoryless property of
the exponential distribution tells us that the remaining distance to these
Q 2 nodes is again exponentially distributed with the same parameter 1.
Hence, this allows us to restart the process from D and F by erasing the
previously partial distance to any other not yet discovered node as if we
ignore that it were ever travelled. From the discovery time y1 of the first
node on, the discovery process has double strength to reach precisely Q 2
new nodes. Hence, the next winner, say G labeled by 2, is reached at y2
in the minimum time out of 2(Q 2) traveling times. This node G has
equal probability to be attached to D or F because of symmetry. When
G is attached to D (the argument below holds similarly for attachment
to F), symmetry appears to be broken, because G and F have only one
link used, whereas D has already two links used. However, since we are
interested in the shortest path problem and since the direct link from D
to G is shorter than the path D $ F $ G, we exclude the latter in the
discovery process, hereby establishing again the full symmetry in the Markov
chain. This exclusion also means that the Markov chain maintains single
paths from D to each newly discovered node and this path is also the shortest
path. Hence, there are no cycles possible. Furthermore, similar to Dijkstra’s
shortest path algorithm, each newly reached node is withdrawn from the
next competition round, which guarantees that the Markov chain eventually
terminates. Besides terminating by extinction of all available nodes, after
each transition when a new node is discovered, the Markov chain stops with
1
, since each of the q already discovered nodes has
probability equal to Q3q
precisely 1 possibility out of the remaining Q q to reach E and only one
of them is the discoverer. The stopping time W is defined as the infimum
for w 0 at which the destination node E is discovered. In summary, the
described Markov discovery process, a pure birth process with birth rate
q = q(Q q), models exactly the shortest path for all values of Q .
16.2.2 The uniform recursive tree
A uniform recursive tree (URT) of size Q is a random tree rooted at D. At
each stage a new node is attached uniformly to one of the existing nodes
until the total number of nodes is equal to Q . The hopcount kQ (equivalent
to the depth or distance) is the smallest number of links between the root
D and a destination chosen uniformly from all nodes {1> 2> = = = > Q }.
352
The Shortest Path Problem
n
o
(n)
Denote by [Q the n-th level set of a tree W , which is the set of nodes
in the tree W at hopcount n from the root nD in aograph with Q nodes, and
(n)
(n)
(0)
by [Q the number of elements in the set [Q . Then, we have [Q = 1
because the zeroth level can only contain the root node D itself. For all
(n)
n A 0, it holds that 0 [Q Q 1 and that
Q31
X
(n)
[Q = Q
(16.3)
n=0
(q)
Another consequence of the definition is that, if [Q = 0 for some level
(m)
q ? Q 1, then all [Q = 0 for levels m A q. In such a case, the longest
possible shortest path in the tree has a hopcount of q. The level set
o
n
(1)
(2)
(Q31)
OQ = 1> [Q > [Q > = = = > [Q
(n)
of a tree W is defined as the set containing the number of nodes [Q at each
level n. An example of a URT organized per level n is drawn on the right
in Fig. 16.2 and in Fig. 16.3. A basic theorem for URTs proved in van der
Hofstad et al. (2002b), is the following:
(n)
(n)
Theorem 16.2.1 Let {\Q }n>QD0 and {]Q }n>QD0 be two independent
copies of the vector of level sets of two sequences of independent URTs.
Then
(n)
g
(n31)
{[Q }nD0 = {\Q1
(n)
+ ]Q3Q1 }nD0 >
(16.4)
where on the right-hand side the random variable Q1 is uniformly distributed
over the set {1> 2> = = = > Q 1}.
Theorem 16.2.1 also implies that a subtree rooted at a direct child of the
root is a URT. For example, in Fig. 16.3, the tree rooted at node 5 is a
URT of size 13 as well as the original tree without the tree rooted at node
5. By applying Theorem 16.2.1 to the URT subtree, any subtree rooted at
a member of a URT is also a URT.
An arbitrary URT X consisting of Q nodes and with the root labeled by
1 can be represented as
X = (q2 # 2) (q3 # 3) = = = (qQ # Q )
(16.5)
where (qm # m) means that the m-th node is attached to node qm 5 [1> m 1]
and q2 = 1. Hence, qm is the predecessor of m and the predecessor relation
is indicated by the arrow “#”. Moreover, qm is a discrete uniform random
variable on [1> m 1] and all q2 > q3 > = = = > qQ are independent.
16.2 The shortest path tree in NQ with exponential link weights
Root
1
6
12
2
18
3
4
22
24
5
26
9
7
8
10
14
21
13
16
20
23
25
19
11
15
17
353
X N( 0)
1
X N(1)
5
X N( 2)
9
X N( 3)
7
X N( 4)
4
Fig. 16.3. An instance of a uniform recursive tree with Q = 26 nodes organized per
level 0 n 4. The node number (inside the circle) indicates the order in which
the nodes were attached to the tree.
Theorem 16.2.2 The total number of URTs with Q nodes is (Q 1)!
Proof: (a) Let the nodes be labeled in the order of attachment to the
URT and assign label 1 for the root. The URT growth law indicates that
node 2 can only be attached in one way, node 3 in two ways, namely to node
1 and node 2 with equal probability. The n-th node can be attached in n 1
possible nodes. Each of these possible constructions leads to a URT.
(b) By summing over all allowable configurations in (16.5), we obtain
1 X
2
X
q2 =1 q3 =1
and this proves the theorem.
===
Q31
X
1 = (Q 1)!
qQ =1
¤
In general, Cayley’s Theorem (Appendix B.1 art. 3) states that there are
Q Q32 labeled trees possible. The URT is a subset of the set of all possible
labeled trees. Not all labeled trees are URTs, because the nodes that are
further away from the root must have larger labels.
The shortest path tree from the source or root D to other nodes in the complete graph is the tree associated with the Markov discovery process, where
the number of nodes [(w) at time w is constructed as follows. Just as the discovery process, the associated tree starts at the root D. We now investigate
the embedded Markov chain (Section 10.4) of the continuous-time discovery
process. After each transition in the continuous-time Markov chain, [(w) $
354
The Shortest Path Problem
[(w)+1, an edge of unit length is attached randomly to one of the q already
discovered nodes in the associated tree because a new edge is equally likely
to be attached to any of the q discovering nodes. Hence, the construction
of the tree associated with the Markov discovery process and illustrated in
Fig. 16.2 on the right demonstrates that the shortest path tree in the complete graph NQ with exponential link weights is an uniform recursive tree.
This property of the shortest path tree in NQ with exponential link weights
is an important motivation to study the URT. More generally, in van der
Hofstad et al. (2001) we have proved that, for a fixed link density s and su!ciently large Q , the shortest path tree in the class RGU, the class of random
graphs Js (Q ) with exponential or uniformly distributed link weights, is a
URT. Smythe and Mahmoud (1995) have reviewed a number of results on
recursive trees that have appeared in the literature from the late 1960s up
to 1995.
16.3 The hopcount kQ in the URT
16.3.1 Theory
The hopcount kQ from the root to an arbitrary chosen node in the URT
equals the number of links or hops from the root to that node. We allow
the arbitrary node to coincide with the root in which case kQ = 0.
Theorem 16.3.1 The probability generating function of the hopcount in the
URT with Q nodes is
h
i
(Q + })
*kQ (}) = H } kQ =
(16.6)
(Q + 1)(} + 1)
Proof: Since the number of nodes at hopcount n from the root (or at
(n)
level n) is [Qk , al node uniformly chosen out of Q nodes in the URT has
(n)
probability
H [Q
Q
of having hopcount n,
Pr[kQ = n] =
h
i
(n)
H [Q
Q
(16.7)
If the size of the URT grows from q to q + 1 nodes, each node at hopcount
n 1 from the root can generate a node at hopcount n with probability 1@q.
Hence, for n 1,
h
i
(n31)
i Q31
h
X H [q
(n)
H [Q =
q
q=n
16.3 The hopcount kQ in the URT
355
With (16.7), a recursion for Pr[kQ = n] follows for n 1 as
1 X
Pr[kQ = n] =
Pr[kq = n 1]
Q
Q 31
q=n
The generating function of kQ equals
Q31
h
i
X
kQ
Pr [kQ = n] } n
= Pr[kQ = 0] +
*kQ (}) = H }
n=1
=
1
1
+
Q
Q
Q
31 Q31
X
X
n=1 q=n
q
Q
31 X
X
1
1
=
+
Q
Q q=1
n=1
Pr[kq = n 1]} n
Q31
1
} X
Pr[kq = n 1]} =
*k (})
+
Q
Q q=1 q
n
Taking the dierence between (Q + 1)*kQ +1 (}) and Q *kQ (}) results in the
recursion
(Q + 1)*kQ +1 (}) = (Q + })*kQ (})
£ ¤
£ ¤
Iterating this recursion starting from *k1 (}) = H } k1 = H } 0 = 1 leads
to (16.6).
¤
Corollary 16.3.2 The probability density function of the hopcount in the
URT with Q nodes is
(n+1)
(1)Q 3(n+1) VQ
(16.8)
Q!
Proof: The probability generating function *kQ (}) in (16.6) is also the
(n)
generating function of the Stirling numbers VQ of the first kind (Abramowitz
and Stegun, 1968, 24.1.3) such that the probability that a uniformly chosen
node in the URT has hopcount n equals (16.8).
¤
Pr[kQ = n] =
The explicit form of the generating function shows that the average hopcount kQ in a URT of size Q equals
¯
Q
X
¯
1
g
0
H[kQ ] = *kQ (1) =
=
log *kQ (})¯¯
(16.9)
g}
o
}=1
o=2
= #(Q + 1) + 1
0
(})
is the digamma function (Abramowitz and Stegun,
where #(}) = KK(})
1968, Section 6.3) and the Euler constant is = 0=57721 = = =. Similarly,
356
The Shortest Path Problem
the variance (2.27) follows from the logarithm of the generating function
OkQ (}) = log (Q + }) log (Q + 1) log (} + 1) as
Var[kQ ] = # 0 (Q + 1) # 0 (2) + #(Q + 1) + 1
2
+ # 0 (Q + 1)
6
Using the asymptotic formulae for the digamma function leads to
µ ¶
1
H[kQ ] = log Q + 1 + R
Q
µ ¶
2
1
+R
Var[kQ ] = log Q + 6
Q
= #(Q + 1) + (16.10)
(16.11)
For large Q , we apply an asymptotic formula of the Gamma function
(Abramowitz and Stegun, 1968, Section 6.1.47) to the generating function
of the hopcount (16.6),
µ
µ ¶¶
Q }31
1
1+R
*kQ (}) =
(} + 1)
Q
P
1
n
= "
Introducing the Taylor series of K(})
n=1 fn } where the coe!cients fn
are listed in Abramowitz and Stegun (1968, Section 6.1.34), we obtain with
Q } = h} log Q ,
µ
µ ¶¶
"
"
X
1
logn Q n
1 X
n31
} 1+R
fn }
*kQ (}) =
Q
n!
Q
n=1
n=0
¡1¢ " n
1+R Q X X
logn3p Q n
}
fp+1
=
Q
(n p)!
n=0 p=0
With the definition (2.18) of the probability generating function, we conclude
that the asymptotic form of the probability density function (16.8) of the
hopcount in the URT is
¡ ¢ n
1 + R Q1 X
logn3p Q
fp+1
Pr[kQ = n] =
(16.12)
Q
(n
p)!
p=0
Since the coe!cients fn are rapidly decreasing, approximating the sum in
(16.12) by its first term (p = 0) yields to first order in Q ,
(log Q )n
(16.13)
Pr[kQ = n] Q n!
which is recognized as a Poisson distribution (3.9) with mean log Q . Hence,
for large Q and to first order, the average and variance of the hopcount in
16.3 The hopcount kQ in the URT
357
the URT are approximately H[kQ ] Var[kQ ] log Q . The accuracy of the
Poisson approximation can be estimated by comparison with the average
(16.10) and the variance (16.11) found above up to second order in Q . For
example, if the URT has Q = 104 nodes, the Poisson approximation yields
H[kQ ] = Var[kQ ] = 9=21034, while the average (16.10) is H[kQ ] = 8=78756
accurate up to 1034 and the variance (16.11) is Var[kQ ] = 8=14262. The
exact results are H[kQ ] = 8=78761 and Var[kQ ] = 8=14277.
16.3.2 Application of the URT to the hopcount in the Internet
In trace-route measurements explained in Van Mieghem (2004a), we are
interested in the hopcount KQ denoted with capital K, which equals kQ in
the URT excluding the event kQ = 0. In other words, the source and the
destination are dierent nodes in the graph. Since from (16.8) Pr[kQ = 0] =
(1)
(31)Q 31 VQ
Q!
= Q1 we obtain, for 1 n Q 1,
Pr[KQ = n] = Pr[kQ = n|kQ 6= 0] =
=
Pr[kQ = n> kQ 6= 0]
Pr[kQ 6= 0]
Q
Pr[kQ = n]
Q 1
Using (16.8), we find
(n+1)
Pr[KQ = n] =
Q (1)Q3(n+1) VQ
Q 1
Q!
(16.14)
with corresponding generating function,
*KQ (}) =
Q31
X
Pr[KQ = n] } n
n=1
Q X
Q
Pr[kQ = n] } n Pr[kQ = 0]
Q 1
Q 1
n=0
µ
¶
Q
1
=
*kQ (}) Q 1
Q
Q31
=
The average hopcount H[KQ ] = H[kQ |kQ 6= 0] is
Q X1
Q 1
o
Q 31
H[KQ ] =
o=2
(16.15)
358
The Shortest Path Problem
Hence, for large Q and in practice, we find that
µ
Pr[KQ = n] = Pr[kQ = n] + R
1
Q
¶
which allows us to use the previously derived expressions (16.12), (16.10)
and (16.11).
The histogram of the number of traversed routers in the Internet measured
between two arbitrary communicating parties seems reasonably well modeled
by the pdf (16.12). Figure 16.4 shows both the histogram of the hopcount
deduced from paths in the Internet measured via the trace-route utility and
the fit with (16.12). From the fit, we find a rather high number of nodes
Asia
Europe
USA
fit with log(NAsia) = 13.5
fit with log(NEurope) = 12.6
fit with log(NUSA) = 12.9
0.10
Pr[H = k]
0.08
0.06
0.04
0.02
0.00
0
5
10
15
20
25
30
hop k
Fig. 16.4. The histograms of the hopcount derived from the trace-route measurement in three continents from CAIDA in 2004 are fitted by the pdf (16.12) of the
hopcount in the URT.
h12=6 3 105 Q h13=5 7 105 , which points to the approximate nature of
modeling the Internet hopcount by that deduced from a URT. The relation
between Internet measurements and the properties of the URT is further
analyzed in a series of articles (Van Mieghem et al., 2000; van der Hofstad
et al., 2001; Van Mieghem et al., 2001b; Janic et al., 2002; van der Hofstad
et al., 2002b). At the time of writing, an accurate model of the hopcount in
the Internet is not available.
16.4 The weight of the shortest path
359
16.4 The weight of the shortest path
The weight — sometimes also called the length — of the shortest path is
defined as the sum of the link weights that constitute the shortest path. In
Section 16.2.1, the shortest path tree in the complete graph with exponential
link weights was shown to be a URT. In this section, we confine ourselves
to the same type of graph and require that the source node D (or root) is
dierent from the destination node E.
By Theorem 10.2.3 of a continuous-time Markov chain, the discovery time
Pn
of the n-th node from node D equals yn =
q=1 q , where 1 > 2 > = = = > n
are independent, exponentially distributed random variables with parameter
q = q(Q q) with 1 q n. We call m the interattachement time between
the discovery or the attachment to the URT of the m 1-th and m-th node
in the graph. The Laplace transform of yn is
Z "
£ 3}y ¤
g
n
=
h3}w Pr [yn w]
H h
gw
0
For a sum of independent exponential random variables, using the probability generating function (3.16), we have
!#
"
Ã
n
n
n
X
Y
£ 3}y ¤
£
¤ Y
q(Q q)
H h n = H exp }
=
q
H h3}q =
} + q(Q q)
q=1
q=1
q=1
(16.16)
¤
£
The probability generating function3 *ZQ (}) = H h3}ZQ of the weight
ZQ of the shortest path equals
*ZQ (}) =
Q31
X
¤
£
H h3}yn Pr [E is n-th attached node in URT]
n=1
1 X Y q(Q q)
Q 1
} + q(Q q)
Q31 n
=
(16.17)
n=1 q=1
because any node apart from the root D but including the destination node
E has equal probability to be the n-th attached node.
The average weight is
¯
¯
Q 31
n
g*ZQ (}) ¯¯
1 X g Y q(Q q) ¯¯
=
H [ZQ ] = ¯
¯
g}
Q 1
g} q=1 } + q(Q q) ¯
}=0
n=1
3
1
d
}=0
If the link weights have mean
(instead of 1), then ZQ is multiplied by d as explained in
Sections 16.2.1 and 3.4.1. The weight of the scaled shortest path ZQ>d has pgf
l
k
*ZQ>d (}) = H h3}dZQ = *ZQ (d})
360
The Shortest Path Problem
Using the logarithmic derivative of the product,
¯
n
g Y q(Q q) ¯¯
¯
g} q=1 } + q(Q q) ¯
n
Y
}=0
q(Q q) g
=
} + q(Q q) g}
q=1
=
à n
X
q(Q q)
log
} + q(Q q)
q=1
!¯
¯
¯
¯
¯
}=0
n
X
1
q(Q q)
q=1
gives
Q31 n
Q31
Q
31
X
1
1
1 XX
1 X
H [ZQ ] =
1
=
Q 1
q(Q q)
Q 1 q=1 q(Q q)
q=1
=
n=1
Q31
X
n=q
Q q
1
Q 1 q=1 q(Q q)
The average weight is
#(Q ) + 1 X 1
=
Q 1
q
Q 1
Q31
H [ZQ ] =
(16.18)
q=1
For large Q ,
log Q + H [ZQ ] =
+R
Q
µ
1
Q2
¶
Similarly, the variance is computed (see problem (ii) in Section 16.9) as,
³P
´2
Q 31 1
Q31
X
q=1 q
1
3
Var [ZQ ] =
(16.19)
2
Q (Q 1)
q
(Q 1)2 Q
q=1
and for large Q,
2
+R
Var [ZQ ] =
2Q 2
µ
log2 Q
Q3
¶
By inverse Laplace transform of (16.17), the distribution Pr [ZQ w] can
be computed. The asymptotic distribution for the weight of the shortest
path is (see problem (iii) in Section 16.9)
3{
lim Pr [Q ZQ log Q {] = h3h
Q<"
(16.20)
A related but slightly more complex analysis is presented in Section 16.5.1
where we study the flooding time. The interest of such an asymptotic analysis is that it often leads to tractable solutions that are physically more appealing to interpret. Moreover, it turns out that results for finite, not too
small Q are reasonably approximated by the asymptotic law.
16.5 The flooding time WQ
361
Since ZQ equals the sum of the link weights of the shortest path from
the root to an arbitrary node and since KQ = kQ |kQ A 0 is the number of
links in that shortest path (where the arbitrary destination node is dierent
from the root), one may wonder whether there is a relation between them.
Although the shortest path has precisely KQ hops, the destination node of
that path is not necessarily the KQ -th attached node to the URT grown
at the root. The destination node cannot be discovered sooner than the
KQ -th attached node, otherwise the hopcount of the shortest path would be
shorter than KQ . Hence, the destination node is the n-th discovered node
and attached to the URT somewhere in between the KQ 1-th and the last
attached node. Thus, n 5 [KQ > Q 1]. If n = KQ , then all previously
discovered nodes belong to the shortest path and the m-th attached node in
the URT is linked to the m 1-th, for all m n. If n A KQ , precisely n KQ
of the attached nodes do not belong to the shortest path. Hence ZQ = Zn
provided n KQ nodes in the URT discovered so far do not belong to the
path and precisely KQ do. The latter condition requires the determination
of all structurally favorable possibilities which is rather complex.
Curiously, the probability that the shortest path consists of the direct
(2)
link between source and destination is, with (16.14), (16.18) and VQ =
P
1
(1)Q (Q 1)! Q31
n=1 n ,
1 X 1
Pr[KQ = 1] =
= H [ZQ ]
Q 1
n
Q31
n=1
16.5 The flooding time WQ
The most commonly used process that informs each node (router) about
changes in the network topology is called flooding: the source node initiates
the flooding process by sending the packet with topology information to all
adjacent neighbors and every router forwards the packet on all interfaces
except for the incoming one and duplicate packets are discarded. Flooding
is particularly simple and robust since it progresses, in fact, along all possible paths from the emitting node to the receiving node. Hence, a flooded
packet reaches a node in the network in the shortest possible time (if overheads in routers are ignored). Therefore, an interesting problem lies in the
determination of the flooding time WQ , which is the minimum time needed
to inform all nodes in a network with Q nodes. Only after a time WQ , all
topology databases at each router in the network are again synchronized,
i.e. all routers possess the same topology information. The flooding time WQ
362
The Shortest Path Problem
is defined as the minimum time needed to reach all Q 1 remaining nodes
from a source node over their respective shortest paths.
We will here consider the flooding time WQ in the complete graph containing Q nodes and with independent, exponentially distributed link weights
with mean 1. The generalization to the random graph Js (Q ) with i.i.d. exponential (or uniform4 ) distributed link weight is treated in van der Hofstad
et al. (2002a).
The flooding time WQ equals the absorption time, starting from state
q = 1 of the birth-process with rates (16.2). The probability generating
function follows directly from (16.16) with n = Q 1,
3{WQ
*WQ ({) = H[h
Z "
]=
0
h3{w iWQ (w) gw =
Q31
Y
q(Q q)
q(Q q) + {
q=1
(16.21)
The average flooding time equals
H[WQ ] =
Q31
X
H [q ] =
q=1
Q31
X
Q31
1
2 X 1
2
=
= (#(Q ) + ) (16.22)
q(Q q)
Q q=1 q
Q
q=1
Using the asymptotic expansion (Abramowitz and Stegun, 1968, Section
6.3.18) of the diagamma function, we conclude that
2 log Q
Q
which demonstrates that the average flooding time in the complete graph
with exponential link weights with mean 1 decreases to zero when Q $ 4.
Also, the average flooding time is about twice as long as the average weight
of an arbitrary shortest path (16.18). The variance of WQ equals
H[WQ ] Q31
Q31
1
2 X 1
4 X 1
=
+
q2 (Q q)2
Q 2 q=1 q2 Q 3 q=1 q
q=1
q=1
(16.23)
³
´
log Q
2
For large Q , we have that Var[WQ ] = 3Q 2 + R Q 3 .
Var[WQ ] =
Q31
X
Var [q ] =
Q31
X
16.5.1 The asymptotic law for flooding time WQ
The exact expression iWQ (w) for probability density function of the flooding
time WQ derived in van der Hofstad et al. (2002a), does not provide much
4
Both the exponential and uniform distribution are regular distributions with extreme value
index = 1. This means that the small link weights that are most likely included in the
shortest path are almost identically distributed for all regular distributions with same iz (0).
16.5 The flooding time WQ
363
insight. Because we are interested in the flooding time in large networks,
we investigate the asymptotic distribution of WQ > for Q large. We rewrite
(16.21) as
[(Q 1)!]2
h
¡
¢ i
Q31
Q2
Q 2
q
{
+
q=1
4
2
*WQ ({) = Q
For Q = 2P , using K(}+p)
K(}+1) =
Ã
*W2P ({) =
(16.24)
Qp31
q=1 (q + }), we deduce that
!2
s
(2P )(1 + { + P 2 P )
s
(P + { + P 2 )
(16.25)
s
{
For large P , there holds { + P 2 P + 2P
, provided |{| ? 2P . After
substitution of { = 2P | in (16.25), with ||| ? 1, we obtain
*W2P (2P |) 2 (1 + |)
2 (2P )
2 (1 + |)(2P )32|
2 (2P + |)
from which follows the asymptotic relation
lim Q 2| *WQ (Q |) = 2 (1 + |)>
Q<"
Equivalently, we have for ||| ? 1,
3|(QWQ 32 log Q)
lim H[h
Q<"
1
] = lim
Q<" Q
Z "
µ
3|w
h
3"
||| ? 1
iWQ
w + 2 log Q
Q
(16.26)
¶
gw
= 2 (1 + |)
This limit demonstrates that the probability distribution function of the
random variable QWQ 2 log Q converges to a probability distribution with
Laplace transform 2 (1 + |). Let us define the normalized density function
¶
µ
1
w + 2 log Q
jQ (w) = iWQ
(16.27)
Q
Q
We can prove convergence in density, i.e. limQ <" jQ (w) = j (w) and that the latter exists. By
the inversion theorem for Laplace transforms we obtain for w M R,
1
Q <" 2l
lim jQ (w) = lim
Q <"
] f+l"
f3l"
h|w Q 2| *WQ (Q|)g|
where 0 ? f ? 1. Since K(}) is analytic over the entire complex plane except for simple poles at
the points } = 3q for q = 0> 1> 2> ===> we find that Q 2| *WQ (Q|) is analytic whenever the real part
of | is non-negative. Evaluation along the line Re(|) = f = 0 then gives
] "
1
lim jQ (w) = lim
hlwx Q 2lx *WQ (lQx)gx
Q <"
Q <" 2 3"
364
The Shortest Path Problem
As dominating function we take
|hlwx Q 2lx *WQ (lQx)| = |*WQ (lQx)| $
1 + x2
x4
when x| A 1> and |*WQ (lQx)| $ 1> for |x| $ 1= This follows from the first equality in (16.24),
using only the factors in the product with q = 1 and q = Q 3 1> and bounding the other factors
using
q(Q 3 q)
$1
|q(Q 3 q) + lQx|
The Dominated Convergence Theorem 6.1.4 allows us to interchange the limit and integration
operator such that
lim jQ (w) =
Q <"
1
2
] "
1
=
2l
hlwx lim Q 2lx *WQ (lQx)gx =
3"
] l"
Q <"
1
2l
] l"
3l"
hw| lim Q 2| *WQ (Q|)g|
Q <"
hw| K2 (1 + |)g|
(16.28)
3l"
The right-hand side of (16.26) is a perfect square, which indicates that
the limit distribution is a two-fold convolution. Now, the Mellin transform
(Titchmarsh, 1948) of the exponential function is
Z f+l"
1
3w
w3| (|) g|>
fA0
h =
2l f3l"
and thus with w = h3x ,
g ³ 3h3x ´
1
h
=
gx
2l
Z f+l"
h|x (| + 1) g|
f3l"
which shows that (16.28) is the two-fold convolution of the probability den3w
g
(w)> where (w) = h3h is the Gumbel distribution (3.37).
sity function gw
Furthermore, the two-fold convolution is given by
Z "
g ³ (2W) ´
3x
3(w3x)
h3h h3h
gx
(w) = h3w
gw
3"
µ
¶¸
Z "
w
3w
3w@2
=h
exp 2h
cosh
x gx
2
3"
Z "
³
´
h
i
exp 2h3w@2 cosh (x) gx = 2h3w N0 2h3w@2
= 2h3w
0
where N ({) denotes the modified Bessel function (Abramowitz and Stegun,
1968, Section 9.6) of order .
In summary,
³
´
g ³ (2W) ´
(w) = 2h3w N0 2h3w@2
(16.29)
lim jQ (w) = j(w) =
Q<"
gw
16.5 The flooding time WQ
365
and the corresponding distribution function is
Z }
³
´
h3w N0 (2h3w@2 )gw = 2h3}@2 N1 2h3}@2
lim Pr[Q WQ 2 log Q }] = 2
Q<"
3"
(16.30)
The right-hand side of (16.29) is maximal for w = 0=506357, which is slightly
smaller than = 0=577261> but still in accordance with H[WQ ] given by
(16.22). The asymmetry shows that {Q WQ 2 log Q + }} is much more
0
10
M=5
M = 10
M = 20
limit M of
-1
10
-2
10
g2M(t)
g2M(t)
0.20
-3
10
0.15
0.10
0.05
-4
10
0.00
-4
-2
0
2
4
6
8
10
t
-5
10
-4
-2
0
2
4
6
8
10
t
Fig. 16.5. The scaled density jQ (w) for three values of Q = 2P (dotted lines) and
the asymptotic result (full line) on a log-lin scale. The insert is drawn on a lin-lin
scale.
likely than the event {Q WQ 2 log Q }}, which confirms the intuition that
the flooding time can be much longer than the average H[WQ ], but not so
much shorter than H[WQ ]. Figure 16.5 illustrates the convergence of jQ (w) to
the limit in (16.29). When comparing (16.26) with the corresponding result
(C.6) for the weight of the shortest path, we observe that, for large Q , the
random variable Q WQ 2 log Q consists of the sum of Q ZQ;1 log Q +
Q ZQ;2 log Q , where both Q ZQ;m log Q are i.i.d. random variables.
Intuitively, we can say that the flooding time consists of the time to travel
from a left-hand corner of the graph to the center and from the center to a
right-hand corner of the graph.
The asymptotic distribution (16.30) is a beautiful example of a sum of Q
366
The Shortest Path Problem
independent random variables that clearly does not converge to a Gaussian
and, hence, does not obey the (extended) Central Limit Theorem 6.3.1.
16.6 The degree of a node in the URT
n
o
(n)
Let us denote by GQ
the set of nodes with degree n in a graph with
(n)
Q nodes
n
o and by GQ the cardinality (the number of elements) of this set
(n)
GQ . Since each node appears only in one set, it holds for any graph
that
Q31
X (n)
GQ = Q
(16.31)
n=1
In a probabilistic setting, we may investigate the event that the degree n
occurs in a graph of size Q . The expectation of that event is
6
5
Q
Q
Q
h
h
i
i X
X
X
(n)
8
7
H GQ = H
1{gm =n} =
H 1{gm =n} =
Pr [gm = n] (16.32)
m=1
m=1
m=1
By summing over all n, we verify that
#
"Q31
Q Q31
Q
X (n)
X
X
X
H
GQ =
Pr [gm = n] =
1=Q
n=1
m=1 n=1
m=1
which is again (16.31).
i
h
(n)
16.6.1 Recursion for Pr GQ = m in the URT
The growth law of URTs dictates the way a specific tree of size Q transforms
to the tree of size Q + 1 by adding the node with label Q + 1 at random.
Based on this growth law, the set of nodes with degree n in a specific tree
of size Q + 1 consists of:
(i) the same set of nodes with degree n in the ancestor tree of size Q
provided the new node qQ+1 is not attached to any of the nodes of
this set nor to any of the nodes with degree n 1;
(ii) the same set of nodes with degree n except for one, say node qo ,
provided the new node qQ+1 is attached to that node qo ;
(iii) the same set of nodes with degree n and one additional node of the
set of n 1 degree nodes provided the new node qQ+1 is attached to
a node of the set of degree n 1.
16.6 The degree of a node in the URT
367
The evolution scenario in three parts is generally applicable for any class
of trees that possess a growth law. It does not hold for graphs in general
because only in a tree, a node has one well-defined parent node and the
in-degree is one. Using the law of total probability (2.46) yields,
oi h
oi
i
h
n
n
h
(n)
(n)
(n) (n31)
(n) (n31)
Pr qQ+1 5
@ GQ >GQ
@ GQ >GQ
Pr GQ +1 = m = Pr GQ = m|qQ +1 5
n
n
oi h
oi
h
(n)
(n)
(n)
Pr qQ+1 5 GQ
+ Pr GQ = m + 1|qQ+1 5 GQ
n
n
h
oi h
oi
(n)
(n31)
(n31)
+ Pr GQ = m 1|qQ+1 5 GQ
Pr qQ+1 5 GQ
If the process of attaching a new node Q + 1 does not depend on the way
thehQ previous nodes
i arehattached ibut rather on their number, there holds
(n)
(n)
Pr GQ = m|qQ+1 = Pr GQ = m . This property holds for the URT. We
obtain a three point recursion for n A 1,
h
i
h
i h
n
oi
(n)
(n)
(n)
(n31)
Pr GQ+1 = m = Pr GQ = m Pr qQ+1 5
@ GQ > GQ
oi
i h
n
h
(n)
(n)
+ Pr GQ = m + 1 Pr qQ +1 5 GQ
i h
n
oi
h
(n)
(n31)
+ Pr GQ = m 1 Pr qQ +1 5 GQ
The probability generating function
"
h (n) i X
h
i
(n)
Pr GQ = m } m
*G (}> Q ; n) = H } GQ =
m=0
is obtained after multiplication by } m and summing over all m,
h
n
oi
(n)
(n31)
@ GQ > GQ
*G (}> Q ; n)
*G (}> Q + 1; n) = Pr qQ+1 5
h
n
oi * (}> Q ; n) * (0> Q; n)
(n)
G
G
+ Pr qQ+1 5 GQ
}
n
oi
h
(n31)
}*G (}> Q ; n)
+ Pr qQ+1 5 GQ
h
i
(n)
Now *G (0> Q ; n) = Pr GQ = 0 is the probability of the event that there
are no nodes with degree n for 1 n nmax Q 1. Since the normalization of the generating function requires that *G (1> Q ; n) = 1 and since
n
n
n
oi
h
oi
h
oi
h
(n) (n31)
(n)
(n31)
@ GQ >GQ
+Pr qQ+1 5 GQ +Pr qQ+1 5 GQ
=1
Pr qQ+1 5
h
i h
n
oi
(n)
(n)
it follows that Pr GQ = 0 Pr qQ+1 5 GQ
= 0. Further,
h
n
oi
(n)
Pr qQ+1 5 GQ
6= 0
368
The Shortest Path Problem
for any n 5 [1> nmax ] because the attachment of the node qQ+1 is possible
to any non-empty set, this means that the absence
of nodes
with degree
h
i
(n)
n 5 [1> nmax ] cannot occur in URTs, thus Pr GQ = 0 = 0. A consequence is that the probability generating function *G (}> Q ; n) is at least
R (}) as
h } $ 0 n(for Q A 1).oiAfter using *G (0> Q ; n) = 0 and eliminat(n)
(n31)
@ GQ > GQ
ing Pr qQ +1 5
, the recursion relation for the probability
generating function becomes5
n
oi
3 h
4
(n)
n
h
oi
Pr
q
5
G
Q+1
Q
*G (}> Q + 1; n)
(n31) D
= 1+C
Pr qQ+1 5 GQ
(1 })
*G (}> Q ; n)
}
(16.33)
The special case for n = 1 and Q A 1 is
h
n
oi
4
3
(1)
Pr qQ+1 5 GQ
(1 })
D *G (}> Q ; 1)
*G (}> Q + 1; 1) = C1 +
}
16.6.2 The Average Number of Degree n Nodes in the URT
In the URT, a new node qQ+1 is attached uniformly to any of Q previously
attached nodes such that
h
i
(n)
n
oi H GQ
h
(n)
=
Pr qQ+1 5 GQ
Q
Also, the probability
that an arbitrary node in a URT of size Q has degree
k
l
(n)
n equals
H GQ
Q
. We obtain from (16.33)
3
3
*G (}> Q + 1; n) = C1 + C
i
h
(n)
H GQ
Q}
h
i4
4
(n31)
H GQ
D (1 })D *G (}> Q ; n)
Q
(16.34)
5
k (n) l
(n)
= 1 for all p $ n because Gp = 0 if
With the initialization *G (}> p; n) = H } Gp
1 ? p $ n, after iterating (16.33) we arrive at
*G (}> Q ; n) =
Q
31 \
p=n
k
k
rl rl
q
q
1
(n)
(n31)
1 3 Pr qp+1 M Gp
3 Pr qp+1 M Gp
13
(1 3 })
}
16.6 The degree of a node in the URT
369
By taking the derivative of both sides in (16.34) with respect to } and
evaluating at } = 1, a recursion for the average is found,
h
i
(n31)
i Q 1 h
i H GQ
h
(n)
(n)
H GQ +
(16.35)
H GQ+1 =
Q
Q
h
i
(n)
(n)
Let uQ = (Q 1)H GQ , then the recursion valid for 1 ? n Q 2
becomes
un31
(n)
(n)
uQ+1 = uQ + Q
(16.36)
Q 1
Theorem 16.6.1 In the URT, the average number of degree n nodes is given
by
(n)
n31
i Q
h
(1)Q+n31 VQ31
(1)Q X (m)
(n)
VQ31 (2)m (16.37)
+ n
H GQ = n +
2
(Q 1)!
2 (Q 1)!
m=1
Proof: See Section 16.8.
¤
(n)
For large Q and using the asymptotics of the Stirling numbers VQ of the
first kind (Abramowitz and Stegun, 1968, Section 24.1.3.III), the asymptotic
law is
i
h
Ã
!
(n)
H GQ
logn31 Q
1
Pr [GURT = n] =
(16.38)
= n +R
Q
2
Q2
The ratio of the average number of nodes with degree n over the total number
of nodes, which equals the probability that an arbitrary node in a URT of
size Q has degree n,
exponentially fast with rate ln 2.
h decreases
i
(n)
The variance Var GQ is most conveniently computed from the logarithm
of the probability generating function with (2.27). By taking the logarithm
of both sides in (16.34) and dierentiating twice and adding (16.35), we
obtain
i
h
i
h
(n)
(n)
Var GQ+1 = i (Q ; n) + Var GQ
where
3
i(Q ; n) = C
h
i
(n)
H GQ
Q
h
h
i 42 3 h
i
i4
(n31)
(n)
(n31)
H GQ
H GQ
H GQ
D +C
D
+
+
Q
Q
Q
370
The Shortest Path Problem
h
i
(n)
Since Var Gp = 0 for p n, the general solution is
Q
i X
h
(n)
i (m; n)
Var GQ =
m=n
For large Q , using (16.38), we observe that
h
i
Ã
!
(n)
µ
¶
Var GQ
log2n32 Q
1
3
=
+R
Q
2n 22n
Q2
(n)
G
(16.39)
(n)
In practice, if we use the estimator ŵQ = QQ for the probability that the
degree of a node equals n, then (a) the estimator is unbiased
because the
l
k
(n)
h i
H
G
(n)
Q
mean of the estimator H ŵQ equals the correct mean
and (b) the
Q
l
k
¸
(n)
h i
(n)
¡ ¢
Var GQ
G
(n)
variance Var ŵQ = Var QQ =
$ 0 as R Q1 for large Q .
2
Q
10
0
RIPE data (May-June 2003) N = 2574, L = 3992
fit: ln(Pr[D U = k]) = 0.44 - 0.67 k with U = 0.99
Pr[DU = k]
10
10
10
10
RIPE data (Jan.-Feb. 2004) N = 3850, L = 6743
fit: ln(Pr[D U = k]) = -0.49 - 0.41 k with U = 0.95
-1
-2
-3
-4
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
k
Fig. 16.6. The histogram of the degree GX derived from the graph JX formed by
the union of paths measured via trace-route in the Internet. Both measurements in
2003 and 2004 are fitted on a log-lin plot and the correlation coe!cient quantifies
the quality of the fit.
The law (16.38) is observed in Fig. 16.6, which plots the histogram of the
degree GX in the graph JX . The graph JX is obtained from the union of
trace-routes from each RIPE measurement box to any other box positioned
16.6 The degree of a node in the URT
371
mainly in the European part of the Internet. For about 50 measurement
boxes in 2003, the correspondence is striking because the slope of the fit on
a log-lin scale equals 0.668 while the law (16.38) gives ln 2 = 0=693.
Ignoring in Fig. 16.6 the leave nodes with n = 1 suggests that the graph JX
is URT-like. For 72 measurement boxes in 2004 which obviously results in
a larger graph JX , deviations from the URT law (16.38) are observed. If
measurements between a larger full mesh of boxes were possible and if the
measurement boxes were more homogeneously spread over the Internet, a
power law behavior is likely to be expected as mentioned in Section 15.3.
However, these earlier reported trace-route measurements that lead to power
law degrees have been performed from a relatively small number of sources
to a large number of destinations. These results question the observability
of the Internet: how accurate are Internet properties such as hopcount and
degree that are derived from incomplete measurements, i.e. from a selected
small subset of nodes at which measurement boxes are placed?
16.6.3 The degree of the shortest path tree in the complete graph
with i.i.d. exponential link weights
In the complete graph NQ with i.i.d. exponential link weights, any node q
possesses equal properties in probability because of symmetry. If we denote
by gq the degree of node q in the shortest path tree rooted at that node q,
the symmetry implies that Pr [gq = n] = Pr [gl = n] for any node q and l. In
fact, we consider here the degree of a URT as an overlay tree in a complete
graph. Concentrating on a node with label 1, we obtain from (16.32)
h
i
i
h
(n)
(1)
H GQ = Q Pr [g1 = n] = Q Pr [Q = n
The latter follows from the fact that the degree of a node is equal to the
(1)
number of its direct neighbors, the nodes at level 1, [Q . By definition of
the URT, the second node surely belongs to the level set 1, while node 3 has
equal probability to be attached to the root or to node 2. In general, when
attaching a node m to a URT of size m 1, the probability that node m is
1
. Thus, the number of nodes at level 1 in the
attached to the root equals m31
URT (constructed upon the complete graph) is in distribution equal to the
sum of Q 1 independent Bernoulli random variables each with dierent
1
,
mean m31
(1) g
[Q =
Q
X
m=2
µ
Bernoulli
1
m1
¶
=
Q31
X
m=1
µ ¶
1
Bernoulli
m
372
The Shortest Path Problem
because each node in the complete graph is connected to Q 1 neighbors.
The generating function is
SQ 31
¸
Q31
h (1) i
Y Bernoulli 1 ¸
1
[Q
m=1 Bernoulli m
m
=H }
H }
=
H }
m=1
Using the
generating function (3.1) of a Bernoulli random vari probability
¸
able, H }
Bernoulli
1
m
= 1 1m + }m , yields
h (1) i Q31
Y µ } + m 1 ¶ (} + Q 1)
[Q
H }
=
=
m
(})(Q )
m=1
Compared to the generating function (16.6) of the hopcount kQ , we recognize that
Q32
Q31
h (1) i
X
X
Pr[kQ31 = n]} n+1 =
Pr[kQ31 = n 1]} n
H } [Q = }*Q31 (}) =
n=0
n=1
from which we deduce, for 1 n Q 1,
h
i
(1)
Pr [Q = n = Pr[kQ31 = n 1]
Using (16.7), we arrive at the curious result
h
i
(n31)
h
i H [Q
31
(1)
Pr [Q = n =
for 1 n Q 1=
Q 1
The probability that the number of level 1 nodes in the shortest path tree
in the complete graph with i.i.d. exponential link weights is n equals the
average number of nodes on level n 1 in a URT of size Q 1 divided
by that size Q 1. In other words, the “horizontal” distribution at level
1 is related to the “vertical” distribution of the size of the level sets. In
summary6 , in the complete graph with i.i.d. exponential link weights, the
probability that an arbitrary node q as root of a shortest path tree has
degree n is
(n)
(1)Q313n VQ31
Pr [gq = n] = Pr[kQ31 = n 1] =
(Q 1)!
(16.40)
The degree of an arbitrary node in the union of all shortest paths trees in
the complete graph NQ with i.i.d. exponential link weights is also given by
(16.40) because in that union each node q is once a root and further plays,
6
This result is due to Remco van der Hofstad (private communication).
16.7 The minimum spanning tree
373
by symmetry, the role of the m-th attached node in the URT rooted at any
other node in NQ .
16.7 The minimum spanning tree
From an algorithmic point of view, the shortest path problem is closely related to the computation of the minimum spanning tree (MST). The Dijkstra
shortest path algorithm is similar to Prim’s minimum spanning tree algorithm (Cormen et al., 1991). In this section, we compute the average weight
of the MST in a graph with a general link weight structure.
16.7.1 The Kruskal growth process of the MST
Since the link weights in the underlying complete graph are chosen independently and assigned randomly to links in the complete graph, the resulting
graph is probabilistically the same if we first order the set of link weights
and assign them in increasing order randomly to links in the complete graph.
In the latter construction process, only the order statistics or the ranking
of the link weights su!ce to construct the graph because the precise link
weight can be unambiguously associated to the rank of a link. This observation immediately favors the Kruskal algorithm for the MST over Prim’s
algorithm. Although the Prim algorithm leads to the same MST, it gives
a more complicated, long-memory growth process, where the attachment of
each new node depends stochastically on the whole growth history so far.
Pietronero and Schneider (1990) illustrate that in our approach Prim, in
contrast with Kruskal, leads to a very complicated stochastic process for the
construction of the MST.
The Kruskal growth process described here is closely related to a growth
process of the random graph Ju (Q> O) with Q nodes and O links. The
construction or growth of Ju (Q> O) starts from Q individual nodes and in
each step an arbitrary, not yet connected random pairs is connected. The
only dierence with Kruskal’s algorithm for the MST is that, in Kruskal,
links generating loops are forbidden. Those forbidden links are the links
that connect nodes within the same connected component or “cluster”. As
a result, the internal wiring of the clusters diers, but the cluster size statistics (counted in nodes, not links) is exactly the same as in the corresponding
random graph. The metacode of the Kruskal growth process for the construction of the MST is shown in Fig. 16.7.
The growth process of the random graph Js (Q ), which is asymptotically
equal to that of Ju (Q> O), is quantified in Section 15.6.4 for large Q . The
374
The Shortest Path Problem
KruskalGrowthMST
1. start with Q disconnected nodes
2. repeat until all nodes are connected
3.
randomly select a node pair (l> m)
4.
if a path Pl$m does not exist
5.
then connect l to m
Fig. 16.7. Kruskal growth process
fraction of nodes V in the giant component of Js (Q ) is related to the average
degree or to the link density s because rg = s(Q 1) in Js (Q ) by (15.20).
For large Q , the size of the giant cluster in the forest is thus determined as
a function of the number of added links that increase rg .
a
e
c
d
f
b
Fig. 16.8. Component structure during the Kruskal growth process.
We will now transform the mean degree rg in the random graph Js (Q )
to the mean degree MST in the corresponding stage in Kruskal growth
process of the MST. In early stages of the growth each selected link will
be added with high probability such that MST = rg almost surely. After
some time the probability that a selected link is forbidden increases, and
thus rg exceeds MST . In the end, when connectivity of all Q nodes is
reached, MST = 2 (since it is a tree) while rg = R(log Q ), as follows from
(15.19) and the critical threshold sf logQQ .
Consider now an intermediate stage of the growth as illustrated in Fig. 16.8.
Assume there is a giant component of average size Q V and qo = Q(1 V)@vo
small components of average size vo each. Then we can distinguish six types
of links labelled d-i in Fig. 16.8. Types d and e are links that have been
16.7 The minimum spanning tree
375
chosen earlier in the giant component (d) and in the small components (e)
respectively. Types f and g are eligible links between the giant component
and a small component (f) and between small components (g) respectively.
Types h and i are forbidden links connecting nodes within the giant component (h), respectively within a small component (i ). For large Q , we can
enumerate the average number of links O{ of each type {:
Od + Oe = 12 PVW Q
Of = VQ · (1 V)Q
Og = 12 q2o · v2o
Oh = 12 (VQ )2 VQ
Oi = 12 qo vo (vo 1) qo (vo 1)
To highest order in R(Q 2 ), we have
Of = Q 2 V(1 V)>
1
Og = Q 2 (1 V)2 >
2
1
Oh = Q 2 V 2
2
+Og
The probability that a randomly selected link is eligible is t = Of +OOgf +O
h +Oi
¡ 2¢
or, to order R Q ,
t = 1 V2
(16.41)
In contrast with the growth of the random graph Js (Q ) where at each stage
a link is added with probability s, in the Kruskal growth of the MST we are
only successful to add one link (with probability 1) per 1t stages on average.
Thus the average number of links added in the random graph corresponding
1
to one link in the MST is 1t = 13V
2 . This provides an asymptotic mapping
between rg and MST in the form of a dierential equation,
grg
1
=
gMST
1 V2
By using (15.22), we find
gMST grg
(1 + V) (V + (1 V) log(1 V))
gMST
=
=
gV
grg gV
V2
Integration with the initial condition MST = 2 at V = 1, finally gives the
average degree MST in the MST as function of the fraction V of nodes in
the giant component
MST (V) = 2V (1 V)2
log(1 V)
V
(16.42)
As shown in Fig. 16.9, the asymptotic result (16.42) agrees well with the
simulation (even for a single sample), except in a small region around the
transition MST = 1 and for relatively small Q .
The key observation is that all transition probabilities in the Kruskal
376
The Shortest Path Problem
Fraction S of nodes in the giant component
1.0
0.8
N = 1000
N = 10000
N = 25000
Theory
0.6
0.4
0.2
0.0
0.0
0.5
1.0
1.5
2.0
Mean degree PMST
Fig. 16.9. Size of the giant component (divided by Q ) as a function of the mean
degree M ST . Each simulation for a dierent number of nodes Q consists of one
MST sample.
growth process asymptotically depend on merely one parameter V, the fraction of nodes in the giant component, and V is called an order parameter in
statistical physics. In general, the expectation of an order parameter distinguishes the qualitatively dierent regimes (states) below and above the phase
transition. In higher dimensions, fluctuations of the order parameter around
the mean can be neglected and the mean value can be computed from a selfconsistent mean-field theory. In our problem, the underlying complete (or
random) graph topology makes the problem eectively infinite-dimensional.
The argument leading to (15.20) is essentially a mean-field argument.
16.7.2 The average weight of the minimum spanning tree
By definition, the weight of the MST is
ZMST =
O
X
z(m) 1mMMST
(16.43)
m=1
where z(m) is the m-th smallest link weight. The average MST weight is
H [ZMST ] =
O
X
m=1
¤
£
H z(m) 1mMMST
16.7 The minimum spanning tree
377
The random variables z(m) and 1mMMST are independent because the m-th
smallest link weight z(m) only depends on the link weight distribution and
the number of links O, while the appearance of the m-th link in the MST
only depends on the graph’s topology, as shown in Section 16.7.1. Hence,
¤
£
¤
¤
£
£
H z(m) 1mMMST = H z(m) H [1mMMST ] = H z(m) Pr [m 5 MST]
such that the average weight of the MST is
H [ZMST ] =
O
X
£
¤
H z(m) Pr [m 5 MST]
(16.44)
m=1
In general for independent link weights with probability density function
iz ({) and distribution function Iz ({) = Pr [z {], the probability density
function of the m-th order statistic follows from (3.36) as
µ ¶
miz ({) O
(16.45)
(Iz ({))m (1 Iz ({))O3m
iz(m) ({) =
Iz ({) m
¡ ¢
The factor Om (Iz ({))m (1 Iz ({))O3m is a binomial distribution with mean
= Iz ({) O and variance 2 = OIz ({) (1 Iz ({)) that, by the Central
(m3)2
Limit Theory 6.3.1, tends for large O to a Gaussian I12 h3 22 , which
£
¤
peaks at m = . For large Q and fixed Om , we have7 {m = H z(m) ' Iz31 ( Om ).
We found before in (16.41) that the link ranked m appears in the MST
with probability
Pr [m 5 MST] = 1 Vm2
where Vm is the fraction of nodes in the giant component during the construction process of the random graph at the stage where the number of
links precisely equals m. Since links are added independently, that stage in
fact establishes the random graph Ju (Q> O = m). Our graph under
¡Q ¢consideration is the complete graph NQ such that we add in total O = 2 links.
7
31
In general, it holds that z(n) = Iz
(X(n) ) and
31
31
H z(n) = H Iz
(X(n) ) 6= Iz
(H X(n) )
but, for a large number of order statistics O, the Central Limit Theorem 6.3.1 leads to
m
31
31
H z(n) ' Iz
(H X(n) )
' Iz
O
because for a uniform random variable X on [0,1] the average weight of the m-th smallest link
is exactly
m
m
H z(l) =
'
O+1
O
378
The Shortest Path Problem
With (15.22) and rg = 2O
Q , it follows that
log(1 Vm )
2m
=
Q
Vm
Hence,
H [ZMST ] '
O
X
m=1
Iz31
(16.46)
µ ¶
¢
m ¡
1 Vm2
O
We now approximate the sum by an integral,
Z O
³x´¡
¢
H [ZMST ] '
Iz31
1 Vx2 gx
O
1
Substituting { = 2x
Q (which is the average degree in any graph J (Q> x))
2
yields for large Q where O ' Q2 ,
Z
Z
´
¢
Q Q 3131 ³ { ´³
Q Q 31 ³ { ´¡
2
Iz
Iz
H [ZMST ] '
1 V Q { g{ '
1 V 2 ({) g{
2
2
Q
2 0
Q
2
Q
It is known (Janson et al., 1993) that, if the number of links in the growth
process of the random graph is below Q2 , with high probability (and ignoring
a small onset region just below Q2 ), there is no giant component such that
V ({) = 0 for { 5 [0> 1]. Thus, we arrive at the general formula valid for
large Q ,
Z
Z
¢
Q 1 31 ³ { ´
Q Q 31 ³ { ´ ¡
Iz
Iz
H [ZMST ] '
g{ +
1 V 2 ({) g{
2 0
Q
2 1
Q
(16.47)
The first term is the contribution from the smallest Q@2 links in the graph,
which are included in the MST almost surely. The remaining part comes
from the more expensive links in the graph, which are included with diminishing probability since 1 V 2 ({) decreases exponentially for large { as
can be deduced from (15.21). The rapid decrease
of 1 V 2 ({) makes only
¡
¢
relatively small values of the argument Iz31 Q{ contribute to the second
integral.
At this point, the specifics of the link weight
¡ ¢ distribution needs to be
introduced. The Taylor expansion of Q2 Iz31 Q{ for large Q to first order is
µ ¶
µ ¶
1
1
Q 31 ³ { ´ Q 31
{
{
Iz
+R
+R
= Iz (0) +
=
2
Q
2
2iz (0)
Q
2iz (0)
Q
since we require that link weights are positive such that Iz31 (0) = 0. This
expansion is only useful provided iz is regular, i.e. iz (0) is neither zero nor
16.7 The minimum spanning tree
379
infinity. These cases occur, for example, for polynomial link weights with
iz ({) = {31 and 6= 1. For polynomial link weights, however, holds
¡ ¢
13 1
1
that Q2 Iz31 Q{ = Q 2 { . Formally, this latter expression reduces to the
first order Taylor approach for = 1, apart from the constant factor iz1(0) .
Therefore, we will first compute H [ZMST ] for polynomial link weights and
then return to the case in which the Taylor expansion is useful.
16.7.2.1 Polynomial link weights
The average weight of the MST for polynomial link weights follows8 from
(16.47) as
Ã
!
1
Z Q
¢
1 ¡
1
Q 13 { 1 V 2 ({) g{
+
H [ZMST ()] '
1
2
+
1
1
and g{ =
Let | = V ({) and use (15.22), then { = V 31 (|) = log(13|)
|
³
´
log(13|)
g
g| while | = V (1) = 0 and | = V (Q ) = 1, such that
g|
|
Z Q
L=
¢
1 ¡
{ 1 V 2 ({) g{
1
¶1
µ
¶
Z 1µ
¢ g
log(1 |)
log(1 |) ¡
1 |2
=
g|
|
g|
|
0
After partial integration, we have
2
1
+ 1
L =1
+1
+1
Finally, we end up with
H [ZMST ()] ' Q
8
Ã
1
13 1
1
+1
Z "
0
Z "
{
0
h3{
1
{ +1
1
(1 h3{ ) 1
+1
g{
!
h3{
1
(1 h3{ ) g{
(16.48)
Since the average of the n-th smallest link weight can be computed from (3.36) as
1
K n+ H!
H z(n) = 1
K (n)
K H+1+ the exact formula (16.44) reduces to
H
1
[
K m+ H!
1 3 Vm2
H [ZM S T ()] = 1
K
(m)
K H+1+ m=1
Analogously to the above manipulations, after convertion to an integral, substituting { =
K(}+ 1 )
2x
and using (Abramowitz and Stegun, 1968, Section 6.1.47), for large }, that K(}) =
Q
1 (}) 1 + R }1 , we arrive at the same formula.
380
The Shortest Path Problem
If ? 1, then H[ZMST ()] $ 0 for Q $ 4, while for A 1, H[ZMST ()] $
4. In particular, lim<" H [ZMST ()] = Q 1. Only for = 1,
H [ZPVW (1)] is finite for large Q . More precisely,
H [ZMST (1)] = (3) = 1=202 = = =
(16.49)
where we have used (Abramowitz and Stegun, 1968,R Section 23.2.7) the
" v31
integral of the Riemann Zeta function (v) (v) = 0 hxx 31 gx, which is
convergent for Re (v) A 1. This particular case for = 1 has been proved
earlier by Frieze (1985) based on a dierent method.
16.7.2.2 Generalizations
We now return to the Taylor series valid for link weights where 0 ? iz (0) ?
4. The above result for = 1 immediately yields
H [ZMST ] =
(3)
iz (0)
(16.50)
This result is for the complete graph NQ . A random graph Js (Q ) with s ? 1
and weight density iz ({) is equivalent to NQ with a fraction 1s of infinite
link weights. Thus the eective link weight distribution is siz ({) + (1 s)z>" , and we can simply replace iz (0) by siz (0) in the expression (16.50)
to obtain the average weight of the MST in the random graph Js (Q ).
16.8 The proof of the degree Theorem 16.6.1 of the URT
h
i
(Q31)
16.8.1 The case n = Q: H GQ
(Q )
If n = Q, the recursion (16.35) becomes with GQ
k
l
(Q )
H GQ +1 =
= 0,
l
k
(Q 31)
H GQ
Q
(1)
With initial value G2 = 2, the solution
l
k
(Q 31)
=
H GQ
2
(16.51)
(Q 3 1)!
l
k
(n)
is readily verified. Since for any URT, it holds that Pr GQ = m = 0 for m A Q 3 n, we have
k
l
k
l
(Q 31)
(Q 31)
that H GQ
= Pr GQ
= 1 . Since there exists in total (Q 3 1)! dierent URTs of size
Q, this result (16.51) means that there are precisely two possible URTs with a node of degree
Q 3 1. Indeed, one is the root with Q 3 1 children and the other is the root with one child of
degree Q 3 1 that in turn possesses Q 3 2 children. Also,
(Q 31)
uQ
l
k
(Q 31)
=
= (Q 3 1)H GQ
2
(Q 3 2)!
(16.52)
16.8 The proof of the degree Theorem 16.6.1 of the URT
h
i
(1)
16.8.2 The case n = 1: H GQ
381
If n = 1 and Q D 3, the recursion (16.35) is slightly dierent because the newly attached node
nQ +1 necessarily belongs to the set of degree 1 nodes in the URT of size Q + 1 such that
k
l
Q 3 1 k (1) l
(1)
H GQ +1 =
H GQ + 1
Q
l
l
k
k
(0)
(1)
(1)
(1)
(1)
= 2. With uQ = (Q 3 1)H GQ , the recursion
with G1 = 1 and G2 = 2. Hence, H G3
for n = 1 becomes
(1)
(1)
uQ +1 = uQ + Q
(1)
(1)
The particular solution is uQ ;s = dQ 2 + eQ + f. Substitution of uQ = dQ 2 + eQ + f into the
dierence equation yields
dQ 2 + (e + 2d) Q + d + e + f = dQ 2 + (e + 1) Q + f
or, by equating corresponding power in Q, we find the conditions e + 2d = e + 1 and d + e = 0
from which d = 12 , e = 3 12 . Thus,
l
k
Q
(1)
(1)
uQ = (Q 3 1)H GQ =
(Q 3 1) + f
2
and
k
l
Q
f
(1)
H GQ =
+
2
Q 31
k
l
(1)
Using H G3
= 2 shows that f = 1 such that, for Q A 2,
l
k
1
Q
(1)
+
H GQ =
2
Q 31
(16.53)
h
i
(n)
16.8.3 The general case: H GQ
Let us denote
U({> |) =
31
" Q
[
[
(n)
uQ {n |Q
(16.54)
Q =3 n=2
then the recursion (16.36) is transformed into
31
31
31
" Q
" Q
" Q
[
[
[
[
[
[
(n)
(n)
(n31) n Q
(Q 3 1)uQ +1 {n | Q =
(Q 3 1)uQ {n | Q +
uQ
{ |
Q =3 n=2
Q =3 n=2
Q =3 n=2
Now, the left-hand side is
31
" Q
" Q 32
" Q 32
[
[
1 [ [
1 [ [
(n)
(n)
(n)
(Q 3 1)uQ +1 {n | Q =
(Q 3 2)uQ {n | Q =
(Q 3 2)uQ {n | Q
|
|
Q =3 n=2
Q =4 n=2
Q =3 n=2
=
" Q 31
"
1 [ [
1 [
(n)
(Q 31) Q 31 Q
QuQ {n |Q 3
QuQ
{
|
| Q =3 n=2
| Q =3
3
" Q 31
"
2 [ [ (n) n Q
2 [ (Q 31) Q 31 Q
uQ { | +
u
{
|
| Q =3 n=2
| Q =3 Q
382
The Shortest Path Problem
Using (16.54) yields
31
" Q
[
[
2
CU({> |)
(n)
3 U({> |)
(Q 3 1)uQ +1 {n | Q =
C|
|
Q =3 n=2
3
"
"
1 [
2 [ (Q 31) Q 31 Q
(Q 31) Q 31 Q
QuQ
{
| +
u
{
|
| Q =3
| Q =3 Q
Invoking (16.52) yields
"
"
"
[
2 [ (Q 31) Q 31 Q
({|)Q
4 [ ({|)Q
= 4{|
= 4{| (h{| 3 1)
uQ
{
| =
| Q =3
{| Q =3 (Q 3 2)!
Q!
Q =1
"
"
"
[
2 [ Q ({|)Q
(Q + 2) ({|)Q
1 [
(Q 31) Q 31 Q
QuQ
{
| =
= 2{|
| Q =3
{| Q =3 (Q 3 2)!
Q!
Q =1
= 2 ({|)2
"
[
({|)Q
Q =0
Q!
+ 4{|
"
[
({|)Q
= 2 ({|)2 h{| + 4{| (h{| 3 1)
Q!
Q =1
such that
31
" Q
[
[
2
CU({> |)
(n)
3 U({> |) 3 2 ({|)2 h{|
(Q 3 1)uQ +1 {n | Q =
C|
|
Q =3 n=2
Similarly,
31
32
" Q
" Q
[
[
[
[
(n31) n Q
(n)
uQ
{ | ={
uQ {n |Q
Q =3 n=2
Q =3 n=1
={
31
" Q
"
"
[
[
[
[
(n)
(1)
(Q 31) Q 31 Q
uQ {n |Q + {2
uQ |Q 3 {
uQ
{
|
Q =3 n=2
= {U({> |) + {2
Q =3
"
[
(1)
uQ | Q 3 {
Q =3
Q =3
"
[
(Q 31) Q 31 Q
uQ
{
|
({|)2 g2
2 g|2
Q =3
(1)
Using both (16.52) and uQ = Q
(Q 3 1) + 1 leads to
2
{2
"
[
(1)
uQ |Q = {2
Q =3
{
"
[
Q =3
(Q 31) Q 31 Q
uQ
{
| =2
Q =3
"
[
Q
2
(Q 3 1)|Q + {2
"
[
Q =3
|Q =
|3
13|
+
{2 |3
13|
"
[
({|)Q
({|)Q
= 2 ({|)2
= 2 ({|)2 (h{| 3 1)
(Q
3
2)!
Q!
Q =3
Q =1
"
[
such that
3 31
" Q
[
[
{2 | 3
({|)2 g2
|
(n31) n Q
+
uQ
{ | = {U({> |) +
3 2 ({|)2 (h{| 3 1)
2
2 g|
13|
13|
Q =3 n=2
Combining all transforms the recursion (16.36) to a first order linear partial dierential equation
(1 3 |)
CU({> |)
+
C|
{2 |3
2
3 3 3| + | 2
+
13{3
U({> |) = {2 | 3
+ 2 ({|)2
|
13|
(1 3 |)3
1
1
= {2 | 2
+
13|
(1 3 |)3
16.8 The proof of the degree Theorem 16.6.1 of the URT
383
with boundary equations U({> 0) = U(0> |) = 0. Further,
]
(n)
31
31
" Q
" Q
l
k
[
[
[
[
uQ
U({> |)
(n)
n Q 31
{
g|
=
|
=
H GQ {n | Q 31
|2
Q
3
1
Q =3 n=2
Q =3 n=2
Hence, if { = 1,
]
31
" Q
" l
k
l
k
[
[
[
U(1> |)
(n)
(1)
Q 3 H GQ
| Q 31
g| =
H GQ |Q 31 =
2
|
Q =3 n=2
Q =3
"
"
"
l
k
[
[
[
Q Q 31
|Q 31
(1)
H GQ | Q 31 =
Q|Q 31 3
3
|
2
Q 31
Q =3
Q =3
Q =3
Q =3
Q =3
]
1
1
1
|
=
3 (1 + 2|) 3
g|
2 (1 3 |)2
2
13|
=
"
[
Q| Q 31 3
"
[
or
U(1> |) =
|2
(1 3 |)3
3
|2
(1 3 |)
(16.55)
It is more convenient to consider the dierential equation as an ordinary dierential equation
in | and to regard the variable { as a parameter. The homogeneous dierential equation,
CUk ({> |)
2
(1 3 |)
= { + 3 1 Uk ({> |)
C|
|
is solved after integration with respect to |,
] {+ 2 31
]
]
g|
g|
|
ln Uk ({> |) =
g| = ({ 3 1)
+2
13|
13|
| (1 3 |)
l
k
= (1 3 {) ln(1 3 |) + 2 (ln | 3 ln(1 3 |)) = ln (1 3 |)3{31 |2
or Uk ({> |) = (1 3 |)3{31 | 2 . The particular solution is of the form U ({> |) = F (|) Uk ({> |)
where F (|) obeys
CF (|)
1
1
= {2 (1 3 |){
+
C|
13|
(1 3 |)3
or
F (|) = {2
=3
] (1 3 |){33 + (1 3 |){31 g| + f ({)
{2 (1 3 |){32
{2 (1 3 |){
3
+ f ({)
{32
{
where f ({) is a function of {, independent of |, to be determined later. The solution is
U ({> |) = 3
({|)2
3
({ 3 2) (1 3 |)
3
{| 2
+ f ({) (1 3 |)3{31 | 2
(1 3 |)
The initial condition U (0> |) = 0 shows that f (0) = 0, while the boundary condition (16.55)
implies that f(1) = 0. Expanding this solution in a power series around { = 0 and | = 0 yields
Uk ({> |) = (1 3 |)3{31 |2 =
" [
3{ 3 1
(31)Q |Q +2
Q
Q =0
384
The Shortest Path Problem
From the generating function of the Stirling numbers of the first kind (Abramowitz and Stegun,
1968, Section 24.1.3),
q
[
K({ + 1)
(m)
=
Vq {m
K({ + 1 3 q)
m=0
(16.56)
we observe that
(n+1)
Q
3{ 3 1
[
VQ +1 (31)n n
K(3{)
=
{
=
Q !K(3{ 3 Q)
Q!
Q
n=0
such that
Uk ({> |) =
(n+1)
" [
Q
[
VQ +1
Q =0 n=0
Q!
(31)Q +n {n | Q +2 =
(n+1)
32
" Q
[
[
VQ 31
(Q 3 2)!
Q =2 n=0
(31)Q +n {n |Q
Hence,
U ({> |) =
" [
"
[
Q =2 n=2
1
2n31
(n+1)
32
"
" Q
Q [
[
[
VQ 31
{n |Q 3 {
|Q + f ({)
(31)Q +n {n | Q
2
(Q
3
2)!
Q =2
Q =2 n=0
It remains to determine f ({) by equating the corresponding powers in { and | at both sides. With
the definition (16.54), equating the second power (Q = 2) in | yields
0=
"
[
1
{n 3 { + f ({)
n31
2
n=2
which indicates that
f ({) = { 3
{2
23{
agreeing with f (0) = f (1) = 0. The Taylor series around { = 0 is f ({) =
1
f1 = 1 and fn = 3 2n1
for n A 1. Equating the power Q A 2 in |,
Q
31
[
(n)
uQ {n =
n=2
S"
n
n=0 fn { with f0 = 0,
(n+1)
Q
32
Q [
VQ 31
n
(31)Q +n {n
{
3
{
+
f
({)
n31 2
2
(Q
3
2)!
n=2
n=0
"
[
1
(n+1)
"
"
Q [
[
VQ 31
n
n
(31)Q +n {n
{
3
{
+
f
{
n
2n31 2
(Q 3 2)!
n=2
n=0
n=0
3
4
(m+1)
n31
"
"
[
[
[
VQ 31
1 Q n
Q +m D n
C
=
fn3m
(31)
{ 3{+
{
2n31 2
(Q 3 2)!
m=0
n=2
n=1
3
3
4
4
(1)
(m+1)
n31
"
"
[
[
[
(31)Q VQ 31
(31)Q +m VQ 31
1 Q n
C
C
D
D {n
{ 3 { + f1
=
fn3m
{+
2n31 2
(Q 3 2)!
(Q 3 2)!
m=0
n=2
n=2
=
"
[
1
(1)
which, by using VQ 31 = (31)Q (Q 3 2)! and f1 = 1, equals
Q
31
[
n=2
(n)
uQ {n =
3
4
(m+1)
n31
"
Q [
[
VQ 31
Q +m D n
n
C
{ +
fn3m
(31)
{
2n31 2
(Q 3 2)!
m=0
n=2
n=2
"
[
1
16.9 Problems
385
Finally, by equating the corresponding powers in {, leads to
(n)
uQ =
=
(m+1)
Q n31
[
VQ 31
f
(31)Q +m
+
n3m
2n31 2
(Q 3 2)!
m=0
1
1
2n31
n31
Q (31)Q +n31 V (n)
[ (m)
(31)Q
Q 31
+ n
+
V
(32)m
(Q 3 2)!
2 (Q 3 2)! m=1 Q 31
2
q
K(q3{)
or to (16.37). As a check, using (16.56) the generating function (31)K(3{)
=
reveals that
l
k
(Q 31)
=
H GQ
Sq
(m) m
m=0 Vq { ,
Q
32
[
(31)Q
1
(m)
+ Q 31
V
(32)m
(Q 3 1)!
2
(Q 3 1)! m=1 Q 31
%
&
(31)Q
(31)Q 31 K(Q 3 1 + 2)
Q
1
(Q 31)
Q 31
+ Q 31
3 VQ 31 (32)
= Q 31 +
2
(Q 3 1)!
2
(Q 3 1)!
K(2)
=
Q
2Q 31
+
Q!
1
2
1
Q
3 Q 31
+
=
+
2Q 31
(Q 3 1)!
2
(Q 3 1)!
(Q 3 1)!
(Q 3 1)!
1
= Q
+ Q 131 is readily verified.
Also H GQ
2
16.9 Problems
(i) Comparison of simulations with exact results. Many of the theoretical results are easily verified by simulations. Consider the following
standard simulation: (a) Construct a graph of a certain class, e.g. an
instance of the random graphs Js (Q ) with exponentially distributed
link weights (b) Determine in that graph a desired property, e.g. the
hopcount of the shortest path between two dierent arbitrary nodes,
(c) Store the hopcount in a histogram and (d) repeat the sequence
(a)-(c) q times with each time a dierent graph instance in (a). Estimate the relative error of the simulated hopcount in Js (Q ) with
s = 1 for q = 104 > 105 and 106 .
(ii) Given the probability generating function (16.17) of the weight of the
shortest path in a complete graph with independent exponential link
weights, compute the variance of ZQ .
(iii) Prove the asymptotic law (16.20) of the weight of the shortest path
in a complete graph with i.i.d. exponential link weights.
(iv) In a communication network often two paths are computed for each
important flow to guarantee su!cient reliability. Apart from the
shortest path between a source D and a destination E, a second path
between D and E is chosen that does not travel over any intermediate
router of the shortest path. We call such a path node-disjoint to the
shortest path. Derive a good approximation for the distribution of
386
The Shortest Path Problem
the hopcount of the shortest node-disjoint path to the shortest path
in the complete graph with exponential link weights with mean 1.
17
The e!ciency of multicast
The e!ciency or gain of multicast in terms of network resources is compared
to unicast. Specifically, we concentrate on a one-to-many communication,
where a source sends a same message to p dierent, uniformly distributed
destinations along the shortest path. In unicast, this message is sent p
times from the source to each destination. Hence, unicast uses on average
iQ (p) = pH [KQ ] link-traversals or hops, where H [KQ ] is the average
number of hops to a uniform location in the graph with Q nodes. One of
the main properties of multicast is that it economizes on the number of linktraversals: the message is only copied at each branch point of the multicast
tree to the p destinations. Let us denote by KQ (p) the number of links in
the shortest path tree (SPT) to p uniformly chosen nodes. If we define the
multicast gain jQ (p) = H [KQ (p)] as the average number of hops in the
SPT rooted at a source to p randomly chosen distinct destinations, then
jQ (p) iQ (p). The purpose here is to quantify the multicast gain jQ (p).
We present general results valid for all graphs and more explicit results valid
for the random graph Js (Q ) and for the n-ary tree. The analysis presented
here may be valuable to derive a business model for multicast: “How many
customers p are needed to make the use of multicast for a service provider
profitable?”
Two modeling assumptions are made. First, the multicast process is assumed1 to deliver packets along the shortest path from a source to each of
the p destinations. As most of the current Internet protocols forward packets based on the (reverse) shortest path, the assumption of SPT delivery is
quite realistic. The second assumption is that the p multicast group member nodes are uniformly chosen out of the total number of nodes Q . This
assumption has been discussed by Phillips et al. (1999). They concluded
1
The assumption ignores shared tree multicast forwarding such as core-based tree (CBT, see
RFC2201).
387
388
The e!ciency of multicast
that, if p and Q are large, deviations from the uniformity assumption are
negligibly small. Also the Internet measurements of Chalmers and Almeroth
(2001) seem to confirm the validity of the uniformity assumption.
17.1 General results for jQ (p)
Theorem 17.1.1 For any connected graph with Q nodes,
Qp
(17.1)
p+1
Proof: We need at least one edge for each dierent user; therefore
jQ (p) p and the lower bound is attained in a star topology with the
source at the center.
We will next show that an upper bound is obtained in a line topology.
It is su!cient to consider trees, because multicast only uses shortest paths
without cycles. If the tree has not a line topology, then at least one node
has degree 3 or the root has degree 2. Take the node closest to the root
with this property and cut one of the branches at this node; we paste that
branch to a node at the deepest level. Through this procedure the multicast
function jQ (p) stays unaltered or increases. Continuing in this fashion until
we reach a line topology demonstrates the claim.
For the line topology we place the source at the origin and the other
nodes at the integers 1> 2> = = = > Q 1. The links of the graph are given by
(l> l + 1)> l = 0> 1> = = = > Q 2. The multicast gain jQ (p) equals H [P ], where
P is the maximum of a sample of size p, without replacement, from the
integers 1> 2> = = = > Q 1. Thus,
¡n¢
p jQ (p) p
Pr [P n] = ¡Q31
¢>
pn Q 1
p
from which jQ (p) = H [P ] is
¡ n ¢ ¡n31¢ Q31 ¡ n31 ¢
Q31
X
X p31
jQ (p) =
n p ¡Q31¢p =
n ¡Q31¢
n=p
Q31
X
p
¡n¢
n=p
Q
31
X
p
¡n¢
pQ
pQ
p
¡ Q
¢=
p+1
p+1
p
n=p
n=p p+1
PQ31 ¡ n ¢ ¡ Q ¢
where we have used that
n=p p @ p+1 = 1, because it is a sum of
probabilities over all possible disjoint outcomes.
¤
=p
p
¡Q31
¢=
Figure 17.1 shows the allowable space for jQ (p).
17.1 General results for jQ (p)
gN(m)
389
Nm/(m + 1)
N1
N/2
clog(N)
1
m
1
N1
Fig. 17.1. The allowable region (in white) of jQ (p). For exponentially growing
graphs, H[KQ ] = f log Q , implying that the allowable region for these graphs is
smaller and bounded at the left (in dotted line) by the straight line p(f log Q ).
Theorem 17.1.2 For any connected graph with Q nodes, the map p 7$
(p)
jQ (p) is concave and the map p 7$ ijQ
is decreasing.
Q (p)
Proof: Define \p to be the random variable giving the additional number
of hops necessary to reach the p-th user when the first p1 users are already
connected. Then we have that
H [\p ] = jQ (p) jQ (p 1)
Moreover, let \p0 be the random number of additional hops necessary to
reach the p-th multicast group member, when we discard all extra hops
of the (p 1)-st group member. An example is illustrated in Fig. 17.2.
The random variable \p0 has the same distribution as \p31 , because both
the (p 1)-st and the p-th group member are chosen uniformly from the
remaining Q p 1 nodes. In general, \p0 6= \p31 > but, for each n,
Pr[\p0 = n] = Pr[\p31 = n] and, hence,
£ ¤
(17.2)
H \p0 = H [\p31 ]
Furthermore, we have by construction that \p \p0 with probability 1,
implying that
£ ¤
(17.3)
H [\p ] H \p0
Indeed, attaching the p-th group member to the reduced tree takes at least
as many hops as attaching that same group member to the non-reduced tree
because the former is contained in the latter and the extra hops added by
390
The e!ciency of multicast
the p 1 group member can only help us. Combining (17.2) and (17.3)
immediately gives
£ ¤
jQ (p) jQ (p 1) = H [\p ] H \p0 = jQ (p 1) jQ (p 2) (17.4)
This is equivalent to the concavity of the map p 7$ jQ (p).
Root
A
C
B
D
3
1
4
5
2
Fig. 17.2. A multicast session with p = 5 group members where \5 = 1 (namely
link C-5). To construct \50 the three dotted lines must be removed and we observe
that \50 = 2 (A-C-5), which is referred to as the reduced tree. In this example,
\50 = \4 = 2 because A-C-4 and A-C-5 both consist of 2 hops. In general, they are
equal in distribution because the role of group member 4 and 5 are identical in the
reduced tree.
(p)
In order to show that jiQ
is decreasing it su!ces to show that p 7$
Q (p)
jQ (p)
is decreasing, since iQ (p) is proportional to p. Defining jQ (0) = 0,
p
we can write jQ (p) as a telescoping sum
p
p
X
X
{jQ (n) jQ (n 1)} =
{n
jQ (p) =
n=1
n=1
where {n = jQ (n) jQ (n 1)> n = 1> = = = > p. Then,
jQ (p)
1 X
{n
=
p
p
p
n=1
is the mean of a sequence of p positive numbers {n . By (17.4) the sequence
{n {n31 is decreasing and, hence,
jQ (p)
1 X
jQ (p 1)
1 X
{n {n =
=
p
p
p1
p1
p
p31
n=1
n=1
17.1 General results for jQ (p)
This proves that p 7$ jQ (p)@p is decreasing.
391
¤
Next, we will give a representation for jQ (p) that is valid for all graphs.
Let [l be the number of joint hops that all l uniformly chosen and dierent
group members have in common, then the following general theorem holds,
Theorem 17.1.3 For any connected graph with Q nodes,
p µ ¶
X
p
jQ (p) =
(1)l31 H [[l ]
l
(17.5)
l=1
Note that
jQ (1) = iQ (1) = H [[1 ] = H [KQ ]
so that the decrease in average hops or the “gain” by using multicast over
unicast is precisely
p µ ¶
X
p
(1)l31 H [[l ]
jQ (p) iQ (p) =
l
l=2
However, computing H [[l ] for general graphs is di!cult.
Proof of Theorem 17.1.3: Let D1 > D2 > = = = > Dp be sets where Dl consists
of all links that constitute the shortest path from the source to multicast
group member l. Denote by |Dl | the number of elements in the set Dl . The
multicast group members are chosen uniformly from the set of all nodes
except for the root. Hence,
H [[1 ] = H [|Dl |] >
for 1 l Q
and
H [[2 ] = H [|Dl _ Dm |] >
for 1 l ? m Q
¡ ¢
etc.. Now, jQ (p) = H [|D1 ^ D2 ^ · · · ^ Dp |]. Since T(D) = H [|D|] @ Q2 is
a probability measure on the set of all links, we obtain from
¡Q ¢ the inclusionexclusion formula (2.3) applied to T and multiplied with 2 afterwards,
H [|D1 ^ D2 ^ · · · ^ Dp |] =
p
X
H [|Dl |] l=1
X
H [|Dl _ Dm |] + · · ·
l?m
+ (1)p31 H [|D1 _ D2 _ · · · _ Dp |]
µ ¶
p
H [[2 ] + · · · + (1)p31 H [[p ]
= pH [[1 ] 2
This proves Theorem 17.1.3.
¤
392
The e!ciency of multicast
Corollary 17.1.4 For any connected graph with Q nodes,
p µ ¶
X
p
H [[p ] =
(1)l31 jQ (l)
l
(17.6)
l=1
The corollary is a direct consequence of the inversion formula for the
binomial (Riordan, 1968, Chapter 2). Alternatively, in view of the GregoryNewton interpolation
formula (Lanczos, 1988, Chapter 4, Section 2) for
P ¡p¢ l
l31 l j (0) where
j
jQ (p) = "
Q (0), we can write H [[l ] = (1)
Q
l=1 l
is the dierence operator, i (0) = i(1) i (0).
Corollary 17.1.5 For any connected graph, the multicast e!ciency jQ (p)
is bounded by
iQ (p)
H [KQ ]
(17.7)
jQ (p)
where H [KQ ] is the average number of hops in unicast.
Proof: We give two demonstrations. (a) From jQ (Q 1) = Q 1 (all
nodes, source plus Q 1 destinations, of the graph are spanned by a tree
(p)
(see Theorem
consisting of Q 1 links) and the monotonicity of p 7$ jiQ
Q (p)
17.1.2) we obtain:
jQ (Q 1)
Q 1
1
jQ (p)
=
=
iQ (p)
iQ (Q 1)
(Q 1)H [KQ ]
H [KQ ]
(b) Alternatively, Theorem 17.1.1 indicates that jQ (p) p, which, with
the identity iQ (p) = pH [KQ ], immediately leads to (17.7).
¤
Corollary 17.1.5 means that for any connected graph, including the graph
describing the Internet, the ratio of the unicast over multicast e!ciency is
bounded by the expected hopcount in unicast. In order words, the maximum
savings in resources an operator can gain by using multicast (over unicast)
never exceeds H [KQ ], which is roughly about 15 in the current Internet.
17.2 The random graph Js (Q )
In this section, we confine to the class RGU, the random graphs of the
class Js (Q ) with independent identically and exponentially distributed link
weights z with mean H [z] = 1 and where Pr[z {] = 1 h3{ , { A 0.
In Section 16.2, we have shown that the corresponding SPT is, asymptotically, a URT. The analysis below is exact for the complete graph NQ while
asymptotically correct for connected random graphs Js (Q ).
17.2 The random graph Js (Q )
393
17.2.1 The hopcount of the shortest path tree
Based on properties of the URT, the complete probability density function
of the number of links KQ (p) in the SPT to p uniformly chosen nodes can
be determined. We first derive
£ K (p)a¤recursion for the probability generating
of the number of links KQ (p) in the
function *KQ (p) (}) = H } Q
SPT to p uniformly chosen nodes in the complete graph NQ .
Lemma 17.2.1 For Q A 1 and all 1 p Q 1,
*KQ (p) (}) =
(Q p 1)(Q 1 + p})
p2 }
*
(})
+
K
(p)
Q
1
2 *KQ 1 (p1) (})
(Q 1)2
(Q 1)
(17.8)
Proof: To prove (17.8), we use the recursive growth of URTs: a URT of size
Q is a URT of size Q 1, where we add an additional link to a uniformly
chosen node.
1
N
N
2 N
N
Case A
Case B
Case C and D
Fig. 17.3. The several possible cases in which the Q -th node can be attached uniformly to the URT of size Q 1. The root is dark shaded while the p multicast
member nodes are lightly shaded.
In order to obtain a recursion for KQ (p) we distinguish between the p
uniformly chosen nodes all being in the URT of size Q 1 or not. The
p
probability that they all belong to the tree of size Q 1 is equal to 1 Q31
(case A in Fig. 17.3). If they all belong to the URT of size Q 1, then we
have that KQ (p) = KQ31 (p). Thus, we obtain
µ
¶
h
i
p
p
*KQ (p) (}) = 1 *KQ 31 (p) (}) +
H } 1+OQ 31 (p) (17.9)
Q 1
Q 1
where OQ31 (p) is the number of links in the subtree of the URT of size
Q 1 spanned by p 1 uniform nodes and the “one” refers to the link from
394
The e!ciency of multicast
the added Q -th node to its ancestor in the URT of size Q 1. We complete
the proof by investigating the generating function of OQ31 (p). Again, there
are two cases. In the first case (B in Fig. 17.3), the ancestor of the added
Q -th node is one of the p 1 previous nodes (which can only happen if it is
unequal to the root), else we get one of the cases C and D in Fig. 17.3. The
probability of the first event equals p31
Q31 , the probability of the latter equals
p31
1 Q31 . If the ancestor of the added Q-th node is one of the p 1 previous
nodes, then the number of links OQ31 (p) equals KQ31 (p 1), otherwise
the generating function of the number of additional links equals
¶
µ
1
1
(})
*KQ 31 (p) (}) +
*
1
Q p
Q p KQ 31 (p31)
The first contribution comes from the case where the ancestor of the added
Q -th node is not the root, and the second from where it is equal to the root,
1
1
= Q3p
. Therefore,
which has probability Q313(p31)
i p1
h
(})
*
H } OQ 31 (p) =
Q 1 KQ 31 (p31)
µ
¶
*KQ 31 (p31) (})
Q p Q p1
*KQ 31 (p) (}) +
+
Q 1
Q p
Q p
p
Q p1
=
*KQ 31 (p31) (}) +
*KQ 31 (p) (}) (17.10)
Q 1
Q 1
Substitution of (17.10) into (17.9) leads to (17.8).
¤
Since jQ (p) = H[KQ (p)] = *0KQ (p) (1), we obtain the recursion for
jQ (p),
µ
¶
p2
p2
p
(p)
+
jQ31 (p 1) +
jQ (p) = 1 j
Q31
(Q 1)2
(Q 1)2
Q 1
(17.11)
Theorem 17.2.2 For all Q 1 and 1 p Q 1,
p µ ¶
h
i p!(Q 1 p)! X
(Q + n})
p
KQ (p)
*KQ (p) (}) = H }
(1)p3n
=
2
(1 + n})
n
((Q 1)!)
n=0
(17.12)
Consequently,
(m+1) (p)
p!(1)Q3(m+1) VQ Sm
Pr [KQ (p) = m] =
¢
¡
(Q 1)! Q31
p
(17.13)
17.2 The random graph Js (Q )
(m+1)
395
(p)
where VQ
and Sm denote the Stirling numbers of first and second kind
(Abramowitz and Stegun, 1968, Section 24.1).
Proof: By iterating the recursion (17.8) for small values of p, the computations given in
van der Hofstad et al. (2006a, Appendix) suggest the solution (17.12) for (17.8). One can verify
that (17.12) satisfies (17.8). This proves (17.12) of Theorem 17.2.2. Using (Abramowitz and
Stegun, 1968, Section 24.1.3.B), the Taylor expansion around } = 0 equals
*KQ (p) (}) =
p
1
p!Q(Q 3 1 3 p)! [ p
K(Q + n})
3
(31)p3n
(Q 3 1)!
Q!K(1 + n})
Q
n
n=0
(m+1)
Q
31
p
[
(31)Q 3(m+1) VQ
p!Q(Q 3 1 3 p)! [ p
nm } m
(31)p3n
(Q 3 1)!
Q!
n
m=1
n=0
# p
$
(m+1)
Q 31
[ p
p!Q(Q 3 1 3 p)! [ (31)Q 3(m+1) VQ
(31)p3n nm } m
=
n
(Q 3 1)!
Q!
m=1
n=0
=
Using the definition of Stirling numbers of the second kind (Abramowitz and Stegun, 1968,
24.1.4.C),
p [
p
(p)
p!Sm
(31)p3n nm
=
n
n=0
(p)
for which Sm
= 0 if m ? p, gives
Q 31
*KQ (p) (}) =
(p!)2 (Q 3 1 3 p)! [
2
((Q 3 1)!)
(m+1)
(31)Q 3(m+1) VQ
(p) m
Sm
}
m=1
This proves (17.13) and completes the proof of Theorem 17.2.2.
¤
Figure 17.4 plots the probability density function of K50 (p) for dierent
values of p.
Corollary 17.2.3 For all Q 1 and 1 p Q 1,
jQ (p) = H [KQ (p)] =
Q
X
1
pQ
Q p
n
(17.14)
n=p+1
and
P
1
2 (p)
p2 Q 2 Q
jQ
Q 1+p
n=p+1 n2
jQ (p) Var [KQ (p)] =
Q +1p
(Q + 1 p) (Q p)(Q + 1 p)
(17.15)
The formula (17.14) is proved in two dierent ways. The earlier proof
presented in Section 17.6 below does not rely on the recursion in Lemma
17.2.1 nor on Theorem 17.2.2. The shorter proof is presented here. Formula
(17.14) can be expressed in terms of the digamma function #({) as
µ
¶
#(Q ) #(p)
1
(17.16)
jQ (p) = pQ
Q p
396
The e!ciency of multicast
0.5
Pr[H50(m) = j]
0.4
0.3
0.2
0.1
0.0
0
10
20
30
40
50
j hops
Fig. 17.4. The pdf of K50 (p) for p = 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45,
47.
Proof of Corollary 17.2.3: The expectation and variance of KQ (p) will not be obtained
using the explicit probabilities (17.13), but by rewriting (17.12) as
p
k
l
K(p + 1)K(Q 3 p) [ p
(31)p3n CwQ 31 wQ 31+n}
2
w=1
n
K (Q)
n=0
k
l
K(p + 1)K(Q 3 p)
(31)p CwQ 31 wQ 31 (1 3 w} )p
=
w=1
K2 (Q)
*KQ (p) (}) =
(17.17)
Indeed,
k
l
K(p + 1)K(Q 3 p)
Q 31 Q 31
p
} p
(31)
w
C
C
(1
3
w
)
}
w
w=}=1
K2 (Q)
k
l
K(p + 1)K(Q 3 p)
p(31)p31 CwQ 31 wQ log w(1 3 w)p31
=
>
2
w=1
K (Q)
k
l
K(p + 1)K(Q 3 p)
H[KQ (p) (KQ (p) 3 1)] =
(31)p C}2 CwQ 31 wQ 31 (1 3 w} )p
2
w=}=1
K (Q)
K(p + 1)K(Q 3 p)
p31
=
p(31)
K2 (Q)
k
l
× CwQ 31 wQ log2 w(1 3 w)p32 [3(p 3 1)w + (1 3 w)]
H[KQ (p)] =
w=1
We will start with the former. Using Cwl (1 3 w)m |w=1 = m!(31)m l>m and Leibniz’ rule, we find
H[KQ (p)] =
l
K(p + 1)K(Q 3 p) Q 3 1 Q 3p k Q
p!
C
w log w
w=1
K2 (Q)
p31 w
17.2 The random graph Js (Q )
397
Since
Cwn [wq log w]w=1 =
q
[
1
q!
(q 3 n)! m=q3n+1 m
we obtain expression (17.14) for H[KQ (p)].
We now extend the above computation to H[KQ (p)(KQ (p) 3 1)] that we write as
H[KQ (p) (KQ (p) 3 1)] =
K(p + 1)K(Q 3 p)
(U1 + U2 )
K2 (Q)
(17.18)
where
k
l
U1 = p(p 3 1)(31)p32 CwQ 31 wQ +1 log2 w(1 3 w)p32
w=1
k
l
U2 = p(31)p31 CwQ 31 wQ log2 w(1 3 w)p31
w=1
Using
Cwn [wq log2 w]w=1 = 2
6
53
42
q
q
q
q
[
[
[
[
q!
1
1
1
q!
8
7C
D 3
=
(q 3 n)! l=q3n+1 m=l+1 lm
(q 3 n)!
l
l2
l=q3n+1
l=q3n+1
we obtain,
k
Q 3 1
l
p(p 3 1)(p 3 2)!CwQ 3p+1 wQ +1 log2 w
w=1
p32
6
53
42
Q
+1
Q
+1
Q 3 1
[ 1
[ 1
8
7C
D 3
= (Q + 1)!
p32
n
n2
n=p+1
n=p+1
U1 =
Similarly,
Q 3 1
l
k
p(p 3 1)!CwQ 3p wQ log2 w
w=1
p31
6
42
53
Q
Q
Q 3 1
[
[
1
1
8
D 3
7C
= Q!
n
n2
p31
n=p+1
n=p+1
U2 =
Substitution into (17.18) leads to
6
53
42
Q
Q
[
[
1
1
p2 Q 2
8
7C
D 3
H[KQ (p)(KQ (p) 3 1)] =
(Q + 1 3 p)(Q 3 p)
n
n2
n=p+1
n=p+1
+
Q
[
1
2p(p 3 1)Q
(Q + 1 3 p)(Q 3 p) n=p+1 n
2 (p), we
From jQ (p) = H[KQ (p)] and Var [KQ (p)] = H[KQ (p)(KQ (p) 3 1)] + jQ (p) 3 jQ
obtain (17.15). This completes the proof of Corollary 17.2.3.
¤
For Q = 1000, Fig. 17.5 illustrates the typical behaviorp
for large Q of the
expectation jQ (p) and the standard deviation Q (p) = Var [KQ (p)] for
all values of p. For any spanning tree, the number of links KQ (Q 1) is
precisely Q 1, so that Var[KQ (Q 1)] = 0.
398
The e!ciency of multicast
1000
12
10
8
600
6
400
4
g1000 (m)
V1000 (m)
Standard deviation VN (m)
average hopcount gN (m)
800
200
2
0
200
400
600
0
1000
800
m
Fig. 17.5. The average number of hops jQ (p) (left axis) in the SPT and the corresponding standard deviation Q (p) (right axis) as a function of the number p
of multicast group members in the complete graph with Q = 1000.
Figure 17.5 also indicates that the standard deviation Q (p) of KQ (p)
is much smaller than the average, even for Q = 1000. In fact, we obtain
from (17.15) that
Var [KQ (p)] 2 (p)
2jQ
Q 1+p
2Q jQ (p)
2
= r(jQ
(p))
jQ (p) = PQ
1
Q +1p
Q p
p n=p+1
n
This bound implies that with probability converging to 1 for every p =
1> = = = > Q 1,
¯
¯
¯ KQ (p)
¯
¯
¯%
1
¯
¯ jQ (p)
(p)3jQ (p)
In van der Hofstad et al. (2006a), the scaled random variable KQs
jQ (p)
g
KQ (p)3jQ (p)
s
$
jQ (p)
is proved to tend to a Gaussian random variable, i.e.
s
Q (0> 1), for all p = r( Q ). For large graphs of the size of the Internet
and larger, this observation implies that the mean jQ (p) = H [KQ (p)]
is a good approximation for the random variable KQ (p) itself because the
variations of KQ (p) around the mean are small. Consequently, it underlines
the importance of jQ (p) as a significant measure for multicast.
17.2 The random graph Js (Q )
399
17.2.2 The weight of the shortest path tree
In this section, we summarize results on the weight ZQ (p) of the SPT
and omit derivations, but refer to van der Hofstad et al. (2006a). For all
1 p Q 1, the average weight of the SPT is
H [ZQ (p)] =
p
X
m=1
1 X 1
Q m
n
Q31
(17.19)
n=m
In particular, if the shortest path tree spans the whole graph, then for all
Q 2,
H [ZQ (Q 1)] =
Q31
X
1
q2
q=1
(17.20)
2
from which H [ZQ (Q 1)] ? (2) = 6 for any finite Q . The variance is
X 1 X1
X 1
4 X 1
5
Var [ZQ (Q 1)] =
+
4
Q
n3
m3
n
m4
Q31
Q31
m
Q31
n=1
m=1
n=1
m=1
or asymptotically, for large Q ,
4 (3)
+R
Var [ZQ (Q 1)] =
Q
µ
log Q
Q2
(17.21)
¶
(17.22)
Asymptotically for large Q , the average weight of a shortest path tree is
(2) = 1=645 = = =, while the average weight of the minimum spanning tree,
given by (16.49), is (3) ? (2). This result has an interesting implication. The Steiner tree is the minimum weight tree that connects a set of
p members out of Q nodes in the graph. The Steiner tree problem is
NP-complete, which means that it is unfeasible to compute for large Q .
If p = 2, the weight of the Steiner tree equals that of the shortest path,
ZSteiner>Q (2) = ZQ , while for p = Q , we have ZSteiner>Q (Q ) = ZMST .
Hence, for any p ? Q and Q , H[ZSteiner>Q (p)] (3) because the weight
of the Steiner tree does not decrease if the number of members p increases.
The ratio (2)
(3) = 1=368 indicates that the use of the SPT (computationally
easy) never performs on average more than 37% worse than the optimal
Steiner tree (computationally unfeasible). In a broader context and referring to the concept of the “Prize of Anarchy”, which is broadly explained
in Robinson (2004), the SPT used in a communications network is related
to the Nash equilibrium, while the Steiner tree gives the hardly achievable
global optimum.
Simulations — even for small Q, which allow us to cover the entire p-range
400
The e!ciency of multicast
10
N
fW* (m)(x)
10
10
0
m=1
m=5
m = 10
m = 20
m = 30
m = 40
m = 50
m = 60
m = 70
m = 80
m = 90
m = 95
-1
-2
N = 100
10
10
-3
Normalized Gumbel
Normalized Gaussian N(0,1)
-4
-4
-2
0
2
4
6
x
Fig. 17.6. The pdf of the normalized random variable ZQ (p) for Q = 100.
as illustrated in Fig. 17.6 — indicate that the normalized random variable
ZQW (p) = ZQs(p)3H[ZQ (p)] lies between a normalized Gaussian Q (0> 1) and
ydu[ZQ (p)]
a normalized Gumbel (see Theorem 6.4.1). Fig. 17.6 may suggest that, for
all p ? Q ,
{3
3I
6
lim Pr [ZQW (p) {] h3h
Q<"
(17.23)
where = 0=57721=== is Euler’s constant. For the particular case of p = 1,
the relation to the Gumbel distribution has been shown in Section 16.4 where
the correct limit law is given in (16.20). However, van der Hofstad et al.
(2006b) show that the weight of the shortest path tree for p = Q 1 tends
to a Gaussian,
s
¢
¡
g
2
Q (ZQ (Q 1) (2)) $ Q 0> SPT
2
with SPT
= 4 (3) 4=80823 as follows from (17.22). This shows that simulations alone may be inadequate to deduce asymptotic behavior. Finally,
Janson (1995) gave the related result for the minimum spanning tree. He
extended Frieze’s result (16.49) by proving that the scaled weight of the
minimum spanning tree also tends to a Gaussian for large Q ,
s
¡
¢
g
2
Q (ZMST (3)) $ Q 0> MST
17.3 The n-ary tree
401
2
where2 MST
= 6 (4) 4 (3) 1=6857.
k=2
k=5
Fig. 17.7. The left hand side tree (n = 2) has Q = 31 and G = 4, while the right
hand side (n = 5) has Q = 31 and G = 2.
17.3 The n-ary tree
In this section, we consider the n-ary tree of depth3 G with the source at the
root of the tree and p receivers at randomly chosen nodes (see Fig. 17.7).
In a n-ary tree the total number of nodes satisfies
Q = 1 + n + n2 + · · · + nG =
n G+1 1
n1
(17.24)
Theorem 17.3.1 For the n-ary tree,
jQ>n (p) = Q 1 G31
X
m=0
Proof: See Section 17.7.
¡Q313 nm+1 31 ¢
n G3m
n31
¡Qp31¢
(17.25)
p
¤
Unfortunately, the m-summation seems di!cult to express in closed form.
Observe that jQ (Q 1) = Q 1> because all binomials vanish. The sum
extends over all levels m G1> for which the remaining number of nodes in
2
Wästlund (2005) succeeded in computing the triple sum in Janson’s original result
2
M
ST =
3
" [
" [
"
[
4
(l + n 3 1)!nn (l + m)l32 m
32
45
l!n! (l + m + n)l+n+2
l=0 m=1 n=1
The depth G is equal to the number of hops from the root to a node at the leaves.
402
The e!ciency of multicast
10000
8000
gN (m)
6000
k=3
k=2
4000
k=5
k = 10
random graph
k-ary tree
0.8
m law
4
N = 10
2000
0
0
2000
4000
6000
8000
10000
m
Fig. 17.8. The multicast gain jQ (p) computed for the n-ary tree with four values
of n, the random graph (with “eective” nrg = h = 2=718===), and the Chuang-Sirbu
power law for Q = 104 on a linear scale where the prefactor H [KQ ] is given by
(16.10).
the lower levels o (i.e. G o A m) is larger than p nodes. In some sense, we
may regard (17.25) as an (exact) expansion around p = Q 1. Explicitly,
µ
jQ>n (p) = Q 1 n
1
G
G31
X
m=2
p
Q 1
nm+1 31
31
n31
n G3m
Y
¶
n
G31
n µ
Y
1
t=0
µ
1
t=0
p
Q 1t
p
Q 1t
¶
¶
(17.26)
which shows that jQ>n (p) is a polynomial in p of degree Q31
n . Moreover,
the terms in the m-sum rapidly decrease; their ratio equals
³
´
31
Q nm+1
31
n31
p
1
m
n 31
Q313t
t= n31
n
1
?
n
Ã
1
!nm
p
m
31
Q 1 nn31
?? 1
17.3 The n-ary tree
403
Figure 17.8 indicates that formula j(17.25), althoughk derived subject to
(n31)]
1 , where b{c is the
(17.24), also seems valid when G = log[1+Q
log n
largest integer smaller than or equal to {. This suggests that the deepest
level G need not be filled completely to count nG nodes and that (17.25)
may extend to “incomplete” n-ary trees. As further observed from Fig. 17.8,
jQ>n (p) is monotonously decreasing in n. Hence, it is quite likely that he
map n 7$ jQ>n (p) is decreasing in n 5 [1> Q 1]. Intuitively, this conjecture
can be understood from Fig. 17.7. Both the n = 2 and n = 5 trees have an
equal number of nodes. We observe that the deeper G (or the smaller n),
the more overlap is possible, hence, the larger jQ>n (p).
Theorem 17.1.1 can also be deduced from (17.25). The lower bound is
attained in a star topology where n = Q 1, G = 1 and H[KQ ] = 1.
The upper bound is attained in a line topology where n = 1, G = Q 1
and H[KQ ] = Q2 . Furthermore, for real values of n 5 [1> Q 1], the set of
curves specified by (17.25) covers the total allowable space of jQ>n (p), as
shown in Fig. 17.1. This suggests to consider (17.25) for estimating n in real
topologies.
Since jQ (1) = H[KQ ], the average hopcount in a n-ary tree follows from
(17.25) as
H[KQ ] = Q 1 G31
X
G31
Q 1 n n3131
1 X G3m nm+1 1
=
n
Q 1
Q 1
n1
m+1
n
m=0
G3m
m=0
QG
G
1
=
+
Q 1 (Q 1)(n 1) n 1
(17.27)
For large Q , we find with
¹
º
log[1 + Q (n 1)]
G=
1 logn Q + logn (1 1@n) + R(1@Q )
log n
that
1
+R
H[KQ ] = logn Q + logn (1 1@n) n1
µ
logn Q
Q
¶
(17.28)
Comparing (17.28) with the average hopcount in the random graph (16.10)
shows equality to first order if nrg = h. Moreover, both the second order
1
terms 1 = 0=42 and log(1 1@h) h31
= 1=04 are R(1) and independent of Q . This shows that the multicast gain in the random graph is well
approximated by jQ>h (p).
404
The e!ciency of multicast
17.4 The Chuang—Sirbu law
We discuss the empirical Chuang—Sirbu scaling law, which states that jQ (p) H [KQ ] p0=8 for the Internet. Based on Internet measurements, Chuang and
Sirbu (1998) observed that jQ (p) H [KQ ] p0=8 . Subsequently, Phillips
et al. (1999) dubbed this observation the Chuang—Sirbu law.
Corollary 17.1.5 implies that the empirical law of Chuang—Sirbu cannot
hold true for all p Q . Indeed, if jQ (p) = H [KQ ] p0=8 > we obtain from the
inequality (17.7) and the identity iQ (p) = pH [KQ ], that p0=2 H [KQ ].
Write p = {Q for a fixed 0 ? { ? 1 and { independent of Q . Hence, we
have shown that
Q]
$ 0>
Corollary 17.4.1 For all graphs satisfying the condition that H[K
Q 0=2
for large Q , the empirical Chuang—Sirbu law does not hold in the region
p = {Q with 0 ? { 1 and su!ciently large Q .
The most realistic graph models for the Internet assume that H [KQ ] f log Q , since this implies that the number of routers that can be reached
from any starting destination grows exponentially with the number of hops.
For these realistic graphs, Corollary 17.4.1 states that empirical Chuang—
Sirbu law does not hold for all p. On the other hand, there are more regular
graphs (such as a g-lattice, where H[KQ ] ' g3 Q 1@g ) with H [KQ ] Q 0=2+
(and A 0) for which the mathematical condition p0=2 H [KQ ] is satisfied
for all p and Q . As shown in Van Mieghem et al. (2000), however, these
classes of graphs, in contrast to random graphs, are not leading to good
models for SPTs in the Internet.
17.4.1 Validity range of the Chuang—Sirbu law
For the random graph Js (Q ), the SPT is very close to a URT for large Q
and with (16.10), we obtain
iQ (p) p(log Q + 1)
From the exact jQ (p) formula (17.16) for the random graph Js (Q ), the
asymptotic for large Q and p follows as
µ ¶
pQ
Q
1
jQ (p) log
(17.29)
Q p
p
2
The above scaling explains the empirical Chuang—Sirbu law for Js (Q ): ¡for¢p
pQ
log Q
small with respect to Q, the graphs of (log Q +1)p0=8 and Q3p
p 1
2 look very alike in a log-log plot, as illustrated in Fig. 17.9.
17.4 The Chuang—Sirbu law
405
Using the asymptotic properties of the digamma function #, we obtain
(17.29) as an excellent approximation for large Q (and all p) or, in normalized form with p = {Q and 0 ? { ? 1,
jQ ({Q ) + 0=5
{ log {
Q
{1
(17.30)
Q ] 0=8
= H[K
The normalized Chuang—Sirbu law is jQ ({Q)
Q
Q 0=2 { . It is interesting
Q]
= 1, since then both
to note that the Chuang—Sirbu law is “best” if H[K
Q 0=2
endpoints { = 0 and { = 1 coincide with (17.30). This optimum is achieved
when Q 250 000, which is of the order of magnitude of the estimated
number of routers in the current Internet. This observation may explain
the fairly good correspondence on a less sensitive log-log scale with Internet
measurements. At the same time, it shows that for a growing Internet, the fit
of the Chuang—Sirbu law will deteriorate. For Q 106 , the Chuang—Sirbu
law underestimates jQ (p) for all p.
7
10
0.8
m law
random graph
6
10
5
10
gN (m)
1.00
4
10
0.95
Effective Power Exponent
3
10
2
10
0.90
0.85
0.80
Number of Nodes N
0.75
1
1
10
10
0
10
1
10
2
10
2
3
10
10
3
4
5
10
10
4
10
10
6
7
10
10
5
10
8
9
10
10
6
10
10
10
7
10
m
Fig. 17.9. The multicast e!ciency for Q = 10m with m = 3> 4> ===> 7. The endpoint of
each curve jQ (Q 1) = Q 1 determines Q . The insert shows the eective power
exponent versus Q .
17.4.2 The eective power exponent (Q )
For small to moderate values of p, jQ (p) is very close to a straight line
in a log-log plot. This “power law behavior” implies that log jQ (p) 406
The e!ciency of multicast
log H(KQ )+(Q ) log p, which is a first order Taylor expansion of log jQ (p)
in log p. This observation suggests the computation4 of the eective power
exponent (Q ) as
¯
g log jQ (p) ¯¯
(17.31)
(Q ) =
g log p ¯
p=1
Only for a straight line, the dierential operator can be replaced by the
dierence operator such that (Q) W (Q )> where
W
(Q) =
jQ (2)
log H[K
Q]
log 2
(17.32)
In general, for small p, the eective power exponent (17.31) is not a constant
0.8 as in the Chuang—Sirbu law, but dependent on Q . Since jQ (p) is concave
jQ (p)
by Theorem 17.1.2, (Q ) is the maximum possible value for g log
at any
g log p
p 1. A direct consequence of Theorem 17.1.1 is that the eective power
exponent (Q ) 5 [ 12 > 1]. From recent Internet measurements, Chalmers and
Almeroth (2001) found that 0=66 (Q) 0=7.
The eective power exponent (Q) as defined in (17.31) for the random
graph is
³
´
2
2
Q #(Q ) + 6 + 6Q
¢
¡
(Q ) =
(Q 1) #(Q ) + ( 1) + Q1
while, according to the definition (17.32),
W
(Q ) =
jQ (2)
log H[K
Q]
log 2
= 1 + log2
(Q 1) (#(Q ) + 3@2 + 1@Q )
(Q 2) (#(Q ) + 1 + 1@Q )
¸
The dierence (Q ) W (Q ) monotonously decreases and is largest, 0.048
at Q = 3 while 0.0083 at Q = 105 and 0.0037 at Q = 1010 . This eective
power exponent (Q ) is drawn in the insert of Fig. 17.9, which shows that
(Q ) is increasing and not a constant close to 0.8. More interestingly, for
Q]
large Q , we find with (16.10) and (16.11) that (Q ) Var[K
H[KQ ] and that
Q]
limQ<" (Q ) = 1. In Van Mieghem et al. (2000), the ratio = Var[K
H[KQ ]
pops up naturally as the extreme value index of the distribution of the
link weights in a topology. Since measurements of the hopcount in Internet
Q]
indicate that Var[K
H[KQ ] 1, which corresponds to a regular distribution, this
extreme value index strongly favors the model of the hopcount based on
4
Although (17.5) only has meaning for integer p, analytic continuation to a complex variable
is possible and, hence, dierentiation can be defined.
17.5 Stability of a multicast shortest path tree
407
shortest paths in Js (Q ), although random graphs do not model the Internet
topology well.
Thus, if the number of nodes in the Internet is still growing, we suggest,
only for small to moderate values of p, the consideration of a power law
approximation for the multicast gain
Va r [KQ ]
jQ (p) H [KQ ] p H[KQ ]
instead of the Chuang—Sirbu law.
In summary, many properties in nature seem linear on an insensitive loglog scale. However, deriving from these plots simple and attractive power
laws for complicated matter, seems a little oversimplified5 .
17.5 Stability of a multicast shortest path tree
We now turn to the problem of quantifying the stability in a multicast tree.
Inspired by Poisson arrival processes, at a single instant of time, we assume
that either one or zero group members can leave. In the sequel, we do
not make any further assumption about the time-dependent process of leaving/joining a multicast group and refrain from dependencies on time. The
number of links in the tree that change after one multicast group member
leaves the group has been chosen as measure for the stability of the multicast
tree. If we denote this quantity by Q (p), then, by definition of jQ (p),
the average number of changes equals
H [Q (p)] = jQ (p) jQ (p 1)
(17.33)
Since jQ (p) is concave (Theorem 17.1.2), H [Q (p)] is always positive and
decreasing in p. If the scope of p is extended to real numbers, H [Q (p)] 0 (p) which simplifies further estimates.
jQ
The situation where on average less than one link changes if one multicast
group member leaves may be regarded as a stable regime. Since H [Q (p)]
is always positive and decreasing in p, this stable regime is reached when
the group size p exceeds p1 , which satisfies H [Q (p1 )] = 1. For example,
for the URT that is asymptotically the SPT for the class RGU defined in
Section 16.2.2, this condition approximately follows from (17.29) as
µ ¶
µ
¶
pQ
Q
(p 1)Q
Q
log
log
(17.34)
H [Q (p)] Q p
p
Q p+1
p1
5
Many recent articles devote attention to power law behavior but most of them seem prudent:
just recall the immense interest (hype?) a few years ago in the long range and self-similar nature
of Internet tra!c and the relation to the “simple” power law with only the Hurst parameter
(comparable to (Q) here) in the exponent.
408
The e!ciency of multicast
p
Let { = Q
, then 0 ? { ? 1 and
µ
¶
H [Q (p)]
{
({ 1@Q )
1
log { +
log { Q
1{
1 ({ 1@Q )
Q
After expanding the second term in a Taylor series around { to first order
in Q1 ,
µ ¶
{ 1 log {
1
H [Q ({Q )] +R
(1 {)2
Q
For large Q , H [Q ({1 Q)] 1 occurs when {1 = 0=3161, which is the
{
= 1. For the class RGU, a stable tree as defined
solution in { of {313log
(13{)2
above is obtained when the multicast group size p is larger than p1 =
0=3161Q Q3 . In the sequel, since p1 is high and of less practical interest,
we will focus on multicast group sizes smaller than p1 = The computation
of p1 for other graph types turns out to be di!cult. Since, as mentioned
above, the comparison with Internet measurement (Van Mieghem et al.,
2001a) shows that formula (17.29) provides a fairly good estimate, we expect
that p1 Q3 also approximates well the stable regime in the Internet.
The following theorem quantifies the stability in the class RGU.
Theorem 17.5.1 For su!ciently large Q and fixed p, the number of
changed edges Q (p) in a random graph Js (Q ) with uniformly distributed
link weights tends to a Poisson distribution,
n
3H[{Q (p)] (H [Q (p)])
Pr [Q (p) = n] h
n!
(17.35)
where H [Q (p)] = jQ (p) jQ (p 1) and jQ (p) is given by (17.16) or
approximately by (17.29).
Proof: In Section 16.2.1 we have mentioned that the SPT in the class
RGU is an URT for large Q . In addition, the random variable for the
number of hops KQ from the root to an arbitrary node tends, for large Q ,
to a Poisson random variable with mean H [KQ ] log Q + 1 as shown in
Section 16.3.1. Now, Q (p) = KQ (p) KQ (p 1) is the positive discrete
random variable that counts the absolute value of the dierence between
the hopcount kU<p from the root (source) to user p and the hopcount
kU<p31 from the root to the user closest in the tree to p, which we here
relabel by p 1. Both users p and p 1 are not independent nor the
two random variables kU<p and kU<p31 are independent in general due to
possible overlap in their paths.
If the shortest paths from the root to each of the two users p and p 1
17.5 Stability of a multicast shortest path tree
409
Root
A
B
D
m
m1
Fig. 17.10. A sketch of a uniform recursive tree, where kU$p = 3 and kU$p1 = 4
and the number of links in common is two (shown in bold Root-A-B).
overlap, there always exists a node in the SPT, say node E as illustrated in
Fig. 17.10, that sees the partial shortest paths from itself to p and p 1
as non-overlapping and independent. Since the SPT is a URT, the subtree
rooted at that node E (enclosed in dotted line in Fig. 17.10) is again a
URT as follows from Theorem 16.2.1. With respect to E, the nodes p and
p 1 are uniformly chosen and the number of links Q (p) that change
if the p-th node leaves is just its hopcount with respect to E (instead of
the original root). We denote the unknown number of nodes in that subtree
rooted at E by (p) Q . We have that (p) (p 1) because by
adding a group member, the size of the subtree can only decrease. For large
Q and small p, (p) is large such that the above mentioned asymptotic
law of the hopcount applies. If both p and Q are large, (p) will become
too small for the asymptotic law to apply. Thus, for fixed p and large Q ,
this implies that Q (p) tends to a Poisson random variable with mean
H [Q (p)].
¤
Simulations in Van Mieghem and Janic (2002) indicate that the Poisson
law seems more widely valid than just in the asymptotic regime (Q $ 4).
The proof can be extended to a general topology. Assume for a certain
class of graphs that the pdf of the hopcount Pr [KQ = n] and the multicast
e!ciency jQ (p) can be computed for all sizes Q. The subtree rooted at E
is again a SPT in a subcluster of size (p), which is an unknown random
variable. The argument similar as the one in the proof above shows that
£
¤
Pr [Q (p) = n] = Pr K(p) = n
410
The e!ciency of multicast
This argument implicitly assumes that all multicast users are uniformly
distributed over the graph. By the law of total probability,
Q
¤ X
£
¤
£
Pr K(p) = n|(p) = q Pr [(p) = q]
Pr K(p) = n =
=
q=1
Q
X
Pr [Kq = n] Pr [(p) = q]
q=1
which, unfortunately shows that the pdf of (p) is required to specify
Pr [Q (p) = n]. However, we can proceed further in an approximate way by
replacing the unknown random variable (p) by its best estimate, H [(p)].
In that approximation, the average size H [(p)] of the shortest path subtree
rooted at E can be specified, at least in principle, with the use of (17.33).
£
¤
¤ PH[(p)]31
£
n Pr KH[(p)] = n , by equating
Indeed, since H KH[(p)] = n=1
¤
£
H KH[(p)] = jQ (p) jQ (p 1)
a relation in one unknown H [(p)] is found and can be solved for H [(p)].
In conclusion, we end up with the approximation
£
¤
Pr [Q (p) = n] Pr KH[(p)] = n
which roughly demonstrates that, in general, Pr [Q (p) = n] is likely related to the hopcount distribution in that certain class of graphs.
Unfortunately, for very few types of graphs, both the pdf Pr [KQ = n] and
the multicast gain jQ (p) can be computed. This fact augments the value of
Theorem 17.5.1, although the class RGU is not a good model for the graph
of the Internet. Fortunately, the shortest path tree deduced from that class
seems a reasonable approximation as shown in Fig. 16.4 and su!cient to
provide first order estimates.
17.6 Proof of (17.16): jQ (p) for random graphs
Before embarking with the proof of formula (17.16), we first prove the following lemma.
Lemma 17.6.1 For d A e,
V(d> e) =
e
[
(d 3 n)! 1
n=1
(e 3 n)! n
=
d!
[#(d + 1) 3 #(d 3 e + 1)]
e!
and
V(e> e) =
e
[
1
n=1
n
= #(e + 1) + 17.6 Proof of (17.16): jQ (p) for random graphs
411
Proof: We start by writing
V(d> e) =
e
[
(d 3 n) · · · (e 3 n + 1)
n
n=1
=d
e
[
e
[
(d 3 1 3 n) · · · (e 3 n + 1)
3
(d 3 1 3 n) · · · (e 3 n + 1)
n
n=1
n=1
and by the recurrence for the binomial
Since (d 3 1 3 n) · · · (e 3 n + 1) = (d 3 e 3 1)! d313n
e3n
Se d313n d31
=
>
we
have
that
n=1
e3n
e31
V(d> e) = dV(d 3 1> e) 3
1 (d 3 1)!
d 3 e (e 3 1)!
After s iterations, we have
V(d> e) = d(d 3 1) · · · (d 3 s + 1)V(d 3 s> e) 3
s31
[
d!
1
(e 3 1)! m=0 (d 3 m)(d 3 m 3 e)
and, if s = d 3 e, the recursions stops with result,
d3e31
e
[
1
d!
d! [ 1
3
e! n=1 n
(e 3 1)! m=0 (d 3 m)(d 3 m 3 e)
3
4
# d
$
d3e
d3e
e
d
[
[1
d! C [ 1
1D
d! [ 1
d! [ 1
3
3
3
=
=
e! n=1 n
e! n=1 n n=e+1 n
e! n=1 n n=1 n
V(d> e) =
from which the lemma follows.
¤
l
k
(Q )
Proof of equation (17.16): We will investigate H [[l ] = H [l
in the URT with Q
nodes. Here H [[l ] is the number of joint hops in a multicast SPT from the root to l uniformly
chosen
k l nodes in the URT and where all the group member nodes are dierent from the root. Let
˜ l be the same quantity where we allow the group member nodes to be the root. Then,
H [
k l
˜ l = Q 3 l H [[l ]
H [
Q
1
since there are l possibilities each with probability Q
that one of the nodes equals the root, in
which case [l = 0.
k l
˜ l is deduced from Fig. 17.11, where two clusters are
The average number of joint hops H [
shown each with respectively n and Q 3 n nodes. The first cluster with n nodes does not possess
the root (dark shaded), but it contains the l multicast group members (light shaded). There is
already at least 1 joint hop because the link between the root and node D, that can be viewed as
the root of the first cluster, is used by all l group members lying in the first cluster. Given the size
n of the first cluster, the probability that all l uniformly chosen group members belong to the first
n(n31)···(n3l+1)
cluster equals Q (Q 31)···(Q 3l+1) because the probability that the first group member belongs to
n
that cluster, which is Q
, the probability that the second group member also belongs to the first
n31
cluster, which is Q 31 and so on. Since the size of the first cluster connected to the root is uniform
in between 1 and Q 3 1, the probability that the size is n equals Q 131 . When all l nodes are in
that first cluster of size n, [l is at least 1, and the problem restarts, but with Q replaced by n and
D being the root. Hence, if all l group members
belong
tolthe first cluster, the average number of
k
S 31 n(n31)···(n3l+1) ˜ (n) because we must sum over all possible
joint hops is Q 131 Q
1
+
H
[
l
n=1 Q (Q 31)···(Q 3l+1)
sizes for the first cluster. If not all l group member nodes are in the first cluster, the group member
nodes are divided over the two clusters. But, in that case, we have no joint overlaps or [l = 0.
412
The e!ciency of multicast
Root
A
N k nodes
k nodes
l
k
˜ (Q ) -recursion.
Fig. 17.11. The two contributing clusters leading to the H [
l
Thus, if not all l group members nodes are in the first cluster, the only way that there are possible
joint overlaps ([l A 0), is that all l group member nodes are in the second cluster. However,
by removing the first cluster, we are left again with a uniform recursive tree of size Qk 3 n. The
l
S 31 (Q 3n)(Q 3n31)···(Q 3n3l+1)
˜ (Q 3n) .
average number of joint hops in this case is Q 131 Q
H [
l
n=1
Q (Q 31)···(Q 3l+1)
Adding both contributions results in the recursion formula
k
l
˜ (Q ) =
H [
l
Q
31
k
l
[
n(n 3 1) · · · (n 3 l + 1) 1
˜ (n)
1 + 2H [
l
Q 3 1 n=1 Q (Q 3 1) · · · (Q 3 l + 1)
(17.36)
We next write
(Q )
l
l
k
˜ (Q ) =
= Q(Q 3 1) · · · (Q 3 l + 1)H [
l
l
k
Q!
˜ (Q )
H [
l
(Q 3 l)!
then the above recurrence equation (17.36) turns into
(Q )
l
=
Q
31
[
1
(n)
[n(n 3 1) · · · (n 3 l + 1) + 2l ]
Q 3 1 n=1
=
Q
31
Q
31
[
[
2
1
(n)
n(n 3 1) · · · (n 3 l + 1) +
Q 3 1 n=l
Q 3 1 n=1 l
Subtracting
(Q )
(Q 3 1)l
(Q 31)
3 (Q 3 2)l
=
(Q 3 1)!
(Q 31)
+ 2l
(Q 3 l 3 1)!
from which we obtain
(Q 31)
(Q )
l
Q
=
(Q 3 2)!
+ l
Q(Q 3 l 3 1)!
Q 31
(17.37)
Iterating (17.37) gives
(Q )
l
Q
=
n31
[
(Q 3n)
(Q 3 2 3 m)!
+ l
(Q 3 m)(Q 3 l 3 1 3 m)!
Q 3n
m=0
17.6 Proof of (17.16): jQ (p) for random graphs
413
l
k
(l)
˜ (l) = 0, because the root is then always one of the group member nodes, we
Since l = H [
l
finally obtain,
(Q )
l
=Q
Q[
3l31
m=0
Q
[
(Q 3 2 3 m)!
(n 3 2)!
=Q
(Q 3 m)(Q 3 l 3 1 3 m)!
n(n
3 l 3 1)!
n=l+1
(Q )
It can be shown that, for large Q, l
Because
k
l
(Q )
H [l
=
(17.38)
(Q 32)!
Q
; l31
.
(Q 3l31)!
l
k
Q
˜ (Q ) = (Q 3 l 3 1)! (Q )
H [
l
l
Q 3l
(Q 3 1)!
we have that
Q
l
k
(Q 3 l 3 1)!Q [
(n 3 2)!
(Q )
=
H [l
(Q 3 1)!
n(n
3 l 3 1)!
n=l+1
(17.39)
k
l
(Q )
1
Q
1
and, for large Q, H [l
; l31
; l31
=
(Q 31)
Invoking Theorem 17.1.3, the average number of multicast hops for p uniformly chosen, distinct
group members is
jQ (p) =
=
p Q
[
(Q 3 l 3 1)!Q [ (n 3 2)!
p
(31)l31
(Q
3
1)!
n(n 3 l 3 1)!
l
l=1
n=2
Q
32
p
[
3Q
(Q 3 2 3 v)! [ p
(Q 3 l 3 1)!
(31)l
l
(Q 3 1)! v=0
Q 3v
(Q
3 l 3 1 3 v)!
l=1
The l-summation can be executed as follows. Consider {Q 31 (131@{)p =
Dierentiating v times yields
Sp p
l Q 3l31 =
l=0 l (31) {
p l
[
(Q 3 l 3 1)!
p
gv k Q 313p
{Q 3l3v31 =
(31)l
{
({ 3 1)p
v
(Q 3 l 3 1 3 v)!
g{
l
l=0
Expanding the right-hand side around { = 1 gives
" l [
Q 3 1 3 p gv
gv k Q 313p
p
{
=
({
3
1)
({ 3 1)n+p
g{v
g{v
n
n=0
=
" [
Q 3 1 3 p
n=0
n
(n + p)!
({ 3 1)n+p3v
(n + p 3 v)!
Evaluation at { = 1 only leads to a non-zero contribution if n + p 3 v = 0. Hence,
p Q 3 1 3 p
[
(Q 3 1)!
(Q 3 l 3 1)!
p
=
v! 3
(31)l
(Q
3
l
3
1
3
v)!
(Q
3 1 3 v)!
l
v
3
p
l=1
414
The e!ciency of multicast
and
Q 32
Q
32
[
3Q(Q 3 1 3 p)! [
v!
1
+Q
(Q 3 1)!
(Q
3
v)(v
3
p)!(Q
3
1
3
v)
(Q
3
v)(Q
3 1 3 v)
v=0
v=0
%Q 32
&
Q
32
[
3Q(Q 3 1 3 p)! [
v!
v!
=
3
(Q 3 1)!
(v
3
p)!(Q
3
1
3
v)
(v
3
p)!(Q
3 v)
v=p
v=p
%Q 31
&
Q
[ 1
[
1
+Q
3
n
n
n=1
n=2
%Q 3p31
&
Q[
3p
[
3Q(Q 3 1 3 p)!
(Q 3 n 3 1)!
(Q 3 n)!
=
3
+Q 31
(Q 3 1)!
(Q 3 n 3 1 3 p)!n
(Q 3 n 3 p)!n
n=1
n=2
jQ (p) =
Rewrite the first summation as
Q 3p31
[
n=1
Q[
3p
(Q 3 2)!
(Q 3 n 3 1)!
(Q 3 n 3 1)!(Q 3 n 3 p)
=
+
(Q 3 n 3 1 3 p)!n
(Q 3 2 3 p)!
(Q 3 n 3 p)!n
n=2
=
Q[
3p
Q[
3p
(Q 3 2)!
(Q 3 n)!
(Q 3 n 3 1)!
+
3p
(Q 3 2 3 p)!
(Q
3
n
3
p)!n
(Q
3 n 3 p)!n
n=2
n=2
Then,
3Q(Q 3 1 3 p)!
jQ (p) =
(Q 3 1)!
=
%
&
Q[
3p
(Q 3 2)!
(Q 3 n 3 1)!
3p
+Q 31
(Q 3 2 3 p)!
n(Q 3 n 3 p)!
n=2
Q 3p
Q (p 3 1) + 1
pQ(Q 3 1 3 p)! [ (Q 3 n 3 1)!
+
(Q 3 1)
(Q 3 1)!
n(Q 3 n 3 p)!
n=2
= 31 +
Q 3p
pQ(Q 3 1 3 p)! [ (Q 3 n 3 1)!
(Q 3 1)!
n(Q 3 n 3 p)!
n=1
Using Lemma 17.6.1
Q[
3p
(Q 3 1)!
(Q 3 n 3 1)! 1
=
[#(Q) 3 #(p)]
(Q
3
n
3
p)!
n
(Q
3 p)!
n=1
(17.40)
¤
finally leads to (17.16).
17.7 Proof of Theorem 17.3.1: jQ (p) for n-ary trees
˜
Let [l be the number of joint hops for l dierent multicast group members (we allow the root to
˜ l = 0). Then,
be a user in which case [
k
l
˜ l D 1 = Pr [All group members belong to the same cluster connected to the root]
Pr [
= n · Pr [All group members belong to the first cluster connected to the root]
(Q 31)@n
1+n+···+nG1 =n
Ql l
l
=n 1+n+···+nG l
(17.41)
17.7 Proof of Theorem 17.3.1: jQ (p) for n-ary trees
415
By self-similarity of n-ary trees we obtain
1+n+···+nG2 k
l
l
˜ l D 2|[
˜ l D 1 = s(G31) = n Pr [
l
G1 1+n+···+n
l
because keach cluster
extending
from the root
l
k
l isk itself a n-ary
l tree of depth G 3 1. In general, we
˜ l D m = Pr [
˜ l D m|[
˜ l D m 3 1 Pr [
˜ l D m 3 1 . Hence, by iteration,
have Pr [
k
l
˜l D m =
Pr [
G
\
(q)
sl
>
m = 1> 2> = = = > G 3 1
(17.42)
q=G3m+1
k
l
˜ l D G = 0, because if [
˜ l = G some destinations must
Note that for l D 2 the probability Pr [
be identical. From (2.36) we obtain for l D 2,
k l G31
[
˜l =
H [
G
\
(q)
sl =
m=1 q=G3m+1
G31
[
m=1
Gm m
G31
[ nG3m 1+···+n
nm 1+···+n
l
l
=
1+···+nG 1+···+nG m=1
l
(17.43)
l
k l
˜ l , we find
Since H [[l ] = QQ3l H [
H [[l ] =
1+···+nm G31
Q [ nG3m
l
1+···+nG >
Q 3 l m=1
lD2
(17.44)
l
k l
˜ 1 and H [[1 ] we find
For the value of H [
G
k l
r
q
[
1
˜1 = 1
H [
nm (1 + · · · + nG3m ) =
GnG+1 3 (Q 3 1)
Q m=1
Q(n 3 1)
and
H [[1 ] =
m
G31
G
[ nG3m 1+···+n
nG
1 [ m
Q
1
n (1 + · · · + nG3m ) =
1+···+nG +
Q 3 1 m=1
Q 3 1 m=1
Q 31
1
Invoking Theorem 17.1.3 yields
jQ>n (p) =
1+···+nm G31
p [
Q [ nG3m
p
pnG
l
3
(31)l
1+···+nG Q 3 1 l=1 l
Q 3 l m=1
l
m+1
Writing Dm = n n3131 and reversing the l- and m-summation yields, using (17.24),
jQ>n (p) =
G31
p
[
Dm ! [ p
(Q 3 l 3 1)!
pnG
3Q
(31)l
nG3m
Q 31
Q!
(Dm 3 l)!
l
m=1
l=1
Concentrating on the inner sum with lower sum bound l = 0, denoted as Vm , and substituting
n = p 3 l, we have
p [
p
K(Q 3 p + n)
Vm =
(31)p3n
n
K(D
m 3 p + n + 1)
n=0
Invoking the Taylor series of the hypergeometric function (Abramowitz and Stegun, 1968, Section
15.1.1),
"
I (d> e; f; }) =
K(f) [ K(d + q)K(e + q) q
}
K(d)K(e) q=0
K(f + q)q!
416
The e!ciency of multicast
p!Vm is the coe!cient in } p of the Cauchy product of
(1 3 })p =
" [
p
(31)n } n
n
n=0
and
"
[
K(Q 3 p)
K(Q 3 p + n)
I (1> Q 3 p; Dm 3 p + 1; }) =
}n
K(Dm 3 p + 1)
K(D
m 3 p + 1 + n)
n=0
Hence,
Vm =
1
K(Q 3 p)
gp
[(1 3 })p I (1> Q 3 p; Dm 3 p + 1; })]|}=0
p! K(Dm 3 p + 1) g} p
Invoking the dierentiation formula (Abramowitz and Stegun, 1968, Section 15.2.7),
(31)p K(d + p)K(f 3 e + p)K(f)
gp (1 3 })d+p31 I (d> e; f; }) =
(13})d31 I (d+p> e; f+p; })
g} p
K(d)K(f 3 e)K(f + p)
we have, since d = 1 and I (d> e; f; 0) = 1,
Vm =
(31)p K(Q 3 p)K(Dm + 1 3 Q + p)
K(Dm + 1 3 Q)K(Dm + 1)
Thus,
jQ>n (p) =
=
G31
[
Dm ! (31)p (Q 3 p 3 1)!(Dm 3 Q + p)!
pnG
(Q 3 1)!
nG3m
3Q
3
Q 31
Q!
(Dm 3 Q)!Dm !
Dm !
m=1
G31
G31
[
(31)p31 (Q 3 p 3 1)! [ G3m (Dm 3 Q + p)!
pnG
+
nG3m +
n
Q 31
(Q 3 1)!
(Dm 3 Q)!
m=1
m=1
from which (17.25) is immediate.
¤
17.8 Problem
(i) Compute the eective power exponent W (Q) for the n-ary tree.
18
The hopcount to an anycast group
In this chapter, the probability density function of the number of hops to
the most nearby member of the anycast group consisting of p members
(e.g. servers) is analyzed. The results are applied to compute a performance measure of the e!ciency of anycast over unicast and to the server
placement problem. The server placement problem asks for the number of
(replicated) servers p needed such that any user in the network is not more
than m hops away from a server of the anycast group with a certain prescribed probability. As in Chapter 17 on multicast, two types of shortest
path trees are investigated: the regular n-ary tree and the irregular uniform
recursive tree treated in Chapter 16. Since these two extreme cases of trees
indicate that the performance measure 1 d log p where the real number d depends on the details of the tree, it is believed that for trees in real
networks (as the Internet) a same logarithmic law applies. An order calculus
on exponentially growing trees further supplies evidence for the conjecture
that 1 d log p for small p.
18.1 Introduction
IPv6 possesses a new address type, anycast, that is not supported in IPv4.
The anycast address is syntactically identical to a unicast address. However, when a set of interfaces is specified by the same unicast address, that
unicast address is called an anycast address. The advantage of anycast
is that a group of interfaces at dierent locations is treated as one single
address. For example, the information on servers is often duplicated over
several secondary servers at dierent locations for reasons of robustness and
accessibility. Changes are only performed on the primary servers, which
are then copied onto all secondary servers to maintain consistency. If both
the primary and all secondary servers have a same anycast address, a query
417
418
The hopcount to an anycast group
from some source towards that anycast address is routed towards the closest
server of the group. Hence, instead of routing the packet to the root server
(primary server) anycast is more e!cient.
Suppose there are p (primary plus all secondary) servers and that these p
servers are uniformly distributed over the Internet. The number of hops from
the querying device D to the closest server is the minimum number of hops,
denoted by kQ (p), of the set of shortest paths from D to these p servers in
a network with Q nodes. In order to solve the problem, the shortest path
tree rooted at node D, the querying device, needs to be investigated. We
assume in the sequel that one of the p uniformly distributed servers can
possibly coincide with the same router to which the querying machine D is
attached. In that case, kQ (p) = 0. This assumption is also reflected in
the notation, small k, according to the convention made in Section 16.3.2
that capital K for the hopcount excludes the event that the hopcount can
be zero.
Clearly, if p = 1, the problem reduces to the hopcount of the shortest
path from D to one uniformly chosen node in the network and we have that
kQ (1) = kQ >
where kQ is the hopcount of the shortest path in a graph with Q nodes.
The other extreme for p = Q leads to
kQ (Q ) = 0
because all nodes in the network are servers. In between these extremes, it
holds that
kQ (p) kQ (p 1)
since one additional anycast group member (server) can never increase the
minimum number of hops from an arbitrary node to that larger group.
The hopcount to an anycast group is a stochastic problem. Even if the
network graph is exactly known, an arbitrary node D views the network
along a tree. Most often it is a shortest path tree. Although the sequel
emphasizes “shortest path trees”, the presented theory is equally valid for
any type of tree. The node D’s perception of the network is very likely
dierent from the view of another node D0 . Nevertheless, shortest path
trees in the same graph possess to some extent related structural properties
that allow us to treat the problem by considering certain types or classes
of shortest path trees. Hence, instead of varying the arbitrary node D over
all possible nodes in the graph and computing the shortest path tree at
each dierent node, we vary the structure of the shortest path tree rooted
18.2 General analysis
419
at D over all possible shortest path trees of a certain type. Of course, the
confinement of the analysis then lies in the type of tree that is investigated.
We will only consider the regular n-ary tree and the irregular URT . It
seems reasonable to assume that “real” shortest path trees in the Internet
possess a structure somewhere in between these extremes and that scaling
laws observed in both the two extreme cases may also apply to the Internet.
The presented analysis allows us to address at least two dierent issues.
First, for a same class of trees, the e!ciency of anycast over unicast defined
in terms of a performance measure ,
=
H [kQ (p)]
1
H [kQ (1)]
is quantified. The performance measure indicates how much hops (or link
traversals or bandwidth consumption) can be saved, on average, by anycast.
Alternatively, also reflects the gain in end-to-end delay or how much faster
than unicast, anycast finds the desired information. Second, the so-called
server placement problem can be treated. More precisely, the question “How
many servers p are needed to guarantee that any user request can access the
information within m hops with probability Pr [kQ (p) A m] , where is
certain level of stringency,” can be answered. The server placement problem
is expected to gain increased interest especially for real-time services where
end-to-end QoS (e.g. delay) requirements are desirable. In the most general
setting of this server placement problem, all nodes are assumed to be equally
important in the sense that users’ requests are generated equally likely at
any router in the network with Q nodes. As mentioned in Chapter 17, the
validity of this assumption has been justified by Phillips et al. (1999). In
the case of uniform user requests, the best strategy is to place servers also
uniformly over the network. Computations of Pr [kQ (p) A m] ? for given
stringency and hop m, allow the determination of the minimum number p
of servers. The solution of this server placement problem may be regarded
as an instance of the general quality of service (QoS) portfolio of an network
operator. When the number of servers for a major application oered by the
service provider are properly computed, the service provider may announce
levels of QoS (e.g. via Pr [kQ (p) A m] ? ) and accordingly price the use
of the application.
18.2 General analysis
Let us consider a particular
n
o shortest path tree W rooted at node D with
(n)
as defined in Section 16.2.2. Suppose
the level set OQ = [Q
1$n$Q31
420
The hopcount to an anycast group
that the result of uniformly distributing p anycast group members over the
graph leads to a number p(n) of those anycast group member nodes that
(n)
are n hops away
n from
o the root. These p distinct nodes all belong to the
(n)
(n)
n-th level set [Q . Similarly as for [Q , some relations are immediate.
First, p(0) = 0 means that none of the p anycast group members coincides
with the root node D or p(0) = 1 means that one of them (and at most one)
is attached to the same router D as the querying device. Also, for all n A 0,
(n)
it holds that 0 p(n) [Q and that
Q31
X
p(n) = p
(18.1)
n=0
Given the tree W specified
the level set OQªand the anycast group members
© (0) by (1)
specified by the set p > p > = = = > p(Q31) , we will derive the lowest nonempty level p(m) , which is equivalent to kQ (p).
Let us denote by hm the event that all first m + 1 levels are not occupied
by an anycast group member,
n
o n
o
n
o
hm = p(0) = 0 _ p(1) = 0 _ · · · _ p(m) = 0
The probability distribution of the minimum hopcount,
[kQ (p)
© Pr
ª = m|OQ ],
(m)
is then equal to the probability of the event hm31 _ p A 0 . Since the
ª ©
ªf
©
event p(m) A 0 = p(m) = 0 , using the conditional probability yields
hn
o¯
i
¯
Pr [kQ (p) = m|OQ ] = Pr p(m) A 0 ¯ hm31 Pr [hm31 ]
i´
o¯
³
hn
¯
= 1 Pr p(m) = 0 ¯ hm31 Pr [hm31 ]
(18.2)
©
ª
Since hm = hm31 _ p(m) = 0 , the probability of the event hm can be decomposed as
hn
o¯
i
¯
Pr [hm ] = Pr p(m) = 0 ¯ hm31 Pr [hm31 ]
(18.3)
The assumption that all£p
group members
are uniformly distributed
¤
© anycast ª¯
enables to compute Pr p(m) = 0 ¯ hm31 exactly. Indeed, by the uniform
assumption, the probability equals the ratio of the favorable possibilities
over the total possible. The total number of ways to distribute p items over
P
(n)
Q m31
— the latter constraints follows from the condition
n=0 [Q positions
¡Q3Sm31 [ (n) ¢
n=0 Q
hm31 — equals
. Likewise, the favorable number of ways to
p
18.2 General analysis
421
distribute p items over the remaining levels higher than m, leads to
S
o¯
hn
i ¡Q 3 mn=0 [Q(n) ¢
¯
(m)
p
(18.4)
Pr p = 0 ¯ hm31 = ¡ Sm31
(n)
Q3
[ ¢
n=0
p
Q
The recursion (18.3) needs an initialization, given by
h
i
p
Pr [h0 ] = Pr p(0) = 0 = 1 Q
Q 31
¤
¤
ª
£ (0)
£©
( )
which follows from Pr p = 0 = p
and equals Pr p(0) = 0 |h31
Q
(p)
£
¤
p
(although the event h31 is meaningless). Observe that Pr p(0) = 1 = Q
holds for any tree such that
p
Pr [kQ (p) = 0] =
Q
By iteration of (18.3), we obtain
Pr [hm ] =
m
Y
v=0
¡Q3Sv
(n)
n=0 [Q
¢
¡Q 3Sm
(n)
n=0 [Q
p
p
¡Q
¢
¡Q3Sv31 [ (n) ¢ =
n=0
p
Q
¢
(18.5)
p
P
where the convention in summation is that en=d in = 0 if d A e. Finally,
combining (18.2) with (18.4) and (18.5), we arrive at the general conditional
expression for the minimum hopcount to the anycast group,
¡Q 3Sm31 [ (n) ¢ ¡Q3Sm [ (n) ¢
n=0 Q
n=0 Q
p
p
(18.6)
Pr [kQ (p) = m|OQ ] =
¡Q ¢
p
Clearly, while Pr [kQ (0) = m|OQ ] = 0 since there is no path, we have for
p = 1,
(m)
[
Pr [kQ (1) = m|OQ ] = Q
Q
It directly follows from (18.6) that
¡Q 3Sq
(n)
n=0 [Q
Pr [kQ (p) q|OQ ] = 1 p
¡Q
¢
p
¢
(18.7)
P 31
P
(n)
(n)
If Q qn=0 [Q ? p or, equivalently, Q
n=q+1 [Q ? p, then equation
(18.7) shows that Pr [kQ (p) A q|OQ ] = 0. The maximum possible hopcount
of a shortest path to an anycast group strongly depends on the specifics of the
shortest path tree or the level set OQ . A general result is worth mentioning:
422
The hopcount to an anycast group
Theorem 18.2.1 For any graph, it holds that
Pr[kQ (p) A Q p] = 0
In words, the longest shortest path to an anycast group with p members
can never possess more than Q p hops.
Proof: This general theorem follows from the fact that the line topology
is the tree with the longest hopcount Q 1 and only in the case that all
p last positions (with respect to the source or root) are occupied by the p
anycast group members, is the maximum hopcount Q p.
¤
For the URT , Pr[kQ (p) = Q p] is computed exactly in (18.12).
Corollary 18.2.2 For any graph, it holds that
1
Q
Proof: This corollary follows from Theorem 18.2.1 and the law of total
probability. Alternatively, if there are Q 1 anycast members in a network
with Q nodes, the shortest path can only consist of one hop if none of the
anycast members coincides with the root node. This probability is precisely
1
¤
Q.
Pr[kQ (Q 1) = 1] =
Using the tail probability formula (2.36) for the average, it follows from
(18.7) that
P
Q32 µ
(n) ¶
1 X Q qn=0 [Q
(18.8)
H [kQ (p)|OQ ] = ¡Q ¢
p
p q=0
from which we find,
1 X
(n)
n[Q
H [kQ (1)|OQ ] =
Q
Q31
n=1
Thus, given OQ , a performance measure for anycast over unicast can be
quantified as
H [kQ (p)|OQ ]
1
=
H [kQ (1)|OQ ]
Using the law of total probability, the distribution of the minimum hopcount to the anycast group is
X
Pr [kQ (p) = m|OQ ] Pr [OQ ]
(18.9)
Pr [kQ (p) = m] =
all OQ
18.3 The n-ary tree
423
or explicitly,
Pr[kQ (p) = m] =
X
SQ31
¡SQ 31 {n ¢ ¡SQ 31 {n ¢ h
i
n=m
n=m+1
(1)
(Q31)
p
= {Q31
Pr [Q = {1 >= = = >[Q
¡Q ¢ p
n=1 {n =Q31
p
where the integers {n 0 for all n. This expression explicitly shows the
importance of the level structure OQ of the shortest path tree W . The level
set OQ entirely determines the shape of the tree W . Unfortunately, a general
form for Pr [OQ ] or Pr [kQ (p) = m] is di!cult to obtain. In principle, via
extensive trace-route measurements from several roots, the shortest path
tree and Pr [OQ ] can be constructed such that a (rough) estimate of the
level set OQ in the Internet can be obtained.
18.3 The n-ary tree
For regular trees, explicit expressions are possible because the summation
in (18.9) simplifies considerably. For example, for the n-ary tree defined in
Section 17.3,
(m)
[Q = n m
(m)
Provided the set OQ only contains these values of [Q for each m, we have
that Pr [OQ ] = 1, else it is zero (because then OQ is not consistent with a
G+1
n-ary tree). Summarizing, for the n-ary tree with Q = n n3131 and G levels,
the distribution of the minimum hopcount to the anycast group is
¡Q3 nm 31 ¢ ¡Q3 nm+1 31 ¢
n31
n31
p
(18.10)
Pr [kQ (p) = m] =
¡Q ¢ p
p
Extension of the integer n to real numbers in the formula (18.10) is expected to be of value as suggested in Section 17.3. When a n-ary tree was
used to fit corresponding Internet multicast measurements (Van Mieghem
et al., 2001a), a remarkably accurate agreement was found for the value
n 3=2, which is about the average degree of the Internet graph. Hence,
if we were to use the n-ary tree as model for the hopcount to an anycast
group, we expect that n 3=2 is the best value for Internet shortest path
trees. However, we feel we ought to mention that the hopcount distribution of the shortest path between two arbitrary nodes is definitely not a
n-ary tree, because Pr [kQ (1) = m] increases with the hopcount m, which is
in conflict with Internet trace-route measurements (see, for example, the
bell-shape curve in Fig. 16.4).
Figure 18.1 displays Pr [k(p) m] for a n-ary with outdegree n = 3 and
424
The hopcount to an anycast group
1.0
N = 500
k=3
0.8
Pr[hN (m) d j]
m = 50
m = 10
0.6
m=5
0.4
m=2
m=1
0.2
0
1
2
3
4
5
j
Fig. 18.1. The distribution function of k500 (p) versus the hops m for various sizes
of the anycast group in a n-ary tree with n = 3 and Q = 500
Q = 500. This type of plot allows us to solve the “server placement problem”. For example, assuming that the n-ary tree is a good model and the
network consists of Q = 500 nodes, Fig. 18.1 shows that at least p = 10
servers are needed to assure that any user is not more than four hops separated from a server with a probability of 93%. More precisely, the equation
Pr[k500 (p) A 4] ? 0=07 is obeyed if p 10.
Figure 18.2 gives an idea how the performance measure decreases with
the size of the anycast group in n-ary trees (all with outdegree n = 3), but
with dierent size Q . For values of p up to around 20% of Q , we observe
that decreases logarithmically in p.
18.4 The uniform recursive tree (URT)
Chapter 16 motivates the interest in the URT. The URT is believed to
provide a reasonable, first order estimate for the hopcount problem to an
anycast group in the Internet.
18.4.1 Recursion for Pr [k(p) = m]
Usually, a combinatorial approach such as (18.9) is seldom successful for
URTs while structural properties often lead to results. The basic Theo-
18.4 The uniform recursive tree (URT)
425
1.0
k=3
0.8
0.6
K
N = 100
N = 500
0.4
N = 5000
5
N = 10
0.2
6
N = 10
0.0 2
3
4
5
6
7 8 9
2
3
4
5
6
7 8 9
0.1
1
m/N
Fig. 18.2. The performance measure for several sizes of n-ary trees (with n = 3)
as a function of the ratio of anycast nodes over the total number of nodes.
rem 16.2.1 of the URT, applied to the anycast minimum hop problem, is
illustrated in Fig. 18.3.
Root
i anycast
members
m i anycast
members
R1
N k nodes
T2
k nodes
T1
Fig. 18.3. A uniform recursive tree consisting of two subtrees W1 and W2 with n and
Q n nodes respectively. The first cluster contains l anycast members while the
cluster with Q n nodes contains p l anycast members.
Figure 18.3 shows that any URT can be separated into two subtrees W1 and
W2 with n and Q n nodes respectively. Moreover, Theorem 16.2.1 states
426
The hopcount to an anycast group
that each subtree is independent of the other and again a URT. Consider
now a specific separation of a URT W into W1 = w1 and W2 = w2 , where the tree
w1 contains n nodes and l of the p anycast members and w2 possesses Q n
nodes and the remaining p l anycast members. The event {kW (p) = m}
equals the union of all possible sizes Q1 = n and subgroups p1 = l of the
event {kw1 (l) = m 1} _ {kw2 (p l) m} and the event {kw1 (l) A m 1} _
{kw2 (p l) = m},
{kW (p) = m} = ^n ^l {{kw1 (l) = m 1} _ {kw2 (p l) m}}
^ {{kw1 (l) A m 1} _ {kw2 (p l) = m}}
Because kQ (0) is meaningless, the relation must be modified for the case
l = 0 to
{kW (p) = m} = {kw2 (p) = m}
and for the case l = p to
{kW (p) = m} = {kw1 (p) = m 1}
This decomposition holds for any URT W1 and W2 , not only for the specific
ones w1 and w2 . The transition towards probabilities becomes
X
(Pr [kw1 (l) = m 1] Pr [kw2 (p l) m]
Pr [kW (p) = m] =
all w1 >w2 >n>l
+ Pr [kw1 (l) m 1] Pr [kw2 (p l) = m])
× Pr [W1 = w1 > W2 = w2 > Q1 = n> p1 = l]
Since W1 and W2 and also p1 are independent given Q1 , the last probability
o simplifies to
o = Pr [W1 = w1 > W2 = w2 > Q1 = n> p1 = l]
= Pr [W1 = w1 |Q1 = n] Pr [W2 = w2 |Q1 = n] Pr [p1 = l|Q1 = n] Pr [Q1 = n]
Theorem 16.2.1 states that Q1 is uniformly distributed over the set with
1
. The fact that l out of the p
Q 1 nodes such that Pr [Q1 = n] = Q31
anycast members, uniformly chosen out of Q nodes, belong to the recursive
subtree W1 implies that p l remaining anycast members belong to W2 .
Hence, analogous to a combinatorial problem outlined by Feller (1970, p. 43)
that leads to the hypergeometric distribution, we have
¡n¢¡Q 3n¢
Pr [p1 = l|Q1 = n] =
l
¡Qp3l
¢
p
18.4 The uniform recursive tree (URT)
427
¡n¢
because all favorable combinations are those l to distribute l anycast mem¡ 3n¢
bers in W1 with n nodes multiplied by all favorable Q
p3l to distribute the
remaining p l in W2 containing Q ¡ n¢nodes. The total way to distribute
p anycast members over Q nodes is Q
p . Finally, we remark that the hopcount of the shortest path to p anycast members in a URT only depends on
its size. This means that the sum over all w1 of Pr [W1 = w1 |Q1 = n], which
equals 1, disappears and likewise also the sum over all w2 . Combining the
above leads to
Pr [kQ (p) = m] =
Q31
X p31
X
(Pr [kn (l) = m 1] Pr [kQ3n (p l) m]
n=1 l=1
¡n¢¡Q3n¢
+ Pr [kn (l) A m 1] Pr [kQ3n (p l) = m])
l p3l
¡ ¢
(Q 1) Q
p
¡
¢
¡ ¢
Q31
X Q3n Pr [kQ 3n (p) = m] + n Pr [kn (p) = m 1]
p
p
+
¢
¡Q
(Q
1)
p
n=1
By substitution of n 0 = Q n and p0 = p l, we obtain the recursion,
¡ ¢¡Q3n¢
Pr [kQ (p) = m] =
Q31
X p31
X nl
n=1 l=1
Q3n31
X
×
p3l
(Pr [kn (l) = m 1] + Pr [kn (l) = m])
¡ ¢
(Q 1) Q
p
Pr [kQ3n (p l) = t]
t=m
¡ ¢
Q31
X n (Pr [kn (p) = m] + Pr [kn (p) = m 1])
p
+
¡ ¢
(Q 1) Q
p
n=1
(18.11)
This recursion (18.11) is solved numerically for Q = 20. The result is shown
in Fig. 18.4, which demonstrates that Pr [k(p) A Q p] = 0 or that the
path with the longest hopcount to an anycast group of p members consists
of Q p links.
Since there are (Q 1)! possible recursive trees (Theorem 16.2.2) and
there is only one line tree with Q 1 hops where each node has precisely
one child node, the probability to have precisely Q 1 hops from the root is
1
(Q31)! (which also is Pr [kQ = Q 1] given in (16.8)). The longest possible
hopcount from a root to p anycast members occurs in the line tree where
all p anycast members occupy the last p positions. Hence, the probability
428
The hopcount to an anycast group
for the longest possible hopcount equals
Pr [kQ (p) = Q p] =
p!
¡ ¢
(Q 1)! Q
p
(18.12)
because there are p! possible ways to distribute the p¡ anycast
members
Q¢
at the p last positions in the line tree while there are p possibilities to
distribute p anycast members at arbitrary places in the line tree.
-1
10
N = 20
-3
10
-5
10
Pr[hN (m) = j]
-7
10
-9
10
-11
10
-13
10
-15
10
-17
10
-19
10
0
2
4
6
8
10
12
14
16
18
20
j
Fig. 18.4. The pdf of kQ (p) in a URT with Q = 20 nodes for all possible p.
Observe that Pr[kQ (p) A Q p] = 0= This relation connects the various curves
to the value for p.
Figure 18.4 allows us to solve the “server placement problem”. For example, consider the scenario in which a network operator announces that
any user request will reach a server of the anycast group in no more than
m = 4 hops in 99.9% of the cases. Assuming his network has Q = 20 routers
and the shortest path tree is a URT, the network operator has to compute
the number of anycast servers p he has to place uniformly spread over the
Q = 20 routers by solving Pr [k20 (p) A 4] ? 1033 . Figure 18.4 shows that
the intersection of the line m = 4 and the line Pr [k20 (p) = 4] = 1033 is the
curve for p = 7. Since the curves for p 7 are exponentially decreasing,
Pr [k20 (p) A 4] is safely1 approximated by Pr [k20 (p) = 4], which leads to
the placing of p = 7 servers. When following the line m = 4, we also observe
that the curves for p = 5> 6> 7> 8 lie near to that of p = 7. This means that
1
More precisely, since Pr [k20 (4) A 4] = 0.001 06 and Pr [k20 (5) A 4] = 0.000 32, only p = 5
servers are su!cient.
18.4 The uniform recursive tree (URT)
429
placing a server more does not considerably change the situation. It is a
manifestation of the law 1 d log p, which tells us that by placing p
servers the gain measured in hops with respect to the single server case is
slowly, more precisely logarithmically, increasing. The performance measure
for the URT is drawn for several sizes Q in Fig. 18.5.
18.4.2 Analysis of the recursion relation
The product of two probabilities in the double sum in (18.11) seriously complicates a possible analytic treatment. A relation for a generating function of
Pr [kQ (p) = m] and other mathematical results are derived in Van Mieghem
(2004b). Here, we summarize the main results.
p
. Using Pr [kn (l) 1] = 1, the
(a) Let us check Pr [kQ (p) = 0] = Q
p3l
convention that Pr [kn (l) = 1] = 0 and Pr [kQ3n (p l) = 0] = Q
3n , the
right hand side of (18.11), denoted by u, simplifies to
µ ¶µ
¶
Q31
p
XX
pl n Q n
1
u=
¡ ¢
Q n l
pl
(Q 1) Q
p n=1 l=0
Q31
X p31
X µn ¶µQ 1 n¶
1
=
¡ ¢
l
p1l
(Q 1) Q
p n=1 l=0
Q31
X µQ 1¶ p
1
=
=
¡ ¢
Q
p1
(Q 1) Q
p n=1
(b) Observe that Pr [kQ (Q ) = m] = 0 for m A 0.
(c) For p = 1,
1 X
n
Pr [kQ = m] =
(Pr[kn = m] + Pr [kn = m 1])
Q 1
Q
Q 31
n=1
Multiplying both sides by } m , summing over all m leads to the recursion for
the generating function (16.6)
(Q + 1)*Q+1 (}) = (} + Q )*Q (})
430
The hopcount to an anycast group
(d) The case p = 2 is solved in Van Mieghem (2004b, Appendix) as
µ
¶
m
2(1)Q313m (m+1)
2(1)Q3m X
(n+m+1)
n 2n1
Pr [kQ (2) = m] =
+
(1)
VQ
VQ
Q!
Q !(Q 1)
n
n=1
¶ µ
¶¸
m31 µ
2n + 1
2(1)Q3m X m + n + 1
n+m+1
(1)n VQ
+
Q !(Q 1)
m
n
n=0
(18.13)
In van der Hofstad et al. (2002b) we have demonstrated that the covariance
between the number of nodes at level u and m for u m in the URT is
µ
¶
u
h
i (1)Q31 X
(u) (m)
(n+m+1)
n+m 2n + m u
(1)
H [Q [Q =
VQ
(Q 1)!
n
n=0
For m u = 1, the last term in (18.13) is recognized as
¡2n31¢ 1 ¡2n¢
= 2 n , the first sum in (18.13) is
n
l
k
(m31) (m)
H [Q
[Q
(Q2 )
. Since
³
´ ¸
(m) 2
µ
¶
H [Q
m
(m+1)
2(1)Q3m X
2(1)Q3m31 VQ
(n+m+1)
n 2n1
(1)
=
VQ
¡ ¢
Q !(Q 1)
(Q 1)
Q!
n
2 Q2
n=1
Q 313m
With 2(31)Q!
(m+1)
VQ
= 2 Pr [kQ = m], we obtain
³
´ ¸
i
h
(m) 2
(m31) (m)
H [Q
H [Q [Q
2Q
Pr [kQ = m] +
Pr [kQ (2) = m] =
¡Q ¢
¡ ¢
Q 1
2 Q2
2
¶
m µ
2(1)Q31 X m + n
n+m
+
(1)n+m VQ
Q !(Q 1)
n
n=1
It would be of interest to find an interpretation for the last sum.
Without proof2 , we mention the following exact results:
Q µ ¶
X
Q
p=1
p
Pr [kQ (p) = Q 2] =
Q1
Xµ
p=1
¶
Q 1 Pr [kQ 1 (p) = Q 3]
1
+
Q 1
(Q 2)!
p
For p Q 3, it holds that
p!
Pr [kQ (p) = Q p 1] =
¡ ¢
(Q 1)! Q
p
2
"µ ¶
#
p
X
Q
p+1
+ (p 1)(p@2 + 1) +
2
n
By substitution into the recursion (18.11), one may verify these relations.
n=2
18.5 Approximate analysis
431
18.5 Approximate analysis
Since the general solution (18.9) is in many cases di!cult to compute as
shown for the URT in Section 18.4, we consider a simplified version of the
p
above problem where each node in the tree has equal probability s = Q
to be a server. Instead of having precisely p servers, the simplified version
considers on average
and the probability that there are precisely
¡Q ¢ p p servers
Q3p
. In the simplified version, the associated
p servers is p s (1 s)
equations to (18.4) and (18.3) are
o¯
oi
i
hn
hn
(m)
¯
Pr p(m) = 0 ¯ hm31 = Pr p(m) = 0 = (1 s)[Q
Pr [hm ] =
m
Y
Pr
hn
oi
Sm31 (o)
p(m) = 0 = (1 s) o=0 [Q
o=0
which implies that the probability that there are no servers in the tree is
(1 s)Q . Since in that case, the hopcount is meaningless, we consider
the conditional probability (18.2) of the hopcount given that the level set
contains at least one server (which is denoted by e
kQ (p)) is
³
´
Sm31 (o)
(m)
i
h
1 (1 s)[Q (1 s) o=0 [Q
kQ (p) = m|OQ =
Pr e
1 (1 s)Q
Thus,
h
i 1 (1 s)Sqo=0 [Q(o)
e
Pr kQ (p) q|OQ =
1 (1 s)Q
h
i
(o)
Finally, to avoid the knowledge of the entire level set OQ , we use H [Q =
(o)
Q Pr [kQ (1) = o] from (16.7) as the best estimate for each [Q and obtain
the approximate formula
µ
l¶
k
Sm31 k (o) l
(m)
H [Q
H [Q
(1 s) o=0
1 (1 s)
h
i
kQ (p) = m =
Pr e
(18.14)
1 (1 s)Q
In the dotted lines in Fig. 18.5, we have added the approximate result for
the URT where H [kQ (p)] is computed based on (18.14), but where H[kQ (1)]
is computed exactly. For p = 1, the approximate analysis (18.14) is not
well
i Fig. 18.5 illustrates this deviation in the fact that appr (1) =
h suited:
kQ (1) @H [kQ (1)] ? 1. For higher values of p we observe a fairly good
H e
correspondence. We found that the probability (18.14) reasonably approximates the exact result plotted on a linear scale. Only the tail behavior (on
432
The hopcount to an anycast group
log-scale) and the case for p = 1 deviate significantly. In summary for the
URT, the approximation (18.14) for Pr [kQ (p) = m] is much faster to compute than the exact recursion and it seems appropriate for the computation
of for p A 1. However, it is less adequate to solve the server placement
problem that requires the tail values Pr [kQ (p) A m].
1.0
N = 10 : K 0.404 ln(m/N)
N = 20 : K 0.295 ln(m/N)
N = 30 : K 0.252 ln(m/N)
N = 50 : K 0.210 ln(m/N)
0.8
K
0.6
0.4
0.2
0.0
2
3
4
5
6
7 8 9
2
0.1
3
4
5
6
7 8 9
1
m/N
Fig. 18.5. The performance measure for several sizes Q of URTs as a function of
the ratio p@Q
18.6 The performance measure in exponentially growing trees
In this section, we investigate the observed law 1 d log p for a much
larger class of trees, namely the class of exponentially growing trees to which
both the n-ary tree and the URT belong. Also most trees in the Internet
are exponentially growing trees. A tree is said to grow exponentially in the
³
´
(m) 1@m
number of nodes Q with degree if limm<" [Q
= or, equivalently,
(m)
[Q m , for large m. The fundamental problem with this definition is that
it only holds for infinite graphs Q = 4. For real (finite) graphs, there must
(o+1)
(o+2)
(Q31)
exist some level m = o for which the sequence [Q > [Q > = = = > [Q
P
(m)
ceases to grow because Q31
m=0 [Q = Q ? 4. This boundary eect complicates the definition of exponential growth in finite graphs. The second
(0)
(1)
(o)
complication is that even in the finite set [Q > [Q > = = = > [Q not necessary
(m)
(m)
all [Q with 0 m o need to obey [Q m , but “enough” should.
18.6 The performance measure in exponentially growing trees
433
Without the limit concept, we cannot specify the precise conditions of exponential growth in a finite shortest path tree. If we assume in finite graphs
P
(m)
(m)
that [Q m for m o, then om=0 [Q = Q with 0 ? ? 1. Indeed,
for A 1, the highest hopcount level o possesses by far the most nodes since
o+1 31
o
31 , which cannot be larger than a fraction Q of the total number
of nodes.
We now present an order calculus to estimate for exponentially growing
trees based on relation (18.8). Let us denote
¡Q3{¢
|=
p
¡Q
¢
p
=
p31
Yµ
m=0
{
1
Q m
¶
For large Q and fixed p,
³ {p ´
(1 + r (1))
| = exp Q
(m)
In the case where the tree is exponentially growing for m o as [Q = m m
with m some slowly varying sequence, only very few levels o (bounded by
P
(n)
a fixed number) around o obey qn=0 [Q = R(Q ) where q 5 [o o> o],
Pq
(n)
while for all m A o, we have n=0 [Q = q Q with some sequence q ?
P
(n)
q+1 ? max q = 1. Applied to (18.8) where { = qn=0 [Q ? Q ,
¡
¢
Q32
³ p
´
X (13q )Q
q
p
exp q +
H [kQ (p)|OQ ] (1 + r (1))
¡Q ¢
Q
p
q=0
o
X
q=o+1
If there are only a few levels more than o, the last series is much smaller than
1 and can be omitted. Since the slowly varying sequence q is unknown, we
approximate q = and
Z p o 3x
³ p
³ p
´ Z o
´
Q
h
1
q
q
exp q exp gq =
gx
p
Q
Q
log x
0
q=0
Q
Z " 3x
h3p
h
1
gx log p x
p log µQ
³ p ´¶
h3p
p
1
log + R
=
log p
Q
Q
o
X
where in the last step a series (Abramowitz and Stegun, 1968, Section 5.1.11)
434
The hopcount to an anycast group
for the exponential integral is used. Thus,
µ
¶
3p
¡p¢
33 h p 3log p3log +R Q
1+
log Q
³
(1 + r (1))
¡ 1 ¢´
31
+
R
1 +hlog+log
Q
Q
Ã
µ
¶!
3p
h
1
log p h31 p
= (1 + r (1)) 1 +R
log Q
log Q
log2 Q
Since by definition = 1 for p = 1, we finally arrive at
¶
µ
3p
log p h31 h p
1
1
+R
log Q
log Q
log2 Q
which supplies evidence for the conjecture 1 d log p that exponentially growing graphs possess a performance measure that logarithmically
decreases in p, which is rather slow.
Measurement data in the Internet seem to support this log p-scaling law.
Apart from the correspondence with figures in the work of Jamin et al.
(2001), Fig. 6 in Krishnan et al. (2000) shows that the relative measured
tra!c flow reduction decreases logarithmically in the number of caches p.
Appendix A
Stochastic matrices
This appendix reviews the matrix theory for Markov chains. In-depth analyses are found in classical books by Gantmacher (1959a,b), Wilkinson (1965)
and Meyer (2000).
A.1 Eigenvalues and eigenvectors
1. The algebraic eigenproblem consists in the determination of the eigenvalues and the corresponding eigenvectors { of a matrix D for which the
set of q homogeneous linear equations in q unknowns
D{ = {
(A.1)
has a non-zero solution. Clearly, the zero vector { = 0 is always a solution of
(A.1). A non-zero solution of (A.1) is only possible if and only if the matrix
D L is singular, that is
det (D L) = 0
(A.2)
This determinant can be expanded in a polynomial in of degree q,
f() = (1)q q + fq31 q31 + · · · + f1 + f0 = 0
(A.3)
which is called the characteristic (eigenvalue) polynomial of the matrix D
and where the coe!cients
X
Pq3n
(A.4)
fn = (1)n
doo
and Pn is a principal minor1 . Since a polynomial of degree q has q complex
zeros, the matrix D possesses q eigenvalues n , not all necessarily distinct.
1
A principal minor Pn is the determinant of a principal n × n submatrix Pn×n obtained by
deleting the same q 3 n rows and columns in D. Hence, the main diagonal elements (Pn×n )ll
are n elements of main diagonal elements {dll }1$l$q of D.
435
436
Stochastic matrices
In general, the characteristic polynomials can be written as
f() =
q
Y
(n )
(A.5)
n=1
Since f() = det (D L), it follows from (A.3) and (A.5) that for = 0
det D = f0 =
q
Y
n
(A.6)
n=1
Hence, if det D = 0, there is at least one zero eigenvalue. Also,
(1)q31 fq31 =
q
X
n = trace(D)
(A.7)
n=1
If all n 0, we can apply the general theorem of the arithmetic and
geometric mean (5.2) to (A.6) and (A.7) with tn = Sqn m ,
m=1
q
Y
n=1
ÃP
n n q
n=1 n n
P
q
m=1 m
!Sq
m=1 m
and by choosing m = 1, we find the inequality
¶
µ
trace(D) q
det D q
To any eigenvalue , the set (A.1) has at least one non-zero eigenvector {.
Furthermore, if { is a non-zero eigenvector, also n{ is a non-zero eigenvalue.
Therefore, eigenvectors are often normalized, for instance, a probabilistic
eigenvector has the sum of its components equal to 1 or a norm k{k1 = 1
as defined in (A.23). If the rank of D L is less than q 1, there will
be more than one independent vector. Just these cases seriously complicate
the eigenvalue problem. In the sequel, we omit the discussion on multiple
eigenvalues and refer to Wilkinson (1965).
2. The eigenproblem of the transpose DW ,
DW | = |
(A.8)
is of singular importance. Since the¡ determinant
of a matrix is equal to the
¢
W
determinant of its transpose, det D L = det (D L) which shows
that the eigenvalues of D and DW are the same. However, the eigenvectors
are, in general, dierent. Alternatively, we can write (A.8) as
|W D = | W
(A.9)
A.1 Eigenvalues and eigenvectors
437
The vector |mW is therefore called the left-eigenvector of D belonging to
the eigenvalue m , whereas {m is called the right-eigenvector belonging to
the same eigenvalue m . An important relation between left- and righteigenvectors of a matrix D is, for m 6= n ,
|mW {n = 0
(A.10)
Indeed, left-multiplying (A.1) with = n by |mW ,
|mW D{n = n |mW {n
and similarly right-multiplying (A.9) with = m by {n
|mW D{n = m |mW {n
leads, after subtraction to 0 = (n m ) |mW {n and (A.10) follows. Since
eigenvectors may be complex in general and since |mW {n = {Wn |m , the expression |mW {n is not an inner-product that is always real and for which
¢W
¡
|mW {n = {Wn |m holds. However, (A.10) expresses that the sets of left- and
right-eigenvectors are orthogonal if m 6= n .
3. If D has q distinct eigenvalues, then the q eigenvectors are linearly independent and span the whole q dimensional space. The proof is by reductio
ad absurdum. Assume that v is the smallest number of linearly dependent
eigenvectors labelled by the first v smallest indices. Linear dependence then
means that,
v
X
n {n = 0
(A.11)
n=1
where n 6= 0 for 1 n v. Left-multiplying by D and using (A.1) yields
v
X
n n {n = 0
(A.12)
n=1
On the other hand, multiplying (A.11) by v and subtracting from (A.12)
leads to
v31
X
n (n v ) {n = 0>
n=1
which, because all eigenvalues are distinct, implies that there is a smaller
set of v 1 linearly depending eigenvectors. This contradicts the initial
hypothesis.
This important property has a number of consequences. First, it applies to
left- as well as to right-eigenvectors. Relation (A.10) then shows that the sets
438
Stochastic matrices
of left- and right-eigenvectors form a bi-orthogonal system with |nW {n 6= 0.
For, if {n were orthogonal to |n (or |nW {n = 0), (A.10) demonstrates that
{n would be orthogonal to all left-eigenvectors |m . Since the set of lefteigenvectors span the q dimensional vector space, it would mean that the
q dimensional vector {n would be orthogonal to the whole q-space, which
is impossible because {n is not the null vector. Second, any q dimensional
vector can be written in terms of either the left- or right-eigenvectors.
4. Let us denote by [ the matrix with in column m the right-eigenvector
{m and by \ W the matrix with in row n the left-eigenvector | W . If the rightand left-eigenvectors are scaled such that, for all 1 n q, |nW {n = 1, then
\ W[ = L
(A.13)
or, the matrix \ W is the inverse of the matrix [. Furthermore, for any
right-eigenvector, (A.1) holds, rewritten in matrix form, that
D[ = [ diag(n )
(A.14)
Left-multiplying by [ 31 = \ W yields the similarity transform of matrix D,
[ 31 D[ = \ W D[ = diag(n )
(A.15)
Thus, when the eigenvalues of D are distinct, there exists a similarity transform K 31 DK that reduces D to diagonal form. In many applications, similarity transforms are applied to simplify matrix problems. Observe that a
similarity transform preserves the eigenvalues, because, if D{ = {, then
K 31 { = K 31 D{ = (K 31 DK)K 31 {. The eigenvectors are transformed to
K 31 {.
When D has multiple eigenvalues, it may be impossible to reduce D to
a diagonal form by similarity transforms. Instead of a diagonal form, the
most compact form when D has u distinct eigenvalues each with multiplicity
P
pm such that um=1 pm = q is the Jordan canonical form F,
5
9
9
9
F=9
9
7
6
Fp1 3d (1 )
Fd (1 )
:
:
:
:
:
8
..
.
Fpu31 (u31 )
Fpu (u )
A.1 Eigenvalues and eigenvectors
439
where Fp () is a p × p submatrix of the form,
5
1
9 0 9
9
..
Fp () = 9 ...
.
9
7 0 ···
0 ···
0 ···
1 0
..
..
.
.
0 0 0
6
0
··· :
:
.. :
. :
:
1 8
The number of independent eigenvectors is equal to the number of submatrices. If an eigenvalue has multiplicity p, there can be one large
submatrix Fp (), but also a number n of smaller submatrices Fem () such
P
that nm=1 em = p. This illustrates, as mentioned in art. 1, the much higher
complexity of the eigenproblem in case of multiple eigenvalues. For more
details we refer to Wilkinson (1965).
5. The companion matrix of the characteristic polynomial (A.3) of D is
defined as
5
(1)q31 fq31 (1)q31 fq32
9
1
0
9
9
0
1
F=9
9
.
..
..
7
.
0
0
···
···
···
..
.
···
6
(1)q31 f1 (1)q31 f0
:
0
0
:
:
0
0
:
:
..
..
8
.
.
1
0
Expanding det (F L) in cofactors of the first row yields det (F L) =
f (). If D has distinct eigenvalues, D as well as F are similar to diag(l ). It
has been shown that the similarity transform K for D equals K = [. The
similarity transform for F is the Vandermonde matrix Y (), where
5
{q31
{q31
1
2
q32
9 {
{q32
2
9 1
9 ..
..
Y ({) = 9
.
9 .
9
7 {1
{2
1
1
···
···
···
..
.
···
6
q31
{q31
q31 {q
q32 :
{q32
q31 {q
:
..
.. :
.
. :
:
:
{q31 {q 8
1
1
The Vandermonde matrix Y () is clearly non-singular if all eigenvalues are
440
Stochastic matrices
distinct. Furthermore,
5
q1
q2
q31
9 q31
2
9 1
9 ..
..
Y ()diag (l ) = 9
.
9 .
9
7 21
22
1
2
while
5
(1)q1 f (1 ) + q1
9
q1
1
9
9
..
9
FY () = 9
.
9
7
21
1
···
···
···
..
.
···
(1)q1 f (2 ) + q2
q1
2
..
.
22
2
6
qq31 qq
q31 :
q31
q31 q
:
..
.. :
.
. :
:
:
2
2
q31 q 8
q31 q
···
···
···
..
.
···
6
(1)q1 f (q ) + qq
:
q1
q
:
:
..
:
.
:
:
2
8
q
q
Since f (m ) = 0, it follows that FY () = Y ()diag(l ), which demonstrates
the claim. Hence, the eigenvector {n of F belonging to eigenvalue n is
¤
£
{Wn = q31
q32
· · · n 1
n
n
6. When left-multiplying (A.1), we obtain
D2 { = D{ = 2 {
or, in general for any integer n 0,
Dn { = n {
(A.16)
Since any eigenvalue satisfies its characteristic polynomial f () = 0, we
directly find from (A.16) that the matrix D satisfies its own characteristic
equation,
f(D) = 0
(A.17)
This result is the Caley—Hamilton theorem. There exist several other proofs
of the Caley—Hamilton theorem.
7. Consider an arbitrary matrix polynomial in ,
I () =
p
X
In n
n=0
where all In are q × q matrices and Ip 6= R. Any matrix polynomial I ()
can be right and left divided by another (non-zero) matrix polynomial E()
in a unique way as proved in Gantmacher (1959a, Chapter IV). Hence the
A.1 Eigenvalues and eigenvectors
441
left-quotient and left-remainder I () = E()TO () + O() and the rightquotient and right-remainder I () = TU ()E() + U() are unique. Let us
concentrate on the right-remainder in the case where E() = L D is a
linear polynomial in . Using Euclid’s division scheme for polynomials,
p31
I () = Ip p31
(L D) + (Ip D + Ip31 ) +
£
¤
= Ip p31 + (Ip D + Ip31 ) p32 (L D)
p32
X
In n
n=0
X
¢ p32 p33
¡
2
+
In n
+ Ip D + Ip31 D + Ip32 n=0
and continued, we arrive at
5
I () = 7Ip p31 + · · · + n31
p
X
Im Dm3n + · · · +
p
X
6
Im Dm31 8 (L D)
m=1
m=n
+
p
X
Im Dm
m=0
In summary, I () = TU () (L D) + U() (and similarly for the leftquotient and left-remainder) with
³P
´
³P
´
P
Pp
p
p
n31
m3n
n31
m3n I
I
D
()
=
D
T
TU () = p
m
m
O
m=n
m=n
P n=1 m
Ppn=1 m
U() = p
I
D
=
I
(D)
O()
=
D
I
m
m=0 m
m=0
(A.18)
and where the right-remainder is independent of . The Generalized Bézout
Theorem states that the polynomial I () is divisible by (L D) on the
right (left) if and only if I (D) = R (O() = R).
By the Generalized Bézout Theorem, the polynomial I () = j()L j(D)
is divisible by (L D) because I (D) = j(D)L j(D) = R. If I () is an
ordinary polynomial, the right- and left-quotient and remainder are equal.
The Caley—Hamilton Theorem (A.17) states that f(D) = 0, which indicates
that f()L = T() (L D) and also f()L = (L D) T(). The matrix
T() = (L D)31 f() is called the adjoint matrix of D. Explicitly, from
(A.18),
3
4
q
q
X
X
T() =
n31 C
fm Dm3n D
n=1
m=n
Pq
m3n . The main theand, with (A.6), T(0) = (D)31 det D =
m=1 fm D
oretical interest of the adjoint matrix stems from its definition f()L =
442
Stochastic matrices
T() (L D) = (L D) T() in case = n is an eigenvalue of D. Then,
(n L D) T(n ) = 0, which indicates by (A.1) that every non-zero column
of the adjoint matrix T(n ) is an eigenvector belonging to the eigenvalue
n . In addition, by dierentiation with respect to , we obtain
f0 ()L = (L D) T0 () + T()
This demonstrates that, if T(n ) 6= R, the eigenvalue n is a simple root of
f() and, conversely, if T(n ) = R, the eigenvalue n has higher multiplicity.
The adjoint matrix T() = (L D)31 f() is computed by observing
that, on the Generalized Bézout Theorem, f()3f()
is divisible without re3
mainder. By replacing in this polynomial and by L and D respectively,
T() readily follows as illustrated in Section A.4.2.
8. Consider the arbitrary polynomial of degree o,
j({) = j0
o
Y
({ m )
m=1
Substitute { by D, then
j(D) = j0
o
Y
(D m L)
m=1
Since det (DE) = det D det E and det(nD) = n q det D, we have
det(j(D)) = j0q
o
Y
det(D m L) = j0q
m=1
o
Y
f(m )
m=1
With (A.5),
det(j(D)) = j0q
o Y
q
Y
(n m ) =
m=1 n=1
=
q
Y
q
Y
n=1
j0
o
Y
(n m )
m=1
j (n )
n=1
If k({) = j({) , we arrive at the general result: For any polynomial j({),
the eigenvalues values of j(D) are j (1 ) > = = = > j (q ) and the characteristic
polynomial is
q
Y
det(j(D) L) =
(j (n ) )
(A.19)
n=1
which is a polynomial in of degree at most q. Since the result holds for an
A.2 Hermitian and real symmetric matrices
443
arbitrary polynomial, it should not surprise that, under appropriate conditions of convergence, it can be extended to infinite polynomials, in particular
to the Taylor series of a complex function. As proved in Gantmacher (1959a,
Chapter V), if the power series of a function i (}) around } = }0
i (}) =
"
X
im (}0 )(} }0 )m
(A.20)
m=1
P
m
converges for all } in the disc |}}0 | ? U, then i (D) = "
m=1 im (}0 )(D}0 L)
provided all eigenvalues of D lie with the region of convergence of (A.20),
i.e. | }0 | ? U. For example,
hD} =
log D =
"
X
} n Dn
n=0
"
X
n=1
n!
for all D
(1)n31
(D L)n for |n 1| ? 1, all 1 n q
n
and, from (A.19), the eigenvalues of hD} are h}1 > = = = > h}1 . Hence, the knowledge of the eigenstructure of a matrix D allows us to compute any function
of D (under the same convergence restrictions as complex numbers }).
A.2 Hermitian and real symmetric matrices
¡ ¢W
A Hermitian matrix D is a complex matrix that obeys DK = DW
=
D, where dK = (dlm )W is the complex conjugate of dlm = Hermitian matrices
possess a number of attractive properties. A particularly interesting subclass
of Hermitian matrices are real, symmetric matrices that obey DW ¡= D.¢The
W
inner-product of vector | and { is defined as | K { and obeys | K { =
¡ K ¢K
P
| { = {K |. The inner-product {K { = qm=1 |{m |2 is real and positive
for all vectors except for the null vector.
9. The eigenvalues of a Hermitian matrix are all real. Indeed, leftmultiplying (A.1) by {K yields
{K D{ = {K {
¡
¢K
and, since {K D{ = {K DK { = {K D{, it follows that {K { = K {K { or
= K because {K { is a positive real number. Furthermore, since D = DK ,
we have
DK { = {
444
Stochastic matrices
Taking the complex conjugate, yields
DW {W = {W
In general, the eigenvectors of a Hermitian matrix are complex, but real for a
real symmetric matrix since DK = DW . Moreover, the left-eigenvector | W is
the complex conjugate of the right-eigenvector {. Hence, the orthogonality
relation (A.10) reduces, after normalization, to an inner-product
{K
n {m = nm
(A.21)
where nm is the Kronecker delta, which is zero if n 6= m and else nn = 1.
Consequently, (A.13) reduces to
[K [ = L
which implies that the matrix [ formed by the eigenvectors is an unitary
matrix ([ 31 = [ K ). For a real symmetric matrix D, the corresponding
relation [ W [ = L implies that [ is an orthogonal matrix ([ 31 = [ W ).
Although the arguments so far (see Section A.1) have assumed that the
eigenvalues of D are distinct, the theorem applies in general (as proved in
Wilkinson (1965, Section 47)): For any Hermitian matrix D, there exists a
unitary matrix X such that
X K DX = diag (m )
real m
and for any real symmetric matrix D, there exists an orthogonal matrix X
such that
X W DX = diag (m )
real m
10. To a real symmetric matrix D, a bilinear form {W D| is associated,
which is a scalar defined as
q
q X
X
W
W
dlm {l |m
{ D| = {D| =
l=1 m=1
We call a bilinear form a quadratic form if | = {. A necessary and su!cient
condition for a quadratic form to be positive definite, i.e. {W D{ A 0 for all
{ 6= 0, is that all eigenvalues of D should be positive. Indeed, art. 9 shows
the existence of an orthogonal matrix X that transforms D to a diagonal
form. Let { = X }, then
{W D{ = } W X W DX } =
q
X
n=1
n }n2
(A.22)
A.3 Vector and matrix norms
445
which is only positive for all }n provided n A 0 for all n. From (A.6),
a positive definite quadratic form {W D{ possesses a positive determinant,
det D A 0. This analysis shows that the problem of determining an orthogonal matrix X (or the eigenvectors of D) is equivalent to the geometrical
problem of determining the principal axes of the hyper-ellipsoid
q
q X
X
dlm {l |m = 1
l=1 m=1
Relation (A.22) illustrates that the eigenvalues n are the squares of the
principal axis. A multiple eigenvalue refers to an indeterminacy of the principal axes. For example if q = 3, an ellipsoid with two equal principal axis
means that any section along the third axis is a circle. Any two perpendicular diameters of the largest circle orthogonal to the third axis are principal
axis of that ellipsoid.
A.3 Vector and matrix norms
Vector and matrix norms, denoted by k{k and kDk respectively, provide a
single number reflecting a “size” of the vector or matrix and may be regarded
as an extension of the concept of the modulus of a complex number. A norm
is a certain function of the vector components or matrix elements. All norms,
vector as well as matrix norms, satisfy the three distance relations
(i) k{k A 0 unless { = 0
(ii) k{k = || k{k for any complex number (iii) k{ + |k k{k + k|k
In general, the Hölder t-norm of a vector { is defined as
3
41@t
q
X
k{k = C
|{m |t D
t
(A.23)
m=1
For example, the well-known Euclidean norm or length of the vector { is
found for t = 2 and k{k22 = {K {. In probability theory where { denotes
P
a discrete pdf, the law of total probability states that k{k1 = qm=1 {m = 1
and we will write k{k1 = k{k. Finally, max |{m | = limt<" k{kt = k{k" .
The unit-spheres Vt = {{| k{kt = 1} are, in three dimensions q = 3, for
t = 1, an octahedron, for t = 2, a ball and for t = 4, a cube. Furthermore,
V1 fits into V2 , which in turn fits into V" , implies that k{k1 k{k2 k{k"
for any {.
446
Stochastic matrices
The Hölder inequality proved in Section 5.5 states that, for 1s + 1t = 1 and
real s> t A 1,
¯ K ¯
¯{ | ¯ k{k k|k
(A.24)
s
t
A special case of the Hölder inequality where s = t = 2 is the CauchySchwarz inequality
¯ K ¯
¯{ | ¯ k{k k|k
(A.25)
2
2
The t = 2 norm is invariant under an unitary (hence also orthogonal) transformation X , where X K X = L, because kX {k22 = {K X K X { = {K { = k{k2 .
An p
other example
s of a non-homogeneous vector norm is the quadratic
form k{kD = {W D{ provided D is positive definite. Relation (A.22)
shows that, if not all eigenvalues m of D are the same, not all p
components
of the vector { are weighted similarly and, thus, in general, k{kD is a
non-homogeneous norm. The quadratic form k{kL equals the homogeneous
Euclidean norm k{k22 .
A.3.1 Properties of norms
All norms are equivalent in the sense that there exist positive real numbers
f1 and f2 such that, for all {,
f1 k{ks k{kt f2 k{ks
For example,
k{k2 k{k1 s
q k{k2
k{k" k{k1 q k{k"
s
k{k" k{k2 q k{k"
By choosing in the Hölder inequality (5.15) s = t = 1, {m $ m {vm for real
v A 0 and |m $ m A 0, we obtain with 0 ? ? 1 an inequality for the
weighted t-norm
à Pq
m=1 m |{m |
Pq
m=1 m
v
à Pq
!1
v
v
m=1 m |{m |
P
q
m=1 m
!1
v
For m = 1, the weights m disappear such that the inequality for the Hölder
t-norm becomes
1 1
k{kv k{kv q v ( 31)
A.3 Vector and matrix norms
where q
1 1
( 31)
v 447
1. On the other hand, with 0 ? ? 1 and for real v A 0,
³P
q
´ 1v
4 1v 3
41
3
¶ 1 v
q
q µ
v
v
X
X
|{
|{
|
|
m
m
D =C
D
C
Pq
1 =
1
Pq
v
|{
|
v ) v
v ) n
n=1
|{
|
(
|{
|
m=1
m=1
n
n
n=1
n=1
v
m=1 |{m |
k{kv
= P
q
k{kv
(
|{ |v
1
Since | = Sq m |{ |v 1 and 1 A 1, it holds that | | and
n=1
n
3
41 3
4 1v à P
!1
¶ 1 v
q
q µ
q
v
v
X
X
|{m |v v
|
|{
|
|{
m
m
m=1
C
D C
D = Pq
Pq
Pq
=1
v
v
v
n=1 |{n |
n=1 |{n |
n=1 |{n |
m=1
m=1
1 1
which leads to the opposite inequality (without normalization as q v ( 31) ),
k{kv k{kv
In summary, if s A t A 0, then the general inequality for Hölder t-norm is
1
1
k{ks k{kt k{ks q t 3 s
(A.26)
For p × q matrices D, the most frequently used norms are the Euclidean
or Frobenius norm
41@2
3
q
p X
X
|dlm |2 D
(A.27)
kDkI = C
l=1 m=1
and the t-norm
kDkt = sup
kD{kt
(A.28)
k{kt
°
°
kD{k
° { °
On the second distance relation, k{k t = °D k{k
° , which shows that
{6=0
t
t
t
kDkt = sup kD{kt
(A.29)
k{kt =1
Furthermore, the matrix t-norm (A.28) implies that
kD{kt kDkt k{kt
(A.30)
Since the vector norm is a continuous function of the vector components and
since the domain k{kt = 1 is closed, there must exist a vector { for which
equality kD{kt = kDkt k{kt holds. Since the n-th vector component of D{
P
is (D{)l = qm=1 dlm {m , it follows from (A.23) that
¯
¯t 41@t
3
¯
p ¯X
X
¯ q
¯
¯
¯ D
C
d
{
kD{kt =
lm
m
¯
¯
¯
¯
l=1 m=1
448
Stochastic matrices
For example, for all { with k{k1 = 1, we have that
¯
¯
¯ X
q
p ¯X
p X
q
p
X
X
X
¯ q
¯
¯
¯
d
{
|d
|
|{
|
=
|{
|
|dlm |
kD{k1 =
lm m ¯
lm
m
m
¯
¯
¯
l=1 m=1
l=1 m=1
m=1
l=1
Ã
!
q
p
p
X
X
X
|{m | max
|dlm | = max
|dlm |
m=1
m
m
l=1
l=1
Clearly, there exists a vector { for which equality holds, namely, if n is the
column in D with maximum absolute sum, then { = hn , the n-th basis vector
with all components zero, except for the n-th one, which is 1. Similarly, for
all { with k{k" = 1,
¯
¯
¯X
¯
q
q
X
X
¯ q
¯
¯
¯
dlm {m ¯ max
|dlm | |{m | max
|dlm |
kD{k" = max ¯
l ¯
l
l
¯
m=1
m=1
m=1
Again, if u is the row with maximum absolute sum and {m = 1.sign(dum )
P
P
such that k{k" = 1, then (D{)u = qm=1 |dum | = maxl qm=1 |dlm | = kD{k" .
Hence, we have proved that
kDk" = max
l
kDk1 = max
m
from which
q
X
m=1
p
X
|dlm |
(A.31)
|dlm |
(A.32)
l=1
° K°
°D ° = kDk
1
"
The t = 2 matrix norm, kD{k2 > is obtained dierently. Consider
kD{k22 = (D{)K D{ = {K DK D{
Since DK D is a Hermitian matrix, art. 9 shows that all eigenvalues are real
and non-negative because a norm kD{k22 0. These ordered eigenvalues are
denoted as 12 22 · · · q2 0. Applying the theorem in art. 9, there
exists a unitary matrix X such that { = X } yields
¡ ¢
{K DK D{ = } K X K DK DX } = } K diag m2 } 12 } K } = 12 k}k22
Since the t = 2 norm is invariant under a unitary (orthogonal) transform
k{k2 = k}k2 , by the definition (A.28),
kD{k2
= 1
{6=0 k{k2
kDk2 = sup
(A.33)
A.3 Vector and matrix norms
449
where the supremum is achieved if { is the eigenvector of DK D belonging to
12 . Meyer (2000, p. 279) proves the corresponding result for the minimum
eigenvalue provided that D is non-singular,
° 31 °
°D ° =
2
1
= q31
min kD{k2
k{k2 =1
The non-negative quantity m is called the m-th singular value and 1 is
the largest singular value of D. The importance of this result lies in an
extension of the eigenvalue problem to non-square matrices which is called
the singular value decomposition. A detailed discussion is found in Golub
and Loan (1983). If D has real eigenvalues 1 2 · · · q , the above
can be simplified and we obtain
{W D{
W
{6=0 { {
(A.34)
{W D{
{6=0 {W {
(A.35)
1 = sup
q = inf
because, for any {, it holds that q {W¡{ {W¢ D{ 1 {W {.
The Frobenius norm kDk2I = trace DK D . With (A.7) and the analysis
of DK D above,
kDk2I =
q
X
n2
(A.36)
n=1
In view of (A.33), the bounds kDk2 kDkI s
q kDk2 may be attained.
A.3.2 Applications of norms
°
°
° n ° ° n31 °
(a) Since °D ° = °DD ° kDk °Dn31 °, by induction, we have for any
integer n, that
° °
° n°
n
°D ° kDk
and
lim Dn = 0 if kDk ? 1
n<"
(b) By taking the norm of the eigenvalue equation (A.1), kD{k = || k{k
and with (A.30),
|| kDkt
(A.37)
450
Stochastic matrices
Applied to DK D, for any t-norm,
°
°
° °
12 °DK D°t °DK °t kDkt
Choose t = 1 and with (A.33),
° °
kDk22 °DK °1 kDk1 = kDk" kDk1
(c) Any matrix D can be transformed by a similarity transform K to
a Jordan canonical form F (art. 4) as D = KFK 31 , from which Dn =
KF n K 31 . A typical Jordan submatrix (Fp ())n = n32 E, where E is
independent of n. Hence, for large n, Dn $ 0 if and only if || ? 1 for all
eigenvalues.
A.4 Stochastic matrices
A probability matrix S is reducible if there is a relabeling of the states that
leads to
¸
S
E
1
Se =
R S2
where S1 and S2 are square matrices. Relabeling amounts to permuting rows
and columns in the same fashion. Thus, there exists a similarity transform
K such that S = K SeK 31 .
A.4.1 The eigenstructure
In this section, the basic theorem on the eigenstructure of a stochastic,
irreducible matrix will be proved.
Lemma A.4.1 If S is an irreducible non-negative matrix and if y is a vector
with positive components, then the vector } = (S +L)y has always fewer zero
components than y.
Proof: Denote
y=
y1
0
¸
and } =
}1
0
¸
where y1 A 0> }1 A 0
which is always possible by suitable renumbering of the states and
¸
S11 S12
S =
S21 S22
A.4 Stochastic matrices
451
The relation } = (S + L)y is written as
¸ ¸
¸ }1
S11 y1
y1
=
+
0
S21 y1
0
Since S is irreducible, S21 6= R, such that y1 A 0 implies that S21 y1 6= 0,
which proves the lemma.
¤
Observe, in addition, that all components of } are never smaller than
those of y. Also, transposing does not alter the result.
Theorem A.4.2 (Frobenius) The modulus of all eigenvalues of an irreducible stochastic matrix S are less than or equal to 1. There is only one real
eigenvalue = 1 and the corresponding eigenvector has positive components.
Proof: The t = 4 norm (A.31) of a probability matrix S with q states
defined by (9.7) subject to (9.8) precisely equals kS k" = 1. From (A.37), it
follows that all eigenvalues are, in absolute value, smaller than or equal to 1.
Since all elements Slm 5 [0> 1] and because an irreducible matrix has no zero
element rows, y W S has positive components if yW has positive components.
(yW S )
Thus, there always exists a scalar 0 ? y = min1$n$q (yW ) n , such that
n
y y W y W S . By Lemma A.4.1, we can always transform the vector y to a
vector } by right-multiplying both sides with (L + S ) such that
y y W (L + S ) yW S (L + S )
y } W } W S
and, by definition of y , y } since the components of } are never smaller
than those of y. Hence, for any arbitrary vector y with positive components,
the transform in Lemma A.4.1 leads to an increasing set y } · · · ,
which is bounded by 1 because no eigenvalue can exceed 1. This shows that
= 1 is the largest eigenvalue and the corresponding eigenvector | W has
positive components.
This eigenvector | W is unique. For, if there were another linearly independent eigenvector zW corresponding to the eigenvalue = 1, any linear
combination } W = | W + zW is also an eigenvector belonging to = 1.
But and can always be chosen to produce a zero component which the
transform method shows to be impossible. The fact that the eigenvector | W
is the only eigenvector belonging = 1, implies that the eigenvalue = 1 is
a single zero of the characteristic polynomial of S .
¤
The theorem proved for stochastic matrices is a special case of the famous
Frobenius theorem for non-negative matrices (see for a proof, e.g. Gant-
452
Stochastic matrices
macher (1959b, Chapter XIII)). We note that, in the theory of Markov
chains, the interest lies in the determination of the left-eigenvector | W = belonging to = 1, because the right-eigenvector { of S belonging to = 1
equals xW = [1 1 · · · 1], where is a scalar, because of the constraints
(9.8). Recall (A.10) and (A.13), the proper normalization, | W x = 1, precisely corresponds to the total law of probability. Using the interpretation
of Markov chains, an alternative argument is possible. If all eigenvalues
were || ? 1, application (c) in Section A.3.2 indicates that the steady-state
would be non-existent because S n $ 0 for n $ 4. Since this is impossible,
there must be at least one eigenvalue with || = 1. Furthermore, (9.22)
shows that at least one eigenvalue corresponding to the steady-state is real
and precisely 1.
Corollary A.4.3 An irreducible probability matrix S cannot have two linearly independent eigenvectors with positive components.
Proof: Consider, apart from |W = belonging to = 1, another eigenvector zW belonging to the eigenvalue $ 6= 1. On art. 3, zW | = 0, which is
¤
only possible if not all components of zW are positive.
The corollary is important because no other eigenvector of S than | W = can represent a (discrete) probability density. Since the null vector is never
an eigenvector, the corollary implies that at least one component in the
other eigenvectors must be negative.
Since the characteristic polynomial of S has real coe!cients (because Slm
is real), the eigenvalues occur in complex conjugate pairs. Since = 1 is an
eigenvalue, for an even number of state q, there must be at least another
real eigenvalue obeying 1 ? 1. It has been proved that the boundary
of the locations of the eigenvalues inside the unit disc consists of a finite
number of points on the unit circle joined by certain curvilinear arcs.
There exist an interesting property of a rank-one update S̄ of a stochastic
matrix S . The lemma is of a general nature and also applies to reducible
Markov chains with several eigenvalues m = 1 for 1 ? m n.
Lemma A.4.4 If {1> 2 > 3 > = = = > q } are the eigenvalues of the stochastic
matrix S , then the eigenvalues of S̄ = S + (1 )xy W , where y W is any
probability vector, are {1> 2 > 3 > = = = > q }.
A.4 Stochastic matrices
453
Proof: We start from the eigenvalues equation (A.2)
¢
¡
¢
¡
det S̄ L = det S L + (1 )xy W
³
³
´´
= det (S L) L + (S L)31 (1 )xyW
´
³
= det (S L) det L + (1 ) (S L)31 xy W
Applying the formula
¢
¡
det L + fgW = 1 + gW f
(A.38)
which follows, after taking the determinant, from the matrix identity
¶µ
¶µ
¶ µ
¶
µ
L
0
L + fgW f
L
f
L 0
=
gW 1
0 1 + gW f
gW 1
0
1
gives
´
³
¡
¢
det S̄ L = det (S L) 1 + y W (1 ) (S L)31 x
Since the row sum of a stochastic matrix S is 1, we have that S x = x and,
thus, (S L) x = ( ) x from which (S L)31 x = ( )31 x.
Using this result leads to
1 + y W (1 ) (S L)31 x = 1 +
1
1 W
1
y x=1+
=
because a probability vector is normalized to 1, i.e. y W x = 1. Hence, we end
up with
¡
¢
1
det S̄ L = det (S L)
Invoking (A.19) yields
¡
¢
det S̄ L =
q
Y
n=1
Y
1
= (1 )
(n )
(n )
q
n=2
which shows that the eigenvalues of S̄ are {1> 2 > 3 > = = = > Q }.
¤
A similar property may occur in a special case where a Markov chain is
supplemented by an additional state q + 1 which connects to every other
state and to which every other state is connected (such that S̄ is irreducible).
Then,
¶
µ
S (1 )x
S̄ =
0
yW
454
Stochastic matrices
with corresponding eigenvalues {1> 2 > 3 > = = = > q > 0}. This result is similarly proved as Lemma A.4.4 using (Meyer, 2000, p. 475)
¶
µ
¡
¢
D E
= det D det G FD31 E
det
(A.39)
F G
provided D31 exists unless F = 0.
A.4.2 Example: the two-state Markov chain
The two-state Markov chain is defined by
¸
1s
s
S =
t
1t
Observe that det S = 1st. The eigenvalues of S satisfy the characteristic
polynomial f() = 2 (2 s t) + det S = 0, from which 1 = 1 and
2 = 1 s t = det S . The adjoint matrix T () is computed (art. 7) via
the polynomial f()3f()
3 ,
f() f()
= + (2 s t)
and after $ L and $ S
T () = L + S (2 s t)L
¸
1+t
s
=
t
1+s
The (unscaled) right- (left-) eigenvectors of S follow as the non-zero columns
(rows) of T (). For 1 = 1, we find {1 = (1> 1) and |1W = (t> s). For
2 = 1st, the eigenvector {2 = (s> t) and |2W = (1> 1). Normalization
1
1
(1> 1) and {2 = s+t
(s> t). If the
(art. 4) requires that |nW {n = 1 or {1 = s+t
eigenvalues are distinct (s + t 6= 0), the matrix S can be written as (art. 4)
S = [diag(n )\ W ,
¸
¸
¸
1
1 s
1
0
t s
S =
0 1st
1 1
s + t 1 t
from which any power S n is immediate as
¸
¸
¸
1
1
0
1 s
t s
n
S =
1 1
0 (1 s t)n
s + t 1 t
¸
¸
n
1
(1 s t)
t s
s s
=
+
t t
s+t t s
s+t
(A.40)
A.4 Stochastic matrices
The steady-state matrix S " = limn<" S n follows as
¸ ¸
1
t s
"
S =
s+t t s
455
(A.41)
because |1 s t| ? 1.
Alternatively, the steady-state vector is a solution of (9.25),
¸ ¸
¸
0
s t
1
=
1
2
1 1
¸
s t
Applying Cramer’s rule with G = det
= (s + t), we obtain
1 1
¸
¸
0 t
s 0
1
1
and 2 = G
or
1 = G
det
det
1 1
1 1
h
i
t
s
= s+t
s+t
which indeed agrees with (A.41) and (9.37).
A.4.3 The tendency towards the steady-state
A stochastic matrix S and the corresponding Markov chain is regular if the
only eigenvalue with || = 1 is = 1. It is fully regular if, in addition,
= 1 is a simple zero of the characteristic polynomial of S . The Frobenius Theorem A.4.2 indicates that a regular matrix is necessarily reducible.
Application (c) in Section A.3.2 demonstrates that the steady-state only
exists for regular Markov chains. Alternatively, a regular matrix S has the
property that S n A R (for some n), i.e. all elements are strictly positive.
In the sequel, we concentrate on fully regular stochastic matrices S , where
all eigenvalues lie within the unit circle, except for the largest one, = 1.
If the Q eigenvalues of the regular stochastic matrix S are ordered as 1 =
1 A |2 | · · · |Q | 0, the second largest eigenvalue 2 will determine
the speed of convergence of the Markov chain towards the steady-state.
A.4.3.1 Example: the three-state Markov chain
The three-state Markov chain S is defined by (9.7) with Q = 3. Assuming
that S is irreducible, we determine the eigenvalues. Since the Frobenius
Theorem A.4.2 already determines one eigenvalue 1 = 1, the remaining
two 2 and 3 are found from (A.6) and (A.7). They obey the equations
2 3 = det S
2 + 3 = S11 + S22 + S33 1 = trace(S ) 1
456
Stochastic matrices
or the quadratic equation {2 (2 + 3 ) { + 2 3 = 0. The explicit solution
is
q
1
1
2 = (trace(S ) 1) +
(trace(S ) 1)2 4 det S
2
2q
1
1
(trace(S ) 1)2 4 det S
3 = (trace(S ) 1) 2
2
All eigenvalues are real if the discriminant (trace(S ) 1)2 4 det S is nonnegative which leads to three cases:
(a) In case (trace(S ) 1)2 A 4 det S , the eigenvalues obey 1 A 2 A
3 , but not necessarily 1 A |2 | A |3 |. The latter inequality is true if
trace(S ) A 1, in which case the speed of convergence towards the steadystate is determined
by the decay of (2 )n as n $ 4. If trace(S ) = 1, then
s
2 = 3 = det S and if trace(S ) ? 1, |3 | determines the speed of
convergence. Notice that 2 A 12 (trace(S ) 1) 12 .
(b) In case (trace(S ) 1)2 ? 4 det S , there are two complex conjugate
roots 2 = + l and 3 = l, both with the same modulus |2 | = |3 |
equal to 2 + 2 = 2 3 = det S and with real part = 12 (trace(S ) 1). In
this case, we have that 0 det S ? 1. Hence, the Markov chain converges
´n
³s
det S
in the discrete-time n.
towards the steady-state as
(c) In case (trace(S ) 1)2 = 4 det S , there is a double eigenvalue
= 2 = 3 =
s
1
(trace(S ) 1) = ± det S
2
and S cannot be reduced by a similarity transform K to a diagonal matrix
(Section A.1, art. 4) that but to the Jordan canonical form F such that
S n = K 31 F n K. Since (Meyer, 2000, pp. 599—600)
6
35
64n 5
1 0 0
1 0
0
F n = C7 0 1 8D = 7 0 n nn31 8
0 0
n
0 0 the Markov chain converges towards the steady-state as n
³s
´n31
det S
in
the discrete-time n. We observe that 12 2 = 3 ? 1 because 0 trace(S ) ? 3. If trace(S ) = 3, then S = L, and S is not irreducible.
The fastest possible convergence occurs when 2 = 3 = 0 or when
det S = 0 and trace(S ) = 1 in which case S has rank 1. In any matrix of rank 1, all row vectors are linearly dependent. Since the column
sum of a stochastic matrix S is 1 by (9.8), every row in S is precisely the
same and (9.6) shows that after one discrete-time step, the steady-state
£
A.5 Special types of stochastic matrices
¤
457
= Q1 Q1 · · · Q1 is reached. As shown in Section 9.3.1, a transition
probability matrix with constant rows can be regarded as a limit transition probability matrix D = limn<" S̆ n of a Markov process with transition
probability matrix S̆ .
A.5 Special types of stochastic matrices
A.5.1 Doubly stochastic matrices
A doubly stochastic matrix S has both row and column sums equal to 1,
Q
X
n=1
Sln =
Q
X
Snm = 1
for all l> m
n=1
If S is symmetric, S = S W , then S is doubly stochastic, but the reverse
implication is not true. As observed in Section A.4.1, the left-eigenvector
| W = and the right-eigenvector { = x belonging to eigenvalue = 1 satisfy
| W x = 1. For doubly stochastic matrices, it holds that the role of left- and
right-eigenvector can be reversed, which leads to | = { or
¤
£
= Q1 Q1 · · · Q1
The example
in Section A.4.3
¤ illustrates that a steady-state vector equal
£
does not necessarily imply that S is doubly
to = Q1 Q1 · · · Q1
stochastic.
A.5.2 Tri-diagonal bandmatrices
A.5.2.1 Tri-diagonal Toeplitz bandmatrix
A Toeplitz matrix has constant entries on each diagonal parallel to the main
diagonal. Of particular interest is the Q × Q tri-diagonal Toeplitz matrix,
6
5
e d
:
9 f e
d
9
:
9
:
.. .. ..
D=9
:
.
.
.
9
:
7
f
e d 8
f
e
that arises in the Markov chain of the random walk and the birth and death
process. Moreover, the eigenstructure of the tri-diagonal Toeplitz matrix D
can be expressed in analytic form.
458
Stochastic matrices
An eigenvector { corresponding to eigenvalue satisfies (D L) { = 0
or, written per component,
(e ){1 + d{2 = 0
f{n31 + (e ){n + d{n+1 = 0
2n Q 1
f{Q31 + (e ){Q = 0
We assume that d 6= 0 and f 6= 0 and rewrite the set with {0 = {Q+1 = 0 as
µ
¶
³f´
e
0n Q 1
{n+1 +
{n = 0
{n+2 +
d
d
which are second order dierence equations with constant coe!cients. The
n
general solution of these equations is {n = u¡1n +u
2 where
¢
¡ f ¢ u1 and u2 are the
e3
2
roots of the corresponding polynomial { + d { + d = 0. If u1 = u2 ,
the general solution is {n = u1n + nu1n , which is impossible since it implies
that all {n = 0 due to the fact that {0 = {Q+1 = 0, which forces u1 to be
zero. An eigenvector is never the zero vector. Thus, we have distinct roots
u1 6= u2 that satisfy
µ
¶
e
u1 + u2 = d
f
u1 u2 =
d
The constants and follow from the boundary requirement {0 = {Q +1 = 0
as
+ =0
u1Q +1 + u2Q+1 = 0
Rewriting the last equation with = , yields
³ ´Q+1
u1
u2
= 1 or uu12 =
2lp
h Q +1 for some 1 p Q (the root p = 0 must be rejected since u1 6= u2 ).
2lp
Substitution of u1 = u2 h Q +1 into the last root equation yields
u1 =
pf
lp
Q +1
dh
and u2 =
p f 3 lp
Q +1
dh
The first root equation is only possible for special values of = p with
1 p Q , which are the eigenvalues,
r ³
µ
¶
´
s
lp
f 3 Qlp
p
h +1 + h Q +1 = e + 2 df cos
p = e + d
d
Q +1
A.5 Special types of stochastic matrices
459
Since there are precisely Q dierent values of p, there are Q distinct eigenvalues p . The components {n of the eigenvector belonging to p are
µ
¶
³ f ´ n ³ lpn
´
³ f ´n
lpn
pn
2
2
{n = sin
h Q +1 h3 Q +1 = 2l
d
d
Q +1
The scaling constant follows from the normalization k{k1 = 1 or
2l
Q ³ ´n
X
f 2
n=1
³
Since sin
pn
Q+1
Q ³ ´n
X
f 2
n=1
d
´
d
µ
sin
pn
Q +1
¶
=1
h lpn i
= Im h Q +1 we have
µ
sin
pn
Q +1
¶
= Im
" Q µr
X
f
n=1
5
d
³p
f
h
lp
Q +1
lp
Q +1
¶n #
´Q+1
6
dh
:
91 18
= Im 7
p f lp
1 d h Q +1
³
³
´
¡p f ¢Q +1 ´
p
1 + (1)p
sin
d
Q+1
³
´
=
1
pf
p
f
1 2 d cos Q+1 + d
from which the scaling constant is
³
´
631
5³
¡p f ¢Q+1 ´
p
sin
1 + (1)p
d
Q+1
³
´
18
2l = 7
pf
p
f
1 2 d cos Q+1 + d
Finally, the components {n of the eigenvector { belonging to p become,
for 1 n Q ,
´
³
¡f ¢n
pn
2 sin
d
Q+1
{n = s f Q +1 p
1+(31) ( d )
sin( Qp
+1 )
sf
1
p
132 d cos( Q +1 )+ df
Observe that for stochastic matrices d + e + f = 1 (see the general random
walk in Section 11.2) and for the infinitesimal rate matrix d + e + f = 0
(see the birth and death process in Section 11.3), which only changes the
eigenvalue through e.
460
Stochastic matrices
A.5.2.2 Tri-diagonal AMS matrix
This section computes the exact spectrum of the tri-diagonal AMS matrix
specified in (14.51). The analysis bears some resemblance to that of the birth
and dead process with constant birth and death rates in Section 11.3.3.
The eigenvalue equation G31 T{ = { is rewritten for the m-th component
of the right-eigenvector belonging to the eigenvalue as
(Q m + 1) {m31 [( + 1 ) m + Q f] {m + (m + 1) {m+1 = 0
for 0 m Q . This dierence equation has linear coe!cients whereas
those in Section A.5.2.1 are constant. It is most conveniently solved using
P
m
generating functions. Let J (}) = Q
m=0 {m } , then the dierence equation
@ [0> Q ] to
is transformed with {m = 0 if m 5
¢
¡
¢
¡
Q } J (}) {Q } Q } 2 J0 (}) Q {Q } Q31 ( + 1 ) }J0 (})
[Q f]J (}) + J0 (}) = 0
from which the logarithmic derivative is
Q } + f Q J0 (})
= 2
J (})
} + ( + 1 ) } 1
The integration of the right-hand side requires a partial faction decomposition,
Q } + f Q } 2 + ( + 1 ) } 1
=
f2
f1
+
} u1 } u2
where u1 and u2 are the roots of the quadratic polynomial } 2 +( + 1 ) }
1 and f1 and f2 are the residues computed for n = 1> 2 as
fn = lim
}<un
(} un ) (Q } + f Q)
(} u1 ) (} u2 )
. Explicitly,
and they obey f1 + f2 = Q and f1 u2 + f2 u2 = f3Q
p
( + 1 )2 + 4
u1 =
A0
2
p
( + 1 ) ( + 1 )2 + 4
?0
u2 =
2
( + 1 ) +
(A.42)
s
with u1 u2 = 1 and u1 + u2 = +13
. Moreover, unless = 1 ± 2l A.5 Special types of stochastic matrices
461
in which case u1 = u2 = ± Il , the roots are distinct. The residues are
Q u1 + f Q (u1 u2 )
Q u2 + f Q = Q f1
f2 =
(u2 u1 )
f1 =
(A.43)
Integration now yields
log J (}) = f1 log (} u1 ) + f2 log (} u2 ) + e
or
J (}) = he (} u1 )f1 (} u2 )Q3f1
The integration constant e is obtained from lim}<" J(})
= {Q . Thus,
}Q
µ
¶f1 ³
J (})
} u1
u2 ´Q
e
=
h
lim
= he
1
lim
}<" } Q
}<" } u2
}
such that he = {Q . The obvious scaling for the eigenvector is to choose
{Q = 1 and we arrive at
J (}) =
Q
X
{m } m = (} u1 )f1 (} u2 )Q3f1
(A.44)
m=0
which shows that f1 must be an integer n 5 [0> Q ] for J (}) to be a polynomial of degree Q . Expanding the binomials with f1 = n gives
n µ ¶
Q3n
X
X µQ n¶
n m
n3m
J (}) =
} (u1 )
} q (u2 )Q 3n3q
m
q
q=0
m=0
¶
m µ ¶µ
" X
X
n
Q n
=
(u1 )n3m (u2 )Q3n3m+q } m
q
m
q
q=0
m=0
from which the eigenvector components belonging to $ (n) are, for
0 m Q,
¶
m µ ¶µ
X
n
Q n
Q3m
(A.45)
u1 n3m u2 Q3n3m+q
{m (n) = (1)
q
m
q
q=0
The requirement on f1 also leads to equations for the eigenvalues . Indeed, equating f1 = n in (A.43) and substituting the explicit expressions for
the roots u1 and u2 , we obtain after squaring the quadratic equations for the
eigenvalue (n) for 0 n Q
D(n) 2 (n) + E(n) (n) + F(n) = 0
(A.46)
462
Stochastic matrices
where
D(n) = (Q@2 n)2 (Q@2 f)2
E(n) = 2(1 ) (Q@2 n)2 Q (1 + ) (Q@2 f)
F(n) = (1 + )2 [(Q@2)2 (Q@2 n)2 ]
Each of the Q + 1 quadratic equations (A.46) has two roots 1 (n) and 2 (n),
thus in total 2(Q +1), while there are only Q +1 eigenvalues. The coe!cients
D (n), E (n) and F (n) only depend on n via (Q@2 n)2 , which means that
the quadratics (A.46) for which n0 = Q n are identical. This observation
reduces the set {1 (n)> 2 (n)}0$n$Q of roots to precisely Q + 1 and confines
the analysis to 0 n Q@2. We will show that all roots are real and
distinct (except for n = Q@2).
(n) = E 2 (n) 4D (n) F (n) is with | = (Q@2 n)2 5
£ The discriminant
¤
0> (Q@2)2 ,
¢
¡
(n) = 16| 2 + 4 (1 + ) f2 (1 + ) 2fQ + Q 2 |
2
{(n)
which shows that (n) is concave in | because g g|
= 32 ? 0, for
2
| = 0, (Q@2) = 0 and, for | = (Q@2)2 , (0) = Q 2 (f(1 + ) Q )2 A 0
and, hence, (n) 0 for n 5 [0> Q@2]. This means that, for 0 n ? Q@2,
the roots 1 (n) and 2 (n) are real and distinct and, for n = Q@2 (only if Q
is even) where (Q@2) = 0,
1 (Q@2) = 2 (Q@2) =
E (Q@2)
1+
=
2D (Q@2)
1 2 Qf
For ? n Q@2, the roots {1 ()> 2 ()} are dierent from the roots
{1 (n)> 2 (n)} because D(n) } 2 + E(n) } + F(n) ? D() } 2 + E() } + F()
for all }. Indeed, D() D(n) = (Q@2 )2 (Q@2 n)2 A 0 and the
discriminant (E() E (n))2 4 (D() D(n)) (F() F(n)) ? 0 shows
that there are no real solutions. Thus, an extreme eigenvalue occurs for
n = 0 for which F (0) = 0 such that 1 (0) = 0 and
2 (0) = 1 + Q
E (0)
f
=
D (0)
1 Qf
(A.47)
Q
? 1 and f ? Q shows that 2 (0) ? 0,
The stability requirement = f(1+)
and thus 2 (0) is the largest negative eigenvalue. The eigenvalues for other
0 ? n Q@2 are either larger than 0 or smaller than 2 (0). We need to
consider two dierent cases (a) f ? Q@2 and (b) f A Q@2 while F (n) ? 0
for all n 5 [0> Q).
(a) If f ? Q@2 and if 0 n ? f and , then D (n) A 0. Hence, the product
A.5 Special types of stochastic matrices
463
1 (n)2 (n) = F(n)
D(n) ? 0 which means that 1 (n) A 0 A 2 (n) and that there
are precisely [f] positive eigenvalues. Similarly, D (n) ? 0 for f ? n ? Q@2,
such that 1 (n)2 (n) A 0 while 1 (n) + 2 (n) = E(n)
D(n) ? 0 shows that both
eigenvalues are negative because E (n) ? 0. Indeed, if 1 and f ? Q@2,
the above expression immediately leads to E (n) ? 0 while if ? 1 and
f ? Q@2, the expression
"µ
µ
¶2 µ
¶2 #
¶
¸
Q
Q
Q
Q
2f
n f
f
+1
E(n) = 2(1 )
2
2
2
f
shows that both terms are negative.
(b) If f A Q@2, we see that D (n) A 0 for 0 ? n ? Q f leading to
1 (n) A 0 A 2 (n). For Q f ? n ? Q@2, we have D (n) ? 0 and thus
1 (n)2 (n) = F(n)
D(n) A 0 while their same sign follows from 1 (n) + 2 (n) =
E(n)
D(n) requires us to consider the sign of E (n). If 1, then E (n) A 0. If
A 1, then
µ
¶
Q
E(n) = Q (1 + ) f + 2(1 ) (Q@2 n)2
2
µ
¶
µ
¶
Q
Q 2
? Q (1 + ) f + 2(1 ) f 2
2
µ
¶µ
¶
Q
(Q f)
= 2f f +1 A0
2
f
which shows that 0 ? 2 (n) ? 1 (n). Hence, there are Q [f] + 2(Q@2 Q + [f]) = [f] positive eigenvalues.
In summary, there are [f] positive eigenvalues, one 1 (0) = 0 and Q [f]
negative eigenvalues. Relabel the eigenvalues as (n > Q3n ) = (1 (n)> 2 (n))
in increasing order Q3[f]31 ? · · · ? 1 ? 0 ? Q = 0 ? Q31 ? · · · ?
Q3[f] .This way of writing distinguishes between underload and overload
eigenvalues. In terms of the discriminant by (n) = E 2 (n) 4D (n) F (n),
the non-positive eigenvalues are
(a) If f ? Q@2,
s
3E(n)3 {(n)
2D(n)
s
3E(n)~ {(n)
1>2 (n) =
2D(n)
1 (n) =
0 n [f]
[f] + 1 n Q2
(b) If f A Q@2,
s
3E(n)3 {(n)
1 (n) =
2D(n)
0 n Q [f] 1
464
Stochastic matrices
The eigenvector belonging to m follows from (A.45) where u1 and u2 are
given in (A.42) and n is determined from (A.43) since n = f1 . The eigenvectors for 1 (n) and 2 (n) belonging to a same quadratic n must be dierent.
Especially in this case, the corresponding n = f1 values can be determined
from (A.43). For example, for Q = 0, we find u1 = 1, u2 = 1 and n = 0
and the eigenvector belonging to Q is with (A.45),
µ ¶ m
µ ¶
Q Q3m Q
3m Q 3m
{m (0) = (1)
=
(A.48)
u1 u2
m Q
m
After renormalization such that k{(0)k1 = 1, i.e. by dividing each com¡Q ¢ m
P
(1+)Q
1 PQ
, the steady-state vector
ponent by Q
m=0 {m (0) = Q
m=0 m =
Q
(14.52) is obtained. Similarly, for the largest negative eigenvalue 0 in (A.47),
we find with u1 = 1 Qf , u2 = Q131 and n = f1 = Q such that
(f )
µ ¶µ
¶Q3m
µ ¶
Q
Q
Q3m Q
Q 3m 0
u2 =
(A.49)
1
u1
{m (Q ) = (1)
m
f
m
The left-eigenvectors | satisfy (A.9): | W G31 T = |W . The above approach is applicable. However, there is a more elegant method based on the
observation that there exists a diagonal matrix q
Z = gldj (Z0 > = = = > ZQ ) for
¡Q ¢
¡ 31
¢W
31
31 TZ is
m
which Z TZ = Z TZ , namely Zm =
m . Since Z
symmetric, the left- and right-eigenvectors corresponding to the same eigenvalue are the same (Section A.2, art. 9). Now |W G31 T = | W is equivalent
to
¡
¢31 ¡ 31
¢
Z TZ
| W Z = |W Z Z 31 G31 Z Z 31 TZ = | W Z Z 31 GZ
W = |W Z , G
31 GZ and T
31 TZ = TW , we obtain
With |Z
Z = Z
Z = Z
Z
31
31
W
W
|Z GZ TZ = |Z . The transpose |Z = TZ GZ |Z is
Z 2 | = TG31 Z 2 |
which shows compared to G31 T{ = { that { = Z 2 | or, the vector components are, for 0 m Q ,
µ ¶
Q m
{m =
|m
(A.50)
m
A.5.2.3 General tri-diagonal matrices
Since tri-diagonal matrices of the form (11.1) frequently occur in Markov
theory, we devote this section to illustrate how far the eigen-analysis can be
A.5 Special types of stochastic matrices
465
pushed. For an eigenpair (the right-eigenvector { belonging to eigenvalue
), the components in (S L){ = 0 satisfy
(u0 ) {0 + s0 {1 = 0
tm {m31 + (um ) {m + sm {m+1 = 0
1m?Q
tQ {Q31 + (uQ ) {Q = 0
If sm = s and tm = t, the matrix S reduces to a Toeplitz form for which the
eigenvalues and eigenvectors can be explicitly written, as shown in Appendix A.5.2.1. Here, we consider the general case and show how orthogonal
polynomials enter the scene.
Using um = 1 tm sm , u0 = 1 s0 and uQ = 1 tQ , the set becomes,
with = 1,
s0 + {0 ()
s0
sm + tm + tm
{m () {m31 ()
{m+1 () =
sm
sm
tQ
{Q 31 ()
{Q () =
tQ + {1 () =
1m?Q
(A.51)
The dependence on the eigenvalue is made explicit. Solving (A.51) iteratively for m ? Q ,
¢
{0 () ¡ 2
+ (t1 + s1 + s0 ) + s1 s0
s0 s1
¢
{0 () ¡ 3
+ (t1 + t2 + s2 + s1 + s0 ) 2
{3 () =
s2 s1 s0
+ (t2 t1 + t2 s0 + s2 t1 + s2 s1 + s2 s0 + s1 s0 ) + s2 s1 s0
{2 () =
reveals a polynomial of degree m in the eigenvalue = 1. By inspection,
the general form of {m () for m ? Q is
m
{0 () X
{m () = Qm31
fn (m) n
p=0 sp n=0
(A.52)
466
Stochastic matrices
with
fm (m) = 1
fm31 (m) =
f0 (m) =
m31
X
p=0
m31
Y
(sp + tp )
sp
p=0
where t0 = sQ = 0. By substituting (A.52) into (A.51),
m31
X
fn (m + 1) n =
n=1
m31
X
[(tm + sm ) fn (m) tm sm31 fn (m 1) + fn31 (m)] n
n=1
and equating the corresponding powers in , a recursion relation for the
coe!cients fn (m) (0 n ? Q ) is obtained with fm (m) = 1,
fn (m + 1) = (tm + sm ) fn (m) tm sm31 fn (m 1) + fn31 (m)
from which all coe!cients can be determined. Finally, for m = Q , the explicit
form of {Q () follows from (A.51) as
tQ
tQ
{0 () X
fn (Q 1) n
{Q31 () =
{Q () =
QQ 32
tQ + tQ + p=0 sp
Q 31
n=0
We can always scale an eigenvector without eecting the corresponding
eigenvalue. If we require a normalization of the eigenvector k{()k1 = 1,
then {0 () is uniquely determined,
{0 () =
1
¯
¯
¯P
¯
¯
PQ 31 ¯¯Pm
¯
fn (m)
fn (Q31) n ¯
T
1 + m=1 ¯ n=0 Tm31
n ¯¯ + |tQtQ+| ¯ Q31
¯
Q 32
n=0
p=0 sp
p=0 sp
Another scaling consists of choosing {0 () = 1. Hence, apart from the
eigenvalue , all eigenvector components {m () are explicitly determined.
If = 1 or = 0, the solution is {m () = {0 (0). If k{()k1 = 1, then {m ()
1
, which is, after proper scaling by Q + 1 (art. 4 in Section A.1), the
= Q+1
£
¤
right-eigenvector x = 1 1 · · · 1 belonging to the left-eigenvector (see also Section A.4.1). If {0 () = 1, we immediate obtain x. Eigenvectors
belonging to dierent eigenvalues 0 6= are linearly independent (art. 3 in
Section A.1), but only orthogonal if S = S W , i.e. if sm = tm+1 . Only in the
latter case (art. 9 in Section A.2), where also all eigenvalues are real, we
A.5 Special types of stochastic matrices
467
have
Q
X
¡ ¢
{m () {m 0 = k{()k22 0
m=0
This orthogonality requirement determines the dierent eigenvalues . Since
0 = 0 is an eigenvalue, each other real eigenvalue 6= 0 must obey
Q
X
{m () = 0
m=0
P
while the normalization enforces k{()k1 = Q
m=0 |{m ()| = 1. The scaling
PQ
{0 () = 1 leads to the polynomial m=0 en n of degree Q whose Q zeros
equal the eigenvalues 6= 0 and whose coe!cients are, with sm = tm+1 and
for 2 n Q 2,
e0 = (Q + 1) tQ
e1 = Q + tQ
en =
Q
32
X
f1 (m)
f1 (Q 1)
+ 2tQ QQ32
Qm31
p=0 sp
p=0 sp
m=1
Q32
X
Q31
X fn31 (m)
tQ fn (m)
2tQ fn (Q 1)
+
+
QQ32
Qm31
Qm31
p=0 sp
p=0 sp
p=0 sp
m=n
m=n31
sQ32 + 2tQ + fQ32 (Q 1)
QQ32
p=0 sp
1
eQ = QQ31
p=0 sp
eQ31 =
The Newton identities (B.9) relate these coe!cients to the sum of integer
powers of the real zeros 6= 0.
Proceeding much further in the case that S is not symmetric is di!cult.
A similarity transform is needed to transform the linearly independent set of
vectors { () for dierent to an orthogonal set from which the eigenvalues
then follow, as in the symmetric case above. Karlin and McGregor (see
Schoutens (2000, Chapter 3)) have shown the existence of a set of orthogonal
polynomials (similar to our set {m ()) that obey an integral orthogonality
condition (similar to Legendre or Chebyshev polynomials) instead of our
summation orthogonality condition. Only in particular cases, however, were
they able to specify this orthogonal set explicitly.
468
Stochastic matrices
A.5.3 A triangular matrix complemented with one subdiagonal
The transition probability matrix S has the structure of a triangular matrix
complemented with one subdiagonal,
5
6
S00 S01 S02
···
···
S0Q
9 S10 S11 S12
···
···
S1Q :
9
:
9 0 S21 S22
···
···
S2Q :
9
:
S =9 .
:
..
..
..
..
9 ..
:
.
.
···
.
.
9
:
7 0
0 · · · SQ31>Q32 SQ31;Q31 SQ31;Q 8
0
0 ···
0
SQ;Q31
SQQ
Besides the normalization kk1 = 1, the steady-state vector obeys the
relation = =S , or per vector component (9.23),
m =
m+1
X
Snm n
n=0
because Snm = 0 if n A m + 1. Immediately we obtain an iterative equation
that expresses m+1 (for m ? Q ) in terms of the n for 0 n m as
µ
m+1 =
1 Smm
Sm+1;m
¶
m31
X
Snm
n
m Sm+1;m
n=0
Let us consider the eigenvalue equation (A.1) that is written for stochastic
matrices as (S L)W {W = 0. The matrix (S L)W is a (Q + 1) × (Q + 1)
matrix of rank Q because det(S L)W = 0 (else all eigenvectors { are zero).
When writing this set of equations in terms of {0 , we produce the following
set of Q equations,
5
S10
9 S11 3 9
9
9 S12
9
9
9
9
···
9
9
..
7
.
0
S21
0
0
···
···
S22 3 S32
···
..
.
···
..
.
S1;Q 31
S2;Q 31
S3;Q 31
···
..
.
0
0
..
.
0
0
..
.
···
···
···
···
SQ 31;Q 32
SQ 31;Q 31 3 0
SQ ;Q 31
6
6 5 3S
6
5 {
:
1
00
:
: 9 {2 : 9 3S01 :
: 9 {3 : 9 3S02 :
: 9
:
:9
:=9
: {0
:=9
..
..
: 9
:
:9
: 9
:
:9
.
.
:7
8 7
8
: {Q 31
3S0;Q 32
8
{Q
3S0;Q 31
Since the right hand side matrix is a triangular
Q matrix, the determinant
equals the product of the diagonal elements or Q31
n=0 Sn+1;n . By Cramer’s
A.5 Special types of stochastic matrices
469
rule, we find that
5
S10
9 S11 3 9
9
9
S12
9
9
..
9
9
.
9
det 9
..
9
.
9
9
9
9
S1m
9
9
..
7
.
{m
=
{0
S1;Q 31
0
S21
S22 3 ..
.
..
.
S2m
..
.
S2;Q 31
···
···
..
.
0
0
..
.
3 S00
3S01
..
.
···
Sm31;m32
3S0>m32
0
0
..
.
..
.
···
Sm31;m31 3 3S0;m31
0
···
···
..
.
···
Sm31;m
..
.
3S0m
..
.
3S0;Q 31
Sm+1;m
..
.
···
..
.
···
Sm31;Q 31
TQ 31
n=0
Sm+1;Q 31
···
···
···
···
0
0
..
.
..
.
..
.
..
.
0
SQ ;Q 31
6
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
8
Sn+1;n
The above determinant is of the form (Meyer, 2000, p. 467)
¸
Dm×m
Rm×Q3m
det
= det F det D
EQ3m×m FQ 3m×Q3m
Q 31
where det F = Q
n=m Sn+1;n . In the determinant det D, we can change the
m-th column with the (m 1)-th, and subsequently, the (m 1) th with the
(m 2)-th and so on until the last column is permuted to the first column,
in total m 1 permutations. After changing the sign of that first column,
the result is that det D = (1)m det (Sm×m Lm×m ) where Sm×m is the original
transition probability matrix limited to m states (instead of Q + 1). Hence,
for 1 m Q ,
{0 (1)m det (Sm×m Lm×m )
{m =
Qm31
n=0 Sn+1;n
and the normalization of eigenvectors k{k1 = 1 determines {0 as
{0 =
1+
PQ
1
(31)m det(Sm×m 3Lm×m )
Tm31
m=1
n=0 Sn+1;n
If the Q + 1 eigenvalues are known, we observe that all eigenvectors can
be expressed in terms of the original matrix S in a same way.
Appendix B
Algebraic graph theory
This appendix reviews the elementary basics of the matrix theory for graphs
J (Q> O). The book by Cvetkovic et al. (1995) is the current standard work
on algebraic graph theory.
B.1 The adjacency and incidence matrix
1. The adjacency matrix D of a graph J with Q nodes is an Q × Q matrix
with elements dlm = 1 only if (l> m) is a link of J, otherwise dlm = 0. Because
the existence of a link implies that dlm = dml , the adjacency matrix D = DW
is a real symmetric matrix. It is assumed further that the graph J does
not contain self-loops (dll = 0) nor multiple links between two nodes. The
complement Jf of the graph J consists of the same set of nodes but with a
link between (l> m) if there is no link (l> m) in J and vice versa. Thus, (Jf )f =
J and the adjacency matrix Df of the complement Jf is Df = M L D
where M is the all-one matrix ((M)lm = 1). Information about the direction
1
3
4
2
6
5
Fig. B.1. A graph with Q = 6 and O = 9. The links are lexicographically ordered,
h1 = 1 $ 2> h2 = 1 $ 3> h3 = 1 # 6 etc.
of the links is specified by the incidence matrix E, an Q × O matrix with
471
472
Algebraic graph theory
elements
;
? 1 if link hm = l $ m
elm =
1 if link hm = l # m
=
0 otherwise
Figure B.1 exemplifies the definition of D and E:
5
0
91
9
91
D=9
90
70
1
1
0
1
0
1
1
1
1
0
1
0
0
0
0
1
0
1
0
0
1
0
1
0
1
6
5
6
1
1
1 1
0
0
0
0
0
0
9 1
1:
0
0
1 1 1
0
0
0:
:
9
:
0:
0 1
0
0
1
0
0:
9 0 1
E =9
:
0:
0
0
0
0
0 1 1
0:
9 0
:
8
7
1
0
0
0
0
1
0
0
1 1 8
0
0
0
1
0
0
1
0
0
1
2. The relation between adjacency and incidence matrix is given by the
admittance matrix or Laplacian T,
T = EE W = D
where = diag(g1 > g2 > = = = > gQ ) is the degree matrix. Indeed, if l 6= m and
noting that each column has only two non-zero elements at a dierent row,
O
X
¡
¢
tlm= EE W lm =
eln emn = 1
n=1
PO
2
If l = m, then
n=1 eln = gl , the number of links that have node l in
common.
Also, by the definition of D, the row sum l of D equals the degree gl of
node l,
Q
X
gl =
dln
(B.1)
PQ
n=1
Consequently, each row sum n=1 tln = 0 which shows that T is singular
implying that det T = 0. The Laplacian is symmetric T = TW because D
and are both symmetric and the quadratic form defined in Section A.2
art. 10,
{W T{ = {W TW { = {W E W E{ = kE{k22 0
is positive semidefinite, which implies that all eigenvalues of T are nonnegative and at least one is zero because det T = 0.
P PQ
Since Q
l=1
n=1 dln = 2O, the basic law for the degree follows as
Q
X
l=1
gl = 2O
(B.2)
B.1 The adjacency and incidence matrix
473
Notice that S = 31 D is a stochastic matrix because all elements of S lie
in the interval [0> 1] and each row sum is 1.
3. Let M denote the all-one matrix with (M)lm = 1 and (J) the total
number of spanning trees in the graph J, also called the complexity of J,
then
adjT = (J) M
(B.3)
adjT
where T31 = det
T . We omit the proof, but apply the relation (B.3) to
the complete graph NQ where T = Q L M. Equation (B.3) demonstrates
that all elements of adjT are equal to (J). Hence, it su!ces to compute
one suitable element of adjT, for example (adjT)11 that is equal to the
determinant of the (Q 1) × (Q 1) principal submatrix of T obtained by
deleting the first row and column in T,
5
6
Q 1
1
===
1
9 1
Q 1 ===
1 :
9
:
(adjT)11 = det 9
:
..
..
..
7
8
.
.
.
1
1
=== Q 1
Adding all rows to the first and subsequently adding this new first row to
all other rows gives
6
5
1 1
1
1
===
1
1 :
9 0 Q
9 1 Q 1 = = =
: = det 9 .
(adjT)11 = det 9
..
..
8
7 ..
7 ...
.
.
0 0
1
1
=== Q 1
5
6
=== 1
=== 0 :
. : = Q Q 2
..
. .. 8
=== Q
Hence, the total number of spanning trees in the complete graph NQ which
is also the total number of possible spanning trees in any graph with Q
nodes equals Q Q32 . This is a famous theorem of Cayley of which many
proofs exist (van Lint and Wilson, 1996, Chapter 2).
4. The complexity of J is also given by
(J) =
det (M + T)
Q2
Indeed, observe that MT = (ME) E W = 0 since ME = 0. Hence,
(Q L M) (M + T) = Q M + QT M 2 MT = Q T
and
adj ((Q L M) (M + T)) = adj (M + T) adj (Q L M) = adj (QT)
(B.4)
474
Algebraic graph theory
Since TNQ = Q L M and as shown in art. 3, adj(Q L M) = Q Q32 M and
since adj(Q T) = Q Q31 adjT = Q Q 31 (J) M where we have used (B.3),
adj (M + T) M = Q (J) M
Left-multiplication with M+T taking into account that MT = 0 and M 2 = Q M
finally gives
(M + T) adj (M + T) M = det (M + T) M = Q 2 (J) M
which proves (B.4).
5. A walk of length n from node l to node m is a succession of n arcs of
the form (q0 $ q1 )(q1 $ q2 ) · · · (qn31 $ qn ) where q0 = l and qn = m.
A path is a walk in which all vertices are dierent, i.e. qo 6= qp for all
0 o 6= p n.
Lemma B.1.1 The ¡number
of walks of length n from node l to node m is
¢
equal to the element Dn lm .
Proof (by induction): For n = 1, the number of walks of length 1
between state l and m equals the number of direct links between l and m,
which is by definition the element dlm in the adjacency matrix D. Suppose
the lemma holds for n 1. A walk of length n consists of a walk of length
n 1 from l to some vertex u which is adjacent to m. By the
induction
¢
¡ n31
and
hypothesis, the number of walks of length n 1 from l to u is D
lu
total
number
the number of walks with length 1 from u to m equals d¡um . The
¢
¡ n¢
P
n31
of walks from l to m with length n then equals Q
d
=
D lm
um
u=1 D
lu
(by the rules of matrix multiplication).
¤
Explicitly,
Q
Q X
Q
³ ´
X
X
=
···
dlu1 du1 u2 · · · dun32 un31 dun31 m
Dn
lm
u1 =1 u2 =1
un31 =1
As shown in Section 15.2, the number of paths with n hops between node l
and node m is
X
X
X
[n (l $ m; Q ) =
···
dlu1 du1 u2 · · · dun31 m
u1 6={l>m} u2 6={l>u1 >m}
un31 6={l>u1 >===>un32 >m}
The definition of a path restricts the first index u1 to Q 2 possible values,
the second u2 to Q 3, etc.. such that the total possible number of paths is
n31
Y
o=1
(Q 1 o) =
(Q 2)!
(Q n 1)!
B.2 The eigenvalues of the adjacency matrix
475
whereas the total possible number of walks clearly is Q n31 .
A graph is connected if, for each pair of nodes, there
a walk or,
¢
¡ exists
n
equivalently, if there exists some integer n A 0 for which D lm 6= 0 for each
¡ ¢
l> m. The lowest integer n for which Dn lm 6= 0 for each pair of nodes l> m
is called the diameter of the graph J. Lemma B.1.1 demonstrates that the
diameter equals the length of the longest shortest hop path in J.
B.2 The eigenvalues of the adjacency matrix
In this section, only general results of the eigenvalue spectrum of a graph J
are treated. For special types of graphs, there exists a wealth of additional,
but specific properties of the eigenvalues.
1. Since D is a real symmetric matrix, it has Q real eigenvalues (Section
A.2), which we order as 1 2 · · · Q . Section A.1, art. 4 shows that,
apart from a similarity transform, the set of eigenvalues with corresponding
eigenvectors is unique. A similarity transform consists of a relabeling of
the nodes in the graph that obviously does not alter the structure of the
graph but merely expresses the eigenvectors in a dierent base. The classical Perron-Frobenius Theorem for non-negative square matrices (of which
Theorem A.4.2 is a special case) states that 1 is a simple and positive root
of the characteristic polynomial in (A.3) possessing the only eigenvector of
D with non-zero components. Moreover, it follows from (A.34) that
PQ PQ
{W D{
l=1
m=1 dlm {l {m
1 = sup W = max
PQ 2
{6=0
{6=0 { {
l=1 {l
The maximum is attained if and only if { is the eigenvector of D belonging
W
as shown in Section A.3.
to 1 and for any other vector | 6= {, 1 {{WD{
{
P
By choosing the vector | = x = (1> 1> = = = > 1), we have, with Q
m=1 dlm = gl
and (B.2),
1 XX
1 X
2O
dlm =
gl =
Q
Q
Q
Q
Q
l=1 m=1
l=1
Q
1 (B.5)
The stochastic matrix S = 31 D where = diag(g¡1 > g2 > = = = > gQ¢) is
the degree matrix has the characteristic polynomial det 31 D L =
Q
Y
det(D3{)
where
det
=
gm . Since the largest eigenvalue of a stochastic
det {
m=1
matrix equals 1 = 1 (Theorem A.4.2), for a regular graph where gm = u,
the largest eigenvalue equals 1 = u.
476
Algebraic graph theory
2. Since dll = 0, we have that trace(D) = 0. From (A.7),
Q 31
(1)
fQ31 =
Q
X
n = 0
(B.6)
n=1
3. The Newton identities for polynomials. Let sq (}) denote a polynomial
of order q defined as
sq (}) =
q
X
n
dn (q) } = dq (q)
n=0
q
Y
(} }n (q))
(B.7)
n=1
where {}n (q)} are the q zeros. It follows from (B.7) that sq (0) = d0 (q) =
Q
dq (q) qn=1 (}n (q)). The logarithmic derivative of (B.7) is
s0q (}) = sq (})
q
X
n=1
1
} }n (q)
For } A maxn }n (q) (which is always possible for polynomials, but not for
functions), we have that
s0q (}) = sq (})
q X
"
X
(}n (q))m
} m+1
n=1 m=0
= sq (})
"
X
]m (q)
m=0
} m+1
where
]m (q) =
q
X
(}n (q))m
n=1
Thus
q
X
ndn (q) } n =
n=1
q
X
dn (q) } n
"
X
]m (q)} 3m =
m=0
n=0
q
" X
X
dn (q) ]m (q)} n3m (B.8)
m=0 n=0
Let o = n m, then 4 o q. Also m = n o 0 such that n o.
Combined with 0 n q, we have max(0> o) n q. Thus,
q
" X
X
dn (q) ]m (q)} n3m =
m=0 n=0
q
X
q
X
dn (q) ]n3o (q)} o
o=3" n=max(o>0)
=
31 X
q
X
o=3" n=0
dn (q) ]n3o (q)} o +
q X
q
X
o=0 n=o
dn (q) ]n3o (q)} o
B.2 The eigenvalues of the adjacency matrix
477
Equating the corresponding powers of } in (B.8) yields
d0 (q)]3o (q) +
q
X
n=1
q
X
dn (q) ]n3o (q) = 0
o0
dn (q) ]n3o (q) = (o q)do (q)
n=o+1
The last set of equations for 0 o ? q,
1 X
dn (q) ]n3o (q)
qo
q
do (q) = (B.9)
n=o+1
are the Newton identities that relate the coe!cients of a polynomial to
the sum of the positive powers of the zeros. Applied to the characteristic polynomials (A.3) and (A.5) of the adjacency matrix with }n (q) = n ,
dn (q) = (1)Q fn and fQ 31 = 0 (from (B.6)) yields for the first few values,
1 X 2
n
2
Q
fQ 32 = n=1
Q
1X
3n
3
n=1
3Ã
4
!2
Q
Q
X
1C X 2
fQ 34 =
n 2
4n D
8
fQ 33 = n=1
n=1
4. From (A.4), the coe!cient of the characteristic polynomial fQ 32 =
P
P . Each principal minor P2 has a principal submatrix of the form
doo 2¸
0 {
with {> | 5 [0> 1]. A minor P2 is non-zero if and only if { = | = 1
| 0
in which case P2 = 1. For each set of adjacent nodes, there exists such
non-zero minor, which implies that
fQ32 = O
From art. 3, it follows that the number of links O equals
1 X 2
n
2
Q
O=
n=1
(B.10)
478
Algebraic graph theory
5. Each principal submatrix P3×3 is of the form
5
6
0 { }
P3×3 = 7 { 0 | 8
} | 0
with determinant P3 = det P3×3 = 2{|}, which is only non-zero for { =
| = }. That form of P3×3 corresponds with a subgraph of 3 nodes that are
fully connected. Hence, fQ33 = 2× the number of triangles in J. From
art. 3, it follows that
1X 3
the number of triangles in J =
n
6
Q
(B.11)
n=1
6. In general, from (A.4) and by identifying the structure of a minor Pn ,
any coe!cient fQ3n can be expressed in terms of graph characteristics,
X
(1)f|fohv(G)
(B.12)
(1)Q fQ3n =
GMJn
where Jn is the set of all subgraphs of J with exactly n nodes and f|fohv (G)
is the number of cycles in a subgraph G 5 Jn . The minor Pn is a determinant of the Pn×n submatrix of D and defined as
X
(1)(s) d1s1 d2s2 · · · dnsn
det Pn =
s
where the sum is over all n! permutations s = (s1 > s2 > = = = > sn ) of (1> 2> = = = > n)
and (s) is the parity of s, i.e. the number of interchanges of (1> 2> = = = > n)
to obtain (s1 > s2 > = = = > sn ). Only if all the links (1> s1 ) > (2> s2 ) > = = = (n> sn ) are
contained in J, d1s1 d2s2 = = = dnsn is non-zero. Since dmm = 0, the sequence of
contributing links (1> s1 ) > (2> s2 ) > = = = (n> sn ) is a set of disjoint cycles and (s)
depends on the number of those disjoint cycles. Now, det Pn is constructed
¡ ¢
from a specific set G 5 Jn of n out of Q nodes and in total there are Qn
such sets in Jn . Combining all contributions leads to the expression (B.12).
7. Since D is a symmetric 0-1 matrix, we observe that using (B.1),
Q
Q
Q
X
X
X
¡ 2¢
2
dln dnl =
dln =
dln = gl
D ll =
n=1
n=1
n=1
Hence, with (A.16) or (B.10), (A.7) and basic law for the degree (B.2) is
expressed as
Q
Q
X
X
2n =
gn = 2O
(B.13)
trace(D2 ) =
n=1
n=1
B.2 The eigenvalues of the adjacency matrix
479
Furthermore,
Q
Q
X
X
¡
Q
Q
Q
Q X
Q
Q
X
X
X
X
X
¢
D2 lm =
dln dnm =
dnl
dnm
l=1 m=1;m6=l
=
l=1 m=1;m6=l n=1
n=1 l=1
Q X
Q
X
Q
X
dnl (gn dnl ) =
n=1 l=1
Ã
gn
m=1;m6=l
Q
X
dnl Q
X
l=1
n=1
!
dnl
l=1
or
Q
Q
X
X
¡
Q
X
¢
gn (gn 1)
D2 lm =
l=1 m=1;m6=l
(B.14)
n=1
¡ 2¢
P PQ
Lemma B.1.1 states that Q
l=1
m=1;m6=l D lm equals twice the total number of two-hop walks with dierent source and destination nodes. In other
words, the total number of connected triplets of nodes in J equals half
(B.14).
8. The total number Qn of walks of length n in a graph follows from
Lemma B.1.1 as
Q
Q X
X
Qn =
(Dn )lm
l=1 m=1
Since any real symmetric matrix (Section A.2, art. 9) can be written as D =
X diag(m )X W where X is an orthogonal matrix of the (normalized) eigenvecP
n
tors of D, we have that Dn = X diag(nm )X W and (Dn )lm = Q
q=1 xlq xmq q .
Hence,
ÃQ
!2
Q X
Q
Q X
Q
X
X
X
Qn =
xlq xmq nq =
xlq nq
l=1 m=1 q=1
q=1
l=1
9. Applying the Hadamard inequality for the determinant of any matrix
Fq×q ,
à q
!1
q
2
Y
X
|det F| |flm |2
m=1
l=1
yields, with dlm = dml and (B.1),
|det D| ÃQ
Q
Y
X
m=1
l=1
! 12
d2ml
=
ÃQ
Q
Y
X
m=1
l=1
! 12
dml
=
Q
Y
p
gm
m=1
480
Algebraic graph theory
Hence, with (A.6),
(det D)2 =
Q
Y
2n Q
Y
gm
(B.15)
m=1
n=1
10. Applying the Cauchy—Schwarz inequality (5.17)
à q
!2
q
q
X
X
X
dn en
d2n
e2n
n=1
n=1
n=1
to the vector (2 > = = = > Q ) and the 1 vector (1> 1> = = = > 1) gives
!2
ÃQ
Q
X
X
n
(Q 1)
2n
n=2
n=2
Introducing (B.6) and (B.13)
¡
¢
21 (Q 1) 2O 21
leads to the bound for the largest (and positive) eigenvalue 1 ,
r
2O (Q 1)
(B.16)
1 Q
P
2O
Alternatively, in terms of the average degree gd = Q1 Q
m=1 gm = Q , the
largest eigenvalue 1 is bounded by the geometric
mean of the average degree
p
and the maximum possible degree, 1 gd (Q 1). Combining the lower
bound (B.5) and upper bound (B.16) yields
r
2O
2O (Q 1)
1 (B.17)
Q
Q
11. From the inequality (A.26) for Hölder t-norms, we find that, if
Q
X
|n |t ? t
n=1
then
Q
X
PQ
|n |s ? s
n=1
for s ¯A t A 0. ¯Since n=1 n = 0, not all n can ¯be positive
¯ and combined
PQ
¯PQ
¯PQ
s¯
s¯
s
with ¯ n=1 n ¯ n=1 |n | , we also have that ¯ n=1 n ¯ ? s . Applied
to the case where t = 2 and s = 3 gives the following implication: if
B.2 The eigenvalues of the adjacency matrix
481
¯
¯P
2 ? 2 then ¯ Q 3 ¯ ? 3 . In that case, the number of triangles
¯ n=1 n ¯
1
1
n=2 n
given in (B.11) is
¯Q
¯
Q
1 3 1 ¯¯X 3 ¯¯
1 3 1X 3
n 1 ¯
n ¯ A 0
the number of triangles in J = 1 +
¯
6
6
6
6¯
PQ
n=2
n=2
P
2
2
Hence, if Q
n=2 n ? 1 , then the number
s of triangles in J is at least one.
Equivalently, in view of (B.10), if 1 A O then the graph J contains at
least one triangle.
12. A Theorem of Turan states that
h
Theorem B.2.1 A graph J with Q nodes and more than
tains at least one triangle.
Q2
4
i
links con-
h 2i
2
This theorem is a consequence of art. 7 and 11. For, using O A Q4 Q4
s
which is equivalent to Q ? 2 O in the bound on the largest eigenvalue (B.5),
1 and 1 A
triangle.
s
2O
2O
A s = O
Q
2 O
s
O is precisely the condition in art. 11 to have at least one
13. The eigenvalues of the complete graph NQ are 1 = Q 1 and
2 = = = = = Q = 1. This follows by computing the determinant in (A.2)
in the same way as in Section B.1, art. 3. Alternatively, the adjacency
matrix of the complete graph is M L and, if xW = [1 1 · · · 1] is the all-one
vector, then M = x=xW . A direct computation yields
µ
¶
¢
¡ W
x=xW
Q
det (M L L) = det x=x ( + 1) L = ( ( + 1)) det L +1
Using (A.38) and xW x = Q ,
µ
det (M L L) = ( ( + 1))
1
Q
Q
+1
¶
= (1)Q31 ( + 1)Q31 ( + 1 Q )
,
gives the eigenvalues of NQ . Since the number of links in NQ is O = Q(Q31)
2
Q(Q31)
we observe that the equality sign in (B.16) can occur. Since O for
2
any graph, the upper bound (B.16) shows that 1 Q 1 for any graph.
482
Algebraic graph theory
14. The dierence between the largest eigenvalue 1 and second largest
2 is never larger than Q , i.e.
1 2 Q
(B.18)
Since 1 A 0 as indicated by (B.17), it follows from (B.6) that
0=
Q
X
n=1
n 1 +
Q
X
|n | 1 + (Q 1) |2 |
n=2
such that
2 1
Q 1
Hence,
1 2 1 +
1
Q 1
=
Q 1
Q 1
Art. 13 states that the largest possible eigenvalue is 1 = Q 1 of the
complete graph which proves (B.18). Again, the equality sign in (B.18)
occurs in case of the complete graph.
15. Regular graphs. Every node m in a regular graph has the same degree
gm = u and relation (B.1) indicates that each row sum of D equals u.
Theorem B.2.2 The maximum degree gmax = max1$m$Q gm is an eigenvalue of the adjacency matrix D of a connected graph J if and only if the
corresponding graph is regular (i.e. gm = gmax for all m).
Proof: If { is an eigenvector of D belonging to eigenvalue = gmax so
is each vector n{ for each complex n (Section A.1, art. 1). Thus, we can
scale the eigenvector { such that the maximum component, say {p = 1, and
{n 1 for all n. The eigenvalue equation D{ = gmax { for that maximum
component {p is
gmax {p = gmax =
Q
X
dpm {m
m=1
which implies that all {m = 1 whenever dpm = 1, i.e. when the node m is
adjacent to node p. Hence, the degree of node p is gp = gmax . For any
node m adjacent to p for which the component {m = 1, a same eigenvalue
relation holds and thus gm = gmax . Proceeding this process shows that every
node n 5 J has same degree gn = gmax because J is connected. Hence,
{ = x where xW = [1 1 · · · 1]. Conversely, if J is connected and regular,
P
then Q
m=1 dpm = gmax for each p such that x is the eigenvector belonging
B.2 The eigenvalues of the adjacency matrix
483
to eigenvalue = gmax , and the only possible eigenvector (as follows from
¤
art. 1). Hence, there is only one eigenvalue gmax .
16. The characteristic polynomial of the complement Jf is
det (Df L) = det (M D ( + 1) L)
³
³
´´
= (1)Q det (D + ( + 1) L) L (D + ( + 1) L)31 M
³
´
= (1)Q det ((D + ( + 1) L)) det L (D + ( + 1) L)31 x=xW
where we have used that M = x=xW and x is the all-one vector. Similar to
the proof of Lemma A.4.4, we find
det (Df L) = (1)Q j () det (D + ( + 1) L)
(B.19)
where
j () = 1 xW (D + ( + 1) L)31 x
In general, j () is not a simple function of although a little more is
°
°2
1
°
°
known. For example, j () = 1 °(D + ( + 1) L)3 2 x° which shows that
2
j () 5 (4> 1]. Unlike in the proof of Lemma A.4.4, x is generally not an
eigenvector of D and we can write (Section A.1, art. 8)
1 X (1)n Dn
+1
( + 1)n
"
(D + ( + 1) L)31 =
n=0
P
n n
where the last sum "
n=0 D } can be interpreted as the matrix generating function of the number of walks
³ ´ of length n (see Section B.1, art. 5
n
and art. 8). Since D = X diag nm X W (Section A.2, art. 9) where the
orthogonal
s £ xm of D, the matrix product¤
£ matrix X consists of eigenvectors
¤
W
x X = x=x1 x=x2 · · · x=xQ = Q cos 1 cos 2 · · · cos Q
where m is the angle between ³the´eigenvector xm and the all-one vector
P
n
2
x. Hence, xW Dn x = xW X diag nm X W x = Q Q
m=1 m cos m and, with
P" (3m )n
+1
n=0 (+1)n = +1+m , we can write
j () = 1 Q
Q
X
m=1
cos2 m
+ 1 + m
With (A.5), we have f ( 1) = det (D + ( + 1) L) =
QQ
n=1 (n + 1 + )
484
Algebraic graph theory
and, hence,
¢
(1)Q X ¡
+ 1 + m Q 2 cos2 m
Q
Q
det (Df L) =
m=1
Q
Y
(n + 1 + )
n=1;n6=m
(B.20)
which shows that the poles of j () are precisely compensated by the zeros
of the polynomial f ( 1). Thus, the eigenvalues of Df are generally
dierent from {m 1}1$m$Q where m is an eigenvalue of D. Only if x
n 3Q
and
is an eigenvector of D corresponding with n , then j () = +1+
+1+n
all eigenvalues of Df belong to the set {m 1}1$m6=n$Q ^ {Q 1 n }.
According to art. 15, x is only an eigenvector when the graph is regular.
B.3 The stochastic matrix S = 31 D
The stochastic matrix S = 31 D, introduced in Section B.2, art. 1, characterizes a random walk on a graph. A random walk is described by a
finite Markov chain that is time-reversible. Alternatively, a time-reversible
Markov chain can be viewed as random walk on an undirected graph. Random walks on graphs have many applications in dierent fields (see e.g.
the survey by Lovász (1993)); perhaps, the most important application is
randomly searching or sampling.
The combination of Markov theory and algebra leads to interesting properties of S = 31 D. Section 9.3.1 and A.4.1 show that the left-eigenvector
of S belonging to eigenvalue = 1 is the steady-state vector (which is a
1×Q row vector) and that the corresponding right-eigenvector is the all-one
vector x, which essentially follows from (9.8) and which indicates that, at
each discrete time step, precisely one transition occurs. These eigenvectors
obey the eigenvalue equations S W W = W and S x = x and the orthogonality relation x = 1 (Section A.1, art. 3). If g = (g1 > g2 > = = = > gQ ) is the degree
vector, then the basic law for the degree (B.2) is written in vector form as
¡ g ¢W
x = 1. Theorem 9.3.5 states that the steady-state
gW x = 2O, or, 2O
¡ g ¢W
x = 1
eigenvector is unique such that the equations x = 1 and 2O
imply that the steady-state vector is
µ ¶W
g
=
2O
or
m =
gm
2O
(B.21)
B.3 The stochastic matrix S = 1 D
485
In general, the matrix S is not symmetric, but, after a similarity transform K = 1@2 , a symmetric matrix U = 1@2 S 31@2 = 31@2 D31@2 is
obtained whose eigenvalues are the same as those of S (Section A.1, art. 4).
The powerful property (Section A.2, art. 9) of symmetric matrices shows
that all eigenvalues are real and that U = X W diag(U ) X , where the columns
of the orthogonal matrix X consist of the normalized eigenvectors yn that
obey ymW yn = mn . Explicitly written in terms of these eigenvectors gives
U=
Q
X
n yn ynW
n=1
where, with Frobenius Theorem A.4.2, the real eigenvalues are ordered as
1 = 1 2 · · · Q 1. If we exclude bipartite graphs (where the set
of nodes is N = N1 ^ N2 with N1 _ N2 = B and where each link connects
a node in N1 and in N2 ) or reducible Markov chains (Section A.4), then
|n | ? 1, for n A 1. Section A.1, art. 4 shows that the similarity transform
K = 1@2 maps the steady state vector into y1 = K 31 W and, with (B.21),
31@2 W
°
y1 = °
°31@2 W °
2
or
s
r
gm
2O
y1m = s
µ s ¶2 =
PQ
gm
m=1
gm
s
= m
2O
2O
Finally, since S = 31@2 U1@2 , the spectral decomposition of the transition
probability matrix of a random walk on a graph with adjacency matrix D is
S =
Q
X
31@2
n yn ynW 1@2 = x +
n=1
Q
X
n 31@2 yn ynW 1@2
n=2
¢
¡
The q-step transition probability (9.10) is, with yn ynW lm = ynl ynm and
(B.21),
s
Q
gm
gm X q
q
+
n ynl ynm
Slm =
2O
gl
n=2
The convergence towards the steady state m can be estimated from
s
s
Q
Q
X
¯
¯ q
gm X q
¯Slm m ¯ gm
|qn | |ynl | |ynm | ?
|n |
gl
gl
n=2
n=2
486
Algebraic graph theory
Denoting by = max (|2 | > |Q |) and by 0 the largest element of the reduced set {|n |} \ {} with 2 n Q , we obtain
s
¯
¯ q
¯Slm m ¯ ? gm q + R (0q )
gl
B.4 Eigenvalues and connectivity
A graph J has n components (or clusters) if there exists a relabeling of the
nodes such that the adjacency matrix has the structure
6
5
D1 R = = = R
9
.. :
9 R D2
. :
:
D=9
9 ..
:
..
7 .
8
.
R
= = = Dn
where the square submatrix Dp is the adjacency matrix of the connected
component p. Disconnectivity is a special case of reducibility of a stochastic
matrix defined in Section A.4 and expresses that no communication is possible between two states in a dierent component or cluster. Using (A.39)
indicates that
det (D L) =
n
Y
det (Dp p L)
(B.22)
p=1
If D is a regular graph with degree u, so is each submatrix Dp . Since Dp
is connected, Section B.2, art. 15 states that the largest eigenvalue of any
Dp equals u. Hence, by (B.22), the multiplicity of the largest eigenvalue of
D equals the number of components in the regular graph.
As shown in Section B.1, art. 2, the Laplacian T has non-negative eigenvalues of which at least one equals zero. In addition, the matrix
(Q 1)L T = (Q 1)L + D
is non-negative with constant row sums all equal to Q 1. Although the
matrix (Q 1)L T is not an adjacency matrix and does not represent
a regular graph, the main argument in the proof of Theorem B.2.2 is the
property of constant row sums and non-negative matrix elements. Hence,
the multiplicity of the largest eigenvalue of (Q 1)L T is equal to the
number of components of J. But the largest eigenvalue of (Q 1)L T is
the smallest of T (Q 1)L and also of T. Hence, we have proved
B.5 Random matrix theory
487
Theorem B.4.1 The multiplicity of the smallest eigenvalue = 0 of the
Laplacian T is equal to the number of components in the graph J
If T has only 1 zero eigenvalue with corresponding eigenvector x (because
PQ
n=1 tln = 0 for each 1 l Q is, in vector notation, Tx = 0), then the
graph is connected; it has only 1 component. Theorem B.4.1 also implies
(T)
that, if the second smallest eigenvalue T = Q31 of T is zero, the graph J
is disconnected. Since all eigenvectors of a matrix are linearly independent,
the eigenvector {T of T must satisfy {WT x = 0 since x is the eigenvector
belonging to = 0. By requiring this additional constraint and choosing
the scaling of the eigenvector such that {W { = 1, we obtain similar to (A.35)
that
T =
min
k{k22 =1 and {W x=0
{W T{
The second smallest eigenvalue T has many interesting properties that
characterize how strongly a graph J is connected. It is interesting to mention
the inequality (Cvetkovic et al., 1995, p. 265)
³
´
(J) T 2 (J) 1 cos
Q
(B.23)
where (J) and (J) are the vertex and edge connectivity respectively.
B.5 Random matrix theory
Random matrix theory investigates the eigenvalues of an Q × Q matrix
D whose elements dlm are random variables with a given joint distribution.
Even in case all elements dlm are independent, there does not exist a general
expression for the distribution of the eigenvalues. However, in some particular cases (such as Gaussian elements dlm ), there exist nice results. Moreover,
if the elements dlm are properly scaled, in various cases the spectrum in the
limit Q $ 4 seems to converge rapidly to a deterministic limit distribution. The fascinating results of random matrix theory and applications from
nuclear physics to the distributions of the non-trivial zeros of the Riemann
Zeta function are discussed by Mehta (1991).
Random matrix theory immediately applies to the adjacency matrix of
the random graph Js (Q ) where each element dlm is 1 with probability s
and zero with probability 1 s.
488
Algebraic graph theory
B.5.1 The spectrum of the random graph Js (Q )
Let denote an arbitrary eigenvalue of the adjacency matrix of the random graph Js (Q ). Clearly, is a random variable with mean H [] = Q1
£ ¤
PQ
n = 0 because of (B.6). In addition, the variance Var[] = H 2 =
n=1
1 PQ
2
n=1 n and from (B.10)
Q
2O
= s(Q 1)
Q
This results
implies
³s
´ that, for fixed s and large Q , the eigenvalues of Js (Q )
grow as R
Q , with the exception1 of the largest eigenvalue 1 .
The number of links O in Js (Q ) is binomially distributed with mean
H [O] = s Q (Q231) . Taking the expectation of the bounds (B.17) on the
largest eigenvalue gives
r
2
2 (Q 1) hs i
O
H [O] H [1 ] H
Q
Q
Using (2.12) yields
Var [] =
(2)
( 2 ) µ¡Q ¢¶
hs i X
X
s
s n
Q
2
O =
n Pr [O = n] =
ns (1 s)( 2 )3n
H
n
Q
Q
n=0
n=0
Unfortunately, the sum cannot be expressed in closed form, but
(Q2 ) µ¡ ¢¶
Q
1 X Q2 s n
s
q¡ ¢
ns (1 s)( 2 )3n s
n
Q
2
n=0
with equality for Q $ 4. In summary, for any Q and s,
s
s(Q 1) H [1 ] s (Q 1)
(B.24)
The degree distribution (15.11) of the random graph is a binomial distribution with mean H [Grg ] = s(Q 1) and Var[Grg ] = (Q 1)s(1 s).
The inequality (5.13) indicates that the degree Grg converges exponentially
fast to zero the mean H [Guj ] for fixed s and large Q , which means that
the random graphs tends to a regular graph with high probability. Section
B.2, art. 1 states that 1 $ s (Q 1) with high probability. Comparison
with the bounds (B.24) indicates that the upper bound is less tight than the
lower bound and that the upper bound is only sharp when s $ 1, i.e. for
the complete graph. Section B.2, art. 13 shows that only for the complete
graph the upper bound is indeed exactly attained.
1
1 It is known that, for large Q, the second largest eigenvalue of Js (Q) grows as R Q 2 + .
B.5 Random matrix theory
489
B.5.2 Wigner’s Semicircle Law
Wigner’s Semicircle Law is the fundamental result in the spectral theory of
large random matrices.
Theorem B.5.1 (Wigner’s Semicircle Law) Let D be a random Q × Q
real symmetric matrix with independent and identically distributed elements
dlm with 2 = Var[dlm ] and denote by (DQ ) an eigenvalue of the set of the
Q real eigenvalues of the scaled matrix DQ = IDQ . The probability density
function i(DQ ) ({) of (DQ ) tends for Q $ 4 to
lim i(DQ ) ({) =
Q<"
1 p 2
4 {2 1|{|$2
2 2
(B.25)
Since Wigner’s first proof (Wigner, 1955) of this Theorem and his subsequent generalizations (Wigner, 1957, 1958) many proofs have been published. However, none of them is short and easy enough to include here.
Wigner’s Semicircle Law illustrates that, for su!ciently large Q , the distribution of the eigenvalues of IDQ does not depend anymore on the probability
distribution of the elements dlm . Hence, Wigner’s Semicircle Law exhibits
a universal property of a class of large, real symmetric matrices with independent random elements. Mehta (1991) suspects that, for a much broader
class of large random matrices, a mysterious yet unknown law of large numbers must be hidden. The scaling of D by I1Q can be understood from the
previous Section B.5.1. The adjacency matrix of the random graph satis2
fies the conditions in Theorem B.5.1
´ = s (1 s) and its eigenvalues
³swith
Q . In order to obtain the finite limit
(apart from the largest) grow as R
distribution (B.25) scaling by I1Q is necessary.
The spectrum of Js (50) together with the properly rescaled Wigner’s
Semicircle Law (B.25) is plotted in Fig. B.2. Already for this small value of
Q , we observe that Wigner’s Semicircle Law is a reasonable approximation
for the intermediate s-region. The largest eigenvalue 1 for finite Q , which is
distributed around s (Q 1) as demonstrated above and shown in Fig. B.2
but which is not incorporated in Wigner’s Semicircle Law, influences the
PQ
average H [] = Q1
n=1 n = 0 and causes the major bulk of the pdf
around { = 0 to shift leftward compared to Wigner’s Semicircle Law, which
is perfectly centered around { = 0.
The complement of Js (Q ) is (Js (Q ))f = J13s (Q ), because a link in
Js (Q ) is present with probability s and absent with probability 1 s and
(Js (Q ))f is also a random graph. For large Q , there exists a large range
of s values for which both s sf and 1 s sf such that both Js (Q )
490
Algebraic graph theory
14
p = 0.1
N = 50
p = 0.2
p = 0.3
p = 0.4
p = 0.5
p = 0.6
p = 0.7
p = 0.8
p = 0.9
Semicircle Law (p = 0.5)
12
p = 0.1
10
p = 0.9
fO(x)
8
6 p = 0.8
p = 0.2
p = 0.7
p = 0.3
4
E[O1] = p(N 1)
2
0
0
10
20
30
40
eigenvalue x
Fig. B.2. The probability density function of an eigenvalue in Js (50) for various
s. Wigner’s Semicircle Law, rescaled and for s = 0=5 ( 2 = 14 ), is shown in bold.
We observe that the spectrum for s and 1 s is similar, but slightly shifted. The
high peak for s = 0=1 reflect disconnectivity, while the high peak at s = 0=9 shows
the tendency to the spectrum of the complete graph where Q 1 eigenvalues are
precisely 1.
and (Js (Q ))f are connected almost surely. Figure B.2 shows that the normalized spectra of Js (Q ) and J13s (Q ) are, apart from a small shift and
ignoring the largest eigenvalue, almost identical. Equation (B.20) indicates
that the spectrum of a graph and its complement tends to each other if
cos m $ 0 (except for the largest eigenvalue which will tend to x). This
seems to suggest that Js (Q ) and J13s (Q ) are tending to a regular graph
with degree s (Q 1) and (1 s) (Q 1) and that these regular graphs
(even for small Q) have nearly the same spectrum (apart from the largest
s
' IQ
I1Q
eigenvalue s (Q 1) and (1 s) (Q 1) respectively): I13s
Q
where s is an eigenvalue of Js (Q ).
Figure B.3 shows the probability density function i ({) of the eigenvalues of the adjacency matrix D of Js (Q ) with Q = 100 together with the
eigenvalues of the corresponding matrix DX where all one elements in the
adjacency matrix of Js (100) are replaced by i.i.d uniform random variables
on [0,1]. Wigner’s Semicircle Law provides an already better approximation
B.5 Random matrix theory
491
12
p = 0.2
N = 100
p = 0.3
p = 0.4
p = 0.5
p = 0.6
p = 0.7
p = 0.8
p = 0.9
Semicircle Law (p = 0.5)
10
p = 0.7
p = 0.8
fO(x)
8
6
4
2
0
0
20
40
60
80
eigenvalue x
Fig. B.3. The spectrum of the adjacency matrix of Js (100) (full lines) and of the
corresponding matrix with i.i.d. uniform elements (dotted lines). The small peaks
at higher values of { are due to 1 .
than for Q = 50. Since the elements of DX are always smaller (with probability 1) than those of D, the matrix norm kDX k2 ? kDk2 , which implies
by Section B.2, art. 1 that 1 (DX ) ? 1 (D). In addition, relation (B.13)
P
2
shows that Q
n=1 n (DX ) ? 2O such that Var[ (DX )] ? Var[ (D)], which
is manifested by a narrower and higher peaked pdf centered around { = 0.
Appendix C
Solutions of problems
C.1 Probability theory (Chapter 2)
(i) Using the general formula (2.12) for a non-zero random variable [, we have
H [log [] =
"
[
log n Pr [[ = n]
n=1
S
n
while (2.18), *[ (}) = "
n=0 Pr [[ = n] } , shows that we need to express log n in terms of
n
} . A possible solution starts from the double integral with 0 ? d $ e,
] "
] " ] e
] e
g{
gwh3w{ =
gw
g{h3w{
d
0
0
d
where the reversal Uof integration is justified by absolute convergence (Titchmarsh, 1964,
Section 1.8). Since 0" gwh3w{ = {1 , the left-hand side integral equals
] e
] "
g{
d
gwh3w{ = log
0
e
d
while the integral at the right hand side is
] " 3wd
] " ] e
h
3 h3we
gw
gw
g{h3w{ =
w
0
d
0
hence,
] "
log n =
0
h3w 3 h3wn
gw
w
Multiplying both sides by Pr [[ = n] and summing over n, we obtain (reversal in operators
is justified on absolute convergence),
"
[
n=1
] "
gw [ 3w
h 3 h3wn Pr [[ = n]
w
0
n=1
$
#
] "
"
"
[
[
gw
Pr [[ = n] 3
h3wn Pr [[ = n]
h3w
=
w
0
n=1
n=1
"
log n Pr [[ = n] =
which finally gives with (2.18)
] "
H [log [] =
0
h3w 3 *[ (h3w ) + *[ (0)
gw
w
493
494
Solutions of problems
(ii) (a) The pdf of the n-th smallest order statistic follows from (3.36) for an exponential
distribution as
p 3 1 n31 3(p3n+1){
1 3 h3{
h
i[(n) ({) = p
n31
The probability generating function (2.37) is
l
p 3 1 ] " k
n31 3(}+(p3n+1))w
1 3 h3w
*[(n) (}) = H h3}[(n) = p
h
gw
n31
0
Let x = h3w and = } + (p 3 n + 1), then the integral reduces to the well-known Beta
function (Abramowitz and Stegun, 1968, Section 6.2.)
]
] "
n31 3w
1 1
1
1 3 h3w
h
gw =
(1 3 x)n31 x@31 gw = E (n> @)
0
0
1 K (n) K (@)
=
K (n + @)
Hence,
*[(n) (}) =
}
n31
\
+p+13n
K 1
p!
p!
}
=
}
(p 3 n)! K + p + 1
(p 3 n)! m=0 +p3m
The mean follows from H [(n) = 3O0z(n) (0) where O[ is the logarithm of the generating
function (2.41) as
n31
1 [ 1
H [(n) =
m=0 p 3 m
(C.1)
(b) For a polynomial probability density function i[ ({) = {31 1{M[0>1] with A 0, we
have with (3.36) for { M [0> 1] that
p 3 1
{n31 (1 3 { )p3n
i[(n) ({) = p
n31
with mean
p 3 1 ] 1
{n (1 3 { )p3n g{
H [(n) = p
n31
0
1
p 3 1 ] 1
K n+ 1
p!
=p
wn+ 31 (1 3 w)p3n gw =
1
n31
(n 3 1)! K p + 1 + 0
If < ", then H [(n) = 1, while for < 0, H [(n) = 0. For a uniform distribution
n
where = 1, the result is H [(n) = p+1 . Indeed, the p independently chosen uniform
random variables divide, after ordering, the line segment [0> 1] into p + 1 subintervals. The
length O of each subinterval has a same distribution, which more easily follows by symmetry
if the line segment is replaced by a circle of unit perimeter. Since the length O of each
subinterval is equal in distribution, one can consider the first subinterval [0> [(1) ] whose
length O exceeds a value { M (0> 1) if and only if all p uniform random variables belong to
[{> 1]. The latter event has probability equal to (1 3 {)p such that Pr [O A {] = (1 3 {)p
1
and, with (2.35), H [O] = p+1
.
(iii) If [ were a discrete random variable, then Pr [[ = n] E qqn , where qn is the number of
values in the set {{1 > {2 > = = = > {q } that is equal to n. For a continuous random variable
[, the values are generally real numbers ranging from {min = min1$m$q {m until {max =
max1$m$q {m . We first construct a histogram K from the set {{1 > {2 > = = = > {q } by choosing
3{min
a bin size {{ = {maxp
, where p is the number of bins (abscissa points). The choice
of 1 ? p ? q is in general di!cult to determine. However, most computer packages allow
us to experiment with p and the human eye proves sensitive enough to make a good choice
C.1 Probability theory (Chapter 2)
495
of p: if p is too small, we loose details, while a high p may lead to high irregularities
due to the stochastic nature of [. Once p is chosen, the histogram consists of the set
{k0 > k1 > = = = > kp31 } where km equals the number of [ values in the set {{1 > {2 > = = = > {q } that
lies in the interval [{min + m{{> {min + (m + 1){{] for 0 $ m $ p 3 1. By construction,
Sp31
m=0 km = q.
The histogram K approximates the probability density function i[ ({) after dividing each
value km by q{{ because
{max {min
31
{
[
] {max
1=
{min
i[ ({)g{ = lim
{{<0
i[ (m ) {{ E
m=0
p31
[
m=0
km
{{ = 1
q{{
where in the Riemann sum m denotes a real number m M [{min + m{{> {min + (m + 1){{].
Alternatively from (2.31) we obtain
i[ (m ) = lim
{{<0
Pr [m ? [ $ m + {{]
Pr [{min + m{{ ? [ $ {min + (m + 1){{]
E
{{
{{
such that
i[ (m ) E
km
q{{
which reduces to the discrete case where {{ = 1.
Q
(iv) The density of mobile nodes in the circle with radius u equals = u
2 . Let U denote the
(random) position of a mobile node. The probability that there is a mobile node between
distance { and { + g{ (and { $ u) is
Pr [{ $ U $ { + g{] =
2{g{
u2
From (2.31), the pdf of U equals iU ({) = 2{
1
and the distribution function follows by
u2 {$u
2
integration as IU ({) = {u2 1{$u + 1{Au . The (random) position U(p) of the p-th nearest
mobile node to the center is given by (3.36)
iU(p) ({) = QiU ({)
Q 3 1
(IU ({))p31 (1 3 IU ({))Q 3p
p31
Written in terms of the density for { $ u,
iU(p) ({) = 2{
Q 313(p31)
Q 3 1 {2 p31 {2
13
p31
Q
Q
2
we recognize, apart from the prefactor 2{, a binomial distribution (3.3) with s = {
.
Q
Similar to the derivation of the law of rare events in Section 3.1.4, this binomial distribution
tends, for large Q but constant density , to a Poisson distribution with = {2 . Hence,
asymptotically, the pdf of the position U(p) of the p-th nearest mobile node to the center
is, for { $ u,
p31
{2
2
iU(p) ({) = 2{
h3{
(p 3 1)!
(v) We use the law of total probability (2.46) first assuming that Z is discrete,
[
Pr[Y 3 Z $ {] =
Pr[Y 3 Z $ {|Z = n] Pr [Z = n]
n
and, by independence, Pr[Y 3 Z $ {|Z = n] = Pr[Y $ n + {]. Hence,
[
Pr[Y $ n + {] Pr [Z = n]
Pr[Y 3 Z $ {] =
n
496
Solutions of problems
If Z is continuous, the general formula is
] "
Pr[Y 3 Z $ {] =
Pr[Y $ { + |]
3"
g Pr[Z $ |]
g|
g|
(C.2)
from which the pdf follows by dierentiation
] "
iY 3Z ({) =
3"
iY ({ + |)iZ (|)g|
This resembles the convolution integral (2.62). If both Y and Z have the same distribution,
direct integration of (C.2) yields
Pr[Y $ Z ] = Pr[Z $ Y ] =
1
2
This equation confirms the intuitive result that two independent random variables with
same density function have equal probability to be larger or smaller than the other.
C.2 Correlation (Chapter 4)
(i) In two dimensions, formula (4.2) becomes
&
%
k
l
2
2
[
\
3}1 [3}2 \
2
2
H h
} + }1 }2 \ [ +
}
= exp 3}2 \ 3 [ }1 +
2 1
2 2
%
&
2 1 3 2
\
1
2
}22
= exp 3}2 \ 3 [ }1 + ([ }1 + }2 \ ) +
2
2
Hence, with (4.21) the joint probability distribution is
i[\ ({> |; ) =
1
(2l)2
] f1 +l" ] f2 +l"
f1 3l"
h}1 ({3[ )+}2 (|3\ )
f2 3l"
2
2
2 \ (1 ) 2
1
}2
2
× h 2 ([ }1 +}2 \ ) +
g}1 g}2
] f2 +l" 2 (12 )
2
\
1
}2 }2 (|3\ )
2
=
h
h
g}2
2l f2 3l"
] f1 +l"
2
1
1
×
h}1 ({3[ ) h 2 ([ }1 +}2 \ ) g}1
2l f1 3l"
Evaluating the last integral, denoted by O, yields
] f1 +l"
1
1
h}1 ({3[ ) exp
([ }1 + }2 \ )2 g}1
2l f1 3l"
2
] "
1
1
=
h(f1 +lw)({3[ ) exp
([ (f1 + lw) + }2 \ )2 gw
2 3"
2
%
2 &
] "
2 3[
}2 \
1 f1 ({3[ )
h
w 3 l f1 +
gw
hlw({3[ ) exp
=
2
2
[
3"
O=
C.3 Poisson process (Chapter 7)
497
Since the integrand is an entire function, thekcontour canlbe shifted, which allows substitution as in real analysis. Thus, let x = w 3 l f1 + }2\ , then
[
%
&
2
[
2
[
h
exp 3
x gx
2
3"
%
#
$&
]
2
({ 3 [ )
1 3 }2\ ({3[ ) "
[
exp 3 [ x2 + 2lx
h
=
gx
2
2
2
[
3"
5
#
&]
$2 6
%
2
"
[
({
3
)
({ 3 [ )2
1 3 }2\ ({3[ )
[
8 gx
[
h
x+l
exp 3
exp 73
=
2
2
2
2[
2
[
3"
1 f1 ({3[ )
O=
h
2
] "
By substituting w = x + l
k
l
} l x+l f1 + 2 \
({3[ )
({3[ )
, the integral becomes
2
[
%
&
&
I ] "
] "
2
2
[
[
2
2
2
exp 3
exp 3
h3z z31@2 gz
w gw = 2
w gw =
2
2
[ 0
3"
0
I
I
2
2
1
=
K
=
[
2
[
] "
%
where we have used the Gamma function (Abramowitz and Stegun, 1968, Chapter 6).
Hence,
} 3 2 \ ({3[ )
O=
h
[
I
[ 2
%
({ 3 [ )2
exp 3
2
2[
&
and
({3 )2
] f2 3l" \2 (12 ) 2 exp 3 22[
1
}2 3 \ + \ ({3[ ) }2 }2 |
[
2
[
i[\ ({> |; ) =
I
h
h g}2
2l f2 3l"
[ 2
The last integral is recognized
with (3.22) as the inverse Laplace transform of a Gaussian
2 1 3 2 and mean = + \ ({ 3 ). Thus
with variance 2 = \
\
[
[
% 2 &
|3\ 3 \ ({3[ )
[
({3[ )2 exp 3
2
exp 3 22
2\ (132 )
[
I
s
i[\ ({> |; ) =
I
[ 2
\ 1 3 2 2
which finally leads to the joint Gaussian density function (4.4). Hence, the linear combination method leads to exact results for Gaussian random variables.
C.3 Poisson process (Chapter 7)
(i) Let \ be a binomial random variable with parameters Q and s, where Q is a Poisson
random variable with parameter . The probability density function of \ is obtained by
applying the law of total probability (2.46),
Pr [\ = n] =
"
[
q=0
Pr[\ = n|Q = q] Pr [Q = q]
498
Solutions of problems
With (3.3) and (3.9), we have
Pr [\ = n] =
=
"
" [
sn 3 [ t (q3n) q
q n (q3n) q h3
=
h
s t
q!
n!
(q 3 n)!
n
q=0
q=n
"
(s)n 3+t
sn n 3 [ t q q
h
=
h
n!
q!
n!
q=0
n
Since t = 1 3 s, we arrive at Pr [\ = n] = (s)
h3s , which means that \ is a Poisson
n!
random variable with mean s. If a su!cient sample of test strings defined above is sent
and received, the average number of “one bits” at receiver divided by the average number
of bits at the sender gives the probability s (if errors occur indeed independently).
(ii) Since the counting process of a sum of a Poisson process is again a Poisson counting process
S
with rate equal to 4m=1 m , the average number of packets of the four classes in the router’s
S
buers during interval W is = W 4m=1 m . Hence, the probability density function for the
q
total number Q of arrivals is Pr [Q = q] = q! h3 .
(iii) Theorem 7.3.4 states the Q(w) is a Poisson counting process with rate 1 + 2 . Then,
Pr [{[1 (w) = 1} K {[(w) = 1}]
Pr [[(w) = 1]
Pr [{[1 (w) = 1} K {[2 (w) = 0}]
=
Pr [[(w) = 1]
1
Pr [[1 (w) = 1] Pr [[2 (w) = 0]
=
=
Pr [[(w) = 1]
1 + 2
Pr [[1 (w) = 1|[(w) = 1] =
since the Poisson random variables [1 and [2 are independent. As an application we can
consider a Poissonean arrival flow of packets at a router with rate . If the packets are
marked randomly with probability s = 1 , the resulting flow consists of two types, those
marked and those not. Each of these flows is again a Poisson flow, the marked flow with
rate 1 = s and the non-marked flow with 2 = (1 3 s). Actually, this procedure leads
to a decomposition of the Poisson process into two independent Poisson processes and leads
to the reverse of Theorem 7.3.4.
1
(iv) (a) Applying the solution of previous exercise immediately gives +
1
2 +3
(b) Since the three Poisson processes are independent, the total number of cars on the three
lanes, denoted by [, is also a Poisson process (Theorem 7.3.4) with rate = 1 + 2 + 3 .
q
Hence, Pr [[ = q] = q! h3 .
(c) Let us denote the Poisson process in lane m by [m . Then, using the independence
between the [m ,
Pr [[1 = q> [2 = 0> [3 = 0] = Pr [[1 = q] Pr [[2 = 0] Pr [[3 = 0]
=
q
q
1 31 32 33
h
h
= 1 h3
h
q!
q!
(v) (a) The player relies on the fact that during the time there is exactly one arrival. Since
the game rules mention that he should identify the last signal in (0> W ), signals arriving
during (0> v) do not influence his chance to win because of the memoryless property of the
Poisson process. The number of arrivals in the interval (v> W ) obeys a Poisson distribution
with parameter (W 3 v). The probability that precisely one signal arrives in the interval
(v> W ) is Pr [Q (W ) 3 Q (v) = 1] = (W 3 v) h3(W 3v) .
(b) Maximizing this winning probability with respect to v (by equating the first derivative
to zero) yields
g
Pr [Q (W ) 3 Q (v) = 1] = 3h3(W 3v) + 2 (W 3 v) h3(W 3v) = 0
gv
with solution (W 3 v) = 1 or v = W 3 1@. This maximum (which is readily verified by
g2
checking that gv
2 Pr [Q (W ) 3 Q (v) = 1] ? 0) lies inside the allowed interval (0> W ). The
maximum probability of winning is Pr [Q (W ) 3 Q (W 3 1@) = 1] = 1@h.
C.3 Poisson process (Chapter 7)
499
(vi) (a) We apply the general formula (7.1) for the pdf of a Poisson process with mean H [[(w)] =
w = 1. Then, Pr [[ (w + v) 3 [ (v) = 0] = h3w = 1h .
S
1
(b) Pr [[ (w + v) 3 [ (v) A 10] = 1 3 Pr [[ (w + v) 3 [ (v) $ 10] = 1 3 1h 10
n=0 n! .
(c) Each minute is equally probable as follows from Theorem 7.3.3.
(vii) This exercise is an application of randomly marking in a Poisson flow as explained in
solution (iii) above. The total flow of packets can be split up into an ACK stream, a
Poisson process Q1 with rate s = 3v31 and a data flow, an independent Poisson process
Q2 with rate (1 3 s) = 7v31 . Then,
(a) Pr [Q1 A 1] = 1 3 Pr [Q1 = 0] = 1 3 h33
(b) The average number is H [Q1 + Q2 |Q1 = 5] = H [Q1 |Q1 = 5] + H [Q2 |Q1 = 5] = 5 +
H [Q2 ] = 5 + 7 = 12 packets.
2
(c) Pr [Q1 = 2|Q1 + Q2 = 8] =
6
3 h3 7 h7
Pr[Q1 =2>Q1 +Q2 =8]
6!
= 2! 108 10
E 29=65%
Pr[Q1 +Q2 =8]
8! h
(viii) (a) Since the three Poisson arrival processes are independent, the total number of requests
will also be a Poisson process with the parameter = 1 + 2 + 3 = 20 requests/hour
(Theorem 7.3.4). The expected number of requests during an 8-hour working day is H [Q] =
w = 20 × 8 = 160 requests.
(b) If we denote arrival processes of requests with dierent ADSL problems each with a
random variable [l for l = 1> 2> and 3, then due to their mutual independence
Pr [[1 = 0> [2 = n> [3 = 0] = Pr [[1 = 0] Pr [[2 = n] Pr [[3 = 0]
= h31 w
8
from which Pr [[1 = 0> [2 = 3> [3 = 0] = h3 3
(2 w)n h32 w 33 w
h
n!
6 3
6
h3 3
6
h3 3 = 1=7 × 1033 .
3!
20
(c) If we denote the total number of requests by [ then Pr [[ = 0] = h3w = h3 4 =
33
6=7 × 10 .
(d) The precise time is irrelevant for Poisson processes, only the duration of the interval
matters. Here intervals are overlapping and we need to compute the probability
3
s = Pr [{[ (0=2) = 1} K {[ (0=5) 3 [ (0=1) = 2}]
=
1
[
Pr [{[ (0=1) = n} K {[ (0=2) 3 [ (0=1) = 1 3 n} K {[ (0=5) 3 [ (0=2)} = 1 + n]
n=0
=
1
[
Pr [[ (0=1) = n] Pr [[ (0=2) 3 [ (0=1) = 1 3 n] Pr [[ (0=5) 3 [ (0=2) = 1 + n]
n=0
=
1
[
n=0
h32
(2)n 32 (2)13n 36 (6)1+n
h
h
= 48h310 = 2=18 × 1033
n!
(1 3 n)!
(1 + n)!
(e) Given that at the moment w + v there are n + p requests, the probability that there
were n requests at the moment w is
Pr [{[ (w) = n} K {[ (w + v) = n + p}]
Pr [[ (w + v) = n + p]
Pr [[ (w) = n] Pr [[ (w + v) 3 [ (w) = p]
=
Pr [[ (w + v) = n + p]
Pr [[ (w) = n|[ (w + v) = n + p] =
(w)n h3w (v)p h3v
n!
p!
=
( (w + v))n+p h3(w+v)
(n + p)!
n + p w n v p
=
n
w+v
w+v
500
Solutions of problems
(ix) (a) The number of attacks that are arriving to the PC is a Poisson random variable [ (w)
with rate = 6. The probability that exactly one (n = 1) attack during one (w = 1) hour
follows from (7.1) as Pr [[(1) = 1] = 6h36 .
(b) Applying (7.2), the expected amount of time that the PC has been on is w = H[[(w)]
=
60
=
10
hours.
6
(c) The arrival time of the fifth attack is denoted by W . Given that there are six attacks in
one hour (w = 1), we compute the probability Pr[W ? w|[(1) = 6] that either five attacks
arrive in the interval (0> w) and one arrives in (w> 1) or all six attacks arrive in (0> w) and none
arrives in the interval (w> 1). Hence, for 0 $ w ? 1,
IW (w) = Pr[W ? w|[(1) = 6]
Pr[{[(w) = 5} K {[(1) = 6}] + Pr[{[(w) = 6} K {[(1) = 6}]
Pr[[(1) = 6]
Pr[[(w) = 5] Pr[[(1) 3 [(w) = 1] + Pr[[(w) = 6] Pr[[(1) 3 [(w) = 0]
=
Pr[[(1) = 6]
=
=
((w)5 @5!)h3w (1 3 w)h3(13w) + ((w)6 @6!)h3w h3(13w)
= 6w5 3 5w6
(6 @6!)h3
The probability
that the fifth attack will arrive between 1:30 p.m. and 2 p.m. is IW (1) 3
7
IW 12 = 1 3 64
= 57
.
64
U
(d) The expectation of W given [(1) = 6 follows from (2.33) as H [W |[(1)] = 01 {iW ({)g{
gIW (w)
derived in (c). Alternatively, the expectation can be computed from
where iW (w) = gw
U (2.35), H [W |[(1)] = 01 1 3 (6{5 3 5{6 ) g{ = 57 . Hence the expected arrival time of the
fifth attack between 1 p.m. and 2 p.m. is about 1:43 p.m.
(x) Let [ and [m denote the lifetime of system and subsystem m respectively. For a series
{[ A w} and
of subsystems with independent lifetimes [m is the event {[ A w} = Kq
m=1 m
T
Pr
[[
A
w].
Recall
with
(3.32)
that
Pr
[[
A
w]
=
Pr
min
Pr [[ A w] = q
m
1$m$q [m A w .
m=1
Using the definition of the reliability function (7.5) then yields
Userie s (w) =
q
\
Um (w)
m=1
(xi) The probability that the system V shown in Fig. 7.6 fails is determined by the subsystem
with longest lifetime or [ = max1$m$q [m . Invoking relation (3.33) combined with the
definition of the reliability function (7.5) leads to
Up a ra llel (w) = 1 3
q
\
(1 3 Um (w))
m=1
C.4 Renewal theory (Chapter 8)
(i) The equivalence {Q (w) A q} Ui {Zq $ w} indicates
"
[
Pr ZQ (w) $ { =
Pr [{Zq $ {} K {Zq+1 A w}]
q=0
= Pr [Z0 $ {> Z1 A w] +
"
[
q=1
Pr [Zq $ {> Zq+1 A w]
C.4 Renewal theory (Chapter 8)
501
The convention Z0 = 0 reduces Pr [Z0 $ {> Z1 A w] = Pr [Z1 A w] = Pr [1 A w] = 1 3
I (w). Furthermore, by the law of total probability,
] "
Pr [Zq $ {> Zq+1 A w] =
Pr [Zq $ {> Zq+1 A w|Zq = x]
0
] {
g Pr [Zq $ x]
gx
gx
Pr [Zq+1 A w|Zq = x] g Pr [Zq $ x]
=
0
A renewal process restarts after each renewal from scratch (due to the stationarity and the
independent increments of the renewal process). This implies that Pr [Zq+1 A w|Zq = x] =
Pr [q+1 A w 3 x] = 1 3 I (w 3 x) because the interarrival times are i.i.d. random variables.
Combined,
"
[
Pr ZQ (w) $ { = Pr [ A w] +
q=1
] {
Pr [ A w 3 x] g Pr [Zq $ x]
0
] {
Pr [ A w 3 x] g
= Pr [ A w] +
#"
[
0
$
Pr [Zq $ x]
q=1
With the basic equivalence (8.6) and the definition (8.7) of the renewal function p(w), we
arrive at
] {
Pr ZQ (w) $ { = Pr [ A w] +
Pr [ A w 3 x] gp (x)
0
This equation holds for all {. If { = w, we can use the renewal equation,
] w
] w
Pr [ A w 3 x] gp (x) = p(w) 3
0
Pr [ $ w 3 x] gp (x)
0
= p(w) 3 p(w) + I (w)
which indeed confirms Pr ZQ (w) $ w = 1.
(ii) The generating function of the number of renewals in the interval [0> w] is with (8.10)
"
l [
k
Pr [Q(w) = n] } n
*Q (w) (}) = H } Q (w) =
n=0
= Pr [Q(w) = 0] +
] w #[
"
$
Pr [Q(w 3 v) = n 3 1] } n
i (v)gv
0
= Pr [Q(w) = 0] + }
n=1
] w #[
"
0
] w
= Pr [Q(w) = 0] + }
0
$
Pr [Q(w 3 v) = n] } n
i (v)gv
n=0
*Q (w3v) (}) gI (v)
From (8.6), we have that Pr [Q(w) = 0] = 1 3 I (w) and
] w
*Q (w) (}) = 1 3 I (w) + }
0
*Q (w3v) (}) gI (v)
By derivation with respect to }, we arrive at the dierential-integral equation for the
derivative of the generating function,
*0Q (w) (}) =
=
] w
0
] w
*Q (w3v) (}) gI (v) + }
*Q (w) (}) 3 1 + I (w)
}
0
] w
+}
0
*0Q (w3v) (}) gI (v)
*0Q (w3v) (}) gI (v)
502
Solutions of problems
which reduces to the renewal equation (8.9) for } = 1 since *0Q (w) (1) = p(w). The second
derivative
] w
] w
0
(})
=
2
*
(})
gI
(v)
+
}
*00
*00
Q (w)
Q (w3v)
Q (w3v) (}) gI (v)
0
2
2
= *0Q (w) (}) 3
}
}
0
] w
0
] w
*Q (w3v) (}) gI (v) + }
0
*00
Q (w3v) (}) gI (v)
evaluated at } = 1, is
*00
Q (w) (1) = 2p(w) 3 2I (w) +
] w
0
*00
Q (w3v) (1) gI (v)
The variance Var[Q(w)] follows from (2.27) as
2
0
0
Var[Q(w)] = *00
Q (w) (1) + *Q (w) (1) 3 *Q (w) (1)
] w
= 3p(w) 3 p2 (w) 3 2I (w) +
*00
Q (w3v) (1) gI (v)
0
(iii) Every time an IP packet is launched by TCP, a renewal occurs and the reward is that 2000
km are travelled, in each renewal, thus Uq = 2000 km. The speed in a trip that suers
from congestion is, on average, 40 000 km/s, while the speed without congestion experience
is 120 000 km/s. Since congestion only occurs in 1/5 cases, the average length (in s) of a
renewal period is
H [ ] =
2000
4
2000
1
7
× +
× =
120000
5
40000
5
300
The average speed of an IP packet (in km/s) then follows from (8.20) as
lim
w<"
H [U]
2000
U(w)
=
= 7 = 85714=3
w
H [ ]
300
(iv) Every transmission of an ATM cell is a renewal with average length of the renewal interval
equal to H [ ] = Q@u, where 1@u is the mean interarrival time for a voice sample. If q is
the time between the q-th and q + 1-th arrival of sample, then the average total cost per
ATM cell transmission equals
H [U] = H
% Q
[
&
qf × q + N = f
q=1
=
Q
[
qH [q ] + N
q=1
f Q(Q 3 1)
+N
u
2
H[U]
(Q 31)
.
Hence, the average cost per unit time incurred in UMTS is H[ ] = f 2 + Nu
Q
(v) (a) The replacement of a router is a renewal process where the time at which router Um is
replaced is Zm = plq([m > W ), and
D> if [m $ W
Um =
E> if [m A W
The average cost per renewal period is H [U] = D Pr [[m $ W ] + E Pr [[m A W ] and the
average length of a renewal interval equals
] "
H [[] =
] "
Pr [Zm A w] gw =
0
] W
Pr [min ([m > W ) A w] gw =
0
The time average cost rate of the policy ChangeRouter is F =
Pr [[m A w] gw
0
H [U]
.
H [[]
C.5 Discrete-time Markov chains (Chapter 9)
503
(b) For D = 10000> E = 7000, Pr [[m $ W ] = 1 3 h3W with mean life time
and W = 5, we have
1
= 10 years
H [U] = D Pr [[m $ W ] + E Pr [[m A W ] = 10000 × 1 3 h31@2 + 7000 × h31@2 ' 8200
and
] W
H [[] =
] 5
Pr [[m A w] gw =
0
h30=1w gw = 10 × 1 3 h31@2 ' 4
0
such that time average cost rate of the policy ChangeRouter is F '
8200
= 2050.
4
C.5 Discrete-time Markov chains (Chapter 9)
(i) (a) The Markov chain is drawn in Fig. C.1
0.2
0.8
0.2
1
0.2
2
3
0.8
0.8
Fig. C.1. Three-state Markov chain.
n
(b) The steady-state vector is computed via (9.24). The sequence S 2
gences to yield three correct digits after four multiplications
5
S2 = 7
5
S8 = 7
0=800
0=640
0=640
0=160
0=320
0=160
6
0=040
0=040 8
0=200
S4 = 7
5
0=762
0=761
0=761
0=190
0=191
0=190
6
0=048
0=048 8
0=048
S 16 = 7
5
rapidly conver-
0=768
0=742
0=742
0=168
0=211
0=186
6
0=046
0=046 8
0=072
0=762
0=762
0=762
0=190
0=190
0=190
6
0=048
0=048 8
0=048
from which we find that the row vector in S 16 equals = 0=762 0=190 0=048 .
The second method consists in solving the set (9.25) by Cramer’s method. Hence,
30=2
det P = 0=2
1
2 =
30=2
0=2
1
0=8
31=0
1
0
0
1
det P
0=0 0=8 = 0=84
1 1 =
0=0 0=8 1 = 0=19
1 =
0
0
1
0=8
31
1
0=0 0=8 1 det P
30=2
0=2
1
0=8
31
1
det P
= 0=762
0 0 1 = 0=048
The third method relies on the specific structure of the Markov chain, a discrete birth and
dead process or general random walk with constant sn = s and tn = t. Applying formula
504
Solutions of problems
(11.9), taking into account that Q = 2, = 0=2
= 14 yields,
0=8
1 =
1 3 14
1 3 413
=
16
= 0=762
21
4
= 0=190
21
1
3 = 2 =
= 0=048
21
2 = 1 =
(ii) The Markov chain is shown in Fig. C.2. The state 1 is an absorbing state. From (9.23),
1
1
2
1
2
1
2
1
3
2
3
3
3
4
4
1
4
4
5
5
1
5
...
N
1
n
Fig. C.2. A recurrent Markov chain with positive drift.
the steady-state vector components are found as
1 =
Q
[
n
n=1
n
2 = 0
m32
m31
m =
m31
mD2
or 1 = 1 and m = 0 for m A 1. Hence, the steady-state vector exists, and is dierent from
= 0, which demonstrates that the Markov chain is positive recurrent for any number of
states Q. However, the drift for m A 1 because m = 1 is absorbing is
H [[n+1 3 [n |[n = m] = 1 3
1
1
2
3 =13
m
m
m
which is always positive for m A 2. Hence, given an initial state m A 2, the Markov chain
will, on average, move to the right (higher states).
(iii) (a) The Markov chain is shown in Fig. C.3.
1 pb
pb
b
1 py
y
1 pm
m
1 po
u
py
pm
po
Fig. C.3. Markov chain of the growth process of trees in a forest during a period of 15
years.
C.5 Discrete-time Markov chains (Chapter 9)
(b) The evolution of the Markov process is defined by
6 5
5
se
e[n + 1]
s|
sp
0
0
9 |[n + 1] : 9 1 3 se
7 p[n + 1] 8 = 7
0
0
1 3 s|
x[n + 1]
0
0
1 3 sp
505
6
6 5
e[n]
sr
0
: 9 |[n] :
8 · 7 p[n] 8
0
x[n]
1 3 sr
(c) The number of trees in each category after 15 years (one period) is
5
6 5
6 5
6 5
6
e[1]
0=1 0=2 0=3 0=4
5000
500
0
0
0 : 9 0 : 9 4500 :
9 |[1] : 9 0=9
·
=
7 p[1] 8 = 7 0
0=8
0
0 8 7 0 8 7 0 8
x[1]
0
0
0=7 0=6
0
0
and after 30 years (two periods)
5
6 5
e[2]
0=1
9 |[2] : 9 0=9
7 p[2] 8 = 7 0
0
x[2]
0=2
0
0=8
0
0=3
0
0
0=7
6 5
6 5
6
500
950
0=4
0 : 9 4500 : 9 450 :
·
=
0 8 7 0 8 7 3600 8
0=6
0
0
(d) The steady-state vector obeys equation (9.22) or, equivalently, (9.25). Applying a
variant of (9.25), we have
6 5
6 5
6
5
1
1
1
1
e
1
31
0
0 : 9 | :
9 0 : 9 1 3 se
·
7 0 8=7
31
0 8 7 p 8
0
1 3 s|
0
0
0
1 3 sp 3sr
0
The determinant is det S = 3s0 3 (1 3 se ) (1 + 2s0 3 s| 3 sp + s| sp 3 s0 s| ) and via
Cramer’s method we have
5
6
1
1
1
1
1
0
31
0
0
9
:
e =
det 7
0 1 3 s|
31
0 8
det S
0
0
1 3 sp 3sr
6
5
31
0
0
7
31
0 8
det 1 3 s|
0
1 3 sp 3sr
3sr
=
=
det S
det S
With the numerical values given in (c), e = 0=25773. After a similar calculation for the
other categories, the total number of trees in steady growth is
6 5
6
5
1289
5000e
9 5000| : 9 1160 :
7 5000 8 ' 7 928 8
p
50000
1624
(iv) (a) The clustered error pattern is modeled as a two-state discrete Markov chain. When a
bit is received incorrectly, the system is in state 0 else it is in state 1. The Markov chain
is shown in Fig. 9.2, wheres = 1 3 0=95 =
0=05 and t = 1 3 0=999 = 0=001. The transition
0=95
0=05
.
probability matrix is S =
0=001 0=999
(b) There is only one communicating class because both states 0 and 1 are reachable from
each other. The Markov chain is therefore irreducible.
(c) The steady-state vector follows from (9.37) as
1
= 50
= 0=0196 0=9804
51
51
The fraction of correctly received bits in the long run is 98.04% and the fraction of incorrectly received bits is 1.96%.
(d) After repair, the system operates correctly in 99.9% of the cases, which implies that
506
Solutions of problems
s
t
1 = 0=999 and 0 = 0=001. Formula (9.37) indicates that s+t
= 0=999 and s+t
= 0=001
or s = 999t. The test sequence shows that
Pr[[0 = 1> = = = > [11 = 1] = (1 3 t)10 Pr [[0 = 1] = (1 3 t)10 = 0=9999
which leads to t ' 1035 and thus, s = 0=0999. A correctly (incorrectly) received bit is
followed by a next correctly (incorrectly) received bit with probability 1 3 t = 0.999 99,
respectively 1 3 s = 0.100 01.
C.6 Continuous-time Markov processes (Chapter 10)
(i) (a) The failure rate for each processor is = 0=001 per hour. The repair rate is = 0=01
per hour. The Markov chain is shown in Fig. C.4.
2O
O
1
2
3
P
2P
Fig. C.4. The Markov chain for the three states: (1) both processors work, (2) one processor
is damaged and (3) both processors are damaged.
(b) The infinitesimal generator is
3
T=C
32
0
2
3( + )
2
4
0
D
32
g
(v(w)), or
If the state probability vector is denoted by v(w), we can also write v(w)T = gw
3
[v1 (w)
v2 (w)
v3 (w)] · C
32
0
2
3( + )
2
4
0
D = v01 (w)
32
v02 (w)
v03 (w)
(c) The steady-state = limw<" v (w) obeys the equation (10.19)
3
1
2
3
32
·C 0
2
3( + )
2
4
0
D = [0
32
0
0]
2
2
2
, 2 = (+)
. From
Since 1 + 2 + 3 = 1, we find that 1 = +
2 and 3 =
+
the balance equation, we know that the probability flux from state 1 to state 2 should
precisely equal that in the opposite direction such that 21 = 2 and similar for the
transitions 2 < 3, 1 = 22 . Using 1 + 2 + 3 = 1 leads faster to the solution. With
= 0=001 and = 0=01, the values are 1 = 0=8264, 2 = 0=1653 and 3 = 0=0083.
(d) The availability in case (i) is 1 = 0=8264. The availability in case (ii) is 1 +2 = 0=9917.
(ii) (a) In state 0, both servers are damaged, state 1 refers to one server down and one operating
while in state 2, both servers are operating. The corresponding Markov chain is shown in
Fig. C.5.
6
5
0
E
3E
8
7
I + H 3I 3 H 3 (b) The infinitesimal generator T =
2K
3H 3 2K
H
C.6 Continuous-time Markov processes (Chapter 10)
507
OB
O
0
1
PF + PE
2
PH
PE
1 31
1 31
= 6=66 × 1032 h 31 , E =
=
h
h
15
20
5 × 1032 h 31 , K = 3 × 1034 h 31 , I = 7 × 1034 h31 and H = 6 × 1035 h31 .
Fig. C.5. The Markov chain is specified by =
(c)
The steady-state vector obeys (10.19). The solution of T = 0 is
5
0
1
2
3E
7 I + H
H
0
3I 3 H 3 2K
6
E
8= 0
3H 3 2K
0
0
Since this linear set of equation
S is undetermined, we remove an arbitrary equation and add
the normalization condition 2m=0 m = 1,
5
0
1
2
3E
7 I + H
H
0
3I 3 H 3 2K
6
1
1 8= 0
1
0
1
The steady-state probabilities are
E (I + H + )
= 0=9898
(H + E ) (I + H + ) + 2K (I + H + E )
2K
1 =
2 = 0=0088
(I + H + )
2 =
0 = 1 3 2 3 1 = 0=0013
(d) Theorem 10.2.3 states that the average lifetime of state m is H [m ] =
1
. This yields
tl
1
1
=
= 20 h
t0
E
1
1
H [1 ] =
=
= 14=9 h
t1
I + H + 1
1
=
= 1515 h
H [2 ] =
t2
H + 2K
H [0 ] =
(e) A repair takes place when the system transfers from state 1 to 2. When the system
jumps from state 0 to state 2, two repairs take place. The fraction of time during which
both servers are damaged is 0 and the fraction of time in which one server is operating is
1 . The rate of repairs will be the rate of changing from state 1 to 2, plus two times the
rate of changing from state 0 to state 2:
iu = 1 t12 + 20 t02 = 1 + 20 E = 7=17 1034
If we denote with [ the random variable of the number of total failures over the period of
1 year, then the average value of [ will be
H [[] = iu × 24 × 365 = 6=28
508
Solutions of problems
C.7 Continuous-time Markov processes (Chapter 11)
(i) In both cases we apply the general formulae (11.15) and (11.16) for the steady-state of a
general birth and death process.
(a) Using the notation = , we first compute
m31
\
m31
m
\
m \ 1
m
p
=
= m
=
(p + 1) p=1 p
m!
p=0 p+1
p=0
Then, with (11.16),
m =
m
m!
S
m
1+ "
m=1 m!
=
m 3
h
m!
mD0
which demonstrates that the steady-state probability that the birth and death process is
in state m is Poisson distributed with mean .
and p = ,
(b) Similarly, we first compute with p = (p+1)
m31
\
m31
\
m
p
=
=
(p + 1)
m!
p=0 p+1
p=0
which leads to precisely the same steady-state as in (a). Indeed, the steady-state is only a
function of the ratios p , which are the same in both (a) and (b).
p+1
(ii) All stations in slotted ALOHA operate independently and each has probability sw = 0=12 to
transmit in a timeslot. A station is successful in one slot with probability sv = sw (13sw )Q 31
where the number of stations Q = 8. Thus, sv = 0=049. The waiting time Z to transmit
one packet is a geometric random variable with parameter sv from which (Section 3.1.3)
the mean H [Z ] = s1 . Alternatively, H [Z ] obeys the equation
v
H [Z ] = sv + (1 3 sv ) (1 + H [Z ])
because the average waiting time equals 1 timeslot with probability sv plus 1 timeslot
increased with the average waiting time with probability 1 3 sv . Solving that equation
again yields H [Z ] = s1 = 20=39 timeslots. The average transmission time for 7 packets is
v
7H [Z ] = 142=7 timeslots.
C.8 Queueing (Chapters 13 and 14)
(i) Let us denote the number of packets in the server by Q{ . Since a router either serves 0 or
1 packet, the problem states that Pr [Q{ = 1] = 0=8, and also that H [Q{ ] = 0=8. For any
queue, it holds that QV = QT + Q{ and W = z + {, the number in the system equals the
number in the buer and the number that is being served. From Little’s Theorem (13.21),
1
that
it follows with H [{] = H [Q{ ] =
or = H [Q{ ]. Substituted into Little’s law for the waiting time in the buer, H QT =
H [z], and using H QT = 3=2 gives
H [z] =
H QT 1
4
=
H [Q{ ] (ii) In a M/M/m/m queue, the number of busy servers equals the number (of packets) in the
C.8 Queueing (Chapters 13 and 14)
509
system QV . From (14.16) and the definition (2.11), the average number of busy servers
equals
H [QV ] =
p
[
m=0
m Pr [QV = m] = Sp
1
p
[
m
m
(m 3 1)!m
m=0 m!m m=1
The sum can be rewritten as
6
5
p31
p
[ m
7[ m
p 8
m
=
=
3
(m 3 1)!m
m=0 m!m
m=0 m!m
p!p
m=1
p
[
such that
6
5
p
7
p!p
8 = (1 3 Pr [QV = p])
H [QV ] =
1 3 Sp
m
m
m=0 m!
where the last probability is recognized as the Erlang B formula (14.17).
(iii) (a) Since the average service rate = 2 s31 , the average response time (average system
time) follows from (14.2) as
1
H WM /M / 1 =
23
(b) If H WM / M / 1 = 2=5 s, then it follows from (a) that = 1=6 s31 . Hence, the number
of jobs/s that can be processed for a given average response time of 2.5s equals 1.6 jobs/s
(c) A 10% increase
in arrival rate corresponds to = 1=76 s31 and from (a) we obtain that
1
H WM / M / 1 = 0=24 = 4=17 s, which is with respect to 2.5s an increase in average response
time of 67%.
(iv) We know that when the average call holding time is 1@ = 10 min, the time blocking
1
. Additionally, for Poisson call arrivals, the time blocking probability
probability SB t = 10
SB t equals the call blocking probability SB on the PASTA property. The number of
channels is p = 2. The arrival intensity can be calculated from the Erlang B formula
(14.17)
SB =
u2 @2
1 + u + u2 @2
where u = @. Solving this equation
for u = 2 and taking into account that the tra!c
t
intensity M [0> 1] yields =
I
SE +
2
2SE 3SE
13SE
I
1
. For SB = 10
, we have that u = 1+9 19
from which = 1+90 19 . The blocking probability (14.17) corresponding to an average call
I
1 + 19
× 15 is SB t E 0=174.
holding time 1@ = 15 min for which u = @ =
90
(v) The queue 1 is a M/M/1 queue. By Burke’s Theorem 14.1.1, the departure process from
queue 1 is a Poisson with rate . By assumption, this departure process which is the
arrival process to the second queue is also independent of the service process at queue 2.
Therefore, queue 2, viewed in isolation, is also a M/M/1 queue. We know that the queueing
processes in both queues are stable because the load 1 = ? 1 and 2 = ? 1. The
1
2
steady-state distribution of the number of customers in queue 1 and queue 2 follow from
q
p
(14.1) as Pr[q at queue 1] = 1 (1 3 1 ) and Pr[p at queue 2] = 2 (1 3 2 ). The number of
customers presently in queue 1 is independent of the sequence of earlier arrivals at queue
2 and therefore also of the number of customers presently in queue 2. This implies that
p
Pr[q at queue 1> p at queue 2] = Pr[q at queue 1] Pr[p at queue 2] = q
1 (131 )2 (132 )
(vi) The average system times H [W ] for the three dierent queueing systems are immediate.
From (14.2), for system A, we have
H [WD ] =
1
n (1 3 )
510
Solutions of problems
For each of the n subqueues of system B, (14.2) gives
H [WE ] =
1
(1 3 )
while for the M/M/k queue, (14.14) yields
H [WF ] =
n (1 3 ) + Pr [QV D n]
n (1 3 )
Clearly, H [WE ] = nH [WD ] shows that by replacing n small systems by one larger system
with same processing capability, the average system time decreases by a factor n.
From the relation H [WF ] = i (n> )H [WD ] with i (n> ) = n (1 3 ) + Pr [QV D n], it is
more complicated to decide where i (n> ) is larger or smaller than 1. The extreme values
Ci (n>)
C Pr[QV Dn]
of i (n> ) are known: i (n> 0) = n and i (n> 1) = 1. Since
= 3n +
C
C
(n>)
V Dn]
and C Pr[Q
A 0, it cannot be concluded that CiC
is monotonously decreasing
C
from n to 1 in which case we would have i (n> ) A 1. Assuming n real, we observe that
Ci (n>)
C Pr[QV Dn]
= (1 3 ) +
A 0 for all ? 1, which implies that i (2> ) ? i (3> ) ? · · ·
Cn
Cn
and allows us to concentrate only on i (2> ). Numerical results show that i (2> ) ? 1 if
D 0=85, but i (3> ) D 1. This leads us to the conclusion that for n A 2, system A always
outperforms system C; only if n = 2 and in the heavy tra!c regime D 0=85, system C leads
to a slightly shorter average system time of maximum 1.7%. Hence, by replacing n A 2
processing units (servers) by one with same processing capability, always lowers the total
time spent in the system. Of course, all conclusions only apply to systems that can be well
modeled as M/M/m queueing systems. To first order a computing device (processor) may
be regarded as a M/M/1 queue. Then, the analysis shows that replacing an old processor
by a n times faster one is faster (on average) than installing n old processors in parallel.
(vii) The waiting process for aeroplanes is modeled as a M/D/1 queue because the arrival process
1
is Poissonean with rate = 10
arrivals/minute, it consists of a single queue as 1 aeroplane
can land at a time and the service process (the landing process) takes precisely { = 5
5
. Since
minutes (constant service time). Thus, H[{] = 5 minutes> Var[{] = 0 and = 10
the M/D/1 process is a special case of the M/G/1, we can apply the general formula (14.28)
for the average waiting time in the queue of an M/G/1 system
1
H[z] =
· 52
H[{2 ]
= 10 1 = 2=5 minutes
2(1 3 )
2· 2
(viii) (a) We know that the arrival intensity of new calls to the cell is i = 20 calls/min. Let k
denote the arrival rate of the handover calls. The average time spent by a call in the cell is
H [W ] = 1=64 minutes and the average number of ongoing calls is H [Q] = 52. Furthermore,
the blocking rate is SE = 0=02. The total arrival rate of calls that are carried by the base
station is
c a rrie d = (1 3 SE ) o e red = (1 3 SB ) (f + h )
Little’s formula (13.21) states that H [Q] = c a rried H [W ]. Note that only the carried calls
have an influence on the state of the system. We can solve the asked k from these two
equations as
h =
H [Q]
3 f = 12=35 calls/minute
H [W ] (1 3 SB )
(b) The arrival intensity of lost calls is lo st = SB (f + h ) = 0=647 calls/minute. If only
the new calls are blocked, the asked blocking rate is
SB f =
lo st
= 3=24%
f
(ix) (a) The derivation is given in the study of
the
M/G/1 queue in Section 14.3.1 where D (})
given by (14.25) should be replace by H } Q .
C.8 Queueing (Chapters 13 and 14)
511
1
b) Use in (14.25) the Laplace transform of an exponential random variable with mean given in (3.16)
*{ (v) =
+v
One obtains
k l
H } Q = *{ ( (1 3 })) =
@ ( + )
=
(1 3 }) + 1 3 (@ ( + )) }
[
n
" }n
= 13
+ n=0 + from which the probability density function
Pr [Q = n] =
13
+
+
n
follows. Thus, Pr [Q = n] is recognized as a geometric random variable with mean H [Q] =
.
(x) The queueing system is modeled by a birth and death process. The death rate is obvious
and equal to . The arrival rate into state m equals the arrival rate of customers multiplied
1
by the probability of really going to that state m which is m+1
. Hence, m = m+1
. The
steady-state equation of this birth and death process is a Poisson process with rate = as derived above in Section C.7 solution (i).
(xi) The M/M/m/m/s queue (The Engset formula ). The arrival rate in Engset model is proportional to the size of the “still demanding subgroup” and the number of arrivals is
1
exponential. The holding time of a line is exponentially distributed with mean .
The Engset model is described as a birth—death process where each state n refers to the size
of the “served subgroup”. Since the total of customers is v, the “still demanding subgroup”
consists of v 3 n members. The birth rate is m = (v 3 n) and the death rate n = n. The
proportionality factor can be interpreted as the arrival rate per “still demanding customer”. The Markov graph is depicted in Fig. C.6. Application of the general birth—death
sD
1
(s 1)D
2
P
(s 2)D
...
3
2P
(s m + 1)D
3P
m
mP
Fig. C.6. The Markov chain of the Engset loss model.
formulae for the steady-state vector (11.16) or Pr [QV = m] yields, with u = ,
Tm
(v3n+1)
n
m =
Sp Tq (v3n+1)
1 + q=1 n=1
n
n=1
v m
u
m
= Sp v q
q=0 q u
The computation of the blocking probability is more complex than for the Erlang B formula,
because the arrival process is not a Poisson process. Indeed, due to the finite number of
customers v, the largest number of possible arrivals is finite and the arrival rate depends
on the state. Hence, the PASTA property cannot be applied. For a small time interval
{w, the blocking probability SE equals the ratio of the se ({w), the probability of blocking
in {w, over sd ({w), the probability of an arrival in {w. Since the arrival rates depend on
512
Solutions of problems
the state, the probability of an arrival in {w is not equal to p as for the Erlang B model.
Instead, we have
Sp
p
[
(v 3 q) v uq
(v 3 q) Pr[QV = q] = {w q=0
sd ({w) = {w
Sp v q
q
q=0 q u
q=0
Sp v31 q
u
q=0
q
= v{w S
v q
p
q=0 q u
Furthermore, blocking is only caused if QV = p and if at least one of the v 3 p customers
of the “still demanding group” generates an arrival. However, since the interval {w can
be made arbitrarily small1 , a generation of more than 1 arrival has probability r({w) such
that it su!ces to consider only one call attempt. Hence,
v31 p
u
se ({w) = {w(v 3 p) Pr[QV = p] = v{w Spp v q
q=0 q u
The Engset call blocking probability SE = sse ({w)
becomes
({w)
d
v31 p
u
SE = Sp pv31
uq
q=0
q
(C.3)
Observe that SE = Pr [QV = p] in a system with v 3 1 instead of v customers: an entering
customer observes a system with v 3 1 customers ignoring himself. At last, if we denote
= v, the Engset call blocking formula (C.3) can be rewritten as
p
SE = S
1
p!
(v313p)!vpq
p
q=0
q!(v313q)!
q
(v313p)!vpq
(v313p)!
The ratio (v313q)! is a polynomial in v of degree q3p such that limv<"
=
(v313q)!
1. In conclusion, if = v and v < ", the Engset call blocking probability reduces to the
Erlang B formula (14.17).
(xii) Although for a M/D/1 the exact expression of the overflow probability (14.44) exists, this
series converges slowly for high tra!c intensities = so that fast executable expressions
are desirable. Substituting (14.45) with = into (14.67) gives
fou() E
13 3N
13
31
13 3N
1 3 31
For su!ciently high loads A 0=8, we use the approximation E 32 of Section 5.7 to
obtain
(1 3 )2N
(C.4)
fouM / D / 1 / K '
1 3 2N+1
Comparing with (14.20) in the M/M/1/K queue,
fouM / M / 1/ K '
(1 3 )N
1 3 N+1
the M-server (in continuous-time) needs approximately twice as much buer places to guarantee the same cell loss ratio as in the corresponding D-server (in discrete-time). Further
combining (14.3) and (14.38) shows that
M / M / 1 H zM / M / 1 = 2M / D / 1 H zM / D / 1
or, the average waiting time in the queue (normalized to the average service time) for the
1
Similar arguments are used in Chapter 7 when studying the Poisson process.
C.9 General characteristics of graphs (Chapter 15)
513
M/M/1 queue is exactly twice as long as for the M/D/1 queue. The variability of the
service in the M-server causes these rather large dierences in performance. Furthermore,
the simple formula (C.4) is particularly useful to engineer ATM buers or to dimension
simple queueing networks. If the number of individual flows that constitute the aggregate
flow are large enough and none of the individual flows is dominant, the aggregate arrival
process is quite well approximated by a Poisson process. Given as a QoS requirement a
stringent cell loss ratio fou W , the input flow can be limited such that fouM / D / 1 / K ? fouW .
Alternatively, the buer size N can be derived from (C.4) subject to fouM / D / 1 / K = fou W for
an aggregate Poisson input flow = 0=9. As long as the input flow is limited to ? 0=9,
the thus found buer size N always guarantees a cell loss ratio below fou W provided the
input flow can be approximated as a Poisson arrival process.
C.9 General characteristics of graphs (Chapter 15)
(i) In one dimension (g = 1), the hopcount kQ of the shortest path between two uniformly
chosen points {D and {E equals the distance between {D and {E . We allow the hopcount to
be zero which is reflected by the small k while capital K refers to the case where the source D
1
and the destination E are dierent. Thus, Pr[|{D 3 {E | = n] = ]
1n=0 + 2(]3n)
11$n$]31
]2
with corresponding generating function
*] ({) =
]31
[
Pr[|{D 3 {E | = n]{n =
n=0
] 3 ]{2 + 2{({] 3 1)
] 2 ({ 3 1)2
Since the nodes are uniformly chosen, all coordinate dimensions are independent and the
generating function of the hopcount of the shortest path in a g-lattice is2 *g] ({). From
g
(] 2 3 1) and
(2.26) and (2.27), the average number of hops is immediate as H[kQ ] = 3]
2
2
g(] 31)(] +2)
. The total number of nodes in the g-lattice is
the variance as Var[kQ ] =
18] 2
g
Q = ] such that, for large Q, we obtain
H[kQ ] '
g 1@g
Q
3
Var[kQ ] '
g 2@g
Q
18
and
both increasing in g A 1 (for constant Q )as inQ (for constant g). For a two-dimensional
I
Q .
lattice, the average hopcount scales as R
(ii) Using the definition (15.6) of the clustering coe!cient and applying the law of total probability (2.46) yields
31
l Q[
k
Pr
Pr fJs (Q ) $ { =
n=0
2|
$ { gy = n Pr [gy = n]
gy (gy 3 1)
The degree distribution Pr [gy = n] in the random graph is given by (15.11) and
k l
Pr
n
{
2
n
n n
[
2|
3m
2
sm (1 3 s) 2
{ gy = n =
$ { gy = n = Pr | $
m
2
g (g 3 1)
y
2
y
If the sizes of the hypercube are not identical, the pgf is
m=0
g
\
m=1
*]m ({).
514
Solutions of problems
because | is the number of links between the gy = n neighbors of y, which is binomially
distributed with parameter s. Combined gives
k l
n
{
2
31 n
l Q[
k
n
[
Q 3 1 n
3m
2
sm (1 3 s) 2
Pr fJs (Q ) $ { =
s (1 3 s)Q 313n
m
n
m=0
n=0
l
k
The average H fJs (Q ) is computed via (2.35) as
k
l ] 1
k
l
H fJs (Q ) =
Pr fJs (Q ) A { g{
0
=
Let w =
n
Q
31 [
] 1
2
[
n=0
0
k l
m= n
{ +1
2
Q 3 1 n
s (1 3 s)Q 313n
n
n
{, then
2
n
] 1
0
n
n
3m
2
sm (1 3 s) 2
g{
m
] n
n
n
2
1
3m
m
2
s (1 3 s) 2
g{ = n
m
k l
0
n
n
2
n
n
[
3m
2
sm (1 3 s) 2
gw
m
2
[
m=
2
{
2
m=[w]+1
n
n
2
2
n
1 [ [ n2 m
3m
s (1 3 s) 2
= n
m
w=0 m=w+1
2
Reversing the w- and m- sum yields
] 1
0
n
n
2 m31 n
n
n
1 [ [ n2 m
3m
3m
2
sm (1 3 s) 2
s (1 3 s) 2
g{ = n
m
m
k l
m=1 w=0
n
2
[
m=
2
2
{
n
2
n
1 [ n2 m
3m
s (1 3 s) 2
= n
m
=s
m
m=1
2
l
k
Hence, we find that H fJs (Q ) = s. Along the same lines, we find that the generating
function *f (}) of the clustering coe!cient fJs (Q ) is
*f (}) =
Q
31 [
n=0
Q 3 1 n
s (1 3 s)Q 313n
n
#
}
3 n
$ n
1 3 s + sh (2)
2
The variance is computed from (2.43) as
31 k
l Q[
Q 3 1 sn (1 3 s)Q 313n
Var fJs (Q ) = s 3 s2
n
n
n=2
2
(iii) The probability Pr [KQ = 2] is determined by the intersection of two independent events.
First, there is no direct path between node D and E. This event has a chance proportional
to 13s. Second, there is at least one path with two hops. All Q 32 possible two-hops paths
between D and E have the structure (D < m) (m < E) and they have no links in common,
i.e. they are mutually independent and independent from the direct link. The probability
of the second event equals 1 3 S2 , where S2 is the probability that there is no path with
C.10 The uniform recursive tree (Chapter 16)
515
two hops. Hence, we have that Pr [KQ = 2] = (1 3 s)(1 3 S2 ) and it remains to compute
S2 . The event of no path with two hops is
f
32
32
= KQ
Q
m=1 1(D<m)(m<E)
m=1 1((D<m)(m<E))f
such that
S2 = Pr
=
k
f l
k
l
32
32
= Pr KQ
Q
m=1 1(D<m)(m<E)
m=1 1((D<m)(m<E))f
Q
32
\
32
Q\
Q 32
1 3 Pr 1((D<m)(m<E)) = 1 3 s2
Pr 1((D<m)(m<E))f =
m=1
m=1
which demonstrates (15.26).
C.10 The uniform recursive tree (Chapter 16)
(i) The relative error u, defined as 1 minus the simulated value over the exact value at hop n
given in (16.14), versus the number of hops n is shown in Fig. C.7. The insert in Fig. C.7
illustrates that, on a linear scale, the dierence between simulation and theory (full line)
is not distinguishable for q D 105 iterations. The average H [uq ] and standard deviation
1
4
10 iterations
5
10 iterations
6
10 iterations
0.20
Pr[H50 = k]
Relative error
0.1
0.01
exact pdf
0.15
0.10
0.05
0.00
0.001
0
0
5
10
2
4
6 8 10 12 14
k hops
15
20
25
k hops
Fig. C.7. The relative error of the simulations of the hopcount in the complete graph with
exponential link weight versus the hopcount for 10 4 > 105 and 106 iterations.
[uq ] of the relative error for q iterations versus the hops n are
H [u104 ] = 0=12
H [u105 ] = 0=047
H [u106 ] = 0=017
[u104 ] = 0=17
[u105 ] = 0=073
[u106 ] = 0=02
where the range of n values has been limited for q = 104 to 10 hops, for q = 105 to 11
hops, and for q = 106 to 12 hops. For larger hops, the simulations return zeros because the
tail probability Pr [KQ A n] decreases as R (1@n!) and simulating such a rare event requires
on average at least as many simulations as (Pr [KQ = n])31 . The table roughly shows
516
Solutions of problems
that the average error over the non-zero returned values decreases as R I1q , which is in
agreement with the Central Limit Theorem 6.3.1. Each iteration of the simulation can be
regarded as an independent trial and the histogram sums in a particular way the number
of these trials.
(ii) Using (2.43), we have
2
2
0
Var [ZQ ] = *00
= *00
ZQ (0) 3 *ZQ (0)
ZQ (0) 3 (H [ZQ ])
$2
#
Q
31
Q
31
n
[
[
g2 \ q(Q 3 q) 1
1
1
=
3
Q 3 1 n=1 g} 2 q=1 } + q(Q 3 q) Q 3 1 q=1 q
}=0
where H [ZQ ] is given in (16.18). The derivatives of the product
j (}) =
n
\
q(Q 3 q)
}
+
q(Q 3 q)
q=1
are elegantly computed via the logarithmic derivative gj(})
= j (}) g logg}j(}) . The second
g}
2
2
2
g j(})
g log j(})
j(})
+ j (}) g log
. With
derivative is g}2 = j (})
g}
g} 2
n
n
[
g [
1
q(Q 3 q)
g log j (})
=
=3
log
g}
g} q=1
} + q(Q 3 q)
}
+
q(Q
3 q)
q=1
n
[
g2 log j (})
1
=
2
g} 2
q=1 (} + q(Q 3 q))
we obtain since j(0) = 1,
Q
31
[
1
Var [ZQ ] =
Q 3 1 n=1
$2
# n
[
1
q(Q
3 q)
q=1
Q
31 [
n
[
1
1
+
3
Q 3 1 n=1 q=1 q2 (Q 3 q)2
# SQ 31 1 $2
q=1 q
Q 31
(C.5)
The first sum is
Q
31
[
n=1
1
q(Q
3 q)
q=1
and, with
Q
31
[
n=1
$2
# n
[
Q
31 [
n
[
Q
31
n
[
[
1
1
=
=
q(Q 3 q) m=1 m(Q 3 m)
q=1
n=1 q=1
SQ 31 Sn
n=q
1
m=1 m(Q 3m)
q(Q 3 q)
SQ 31 Sn
# n
[
n=q
SQ 31 Sn
Sq31 Sn
1
1
1
m=1 m(Q 3m) =
m=1 m(Q 3m) 3
m=1 m(Q 3m) ,
n=1
n=1
1
q(Q
3 q)
q=1
$2
3
4
Q
31 [
q31
n
n
[
[[
1
1
1
C
D
=
3
q(Q 3 q) n=1 m=1 m(Q 3 m) n=1 m=1 m(Q 3 m)
q=1
4
3
Q
31
Q
31
Q
31
q31
q31
[
[
[
[
[
1
1
1
C
13
1D
=
q(Q 3 q) m=1 m(Q 3 m) n=m
m(Q 3 m) n=m
q=1
m=1
Q
31
[
=
Q
31
[
Q
31
Q
31
q31
[
[
[
1
1
1
1
3
q(Q
3
q)
m
(Q
3
q)
m(Q
3 m)
q=1
q=1
m=1
m=1
+
Q
31
[
q31
[
1
1
q(Q 3 q) m=1 (Q 3 m)
q=1
C.10 The uniform recursive tree (Chapter 16)
Furthermore, since
1 Sn
Sn
1
q=1 q(Q 3q) = Q
517
1 SQ 31
1
q=Q 3n q , we have
1
q=1 q + Q
Q
31
[
q31
Q 31
q31
Q 31
Q
31
[
[ 1
[
1 [
1 [
1
1
1
1
1
=
+
(Q
3
q)
m(Q
3
m)
Q
(Q
3
q)
n
Q
(Q
3
q)
n
q=1
q=1
q=1
m=1
n=1
n=Q 3q+1
Q[
3m31
Q 31
=
1 [ 1
Q m=1 m
n=1
Q 31
Q 31
1 [ 1 [ 1
1
+
n
Q m=1 m n=m+1 n
and
q31
q31
Q 31 [
[
1 [ 1
1
1
1
1
=
+
q(Q
3
q)
(Q
3
m)
Q
q
Q
3
q
(Q
3 m)
q=1
q=1
m=1
m=1
Q
31
[
=
Q 31
Q
31
Q 31
Q 31
[
1 [ 1 [ 1
1
1 [ 1
+
Q m=1 m n=Q 3m+1 n
Q m=1 m n=m+1 n
Hence,
Q
31
[
n=1
$2
# n
[
#Q 31 $2
[ 1
2
=
Q
+
1
=
Q
=
Q 31
Q[
3m31
Q 31
Q
31
[
1 [ 1
1
1
+
q
n
Q
m
n
q=1
m=1
n=1
n=Q 3m+1
3
4
#Q 31 $2
Q
31
Q
31
Q
31
[ 1
[ 1
1 [ 1 C[ 1
2
D
3
=
3
Q q=1 q
Q m=1 m n=1 n n=Q 3m n
1
q(Q
3 q)
q=1
3
1 [ 1
Q m=1 m
Q 31
Q
31
[
1 [ 1
1
Q m=1 m n=Q 3m+1 n
#Q 31 $2
[ 1
q=1
q
#Q 31 $2
[ 1
1
Q
q=1
q
+
Q 31
Q 31
Q
31
[
1 [ 1 1
2 [ 1
1
+
Q m=1 m Q 3 m
Q m=1 m n=Q 3m+1 n
+
Q 31
Q 31
Q
31
[
2 [ 1
1
2 [ 1
+
Q 2 m=1 m
Q m=1 m n=Q 3m+1 n
Substituted into (C.5) yields
S
2
Q 31 1
Var [ZQ ] = 3
q=1 q
2
(Q 3 1) Q
SQ 31 Sn
SQ 31 1
m=1
m
+
(Q 3 1) Q 2
2
SQ 31 1 SQ 31
m=1
m
1
n=Q 3m+1 n
Q (Q 3 1)
1
q=1 q2 (Q 3q)2
n=1
+
2
+
Q 31
Further,
Q
31 [
n
[
n=1 q=1
1
q2 (Q 3 q)2
=
Q
31
[
q=1
1
q2 (Q 3 q)2
Q
31
[
n=q
1=
Q
31
[
1
2 (Q 3 q)
q
q=1
1
such that
The partial fraction expansion of q2 (Q1 3q) = Q12 q + Q1q2 + Q 2 (Q
3q)
Q
31 [
n
[
n=1 q=1
1
q2 (Q 3 q)2
=
Q 31
Q 31
2 [ 1
1 [ 1
+
2
Q q=1 q
Q q=1 q2
518
Solutions of problems
Combined,
S
2
Q 31 1
Var [ZQ ] = 3
q=1 q
2
4
(Q 3 1) Q
SQ 31 1
2
q=1 q
+
(Q 3 1) Q 2
Q (Q 3 1)
+
SQ 31 1
Q
31
[
1
1
q=1 q2
+
m n=Q 3m+1 n
Q(Q 3 1)
m=1
Q
31
[
Invoking the identity
Q
31
[
Q
31
Q
31
[
[
1
1
1
=
Q 3 m n=m n
q2
q=1
m=1
(which can be verified by induction) yields
3
4
Q
31
Q
31
Q
31
Q
31
Q 31
[
[
[
[
1D
1
1
1
1
1 C[ 1
=
=
3
m n=Q 3m+1 n
Q 3 m n=m+1 n
Q 3 m n=m n
m
m=1
m=1
m=1
Q
31
[
=
Q
31
[
Q
31
Q
31
Q
31
Q 31
[
[
[
2 [ 1
1
1
1
1
3
3
=
2
Q 3 m n=m n
m (Q 3 m)
q
Q q=1 q
q=1
m=1
m=1
Finally, we arrive at (16.19).
(iii) The limit for Q < " of the probability generating function (16.17) of the weight ZQ of
the shortest path
l
k
*ZQ (}) = H h3}ZQ =
Q
31 \
n
[
q(Q 3 q)
1
Q 3 1 n=1 q=1 } + q(Q 3 q)
will be derived from which the distribution then follows by taking the inverse Laplace
transform. Since
3v
4 3v
4
2
2
Q
Q
Q
Q
C
D
C
} + q(Q 3 q) =
+}+
+}3
3q
3q D
2
2
2
2
u
we have with | =
Q
2
2
+ },
n
\
n
n
\
n!(Q 3 1)! \
q(Q 3 q)
1
1
=
Q
Q
}
+
q(Q
3
q)
(Q
3
n
3
1)!
3 q q=1 | 3
+q
q=1
q=1 | +
2
2
The products can be written in terms of the Gamma function,
3n
K |+ Q
2
=
Q
K |+ Q
q=1 | + 2 3 q
2
n
+1
K |3 Q
\
1
2
= Q
K |3 Q
+n+1
q=1 | 3 2 + q
2
n
\
1
Thus,
Q
31
K |3 Q
+ 1 Q[
K (n + 1) K | + 2 3 n
2
*ZQ (}) = (Q 3 2)!
K(Q 3 n) K | 3 Q + n + 1
K |+ Q
n=1
2
2
C.10 The uniform recursive tree (Chapter 16)
519
I
}
Let the number of nodes be even Q = 2P such that | = P 2 + } ; P + 2P
(provided
|}| ? 2P). The sum, denoted by V, can be split as
V=
=
2P
31
P
[
[
K (| + P 3 n)
K (| + P 3 n)
K (n + 1)
K (n + 1)
+
K(2P
3
n)
K
(|
3
P
+
n
+
1)
K(2P
3
n)
K
(|
3 P + n + 1)
n=1
n=P+1
P
31
[
P
31
[
K (P 3 m + 1) K (| + m)
K (P + n + 1) K (| 3 n)
+
K(P
+
m)
K
(|
3
m
+
1)
K(P 3 n) K (| + n + 1)
m=0
n=1
P
31
[
=
m=3(P 31)
K (| + m) K (P 3 m + 1)
K(P + m) K (| 3 m + 1)
and
*Z2P (}) = (2P 3 1)!
K (| 3 P + 1)
1
K (| + P) 2P 3 1
P
31
[
m=3(P 31)
K (| + m) K (P 3 m + 1)
K(P + m) K (| 3 m + 1)
For large P,
(2P 3 1)!
}
K } +1
}
K (| 3 P + 1)
; (2P)3 2P K
; K (2P) 2P
+1
}
K (| + P)
2P
K 2P + 2P
which suggests that we consider } < 2P} since then, using (Abramowitz and Stegun, 1968,
Section 6.1.47),
*Z2P (2P}) ; (2P)3} K (} + 1)
3}
; (2P)
1
2P 3 1
P
31
[
m=3(P 31)
1
1
1+R
1+R
P
P
K (} + 1)
Hence,
lim Q } *ZQ (Q}) = K (} + 1)
Q <"
or equivalently,
l
k
lim H h3(Q ZQ 3log Q )} = K (} + 1)
Q <"
(C.6)
The inverse Laplace transform of K (} + 1) is a Gumbel distribution (3.37) and we arrive
at the asymptotic distribution for the
k weight of thel shortest path (16.20).
Q
Since Pr [QZQ 3 log Q $ |] = Pr ZQ $ |+log
Q
from which after substitution of { =
|+log Q
Q
and ignoring the limit Q < ", it follows that Pr [ZQ $ {] = h3Q h
probability density function is found after derivation as
Q {
+{)
i˜ZQ ({) = Q 2 h3Q (h
Q{
, the
(C.7)
The goodness of this asymptotic distribution (C.7) for finite Q is illustrated in Fig. C.8.
2 3Q E 0. Since *
˜
Observe
ZQ (}) =
U " 3}w from Fig. C.8 that iZQ (0) = 1 while iZQ (0) = Q h
h
i
(w)
gw
is
a
single-sided
Laplace
transform,
integrating
by parts yields }*ZQ (}) =
ZUQ
0
0
0
iZQ (0) + 0" h3}w iZ
(w) gw provided iZ
(w) exists for all w D 0. Hence, we find a wellQ
Q
known limit criterion of single-sided Laplace transforms,
iZQ (0) = lim }*ZQ (})
}<"
(C.8)
Applied to (16.17) leads to iZQ (0) = 1 for all finite Q and applied to the scaled link
weight where the mean is d1 such that *ZQ;d (}) = *ZQ (d}) gives iZQ (0) = d. The
interpretation of this property is related to the choice of the link weights. The shortest
520
Solutions of problems
2
10
N = 200
N = 100
N = 50
1
10
0
fWN(x)
10
-1
10
-2
10
-3
10
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
x
Fig. C.8. The pdf of the weight of the shortest path for various Q. Each simulation consists
of 106 iterations. The bold curves represent the finite Q-equivalent (C.7) of the asymptotic
result.
path includes almost surely the smallest link weights of Iz ({) = iz (0){ + R {2 since
Iz (0) = 0. Both the exponential (with parameter 1) and uniform distribution are regular
with iz (0) = 1. Since the smallest values of the weight of the shortest path ZQ in
NQ occur for direct links a.s., the distribution of ZQ around zero is dominated by the
distribution of the link weight z around zero. The contribution cannot be due a q-hop
shortest path with q A 1 since such a path existsof thesum of q exponentials, which has a
probability density around { = 0 of the form R {q31 . Indeed, see (3.24) or apply (C.8)
T
n
to the pgf of a sum of q exponentials *Vq (}) = q
n=1 }+n .
(iv) When the intermediate nodes of the shortest path between a source D and a destination
E are removed from NQ , we obtain again a complete graph with Q 3 kQ + 1 nodes.
The resulting graph contains link weights that are not perfectly exponentially distributed
anymore nor are they perfectly independent, because we have removed a special set of nodes
and not a random set. But, since we have removed at each node of the shortest path, apart
from the shortest link, Q 3 3 other links, we assume that the dependence between NQ and
the reduced graph is ignorably small. Under these assumptions, the shortest node-disjoint
path in NQ is a shortest path in NQ 3KQ +1 with exponential link weight with mean 1.
The distribution of hopcount kqg
Q of that shortest node-disjoint path is
31
k
l Q[
Pr kqg
Pr kQ 3m+1 = n|kQ = m Pr [kQ = m]
Q =n '
m=0
The hopcount KQ of the shortest path in the complete graph NQ with independent exponential link weights with mean 1 is given in (16.8). With the assumption that
Pr kQ 3m+1 = n|kQ = m = Pr kQ 3m+1 = n
C.10 The uniform recursive tree (Chapter 16)
521
node disjoint shortest path
shortest path
0.20
Pr[HN = k]
0.15
N = 200
0.10
N = 100
N = 50
0.05
0.00
0
2
4
6
8
10
12
14
k hops
Fig. C.9. Both the pdf of the hopcount of the shortest path (thin line) and the shortest
node-disjoint path (bold line).
we obtain
(m+1)
Q 31 (n+1)
k
l
(31)n+1 [ VQ 3m+1 VQ
Pr kqg
Q =n '
Q!
(Q 3 m + 1)!
m=0
For large Q, we can use the Poisson approximation (16.13)
l
k
Pr kqg
Q =n E
Q 31
1 [ (log(Q 3 m + 1))n (log Q )m
Qn! m=0
Q 3m+1
m!
Since (log(Q 3 m + 1))n = logn Q 3 n(m31)
logn31 Q + R
Q
R Q12 , we have to highest order in Q,
logn1
Q2
1
1
and Q 3m+1
= Q
+
#
#
$$
Q 31
1 [ logn Q
logn31 Q
(log Q )m
+R
2
Qn! m=0
Q
Q
m!
#
#
$$
1 logn Q
logn31 Q
E
+R
E Pr [kQ = n]
n!
Q
Q2
l
k
Pr kqg
Q =n E
For large Q, we expect approximately that the hopcount of the shortest and that of the
shortest node-disjoint path have about the same distribution. The validity of the assumption is illustrated in Fig. C.9 for relatively small values of Q = 50> 100> and 200. Each
simulation consisted of q = 106 iterations. The corresponding weight of the shortest and
node-disjoint shortest path are drawn in Fig. C.10. The weight of the node-disjoint shortest
path is evidently always larger than that of the shortest path in the same graph. Nevertheless, for large Q , the simulations suggest that both pdfs tend to each other.
522
Solutions of problems
shortest path
node-disjoint shortest path
10
N = 50
N
fW (x)
1
0.1
N = 100
N = 200
0.01
0.001
0.0
0.1
0.2
0.3
0.4
0.5
x
Fig. C.10. Pdf of the weight of the shortest path (thin line) and the node-disjoint shortest
path (bold line).
C.11 The e!ciency of multicast (Chapter 17)
(i) Using (17.25), we obtain
jQ>n (2) = Q 3 1 3
G31
[
n
G3m
m+1
m+1
Q 3 1 3 n n3131
Q 3 2 3 n n3131
m=0
=
(Q 3 1)(Q 3 2)
(2Q 3 3)G
(2Q 3 3)
(2Q 3 3)GQ
+
3
+
(Q 3 1)(Q 3 2)
(Q 3 1)(Q 3 2)(n 3 1)
(Q 3 2)(n 3 1)
Q(Q 3 1 3 2G)
1
(Q 3 1 3 2G)
3
3
3
(Q 3 1)(Q 3 2)(n 3 1)
(Q 3 1)(Q 3 2)(n 3 1)2
(Q 3 2)(n 3 1)2
or, for large Q,
jQ>n (2) ; 2G 3
3
+R
n31
logn Q
Q
the eective power exponent W (Q) as defined in (17.32), equals for the n-ary tree and
large Q ,
3
2G3 n1
log
1
G3 n1
W (Q) ;
log 2
5
6
1
8
= 1 + log2 71 3
1
2(n 3 1) logn Q + logn (1 3 1@n) 3 n31
1
1
;13
= 1 + log2 1 3
2(n 3 1)H[KQ ]
(log 4)(n 3 1)H[KQ ]
which shows, for large Q, that W (Q) ? 1, but that W (Q) < 1 if n < ".
Bibliography
Abramowitz, M. and Stegun, I. A. (1968). Handbook of Mathematical Functions.
(Dover Publications, Inc., New York).
Allen, A. O. (1978). Probability, Statistics, and Queueing Theory. Computer Science
and Applied Mathematics, (Academic Press, Inc., Orlando).
Almkvist, G. and Berndt, B. C. (1988). Gauss, Landen, Ramanuyan, the ArithmicGeometric Mean, Ellipses, and the Ladies Diary. American Mathematical
Monthly 95, 585—608.
Anick, D., Mitra, D., and Sondhi, M. M. (1982). Stochastic theory of a datahandling system with multiple sources. The Bell System Technical Journal 61, 8
(October), 1871—1894.
Anupindi, R., Chopra, S., Deshmukh, S. D., Van Mieghem, J. A., and Zemel, E.
(2006). Managing Business Flows. Principles of Operations Management , 2nd
edn. (Prentice Hall, Upper Saddle River).
Barabasi, A.-L. (2002). Linked, The New Science of Networks. (Perseus, Cambridge,
MA).
Baran, P. (2002). The beginnings of packet switching - some underlying concepts:
The Franklin Institute and Drexel University seminar on the evolution of packet
switching and the Internet. IEEE Communications Magazine, 2—8.
Berger, M. A. (1993). An Introduction to Probabiliy and Stochastic Processes.
(Springer-Verlag, New York).
Bertsekas, D. and Gallager, R. (1992). Data Networks, 2nd edn. (Prentice-Hall
International Editions, London).
Billingsley, P. (1995). Probability and Measure, 3rd edn. (John Wiley & Sons, New
York).
Bisdikian, C., Lew, J. S., and Tantawi, A. N. (1992). On the tail approximation
of the blocking probability of single server queues with finite buer capacity.
Queueing Networks with Finite Capacity, Proc. 2nd Int. Conf., 267—280.
Bollobas, B. (2001). Random Graphs, 2nd edn. (Cambridge University Press,
Cambridge, UK).
Borovkov, A. A. (1976). Stochastic Processes in Queueing Theory. (Springer-Verlag,
New York).
Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. (Cambridge University Press, Cambridge).
Brockmeyer, E., Halstrom, H. L., and Jensen, A. (1948). The Life and Works of
A. K. Erlang. (Academy of Technical Sciences, Copenhagen).
523
524
BIBLIOGRAPHY
Chalmers, R. C. and Almeroth, K. C. (2001). Modeling the branching characteristics and e!ciency gains in global multicast trees. IEEE INFOCOM2001,
Alaska.
Chen, L. Y. (1975). Poisson approximation for dependent trials. The Annals of
Probability 3, 3, 534—545.
Chen, W.-K. (1971). Applied Graph Theory. (North-Holland Publishing Company,
Amsterdam).
Chuang, J. and Sirbu, M. A. (1998). Pricing multicast communication: A costbased approach. Proceedings of the INET’98 .
Cohen, J. W. (1969). The Single Server Queue. (North-Holland Publishing Company, Amsterdam).
Cohen-Tannoudji, C., Diu, B., and Laloë, F. (1977). Mécanique Quantique. Vol. I
and II. (Hermann, Paris).
Comtet, L. (1974). Advanced Combinatorics, revised and enlarged edn. (D. Riedel
Publishing Company, Dordrecht, Holland).
Cormen, T. H., Leiserson, C. E., and Rivest, R. L. (1991). An Introduction to
Algorithms. (MIT Press, Boston).
Cvetkovic, D. M., Doob, M., and Sachs, H. (1995). Spectra of Graphs, Theory and
Applications, third edn. (Johann Ambrosius Barth Verlag, Heidelberg).
Dorogovtsev, S. N. and Mendes, J. F. F. (2003). Evolution of Networks, From
Biological Nets to the Internet and WWW. (Oxford University Press, Oxford).
Embrechts, P., Klüppelberg, C., and Mikosch, T. (2001a). Modelling Extremal
Events for Insurance and Finance, 3rd edn. (Springer-Verlag, Berlin).
Embrechts, P., McNeil, A., and Straumann, D. (2001b). Correlation and Dependence in Risk Management: Properties and Pitfalls. Risk Management: Value at
Risk and Beyond, ed. M. Dempster and H. K. Moatt, (Cambridge University
Press, Cambridge, UK).
Erdös, P. and Rényi, A. (1959). On random graphs. Publicationes Mathematicae
Debrecen 6, 290—297.
Erdös, P. and Rényi, A. (1960). On the evolution of random graphs. Magyar Tud.
Akad. Mat. Kutato Int. Kozl. 5, 17—61.
Feller, W. (1970). An Introduction to Probability Theory and Its Applications, 3rd
edn. Vol. 1. (John Wiley & Sons, New York).
Feller, W. (1971). An Introduction to Probability Theory and Its Applications, 2nd
edn. Vol. 2. (John Wiley & Sons, New York).
Floyd, S. and Paxson, V. (2001). Di!culties in simulating the internet. IEEE
Transactions on Networking 9, 4 (August), 392—403.
Fortz, B. and Thorup, M. (2000). Internet tra!c engineering by optimizing OSPF
weights. IEEE INFOCOM2000 .
Frieze, A. M. (1985). On the value of a random minimum spanning tree problem.
Discrete Applied Mathematics 10, 47—56.
Gallager, R. G. (1996). Discrete Stochastic Processes. (Kluwer Academic Publishers, Boston).
Gantmacher, F. R. (1959a). The Theory of Matrices. Vol. I. (Chelsea Publishing
Company, New York).
Gantmacher, F. R. (1959b). The Theory of Matrices. Vol. II. (Chelsea Publishing
Company, New York).
Gauss, C. F. (1821). Theoria combinationis observationum erroribus minimus obnoxiae. Pars prior. Gauss Werke 4, 3—26.
BIBLIOGRAPHY
525
Gilbert, E. N. (1956). Enumeration of labelled graphs. Canadian Journal of Mathematics 8, 405—411.
Gnedenko, B. V. and Kovalenko, I. N. (1989). Introduction to Queuing Theory,
second edn. (Birkhauser, Boston).
Golub, G. H. and Loan, C. F. V. (1983). Matrix Computations. (North Oxford
Academic, Oxford).
Goulden, I. P. and Jackson, D. M. (1983). Combinatorial Enumeration. (John
Wiley & Sons, New York).
Grimmett, G. R. (1989). Percolation. (Springer-Verlag, New York).
Grimmett, G. R. and Stirzacker, D. (2001). Probability and Random Processes, 3rd
edn. (Oxford University Press, Oxford).
Hardy, G. H. (1948). Divergent Series. (Oxford University Press, London).
Hardy, G. H., Littlewood, J. E., and Polya, G. (1999). Inequalities, 2nd edn.
(Cambridge University Press, Cambridge, UK).
Hardy, G. H. and Wright, E. M. (1968). An Introduction to the Theory of Numbers,
4th edn. (Oxford University Press, London).
Harris, T. E. (1963). The Theory of Branching Processes. (Springer-Verlag, Berlin).
Harrison, J. M. (1990). Brownian Motion and Stochastic Flow Systems. (Krieger
Publishing Company, Malabar, Florida).
van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2001). First passage
percolation on the random graph. Probability in the Engineering and Informational Sciences (PEIS) 15, 225—237.
van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2002a). The flooding
time in random graphs. Extremes 5, 2 (June), 111—129.
van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2002b). On the
covariance of the level sizes in recursive trees. Random Structures and Algorithms 20, 519—539.
van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2005). Distances
in random graphs with finite variance degree. Random Structures and Algorithms 27, 1 (August), 76—123.
van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2006a). Size and
weight of shortest path trees with exponential link weights. Combinatorics, Probability and Computing.
van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2006b). The weight
of the shortest path tree. Random Structures and Algorithms.
Hooghiemstra, G. and Koole, G. (2000). On the convergence of the power series
algorithm. Performance Evaluation 42, 21—39.
Hooghiemstra, G. and Van Mieghem, P. (2005). On the mean distance in scale free
graphs. Methodology and Computing in Applied Probability (MCAP) 7, 285—306.
Jamin, S., C. Jin, A. R. Kurc, D. R., and Shavitt, Y. (2001). Constrained mirror
placement on the internet. IEEE INFOCOM’01 .
Janic, M., Kuipers, F., Zhou, X., and Van Mieghem, P. (2002). Implications for QoS
provisioning based on traceroute measurements. Proceedings of 3nd International
Workshop on Quality of Future Internet Services, QofIS2002 ed. B. Stiller et al.,
Zurich, Switzerland, Springer Verlag LNCS 2511 , 3—14.
Janson, S. (1995). The minimal spanning tree in a complete graph and a functional limit theorem for trees in a random graph. Random Structures and Algorithms 7, 4 (December), 337—356.
526
BIBLIOGRAPHY
Janson, S. (2002). On concentration of probability. Contemporary Combinatorics,
ed. B. Bollobás, Bolyai Soc. Math. Stud. 10, János Bolyai Mathematical Society,
Budapest, 289—301.
Janson, S., Knuth, D. E., Luczak, T., and Pittel, B. (1993). The birth of the giant
component. Random Structures and Algorithms 4, 3, 233—358.
Karlin, S. and Taylor, H. M. (1975). A First Course in Stochastic Processes, 2nd
edn. (Academic Press, San Diego).
Karlin, S. and Taylor, H. M. (1981). A Second Course in Stochastic Processes.
(Academic Press, San Diego).
Kelly, F. P. (1991). Special invited paper: Loss networks. The Annals of Applied
Probability 1, 3, 319—378.
Kleinrock, L. (1975). Queueing Systems. Vol. 1 — Theory. (John Wiley and Sons,
New York).
Kleinrock, L. (1976). Queueing Systems. Vol. 2 — Computer Applications. (John
Wiley and Sons, New York).
Krishnan, P., Raz, D., and Shavitt, Y. (2000). The cache location problem.
IEEE/ACM Transactions on Networking 8, 5 (October), 586—582.
Kuipers, F. A. and Van Mieghem, P. (2003). The Impact of Correlated Link Weights
on QoS Routing. IEEE INFOCOM03 .
Lanczos, C. (1988). Applied Analysis. (Dover Publications, Inc., New York).
Langville, A. N. and Meyer, C. D. (2005). Deeper inside PageRank. Internet
Mathematics 1, 3 (Februari), 335—380.
Le Boudec, J.-Y. and Thiran, P. (2001). Network Calculus, A Theory of Deterministic Queuing Systems for the Internet. (Springer Verlag, Berlin).
Leadbetter, M. R., Lindgren, G., and Rootzen, H. (1983). Extremes and Related
Properties of Random Sequences and Processes. (Springer-Verlag, New York).
Leon-Garcia, A. (1994). Probability and Random Processes for Electrical Engineering, 2nd edn. (Addison-Wesley, Reading, Massachusetts).
van Lint, J. H. and Wilson, R. M. (1996). A course in Combinatorics. (Cambridge
University Press, Cambridge, UK).
Lovász, L. (1993). Random Walks on Graphs: A Survey. Combinatorics 2, 1—46.
Markushevich, A. I. (1985). Theory of functions of a complex variable. Vol. I — III.
(Chelsea Publishing Company, New York).
Mehta, M. L. (1991). Random Matrices, 2nd edn. (Academic Press, Boston).
Meyer, C. D. (2000). Matrix Analysis and Applied Linear Algebra. (Society for
Industrial and Applied Mathematics (SIAM), Philadelphia).
Mitra, D. (1988). Stochastic theory of a fluid model of producers and consumers
coupled by a buer. Advances in Applied Probability 20, 646—676.
Morse, P. M. and Feshbach, H. (1978). Methods of Theoretical Physics. (McGrawHill Book Company, New York).
Neuts, M. F. (1989). Structured Stochastic Matrices of the M/G/1 Type and Their
Applications. (Marcel Dekker Inc., New York).
Norros, I. (1994). A storage model with self-similar input. Queueing Systems 16, 34, 387—396.
Pascal, B. (1954). Oeuvres completes. Bibliothèque de la Pléade, (Gallimard, Paris).
Paxson, V. (1997). End-to-end Routing Behavior in the Internet. IEEE/ACM
Transactions on Networking 5, 5 (October), 601—615.
Phillips, G., Schenker, S., and Tangmunarunkit, H. (1999). Scaling of multicast
trees: Comments on the chuang-sirbu scaling law. ACM Sigcomm99 .
BIBLIOGRAPHY
527
Pietronero, L. and Schneider, W. (1990). Invasion percolation as a fractal growth
problem. Physica A 170, 81—104.
Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992).
Numerical Recipes in C , 2nd edn. (Cambridge University Press, New York).
Rainville, E. D. (1960). Special Functions. (Chelsea Publishing Company, New
York).
Riordan, J. (1968). Combinatorial Identities. (John Wiley & Sons, New York).
Roberts, J. W. (1991). Performance Evaluation and Design of Multiservice Networks. Information Technologies and Sciences, vol. COST 224. (Commission of
the European Communities, Luxembourg).
Robinson, S. (2004). The prize of anarchy. SIAM News 37, 5 (June), 1—4.
Ross, S. M. (1996). Stochastic Processes, 2nd edn. (John Wiley & Sons, New York).
Royden, H. L. (1988). Real Analysis, 3rd edn. (Macmillan Publishing Company,
New York).
Sansone, G. and Gerretsen, J. (1960). Lectures on the Theory of Functions of a
Complex Variable. Vol. 1 and 2. (P. Noordho, Groningen).
Schoutens, W. (2000). Stochastic Processes and Orthogonal Polynomials. (SpringerVerlag, New York).
Siganos, G., Faloutsos, M., Faloutsos, P., and Faloutsos, C. (2003). Power laws and
the AS-level internet topology. IEEE/ACM Transactions on Networking 11, 4
(August), 514—524.
Smythe, R. T. and Mahmoud, H. M. (1995). A survey of recursive trees. Theory
of Probability and Mathematical Statistics 51, 1—27.
Steyaert, B. and Bruneel, H. (1994). Analytic derivation of the cell loss probability
in finite multiserver buers, from infinite buer results. Proceedings of the second
workshop on performance modelling and evaluation of ATM networks, Bradford
UK , 18.1—11.
Strogatz, S. H. (2001). Exploring complex networks. Nature 410, 8 (March), 268—
276.
Syski, R. (1986). Introduction to Congestion Theory in Telephone Systems, 2nd
edn. Studies in Telecommunication, vol. 4. (North-Holland, Amsterdam).
Titchmarsh, E. C. (1948). Introduction to the Theory of Fourier Integrals, 2nd edn.
(Oxford University Press, Ely House, London W. I).
Titchmarsh, E. C. (1964). The Theory of Functions. (Oxford University Press,
Amen House, London).
Titchmarsh, E. C. and Heath-Brown, D. R. (1986). The Theory of the Zetafunction, 2nd edn. (Oxford Science Publications, Oxford).
Van Mieghem, P. (1996). The asymptotic behaviour of queueing systems: Large
deviations theory and dominant pole approximation. Queueing Systems 23, 27—
55.
Van Mieghem, P. (2001). Paths in the simple random graph and the Waxman
graph. Probability in the Engineering and Informational Sciences (PEIS) 15,
535—555.
Van Mieghem, P. (2004a). Data Communications Networking. (Delft University of
Technology, Delft).
Van Mieghem, P. (2004b).
The Probability Distribution of the Hopcount
to an Anycast Group.
Delft University of Technology, Report 2003605
(www.nas.ewi.tudelft.nl/people/Piet/teleconference) .
528
BIBLIOGRAPHY
Van Mieghem, P. (2005).
The limit random variable W of a branching process.
Delft University of Technology, Report 20050206
(www.nas.ewi.tudelft.nl/people/Piet/teleconference) .
Van Mieghem, P., Hooghiemstra, G., and van der Hofstad, R. (2000). A Scaling Law
for the Hopcount in the Internet. Delft University of Technology, Report2000125
(www.nas.ewi.tudelft.nl/people/Piet/telconference).
Van Mieghem, P., Hooghiemstra, G., and van der Hofstad, R. (2001a). On the
e!ciency of multicast. IEEE/ACM Transactions on Networking 9, 6 (December),
719—732.
Van Mieghem, P., Hooghiemstra, G., and van der Hofstad, R. W. (2001b). Stochastic model for the number of traversed routers in internet. Proceedings of Passive
and Active Measurement: PAM-2001, April 23-24, Amsterdam.
Van Mieghem, P. and Janic, M. (2002). Stability of a multicast tree. Proceedings
IEEE INFOCOM2002 2, 1099—1108.
Veres, A. and Boda, M. (2000). The chaotic nature of TCP congestion control.
IEEE INFOCOM’2000, Tel-Aviv, Israel .
Walrand, J. (1998). Communication Networks, A First Course, 2nd edn. (McGrawHill, Boston).
Wästlund, J. (2005). Evaluation of Janson’s constant for the variance in the random minimum spanning tree problem. Linköping studies in Mathematics. Series
editor: Bengt Ove Turesson 7 (www.ep.liu.se/ea/lsm/2005/007).
Waxman, B. M. (1998). Routing of multipoint connections. IEEE Journal on
Selected Areas in Communications 6, 9 (December), 1617—1622.
Whittaker, E. T. and Watson, G. N. (1996). A Course of Modern Analysis, Cambridge Mathematical Library edn. (Cambridge University Press, Cambridge,
UK).
Wigner, E. P. (1955). Characteristic vectors of bordered matrices with infinite
dimensions. Annals of Mathematics 62, 3 (November), 548—564.
Wigner, E. P. (1957). Characteristic vectors of bordered matrices with infinite
dimensions ii. Annals of Mathematics 65, 2 (March), 203—207.
Wigner, E. P. (1958). On the distribution of the roots of certain symmetric matrices.
Annals of Mathematics 67, 2 (March), 325—327.
Wilkinson, J. H. (1965). The Algebraic Eigenvalue Problem. (Oxford University
Press, New York).
Wol, R. W. (1982). Poisson arrivals see time averages. Operations Research 30, 2
(April), 223—231.
Wol, R. W. (1989). Stochastic Modeling and the Theory of Queues. (Prentice-Hall
International Editions, New York).
Index
n-ary tree, 387, 401, 414, 417, 423, 432, 522
Pareto, 56
Poisson, 40, 116, 129, 335
polynomial, 44, 348, 494
regular, 348, 362
uniform, 43, 74
Weibull, 55, 107, 132
adjacency matrix, 320, 471, 488
eigenvalues, 475
Bayes’ rule, 28, 197
Benes̆’ equation, 261, 301
cell loss ratio (clr), 309, 512
Central Limit Theorem, 104, 148, 366, 377,
516
Cherno bound, 88
Chuang—Sirbu scaling law, 404—407
complete graph, 319, 321, 327, 347, 349, 359,
371, 373, 380, 392, 473, 481, 482, 488, 520
conditional distribution function, 28
conditional expectation, 34, 233, 341
conditional probability, 26
conditional probability density function, 28
correlation coe!cient, 30, 61, 67, 69, 74, 119
covariance, 29, 71, 78
matrix, 63, 66
degree graph, 323
degree of a node, 225, 322, 472
disjoint paths, 520
Merger’s Theorem, 327
distribution
n-th order statistics, 53, 494
Bernoulli, 37
binomial, 38, 332, 488
Cauchy, 54
chi-square, 50
Erlang, 48, 125, 274, 278
exponential, 44, 75
extremal, 106
Fréchet, 107
Gamma, 48, 51, 125
Gaussian, 46, 103, 400
geometric, 39, 272
Gumbel, 54, 107, 400
joint Gaussian, 64
lognormal, 57, 77
Engset formula, 314, 511
Erlang B formula, 2, 280
Erlang C formula, 277
event, 10, 53
mutually exclusive, 10
failure rate, 131
flooding time, 362
giant component, 335, 337, 339
Google, 224
graph connectivity, 325, 486
edge connectivity, 326, 487
vertex connectivity, 326, 487
graph metrics
betweenness, 329
clustering coe!cient, 328, 346, 513
diameter, 475
distortion, 329
expansion, 328
hopcount, 329
resilience, 329
histogram, 118, 494
hopcount, 340, 347, 354, 357, 387, 392, 403,
409, 418, 420, 423, 431, 513, 520
incidence matrix, 471
inclusion-exclusion formula, 12, 335, 391
sieve of Eratosthenes, 15
indicator function, 12, 17, 43, 321
inequality
Boole, 15
Cauchy-Schwarz, 90, 91, 480
Chebyshev, 88
Gauss, 92
529
530
Index
Hölder, 90, 480
Jensen, 85, 342
Markov, 88
Minkowsky, 91
infinitesimal generator, 181
Laplacian (admittance matrix), 472, 486
law of rare events, 41, 128, 495
law of total probability, 27, 123, 142, 159, 204,
205, 238, 255, 274, 278, 284, 295, 324,
341, 367, 410, 422, 445, 495
level set of a tree, 352
Lindley’s equation, 255
link weight, 320, 340, 341, 347, 349, 359, 362,
373, 392, 406, 408
Little’s law, 267, 273, 275, 281, 287, 297, 508,
510
Markov chain
absorbing states, 164
communicating states, 162
conservative, 181
continuous-time, 179
discrete-time, 158
embedded, 186, 188
hitting time, 163
irreducible Markov chain, 161, 226
periodic and aperiodic, 162
transient and recurrent states, 165
mean time to failure, 131
memoryless property, 27, 40, 45, 125, 132, 185,
351
Metcalfe’s law, 320
minimum spanning tree (MST), 373, 399
modes of convergence, 99
Newton identities for polynomials, 477
order statistics, 52, 127
PageRank (Google), 224
phase transition, 335, 376
Poisson arrivals see time averages (PASTA),
267, 274, 275, 283, 288, 312, 509, 511
Pollaczek-Khinchin equation, 286
power law, 325
probability density function (pdf), 16, 20, 22
joint, 28, 32
probability generating function (pgf), 18
logarithm of, 19, 25
moment generating function, 25, 235
process
arrival, 248, 270
birth and death, 208, 304, 351
branching, 229, 342
geometric, 244
Poisson, 246, 345
counting, 263
Markov, 180, 253, 349
balance equation, 187
Chapman-Kolmogorov equation, 180
forward and backward equation, 182, 195
time reversibility, 196
nonhomogeneous Poisson, 129
Poisson, 120, 210
queueing, 250
renewal, 137
service, 249, 270
stochastic, 115
modeling, 117
Yule, 212, 230
quality of service (QoS), 2, 249, 283, 309, 340,
419
random graph, 330, 332, 337, 339, 346, 354,
362, 373, 374, 377, 387, 392, 403, 404,
406, 408, 410, 488, 513
random variable
continuous, 20, 59
discrete, 16, 58
expectation, 17, 22
indepedent, 97, 104
independent, 28, 29, 32, 34, 47, 49, 51, 78
normalized, 31, 93, 400
random vector, 62
random walk, 202, 484
redundancy level, 325
regular graphs, 322, 328, 475, 482, 486, 488
reliability function, 131
renewal
alternating renewal process, 153
Blackwell’s Renewal Theorem, 146
Elementary Renewal Theorem, 145, 170
inspection paradox, 152
Key Renewal Theorem, 146, 151
renewal equation, 141
renewal function, 140
renewal process, 137
renewal theory, 165
server placement problem, 419, 424, 429
shortest path, 340, 347
tree (SPT), 387, 392, 399, 407, 419, 428
slotted Aloha, 219
stochastic matrix, 159, 450, 451, 455, 473
total variation distance, 42
transition probability matrix, 159, 190, 201
spectral decomposition, 184
unfinished work, 256, 261, 263, 301
uniform recursive tree (URT), 354, 380, 392,
404, 407, 411, 417, 419, 422, 424
uniformization, 189
Wald’s identity, 34, 145, 154
web graph, 224, 323
Wigner’s Semicircle Law, 489
Download