Uploaded by Kah Ken

Stochastic Processes Theory for Applications (Instructor's Solution Manulal) (Solutions)

advertisement
1
Stochastic Processes,
Theory for Applications
Solutions to Exercises
Updated and corrected
as of October 16, 2014
R.G.Gallager
These solutions are available to instructors from Cambridge Press at www.Cambridge.org.
If any errors or typos are discovered, please notify Robert Gallager at gallager@mit.edu so
that they can be corrected.
The author gratefully acknowleges the help of Shan-Yuan Ho, who has edited many of
these solutions, and of a number of teaching assistants, particulartly Natasha Blitvic and
Mina Karzand, who wrote earlier drafts of some solutions. Thanks are also due to Kluwer
Academic Press for permission to use a number of exercises that also appeared in Gallager,
‘Discrete Stochastic Processes,’ Kluwer, 1995. The original solutions to those exercises
were prepared by Shan-Yuan Ho, but changed substantially here by the author both to be
consistent with the new text and occasionally for added clarity.
2
A.1
APPENDIX A. SOLUTIONS TO EXERCISES
Solutions for Chapter 1
Exercise 1.1: Let A1 and A2 be arbitrary events and show that Pr{A1
S
A2 } + Pr{A1 A2 } = Pr{A1 } +
Pr{A2 }. Explain which parts of the sample space are being double-counted on both sides of this equation
and which parts are being counted once.
S
Solution: As shown in the figure below, A1 A2 is part of A1 A2 and is thus being double
counted on the left side of the equation. It is also being double counted on the right (and
is in fact the meaning of A1 A2 as those sample points that are both in A1 and in A2 ).
S
⇥ ⇠
B ⇠
⇥ ? BN
A1
A2
AKACOC
6
⇤⇤⌫ ⇢
⇢
⇡
⇡
⇤
AC
A1 A1 A2 A2
Exercise 1.2: This exercise derives the probability of an arbitrary (non-disjoint) union of events, derives
the union bound, and derives some useful limit expressions.
a) For 2 arbitrary events A1 and A2 , show that
[
[
A1
A2 = A1 (A2 A1 ),
where A2 A1 = A2 Ac1 . Show that A1 and A2
(A.1)
A1 are disjoint. Hint: This is what Venn diagrams were
invented for.
Solution: Note that each sample point ! is in A1 or Ac1 , but not both. Thus each ! is in
exactly one of A1 , Ac1 A2 or Ac1 Ac2 . In the first two cases, ! is in both sides of (A.1) and in
the last case it is in neither. Thus the two sides of (A.1) are identical. Also, as pointed out
above, A1 and A2 A1 are disjoint. These results are intuitively obvious from the Venn
diagram,
A1 Ac2 A1 A2 A2 Ac1 = A2 A1
⇠
⇠
?↵
6
6
⇥ BMB⇡
⇥⇢
⇢
⇡
⇥
B
A
AU
A1
A2
Sn 1
2 and arbitrary events A1 , . . . , An , define Bn = An
i=1 Ai . Show that B1 , B2 , . . . are
Sn
Sn
disjoint events and show that for each n 2,
A
=
B
.
Hint:
Use induction.
i=1 i
i=1 i
b) For any n
S
Solution:
Let
B
=
A
.
From
(a)
B
and
B
are
disjoint
and
(from
(A.1)),
A
A2 =
1
1
1
2
1
S
Sn
Sn
B1 B2 . Let Cn = i=1 Ai . We use induction
S to prove that Cn = i=1 Bi and that the
Bn are disjoint. We haveS
seen that C2 = B1 B2 , which
Snforms the basis for the induction.
n 1
We assume that Cn 1 = i=1 Bi and prove that Cn = i=1 Bi .
[
[
Cn = Cn 1 An = Cn 1 An Cnc 1
[
[n
= Cn 1 Bn =
Bi .
i 1
In the second equality, we used (A.1), letting Cn 1 play the role of A1 and An play the role
3
A.1. SOLUTIONS FOR CHAPTER 1
of A2 . From this same application
of (A.1), we also see that Cn 1 and Bn = An Cn 1 are
S
disjoint. Since Cn 1 = ni=11 Bi , this also shows that Bn is disjoint from B1 , . . . , Bn 1 .
c) Show that
Pr
S1
n[ 1
n=1
An
o
= Pr
n[ 1
n=1
Bn
o
=
X1
n=1
Pr{Bn } .
Sn
Solution:
If
!
2
A
,
then
it
is
in
A
for
some
n
1.
Thus
!
2
n
n
n=1
i=1
S1
S1
S1Bi , and thus
! 2 n=1 Bn . The same argument works the other way, so n=1 An = n=1 Bn . This
establishes the first equality above, and the second is the third axiom of probability.
d) Show that for each n, Pr{Bn }  Pr{An }. Use this to show that
n[ 1
o
X1
Pr
An 
Pr{An } .
n=1
n=1
Sn 1
Solution: Since Bn = An
i=1 Ai , we see that ! 2 Bn implies that ! 2 An , i.e., that
Bn ✓ An . From (1.5), this implies that Pr{Bn }  Pr{An } for each n. Thus
n[1
o
X1
X1
Pr
An =
Pr{Bn } 
Pr{An } .
n=1
e) Show that Pr
S1
n=1 An
n=1
= limn!1 Pr
Sn
i=1 Ai
n=1
. Hint: Combine (c) and (b). Note that this says that
the probability of a limit is equal to the limit of the probabilities. This might well appear to be obvious
without a proof, but you will see situations later where similar appearing interchanges cannot be made.
Solution: From (c),
n[1
o
X1
Xk
Pr
An =
Pr{Bn } = lim
n=1
n=1
n=1
k!1
Pr{Bn } .
From (b), however,
Xk
n=1
Pr{Bn } = Pr
( k
[
Bn
n=1
)
= Pr
( k
[
An
n=1
)
.
Combining the first equation with the limit in k of the second yields the desired result.
f ) Show that Pr
T1
n=1 An
= limn!1 Pr
Tn
i=1 Ai
. Hint: Remember De Morgan’s equalities.
Solution: Using De Morgans equalities,
(1
)
(1
)
\
[
c
Pr
An
= 1 Pr
An = 1
n=1
=
lim Pr
k!1
n=1
( k
\
n=1
An
)
lim Pr
k!1
( k
[
n=1
Acn
)
.
Exercise 1.3: Find the probability that a five card poker hand, chosen randomly from a 52 card deck,
contains 4 aces. That is, if all 52! arrangements of a deck of cards are equally likely, what is the probability
that all 4 aces are in the first 5 cards of the deck.
Solution: The ace of spades can be in any of the first 5 positions, the ace of hearts in any
of the 4 remaining positions out of the first 5, and so forth for the other two aces. The
4
APPENDIX A. SOLUTIONS TO EXERCISES
remaining 48 cards can be in any of the remaining 48 positions. Thus there are (5·4·3·2)48!
permutations of the 52 cards for which the first 5 cards contain 4 aces. Thus
Pr{4 aces} = =
5!48!
5·4·3·2
=
= 1.847 ⇥ 10 5 .
52!
52 · 51 · 50 · 49
Exercise 1.4: Consider a sample space of 8 equiprobable sample points and let A1 , A2 , A3 be three
events each of probability 1/2 such that Pr{A1 A2 A3 } = Pr{A1 } Pr{A2 } Pr{A3 }.
a) Create an example where Pr{A1 A2 } = Pr{A1 A3 } = 14 but Pr{A2 A3 } = 18 . Hint: Make a table with a
row for each sample point and a column for each of the above 3 events and try di↵erent ways of assigning
sample points to events (the answer is not unique).
Solution: Note that exactly one sample point must be in A1 , A2 , and A3 in order to make
Pr{A1 A2 A3 } = 1/8. In order to make Pr{A1 A2 } = 1/4, there must be one additional
sample point that contains A1 and A2 but not A3 . Similarly, there must be one sample
point that contains A1 and A3 but not A2 . These points give rise to the first three rows
in the table below. There can be no additional sample point containing A2 and A3 since
Pr{A2 A3 } = 1/8. Thus each remaining sample point can be in at most 1 of the events
A1 , A2 , and A3 . Since Pr{Ai } = 1/2 for 1  i  3 two sample points must contain A2
alone, two must contain A3 alone, and a single sample point must contain A1 alone. This
uniquely specifies the table below except for which sample point lies in each event.
Sample points
1
2
3
4
5
6
7
8
A1
1
1
1
1
0
0
0
0
A2
1
1
0
0
1
1
0
0
A3
1
0
1
0
0
0
1
1
b) Show that, for your example, A2 and A3 are not independent. Note that the definition of statistical independence would be very strange if it allowed A1 , A2 , A3 to be independent while A2 and A3 are dependent.
This illustrates why the definition of independence requires (1.14) rather than just (1.15).
Solution: Note that Pr{A2 A3 } = 1/8 6= Pr{A2 } Pr{A3 }, so A2 and A3 are dependent.
We also note that Pr{Ac1 Ac2 Ac3 } = 0 6= Pr{Ac1 } Pr{Ac2 } Pr{Ac3 }, further reinforcing the
conclusion that A1 , A2 , A3 are not statistically independent. Although the definition in
(1.14) of statistical independence of more than 2 events looks strange, it is clear from this
example that (1.15) is insufficient in the sense that it only specifies part of the above table.
Exercise 1.5: This exercise shows that for all rv’s X, FX (x) is continuous from the right.
a) For any given rv X, any real number x, and each integer n
1, let An = {! : X > x + 1/n},
andn show that
o A1 ✓ A2 ✓ · · · . Use this and the corollaries to the axioms of probability to show that
S
Pr
= limn!1 Pr{An }.
n 1 An
5
A.1. SOLUTIONS FOR CHAPTER 1
1
1
Solution: If X(!) > x + n1 , then (since n1 > n+1
), we also have X(!) > x + n+1
. Thus
nS
o
An ✓ An+1 for all n 1. Thus from (1.9), Pr
n 1 An = limn!1 Pr{An }.
b) Show that Pr
nS
n 1 An
o
= Pr{X > x} and that Pr{X > x} = limn!1 Pr{X > x + 1/n}.
Solution: If X(!) > x,Sthen there must be an n sufficiently large that X(!) > x + 1/n.
Thus {! : X > x} ✓ n 1 An }. The subset inequality goes the other way also since
X(!) > x + 1/n for any n
1 implies that X(!)
nS > x.o Since these represent the same
events, they have the same probability and Pr
n 1 An = Pr{X > x}. Then from (a)
we also have
Pr{X > x} = lim Pr{An } = lim Pr{X > x + 1/n} .
n!1
n!1
c) Show that for ✏ > 0, lim✏!0 Pr{X  x + ✏} = Pr{X  x}.
Solution: Taking the complement of both sides of the above equation, Pr{X  x} =
limn!1 Pr{X  x + 1/n}. Since Pr{X  x + ✏} is non-decreasing in ✏, it also follows that
for ✏ > 0, Pr{X  x} = lim✏!0 Pr{X  x + ✏}.
Exercise 1.6: Show that for a continuous nonnegative rv X,
Z 1
Pr{X > x} dx =
0
Z 1
xfX (x) dx.
(A.2)
0
Hint 1: First rewrite Pr{X > x} on the left side of (A.2) as
R1
x
fX (y) dy. Then think through, to your level
of comfort, how and why the order of integration can be interchnged in the resulting expression.
R1
Solution: We haveR Pr{X > x} = x fX (y) dy from
R a c the definition of a continuous rv.
1
We look at E [X] = 0 Pr{X > x} dx as lima!1 0 F (x) dx since the limiting operation
a ! 1 is where the interesting issue is.
Z a
Z aZ 1
Fc (x) dx =
fX (y) dy dx
0
0
x
Z aZ a
Z aZ 1
=
fX (y) dy dx +
fX (y) dy dx
0
a
Z0 a Zxy
=
fX (y) dx dy + aFcX (a).
0
0
We first broke the integral on the right into two parts, one for y < x and the other for y x.
Since the limits of integration on the first part were finite, they could be interchanged. The
inner integral of the first part is yfX (y), so
Z a
Z a
lim
FcX (x) dx = lim
yfX (y) dy + lim aFcX (a).
a!1 0
a!1 0
a!1
Assuming that E [X] exists, the integral on the left is nondecreasing in A and has the finite
limit X. The first integral on the right is also nondecreasing and upper bounded by the
first integral, so it also has a limit. This means that lima!1 aFcX (a) must also have a limit,
say . Now if > 0, then for any ✏ 2 (0, a), aFX (a) >
✏ for all sufficiently large a. For
6
APPENDIX A. SOLUTIONS TO EXERCISES
R1
all such a, then FcX (a) > (
✏)/a. This would imply that X = 0 FcX (x) dx = 1, which
is a contradiction. Thus = 0, i.e., lima!1 aFcX (a) = 0, establishing (A.2) for the case
where E [X] is finite. The case where E [X] is infinite is a minor perturbation.
c (a) = 0 is also important and can be seen intuitively from
The result that lima!1 aFX
Figure 1.3.
Hint 2: As an alternate approach, derive (A.2) using integration by parts.
Solution: Using integration by parts and being less careful,
Z 1
d xFcX (x)
0
=
Z 1
xfX (x) dx +
0
Z 1
0
FcX (x) dx.
The left side is lima!1 aFcX (a) 0FX (0) so this shows the same thing, again requiring the
fact that lima!1 aFcX (a) = 0 when E [X] exists.
Exercise 1.7:
Suppose X and Y are discrete rv’s with the PMF pXY (xi , yj ). Show (a picture will
help) that this is related to the joint CDF by
pXY (xi , yj ) =
lim
>0, !0
[F(xi , yj )
F(xi
, yj )
F(xi , yj
) + F(xi
, yj
)] .
Solution: The picture below makes this equation obvious. Note that F(xi , yj ) is the
probability of the quadrant of joint sample values (x, y) that satisfy (x, y)  (xi , yj ). The
term FXY (xi , yj ) is the probability of the hatched region on the left and FXY (xi , yj )
FXY (xi , yj ) is the probability of the hatched region on the right. Thus the expression on the right of the equation is the probability of the by square to the left and
below (xi , yj ). When this square becomes too small to include any other sample point, the
probability of the square is pXY (xi , yj ).
s(xi , yj )
FXY (xi
@ F (x , y
@ XYF i (xj
XY
i
@
@
, yj )
)
, yj
)
Exercise 1.8: A variation of Example 1.5.1 is to let M be a random variable that takes on both positive
and negative values with the PMF
pM (m) =
1
.
2|m| (|m| + 1)
In other words, M is symmetric around 0 and |M | has the same PMF as the nonnegative rv N of Example
1.5.1.
a) Show that
P
m 0 mpM (m) = 1 and
P
m<0 mpM (m) =
1. (Thus show that the expectation of M not
only does not exist but is undefined even in the extended real number system.)
7
A.1. SOLUTIONS FOR CHAPTER 1
Solution:
X
m 0
X
m<0
mpM (m) =
mpM (m) =
X
1
m 0 2(|m| + 1)
X
1
m<0 2(|m| + 1)
= 1
=
1.
P
b) Suppose that the terms in 1
m= 1 mpM (m) are summed in the order of 2 positive terms for each negative
term (i.e., in the order 1, 2, 1, 3, 4, 2, 5, · · · ). Find the limiting value of the partial sums in this series.
Hint: You may find it helpful to know that
X
Z n
n
1
1
lim
dx = ,
n!1
i=1 i
x
1
where
is the Euler-Mascheroni constant,
= 0.57721 · · · .
Solution: The sum after 3n terms is
2n
X
1
2(m
+ 1)
m=1
n
X
2n+1
1
1 X 1
=
.
2(m + 1)
2
i
m=1
i=n+2
Taking the limit as n ! 1, the Euler-Mascheroni constant cancels out and
Z
2n+1
1 X 1
1 2n+1 1
1
lim
= lim
dx = = ln 2.
n!1 2
n!1 2 n+2
i
x
2
i=n+2
c) Repeat (b) where, for any given integer k > 0, the order of summation is k positive terms for each negative
term.
Solution: This is done the same way, and the answer is 12 ln k. What the exercise essentially
shows is that in a sum for which both the positive terms sum to infinity and the negative
terms sum to 1, one can get any desired limit by summing terms in an appropriate order.
In fact, to reach any desired limit, one alternates between positive terms until exceeding the
desired limit, then negative terms until falling below the desired limit, then positive terms
again, etc.
Exercise 1.9: (Proof of Theorem 1.4.1) The bounds on the binomial in this theorem are based on
the Stirling bounds. These say that for all n 1, n! is upper and lower bounded by
⇣ n ⌘n
⇣ n ⌘n
p
p
2⇡n
< n! < 2⇡n
e1/12n .
(A.3)
e
e
p
n
The ratio, 2⇡n(n/e)
/n!, of the first two terms is monotonically increasing with n toward the limit 1,
p
and the ratio 2⇡n(n/e)n exp(1/12n)/n! is monotonically decreasing toward 1. The upper bound is more
accurate, but the lower bound is simpler and known as the Stirling approximation. See [8] for proofs and
further discussion of the above facts.
a) Show from (A.3) and from the above monotone property that
! r
n
n
nn
<
.
k
2⇡k(n k) kk (n k)n k
Hint: First show that n!/k! <
p
n/k nn k k e n+k for k < n.
8
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: Since the ratio of the first two terms of (A.3) is increasing in n, we have
p
p
2⇡k(k/e)k /k! < 2⇡n(n/e)n /n!.
Rearranging terms, we have the result in the hint. Applying the first inequality of (A.3) to
n k and combining this with the result on n!/k! yields the desired result.
b) Use the result of (a) to upper bound pSn (k) by
r
n
pSn (k) <
2⇡k(n
pk (1 p)n k nn
.
k) kk (n k)n k
Show that this is equivalent to the upper bound in Theorem 1.4.1.
Solution: Using the binomial equation and then (a),
✓ ◆
r
n k
n
nn
n k
pSn (k) =
p (1 p)
<
pk (1
k
2⇡k(n k) kk (n k)n k
p)n k .
This is the the desired bound on pSn (k). Letting p̃ = k/n, this becomes
s
1
pp̃n (1 p)n(1 p̃)
pSn (p̃n) <
2⇡np̃(1 p̃) p̃p̃n (1 p̃)n(1 p̃)
s
✓ 
◆
1
p
1 p
=
exp n p̃ ln + p̃ ln
,
2⇡np̃(1 p̃)
p̃
1 p̃
which is the same as the upper bound in Theorem 1.4.1.
c) Show that
! r
n
n
>
k
2⇡k(n

nn
1
k) kk (n k)n k
n
12k(n
k)
.
Solution: Use the factorial lower bound on n! and the upper bound on k and (n
This yields
✓ ◆
✓
◆
r
n
n
nn
1
1
>
exp
k
2⇡k(n k) kk (n k)n k
12k 12(n k)

r
n
n
n
n
>
1
,
k
n
k
2⇡k(n k) k (n k)
12k(n k)
k)!.
where the latter equation comes from combining the two terms in the exponent and then
using the bound e x > 1 x.
d) Derive the lower bound in Theorem 1.4.1.
Solution: This follows by substituting p̃n for k in the solution to c) and substituting this
in the binomial formula.
e) Show that (p, p̃) = p̃ ln( p̃p ) + (1
p̃) ln( 11 p̃p ) is 0 at p̃ = p and nonnegative elsewhere.
Solution: It is obvious that (p, p̂) = 0 for p̃ = p. Taking the first two derivatives of (p, p̃)
with respect to p̃,
✓
◆
@ (p, p̃)
p(1 p̃)
@f 2 (p, p̃)
1
= ln
=
.
2
@ p̃
p̃(1 p)
@ p̃
p̃(1 p̃)
9
A.1. SOLUTIONS FOR CHAPTER 1
Since the second derivative is positive for 0 < p̃ < 1, the minimum of (p, p̃) with respect to
p̃ is 0, is achieved where the first derivative is 0, i.e., at p̃ = p. Thus (p, p̃) > 0 for p̃ 6= p.
Furthermore, (p, p̃) increases as p̃ moves in either direction away from p.
Exercise 1.10: Let X be a ternary rv taking on the 3 values 0, 1, 2 with probabilities p0 , p1 , p2
respectively. Find the median of X for each of the cases below.
a) p0 = 0.2, p1 = 0.4, p2 = 0.4.
b) p0 = 0.2, p1 = 0.2, p2 = 0.6.
c) p0 = 0.2, p1 = 0.3, p2 = 0.5.
Note 1: The median is not unique in (c). Find the interval of values that are medians. Note 2: Some people
force the median to be distinct by defining it as the midpoint of the interval satisfying the definition given
here.
Solution: The median of X is 1 for (a), 2 for (b), and the interval [1, 2) for (c).
d) Now suppose that X is nonnegative and continuous with the density fX (x) = 1 for 0  x  0.5 and
fX (x) = 0 for 0.5 < x  1. We know that fX (x) is positive for all x > 1, but it is otherwise unknown. Find
the median or interval of medians.
The median is sometimes (incorrectly) defined as that ↵ for which Pr{X > ↵} = Pr{X < ↵}. Show that it
is possible for no such ↵ to exist. Hint: Look at the examples above.
Solution: The interval of medians is [0.5, 1]. In particular, Pr{X  x} = 1/2 for all x in
this interval and Pr{X x} = 1/2 in this interval.
For each of the first 3 examples, there is no ↵ for which Pr{X < ↵} = Pr{X > ↵}.
One should then ask why there must always be an x such that Pr{X x}
1/2 and
Pr{X  x} 1/2. To see this, let xo = inf{x : FX (x) 1/2}. We must have FX (xo ) 1/2
since FX is continuous on the right. Because of the infimum, we must have FX (xo ✏) < 1/2
for all ✏ > 0, and therefore Pr{X xo ✏}
1/2. But Pr{X x} is continuous on the
left for the same reason that FX (x) is continuous on the right, and thus xo is a median of
X. This is the kind of argument that makes many people hate analysis.
R
R
Exercise 1.11: a) For any given rv Y , express E [|Y |] in terms of y<0 FY (y) dy and y 0 FcY (y) dy. Hint:
Review the argument in Figure 1.4.
Solution: We have seen in (1.34) that
Z
Z
E [Y ] =
FY (y) dy +
y<0
y 0
Since all negative values of Y become positive in |Y |,
Z
Z
E [|Y |] = +
FY (y) dy +
y<0
FcY (y) dy.
y 0
FcY (y) dy.
To spell this out in greater detail, let Y = Y + + Y where Y + = max{0, Y } and Y =
min{Y,
0}. Then Y = Y + +RY and |Y | = Y + Y = Y + + |Y |. Since E [Y + ] =
R
c
y 0 FY (y) dy and E [Y ] =
y<0 FY (y) dy, the above results follow.
10
APPENDIX A. SOLUTIONS TO EXERCISES
b) For some given rv X with E [|X|] < 1, let Y = X ↵. Using (a), show that
Z ↵
Z 1
E [|X ↵|] =
FX (x) dx +
FcX (x) dx.
1
↵
Solution: This follows by changing the variable of integration in (a). That is,
Z
Z
E [|X ↵|] = E [|Y |] = +
FY (y) dy +
FcY (y) dy
y<0
y 0
Z ↵
Z 1
=
FX (x) dx +
FcX (x) dx,
1
↵
where in the last step, we have changed the variable of integration from y to x
c) Show that E [|X
↵.
↵|] is minimized over ↵ by choosing ↵ to be a median of X. Hint: Both the easy way
and the most instructive way to do this is to use a graphical argument illustrating the above two integrals
Be careful to show that when the median is an interval, all points in this interval achieve the minimum.
Solution: As illustrated in the picture, we are minimizing an integral for which the integrand changes from FX (x) to FcX (x) at x = ↵. If FX (x) is strictly increasing in x, then
FcX = 1 FX is strictly decreasing. We then minimize the integrand over all x by choosing
↵ to be the point where the curves cross, i.e., where FX (x) = .5. Since the integrand has
been minimized at each point, the integral must also be minimized.
1
0.5
↵
FX (x)
FcX (x)
0
If FX is continuous but not strictly increasing, then there might be an interval over which
FX (x) = .5; all points on this interval are medians and also minimize the integral; Exercise
1.10 (c) gives an example where FX (x) = 0.5 over the interval [1, 2). Finally, if FX (↵) 0.5
and FX (↵ ✏) < 0.5 for some ↵ and all ✏ > 0 (as in parts (a) and (b) of Exercise 1.10),
then the integral is minimized at that ↵ and that ↵ is also the median.
Exercise 1.12: Let X be a rv with CDF FX (x). Find the CDF of the following rv’s.
a) The maximum of n IID rv’s, each with CDF FX (x).
Solution: Let M+ be the maximum of the n rv’s X1 , . . . , Xn . Note that for any real x,
M+ is less than or equal to x if and only if Xj  x for each j, 1  j  n. Thus
Pr{M+  x} = Pr{X1  x, X2  x, . . . , Xn  x} =
n
Y
j=1
Pr{Xj  x} ,
where we have used the independence of the Xj ’s. Finally, since Pr{Xj  x} = FX (x) for
n
each j, we have FM+ (x) = Pr{M+  x} = FX (x) .
b) The minimum of n IID rv’s, each with CDF FX (x).
11
A.1. SOLUTIONS FOR CHAPTER 1
Solution: Let M be the minimum of X1 , . . . , Xn . Then, in the same way as in ((a),
M > y if and only if Xj > y for 1  j  n and for all choice of y. We could make the
same statement using greater than or equal in place of strictly greater than, but the strict
inequality is what is needed for the CDF. Thus,
Pr{M > y} = Pr{X1 > y, X2 > y, . . . , Xn > y} =
n
Y
Pr{Xj > y} .
j=1
It follows that 1
⇣
FM (y) = 1
⌘n
FX (y) .
c) The di↵erence of the rv’s defined in a) and b); assume X has a density fX (x).
Solution: There are many difficult ways to do this, but also a simple way, based on first
conditioning on the event that X1 = x. Then X1 = M+ if and only if Xj  x for 2  j  n.
Also, given X1 = M+ = x, we have R = M+ M  r if and only if Xj > x r for
2  j  n. Thus, since the Xj are IID,
Pr{M+ =X1 , R  r | X1 = x} =
n
Y
j=2
Pr{x r < Xj  x}
= [Pr{x r < X  x}]n 1 = [FX (x)
FX (x
r)]n 1 .
We can now remove the conditioning by averaging over X1 = x. Assuming that X has the
density fX (x),
Z 1
Pr{X1 = M+ , R  r} =
fX (x) [FX (x) FX (x r)]n 1 dx.
1
Finally, we note that the probability that two of the Xj are the same is 0 so the events
Xj = M+ are disjoint except with zero probability. Also we could condition on Xj = x
instead of X1 with the same argument (i.e., by using symmetry), so Pr{Xj = M+ , R  r} =
Pr{X1 = M+ R  r} It follows that
Z 1
Pr{R  r} =
nfX (x) [FX (x) FX (x r)]n 1 dx.
1
The only place we really needed the assumption that X has a PDF was in asserting that
the probability that two or more of the Xj ’s are jointly equal to the maximum is 0. The
formula can be extended to arbitrary CDF’s by being careful about this possibility.
These expressions have a simple form if X is exponential with the PDF e
Then
Pr{M
y} = e n y ;
Pr{M+  y} = 1
e
y n
;
Pr{R  y} = 1
x for x
e
y n 1
0.
.
We will see how to derive the above expression for Pr{R  y} in Chapter 2.
Exercise 1.13: Let X and Y be rv’s in some sample space ⌦ and let Z = X + Y , i.e., for each
! 2 ⌦, Z(!) = X(!) + Y (!). The purpose of this exercise is to show that Z is a rv. This is a mathematical
fine point that many readers may prefer to simply accept without proof.
12
APPENDIX A. SOLUTIONS TO EXERCISES
a) Show that the set of ! for which Z(!) is infinite or undefined has probability 0.
Solution: Note that Z can be infinite (either ±1) or undefined only when either X or
Y are infinite or undefined. Since these are events of zero probability, Z can be infinite or
undefined only with probability 0.
b) We must show that {! 2 ⌦ : Z(!)  ↵} is an event for each real ↵, and we start by approximating
that event. To show that Z = X + Y is a rv, we must show that for each real number ↵, the set {! 2 ⌦ :
T
X(!) + Y (!)  ↵} is an event. Let B(n, k) = {! : X(!)  k/n} {Y (!)  ↵ + (1 k)/n} for integer k > 0.
S
Let D(n) = k B(n, k), and show that D(n) is an event.
Solution: We are trying to show that {Z  ↵} is an event for arbitrary ↵ and doing this
by first quantizing X and Y into intervals of size 1/n where k is used to number these
quantized elements. Part (c) will make sense of how this is related to {Z  ↵, but for
now we simply treat the sets as defined. Each set B(n, k) is an intersection of two events,
namely the event {! : X(!)  k/n} and the event {! : Y (!)  ↵ + (1 k)/n}; these must
be events since X and Y are rv’s. For each n, D(n) is a countable union (over k) of the
sets B(n, k), and thus D(n) is an event for each n and each ↵
c) On a 2 dimensional sketch for a given ↵, show the values of X(!) and Y (!) for which ! 2 D(n). Hint:
This set of values should be bounded by a staircase function.
Solution:
y
@
@
@
↵
↵ 1/n
@
@
↵ 2/n
@
@
x
@
↵@
1 0 1 2 3
n
n n n
@
@
The region D(n) is sketched for ↵n = 5; it is the region below the staircase function above.
The kth step of the staircase, extended horizontally to the left and vertically down is the
set B(n, k). Thus we see that D(n) is an upper bound to the set {Z  ↵}, which is the
straight line of slope -1 below the staircase.
d) Show that
{! : X(!) + Y (!)  ↵} =
Explain why this shows that Z = X + Y is a rv.
\
n 1
D(n).
(A.4)
Solution: The region {! : X(!) + Y (!)  ↵} is the region below the diagonal line of
slope -1 that passes through the point
T (0, ↵). This region is thus contained in D(n) for
each n 1 and is thus contained in n 1 D(n). On the other hand, each point ! for which
X(!) + Y (!) > ↵ is not contained in D(n) for sufficiently large n. This verifies (A.4). Since
D(n) is an event, the countable intersection is also an event, so {! : X(!) + Y (!)  ↵} is
an event. This applies for all ↵. This, in conjunction with (a), shows that Z is a rv.
e) Explain why this implies that if X1 , X2 , . . . , Xn are rv’s, then Y = X1 + X2 + · · · + Xn is a rv. Hint:
13
A.1. SOLUTIONS FOR CHAPTER 1
Only one or two lines of explanation are needed.
Solution: We have shown that X1 + X2 is a rv, so (X1 + X2 ) + X3 is a rv, etc.
Exercise 1.14: a) Let X1 , X2 , . . . , Xn be rv’s with expected values X 1 , . . . , X n . Show that E [X1 + · · · + Xn ] =
X 1 + · · · + X n . You may assume that the rv’s have a joint density function, but do not assume that the rv’s
are independent.
Solution: We assume that the rv’s have a joint density, and we ignore all mathematical
fine points here. Then
Z 1
Z 1
E [X1 + · · · + Xn ] =
···
(x1 + · · · + xn )fX1 ···Xn (x1 , . . . , xn ) dx1 · · · dxn
1
=
=
n Z 1
X
1
j=1
Z
n
1
X
1
j=1
1
···
Z 1
1
xj fX1 ···Xn (x1 , . . . , xn ) dx1 · · · dxn
n
X
xj fXj (xj ) dxj =
E [Xj ] .
j=1
Note that the separation into a sum of integrals simply used the properties of integration
and that no assumption of statistical independence was made.
b) Now assume that X1 , . . . , Xn are statistically independent and show that the expected value of the
product is equal to the product of the expected values.
Solution: From the independence, fX1 ···Xn (x1 , . . . , xn ) =
E [X1 X2 · · · Xn ] =
=
Z 1
1
···
n Z 1
Y
j=1
1
Z 1 Y
n
xj
1 j=1
n
Y
j=1
Qn
j=1 fXj (xj ). Thus
fXj (xj ) dx1 · · · dxn
xj fXj (xj ) dxj =
n
Y
E [Xj ] .
j=1
c) Again assuming that X1 , . . . , Xn are statistically independent, show that the variance of the sum is equal
to the sum of the variances.
Solution: Since (a) shows that E
hP
i P
X
j j =
j X j , we have
2
3
20
n
n
X
X
4
5
4
@
VAR
Xj
= E
Xj
j=1
j=1
2
n X
n
X
4
= E
(Xj
n
X
j=1
12 3
XjA 5
X j )(Xi
j=1 i=1
=
n X
n
X
j=1 i=1
⇥
E (Xj
X j )(Xi
3
X i )5
⇤
X i) ,
(A.5)
14
APPENDIX A. SOLUTIONS TO EXERCISES
where
we have again used
(a). Now from (b) (which used the independence of the Xj ),
⇥
⇤
E (Xj X j )(Xi X i ) = 0 for i 6= j. Thus(A.5) simplifies to
2
3
n
n
n
X
X
X
⇥
⇤
VAR 4
Xj 5 =
E (Xj X j )2 =
VAR [Xj ] .
j=1
j=1
j=1
Exercise 1.15: (Stieltjes integration) a) Let h(x) = u(x) and FX (x) = u(x) where u(x) is the unit
step, i.e., u(x) = 0 for
1 < x < 0 and u(x) = 1 for x
0. Using the definition of the Stieltjes integral
R1
h(x)dF
(x)
does
not
exist.
Hint: Look at the term in the Riemann sum
X
1
in Footnote 19, show that
including x = 0 and look at the range of choices for h(x) in that interval. Intuitively, it might help initially
to view dFX (x) as a unit impulse at x = 0.
P
Solution: The Riemann sum for this Stieltjes integral is n h(xn )[F(yn ) F(yn 1 )] where
yn 1 < xn  yn . For any partition {yn ; n 1}, consider the k such that yk 1 < 0  yk and
consider choosing either xn < 0 or xn 0. In the first case h(xn )[F(yn ) F(yn 1 )] = 0 and
in the second h(xn )[F(yn ) F(yn 1 )] = 1. All other terms are 0 and this can be done for
all partitions as ! 0, so the integral is undefined.
b) Let h(x) = u(x
a) and FX (x) = u(x
b) where a and b are in ( 1, +1). Show that
R1
1
h(x)dFX (x)
exists if and only if a 6= b. Show that the integral has the value 1 for a < b and the value 0 for a > b. Argue
that this result is still valid in the limit of integration over ( 1, 1).
Solution: Using the same argument as in (a) for any given partition {yn ; n 1}, consider
the k such that yk 1 < b  yk . If a = b, xk can be chosen to make h(xk ) either 0 or 1,
causing the integral to be undefined as in (a). If a < b, then for a sufficiently fine partion,
h(xk ) = 1 for all xk such that yk 1 < xk  yk . Thus that term in the Riemann sum is
1. For all other n, FX (yn ) FX (yn 1 ) = 0, so the Riemann sum is 1. For a > b and k
as before, h(xk ) = 0 for a sufficiently fine partition, and the integral is 0. The argument
does not involve the finite limits of integration, so the integral remains the same for infinite
limits.
c) Let X and Y be independent discrete rv’s, each with a finite set of possible values. Show that
R1
1
FX (z
y)dFY (y), defined as a Stieltjes integral, is equal to the distribution of Z = X + Y at each z other than the
possible sample values of Z, and is undefined at each sample value of Z. Hint: Express FX and FY as sums
of unit steps. Note: This failure of Stieltjes integration is not a serious problem; FZ (z) is a step function,
and the integral is undefined at its points of discontinuity. We automatically define FZ (z) at those step
values so that FZ is a CDF (i.e., is continuous from the right). This problem does not arise if either X or
Y is continuous.
Solution: Let X
have the PMF {p(x1 ), . . . , p(xK )}P
and Y have the PMF {pY (y1 ), . . . , pY (yJ )}.
PK
Then FX (x) = k=1 p(xk )u(x xk ) and FY (y) = Jj=1 q(yj )u(y yj ). Then
Z 1
1
FX (z
y)dFY (y) =
K X
J Z 1
X
k=1 j=1
1
p(xk )q(yj )u(z
yj
xk )du(y
yj ).
From (b), the integral above for a given k, j exists unless z = xk + yj . In other words, the
Stieltjes integral gives the CDF of X + Y except at those z equal to xk + yj for some k, j,
15
A.1. SOLUTIONS FOR CHAPTER 1
i.e., equal to the values of Z at which FZ (z) (as found by discrete convolution) has step
discontinuities.
To give a more intuitive explanation, FX (x) = Pr{X  x} for any discrete rv X has jumps
at the sample values of X and the value of FX (xk ) at any such xk includes p(xk ), i.e., FX
is continuous to the right. The Riemann sum used to define the Stieltjes integral is not
sensitive enough to ‘see’ this step discontinuity at the step itself. Thus, the stipulation that
Z be continuous on the right must be used in addition to the Stieltjes integral to define FZ
at its jumps.
Exercise 1.16: Let X1 , X2 , . . . , Xn , . . . be a sequence of IID continuous rv’s with the common probability
density function fX (x); note that Pr{X=↵} = 0 for all ↵ and that Pr{Xi =Xj } = 0 for all i 6= j. For n
define Xn as a record-to-date of the sequence if Xn > Xi for all i < n.
2,
a) Find the probability that X2 is a record-to-date. Use symmetry to obtain a numerical answer without
computation. A one or two line explanation should be adequate).
Solution: X2 is a record-to-date with probability 1/2. The reason is that X1 and X2 are
IID, so either one is larger with probability 1/2; this uses the fact that they are equal with
probability 0 since they have a density.
b) Find the probability that Xn is a record-to-date, as a function of n
1. Again use symmetry.
Solution: By the same symmetry argument, each Xi , 1  i  n is equally likely to be the
largest, so that each is largest with probability 1/n. Since Xn is a record-to-date if and
only if it is the largest of X1 , . . . , Xn , it is a record-to-date with probability 1/n.
c) Find a simple expression for the expected number of records-to-date that occur over the first m trials for
any given integer m. Hint: Use indicator functions. Show that this expected number is infinite in the limit
m ! 1.
Solution: Let In be 1 if Xn is a record-to-date and be 0 otherwise. Thus E [Ii ] is the
expected value of the ‘number’ of records-to-date (either 1 or 0) on trial i. That is
E [In ] = Pr{In = 1} = Pr{Xn is a record-to-date} = 1/n.
Thus
E [records-to-date up to m] =
m
X
n=1
E [In ] =
m
X
1
n
n=1
.
This is the harmonicPseries, which
to 1 in the limit m ! 1. If you are unfamiliar
R 1goes
1
with this, note that 1
1/n
dx
= 1.
n=1
1 x
Exercise 1.17: (Continuation of Exercise 1.16) a) Let N1 be the index of the first record-to-date
in the sequence. Find Pr{N1 > n} for each n
2. Hint: There is a far simpler way to do this than working
from (b) in Exercise 1.16.
Solution: The event {N1 > n} is the event that no record-to-date occurs in the first n
trials, which means that X1 is the largest of {X1 , X2 , . . . , Xn }. By symmetry, this event
has probability 1/n. Thus Pr{N1 > n} = 1/n.
16
APPENDIX A. SOLUTIONS TO EXERCISES
b) Show that N1 is a rv (i.e., that N1 is not defective).
Solution: Every sample sequence for X1 , X2 , . . . , maps into either a positive integer or
infinity for N1 . The probability that N1 is infinite is limn!1 Pr{N1 > n} = limn!1 (1/n),
which is 0. Thus N1 is finite with probability 1 and is thus a rv.
c) Show that E [N1 ] = 1.
Solution: Since N1 is a nonnegative rv,
E [N1 ] =
Z 1
Pr{N1 > x} dx = 1 +
0
1
X
1
n=1
n
= 1.
d) Let N2 be the index of the second record-to-date in the sequence. Show that N2 is a rv. You need not
find the CDF of N2 here.
Solution: For any given n1 and n2 , 2  n1  n2 , we start by finding Pr{N1 = n1 , N2 > n2 },
which we show can be expressed as
\
{N1 = n1 , N2 > n2 } = {Xn1 = max(X1 , . . . , Xn2 )} {X1 = max(X1 , . . . , Xn1 1 )}. (A.6)
The first of these events, {Xn1 = max(X1 , . . . , Xn2 )}, means that Xn1 must be a recordto-date and also must be the last record to date up to and including n2 . The second event,
{X1 = max(X1 , . . . , Xn1 1 )} ensures that Xn1 is the first record-to-date. The first event
has probability 1/n2 . The second event has to do only with the ordering of the first n1 1
terms, and thus is independent of the first event and has probability 1/n1 . Thus,
Pr{{N1 = n1 , N2 > n2 }} =
1
.
n2 (n1 1)
The marginal probability Pr{N2 > n2 } can now be found by summing over n1 and including
the event {N1 > n2 , N2 > n2 }, which has probability 1/n2 .
n2
X
1
1
Pr{N2 > n2 } =
+
n2 n =2 n2 (n1
1
1)
This approaches 0 with increasing n2 , so N2 is a rv. More precisely, we see that for large
n2 , Pr{N2 > n2 } ⇡ (ln n2 )/n2 , so that this approaches 0 somewhat more slowly than
Pr{N1 > n2 }, which is not surprising. These results do not depend on the CDF of X
beyond the assumption that X is continuous since they are purely ordering results.
e) Contrast your result in (c) to the result from (c) of Exercise 1.16 saying that the expected number of
records-to-date is infinite over an an infinite number of trials. Note: this might be a shock to your intuition
— there is an infinite expected wait for the first of an infinite sequence of occurrences.
Solution: Even though the expected wait for the first record-to-date is infinite, it is still
a random variable, and thus the first record-to-date must eventually occur. We have also
shown that the second record-to-date eventually occurs, and it can be shown that the nth
17
A.1. SOLUTIONS FOR CHAPTER 1
eventually occurs for all n. This makes the result in Exercise 1.16 unsurprising once it is
understood.
Exercise 1.18: (Another direction from Exercise 1.16) a) For any given n
2, find the probability
that Xn and Xn+1 are both records-to-date. Hint: The idea in Exercise 1.16 (b) is helpful here, but the
result is not.
Solution: For both Xn+1 and Xn to be records-to-date it is necessary and sufficient for
Xn+1 to be larger than all the earlier Xi (including Xn ) and for Xn to be larger than all of
the Xi earlier than it. Since any one of the first n + 1 Xi is equally likely to be the largest,
the probability that Xn+1 is the largest is 1/(n + 1). For Xn to also be a record-to-date, it
must be the second largest of the n + 1. Since all the first n terms are equally likely to be
the second largest (given that Xn+1 is the largest), the conditional probability that Xn is
the second largest is 1/n. Thus,
Pr{Xn+1 and Xn are records-to-date} =
1
.
n(n + 1)
Note that there might be earlier records-to-date before n; we have simply calculated the
probability that Xn and Xn+1 are records-to-date.
b) Is the event that Xn is a record-to-date statistically independent of the event that Xn+1 is a record-todate?
Solution: Yes, we have found the joint probability that Xn and Xn+1 are records, and it
is the product of the events that each are records individually.
c) Find the expected number of adjacent pairs of records-to-date over the sequence X1 , X2 , . . . . Hint: A
1
helpful fact here is that n(n+1)
= n1
1
.
n+1
Solution: Let In be the indicator function of the event that Xn and Xn+1 are records-to1
date. Then E [In ] = Pr{In = 1} = n(n+1)
. The expected number of pairs of records (where,
for example, records at 2, 3, and 4 are counted as two pairs of records), the expected number
of pairs over the sequence is
◆
1
1 ✓
X
X
1
1
1
E [Number of pairs] =
=
n(n + 1)
n n+1
n=2
n=2
=
1
X
1
n=2
n
1
X
1
n=3
n
=
1
,
2
where we have used the hint and then summed the terms separately. This hint is often
useful in analyzing stochastic processes.
The intuition here is that records-to-date tend to become more rare with increasing n (Xn
is a record with probability 1/n). As we have seen, the expected number of records from
2 to m is on the order of ln m, which grows very slowly with m. The probability of an
adjacent pair of records, as we have seen, decreases as 1/n(n + 1) with n, which means that
if one does not occur for small n, it will probably not occur at all. It can be seen from this
that the time until the first pair of records is a defective random variable.
18
APPENDIX A. SOLUTIONS TO EXERCISES
Exercise 1.19: a) Assume that X is a nonnegative discrete rv taking on values a1 , a2 , . . . , and let
Y = h(X) for some nonnegative function h. Let bi = h(ai ), i 1 be the ith value taken on by Y . Show that
P
P
E [Y ] = i bi pY (bi ) = i h(ai )pX (ai ). Find an example where E [X] exists but E [Y ] = 1.
Solution: If we make the added assumption that bi 6= bj for all i 6= j, then Y has the
sample value bi if P
and only if X has
P the sample value ai ; thus pY (bi ) = pX (ai ) for each i. It
then follows that i bi pY (bi ) = i h(ai )pX (ai ). This must be E [Y ] (which might be finite
or infinite). The idea is the same without
P the assumption that bi 6= bj for i 6= j, but now the
more complicated notation Pr{b} = i:h(ai )=b Pr{ai } must be used for each sample value
b of Y .
A simple example where E [X] is finite and E [Y ] = 1 is to choose a1 , a2 , . . . , to be 1, 2,
. . . and P
choose pX (i) = 2 i . Then E [X] = 2. Choosing h(i) = 2i , we have bi = 2i and
E [Y ] = i 2i · 2 i = 1. Without the assumption that bi 6= bj , the set of sample points of
Y is the set of distinct values of bi .
b) Let X be a nonnegative continuous rv with density fX (x) and let h(x) be di↵erentiable, nonnegative,
P
and nondecreasing in x. Let A( ) = n 1 h(n )[F(n ) F(n
)], i.e., A( ) is a th order approximation
R
to the Stieltjes integral h(x)dF(x). Show that if A(1) < 1, then A(2 k )  A(2 (k 1) ) < 1 for k
1.
R
Show from this that h(x) dF(x) converges to a finite value. Note: this is a very special case, but it can
be extended to many cases of interest. It seems better to consider these convergence questions as required
rather than consider them in general.
Solution: Let = 2 k for k 1. We take the expression for A(2 ) and break each interval
of size 2 into two intervals each of size ; we use this to relate A(2 ) to A( ).
h
i
X
A(2 ) =
h(2n ) F(2n ) F(2n
2 )
n 1
=
X
n 1
X
n 1
=
X
h
h(2n ) F(2n )
h
h(2n ) F(2n )
m 1
h
h(m ) F(m )
F(2n
i X
h
) +
h(2n ) F(2n
n 1
F(2n
i X
) +
h(2n
n 1
F(m
i
) = A( ),
)
h
) F(2n
F(2n
)
i
2 )
F(2n
i
2 )
where to get the final line, we substituted m for 2n in the first term of the preceding line
and m for 2n 1 in the second term, resulting in A( ). Thus A(2 k ) is nonnegative and
nonincreasing in k. Since A(1)is finite y definition, limk!1 A(2 k ) has a limit, which must
be the value of the Stieltjes integral.
To be a little more precise about the Stieltjes integral, we see that A( ) as defined above
uses the largest value, h(n ), of h(x) over the interval x 2 [n
, n ]. By replacing h(n ) by
h(n
), we get the smallest value in each interval, and then the sequence is nonincreasing
with the same limit. For an arbitrary partition of the real line, rather than the equi-spaced
partition here, the argument here would have to be further extended.
Exercise 1.20: a) Consider a positive, integer-valued rv whose CDF is given at integer values by
FY (y) = 1
2
(y + 1)(y + 2)
for integer y
0.
19
A.1. SOLUTIONS FOR CHAPTER 1
Use (1.31) to show that E [Y ] = 2. Hint: Note that 1/[(y + 1)(y + 2)] = 1/(y + 1)
1/(y + 2).
Solution: Combining (1.31) with the hint, we have
X
E [Y ] =
FYc (y) =
y 0
X
=
y 0
X
y 0
X
2
y+1
2
y+1
X
y 0
2
y+2
2
= 2,
y+1
y 1
where the second sum in the second line eliminates all but the first term of the first sum.
b) Find the PMF of Y and use it to check the value of E [Y ].
Solution: For y = 0, pY (0) = FY (0) = 0. For integer y
Thus for y 1,
pY (y) =
2
y(y + 1)
1, pY (y) = FY (y)
FY (y
1).
2
4
=
.
(y + 1)(y + 2)
y(y + 1)(y + 2)
Finding E [Y ] from the PMF, we have
E [Y ] =
=
1
X
y pY (y) =
y=1
1
X
4
y
+
1
y=1
1
X
4
(y + 1)(y + 2)
y=1
1
X
4
= 2.
y
+
1
y=2
c) Let X be another positive, integer-valued rv. Assume its conditional PMF is given by
pX|Y (x|y) =
1
y
for 1  x  y.
Find E [X | Y = y] and use it to show that E [X] = 3/2. Explore finding pX (x) until you are convinced that
using the conditional expectation to calculate E [X] is considerably easier than using pX (x).
Solution: Conditioned on Y = y, X is uniform over {1, 2, . . . , y} and thus has the conditional mean (y + 1)/2. If you are unfamiliar with this fact, think of adding {1 + 2 + · · · + y}
to {y + (y 1) + · · · + 1}, getting y(y + 1). Thus {1 + 2 + · · · + y} = y(y + 1)/2. It follows
that

Y +1
E [X] = E [E [X|Y ]] = E
= 3/2.
2
Calculating this expectation in the conventional way would require first calculating pX (x)
and then calculating the expectation. Calculating pX (x),
pX (x) =
1
X
y=x
pY (y)pX|Y (x|y) =
1
X
4
1
.
y(y + 1)(y + 2) y
y=x
It might be possible to calculate this in closed form, but it certainly does not look attractive.
Only a dedicated algebraic masochist would pursue this further given the other approach.
20
APPENDIX A. SOLUTIONS TO EXERCISES
d) Let Z be another integer-valued rv with the conditional PMF
pZ|Y (z|y) =
Find E [Z | Y = y] for each integer y
1
y2
for 1  z  y 2 .
1 and find E [Z].
⇥ ⇤
2
Solution: As in (c), E [Z|Y ] = Y 2+1 . Since pY (y) approaches 0 as y 3 , we see that E Y 2
is infinite and thus E [Z] = 1.
Exercise 1.21: a) Show that, for uncorrelated rv’s, the expected value of the product is equal to the
⇥
product of the expected values (by definition, X and Y are uncorrelated if E (X
X)(Y
⇤
Y ) = 0).
Solution: This results from a straightforward computation.
⇥
⇤
⇥
⇤
⇥
⇤
⇥
⇤
0 = E (X X)(Y Y ) = E [XY ] E XY
E XY + E X Y
= E [XY ]
XY.
b) Show that if X and Y are uncorrelated, then the variance of X + Y is equal to the variance of X plus
the variance of Y .
Solution: This is also straightforward, but a bit more tedious.
⇥
⇤
VAR [X + Y ] = E (X + Y )2
(E [X + Y ])2
⇥ ⇤
⇥ ⇤
2
= E X 2 + 2E [XY ] + E Y 2
X
2XY
= VAR [X] + VAR [Y ] + 2E [XY ]
Y
2
2X Y = VAR [X] + VAR [Y ] ,
where we used (a) in the last step.
c) Show that if X1 , . . . , Xn are uncorrelated, then the variance of the sum is equal to the sum of the
variances.
Solution: If X1 , . . . , Xn are uncorrelated, then X1 + . . . , Xn 1 is uncorrelated from Xn .
Thus, from (b),
VAR [X1 + · · · + Xn ] = VAR [X1 + · · · + Xn 1 ] + VAR [Xn ] .
Using
Pn this equation for n = 2 as the basis for induction, we see that VAR [X1 + · · · + Xn ] =
i=1 VAR [Xi ].
d) Show that independent rv’s are uncorrelated.
⇥
Solution: If X and Y are independent, then E (X
uncorrelated
X)(Y
⇤
Y ) = 0, so X and Y are
e) Let X, Y be identically distributed ternary valued random variables with the PMF pX ( 1) = pX (1) =
1/4; pX (0) = 1/2. Find a simple joint probability assignment such that X and Y are uncorrelated but
dependent.
Solution: This becomes easier to understand if we express X as |X| · Xs where |X| is the
magnitude of X and Xs is ±1 with equal probability. From the given PMF, we see that |X|
and Xs are independent. Similarly, let Y = |Y | · Ys . For reasons soon to be apparent, we
21
A.1. SOLUTIONS FOR CHAPTER 1
construct a joint PMF by taking |Y | = |X| and by taking, Xs , Ys , and |X| to be statistically
independent.
Since X and Y are zero mean, they are uncorrelated if E [XY ] = 0. For the given joint
PMF, we have
⇥
⇤
E [XY ] = E [|X| · Xs · |Y | · Ys ] = E |X|2 E [Xs Ys ] = 0,
where we have used the fact that Xs and Ys are zero mean and independent. Thus X
and Y are uncorrelated. On the other hand, X and Y are certainly not independent since
|X| and |Y | are dependent and in fact identical. This should not be surprising since the
correlation of two rv’s is a single number, whereas many numbers are needed to specify
the joint distribution. This example was constructed through the realization that the type
of symmetry here between X and Y is sufficient to guarantee uncorrelatedness, but not
enough to cause independence.
f ) You have seen that the moment generating function of a sum of independent rv’s is equal to the product
of the individual moment generating functions. Give an example where this is false if the variables are
uncorrelated but dependent.
Solution: We know (although we have not proven) that the MGF of a rv specifies the
distribution, and thus the MGF of a dependent sum cannot be the same as the MGF of an
independent sum with the same marginals. Thus any will work, but the one in (e) is quite
simple.
1 2r 3 1 2r
e
+ + e
8
4 8
⇣1
1 1 ⌘2
1
1
3 1
1
r
gX (r)gY (r) =
e + + er = e 2r + e r + + er + e2r .
4
2 4
16
4
8 4
16
gX+Y (r) =
Exercise 1.22: Suppose X has the Poisson PMF, pX (n) =
n
Poisson PMF, pY (n) = µ exp( µ)/n! for n
n
exp(
)/n! for n
0 and Y has the
0. Assume that X and Y are independent. Find the
distribution of Z = X + Y and find the conditional distribution of Y conditional on Z = n.
Solution: The seemingly straightforward approach is to take the discrete convolution of
X and Y (i.e., the sum of the joint PMF’s of X and Y for which X + Y has a given value
Z = n). Thus
pZ (n) =
n
X
pX (k)pY (n
k) =
k=0
= e ( +µ)
n
X
k=0
n
X
k=0
k µn k
k!(n
k)!
ke
k!
⇥
µn k e µ
(n k)!
.
At this point, one needs some added knowledge or luck. One might hypothesize (correctly)
that Z is also a Poisson rv with parameter + µ; one might recognize the sum above, or
one might look at an old solution. We multiply and divide the right hand expression above
22
APPENDIX A. SOLUTIONS TO EXERCISES
by ( + µ)n /n!.
n
pZ (n) =
( + µ)n e ( +µ) X
n!
n!
k!(n k)!
k=0
=
✓
+µ
◆k ✓
µ
+µ
◆n k
( + µ)n e ( +µ)
,
n!
where we have recognized the sum on the right as a binomial sum.
Another approach that is actually more straightforward uses the fact (see (1.52)) that the
MGF of the sum of independent rv’s is the⇥ product ⇤of the MGF’s of those rv’s.
Table
⇥ From
⇤
r
r
1.2 (or a simple
derivation),
g
(r)
=
exp
(e
1)
.
Similarly,
g
(r)
=
exp
µ(e
1)
, so
X
Y
⇥
⇤
r
gZ (r) = exp ( + µ)(e
1) . Since the MGF specifies the PMF, Z is a Poisson rv with
parameter + µ.
Finally, we must find pY |Z (i | n). As a prelude to using Bayes’ law, note that
pZ|Y (n | i) = Pr{X + Y = n | Y = i} = Pr{X = n
i} .
Thus
n ie
pY (i)pX (n i)
µi e µ
n!
=
⇥
⇥
pZ (n)
(n i)!
i!
(µ + )n e ( +µ)
✓ ◆✓
◆n i ✓
◆i
n
µ
=
.
i
+µ
+µ
pY |Z (i | n) =
Why this turns out to be a binomial PMF will be clarified when we study Poisson processes.
Exercise 1.23: a) Suppose X, Y and Z are binary rv’s, each taking on the value 0 with probability 1/2
and the value 1 with probability 1/2. Find a simple example in which X, Y , Z are statistically dependent
but are pairwise statistically independent (i.e., X, Y are statistically independent, X, Z are statistically
independent, and Y , Z are statistically independent). Give pXY Z (x, y, z) for your example. Hint: In the
simplest example, there are four joint values for x, y, z that have probability 1/4 each.
Solution: The simplest solution is also a very common relationship between 3 bnary rv’s.
The relationship is that X and Y are IID and Z = X Y where is modulo two addition,
i.e., addition with the table 0 0 = 1 1 = 0 and 0 1 = 1 0 = 1. Since Z is a function
of X and Y , there are only 4 sample values, each of probability 1/4. The 4 possible sample
values for (XY Z) are then (000), (011), (101) and (110). It is seen from this that all pairs
of X, Y, Z are statistically independent
b) Is pairwise statistical independence enough to ensure that
hY n
i Yn
E
Xi =
E [Xi ]
i=1
i=1
for a set of rv’s X1 , . . . , Xn ?
Solution: No, (a) gives an example, i.e., E [XY Z] = 0 and E [X] E [Y ] E [Z] = 1/8.
⇥
Exercise 1.24: Show that E [X] is the value of ↵ that minimizes E (X
⇤
↵)2 .
23
A.1. SOLUTIONS FOR CHAPTER 1
⇥
Solution: E (X
↵ = E [X].
⇤
⇥ ⇤
↵)2 = E X 2
2↵E [X] + ↵2 . This is clearly minimized over ↵ by
Exercise 1.25: For each of the following random variables, find the endpoints r and r+ of the interval
for which the moment generating function g(r) exists. Determine in each case whether g(r) exists at r and
r+ . For parts a) and b) you should also find and sketch g(r). For parts c) and d), g(r) has no closed form.
a) Let , ✓, be positive numbers and let X have the density.
fX (x) =
1
✓ exp(✓x); x < 0.
2
Solution: Integrating to find gX (r) as a function of
and ✓, we get
fX (x) =
gX (r) =
1
exp(
2
x); x
Z 0
1 ✓x+rx
✓e
dx +
1 2
Z 1
0
0;
1
e
2
x+rx
✓
+
2(✓ + r) 2(
dx =
r)
.
The first integral above converges for r > ✓ and the second for r < . Thus r =
r+ = . The MGF does not exist at either end point.
b) Let Y be a Gaussian random variable with mean m and variance
2
✓ and
.
Solution: Calculating the MGF by completing the square in the exponent,
✓
◆
Z 1
1
(y m)2
p
gY (r) =
exp
+ ry dy
2 2
2⇡ 2
1
✓
◆
Z 1
1
(y m r 2 )2
r2 2
p
=
exp
+ rm +
dy
2 2
2
2⇡ 2
1
✓
◆
r2 2
= exp rm +
,
2
where the final equality arises from realizing that the other terms in the equation above
represent a Gaussian density and thus have unit integral. Note that this is the same as the
result in Table 1.1. This MGF is finite for all finite r so r = 1 and r+ = 1. Also gY (r)
is infinite at each endpoint.
c) Let Z be a nonnegative random variable with density
fZ (z) = k(1 + z) 2 exp(
where
R
> 0 and k = [ z 0 (1+z) 2 exp(
z);
z
0,
z)dz] 1 . Hint: Do not try to evaluate gZ (r). Instead, investigate
values of r for which the integral is finite and infinite.
Solution: Writing out the formula for gZ (r), we have
Z 1
gZ (r) =
k(1 + z) 2 exp (r
)z dz.
0
This integral is clearly infinite for r >
and clearly finite for r < . For r = , the
exponential term disappears, and we note that (1 + z) 2 is bounded for z  1 and goes to
0 as z 2 as z ! 1, so the integral is finite. Thus r+ belongs to the region where gZ (r) is
finite.
24
APPENDIX A. SOLUTIONS TO EXERCISES
The whole point of this is that the random variables for which r+ = are those for which
the density or PMF go to 0 with increasing z as e z . Whether or not gZ ( ) is finite
depends on the coefficient of e z .
d) For the Z of (c), find the limit of
0
(r) as r approaches
from below. Then replace (1 + z)2 with |1 + z|3
in the definition of fZ (z) and K and show whether the above limit is then finite or not. Hint: no integration
is required.
Solution: Di↵erentiating gZ (r) with respect to r,
Z 1
0
gZ
(r) =
kz(1 + z) 2 exp (r
)z dz.
0
For r = , the above integrand approaches 0 as 1/z and thus the integral does not converge.
In other words, although gZ ( ) is finite, the slope of gZ (r) is unbounded as r ! from
below. If (1 + z) 2 is repaced with (1 + z) 3 (with k modified to maintain a probability
density), we see that as z ! 1, z(1 + z) 3 goes to 0 as 1/z 2 , so the integral converges.
Thus in this case the slope of gZ (r) remains bounded for r < .
Exercise 1.26: a) Assume that the random variable X has a moment generating function gX (r) that is
finite in the interval (r , r+ ), r < 0 < r+ , and assume r < r < r+ throughout. For any finite constant c,
express the moment generating function of X
00
of X. Show that g(X
c) (r)
c, i.e., g(X c) (r) in terms of the moment generating function
0.
Solution: Note that g(X c) (r) = E [exp(r(X c))] = gX (r)e cr . Thus r+ and r are the
same for X and X c. Thus (see Footnote 24), the derivatives of g(X c) (r) with respect to
r are finite. The first two derivatives are then given by
0
g(X
c) (r) = E [(X
since (X
⇥
00
g(X
c) (r) = E (X
c)2 exp(r(X
c))
00
00
b) Show that g(X
c) (r) = [gX (r)
c) exp(r(X
c)2 exp(r(X
0 for all x.
c))] ,
⇤
c))
0,
0
2cgX
(r) + c2 gX (r)]e rc .
Solution: Writing (X c)2 as X 2 2cX + c2 , we get
⇥ 2
⇤
00
g(X
c))
2cE [X exp(r(X c))] + c2 E [exp(r(X c))]
c) (r) = E X exp(r(X
⇥ ⇥ 2
⇤
⇤
= E X exp(rX)
2cE [X exp(rX)] + c2 E [exp(rX)] exp( rc)
⇥ 00
⇤
0
= gX
(r) 2cgX
(r) + c2 g(r) exp( rc).
00
c) Use a) and b) to show that gX
(r)gX (r)
0
[gX
(r)]2
0, and that
00
X (r)
0
0. Hint: Let c = gX
(r)/gX (r).
Solution: With the suggested choice for c,

0 (r))2
(g 0 (r))2 (gX
00
00
g(X c) (r) = gX
(r) 2 X
+
exp( rc)
gX (r)
gX (r)

00 (r)
0 (r)]2
gX (r)gX
[gX
=
exp( cr).
gX (r)
25
A.1. SOLUTIONS FOR CHAPTER 1
Since this is nonnegative from (a), we see that
00
00
X (r) = gX (r)gX (r)
0
[gX
(r)]2
0.
d) Assume that X is non-atomic, i.e., that there is no value of c such that Pr{X = c} = 1. Show that the
inequality sign “
“ may be replaced by “ > “ everywhere in a), b) and c).
Solution: Since X is non-atomic, (X c) must be non-zero with positive probability, and
00
thus from (a), g(X
c) (r) > 0. Thus the inequalities in parts b) and c) are strict also.
Exercise 1.27:
A computer system has n users, each with a unique name and password. Due to a
software error, the n passwords are randomly permuted internally (i.e. each of the n! possible permutations
are equally likely. Only those users lucky enough to have had their passwords unchanged in the permutation
are able to continue using the system.)
a) What is the probability that a particular user, say user 1, is able to continue using the system?
Solution: The obvious (and perfectly correct) solution is that user 1 is equally likely to be
assigned any one of the n passwords in the permutation, and thus has probability 1/n of
being able to continue using the system.
This might be somewhat unsatisfying since it does not reason directly from the property
that all n! permutations are equally likely. Thus a more direct argument is that the number
of permutations starting with 1 is the number of permutations of the remaining (n 1)
users, i.e., (n 1)!. These are equally likely, each with probability 1/n!, so the probability
that user 1 (or any given user i) can continue to use the system is 1/n.
b) What is the expected number of users able to continue using the system? Hint: Let Xi be a rv with the
value 1 if user i can use the system and 0 otherwise.
Solution: We have just seen that Pr{Xi = 1} = 1/n, so Pr{Xi = 0} = 1 1/n.
Pn Thus
E [Xi ] = 1/n.
The
number
of
users
who
can
continue
to
use
the
system
is
S
=
n
i=1 Xi ,
P
so E [Sn ] = i E [Xi ] = n/n = 1. It is important to understand here that the Xi are not
independent, and that the expected value of a finite sum of rv’s is equal to the sum of the
expected values whether or not the rv’s are independent. For example, with two users,
either both can continue or neither.
Exercise 1.28:
Suppose the rv X is continuous and has the CDF FX (x). Consider another rv
Y = FX (X). That is, for each sample point ! such that X(!) = x, we have Y (!) = FX (x). Show that Y is
uniformly distributed in the interval 0 to 1.
Solution: For simplicity, first assume that FX (x) is strictly increasing in x, thus having
the following appearance:
FX1 (y)
y
FX (x)
x
If FX (x) = y, then FX1 (y) = x
26
APPENDIX A. SOLUTIONS TO EXERCISES
Since FX (x) is continuous in x and strictly increasing from 0 to 1, there must be an inverse
function FX1 such that for each y 2 (0, 1), FX1 (y) = x for that x such that FX (x) = y.
For this y, then, the event {FX (X)  y} is the same as the event {X  FX1 (y)}. This is
illustrated in the figure above. Using this equality for the given y,
Pr{Y  y} = Pr{FX (X)  y} = Pr X  FX1 (y)
= FX (FX1 (y)) = y.
where in the final equation, we have used the fact that FX1 is the inverse function of FX .
This relation, for all y 2 (0, 1), shows that Y is uniformly distributed between 0 and 1.
If FX is not strictly increasing, i.e., if there is any interval over which FX (x) has a constant
value y, then we can define FX1 (y) to have any given value within that interval. The above
argument then still holds, although FX1 is no longer the inverse of FX .
If there is any discrete point, say z at which Pr{X = z} > 0, then FX (x) cannot take on
values in the open interval between FX (z) a and FX (z) where a = Pr{X = z}. Thus FX
is uniformly distributed only for continuous rv’s.
Exercise 1.29:
Let Z be an integer-valued rv with the PMF pZ (n) = 1/k for 0  n  k
1. Find
the mean, variance, and moment generating function of Z. Hint: An elegant way to do this is to let U be a
uniformly distributed continuous rv over (0, 1] that is independent of Z. Then U + Z is uniform over (0, k].
Use the known results about U and U + Z to find the mean, variance, and MGF for Z.
Solution: It is not particularly hard to find the mean, variance, and MGF of Z directly,
but the technique here is often useful in relating continuous rv’s to their quantized versions.
We have
E [Z] = E [U + Z]
2
Z
=
2
(U +Z)
E [U ] = k/2
2
U
= k /12
2
gZ (r) = g(Z+U ) (r)/gU (r) =
=
exp(kr)
k[exp(r)
1/2 = (k
1)/2
1/12 = (k
1)/12
exp(rk)
rk
1
⇥
2
r
exp(r)
1
1
.
1]
Exercise 1.30: (Alternate approach 1 to the Markov inequality) a) Let Y be a nonnegative rv
and y > 0 be some fixed number. Let A be the event that Y
inequality is satisfied for every ! 2 ⌦).
y. Show that y IA  Y (i.e., that this
Solution If Y
y, then IA = 1 and Y
yIA . Alternatively, if Y < y, then IA = 0 and
Y
Y IA again since Y is nonnegative. Thus in all cases, Y
yIA .
b) Use your result in (a) to prove the Markov inequality.
Solution Taking the expected value of both sides of (a), E [Y ]
Pr{A y}, the Markov inequality follows.
yE [IA ]. Since E [IA ] =
Exercise 1.31: (Alternate approach 2 to the Markov inequality) a) Minimize E [Y ] over all
27
A.1. SOLUTIONS FOR CHAPTER 1
nonnegative rv’s such that Pr{Y
b} =
for some given b > 0 and 0 <
< 1. Hint: Use a graphical
argument similar to that in Figure 1.7. What is the rv that achieves the minimum? Hint: It is binary.
R1
Solution: Since E [Y ] = 0 Fc (y) dy, we can try to minimize E [Y ] by minimizing Fc (y) at
each y subject to the constraints on Fc (y). One constraint is that Fc (y) is nonincreasing in
y and the other is that Pr{Y
b} = . Choosing Fc (y) = for 0  y < b and Fc (y) = 0
for y
satisfies these constraints and Fc (y) cannot be further reduced anywhere. The
minimizing complementary distribution function is then sketched below:
1
s
Fc (y)
s
b
0
The rv Y is then recognized as binary with the sample values 0 and b and the probabilities
pY (0) = 1
and pY (b) = . The minimizing E [Y ] is then b.
b) Use (a) to prove the Markov inequality and also point out the distribution that meets the inequality with
equality.
Solution: We have seen that E [Y ]
b = bPr{Y
b}, which, on dividing by b, is the
Markov inequality. Thus, for any b > 0 and any Pr{Y
b} there is a binary rv that meets
the Markov bound with equality.
Exercise 1.32: (The one-sided Chebyshev inequality) This inequality states that if a zero-mean rv X
has a variance
2
, then it satisfies the inequality
2
Pr{X
b} 
for every b > 0,
2 + b2
(A.7)
with equality for some b only if X is binary and Pr{X = b} = 2 /( 2 + b2 ). We prove this here using the
same approach as in Exercise 1.31. Let X be a zero-mean rv that satisfies Pr{X b} = for some given
b > 0 and 0 < < 1. The variance 2 of X can be expressed as
2
=
Z b
x2 fX (x) dx +
1
We will first minimize
Z 1
x2 fX (x) dx.
(A.8)
b
2
over all zero-mean X satisfying Pr{X b} = .
R1
a) Show that the second integral in (A.8) satisfies b x2 fX (x) dx b2 .
Solution: We first should explain why we started trying to establish an upper bound to
Pr{X b} and then switched to minimizing the variance such that X = 0 and Pr{X b} =
2
2 /(b2 + 2 ).
for some given . We will find that min
= b2 /(1 ), or, equivalently, = min
min
Thus for each zero-mean X satisfying Pr{X b} = , we have (A.8), i.e.,
Pr{X
b} =
=
2
min
2
2
min + b

2
X
.
2
2
X +b
Solving part (a),
Z 1
b
x fX (x)
2
Z 1
b
b2 fX (x) dx = b2 FcX (b) = b2 .
28
APPENDIX A. SOLUTIONS TO EXERCISES
b) Show that the first integral in (A.8) is constrained by
Solution:
Rb
Z b
fX (x) dx = 1
1
1 fX (x) dx = 1
Z b
1
R1
b
xfX (x) dx = 0
and
Z b
1
fX (x) dx = 1
Z 1
b
xfX (x) dx 
b .
. Similarly, since X = 0,
xfX (x) dx 
Z 1
bfX (x) dx =
b .
b
c) Minimize the first integral in (A.8) subject to the constraints in (b). Hint: If you scale fX (x) up by
1/(1
), it integrates to 1 over ( 1, b) and the second constraint becomes an expectation. You can then
minimize the first integral in (A.8) by inspection.
Solution: Let h(x) = fX (x)/(1
) over x  b and h(x) = 0 elsewhere. Then h(x) is a
PDF and the corresponding mean is b /(1
). The corresponding second moment is
lower bounded by the square of that mean, so
Z b
b2 2
x2 h(x) dx
.
(1
)2
1
Since h(x) = fX (x)/(1
), we have
Rb
2
1 x fX (x) dx
d) Combine the results in a) and c) to show that
2
b2 /(1
b2 2 /(1
).
). Find the minimizing distribution. Hint:
It is binary.
Solution: Substituting the results of (a) and (c) into (A.8)
2
b2 2
+ b2
1
=
b2
1
(A.9)
.
This is met with equality when the inequalities in (a) and (c) are met with equality. Thus
X = b whenever X b and X = b /(1
) when X < b. Since is the probability that
X b, we see that pX (b) = and pX ( b /(1
)) = 1
.
e) Use (d) to establish (A.7).
Pr Y
Y
b 
2
/(
2
2
Also show (trivially) that if Y has a mean Y and variance
, then
2
+ b ).
Solution: From (A.9), 2 (1
) b2 , from which  2 /(b2 + 2 ), i.e., Pr{X b} 
2 /(b2 + 2 ). The conditions for equality are clearly the same as in (d). The final result
follows by letting X be the fluctuation of Y and applying (A.7) to X.
Exercise 1.33: (Proof of (1.48)) Here we show that if X is a zero-mean rv with a variance
2
, then the
median ↵ satisfies |↵|  .
a) First show that |↵| 
for the special case where X is binary with equiprobable values at ± .
Solution: In this special case, FX (x) = 1/2 for l
satisfies the condition to be a median.
b) For all zero-mean rv’s X with variance
2
 x  , and any value in this range
other than the special case in a), show that
Pr{X
} < 0.5.
29
A.1. SOLUTIONS FOR CHAPTER 1
Hint: Use the one-sided Chebyshev inequality of Exercise 1.32.
Solution: The one-sided Chebyshev inequality for zero-mean X says that Pr{X b} 
2 /(b2 + 2 ). Letting b = , this says that Pr{X
}  0.5. The condition for equality
here (as shown in Exercise 1.32) is that X is binary with values ± . Thus, outside of that
special case, Pr{X
} < 0.5.
c) Show that Pr{X
↵}
0.5. Other than the special case in a), show that this implies that ↵ < .
Solution: By definition of a median, Pr{X ↵}
(outside of the special case of (a)) shows that Pr{X
than .
0.5 and Pr{X  ↵} 0.5. Since (b)
} < 0.5, the median must be smaller
d) Other than the special case in a), show that |↵| < . Hint: repeat b) and c) for the rv
then shown that |↵| 
mean, this shows that |↵
X. You have
with equality only for the binary case with values ± . For rv’s Y with a non-zero
Y| .
Solution: Note that X is also zero-mean with variance 2 , so (outside of the case in
(a)), Pr{ X
} = Pr{X 
} < 0.5. This implies that the median is greater than
.
Thus |↵|  with equality only in the binary equiprobable case. For Y with a non-zero
mean, we simply apply this result to the fluctuation of Y .
Exercise 1.34: We stressed the importance of the mean of a rv X in terms of its association with the
sample average via the WLLN. Here we show that there is a form of WLLN for the median and for the
entire CDF, say FX (x) of X via sufficiently many independent sample values of X.
a) For any given x, let Ij (x) be the indicator function of the event {Xj  x} where X1 , X2 , . . . , Xj , . . . are
IID rv’s with the CDF FX (x). State the WLLN for the IID rv’s {I1 (x), I2 (x), . . . }.
Solution: The mean value of Ij (x) is FX (x) and the variance (after a short calculation) is
FX (x)FcX (x). This is finite (and in fact at most 1/4), so Theorem 1.7.1 applies and
8
9
n
< 1X
=
lim Pr
Ij (x) FX (x) > ✏ = 0
for all x and ✏ > 0.
(A.10)
n!1
: n
;
j=1
P
This says that if we take n samples of X and use (1/n) nj=1 Ij (x) to approximate the CDF
FX (x) at each x, then the probability that the approximation error exceeds ✏ at any given
x approaches 0 with increasing n.
b) Does the answer to (a) require X to have a mean or variance?
Solution: No. As pointed out in a), Ij (x) has a mean and variance whether or not X does,
so Theorem 1.7.1 applies.
c) Suggest a procedure for evaluating the median of X from the sample values of X1 , X2 , . . . . Assume that
X is a continuous rv and that its PDF is positive in an open interval around the median. You need not be
precise, but try to think the issue through carefully.
What you have seen here, without stating it precisely or proving it is that the median has a law of large
numbers associated with it, saying that the sample median of n IID samples of a rv is close to the true
median with high probability.
P
Solution: Note that (1/n) nj=1 Ij (y) is a rv for P
each y. Any sample function x1 , . . . , xn
of X1 , . . . , Xn maps into a sample value of (1/n) nj=1 Ij (y) for each y. We can view this
30
APPENDIX A. SOLUTIONS TO EXERCISES
collection of sample values as a function of y. Any such sample function is non-decreasing
in y, and as seen in (a) is an approximation to FX (y) at each y. This function
Pnof y has all
the characteristics of a CDF itself, so we can let ↵
ˆ n be the median of (1/n) j=1 Ij (y) as
a function
of
y.
Let
↵
be
the
true
median
of
X
and
let > 0 be
P
P arbitrary. Note that if
(1/n) nj=1 Ij (↵
) < .5, then ↵
ˆn > ↵
. Similarly, if (1/n) nj=1 Ij (↵ + ) > .5, then
↵
ˆ n < ↵ + . Thus,
8
9
8
9
n
n
<1 X
=
<1 X
=
Pr ↵
ˆn ↵
 Pr
Ij (↵
) .5 + Pr
Ij (↵ + )  .5 .
:n
;
:n
;
j=1
j=1
Because of the assumption of a nonzero density, there is some ✏1 > 0 such that FX (a
.5 ✏1 and some ✏2 > 0 such that FX (a
) > .5 + ✏1 . Thus,
8
9
n
< 1X
=
Pr ↵
ˆn ↵

Pr
Ij (↵
) FX (↵
) > ✏1
: n
;
j=1
8
9
n
< 1X
=
+ Pr
Ij (↵ + ) FX (↵ + ) > ✏2 .
: n
;
)<
j=1
From (A.10), the limit of this as n ! 1 is 0, which is a WLLN for the median. With a
great deal more fussing, the same result holds true without the assumption of a positive
density if we allow ↵ above to be any median in cases where the median is nonunique.
Exercise 1.35 a) Show that for any integers 0 < k < n,
!
n

k+1
!
n n k
.
k
k
Solution:
✓
◆
n
=
k+1
n!
n!
=
(k + 1)!(n k 1)!
k!(k + 1)(n k)!/(n
✓ ◆
✓ ◆
n n k
n n k
=

.
k k+1
k
k
b) Extend (a) to show that, for all `  n
k)
(A.11)
k,
!
n

k+`
!
n
n k
k
k
`
.
(A.12)
Solution: Using k + ` in place of k + 1 in (A.11),
✓
◆
✓
n
n

k+`
k+`
◆
n
1
Applying recursion on `, we get (A.12).
k (` 1)
k+` 1

✓
n
k+`
◆
1
n
k
k
.
31
A.1. SOLUTIONS FOR CHAPTER 1
c) Let p̃ = k/n and q̃ = 1
that for all `  n k,
p̃. Let Sn be the sum of n binary IID rv’s with pX (0) = q and pX (1) = p. Show
pSn (k + `)  pSn (k)
✓
q̃p
p̃q
◆`
.
(A.13)
Solution: Using (b),
✓
◆
✓ ◆


n
n
n k ` p ` k n k
q̃p `
k+` n k `
pSn (k + `) =
p q

p q
= pSn (k)
.
k+`
k
k
q
p̃q
d) For k/n > p, show that Pr{Sn
k}  p̃p̃qp pSn (k).
Solution: Using the bound in (A.13), we get
Pr{Sn
k} =
n
Xk
`=0
pSn (k+`) 
n
Xk
pSn (k)
`=0
✓
q̃p
p̃q
◆`
1
p̃q
= pSn (k)
1 q̃p/p̃q
p̃q q̃p
p̃q
p̃q
= pSn (k)
= pSn (k)
.
p̃(1 p) (1 p̃)p
p̃ p
 pSn (k)
(A.14)
e) Now let ` be fixed and k = dnp̃e for fixed p̃ such that 1 > p̃ > p. Argue that as n ! 1,
✓ ◆`
q̃p
p̃q
pSn (k + `) ⇠ pSn (k)
and Pr{Sn k} ⇠
pS (k),
p̃q
p̃ p n
where a(n) ⇠ b(n) means that limn!1 a(n)/b(n) = 1.
Solution: Note that (A.12) provides an upper bound to
n
k+` and a slight modification of
n k ` `
. Taking the ratio of the
k+`
n
n
the same argument provides the lower bound k+`
k
upper to lower bound for fixed ` and p̃ as n ! 1, we see that
✓
◆
✓ ◆
n
n
n k `
⇠
so that
k+`
k
k
pSn (k + `)
⇠
pSn (k)(q̃p/p̃q)`
(A.15)
follows. Replacing the upper bound in (A.14) with the asymptotic equality in (A.15), and
letting ` grow very slowly with n, we get Pr{Sn k} ⇠ p̃p̃qp pSn (k).
Exercise 1.36 A sequence {an ; n
1} of real numbers has the limit 0 if for all ✏ > 0, there is an m(✏)
such that |an |  ✏ for all n m(✏). Show that the sequences in parts a) and b) below satisfy limn!1 an = 0
but the sequence in (c) does not have a limit.
1
a) an = ln(ln(n+1))
Solution: This is very tedious, but most of us who have never done any analysis must
carry out this kind of detail a few times before seeing automatically what is going on. For
an  ✏, we have ln(ln(n + 1)) 1/✏. Thus we must have ln(n + 1) e1/✏ , or, finally
n
1 + exp(exp(1/✏)).
32
APPENDIX A. SOLUTIONS TO EXERCISES
Taking m(✏) to be the right-hand side (which is then finite for all ✏ > 0, we see that an  ✏
for all n m(✏).
b) an = n10 exp( n)
Solution: The technique in (a) doesn’t work easily here, but the following criterion is often
useful and applies here: If bn
an for all n greater than some n0 , and if limn!1 bn = 0,
then limn!1 an = 0 also. For the sequence here, let bn = exp( n/2). For large enough n,
n10 = exp(10 ln n)  exp(n/2), and this implies that an  bn for all large enough n. Thus
limn!1 an = 0.
c) an = 1 for n = 10` for each positive integer ` and an = 0 otherwise.
Solution: Here, for any ✏ < 1, an > ✏ for all n of the form n = 10` for ` integer. Since
an = 0 for all other n, limn!1 an does not exist.
d) Show that the definition can be changed (with no change in meaning) by replacing ✏ with either 1/k or
2 k for every positive integer k.
Solution: Suppose the original definition holds, Then for any k, we can let ✏ = 1/k, and
know that there is an m(1/k) such that an  1/k for all n m(1/k). Conversely, suppose
that for every k, there is an mk such that an  1/k for all n mk . Then for every ✏ > 0,
we can choose a k such that ✏ > 1/k. For this k, an  ✏ for all n mk . The same argument
holds when ✏ is replaced by 2 k .
Exercise 1.37: Represent the MGF of a rv X by
gX (r) =
Z 0
erx dF(x) +
1
Z 1
erx dF(x).
0
In each of the following parts, you are welcome to restrict X to be either discrete or continuous.
a) Show that the first integral always exists (i.e., is finite) for r
0 and that the second integral always
exists for r  0.
Solution: For r 0 and x  0, we have 0  erx  1. Thus the first integral lies between 0
and 1 and is thus finite. Similarly, for r  0 and x 0, we have 0  erx  1, so the second
integral is finite in this range.
b) Show that if the second integral exists for a given r1 > 0, then it also exists for all r in the range
0  r  r1 .
Solution:R For x 0 (i.e.,R in the domain of the first integral), we have erx  er1 x , so we also
1
1
have 0  0 erx dF(x)  0 er1 x dF(x). Thus the second integral exists for all r 2 [0, r1 ].
c) Show that if the first integral exists for a given r2 < 0, then it also exists for all r in the range r2  r  0.
Solution: This follows in the same way as (b).
d) Show that the domain of r over which gX (r) exists is an interval from some r  0 to some r+
0 (the
interval might or might not include each endpoint, and either or both end points might be 0, 1, or any
point between).
Solution: Combining parts a) and b), we see that if the second integral exists for some
r1 > 0, then it exists for all r  r1 . Defining r+ as the supremum of values of r for which
33
A.1. SOLUTIONS FOR CHAPTER 1
the second integral exists, we see that the second integral exists for all r < r+ . Since the
second integral exists for r = 0, we see that r+ 0.
Similarly, the first integral exists for all r > r where r  0 is the infimum of r such that
the first integral exists. From (a), both integrals then exist for r 2 (r , r+ ).
e) Find an example where r+ = 1 and the MGF does not exist for r = 1. Find another example where
r+ = 1 and the MGF does exist for r = 1. Hint: Consider fX (x) = e x for x
0 and figure out how to
R1 y
R 1 y+✏y
modify it to fY (y) so that 0 e fY (y) dy < 1 but 0 e
fY (y) = 1 for all ✏ > 0.
Solution:
R 1 Note: The hint should not have been necessary here. Anyone trying to imagine
why 0 erx f(x) dx should converge for some r and not for others who doesn’t consider
exponentials
was up too late the previous night. We observe that the second integral,
R1
exp(rx
x)
dx converges for r < 1 and diverges for r 1, so r+ is not in the domain of
0
convergence.
R1
How do we modify e x for x
0 to make 0 f (x) exp(xr) dx converge for r = 1 but to
diverge for all r > 1? We need an f (x) that goes to 0 with increasing x just a little faster
than e x , but it has to go to zero more slowly than e (1+✏)x for any ✏ > 0 There are many
p
choices, such as exp( x
x) or (1 + x2 ) 1 exp( x). Each of these requires a normalizing
constant to make the function a probability density, but there is no need to calculate the
constant.
Exercise 1.38: Let {Xn ; n
1} be a sequence of independent but not identically distributed rv’s. We
say that the weak law of large numbers (WLLN) holds for this sequence if for all ✏ > 0
⇢
E [Sn ]
Sn
lim Pr
✏ = 0 where Sn = X1 + X2 + · · · + Xn .
n!1
n
n
a) Show that the WLLN holds if there is some constant A such that
2
Xn  A for all n.
P
2 . Since
Solution: From (1.41), the variance of Sn is given by S2 n = ni=1 X
i
2
2
have Sn  nA and thus Sn /n  A/n. Thus, from the Chebyshev inequality,
Pr
⇢
Sn
n
E [Sn ]
n
✏

2
Xi  A, we
A
.
n✏2
Then, exactly as in the proof of the WLLN for IID rv’s, the quantity on the right approaches
0 as n ! 1 for all ✏ > 0, completing the demonstration.
b) Suppose that
case.
2
1 ↵
for some ↵, 0 < ↵ < 1 and for all n
Xn  A n
1. Show that the WLLN holds in this
Solution: This is similar to (a), but it is considerably stronger, since it says that the
variance of the Xi can be slowly increasing in i, and if this increase is bounded as above,
2  An1 ↵ for i  n. Thus,
the WLLN still holds. Since An1 ↵ is increasing in n, X
i
2
Sn
=
n
X
2
Xi
i=1
2
Sn /n
 An ↵ .
 nAn1 ↵  An2 ↵
34
APPENDIX A. SOLUTIONS TO EXERCISES
From the Chebyshev inequality,
⇢
Sn
Pr
n
E [Sn ]
n
✏

An ↵
,
✏2
which for any ✏ > 0 approaches 0 as n ! 1.
Exercise 1.39: Let {Xi ; i
1} be IID binary rv’s. Let Pr{Xi = 1} = , Pr{Xi = 0} = 1
. Let
Sn = X1 + · · · + Xn . Let m be an arbitrary but fixed positive integer. Think! then evaluate the following
and explain your answers:
P
a) limn!1 i: n min +m Pr{Sn = i}.
Solution: It is easier to reason about the problem if we restate the sum in the following
way:
X
Pr{Sn = i} = Pr{n
m  Sn  n + m}
i: n
min +m
= Pr
= Pr
⇢
m  Sn
nX  m
m
S
nX
m
p  n p
 p
n
n
n
,
where
is the standard deviation of X. Now in the limit n ! 1, (Sn nX)/
approaches a normalized Gaussian rv in distribution, i.e.,
⇢
⇥
m
S
nX
m
m
m ⇤
p  n p
p
p
lim Pr
 p
= lim
= 0.
n!1
n!1
n
n
n
n
n
p
n
This can also be seen immediately from the binomial distribution as it approaches a discrete
Gaussian distribution. We are looking only at essentially the central 2m terms of the
p
binomial, and each of those terms goes to 0 as 1/ n with increasing n.
b) limn!1
P
i :0in +m Pr{Sn = i}.
Solution: Here all terms on lower side of the distribution are included and the upper side
is bounded as in (a). Arguing in the same way as in (a), we see that
⇢
X
Sn nX
m
p
Pr{Sn = i} = Pr
 p
.
n
n
i:0in +m
In the limit, this is
c) limn!1
P
i :n(
(0) = 1/2.
1/m)in( +1/m) Pr{Sn = i}.
Solution: Here the number of terms included in the sum is increasing linearly with n, and
the appropriate mechanism is the WLLN.
⇢
X
1
Sn nX
1
Pr{Sn = i} = Pr


.
m
n
m
i:n(
1/m)in( +1/m)
In the limit n ! 1, this is 1 by the WLLN. The essence of this exercise has been to scale
the random variables properly to go the limit. We have used the CLT and the WLLN, but
35
A.1. SOLUTIONS FOR CHAPTER 1
one could guess the answers immediately by recognizing what part of the distribution is
being looked at.
Exercise 1.40: The WLLN is said to hold for a zero-mean sequence {Sn ; n
lim Pr
n!1
The CLT is said to hold for {Sn ; n
⇢
Sn
>✏
n
=0
1} if
for every ✏ > 0.
1} if for some > 0 and all z 2 R,
⇢
Sn
p  z = (z),
lim Pr
n!1
n
where (z) is the normalized Gaussian CDF. Show that if the CLT holds, then the WLLN holds also. Note
1: If you hate ✏, arguments, you will hate this. Note 2: It will probably ease the pain if you convert the
WLLN statement to: For every ✏ > 0, > 0 there exists an n(✏, ) such that for every n n(✏, ),
⇢
⇢
Sn
Sn
Pr
< ✏ 
and Pr
>✏  .
(A.16)
n
n
Solution: The alternate form above simply splits the two tails of the distribution and uses
the definition of a limit. Note that if this statement is satisfied for one value of > 0, it is
satisfied for all larger . Thus we restrict attention to 0 < < 1. For any such and any
✏ > 0, we now derive this alternate form from the CLT. To do this, choose z < 0 to satisfy
= 2 (z). Note that this also satisfies = 2(1
(|z|) (it might help to look at Figure
1.12 while following this).
From the CLT, using this particular z, we see that since the limit is (z), there must be an
n1 large enough that
⇢
Sn
p < z < 2 (z) =
Pr
for n n1 .
n
Rewriting this,
Pr
⇢
Sn
z
<p
n
n
<
for n
n1 .
p
Let n(✏, ) be any integer both greater than n1 and such that z / n < ✏. Then the first
part of (A.16) is satisfied. The second part is a minor variation.
Exercise 1.41: (Details in the proof of Theorem 1.7.4) a) Show that if X1 , X2 , . . . are IID, then
the truncated versions X̆1 , X̆2 , . . . are also IID.
Solution: Readers with an engineering orientation can be forgiven for saying this is obvious.
For an actual proof, note that
8
9
n
<\
=
FX̆1 ···X̆n (y1 , . . . , yn ) = Pr
{X̆j  yj } .
:
;
j=1
The rv X̆j is a function of the rv Xj and for each real number yj , we have
8
;
for yj < X b,
<
{X̆j  yj } =
{Xj  yj }
for X b  yj < X + b,
:
⌦
for yj X + b.
36
APPENDIX A. SOLUTIONS TO EXERCISES
If all the yj are in the range [X b, X+b), then each event {X̆j  yj } is the same as
{Xj  yj }, so it is clear that these n events are statistically independent. If some of the yj
are outside of this range, then the events are still independent since ; and ⌦ are independent
of all other events, including themselves. Thus
FX̆1 ···X̆n (y1 , . . . , yn ) =
n
Y
j=1
n
n
o
Y
Pr X̆j  yj =
FX̆j (yj ).
j=1
Finally, since the Xj are identically distributed, the X̆j are also.
h i
b) Show that each X̆i has a finite mean E X̆ and finite variance
2
.
X̆
Show that the variance is upper
h
i
2
bounded by the second moment around the original mean X, i.e., show that X̆
 E | X̆ E [X] |2 .
Solution: Since X
b  X̆  X + b, we have
h i
X b  E X̆  X + b,
h i
h i
so E X̆ is finite (and in the same way, E |X̆| is finite).
h i
2 is also finite.
Also, since (X̆)2  (|X| + b)2 , we have E X̆ 2  (|X| + b)2 so the variance X̆
⇥
⇤
⇥ ⇤
Finally, for any rv Y with a variance, and any real ↵, E (Y ↵)2 = E Y 2
2↵Y + ↵2 ,
⇥
⇤
2
which is minimized
by ↵ = iY , i.e., E (Y ↵)2
Y . Letting Y be X̆ and ↵ be X shows
h
2  E | X̆
that X̆
E [X] |2 .
c) Assume that X̆i is X
h i truncated
i to X ± b. Show that |X̆
to show that
2
 bE
X̆
|X̆
X|  2bE [|X|].
X|  b and that |X̆
X|  |X
X|. Use this
Solution: Since X b  X̆  X + b, we have b  X̆ X  b, i.e., |X̆ X|  b. To show
that |X̆ X|  |X X|, we look at the two cases, |X X|  b and |X X| > b. In the first
case, X̆ = X, so X̆ X = X X. In the second case, |X̆ X| = b so |X̆ X| < |X X|.
|X̆
X|2  b|X
X|,
where we have used the upper bound b on the first |X̆ X| above and the bound X X
on the other. Combining this with (b),
h
i
⇥
⇤
2
2

E
|
X̆
X|

bE
|X|
+
|X|
 2bE [|X|] .
(A.17)
X̆
d) Let S̆n = X̆1 + · · · + X̆n and show that for any ✏ > 0,
(
)
h i
8bE [|X|]
S̆n
✏
Pr
E X̆

.
n
2
n✏2
(A.18)
2 /n, so the Chebyshev inequality says that
Solution: Note that S̆n /n has variance X̆
(
)
2
4 X̆
⇥ ⇤
S̆n
✏
Pr
E X̆

.
n
2
n✏2
37
A.1. SOLUTIONS FOR CHAPTER 1
Substituting (A.17) in this gives us the desired bound.
h
e) Use Figure 1.13, along with (1.34), to show that for all sufficiently large b, E X
to show that
(
)
8bE [|X|]
S̆n
Pr
E [X]
✏ 
for all large enough b.
n
n✏2
Solution: Note that X̆ and X have the same CDF from X
h
E X
i
X̆ =
R1
Z X b
1
FX (x) dx +
Z 1
X̆
i
 ✏/2. Use this
(A.19)
b to X + b. From (1.34), then,
Fc (x) dx.
X+b
R1
Since X has a finite mean, 0 FcX (x) dx is finite, so limb!1 X+b FcX (x) dx = 0. This shows
h
i
h
i
that E X X̆  ✏/2 for all sufficienly large b. In the same way E X X̆
✏/2 for all
⇥
⇤
sufficiently large b and thus E X X̆  ✏/2. Next, recall that for all numbers a, b, we
have |a b|  |a| + |b|. Thus,
!
h i
⇣
h i⌘
S̆n
S̆n
X =
E X̆
X E X̆
n
n

S̆n
n
h i
h
E X̆ + E X
X̆
i
.
The final term on the right is at most ✏/2 for all sufficiently large b. Thus,
⇥ ⇤ for such b, every
sample point for which S̆n /n X
✏ must also satisfy S̆n /n E X̆
✏/2. When we
take the probability of this set of terms then,
(
)
(
)
h i
S̆n
S̆n
Pr
X
✏  Pr
E X̆
✏/2
for all sufficiently large b.
n
n
Combining this with (A.18) gives the desired result.
f ) Use the following equation to justify (1.96).
⇢
⇢n
o \ n
o
Sn
Sn
Pr
E [X] > ✏
=
Pr
E [X] > ✏
Sn = S̆n
n
n
⇢n
o \n
o
Sn
+ Pr
E [X] > ✏
Sn 6= S̆n
.
n
Solution: The first term on the right above is unchanged if the first Sn is replaced by
S̆n ,nand therefore
it is upper-bounded by (A.19). The second term is upper-bounded by
o
Pr Sn 6= S̆n , which is the probability that one or more of the Xj are outside of [X
b, X + b]. The union bound then results in (1.96).
⇥
⇤
1} be IID rv’s with mean 0 and infinite variance. Assume that E |Xi |1+h =
for some given h, 0 < h < 1 and some finite . Let Sn = X1 + · · · + Xn .
Exercise 1.42: Let {Xi ; i
a) Show that for any y > 0, Pr{|Xi |
y} 
y 1 h.
38
APPENDIX A. SOLUTIONS TO EXERCISES
⇥
⇤
Solution: Since E |Xi |1+h is finite, we can apply the Markov inequality to |Xi |1+h , getting
⇥
⇤
n
o
E |X 1+h |
1+h
1+h
Pr{|Xi | > y} = Pr |Xi |
>y

= y 1 h.
y 1+h
Note that this converges faster as y ! 1 than the Markov inequality applied to |Xi |.
b) Let {X̆i ; i
1} be truncated variables X̆i =
8
<
b
Xi
:
b
:
:
:
Xi b
b  Xi  b
Xi  b
⇥ ⇤
⇥ ⇤
R1
1 h
Show that E X̆ 2  2 1b h . Hint: For a nonnegative rv Z, E X 2 = 0 2 z Pr{Z
⇥
⇤
R
1
this, view X 2 as a rv, so E X 2 = 0 Pr X 2 > z 2 dz 2 ).
z} dz (To establish
⇥ ⇤
Solution: We know that E X̆i2 is finite since it is truncated. Upper bounding it by first
using the hint and next using (a) (for 0 < h < 1), we get
Z b
h i
2
E X̆i =
2xPr{|Xi |
0
x} dx 
Z b
2x x 1 h dx =
0
2 b1 h
.
1 h
n
o
c) Let S̆n = X̆1 + . . . + X̆n . Show that Pr Sn 6= S̆n  n b 1 h .
Solution:
n
o
n[n
Pr Sn 6= S̆n  Pr
i=1
Xi 6= X̆i
o

Xn
i=1
n
o
Pr Xi 6= X̆i  n b 1 h ,
where we have used the union bound and then the fact that Xi 6= X̆i occurs if and only if
|Xi | > b, which is bounded in (a).
d) Show that Pr
Sn
n
✏ 
h
2b1 h
n
+ b1+h
(1 h)n✏2
i
.
n
o
n
o
Solution: Pr{|Sn | ✏n}  Pr Sn 6= S̆n + Pr |S̆n | n✏ . Bounding the first term by
(c) and the second term by the Markov inequality,
h i
h i
Pr{|Sn | n✏}  n b 1 h + E S̆n2 /n2 ✏2 = n b 1 h + E X̆i2 /n✏2

2 b1 h
2b1 h
n
1 h
 n b
+
=
+ 1+h .
2
2
(1 h)n✏
(1 h)n✏
b
e) Optimize your bound with respect to b. How fast does this optimized bound approach 0 with increasing
n?
Solution: To minimize this bound over b, we take the derivative of the term in brackets,
getting


@
2b1 h
n
2
(1 + h)n
h
+
=
b
.
@b (1 h)n✏2 b1+h
n✏2
b2
p
The derivative is 0 at b = n✏ (1 + h)/2. This must be the minimum since the coefficient
b h is positive for all b > 0 and the term in brackets is is increasing in b, being negative for
39
A.1. SOLUTIONS FOR CHAPTER 1
p
b < n✏ (1 + h)/2 and positive for larger b. Substituting this value of b into the result of
(d) and simplifying with a fair amount of algebra,
⇢
Sn
Pr
n
✏

1 + h (1 h)/2
.
2
h2 )
4
 h 1+h
n ✏ (1
This goes to 0 with 1/nh , which is slower (as expected) than the 1/n decrease for the WLLN
with a finite variance, but much faster than the WLLN assuming only E [|X|] < 1. In this
latter case, there is no known rate of convergence.
Exercise 1.43: (MS convergence =) convergence in probability)
Assume
that {Zn ; n
⇥
⇤
a) Let ✏ > 0 be arbitrary and show that for each n
Pr{|Zn
↵|
1} is
↵)2 = 0.
a sequence of rv’s and ↵ is a number with the property that limn!1 E (Zn
0,
✏} 
⇥
⇤
E (Zn ↵)2
.
✏2
(A.20)
⇥
⇤
Solution: It is possible for E (Zn ↵)2 to be infinite for some n, although because of
the limit, it is finite for all large enough n. We consider only such large enough n in what
follows. The statement to be shown is almost the Chebyshev inequality, except that ↵
need not be E [Zn ]. However, the desired statement follows immediately from applying the
Markov inequality to (Zn ↵)2 . This gives us a further reminder that it is the Markov
inequality that is the more fundamental and useful between Markov and Chebyshev.
b) For the ✏ above, let
for all n
m.
⇥
> 0 be arbitrary. Show that there is an integer m such that E (Zn
⇤
↵)2  ✏2
⇥
⇤
2
Solution: By the definition of a limit,
⇥ limn!12 ⇤E (Zn ↵) = 0 means that for all ✏1 > 20,
there is an m large enough that E (Zn ↵) < ✏1 for all n
m. Choosing ✏1 = ✏
establishes the desired result.
c) Show that this implies convergence in probability.
⇥
Solution: Substituting
E (Zn
n
an m such that Pr
Sn /n
↵
2
⇤
↵)2o  ✏2 into (A.20), we see that for all ✏, > 0, there is
✏ 
for all n
m. This is convergence in probability.
Exercise 1.44: Let X1 , X2 . . . be a sequence of IID rv’s each with mean 0 and variance
p
Sn = X1 + · · · + Xn for all n and consider the random variable Sn / n
2
. Let
p
S2n / 2n. Find the limiting
CDF for this sequence of rv’s as n ! 1. The point of this exercise is to see clearly that the CDF of
p
p
Sn / n S2n / 2n is converging in n but that the sequence of rv’s is not converging in any reasonable
sense.
Solution: If we write out the above expression in terms of the Xi , we get
Sn
p
n

n
X
S2n
1
p =
Xi p
n
2n
i=1
1
p
2n
2n
X
i=n+1
X
pi .
2n
p
The first sum above approaches a Gaussian distribution of variance (1 1/ 2)2 and the
second sum approaches a Gaussian distribution of variance 1/2. Since these two terms are
40
APPENDIX A. SOLUTIONS TO EXERCISES
p
independent, the di↵erence approaches a Gaussian distribution of variance 1 + (1 1/ 2)2 .
This means that the distribution of the di↵erence does converge to this Gaussian rv as
n ! 1. As n increases, however, this di↵erence slowly changes, and each time n is doubled,
the new di↵erence is only weakly correlated with the old di↵erence.
Note that Sn /n X behaves in this same way. The CDF converges as n ! 1, but the rv’s
themselves do not approach each other in any reasonable way. The point of the problem
was to emphasize this property.
Exercise 1.45: Verify (1.57), i.e., verify that limy!1 yPr{Y y} = 0 if Y
that yPr{Y
y} 
R
z y
z dFY (z) and show that limy!1
Solution: We can write Pr{Y
the range of integration,
R
z y
zdFY (z) = 0.
y} as the Stieltjes integral
yPr{Y y} =
Z 1
y
y dFY (z) 
Z 1
R1
y
0 and Y < 1. Hint: Show
dFY (z). Since z
y over
z dFY (z).
y
We can express Y as the Stieltjes integral
Z 1
Z y
z dFY (z) = lim
z dFY (z).
y!1 0
0
R1
Since Y < 1, the limit above exists, so that limy!1 y z dFY (y) = 0. Combining these
R
results, limy!1 z y y Pr{Y
y} = 0.
Ry
The result can also be proven without using Stieltjes integration. Since Y = limy!1 0 FcY dy
and Y < 1, the above limit exists, so
Z 1
lim
FcY (z) dz = 0.
y!1 y
For any ✏ > 0, we can thereby choose yo such that
Z 1
FcY (z) dz  ✏/2.
(A.21)
yo
For y > yo , we can express y Pr{Y
y Pr{Y
y} as
y} = (y yo )Pr{Y
y} + yo Pr{Y
y}
Z y
=
Pr{Y
y} dz + yo Pr{Y
y}
yo
Z y

FcY (z) dz + yo Pr{Y
y} .
(A.22)
(A.23)
yo
The first term in (A.23) upper bounds the first term in (A.22) since FcY (z) Pr{Y
y} for
z < y. From (A.21), the first term in (A.23) is upper bounded by ✏/2. Since limy!1 Pr{Y
y} =
0, yo Pr{Y
y}  ✏/2 for all sufficiently large y. Thus yPr{Y
y}  ✏ for all sufficiently
large y. Since ✏ is arbitrary, limy!1 yPr{Y
y} = 0.
41
A.1. SOLUTIONS FOR CHAPTER 1
Exercise 1.46: Show that
Q
✓
m n (1
1
m
1
◆
1/m) = 0. Hint: Show that
✓ ✓
◆◆
✓
◆
1
1
= exp ln 1
 exp
.
m
m
Solution: A particularly useful inequality, for x > 1, is ln(1 + x)  x, with equality at
x = 0. This can be trivially derived by maximizing ln(1 + x) x. This verifies the above
hint.
0
1
Y
Y
X
(1 1/m) 
exp( 1/m) = exp @
1/mA = exp( 1) = 0,
m n
m n
m n
where we have used the fact that the harmonic series diverges
Exercise 1.47: Consider a discrete rv X with the PMF
pX ( 1)
=
(1
10 10 )/2,
pX (1)
=
(1
10 10 )/2,
pX (1012 )
=
10 10 .
a) Find the mean and variance of X. Assuming that {Xm ; m
1} is an IID sequence with the distribution
Solution: X = 100 and
2
14
Sn ⇡ n ⇥ 10 .
104 ⇡ 1014 . Thus S n = 100n and
of X and that Sn = X1 + · · · + Xn for each n, find the mean and variance of Sn . (no explanations needed.)
2
X
= 1014 + (1
10 10 )
b) Let n = 106 and describe the event {Sn  106 } in words. Find an exact expression for Pr Sn  106 =
FSn (106 ).
Solution: This is the event that all 106 trials result in ±1. That is, there are no occurrences
6
of 1012 . Thus Pr Sn  106 = (1 10 10 )10
c) Find a way to use the union bound to get a simple upper bound and approximation of 1
FSn (106 ).
Solution: From the union bound, the probability of one or more occurrences of the sample
value 1012 out of 106 trials is bounded by a sum of 106 terms, each equal to 10 10 , i.e.,
1 FSn (106 )  10 4 . This is also a good approximation, since we can write
⇣
⌘
6
(1 10 10 )10 = exp 106 ln(1 10 10 ) ⇡ exp(106 · 10 10 ) ⇡ 1 10 4 .
d) Sketch the CDF of Sn for n = 106 . You can choose the horizontal axis for your sketch to go from
to +1 or from
3
3
3 ⇥ 10 to 3 ⇥ 10 or from
6
6
1
12
10 to 10 or from 0 to 10 , whichever you think will best
describe this CDF.
Solution: Conditional on no occurrences of 1012 , Sn simply has a binomial distribution.
We know from the central limit theorem for the binomial case that Sn will be approximately Gaussian with mean 0 and standard deviation 103 . Since one or more occurrences
of 1012 occur only with probability 10 4 , this can be neglected in the sketch, so the CDF is
approximately Gaussian with 3 sigma points at ±3 ⇥ 103 . There is a little blip, in the 4th
42
APPENDIX A. SOLUTIONS TO EXERCISES
decimal place out at 1012 which doesn’t show up well in the sketch, but of course could be
important for some purposes such as calculating 2 .
1
FSn
103
0
3 ⇥ 103
0
e) Now let n = 1010 . Give an exact expression for Pr Sn  1010 and show that this can be approximated
by e 1 . Sketch the CDF of Sn for n = 1010 , using a horizontal axis going from slightly below 0 to slightly
more than 2 ⇥ 1012 . Hint: First view Sn as conditioned on an appropriate rv.
Solution: First consider the PMF pB (j) of the number B = j of occurrences of the value
1012 . We have
✓ 10 ◆
10
10
pB (j) =
pj (1 p)10 j
where p = 10 10
j
pB (0) = (1
10
p)10
= exp{1010 ln[1
10
p]} ⇡ exp( 1010 p) = e 1
10
pB (1) = 1010 p(1 p)10 1 = (1 p)10 1 ⇡ e 1
✓ 10 ◆
10
1
10
pB (2) =
p2 (1 p)10 2 ⇡ e 1 .
2
2
Conditional on B = j, Sn will be approximately Gaussian with mean 1012 j and standard
deviation 105 . Thus FSn (s) rises from 0 to e 1 over a range from about 3⇥105 to +3⇥105 .
It then stays virtually constant up to about 1012 3 ⇥ 105 . It rises to 2/e by 1012 + 3 ⇥ 105 .
It stays virtually constant up to 2 ⇥ 1012 3 ⇥ 105 and rises to 2.5/e by 2 ⇥ 1012 + 3 ⇥ 105 .
When we sketch this, it looks like a staircase function, rising from 0 to 1/e at 0, from 1/e
to 2/e at 1012 and from 2/e to 2.5/e at 2 ⇥ 1012 . There are smaller steps at larger values,
but they would not show up on the sketch.
d) Can you make a qualitative statement about how the distribution function of a rv X a↵ects the required
size of n before the WLLN and the CLT provide much of an indication about Sn .
Solution: It can be seen that for this peculiar rv, Sn /n is not concentrated around its
p
mean even for n = 1010 and Sn / n does not look Gaussian even for n = 1010 . For this
particular distribution, n has to be so large that B, the number of occurrences of 1012 , is
large, and this requires n >> 1010 . This illustrates a common weakness of limit theorems.
They say what happens as a parameter (n in this case) becomes sufficiently large, but it
takes extra work to see how large that is.
Exercise 1.48: Let {Yn ; n
{Yn ; n
1} be a sequence of rv’s and assume that limn!1 E [|Yn |] = 0. Show that
1} converges to 0 in probability. Hint 1: Look for the easy way. Hint 2: The easy way uses the
Markov inequality.
43
A.1. SOLUTIONS FOR CHAPTER 1
Solution: Applying the Markov inequality to |Yn | for arbitrary n and arbitrary ✏ > 0, we
have
Pr{|Yn |
✏} 
E [|Yn |]
.
✏
Thus going to the limit n ! 1 for the given ✏,
lim Pr{|Yn |
n!1
✏} = 0.
Since this is true for every ✏ > 0, this satisfies the definition for convergence to 0 in probability.
44
A.2
APPENDIX A. SOLUTIONS TO EXERCISES
Solutions for Chapter 2
Exercise 2.1: a) Find the Erlang density fSn (t) by convolving fX (x) = exp(
x), x
0 with itself n
times.
Solution: For n = 2, we convolve fX (x) with itself.
fS2 (t) =
Z t
0
fX1 (x)fX2 (t
x) dx =
Z t
e
x
(t x)
e
dx =
2
te
t
.
0
For larger n, convolving fX (x) with itself n times is found by taking the convolution n 1
times, i.e., fSn 1 (t), and convolving this with fX (x). Starting with n = 3,
fS3 (t) =
fS4 (t) =
Z t
0
Z t
0
fS2 (x)fX3 (t
3 x2
2
e
x
x) dx =
Z t
2
x
xe
e
(t x)
dx =
0
· e (t x) dx =
4 t3
3!
e
t
3 t2
2
e
t
.
We now see the pattern; each additional integration increases the power of and t by 1
n n 1
and multiplies the denominator by n 1. Thus we hypothesize that fSn (t) = tn! e t .
If one merely wants to verify the well-known Erlang density, one can simply use induction
from the beginning, but it is more satisfying, and not that much more difficult, to actually
derive the Erlang density, as done above.
b) Find the moment generating function of X (or find the Laplace transform of fX (x)), and use this to find
the moment generating function (or Laplace transform) of Sn = X1 + X2 + · · · + Xn .
Solution: The formula for the MGF is almost trivial here,
Z 1
gX (r) =
e x erx dx =
for r < .
r
0
Since Sn is the sum of n IID rv’s,
⇥
⇤n
gSn (r) = gX (r)
=
✓
r
◆n
.
c) Find the Erlang density by starting with (2.15) and then calculating the marginal density for Sn .
Solution: To find the marginal density, fSn (sn ), we start with the joint density in (2.15)
and integrate over the region of space where s1  s2  · · ·  sn . It is a peculiar integral,
since the integrand is constant and we are just finding the volume of the n 1 dimensional
space in s1 , . . . , sn 1 with the inequality constraints above. For n = 2 and n = 3, we have
Z s2
⇣
⌘
2
s2
2
fS2 (s2 ) =
e
ds1 =
e s2 s2
o
Z s3 Z s2
Z s3
⇣
⌘ s2
3
s3
3
s3
3
fS3 (s3 ) =
e
ds1 ds2 =
e
s2 ds2 =
e s3 3 .
2
0
0
0
45
A.2. SOLUTIONS FOR CHAPTER 2
The critical part of these calculations is the calculation of the volume, and we can do this
inductively by guessing from the previous equation that the volume, given sn , of the n 1
dimensional space where 0 < s1 < · · · < sn 1 < sn is snn 1 /(n 1)!. We can check that by
Z sn Z sn 1
Z s2
Z sn
snn 21
snn 1
···
ds1 . . . dsn 2 dsn 1 =
dsn 1 =
.
(n 2)!
(n 1)!
0
0
0
0
This volume integral, multiplied by
ne
sn , is then the desired marginal density.
A more elegant and instructive way to calculate this volume is by first observing that the
volume of the n 1 dimensional cube, sn on a side, is snn 1 . Each point in this cube can
be visualized as a vector (s1 , s2 , . . . , sn 1 ). Each component lies in (0, sn ), but the cube
doesn’t have the ordering constraint s1 < s2 < · · · < sn 1 . By symmetry, the volume of
points in the cube satisfying this ordering constraint is the same as the volume in which
the components s1 , . . . sn 1 are ordered in any other particular way. There are (n 1)!
di↵erent ways to order these n 1 components (i.e., there are (n 1)! permutations of the
components), and thus the volume with the ordering constraints, is snn 1 /(n 1)!.
Exercise 2.2: a) Find the mean, variance, and moment generating function of N (t), as given by (2.17).
Solution:
E [N (t)] =
1
X
n( t)n e
n=0
t
n!
=
t
1
X
( t)n 1 e
n=1
(n
t
1)!
=
t
1
X
( t)m e
t
m!
m=0
=
t,
where in the last step, we recognized the terms in the sum as the Poisson PMF of parameter
t; since the PMF must sum to 1 the mean is t as shown. Using the same approach, the
second moment is ( t)2 + t, so the variance is t. For the MGF,
r
1
1 
h
i
X
X
( ter )n e t
( ter )n e te
r
rN (t)
t
E e
=
= e
e te .
n!
n!
n=0
n=0
Recognizing the term in brackets as the PMF of a Poisson rv of parameter
expression, we get
h
i
r
E erN (t) = e t e te = exp[ t(er 1)].
er in this
b) Show by discrete convolution that the sum of two independent Poisson rv’s is again Poisson.
Solution: Let X and Y be independent Poisson with parameters
Then
pX+Y (m) =
m
X
pX (n)pY (m
n=0
=
e
m!
m ✓
µ X
n=0
n) = e
µ
◆
m n m n
e
µ
=
n
where we recognized the final sum as a binomial sum.
m
X
and µ respectively.
n µm n
n!(m
n=0
µ(
n)!
+ µ)m
,
m!
46
APPENDIX A. SOLUTIONS TO EXERCISES
c) Show by using the properties of the Poisson process that the sum of two independent Poisson rv’s must
be Poisson.
e (t, t + ⌧ )
Solution: For any t, ⌧ > 0 in a Poisson process we know that N (t + ⌧ ) = N (t) + N
e
e
is Poisson and N (t, t + ⌧ ) is Poisson in ⌧ . Since N (t) and N (t, ⌧ ) are independent and t
and ⌧ are arbitrary positive numbers, the sum of 2 independent Poisson rv’s is Poisson.
Exercise 2.3: The purpose of this exercise is to give an alternate derivation of the Poisson distribution
for N (t), the number of arrivals in a Poisson process up to time t. Let
be the rate of the process.
a) Find the conditional probability Pr{N (t) = n | Sn = ⌧ } for all ⌧  t.
Solution: The condition Sn = ⌧ means that the epoch of the nth arrival is ⌧ . Conditional
on this, the event {N (t) = n} for some t > ⌧ means there have been no subsequent arrivals
from ⌧ to t. In other words, it means that the (n + 1)th interarrival time, Xn+1 exceeds
t ⌧ . This interarrival time is independent of Sn and thus
Pr{N (t) = n | Sn = ⌧ } = Pr{Xn+1 > t
⌧} = e
(t ⌧ )
for t > ⌧.
(A.24)
b) Using the Erlang density for Sn , use (a) to find Pr{N (t) = n}.
Solution: We find Pr{N (t) = n} simply by averaging (A.24) over Sn .
Z 1
Pr{N (t)=n} =
Pr{N (t)=n | Sn =⌧ } fSn (⌧ ) d⌧
0
Z t
n⌧ n 1e ⌧
=
e (t ⌧ )
d⌧
(n 1)!
0
ne t Z t
( t)n e t
=
⌧ n 1 d⌧ =
.
(n 1)! 0
n!
Exercise 2.4: Assume that a counting process {N (t); t>0} has the independent and stationary increment
properties and satisfies (2.17) (for all t > 0). Let X1 be the epoch of the first arrival and Xn be the interarrival
time between the (n 1)st and the nth arrival. Use only these assumptions in doing the following parts of
this exercise.
a) Show that Pr{X1 > x} = e
x
.
Solution: The event {X1 > x} is the same as the event {N (x) = 0}. Thus, from (2.17),
Pr{X1 > x} = Pr{N (x) = 0} = e x .
b) Let Sn 1 be the epoch of the (n 1)st arrival. Show that Pr{Xn > x | Sn 1 = ⌧ } = e
x
.
Solution: The conditioning event {Sn 1 = ⌧ } is somewhat messy to deal with in terms
of the parameters of the counting process, so we start by solving an approximation of the
desired result and then go to a limit as the approximation becomes increasingly close. Let
> 0 be a small positive number which we later allow to approach 0. We replace the event
{Sn 1 = ⌧ } with {⌧
< Sn 1  ⌧ }. Since the occurrence of two arrivals in a small interval
of size is very unlikely (of order o( )), we also include the condition that Sn 2  ⌧
and
Sn ⌧ . With this, the approximate conditioning event becomes
{Sn 2  ⌧
< Sn 1  ⌧ < Sn } = {N (⌧
e (⌧
) = n 2, N
, ⌧ ) = 1}.
47
A.2. SOLUTIONS FOR CHAPTER 2
Since we are irretrievably deep in approximations, we also replace the event {Xn > x}
e (⌧, ⌧ +x) = 0}. Note that this ap(conditional on this approximating condition) with {N
proximation is exact for = 0, since in that case Sn 1 = ⌧ , so Xn > x means that no
arrivals occur in (⌧, ⌧ +x).
We can now solve this approximate problem precisely,
n
o
n
o
e (⌧, ⌧ +x) = 0 | N (⌧ ) = n 2, N
e (⌧ , ⌧ ) = 1
e (⌧, ⌧ +x) = 0
Pr N
= Pr N
= e
x
.
In the first step, we used the independent increment property and in the second, the stationary increment property along with (a).
In the limit ! 0, the conditioning event becomes Sn 1 = ⌧ and the conditioned event
becomes Xn > x. The argument is very convincing, and becomes more convincing the
more one thinks about it. At the same time, it is somewhat unsatisfactory since both the
conditioned and conditioning event are being approximated. One can easily upper and
lower bound the probability that Xn > x for each but the ‘proof’ then requires many
uninsightful and tedious details.
c) For each n > 1, show that Pr{Xn > x} = e
x
and that Xn is independent of Sn 1 .
Solution: We have seen that Pr{Xn > x | Sn 1 =⌧ } = e x . Since the value of this probability conditioned on {Sn 1 = ⌧ } does not depend on ⌧ , Xn must be independent of
Sn 1 .
d) Argue that Xn is independent of X1 , X2 , . . . Xn 1 .
Solution: Equivalently, we show that Xn is independent of {S1 =s1 , S2 =s2 , . . . , Sn 1 =sn 1 }
for all choices of 0 < s1 < s2 < · · · < sn 1 . Using the same artifice as in (b), this latter
event is the same as the limit as ! 0 of the event
{N (s1
e (s1
)=0, N
e (s1 , s2
, s1 )=1, N
e (s2
)=0, N
e (sn 1
, s2 )=1, . . . , N
, sn 1 )=1}.
From the independent increment property, the above event is then independent of the
e (sn 1 , sn 1 +x) for each x > 0. As in (b), this shows that Xn is independent of
rv N
S1 , . . . , Sn 1 and thus of X1 , . . . , Xn 1 .
The most interesting part of this entire exercise is that the Poisson CDF was used only to
derive the fact that X1 has an exponential CDF. In other words, we have shown quite a bit
more than Definition 2 of a Poisson process. We have shown that if X1 is exponential and
the stationary and independent increment properties hold, then the process is Poisson. On
the other hand, we have shown that a careful derivation of the properties of the Poisson
process from this definition requires a great deal of intricate, uninsightful, and tedious
analysis.
Exercise 2.5: The point of this exercise is to show that the sequence of PMF’s for the counting process
of a Bernoulli process does not specify the process. In other words, knowing that N (t) satisfies the binomial
distribution for all t does not mean that the process is Bernoulli. This helps us understand why the
second definition of a Poisson process requires stationary and independent increments along with the Poisson
distribution for N (t).
48
APPENDIX A. SOLUTIONS TO EXERCISES
a) For a sequence of binary rv’s Y1 , Y2 , Y3 , . . . , in which each rv is 0 or 1 with equal probability, find a
joint distribution for Y1 , Y2 , Y3 that satisfies the binomial distribution, pN (t) (k) = kt 2 t for t = 1, 2, 3 and
0  k  t, but for which Y1 , Y2 , Y3 are not independent.
Your solution should contain four 3-tuples with probability 1/8 each, two 3-tuples with probability 1/4 each,
and two 3-tuples with probability 0. Note that by making the subsequent arrivals IID and equiprobable, you
have an example where N (t) is binomial for all t but the process is not Bernoulli. Hint: Use the binomial
for t = 3 to find two 3-tuples that must have probability 1/8. Combine this with the binomial for t = 2
to find two other 3-tuples with probability 1/8. Finally look at the constraints imposed by the binomial
distribution on the remaining four 3-tuples.
Solution: The 3-tuples 000 and 111 each have probability 1/8, and are the unique tuples
for which N (3) = 0 and N (3) = 3 respectively. In the same way, N (2) = 0 only for
(Y1 , Y2 ) = (0, 0), so (0,0) has probability 1/4. Since (0, 0, 0) has probability 1/8, it follows
that (0, 0, 1) has probability 1/8. In the same way, looking at N (2) = 2, we see that (1, 1, 0)
has probability 1/8.
The four remaining 3-tuples are illustrated below, with the constraints imposed by N (1)
and N (2) on the left and those imposed by N (3) on the right.
1/4
1/4
0
0
1
1
1
1
0
0
0
1
0
1
1/4
⇤
1/4
It can be seen by inspection from the figure that if (0, 1, 0) and (1, 0, 1) each have probability
1/4, then the constraints are satisfied. There is one other solution satisfying the constraints:
choose (0, 1, 1) and (1, 0, 0) to each have probability 1/4.
b) Generalize (a) to the case where Y1 , Y2 , Y3 satisfy Pr{Yi = 1} = q and Pr{Yi = 0} = 1
q. Assume
q < 1/2 and find a joint distribution on Y1 , Y2 , Y3 that satisfies the binomial distribution, but for which the
3-tuple (0, 1, 1) has zero probability.
Solution: Arguing as in (a), we see that Pr{(0, 0, 0)} = (1 q)3 , Pr{(0, 0, 1)} = (1 q)2 p,
Pr{(1, 1, 1)} = q 3 , and Pr{(1, 1, 0)} = q 2 (1 q). The remaining four 3-tuples are constrained
as shown below.
q(1
q)
q(1
q)
0
0
1
1
1
1
0
0
0
1
0
1
⇤
2q(1
q)2
2q 2 (1
q)
If we set Pr{(0, 1, 1)} = 0, then Pr{0, 1, 0)} = q(1
Pr{(1, 0, 0)} = q(1 q) 2q 2 (1 q) = q(1 q)(1
constraints.
q), Pr{(1, 0, 1)} = 2q 2 (1 q), and
2q). This satisfies all the binomial
c) More generally yet, view a joint PMF on binary t-tuples as a nonnegative vector in a 2t dimensional vector
space. Each binomial probability pN (⌧ ) (k) =
⌧
k
q k (1
q)⌧
k
constitutes a linear constraint on this vector.
A.2. SOLUTIONS FOR CHAPTER 2
49
For each ⌧ , show that one of these constraints may be replaced by the constraint that the components of
the vector sum to 1.
Solution: There are 2t binary n-tuples and each has a probability, so the joint PMF can
be viewed as a vector of 2t numbers. The binomial probability pN (⌧ ) (k) = ⌧k q k (1 q)⌧ k
specifies the sum of the probabilities of the n-tuples in the event {N (⌧ ) = k}, and thus is a
linear constraint on the joint PMF. Note: Mathematically, a linear constraint specifies that
a given weighted sum of components is 0. The type of constraint here, where the weighted
sum is a nonzero constant, is more properly called a first-order constraint. Engineers often
refer to first order constraints as linear, and we follow that practice here.
P
Since ⌧k=0 ⌧k pk q ⌧ k = 1, one of these ⌧ + 1 constraints can be replaced by the constraint
that the sum of all 2t components of the PMF is 1.
d) Using (c), show that at most (t + 1)t/2 + 1 of the binomial constraints are linearly independent. Note
that this means that the linear space of vectors satisfying these binomial constraints has dimension at least
2t
(t + 1)t/2
1. This linear space has dimension 1 for t = 3, explaining the results in parts a) and
b). It has a rapidly increasing dimension for t > 3, suggesting that the binomial constraints are relatively
ine↵ectual for constraining the joint PMF of a joint distribution. More work is required for the case of t > 3
because of all the inequality constraints, but it turns out that this large dimensionality remains.
Solution: We know that the sum of all the 2t components of the PMF is 1, and we saw in
(c) that for each integer ⌧, 1  ⌧  t, there are ⌧ additional linear constraints
on the PMF
Pt
established by the binomial terms N (⌧ = k) for 0  k  ⌧ . Since ⌧ =1 ⌧ = (t + 1)t/2,
we see that there are t(t + 1)/2 independent linear constraints on the joint PMF imposed
by the binomial terms, in addition to the overall constraint that the components sum to
1. Thus the dimensionality of the 2t vectors satisfying these linear constraints is at least
2t 1 (t + 1)t/2.
Exercise 2.6: Let h(x) be a positive function of a real variable that satisfies h(x + t) = h(x) + h(t) and
let h(1) = c.
a) Show that for integer k > 0, h(k) = kc.
Solution: We use induction. We know h(1) = c and the inductive hypothesis is that h(n) =
nc, which is satisfied for n = 1. We then have h(n + 1) = h(n) + h(1) = nc + c = (n + 1)c.
Thus if the hypothesis is satisfied for n it is also satisfied for n + 1, which verifies that it is
satisfied for all positive integer n.
b) Show that for integer j > 0, h(1/j) = c/j.
Solution: Repeatedly adding h(1/j) to itself, we get h(2/j) = h(1/j) + h(1/j) = 2h(1/j),
h(3/j) = h(2/j) + h(1/j) = 3h(1/j) and so forth to h(1) = h(j/j) = jh(1/j). Thus
h(1/j) = c/j.
c) Show that for all positive integers k, j, h(k/j) = ck/j.
Solution: Since h(1/j) = c/j, for each positive integer j, we can use induction on positive
integers k for any given j > 0 to get h(k/j) = ck/j
d) The above parts show that h(x) is linear in positive rational numbers. For very picky mathematicians,
this does not guarantee that h(x) is linear in positive real numbers. Show that if h(x) is also monotonic in
50
APPENDIX A. SOLUTIONS TO EXERCISES
x, then h(x) is linear in x > 0.
Solution: Let x > 0 be a real number and let x1 , x2 , . . . be a sequence of increasing rational
numbers approaching x. Then limi!1 h(xi ) = c limi!1 xi = cx. Thus h(x)
cx. If we
look at a similarly decreasing sequence, we see that h(x)  cx, so h(x) = cx.
Exercise 2.7: Assume that a counting process {N (t); t>0} has the independent and stationary increment
properties and, for all t > 0, satisfies
n
o
e (t, t + ) = 0
Pr N
n
o
e (t, t + ) = 1
Pr N
n
o
e (t, t + ) > 1
Pr N
=
1
+ o( )
=
=
a) Let Fc1 (⌧ ) = Pr{N (⌧ ) = 0} and show that dFc1 (⌧ )/d⌧ =
+ o( )
(A.25)
o( ).
Fc1 (⌧ ).
Solution: Note that Fc1 is the complementary CDF of X1 . Using the fundamental definition
of a derivitive,
dFc1 (⌧ )
d⌧
= lim
!0
= lim
!0
= lim
!0
Fc1 (⌧ + )
Fc1 (⌧ )
Pr{N (⌧ + ) = 0}
= lim
!0
⇣ n
o
⌘
e (⌧, ⌧ + ) = 0
Pr{N (⌧ ) = 0} Pr N
1
Pr{N (⌧ ) = 0} 1
= Pr{N (⌧ ) = 0} (
) =
+ o( )
Pr{N (⌧ ) = 0}
(A.26)
1
(A.27)
Fc1 (⌧ ),
where (A.26) resulted from the independent increment property and (A.27) resulted from
(A.25).
b) Show that X1 , the time of the first arrival, is exponential with parameter .
Solution: The complementary CDF of X1 is Fc1 (⌧ ), which satisfies the first order linear
di↵erential equation in (a). The solution to that equation, with the boundary point Fc1 (0) =
1 is e ⌧ for ⌧ > 0, showing that X1 is exponential.
n
o
e (t, t + ⌧ ) = 0 | Sn 1 = t and show that dFcn (⌧ )/d⌧ =
c) Let Fcn (⌧ ) = Pr N
Fcn (⌧ ).
e (t, t+⌧ ) = 0} is independent of
Solution: By the independent increment property, {N
{Sn 1 = t}, and by the stationary increment property, it has the same probability as
N (⌧ ) = 0. Thus dFcn (⌧ )/d⌧ =
Fcn (⌧ ) follows by the argument in (a).
d) Argue that Xn is exponential with parameter
and independent of earlier arrival times.
Solution: Note that Fcn (⌧ ) is the complementary CDF of Xn conditional on {Sn 1 = t},
and as shown in (c), it’s distribution does not depend on Sn 1 . In other words, Fcn (⌧ ) as
found in (c) is the complementary CDF of Xn . It is exponential by the argument given in
e (t, t+⌧ ) = 0} is independent of
(b) for X1 , and it is independent of earlier arrivals since {N
arrivals before t.
This was also shown in the solution to Exercise 2.4 (c) and (d). In other words, definitions
2 and 3 of a Poisson process both follow from the assumptions that X1 is exponential and
that the stationary and independent increment properties hold.
51
A.2. SOLUTIONS FOR CHAPTER 2
Exercise 2.8: For a Poisson process, let t > 0 be arbitrary and let Z1 be the duration of the interval
from t until the next arrival after t. Let Zm , for each m > 1, be the interarrival time from the epoch of the
(m 1)st arrival after t until the mth arrival after t.
a) Given that N (t) = n, explain why Z1 = Xn+1
t + Sn and, for each m > 1, Zm = Xm+n .
Solution: Given N (t) = n, the mth arrival after t for m
overall. Its arrival epoch is then
1 must be the (n + m)th arrival
Sn+m = Sn+m 1 + Xn+m .
(A.28)
By definition for m > 1, Zm is the interval from Sn+m 1 (the time of the (m 1)st arrival
after t) to Sn+m . Thus, from (A.28), Zm = Xn+m for m > 1. For m = 1, Z1 is the interval
from t until the next arrival, i.e., Z1 = Sn+1 t. Using m = 1 in (A.28), Z1 = Sn +Xn+1 t.
b) Conditional on N (t) = n and Sn = ⌧ , show that Z1 , Z2 , . . . are IID.
Solution: The condition N (t) = n and Sn = ⌧ implies that Xn+1 > t ⌧ . Given this
condition, Xn+1 (t ⌧ ) is exponential, so Z1 = Xn+1 (t ⌧ ) is exponential. For m > 1,
and for the given condition, Zm = Xn+m . Since Xn+m is exponential and independent of
X1 , . . . , Xn+m 1 , we see that Zm is also exponential and independent of Z1 , . . . , Zm 1 .
Since these exponential distributions are the same, Z1 , Z2 , . . . , are IID conditional on
N (t) = n and Sn = ⌧ .
c) Show that Z1 , Z2 , . . . are IID.
Solution: We have shown that{Zm ; m 1} are IID and exponential conditional on N (t) =
n and Sn = ⌧ . The joint distribution of Z1 , . . . , Zm is thus specified as a function of N (t) = n
and Sn = ⌧ . Since this function is constant in n and ⌧ , the joint conditional distribution
must be the same as the joint unconditional distribution, and therefore Z1 , . . . , Zm are IID
for all m > 0.
Exercise 2.9: Consider a “shrinking Bernoulli” approximation N (m ) = Y1 + · · · + Ym to a Poisson
process as described in Subsection 2.2.5.
a) Show that
Pr{N (m ) = n} =
!
m
(
n
)n (1
)m n .
Solution: This is just the binomial PMF in (1.23)
b) Let t = m , and let t be fixed for the remainder of the exercise. Explain why
!✓ ◆ ✓
◆m n
n
m
t
t
lim Pr{N (t) = n} = lim
1
,
m!1
!0
n
m
m
where the limit on the left is taken over values of
that divide t.
Solution: This is the binomial PMF in (a) with
= t/m.
c) Derive the following two equalities:
!
m 1
1
lim
=
;
m!1
n mn
n!
✓
and
lim
m!1
1
t
m
◆m n
=e
t
.
52
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: Note that
✓ ◆
n 1
m
m!
1 Y
=
=
(m
n
n!(m n)!
n!
i).
i=0
When this is divided by mn , each term in the product above is divided by m, so
✓ ◆
n 1
n 1
m 1
1 Y (m i)
1 Y⇣
=
=
1
n mn
n!
m
n!
i=0
i=0
i⌘
.
m
(A.29)
Taking the limit as m ! 1, each of the n terms in the product approaches 1, so the limit
is 1/n!, verifying the first equality in (c). For the second,
✓
1
t
m
◆m n


⇣
t⌘
= exp (m n) ln 1
= exp (m
m

n t
= exp
t+
+ (m n)o(1/m) .
m
In the second equality, we expanded ln(1 x) =
final expression is exp( t), as was to be shown.
⇣
n)
t
m
⌘
+ o(1/m
x + x2 /2 · · · . In the limit m ! 1, the
If one wishes to see how the limit in (A.29) is approached, we have
!
✓
◆
n 1
n
X1 ⇣
1 Y⇣
i⌘
1
i⌘
1
n(n 1)
1
=
exp
ln 1
=
exp
+ o(1/m) .
n!
m
n!
m
n!
2m
i=0
i=1
d) Conclude from this that for every t and every n, lim !0 Pr{N (t)=n} = Pr{N (t)=n} where {N (t); t > 0}
is a Poisson process of rate .
Solution: We simply substitute the results of (c) into the expression in (b), getting
lim Pr{N (t) = n} =
!0
( t)n e
n!
t
.
This shows that the Poisson PMF is the limit of shrinking Bernoulli PMF’s, but recall
from Exercise 2.5 that this is not quite enough to show that a Poisson process is the
limit of shrinking Bernoulli processes. It is also necessary to show that the stationary
and independent increment properties hold in the limit ! 0. It can be seen that the
Bernoulli process has these properties at each increment , and it is intuitively clear that
these properties should hold in the limit, but it seems that carrying out all the analytical
details to show this precisely is neither warranted or interesting.
Exercise 2.10: Let {N (t); t > 0} be a Poisson process of rate .
a) Find the joint probability mass function (PMF) of N (t), N (t + s) for s > 0.
Solution: Note that N (t + s) is the number of arrivals in (0, t] plus the number in t, t+s .
In order to find the joint distribution of N (t) and N (t+s), it makes sense to express N (t+s)
53
A.2. SOLUTIONS FOR CHAPTER 2
e (t, t+s) and to use the independent increment property to see that N
e t, t+s)
as N (t) + N
is independent of N (t). Thus for m > n,
n
o
e (t, t+s)=m n)
pN (t)N (t+s) (n, m) = Pr{N (t)=n} Pr N
=
( t)n e
n!
m n
t
⇥
s
(m
e s
,
n)!
e (t, t+s) has the same
where we have used the stationary increment property to see that N
distribution as N (s). This solution can be rearranged in various ways, of which the most
interesting is
✓ ◆⇣
m
( t+s) e (s+t)
m
t ⌘n ⇣ s ⌘m n
pN (t)N (t+s) (n, m) =
⇥
,
m!
n
t+s
t+s
where the first term is pN (t+s) (m) (the probability of m arrivals in (0, t+s]) and the second,
conditional on the first, is the binomial probability that n of those m arrivals occur in (0, t).
b) Find E [N (t) · N (t + s)] for s > 0.
e (t, t+s),
Solution: Again expressing N (t+s) = N (t) + N
h
i
⇥
⇤
e (t, t+s)
E [N (t) · N (t+s)] = E N 2 (t) + E N (t)N
⇥
⇤
= E N 2 (t) + E [N (t)] E [N (s)]
=
t+
t + t s.
2 2
In the final step, we have used the fact (from Table 1.2 or a simple calculation) that the
mean of a Poisson rv with PMF ( t)n exp( t)/n! is t and the variance is also t (thus
the second moment is t + ( t)2 ). This mean and variance was also derived in Exercise 2.2
and can also be calculated by looking at the limit of shrinking Bernoulli processes.
h
i
e (t1 , t3 ) · N
e (t2 , t4 ) where N
e (t, ⌧ ) is the number of arrivals in (t, ⌧ ] and t1 < t2 < t3 < t4 .
c) Find E N
Solution: This is a straightforward generalization of what was done in (b). We break up
e (t1 , t3 ) as N
e (t1 , t2 ) + N
e (t2 , t3 ) and break up N
e (t2 , t4 ) as N
e (t2 , t3 ) + N
e (t3 , t4 ). The interval
N
(t2 , t3 ] is shared. Thus
h
i
h
i
h
i
h
i
e (t1 , t3 )N
e (t2 , t4 ) = E N
e (t1 , t2 )N
e (t2 , t4 ) + E N
e 2 (t2 , t3 ) + E N
e (t2 , t3 )N
e (t3 , t4 )
E N
=
2
(t2 t1 )(t4 t2 ) +
=
2
(t3 t1 )(t4 t2 ) + (t3 t2 ).
2
(t3 t2 )2 + (t3 t2 ) +
2
(t3 t2 )(t4 t3 )
Exercise 2.11: An elementary experiment is independently performed N times where N is a Poisson
rv of mean . Let {a1 , a2 , . . . , aK } be the set of sample points of the elementary experiment and let pk ,
1  k  K, denote the probability of ak .
a) Let Nk denote the number of elementary experiments performed for which the output is ak . Find the
PMF for Nk (1  k  K). (Hint: no calculation is necessary.)
Solution: View the experiment as a combination of K Poisson processes where the kth has
rate pk and the combined process has rate . At t = 1, the total number of experiments is
54
APPENDIX A. SOLUTIONS TO EXERCISES
then Poisson with mean
( pk )n e pk /n!.
and the kth process is Poisson with mean pk . Thus pNk (n) =
b) Find the PMF for N1 + N2 .
Solution: By the same argument,
pN1 +N2 (n) =
[ (p1 + p2 )]n e
n!
(p1 +p2 )
.
c) Find the conditional PMF for N1 given that N = n.
Solution: Each of the n combined arrivals over (0, 1] is then a1 with probability p1 . Thus
N1 is binomial given that N = n,
✓ ◆
n
pN1 |N (n1 |n) =
(p1 )n1 (1 p1 )n n1 .
n1
d) Find the conditional PMF for N1 + N2 given that N = n.
Solution: Let the sample value of N1 + N2 be n12 . By the same argument in (c),
✓ ◆
n
pN1 +N2 |N (n12 |n) =
(p1 + p2 )n12 (1 p1 p2 )n n12 .
n12
e) Find the conditional PMF for N given that N1 = n1 .
Solution: Since N is then n1 plus the number of arrivals from the other processes, and
those additional arrivals are Poisson with mean (1 p1 ),
pN |N1 (n|n1 ) =
[ (1
p1 )]n n1 e
(n n1 )!
(1 p1 )
.
Exercise 2.12: Starting from time 0, northbound buses arrive at 77 Mass. Avenue according to a Poisson
process of rate . Customers arrive according to an independent Poisson process of rate µ. When a bus
arrives, all waiting customers instantly enter the bus and subsequent customers wait for the next bus.
a) Find the PMF for the number of customers entering a bus (more specifically, for any given m, find the
PMF for the number of customers entering the mth bus).
Solution: Since the customer arrival process and the bus arrival process are independent
Poisson processes, the sum of the two counting processes is a Poisson counting process of
rate + µ. Each arrival for the combined process is a bus with probability /( + µ) and
a customer with probability µ/( + µ). The sequence of choices between bus or customer
arrivals is an IID sequence. Thus, starting immediately after bus m 1 (or at time 0 for
m
0, is
⇥ = 1), the
⇤n probability of n customers in a row followed by a bus, for any n
µ/( + µ)
/( + µ). This is the probability that n customers enter the mth bus, i.e.,
defining Nm as the number of customers entering the mth bus, the PMF of Nm is
✓
◆n
µ
pNm (n) =
.
(A.30)
+µ
+µ
55
A.2. SOLUTIONS FOR CHAPTER 2
b) Find the PMF for the number of customers entering the mth bus given that the interarrival interval
between bus m
1 and bus m is x.
Solution: For any given interval of size x (i.e., for the interval (s, s+x] for any given s),
the number of customer arrivals in that interval has a Poisson distribution of rate µ. Since
the customer arrival process is independent of the bus arrivals, this is also the distribution
of customer arrivals between the arrival of bus m 1 and that of bus m given that the
interval Xm between these bus arrivals is x. Thus letting Xm be the interval between the
arrivals of bus m 1 and m,
pNm |Xm (n|x) = (µx)n e µx /n!.
c) Given that a bus arrives at time 10:30 PM, find the PMF for the number of customers entering the next
bus.
Solution: First assume that for some given m, bus m 1 arrives at 10:30. The number
of customers entering bus m is still determined by the argument in (a) and has the PMF
in (A.30). In other words, Nm is independent of the arrival time of bus m 1. From the
formula in (A.30), the PMF of the number entering a bus is also independent of m. Thus
the desired PMF is that on the right side of (A.30).
d) Given that a bus arrives at 10:30 PM and no bus arrives between 10:30 and 11, find the PMF for the
number of customers on the next bus.
Solution: Using the same reasoning as in (b), the number of customer arrivals from 10:30
to 11 is a Poisson rv, say N 0 with PMF pN 0 (n) = (µ/2)n e µ/2 /n! (we are measuring time in
hours so that µ is the customer arrival rate in arrivals per hour.) Since this is independent
of bus arrivals, it is also the PMF of customer arrivals in (10:30 to 11] given no bus arrival
in that interval.
The number of customers to enter the next bus is N 0 plus the number of customers N 00
arriving between 11 and the next bus arrival. By the argument in (a), N 00 has the PMF in
(A.30). Since N 0 and N 00 are independent, the PMF of N 0 + N 00 (the number entering the
next bus given this conditioning) is the convolution of the PMF’s of N 0 and N 00 , i.e.,
pN 0 +N 00 (n) =
n ✓
X
k=0
µ
+µ
◆k
(µ/2)n k e µ/2
.
+µ
(n k)!
This does not simplify in any nice way.
e) Find the PMF for the number of customers waiting at some given time, say 2:30 PM (assume that the
processes started infinitely far in the past). Hint: think of what happens moving backward in time from
2:30 PM.
Solution: Let {Zi ; 1 < i < 1} be the (doubly infinite) IID sequence of bus/customer
choices where Zi = 0 if the ith combined arrival is a bus and Zi = 1 if it is a customer.
Indexing this sequence so that 1 is the index of the most recent combined arrival before
2:30, we see that if Z 1 = 0, then no customers are waiting at 2:30. If Z 1 = 1 and Z 2 = 0,
then one customer is waiting. In general, if Z n = 0 and Z m = 1 for 1  m < n, then n
56
APPENDIX A. SOLUTIONS TO EXERCISES
customers are waiting. Since the Zi are IID, the PMF of the number Npast waiting at 2:30
is
✓
◆n
µ
pN past (n) =
.
+µ
+µ
This is intuitive in one way, i.e., the number of customers looking back toward the previous
bus should be the same as the number of customers looking forward to the next bus since
the bus/customer choices are IID. It is paradoxical in another way since if we visualize a
sample path of the process, we see waiting customers gradually increasing until a bus arrival,
then going to 0 and gradually increasing again, etc. It is then surprising that the number of
customers at an arbitrary time is statistically the same as the number immediately before
a bus arrival. This paradox is partly explained at the end of (f) and fully explained in
Chapter 5.
Mathematically inclined readers may also be concerned about the notion of ‘starting infinitely far in the past.’ A more precise way of looking at this is to start the Poisson process
at time 0 (in accordance with the definition of a Poisson process). We can then find the
PMF of the number waiting at time t and take the limit of this PMF as t ! 1. For very
large t, the number M of combined arrivals before t is large with high probability. Given
M = m, the geometric distribution above is truncated at m, which is a neglibible correction
for t large. This type of issue is handled more cleanly in Chapter 5.
f ) Find the PMF for the number of customers getting on the next bus to arrive after 2:30. Hint: this is
di↵erent from (a); look carefully at (e).
Solution: The number getting on the next bus after 2:30 is the sum of the number Np
waiting at 2:30 and the number of future customer arrivals Nf (found in (c)) until the next
bus after 2:30. Note that Np and Nf are IID. Convolving these PMF’s, we get
◆m
✓
◆n m
n ✓
X
µ
µ
pNp +Nf (n) =
+µ
+µ
+µ
+µ
m=0
✓
◆n ✓
◆2
µ
= (n+1)
.
+µ
+µ
This is very surprising. It says that the number of people getting on the first bus after 2:30
is the sum of two IID rv’s, each with the same distribution as the number to get on the mth
bus. This is an example of the ‘paradox of residual life,’ which we discuss very informally
here and then discuss carefully in Chapter 5.
Consider a very large interval of time (0, to ] over which a large number of bus arrivals occur.
Then choose a random time instant T , uniformly distributed in (0, to ]. Note that T is more
likely to occur within one of the larger bus interarrival intervals than within one of the
smaller intervals, and thus, given the randomly chosen time instant T , the bus interarrival
interval around that instant will tend to be larger than that from a given bus arrival, m 1
say, to the next bus arrival m. Since 2:30 is arbitrary, it is plausible that the interval around
2:30 behaves like that around T , making the result here also plausible.
g) Given that I arrive to wait for a bus at 2:30 PM, find the PMF for the number of customers getting on
the next bus.
57
A.2. SOLUTIONS FOR CHAPTER 2
Solution: My arrival at 2:30 is in addition to the Poisson process of customers, and thus
the number entering the next bus is 1 + Np + Nf . This has the sample value n if Np + Nf
has the sample value n 1, so from (f),
✓
◆n 1 ✓
◆2
µ
p1+Np +Nf (n) = n
.
+µ
+µ
Do not be discouraged if you made a number of errors in this exercise and if it still looks
very strange. This is a first exposure to a difficult set of issues which will become clear in
Chapter 5.
Exercise 2.13: a) Show that the arrival epochs of a Poisson process satisfy
fS (n) |Sn+1 (s (n) |sn+1 ) = n!/sn
n+1 .
Hint: This is easy if you use only the results of Section 2.2.2.
Solution: Note that S (n) is shorthand for S1 , . . . , Sn . Using Bayes’ law,
fS (n) |Sn+1 (s
(n)
|sn+1 ) =
fS (n) (s (n) )fSn+1 |S (n) (sn+1 |s (n) )
fSn+1 (sn+1 )
From (2.15), fS (n) (s (n) ) = n e sn . Also, fSn+1 |S (n) (sn+1 |s (n) ) =
fSn+1 (sn+1 ) is Erlang. Combining these terms,
e
(sn+1 sn ) .
Finally
fS (n) |Sn+1 (s (n) |sn+1 ) = n!/snn+1 .
b) Contrast this with the result of Theorem 2.5.1
Solution: (a) says that S1 , . . . , Sn are uniformly distributed (subject to the ordering constraint) between 0 and t for any sample value t for Sn+1 . Theorem 2.5.1 says that they are
uniformly distributed between 0 and t given that N (t) = n. The conditions {Sn+1 = t}
and {N (t) = n} each imply that there are n arrivals in (0, t). The condition {Sn+1 = t} in
addition specifies that arrival n + 1 is at epoch t, whereas {N (t) = n} specifies that arrival
n+1 is in (t, 1). From the independent increment property, the arrival epochs in (0, t) are
independent of those in [t, 1) and thus the conditional joint distribution of S (n) is the same
for each conditioning event.
One might ask whether this equivalence of conditional distributions provides a rigorous way
of answering (a). The answer is yes if the above argument is spelled out in more detail. It is
simpler, however, to use the approach in (a). The equivalence approach is more insightful,
on the other hand, so it is worthwhile to understand both approaches.
Exercise 2.14: Equation (2.42) gives fSi |N (t) (si | n), which is the density of random variable Si conditional on N (t) = n for n
i. Multiply this expression by Pr{N (t) = n} and sum over n to find fSi (si );
verify that your answer is indeed the Erlang density.
Solution: It is almost magical, but of course it has to work out.
fSi |N (t) (si |n) =
(si )i 1 (t si )n i n!
;
(i 1)! (n i)! tn
pN (t) (n) =
( t)n e
n!
t
.
58
APPENDIX A. SOLUTIONS TO EXERCISES
1
X
n=i
f(si |n)p(n) =
=
=
1
sii 1 X (t si )n i
(i i)!
(n i)!
i si 1 e
i
(i
n=i
si
1
X
i)!
i si 1 e
i
(i
n i (t
n=i
si
i)!
n
t
e
si )n i e
(n i)!
(t si )
.
This is the Erlang distribution, and it follows because the preceding sum is the sum of terms
in the PMF for the Poisson rv of rate (t si )
Exercise 2.15: Consider generalizing the bulk arrival process in Figure 2.5. Assume that the epochs at
which arrivals occur form a Poisson process {N (t); t > 0} of rate . At each arrival epoch Sn , the number
of arrivals Zn , satisfies Pr{Zn =1} = p, Pr{Zn =2} = 1 p. The variables Zn are IID.
a) Let {N1 (t); t > 0} be the counting process of the epochs at which single arrivals occur. Find the PMF of
N1 (t) as a function of t. Similarly, let {N2 (t); t
0} be the counting process of the epochs at which double
arrivals occur. Find the PMF of N2 (t) as a function of t.
Solution: Since the process of arrival epochs is Poisson, and these epochs are split into
single and double-arrival epochs by an IID splitting, the process of single-arrival epochs is
Poisson with rate p and the process of double-arrival epochs is Poisson with rate (1 p).
Thus, letting q = 1 p,
pN1 (t) (n) =
b) Let {NB (t); t
( p)n e
n!
p
;
pN2 (t) (m) =
( q)m e
m!
q
.
0} be the counting process of the total number of arrivals. Give an expression for the
PMF of NB (t) as a function of t.
Solution: Since there are two arrivals at each double-arrival epoch, we have NB (t) =
N1 (t) + 2N2 (t), and as seen above N1 (t) and N2 (t) are independent. This can be done as
a digital convolution of the PMF’s for N1 (t) and 2N2 (t), but this can be slightly confusing
since 2N2 (t) is nonzero only for even integers. Thus we revert to the general approach,
which also reminds you where convolution comes from and thus how it can be done in
general (PDF’s, PMF’s, etc.)
Pr{NB (t) = n, N2 (t) = m} = pN1 (t) (n 2m)pN2 (t) (m).
The marginal PMF for NB (t) is then given by
pNB (t) (n) =
bn/2c
X
pN1 (t) (n 2m)pN2 (t) (m)
m=0
=
bn/2c
X ( pt)n 2m e
m=0
= e
t
(n
pt (
2m)!
qt)m e
m!
bn/2c
X ( pt)n 2m ( qt)m
m=0
(n
2m)!m!
.
qt
59
A.2. SOLUTIONS FOR CHAPTER 2
Exercise 2.16: a) For a Poisson counting process of rate , find the joint probability density of
S1 , S2 , . . . , Sn 1 conditional on Sn = t.
Solution: The joint density of S1 , . . . , Sn is given in (2.15) as fS1 ,... ,Sn (s1 , . . . , sn ) =
n exp(
sn ). The marginal density of Sn is the Erlang density. The conditional density is the ratio of these, i.e.,
fS1 ,... ,Sn 1 |Sn (s1 , . . . , sn 1 |sn ) =
n exp(
sn )
sn /(n
n sn 1 e
n
1)!
=
(n 1)!
.
snn 1
b) Find Pr{X1 > ⌧ | Sn =t}.
Solution: We first use Bayes’ law to find the density, fX1 |Sn (⌧ |t).
fX1 |Sn (⌧ |t) =
=
fX1 (⌧ )fSn |X1 (t|⌧ )
e ⌧ n 1 (t ⌧ )n 2 e (t ⌧ ) /(n
=
n tn 1 e t /(n
fSn (t)
1)!
n
2
(t ⌧ )
(n 1)
for ⌧ < t
tn 1
2)!
Integrating this with respect to ⌧ , we get
Pr{X1 > ⌧ | Sn =t} =

t
⌧
t
n 1
.
c) Find Pr{Xi > ⌧ | Sn =t} for 1  i  n.
Solution: The condition here is X1 + · · · + Xn = t. Since X1 , . . . , Xn are IID without
the condition, and the condition is symmetric in X1 , . . . , Xn , we see that X1 , . . . , Xn are
identically distributed conditional on Sn = t. Thus, from (b),

t ⌧ n 1
Pr{Xi > ⌧ | Sn =t} =
;
for 1  i  n.
t
d) Find the density fSi |Sn (si |t) for 1  i  n
1.
Solution: We can use Bayes’ law in the same way as (b), getting
fSi |Sn (si |t) =
sii 1 (t si )n i 1 (n 1)!
.
tn 1 (i 1)!(n i 1)!
e) Give an explanation for the striking similarity between the condition N (t) = n
1 and the condition
Sn = t.
Solution: The solutions to (c) and (d) are the same as (2.45) and (2.46) respectively for
N (t) = n 1. The condition N (t) = n 1 and the condition Sn = t both imply that
the number of arrivals in (0, t) is n 1. In addition, N (t) = n 1 implies that the first
arrival after (0, t) is strictly after t, whereas Sn = t implies that the first arrival after (0, t)
is at t. Because of the independent increment property, this additional implication does not
a↵ect the distribution of S1 , . . . , Sn 1 . See the solution to Exercise 2.13(b) for a further
discussion of this point.
60
APPENDIX A. SOLUTIONS TO EXERCISES
The important fact here is that the equivalence of the arrival distributions in (0, t), given
these slightly di↵erent conditions, is a valuable aid in problem solving, since either approach
can be used.
Exercise 2.17: a) For a Poisson process of rate , find Pr{N (t)=n | S1 =⌧ } for t > ⌧ and n
1.
Solution: Given that S1 = ⌧ , the number, N (t), of arrivals in (0, t] is 1 plus the number
e (⌧, t) is Poisson with mean (t ⌧ ). Thus,
in (⌧, t]. This latter number, N
n
o
e (⌧, t) = n 1 = [ (t
Pr{N (t)=n | S1 =⌧ } = Pr N
⌧ )]n 1 e
(n 1)!
(t ⌧ )
.
b) Using this, find fS1 (⌧ ) | N (t)=n).
Solution: Using Bayes’ law,
fS1 |N (t) (⌧ |n) =
n(t
⌧ )n 1
tn
.
c) Check your answer against (2.41).
Solution: Eq. (2.41) is Pr{S1 > ⌧ | N (t) = n} = [(t ⌧ )/t]n . The derivative of this with
respect to ⌧ is fS1 |N (t) (⌧ |t), which clearly checks with (b).
Exercise 2.18: Consider a counting process in which the rate is a rv ⇤ with probability density f⇤ ( ) =
↵e ↵ for > 0. Conditional on a given sample value
process of rate (i.e., nature first chooses a sample value
process of that rate ).
for the rate, the counting process is a Poisson
and then generates a sample path of a Poisson
a) What is Pr{N (t)=n | ⇤= }, where N (t) is the number of arrivals in the interval (0, t] for some given
t > 0?
Solution: Conditional on ⇤ = , {N (t); t > 0 is a Poisson process, so
Pr{N (t)=n | ⇤= } = ( t)n e
t
/n!.
b) Show that Pr{N (t)=n}, the unconditional PMF for N (t), is given by
Pr{N (t)=n} =
↵tn
.
(t + ↵)n+1
Solution: The straightforward approach is to average the conditional distribution over ,
Z 1
( t)n e t
Pr{N (t) = n} =
↵e ↵ d
n!
0
Z 1
↵tn
[ (t+↵)]n e (t+↵)
=
d
(t + ↵)n 0
n!
Z 1 n x
↵tn
x e
↵tn
=
dx =
,
n+1
(t + ↵)
n!
(t + ↵)n+1
0
where we changed the variable of integration from to x = (t+↵) and then recognized
the integral as the integral of an Erlang density of order n+1 with unit rate.
61
A.2. SOLUTIONS FOR CHAPTER 2
The solution can also be written as pq n where p = ↵/(t+↵) and q = t/(t+↵). This suggests
a di↵erent interpretation for this result. Pr{N (t) = n} for a Poisson process (PP) of rate
is a function only of t and n. The rv N (t) for a PP of rate
thus has the same
distribution as N ( t) for a PP of unit rate and thus N (t) for a PP of variable rate ⇤ has
the same distribution as N (⇤t) for a PP of rate 1.
Since ⇤t is an exponential rv of parameter ↵/t, we see that N (⇤t) is the number of arrivals
of a PP of unit rate before the first arrival from an independent PP of rate ↵/t. This unit
rate PP and rate ↵/t PP are independent and the combined process has rate 1 + ↵/t. The
event {N (⇤t) = n} then has the probability of n arrivals from the unit rate PP followed by
one arrival from the ↵/t rate process, thus yielding the probability q n p.
c) Find f⇤ ( | N (t)=n), the density of ⇤ conditional on N (t)=n.
Solution: Using Bayes’ law with the answers in (a) and (b), we get
f⇤|N (t) ( | n) =
ne
(↵+t) (↵ + t)n+1
n!
.
This is an Erlang PDF of order n+1 and can be interpreted (after a little work) in the same
way as (b).
d) Find E [⇤ | N (t)=n] and interpret your result for very small t with n = 0 and for very large t with n
large.
Solution: Since ⇤ conditional on N (t) = n is Erlang, it is the sum of n + 1 IID rv’s, each
of mean 1/(t + ↵). Thus
E [⇤ | N (t)=n] =
n+1
.
↵+t
For N (t) = 0 and t << ↵, this is close to 1/↵, which is E [⇤]. This is not surprising since
it has little e↵ect on the distribution of ⇤. For n large and t >> ↵, E [⇤ | N (t)=n] ⇡ n/t.
e) Find E [⇤ | N (t)=n, S1 , S2 , . . . , Sn ]. (Hint: consider the distribution of S1 , . . . , Sn conditional on N (t)
and ⇤). Find E [⇤ | N (t)=n, N (⌧ )=m] for some ⌧ < t.
Solution: From Theorem 2.5.1, S1 , . . . , Sn are uniformly distributed, subject to 0 < S1 <
· · · < Sn < t, given N (t) = n and ⇤ = . Thus, conditional on N (t) = n, ⇤ is statistically
independent of S1 , . . . , Sn .
E [⇤ | N (t)=n, S1 , S2 , . . . , Sn ] = E [⇤ | N (t)=n] =
n+1
.
↵+t
Conditional on N (t) = n, N (⌧ ) is determined by S1 , . . . , Sn for ⌧ < t, and thus
E [⇤ | N (t)=n, N (⌧ ) = m] = E [⇤ | N (t)=n] =
n+1
.
↵+t
This corresponds to one’s intuition; given the number of arrivals in (0, t], it makes no
di↵erence where the individual arrivals occur.
62
APPENDIX A. SOLUTIONS TO EXERCISES
Exercise 2.19: a) Use Equation (2.42) to find E [Si | N (t)=n]. Hint: When you integrate si fSi (si |
N (t)=n), compare this integral with fSi+1 (si | N (t)=n + 1) and use the fact that the latter expression is a
probability density.
Solution: We can find E [Si | N (t)=n] from fSi |N (t) (si |n) (as given in (2.42)) by integration,
Z 1
Z 1 i
x (t x)n i n!
E [Si | N (t)=n] =
xfSi |N (t) (x|n) dx =
dx.
(A.31)
n
t (n i)!(i 1)!
0
0
Using the hint,
fSi+1 |N (t) (x|n + 1) =
xi (t x)n i (n + 1)!
.
tn+1 (n i)! i!
(A.32)
The factors involving x are the same as in (A.31), and substituting (A.32) into (A.31)
Z 1
it
it
E [Si | N (t)=n] =
fSi+1 |N (t) (x|n + 1) dx =
.
n+1
n+1
0
As a check, we see that this agrees with the more elegant derivation in (2.43). The derivation
in (2.43), however, does not generalize as easily to the second moment.
b) Find the second moment and the variance of Si conditional on N (t)=n. Hint: Extend the previous hint.
Solution: Using the same approach as in (AP19)
Z 1
Z 1 i+1
⇥ 2
⇤
x (t x)n i n!
2
E Si | N (t)=n =
x fSi |N (t) (x|n) dx =
dx.
tn (n i)!(i 1)!
0
0
(A.33)
This suggests comparing with the density of Si+2 conditional on N (t) = n + 2,
fSi+2 |N (t) (x|n + 2) =
xi+1 (t x)n i (n + 2)!
.
tn+2 (n i)!(i + 1)!
⇥
⇤
E Si2 | N (t)=n =
(A.34)
(i + 1) i t2
.
(n + 2)(n + 1)
Finally, calculating the variance as the second moment minus the mean squared,

(i + 1)it2
it 2
it2 (n + 1 i)
VAR [Si |N (t) = n] =
=
.
(n + 2)(n + 1)
n+1
(n + 1)2 (n + 2)
c) Assume that n is odd, and consider i = (n + 1)/2. What is the relationship between Si , conditional on
N (t)=n, and the sample median of n IID uniform random variables.
Solution: We have seen that the first n arrival epochs of a Poisson process, conditional on
N (t) = n have the same joint probability distribution as the order statistics of n IID rv’s
that are uniform over (0, t]. Thus the sample value of the rv S(n+1)/2 is the sample median
of those rv’s . From (a) and (b),
⇥
⇤
E S(n+1)/2 = t/2;
⇥
⇤
VAR S(n+1)/2 =
t2
.
4(n + 2)
63
A.2. SOLUTIONS FOR CHAPTER 2
d) Give a weak law of large numbers for the above median.
Solution: The median S(n+1)/2 is a rv, just like the sample average of n rv’s is a rv . By a
WWLN for the median, we mean a result that says that S(n+1)/2 converges in probability
to a limit, in this case the mean of S(n+1)/2 as n ! 1. Using the Chebyshev inequality,
Pr
⇢
t
2
✏
lim Pr
⇢
S(n+1)/2
⇥
⇤
VAR S(n+1)/2
t2

=
.
✏2
4(n + 2)✏2
Thus, for any fixed ✏, t,
n!1
S(n+1)/2
t
2
= 0.
✏
The limit here does not correspond to any limit in a given Poisson process, but it does
correspond to a limit of the median of n uniformly distributed IID rv’s. By combining this
result with Exercise (1.28), it is possible to extend this result to the median of IID rv’s with
an arbitrary probability density.
Exercise 2.20: Suppose cars enter a one-way infinite length, infinite lane highway at a Poisson rate .
The ith car to enter chooses a velocity Vi and travels at this velocity. Assume that the Vi ’s are independent
positive rv’s having a common CDF F. Derive the distribution of the number of cars that are located in an
interval (0, a) at time t.
Solution: This is a thinly disguised variation of an M/G/1 queue. The arrival process is
the Poisson process of cars entering the highway. We then view the service time of a car
as the time interval until the car reaches or passes point a. All cars then have IID service
times, and service always starts at the time of arrival (i.e., this can be viewed as infinitely
many independent and identical servers). To avoid distractions, assume initially that V is
a continuous rv. The CDF G(⌧ ) of the service time X is then given by the equation
G(⌧ ) = Pr{X  ⌧ } = Pr{a/V  ⌧ } = Pr{V
a/⌧ } = FcV (a/⌧ ).
The PMF of the number N1 (t) of cars in service at time t is then given by (2.36) and (2.37)
as
pN1 (t) (n) =
mn (t) exp[ m(t)]
,
n!
where
m(t) =
Z t
0
[1
G(⌧ )] d⌧ =
Z t
FV (a/⌧ ) d⌧.
0
Since this depends only on the CDF, it can be seen that the answer is the same if V is
discrete or mixed.
Exercise 2.21: Consider an M/G/1 queue, i.e., a queue with Poisson arrivals of rate
arrival i, independent of other arrivals, remains in the system for a time Xi , where {Xi ; i
IID rv’s with some given CDF F(x).
in which each
1} is a set of
64
APPENDIX A. SOLUTIONS TO EXERCISES
You may assume that the number of arrivals in any interval (t, t + ✏) that are still in the system at some
later time ⌧ t + ✏ is statistically independent of the number of arrivals in that same interval (t, t + ✏) that
have departed from the system by time ⌧ .
a) Let N (⌧ ) be the number of customers in the system at time ⌧ . Find the mean, m(⌧ ), of N (⌧ ) and find
Pr{N (⌧ ) = n}.
Solution: The answer is worked out and explained in (2.36) and (2.37). We have
Z ⌧
m(⌧ )n exp( m(⌧ ))
m(⌧ ) =
[1 F(t)] dt;
Pr{N (⌧ ) = n} =
.
n!
0
Note that m(⌧ ) is non-decreasing in ⌧ and has the limit lim⌧ !1 m(⌧ ) = X.
b) Let D(⌧ ) be the number of customers that have departed from the system by time ⌧ . Find the mean,
E [D(⌧ )], and find Pr{D(⌧ ) = d}.
Solution: As illustrated in Figure 2.8, The number of arrivals that have departed and
those that remain can be treated as a non-homogeneous splitting of the Poisson arrival
process. Thus the departures can be handled in the same way as those still in the system.
Let µ(⌧ ) = E [D(⌧ )] be the mean of the number of arrivals that have departed by time t.
Then µ(⌧ ) and Pr{D(⌧ ) = d} are given by
µ(⌧ ) =
Z ⌧
F(t) dt;
Pr{D(⌧ ) = d} =
0
Note that m(⌧ ) + µ(⌧ ) =
R⌧
0
µ(⌧ )d exp( µ(⌧ ))
.
d!
dt = ⌧ so that µ(⌧ ) tends to ⌧
X as ⌧ increases.
c) Find Pr{N (⌧ ) = n, D(⌧ ) = d}.
Solution: By the same argument as used in Section 2.3 on the splitting of homogeneous
Poisson processes, the processes {N (⌧ ); ⌧ > 0} and the process {D(⌧ ); ⌧ > 0} are statistically independent. The assumption in the exercise is one step in that argument. Combining
(a) and (b), we then have
Pr{N (⌧ ) = n, D(⌧ ) = d} =
m(⌧ )n µ(⌧ )d exp
m(⌧ )
n! d!
µ(⌧ )
(A.35)
.
d) Let A(⌧ ) be the total number of arrivals up to time ⌧ . Find Pr{N (⌧ ) = n | A(⌧ ) = a}.
Solution: Since A(⌧ ) = N (⌧ ) + D(⌧ ), we can let a = n + d for a
Pr{N (⌧ ) = n, A(⌧ ) = a} =
Since A(⌧ ) is Poisson with rate
n and rewrite (A.35) as
m(⌧ )n µ(⌧ )a n exp
m(⌧ )
n! (a n)!
µ(⌧ )
(A.36)
.
= [m(⌧ ) + µ(⌧ )]/⌧ , the conditional PMF is
m(⌧ )n µ(⌧ )a n exp
⌧
a!
⇥
a
n! (a n)!
( t) exp(
✓ ◆
n
a n
a
m(⌧ )
µ(⌧ )
=
.
n
⌧
⌧
Pr{N (⌧ ) = n | A(⌧ ) = a} =
t)
65
A.2. SOLUTIONS FOR CHAPTER 2
e) Find Pr{D(⌧ + ✏)
D(⌧ ) = d}.
Solution: Since {D(⌧ ); ⌧ > 0} is a non-homogeneous
Poisson process, Y = D(⌧ +✏)
R ⌧ +✏
is a Poisson random variable of mean Y = ⌧ F(t) dt. Thus
D(⌧ )
d
Pr{Y = d} =
Y exp( Y )
.
d!
Note that Y is the number of departures in (⌧, ⌧ + ✏]. This is a Poisson rv which has a
mean approaching ✏ for ⌧ ! 1, The output process is a non-homogeneous Poisson process
which becomes stationary (homogeneous) as ⌧ ! 1.
Exercise 2.22: The voters in a given town arrive at the place of voting according to a Poisson process
of rate = 100 voters per hour. The voters independently vote for candidate A and candidate B each with
probability 1/2. Assume that the voting starts at time 0 and continues indefinitely.
a) Conditional on 1000 voters arriving during the first 10 hours of voting, find the probability that candidate
A receives n of those votes.
Solution: Intuitively, each of the 1000 voters votes independently for A with probability
1/2. Thus, from the binomial formula,
✓
◆
1000
Pr{n of 1000 votes for A} =
(1/2)1000 .
n
Being more careful, the number of votes for A in 10 hours is Poisson with mean 500 and
the number for B is independent and Poisson with mean 500. The total number of votes
in 10 hours is the sum of these and also Poisson. Using Bayes’ law to find the votes for A
conditional on the overall number, we get the same answer.
b) Again conditional on 1000 voters during the first 10 hours, find the probability that candidate A receives
n votes in the first 4 hours of voting.
Solution: Intuitively, each of the 1000 voters comes in the first 4 hours independently with
probability .4, and out of those, each independently votes for A with probability 1/2. Thus
each voter independently both comes in the first 4 hours and votes for A with probability
0.2. Thus
✓
◆
1000
Pr{n votes for A in first 4 hours | 1000 in 10 hours} =
(0.2)n (0.8)1000 n .
n
Being more careful (or first stating a general theorem about these conditional Poisson
probabilities), the number in the first 4 hours is Poisson, the number of those who vote for
A is Poisson and the overall number to vote in 10 hours is Poisson and the sum of those in
the first 4 hours and those in the last 6 hours, further broken into A and B are independent
Poisson. We can then find the conditional probability as in (a).
c) Let T be the epoch of the arrival of the first voter voting for candidate A. Find the density of T .
Solution: View the voters for A as a splitting of the overall arrival process. Thus the
voters for A form a Poisson process of rate 50 and fT (t) = 50 exp( 50t).
66
APPENDIX A. SOLUTIONS TO EXERCISES
d) Find the PMF of the number of voters for candidate B who arrive before the first voter for A.
Solution: n B voters arrive before the first A if the first n voters are B and the (n + 1)st
is A; this is an event of probability (1/2)n+1 for n 0. Thus this number is a geometric rv.
e) Define the nth voter as a reversal if the nth voter votes for a di↵erent candidate than the (n
1)st . For
example, in the sequence of votes AABAABB, the third, fourth, and sixth voters are reversals; the third
and sixth are A to B reversals and the fourth is a B to A reversal. Let N (t) be the number of reversals up
to time t (t in hours). Is {N (t); t > 0} a Poisson process? Explain.
Solution: The first voter can not be a reversal. Every subsequent voter is a reversal with
probability 1/2. Thus, after the first arrival, each inter-reversal interval is exponential with
mean 1/50 hours. The process is not Poisson, however, because the interval until the first
reversal is not exponential with mean 1/50 hours.
When we study renewal processes, we denote renewal processes with a non-standard first
renewal interval as ‘delayed renewal processes’ and find that most of the renewal results
also apply to delayed renewal processes. Thus we might call this a delayed Poisson process.
f ) Find the expected time (in hours) between reversals.
Solution: Starting from one reversal we can view subsequent reversals as an equi-probable
splitting of the arrivals. Thus the expected time between reversals is 1/50.
g) Find the probability density of the time between reversals.
Solution: As explained in (f), the time X between reversals is exponential with mean 1/50,
so fX (x) = 50 exp( 50t).
h) Find the density of the time from one A to B reversal to the next A to B reversal.
Solution: Since A to B reversals alternate with B to A reversals, the time Y from one
A to B reversal to the next is the sum of 2 independent exponential rv’s. One can either
calculate this density by convolution or recognize it as an Erlang rv of order 2.
fY (y) =
2
y exp(
where
y)
= 50 and y
0.
An alternate approach here is to see that after an A to B reversal, the time to the next A
to B reversal is the sum of the time to the next A followed by the time to the next B.
Exercise 2.23: Let {N1 (t); t > 0} be a Poisson counting process of rate . Assume that the arrivals from
this process are switched on and o↵ by arrivals from a second independent Poisson process {N2 (t); t > 0} of
rate .
rate
rate
A
A
A
On A
A
A
A
A A
A
On A
A
A
A
N2 (t)
-A
On
A
N1 (t)
A
NA (t)
Let {NA (t); t>0} be the switched process; that is NA (t) includes the arrivals from {N1 (t); t > 0} during
periods when N2 (t) is even and excludes the arrivals from {N1 (t); t > 0} while N2 (t) is odd.
67
A.2. SOLUTIONS FOR CHAPTER 2
a) Find the PMF for the number of arrivals of the first process, {N1 (t); t > 0}, during the nth period when
the switch is on.
Solution: We have seen that the combined process {N1 (t) + N2 (t)} is a Poisson process of
rate + . For any even numbered arrival to process 2, subsequent arrivals to the combined
process independently come from process 1 or 2, and come from process 1 with probability
/( + ). The number
⇥ Ns of such
⇤n ⇥ arrivals⇤before the next arrival to process 2 is geometric
with PMF pNs (n) = /( + )
/( + ) for integer n 0.
b) Given that the first arrival for the second process occurs at epoch ⌧ , find the conditional PMF for the
number of arrivals Na of the first process up to ⌧ .
Solution: Since processes 1 and 2 are independent, this is equal to the PMF for the number
of arrivals of the first process up to ⌧ . This number has a Poisson PMF, ( ⌧ )n e ⌧ /n!.
c) Given that the number of arrivals of the first process, up to the first arrival for the second process, is n,
find the density for the epoch of the first arrival from the second process.
Solution: Let Na be the number of process 1 arrivals before the first process 2 arrival
and
let X⇤2 ⇥be the time
⇥
⇤ of the first process 2 arrival. In (a), we showed that pNa (n) =
n
/( + )
/( + ) and in (b) we showed that pNa |X2 (n|⌧ ) = ( ⌧ )n e ⌧ /n!. We can
then use Bayes’ law to find fX2 |Na (⌧ | n), which is the desired solution. We have
fX2 |Na (⌧ | n) = fX2 (⌧ )
pNa |X2 (n|⌧ )
( + )n+1 ⌧ n e ( + )⌧
=
,
pNa (n)
n!
where we have used the fact that X2 is exponential with PDF exp( ⌧ ) for ⌧ 0. It can
be seen that the solution is an Erlang rv of order n + 1. To interpret this (and to solve the
exercise in a perhaps more elegant way), note that this is the same as the Erlang density for
the epoch of the (n+1)th arrival in the combined process. This arrival epoch is independent
of the process 1/process 2 choices for these n+1 arrivals, and thus is the arrival epoch for
the particular choice of n successive arrivals to process 1 followed by 1 arrival to process 2.
d) Find the density of the interarrival time for {NA (t); t
0}. Note: This part is quite messy and is done
most easily via Laplace transforms.
Solution: The process {NA (t); t > 0 is not a Poisson process, but, perhaps surprisingly, it is
a renewal process; that is, the interarrival times are independent and identically distributed.
One might prefer to postpone trying to understand this until starting to study renewal
processes, but we have the necessary machinery already.
Starting at a given arrival to {NA (t); t > 0}, let XA be the interval until the next arrival
to {NA (t); t > 0} and let X be the interval until the next arrival to the combined process.
Given that the next arrival in the combined process is from process 1, it will be an arrival
to {NA (t); t > 0}, so that under this condition, XA = X. Alternatively, given that this
next arrival is from process 2, XA will be the sum of three independent rv’s, first X, next,
the interval X2 to the following arrival for process 2, and next the interval from that point
to the following arrival to {NA (t); t > 0}. This final interarrival time will have the same
68
APPENDIX A. SOLUTIONS TO EXERCISES
distribution as XA . Thus the unconditional PDF for XA is given by
fXA (x) =
fX (x) +
fX (x) ⌦ fX2 (x) ⌦ fXA (x)
+
+
exp( ( + )x) + exp( ( + )x) ⌦ exp(
=
x) ⌦ fXA (x).
where ⌦ is the convolution operator and all functions are 0 for x < 0.
Solving this by Laplace transforms is a mechanical operation of no real interest here. The
solution is
h x⇣
⌘i
h x⇣
⌘i
p
p
fXA (x) = B exp
2 + + 4 2 + 2 + C exp
2 +
4 2+ 2 ,
2
2
where
B=
2
1+ p
4 2+
2
!
;
C=
2
1
p
4 2+
2
!
.
Exercise 2.24 : Let us model the chess tournament between Fisher and Spassky as a stochastic process.
Let Xi , for i 1, be the duration of the ith game and assume that {Xi ; i 1} is a set of IID exponentially
distributed rv’s each with density fX (x) = e x . Suppose that each game (independently of all other
games, and independently of the length of the games) is won by Fisher with probability p, by Spassky with
probability q, and is a draw with probability 1 p q. The first player to win n games is defined to be the
winner, but we consider the match up to the point of winning as being embedded in an unending sequence
of games.
a) Find the distribution of time, from the beginning of the match, until the completion of the first game
that is won (i.e., that is not a draw). Characterize the process of the number {N (t); t > 0} of games won
up to and including time t. Characterize the process of the number {NF (t); t
and the number {NS (t); t
0} of games won by Fisher
0} won by Spassky.
Solution: The Poisson game process is split into 3 independent Poisson processes, namely
the draw process of rate (1 p q) , the Fisher win process of rate p and the Spassky win
process of rate q . The process of wins is the sum of the Fisher and Spassky win processes,
and is independent of the draw process. Thus the time to the first win is an exponential rv
XW of rate (p + q) . Thus the density and CDF are
fXW (x) = (p + q) exp( (p + q) x),
FXW (x) = 1
exp( (p + q) x).
b) For the remainder of the problem, assume that the probability of a draw is zero; i.e., that p + q = 1. How
many of the first 2n
1 games must be won by Fisher in order to win the match?
Solution: Note that (a) shows that the process of wins is Poisson within the process of
games including draws, and thus the assumption that there are no draws (i.e., p + q = 1)
only simplifies the notation slightly. Note also that it makes no di↵erence to the probability
of winning the match whether they continue playing beyond the game in which one of them
first wins n games.
If Fisher wins n or more of the first 2n 1 games, then Spassky wins at most n 1, so Fisher
wins the match. Conversely if Fisher wins the match, he must win n or more of the first
2n 1 games. Thus Fisher wins if and only if he wins n or more of the first 2n 1 games.
69
A.2. SOLUTIONS FOR CHAPTER 2
c) What is the probability that Fisher wins the match? Your answer should not involve any integrals. Hint:
consider the unending sequence of games and use (b).
Solution: The sequence of games is Bernoulli with probabiity p of a Fisher game win.
Thus, using the binomial PMF for 2n 1 plays, the probability that Fisher wins the match
is
2n
X1 ✓2n 1◆
Pr{Fisher wins match} =
pk q 2n 1 k .
k
k=n
Without the hint, the problem is more tricky, but no harder computationally. The probability that Fisher wins the match at the end of game k, for n  k  2n 1, is the probabiity
that he wins n 1 games out the first k 1 and then wins the kth. This is p times
k 1 n 1 k 1
q . Thus
n 1 p
Pr{Fisher wins match} =
2n
X1 ✓
k=n
◆
1 n k n
p q
.
1
k
n
It is surprising that these very di↵erent appearing expressions are the same.
d) Let T be the epoch at which the match is completed (i.e., either Fisher or Spassky wins). Find the CDF
of T .
Solution: Let Tf be the time at which Fisher wins his nth game and Ts be the time
at which Spassky wins his nth game (again assuming that playing continues beyond the
winning of the match). The Poisson process of Fisher wins is independent of that of Spassky
wins. Also, the time T at which the match ends is the minimum of Tf and Ts , so, for any
t > 0, Pr{T > t} = Pr{Tf > t, Ts > t}. Thus
Pr{T > t} = Pr{Tf > t} Pr{Ts > t} .
(A.37)
Now Tf has an Erlang distribution so its complementry CDF is equal to the probability
that fewer than n Fisher wins have occured by time t. The number of Fisher wins is a
Poisson rv, and Spassky wins are handled the same way. Thus,
Pr{T > t} =
n
X1
k=0
Finally FT (t) = 1
( pt)k e
k!
pt n
X1 (
j=0
qt)j e
j!
qt
= e
t
n
X1
k=0
n 1
( pt)k X ( qt)j
.
k!
j!
j=0
Pr{T > t}.
e) Find the probability that Fisher wins and that T lies in the interval (t, t + ) for arbitrarily small .
Solution: From (A.37), the PDF fT (t) is given by
fT (t) = fTf (t)Pr{Ts > t} + fTs (t)Pr{Tf > t} .
The first term on the right is associated with a Fisher win, and the second with a Spassky
win. Thus
lim
!0
Pr{Fisher wins, T 2 [t, t+ )}
=
( p)n tn 1 e
(n 1)!
pt n
X1 (
j=0
qt)j e
j!
qt
.
(A.38)
70
APPENDIX A. SOLUTIONS TO EXERCISES
The main reason for calculating this is to point out that T and the event that Fisher wins
the match are not independent events. The number of games that Fisher wins up to time
t is independent of the number that Spassky wins up to t. Also, given that k games have
been played by time t, the distribution of the number of those games won by Fisher does
not vary with t. The problem here, given a Fisher win with T = t, the most likely number
of Spassky wins is 0 when t is very small and n 1 when t is very large. This can be seen
by comparing the terms in the sum of (A.38). This is not something you could reasonably
be expected to hypothesize intuitively; it is simply something that indicates that one must
sometimes be very careful.
Exercise 2.25: a) For 1  i < n, find the conditional density of Si+1 , conditional on N (t) = n and
Si = si .
Solution: Recall from (2,41) that
Pr{S1 > ⌧ | N (t) = n} = Pr{X1 > ⌧ | N (t) = n} =
✓
t
⌧
t
◆n
.
e (si , t) for the first arrival after si .
Given that Si = si , we can apply this same formula to N
✓
◆
n
o
t si ⌧ n i
e
Pr{Xi+1 >⌧ | N (t)=n, Si =si } = Pr Xi+1 >⌧ | N (si , t)=n i, Si =si =
.
t si
Since Si+1 = Si + Xi+1 , we get
Pr{Si+1 >si+1 | N (t)=n, Si =si } =
fSi+1 |N (t)Si si+1 | n, si
=
✓
◆
t si+1 n i
t si
(n i)(t si+1 )n i 1
.
(t si )n i
(A.39)
b) Use (a) to find the joint density of S1 , . . . , Sn conditional on N (t) = n. Verify that your answer agrees
with (2.38).
Solution: For each i, the conditional probability in (a) is clearly independent of Si 2 , . . . , S1 .
Thus we can use the chain rule to multiply (A35) by itself for each value of i. We must also
include fS1 |N (t) s1 | n = n(t s1 )n 1 /tn . Thus
fS (n) |N (t) s (n) |n
=
=
n(t s1 )n 1 (n 1)(t s2 )n 2 (n 2)(t s3 )n 3
(t sn )0
·
·
···
n
n
1
n
2
t
(t s1 )
(t s2 )
t sn 1
n!
.
tn
Note: There is no great insight to be claimed from this exercise. it is useful, however, in
providing some additional techniques for working with such problems.
Exercise 2.26: A two-dimensional Poisson process is a process of randomly occurring special points in
the plane such that (i) for any region of area A the number of special points in that region has a Poisson
distribution with mean A, and (ii) the number of special points in nonoverlapping regions is independent.
71
A.2. SOLUTIONS FOR CHAPTER 2
For such a process consider an arbitrary location in the plane and let X denote its distance from its nearest
special point (where distance is measured in the usual Euclidean manner). Show that
a) Pr{X > t} = exp(
⇡t2 ).
Solution: Given an arbitrary location, X > t if and only if there are no special points in
the circle of radius t around the given point. The expected number in that circle is ⇡t2 ,
and since the number in that circle is Poisson with expected value ⇡t2 , the probability
2
2
that number is 0 is e ⇡t . Thus Pr{X > t} = e ⇡t .
p
b) E [X] = 1/(2 ).
Solution: Since X
0, we have
Z 1
Z 1
E [X] =
Pr{X > t} dt =
exp(
0
⇡t2 ) dt.
0
We can look this up in a table of integrals, or recognize its resemblance to the Gaussian
PDF. If we define 2 = 1/(2⇡ ), the above integral is
p
 2
p Z 1 1
t
2⇡
1
p exp
E [X] =
2⇡
dt =
= p .
2
2
2
2⇡
2
0
Exercise 2.27: This problem is intended to show that one can analyze the long term behavior of
queueing problems by using just notions of means and variances, but that such analysis is awkward, justifying
understanding the strong law of large numbers. Consider an M/G/1 queue. The arrival process is Poisson
with = 1. The expected service time, E [Y ], is 1/2 and the variance of the service time is given to be 1.
a) Consider Sn , the time of the nth arrival, for n = 1012 . With high probability, Sn will lie within 3 standard
derivations of its mean. Find and compare this mean and the 3 range.
P
Solution: Let Xi be the ith interarrival time. Then Sn = ni=1 Xi , so E [Sn ] = n and
VAR [Sn ] = n. Thus E [S1012 ] = 1012 with 3 ⇥ 106 as the 3 sigma point.
b) Let Vn be the total amount of time during which the server is busy with these n arrivals (i.e., the sum
of 1012 service times). Find the mean and 3 range of Vn .
Solution: In the same way, E [V1012 ] = 5 ⇥ 1011 with 3 ⇥ 106 as the 3 sigma point.
c) Find the mean and 3
Sn
range of In , the total amount of time the server is idle up until Sn (take In as
Vn , thus ignoring any service time after Sn ).
Solution: As before E [I1012 ] = 5 ⇥ 1011 . Since interarrival times and service times are
independent and V and V have the same variance, the 3 sigma point of In is 6 ⇥ 106 .
d) An idle period starts when the server completes a service and there are no waiting arrivals; it ends on
the next arrival. Find the mean and variance of an idle period. Are successive idle periods IID?
Solution: The time from the beginning of an idle period to the next arrival is exponential
with mean 1 and variance 1. Since the time at which the nth idle period starts is a function
only of the first n interarrival times and service times, and since for each sample value of the
beginning of the nth idle period, the interval until the next arrival is independent of those
72
APPENDIX A. SOLUTIONS TO EXERCISES
arrivals (by the independent increment property) and independent of those service times,
it follows that the interval until the next arrival is independent of those earlier interarrival
intervals and service times. This interval (i.e., this idle period) is thus independent of all
earlier idle periods.
This is one of those peculiar situations where the independence of idle durations is virtually
obvious but a careful demonstration is quite tricky. This type of argument will become
clearer when we study renewal processes.
e) Combine (c) and (d) to estimate the total number of idle periods up to time Sn . Use this to estimate the
total number of busy periods.
p
Solution: The aggregate idle time up to Sn can be expressed as n/2 ± 6 n and the
p
aggregate duration of m idle periods can be expressed as m ± 3 m. Letting m(n) be
p
p
the number of idle periods up to Sn , we then have n/2 ± 6 n ⇡ m ± 3 m. For n very
large (1012 ), the squareproots are small relative to the linear terms, so we can argue that
p
m(n) ⇡ n/2 ± [6 n + 3 n/2]. One can do this more carefully by looking at the maximum
with 3 sigma limits of one expression and the minimum of the other, but however this
is expressed, the number or idle periods per arrival is increasingly close to 1/2 with high
probability as the number of arrivals is increased.
The period from 0 to Sn starts with an idle period and ends with a busy period, so that
m(n) is also the number of busy periods.
f ) Combine (e) and (b) to estimate the expected length of a busy period.
Solution: Let B be the expected duration of a busy period. Then the expected aggregate
duration of m busy periods is mB. Since the expected aggregate busy time for large m is
equal to the expected aggregate idle time, we have E [B] ⇡ 1.
It is curious to note that this does not depend on the variance of the service time (beyond
ensuring that the standard deviation of 1012 service times is small compared to the mean).
Exercise 2.28: The purpose of this problem is to illustrate that for an arrival process with independent
but not identically distributed interarrival intervals, X1 , X2 , . . . , the number of arrivals N (t) in the interval
(0, t] can be a defective rv. In other words, the ‘counting process’ is not a stochastic process according to
our definitions. This illustrates that it is necessary to prove that the counting rv’s for a renewal process are
actually rv’s .
a) Let the CDF of the ith interarrival interval for an arrival process be FXi (xi ) = 1
fixed ↵ 2 (0, 1). Let Sn = X1 + · · · + Xn and show that
E [Sn ] =
↵(1
1
exp( ↵ i xi ) for some
↵n )
.
↵
Solution: Each Xi is an exponential rv, but the rate, ↵ i , is rapidly increasing with i
and the expected interarrival time, E [Xi ] = ↵i , is rapidly decreasing with i. Thus
E [Sn ] = ↵ + ↵2 + · · · ↵n .
Recalling that 1 + ↵ + ↵2 + · · · + ↵n 1 = (1
↵n )/(1
↵),
E [Sn ] = ↵(1 + ↵ + · · · ↵
)
↵(1 ↵n )
↵
=
<
.
1 ↵
1 ↵
n 1
73
A.2. SOLUTIONS FOR CHAPTER 2
In other words, not only is E [Xi ] decaying to 0 geometrically with increasing i, but E [Sn ]
is upper bounded, for all n, by ↵/(1 ↵).
b) Sketch a ‘reasonable’ sample function for N (t).
Solution: Since the expected interarrival times are decaying geometrically and the expected
arrival epochs are bounded for all n, it is reasonable for a sample path to have the following
shape:
N (t)
t
0
S1 S2 S3
Note that the question here is not precise (there are obviously many sample paths, and
which are ‘reasonable’ is a matter of interpretation). The reason for drawing such sketches
is to acquire understanding to guide the solution to the following parts of the problem.
c) Find
2
Sn .
Solution: Since Xi is exponential,
2
Sn
=
2
2i
Xi = ↵ . Since the Xi are independent,
2
2
X1 + X2 · · · +
2
4
2
Xn
2n
= ↵ + ↵ + ··· + ↵
= ↵2 (1 + ↵2 + · · · ↵2(n 1) )
↵2 (1 ↵2n )
↵2
=
<
.
2
1 ↵
1 ↵2
d) Use the Markov inequality on Pr{Sn
t} to find an upper bound on Pr{N (t)  n} that is smaller than
1 for all n and for large enough t. Use this to show that N (t) is defective for large enough t.
Solution: The figure suggests (but does not prove) that for typical sample functions (and
in particular for a set of sample functions of non-zero probability), N (t) goes to infinity for
finite values of t. If the probability that N (t)  n (for a given t) is bounded, independent
of n, by a number strictly less than 1, then that N (t) is a defective rv rather than a true
rv.
By the Markov inequality,
Pr{Sn
t} 
Sn
↵

t
t(1 ↵)
Pr{N (t) < n} = Pr{Sn > t}  Pr{Sn
t} 
↵
t(1
↵)
.
where we have used (2.3). Since this bound is independent of n, it also applies in the limit,
i.e.,
↵
lim Pr{N (t)  n} 
.
n!1
t(1 ↵)
74
APPENDIX A. SOLUTIONS TO EXERCISES
For any t > ↵/(1 ↵), we see that t(1↵ ↵) < 1. Thus N (t) is defective for any such t, i.e.,
for any t greater than limn!1 E [Sn ].
Actually, by working harder, it can be shown that N (t) is defective for all t > 0. The outline
of the argument is as follows: for any given t, we choose an m such
P1that Pr{Sm  t/2} >
0 and such that Pr{S1 Sm  t/2} > 0 where S1 Sm =
i=m+1 Xi . The second
inequality can be satisfied for m large enough by the Markov inequality. The first inequality
is then satisfied since Sm has a density that is positive for t > 0.
75
A.3. SOLUTIONS FOR CHAPTER 3
A.3
Solutions for Chapter 3
Exercise 3.1: a) Let X, Y be IID rv’s, each with density fX (x) = ↵ exp( x2 /2). In (b), we show
p
that ↵ must be 1/ 2⇡ in order for fX (x) to integrate to 1, but in this part, we leave ↵ undetermined. Let
S = X 2 + Y 2 . Find the probability density of S in terms of ↵.
Solution: First we find the CDF of S.
Z Z
2
2
FS (s) =
↵2 e( x y )/2 dxdy
x2 +y 2 s
=
=
Z 2⇡ Z
Z0
2
↵2 re r /2 drd✓
r2 <s
2
2⇡↵2 e r /2 d(r2 /2) = 2⇡↵2 (1
e s/2 ),
(A.40)
r2 s
where we first changed to polar coordinates and then integrated. The density is then
fS (s) = ⇡↵2 e s/2 ;
for s
0.
p
b) Prove from (a) that ↵ must be 1/ 2⇡ in order for S, and thus X and Y , to be random variables. Show
⇥ ⇤
that E [X] = 0 and that E X 2 = 1.
2
Solution:
p From (A.40) and the fact that lims!1 FS (s) = 1, we see that 2⇡↵ = 1, so
↵ = 1/ 2⇡. From the fact that fX (x) = fX ( x) for all x, we see that E [X] = 0. Also,
since S is exponential and is seen
⇥ to
⇤ have mean 2, and since X and Y must have the same
second moment, we see that E X 2 = 1. This also follows by using integration by parts.
c) Find the probability density of R =
p
S. R is called a Rayleigh rv.
Solution: Since S 0 and FS (s) = 1 e s/2 , we see that R
2
Thus the density is given by fR (r) = re r /2 .
0 and FR (r) = 1
2
e r /2 .
Exercise 3.2: a) By expanding in a power series in (1/2)r2 2 , show that
exp
✓ 2 2◆
r
r2 2
r4 4
r2k 2k
=1+
+
+ ··· +
+ ··· .
2
2
2
2(2 )
k!2k
(A.41)
Solution: Recall that ex = 1 + x + x2 /2 + · · · + xn /n! + · · · . Using r2 2 /2 in place of x,
we get (A.41).
b) By expanding erZ in a power series in rZ, show that
⇥ ⇤
⇥ ⇤
h
i
r2 E Z 2
rk E Z k
rZ
gZ (r) = E e
= 1 + rE [Z] +
+ ··· +
+ ··· .
2
k!
Solution: The power series of erZ converges for each sample value of Z, so we have

⇥ rZ ⇤
r2 Z 2
rn Z n
E e
= E 1 + rZ +
+ ··· +
+ ··· .
2
n!
(A.42)
If the moment generating function is finite, then the expectation can be interchanged with
the sum, although we shall not prove that here. This exchange results in (A.42).
76
APPENDIX A. SOLUTIONS TO EXERCISES
c) By matching powers of r between parts (a) and (b), show that for all integer k
h
i
(2k)! 2k
E Z 2k =
= (2k
k!2k
1)(2k
3) · · · (3)(1) 2k
;
1,
h
i
E Z 2k+1 = 0.
Solution: The unstated assumption in this exercise is that Z ⇠ N (0, 2 ). With this
assumption, (3.6) shows that gZ (r) = exp r2 2 /2 , which is given in (A.42). Since the
power series is unique, the terms in (A.41) and (A.42) can be equated term by term so that
for even powers of r,
⇥
⇤
r2k E Z 2k
r2k 2k
=
.
(2k)!
k!2k
(A.43)
For odd powers of r, E [Z r ] = 0, which we already know from symmetry. Solving (A.43),
we get
h
i (2k)! 2k
E Z 2k =
.
k!2k
We can hnow separate (2k)! into a product
of even terms and
ih
i a product of odd terms,
(2k)! = (2k)(2k 2)(2k 4) · · · (2) (2k 1)(2k 3) · · · (1) . The product of the even
⇥
⇤
terms is k!2k , so E Z 2k = (2k 1)(2k 3) · · · (1) 2k , which is the desired result.
Exercise 3.3: Let X and Z be IID normalized Gaussian random variables. Let Y = |Z| Sgn(X), where
Sgn(X) is 1 if X
0 and
1 otherwise. Show that X and Y are each Gaussian, but are not jointly Gaussian.
Sketch the contours of equal joint probability density.
Solution: Note that Y has the magnitude of Z but the sign of X, so that X and Y are
either both positive or both negative, i.e., their joint density is nonzero only in the first and
third quadrant of the X, Y plane. Conditional on a given X, the conditional density of Y
is twice the conditional density of Z since both Z and Z are mapped into the same Y .
2
2
Thus fXY (x, y) = (1/⇡) exp( x y )/2 for all x, y in the first or third quadrant.
z
⌦
$
y
&
Exercise 3.4: a) Let X1 ⇠ N (0, 12 ) and let X2 ⇠ N (0, 22 ) be independent of X1 . Convolve the density
of X1 with that of X2 to show that X1 + X2 is Gaussian, N (0,
2
1 +
2
2 ).
Solution: Let Z = X1 + X2 . Since X1 and X2 are independent, the density of Z is the
77
A.3. SOLUTIONS FOR CHAPTER 3
2 =
convolution of the X1 and X2 densities. For initial simplicity, assume X
1
Z 1
fZ (z) = fX1 (z) ⇤ fX2 (z) =
fX1 (x)fX2 (z x) dx
1
Z 1
1
1
2
2
p e x /2 p e (z x) /2 dx
=
2⇡
2⇡
1
Z 1
z2
1
2
=
e (x xz+ 2 ) dx
2⇡ 1
Z 1
z2
z2
1
2
=
e (x xz+ 4 ) 4 dx
2⇡ 1
Z 1
z 2
1
1
z 2 /4
p e
p e (x 2 ) dx
=
2 ⇡
⇡
1
1
2
p e z /4 ,
=
2 ⇡
2
X2 = 1.
since the last integral integrates a Gaussian pdf with mean z/2 and variance 1/2, which
evaluates to 1. As expected, Z is Gaussian with zero mean and variance 2.
The ‘trick’ used here in the fourth equation above is called completing the square. The idea
is to take a quadratic expression such as x2 + ↵z + z 2 and to add and subtract ↵2 z 2 /4.
Then x2 +↵xz+↵z 2 /4 is (x+↵z/2)2 , which leads to a Gaussian form that can be integrated.
Repeating the same steps for arbitrary
2 + 2 .
0 and variance X
X2
1
2
X1 and
2
X2 , we get the Gaussian density with mean
b) Let W1 , W2 be IID normalized Gaussian rv’s . Show that a1 W1 + a2 W2 is Gaussian, N (0, a21 + a22 ). Hint:
You could repeat all the equations of (a), but the insightful approach is to let Xi = ai Wi for i = 1, 2 and
then use (a) directly.
Solution: Following the hint
2
2
2
2
Xi = ↵i for i = 1, 2, so ↵1 W1 + ↵2 W2 is N (0, ↵1 +↵2 ).
c) Combine (b) with induction to show that all linear combinations of IID normalized Gaussian rv’s are
Gaussian.
Solution: The inductive hypothesis is that if {WP
1} is a sequence
of IID normal
i; i
P
rv’s, if {↵i ; i P1} is a sequence ofP
numbers, and if ni=1 ↵i Wi is N (0, ni=1 ↵i2 ) for a given
n+1 2
n
1, then n+1
i ). The basis for the induction was established
i=1 ↵i Wi is N (0,
i=1 ↵P
in (b). For the inductive step, let X =P ni=1 ↵i Wi . Now X is independent of P
Wn+1 and
n+1 2
by the inductive hypothesis X ⇠ N (0,P ni=1 ↵i2 ). From (a),
X
+
W
⇠
N
(0,
n+1
i=1 ↵i ).
Pn
n
2
This establishes the inductive step, so i=1 ↵i Wi ⇠ N (0, i=1 ↵i ) for all n, i.e., all linear
combinations of IID normalized Gaussian rv’s are Gaussian.
Exercise 3.5: a) Let U be an n-rv with mean m and covariance [K] whose MGF is given by (3.18). Let
⇥
⇤
2
X = r T U for an arbitrary real vector r . Show that the MGF of X is given by gX (r) = exp rE [X] + r2 X
/2
and relate E [X] and
2
X to m and [K].
Solution: Note that X is defined in terms of a vector r and the argument of the MGF is
an unrelated real number r. To reduce confusion, we use s as the argument of the MGF.
Note also that it is not directly given whether either U or X are Gaussian. However, we
78
APPENDIX A. SOLUTIONS TO EXERCISES
are given that (3.18) holds, so we have
✓
◆
h T i
⇥ sX ⇤
s2 r T [K]r
sr U
T
gX (s) = E e
= E e
= gU (sr ) = exp sr m +
.
2
(A.44)
Since X = r T U , we have E [X] = r T m and
m)T r ] = r T [K]r .
⇥
⇤
2 /2 .
Substituting this into (A.44), we get gX (s) = exp sE [X] + s2 X
2
X
= E [r T (U
m)(U
b) Show that U is a Gaussian rv.
Solution: A basic fact that we use (but do not prove) is that if a rv has the MGF of a
Gaussian rv, then that rv is in fact Gaussian. The point of (b) is to show that this implies
the same result for a Gaussian random vector. We showed in (a) that X = r T U has the
MGF of a Gaussian rv, and thus is Gaussian. This is true for all choices of r , and thus
shows that all linear combinations of U are Gaussian. Thus from Theorem 3.4.8 (trivially
generalized to non-zero mean), U itself is Gaussian.
Exercise 3.6: a) Let Z ⇠ N (0, [K]) be n-dimensional. By expanding in a power series in (1/2)r T [K] r ,
show that
gZ (r ) = exp

T
r [K]r
2
= 1+
P
j,k rj rk Kjk
2
+ ··· +
⇣P
j,k rj rk Kjk
2m m!
⌘m
+ ··· .
(A.45)
Solution: This looks complicated, but is really quite simple. The first equality above is
the standard result in (3.14) for the MGF of jointly Gaussian rv’s. For the
P second equality,
note that ordinary vector/matrix multiplication shows that r T [K]r = j,k rj rk Kjk . For
any given r , this is simply a number, and we can use the series expansion of exp x =
1 + x + x2 /2 + · · · (which always converges) on that number divided by 2, getting (A.45).
b) By expanding erj Zj in a power series in rj Zj for each j, show that
"
!#
1
1
i
X
X
X
r1`1
r`n h
gZ (r ) = E exp
rj Zj
=
···
· · · n E Z1`1 . . . Zn`n .
(`1 )!
(`n )!
j
` =0
` =0
1
(A.46)
n
Solution: Starting with the definition of gZ (r ),
h
i
hYn
i
Xn
gZ (r ) = E exp
rj Zj = E
exp(rj Zj )
j=1
j=1
"
`j `j !#
Yn
X1 rj Zj
= E
,
j=1
`j =0
`j !
where we have used the power series expansion of ex in the final equality. This is a product
of n sums and (A.46) results from writing out each sum separately and then interchanging
the expection with the multiple sums. This interchange is valid but we won’t justify it here.
c) Let D = {i1 , i2 , . . . , i2m } be a set of 2m distinct integers each between 1 and n and let (j1 , . . . , j2m ) be
a permutation of D. Consider the term rj1 rj2 · · · rj2m E [Zj1 Zj2 · · · Zj2m ] in (A.46). By comparing with the
same set of terms in (A.45) containing the same product rj1 rj2 · · · rj2m , show that
P
j1 j2 ...j2m Kj1 j2 Kj3 j4 · · · Kj2m 1 j2m
E [Zi1 Zi2 · · · Zi2m ] =
,
(A.47)
2m m!
A.3. SOLUTIONS FOR CHAPTER 3
79
where the sum is over all permutations (j1 , j2 , . . . , j2m ) of D.
Solution: This looks simple, but is really quite complicated. Let us start with m = 1 so
that D = (i1 , i2 ) for some choice of i1 , i2 , i1 6= i2 . The two permutations of D are (i1 , i2 )
and (i2 , i1 ). The corresponding term in (A.46) is that with `i1 = 1, `i2 = 1, and `j = 0 for
j 6= i1 , i2 , i.e., it is the term ri1 ri2 E [Zi1 Zi2 ] = ri1 ri2 Ki1 ,i2 . Both permutations correspond to
the same term in (A.46). In (A.45) the two permutations each correspond to a separate but
equal term, ri1 ri2 Ki1 i2 /2 and ri2 ri1 Ki2 i1 /2. Cancelling out the term ri1 ri2 , we get (A.47)
for the case m = 1.
Th analysis becomes more complex for m > 1. We observe three points from the above
analysis for m = 1. First, each single term in (A.46) corresponds to multiple terms in
(A.45). Second, we are only looking at terms in (A.46) for which each `i is either 1 or 0.
In (e), we will find how to avoid analyzing the other terms. Finally, we are relying on the
fact that gZ (r ) can be represented as a unique multinomial expansion in r , so that we can
compare di↵erent ways of calculating this multinomial.
For arbitrary m  n/2, let i1 , . . . , i2m are distinct integers between 1 and n. We order
these integers i1 < i2 < · · · < i2m to be specific. As before, consider the term in (A.46)
where `i1 = 1, `i2 = 1, . . . , `i2m = 1 and `i = 0 otherwise. The specified term is then
ri1 ri2 · · · ri2m E [Zi1 Zi2 · · · Zi2m ].
⇣P
⌘m
Next consider the form
r
r
K
in (A.45) and in particular consider the terms in
j,k j k j,k
this sum for which (j1 , k1 , j2 , k2 , . . . , jm , km ) is a permutation of i1 , . . . , i2m . Each of these
terms, as a function of r1 , . . . , rn is the product ri1 , . . . , ri2m times a product of correlation
terms, and thus the sum of all these terms must be equal to ri1 ri2 · · · ri2m E [Zi1 Zi2 · · · Zi2m ].
Cancelling out ri1 ri2 · · · ri2m , we have (A.47).
d) Find the number of permutations of D containing the same set of unordered pairs ({j1 , j2 }, . . . , {j2m 1 , j2m }).
For example, ({1, 2}, {3, 4}) is the same set of unordered pairs as ({3, 4}, {2, 1}). Show that
X
E [Zj1 Zj2 · · · Zj2m ] =
Kj1 j2 Kj3 j4 · · · Kj2m 1 j2m ,
(A.48)
j1 ,j2 ,... ,j2m
where the sum is over distinct sets of unordered pairs of the set D. Note: another way to say the same
thing is that the sum is over the set of all permutations of D for which j2k 1 < j2k for 1  k  m and
j2k 1 < j2k+1 for 1  k  m
1.
Solution: There are 2 ways to order the elements within each pair, so 2m ways to order
the elements within all the m pairs. There are then m! ways to order the m pairs. Thus
there are 2m m! permutations in (A.47) that all correspond to the same set of unordered
pairs. This cancels the denominator in (A.47), giving rise to (A.48). Note that this is valid
only for 2m distinct rv’s, although they can have arbitrary correlations. We see in (e) how
to apply this to arbitrary powers of individual rv’s.
⇥
⇤
e) To find E Z1j1 · · · Znjn , where j1 + j2 + · · · + jn = 2m, construct the random variables U1 , . . . , U2m , where
U1 , . . . , Uj1 are all identically equal to Z1 , where Uj1 +1 , . . . , Uj1 +j2 are identically equal to Z2 , etc., and use
⇥
⇤ ⇥
⇤
(A.48) to find E [U1 U2 · · · U2m ]. Use this formula to find E Z12 Z2 Z3 , E Z12 Z22 , and E [Z1 ]4 .
Solution: All of the expected values to be found involve E [U1 U2 · · · U2m ] for m = 2. From
(A.48),
E [U1 U2 U3 U4 ] = E [U1 U2 ] E [U3 U4 ] + E [U1 U3 ] E [U2 U4 ] + E [U1 U4 ] E [U2 U3 ] .
80
APPENDIX A. SOLUTIONS TO EXERCISES
Applying this to the 3 desired expectations,
⇥
⇤
⇥ ⇤
E Z12 Z2 Z3 = E Z12 E [Z2 Z3 ] + 2E [Z1 Z2 ] E [Z1 Z3 ]
⇥
⇤
⇥ ⇤ ⇥ ⇤
2
E Z12 Z22 = E Z12 E Z22 + 2 E [Z1 Z2 ]
⇥ ⇤
⇥ ⇤ 2
E Z14 = 3 E Z12
.
Exercise 3.7: Let [Q] be an orthonormal matrix. Show that the squared distance between any two
vectors z and y is equal to the squared distance between [Q]z and [Q]y .
Solution: The squared distance, say d2 between z and y is d2 = (z y )T (z y ). Letting
x = z y , we have d2 = x T x . The squared distance, say d21 , between [Q]z and [Q]y is
⇣
⌘T ⇣
⌘ ⇣
⌘T ⇣
⌘
d21 = [Q]z [Q]y
[Q]z [Q]y = [Q]x
[Q]x = x T [Q]T [Q]x .
From (3.26), [QT ] = [Q 1 ], so [Q]T [Q] = [I] and d21 = d2 .
⇥
⇤
.25
Exercise 3.8: a) Let [K] = .75
. Show that 1 and 1/2 are eigenvalues of [K] and find the normalized
.25 .75
eigenvectors. Express [K] as [Q⇤Q 1 ] where [⇤] is diagonal and [Q] is orthonormal.
h
i
Solution: Note that [K
I] = .75.25 .75.25 . By inspection, this is singular for = 1
T
and
=
for = 1,pand this
p
p1/2. T We easily see that (1, 1) is an eigenvector
p becomes
( 1/2, 1/2) when normalized. Similarly, (1, 1)T , normalized as ( 1/2,
1/2)T , is
p
p
h
i
1/2
0
p1/2 and [Q 1 ] = [Q].
an eigenvector for = 1/2. Thus [⇤] = 10 1/2
, [Q] = p
1/2
1/2
b) Let [K 0 ] = ↵[K] for real ↵ 6= 0. Find the eigenvalues and eigenvectors of [K 0 ]. Don’t use brute force—
think!
0 I] = ↵[K
0 I/↵]. Thus if 0 /↵ is an eigenvalueof [K], then
Solution: Note that [K 0
0 is an eigenvalue of [K 0 ]. Since 1 and 1/2 are the eigenvalues of [K], the eigenvalues of
[K 0 ] are ↵ and ↵/2. The eigenvectors remain unchanged.
c) Find the eigenvalues and eigenvectors of [K m ], where [K m ] is the mth power of [K].
Solution: Let q j , j = 1, 2 be an eigenvector of [K]. Then
[K m ]q j =
j [K
m 1
]q j = · · · =
m
j qj.
Thus the eigenvalues of [K m ] are 1 and (1/2)m . The eigenvectors are unchanged.
Exercise 3.9: Let X and Y be jointly Gaussian with means mX , mY , variances
2
X,
2
Y , and normalized
covariance ⇢. Find the conditional density fX|Y (x | y).
Solution: This was calculated in a straightforward way in (3.37) for the case in which X
e and Ye be the fluctuations of X and Y respectively,
and Y are zero mean. Thus, letting X
the conditional density of the fluctuations is given by
"
#
1
[x̃ ⇢( X / Y )ỹ]2
p
fX|
exp
.
e Ye (x̃|ỹ) =
2 (1
2 X
⇢2 )
2⇡(1 ⇢2 )
X
81
A.3. SOLUTIONS FOR CHAPTER 3
e + X and Y = Ye + Y , we have
Since X = X
"
1
p
fX|Y (x|y) =
exp
2⇡(1 ⇢2 )
X
⇥
x
X
⇢( X / Y )(y
2 (1
2 X
⇢2 )
Exercise 3.10: a) Let X and Y be zero-mean jointly Gaussian with variances
⇤2 #
Y)
2
X,
.
2
Y , and normalized
3
covariance ⇢. Let V = Y . Find the conditional density fX|V (x | v). Hint: This requires no computation.
Solution: Note that v = y 3 for y 2 R is a one-to-one mapping. It follows that if V = v,
then Y = v 1/3 . Thus fX|V (x|v) can be found directly from fX|Y (x|y) as given in (3.37) by
substituting v 1/3 for y, i.e.,
" ⇥
⇤2 #
x ⇢( X / Y )v 1/3
1
p
fX|V (x|v) =
exp
.
2 (1
2 X
⇢2 )
2⇡(1 ⇢2 )
X
b) Let U = Y 2 and find the conditional density of fX|U (x | u). Hint: first understand why this is harder
than (a).
Solution: Note that u = y 2 for y 2 R is not one-to-one. If U = u, then Y is either u1/2 or
u1/2 . We then have
p
p
fXU (x, u)
fXY (x, u) + fXY (x,
u)
p
p
fX|U (x|u) =
=
.
fU (u)
fY ( u) + fY (
u)
p
p
Since fY ( u) = fY (
u), this can be rewritten as
p
p
fX|Y (x, y) + fX|Y (x,
y)
fX|U (x|u) =
.
2
Substituting these terms into (3.37) gives a rather ugly answer. The point here is not the
answer but rather the approach to finding a conditional probability for a Gaussian problem
when the conditioning rv is a non-one-to-one function of a Gaussian rv.
Exercise 3.11: a) Let (X T , Y T ) have a non-singular covariance matrix [K]. Show that [KX ] and [KY ]
are positive definite, and thus non-singular.
Solution: Let X have dimension n and Y have dimension m. If [KX ] is not positive
definite but only semi-definite, then there is a b 6= 0 such that b T [kX ]b = 0. Defining b̂
as the m + n dimensional vector with b as the first n components and zeros as the last m
T
components, we see that b̂ [K]b̂ = 0, so that [K] is not positive definite. The contrapositive
of this is that if [K] is positive definite, then [KX ] is also positive definite.
The same argument shows that if [K] is positive definite, then [KY ] is also positive definite.
In fact, this argument shows that all submatrices of [K] that are centered around the main
diagonal are positive definite.
b) Show that the matrices [B] and [D] in (3.39) are also positive definite and thus non-singular.
82
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: We can represent [K] in the form [Q⇤Q 1 ] where [Q] is a matrix whose columns
are orthonormal eigenvectors of [K] and [⇤] is the diagonal matrix of eigenvalues. Assuming
that [K] is positive definite, the eigenvalues are all strictly positive. It is easily seen that the
inverse of [K] can be represented as [K 1 ] = [Q⇤ 1 Q 1 ] where [⇤ 1 ] is the diagonal matrix
of the reciprocals of the eigenvalues. Since the eigenvalues are positive, their reciprocals are
also, so [K 1 ] is also positive definite. From (a), then, the matrices [B] and [D] are also
positive definite.
Exercise 3.12: Let X and Y be jointly-Gaussian rv’s with means m X and m Y , covariance matrices
[KX ] and [KY ] and cross covariance matrix [KX ·Y ]. Find the conditional probability density fX |Y (x | y ).
Assume that the covariance of (X T , Y T ) is non-singular. Hint: think of the fluctuations of X and Y .
f = X m X and Y
f = Y m Y be the fluctuations of X and Y . Then
Solution: Let X
f
f
X and Y have the same covariances and cross covariance as X and Y . They are also
zero-mean and jointly Gaussian so their density is given by (3.40), i.e.,
fXe |Y
f (x̃ |ỹ ) =
exp
n
1
2
⇣
⌘T
⇣
⌘o
x̃ + [B 1 C] ỹ [B] x̃ + [B 1 C] ỹ
p
.
(2⇡)n/2 det[B 1 ]
We now get fX |Y (x |y ) simply by translation,
fX |Y (x |y ) =
exp
n
1
2
⇣
x
⌘T
⇣
m Y ) [B] x
p
(2⇡)n/2 det[B 1 ]
m X + [B 1 C] (y
m X + [B 1 C] (y
⌘o
mY )
Exercise 3.13: a) Let W be a normalized IID Gaussian n-rv and let Y be a Gaussian m-rv. Suppose
⇥
⇤
we would like to choose Y so that the joint covariance E W Y T is some arbitrary real-valued n ⇥ m matrix
[K]. Find the matrix [A] such that Y = [A]W achieves the desired joint covariance. Note: this shows that
any real-valued n ⇥ m matrix is the joint covariance matrix for some choice of random vectors.
P
Solution: Y = [A]W means that Yi = nj=1 aij Wj for each i. Since the Wj are normalized
and IID, E [Yi Wj ] = aij . This is turn can be rewritten as E [Y W T ] = [A]. Thus we choose
[A] = [K T ]. Note that the desired matrix [K] also determines the covariance [KY ], i.e.,
[KY ] = E [Y Y T ] = E [AW W T AT ] = [AAT ].
In other words, we can choose [K] arbitrarily, but this also determines [KY ]
b) Let Z be a zero-mean Gaussian n-rv with non-singular covariance [KZ ], and let Y be a Gaussian m-rv.
⇥
⇤
Suppose we would like the joint covariance E Z Y T to be some arbitrary n ⇥ m matrix [K 0 ]. Find the
matrix [B] such that Y = [B]Z achieves the desired joint covariance. Note: this shows that any real valued
n ⇥ m matrix is the joint covariance matrix for some choice of random vectors Z and Y where [KZ ] is given
(and non-singular).
Solution: For any given m by n matrix [B], if we choose Y = [B]Z , then
E [Z Y T ] = E [Z Z T [B T ]] = [KZ ][B T ].
.
83
A.3. SOLUTIONS FOR CHAPTER 3
Thus if we want to set this equal to some given matrix [K 0 ], it is sufficient to choose
[B] = [K 0 ]T [KZ 1 ].
c) Now assume that Z has a singular covariance matrix in (b). Explain the constraints this places on possible
⇥
⇤
choices for the joint covariance E Z Y T . Hint: your solution should involve the eigenvectors of [KZ ].
Solution: If [KZ ] is singular, then there are one or more eigenvectors of [KZ ] of eigenvalue
0, i.e., vectors b such that E [Z Z T ] b = 0. For such b, we must have E [(b T Z )(Z T b)] = 0,
i.e., the first and second moments of b T Z must be 0. For each such b, the transpose b T
must satisfy b T E [Z Y T ] = 0. This means that each linearly independent eigenvector b of
eigenvalue 0 provides a linear constraint b T [K 0 ] = 0 on [K 0 ].
Exercise 3.14: a) Let W = (W1 , W2 , . . . , W2n )T be a 2n dimensional IID normalized Gaussian rv. Let
2
S2n = W12 + W22 + · · · + W2n
. Show that S2n is an nth order Erlang rv with parameter 1/2, i.e., that
fS2n (s) =
2 n sn 1 e s/2
.
(n 1)!
(A.49)
Hint: look at S2 from Exercise 3.1.
Solution: Exercise 3.1 showed that S2 is an exponentially distributed random variable
of mean 2 (i.e., parameter = 1/2). Thus S2n is the sum of n exponential rv’s each of
parameter 1/2. The Erlang density (see Section 2.2) is defined as the density of the sum
of n IID exponential rv’s . It’s density is given in Table 1.1 and is easily calculated by
convolving n exponential densities. The result (for = 1/2) is given in (A.49).
b) Let R2n =
p
S2n . Find the probability density of R2n .
2 = S , we have
Solution: Since R2n
2n
2
fR2n (r) =
2 n r2(n 1) e r /2
2r
(n 1)!
=
2 (n 1) r2n 1 e r /2
.
(n 1)!
2
c) Let v2n (r) be the volume of a 2n dimensional sphere of radius r and let b2n (r) be the surface area of that
sphere, i.e., b2n (r) = dv2n (r)/dr. The point of this exercise is to show how to calculate these quantities. By
considering an infinitesimally thin spherical shell of thickness at radius r, show that
fR2n (r) = b2n (r)fW (w ) |W :W T W =r2 .
(A.50)
Solution: Note that fR2n (r) is the probability that R2n lies between r and r+ (more precisely, fR2n (r) is lim !0 (1/ )Pr{r  R2n < r + }. In the same infinitesimal sense fW (w ) is
constant in this infinitesimal 2n dimensional spherical shell. Finally, in the same infinitesimal sense, b2n (r) is the volume of this shell. We then see that (A.50) states that the
probability of this shell is equal to the volume of the shell times the (constant) probability
per unit volume of the shell. The equation is exact since has been cancelled out.
d) Calculate b2n (r) and v2n (r). Note that for any fixed
of a sphere of radius r to that of a spherical shell of width
⌧ r, the limiting ratio as n ! 1 of the volume
and outer radius r is 1.
p
2n
Solution: Note that fW (w ) =
2⇡
exp( r2 /2) over the shell where W T W = r2 .
84
APPENDIX A. SOLUTIONS TO EXERCISES
Thus, from (A.50)
2
b2n (r) =
(2⇡)n 2 (n 1) r2n 1 e r /2
2⇡ n r2n 1
=
.
(n 1!
(n 1)!e r2 /2
We find v2n (r) by integrating over b2n (r), getting
v2n (r) =
⇡ n r2n
.
n!
The final statement that the volume within of the surface to the entire volume requires us
to look at the shell between r
and r, whose volume is v2n (r) v2n (r
). The ratio of
this shell to the volume is v2n (r) 1 (1
/r)2n ) which approaches 1 as n ! 1. The shell
between r and r + has a much larger volume than v2n (r) for large n. If you lived in large
dimensional space, you could not store enough paint in your house to paint the outside.
Exercise 3.15: a) Solve directly for [B], [C], and [D] in (3.39) for the one dimensional case where
n = m = 1. Show that (3.40) agrees with (3.37)
Solution: For X and Y one-dimensional, the covariance matrix of (X, Y )T is
2
3
2
3
2
[KX ] [KX ·Y ]
⇢
X Y
X
5 = 4
5.
[K] = 4
T
2
[KX ·Y ] [KY ]
⇢ X Y
Y
With the assumption in (3.39) that [K] is positive definite, [K 1 ] must exist and we can
solve directly for B, C in

B C
[K 1 ] =
C D
2 B +⇢
by multiplying out the left two terms in [K][K 1 ] = [I]. We get X
X YC = 1
2
and ⇢ X Y B + X C = 0. From the second equation, C = ⇢ X B/ Y . Substituting this
2 (1
into the first equation, B = 1/[ X
⇢2 )]. Substituting this into the equation for C,
C = ⇢/[ X Y (1 ⇢2 )]. From symmetry or from the right two terms, we solve for D,
getting
2
3
2
1
1
⇢
1
X
X Y 5
4
[K 1 ] =
.
1
1
2
1 ⇢2
⇢
X
Y
Y
Finally, noting that B 1 C = ⇢ X / Y , it is simple but slightly tedious to substitute these
terms into (3.40) to get (3.37).
Exercise 3.16: a) Express [B], [C], and [D], as defined in (3.39), in terms of [KX ], [KY ] and [KX ·Y ]
by multiplying the block expression for [K] by that for [K] 1 . You can check your solutions against those
in (3.46) to (3.48). Hint: You can solve for [B] and [C] by looking at only the left two of the four block
equations in [KK
1
]. You can use the symmetry between X and Y to solve for [D].
85
A.3. SOLUTIONS FOR CHAPTER 3
Solution: One reason for going through this exercise, for those not very familiar with
matrix manipulations, is to realize that algebraic manipulations on matrices are very similar
to those on equations of numbers and real variables. One major di↵erence is that matrices
are not in general commutative (i.e., AB 6= BA in many cases), and thus premultiplication
and postmultiplication are di↵erent. Another is that invertibility involves much more than
being non-zero. Recognizing this, we proceed with slightly guided plug and chug.
Multiplying out two of the block terms in [KK 1 ], we get [KX B] + [KX ·Y C T ] = [I] and
T
[KX·Y
B]+[KY C T ] = 0. These involve only two of the unknown terms, and we now solve for
those terms. Recognizing that [KX ] and [KY ] are invertible, we can rearrange the second
equation as
[C T ] =
T
[KY 1 KX·Y
B].
Substituting this into the first equation, we get
h
i
T
[KX ] [KX·Y KY 1 KX·Y
] [B] = [I].
Now [B] must be invertible (See Exercise 3.11c) Thus the matrix preceeding [B] above is
also invertible, so
h
i 1
T
[B] = [KX ] [KX·Y KY 1 KX·Y
]
.
This agrees wth the solution (derived very di↵erently) in (3.46). Next, to solve for [C], we
take the transpose of [C T ] above, leading to [C] = [BKX ·Y KY 1 ]. This agrees with (3.47).
We could solve for [D] in the same way, but it is easier to use the symmetry and simply
interchange the roles of X and Y to get (3.48).
b) Use your result in (a) for [C] plus the symmetry between X and Y to show that
[BKX ·Y KY 1 ] = [KX 1 KX ·Y D].
Solution: The quantity on the left above is [C] as derived in (a). By using the symmetry
1
T
between X and Y , we see that [DKX
C T , and taking the transpose completes
·Y KX ] is
the argument.
c) Show that [KV 1 G] = [H T KZ 1 ] for the formulations X = [G]Y + V and Y = [H]X + Z where X and Y
are zero-mean, jointly Gaussian and have a non-singular combined covariance matrix. Hint: This is almost
trivial from (b), (3.43), (3.44), and the symmetry.
Solution: From (3.43), [KV ] = [B 1 ] and from (3.44), [G] = [KX ·Y KY ]. Substituting
this into the left side of (b) and the symmetric relations for X and Y interchanged into
the right side completes the demonstration.
Exercise 3.17: Let X and Z be statistically independent Gaussian rv’s of arbitrary dimension n and
m respectively. Let Y = [H]X + Z where [H] is an arbirary real n ⇥ m matrix.
a) Explain why X1 , . . . , Xn , Z1 , . . . , Zm must be jointly Gaussian rv’s and then why X1 , . . . , Xn , Y1 , . . . , Ym
must be jointly Gaussian.
86
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: Let a be an arbitrary real n-vector and b be an arbitrary real m vector. Then
a T X is a Gaussian rv. Also b T Z is a Gaussian rv and is independent of a T X . Thus a T X +
b T Z is a Gaussian rv. Since a and b are arbitrary, this shows that all linear combinations
of X1 , . . . , Xn , Z1 , . . . , Zm are Gaussian, and thus that X1 , . . . , Xn , Z1 , . . . , Zm are jointly
Gaussian. In the same way, any linear combination, a T X + b T Y = a T X + b T [H]X + b T Z
is the sum of two independent rv’s, (a T + b T [H])X and b T Z . Thus X1 , . . . , Xn , Z1 , . . . , Zm
are jointly Gaussian.
b) Show that if [KX ] and [KZ ] are non-singular, then the combined matrix [K] for (X1 , . . . , Xn , Y1 , . . . , Ym )T
must be non-singular.
Solution: For any covariance matrix, E [U U T ], the definition of positive definite is the
same as the condition that any linear combination c T U with c 6= 0 is a rv with positive
variance (i.e., c T E [U U T ] c > 0. For the case here, if b 6= 0, then VAR [b T z ] > 0 and b T Z
is one of the independent rv’s being added. If b = 0, then the other rv is a T X , which has
a positive variance since a must then be nonzero.
Exercise 3.18: a) Verify (3.56) for k = 0 by the use of induction on (3.54).
⇥
⇤
⇥
⇤
Solution: We start with E X 2 (1) = E W 2 (0) 2 , which is
basis
and
⇥ the
⇤ of2 the induction
2
2n
which satisfies (3.56) for n = 0, k = 0. Next assume that E X (n) = (1 ↵ )/(1 ↵2 )
for given n 1. Using (3.54) and the independence of Xn and Wn , we have
2 ↵2 (1
⇥
⇤
⇥
⇤
⇥
⇤
↵2n )
E X 2 (n+1 ) = ↵2 E X 2 (n) + E W 2 (n) =
+ 2
2
1
↵
"
#
2
2(n+1 )
2
2 (1
↵
↵
1
↵
↵2(n+1) )
2
=
+
=
.
1 ↵2
1 ↵2
1 ↵2
b Verify (3.56) for k = 0 directly from (3.55).
Solution: Since W (0), W (1), . . . , are independent, we can take the expected value of (3.55)
squared to get
⇥
⇤
E X 2 (n) =
2
+ ↵2 2 + · · · ↵2(n 1) 2 =
where the last inequality can be verified by dividing 1
2 (1
↵2n )
,
↵2
1
↵2n by 1
↵2 .
c) Verify (3.56) for k > 0 by using induction on k.
Solution: We have demonstrated the base of the induction (with k = 0) for all n in (a)
and (b). Now assume (3.56) for any given k 0, i.e.,
E [X(n)X(n + k)] =
2 ↵k (1
1
↵2n )
↵2
(A.51)
.
For k + 1, we then have
h
⇣
⌘i
E [X(n)X(n+k+1)] = E X(n) ↵X(n+k + W (n+k)
= ↵E [X(n)X(n + k)] =
2 ↵k+1 (1
1
↵2
↵2n )
.
87
A.3. SOLUTIONS FOR CHAPTER 3
d) Verify that
2
lim E [X(n)X(n + k)] =
n!1
1
↵k
.
↵2
Solution: Since |↵| < 1 by assumption, limn!1 ↵2n = 0. The result is then immediate
from (A.51).
Exercise 3.19: Let {X(t); t 2 <} be defined by X(t) = tA for all t 2 < where A ⇠ N (0, 1). Show that
this is a Gaussian process. Find its mean for each t and find its covariance function. Note: The purpose of
this exercise is to show that Gaussian processes can be very degenerate and trivial.
Solution: Since X(t1 ), . . . , X(tn ) are all linear combinations of A, which is N (0, 1),
{X(t); t 2 R} is a Gaussian process.
Since E [A] = 0, we see that E [X(t)] = tE [A] = 0.
⇥ 2⇤
Similarly, E [X(t)X(⌧ )] = t⌧ E A = t⌧ . The sample functions of this process are straight
lines through the origin with the slope of the line given by the sample value of A.
Exercise 3.20: Let {X(t); t
0} be a stochastic process with independent and stationary increments
and let X(0) be an arbitrary random variable. Show that E [X(t)] = E [X(0)] + t E [X(1) X(0)] and that
KX (t, ⌧ ) = VAR [X(0)] + t [VAR [X(1)]
VAR [X(0)]] .
Solution: For integer n > 0, we can express
n
X
X(n) = X(0) +
[X(i)
X(i
1)].
i=1
Using the stationary increment property, E [X(i) X(i 1)] = E [X(1) X(0)], so E [X(n)] =
E [X(0)] + nE [X(1) X(0)]. Extending this to increments of 1/m for arbitrary integers
n, m > 0,
X(n/m) = X(0) +
n
X
i=1
X(
i
)
m
X(
i 1
).
m
Taking expected values and using the stationary increment property, E [X(n/m)] = E [X(0)]+
nE [X(1/m) X(0)]. For n = m, this is E [X(1)] E [X(0)] = mE [X(1/m) X(0)], so we
have
n
E [X(n/m)] = E [X(0)] + E [X(1) X(0)] .
m
This is the desired result for E [X(t)] for all positive rational t. Unfortunately this can not
be extended to irrational t without making a further assumption about the process, and
being concerned about it appears to be mathematical fetishism, especially since having a
nonzero mean is somewhat silly for this kind of process.
For the covariance, we can express the fluctuations of the rv’s as
e
e
X(n)
= X(0)
+
n
X
e
[X(i)
i=1
e
X(i
1)].
88
APPENDIX A. SOLUTIONS TO EXERCISES
The increments are independent and stationary, and we also assume the increments are
independent of X(0). This should have been stated explicitly in the problem statement,
but the exercise becomes more challenging when one has to figure that out. Assuming
k > n, we then have
h
i
e X(m)
e
KX (n, k) = E X(n)
= VAR [X(0)] + nVAR [X(1) X(0)] .
For n = 1, we see that VAR [X(1) X(0)] = VAR [X(1)] VAR [X(0)], so the result follows
for t and ⌧ integer. The result for t and ⌧ rational follows as before.
Exercise 3.21: a) Let X(t) = R cos(2⇡f t + ✓) where R is a Rayleigh rv and the rv ✓ is independent of
R and uniformly distributed over the interval 0 to 2⇡. Show that E [X(t)] = 0.
Solution: This can be done by standard (and quite tedious) manipulations, but if we first
look at t = 0 and condition on a sample value of R, we are simply looking at cos(✓), and
since ✓ is uniform over [0, 2⇡), it seems almost obvious that the mean should be 0. To
capture this intuition, note that cos(✓) = cos(✓ + ⇡). Since ✓ is uniform between 0 and
2⇡, E [cos(✓)] = E [cos(✓ + ⇡)], so that E [cos(✓)] = 0. The same argument works for any t,
so the result follows.
⇥ ⇤
b) Show that E [X(t)X(t + ⌧ )] = 12 E R2 cos(2⇡f ⌧ ).
Solution: Since ✓ and R are independent, we have
⇥ ⇤
E [X(t)X(t + ⌧ )] = E R2 E [cos(2⇡f t + ✓) cos(2⇡f (t + ⌧ ) + ✓)]
⇥ ⇤1
= E R2 E [cos(4⇡f t + 2⇡f ⌧ + 2✓) + cos(2⇡f ⌧ )]
⇥ ⇤2
E R2 cos(2⇡f ⌧ )
=
.
2
where we used a standard trigonometric identity and then took the expectation over ✓ using
the same argument as in (a).
c) Show that {X(t); t 2 R} is a Gaussian process.
Solution: Let W1 , W2 be IID normal Gaussian rv’s. These can be expressed in polar
coordinates as W1 = R cos ✓ and W2 = R sin ✓, where R is Rayleigh and ✓ is uniform. The
rv R cos ✓ is then N (0, 1). Similarly, X(t) is a linear combination of W1 and W2 for each t,
so each set {X(t1 ), X(t2 ), . . . X(tk )} of rv’s is jointly Gaussian. It follows that the process
is Gaussian.
Exercise 3.22: Let h(t) be a real square-integrable function whose Fourier transform is 0 for |f | > B
for some B > 0. Show that
P
2
n h (t
n/2B) = (1/2B)
theorem expansion for a time shifted sinc function.
R
h2 (⌧ ) d⌧ for all t 2 <. Hint: find the sampling
Solution: We use a slightly simpler approach than that of the hint. The sampling theorem
expansion of h(t) is given by
X
n
h(t) =
h
sinc(2Bt n).
2B
n
A.3. SOLUTIONS FOR CHAPTER 3
89
p
Since the functions
n); n 2 Z} are orthonormal, the energy equation,
R 1 { 22B sinc(2Bt
P
(3.64), says that 1 h (t)dt = n 2Bh2 n/2B) . For the special case of t = 0, this is the
same as what is to be shown (summing over n instead of n). To generalize this, consider
h(t + ⌧ ) as a function of t for fixed ⌧ . Then using the same sampling expansion on this
shifted function,
X
n
h(t + ⌧ ) =
h
+ ⌧ sinc(2Bt n).
2B
n
The
of h(t + ⌧ ) (as a function of t for fixed ⌧ ) is still 0 for f > B, and
R 2 FourierRtransform
2
h (t)dt = h (t + ⌧ ) dt. Thus,
Z
⇣ n
⌘
X
2
2
h (t) dt =
2Bh
+⌧ .
2B
n
Replacing n by
n in the sum over Z and interchanging t and ⌧ , we have the desired result.
Exercise 3.23: a) Let Z = (Z1 , . . . , Zn )T be a circularly-symmetric n-rv. Show that Zk is circularly
symmetric for each k, 1  k  n. Hint: use the definition directly (you cannot assume that Z is also
Gaussian).
Solution: We carry out the argument for n = 2. The extension to arbitrary n will then be
clear. Assume that Z has a probability density function and let Y = ei✓ Z for an arbitrary
angle ✓. Then Z is circularly symmetric if for all such Y , fZ1 ,Z2 (z1 , z2 ) = fY1 ,Y2 (z1 , z2 ) for
all z . Integrating both sides over the real and imaginary parts of z1 , we get the marginal
densiy of Z2 on one side and Y2 on the other. That is fZ2 (z2 ) = fY2 (z2 ). Since Y2 = ei✓ Z2 ,
and this is true for all choices of ✓, Z2 is circularly symmetric.
Integrating over z2 instead of z1 shows that Z1 is circularly symmetric. For n > 2, we integrate over all components other than k to show that Zk is circularly symmetric. Although
this shows that a circularly symmetric n-rv must have circularly symmetric components,
the end of Example 3.7.7 shows that circularly-symmetric components do not necessarily
combine into a circularly symmetric vector.
b) Show that Z1 + Z2 is circularly symmetric. For any complex n-vector c, show that c T Z is a circularly
symmetric rv.
Solution: Let U = Z1 + ZR2 . We can still find the PDF of U by convolving the PDF’s
of Z1 and Z2 , i.e., fU (u) = fZ1 (z1 )fZ2 (u z1 ) dz1 . Here however, the integration is over
the complex plane and the result for each complex value u is complex. Let Y = ei✓ Z ,
i✓
i✓
i✓
so
R that Y1 = e Z1 and Y2 = e Z2 . Let V = Y1 + Y2 = e U . We then have fV (u) =
fY1 (z1 )fY2 (u z1 ) dz1 . From the circular symmetry of Z1 and Z2 , this integral is equal to
fU (u), and this is true for all u and ✓. Thus U is circularly symmetric. We can next show
that c1 Z1 +c2 Z2 is circularly symmetric and then use induction to see that c T Z is circularly
symmetric.
There are two lessons from this exercise. The first is that one can build up a set of results
about circular symmetry and the second is that this is rather painful and time-consuming.
The text (and the majority of applications) deal with complex random vectors that are both
circularly symmetric and Gaussian, and here things are much more pleasant and elegant.
90
APPENDIX A. SOLUTIONS TO EXERCISES
Exercise 3.24: Let A be a complex Gaussian rv, i.e., A = A1 + iA2 where A1 and A2 are zero-mean
jointly-Gaussian real rv’s with variances
a) Show that E [AA⇤ ] =
2
1 +
2
1 and
2
2 respectively.
2
2.
Solution:
AA⇤ = (A1 + iA2 )(A1
iA2 ) = A21 + A22 .
⇥
⇤
E [AA⇤ ] = E A21 + A22 =
b) Show that
2
1 +
(A.52)
2
2.
⇥
⇤
E (AA⇤ )2 = 3 14 + 3 24 + 2 12 22 + 4 (E [A1 A2 ])2 .
Solution: Using (A.52),
⇥
⇤
⇥
⇤
⇥ ⇤
⇥ ⇤
⇥
⇤
E (AA⇤ )2 = E (A21 + A22 )2 = E A41 + E A42 + 2E A21 A22 .
(A.53)
⇥ 4⇤
⇥
⇤
From (3.7), E A` = 3 `4 for ` = 1, 2. The solution to Exercise 3.5(e) shows that E A21 A22 =
⇥ ⇤ ⇥ ⇤
2
E A21 E A22 + 2 E [A1 A2 ] . Substituting these terms into (A.53) yields the result to be
shown.
⇥
⇤
c) Show that E (AA⇤ )2
2 (E [AA⇤ ])2 with equality if and only if A1 and A2 are IID. Hint: Lower bound
2
(E [A1 A2 ]) by 0.
Solution: Using (a) and (b),
⇥
⇤
E (AA⇤ )2
2 (E [AA⇤ ])2 = 3 14 + 3 24 + 2 12 22 + 4 (E [A1 A2 ])2
= ( 12
2
2 2
2 ) + 4 (E [A1 A2 ])
2 14
2 24
4 12 22
0.
The condition for equality is that 12 = 22 and (E [A1 A2 ]) = 0, i.e., that A1 and A2 are IID,
which means that A is circularly symmetric.
d Show that VAR [AA⇤ ]
(E [AA⇤ ])2 .
⇥
⇤
Solution: This is immediate from (c), since VAR [AA⇤ ] = E (AA⇤ )2
condition for equality is that A is circularly symmetric.
(E [AA⇤ ])2 . The
Exercise 3.25: Let KX (t) be the covariance function of a WSS process {X(t); t 2 <}. Show that if KX (t)
is continuous at t = 0, then it is continuous everywhere. Hint: Show that lim !0 E [X(0)(X(t
)
X(t))] =
0 for all t. Use the Schwarz inequality.
Solution: The Schwarz inequality applied to zero-mean rv’s says that |E [XY ] | 
Now consider KX (t) KX (t
) for zero-mean rv’s,
|KX (t)
KX (t
Taking the limit on both sides as
everywhere.
)| = |E [X(0)(X(t) X(t
)] |
⇥
⇤ 1/2
 X E (X(t) X(t
))2
⇥
⇤1/2
= X 2KX (0) 2KX ( )
.
X Y.
! 0, we see that if KX is continuous at 0 it is continuous
A.4. SOLUTIONS FOR CHAPTER 4
A.4
91
Solutions for Chapter 4
Exercise 4.1: Let [P ] be the transition matrix for a finite state Markov chain and let state i be recurrent.
Prove that i is aperiodic if Pii > 0.
Solution: Since P11 > 0, there is a walk of length 2 that starts in state 1, goes to state 1
at epoch 1 and goes from there to state 1 at epoch 2. In the same way, for each n > 2, a
n > 0, so the greatest
walk of length n exists, going through state 1 at each step. Thus P11
n > 0} is 1.
common denominator of {n : P11
Exercise 4.2: Show that every Markov chain with M < 1 states contains at least one recurrent set of
states. Explaining each of the following statements is sufficient.
a) If state i1 is transient, then there is some other state i2 such that i1 ! i2 and i2 6! i1 .
Solution: If there is no such state i2 , then i1 is recurrent by definition. That state is
distinct from i1 since otherwise i1 ! i2 would imply i2 ! i1 .
b) If the i2 of (a) is also transient, there is a third state i3 such that i2 ! i3 , i3 6! i2 ; that state must satisfy
i3 6= i2 , i3 6= i1 .
Solution: The argument why i3 exists with i2 ! i3 , i3 6! i2 and with i3 6= i2 is the same
as (a). Since i1 ! i2 and i2 ! i3 , we have i1 ! i3 . We must also have i3 6! i1 , since
otherwise i3 ! i1 and i1 ! i2 would imply the contradiction i3 ! i2 . Since i1 ! i3 and
i3 6! i1 , it follows as before that i3 6= i1 .
c) Continue iteratively to repeat (b) for successive states, i1 , i2 , . . . . That is, if i1 , . . . , ik are generated as
above and are all transient, generate ik+1 such that ik ! ik+1 and ik+1 6! ik . Then ik+1 6= ij for 1  j  k.
Solution: The argument why ik+1 exists with ik ! ik+1 , ik+1 6! ik and with ik+1 6= ik is
the same as before. To show that ik+1 6= ij for each j < k, we use contradiction, noting
that if ik+1 = ij , then ik+1 ! ij+1 ! ik .
d) Show that for some k  M, k is not transient, i.e., it is recurrent, so a recurrent class exists.
Solution: For transient states i1 , . . . , ik generated in (c), state ik+1 found in (c) must be
distinct from the distinct states {ij ; j  k}. Since there are only M states, there cannot be
M transient states, since then, with k = M, a new distinct state iM+1 would be generated,
which is impossible. Thus there must be some k < M for which the extension to ik+1 leads
to a recurrent state.
Exercise 4.3: Consider a finite-state Markov chain in which some given state, say state 1, is accessible
from every other state. Show that the chain has exactly one recurrent class R of states and state 1 2 R.
Solution: Since j ! 1 for each j, there can be no state j for which 1 ! j and j 6! 1.
Thus state 1 is recurrent. Next, for any given j, if 1 6! j, then j must be transient since
j ! 1. On the other hand, if 1 ! j, then 1 and j communicate and j must be in the same
recurrent class as 1. Thus each state is either transient or in the same recurrent class as 1.
Exercise 4.4: Show how to generalize the graph in Figure 4.4 to an arbitrary number of states M
with one cycle of M nodes and one of M
3
1 nodes. For M = 4, let node 1 be the node not in the cycle of
92
APPENDIX A. SOLUTIONS TO EXERCISES
M
1 nodes. List the set of states accessible from node 1 in n steps for each n  12 and show that the
bound in Theorem 4.2.11 is met with equality. Explain why the same result holds for all larger M.
Solution: As illustrated below, the generalization has the set of nonzero transition probabilities Pi,i+1 for 1  i < M and has PM,1 > 0 and PM,2 > 0.
n
- Mn
3n
2n
M 1
..6
.
4, . .. . ,M 2
..
6
@
R
@
?
1n
For M = 4, let T (n) be the set of states accessible from state 1 in n steps.
T (1)
T (2)
T (3)
T (4)
=
=
=
=
{2}
{3}
{4}
{1, 2}
T (5)
T (6)
T (7)
T (8)
=
=
=
=
{2, 3}
{3, 4}
{1, 2, 4}
{1, 2, 3}
T (9)
T (10)
T (11)
T (12)
=
=
=
=
{2, 3, 4}
{1, 2, 3, 4}
{1, 2, 3, 4}
{1, 2, 3, 4}.
What we observe above is that for the first three steps, T (n) is a singleton set, and for
arbitrary M, T (n) for the first M 1 steps are singleton states. There are then M 1
doubleton states, and so forth up to M 1 steps with M 1 states included in T (n). Thus
it is on step (M 1)2 + 1 that all states are first included.
Exercise 4.5: (Proof of Theorem 4.2.11) a) Show that an ergodic Markov chain with M > 1 states
must contain a cycle with ⌧ < M states. Hint: Use ergodicity to show that the smallest cycle cannot contain
M states.
Solution: The states in any cycle (not counting the initial state) are distinct and thus the
number of steps in a cycle is at most M. A recurrent chain must contain cycles, since for
each pair of states ` 6= j, there is a walk from ` to j and then back to `; if any state i other
than ` is repeated in this walk, the first i and all subsequent states before the second i can
be eliminated. This can be done repeatedly until a cycle remains.
Finally, suppose a cycle contains M states. If there is any transition Pij > 0 for which (i, j)
is not a transition on that cycle, then that transition can be added to the cycle and all the
transitions between i and j on the existing cycle can be omitted, thus creating a cycle of
fewer than M steps. If there are no nonzero transitions other than those in a cycle with M
steps, then the Markov chain is periodic with period M and thus not ergodic.
b) Let ` be a fixed state on a fixed cycle of length ⌧ < M. Let T (m) be the set of states accessible from `
in m steps. Show that for each m
1, T (m) ✓ T (m + ⌧ ). Hint: For any given state j 2 T (m), show how
to construct a walk of m + ⌧ steps from ` to j from the assumed walk of m steps.
Solution: Let j be any state in T (m). Then there is an m-step walk from ` to j. There is
also a cycle of ⌧ steps from state ` to `. Concatenate this cycle (as a walk) with the above
m step walk from ` to j, yielding a walk of ⌧ + m steps from ` to j. Thus j 2 T (m + ⌧ )
and it follows that T (m) ✓ T (m + ⌧ ).
93
A.4. SOLUTIONS FOR CHAPTER 4
c) Define T (0) to be the singleton set {`} and show that
T (0) ✓ T (⌧ ) ✓ T (2⌧ ) ✓ · · · ✓ T (n⌧ ) ✓ · · · .
Solution: Since T (0) = {`} and ` 2 T (⌧ ), we see that T (0) ✓ T (⌧ ). Next, for each n 1,
use (b), with m = n⌧ , to see that T (n⌧ ) ✓ T (n⌧ + ⌧ ). Thus each subset inequality above
is satisfied.
d) Show that if one of the inclusions above is satisfied with equality, then all subsequent inclusions are
satisfied with equality. Show from this that at most the first M
inequality and that T (n⌧ ) = T ((M
1)⌧ ) for all n
M
1 inclusions can be satisfied with strict
1.
Solution: We first show that if T ((k+1)⌧ ) = T (k⌧ ) for some k, then T (n⌧ ) = T (k⌧ ) for
all n > k. Note that T ((k+1)⌧ ) is the set of states reached in ⌧ steps from T (k⌧ ). Similarly
T ((k+2)⌧ ) is the set of states reached in ⌧ steps from T ((k+1)⌧ ). Thus if T ((k+1)⌧ ) =
T (k⌧ ) then also T ((k+2)⌧ ) = T ((k+1)⌧ ). Using induction,
T (n⌧ ) = T (k⌧ )
for all n
k.
Now if k is the smallest integer for which T ((k+1)⌧ ) = T (k⌧ ), then the size of T (n⌧ ) must
increase for each n < k. Since |T (0)| = 1, we see that |T (n⌧ )| n + 1 for n  k. Since M
is the total number of states, we see that k  M 1. Thus T (n⌧ ) = T ((M 1)⌧ ) for all
n M 1.
e) Show that all states are included in T (M
1)⌧ .
Solution: For any t such that P``t > 0, we can repeat the argument in part (b), replacing
⌧ by t to see that for any m 1, T (m) ✓ T (m + t). Thus we have
T (M 1)⌧ ✓ T (M 1)⌧ + t ✓ · · · ✓ T (M 1)⌧ + t⌧ = T (M 1)⌧ ,
where (d) was used in the final equality. This shows that all the inclusions above are satisfied
with equality and thus that T (M 1)⌧ = T (M 1)⌧ + kt for all k  ⌧ . Using t in place
of ⌧ in the argument in (d), this can be extended to
T (M 1)⌧ = T (M 1)⌧ + kt
for all k
1.
Since the chain is ergodic, we can choose t so that both P``t > 0 and gcd(t, ⌧ ) = 1. From
elementary number theory, integers k 1 and j 1 can then be chosen so that kt = j⌧ + 1.
Thus
T (M
1)⌧ = T (M
1)⌧ + kt = T (M
1 + j)⌧ + 1 = T (M
1)⌧ + 1 .
(4.5a)
As in (d), T (M 1)⌧ + 2 is the set of states reachable in one step from T (M 1)⌧ + 1 .
From (4.5a), this is the set of states reachable from T (M 1)⌧ in 1 step, i.e.,
T (M
1)⌧ + 2 = T (M
T (M
1)⌧ = T (M
1)⌧ + 1 = T (M
1)⌧ .
Extending this,
1)⌧ + m
for all m
1.
94
APPENDIX A. SOLUTIONS TO EXERCISES
This means that T (M 1)⌧ contains all states that can ever occur from time (M
on, and thus must contain all states since the chain is recurrent.
(M 1)2 +1
f ) Show that Pij
1)⌧
> 0 for all i, j.
Solution: We have shown that all states are accessible from state ` at all times ⌧ (M 1)
or later, and since ⌧  M 1, all are accessible at all times n (M 1)2 . The same applies
to any state on a cycle of length at most M 1. It is possible (as in Figure 4.4), for some
states to be only on a cycle of length M. Any such state can reach the cycle in the proof
in at most M ⌧ steps. Using this path to reach a state on the cycle and following this by
paths of length ⌧ (M 1), all states can reach all other states at all times greater than or
equal to
⌧ (M
1) + M
⌧  (M
1)2 + 1.
The above derivation assumed M > 1. The case M = 1 is obvious, so the theorem is proven.
Exercise 4.6: Consider a Markov chain with one ergodic class of r states, say {1, 2, . . . , r} and M
other states that are all transient. Show that Pijn > 0 for all j  r and n
(r
1)2 + 1 + M
r
r.
Solution: Applying Theorem 4.2.11 (proved in Exercise 4.5) to the ergodic class, We see
that Pijn > 0 for all i  r and j  r and all n (r 1)2 +1. Each transient state i has a path
mP n m > 0
of some length m  M r to some state k in the ergodic class, and thus Pijn Pik
kj
for each j  r if n m (r 1)2 + 1, i.e., if n (r 1)2 + 1 + M r.
Exercise 4.7: a) Let ⌧ be the number of states in the smallest cycle of an arbitrary ergodic Markov
chain of M
3 states. Show that Pijn > 0 for all n
2)⌧ + M. Hint: Look at the proof of Theorem
(M
4.2.11 in Exercise 4.5.
Solution: Exercise 4.5 (e) showed that if ⌧ is the length of the smallest cycle, then for i
in such a cycle, Pijn > 0 for each j and each n (M 1)⌧ . Since some state on a cycle of
length ⌧ can be reached from any state in at most M ⌧ steps, any state can be reached
from any other state in n steps for all n (M 1)⌧ + M ⌧ = (M 2)⌧ + M .
b) For ⌧ = 1, draw the graph of an ergodic Markov chain (generalized for arbitrary M
is an i, j for which Pijn = 0 for n = 2M
3) for which there
3. Hint: Look at Figure 4.4.
Solution: Figure 4.4 contains a cycle of M 1 states on the left side and an overall cycle of
M states on the outside. If we shrink the left side cycle to a single state while maintaining
the outer cycle, we get
: Ml
⇠
..6
.
..
.
4l
6
- 1l
@
R l
@
3l
2
Starting with state 1, we see that state M 1 is reached for the first time at step M 2 and
is reachable for the second time M steps later. One step before that, i.e., at time 2M 3,
95
A.4. SOLUTIONS FOR CHAPTER 4
state M 1 cannot be reached. At 2M
reachable.
c) For arbitrary ⌧ < M
2, however, state M
1 and all other states are
1, draw the graph of an ergodic Markov chain (generalized for arbitrary M) for
which there is an i, j for which Pijn = 0 for n = (M
2)⌧ + M
1. Assume that M and ⌧ are relatively prime.
Solution: We generalize what was done in (b), creating a cycle of length ⌧ on the left while
maintaining the cycle of length M on the outside.
M
⌧ +2
*
··
l
M
*
l
YH
H
HH ?
l
M ⌧ +1
···
- 1l
@
R l
@
3l
2
This can be analyzed in the same way as Exercise 4.4. In particular, let T (n) be the set of
states reachable from state 1 in n steps. Then T (n) is a singleton for n < M, a doubleton
for M  n < M + ⌧ , and so forth up to containing M 1 states for M + ⌧ (M 3)  n <
(M 2)⌧ +M 1
M + ⌧ (M 2). Thus there is a j for which P1j
= 0. It can be seen that this j is
state M ⌧ .
Exercise 4.8: A transition probability matrix [P ] is said to be doubly stochastic if
X
Pij = 1
for all i;
j
X
Pij = 1
for all j.
i
That is, each row sum and each column sum equals 1. If a doubly stochastic chain has M states and is
ergodic (i.e., has a single class of states and is aperiodic), calculate its steady-state probabilities.
Solution: It is easy to see that if the row sums are all equal to 1, then [P ]e = e. If the
column sums are also equal to 1, then e T [P ] = e T . Thus e T is a left eigenvector of [P ] with
eigenvalue 1, and it is unique within a scale factor since the chain is ergodic. Scaling e T to
be probabilities, ⇡ = (1/M, 1/M, . . . , 1/M).
Exercise 4.9: a) Find the steady-state probabilities ⇡0 , . . . , ⇡k 1 for the Markov chain below. Express
your answer in terms of the ratio ⇢ = p/q where q = 1
⇢ = 1.
: 0n
⇠
y
X
1 p
p
1 p
z n
X
1 X
y
p
1 p
z n
X
2 X
y
p. Pay particular attention to the special case
p
z ...
X
1 p
kn
2X
y
Solution: The steady-state equations, using the abbreviation q = 1
p
1 p
zk n
X
1X
y
p
p are
⇡0 = q⇡0 + q⇡1
⇡j = p⇡j 1 + q⇡j+1 ;
⇡k 1 = p⇡k 2 + p⇡k 1 .
for 1  j  k
2
Simplifying the first equation, we get p⇡0 = q⇡1 .
Substituting q⇡1 for p⇡0 in the second equation, we get ⇡1 = q⇡1 + q⇡2 . Simplifying, we
get p⇡1 = q⇡2 .
96
APPENDIX A. SOLUTIONS TO EXERCISES
We can then use induction. Using the inductive hypothesis p⇡j 1 = q⇡j (which has been
verified for j = 1, 2) on ⇡j = p⇡j 1 + q⇡j+1 , we get
p⇡j = q⇡j+1
for 1  j  k
2.
Combining these equations, ⇡j = ⇢⇡j 1 for 1  j  k 1, so ⇡j = ⇢j ⇡0 for 1  j  k 1.
Normalizing by the fact that the steady-state probabilities sum to 1, and taking ⇢ 6= 1,
✓X
◆
k 1 j
1 ⇢
1 ⇢
⇡0
⇢ =1
so ⇡0 =
; ⇡j = ⇢j
.
(A.54)
k
j=0
1 ⇢
1 ⇢k
For ⇢ = 1, ⇢j = 1 and ⇡j = 1/k for 0  j  k 1. Note that we did not use the final
steady-state equation. The reason is that ⇡ = ⇡ [P ] specifies ⇡ only within a scale factor,
so the k equations can not be linearly independent and the kth equation can not provide
anything new.
Note also that the general result here is most simply stated as p⇡j 1 = q⇡j . This says that
the steady-state probability of a transition from j 1 to j is the same as that from j to
j 1. This is intuitively obvious since in any sample path, the total number of transitions
from j 1 to j is within 1 of the transitions in the opposite direction. This important
intuitive notion will become precise after studying the strong law of numbers.
b) Sketch ⇡0 , . . . , ⇡k 1 . Give one sketch for ⇢ = 1/2, one for ⇢ = 1, and one for ⇢ = 2.
Solution: We see from (A.54) that for ⇢ 6= 1, ⇡j is geometrically distributed for 1  j 
k 1. For ⇢ > 1, it is geometrically increasing in j and for ⇢ < 1, it is geometrically
decreasing. For ⇢ = 1, it is constant. If you can’t visualize this without a sketch, you
should draw the sketch yourself.
c) Find the limit of ⇡0 as k approaches 1; give separate answers for ⇢ < 1, ⇢ = 1, and ⇢ > 1. Find limiting
values of ⇡k 1 for the same cases.
Solution:
lim ⇡0 =
k!1
8
>
>
limk!1 11 ⇢⇢k
>
<
limk!1 k1
>
>
>
⇢ 1
: lim
k!1 ⇢k 1
= 1
for ⇢ < 1,
⇢
= 0
for ⇢ = 1,
= 0
for ⇢ > 1.
Note that the approach to 0 for ⇢ = 1 is harmonic in k and that for ⇢ > 1 is geometric in
k. For state k 1, the analogous result is
8
>
>
limk!1 ⇢k 1 11 ⇢⇢k = 0
for ⇢ < 1,
>
<
lim ⇡k 1 =
limk!1 k1
= 0
for ⇢ = 1,
>
k!1
>
>
: lim
⇢k 1 ⇢ 1 = 1 1
for ⇢ > 1,
k!1
⇢k 1
⇢
This shows us that for ⇢ > 1, the probability is still geometric in the states, but the large
probabilities are clustered at the highest states. We will see in the next chapter that with
a countable number of states, all states have probability zero when ⇢ 1, and this shows
us how that limit is approached with increasing k.
97
A.4. SOLUTIONS FOR CHAPTER 4
Exercise 4.10:
a) Find the steady-state probabilities for each of the Markov chains in Figure 4.2.
Assume that all clockwise probabilities in the first graph are the same, say p, and assume that P4,5 = P4,1
in the second graph.
Solution: These probabilities can be found in a straightforward but tedious fashion by
solving (4.8). Note that ⇡P= ⇡ [P ] is a set of M linear equations of which only M 1 are
linearly independent and
⇡i = 1 provides the needed extra equation. The solutions are
⇡i = 1/4 for each state in the first graph and ⇡i = 1/10 for all but state 4 in the second
graph; ⇡4 = 1/5.
One learns more by trying to find ⇡ by inspection. For the first graph, the ⇡i are clearly equal
by symmetry. For the second graph, states 1 and 5 are immediately accessible only from
state 4 and are thus equally likely and each has half the probability of 4. The probabilities
on the states of each loop should be the same, leading to the answer above. It would be
prudent to check this answer by (4.8), but that is certainly easier than solving (4.8).
b) Find the matrices [P 2 ] for the same chains. Draw the graphs for the Markov chains represented by [P 2 ],
i.e., the graph of two step transitions for the original chains. Find the steady-state probabilities for these
two-step chains. Explain why your steady-state probabilities are not unique.
Solution: Let q = 1 p in the first graph. In the second graph, all transitions out of states
3, 4, and 9 have probability 1/2. All other transitions have probability 1.
: 1l
⇠
y
X
p2 + q 2
: 4l
⇠
y
X
p2 + q 2
2pq
2pq
2
p +q
2
p2 + q 2
z l
X
3X
y
8l
z l
X
2X
y
6l
2pq
@
6
R l
@
4H
Y
2pq
✓
j l l
H
2
7
9l - 1l
⇤⌫
@ ?
I
@ l
5
3l
⇤✏
One steady-state probability for the first chain is ⇡1 = ⇡3 = 1/2 and the other is ⇡2 =
⇡4 = 1/2. These are the steady-state probabilities for the two recurrent classes of [P 2 ]. The
second chain also has two recurrent classes. The steady-state probabilities for the first are
⇡2 = ⇡6 = ⇡8 = 0.2 and ⇡4 = .4. Those for the second are ⇡1 = ⇡3 = ⇡5 = ⇡7 = ⇡9 = 0.2.
c) Find limn!1 [P 2n ] for each of the chains.
Solution: The limit for each chain is block diagonal with one block being the even numbers
and the other the odd numbers. Within a block, the rows are the same. For the first chain,
the blocks are (1, 3) and (2, 4). We have limn!1 Pij2n = 1/2 for i, j both odd or both even;
it is 0 otherwise. For the second chain, within the even block, limn!1 Pijn = 0.2 for j 6= 4
and 0.4 for j = 4. For the odd block, limn!1 Pijn = 0.2 for all odd i, j.
Exercise 4.11: a) Assume that ⌫ (i) is a right eigenvector and ⇡ (j) is a left eigenvector of an M ⇥ M
stochastic matrix [P ] where
i 6=
j . Show that ⇡
(j) (i)
⌫
= 0. Hint: Consider two ways of finding ⇡ (j) [P ]⌫⌫ (i) .
Solution: If we first premultiply [P ] by ⇡ (j) , we get
⇡ (j) [P ]⌫⌫ (i) =
j⇡
(j) (i)
⌫ .
98
APPENDIX A. SOLUTIONS TO EXERCISES
If we first postmultiply [P ] by ⌫ (i) , we get
⇡ (j) [P ]⌫⌫ (i) =
Thus ( j
⇡
i )⇡
(j)⌫ (i) = 0. Since (
j
i⇡
(j) (i)
⌫ .
i ) 6= 0, we must have ⇡
(j)⌫ (i) = 0.
b) Assume that [P ] has M distinct eigenvalues. The right eigenvectors of [P ] then span M space (see section
5.2 of Strang, [28]), so the matrix [U ] with those eigenvectors as columns is non-singular. Show that U
1
is
a matix whose rows are the M left eigenvectors of [P ]. Hint: use (a).
Solution: Order the right eigenvectors so that the ith column of [U ] is ⌫ (i) with eigenvalue
i . Let [V ] be a matrix whose rows are linearly independent left eigenvectors of [P ], ordered
so that the ith row is ⇡ (i) with eigenvalue i .
Then [V U ]ji is ⇡ (j)⌫ (i) . From part (a), this is 0 for i 6= j, from which it follows that [V U ] is
diagonal. We next show that the diagonal elements of [V U ] must be nonzero. Since ⇡ (i) is
an eigenvector, it is nonzero by definition. Since [U ] is nonsingular, ⇡ (i) [U ] is nonzero. The
⇡ (i)⌫ (j) ; 1  j  M}. These elements are 0 for all j 6= i, so ⇡ (i)⌫ (i)
elements of ⇡ (i) [U ] are {⇡
must be nonzero. Thus the ith element on the diagonal on [V U ] is nonzero for each i.
The eigenvector ⇡ (i) of i (assuming M distinct eigenvalues) is unique within a scale factor,
but that scale factor can now be chosen so that ⇡ (i)⌫ (i) = 1. With this scaliing, [V U ] = [I],
which means that [V ] = [U 1 ]. Thus, the rows of [U 1 ] are the left eigenvectors of [P ],
scaled so that ⇡ (i)⌫ (i) = 1 for each i.
c) For any given integer i, 1  i  M, let [A] be a diagonal matrix with a single non-zero element, Aii = a 6= 0.
Using the assuptions and results of (b), show that
[U AU
1
] = a⌫⌫ (i)⇡ (i) .
Hint: visualize straightforward vector/matrix multiplication.
Solution: Since each column of [A] is zero other than the ith, each column of [U A] is zero
other than the ith. Since the ith column of [U ] is ⌫ (i) , the ith column of [U A] is a⌫⌫ (i) . Thus
we have
X
(i) (i)
[U AU 1 ]kj =
[U A]k` [U 1 ]`j = [U A]ki [U 1 ]ij = a⌫k ⇡j .
`
Putting these terms together, [U AU
1 ] = a⌫
⌫ (i)⇡ (i) .
d) Verify (4.30).
Solution: Note that [⇤n ] is a diagonal matrix with [⇤n ]ii = ni . Let [⇤ni ] be a diagonal
P
n
matrix with a single non-zero element, ni , in position i, i. Then [⇤] = M
i=1 [⇤i ]. It follows
that
"M
#
M
M
X
X
X
n
1
n
1
n
1
n (i) (i)
[U ⇤ U ] = [U ]
[⇤i ] [U ] = =
[U ⇤i U ] =
i⌫ ⇡ .
i=1
i=1
where we have used the result of (c) in each term.
i=1
99
A.4. SOLUTIONS FOR CHAPTER 4
Exercise 4.12: a) Let
for
k be an eigenvalue of a stochastic matrix [P ] and let ⇡
(k)
of ⇡ (k) and each n that
k . Show that for each component ⇡j
n (k)
=
k ⇡j
X
(k)
be a left eigenvector
(k)
⇡i Pijn .
i
Solution: By the definition of an eigenvector, ⇡ (k) [P ] = k⇡ (k) . By iterating this, ⇡ (k) [P 2 ] =
(k) [P ] = 2 ⇡ (k) , and in general ⇡ (k) [P n ] = n⇡ (k) . The desired expression is the jth
k⇡
k
k
component of this vector equation.
b) By taking magnitudes of each side and looking at the appropriate j, show that
n
k
 M.
(k)
Solution: Choose j to maximize |⇡j | for the given k. Then from (a) and the fact that
Pijn  1,
(k)
| nk | max |⇡j | 
j
X
i
(k)
|⇡i |Pijn 
X⇥
i
Cancelling the common term, | k |n = | nk |  M.
c) Show that
k
⇥
(k) ⇤
(k)
max |⇡j | = M max |⇡j |.
j
j
 1.
Solution: Note that | k |n is exponential in n. Since | k | is nonnegative and real, this is
exponential and increasing in n without bound if | k | > 1. Since | k |n is bounded by M,
| k |  1.
Exercise 4.13: Consider a finite state Markov chain with matrix [P ] which has  aperiodic recurrent
classes, R1 , . . . , R and a set T of transient states. For any given recurrent class R` , consider a vector ⌫
such that ⌫i = 1 for each i 2 R` , ⌫i = limn!1 Pr{Xn 2 R` |X0 = i} for each i 2 T , and ⌫i = 0 otherwise.
Show that ⌫ is a right eigenvector of [P ] with eigenvalue 1. Hint: Redraw Figure 4.5 for multiple recurrent
classes and first show that ⌫ is an eigenvector of [P n ] in the limit.
Solution: A simple example of this result is treated in Exercise 4.29 and a complete
derivation (extended almost trivially to periodic as well as aperiodic recurrent classes) is
given in the solution to Exercise 4.18. Thus we give a more intuitive and slightly less
complete derivation here.
Number the states with the transient states first, followed by each recurent class in order.
Then [P ] has the following block structure
2
3
[P ] [PT R1 ]
6 T
6
6 0
[P ] = 6 .
6 ..
4
0
..
.
[PR1 ]
..
.
0
..
.
0
..
.
..
.
..
.
..
.
0
[PT R ]
0
..
.
[PR ]
7
7
7
7.
7
5
The `th recurrent class has an |R` | by |R` | transition matrix which, viewed alone, has
an eigenvalue = 1 of multiplicity 1, a corresponding unique (within a scale factor) left
100
APPENDIX A. SOLUTIONS TO EXERCISES
eigenvector, say ⇡ (R` ), and a corresponding unique (within a scale factor) right eigenvector,
⌫ (R` ) = (1, . . . , 1)T (see Theorem 4.4.2).
Let ⇡ (`) be an M dimensional row vector whose components are equal to those of ⇡ (R` ) over
the states of R` and equal to 0 otherwise. Then it can be seen by visualizing elementary
row/matrix multiplication on the block structure of [P ] that ⇡ (`) [P ] = ⇡ (`) . This gives us 
left eigenvectors of eigenvalue 1, one for each recurrent class R` .
These  left eigenvectors are clearly linearly independent and span the  dimensional space
of left eigenvectors of eigenvalue 1 (see Exercise 4.18).
If there are no transient states, then a set of  right eigenvectors can be chosen in the same
way as the left eigenvectors. That is, for each `, 1  `  , the components of ⌫ (`) can
be chosen to be 1 for each state in R` and 0 for all other states. This doesn’t satisfy the
eigenvector equation if there are transient states, however. We now show, instead, that for
each `, 1  `  , there is a right eigenvector ⌫ (`) of eigenvalue 1 that can be nonzero both
(`)
on R` and T . such that ⌫i = 0 for all i 2 Rm , for each m 6= `. Finally we will show that
these  vectors are linearly independent and have the properties specified in the problem
statement.
(`)
The right eigenvector equation that must be satisfied by ⌫ (`) assuming that ⌫i
for i 2 R` [ T can be written out component by component, getting
X
(`)
(`)
⌫i
=
Pij ⌫j
for i 2 R`
j2R`
(`)
⌫i
=
X
j2T
(`)
Pij ⌫j +
X
j2R`
(`)
Pij ⌫j
6= 0 only
for i 2 T .
The first set of equations above are simply the usual right eigenvector equations for eigen(`)
value 1 over the recurrent submatrix [PR` ]. Thus ⌫j = 1 for j 2 R` and this solution
(over R` ) is unique within a scale factor. Substituting this solution into the second set of
equations, we get
X
X
(`)
(`)
⌫i =
Pij ⌫j +
Pij
for i 2 T .
j2T
j2R`
This has a unique solution for each ` (see Exercise 4.18). These eigenvectors, {⌫⌫ (`) , 1  ` 
(`)
(`)
} must be linearly independent since ⌫i = 1 for i 2 R` and ⌫i = 0 for i 2 Rm , m 6= `.
They then form a basis for the  dimensional space of eigenvectors of [P ] of eigenvalue 1.
These eigenvectors are also eigenvectors of eigenvalue 1 for [P n ] for each n > 1. Thus
X
X
(`)
(`)
⌫i =
Pijn ⌫j +
Pijn
for i 2 T .
j2T
j2R`
P
Now recall that for i, j 2 T , we have limn!1 Pijn = 0. Also j2R` Pijn is the probability
that Xn 2 R` given X0 2 T . Since there is no exit from R` , this quantity is non-decreasing
in n and bounded by 1, so it has a limit. This limit is the probability of ever going from i
to R` , completing the derivation.
101
A.4. SOLUTIONS FOR CHAPTER 4
Exercise 4.14: Answer the following questions for the following stochastic matrix [P ]
2
1/2
[P ] = 4 0
0
1/2
1/2
0
a) Find [P n ] in closed form for arbitrary n > 1.
3
0
1/2 5 .
1
Solution: There are several approaches here. We first give the brute-force solution of
simply multiplying [P ] by itself multiple times (which is reasonable for a first look), and
then give the elegant solution.
2
32
3
2
3
1/2 1/2 0
1/2 1/2 0
1/4 2/4 1/4
[P 2 ] = 4 0 1/2 1/2 5 4 0 1/2 1/2 5 = 4 0 1/4 3/4 5 .
0
0
1
0
0
1
0
0
1
2
32
3
2
3
1/2 1/2 0
1/4 2/4 1/4
1/8 3/8 4/8
[P 3 ] = 4 0 1/2 1/2 5 4 0 1/4 3/4 5 = 4 0 1/8 7/8 5 .
0
0
1
0
0
1
0
0
1
We could proceed to [P 4 ], but it is natural to stop and think whether this is telling us
something. The bottom row of [P n ] is clearly (0, 0, 1) for all n, and we can easily either
reason or guess that the first two main diagonal elements are 2 n . The final column is
n . We
whatever is required to make the rows sum to 1. The only questionable element is P12
n
guess that it is n2 and verify it by induction,
2
32
3
1/2 1/2 0
2 n n2 n 1 (n+1)2 n
5
2 n
1 2 n
[P n+1 ] = [P ] [P n ] = 4 0 1/2 1/2 5 4 0
0
0
1
0
0
1
2
3
n
1
n
1
n
1
2
(n+1)2
1 (n+2)2
5.
0
2 n
1 2 n
= 4
0
0
1
This solution is not very satisfying, first because it is tedious, second because it required a
somewhat unmotivated guess, and third because no clear rhyme or reason emerged.
The elegant solution, which can be solved with no equations, requires looking at the graph
of the Markov chain,
1/2
1/2
n
n
z
X
z 3n
X
1
2
O
1/2
O
1/2
O
1
n = 2 n is the probability of taking the lower loop for n successive
It is now clear that P11
n = 2 n is the probability of taking the lower loop at
steps starting in state 1. Similarly P22
state 2 for n successive steps.
n is the probability of taking the transition from state 1 to 2 exactly once out
Finally, P12
the n transitions starting in state 1 and of staying in the same state (first 1 and then 2)
102
APPENDIX A. SOLUTIONS TO EXERCISES
for the other n 1 transitions. There are n such paths, corresponding to the n possible
steps at which the 1 ! 2 transition can occur, and each path has probability 2 n . Thus
n = n2 n , and we ‘see’ why this factor of n appears. The transitions P n are then chosen
P12
i3
to make the rows sum to 1, yielding the same solution as above.
b) Find all distinct eigenvalues and the multiplicity of each distinct eigenvalue for [P ].
Solution: Note that [P ] is an upper triangular matrix, and thus [P
I] is also upper
triangular. Thus its determinant is the product of the terms on the diagonal, det[P ⇤I] =
( 12
)2 (1
). It follows that = 1 is an eigenvalue of multiplicity 1 and = 1/2 is an
eigenvalue of multiplicity 2.
c) Find a right eigenvector for each distinct eigenvalue, and show that the eigenvalue of multiplicity 2 does
not have 2 linearly independent eigenvectors.
Solution: For any Markov chain, e = (1, . . . , 1)T is a right eigenvector. For the given
chain, this is unique within a scale factor, since = 1 has multiplicity 1. For ⌫ to be a right
eigenvector of eigenvalue 1/2, it must satisfy
1
1
⌫1 + ⌫2 + 0⌫3 =
2
2
1
1
0⌫1 + ⌫2 + ⌫3 =
2
2
⌫3 =
1
⌫1
2
1
⌫2
2
1
⌫3 .
2
From the first equation, ⌫2 = 0 and from the third ⌫3 = 0, so ⌫ = (1, 0, 0) is the right
eigenvector of = 1/2, unique within a scale factor. Thus = 1/2 does not have 2 linearly
independent eigenvectors.
d) Use (c) to show that there is no diagonal matrix [⇤] and no invertible matrix [U ] for which [P ][U ] = [U ][⇤].
Solution: Letting ⌫ 1 , ⌫ 2 , ⌫ 3 be the columns of an hypothesized matrix [U ], we see that
[P ][U ] = [U ][⇤] can be written out as [P ]⌫⌫ i = i⌫ i for i = 1, 2, 3. For [U ] to be invertible,
⌫ 1 , ⌫ 2 , ⌫ 3 must be linearly independent eigenvectors of [P ]. However part (c) showed that
3 such eigenvectors do not exist.
e) Rederive the result of (d) using the result of a) rather than c).
Solution:
[U ] and [⇤] of (d) exist, then [P n ] = [U ][⇤n ][U 1 ]. Then, as in (4.30),
P3 If the
n
n
(i)
[P ] = i=1 i ⌫ ⇡ (i) where ⌫ (i) is the ith column of [U ] and ⇡ (i) is the ith row of [U 1 ].
n = n(1/2)n , the factor of n means that it cannot have the form a n + b n + c n
Since P12
1
2
3
for any choice of 1 , 2 , 3 , a, b, c.
Note that the argument here is quite general. If [P n ] has any terms containing a polynomial
in n times ni , then the eigenvectors can’t span the space and a Jordan form decomposition
is required.
Exercise 4.15: a) Let [Ji ] be a 3 by 3 block of a Jordan form, i.e.,
2
i
[Ji ] = 4 0
0
1
i
0
3
0
1 5.
i
103
A.4. SOLUTIONS FOR CHAPTER 4
Show that the nth power of [Ji ] is given by
2
[Jin ] = 4
n
i
1
n n
i
0
0
0
n
2
n
i
n
n 2
i
n 1
i
n
i
3
5.
Hint: Perhaps the easiest way is to calculate [Ji2 ] and [Ji3 ] and then use iteration.
Solution: It is probably worthwhile to multiply [Ji ] by itself one or two times to gain
familiarity with the problem, but note that the proposed formula for [Jin ] is equal to [Ji ]
for n = 1, so we can use induction to show that it is correct for all larger n. That is, for
each n 1, we assume the given formula for [Jin ] is correct and demonstrate that [Ji ] [Jin ]
is the given formula evaluated at n + 1. We suppress the subscript i in the evaluation.
We start with the elements on the first row, i.e.,
X
n
J1j Jj1
=
· n = n+1
j
X
n
J1j Jj2
=
j
X
n
J1j Jj3
j
·n n 1+1·
✓ ◆
n
=
·
2
n 2
n
= (n+1) n
+1·n
n 1
✓
◆
n+1 n 1
=
.
2
In the last equation, we used
✓ ◆
✓
◆
n
n(n 1)
n(n + 1)
n+1
+n =
+n =
=
.
2
2
2
2
The elements in the second and third row can be handled the same way, although with
slightly less work. The solution in (b) provides a more elegant and more general solution.
b) Generalize (a) to a k ⇥ k block of a Jordan form J. Note that the nth power of an entire Jordan form is
composed of these blocks along the diagonal of the matrix.
Solution: This is rather difficult unless looked at in just the right way, but the right way
is quite instructive and often useful. Thus we hope you didn’t spin your wheels too much,
and hope you will learn from the solution. The idea comes from the elegant way to solve (a)
of Exercise 4.14, namely taking a graphical approach rather than a matrix multiplication
approach. Forgetting for the moment that J is not a stochastic matrix, we can draw its
graph, i.e., the representation of the non-zero elements of the matrix, as
1n
O
1
z 2n
X
O
1
z 3n
X
O
z
X
· · · XXz knO
Each edge, say from i ! j in the graph, represents P
a non-zero element of J and is labelled
2
2
by the value of Jij . If we look at [J ], then Jij = k Jik Jkj is the sum over the walks of
length 2 in the graph of the ‘measure’ of that walk, where the measure of a walk is the
product of the labels on the edges of that walk. Similarly, we can see that Jijn is the sum
over walks of length n of the product of the labels for that walk. In short, Jijn can be found
104
APPENDIX A. SOLUTIONS TO EXERCISES
from the graph in the same way as for stochastic matrices; we simply ignore the fact that
the outgoing edges from each node do not sum to 1.
We can now see immediately how to evaluate these elements. For Jijn , we must look at all
the walks of length n that go from i to j. For i > j, the number is 0, so we assume j i.
Each such walk must include j i rightward transitions and thus must include n j + i
self loops. Thus the measure of each such walk is n j+i . The number of such walks is the
number of ways in which the j i rightward transition times can be selected out of the n
transitions altogether, i.e., j n i . Thus, the general formula is
Jijn =
✓
n
j
◆
for j > i.
j i
i
c) Let [P ] be a stochastic matrix represented by a Jordan form [J] as [P ] = U [J][U
[U
1
][P ][U ] = [J]. Show that any repeated eigenvalue
magnitude of the elements of [U ] and [U
] and consider
of [P ] that is represented by a Jordan block of 2 by
2 or more must satisfy | | < 1. Hint: Upper bound the element magnitudes of [U
1
1
1
][P n ][U ] by taking the
] and by upper bounding each element of [P n ] by 1.
Solution: Each element of the matrix [U 1 ][P n ][U ] is a sum of products of terms and the
magnitude of that sum is upper bounded by the sum of products of the magnitudes of the
terms. Representing the matrix whose elements are the magnitudes of a matrix [U ] as [|U |],
and recognizing that [P n ] = [|P n |] since its elements are nonnegative, we have
[|J n |]  [|U
1
|][P n ][|U |]  [|U
1
|][M ][|U |],
where [M ] is a matrix whose elements are all equal to 1.
The matrix on the right is independent of n, so each of its elements is simply a finite
number. If [P ] has an eigenvalue of magnitude 1 whose multiplicity exceeds its number
of linearly independent eigenvectors, then [J n ] contains an element n n 1 of magnitude n,
and for large enough n, this is larger than the finite number above, yielding a contradiction.
Thus, for any stochastic matrix, any repeated eigenvalue with a defective number of linearly
independent eigenvectors has a magnitude strictly less than 1.
d) Let
s be the eigenvalue of largest magnitude less than 1.
Assume that the Jordan blocks for
s are at
most of size k. Show that each ergodic class of [P ] converges at least as fast as nk n
s.
Solution: What we want to show is that for each element i, j of the matrix, |[P n ]
bnk | ns | for some sufficiently large constant b depending on [P ] but not n.
⇡ |ij 
e⇡
The states of an ergodic class have no transitions out, and for questions of convergence
within the class we can ignore transitions in. Thus we will get rid of some excess notation
by simply assuming an ergodic Markov chain. Thus there is one eigenvalue equal to 1 and
it has unit multiplicity. All other eigenvalues are strictly less than 1. The largest such in
magnitude is denoted as s . Assume that the eigenvalues are arranged in [J] in decreasing
magnitude, so that J11 = 1.
Since [J] is in general a Jordan form, its upper triangular block structure is illustrated
105
A.4. SOLUTIONS FOR CHAPTER 4
below. The corresponding structure for [J n ], using part (b), is also shown.
2
3
2
[J] =
1
60
60
6
60
6
60
40
0
0
s
0
0
0
0
0
0
1
s
0
0
1
0
0
0
0
0
0
0
s
0
0
0
0
3
0
0
0
0
0
0
0
4
0
0
0 7
7
0 7
0 7
7
0 7
5
1
1
60
60
6
60
6
60
40
0
[J n ] =
4
0
0
n
s
n n
s
0
0
0
0
0
n
s
0
1
n
2
n
0
0
0
0
n 2
s
n 1
s
n
s
0
0
0
0
0
0
0
n
3
0
0
0
0
0
0
0
0
n
4
0
0
0
0
0
3
7
7
7
7.
7
7
n 1 5
4
n
4
For all n, we have [P n ] = [U ][J n ][U 1 ]. Note that limn!1 [J n ] is a matrix with Jijn ! 0
n = 1 for all n. Let u be the first column of [U ] and let v
except for i = j = 1 where J11
1
1
1
be the first row of U . Then since only the top left element of [J n ] is nonzero in the limit,
⇡ . Thus it
we can see that limn!1 [U J n U 1 ] = u 1 v 1 . We have seen that limn!1 [P n ] = e⇡
follows that u 1 = e and v 1 = ⇡ .
⇡ = [U ][J˜n ][U 1 ]where [J˜n ] is the same as [J n ] except
It follows from this that [P n ] e⇡
that the eigenvalue 1 has been replaced by 0. This means, however, that the magnitude of
each element of [J˜n ] is upper bounded by nk | ns k |. It follows that the magnitude of each
element in [U ][J˜n ][U 1 ] is upper bounded by bnk | s |n for large enough b (the value of b
takes account of [U ], [U 1 ], and s k ).
Exercise 4.16: a) Let be an eigenvalue of a matrix [A], and let ⌫ and ⇡ be right and left eigenvectors
respectively of , normalized so that ⇡ ⌫ = 1. Show that
[[A]
⌫ ⇡ ]2 = [A2 ]
2
⌫⇡.
Solution: We simply multiply out the original square,
[[A]
⌫ ⇡ ]2 = [A2 ]
⌫ ⇡ [A]
2
⌫⇡
= [A2 ]
⇥
b) Show that [An ]
n
⌫⇡
⇤⇥
[A]
⇤
⌫ ⇡ = [An+1 ]
[A]⌫⌫ ⇡ + 2⌫ ⇡ ⌫ ⇡
2
⌫ ⇡ + 2⌫ ⇡ = [A2 ]
n+1
= [A
c) Use induction to show that [[A]
⌫ ⇡ ]n = [An ]
⌫ ⇡ [A]
n+1
⌫⇡.
n
]
n
⌫⇡.
⌫⇡.
Solution: This is essentially the same as (a)
⇥ n
⇤⇥
⇤
n
[A ]
⌫ ⇡ [A]
⌫ ⇡ = [An+1 ]
n+1
2
[An ]⌫⌫ ⇡ +
n+1
⌫ ⇡⌫ ⇡
⌫⇡.
Solution: (a) gives the base of the induction and (b) gives the inductive step.
Exercise 4.17: Let [P ] be the transition matrix for an aperiodic Markov unichain with the states
numbered as in Figure 4.5.
a) Show that [P n ] can be partitioned as
[P n ] =

[PT n ]
0
[Pxn ]
n
[PR
]
.
That is, the blocks on the diagonal are simply products of the corresponding blocks of [P ], and the upper
right block is whatever it turns out to be.
106
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: If we imagine multiplying [P ][P ] by straightforward block multiplication, we get

[PT PT ] [PT PT R + PT R PR ]
2
[P ] =
.
[0]
[PR PR ]
In the same way, using induction, and not tracking the upper right term

[PTn ] [Pxn ]
n
[P ] =
.
n]
[0] [PR
Another approach is to look at Pijn . If i, j 2 T , then any n step walk from i to j stays in
T . Similarly, if i, j 2 R, any walk from i to j stays in R. Finally, there are no walks from
R to T , giving rise to the matrix [0] in the lower left.
b) Let qi be the probability that the chain will be in a recurrent state after t transitions, starting from
P
state i, i.e., qi = j2R Pijt . Show that qi > 0 for all transient i.
Solution: As specified in Figure 4.5, t is the number of transient states. As explained in
Section 4.3.4, each transient state has a path of at most t steps to a recurrent state. All
extensions of that path remain in recurrent states and thus if the original path has fewer
than t steps, any extension of that path to a walk with t steps still ends at a recurrent state
and has positive probability. Thus qi is positive.
c) Let q be the minimum qi over all transient i and show that Pijnt  (1
q)n for all transient i, j (i.e.,
show that [PTn ] approaches the all zero matrix [0] with increasing n).
P
P
t
Solution: Since k2R Pik
q, it follows that j2T Pijt  1 q for all i 2 T . Using the
Chapman-Kolmogorov equations for i 2 T ,
X
X
X
t t
t
Pij2t =
Pik
Pkj 
Pik
(1 q)  (1 q)2 .
j2T
k,j2T
Iterating this equation, we get
P
k2T
nt
j2T Pij  (1
q)n and thus Pijnt  (1
q)n for all j 2 T .
⇡ T , ⇡ R ) be a left eigenvector of [P ] of eigenvalue 1. Show that ⇡ T = 0 and show that ⇡ R must
d) Let ⇡ = (⇡
be positive and be a left eigenvector of [PR ]. Thus show that ⇡ exists and is unique (within a scale factor).
Solution: Note that ⇡ is also an eigenvector of [P n ] for all n, so that ⇡ T [PT ]n = ⇡ T . Since
limn!1 [PTn ] = 0, this shows that ⇡ T = 0. We then have ⇡ R = ⇡ R [PR ], which has a unique
solution.
e) Show that e is the unique right eigenvector of [P ] of eigenvalue 1 (within a scale factor).
Solution: e must be a right eigenvector of [P ] since [P ] is stochastic (i.e., its rows all sum
to 1, which is what [P ]e = e means). Since [P ] is a unichain, it has a single eigenvalue
equal to 1 and therefore a unique right eigenvector within a scale factor.
Exercise 4.18: Generalize Exercise 4.17 to the case of a Markov chain [P ] with  recurrent classes and
one or more transient classes. In particular,
a) Show that [P ] has exactly  linearly independent left eigenvectors, ⇡ (1) , ⇡ (2) , . . . , ⇡ () of eigenvalue 1,
and that the mth can be taken as a probability vector that is positive on the mth recurrent class and zero
elsewhere.
107
A.4. SOLUTIONS FOR CHAPTER 4
Solution: See Exercise 4.28 for a very simple example of these results, and see the solution
to Exercise 4.13 for a more intuitive but less precise explanation.
Number the states so that the transient states come first, then the states in recurrent class
1, and so forth up to the recurrent states in class . Then [P ] has the following block
structure
2
3
[P ] [PT R1 ]
6 T
6
6 0
[P ] = 6 .
6 ..
4
0
..
.
[PR1 ]
..
.
0
..
.
0
..
.
..
.
..
.
..
.
0
[PT R ]
0
..
.
[PR ]
7
7
7
7.
7
5
The `th recurrent class has an |R` | by |R` | transition matrix which, viewed alone, has
an eigenvalue = 1 of multiplicity 1, a corresponding unique (within a scale factor) left
eigenvector, say ⇡ (R` ), and a corresponding unique (within a scale factor) right eigenvector,
⌫ (R` ) = (1, . . . , 1)T (see Theorem 4.4.2).
Let ⇡ (`) be an M dimensional row vector whose components are equal to those of ⇡ (R` ) over
the states of R` and equal to 0 elsewhere. Then it can be seen by visualizing elementary
row/matrix multiplication on the block structure of [P ] that ⇡ (`) [P ] = ⇡ (`) . This gives us 
left eigenvectors of eigenvalue 1, one for each recurrent class R` .
If we look at the determinant equation, det[P
I], the upper triangular block structure
of [P ] implies that det[P
I] breaks into the product of  + 1 determinants, one for the
transient states and one for each recurrent class. Thus we can identify the eigenvalues with
the classes, |T | eigenvalues for T and, for each `, |R` | eigenvalues for R` . Each recurrent
class has one eigenvalue equal to 1 and the transient set has no eigenvalues equal to 1. To
verify this latter statement, assume the contrary, i.e., assume there is a row vector ⇡ (T ) that
is nonzero only on the components of T and which satisfies ⇡ (T ) = ⇡ (T ) [P ]. Then we must
also have ⇡ (T ) = ⇡ (T ) [P n ] for all n > 1. However, we have seen that Pijn ! 0 as n ! 1
for all i, j 2 T . This yields the desired contradiction, since there can then be no nonzero
solution to ⇡ (T ) = ⇡ (T ) [P n ] for large n.
We have thus shown that = 1 has multiplicity  for matrix [P ] and we have identified
all the left eigenvectors of eigenvalue 1 (or more strictly, we have identified a basis for the
vector space of left eigenvectors of eigenvalue 1).
b) Show that [P ] has exactly  linearly independent right eigenvectors, ⌫ (1) , ⌫ (2) , . . . , ⌫ () of eigenvalue 1,
(m)
and that the mth can be taken as a vector with ⌫i
equal to the probability that recurrent class m will
ever be entered starting from state i.
If there are no transient states, then a set of  right eigenvectors can be chosen in the same
way as the left eigenvectors. That is, for each `, 1  `  , the components of ⌫ (`) can
be chosen to be 1 for each state in R` and 0 for all other states. This doesn’t satisfy the
eigenvector equation if there are transient states, however. We now show, instead, that for
(`)
each `, 1  `  , there is a right eigenvector ⌫ (`) of eigenvalue 1 such that ⌫i = 0 for all
i 2 Rm , for each m 6= `. Finally we will show that these  vectors are linearly independent
and have the properties specified in the problem statement.
108
APPENDIX A. SOLUTIONS TO EXERCISES
The right eigenvector equation that must be satisfied by ⌫ (`) with the conditions above can
be written out component by component, getting
(`)
X
=
⌫i
(`)
j2R`
(`)
X
=
⌫i
for i 2 R`
Pij ⌫j
(`)
Pij ⌫j +
j2T
X
(`)
for i 2 T .
Pij ⌫j
j2R`
The first set of equations above are simply the usual right eigenvector equations for eigen(`)
value 1 over the recurrent submatrix [PR` ]. Thus ⌫j = 1 for j 2 R` and this solution
(over R` ) is unique within a scale factor. Substituting this solution into the second set of
equations, we get
(`)
⌫i
=
X
(`)
Pij ⌫j +
j2T
X
Pij
j2R`
for i 2 T .
(A.55)
(`)
We can view this as a vector/matrix equation by letting ⌫ T be the vector ⌫ (`) restricted to
P
(`)
(`)
the transient states. Thus, ⌫ T = [PT ]⌫⌫ T + x where xi = j2R` Pij . Since T is a set of
transient states, [PT ] does not have 1 as an eigenvalue, so [PT
I] is non-singular. Thus
this set of equations has a unique solution (we have already specified the scale factor on
(`)
⌫ (`) by setting ⌫i = 1 for i 2 R` ).
We have now established that for each `, 1  `  , [P ] has a right eigenvector of eigenvalue
1 that is nonzero only on R` and T . Since the components of ⌫ (`) are 1 on R` and 0 on all
other recurrent classes, these  vectors are clearly linearly independent. Since the eigenvalue
1 has multiplicity , these eigenvectors form a basis for the eigenvectors of eigenvalue 1.
(`)
It remains only to show that ⌫i = limn!1 Pr{Xn 2 R` | X0 =i}. We have seen many times
that if ⌫ (`) is a right eigenvector of [P ] of eigenvalue 1, then it is also a right eigenvector
of [P n ] of eigenvalue 1. Since [P n ] has the same block structure as [P ], we can repeat the
argument leading to (A.55). For each n > 1, then, we have
(`)
⌫i
=
X
(`)
Pijn ⌫j +
j2T
X
Pijn
j2R`
for i 2 T .
P
In the limit n ! 1, each Pijn for i, j 2 T goes to 0. Also j2R` Pijn is the probability that
Xn 2 R` given X0 = i. Since there is no exit from R` , this quantity is non-decreasing in
n and bounded by 1, so it has a limit. This limit is the probability of ever going from i to
R` , completing the derivation.
c) Show that
lim [P n ] =
n!1
X
⌫ (m)⇡ (m) .
m
Solution: Unfortunately, this result is not true in general. The additional restriction
that each recurrent class is ergodic must be added. In this case, Pijn for i 2 T and j 2
109
A.4. SOLUTIONS FOR CHAPTER 4
Rm converges to the probability that class Rm is entered from i times the steady-state
probability of j within class Rm .
Exercise 4.19: Suppose a recurrent Markov chain has period d and let Sm , 0  m  d
1, be the mth
subset in the sense of Theorem 4.2.9. Assume the states are numbered so that the first s0 states are the
states of S0 , the next s1 are those of S1 , and so forth. Then the matrix [P ] for the chain has the block form
given by
2
6
6
6
6
6
[P ] = 6
6
6
6
6
4
0
[P0 ]
0
..
.
0
..
.
0
..
[P1 ]
..
.
..
.
..
.
0
[Pd 1 ]
.
0
..
.
..
.
..
.
..
.
..
.
3
0
..
.
..
.
7
7
7
7
7
7,
7
7
7
[Pd 2 ] 7
5
0
where [Pm ] has dimension sm ⇥ sm+1 for 0  m  d 1, where the index (d 1) + 1 is interpreted as 0
throughout (i.e., mod d arithmetic is used on indices). In what follows it is often more convenient to express
0
[Pm ] as an M ⇥ M matrix [Pm
] whose entries are 0 except for the
of Sm and the columns of Sm+1 ,
P rows
1
0
where the entries are equal to those of [Pm ]. In this view, [P ] = dm=0
[Pm
].
a) Show that [P d ] has the form
2
6 [Q0 ]
6
d
[P ] = 6
6 0
4
0
3
0
..
.
[Q1 ]
..
.
0
..
.
0
..
.
[Qd 1 ]
7
7
7,
7
5
where [Qm ] = [Pm ][Pm+1 ] · · · [Pd 1 ][P0 ] · · · [Pm 1 ]. Expressing [Qm ] as an M ⇥ M matrix [Q0m ] whose entries
are 0 except for the rows and columns of Sm where the entries are equal to those of [Qm ], this becomes
P 1 0
[P d ] = dm=0
[Qm ].
Solution: As explained in Section 4.3.5, Pijd must be 0 for (i, j) not in the same subset.
Since the subsets are arranged as contiguous sets of state numbers, [Qj ] is the dth order
transition matrix of the states in class Sj . To see this more clearly and to understand why
[Qm ] = [Pm ][Pm+1 ] . . . [Pd 1 ][P0 ] . . . [Pm 1 ], note that [P 2 ] is given by
2
0
6
6
6
0
6
6
2
.
6
..
[P ] = 6
6
6
6 [Pd 2 Pd 1 ]
4
0
0
[P0 P1 ]
..
0
..
.
0
..
.
..
.
..
.
[P1 P2 ]
..
.
..
.
..
.
0
[Pd 1 P0 ]
.
..
.
3
7
7
. 7
7
.. 7
7
. 7.
7
.. 7
. 7
5
..
.
..
Similarly, [P m ] for m < d has a block structure where states in subset i can go only to
states in subset (i + m) mod d and go there with probabilities determined by going first to
i + 1, then i + 2 up to i + m, all mod d.
b) Show that [Qm ] is the matrix of an ergodic Markov chain and let its left eigenvector of eigenvalue 1
ˆ m and let ⌫ˆm be the corresponding right eigenvector scaled so that
(scaled to be a probability vector) be ⇡
P 1 0 0
0
0
⇡ˆ m⌫ˆm = 1. Let ⇡ˆ m and ⌫ˆm be the corresponding M-vectors. Show that limn!1 [P nd ] = dm=0
⌫ˆm⇡ˆ m .
110
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: Note that [Qm ] for 0  m  d 1 must be recurrent. This follows since [P ] is
recurrent (so that each pair of states in Sm communicates) and is periodic (so that each
pair in Sm communicates only over periods of d. Also [Qm ] is aperiodic since otherwise Piik
for i 2 Sm would be nonzero only for k = `d for some integer ` > 1, which is contrary to
the given fact that the chain, and thus i, is periodic with period d. Thus each [Qm ] is the
matrix of an ergodic Markov chain. It follows that each [Qm ] has a unique left eigenvector
⇡ˆ m whose components sum to 1, i.e., ⇡ˆ m e = 1 where e is the unit vector of sm dimensions
(i.e., ⌫ˆm = e). Thus limn!1 [Qnm ] = ⌫ˆm⇡ˆ m . If we now consider [Q0m ] as the M by M matrix
that is equal to [Qm ] P
on i, j 2 Sm and zero elsewhere, we see that limn!1 [Q0m ] = ⌫ 0m⇡ 0m .
1 0
nd
Thus limn!1 [P ] = dm=0
⌫ˆm⇡ˆ 0m .
0
c) Show that ⇡ˆ 0m [Pm
] = ⇡ˆ 0m+1 . Note that ⇡ˆ 0m is an M-tuple that is non-zero only on the components of Sm .
Solution: [Qm ] = [Pm Pm+1 · · · Pd 1 P0 · · · Pm 1 ]. We saw in (b) that each Qm has a unique
(within scale factor) left eigenvector ⇡ˆ m of eigenvalue 1. Thus ⇡ˆ m [Qm ] = ⇡ˆ m and
⇡ˆ m [Pm Pm+1 · · · Pd 1 P0 · · · Pm 1 ] = ⇡ˆ m .
For m < d
1, we can postmultiply both sides of this equation by [Pm ] to get
⇡ˆ m [Pm ][Qm+1 ] = ⇡ˆ m [Pm ].
This means that ⇡ˆ m [Pm ] is a left eigenvector (with eigenvalue 1) of [Qm+1 ]. Normalizing
each ⇡ m to be a probability vector, the components of ⇡ˆ m [Pm ] sum to 1, so ⇡ m [Pm ] = ⇡ m+1 .
For m = d 1, the same argument shows that ⇡ˆ d 1 [Pd 1 ] is a left eigenvector of [Q0 ]. In
what follows, we use mod d arithmetic on indices without special mention.
As in parts (a) and (b), we can express these results in terms of the M by M matrices. Thus,
0 P0
0
0
0
0 ]=⇡
ˆ 0m [Pm
ˆ 0m+1
[Q0m ] = [Pm
m+1 · · · Pd 1 P0 · · · Pm 1 ] and it follows that ⇡
d) Let
e
k
.
p
= 2⇡⇡ d 1 and let ⇡ (k) =
Pd 1
ˆ 0m emk
m=0 ⇡
. Show that ⇡ (k) is a left eigenvector of [P ] of eigenvalue
Solution: Note that {k ; k = 0, 1, . . . , d 1} is the set of d equi-spaced points around the
complex unit circle. We want to show that these points are eigenvalues of [P ] and that ⇡ (k)
is a corresponding left eigenvector of eigenvalue k . We have
Xd 1
⇡ (k) [P ] =
emk ⇡ 0m [P ]
m=0
Xd 1
0
=
emk ⇡ 0m [Pm
]
m=0
Xd 1
=
emk ⇡ 0m+1
m=0
Xd 1
= e k
e(m+1)k ⇡ 0m+1
m=0
Xd
= e k
emk ⇡ 0m = e k ⇡ (k) .
m=1
In the second step, we used the block structure of [P ] and in the third step, we used (c).
In the fourth step, we multiplied and divided by e k , and in the fifth, we substituted the
index m for m+1 and used the mod d summation to recognize that summing from 1 to d
is equivalent to summing from 0 to d 1. Finally we used the definition of ⇡ (k) .
111
A.4. SOLUTIONS FOR CHAPTER 4
Exercise 4.20: (continuation of Exercise 4.19) a) Show that, with the eigenvectors defined in Exercise
4.19,
lim [P nd ][P ] =
n!1
where, as before, (d
d 1
X
⌫ (i)⇡ (i+1) ,
(A.56)
i=0
1) + 1 is taken to be 0.
Solution: The quantities ⌫ˆ0i and ⇡ˆ 0i defined in ExerciseP4.19 are being referred to here as
⌫ (i) and ⇡ (i) . We showed there that limn!1 [P nd ] = di=01 ⌫ (i)⇡ (i) and ⇡ (i) [P ] = ⇡ (i+1) .
Thus (again using mod d indices)
lim [P nd ][P ] =
n!1
d 1
X
d 1
X
⌫ (i)⇡ (i) [P ] =
i=0
⌫ (i)⇡ (i+1) .
i=0
b) Show that, for 1  j < d,
lim [P nd ][P j ] =
n!1
d 1
X
⌫ (i)⇡ (i+j) .
(A.57)
i=0
Solution: This follows by successively postmultiplying (A.56) by [P ] j 1 times.
c) Show that
lim [P
n!1
nd
n
] I + [P ] + · · · + [P
d 1
o
] =
d 1
X
i=0
⌫
(i)
!
d 1
X
⇡
(i+j)
j=0
!
.
(A.58)
Solution: This results immediately by summing (A.57) over j.
d) Show that
⌘
1⇣ n
⇡,
[P ] + [P n+1 ] + · · · + [P n+d 1 ] = e⇡
(A.59)
n!1 d
P
P
where ⇡ is the steady-state probability vector for [P ]. Hint: Show that e = m ⌫ (m) and ⇡ = (1/d) m ⇡ (m) .
lim
Solution:
mod d indices, the last P
term in (A.58) P
can be replaced by
Pd 1 (j) Because of the
d 1 (j)
d 1 (j)
(j) [P ] = ⇡ (j+1) , we see that
⇡
.
Now
since
⇡
⇡
[P
]
=
j=0
j=0
j=0 ⇡ , again bePd 1 (j)
cause of the mod d indices. Thus j=0 ⇡ is a left eigenvector of [P ]. However, each ⇡ (j)
was chosen as a steady-state
vector for a periodic subset of states and thus each sums to 1.
P
Thus ⇡ = (1/d) dj=01 ⇡ (j) is a left eigenvector of [P ] for which the components sum to 1.
This result also follows from Exercise 4.19(d) by setting k = 0.
We also note that ⌫ (i) is a vector whose
P components are equal to 1 over the ith periodic
subset of states and 0 elsewhere, Thus di=01 ⌫ (i) = e. We have thus shown that
n
o
1
nd
d 1
⇡.
lim [P ] [I] + [P ] + · · · + [P
] = e⇡
d n!1
This equation remains valid if we postmultiply it by [P ] or [P m ] for any m, and thus we
get (A.59).
112
APPENDIX A. SOLUTIONS TO EXERCISES
e) Show that the above result is also valid for periodic unichains.
Solution: Each transient state can join each periodic subset with some arbitrary probability, but this makes no di↵erence as far as (A.59) is concerned since these probabilities are
summed over the periodic subsets. We omit the details.
Exercise 4.21: Suppose A and B are each ergodic Markov chains with transition probabilities {PAi ,Aj }
and {PBi ,Bj } respectively. Denote the steady-state probabilities of A and B by {⇡Ai } and {⇡Bi } respectively.
The chains are now connected and modified as shown below. In particular, states A1 and B1 are now
connected and the new transition probabilities P 0 for the combined chain are given by
PA0 1 ,B1
PB0 1 ,A1
=
",
PA0 1 ,Aj = (1
")PA1 ,Aj
for all Aj
,
PB0 1 ,Bj = (1
)PB1 ,Bj
for all Bj .
=
All other transition probabilities remain the same. Think intuitively of " and as being small, but do not
make any approximations in what follows. Give your answers to the following questions as functions of ", ,
{⇡Ai } and {⇡Bi }.
'
n
n
&
a) Assume that ✏ > 0,
$ '
n
Z
Z
Z n
A1 X
y
⇢
⇢
⇢
n
"
n
⇢ Z
⇢
Z
⇢
Z n
X Bn
z
1
Z
⇢
Z
⇢
Z n
⇢
% &
Chain A
Chain B
$
%
= 0 (i.e., that A is a set of transient states in the combined chain). Starting in
state A1 , find the conditional expected time to return to A1 given that the first transition is to some state
in chain A.
Solution: Conditional on the first transition from state A1 being to a state Ai 6= A1 , these
conditional transition probabilities are the same as the original transition probabilities for
A. If we look at a long sequence of transitions in chain A alone, the relative frequency
of state A1 tends to ⇡A1 so we might hypothesize that, within A, the expected time to
return to A1 starting from A1 is 1/⇡A1 . This hypothesis is correct and we will verify it by
a simple argument when we study renewal theory. Here, however, we verify it by looking
at first-passage times within the chain A. For now, label the states in A as (1, 2, . . . , M)
where 1 stands for A1 . For 2  i  M, let vi be the expected time to first reach state 1
starting in state i. As in (4.31),
X
vi = 1 +
Pij vj ;
2  i  M.
(A.60)
j6=1
We can use these equations to write an equation for the expected time T to return to state
1 given that we start in state 1. The first transition goes to each state j with probability
P1j and the remaining time to reach state 1 from state j is vj . We define v1 = 0 since if
the first transition from 1 goes to 1, there is no remaining time required to return to state
A1 . We then have
T =1+
M
X
j=1
Pij vj .
(A.61)
113
A.4. SOLUTIONS FOR CHAPTER 4
Note that this is very di↵erent from (4.32) where [P ] is a Markov chain in which 1 is a
trapping state. We can now combine (A.61) (for component 1) and (A.60) (for components
2 to M) into the following vector equation:
T e 1 + v = e + [P ]v ,
where e 1 = (1, 0, 0, . . . , 0)T and e = (1, 1, . . . , 1)T . Motivated by the hypothesis that
T = 1/⇡1 , we premultiply this vector equation by the steady-state row vector ⇡ , getting
T ⇡1 + ⇡ v = 1 + ⇡ [P ]v = 1 + ⇡ v .
Cancelling ⇡ v from each side, we get T = 1/⇡1 as hypothesized.
b) Assume that ✏ > 0,
= 0. Find TA,B , the expected time to first reach state B1 starting from state A1 .
Your answer should be a function of ✏ and the original steady-state probabilities {⇡Ai } in chain A.
Solution: Starting in state A1 , we reach B1 in a single step with probability ✏. With
probability 1 ✏, we wait for a return to A1 and then have expected time TA,B remaining.
Thus TA,B = ✏ + (1 ✏) ⇡A1 + TA,B . Solving this equation,
1
TA,B = 1 +
c) Assume " > 0,
1 ✏
.
✏⇡A1
> 0. Find TB,A , the expected time to first reach state A1 , starting in state B1 . Your
answer should depend only on
and {⇡Bi }.
Solution: The fact that ✏ > 0 here is irrelevant since that transition can never be used in
the first passage from B1 to A1 . Thus the answer is the reversed version of the answer to
(b), where now ⇡B1 is the steady-state probability of B1 for chain B alone.
TB,A = 1 +
d) Assume " > 0 and
1
⇡B1
.
> 0. Find P 0 (A), the steady-state probability that the combined chain is in one of
the states {Aj } of the original chain A.
Solution: With 0 < ✏ < 1 and 0 < < 1, the combined chain is ergodic. To see this, note
that all states communicate with each other so the combined chain is recurrent. Also, all
walks in A are still walks in the combined chain, so the gcd of their lengths is 1. Thus A,
and consequently B, are still aperiodic.
We can thus use the steady state equations to find the unique steady-state vector ⇡ . In
parts (d), (e), and (f), we are interested in those probabilities in chain A. We denote those
states, as before, as (1, . . . , M) where 1 is state A1 . Steady-state probabilities for A in the
combined chain for given ✏, are denoted ⇡j0 , whereas they are denoted as ⇡j in the original
chain. We first find ⇡10 and then ⇡j0 for 2  j  M.
As we saw in (a), the expected first return time from a state to itself is the reciprocal of
the steady-state probability, so we first find TAA , the expected time of first return from
A1 to A1 . Given that the first transition from state 1 goes to a state in A, the expected
114
APPENDIX A. SOLUTIONS TO EXERCISES
first-return time is 1/⇡1 from (a). If the transition goes to B1 , the expected first-return
time is 1 + TBA , where TBA is found in (c). Combining these with the a priori probabilities
of going to A or B,
TAA = (1
Thus
⇥
⇤
1
✏)/⇡1 + ✏ 1 + TBA =
⇡10 =

1
✏
⇡1
+ 2✏ +
✏
⇡1
+ 2✏ +
✏(1
)
⇡B1
✏(1
)
.
⇡B1
1
.
Next we find ⇡j0 for the other states in A in terms of the ⇡j for the uncombined chains and
⇡10 . The original and the combined steady-state equations for 2  j  M are
X
X
⇡j =
⇡i Pij + ⇡1 P1j ;
⇡j0 =
⇡i0 Pij + ⇡10 (1 ✏)P1j .
i6=1
i6=1
These equations, as M 1 equations in the unknowns ⇡j ; j 2 for given ⇡1 , uniquely specify
⇡2 , . . . , ⇡M and they di↵er in ⇡1 being replaced by (1 ✏)⇡10 . From this, we see that the
second set of equations is satisfied if we choose
⇡j0 = ⇡j
(1
✏)⇡10
⇡1
.
We can now sum the steady-state probabilities in A to get
Pr{A} =
M
X
j=1
e) Assume " > 0,
⇡j0 = ⇡10

1
✏+
✏
⇡1
.
= 0. For each state Aj 6= A1 in A, find vAj , the expected number of visits to state Aj ,
starting in state A1 , before reaching state B1 . Your answer should depend only on " and {⇡Ai }.
Solution: We use a variation on the first passage time problem in (a) to find the expected
number of visits to state j, E [Nj ], in the original chain starting in state 1 before the first
return to 1. Here we let vi (j) be the expected number of visits to j, starting in state i,
before the first return to 1. The equations are
X
X
E [Nj ] =
P1k vk (j) + P1j ;
vi (j) =
Pik vk (j) + Pij for i 6= 1.
k6=1
k6=1
The first equation represents E [N (j)] in terms of the expected number of visits to j conditional on each first transition k from the initial state 1. in the special case where that
first transition is to j, the expected number of visits to j includes both 1 for the initial visit
plus vj (j) for the subsequent visits. The second set of equations are similar, but give the
expected number of visits to j (before a return to 1) starting from each state other than 1.
Writing this as a vector equation, with v (j) = (0, v2 (j), v3 (j). . . . , vM (j))T , we get
E [Nj ] e 1 + v (j) = [P ]v (j) + [P ]e j .
115
A.4. SOLUTIONS FOR CHAPTER 4
Note that [P ]e j is the column vector (p1j , . . . , PMj )T Premultiplying by ⇡ , we see that
E [Nj ] = ⇡j /⇡1 . Finally, to find vAj , the expected number of visits to state j before the first
to B1 , we have
vAj = (1
f ) Assume " > 0,
✏)[E [Nj ] + vAj ] =
(1
✏)E [Nj ]
(1 ✏)⇡j
=
.
✏
✏⇡i
0
> 0. For each state Aj in A, find ⇡A
, the steady-state probability of being in state Aj
j
in the combined chain. Hint: Be careful in your treatment of state A1 .
Solution: This was solved in (d). Readers might want to come back to this exercise later
and re-solve it more simply using renewal theory.
Exercise 4.22: Section 4.5.1 showed how to find the expected first passage times to a fixed state, say
1, from all other states. It is often desirable to include the expected first recurrence time from state 1 to
return to state 1. This can be done by splitting state 1 into 2 states, first an initial state with no transitions
coming into it but the original transitions going out, and second, a final trapping state with the original
transitions coming in.
a) For the chain on the left side of Figure 4.6, draw the graph for the modified chain with 5 states where
state 1 has been split into 2 states.
Solution: We split state 1 into states 10 and 1, where 10 is the trapping state and 1 can be
an initial state.
0
: 1n
⇠
I
@
1n
n
* 2 H
Y
⇤⌫
✏⇤ ⇡
j n
H
4
H 3n
j
y
* X
b) Suppose one has found the expected first-passage-times vj for states j = 2 to 4 (or in general from 2 to
M). Find an expression for v1 , the expected first recurrence time for state 1 in terms of v2 , v3 , . . . vM and
P12 , . . . , P1M .
Solution: Note that v2 , v3 , and v4 are unchanged by the addition of state 1, since no
transitions can go to 1 from states 2, 3, or 4. We then have
v1 = 1 +
4
X
P1j vj .
j=2
Exercise 4.23: a) Assume throughout that [P ] is the transition matrix of a unichain (and thus the
eigenvalue 1 has multiplicity 1). Show that a solution to the equation [P ]w
if r
ge lies in the column space of [P
w =r
ge exists if and only
I] where [I] is the identity matrix.
Solution: Let C[P I] be the column space of [P I]. A vector x is in C[P I] if x is a
linear combination of columns of [P I), i.e., if x is w1 times the first column of [P I] plus
w2 times the second column, etc. More succinctly, x 2 C[P I] if and only if x = [P I]w
for some vector w . Thus r ge 2 C[P I] if and only if [P I]w = r ge for some w ,
which after rearrangement is what is to be shown.
116
APPENDIX A. SOLUTIONS TO EXERCISES
b) Show that this column space is the set of vectors x for which ⇡ x = 0. Then show that r
ge lies in this
column space.
Solution: We know that [P ] has a single eigenvalue equal to 1. Thus [P I] is singular and
the steady-state vector ⇡ satisfies ⇡ [P I] = 0. Thus for every w , ⇡ [P I]w = 0 so ⇡ x = 0
for every x 2 C[P I] . Furthermore, ⇡ (and its scalar multiples) are the only vectors to
satisfy ⇡ [P I] = 0 and thus C[P I] is an M 1 dimensional vector space. Since the
vector space of vectors x that satisfy ⇡ x = 0 is M 1 dimensional, this must be the same
as C[P I]. Finally, since g is defined as ⇡ r , we have ⇡ (r ge) = 0, so r ge 2 C[P I].
c) Show that, with the extra constraint that ⇡ w = 0, the equation [P ]w
w =r
ge has a unique solution.
Solution: For any w 0 that satisfies [P ]w 0 w 0 = r ge, it is easy to see that w 0 + ↵e
also satisfies this equation for any real ↵. Furthermore, since the column space of [P I]
is M 1 dimensional, this set of solutions, namely {w 0 + ↵e; ↵ 2 <} is the entire set of
solutions. The additional constraint that ⇡ (w 0 + ↵e) = ⇡ w 0 + ↵ = 0 specifies a unique
element in this set.
Exercise 4.24: For the Markov chain with rewards in Figure 4.8,
a) Find the solution to (4.37) and find the gain g.
Solution: The symmetry in the transition probabilities shows that ⇡ = (1/2, 1/2)T , and
thus g = ⇡ r = 1/2. The first component of (4.37), i.e., of w + ge = [P ]w + r is then
w1 + 1/2 = P11 w1 + P12 w2 + r1 . This simplifies to w1 w2 = 50. The second component
is redundant, and ⇡ w = 0 simplifies to w1 + w2 = 0. Thus w1 = 25 and w2 = 25.
b) Modify Figure 4.8 by letting P12 be an arbitrary probability. Find g and w again and give an intuitive
explanation of why P12 e↵ects w2 .
Solution: With arbitrary P12 , the steady-state probabilities become
⇡1 =
0.01
;
P12 + 0.01
⇡2 =
P12
.
P12 + 0.01
The steady-state gain, g = ⇡ r , then becomes g = P12 /(P12 + 0.01). Solving for w as before,
we get
w1 =
P12
(P12 + 0.01)2
w2 =
0.01
.
(P12 + 0.01)2
As P12 increases, the mean duration of each visit to state 1 decreases so that the fraction of
time spent in state 1 also decreases, thus increasing the expected gain. At the same time,
the relative advantage of starting in state 2 decreases since the interruptions to having a
reward on each transition become shorter. For example, if P12 = 1/2, the mean time to
leave state 1, starting in state 1, is 2.
Exercise 4.25: (Proof of Corollary 4.5.6) a) Show that the gain per stage g is 0. Hint: Show that r is
zero where the steady-state vector ⇡ is non-zero.
Solution: The steady-state vector ⇡ is non-zero only for the recurrent states, and the
reward is 0 on each of those states, so the steady-state gain g = ⇡ r is 0.
117
A.4. SOLUTIONS FOR CHAPTER 4
b) Let [PR ] be the transition matrix for the recurrent states and let r R = 0 be the reward vector and w R
the relative-gain vector for [PR ]. Show that w R = 0. Hint: Use Theorem 4.5.4.
Solution: Treating the recurrent states in isolation, (4.37) becomes w R = [PR ]w R and
⇡ R w R = 0. The first part of this is simply the equation for the right eigenvector of [PR ] of
eigenvalue 1. This is unique wihin a scale factor, and the second part specifies that the scale
factor is 0. As a matter of terminology, w R is then not an eigenvector, since eigenvectors
cannot be 0, but w R = 0 does satisfy the eigenvector equation aside from the nonzero
condition.
c) Show that wi = 0 for all i 2 R. Hint: Compare the relative-gain equations for [P ] to those for [PR ].
Solution: Use the block upper triangular form of [P ] in Figure 4.5 and represent the
transient and recurrent parts of w as (w T , w R ). Then [P ]w over the recurrent states is
[PR ]w R , and (4.37) restricted to these components becomes the same as in (b).
d) Show that for each n
0, [P n ]w = [P n+1 ]w + [P n ]r . Hint: Start with the relative-gain equation for [P ].
Solution: The relative gain equation, (4.37), with g = 0, is w = [P ]w + r . Premultiplying
this by [P ] yields [P ]w = [P 2 ]w + [P ]r , and, for n > 1, premultiplying it by [P n ] yields
[P n ]w = [P n+1 ]w + [P n ]r . For the special case of n = 0, we interpret [P 0 ] as the identity
matrix, so [P n ]w = [P n+1 ]w + [P n ]r becomes w = [P ]w + r , which is the relative gain
equation.
e) Show that w = [P n+1 ]w +
Pn
m=0 [P
m
]r . Hint: Sum the result in (d).
Solution: Summing the result of (d) for the first n + 1 terms,
n
X
[P m ]w =
m=0
n+1
X
[P m ]w +
m=1
n
X
[P m ]r .
m=0
Canceling the repeated terms from each side, we have the desired result.
f ) Show that limn!1 [P n+1 ]w = 0 and that limn!1
components for ri > 0. Hint: Use lemma 4.3.6.
Pn
m=0 [P
m
]r is finite, nonnegative, and has positive
Solution: Since w R = 0, we have [P n+1 ]w =P
[PTn+1 ]w T . From Lemma 4.3.6, the components of [PTn+1 ] all
n
m
approach 0 geometrically in n.
Thus
lim
n!1
m=0 [P ]r is finite. It is nonnegative since r is nonnegative.
P1
n
Finally, if ri > 0, then wi = n=0 Pij rj > 0 because Pii0 = 1.
g) Demonstrate the final result of the corollary by using the previous results on r = r 0
r 00 .
Solution: This final result is obvious by linearity.
Exercise 4.26: Consider the Markov chain below:
a) Suppose the chain is started in state i and goes through n transitions; let vi (n) be the expected number
of transitions (out of the total of n) until the chain enters the trapping state, state 1. Find an expression
for v (n) = (v1 (n), v2 (n), v3 (n))T in terms of v (n
1) (take v1 (n) = 0 for all n). (Hint: view the system as
a Markov reward system; what is the value of r ?)
Solution: We use essentially the same approach as in Section 4.5.1, but we are explicit
here about the number of transitions. Starting in any state i 6= 1 with n transitions to go,
118
APPENDIX A. SOLUTIONS TO EXERCISES
: 2k
⇠
PP
PP 1/2
6
PP
PP
q k 1
P
1/2
y
11 X
⇣
⇣
⇣⇣
⇣
1/4 ⇣⇣⇣⇣ 1/4
: 3k
⇠
1/2
the expected number of transitions until the first that either enters the trapping state or
completes the nth transition is given by
vi (n) = 1 +
3
X
Pij vj (n 1).
(A.62)
j=1
Since i = 1 is the trapping state, we can express v1 (n) = 0 for all n, since no
P3transitions are
required to enter the trapping state. Thus (A.62) is modified to v1 (n) = j=1 P1j vj (n 1)
for i = 1. Viewing r as a reward vector, with one unit of reward for being in any state other
than the trapping state, we have r = (0, 1, 1)T . Thus (A.62) can be expressed in vector
form as
v (n) = r + [P ]v (n
1).
b) Solve numerically for limn!1 v (n). Interpret the meaning of the elements vi in the solution of (4.32).
Solution: We have already seen that v1 (n) = 0 for all n, and thus, since P23 = 0, we have
v2 (n) = 1 + (1/2)v2 (n 1). Since v2 (0) = 0, this iterates to
v2 (n) = 1 +
1
1 + v2 (n 2)
2
= 1+
1 1
+ 1 + v2 (n 3)
2 4
= ··· = 2
2 (n 1) .
For v3 (n), we use the same approach, but directly use the above solution for v2 (n) for each
n.
1
1
v3 (n) = 1 + P32 v2 (n 1) + P33 v3 (n 1) = 1 + 1 2 (n 2) + v3 (n 1)
2
4
i
1h
1
n 1
n 2
= (2 2
)+ 2 2
+ v3 (n 2) = · · ·
4
4
h
i
i
1
1h
1
= (2 2n 1 ) + 2 2n 2 +
2 2n 3 + · · · + n 1 [2 20 ]
4
16
4
h
i
1
1
1 i h (n 1)
n
= 2 1 + + 2 + ··· n 1
2
+ 2 + · · · + 2 2n+2
4 4
4 

1 4 n
2 n
8 2 2n+3
(n 1) 1
= 2
2
=
2 n+2 + 2 2n+2 .
1 4 1
1 2 1
3
3
Taking the limit,
lim v2 (n) = 2;
n!1
8
lim v3 (n) = .
3
n!1
119
A.4. SOLUTIONS FOR CHAPTER 4
c) Give a direct argument why (4.32) provides the solution directly to the expected time from each state
to enter the trapping state.
Solution: The limit as n ! 1 gives the expected time to enter the trapping state with no
limit on the required number of transitions. This limit exists in general since the transient
states persist with probabilities decreasing exponentially with n.
Exercise 4.27: a) Show that (4.48) can be rewritten in the more compact form
v ⇤ (n, u) = v ⇤ (1, v ⇤ (n 1, u)).
Solution: From (4.45), v ⇤ (1, u) = maxk r k + [P (k ) ]u . This is valid for an arbitrary
choice of u, so choosing v ⇤ (n 1, u) in place of u, we get
⇣
1, u)) = max r k + [P (k ) ]v ⇤ (n
v ⇤ (1, v ⇤ (n
k
This is equal to v ⇤ (n, u) from (4.48).
⌘
1, u) .
b) Explain why it is also true that
v ⇤ (2n, u) = v ⇤ (n, v ⇤ (n, u)).
(A.63)
Solution: We can see that v ⇤ (n, v ⇤ (n, u)) is the maximum gain over n stages using a final
gain of v ⇤ (n, u) where the final gain v ⇤ (n, u) can be interpreted as the maximum gain over
the next n stages (with a final gain at time 2n of u). This expression makes sense since
the gain over the next n stages depends only on the state at the beginning of that next n
stages. It depends on that initial state i at time n only as the ith component of the vector
gain.
c) One might guess that (A.63) could be used iteratively, finding v ⇤ (2n+1 , u) from v ⇤ (2n , u). Explain why
this is not possible in any straightforward way. Hint: Think through explicitly how one might calculate
v ⇤ (n, v ⇤ (n, u)) from v ⇤ (n, u).
Solution: If one could find v ⇤ (n, u) as a convenient formula in u, then by substituting
v ⇤ (n, u) for u, one could immediately find v ⇤ (2n, u). The trouble is that the iteration for
finding v ⇤ (n, u) is an iteration requiring n steps for any given u. When u is changed to
v ⇤ (n, u), then one needs another n steps to find v ⇤ (n, v ⇤ (n, u). This is exactly the same
computation that is required to find v ⇤ (2n, u) in the first place.
Exercise 4.28: Consider finding the expected time until a given string appears in an IID binary sequence
with Pr{Xi =1} = p1 , Pr{Xi =0} = p0 = 1
p1 .
a) Following the procedure in Example 4.5.1, draw the 3 state Markov chain for the string (0, 1). Find the
expected number of trials until the first occurrence of that string.
Solution:
: 0l
⇠
p1
p0
p0
✏
z l
X
1
p1
z l
X
2 X
y
1
120
APPENDIX A. SOLUTIONS TO EXERCISES
Let vi be the expected first-passage time from node i to node 2. Then
v0 = 1 + p1 v0 + p0 v1 ;
v1 = 1 + p0 v1 + p1 v2
v0 = 1/p0 + v1 ;
v1 = 1/p1 + v2 .
Combining these equations to eliminate v1 ,
v0 = 1/p0 + 1/p1 + v2 = 1/p0 p1 + v2 .
Finally, the trapping state, 2, is reached when the string 01 first occurs, so v2 = 0 and
v0 = 1/p0 p1 .
b) For parts b) to d), let (a1 , a2 , a3 , . . . , ak ) = (0, 1, 1, . . . , 1), i.e., zero followed by k
1 ones. Draw the
corresponding Markov chain for k = 4.
Solution:
p1
⇠ 0n
:
p0
z n
X
1 X
y
Y
H
p0 O
0
p1
z n
X
2
01
p0
p1
p0
z n
X
3
011
p1
z n
X
4 X
y
0111
1
c) Let vi , 1  i  k be the expected first-passage time from state i to state k. Note that vk = 0. For each
i, 1  i < k, show that vi = ↵i + vi+1 and v0 =
i + vi+1 where ↵i and
i are each expressed as a product
of powers of p0 and p1 . Hint: use induction on i using i = 1 as the base. For the inductive step, first find
i+1 as a function of
i starting with i = 1 and using the equation v0 = 1/p0 + v1 .
Solution: (a) solved the problem for i = 1. The fact that the string length was 2 there
was of significance only at the end where we set v2 = 0. We found that ↵1 = 1/p1 and
1 = 1/p0 p1 .
For the inductive step, assume vi = ↵i + vi+1 and v0 =
the basic first-passage-time equation,
i + vi+1 for a given i
1. Using
vi+1 = 1 + p0 v1 + p1 vi+2
= p0 v0 + p1 vi+2
= p0 i + p0 vi+1 + p1 vi+2 .
The second equality uses the basic equation for v0 , i.e., v0 = 1 + p1 v0 + p0 v1 which reduces
to p0 v1 + 1 = p0 v0 and the third equality uses the inductive hypothesis v0 = i + vi+1 .
Combining the terms in vi+1 ,
vi+1 =
p0 i
+ vi+2 .
p1
This completes half the inductive step, showing that ai+1 = p0 i /p1 . Now we use this result
plus the inductive hypothesis v0 = i + vi+1 to get
v0 =
i + vi+1
=
i+
p0 i
i
+ vi+2 =
+ vi+2 .
p1
p1
121
A.4. SOLUTIONS FOR CHAPTER 4
This completes the second half of the induction, showing that
these equations for i and ↵i , we find the explicit expression
↵i =
1
;
pi1
i =
i+1 =
i /p1 .
Iterating on
1
.
p0 pi1
Note that the algebra here was quite simple, but if one did not follow the hints precisely,
one could get into a terrible mess. In addition, the whole thing was quite unmotivated. We
view the occurrence of these strings as renewals in Exercise 5.35 and find a more intuitive
way to derive the same answer.
d) Let a = (0, 1, 0). Draw the corresponding Markov chain for this string. Evaluate v0 , the expected time
for (0, 1, 0) to occur.
Solution:
p1
: 0n
⇠
Y
H
p0
0
01
010
p1
p0
✏
z n
X
X n
z
X n 1
z
1
2
3 X
y
p1
p0
The solution for v0 and v1 in terms of v2 is the same as (a). The basic equation for v2 in
terms of its outward transitions is
v2 = 1 + p0 v0 + p1 v3

1
= 1 + p0
+ v2 .
p0 p1
Combining the terms in v2 , we get
p1 v2 = 1 + 1/p1 .
Using v0 = 1/p0 p1 + v2 ,
1
v0 =
1
1
+
+
p21 p0 p1 p21
=
1
1
+
.
p1 p0 p21
This solution will become more transparent after doing Exercise 5.35.
Exercise 4.29: a) Find limn!1 [P n ] for the Markov chain below. Hint: Think in terms of the long term
transition probabilities. Recall that the edges in the graph for a Markov chain correspond to the positive
transition probabilities.
1
P31
: 1n
⇠
3n
CO
P32- n 1
2 X
y
P33
Solution: The chain has 2 recurrent states, each in its own class, and one transient state.
n = 1 and lim
n
n
n
Thus limn!1 P11
n!1 P22 = 1. Let q1 = limn!1 P31 and q2 = limn!1 P32 .
Since q1 + q2 = 1 and since in each transition starting in state 3, P31 and P32 give the
122
APPENDIX A. SOLUTIONS TO EXERCISES
probabilities of moving to 1 or 2, q1 = P31 /(P31 + P32 ) and q2 = P32 /(P31 + P32 ). The other
long term transtion probabilities are zero, so
2
3
1 0 0
lim [P n ] = 4 0 1 0 5 .
n!1
q1 q2 0
The general case here, with an arbitrary set of transient states and an arbitrary number of
recurrent classes is solved in Exercise 4.18.
b) Let ⇡ (1) and ⇡ (2) denote the first two rows of limn!1 [P n ] and let ⌫ (1) and ⌫ (2) denote the first two
columns of limn!1 [P n ]. Show that ⇡ (1) and ⇡ (2) are independent left eigenvectors of [P ], and that ⌫ (1) and
⌫ (2) are independent right eigenvectors of [P ]. Find the eigenvalue for each eigenvector.
Solution: ⇡ (1) = (1, 0, 0) and ⇡ (2) = (0, 1, 0). Multiplying ⇡ (i) by limn!1 [P n ] for i = 1, 2
we see that these are the left eigenvectors of eigenvalue 1 of the two recurrent classes. We
also know this from Section 4.4. Similarly, ⌫ (1) = (1, 0, q1 )T and ⌫ (2) = (0, 1, q2 )T are right
eigenvectors of eigenvalue 1, both of limn!1 [P n ] and also of [P ].
c) Let r be an arbitrary reward vector and consider the equation
w + g (1)⌫ (1) + g (2)⌫ (2) = r + [P ]w .
(A.64)
Determine what values g (1) and g (2) must have in order for (A.64) to have a solution. Argue that with the
additional constraints w1 = w2 = 0, (A.64) has a unique solution for w and find that w .
Solution: In order for this equation to have a solution, it is necessary, first, for a solution
to exist when both sides are premultiplied by ⇡ (1) . This results in g (1) = r1 . Similarly,
premultiplying by ⇡ (2) results in g (2) = r2 . In other words, g (1) is the gain per transition
when in state 1 or when the chain moves fom state 3 to state 1. Similarly g (2) is the gain
per transition in state 2 or after a transition from 3 to 2. Since there are two recurrent sets
of states, there is no common meaning to a gain per transition.
Setting w1 = w2 = 0 makes a certain amount of sense, since starting in state 1 or starting
in state 2, the reward increases by r1 or r2 per transition with no initial transient. With
this choice, the first two components of the vector equation in (A.64) are 0=0 and the third
is
w3 + r3 q1 + r2 q2 = r3 + P33 w3 .
Solving for w3 ,

1
w3 =
r3
P31 + P32
r1
P31
P31 + P32
r2
P32
.
P31 + P32
This can be interpreted as the relative gain of starting in state 3 relative to the ‘average’
of starting in 1 or 2. The interpretation is quite limited, since the gain per transition
depends on which recurrent class is entered, and thus relative gain has nothing definitive
for comparison.
Exercise 4.30: Let u and u 0 be arbitrary final reward vectors with u  u 0 .
123
A.4. SOLUTIONS FOR CHAPTER 4
a) Let k be an arbitrary stationary policy and prove that v k (n, u)  v k (n, u 0 ) for each n
1.
Solution: For n = 1,
v k (1, u) = r k + [P k ]u  r k + [P k ]u 0 = v k (1, u 0 ),
where the inequality has used the fact that [P k ] is nonnegative. We use this as the basis in
the following inductive step: assume that v k (n, u)  v k (n, u 0 ). Then
v k (n + 1, u) = r k + [P k ]v k (n, u)  r k + [P k ]v k (n, u 0 ) = v k (n + 1, u 0 ).
Thus v k (n, u)  v k (n, u 0 ). Note that this does not depend on whether [P k ] is the matrix
of a unichain; it is valid for any Markov chain and any policy.
b) For the optimal dynamic policy, prove that v ⇤ (n, u)  v ⇤ (n, u 0 ) for each n
1. This is known as the
monotonicity theorem.
Solution: Let k 0 maximize v (1, k ) over k . Then
0
0
v ⇤ (1, u) = v k (1, u)  v k (1, u 0 )  v ⇤ (1, u 0 ).
We use this as the basis of the following inductive step over n. Assume that v ⇤ (n, u) 
v ⇤ (n, u 0 ). Let k 0 maximize v k (n+1), u) over k .
0
0
0
v ⇤ (n+1, u) = v k (n+1, u) = r k + [P k ]v ⇤ (n, u)
0
0
 r k + [P k ]v ⇤ (n, u 0 )
 v ⇤ (n+1, u 0 ),
where in the next to last step, we used the inductive hypothesis plus the nonnegativity of
0
[P k ] and in the last step, we re-optimized over the policy for the new final reward u 0 .
c) Now let u and u 0 be arbitrary. Let ↵ = maxi (ui
u0i ). Show that
v ⇤ (n, u)  v ⇤ (n, u 0 ) + ↵e.
Solution: Note that ui
(A.65)
u0i  ↵ for all i, so u  u 0 + ↵e. Thus, using (b),
v ⇤ (n, u)  v ⇤ (n, u 0 + ↵e).
Letting k 0 maximize v k (n), u 0 + ↵e), we have
0
0
0
0
v ⇤ (n, u 0 + ↵e) = r k + [P k ](u 0 + ↵e)
= r k + [P k ](u 0 ) + ↵e  v ⇤ (n, u 0 ) + ↵e,
which gives us (A.65). The final inequality here is actually an equality, since adding a fixed
quantity ↵ to the final reward in all states can not change the optimization over states.
Exercise 4.31: Consider a Markov decision problem with M states in which some state, say state 1, is
inherently reachable from each other state.
(k )
a) Show that there must be some other state, say state 2, and some decision, k2 , such that P21 2 > 0.
124
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: Assume that M > 1 and let j be any state other than 1. The definition of
inherent reachability says that there is a policy, say k (1), for which there is a path from j
to 1. Denote the last state on this path before state 1 as state 2 and let k2 be the decision
(k )
in state 2 under policy k (1). Then P21 2 > 0.
(k )
b) Show that there must be some other state, say state 3, and some decision, k3 , such that either P31 3 > 0
(k )
or P32 3 > 0.
Solution: Assume that M > 2 and let j be any state other than 1 or 2. There is a policy,
say k (2) under which there is a path from j to 1. Denote state 3 as the last state on this
path that di↵ers from 1 and 2. Let k3 be the decision in state 3 under policy k (2). Then if
(k )
this path goes directly from 3 to 1, we have P31 3 > 0. Alternatively, if this path goes from
(k )
(k )
3 to 2 to 1, we have P32 3 > 0 (and, of course, P21 2 > 0)
(k )
c) Assume, for some i, and some set of decisions k2 , . . . , ki that, for each j, 2  j  i, Pj` j > 0 for some
` < j (i.e., that each state from 2 to j has a non-zero transition to a lower numbered state). Show that there
(k
)
i+1
is some state (other than 1 to i), say i + 1 and some decision ki+1 such that Pi+1,`
> 0 for some `  i.
Solution: For those confused by all the indices, we are constructing a tree, rooted at node
(state) 1, with ingoing branches corresponding to transitions with positive probability under
the selected decisions. The numbering of the states is not unique, but is used simply to
show that such a tree can be constructed. Each higher numbered node is connected by an
inward going transition to the part of the tree already constructed.
As before, assume M > i and choose some state j other than 1 to i. There is a policy k (i)
for which there is a path from j to state 1. Denote i + 1 as the last state on this path that
di↵ers from 1 to i. Let ` be the next state on this path (which must satisfy 1  `  i) and
(ki+1 )
let ki+1 be the decision in policy k (i) used in state i + 1. Then Pi+1,`
> 0.
d) Use parts a), b), and c) to observe that there is a stationary policy k = k1 , . . . , kM for which state 1 is
accessible from each other state.
Solution: When each state has been chosen (i.e., when i = M, then every state j has a
path to state 1 where kj is the decision used in state j. This set of decisions {kj ; 1  j  M}
constitutes a policy under which state 1 is accessible from each other node.
Exercise 4.32: George drives his car to the theater, which is at the end of a one-way street. There are
parking places along the side of the street and a parking garage that costs $5 at the theater. Each parking
place is independently occupied or unoccupied with probability 1/2. If George parks n parking places away
from the theater, it costs him n cents (in time and shoe leather) to walk the rest of the way. George is
myopic and can only see the parking place he is currently passing. If George has not already parked by the
time he reaches the nth place, he first decides whether or not he will park if the place is unoccupied, and
then observes the place and acts according to his decision. George can never go back and must park in the
parking garage if he has not parked before.
a) Model the above problem as a 2 state dynamic programming problem. In the “driving” state, state
2, there are two possible decisions: park if the current place is unoccupied or drive on whether or not the
current place is unoccupied.
Solution: View the two states as walking (state 1) and driving (state 2). The state
transitions correspond to passing successive possibly idle parking spaces, which are passed
either by car or foot. Thus r1 = 1 (the cost in shoe leather and time) and r2 = 0. There
125
A.4. SOLUTIONS FOR CHAPTER 4
is no choice of policy in state 1, but in state 2, policy t is to try to park and policy d is
continue to drive.
First consider stage 1 where George is between the second and first parking space from the
end. If George is in the walk state, there is one unit of cost going from 2 to 1 and 1 unit
of final cost going from 1 to 0. If George is in the drive state (2) and uses the try policy,
r2 = 0 and with probability 1/2, he parks and has one unit final cost getting to the theatre.
With probability 1/2, he doesn’t park and the final cost is 500. For policy d, r2 = 0 and
with probability 1, the final cost is 500.
In summary, u = (1, 500)T , r = (0, 1)T and the Markov chain with the t and d policies in
state 2 is
1
: 1n
⇠
r1 =1
1
(d)
2n
y P22
X
r2 =0
1
: 1n
⇠
y
X
r1 =1
(t)
P22
1/2
1/2
(t)
2n
y P22
X
r2 =0
b) Find vi⇤ (n, u), the minimum expected aggregate cost for n stages (i.e., immediately before observation
of the nth parking place) starting in state i = 1 or 2; it is sufficient to express vi⇤ (n, u) in terms of vi⇤ (n
1).
The final costs, in cents, at stage 0 should be v2 (0) = 500, v1 (0) = 0.
Solution: We start with stage 1.
v ⇤ (1, u) = r + min[P k ]u = (2, 250.5)T ,
k
where policy t is clearly optimal. At an arbitrary stage n, the cost is found iteratively from
stage n 1 by (4.48), i.e.,
⇣
⌘
v ⇤ (n, u) = min r k + [P k ]v ⇤ (n 1, u) .
(A.66)
k
c) For what values of n is the optimal decision the decision to drive on?
Solution: The straightforward approach (particularly with computional aids is to simply
calculate (A.66). The cost drops sharply with increasing n and for n
8, the optimal
decision is to drive on, whereas for n  7, the optimal decision is to park if possible.
As with many problems, the formalism of dymamic programming makes hand calculation
and understanding more awkward than it need be. The simple approach, for any distance
n away from the theatre, is to simply calculate the cost of parking if a place is available,
namely n, and to calculate the expected cost if one drives on. This expected cost, En is
easily seen to be
1
En = (m
2
1
1) + (m
4
2) + · · · +
1
2m 1
(1) +
500
.
2m 1
With some patience, this is simplified to En = m 2 + (501)/2m 1 , from which it is clear
that one should drive for n 8 and park for n  7.
d) What is the probability that George will park in the garage, assuming that he follows the optimal policy?
Solution: George will park in the garage if the last 7 potential parking spaces are full, an
event of probability 2 7 .
126
APPENDIX A. SOLUTIONS TO EXERCISES
Exercise 4.33: (Proof of Corollary 4.6.9) a) Show that if two stationary policies k 0 and k have the same
recurrent class R0 and if ki0 = ki for all i 2 R0 , then wi0 = wi for all i 2 R0 . Hint: See the first part of the
proof of Lemma 4.6.7.
(k0 )
(k )
Solution: For all i 2 R0 , Pij i = Pij i . The steady-state vector ⇡ is determined solely by
the transition probabilities in the recurrent class, so ⇡i = ⇡i0 for all i 2 R0 . Since ri = ri0 for
all i 2 R0 , it also follows that g = g 0 . The equations for the components of the relative gain
vector w for i 2 R0 are
X (k )
(k )
wi + g = ri i +
Pij i wj ;
i 2 R.
j
These equations, along with ⇡ w = 0 have a unique solution if we look at them only over
the recurrent class R0 . Since all components are the same for k and k 0 , there is a unique
solution over R0 , i.e., wi = wi0 for all i 2 R0 .
b) Assume that k 0 satisfies 4.50 (i.e., that it satisfies the termination condition of the policy improvement
algorithm) and that k satisfies the conditions of (a). Show that (4.64) is satisfied for all states `.
Solution: The implication from R0 being the recurrent class of k and k 0 is that each of
them are unichains. Thus w + ge = r k + [P k ]w with ⇡ w = 0 has a unique solution, and
there is a unique solution for the primed case. Rewriting these equations for the primed
and unprimed case, and recognizing from (a) that ge = g 0 e and ⇡ w = ⇡ w 0 ,
n
o
0
0
w w 0 = [P k ](w w 0 ) + r k r k + [P k ]w 0 [P k ]w 0 .
(A.67)
This is the vector form of (4.64).
c) Show that w  w 0 . Hint: Follow the reasoning at the end of the proof of Lemma 4.6.7.
Solution: The quantity in braces in (A.67) is non-negative because of the termination
condition of the policy improvement algorithm. Also w = w 0 over the recurrent components
of k . Viewing the term in braces as the non-negative di↵erence between two reward vectors,
Corollary 4.5.6 shows that w  w 0 .
Exercise 4.34: Consider the dynamic programming problem below with two states and two possible
policies, denoted k and k 0 . The policies di↵er only in state 2.
: 1n
⇠
y
X
1/2
r1 =0
1/2
1/8
z n
X
2 X
y
rk2 = 5
7/8
: 1n
⇠
y
X
1/2
r1 =0
1/2
1/4
z n
X
2 X
y
3/4
0
rk2 =6
a) Find the steady-state gain per stage, g and g 0 , for stationary policies k and k 0 . Show that g = g 0 .
Solution: Denote the steady-state vectors as ⇡ and ⇡ 0 . Then ⇡ = (1/5, 4/5) so g = 4.
Similarly, ⇡ 0 = (1/3, 2/3) so g 0 = 4
b) Find the relative-gain vectors, w and w 0 , for stationary policies k and k 0 .
Solution: With an M state unichain, w + ge = [P ]w + r provides M 1 independent
equations for w ; ⇡ w = 0 provides the the final equation. Thus we solve w1 + g = P11 w1 +
P12 w2 + r1 and ⇡1 w1 + ⇡2 w2 . The solution are w = ( 6.4, 1.6)T and w 0 = ( 16/3, 8/3)T
127
A.4. SOLUTIONS FOR CHAPTER 4
c) Suppose the final reward, at stage 0, is u1 = 0, u2 = u. For what range of u does the dynamic
programming algorithm use decision k in state 2 at stage 1?
Solution: It will simplify the following sections if we do this for arbitrary u1 , u2 . Then
⇣
u1 + u2
u1 + 7u2
u1 + 3u2 ⌘
v1⇤ (1, u) =
;
v2⇤ (1, u) = max 5 +
, 6+
.
2
8
4
Expressing the di↵erence v2⇤ (1, u)
v2⇤ (1, u)
Thus k is used for u2
v1⇤ (u) in terms of u2
v1⇤ (1, u) = 8 +
u2 u1
8
u1 > 8 and k 0 for u2
u1 , this becomes
⇣
8
u2 u1 8 ⌘
+ max
, 0 .
8
(A.68)
u1 < 8. Either can be used for u = 8.
d) For what range of u does the dynamic programming algorithm use decision k in state 2 at stage 2?
at stage n? You should find that (for this example) the dynamic programming algorithm uses the same
decision at each stage n as it uses in stage 1.
Solution: We find v1⇤ (2, u) and v2⇤ (2, u) in the same way, simply replacing u1 , u2 by v1⇤ (1, u)
and v2⇤ (1, u). We have found that decision k is used if u1 u1 > 8, and it is seen from
(A.68) that in this case v2⇤ (1, u) v1⇤ (1, u) > 8, so that k is used in stage 2. The same
argument works inductively for all n, so if u2 u1 > 8 that k is used at all stages. Clearly,
k 0 is used in all stages if u2 u1 < 8.
e) Find the optimal gain v2⇤ (n, u) and v1⇤ (n, u) as a function of stage n assuming u = 10.
Solution: In this case, the stationary policy k is used at all stages and from (4.40),
⇥
⇤
v ⇤ (n, u) = nge + w + [P k ]n (u w ).
f ) Find limn!1 [v ⇤ (n, u)
nge] and show how it depends on u.
⇥
⇤
Solution: From (e), using the limiting form for [P k ]n ,
v ⇤ (n, u)
⇡ (u
nge = w + e⇡
⇡ u,
w ) = w + e⇡
⇡ u > w 0 + e⇡
⇡0u 0
where w and ⇡ should be primed if u2 u1 < 8. As a sanity check, w + e⇡
if u2 u1 > 8 with the equality reversed if u2 u1 < 8.
Exercise 4.35: Consider a Markov decision problem in which the stationary policies k and k 0 each
satisfy (4.50) and each correspond to ergodic Markov chains.
0
0
a) Show that if r k + [P k ]w 0
r k + [P k ]w 0 is not satisfied with equality, then g 0 > g.
Solution: The solution is very similar to the proof of Lemma 4.6.5. Since [P k ] is ergodic,
0
0
⇡ k is strictly positive. Now r k + [P k ]w 0 r k + [P k ]w 0 must be satisfied because of (4.50),
and if one or more components are not satisfied with equality, then
⇣ 0
⌘
⇣
0
⇡ k r k + [P k ]w 0 > ⇡ k r k + [P k ]w 0 = g + ⇡ k w 0 ,
(A.69)
where in the equality, we used the definition of g and the fact that ⇡ k is an eigenvector of
0
0
[P k ]. Since r k + [P k ]w 0 = w 0 + g 0 e from (4.37), (A.69) simplifies to
⇡ k w 0 + g 0⇡ k e > g + ⇡ k w 0 .
128
APPENDIX A. SOLUTIONS TO EXERCISES
After canceling ⇡ k w 0 from each side, it is clear that g 0 > g.
0
0
b) Show that r k + [P k ]w 0 = r k + [P k ]w 0 (Hint: use (a).
Solution: The ergodicity of k and k 0 assures the inherently reachable assumption of Theorem 4.6.8, and thus we know from the fact that k satisfies the termination condition of
the policy improvement algorithm that g g 0 . Thus g 0 > g is impossible and none of the
0
0
components of r k + [P k ]w 0 r k + [P k ]w 0 can be satisfied with inequality, i.e.,
0
0
r k + [P k ]w 0
r k + [P k ]w 0 .
c) Find the relationship between the relative gain vector w for policy k and the relative-gain vector w 0 for
policy k 0 . (Hint: Show that r k + [P k ]w 0 = ge + w 0 ; what does this say about w and w 0 ?)
Solution: Using (4.37) and (b), we have
0
0
w 0 + g 0 e = r k + [P k ]w 0 = r k + [P k ]w 0 .
We also have
w + ge = r k + [P k ]w .
Subtracting the second equation from the first, and remembering that g 0 = g, we get
w0
w = [P k ](w 0
w ).
Thus w 0 w is a right eigenvector (of eigenvalue 1) of [P k ]. This implies that w 0
for some ↵ 2 R. Note that this was seen in the special case of Exercise 4.34(d).
w = ↵e
d) Suppose that policy k uses decision 1 in state 1 and policy k 0 uses decision 2 in state 1 (i.e., k1 = 1 for
(k)
(k)
(k)
(k)
policy k and k1 = 2 for policy k 0 ). What is the relationship between r1 , P11 , P12 , . . . P1J for k equal to
1 and 2?
0
0
(2) P
(2) 0
Solution: The first component of the equation r k +[P k ]w 0 = w 0 +ge is r1 + M
J=1 P1j wj =
(1)
w0 1 g. From (c), w 0 = w + ↵e, so we have
(2)
r1 +
M
X
(2)
P1j (wj + ↵) = g(w1 + ↵).
J=1
Since ↵ cancels out,
(2)
r1 +
M
X
(2)
(1)
P1j wj = r1 +
J=1
M
X
(1)
P1j wj .
J=1
e) Now suppose that policy k uses decision 1 in each state and policy k 0 uses decision 2 in each state. Is it
(1)
possible that ri
(2)
> ri
for all i? Explain carefully.
0
0
Solution: It seems a little surprising, since ⇡ k r k = ⇡ k r k , but Exercise 4.34 essentially
provides an example. In that example, r1 is fixed over both policies, but by providing a
A.4. SOLUTIONS FOR CHAPTER 4
129
choice in state 1 like that provided in state 2, one can create a policy with g = g 0 where
(1)
(2)
ri > ri for all i.
(1)
f ) Now assume that ri
is the same for all i. Does this change your answer to (e )? Explain.
(1)
Solution: If ri is the same, say r for all i, then g = r. The only way to achieve g 0 = r
(2)
(1)
(2)
for all i is for ri
r for at least one i. Thus we cannot have ri > ri for all i.
Exercise 4.36: Consider a Markov decision problem with three states. Assume that each stationary
policy corresponds to an ergodic Markov chain. It is known that a particular policy k 0 = (k1 , k2 , k3 ) = (2, 4, 1)
is the unique optimal stationary policy (i.e., the gain per stage in steady state is maximized by always using
(k)
decision 2 in state 1, decision 4 in state 2, and decision 1 in state 3). As usual, ri denotes the reward in
(k)
state i under decision k, and Pij denotes the probability of a transition to state j given state i and given
the use of decision k in state i. For each of the following ways of modifying the rewards in the Markov
decision problem, find first whether the gain per stage of policy k 0 is increased, decreased, or unchanged,
and second, whether it is possible that another policy becomes optimal after the change. The changes in
each part are to be considered in the absence of the changes in the other parts:
(1)
(1)
a) r1 is replaced by r1
1.
Solution: The gain g 0 for policy k 0 is unchanged since k 0 uses decision 2 rather than 1 in
state 1. Each other policy, say k initially has a gain g < g 0 , and if k uses decision 1 in
state 1, its gain decreases and otherwise its gain remains the same. Thus k 0 must still be
optimal.
(2)
(2)
b) r1 is replaced by r1 + 1.
Solution: Since the chain for policy k 0 is ergodic, state 1 is used with positive probability
so g 0 must increase, say to g̃ 0 . Surprisingly, there might be a policy k whose modified gain
g̃ is greater than g̃ 0 . To see this, suppose the rewards using policy k 0 are all zero, and
the rewards for all other decisions in all states are slightly negative. Then k 0 is obviously
(2)
optimal with g 0 = 0. When r1 is increased to 1, there is the possibility of added gain
per stage by using decisions in the other states that increase the steady-state probability
of state 1, since the slight negative gain in the other states is negligible. Note that if there
is such a new optimal policy k , it still uses decision 2 in state 1; it is the decisions in the
other states that are changed.
(k)
c) r1
(k)
is replaced by r1 + 1 for all state 1 decisions k.
Solution: As in (b), g 0 increases and some other policy might become optimal. Here it is
possible that only the decision in state 1 changes, since under the same type of example as
before, another decision in state 1 might increase the probability of being in state 1 with
only a negligible loss of gain in each visit to state 1.
(k )
(k )
d) For all i, ri i is replaced by ri i + 1 for the decision ki of policy k 0 .
Solution: Here the gain using policy k 0 is increased by 1 in each stage, so g 0 increases by
exactly 1. As we now show, k 0 is still the optimal policy after the change. For any policy
k , the increase in gain is 1 in each state in which k uses the same decision as k 0 and the
increase is 0 in each other state. Thus the increase in gain per stage, g with policy k is at
most 1. Since g < g 0 before the change, g < g 0 after the change.
130
APPENDIX A. SOLUTIONS TO EXERCISES
Exercise 4.37: (The Odoni Bound) Let k 0 be the optimal stationary policy for a Markov decision
problem and let g 0 and ⇡ 0 be the corresponding gain and steady-state probability respectively. Let vi⇤ (n, u)
be the optimal dynamic expected reward for starting in state i at stage n with final reward vector u.
a) Show that mini [vi⇤ (n, u)
premultiplying v ⇤ (n, u)
vi⇤ (n
v ⇤ (n
1, u)]  g 0  maxi [vi⇤ (n, u) vi⇤ (n 1, u)] ; n 1. Hint: Consider
1, u) by ⇡ 0 or ⇡ where k is the optimal dynamic policy at stage n.
Solution: For the upper bound,
v ⇤ (n, u) = max r k + [P k ]v ⇤ (n 1, u)
(A.70)
k
0
0
r k + [P k ]v ⇤ (n 1), u),
0
where k 0 is the optimal stationary policy. Premultiplying both sides by ⇡ k ,
0
0
⇡ k v ⇤ (n, u)
g 0 + ⇡ k v ⇤ (n 1, u).
⇣
0
⇡ k v ⇤ (n, u)
⌘
v ⇤ (n 1, u)
Thus it is impossible for all components of v ⇤ (n, u)
g 0  maxi vi⇤ (n, u) v ⇤i (n 1, u).
g0 .
v ⇤ (n 1, u) to be less than g 0 , so
For the lower bound, premultiply both sides of (A.70) by ⇡ k where k maximizes (A.70).
Then
⇣
⌘
⇡ k v ⇤ (n, u) v ⇤ (n 1, u) = g,
where g = ⇡ k r k . Assuming inherent reachability, as we do throughout, Theorem 4.6.8
shows that g < g 0 . Thus mini [vi⇤ (n, u) vi⇤ (n 1, u)]  g 0 .
b) Show that the lower bound is non-decreasing in n and the upper bound is non-increasing in n and both
converge to g 0 with increasing n.
Solution: Let k and k 1 be the optimal dynamic policies at stages n and n
for final gain u. Then
v ⇤ (n, u) = r k + [P k ]v ⇤ (n 1, u)
⇤
v (n 1, u) = r
k1
+ [P
k1
1 respectively
r k 1 + [P k 1 ]v ⇤ (n 1, u)
⇤
]v (n 2, u)
⇤
r + [P ]v (n 2, u).
k
k
(A.71)
(A.72)
Taking the di↵erence of these equations, using the equality in (A.71) and the inequality in
(A.72), we get the upper bound ,
⇣
⌘
v ⇤ (n, u) v ⇤ (n 1, u)  [P k ] v ⇤ (n 1, u) v ⇤ (n 2, u)
 max vi⇤ (n
1, u)
vi⇤ (n
2, u) [P k ]e
= max vi⇤ (n
1, u)
vi⇤ (n
2, u) e.
i
i
Each component on the left is upper bounded by the coefficient of e on the right, so
max vi⇤ (n, u)
i
vi⇤ (n
1, u)
 max vi⇤ (n
i
1, u)
vi⇤ (n
2, u) .
131
A.4. SOLUTIONS FOR CHAPTER 4
The lower bound is virtually the same. We start with the di↵erence of the inequality in
(A.71) and the equality in (A.72) to get
⇣
⌘
v ⇤ (n, u) v ⇤ (n 1, u)
[P k 1 ] v ⇤ (n 1, u) v ⇤ (n 2, u) .
In the same way as before, then,
min vi⇤ (n, u)
i
vi⇤ (n
min vi⇤ (n
1, u)
1, u)
i
vi⇤ (n
2, u) .
The upper and lower bounds do not necessarily approach each other. A counter example,
without even any alternative policies, is given by a two state periodic chain with 0 reward
in each state and a final reward of (1, 0)T . They do approach each other with the extra
condition that the optimal policy is unique and ergodic, but we don’t prove that here.
Exercise 4.38: Consider an integer-time queueing system with a finite bu↵er of size 2. At the beginning
of the nth time interval, the queue contains at most two customers. There is a cost of one unit for each
customer in queue (i.e., the cost of delaying that customer). If there is one customer in queue, that customer
is served. If there are two customers, an extra server is hired at a cost of 3 units and both customers are
served. Thus the total immediate cost for two customers in queue is 5, the cost for one customer is 1, and
the cost for 0 customers is 0. At the end of the nth time interval, either 0, 1, or 2 new customers arrive
(each with probability 1/3).
a) Assume that the system starts with 0  i  2 customers in queue at time
1 (i.e., in stage 1) and
terminates at time 0 (stage 0) with a final cost u of 5 units for each customer in queue (at the beginning of
interval 0). Find the expected aggregate cost vi (1, u) for 0  i  2.
Solution: The aggregate cost at stage 1 is the immediate cost plus the expected final cost
at time 0, i.e.,
X
vi (1, u) = ri +
Pij uj ,
j
where r0 = 0, r1 = 1, and r2 = 5, where the cost of hiring an extra server is included as
part of the immediate cost. Since Pij = 1/3 for all i, j, the expected final cost is 5 for all i.
Thus,
v0 (1, u) = 5,
v1 (1, u) = 6,
v2 (1, u) = 10.
b) Assume now that the system starts with i customers in queue at time
2 with the same final cost at
time 0. Find the expected aggregate cost vi (2, u) for 0  i  2.
Solution: As in dynamic programming (simplified by having no policy choice),
X
vi (2, u) = ri +
Pij vj (1, u) = ri + 7.
j
v0 (2, u) = 7,
c) For an arbitrary starting time
v1 (2, u) = 8,
v2 (3, u) = 12.
n, find the expected aggregate cost vi (n, u) for 0  i  2.
132
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: Using the same approach as in (b),
X
vi (n, u) = ri +
Pij vj (n 1, u).
(A.73)
j
We note that in going from n = 1 to n = 2, vi (2, u) = 2 + vi (1, u), and thus
X
X
Pij vj (2, u) = 2 +
Pij vj (1, u).
j
ij
We can take this as the basis for an inductive proof that
X
X
Pij vj (n, u) = 2 +
Pij vj (n 1, u).
j
ij
The inductive step follows from (A.73) by multiplying by Pij and summing over i. We thus
see that
vi (n, u) = ri + 3 + 2n.
To be a little more honest and a little less elegant, one would calculate vi (n, u) for n = 3
(and perhaps n = 4), see the pattern, and then look for the inductive proof.
d) Find the cost per stage and find the relative cost (gain) vector.
Solution: This was essentially done in (c). The gain (cost) per stage is 2 and the relative
gain (cost) vector is (3, 4, 8).
e) Now assume that there is a decision maker who can choose whether or not to hire the extra server when
there are two customers in queue. If the extra server is not hired, the 3 unit fee is saved, but only one of
the customers is served. If there are two arrivals in this case, assume that one is turned away at a cost of 5
units. Find the minimum dynamic aggregate expected cost vi⇤ (1), 0  i , for stage 1 with the same final
cost as before.
Solution: We have analyzed the stationary policy where the extra server is hired in state
2. The possible new policy involves a decision in state 2 which we denote by primes. For
the decision not to hire an additional server, the cost of the server is eliminated but the cost
of throwing away an arrival given 2 new arrivals is 5. It is more convenient to view the cost
of throwing away an arrival as an immediate cost rather than a final cost (the student is
welcome to pursue the other alternative, but it beomes unwieldy). Thus r20 = 2+5/3 = 11/3.
0 = 0, P 0 = 1/3 and P 0 = 2/3.
Also P20
21
22
We see that the minimum cost decision at stage 1 in states 0 and 1 does not involve the
decision in state 2, so v0⇤ (1, u) = 1 and v ⇤1 (1, u) = 2 as before. For state 2,
⇣
⌘
⇣
X
11 5 20 ⌘
v2⇤ (1, u) = min 10, r20 +
Pij0 uj = min 10,
+ +
.
3
3
3
j
This is the minimum of 10 and 12, which is 10, showing that the original decision to hire
an extra server is optimal at stage 1.
f ) Find the minimum dynamic aggregate expected cost vi⇤ (n, u) for stage n, 0  i  2.
133
A.4. SOLUTIONS FOR CHAPTER 4
Solution: Repeating (b) - (d) with this additional decision, we see that the decision to hire
an extra server is optimal at each stage.
g) Now assume a final cost u of one unit per customer rather than 5, and find the new minimum dynamic
aggregate expected cost vi⇤ (n, u), 0  i  2.
Solution: The decision to hire the extra server in state 2 at stage 1 has immediate cost
5 (as before) but average final cost 1. Thus v2 (1, u 0 ) = 6 where u 0 = (0, 1, 2)T . The cost
under the decision to not hire is
X
11 1 4
16
v20 (1, u 0 ) = r20 +
Pij0 u0j =
+ +
=
.
3
3 3
3
j
Thus the optimal policy at stage 1 is to not hire the extra server in state and
v0⇤ (1, u 0 ) = 1;
v1⇤ (1, u 0 ) = 2;
v2⇤ (1, u 0 ) =
16
.
3
Using these results for the stage 2 computation, we find that v2 (2, u 0 ) = 70/9 and v20 (2, u 0 ) =
71/9, so that hiring the extra server is the optimal decision and v2⇤ (2) = 70/9.
v0⇤ (2, u 0 ) =
25
;
9
v1⇤ (2, u 0 ) =
34
;
9
v2⇤ (2, u 0 ) =
70
.
9
38e
This can be rewritten as v ⇤ (2, u 0 ) = 4 + w
9 . We can use this to hypothesize that
38e
⇤
0
v (n, u ) = 2n + w
.
This
can
be
checked
for
n 2 in general by induction, comparing
9
0
0
0
v2 (n, u ) with v2 (n, u ) to see that hiring an extra server is the minimum cost decision for
each n 2 when in state 2.
134
A.5
APPENDIX A. SOLUTIONS TO EXERCISES
Solutions for Chapter 5
Exercise 5.1: The purpose of this exercise is to show that for an arbitrary renewal process, N (t), the
number of renewals in (0, t] is a (non-defective) random variable.
a) Let X1 , X2 , . . . be a sequence of IID inter-renewal rv’s. Let Sn = X1 + · · · + Xn be the corresponding
renewal epochs for each n
1. Assume that each Xi has a finite expectation X > 0 and, for any given
t > 0, use the weak law of large numbers to show that limn!1 Pr{Sn  t} = 0.
Solution: From the WLLN, (1.75),
lim Pr
n!1
⇢
Sn
n
X >✏
=0
for every ✏ > 0.
Choosing ✏ = X/2, say, and looking only at the lower limit above, limn!1 Pr Sn < nX/2 =
0. For any given t and all large enough n, t < nX/2, so limn!1 Pr{Sn  t} = 0.
b) Use (a) to show that limn!1 Pr{N (t)
n} = 0 for each t > 0 and explain why this means that N (t) is
a rv, i.e., is not defective.
Solution: Since {Sn  t} = {N (t) n}, we see that limn!1 Pr{N (t)
t > 0. Since N (t) is nonnegative, this shows that it is a rv.
n} = 0 for each
c) Now suppose that the Xi do not have a finite mean. Consider truncating each Xi to X̆i , where for any
given b > 0, X̆i = min(Xi , b). Let N̆ (t) be the renewal counting process for the inter-renewal intervals X̆i .
Show that N̆ (t) is non-defective for each t > 0. Show that N (t)  N̆ (t) and thus that N (t) is non-defective.
Note: Large inter-renewal intervals create small values of N (t), and thus E [X] = 1 has nothing to do with
potentially large values of N (t), so the argument here was purely technical.
n
o
Solution: Since Pr X̆ > b = 0, we know that X̆ has a finite mean, and consequently,
from (b), N̆ (t) is a rv for each t > 0. Since X̆n  Xn for each n, we also have S̆n  Sn for
all n 1. Thus if S̆n > t, we also have Sn >n t. Consequently,
if N̆ (t) < n, we also have
o
N (t) < n. It follows that Pr{N (t) n}  Pr N̆ (t) n , so N (t) is also a rv.
Exercise 5.2: This exercise shows that, for an arbitrary renewal process, N (t), the number of renewals
in (0, t], has finite expectation.
a) Let the inter-renewal intervals have the CDF FX (x), with, as usual, FX (0) = 0. Using whatever combination of mathematics and common sense is comfortable for you, show that for any ✏, 0 < ✏ < 1, there is a
> 0 such that FX ( )  1
✏. In other words, you are to show that a positive rv must lie in some range of
positive values bounded away from 0 with positive probability.
Solution: Consider the sequence of events {X > 1/k; k 1}. The union of these events
is the event {X > 0}. Since Pr{X  0} = 0, Pr{X > 0} = 1. The events {X > 1/k} are
nested in k, so that, from (1.9),
(
)
[
1 = Pr
{X > 1/k} = lim Pr{X > 1/k} .
k
k!1
135
A.5. SOLUTIONS FOR CHAPTER 5
Thus, for any 0 < ✏ < 1, and any k large enough, Pr{X > 1/k} > ✏. Taking to be 1/k
for that value of k shows that Pr{X  }  1 ✏. Another equally good approach is to use
the continuity from the right of FX .
b) Show that Pr{Sn  }  (1
✏)n .
Solution: Sn is the sum of n interarrival times, and, bounding very loosely, Sn  implies
that Xi  for each i, 1  i  n. Using the ✏ and of (a), Pr{Xi  }  (1 ✏) for
1  i  n. Since the Xi are independent, we then have Pr{Sn  }  (1 ✏)n .
c) Show that E [N ( )]  1/✏.
Solution: Since N (t) is nonnegative and integer,
E [N ( )] =
=

=
d) For the ✏,
t > 0.
1
X
Pr{N ( )
n=1
1
X
n=1
1
X
n}
Pr{Sn  }
(1
✏)n
n=1
1
1 ✏
1 ✏
1
=
 .
(1 ✏)
✏
✏
of (a), show that for every integer k, E [N (k )]  k/✏ and thus that E [N (t)]  t+
for any
✏
Solution: The solution of (c) suggests breaking the interval (0, k ] into k intervals each
ei = N (i ) N ((i 1) ) be the number of arrivals in the ith of these
of size . Letting N
h i
P
ei .
intervals, we have E [N ( k)] = ki=1 E N
h i
e1  1/✏, but that argument does
For the first of these intervals, we have shown that E N
not quite work for the subsequent intervals, since the first arrival in an interval i > 1 might
correspond to an interarrival interval that starts early in the interval i 1 and ends late in
interval i. Subsequent arrivals in interval i must correspond to interarrival intervals that
(i)
both start and end in interval i and thus have duration at most . Thus if we let Sn be
the number of arrivals in the ith interval, we have
n
o
Pr Sn(i) 
 (1 ✏)n 1 .
Repeating the argument in (c) for i > 1, then,
1
h i
n
X
e
ei
E Ni =
Pr N

n=1
1
X
(1
n=1
1
o
n
o
X
n =
Pr Sn(i) 
✏)n 1 =
n=1
1
1
(1
✏)
=
1
.
✏
136
Since E [N ( k)] =
APPENDIX A. SOLUTIONS TO EXERCISES
h i
ei , we then have
E
N
i=1
Pk
E [N (k )]  k/✏.
Since N (t) is non-decreasing in t, N (t) can be upper bounded by replacing t by the smallest
integer multiple of 1/ that is t or greater, i.e.,
E [N (t)]  E [N ( dt/ e)] 
dt/ e
(t/ ) + 1

.
✏
✏
e) Use the result here to show that N (t) is non-defective.
Solution: Since N (t), for each t, is nonnegative and has finite expectation, it is obviously
non-defective. One way to see this is that E [N (t)] is the integral of the complementary CDF,
FcN (t) (n) of N (t). Since this integral is finite, FcN (t) (n) must approach 0 with increasing n.
Exercise 5.3: Let {Xi ; i
1} be the inter-renewal intervals of a renewal process generalized to allow for
inter-renewal intervals of size 0 and let Pr{Xi =0} = ↵, 0 < ↵ < 1. Let {Yi ; i 1} be the sequence of nonzero interarrival intervals. For example, if X1 = x1 >0, X2 = 0, X3 = x3 >0, . . . , then Y1 =x1 , Y2 =x3 , . . . .
a) Find the CDF of each Yi in terms of that of the Xi .
Solution: The ituitive approach here is to note that each positive inter-renewal interval has the distribution of a generalized renewal, conditional on the generalized renewal
being positive. Thus the positive inter-renewal intervals are IID with the distribution
Pr{Y > y} = Pr{X > y} /Pr{X > 0}. Rearranging this in terms of the CDF,
FY (y) = 1
1
1
FX (y)
FX (y) FX (0)
=
.
FX (0)
1 FX (0)
(A.74)
If one wishes to see this worked out with all the steps, we can start with Y1 . Conditional
on X1 > 0, Y1 has the CDF in (A.74). Next, conditional on X1 = 0 and X2 > 0, Y1 again
has the CDF in (A.74). Continuing with X1 = 0, X2 = 0 and X3 > 0 etc. we get (A.74) in
general. We can then continue with Y2 , Y3 , . . . in the same way.
b) Find the PMF of the number of arrivals of the generalized renewal process at each epoch at which arrivals
occur.
Solution: There is no arrival at t = 0 if X1 > 0. There is one arrival if X1 = 0, X2 > 0, etc.
Letting the rv K be the number of arrivals at 0, pK (k) = (Pr{X = 0})k Pr{X > 0}. If there
are arrivals at t = 0 (i.e., given one or more arrivals at t = 0), then Pr{K = k|K 1} =
(Pr{X = 0})k 1 Pr{X > 0}. This also gives the probability of k arrivals at any epoch at
which 1 or more arrivals occur.
137
A.5. SOLUTIONS FOR CHAPTER 5
c) Explain how to view the generalized renewal process as an ordinary renewal process with inter-renewal
intervals {Yi ; i
1} and bulk arrivals at each renewal epoch.
Solution: The generalized renewal process here is essentially a special case of bulk arrivals
at the epochs of an ordinary renewal process; the number of bulk arrivals at each ordinary
arrival epoch is geometrically distributed (with one or more arrivals). There is a peculiarity
here about arrivals at t = 0. For an ordinary renewal process with geometric bulk arrivals,
there are no arrivals at time zero, whereas here there are geometrically many (with 0 or
more arrivals) at t = 0. It is difficult to imagine an application in which arrivals at time 0
are desirable, let alone being geometric (with zero or more arrivals) rather than with 1 or
more arrivals.
d) When a generalized renewal process is viewed as an ordinary renewal process with bulk arrivals, what is
the distribution of the bulk arrivals? (The point of this part is to illustrate that bulk arrivals on an ordinary
renewal process are considerably more general than generalized renewal processes.)
Solution: As seen in (b), the number of bulk arrivals at each ordinary arrival epoch is
geometrically distributed with one or more arrivals.
Exercise 5.4: Is it true for a renewal process that:
a) N (t) < n if and only if Sn > t?
b) N (t)  n if and only if Sn
t?
c) N (t) > n if and only if Sn < t?
Solution: Part (a) is true, as pointed out in (5.1) and more fully explained in (2.2) and
(2.3).
Parts b) and c) are false, as seen by any situation where Sn < t and Sn+1 > t. In these cases,
N (t) = n. In other words, one must be careful about strict versus non-strict inequalities
when using {Sn  t} = {N (t) n}
Exercise 5.5: (This shows that convergence WP1 implies convergence in probability.) Let {Yn ; n
1}
be a sequence of rv’s that converges to 0 WP1. For any positive integers m and k, let
A(m, k) = {! : |Yn (!)|  1/k
for all n
m}.
a) Show that if limn!1 Yn (!) = 0 for some given !, then (for any given k) ! 2 A(m, k) for some positive
integer m.
Solution: Note that for a given !, {Yn (!); n 1} is simply a sequence of real numbers.
The definition of convergence of such a sequence to 0 says that for any ✏ (or any 1/k where
k > 0 is an integer), there must be an m large enough that Yn (!)  1/k for all n m. In
other words, the given ! is contained in A(m, k) for that m.
b) Show that for all k
1
Pr
n[ 1
m=1
o
A(m, k) = 1.
Solution: The set of ! for which limn!1 Yn (!) = 0 has probability 1. For any such !
and any given k, part (a) showed that ! 2 A(m, k) for some integer m > 0. Thus each such
! lies in the above union, implying that the union has probability 1.
138
APPENDIX A. SOLUTIONS TO EXERCISES
c) Show that, for all m
1, A(m, k) ✓ A(m+1, k). Use this (plus (1.9)) to show that
lim Pr{A(m, k)} = 1.
m!1
Solution: Note that if |Yn (!)|  1/k for all n
m, then also |Yn (!)|  1/k for all
n m + 1. This means that A(m, k) ✓ A(m+1, k). From (1.9) then
n[
o
1 = Pr
A(m, k) = lim Pr{A(m, k)} .
m
m!1
d) Show that if ! 2 A(m, k), then |Ym (!)|  1/k. Use this (plus (c)) to show that
lim Pr{|Ym | > 1/k} = 0.
m!1
Since k
1 is arbitrary, this shows that {Yn ; n
1} converges in probabiity.
Solution: Note that if |Yn (!)|  1/k for all n m, then certainly |Yn (!)|  1/k for n =
m. It then follows from (c) that limm!1 Pr{|Ym |  1/k} = 1, which is equivalent to the
desired statement. This shows that {Yn ; n 1} converges in probability.
Exercise 5.6: A town starts a mosquito control program and the rv Zn is the number of mosquitoes
at the end of the nth year (n = 0, 1, 2, . . . ). Let Xn be the growth rate of mosquitoes in year n; i.e.,
Zn = Xn Zn 1 ; n 1. Assume that {Xn ; n 1} is a sequence of IID rv’s with the PMF Pr{X=2} = 1/2;
Pr{X=1/2} = 1/4; Pr{X=1/4} = 1/4. Suppose that Z0 , the initial number of mosquitos, is some known
constant and assume for simplicity and consistency that Zn can take on non-integer values.
a) Find E [Zn ] as a function of n and find limn!1 E [Zn ].
Solution: First we find the expected value of Xn (which is the same for each n).
E [Xn ] =
1
1 1 1 1
19
·2+ · + · = .
2
4 2 4 4
16
E [Zn ] = E [Xn ] E [Zn 1 ] = E [Xn ] E [Xn 1 ] · · · E [X1 ] Z0 =
Assuming that Z0 > 0, we see that limn!1 E [Zn ] = 1.
⇣ 19 ⌘n
16
Z0 .
b) Let Wn = log2 Xn . Find E [Wn ] and E [log2 (Zn /Z0 )] as a function of n.
Solution:
E [Wn ] =
1
1
1
1
1
log2 (2) + log2 ( ) + log2 ( ) =
2
4
2
4
4
log2 (Zn /Z0 ) = log2 (X1 X2 · · · Xn ) =
Thus E [log(Zn /Z0 )] =
n
X
1
.
4
Wn .
j=1
n/4.
c) There is a constant ↵ such that limn!1 (1/n)[log2 (Zn /Z0 )] = ↵ with probability 1. Find ↵ and explain
how this follows from the strong law of large numbers.
139
A.5. SOLUTIONS FOR CHAPTER 5
Solution: From (b), we see that (1/n) log2 Zn /Z0 is the sample average of n IID rv’s.
The SLLN applies since E [|Wn |] < 1. Thus
lim (1/n)[log2 (Zn /Z0 )] =
n!1
d) Using (c), show that limn!1 Zn =
1
4
WP1.
with probability 1 for some
(A.75)
and evaluate .
Solution: For any sample value ! satisfying (A.75), and any ✏ > 0, there must be an m
for which
1
log Zn (!)/Z0 <
n
1
+ ✏;
4
for all n
For each such !, Zn (!) < Z0 2 n(0.25 ✏) for all n
m.
limn!1 Zn (!) = 0. It follows that limn!1 Zn = 0 WP1.
m.
Thus (choosing ✏ < 0.25),
e) Explain carefully how the result in (a) and the result in (d) are compatible. What you should learn from
this problem is that the expected value of the log of a product of IID rv’s might be more significant than
the expected value of the product itself.
Solution: After a very large number of years, the mosquito population will increase in
about half the years and decrease in about half the years, but half the decreases lead to
quartering of the population rather than halving. Thus we expect the mosquito population
to gradually die out with high probability, and (d) states this precisely.
When we look at the expected number of mosquitoes, however, very rare events have an
extreme e↵ect on the average. For example, doubling each year for n years in a row has
probability 2 n , but leads to 2n mosquitoes, hiding the large probability of a gradually
decreasing population.
We will see a number of situations (including gambling and investing) that involve products
of IID rv’s . This same sort of e↵ect is typical in these cases.
Exercise 5.7: In this exercise, you will find an explicit expression for {! : limn Yn (!) = 0}. You need
not be mathematically precise.
a) Let {Yn ; n
1} be a sequence of rv’s. Using the definition of convergence for a sequence of numbers,
justify the following set equivalences:
\1
{! : lim Yn (!) = 0} =
{! : there exists an m such that |Yn (!)|  1/k for all n m}(A.76)
n
k=1
\1 [1
=
{! : Yn (!)  1/k for all n m}
(A.77)
k=1
m=1
\1 [1 \1
=
{! : Yn (!)  1/k}.
(A.78)
k=1
m=1
n=m
Solution: For any given sample point !, {Yn (!); n
1} is a sample sequence of the
random sequence {Yn ; n
1} and is simply a sequence of real numbers. That sequence
converges to 0 if for every integer k > 0, there is an m(k) such that Yn (!)  1/k for n k.
The set of ! that satisfies this test is given by (A.76). A set theoretic way to express the
existence of an m(k) is to take the union over m
1, giving us (A.77). Finally, the set
theoretic way to express ‘for all n m’ is to take the intersection over n m, giving us
(A.78).
140
APPENDIX A. SOLUTIONS TO EXERCISES
b) Explain how this shows that {! : limn Yn (!) = 0} must be an event.
Solution: Since Yn is a rv, {! : Yn (!)  1/k} is an event for all n, k. The countable
intersection over n
m is then an event (by the axioms of probability), the countable
union of these events is an event for each k, and the final intersection is an event.
c) Use De Morgan’s laws to show that the complement of the above equivalence is
[1 \1 [1
{! : lim Yn (!) = 0}c =
{! : Yn (!) > 1/k}.
n
k=1
m=1
n=m
k=1
m=1
n=m
(A.79)
S
T
Solution: ForTa sequenceSof sets A1 , A2 , . . . , de Morgan’s laws say that n Acn = { n An }c
and also that n Acn = { n An }c . Applying the second form of de Morgan’s laws to the
right side of the complement of each side of (A.78), we get
oc
[1 n [1 \1
{! : lim Yn (!) = 0}c =
{! : Yn (!)  1/k}
k=1
m=1
n=m
n
oc
[1 \1 n \1
=
{! : Yn (!)  1/k}
k=1
m=1
n=m
oc
[1 \1 [1 n
=
! : Yn (!)  1/k
k=1
m=1
n=m
[1 \1 [1
=
! : Yn (!) > 1/k
In the second line, the first form of de Morgan’s laws were applied to the complemented
term in braces on the first line. The third line applied the second form to the complemented
term in the second line.
d) Show that for {Yn ; n
1} to converge to zero WP1, it is necessary and sufficient to satisfy
n\ 1 [ 1
o
Pr
{Yn > 1/k} = 0
for all k 1.
m=1
n=m
(A.80)
Solution: Applying the union bound to (A.79),
1
n
o
n\1
X
Pr {! : lim Yn (!) = 0}c 
Pr
n
k=1
m=1
[1
n=m
o
{Yn > 1/k} .
If (A.80) is satisfied for all k 1, then the above sum is 0 and Pr{{! : limn Yn (!) = 0}c } =
0. This means that Pr{limn!1 Yn = 0} = 1, i.e., that limn!1 Yn = 0 WP1.
If (A.80)
is not
T1
S1satisfied for some given k, on the other hand, then the probability of the
event { m=1 n=m {Yn > 1/k}} must be positive, say ✏ > 0, so, using (A.79) and lower
bounding the probabiity of the union over k by the single k above,
n
o
n[1 \1 [1
o
Pr {! : lim Yn (!) = 0}c
= Pr
{! : Yn (!) > 1/k}
k=1
m=1
n=m
n
n\1 [1
o
Pr
{! : Yn (!) > 1/k}
✏.
m=1
n=m
Thus there is a set of !, whose probability is at least ✏, for which limn Yn (!) either does
not exist or does not equal 0.
e) Show that for {Yn ; n
1} to converge WP1, it is necessary and sufficient to satisfy
n[ 1
o
lim Pr
{Yn > 1/k} = 0
for all k 1.
m!1
n=m
141
A.5. SOLUTIONS FOR CHAPTER 5
Hint: Use (a) of Exercise 5.8. Note: (e) provides an equivalent condition that is often useful in establishing
convergence WP1. It also brings out quite clearly the di↵erence between convergence WP1 and convergence
in probability.
S
Solution: As in Exercise 5.8, let Bm = n m {Yn > 1/k}. Then the condition in (e)
is lim
nTm!1 Pr{B
o m } = 0 and the necessary and sufficient condition established in (d) is
Pr
m 1 Bm = 0. The equivalence of these conditions is implied by (1.10).
The solution to this exercise given here is mathematically precise and has the added benefit
of showing that the various sets used in establishing the SLLN are in fact events since they
are expressed as countable intersections and unions of events.
T
Exercise 5.8: (The Borel-Cantelli lemma) Consider the event
arbitrary events.
m 1
S
n m An where A1 , A2 , . . . are
a) Show that
lim Pr
m!1
Hint: Apply (1.10).
Solution: Let Bm =
n[
n m
An
o
()
=0
S
n\
()
m!1
Since Bm ◆ Bm+1 for each m
Pr
Pm
n\
m 1
1, this is implied by (1.10).
b) Show that
m=1 Pr{Am } < 1 implies that limm!1 Pr
P1
P1
m=1 Pr{Am } < 1 implies that limm!1
n=m Pr{An } = 0.
Solution: Let Sm =
m 1
[
An
n m
o
= 0.
n m An . Then we are to show that
lim Pr{Bm } = 0
P1
Pr
S
o
Bm = 0.
n m An
= 0. Hint: First explain why
n=1 Pr{An }. By definition of an infinite limit,
1
X
m=1
Pr{Am } =
lim
m!1
m
X
n=1
Pr{Am } =
lim Pr{Sm } .
m!1
Since
0, {Pr{Sm } ; m
1} is non-decreasing. Assume in what follows that
P1 Pr{An }
Pr{A
}
<
1
with
some
given
value ↵. Then limm!1 Pr{Sm } exists
m
m=1
P1 with the value
↵. We then have limm!1 (↵ Pr{S
=
0,
which
means
that
lim
})
m!1
n=m+1 Pr{An } =
Pm1
0. Shifting the index m, limm!1 n=m Pr{An } = 0. Now, using the union bound,
n[
o X
Pr
An 
Pr{An } .
n m
n m
Taking the limit as m ! 1, the right side is 0 so limm!1 Pr
c) Show that if
P1
m=1 Pr{Am } < 1, then Pr
Borel-Cantelli lemma.
Solution: If
P1
nT
m 1
S
n m An
o
nS
= 0. This well-known result is called the
from (b), limm!1 Pr
m=1nPr{Am } < 1, then
o
this implies that Pr
T
m 1
S
n m An
= 0.
o
A
 0.
n
n m
S
n m An
= 0. From (a),
142
d) The set
APPENDIX A. SOLUTIONS TO EXERCISES
T
m 1
S
n m An
is often referred to as the set of sample points ! that are contained in
infinitely many of the An . Consider an ! that is contained in a finite number k of the sets An and argue
that there must be an integer m such that ! 2
/ An for all n > m. Conversely, if ! is contained in infinitely
many of the An , show that there cannot be such an m.
Solution: If a given ! is in any of the sets An , let b1 be the smallest n for which ! 2 An ,
If it is in other sets, let b2 be the next smallest, and so forth. Thus ! 2 Abi for 1  i  k.
If k is finite, then ! 2
/ An for all n > bk . If k is infinite, then for all m, ! 2 An for some
n > m. This type of set looks very complicated at first, but it arises in many situations,
and it is useful intuitively to interpret it as the event of infinitely recurring sample points.
Exercise 5.9: (Strong law for renewals where X = 1) Let {Xi ; i 1} be the inter-renewal intervals
of a renewal process and assume that E [Xi ] = 1. Let b > 0 be an arbitrary number and X̆i be a truncated
random variable defined by X̆i = Xi if Xi  b and X̆i = b otherwise.
h i
a) Show that for any constant M > 0, there is a b sufficiently large so that E X̆i
M.
Solution: Since E [X] =
an infinite limit that
R1
0
FcX (x) dx = 1, we know from the definition of an integral over
E [X] = lim
Z b
b!1 0
FcX (x) dx = 1.
For
h X̆
i = Rmin(X, b), we see that FX̆ (x) = FX (x) for x < b and FX̆ (x) = 1 for x b. Thus
b
E X̆ = 0 FcX (x) dx. We have just seen that the limit of this as b ! 1 is 1, so that for
h i
any M > 0, there is a b sufficiently large that E X̆
M.
b) Let {N̆ (t); t 0} be the renewal counting process with inter-renewal intervals {X̆i ; i
for all t > 0, N̆ (t)
1} and show that
N (t).
Solution: Note that X X̆ is a non-negative rv, i.e., it is 0 for X  b and greater than b
otherwise. Thus X̆  X. It follows then that for all n 1,
S̆n = X̆1 + X̆2 + · · · X̆n  X1 + X2 + · · · Xn = Sn .
Since S̆n  Sn , it follows for all t > 0 that if Sn  t then also S̆n  t. This then means that
if N (t) n, then also N̆ (t) n. Since this is true for all n, N̆ (t) N (t), i.e., the number
of renewals after truncation is greater than or equal to the number before truncation.
c) Show that for all sample functions N (t, !), except a set of probability 0, N (t, !)/t  2/M for all sufficiently
large t. Note: Since M is arbitrary, this means that limt!1 N (t)/t = 0 with probability 1.
h i
Solution: Let M and b < 1 such that E X̆
M be fixed in what follows. Since X̆  b,
h i
we see that E X̆ < 1, so we can apply Theorem 5.3.1, which asserts that
8
9
<
N̆ (t, !)
1 =
Pr ! : lim
= h i
= 1.
: t!1
t
E X̆ ;
143
A.5. SOLUTIONS FOR CHAPTER 5
Let A(M ) denote the seth ofisample points for which the above limit exists, i.e., for which
limt!1 N̆ (t, !)/t = 1/E X̆ ; Thus A(M ) has probability 1 from Theorem 5.3.1. We will
show that, for each ! h2 A(M
i ), limt N (t, !)/t  1/(2M ). We know that for every ! 2 A(M ),
limt N̆ (t, !)/t = 1/E X̆ . The definition of the limit of a real-valued function states that
for any ✏ > 0, there is a ⌧ (✏) such that
N̆ (t, !)
t
1
h i
E X̆
for all t
✏
⌧ (✏).
Note that ⌧ (✏) depends on b and ! as well ash✏, iso we denote it as ⌧ (✏, b, !). Using only
one side of this inequality, N (t, !)/t  ✏ + 1/E X̆ for all t ⌧ (✏, b, !). Since we have seen
h i
that N (t, !)  N̆ (t, !) and E X̆
M , we have
N (t, !)
1
✏+
t
M
for all t
⌧ (✏, b, !).
Since ✏ is arbitrary, we can choose it as 1/M , so N (t,
T!)/t  1/(2M ) for all sufficiently large t
for each ! 2 A(M ). Now consider the intersection M A(M ) over integer M 1. Since each
A(M ) has probability 1, the intersection has probability 1 also. For all ! in this intersection,
limn!1 Pr{N (t)/t}  1/(2M ) for all integer M 1, so limn!1 Pr{N (t)/t} = 0 WP1.
⇥
⇤
⇥
⇤
Exercise 5.10: The first part of the proof of the SLLN showed that if E X 4 < 1, then E X 2 < 1.
n
m
Here we generalize this to show that for 1  m < n, if E [|X | < 1], then E [|X | < 1].
a) For any x > 0, show that xm  1 + xn .
Solution: This is virtually the same as the argument used in the proof of the SLLN. If
x  1, then xm  1 for any m 1 and if x > 1, then xm < xn for any m  n. Thus for all
x > 0, xm  1 + xn .
b) For any rv X, show that if E [|X n |] < 1, then E [|X m |]  1 + E [|X n |].
Solution: For any sample value X(!) of X, (a) shows that |X m (!)|  1 + |X n (!)|. Taking
the expectation yields E [|X m |]  1 + E [|X n |].
Exercise 5.11: Let Y (t) = SN (t)+1
t be the residual life at time t of a renewal process. First consider
a renewal process in which the interarrival time has density fX (x) = e x ; x 0, and next consider a renewal
process with density
fX (x) =
3
;
(x + 1)4
x
0.
(A.81)
For each of the above densities, use renewal-reward theory to find:
i) the time average of Y (t)
Solution: From (5.17),
lim
t!1
⇥ 2⇤
Y
(⌧
)
d⌧
E
X
⌧ =0
=
t
2E [X]
Rt
WP1.
144
APPENDIX A. SOLUTIONS TO EXERCISES
⇥ ⇤
For fX (x) = e x for x > 0, E [X] = 1 and E X 2 = 2. Thus the time-average residual life
is 1. This is something we already know, since if the interarrival time is exponential, the
renewal process is Poisson.
⇥ ⇤
R1
For
density in (A.81), we have E [X] = 0 3x(x + 1) 4 dx = 1/2 and E X 2 =
R 1 the
2
4 dx = 1, so the time-average residual life is again 1.
0 3x (x + 1)
ii) the second moment in time of Y (t) (i.e., limT !1 T1
RT
0
Y 2 (t)dt).
Solution: We regard Y 2 (t) as simply another reward function. Given an inter-renewal
interval xn , the reward at each
life r from 0 to xn is r2 , so the accumulated reward
R xn residual
2
3
over that interval
⇥ 3 ⇤ is rn = 0 r dr = xn /3. Thus the expected reward over an interval is
E [Rn ] = E X /3. Using Theorem 5.4.5, then, the time-average second moment is
lim
t!1
Rt
⌧ =0 Y
2 (⌧ ) d⌧
t
⇥ ⇤
E X3
=
3E [X]
⇥ ⇤
WP1 if E X 3 < 1.
⇥ ⇤
For fX (x) = e x , E X 3 = 6 so the time-average second moment is 2.
For the density in (A.81),
⇥ ⇤
E X3 =
Z 1
0
3x3 (x + 1) 4 dx = 1.
so the time-average second moment is infinite. This is somewhat shocking, but the distribution of X has tails that go to 0 rather slowly, and the enormous acculated reward over
large sample values of Xn cause this infinite reward.
⇥
⇤
For the exponential density, verify your answers by finding E [Y (t)] and E Y 2 (t) directly.
Solution: The exponential density yields a Poisson process and the residual life at any
time
independent of the previous arrival. Thus E [Y (t)] = 1 for all t and
⇥ 2is exponential,
⇤
E Y (t) = 2 for all t.
Exercise 5.12: Consider a variation of an M/G/1 queueing system in which there is no facility to save
waiting customers. Assume customers arrive according to a Poisson process of rate . If the server is busy,
the customer departs and is lost forever; if the server is not busy, the customer enters service with a service
time CDF denoted by FY (y).
Successive service times (for those customers that are served) are IID and independent of arrival times.
Assume that customer number 0 arrives and enters service at time t = 0.
a) Show that the sequence of times S1 , S2 , . . . at which successive customers enter service are the renewal
times of a renewal process. Show that each inter-renewal interval Xi = Si
Si 1 (where S0 = 0) is the sum
of two independent random variables, Yi + Ui where Yi is the ith service time; find the probability density
of Ui .
Solution: Let Y1 be the first service time, i.e., the time spent serving customer 0. Customers who arrive during (0, Y1 ] are lost, and, given that Y1 = y, the residual time until the
next customer arrives is memoryless and exponential with rate . Thus the time X1 = S1
at which the next customer enters service is Y1 + U1 where U1 is exponential with rate ,
i.e., fU1 (u) = exp( u).
145
A.5. SOLUTIONS FOR CHAPTER 5
At time X1 , the arriving customer enters service, customers are dropped until X1 + Y2 , and
after an exponential interval U2 of rate a new customer enters service at time X1 + X2
where X2 = Y2 +U2 . Both Y2 and U2 are independent of X1 , so X2 and X1 are independent.
Since the Yi are IID and the Ui are IID, X1 and X2 are IID. In the same way, the sequence
X1 , X2 , . . . are IID intervals between successive services. Thus {Xi ; i 1} is a sequence of
inter-renewal intervals for a renewal process and S1 , S2 , . . . are the renewal epochs.
b) Assume that a reward (actually a cost in this case) of one unit is incurred for each customer turned away.
Sketch the expected reward function as a function of time for the sample function of inter-renewal intervals
and service intervals shown below; the expectation is to be taken over those (unshown) arrivals of customers
that must be turned away.
?
Y1
S0 = 0
?
?
Y2
S1
?
?
Y3
S2
?
Solution: Customers are turned away at rate during the service times, so if we let R(t)
be the rate at which customers are turned away for a given sample path of services and
arrivals, we have R(t) = for t in a service interval and R(t) = 0 otherwise.
Y1
S0 = 0
-
Y2
S1
-
Y3
-
S2
Note that the number of arrivals within a service interval is dependent on the length of
the service interval but independent of arrivals outside of that interval and independent of
other service intervals.
To be more systematic, we would define {Zm ; m 1} as the sequence of customer interarrival intervals. This sequence along with the service sequence {Yn ; n 1} specifies the
sequences {Sn , Yn ; n 1}. The discarded customer arrivals within each service interval is
then Poisson of rate . The function {R(t); t > 0} above would then better be described as
E [R(t) | {Sn , Yn ; n 1}] where R(t) would then become a unit impulse at each discarded
arrival.
c) Let
Rt
of (1/t)
R(⌧ )d⌧ denote the accumulated reward (i.e., cost) from 0 to t and find the limit as t ! 1
R0 t
R(⌧ )d⌧ . Explain (without any attempt to be rigorous or formal) why this limit exists with
0
probability 1.
Solution: The reward within the nth inter-renewal interval (averaged over the discarded
arrivals, but conditional on Yn ) is Rn = Yn . Assuming that E [Yn ] < 1, Theorem 5.4.5
]
asserts that average reward, WP1, is E[YE[Y
]+1/ . Thus the theorem asserts that the limit
exists with probability 1. If E [Yn ] = 1, one could use a truncation argument to show that
the average reward, WP1, is .
d) In the limit of large t, find the expected reward from time t until the next renewal. Hint: Sketch this
expected reward as a function of t for a given sample of inter-renewal intervals and service intervals; then
find the time average.
Solution: For the sample function above, the reward to the next inter-renewal (again
146
APPENDIX A. SOLUTIONS TO EXERCISES
averaged over dropped arrivals) is given by
R(t)PP
P
YP
1 PP
P
S0 = 0
PP
P
YP
2 PP
P
PP
P
YP
3 PP
P
S1
S2
The reward over the nth inter-renewal interval is then Yn2 /2 so the sample path average
of the expected reward per unit time is
⇥ ⇤
E Y2
E [R(t)]
=
.
X
2(Y + 1/ )
e) Now assume that the arrivals are deterministic, with the first arrival at time 0 and the nth arrival at time
n
1. Does the sequence of times S1 , S2 , . . . at which subsequent customers start service still constitute the
renewal times⇣ of a renewal
⌘ process? Draw a sketch of arrivals, departures, and service time intervals. Again
Rt
find limt!1 0 R(⌧ ) d⌧ /t.
Solution: Since the arrivals are deterministic at unit intervals, the customer to be served
at the completion of Y1 is the customer arriving at dY1 e (the problem statement was not
sufficiently precise to specify what happens if a service completion and a customer arrival
are simultaneous, so we assume here that such a customer is served). The customers arriving
from time 1 to dY1 e 1 are then dropped as illustrated below.
S0
S1
dropped
6
6
6
served
S2
6
S3
6
?
Y1
6
6
6
?
S1 + Y2
6
?
S2 + Y3
We see that the interval between the first and second service is dY2 e and in general between
service n 1 and n is dYn e. These intervals are IID and thus form a renewal process.
Finding the sample path average as before,
E [R(t)]
E [dY e 1]
=
.
E [dY e]
X
Exercise 5.13: Let Z(t) = t
SN (t) be the age of a renewal process and Y (t) = SN (t)+1 t be the
residual life. Let FX (x) be the CDF of the inter-renewal interval and find the following as a function of
FX (x):
a) Pr{Y (t)>x | Z(t)=s}
Solution: First assume that X is discrete, and thus Sn is also discrete for each n. We first
find Pr{Z(t) = s} for any given s, 0 < s  t. Since {Z(t) = s} = {SN (t) = t s}, we have
Pr{Z(t)=s} = Pr SN (t) = t s
=
1
X
Pr{N (t) = n, Sn = t s}
n=0
=
1
X
n=0
Pr{Sn = t s, Xn+1 > s} =
FcX (s)
1
X
n=0
Pr{Sn = t s} (A.82)
147
A.5. SOLUTIONS FOR CHAPTER 5
where in the next to last step we noted that if the nth arrival comes at t s and N (t) = n,
then no arrivals occur in (t s, t], so Xn+1 > t s. The last step used the fact that
Xn+1 is IID over n and statistically independent of Sn . The figure below illustrates these
relationships and also illustrates how to find Pr{Y (t)>x, Z(t)=s}.
Sn
t
6
- Xn+1 > s
s
t
Pr{Y (t)>x, Z(t)=s} =
=
1
X
n=0
1
X
- Xn+1 > s + x
t+x
Pr{N (t)=n, Sn =t s, Sn+1 > t + x}
Pr{N (t)=n, Sn =t s, Xn+1 > s + x}
n=0
=
FcX (s + x)
1
X
Pr{N (t)=n, Sn =t s}
(A.83)
n=0
We can now find Pr{Y (t) > x | Z(t) = s} by dividing (A.83) by (A.82) and cancelling out
the common summation. Thus
Pr{Y (t) > x | Z(t) = s} =
FcX (s + x)
.
FcX (s)
We can see the intuitive reason for this answer from the figure above: Z(t) = s specifies
that there is an n such that Sn = t s and there are no arrivals in (t s, t]; Y (t) > x extends
this interval of no arrivals to (t s, t + x], and these probabilities do not depend on n.
The argument above assumed that X is discrete, but this is a technicality that could be
handled by a limiting argument. In this case, however, it is cleaner to modify the above
argument by working directly with the conditional probabilities.
Pr{Y (t) > x | Z(t) = s} =
=
=
1
X
n=0
1
X
Pr{N (t)=n} Pr Y (t) > x | SN (t) =t s, N (t)=n
Pr{N (t)=n} Pr{Xn+1 > t+s | Sn =t s, Xn+1 >s}
n=0
1
c (s+x) X
c (s+x)
FX
FX
Pr{N
(t)=n}
=
c (s)
c (s) ,
FX
FX
n=0
where the second line again uses the fact that {Sn =t s, N (t)=n} = {Sn =t s, Xn+1 >s}.
b) Pr{Y (t)>x | Z(t+x/2)=s}.
Solution: The conditioning event {Z(t + x/2) = s} means that there is an arrival at
t + x/2 s and no further arrival until after t + x/2. We must look at two cases, first where
the arrival at t + x/2 s comes at or before t (i.e., s x/2) and second where it comes
after t. The solution where s
x/2 is quite similar to (a), and we repeat the argument
148
APPENDIX A. SOLUTIONS TO EXERCISES
there for general X modifying the terms in (a) as needed. The diagram below will clarify
the various terms, assuming s x/2.
Sn
6
t + x2
-Xn+1 > s
s
t + x2
t
- Xn+1 > s + x
2
t+x
As in (a),
1
n
o X
n
o n
o
x
x
x
x
Pr Y (t)>x | Z(t+ )=s =
Pr N (t+ )=n Pr Y (t)>x | Z(t+ )=s, N (t+ )=n .
2
2
2
2
n=0
The conditioning event above is
n
o
n
x
x
x
Z t+
= s, N (t+ ) = n = Sn = t +
2
2
2
o
s, Xn+1 > s .
(A.84)
Given this conditioning event, the diagram shows that Y (t) > x implies that Xn+1 > s+ x2 .
Thus
n
o
x
Pr Y (t)>x | Z(t+ )=s
2
=
=
n
o n
x
x
x
Pr N (t+ )=n Pr Xn+1 >s+ | Sn = t +
2
2
2
n=0
1
X
FcX (s+ x2 )
FcX (s)
for s
o
s, Xn+1 > s
x/2.
(A.85)
For the other case, where s < x/2, the condition asserts that SN (t+ x2 ) > t. Thus there
must be an arrival in the interval from t to t+ x2 , so Pr Y (t) > x | Z(t+ x2 )=s = 0.
c) Pr{Y (t)>x | Z(t+x)>s} for a Poisson process.
Solution: The event {Z(t + x) > s} is the event that there are no arrivals in the interval
(t+x s, t+x]. We consider two cases separately, first s x and next s < x. If s x, then
there are no arrivals from t to t + x, so that Y (t) must exceed x. Thus
Pr{Y (t) > x | Z(t + x) > s} = 1;
for s
x.
Alternatively, if s < x, any arrival between t and t + x must arrive between t and t + x s,
so the probability of no arrival between t and t + x given no arrival between t + x s and
t + x is the probability of no arrival between t and t + x s.
Pr{Y (t) > x | Z(t + x) > s} = exp
where
(x s) ;
for s < x,
is the rate of the Poisson process.
Exercise 5.14: Let FZ (z) be the fraction of time (over the limiting interval (0, 1)) that the age of a
renewal process is at most z. Show that FZ (z) satisfies
Z z
1
FZ (z) =
Pr{X > x} dx
X x=0
Hint: Follow the argument in Example 5.4.7.
WP1.
149
A.5. SOLUTIONS FOR CHAPTER 5
Solution: We want to find the time-average CDF of the age Z(t) of an arbitrary renewal
process, and do this by following Example 5.4.7. For any given z > 0, define the reward
function R(t) to be 1 for Z(t)  1 and to be 0 otherwise, i.e.,
(
1 ; Z(t)  z
e
R(t) = R(Z(t), X(t))
=
.
0 ; otherwise
Note that R(t) is positive only over the first z units of a renewal interval. Thus Rn =
min(z, Xn ). It follows that
Z 1
E [Rn ] = E [min(z, Xn )] =
Pr{min(X, z) > x} dx
(A.86)
0
Z z
=
Pr{X > x} dx.
(A.87)
0
Rt
Let FZ (z) = limt!1 0 (1/t)R(⌧ ) d⌧ denote the fraction of time that the age is less than or
equal to z. From Theorem 5.4.5 and (A.87),
E [Rn ]
1
FZ (z) =
=
X
X
Z z
Pr{X > x} dx
WP1.
0
Exercise 5.15: a) Let J be a stopping rule and I{J n} be the indicator random variable of the event
{J
n}. Show that J =
P
n 1 I{J
n} .
Solution: Given that J P
= m for any given m 1, we see that I{J n} is 1 for each n  m
and 0 for n > m. Thus
n 1 I{J n} = m given J = m. Since this is true for all sample
P
values m of J, J = n 1 I{J n} .
b) Show that I{J
1}
I{J
2}
. . . , i.e., show that for each n
(except perhaps for a set of probability 0).
1, I{J
n} (!)
I{J
n+1} (!) for each ! 2 ⌦
Solution: Given J = m, I{J n} is 1 for n  m and 0 for n > m. Thus I{J n} I{J n+1}
with strict inequality when n = m. The statement I{J n} I{J n+1} is true for each event
{J = m}, however and thus true in general. The strange exception of a set of probability
0 is because rv’s can be undefined on a set of 0 probability, which is something we don’t
usually concern ourselves with and need not concern ourselves with here.
Exercise 5.16: a) Use Wald’s equality to compute the expected number of trials of a Bernoulli process
up to and including the kth success.
Solution: In a Bernoulli process {Xn ; n 1, P
we call trial n a success if Xn = 1. Define
a stopping trial J as the first trial n at which nm=1 Xm = k. This constitutes a stopping
rule since IJ=n is a function of X1 , . . . , Xn . Given J = n, SJ = X1 + · · · Xn = k, and
since this is true for all n, Sj = k unconditionally. Thus E [SJ ] = k. From Wald’s equality,
E [SJ ] = XE [J] so that E [J] = k/X.
We should have shown that E [J] < 1 to justify the use of Wald’s equality, but we will
show that in (b).
150
APPENDIX A. SOLUTIONS TO EXERCISES
b) Use elementary means to find the expected number of trials up to and including the first success. Use
this to find the expected number of trials up to and including the kth success. Compare with (a).
Solution: Let Pr{Xn = 1} = p. Then the first success comes at trial 1 with probability p,
and at trial n with probability (1 p)n 1 p. The expected time to the first success is then
1/p = 1/X. The expected time to the kth success is then k/X, which agrees with the result
in (a).
The reader might question the value of Wald’s equality in this exercise, since the demonstration that E [J] < 1 was most easily accomplished by solving the entire problem by
elementary means. In typical applications, however, the condition that E [J] < 1 is essentially trivial.
Exercise 5.17: A gambler with an initial finite capital of d > 0 dollars starts to play a dollar slot
machine. At each play, either his dollar is lost or is returned with some additional number of dollars. Let
Xi be his change of capital on the ith play. Assume that {Xi ; i=1, 2, . . . } is a set of IID random variables
taking on integer values { 1, 0, 1, . . . }. Assume that E [Xi ] < 0. The gambler plays until losing all his money
(i.e., the initial d dollars plus subsequent winnings).
a) Let J be the number of plays until the gambler loses all his money. Is the weak law of large numbers
sufficient to argue that limn!1 Pr{J > n} = 0 (i.e., that J is a random variable) or is the strong law
necessary?
Solution: We show below that the weak law is sufficient. The event {J > n} is the event
that Si > d for 1  i  n. Thus Pr{J > n}  Pr{Sn > d}. Since E [Sn ] = nX and
X < 0, we see that the event {Sn > d} for large n is an event in which Sn is very far above
its mean. Putting this event in terms of distance from the sample average to the mean,
⇢
Sn
d
Pr{Sn > d} = Pr
X>
X .
n
n
The WLLN says that limn!1 Pr | Snn X| > ✏ = 0 for all ✏ > 0, and this implies the same
statement with the absolute value signs removed, i.e., Pr Snn X > ✏ = 0 for all ✏ > 0. If
we choose ✏ = X/2 in the equation above, it becomes
⇢
nX
lim Pr Sn >
= 0.
n!1
2
Since X < 0, we see that
d > nX/2 for n > 2 d/X , and thus limn!1 Pr{Sn >
d} = 0.
b) Find E [J]. Hint: The fact that there is only one possible negative outcome is important here.
Solution: One stops playing on trial J = n if one’s capital reaches 0 for the first time
on the nth trial, i.e., if Sn = d for the first time at trial n. This is clearly a function
of X1 , X2 , . . . , Xn , so J is a stopping rule. Note that stopping occurs exactly on reaching
d since Sn can decrease with n only in increments of -1 and Sn is always integer. Thus
SJ = d
Using Wald’s equality, we then have
E [J] =
d
,
X
A.5. SOLUTIONS FOR CHAPTER 5
151
which is positive since X is negative. You should note from the exercises we have done with
Wald’s equality that it is often used to solve for E [J] after determining E [SJ ].
For mathematical completeness, we should also verify that E [J] < 1 to use Wald’s equality.
If X is bounded, this can be done by using the Cherno↵ bound on Pr{Sn > d}. This
is exponentially decreasing in n, thus verifying that E [J] < 1, and consequently that
E [J] = d/X. If X is not bounded,
h i a truncation argument can be used. Letting X̆b be X
truncated to b, we see that E X̆b is increasing with b toward X, and is less E [X] for all b.
Thus the expected stopping time, say E [Jb ] is upper bounded by d/X for all b. It follows
that limb!1 E [Jb ] is finite (and equal to d/X). Most students are ill-advised to worry too
much about such details at first.
Exercise 5.18: Let {Xi ; i
1} be IID binary random variables with PX (0) = PX (1) = 1/2. Let J be
a positive integer-valued random variable defined on the above sample space of binary sequences and let
P
SJ = Ji=1 Xi . Find the simplest example you can in which J is not a stopping trial for {Xi ; i 1} and
where E [X] E [J] 6= E [SJ ]. Hint: Try letting J take on only the values 1 and 2.
Solution: We start with a brief discussion of how to find examples, since this is a major
part of creative research, but is practically ignored in engineering, science, and mathematics
curricula. One should start by trying to combine whatever insight one has about the problem
with the simplest possible cases, where simplicity might involve very little choice (such as
here with J = 1 or 2), or great symmetry, or very extreme cases. One then tries some
simple examples which might not provide the desired result, and tries to understand what
has gone wrong. One also, of course, uses any available hints.
In this case, using the hint, we need only think about the four pairs of binary numbers
and how to map each into J. Since we are looking for an example that is not a stopping
rule, choosing J = 1 must sometimes depend on X2 . We also want our example to satisfy
E [SJ ] 6= E [J] E [X]. We have no control over E [X], so we want to either make E [SJ ] small
or large for given E [J]. If E [SJ ] represents winnings and E [J] represents e↵ort to get those
winnings, we want to get large winnings with little e↵ort.
A good strategy is now almost self evident. When X2 = 1, we choose J = 2 and when
X2 = 0, we choose J = 1. Thus, we get a win when X2 = 1 and don’t waste the e↵ort
when X2 = 0. This requires clairvoyance, or peeking, etc., but this just means we are not
using a stopping rule. With this rule E [SJ ] = E [X1 ] + E [X2 ] = 1. Also, E [J] = 3/2 and
E [J] E [X] = 3/4, so Wald’s equality does not hold (which is not surprising since J is not a
stopping rule).
Exercise 5.19: Let J = min{n | Sn b or Sn a}, where a is a positive integer, b is a negative integer,
and Sn = X1 + X2 + · · · + Xn . Assume that {Xi ; i 1} is a set of zero-mean IID rv’s that can take on only
the set of values { 1, 0, +1}, each with positive probability.
a) Is J a stopping rule? Why or why not? Hint: The more difficult part of this is to argue that J is a
random variable (i.e., non-defective); you do not need to construct a proof of this, but try to argue why it
must be true.
Solution: For J to be a stopping trial, it must be a random variable and also, for each
n, IJ=n must be a function of X1 , . . . , Xn . For the case here, Sn = X1 + · · · +Xn is
152
APPENDIX A. SOLUTIONS TO EXERCISES
clearly a function of X1 , . . . , Xn , so the event that Sn
a or Sn  b is a function of
X1 , . . . , Xn . The first n at which this occurs is a function of S1 , . . . , Sn , which is a function
of X1 , . . . , Xn . Thus IJ=n must be a function of X1 , . . . , Xn . For J to be a rv, we must
p
show that limn!1 Pr{J  n} = 1. The central limit theorem states that (Sn nX)/ n
approaches a normalized Gaussian rv in distribution as n ! 1. Since X is given to be
p
p
p
zero-mean, Sn / n must approach normal. Now both a/ n and b/ n approach 0 as
n ! 1, so the probability that {Sn ; n > 0} (i.e., the process in the absence of a stopping
rule) remains between these limits goes to 0 as n ! 1. Thus the probability that the
process has not stopped by time n goes to 0 as n ! 1.
An alternate approach here is to model {Sn ; n 1} for the stopped process as a Markov
chain where a and b are recurrent states and the other states are transient. Then we know
that one of the recurrent states are reached eventually with probability 1.
b) What are the possible values of SJ ?
Solution: Since {Sn ; n > 0} can change only in integer steps, Sn cannot exceed a without
some Sm , m < n first equaling a and it cannot be less than b without some Sk , k < n first
equaling b. Thus SJ is only a or b.
c) Find an expression for E [SJ ] in terms of p, a, and b, where p = Pr{SJ
a}.
Solution: E [SJ ] = aPr{SJ = a} + bPr{SJ = b} = pa + (1
p)b
d) Find an expression for E [SJ ] from Wald’s equality. Use this to solve for p.
Solution: We have seen that J is a stopping trial for the IID rv’s {Xi ; i > 0}. Assuming
for the moment that E [J] < 1, Wald’s equality holds and E [SJ ] = XE [J]. Since X = 0,
we conclude that E [SJ ] = 0. Combining this with (c), we have 0 = pa + (1 p)b, so
p = b/(a b). This is easier to interpret as
p = |b|/(a + |b|).
(A.88)
The assumption that E [J] < 1 is more than idle mathematical nitpicking, since we saw
in Example 5.5.4 that E [J] = 1 for b = 1. The CLT does not resolve this issue, since
the probability that Sn is outside the limits [ b, a] approaches 0 with increasing n only as
n 1 . Viewing the process as a finite-state Markov chain with a and b viewed as a common
trapping state does resolve the issue, since, as seen in Theorem 4.5.4 applied to expected
first passage times, the expected time to reach the trapping state is finite. As will be seen
when we study Markov chains with countably many states, this result is no longer valid,
as illustrated by Example 5.5.4. This issue will become more transparent when we study
random walks in Chapter 9.
The solution in (A.88) applies only for X = 0, but Chapter 9 shows how to solve the problem
for an arbitrary distribution on X. We also note that the solution is independent of pX (0),
although pX (0) is obviously involved in finding E [J]. Finally we note that this helps explain
the peculiar behavior of the ‘stop when you’re ahead’ example. The negative threshold b
represents the capital of the gambler in that example and shows that as b ! 1, the
probability of reaching the threshold a increases, but at the expense of a larger catastophe
if the gamblers capital is wiped out.
153
A.5. SOLUTIONS FOR CHAPTER 5
Exercise 5.20: Show that the interchange of expectation and sum in (5.34) is valid if E [J] < 1. Hint:
First express the sum as
Pk 1
n=1 Xn IJ
n +
P1
+
n=k (Xn + Xn )IJ
n and then consider the limit as k ! 1.
Solution: We first prove Theorem 5.5.3 in a di↵erent and more elegant way, which eliminates the need to justify the exchange of expectation and sum in (5.34).
Let J be a stopping trial for the sequence {Xn ; n
1}. Consider a renewal process in
which the stopping trial is viewed as a renewal. That is, for a sample path {xn ; n
1}
for {Xn ; n
1}, let J1 be the n at which the stopping rule is satisfied. Consider using
the same stopping rule again, now starting at xn+1 , xn+2 , . . . . Let J2 be the value of m
for which xn+1 , xn+2 . . . . , xn+m satisfies the stopping rule. In this same way, consider the
successive trials at which the stopping rule is satisfied as an integer arrival process. Since
IJ=n is a function of X1 , . . . , Xn , and since the {Xn ; n 1} are IID, this arrival process
is actually a renewal process. Now
P let R(t) be a renewal reward process where at each
positive integer time t, R(t) = tn=1 Xn . For non-integer t, let R(t) = R(dte). By the
Rt
SLLN, limt!1 (1/t) 0 R(⌧ ) d⌧ = X with probability 1.
Now J is the length of an interrenewal interval, so the expected interrenewal interval is
E [J], which ishgiven to be
i finite. Finally the expected reward in an interrenewal interval
PJ
is E [SJ ] = E
n=1 Xn . The renewal reward theorem, Theorem 5.4.5 (generalized to
cover the reward function here as discussed at the end of Section 5.4.1), says that X =
E [SJ ] /E [J], so that E [SJ ] = XE [J], which is Theorem 5.5.3.
It is quite surprising to realize that Wald’s equality is really just a special case of the renewal
reward theorem. Wald’s equality is used to prove the elementary renewal theorem, so this
illustrates the advantage of understanding both time averages and ensemble averages, since
they are quite intimately related.
We now return to solving the problem as intended in the hint. We must pull an important
theorem, called the Lebesque monotone convergence theorem, out of the hat. This theorem
is a simple but fundamental result in measure theory, so we accept it without proof. It
provides an almost generic justification for interchanging limits and expectations. Stated
in terms of rv’s and expectations, it says the following:
Lebesque monotone convergence theorem: Let Y1 , Y2 , . . . , be a sequence of rv’s
with the properties (a) that Y1 (!)  Y2 (!)  . . . for each sample point ! and (b) that
limk!1 Yk (!) converges (is finite) for each ! (WP1). Let Y (!) = limk!1 Yk (!) for each
!. Then Y is a rv and limk!1 E [Yk ] = E [Y ].
In theP
context of (5.34), we initially assume that X (i.e., each Xn ) is nonnegative. Then let
Yk = kn=1 Xn IJ n . Since Xn IJ n is nonnegative, we see that Yk (!)  Yk+1 (!) for each
k 1 and each !. Finally, since J(!) is finite WP1, we see that limk!1 Yk (!) = YJ(!) (!),
so the second condition of the theorem is satisfied. The theorem then says that

E lim Yk = E [Y ] = lim E [Yk ] ,
k!1
k!1
which is (5.34).
Finally, if Xn has both a positive and negative part, say Xn+ = max[Xn , 0] and Xn =
min[Xn , 0], we apply the result above to {Xn+ ; n 1} and {Xn ; n 1} separately.
154
APPENDIX A. SOLUTIONS TO EXERCISES
Exercise 5.21: Consider a G/G/1 queue with inter-arrival sequence {Xi ; i
sequence {Vi ; i
a) Let Zn =
Pn
1} and service time
0}. Assume that E [X] < 1 and that E [V ] < E [X].
i=1 (Xi
Vi 1 ) and let In be the sum of the intervals between 0 and Sn =
which the server is empty. Show that In
Pn
i=1 Xi during
Zn and explain the di↵erence between them. Show that if arrival
n enters an empty system, then In = Zn .
Pn 1
Solution:
i=0 Vi is the aggregate time required by the server to serve customers 0 to
n 1. The amount of time the server has been busy is equal to this sum less the unfinished
work still in the queue and server. The aggregate time the server has been idle over (0, Sn ]
is thus Zn plus the unfinished work. If customer n enters an empty system, then there is
no unfinished work and In = Zn .
b) Assume that the inter-arrival distribution is bounded in the sense that for some b < 1, Pr{X > b} = 0.
Show that the duration of each idle period for the G/G/1 queue has a duration less than or equal to b.
Solution: Note that Sn Sn 1  b. The server was busy just after the arrival at Sn 1 and
thus the idle period was at most b.
c) Show that the number of idle intervals for the server up to time Sn is at least Zn /b. Show that the
number of renewals N r (Sn ) up to the time Sn at which an arrival sees an empty system is at least (Zn /b).
Solution: The sequence of arrivals to an empty system constitute the renewal process of
interest, so N r (Sn ) is the number Mn of idle intervals over (0, Sn ]. Note that Sn is either
the arrival time of an arrival that enters the queue or an arrival that ends an idle period,
so the number of idle intervals and the number of renewals in (0, Sn ] is the same. Since the
duration of each idle interval is at most b, Mn b In Zn , so Mn Zn /b.
r
d) Show that the renewal counting process satisfies limt!1 N t(t) = a WP1 for some a > 0. Show that the
mean number of arrivals per arrival to an empty system is J = 1/a < 1.
Solution: It was shown in Section 5.5.3 that {N r (t); t > 0} is a renewal counting process,
r
so limt!1 N t(t) exists WP1 and its value a is the reciprocal of the interval between renewals.
We must show that a > 0. We have
N r (t)
N r (Sn )
Mn
= lim
= lim
n!1
n!1
t!1
t
Sn
Sn
Mn
n
= lim
lim
n!1 n n!1 Sn
Zn
n
E [X] E [V ]
1
lim
lim
=
·
> 0,
n!1 bn n!1 Sn
b
E [X]
a =
lim
where we have used the SLLN and the strong law for renewal processes.
e) Assume everything above except that Pr{X > b} = 0 for some b. Use the truncation method on X to
establish (d) for this new case.
e =
Solution: Let E [X] h Ei[V ] = ✏ and choose b large enough that when we choose X
e
min{X, b}, we have E X
E [V ]
✏/2. Now compare an arbitrary sample function of
ei ; i
{Xi ; i
1}, {Vi ; i
0} with the truncated version {X
1}, {Vi ; i
0}. For each n
e
for which the sample values Xn = x
en < Xn xn , we see that the unfinished work in the
155
A.5. SOLUTIONS FOR CHAPTER 5
truncated system at Sen = sen increases relative to Sn = sn . When the original system has
a period where the server is idle, the unfinished work in the truncated system decreases
toward 0 while that in the original system remains equal to 0. Thus, if the truncated
system has an idle period in the interval before Sen , the untruncated system has an idle
period there also. In other words, the number of idle intervals in the truncated system up
to Sen must be less than or equal to the number of idle periods in the original system up to
e r (Sen )  N r (Sn ) for all n. Since the rate of arrivals seeing an empty system in
Sn . Thus N
the truncated version exceeds 0, that in the original system does also.
Exercise 5.22: A company has three promising approaches to the design of a new wireless application.
Unknown to the company, the first approach requires 2 weeks, the second, 4 weeks, and the third 8 weeks,
but only the first will be successful. The company hires one programmer after another for the design,
stopping when a design is successful. Let Xi be the time taken by the ith programmer, assuming that each
programmer independently chooses an approach with equal probability (i.e., pXiP
(x) = 1/3 for x = 2, 4, and
8). Let J be the number of the first programmer who is successful and let T = Ji=1 Xi be the time of the
first success.
a) Use Wald’s equality to find E [T ].
Solution: Let {Xi ; i 1} be the IID sequence of programmer times, visualing this sequence
as continuing beyond the time of the first successful program. The stopping trial J is the
smallest i for which Xi = 2. Note that J is a function only of X1 , . . . , XJ , so J is a (possibly
defective) stopping rule. We see that J = 1 with probability 1/3, J = 2 with probability
(2/3)(1/3), and in general J = n with probability (2/3)n 1 (1/3). Thus J is a geometric rv
(and thus non-defective) and E [J] = 3 (which is finite). This shows that the Wald equality
holds, so
E [T ] = E [X] E [J] =
b) Compute E
hP
i
J
i=1 Xi | J=n
1
2+4+8 E [J] = 14.
3
and show that it is not equal to E
⇥P n
i=1 Xi
⇤
.
Solution: The conditioning event {J = n} is the same as the event {X1 6= 2, . . . , Xn 1 6=
2, Xn = 2}. Given this event, X1 , . . . , Xn 1 are IID and take on the values 4 and 8 with
equal probability. Thus
E
hXn
i=1
i
Xi | J = n = (n
1)
On the other hand, viewing the sequence {Xi ; i
time,
E
hXn
1
4 + 8) + 2 = 6n
2
4.
(A.89)
1} as continuing beyond the stopping
i
14n
Xi = nE [X] =
.
i=1
3
(A.90)
The fact that (A.89) increases with n faster than (A.90) should not be surprising, since
when J is large, the Xi for i < J must all be large (4 or 8), thus enlarging the expected
value.
c) Use (b) for a second derivation of E [T ].
156
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: Note that (A.89) is E [T | J=n], so
E [T ] =
1
X
n=1
E [T | J=n] pJ (n) =
1
X
(6n
4) pJ (n) = 6E [J]
4.
n=1
Since E [J] = 3, this results in E [T ] = 14 as before.
d) Find E [T ] if the company tells successive programmers which approaches have been tried unsuccessfully
earlier.
Solution: We assume that the first programmer chooses the approaches with equal probability, that the second chooses the two remaining choices with equal probability if the first
is not successful, and the third chooses the final choice if the first two are unsuccessful. This
can be calculated by brute force, but the easy way is to note the two week approach must
always be taken before stopping. The 4 week approach will be tried before the two week
approach with probability 1/2, and the 8 week approach will be tried before the two week
approach with probability 1/2. Thus E [T ] = 2 + 4/2 + 8/2 = 8. It is not surprising that
this requires much less time than continuing to use failed approaches.
Exercise 5.23: a) Consider a renewal process for which the inter-renewal intervals have the PMF
pX (1) = pX (2) = 1/2. Let m(t) = E [N (t)] and use elementary combinatorics to show that m(1) = 1/2,
m(2) = 5/4, and m(3) = 15/8.
P
Solution: Note that m(n) = ni=1 E [Wi ] where Wi = 1 if an arrival occurs at time i and
Wi = 0 otherwise. Now W1 = 1 if and only if (i↵) X1 = 1, so E [W1 ] = 1/2 and m(1) = 1/2.
Next W2 = 1 i↵ either X1 = 2 or both X1 = 1 and X2 = 1. Thus E [W2 ] = 3/4 and
m(2) = 5/4. To calculate m(3), we get a little fancier and note that if W2 = 0, then W (3)
must be 1 and if W (2) = 1, then W (3) is 1 with probability 1/2. Thus
1
5
Pr{W3 = 1} = Pr{W2 = 0} + Pr{W2 = 1} = .
2
8
It follows that m(3) = E [W1 ] + E [W2 ] + E [W3 ] = 15/8.
⇥
⇤
⇥
⇤
b) Use elementary means to show that E SN (1) = 1/2 and E SN (1)+1 = 9/4. Verify (5.37) in this
case (i.e., for t = 1) and show why N (1) is not a stopping trial. Note also that the expected duration,
⇥
⇤
E SN (1)+1 SN (1) is not equal to X.
Solution: Note that N (1) is the number of arrivals up to and including time 1, so N (1) = 1
with probability 1/2 and N (1) = 0 with probability 1/2. Now SN (1) is the epoch of arrival
N (1). If N (1) = 1, then that arrival must have occured at epoch 1. If N (1) = 0, no arrival
has occurred and by convention
S0 = 0. Thus SN (1) is 1 with probability 1/2 and is 0 with
⇥
⇤
probability 1/2; so E SN (1) = 1/2.
The first arrival epoch after time 1, i.e., SN (1)+1 , is at time 2 if X1 = 2 and is at time 2 or
3 with equal
if X1 = 1. Thus it is 2 with probability 3/4 and 3 with probability
⇥ probability
⇤
1/4, so E SN (1)+1 = 9/4.
⇥
⇤
Eq. (5.37) says that E SN (1)+1 = E [X] (E [N (1)] + 1]. Since X = 3/2 and E [N (1)] + 1 =
3/2, (5.37) is satisfied.
157
A.5. SOLUTIONS FOR CHAPTER 5
The event {N (1) = 0} has non-zero probability but is not determined before the observation
of X1 . Thus N (1) cannot⇥ be a ⇤stopping trial for {Xn ; n 1}. Also, Wald’s equality (if it
applied) would equate E SN (1) = 1/2 with XE [N (1)] = 3/4, so it is not satisfied.
c) Generalize (a) so that Pr{X = 1} = 1 p and Pr{X = 2} = p. Let Wn = N (n) N (n 1), i.e., Wn
is 1 if an arrival occurs at time n and is 0 if no arrival occurs at n. Show that W n satisfies the di↵erence
equation W n = 1 pW n 1 for n 1 where by convention W 0 = 1. Use this to show that
Wn =
From this, solve for m(n) for n
1
( p)n+1
.
1+p
(A.91)
1 and verify the result in (a).
Solution: Note that if Wn 1 = 0, then, since each interarrival interval is either 1 or 2, we
must have Wn = 1. Alternatively, if Wn 1 = 1, then Wn is 1 with probability 1 p. Thus
Pr{Wn = 1} = Pr{Wn 1 = 0} + (1
p)Pr{Wn 1 = 1} .
Since W n = Pr{Wn = 1}, this can be rewritten as
Wn = 1
W n 1 + (1
p)W n 1 = 1
pW n 1 .
Expanding this expression,
Wn = 1
p 1
pW n 2 = · · · = 1
p + p2
p3 + · · · + ( p)n .
Now for any number y, note that
(1
y)(1 + y + y 2 + · · · + y n ) = 1
Taking p for y, we get (A.91). We can find m(n) =
powers of p in (A.91). The result is
m(n) =
n
1+p
y n+1 .
(A.92)
Pn
i=1 W i by using (A.92) to sum the
p2 (1 ( p)n
.
(1 + p)2
With some patience, this agrees with the solution in (a). This same approach, combined
with more tools about di↵erence equations, can be used to solve
⇥ for m(n)
⇤ for other integer
valued renewal processes. Finally note that we can solve for E SN (n)+1 for all n by using
Wald’s equality.
Exercise 5.24: Let {N (t); t > 0} be a renewal counting process generalized to allow for
inter-renewal intervals {Xi } of duration 0. Let each Xi have the PMF Pr{Xi = 0} = 1 ✏ ;
Pr{Xi = 1/✏} = ✏.
a) Sketch a typical sample function of {N (t); t > 0}. Note that N (0) can be non-zero (i.e.,
N (0) is the number of zero interarrival times that occur before the first non-zero interarrival
time).
Solution:
158
APPENDIX A. SOLUTIONS TO EXERCISES
5
4
3
2
1r
0
r
rN (t, !)
1/✏
r
r
2/✏
X1 = 0
X2 = 1/✏
X3 = 0
X4 = 1/✏
X5 = 1/✏
3/✏
b) Evaluate E [N (t)] as a function of t.
Solution: Arrivals occur only at nonnegative-integer multiples of 1/✏. Epoch 0 is di↵erent
than the positive multiples of 1/✏ in that the nominal initial arrival that starts the process
is not counted as part of N (t). Thus pN (0) (j) = (1 ✏)j ✏ for integer j 0. Thus E [N (0)] =
(1 ✏)/✏.
At each positive multiple ` of 1/✏, the initial arrival is counted, so the probability of j 1
arrivals at time `✏ is (1 ✏)j 1 ✏ for all j 1. Thus the expected number of arrivals at epoch
`/✏ for each ` 1 is 1/✏. We can now evaluate E [N (k/✏)] as the expected number of arrivals
at 0 plus the expected number at each `/✏, 1  `  k. Thus E [N (k✏)] = k/✏ + (1 ✏)/✏.
Finally, since arrivals occur only at multiples of 1/✏, E [N (t)] for arbitrtary t > 0 is
1
1 ✏
E [N (t)] = bt✏c +
.
✏
✏
(A.93)
c) Sketch E [N (t)] /t as a function of t.
Solution:
r2 ✏
1 ✏
⇥
⇤
1/✏
r 32 ✏
2 ✏
2
2/✏
r 43 ✏
3/✏
lower bound = 1
1
t
4/✏
d) Evaluate E SN (t)+1 as a function of t (do this directly, and then use Wald’s equality as a check on your
work).
Solution: At each epoch k/✏ with integer k 1 there are one or more arrivals, and there
are no arrivals at other times. Thus for any t 0, the epoch of the first arrival after t is
(1/✏)b✏tc + 1/✏. Thus
1
1
SN (t)+1 = b✏tc + .
✏
✏
Since N (t) is the number of arrivals up to and including t, N (t) + 1 is the first arrival after
t, so the formula above works for arbitrary t and particularly for t = k/✏. Note that for
each t > 0, SN (t)+1 is deterministic, so
⇥
⇤ 1
1
E SN (t)+1 = b✏tc + .
✏
✏
(A.94)
159
A.5. SOLUTIONS FOR CHAPTER 5
⇥
⇤
The Wald equality says E SN (t)+1 = XE [N (t) + 1]. Since X = 1, this checks with (A.93)
and (A.94).
e) Sketch the lower bound E [N (t)] /t
1/E [X]
1/t on the same graph with (c).
Solution: See (c).
⇥
f ) Sketch E SN (t)+1
Solution:
1/✏
⇤
t as a function of t and find the time average of this quantity.
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
1/✏
2/✏
3/✏
⇥
By inspection, lim⌧ !1 (1/⌧ ) 0 E SN (t)+1
R⌧
4/✏
⇥
E SN (t)+1
4/✏
⇤
t
t
⇤
t = 1/2✏.
⇥
⇤
⇥
⇤
g) Evaluate E SN (t) as a function of t; verify that E SN (t) 6= E [X] E [N (t)].
⇥
⇤
Solution: From the same argument as in (d), E SN (t) = 1✏ b✏tc. From (A.93), this is not
equal to XE [N (t)].
Exercise 5.25: Let {N (t); t > 0} be a renewal counting process and let m(t) = E [N (t)] be the expected
number of arrivals up to and including time t. Let {Xi ; i
assume that FX (0) = 0.
1} be the sequence of inter-renewal intervals and
a) For all x > 0 and t > x show that E [N (t)|X1 =x] = E [N (t
x)] + 1.
Solution: Given that the first arrival occurs at x, the number of arrivals in (x, t] forms a renewal process starting at x, and thus the expected number of arrivals in (x, t] is E [N (t x)].
The total number of arrivals in (0, t] is then E [N (t x)] + 1 as was to be shown.
Rt
b) Use ( a) to show that m(t) = FX (t) + 0 m(t
x)dFX (x) for t > 0. This equation is the renewal equation,
derived di↵erently in (5.55).
Solution: Assume that X has a density, fX (x), for initial simplicity. Then
Z t
m(t) =
E [N (t|X1 = x)] fX (x) dx
0
Z t
=
m(t x) + 1 fX (x) dx
0
Z t
= FX (t) +
m(t x)fX (x) dx.
(A.95)
0
This same argument works for arbitrary X by using Stieltjes integration in place of the
integrals above.
c) Suppose that X is an exponential random variable of parameter . Evaluate Lm (s) from (5.57); verify
that the inverse Laplace transform is t; t 0.
Solution: Evaluating the Laplace transform
R t of both sides of (A.95) and recognizing that
FX (x) is the integral of fX (x) and that 0 m(t x)fX (x) is the convolution of m(t) and
160
APPENDIX A. SOLUTIONS TO EXERCISES
fX (x), we get
Lm (s) =
LX (s)
+ Lm (s)LX (s).
s
Solving this equation for Lm (s),
Lm (s) =
LX (s)
.
s 1 LX (s)
(A.96)
The Laplace transform of an exponential with density e x for x 0 is LX (s) = /( +s).
Substituting this into (A.96) yields Lm (s) = /s2 . The inverse Laplace transform of this is
t for t 0.
Exercise 5.26: a) Let the inter-renewal interval of a renewal process have a second order Erlang density,
fX (x) =
2
x exp(
x). Evaluate the Laplace transform of m(t) = E [N (t)].
Solution: The Laplace transform of fX (s) is
Z 1
2
LX (s) =
x exp ( +s)x dx =
0
2
( +s)2
.
We then evaluate Lm (s) from (5.57) to get
Lm (s) =
b) Use this to evaluate m(t) for t
LX (s)
s 1 LX (s)
=
2
s3 + 2 s2
=
2
s2 (s + 2 )
.
0. Verify that your answer agrees with (5.59).
Solution: Note that Lm (s) has a second order pole at s = 0 and the residue (the value of
s2 Lm (s) at s = 0 is /2. The remainder leads to another single order pole at s = 0 and a
pole at s = 2 , leading to the Heaviside expansion
Lm (s) =
2s2
1
1
+
.
4s 4(s + 2 )
The inverse Laplace transform is then
m(t) =
t
2
1 1 2 t
+ e
4 4
for t
0.
For the second order Erlang, X = 2/ and 2 = 2/ 2 . Substituting this in (5.59) yields the
non-vanishing terms t/2 1/4, in agreement with the result here.
c) Evaluate the slope of m(t) at t = 0 and explain why that slope is not surprising.
Solution: The slope of m(t) at 0, i.e., limt!0+ dm(t)
dt , is 0, which means that the expected
number of renewals in (0, ] is o( ). This is not surprising since fX ( ) is also o( ).
d) View the renewals here as being the even numbered arrivals in a Poisson process of rate . Sketch m(t)
for the process here and show one half the expected number of arrivals for the Poisson process on the same
sketch. Explain the di↵erence between the two.
161
A.5. SOLUTIONS FOR CHAPTER 5
Solution: The first arrival of a second order Erlang distribution has the distribution of the
second arrival in a Poisson process, and successive arrivals from the Erlang distribution then
have the distribution of the even numbered Poisson arrivals. To sketch m(t) for the process
here, note that m(0) = 0 with an initial slope of 0. Asymptotically, m(t) = /2t 1/4 and
the slope is increasing from 0 to /2. This gives rise to the sketch below.
⌘⌘
⌘
t/2 ⌘
m(t)
⌘
⌘
⌘
⌘
⌘
⌘
The number of even arrivals in (0, t] is always either one less than the number of odd arrivals
or the same, asymptotically each half the time. Since the even plus the odd arrivals sum
to the arrivals of a Poisson process of rate t, the expected number of even arrivals lags
slightly behind half the overall expected number, as shown in the above curve.
Exercise 5.27: a) Let N (t) be the number of arrivals in the interval (0, t] for a Poisson process of rate .
Show that the probability that N (t) is even is [1 + exp( 2 t)]/2. Hint: Look at the power series expansion
P
of exp( t) and that of exp( t), and look at the sum of the two. Compare this with n even Pr{N (t) = n}.
Solution: The power series expansions are
e
t
e t
( t)2 ( t)3
( t)n
+ ··· +
+ ···
2
3!
n!
( t)2 ( t)3
( t)n
= 1+ t+
+
+ ··· +
+ ··· .
2
3!
n!
= 1
t+
Summing these two equations and dividing by 2,
⌘
X ( t)n
1⇣
e t+e t =
.
2
n!
n even
Multiplying both sides by e
t,
i
X ( t)n e
1h
1+e 2 t =
2
n!
n even
t
=
X
Pr{N (t) = n} .
(A.97)
e (t) be the number of even numbered arrivals in (0, t]. Show that N
e (t) = N (t)/2
b) Let N
Iodd (t)/2 where
n even
Iodd (t) is a random variable that is 1 if N (t) is odd and 0 otherwise.
Solution: When Iodd (t) = 0, N (t) is even and the number of odd numbered arrivals is
e (t) = N (t). When Iodd (t) = 1, the
equal to the number of even-numbered arrivals; thus 2N
number of odd-numbered arrivals is one more than the number of even-numbered arrivals;
e (t) + 1 = N (t). In both cases, then, 2N
e (t) + Iodd (t) = N (t). Dividing by 2 gives
thus 2N
the desired result.
h
i
e (t) . Note that this is m(t) for a renewal process with second-order Erlang
c) Use (a) and (b) to find E N
inter-renewal intervals.
162
APPENDIX A. SOLUTIONS TO EXERCISES
h
i
e (t) = 1 E [N (t)]
Solution: From (b), E N
2
1
2 E [Iodd (t)] . But E [Iodd (t)] is 1 minus the
⇥
⇤
probability that N (t) is even, so from (A.97), E [Iodd (t)] = 12 1 e 2 t . Thus
h
i
e (t) = t
E N
2
1 1 2 t
+ e
.
4 4
Note that this is the expected number of arrivals for the Erlang second-order arrival process
derived by Lagrange multiplier methods in Exercise 5.26
Exercise 5.28: a) Consider a function r(z)
0 defined as follows for 0  z < 1: For each integer n
1
and each integer k, 1  k < n, r(z) = 1 for n + k/n  z  n + k/n + 2 n . For all other z, r(z) = 0. Sketch
this function and show that r(z) is not directly Riemann integrable.
Solution:
r(z)
0
1
2
3
4
5
6
For any , and all n > 2/ , the increments of size all contain points equal to 0 and points
equal to 1. Thus the Riemann upper sum converges to 1 and the lower sum is zero. This
shows that the function is not directly Riemann integrable. The problem here is not the
discontinuitites, but rather the increasingly fast variations as t ! 1.
b) Evaluate the Riemann integral
R1
0
r(z)dz.
Solution: By definition, the Riemann integral
t,
Z t+1
r(z) dz =
0
R1
0
Rt
r(z) dz = limt!1 0 r(z) dz. For integer
t
X
(k
1)2 k .
k=1
This is clearly finite for all integer t > 0 and the integral is non-decreasing for t both integer
and non-integer. The limit as t ! 1 is 1. Note that for direct Riemann integration, we
take the limit over t ! 1 first and then the limit over . For ordinary Riemann integration,
the limit over is taken first and the limit over t second.
c) Suppose r(z) is decreasing, i.e., that r(z)
r(y) for all y > z > 0. Show that if r(z) is Riemann
integrable, it is also directly Riemann integrable.
Solution: Assume r(z) is non-increasing and Riemann integrable over [0, 1). We are
assuming r(z)
0 throughout
the problem
integrable
R1
R t (and if not, r cannot be Riemann
R1
in this part). Thus 0 r(z) dz = limt!1 0 r(z) dz exists. If we define t r(z) dz to be
R1
Rt
R1
R1
dz, we see that limt!1 t r(z) dz = 0. Note that t r(z) dz is also
0 r(z) dz
0 r(z)
Rt
given by limt1 !1 t 1 r(z) dz.
Consider any partition P t = z0 , z1 , . . . , of [t, 1) where z0 = t and 0 < zi+1 zi  for
i 0. Then r(zi ) r(z) r(zi+1 ) for zi  z  zi+1 . It follows that the upper Riemann
163
A.5. SOLUTIONS FOR CHAPTER 5
sum, say U (P t ) and the lower Riemann sum L(P t ) are given by
U (P t ) =
1
X
r(zi ) zi+1
zi ;
L(P t ) =
i=0
The di↵erence U (P t )
1
X
r(zi+1) zi+1
zi .
(A.98)
i=0
L(P t ) is then
U (P )
t
L(P ) =
t

1
X
i=0
1
X
r(zi )
r(zi+1 ) zi+1
r(zi )
r(zi+1 )
zi
= r(t)
i=0
⇥
Since ⇤r is Riemann integrable and non-increasing, limt!1 r(t) = 0. Thus limt!1 U (P t )
L(P t ) = 0. Taking the limit t ! 1, we see that the direct Riemann integral from t to 1
exists. This implies that the direct Riemann integral over [0, t) exists also.
d) Suppose f (z)
z
0, defined for z
0, is decreasing and Riemann integrable and that f (z)
r(z) for all
0. Show that r(z) is Directly Riemann integrable.
Solution: For any t and any partition P t for t to 1, we can apply (A.98) to f , showing
that Uf (P t ) Lf (P t )  f (t). The Riemann and direct Riemann integral of f over t to 1
exist, are the same, and lie between the upper and lower Riemann sums, so
Z 1
t
Uf (P )  f (t) +
f (z) dz.
t
Since r(z)  f (z), it is easy to see that with any partition P t , Ur (P t )  Uf (P t ). Similarly,
Lr (P t ) 0, so
Z 1
t
t
Ur (P ) Lr (P )  f (t) +
f (z) dz.
t
⇥
Again, taking the limit t ! 1, we see that limt!1 Ur (P t )
Riemann integral exists.
⇤
Lr (P t ) = 0, so the direct
⇥ ⇤
e) Let X be a nonnegative rv for which E X 2 < 1. Show that x FcX (x) is directly Riemann integrable. Hint:
R1
Consider yFcX (y) + y FcX (x) dx and use Figure 1.7 (or integration by parts) to show that this expression is
decreasing in y.
R1
Solution: The fact that yFcX (y) + y FcX (x) dx is non-increasing in y is obvious geometrically from Figure 1.7. It can be shown using integration by parts that
Z 1
Z 1
⇥ 2⇤
c
E X =
yFX (y) +
FcX (x) dx dy.
0
y
Thus the above Riemann integral exists. The argument above is an upper bound to yFcX (y)
for each y, and thus by (d), yFcX (y)⇥is directly
Riemann integrable. It is curious that the
⇤
integral of yFcX (y) from 0 to 1 is E X 2 /2.
164
APPENDIX A. SOLUTIONS TO EXERCISES
e
Exercise 5.29: Let Z(t), Y (t), X(t)
denote the age, residual life, and duration at time t for a renewal
counting process {N (t); t > 0} in which the interarrival time has a density given by f (x). Find the following
probability densities; assume steady state.
a) fY (t) (y | Z(t+s/2)=s) for given s > 0.
Solution: The condition {Z(t+s/2)=s} is the event that an arrival, say Sn , occured at
(t+s/2) s = t s/2 and that no further arrival occured in the interval (t s/2, t+s/2] (this
assumes that t s/2, which is implicit in the notion of steady state). Thus the conditional
residual life Y (t) at t must be greater than s/2 and, for t > s/2, Y (t) = Xn+1 (t + s/2). (see
the diagram below).
Sn
-Y (t)
6
Z(t+s/2) = s
t
-Xn+1 > s
s
2
t + 2s
t
Thus, adding the condition that Xn+1 > s,
fY (t) (y | Z(t+s/2)=s) =
fX (y + s/2)
;
FcX (s)
for y > s/2.
This is valid without the steady-state assumption so long as t
s/2.
b) fY (t),Z(t) (y, z).
Solution: First review the joint distribution of age and duration given in (5.77) for the nonarithmetic case. In steady state, the arrival rate is 1/X, so for given z > 0, the probability
of an arrival in the incremental interval [t z, t z + dz) is dz/X + o(dz). Given such
an arrival at t z, the probability that the next arrival is in [t z + x, t z + x + dx) is
fX (x) dx. If x > z, this corresponds to an age z and duration x. Thus, putting (5.77) in
e
PDF form, the joint density is 0 except for X(t)
> Z(t) 0, where it is
e
Since Y (t) = X(t)
where it is
fZ(t)X(t)
e (z, x) =
fX (x)
;
X
for x > z
0.
(A.99)
Z(t), the joint density of Y (t), Z(t) is 0 except for Y (t) > 0, Z(t)
fZ(t)Y (t) (z, y) =
fX (y + z)
;
X
for y > 0, z
0,
0.
An alternate approach is to observe that for z
0, y > 0, t > z, the joint density of
fZ(t)Y (t) (z, y) is the joint density of ‘an arrival’ at t z and the next arrival at t + y; more
specifically, this is the sum over n
1 of the density of the nth arrival at t z and the
(n + 1)th arrival at t + y, i.e.,
X
fZ(t)Y (t) (z, y) =
fSn (t z)fX (y + z).
n 1
P
In the limit t ! 1, Blackwell essentially says that n 1 fSn (t z) = 1/X. This same
approach can be used throughout the exercise and clarifies the meaning of ‘arrival density.’
165
A.5. SOLUTIONS FOR CHAPTER 5
c) fY (t)|X(t)
(y | x).
e
e
Solution: Replacing Z(t) in (A.99) by Y (t) = X(t)
Z(t) (where the dependence of Z(t)
e
e
appears only in the condition X(t) > Z(t) 0), the joint density of Y (t), X(t)
is 0 except
e
for X(t)
Y (t) > 0, where it is
fY (t)X(t)
e (z, x) =
fX (x)
;
X
for 0 < y  x.
e
Conditional on X(t)
= x, we see that the joint density is constant for 0  Y (t) < x, so
fY (t)|X(t)
e (y | x) =
1
;
x
for 0 < y  x.
d) fZ(t)|Y (t s/2) (z | s) for given s > 0.
Solution: This is the essentially the same as (a), with the roles of Z(t) and Y (t) interchanged (but the steady state assumption is needed here). Using the same argument,
fZ(t)|Y (t s/2) (z | s) =
e) fY (t) (y | Z(t+s/2)
fX (z + s/2)
;
FcX (s)
for z
s/2.
s) for given s > 0.
Solution: By a slight modification of (a) and (b), we see that the joint density of Y (t) and
Z(t+s/2) is
fY (t),Z(t+s/2) (y, z) =
1
fX (y + z
X
s/2);
for z
s/2, y > s/2.
By integrating this over z
s/2, we get the joint probability that Z(t+s/2)
s/2 and
probability density that Y (t) = y.
Z 1
Z
Fc (y + s/2)
1 1
fY (t)Z(t+s/2) (y, z) dz =
fX (y+z s/2) dz = X
.
X s
X
s
The conditional density of Y (t) conditional on Z(t + s/2))
s is this result, divided by
Pr{Z(t+s/2) s}. From the steady state assumption, Pr{Z(t+s/2) s} = Pr{Z(t) s}.
Thus, using (5.88),
Z
1 1 c
Pr{Z(t+s/2) s} = Pr{Z(t) s} =
FX (⌧ ) d⌧.
X s
Thus
fY (t) (y | Z(t+s/2)
FcX (y + s/2)
s) = R 1
;
c
s FX (⌧ ) d⌧
Exercise 5.30: a) Find limt!1 {E [N (t)]
inter-renewal times {Xi ; i
for y > s/2.
t/X} for a renewal counting process {N (t); t > 0} with
1}. Hint: use Wald’s equation.
166
APPENDIX A. SOLUTIONS TO EXERCISES
⇥
⇤
Solution: Use Wald’s equation, stopping on the first arrival after t. Thus E SN (t)+1 =
XE [N (t) + 1] and
⌘
⇥
⇤
⇤
1⇣ ⇥
1
E N (t) t/X =
E SN (t)+1
t
1 =
E [Y (t)] 1,
X
X
where Y (t) = SN (t)+1 t is the residual life at t. As seen in detail in Section 5.7.4
for age and in less detail for residual life in Section 5.7.5, this has a limit if X is nonarithmetic (and typically ⇥does⇤ not have a limit otherwise). Assuming the nonarithmetic
case, limt!1 E [Y (t)] = E X 2 /2X. Thus
⇥ ⇤
⇥
⇤
E X2
lim E N (t) t/X =
1.
(A.100)
2
t!1
2X
b) Evaluate your result for the case in which X is an exponential random variable (you already know what
the result should be in this case).
Solution:
For an exponential rv X with f(x) = e x for x
0, we have X = 1/ and
⇥ 2⇤
E X = 2/ 2 . Thus the expression in (A.100) is zero, and we already know from the
Poisson process that E [N (t)] = t for all t > 0. Thus this case serves as a sanity check on
the general result.
⇥ ⇤
c) Evaluate your result for a case in which E [X] < 1 and E X 2 = 1. Explain (very briefly) why this
does not contradict the elementary renewal theorem.
⇥ 2⇤
Solution: If E [X] < 1 and
⇥ E X =⇤ 1, then the limit in (A.100) is infinite, but the
result does not show how E N (t) ⇥t/X approaches
1 as t ! 1. The elementary
renewal⇤
⇤
⇥
theorem says that limt!1 (1/t) E N (t) t/X
= 0, which shows that E N (t) t/X
must approach 1 as t ! 1 more slowly than linearly in t.
Exercise 5.31: Customers arrive at a bus stop according to a Poisson process of rate . Independently,
buses arrive according to a renewal process with the inter-renewal interval CDF FX (x). At the epoch of a
bus arrival, all waiting passengers enter the bus and the bus leaves immediately. Let R(t) be the number of
customers waiting at time t.
a) Draw a sketch of a sample function of R(t).
Solution: Let Sn = X1 + X2 + · · · be the epoch of the nth bus arrival and Cm the epoch of
the mth customer arrival. Then {Cm ; m 1} is the sequence of arrival times in a Poisson
process of rate and {Sn ; n 1} is the renewal time sequence of a renewal process.
R(t)
C1
C2 C3
C4
S1
C5
C6
t
S2
b) Given that the first bus arrives at time X1 = x, find the expected number of customers picked up; then
⇥R x
⇤
find E 0 R(t)dt , again given the first bus arrival at X1 = x.
Solution: The expected number of customers picked up, given S1 = x is the expected
number of arrivals in the Poisson process by x, i.e., x. For t < x, R(t) is the number of
167
A.5. SOLUTIONS FOR CHAPTER 5
customers that have arrived by time t. R(t), for t < S1 , is independent of S1 , so
E
Z x
0
c) Find limt!1 1t
Rt
0
R(t) dt
=
Z x
x dx =
0
x2
.
2
R(⌧ )d⌧ (with probability 1). Assuming that FX is a non-arithmetic distribution, find
limt!1 E [R(t)]. Interpret what these quantities mean.
Solution: In (b), we found the expected reward in an inter-renewal interval of size x, i.e.,
we took the expected value over the Poisson process given a specific value ⇥of X⇤1 . Taking
the expected value of this over the bus renewal interval, we get (1/2) E X 2 . This is
the expected accumulated reward over the first bus inter-renewal period, denoted E [R1 ] in
(5.22). It is the expected number of customers at each time, integrated over the time until
the first bus arrival. This integrated value is not the expected number waiting for a bus,
but rather will be used as a step in finding the time-average number waiting over time.
Since the bus arrivals form a renewal process, this is equal to E [Rn ] for each inter-renewal
period n. By (5.24),
⇥ ⇤
Z
E X2
1 t
E [Rn ]
lim
R(⌧ )d⌧ =
=
WP1.
(A.101)
t!1 t 0
X
2X
Since the renewals are non-arithmetic, this is the same as limt!1 E [R(t)].
Note that this is not the same as the expected number of customers per bus. It is the
expected number of waiting customers, averaged over time. If a bus is highly delayed, a
large number of customers accumulate, but also, because of the large interval waiting for
the bus, the contribution to the time average grows as the square of the wait.
d) Use (c) to find the time-average expected wait per customer.
Solution: Little’s theorem can be applied here since (A.101) gives the limiting time-average number of
customers
in the system, L. The time-average wait per customer W is W = L/ , which from (A.101) is
⇥ ⇤
E X 2 /2X. The system here does not satisfy the conditions of Little’s theorem, but it is easy to check that
the proof of the theorem applies in this case.
e) Find the fraction of time that there are no customers at the bus stop. (Hint: this part is independent of
(a), (b), and (c); check your answer for E [X] << 1/ ).
Solution: There are no customers at the bus stop at the beginning of each renewal period.
Let Un be the interval from the beginning of the nth renewal period until the first customer
arrival. It is possible that no customer will arrive before the next bus arrives, so the interval
within the nth inter-renewal period when there is no customer waiting is min(Un , Xn ).
Consider a reward function R(t) equal to 1 when no customer is in the system and 0
otherwise. The accumulated reward Rn within the nth inter-renewal period is then Rn =
min(Un , Xn ). Thus, using the independence of Un and Xn ,
Z 1
Z 1
E [Rn ] =
Pr{min(Un , Xn ) > t} dt =
Pr{(Un > t} Pr{Xn > t} dt
0
0
Z t
=
Pr{Xn > t} e t dt.
0
168
APPENDIX A. SOLUTIONS TO EXERCISES
Using R(5.24), the limiting time-average fraction of time when no customers are waiting is
t
(1/X) 0 Pr{Xn > t} e t dt. Checking when X << 1/ , we see that the above integral is
close to X. In particular, if Xn is exponential with rate µ, we see that the fraction above
is µ/(µ + ).
Exercise 5.32: Consider the same setup as in Exercise 5.31 except that now customers arrive according
to a non-arithmetic renewal process independent of the bus arrival process. Let 1/ be the expected interrenewal interval for the customer renewal process. Assume that both renewal processes are in steady state
(i.e., either we look only at t
0, or we assume that they are equilibrium processes). Given that the nth
customer arrives at time t, find the expected wait for customer n. Find the expected wait for customer n
without conditioning on the arrival time.
Solution: The expected time customer n waits for a bus, given that Cn = t, where Cn is
n’s arrival time, is the residual life of the bus process at time t. Since the bus process is in
equilibrium,
this is the same as the time-average residual life of the bus process, which is
⇥ ⇤
E X 2 /2X. This does not depend on t, and is thus the unconditional expected wait.
Exercise 5.33: Let {N1 (t); t > 0} be a Poisson counting process of rate . Assume that the arrivals from
this process are switched on and o↵ by arrivals from a non-arithmetic renewal counting process {N2 (t); t > 0}
(see figure below). The two processes are independent.
rate
rate
A
A
A
On A
A
A
A
A A
A
On A
A
A
A
N2 (t)
-A
On
A
N1 (t)
A
NA (t)
Let {NA (t); t 0} be the switched process; that is NA (t) includes arrivals from {N1 (t); t > 0} while N2 (t)
is even and excludes arrivals from {N1 (t); t > 0} while N2 (t) is odd.
a) Is NA (t) a renewal counting process? Explain your answer and if you are not sure, look at several
examples for N2 (t).
Solution: NA (t) is not a renewal process. An easy way to see this is to start by letting
N2 (t) be deterministic with = 1 and let be very large. Then NA (t) contains large
numbers of arrivals from each even integer to the subsequent odd integer and none from
odd to even. The inter-renewal interval for NA (t) then depends on the placement of each
arrival relative to the next even integer.
This argument doesn’t quite work since N2 (t) is arithmetic here. This is cured by letting
the inter-renewal interval for process 2 be uniform between 0.999 and 1.001. Then the
interarrivals for NA (t) are still bunched into groups separated by approximately 1 and it
is clear that the interarrivals close to the end of a group have a di↵erent distribution than
those close to the beginning, i.e., the interarrival times of NA (t) are not IID.
Note, as a comment on doing research, that if one doesn’t start with a simple arithmetic
process here, then it is difficult to get the insight to construct a correct non-arithmetic
process.
b) Find limt!1 1t NA (t) and explain why the limit exists with probability 1. Hint: Use symmetry—that
169
A.5. SOLUTIONS FOR CHAPTER 5
is, look at N1 (t)
NA (t). To show why the limit exists, use the renewal-reward theorem. What is the
appropriate renewal process to use here?
Solution: Let {N2e (t); t > 0} be the arrival process of the even numbered arrivals of
{N2 (t); t > 0. This is seen to be another non-arithmetic renewal process since each interrenewal interval is the sum X1 + X2 of 2 inter-renewal intervals from N2 (t). Then consider
a reward R(t) of 1 for the initial part of an inter-renewal interval of the N2e process until
the subsequent odd renewal of {N2 (t); t > 0}. Let R(t) = 0 from this odd renewal until the
subsequent renewal
of N2e (t). The time-average renewal reward theorem then shows that
Rt
limt!1 (1/t) 0 R(⌧ ) d⌧ exists WP1. The symmetry between R(t) and 1 R(t) shows that
this limit is 1/2. From the memoryless property of the Poisson process, then
lim
1
t!1 t
NA (t) = /2
WP1.
c) Now suppose that {N1 (t); t>0} is a non-arithmetic renewal counting process but not a Poisson process
and let the expected inter-renewal interval be 1/ . For any given , find limt!1 E [NA (t + )
NA (t)] and
explain your reasoning. Why does your argument in (b) fail to demonstrate a time average for this case?
Solution: The argument in (b) fails here because the arrivals in N1 (t) are not memoryless
here and thus the number of process 1 arrivals in a given even to odd period of N2 (t) might
depend on the duration of other renewal intervals from N2 (t).
Applying Blackwell’s theorem to N2 (t), limt!1 E [N2 (t + ) N2 (t)] = . For all large
enough t for the given , E [N2 (t + ) N2 (t)] =
± o( ). From the Markov inequality,
then, Pr{N2 (t + ) N2 (t) 1} 
± o( ). This means that N2 (⌧ ) is even throughout
t  ⌧ < t + with probability (1/2)(1
± o( )).
If N2 (⌧ ) is even for [t, t + ), then NA (t + ) NA (t) = N1 (t + ) N1 (t). If N2 (⌧ ) is odd
for [t, t + ), then NA (t + ) NA (t) = 0. The remaining case, where N2 (⌧ ) changes is a
low probability event, but NA (t + ) NA (t) = 0 is between the above two values in this
event. Thus
E [NA (t + )
NA (t)] =
=
(1
1
± o( ))
E [N1 (t + ) N1 (t)]
2
± o( )
± o( )
± o( )
=
,
2
2
where in the last step we have used Blackwell’s theorem on N1 (t). By adding together an
arbitrarily large number of these incremental intervals and going to the limit, we get
lim E [NA (t + )
t!1
NA (t)] =
/2.
Exercise 5.34: An M/G/1 queue has arrivals at rate
and a service time distribution given by FY (y).
Assume that < 1/E [Y ]. Epochs at which the system becomes empty define a renewal process. Let FZ (z)
be the CDF of the inter-renewal intervals and let E [Z] be the mean inter-renewal interval.
a) Find the fraction of time that the system is empty as a function of
mean by such a fraction.
and E [Z]. State carefully what you
170
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: Consider a renewal reward process where the reward is 1 when the system is
empty and 0 otherwise. At the beginning of each renewal period, the system remains
empty until the first subsequent arrival; since the arrival process is Poisson, that interval
is exponential with a mean time 1/ . Thus the expected aggregate reward per renewal
interval is 1/ and the time-average reward, WP1, is 1/ E [Z]. Thus fidle , the fraction of
time (WP1) that the server is idle is
fidle =
1
.
E [Z]
(A.102)
This means that for all !, except perhaps a set of probability 0, limt!1 (1/t) [aggregate
time that sample path ! is idle from 0 to t] = 1/ Z.
b) Apply Little’s theorem, not to the system as a whole, but to the number of customers in the server (i.e.,
0 or 1). Use this to find the fraction of time that the server is busy.
Solution: Little’s theorem says that L = W . Here L is the time-average number of
customers in the server and W is the time-average service time WP1. Little also says that
each of these converge WP1. Since the server is busy when it contains one customer and
idle when it contains none, L is fbusy , the fraction of time (WP1) that the server is busy..
Similarly, since the expected service time is Y and successive services are IID, W = Y .
Thus fbusy = Y .
c) Combine your results in (a) and (b) to find E [Z] in terms of
the system is idle in terms of
Solution: Since fbusy = 1
with (A.102), we get
and E [Y ]; give the fraction of time that
and E [Y ].
fidle and fbusy = Y , we have fidle = 1
E [Z] =
1
(1
Y)
Y , Combining this
.
d) Find the expected duration of a busy period.
Solution: Each renewal interval contains one idle period and one busy period. Thus the
expected duration of a busy period is E [Z] 1/ = 1 Y Y .
Exercise 5.35: Consider a sequence X1 , X2 , . . . of IID binary rv’s with Pr{Xn = 1} = p1 and Pr{Xn = 0} =
p0 = 1
p1 . A renewal is said to occur at time n
2 if Xn 1 = 0 and Xn = 1.
a) Show that {N (n); n > 0} is a renewal counting process where N (n) is the number of renewals up to and
including time n.
Solution: Let Y1 , Y2 , . . . be the intervals between successive occurrences of the pair (0,1).
Note that Y1 2 since the first binary rv occurs at time 1. Also Yn 2 for each n > 1, since
if there is an occurrence at time n (i.e., if Xn 1 = 0, Xn = 1, then the pair Xn = 0, Xn+1 = 1
is impossible. The length of each inter-arrival interval is determined solely by the bits within
that interval, which are IID both among themselves and between intervals. Thus the interarrival intervals are IID and are thus renewals, so the counting process is also a renewal
process.
b) What is the probability that a renewal occurs at time n, n
2?
171
A.5. SOLUTIONS FOR CHAPTER 5
Solution: The probability of a renewal at time n
2 is Pr{Xn 1 = 0, Xn = 1} = p0 p1 .
c) Find the expected inter-renewal interval; use Blackwell’s theorem.
Solution: The inter-renewal distribution is arithmetic with span 1 and the expected number
of renewals in the interval (n 1, n] is simply the probability of a renewal at time n, i.e.,
Pr{Xn 1 = 0, Xn = 1} = p0 p1 . Thus, from Blackwell’s theorem,
E [N (n)]
E [N (n 1)] = p0 p1 =
1
.
E [Y ]
The expected inter-renewal interval is then 1/po p1 ; this is also the expected time until the
first occurrence of (0,1). It is not hard to find E [Y ] by elementary means without using
Blackwell, but the advantage of using Blackwell’s theorem is clear in (g) where the string
of interest has an arbitrary length.
d) Now change the definition of renewal; a renewal now occurs at time n if Xn 1 = 1 and Xn = 1. Show
that {N ⇤ (n); n
0} is a delayed renewal counting process where Nn⇤ is the number of renewals up to and
including n for this new definition of renewal.
Solution: Let Y1 , Y2 , . . . now be the inter-arrival intervals between successive occurrences
of the pair (1,1). Note that Y1 2 as before since the first binary rv occurs at time 1, so the
string (1,1) cannot appear until time 2. In this case, however, it is possible to have Yi = 1
for i 2 since if (Xm 1 , Xm , Xm+1 ) = (1, 1, 1), then a renewal occurs at time m and also
at m + 1.
Since successive occurrences of (1,1) can overlap, we must be a little careful to show that
Y1 , Y2 , . . . are independent. Given Y1 , . . . , Yi , we see that Yi+1 = 1 with probability p1 , and
this is independent of Y1 , . . . , Yi , so the event {Yi+1 = 1} is independent of Y1 , . . . , Yi . In
the same way {Yi+1 = k} is independent of Y1 , . . . , Yi , so Yi+1 is independent of Y1 , . . . , Yi
for all i 2, establishing the independence. The fact that Y2 , Y3 , . . . are equally distributed
is obvious as before. Thus N ⇤ is a delayed renewal process.
e) Find E [Yi ] for i
2 for the case in (d).
Solution: Using the same argument as in (c), E [Yi ] = p12 for i
1
2.
f ) Find E [Y1 ] for the case in (d), Hint: Show that E [Y1 |X1 = 1] = 1 + E [Y2 ] and E [Y1 |X1 = 0] = 1 + E [Y1 ].
Solution: If X1 = 1, then the wait for a renewal, starting at time 1, has the same probabilities as the wait for a renewal starting at the previous renewal. If X1 = 0, then the wait,
starting at time 1 is the same as it was before. Thus
E [Y1 ] = 1 + p1 E [Y2 ] + p0 E [Y1 ] .
Combining the terms in E [Y1 ], we have p1 E [Y1 ] = 1 + p1 E [Y2 ], so
E [Y1 ] =
1
1
1
+ E [Y2 ] =
+ 2.
p1
p1 p1
g) Looking at your results above for the strings (0,1) and (1,1), show that for an arbitrary string a =
(a1 , . . . , ak ), the arrival process of successive occurrences of the string is a renewal process if no proper suffix
of a is a prefix of a. Otherwise it is a delayed renewal process.
172
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: For the 2 strings looked at, (0,1) has the property that no proper suffix is a
prefix, and (1,1) has the property that the suffix (1) is equal to the prefix (1). In general,
if there is a proper suffix of length i equal to a prefix of length i, then, when the string
occurs, the final i bits of the string provide a prefix for the next string, which can then
have a length as small as k i. Since the first occurrence of the string can not appear
until time k, The arrival process is a delayed renewal process, where the independence and
the identical distribution after the first occurrence follow the arguments for the strings of
length 1 above. If no proper suffix of a is equal to a prefix of a, then the end of one string
provides no beginning for a new string and the first renewal is statistically identical to the
future renewals; i.e., the process is a renewal process.
h) Suppose a string a = (a1 , . . . ak ) of length k has no proper suffixes equal to a prefix. Show that the time
to the first renewal satisfies
1
E [Y1 ] = Qk
.
(A.103)
`=1 pa`
Solution: As shown in (g), the successive occurrences of a constitute a renewal process
(rather than a delayed renewal process). The denominator in (A.103) is then the probability
of a renewal at any given time n k, and the result follows from Blackwell’s theorem as in
the earlier parts.
i) Suppose the string a = (a1 , . . . , ak ) has at least one proper suffix equal to a prefix, and suppose i is the
length of the longest such suffix. Show that the expected time until the first occurrence of a is given by
E [Y1 ] = Qk
1
`=1 pa`
+ E [Ui ] ,
(A.104)
where E [Ui ] is the expected time until the first occurrence of the string (a1 , . . . , ai ).
Solution: The first term above is the expected inter-renewal interval, which follows from
Blackwell. If a renewal occurs at time n, then, since (a1 , . . . , ai ) is both a prefix and suffix of
a, we must have (Xn i+1 , . . . , Xn ) = (a1 , . . . , ai ). This means that an inter-renewal interval
is the time from an occurrence of (a1 , . . . , ai ) until the first occurrence of (a1 , . . . , ak ).
Starting from time 0, it is necessary for (a1 , . . . , ai ) to occur before (a1 , . . . , ak ), so we can
characterize the time until the first occurrence of a as the sum of the time to the first
occurrence of (a1 , . . . , ai ) plus the time from that first occurrence to the first occurrence of
(a1 , . . . , ak ); as explained in greater detail in Example 4.5.1, this is statistically the same
as an inter-renewal time.
j) Show that the expected time until the first occurrence of a = (a1 , . . . , ak ) is given by
E [Y1 ] =
k
X
i=1
Qi
Ii
`=1 pa`
,
(A.105)
where, for 1  i  k, Ii is 1 if the prefix of a of length i is equal to the suffix of length i. Hint: Use (A.104)
recursively. Also show that if a has a suffix of length i equal to the prefix of length i and also a suffix of
length j equal to a prefix of length j where j < i, then the suffix of (a1 , . . . , ai ) of length j is also equal to
the prefix of both a and (a1 , . . . , ai ) of length j.
Solution: Note that the suffix of a of length k is not a proper suffix and is simply equal
to a itself and thus also equal to the prefix of a of length k. This is the final term of the
expression above and also the first term in (A.104).
173
A.5. SOLUTIONS FOR CHAPTER 5
The next smaller value of i for whch Ii = 1 is the largest i for which a proper suffix of a
of length i is equal to a prefix. If we apply (A.104) to finding the expected first occurrence
of a1 , . . . , ai for that i, we see that it is the next to final non-zero term in the above sum
plus the expected time for the first occurrence of the largest j for which the proper suffix
of (a1 , . . . , ai ) is equal to the prefix of (a1 , . . . , ai ) of length j. Now the first j bits of
(a1 , . . . , ai ) are also the first j bit of a. Also, since (a1 , . . . , ai ) = (ak i+1 , . . . , ak ), we see
that the last j bits of (a1 , . . . , ai ) are also the last j bits of a. Thus j is the third from last
value for which Ii = 1.
This argument continues (recursively) through all the terms in the above sum. We can also
interpret the partial sums above as the expected times of occurrence for each prefix that is
equal to a suffix.
k) Use (i) to find, first, the expected time until the first occurrence of (1,1,1,1,1,1,0) and, second, that of
(1,1,1,1,1,1). Use (4.31) to check the relationship between these answers.
Solution: The first string has no proper suffixes equal to a prefix, so the expected time is
p1 6 p0 1 . For the second string, all suffixes are also prefixes, so the expected time is
6
X
j=1
p1 j = p1 6 (1 + p1 + · · · + p51 ) = p1 6 (1
= p1 6 p0 1
p61 )/(1
p1 )
p0 1 .
Adapting (4.31) to the notation here,
E [time to 111111] = 1 + p1 E [time to 111111] + p0 E [time to 1111110]
p0 E [time to 111111] = 1 + p0 E [time to 1111110] ,
which checks with the answer above.
Exercise 5.36: A large system is controlled by n identical computers. Each computer independently
alternates between an operational state and a repair state. The duration of the operational state, from
completion of one repair until the next need for repair, is a random variable X with finite expected duration
E [X]. The time required to repair a computer is an exponentially distributed random variable with density
e t . All operating durations and repair durations are independent. Assume that all computers are in the
repair state at time 0.
a) For a single computer, say the ith, do the epochs at which the computer enters the repair state form a
renewal process? If so, find the expected inter-renewal interval.
Solution: For the ith computer, let {Xi,m ; m > 0} be the sequence of successive operating
durations between repairs. Let Vi,1 be the completion time of the first repair. Since the
repair times are each exponential at rate , this first repair completion time is also exponential, independent of how long computer i has been in the repair state before time 0.
Let {Vi,m ; m > 1} be the subsequent repair intervals of the ith computer; these are also
exponential at rate and are independent of each other and of Vi,1 . Thus the sequence
{Vi,m ; m > 0} is IID. Let {Ui,m = Vi,m + Xi,m ; m > 0} be the sequence of successive repair
plus operation intervals. From the specified independence of repair and operational intervals, {Ui,m ; m > 0} is a sequence of IID rv’s. Thus this sequence of rv’s is the sequence
174
APPENDIX A. SOLUTIONS TO EXERCISES
of interarrival intervals of a renewal process, and the successive epochs at which computer
i enters the repair state after time 0 are the epochs of a renewal process. The expected
inter-renewal interval is U = X + 1/ .
Note that this is essentially obvious, except for the need to be clear about the first repair
time of computer i starting at time 0.
b) Do the epochs at which it enters the operational state form a renewal process?
Solution: The first epoch at which it enters the operational state is Vi,1 and each successive inter-epoch interval is Xi,n + Vi,n+1 for n > 0. Thus the inter-epoch intervals are
independent, but the first has a di↵erent distribution than the rest. This means that these
epochs do not form a renewal process, but instead form a delayed renewal process.
c) Find the fraction of time over which the ith computer is operational and explain what you mean by
fraction of time.
Solution: Let Ri (t) be a renewal reward function for the renewal process in (a). Let
Ri (t) = 1 when computer i is in the operational
state and let Ri (t) = 0 otherwise. The
Rt
time average of Ri (t), i.e., limt!1 (1/t) 0 Ri (⌧ ) d⌧ is the fraction of time computer i is
operational. From Theorem 5.4.5, this limit converges WP1 to the value X/(X + 1/ ).
Those of you following the derivations of theorems very carefully might notice that Theorem
5.4.5 does not quite apply here, since Ri (t) is a function of {Vi (t); i > 0} as well as the
inter-renewal intervals. As discussed at the end of Section 5.4 and in 5.5.3-5, however, the
theorem does apply to these more general cases.
d) Let Qi (t) be the probability that the ith computer is operational at time t and find limt!1 Qi (t).
Solution: Using the notation of (c),
Qi (t) = Pr{Ri (t) = 1} = E [Ri (t)] .
Thus limt!1 Qi (t) = limt!1 E [Ri (t)], assuming the limit exists. Since Ui (n) = Vi,n +Xi,n is
non-arithmetic, Section 5.7 of the text essentially shows that this limiting ensemble average
exists and equals the time average, i.e., that limt!1 Qi (t) = X/(X + 1/ ).
As in (c), the argument in Section 5.7 has to be extended slightly, but going through the
details does not seem to be warranted here.
e) The system is in failure mode at a given time if all computers are in the repair state at that time. Do
the epochs at which system failure modes begin form a renewal process?
Solution: At time 0, all computers are in the repair state, and thus each requires an
independent exponentially distributed repair time (of rate ) which is independent of all
subsequent repair intervals and operational intervals. At each subsequent entry to a failure
mode, each computer requires an independent exponentially distributed repair time (of rate
), independent of all subsequent repair intervals and operational intervals. Thus the epochs
of entering a failure mode constitute a renewal process.
f ) Let Pr{t} be the probability that the the system is in failure mode at time t. Find limt!1 Pr{t}. Hint:
look at (d).
175
A.5. SOLUTIONS FOR CHAPTER 5
Solution: The repair and operation intervals of each computer are independent of those of
all other computers, and thus, for each t the event that computer i is operational at time t
is independent of the events that the other computers are operational at time t. Thus
⇥
⇤n
Pr{t} = 1 Qi (t) .
Using (d) to take the limit as t ! 1,

lim Pr{t} = 1
t!1
g) For
X
X + 1/
n
=

n
1
.
X +1
small, find the probability that the system enters failure mode in the interval (t, t + ] in the limit
as t ! 1.
Solution: For sufficiently small, the probability of two or more computers entering or
leaving the repair mode in (t, t + ] is negligible compared with that of one computer. Thus
the system will enter failure mode in (t, t + ] if n 1 computers are in repair mode at
time t and the remaining computer enters repair mode in (t, t + ]. From (f), the limiting
⇥
⇤n 1
probability that a given set of n 1 computers are in repair mode at t is 1/(1 + X)
.
Applying Blackwell’s theorem to the renewal process of the remaining computer, the limiting
probability that the remaining computer enters repair mode (i.e., that computer renewal
process has a renewal) in (t, t+ ] is /(X +1/ ). Since there are n choices for the remaining
computer, the probability that the system enters failure mode in (t, t + ] is n /( X + 1)n .
h) Find the expected time between successive entries into failure mode.
Solution: Let {Wi ; i > 0} be the sequence of inter-renewal intervals for the renewal process
of system failures. Thus we are are asked to find E [W ]. From Blackwell’s theorem, applied
to this failure renewal process, the expected number of failures in (t, t + ], in the limit
t ! 1, is /E [W ]. For small, this expected number is the same as the probability of a
failure as given in (g). Thus,
E [W ] = ( X + 1)n / n.
An alternate approach is to first observe that the expected duration of a failure is the time
until the first of the n computers is repaired, i.e., 1/ n. Then, letting R(t) be a reward
of 1 during failure mode and 0 elsewhere, we see that 1/ n is the expected reward during
an inter-renewal interval. Then, using Corollary 5.4.6, the time-average reward is n/E [W ]
WP1. But now using the equality of time and limiting ensemble average, the time-average
reward is limt!1 Pr{t} as found in (f).
i) Next assume that the repair time of each computer has an arbitrary density rather than exponential, but
has a mean repair time of 1/ . Do the epochs at which system failure modes begin form a renewal process?
Solution: For each computer, the set of epochs at which that computer enters the repair
state form a renewal process or a delayed renewal process, depending on the assumption
about whether a repair interval starts at time 0 or is in process at time 0. When a system
failure occurs, however, each computer (except the last to fail) has been in the repair state
for some random period, and the set of those periods a↵ects the interval until the next
176
APPENDIX A. SOLUTIONS TO EXERCISES
failure. Thus the epochs at which system failures start do not form a renewal process and
also do not form a delayed renewal process.
j) Repeat (f) for the assumption in (i).
Solution: Despite the fact that the sequence of failure epochs do not form a renewal process,
(f) still gives the limit, as t ! 1, of the probability of a failure at time t, Pr{t}. The reason
is that the independence between computers still holds, so that Pr{t} = (1 Qi (t))n as in
(f), where now Qi (t) is the probability that computer i is in the operational mode at time
t. Since the times at which computer i enters the repair state is still a renewal process (or
a delayed renewal process), the limit limt!1 Qi (t) = X/(X + 1/ ) as shown in (d). Thus
the answer in (f) is still valid.
Exercise 5.37 : Let {N1 (t); t>0} and {N2 (t); t>0} be independent renewal counting processes. Assume
that each has the same CDF F(x) for interarrival intervals and assume that a density f (x) exists for the
interarrival intervals.
a) Is the counting process {N1 (t) + N2 (t); t > 0} a renewal counting process? Explain.
Solution: For the special case in which {N1 (t); t > 0} and {N2 (t); t > 0} are Poisson, the
sum N1 (t) + N2 (t) is also Poisson and thus renewal. Otherwise, when a renewal from one
process occurs, the residual life of the other process (i.e., the time until the next arrival)
is statistically dependent on the age of that process, which is not the same for each arrival
of the other process. Thus N1 (t) + N2 (t) is not a renewal process. For example, if the
inter-renewal interval for {N1 (t); t > 0} is the sum of two IID exponentials (i.e., if N1 (t) is
the number of even arrivals in a Poisson process), then, on a renewal at time t for process
1, the time until the next renewal depends on whether process 2 has experienced an even
or odd number of Poisson arrivals at time t.
b) Let Y (t) be the interval from t until the first arrival (from either process) after t. Find an expression for
the CDF of Y (t) in the limit t ! 1 (you may assume that time averages and ensemble averages are the
same).
Solution: Let Y1 (t) be the residual life for process 1 and let Y2 (t) be the residual life for
process 2. Then Y (t) = min[Y1 (t), Y2 (t)]. Since the two processes are independent, Y1 (t)
and Y2 (t) are independent rv’s, and, of course, are non-negative. It follows that for any
y > 0, Pr{Y (t) > y} = Pr{Y1 (t) > y} Pr{Y2 (t) > y}. Thus, for the CDF’s,
FY (t) (y) = 1
= 1
1
1
FY1 (t) (y) 1
FY2 (t) (y)
2
FY1 (t) (y) ,
where we have used the fact that Y1 (t) and Y2 (t) are IID. Since {Y1 (t); t > 0} is a renewal
process, the CDF of the residual life is given (as a time average WP1) by (5.30) as
Z
1 y
FY1 (y) =
Pr{X > x} dx
WP1.
X 0
Given that time and ensemble averages are the same, i.e., given that each process is nonarithmetic, this is also the limiting ensemble average as t ! 1 for Y1 (t).
c) Assume that a reward R of rate 1 unit per second starts to be earned whenever an arrival from process
1 occurs and ceases to be earned whenever an arrival from process 2 occurs.
177
A.5. SOLUTIONS FOR CHAPTER 5
Rt
Assume that limt!1 (1/t) 0 R(⌧ ) d⌧ exists with probability 1 and find its numerical value.
Solution: Assuming that the above limit exists, it must have the value 1/2 from symmetry.
To see this, let R⇤ be a reward of rate 1 unit per second that starts whenever an arrival
from 2 occurs and ceases whenever an arrival from process 1 occurs. Then at any time ⌧ ,
either R(⌧ ) = 1 (if the last arrival was from 1) or R⇤ (⌧ ) = 1 (if the last arrival was from
2). Thus R(⌧ ) + R⇤ (⌧ ) = 1 for all ⌧ . Since the interarrival intervals for the two processes
are IID, the two processes are identically distributed. Thus R(⌧ ) and its integral over ⌧
are identically distributed to R⇤ (t) and its integral over ⌧ . Since the sum integrates to 1,
each individually sum to 1/2. Although the above explanation was somewhat lengthy, it is
essentially obvious.
d) Let Z(t) be the interval from t until the first time after t that R(t) (as in (c)) changes value. Find an
expression for E [Z(t)] in the limit t ! 1. Hint: Make sure you understand why Z(t) is not the same as
Y (t) in (b). You might find it easiest to first find the expectation of Z(t) conditional on both the duration
of the {N1 (t); t > 0} interarrival interval containing t and the duration of the {N2 (t); t
0} interarrival
interval containing t; draw pictures!
Solution: Rename Z(t) as U (t) to avoid confusion with the age variables Z1 (t) and Z2 (t)
and leave out t since we are dealing with a fixed arbitrarily large t throughout. Note that
if Z1 > Z2 , then the next change in R after t is at t + Y1 . if Z1 < Z2 , R changes at t + Y2 .
We assume that X is continuous and ignore whether or not inequalities are strict. We only
outline a solution since the arithmetic is messy and of no real interest.
Consider Zmax = max(Z1 , Z2 ). Since Z1 and Z2 are IID, we have
FZmax (z) = F2Z1 (z).
Rz
From (5.93), the CDF of Zi , i = 1, 2 is FZi (z) = (1/X) 0 FcX (x) dx, so
FZmax (z) =
fZmax (z) =
1
2
✓Z z
0
X
✓Z z
2fX (x)
X
2
◆2
FcX (x) dx
0
◆
FcX (x) dx
.
(A.106)
ei , Zi for either process, i = 1, 2. Assuming that
Now consider the joint duration and age, X
X has a density, and converting the incremental expression in (5.77) to a density form,
fZi ,Xe i (z, x) =
fX (x)
;
X
for 0 < z < x.
The marginal density of Zi is then found by integrating over x > z above, giving fZi (z) =
ei conditional on Zi can then be found as
FcX (z)/X. The conditional density of X
fXe i |Zi (x|z) =
fZi ,Xe i (z, x)
fZi (z)
=
fX (x)
FcX (z)
for x > z > 0.
emax (x) denote X
ei for the i that maximizes (Z1 , Z2 ). Then
Now let X
178
APPENDIX A. SOLUTIONS TO EXERCISES
fXe max |Zmax (x|z) =
fX (x)
FcX (z)
for x > z > 0.
(A.107)
emax ) is now given by the product of (A.106) and (A.107). We
The joint PDF of (Zmax , X
emax from this and then calculate the expectation
can then calculate the marginal PDF of X
e
emax and then find the unconof Xmax . Alternatively, we can find the conditional mean of X
h
i
emax , we can
ditional mean. Neither lead to much insight. At any rate, after finding E X
h
i
emax
find E [U ] = E X
E [Zmax ].
Exercise 5.38: This problem provides another way of treating ensemble averages for renewal-reward
problems. Assume for notational simplicity that X is a continuous random variable.
a) Show that Pr{one or more arrivals in (⌧, ⌧ + )} = m(⌧ + ) m(⌧ ) o( ) where o( )
0 and lim !0 o( )/ =
0.
Solution: Recall that m(t) = E [N (t)] and consequently, m(t + )
number of arrivals in (t, t+ ]. Thus
m(t+ )
m(t) =
1
X
i=1
iPr{i arrivals 2 (t, t+ )}
m(t) is the expected
Pr{one or more arrivals 2 (t, t+ )} .
Unfortunately, the di↵erence between the two sides of this inequality can be greater than
p
o( ). An example occurs when FX (x) = x for 0  x  1. Then the probability of two
arrivals in (0, ) is greater than the probability of one arrival in (0, /2) and a second arrival
p
2
in the interval of duration /2 following the first. The probability of this is
/2 = /2,
which is not o( ).
When you can’t prove something in your research, a good policy is to alternate between
looking for a proof and looking for a counterexample. In the remainder of the exerise, we
restrict attention to usual case where (a) is satisfied.
n
o
e
b) Show that Pr Z(t) 2 [z, z + ), X(t)
2 (x, x + ) is equal to [m(t z)
FX (x)] for x
m(t z
)
o( )][FX (x+ )
z+ .
Solution: The clean way to do this is given in the proof of Theorem 5.7.2, and it is also
shown there that the o( ) term is unnecessary. Here we take another approach that is
less general and rigorous, but perhaps more intuitive. Assuming to be negligibly small,
Pr{Z(t) 2 [z, z + )} is essentially the probability of, first, an arrival in (t z
, t z]
and, second, no arrival in (t z, t]. The probability of no arrival in (t z, t], given an arrival
essentially at t z is FcX (z). Thus
⇥
⇤
Pr{Z(t) 2 [z, z + )} ⇡ m(t z) m(t z ) FcX (z).
(A.108)
Now, given Z(t) 2 [z, z + ), the probability that the next arrival after t z is in (x, x+ ]
is FX (x+ ) FX (x) /FcX (z). This together with (A.108) gives the desired result.
e is f
c) Assuming that m0 (⌧ ) = dm(⌧ )/d⌧ exists for all ⌧ , show that the joint density of Z(t), X(t)
(z, x) =
e
Z(t),X(t)
m0 (t
z)fX (x) for x > z.
179
A.5. SOLUTIONS FOR CHAPTER 5
Solution: Ignoring the o( ) term, the probability in (b) is m0 (t
disappears in going to the joint density.
d) Show that E [R(t)] =
Rt
z=0
R1
x=z
R(z, x)fX (x)dx m0 (t
z)fX (x) 2 . The
2 term
z)dz.
e = x, and using the joint density in
Solution: Letting R(z, x) be the reward for Z = z, X
(c), we get the desired result.
Exercise 5.39: In this problem, we show how to calculate the distribution of the residual life Y (t) as
a transient in t. Let µ(t) = dm(t)/dt where m(t) = E [N (t)], and let the interarrival distribution have the
density fX (x). Let Y (t) have the density fY (t) (y).
a) Show that these densities are related by the integral equation
Z y
µ(t + y) = fY (t) (y) +
µ(t + u)fX (y
u)du.
(A.109)
u=0
Solution: Assuming that all these densities exist, and ignoring the bizarre example in
Exercise 5.38 (a), µ(t + y) + o( ) is the probability of an arrival in (t+y, t+y+ ]. Such an
arrival event can be expressed as a disjoint union of subevents depending on the epoch of
the most recent arrival on or before t + y. One of these subevents is that of no arrivals in
(t, t + y] and an arrival in (t+y, t+y+ ]. This subevent is the event that Y (t) 2 (y, y + )
and has probabiity fY (y) + o( ) The other subevents where the previous arrival occurs in
one of the discrete sized intervals between t and t + y. Thus
µ(t + y) + o( ) = fY (t) (y) +
Cancelling
Xby/ c
i=1
2
µ(t+ i)fX (y
from both sides and going to the limit
b) Let Lµ,t (r) =
R
y 0
i) + o( ).
! 0, we get (A.109).
µ(t + y)e ry dy and let LY (t) (r) and LX (r) be the Laplace transforms of fY (t) (y) and
fX (x) respectively. Find LY (t) (r) as a function of Lµ,t and LX .
Solution: For fixed t and y, the integral in (A.109) is the convolution of µ over ⌧ > t and
fX for 0  ⌧  y. Thus,
⇥
⇤
Lµ,t (r) = LY (t) (r) + Lµ,t (r)LX (r);
LY (t) (r) = 1 LX (r) Lµ,t (r).
(A.110)
c) Consider the inter-renewal density fX (x) = (1/2)e x + e 2x for x
0 (as in Example 5.6.1). Find Lµ,t (r)
and LY (t) (r) for this example.
Solution: From Example 5.6.1, we have
LX (r) =
Lm (s) =
m(t) =
(3/2)r + 2
(r + 1)(r + 2)
4
1
1
+
2
3r
9r 9(r + 3/2)
4t 1 exp[ (3/2)t]
+
.
3
9
d
We see that µ(t) = dt
m(t) = 4/3 + (1/6) exp( 3t/2). From this, we calculate
Z 1
4
1
Lµ,t (r) =
µ(t + y)e ry dy =
+
e 3t/2 .
3r 6(r + 3/2)
0
180
APPENDIX A. SOLUTIONS TO EXERCISES
Substituting this and the expression for LX (r) into (A.110), we get
LY (t) (r) =
8(r + 3/2) + re 3t/2
.
6(r + 1)(r + 2)
d) Find fY (t) (y). Show that your answer reduces to that of (5.30) in the limit as t ! 1.
Solution: Using the Heaviside expansion formula on LY (t) (r),
fY (t) (y) =
lim fY (t) (y) =
t!1
2
e y + e 2y
3
2
e y + e 2y .
3
1
e y
6
2e
2y
exp
✓
3t
2
◆
(A.111)
The time-average result in (5.30), when applied to this example, yields X = 3/4 and
FcX (x) = (1/2))(e x + e 2x . Thus FcY (y) = (2/3)e y + (1/3)e 2y . Di↵erentiating, we get
the density in (A.111).
e) Explain how to go about finding fY (t) (y) in general, assuming that fX has a rational Laplace transform.
Solution: One simply goes through the same procedure as in Example 5.6.1 and here.
Recognizing that the Laplace transforms remain rational in r throughout the procedure,
one finally gets fY (t) (y). In fact the procedure could easily be automated. The reason we do
not stress this more is that one learns more about Laplace transforms than renewal theory
from these standard manipulations.
Exercise 5.40: This problem is designed to give you an alternate way of looking at ensemble averages
for renewal-reward problems. First we find an exact expression for Pr SN (t) > s . We find this for arbitrary
s and t, 0 < s < t.
a) By breaking the event {SN (t) > s}into subevents {SN (t) > s, N (t) = n}, explain each of the following
steps:
Pr SN (t) > s
=
=
1
X
Pr{t
n=1
1 Z t
X
n=1
=
Z t
y=s
=
Z t
y=s
Sn > s, Sn+1 > t}
Pr{Sn+1 >t | Sn =y} dFSn (y)
FcX (t y) d
1
X
FSn (y)
(A.112)
(A.113)
(A.114)
n=1
FcX (t y) dm(y)
where m(y) = E [N (y)] .
(A.115)
y=s
Solution: N (t) is the (random) number of arrivals up to and including t, and SN (t) is the
epoch of the last of these arrivals. Thus SN (t)+1 must be greater than t. The joint event
N (t) = n and SN (t) > s then has probability Pr{t Sn > s, Sn+1 > t}. These joint events
are disjoint over n and do not include n = 0 since S0 = 0 by convention and, by assumption,
s > 0. Thus summing over the probabilities of these subevents for n 1 gives Pr{Sn > s}
establishing (A.112). We then see that (A.113) is simply an alternate way of expressing the
probability of the subevents above.
In (A.114), we recognized that Xn+1 = Sn+1 Sn and that Xn+1 is independent of Sn . We
also interchanged the order of summation and integration, which is permissable since all
the terms are positive. Finally in (A.115), we used (5.54).
181
A.5. SOLUTIONS FOR CHAPTER 5
b) Show that for 0 < s < t < u,
Pr SN (t) > s, SN (t)+1 > u =
Z t
FcX (u
y) dm(y).
(A.116)
y=s
Solution: This is virtually the same as (a), except (A.112) is replaced by
Pr SN (t) > s, SN (t)+1 > u
=
1
X
Pr{t
Sn > s, Sn+1 > u} .
n=1
In (A.113-A.115), t is simply replaced by u.
c) Draw a two dimensional sketch, with age and duration as the axes, and show the region of (age, duration)
values corresponding to the event {SN (t) > s, SN (t)+1 > u}.
Solution:
-
t
Feasible region for SN (t) , SN (t)+1
Z(t)
H
Y
H {s < SN (t) < s+ , u < SN (t)+1 < u+ }
t s
Region where {SN (t) > s, SN (t)+1 > u}
u t
e
X(t)
u s
-
d) Assume that for large t, dm(y) can be approximated (according to Blackwell) as (1/X)dy, where X =
E [X]. Assuming that X also has a density, use the result in parts b) and c) to find the joint density of age
and duration.
Solution: We can use the expression in (A.116) to evaluate the probability of the trapezoidal region in (c). Assuming that is vanishingly small,
Pr s < SN (t) < s+ , u < SN (t)+1 < u+
Going to the limit
= fX (u
s)(X) 1 2 .
! 0, the joint probability density is
fSN (t) SN (t)+1 (s, u) =
fX (u s)
.
X
Exercise 5.41: Show that for a G/G/1 queue, the time-average wait in the system is the same as
limn!1 E [Wn ]. Hint: Consider an integer renewal counting process {M (n); n
0} where M (n) is the
number of renewals in the G/G/1 process of Section 5.5.3 that have occured by the nth arrival. Show that
this renewal process has a span of 1. Then consider {Wn ; n
1} as a reward within this renewal process.
Solution: The time-average wait in the system is what
we have called
the
h⇣ P
⌘
i sample-pathN (t)
average waiting time in Little’s theorem. It is limt!1
i=0 Wi /N (t) , and according
to Little’s theorem, the limit exists and is equal to a given number W WP1. The question is whether this sample-path average is the same as the limiting ensemble average
limn!1 E [Wn ]. The hint shows that limn!1 E [Wn ] exists, and Little’s theorem implies
that this ensemble average is equal to the sample-path average.
182
APPENDIX A. SOLUTIONS TO EXERCISES
Exercise 5.42: If one extends the definition of renewal processes to include inter-renewal intervals of
duration 0, with Pr{X=0} = ↵, show that the expected number of simultaneous renewals at a renewal epoch
is 1/(1
↵), and that, for a non-arithmetic process, the probability of 1 or more renewals in the interval
(t, t + ] tends to (1
↵) /E [X] + o( ) as t ! 1.
Solution: A renewal epoch is the time at which one or more renewals occur. Such an epoch
consists of one renewal wth probability 1 ↵, two with probability ↵(1 ↵), etc. Summing
the series, the expected number of renewals at a renewal epoch is 1/(1 ↵).
As in the solution to Exercise 5.3, we can consider the related ordinary renewal process in
which only the renewals after a nonzero interval are counted. In other words, the interrenewal intervals form the sequence {Yn ; n 1} generated by crossing out all inter-renewal
intervals equal to 0. Thus FcY (y) = FcX (y)/FcX (0) = FcX (y)/(1 ↵). Integrating the complementary CDF, we get the expectation, E [Y ] = E [X] /(1 ↵). Applying Blackwell’s
theorem to {Yn ; n 1} the expected number of renewal epochs in (t, t+ ] tends to /Y as
t ! 1, and thus the probability of a renewal epoch tends to /Y + o( ) = (1 ↵)/X].
Note that we could also apply Blackwell to {Xn ; n 1}, but we couldn’t get from there to
the probability of an arrival in a interval.
Exercise 5.43:
The purpose of this exercise is to show why the interchange of expectation and sum in the proof of Wald’s
equality is justified when E [J] < 1 but not otherwise. Let X1 , X2 , . . . , be a sequence of IID rv’s, each with
the CDF FX . Assume that E [|X|] < 1.
a) Show that Sn = X1 + · · · + Xn is a rv for each integer n > 0. Note: Sn is obviously a mapping from the
sample space to the real numbers, but you must show that it is finite with probability 1. Hint: Recall the
additivity axiom for the real numbers.
Solution: The additivity axiom for real numbers says that the sum of 2 real numbers is
a real number (i.e., finite real number). By induction the sum of n real numbers is a real
number. For n rv’s, the sample values of each are finite WP1, and hence the sum is finite
WP1. Exercise 1.13 treats this in more depth
b) Let J be a stopping trial for X1 , X2 , . . . . Show that SJ = X1 +· · · XJ is a rv. Hint: Represent Pr{SJ  A}
P
as 1
n=1 Pr{J = n, Sn  A}.
Solution: Since {SJ  A} is clearly an event for each real A, it suffices to show that
limA!1 Pr{SJ  A} = 1. Taking the complementary event,
Pr{SJ > A} =
1
X
n=1
Pr{J = n, Sn > A} 
k
X
Pr{Sn > A} + Pr{J > k} .
n=1
This is valid for all k 1, so for any given k, we can take the limit A ! 1 above to see
that limA!1 Pr{SJ > A} < Pr{J > k} for that k. Since J is a rv, we can now take the
limit k ! 1 to see that limA!1 Pr{SJ > A} = 0. Thus SJ is a rv.
c) For the stopping trial J above, let J (k) = min(J, k) be the stopping trial J truncated to integer k.
Explain why the interchange
of sum and expectation in the proof of Wald’s equality is justified in this case,
h
i
so E [SJ (k) ] = XE J (k) .
183
A.5. SOLUTIONS FOR CHAPTER 5
Solution: A finite sum and expectation can always be interchanged if the expectations
exist (see Exercise 1.14).
d) In parts (d), (e), and (f), assume, in addition to the assumptions above, that FX (0) = 0, i.e., that the
Xi are positive rv’s. Show that limk!1 E [SJ (k) ] < 1 if E [J] < 1 and limk!1 E [SJ (k) ] = 1 if E [J] = 1.
Solution: Since the Xi are positive rv’s with E [|X|] < 1, it follows
⇥
⇤ that X > 0. The rv
J (k) is non-decreasing in k and upper bounded by J. Thus E J (k) is non-decreasing in
⇥
⇤
k and upper bounded by E [J]. If E [J] < 1, then limk!1 E J (k) is also finite, so, from
(c),
lim⇤k!1 E [SJ (k) ] < 1. If E [J] = 1, ⇥then⇤for any B, there is a k large enough that
⇥ (k)
E J
> B, which means that limk!1 E J (k) = 1. Thus, since X > 0,
h
i
lim E [SJ (k) ] = lim XE J (k) = 1
for E [J] = 1.
k!1
k!1
e) Show that
Pr{SJ (k) > x}  Pr{SJ > x}
for all k, x.
(A.117)
Solution: Since Sn = Sn 1 + Xn , we see that Sn > Sn 1 for all n 1 and thus Sn > Sm
for all n > m > 0. Since J (k)  J, it follows that SJ SJ k . The desired result follows.
f ) Show that E [SJ ] = XE [J] if E [J] < 1 and E [SJ ] = 1 if E [J] = 1.
Solution: Integrating (A.117) over x > 0, we see that E [SJ (k) ]  E [SJ ] and also that
E [SJ (k) ] is increasing in k, which provides an independent derivation of some of the results
in (d).
g) Now assume that X has both negative and positive values with non-zero probability and let X + =
P
P
max(0, X) and X = min(X, 0). Express SJ as SJ+ + SJ where SJ+ = Ji=1 Xi+ and SJ = Ji=1 Xi . Show
that E [SJ ] = XE [J] if E [J] < 1 and that E [Sj ] is undefined otherwise.
Solution: The result in (f) can be applied independently to {Xi+ ; i
giving the desired result.
1} and {Xi ; i
1},
Exercise 5.44: This is a very simple exercise designed to clarify confusion about the roles of past,
present, and future in stopping rules. Let {Xn ; n 1} be a sequence of IID binary rv’s , each with the pmf
pX (1) = 1/2, pX (0) = 1/2. Let J be a positive integer-valued rv that takes on the sample value n of the
first trial for which Xn = 1. That is, for each n 1,
{J = n} = {X1 =0, X2 =0, . . . , Xn 1 =0, Xn =1}.
a) Use the definition of stopping trial, Definition 5.5.1, to show that J is a stopping trial for {Xn ; n
1}.
Solution: Note that IJ=n = 1 if Xn = 1 and Xi = 0 for all i < n. IJ=n = 0 otherwise.
Thus IJ=n is a function of X1 , . . . , Xn . We also see that J is a positive integer-valued rv.
That is, each sample sequence of {Xi ; i
1} (except the zero probability sequence of all
zeros) maps into a positive integer.
b) Show that for any given n, the rv’s Xn and IJ=n are statistically dependent.
Solution: Note that Pr{IJ=n = 1} = 2 n and Pr{Xn = 1} = 1/2. The product of these is
2 (n+1) , whereas Pr{Xn = 1, IJ=n = 1} = 2 n , demonstrating statistical dependence. More
intuitively Pr{Xn = 1 | IJ=n = 1} = 1 and Pr{Xn = 1} = 1/2.
184
APPENDIX A. SOLUTIONS TO EXERCISES
c) Show that for every m > n, Xn and IJ=m are statistically dependent.
Solution: For m > n, IJ=m =1 implies that Xi = 0 for i < m, and thus that Xn = 0. Thus
Pr{Xn = 1 | IJ=m =1} = 0. Since Pr{Xn = 1} = 1/2, IJ=m and Xn are dependent.
d) Show that for every m < n, Xn and IJ=m are statistically independent.
Solution: Since IJ=m is a function of X1 , . . . , Xm and Xn is independent of X1 , . . . , Xm ,
it is clear that Xn is independent of IJ=m
e) Show that Xn and IJ
event {J
n are statistically independent.
Give the simplest characterization you can of the
n}.
Solution: For the same reason as IJ=m is independent of Xn for m < n, we see that IJ<n
is independent of Xn . Now {J n} = {J < n}c , so IJ n = 1 IJ<n . Thus {J n} is also
independent of Xn . We give an intuitive explanation after (f).
f ) Show that Xn and IJ>n are statistically dependent.
Solution: The event {J > n} implies that X1 , . . . , Xn are all 0, so Pr{Xn = 1 | J > n} = 0.
Since Pr{Xn =1} = 1/2, it follows that Xn and IJ>n are dependent.
It is important (and essentially the central issue of S
the exercise) to understand why (e) and
(f) are di↵erent. Note that {J
n} = {J = n} {J > n}. Now {J = n} implies that
Xn = 1 whereas {J > n} implies that Xn = 0. The union, however, implies nothing about
Xn , and is independent of Xn . The union, IJ n is the event that X1 , . . . , Xn 1 are all 0
and Xn then determines whether J = n or J > n.
Note: The results here are characteristic of most sequences of IID rv’s. For most people, this requires some
realignment of intuition, since {J n} is the union of {J=m} for all m
n, and each of these events are
highly dependent on Xn . The right way to think of this is that {J n} is the complement of {J<n}, which
is determined by X1 , . . . , Xn 1 . Thus {J n} is also determined by X1 , . . . , Xn 1 and is thus independent
of Xn . The moral of the story is that thinking of stopping rules as rv’s independent of the future is very
tricky, even in totally obvious cases such as this.
Exercise 5.45: Assume a friend has developed an excellent program for finding the steady-state probabilities for finite-state Markov chains. More precisely, given the transition matrix [P], the program returns
limn!1 Piin for each i. Assume all chains are aperiodic.
a) You want to find the expected time to first reach a given state k starting from a di↵erent state m for a
0
0
Markov chain with transition matrix [P ]. You modify the matrix to [P 0 ] where Pkm
= 1, Pkj
= 0 for j 6= m,
and Pij0 = Pij otherwise. How do you find the desired first-passage time from the program output given
[P 0 ] as an input? (Hint: The times at which a Markov chain enters any given state can be considered as
renewals in a (perhaps delayed) renewal process).
Solution: First assume k, m are in the same recurrent class in [P 0 ]. Successive entries to
state k form a delayed renewal process (delayed because the process starts in state m). By
(5.106), i.e., Blackwell’s theorem for delayed arithmetic renewal processes, the probability
of an arrival at a limiting time n (i.e., limn!1 [P 0 ]nkk ) is 1/E [Xkk ] where Xkk is the interrenewal time between successive arrivals to state k. The first passage time is E [Xkk ] 1
since the first transition after an arrival at k is the transition
friend’s
n back to m.o The
1
program gives us limn!1 [P 0 ]nkk , so the first passage time is limn!1 [P 0 ]nkk
1. Note
185
A.5. SOLUTIONS FOR CHAPTER 5
that successive arrivals to state m also form a renewal process, but not every return to state
m corresponds to a first passage time to k (as detailed in (b)).
If k and m are not in the same recurrence class in [P 0 ], then the expected first passage time
is infinite, i.e., there is a positive probability of taking some path from m for which there
is no return to m or path to k. Such a path must exist in the original chain also.
b) Using the same [P 0 ] as the program input, how can you find the expected number of returns to state m
before the first passage to state k?
Solution: Consider a (delayed) renewal reward process with unit reward on each arrival
to state m and a renewal on each arrival to state k. Then the limiting probability of an
arrival to m at time n is limn!1 [P 0 ]nmm , which can be found from the program. From the
renewal reward theorem, the expected reward per renewal is
E [reward per renewal] =
limn!1 [P 0 ]nmm
.
limn!1 [P 0 ]nkk
This is the expected number of visits to m per visit to k, and we are asked to find the
number of returns to m per visit to k, which is one less than the value above. There is
always at least one visit to m per visit to k, so this expected number of of returns is always
non-negative.
c) Suppose, for the same Markov chain [P ] and the same starting state m, you want to find the probability
of reaching some given state n before the first-passage to k. Modify [P ] to some [P 00 ] so that the above
program with P 00 as an input allows you to easily find the desired probability.
Solution: One solution is to modify the chain so that n always goes to k and k always
00 = 1, P 00 = 0 for j 6= m and P 00 = 1 and P 00 = 0 for j 6= k. We
goes to m, i.e., Pkm
nj
kj
nk
see that this doesn’t alter any first passage path from m to k that doesn’t pass through
n and doesn’t alter any first passage path from m to n that doesn’t first pass through k.
Assigning a unit reward for each visit to n and noting that at most one visit to n can occur
per inter-renewal interval, we see that the probability of visiting n before k is the same as
the expected reward per inter-renewal. Thus
Pr{n before k} =
lim`!1 [P 00 ]`nn
.
lim`!1 [P 00 ]`kk
d) Let Pr{X(0) = i} = Qi , 1  i  M be an arbitrary set of initial probabilities for the same Markov chain
[P ] as above. Show how to modify [P ] to some [P 000 ] for which the steady-state probabilities allow you to
easily find the expected time of the first-passage to state k.
Solution: Essentially, we want to modify the matrix [P ] so that after each visit to k, the
000 = Q for each i, with the other transitions
chain reverts to its initial state. This suggests Pki
i
unchanged. This doesn’t quite work, since k might be one of the initial states, and in finding
000 . The solution to this is the
the first passage time from k we must use Pki rather than Pki
same as was used in Exercise 4.22. The state k is split into two states, one with the original
000 = Q .
transition probabilities and the other with Pki
i
Exercise 5.46: Consider a ferry that carries cars across a river. The ferry holds an integer number k
of cars and departs the dock when full. At that time, a new ferry immediately appears and begins loading
186
APPENDIX A. SOLUTIONS TO EXERCISES
newly arriving cars ad infinitum. The ferry business has been good, but customers complain about the long
wait for the ferry to fill up.
a) Assume that cars arrive according to a renewal process. The IID interarrival times have mean X, variance
2
and moment generating function gX (r). Does the sequence of departure times of the ferries form a renewal
process? Explain carefully.
Solution: Yes, the ferry departure times form a renewal process. The reason is that the
`th ferry departure is immediately after the k`th customer arrival. The time from the `th
to ` + 1 ferry departure is the time from the (k`)th to ((k + 1)`)th customer arrival, which
is clearly independent of all previous ferry departure times.
b) Find the expected time that a customer waits, starting from its arrival at the ferry terminal and ending
at the departure of its ferry. Note 1: Part of the problem here is to give a reasonable definition of the
expected customer waiting time. Note 2: It might be useful to consider k = 1 and k = 2 first.
Solution: For k = 1, the ferry leaves immediately when a customer arrives, so the expected waiting time for each customer is 0. For k = 2, odd numbered customers wait for
the following even numbered customer, and even number customers don’t wait at all, so
the average waiting time over customers (which we take as the only sensible definition of
expected waiting time) is X/2.
We next find the expected waiting time, averaged over customers, for the `th ferry. To
simplify notation, we look at the first ferry. Later ferries could be shown to be the same by
renewal reward theory, but more simply they are the same simply by adding extra notation.
The average expected wait over the k customers is the sum of their expected waits divided
by k (recall that this is true even if (as here) the wait of di↵erent customers are statistically
dependent. The expected wait of customer 1 is (k 1)X, that of customer 2 (k 2)X,
etc. Recall (or derive) that 1 + 2 + · · · + (k 1) = (k 1)k/2. Thus the expected wait per
customer is (k 1)X/2, which checks with the result for k = 1 and k = 2.
⇥ ⇤
c Is there a ‘slow truck’ phenomenon (a dependence on E X 2 ) here? Give an intuitive explanation. Hint:
Look at k = 1 and k = 2 again.
Solution: Clearly, there is no ‘slow truck’ phenomenon for the ferry wait here since the
expected wait per customer is (k 1)X. The reason is most evident for k = 1, where the
wait is 0. The arrival of a car at the ferry terminal could have been delayed by an arbitrary
amount by a slow truck in getting to the ferry terminal, but is not delayed at all in getting on
the ferry since the slow truck took an earlier ferry. For larger k, a vehicle could be delayed
by⇥ a later
arriving truck, but at most k 1 vehicles could be delayed that way, while the
⇤
E X 2 slow truck e↵ect arises from the potentially unbounded number of customers delayed
by a slow truck.
d) In an e↵ort to decrease waiting, the ferry managers institute a policy where no customer ever has to wait
more than one hour. Thus, the first customer to arrive after a ferry departure waits for either one hour or
the time at which the ferry is full, whichever comes first, and then the ferry leaves and a new ferry starts
to accumulate new customers. Does the sequence of ferry departures form a renewal process under this new
system? Does the sequence of times at which each successive empty ferry is entered by its first customer
form a renewal process? You can assume here that t = 0 is the time of the first arrival to the first ferry.
Explain carefully.
187
A.5. SOLUTIONS FOR CHAPTER 5
Solution: The sequence of ferry departures does not form a renewal process (nor a delayed
renewal process). As an example, suppose that k = 2 and X is 1 minute with probability
9/10 and 2.1 hours with probability 1/10. Then a ferry interdeparture interval of 2 minutes
followed by one of an hour and one minute implies that the second departure was generated
by a one-minute inter-arrival followed by a 2.1-hour inter-arrival. The second ferry departure
then left 1 hour after the 1 minute-interarrival, with 1.1 hour remainng until the 2.1-hour
interarrival. Thus the third ferry departure is at least 1.2 hour after the second departure.
The times at which each successive ferry is entered by its first customer is a renewal process
since customer arrivals form a renewal process.
e) Give an expression for the expected waiting time of the first new customer to enter an empty ferry under
this new strategy.
Solution: The first new customer waits for a time T in hours which is the minimum of 1
hour and X2 + . . . , Xk . This latter term has the CDF of the sum Sk 1 of IID rv’s with
the distribution of X. Since the expected value of a rv is the integral of its complementary
CDF,
E [T ] =
Z 1
[1
FSk 1 (s)] ds.
0
Exercise 5.47: Consider a (G/G/1) ‘queueing’ system. That is, the arriving customers form a renewal
process, i.e., the interarrival intervals {Xn ; n
1} are IID. You may assume throughout that E [X] < 1.
Each arrival immediately enters service; there are infinitely many servers, so one is immediately available
for each arrival. The service time Yi of the ith customer is a rv of expected value Y < 1 and is IID with
all other service times and independent of all inter-arrival times. There is no queueing in such a system,
and one can easily intuit that the number in service never becomes infinite since there are always available
servers.
N (t)
Y2
Y1
0
S1
S2
-
Y3
-
Y4
-
L(t) = 1
-
t
a) Give a simple example of distributions for X and Y in which this system never becomes empty. Hint:
Deterministic rv’s are fair game.
Solution: Suppose, for example that the arrivals are deterministic, one arrival each second.
Suppose that each service takes 2 seconds. Then when the second arrival occurs, the first
service is only half over. When the third arrival arrives, the first service is just finishing,
but the second arrival is only half finished, etc. This also illustrates the unusual behavior
of systems with an unlimited number of servers. If the arrival rate is increased, there are
simply more servers at work simultaneously; there is no possibility of the system becoming
overloaded. The example here can be made less trivial by making the interarrival distribu-
188
APPENDIX A. SOLUTIONS TO EXERCISES
tion arbitrary but upper bounded by 1 second. The service distribution could similarly be
arbitrary but lower bounded by 2 seconds.
b) We want to prove Little’s theorem for this type of system, but there are no renewal instants for the
entire system. As illustrated above, let N (t) be the renewal counting process for the arriving customers and
let L(t) be the number of customers in the system (i.e., receiving service) at time t. In distinction to our
usual view of queueing systems, assume that there is no arrival at time 0 and the first arrival occurs at time
S1 = X1 . The nth arrival occurs at time Sn = X1 + · · · + Xn .
Carefully explain why, for each sample point ! and each time t > 0,
Z t
0
N (t,!)
L(t, !) dt 
X
Yi (!).
i=1
Rt
Solution: For the case in the above figure, 0 L(t, !) dt is Y1 plus Y2 plus the part of Y3 on
Rt
which service has been completed by t. Thus 0 L(t, !) dt  Y1 + Y2 + Y3 . Since N (t) = 3
Rt
PN (t)
for this example, 0 L(t, !) dt  i=1 Yi .
PN (t)
In general, i=1 Yi is the total amount of service required by all customers arriving before
Rt
or at t. 0 L(t, !) dt is the amount of service already provided by t. Since not all the
required service has necessarily been provided by t, the inequality is as given.
c) Find the limit as t ! 1 of 1t
PN (t,!)
i=1
Yi (!) and show that this limit exists WP1.
Solution: This is virually the same as many limits we have taken. The trick is to multiply
and divide by N (t, !) and then use the SLLN and the strong law for renewals, i.e.,
PN (t,!)
N (t,!)
Yi (!)
1 X
N (t, !)
i=1
lim
Yi (!) = lim
.
t!1 t
t!1
t
N (t, !)
i=1
For a given !, N (t, !) is a real valued function of t which approaches 1/X for all ! except
perhaps a set of probability 0. For each of those sample points, limt!1 N (t, !) = 1 and
thus the second term above has the following limit WP1.
lim
t!1
PN (t,!)
n
Yi (!)
1X
= lim
Yi (!).
n!1 n
N (t, !)
i=1
i=1
Since the Yi are IID, the SLLN says that this limit is Y WP1. The limit limt!1 N (t, !)/t =
1/X WP1. The set of ! for which both both limits exist then has probability 1, so
N (t,!)
1 X
Y
lim
Yi (!) =
t!1 t
X
for a set of ! of probability 1.
i=1
d) Assume that the service time distribution is bounded between 0 and some b > 0, i.e., that FY (b) = 1.
Carefully explain why
Z t+b
0
N (t,!)
L(⌧, !) d⌧
X
i=1
Yi (!).
189
A.5. SOLUTIONS FOR CHAPTER 5
Solution: This is almost the same as (b). All customers that have entered service by t
have completed service by t + b, and thus the di↵erence between the two terms above is the
service provided to customers that have arrived between t and t + b.
e) Find the limit as t ! 1 of 1t
Rt
0
L(⌧, !) d⌧ and indicate why this limit exists WP1.
Solution: Combining the results of b) and d),
N (t,!)
X
i=1
Yi (!) 
Z t+b
0
N (t+b,!)
X
L(⌧, !) d⌧ 
Yi (!).
i=1
Dividing by t and going to the limit as t ! 1, the limit on the left is Y /X WP1 and the
limit on the right, after multiplying and dividing by t + b, is the same. Thus,
1
t!1 t
lim
Z t+b
0
t+b 1
t!1 t t + b
L(⌧, !) d⌧ = lim
Z t+b
L(⌧, !) d⌧ = Y /X.
0
This is Little’s theorem for the case of a rvX that is truncated to b and a truncation
argument establishes Little’s theorem for arbitrary X.
190
A.6
APPENDIX A. SOLUTIONS TO EXERCISES
Solutions for Chapter 6
Exercise 6.1: Let {Pij ; i, j
0} be the set of transition probabilities for a countable-state Markov
chain. For each i, j, let Fij (n) be the probability that state j occurs sometime between time 1 and n
inclusive, given X0 = i. For some given j, assume that {xi ; i 0} is a set of nonnegative numbers satisfying
P
xi = Pij + k6=j Pik xk for all i 0. Show that xi Fij (n) for all n and i, and hence that xi Fij (1) for
all i. Hint: use induction.
Solution: We use induction on n. As the basis for the induction, we know that Fij (1) = Pij .
Since the xi are by assumption nonnegative, it follows for all i that
X
Fij (1) = Pij  Pij +
Pik xk = xi .
k6=j
For the inductive step, assume that Fij (n)  xi for a given n and all i. Using (6.8),
X
Fij (n + 1) = Pij +
Pik Fkj (n)
k6=j
 Pij +
X
Pik xk = xi
for all i.
k6=j
By induction, it then follows that Fij (n)  xi for all i, n. From (6.7), Fij (n) is nondecreasing in n and upper bounded by 1. It thus has a limit, Fij (1), which satisfies
Fij (1)  xi for all i.
P
One solution to the set of equations xi = Pij + k6=j Pik xk is xi = 1 for all i. Another (from
6.9)) is xi = Fij (1). These solutions are di↵erent when Fij (1) < 1, and this exercise then
shows that the solution {Fij (1); i 0} is the smallest of all possible solutions.
Exercise 6.2: a) For the Markov chain in Figure 6.2, show that, for p
show that Fi0 (1) = [(1
i
p)/p] for i
1/2, F00 (1) = 2(1
p) and
1. Hint: first show that this solution satisfies (6.9) and then show
that (6.9) has no smaller solution (Exercise 6.1 shows that Fij (1) is the smallest solution to (6.9)). Note
that you have shown that the chain is transient for p > 1/2 and that it is recurrent for p = 1/2.
Solution: Note that the diagram in Figure 6.2 implicitly assumes p < 1, and we assume
that here, since the p = 1 case is trivial.
The hint provides a straightforward algebraic solution, but provides little insight. It may
be helpful, before carrying that solution out, to solve the problem by the more intuitive
method used in the ‘stop when you’re ahead’ example, Example 5,4.4. This method also
derives the values of Fi0 (1) rather than just verifying that they form a solution.
Let F10 (1) be denoted by ↵. This is the probability of ever reaching state 0 starting from
state 1. This probability is unchanged by converting state 0 to a trapping state, since when
starting in state 1, transitions from state 0 can occur only after reaching state 0.
In the same way, Fi,i 1 (1) for any i > 1 is unchanged by converting state i 1 into a
trapping state, and the modified Markov chain, for states k i 1 is then the same as the
modified chain that uses state 0 as a trapping state. Thus Fi,i 1 (1) = ↵.
191
A.6. SOLUTIONS FOR CHAPTER 6
We can proceed from this to calculate Fi,i 2 for any i > 2. In order to access state i 2
from state i, it is necessary to first access state i 1. Given a first passage from i to i 1,
(an event of probability ↵), it is necessary to have a subsequent first passage from i 1 to
i 2, so Fi,i 2 (1) = ↵2 . Iterating on this, Fi0 (1) = ↵i .
We can now solve for ↵ by using (6.9) for i = 1, j = 0: F10 (1) = q + pF20 (1). Thus
↵ = q + p↵2 . This is a quadratic equation in ↵ with the solutions ↵ = 1 and ↵ = q/p. We
then know that F10 (1) has either the value 1 or q/p. For whichever value of ↵ is correct,
we also have Pi0 (1) = ↵i for i 1. Finally, from (6.9) for j = 0, i = 0, F00 (1) = q + p↵.
From the reasoning above, we know that these two possible solutions are the only possibilities. If both of them satisfy (6.9), then, from Exercise 6.1, the one with ↵ = q/p is
the correct one since it is the smaller solution to (6.9). We now show that the solution
with ↵ = q/p satisfies (6.9). This is the first part of what the question asks, but it is now
unnecessary to also show that this is the smallest solution.
If we substitute the hypothesized solution,
F00 (1) = 2q;
Fi0 (1) = (q/p)i
for i > 0,
(A.118)
into (6.9) for j = 0, we get the hypothesized set of equalities,
2q = q + p(q/p)
for i = 0
q/p = q + p(q/p)
for i = 1
2
(q/p)
i
= q(q/p)
i 1
+ p(q/p)
for all i
i+1
2.
The first of these is clearly an identity, and the third is seen to be an identity by rewriting
the right side as p(q/p)i + q(q/p)i . The second is an identity by almost the same argument.
Thus (A.118) satisfies (6.9) and thus gives the correct solution.
For those who dutifully took the hint directly, it is still necessary to show that (A.118) is
the smallest solution to (6.9). Let xi abbreviate Fi0 (1) for i 0. Then (6.9) for j = 0 can
be rewritten as
x0 = q + px1
x1 = q + px2
xi = qxi 1 + pxi+1
The equation for each i
for i
2.
2 can be rearranged to the alternate form,
xi+1
(q/p)xi = xi
(q/p)xi 1 .
(A.119)
For i = 1, the similar rearrangement is x2 (q/p)x1 = x1 q/p. Now consider the possibility
of a solution to these equations with x1 < q/p, say x1 = (q/p)
with > 0. Recursing
on the equations (A.119), we have
xi+1
(q/p)xi =
for i
1.
(A.120)
For the case q = p = 1/2, this becomes xi+1 xi =
. Thus for sufficiently large i, xi
becomes negative, so there can be no non-negative solution of (6.9) with F10 < q/p.
192
APPENDIX A. SOLUTIONS TO EXERCISES
For the case 1/2 < p < 1, (A.120) leads to
xi+1 = (q/p)xi
 (q/p)2 xi 1
 · · ·  (q/p)i
.
For large enough i, this shows that xi+1 is negative, showing that no non-negative solution
exists with x1 < q/p.
b) Under the same conditions as (a), show that Fij (1) equals 2(1
p) for j = i, equals [(1
p)/p]i j for
i > j, and equals 1 for i < j.
Solution: In the first part of the solution to (a), we used a trapping state argument to show
that Fi,i 1 (1) = F10 (1) for each i > 1. That same argument shows that Fij = Fi j,0 (1)
for all i > j. Thus Fij (1) = (q/p)i j for i > j.
Next, for i < j, consider converting state j into a trapping state. This does not alter Fij (1)
for i < j, but converts the states 0, 1, . . . , j into a finite-state Markov chain with a single
recurrent state, j. Thus Fij (1) = 1 for i < j.
Finally, for i = j > 1, (6.9) says that
Fii (1) = qFi 1,i (1) + pFi+1,i (1) = q + pF10 (1) = 2q,
where we have used the earlier results for i < j and i > j. The case for F11 (1) is virtually
the same.
Exercise 6.3 a): Show that the nth order transition probabilities, starting in state 0, for the Markov
chain in Figure 6.2 satisfy
n
n 1
n 1
P0j
= pP0,j
1 + qP0,j+1 ,
for j 6= 0;
n
n 1
n 1
P00
= qP00
+ qP01
.
Hint: Use the Chapman-Kolmogorov equality, (4.7).
P n 1
Solution: This is the Chapman-Kolmogorov equality in the form Pijn = k Pik
Pkj where
P00 = q, Pk,k+1 = p for all k 0 and Pk,k 1 = q for all k 1; Pkj = 0 otherwise.
n
b) For p = 1/2, use this equation to calculate P0j
iteratively for n = 1, 2, 3, 4. Verify (6.3) for n = 4 and
then use induction to verify (6.3) in general. Note: this becomes an absolute mess for p 6= 1/2, so don’t
attempt this in general.
n ) for
Solution: This is less tedious if organized as an array of terms. Each term (except P00
each n is then half the term diagonally above and to the left, plus half the term diagonally
n is half the term above plus half the term diagonally above and
above and to the right. P00
to the right.
j
0
1
P0,j
1/2
2
P0,j
1/2
3
P0,j
3/8
4
P0,j
3/8
1
H
?
⇡
j
H
H
?
⇡
j
H
H
?
⇡
j
H
1/2
1/4
3/8
1/4
3
H
j
H
H
⇡
j
H
H
⇡
j
H
1/4
1/8
1/4
3
H
j
H
H
⇡
j
H
1/8
1/16
4
H
j
H
1/16
A.6. SOLUTIONS FOR CHAPTER 6
193
terms are similar. We also see (with less work) that (6.3) is valid for 1  n  3 for all
j, 0  j  n. We can avoid fussing with the constraint j  n by following the convention
that nk = 0 for all n 1 and k > n. We have then shown that (6.3) is valid for 1  n  4
and all j 0.
We next use induction to validate (6.3) in general. From the previous calculation, any
n, 1  n  4 can be used as the basis of the induction and then, given the assumption that
(6.3) is valid for any given n > 0 and all j
0, we will prove that (6.3) is also valid for
n+1 and all j 0. Initially, for given n, we assume j > 0; the case j = 0 is a little di↵erent
since it has self transitions, so we consider it later.
For the subcase where n + j is even, we have
⇤
1⇥ n
n
P0,j 1 + P0,j+1
2 ✓
◆
✓
◆
1
n
n
n
=
2 +
2 n
2
(j + n)/2
((j + n)/2) + 1
✓
◆
n+1
=
2 (n+1) .
((j + n)/2) + 1
n+1
P0j
=
The first equality comes from part (a) with p = q = 1/2. The second equality uses (6.3) for
the given n and uses the fact that n+j 1 and n+j+1 are odd. The final equality follows
immediately from the combinatorial identity
✓ ◆ ✓
◆ ✓
◆
n
n
n+1
+
=
.
k
k+1
k+1
This identity follows by viewing n+1
k+1 as the number of ways to arrange k + 1 ones in a
binary n + 1 tuple. These arrangements can be separated into those that start with a one
followed by k ones out of n and those that start with a zero followed by k + 1 ones out of
n+1
n. The final result for P0j
above is for n+1+j odd and agrees with the odd case in (6.3)
after substituting n+1 for n in (6.3). This validates (6.3) for this case. The subcase where
n+j is odd is handled in the same way.
n+1
n + 1 P n . For
For the case j = 0, we use the second portion of part (a), namely P00
= 12 P00
2 01
the subcase where n is even, we then have
✓
◆
✓
◆
1
n
n
n+1
n
P00
=
2 +
2 n
2
n/2
(n/2) + 1
✓
◆
n+1
=
2 (n+1) .
(n/2)+1
In the first equality, we have used (6.3) with n+j even for j=0 and n+j odd for j=1. The
second equality uses the combinatorial identity above. This result agrees with the odd case
in (6.3) after substituting n+1 for n. Finally, the subcase where n is odd is handled in
n
almost the same way, except that one must recognize that (n+1)/2
= (n n1)/2 .
(c) As a more interesting approach, which brings out the relationship of Figures 6.2 and 6.1, note that (6.3),
with j + n even, is the probability that Sn = j for the chain in 6.1. Similarly, (6.3) with j + n odd is the
194
probability that Sn =
APPENDIX A. SOLUTIONS TO EXERCISES
j
1 for the chain in 6.1. By viewing each transition over the self loop at state 0 as
a sign reversal for the chain in 6.1, explain why this surprising result is true. (Again, this doesn’t work for
p 6= 1/2, since the sign reversals also reverse the +1, -1 transitions.)
Solution: The meaning of the hint will be less obscure if we redraw Figure 6.1 in the
following way:
0n
y
X
1/2
↵
n
1X
y
1/2
even
1/2
1/2
odd
1/2
1/2
z n
X
1 X
y
1/2
z n
X
2X
y
1/2
odd
even
1/2
1/2
z n
X
2
even
z n
X
3
...
...
Figure 6.1 redrawn
odd
Compare this with the discrete M/M/1 chain of Figure 6.2,
1/2
: 0n
⇠
y
X
1/2
1/2
z n
X
1 X
y
1/2
1/2
z n
X
2
. . . Figure 6.2 redrawn
Note that Figure 6.2 can be viewed as the result of combining each nonnegative state i in
the redrawn Figure 6.1 with the state i 1 lying directly beneath it. To be more specific,
the transition probability from state 0 in Fig. 6.1 to the aggregated states 1 and -2 is 1/2,
and the transition probability to the aggregated states 0 and -1 is 1/2. The same transition
probabilities hold for state -1. Similarly, starting from any state i in Fig 6.1, there is a
transition probablity 1/2 to the aggregated states i+1, i 2 and 1/2 to the aggregated
states i 1 and i. The same aggregated transition probabilities hold starting from state
i 1.
What this means is that the set of aggregated pairs forms a Markov chain in its own right,
and this Markov chain is the M/M/1 chain of Fig 6.2. The nth order transition probabilites
n for the M/M/1 chain are thus the same as the nth order aggregate transition probabilities
P0i
say Qn0i + Qn0, i 1 for the Bernoulli chain. Since the Bernoulli chain is periodic with period
2, however, and each pair of states consists of one even and one odd term, only one of these
aggregated terms is nonzero. This helps explain the strange looking di↵erence in part (b)
between n + i even and odd.
Exercise 6.4: Let j be a transient state in a Markov chain and let j be accessible from i. Show that i
is transient also. Interpret this as a form of Murphy’s law (if something bad can happen, it will, where the
bad thing is the lack of an eventual return). Note: give a direct demonstration rather than using Theorem
6.2.5.
Solution: It is instructive to start o↵ using Theorem 6.2.5, and after that, we give two
solutions without it. First, using the given path from state i to j, we use contradiction,
showing that the assumption that state i is recurrent implies the contradiction that j is
recurrent. This contradiction will imply that i is not recurrent, i.e., transient. From Lemma
6.2.3 (with the roles of i and j reversed, i recurrent implies that Fji (1) = 1. This in turn
implies that there is a path from j to i, so that i and j are in the same class. From
Theorem 6.2.5, this implies that j is recurrent, which is a contradiction. Thus state i must
A.6. SOLUTIONS FOR CHAPTER 6
195
be transient.
The second proof will use contradiction in the same way. If i is recurrent, then the Markov
chain generates a renewal process with renewals on occurrences of state i. From Theorem 6.2.6, part 3, limt!1 E [Nii (t)] = 1. For each renewal (occurrence of i), there is an
occurrence of j before the next renewal with at least the positive probability, say ↵ > 0,
that the path from i to j is taken immediately after the renewal (and thus before the next
renewal). Thus the expected number of occurrences of j per occurrence of i is at least q
and E [Njj (t)]
↵E [Nii (t)] 1). Thus limt!1 E [Nii (t)] = 1, so, from theorem 6.2.6, i
must be recurrent. This is a contradiction, so i is transient.
The third proof is direct (does not use contradiction) and uses a technique that can be
useful in many contexts. Since state j is transient, it follows by definition that Fjj (1) < 1.
This implies that the number of occurrences of state j over the infinite time interval is a
geometric rv and the interval between occurrences is a degenerate rv (see Theorem 6.2.6).
Consider the sequence of occurrences of states i and j, ignoring the time intervals between
them and considering only the order in which they occur, such as, for example i, i, j, i, j, j
where after the last occurrence of j, no more occurrences of either i or j occur. After each
i, the next occurrence in this sequence will be either i, j, or nothing. The probability of
each of these, conditional on i, is independent of position in the sequence and independent
of the terms before that i. The same situation (with di↵erent probabilities) occurs after
each j, and ‘nothing’ can be viewed as a trapping state. Thus this sequence (including the
‘nothings’) forms a 3-state Markov chain. Since state j is transient in the original chain,
it is also transient in this 3-state chain, since j appears the same finite number of times in
each.
It is given that there is a path from i to j in the original chain, so there is a nonzero
transition probability from i to j in the 3-state chain. If there is also a path from j to i in
the original chain, then there is a nonzero transition from j to i in the 3-state chain, so j
and i are in the same class (both for the original and the 3-state chain). If there is no such
transition from j to i, then they are not in the same class, but both are still transient (as
seen from the analysis of classes for finite-state chains. Thus i is transient in the 3-state
chain, so the number of occurrences of state i in the original chain is finite with probability
1. Thus i is transient also.
It is not necessary, in this argument, to be able to calculate the transition probabilities in
the 3-state chain, but it might seem a little less abstract to see how to calculate them. For
example, to find the transition probability from i to i in the 3-state chain, consider changing
the original Markov chain by making j a trapping state. This does not change Fij (1), but
Fii (1) then becomes the probability that an i follows an i in the 3-state chain. The other
transition probabilities are found in a similar way.
Murphy’s law is not a law, but rather a popular saying used to express pessimism. In terms
of Murphy’s law, if there is a nonzero probability that i ever gets to a state from which
there is the possibility of no return, then that possibility will eventually happen.
Exercise 6.5: This exercise treats a mathematical detail from the proof of Lemma 6.2.4. Assume that j
is recurrent, there is a path of probability ↵ > 0 from j to i, and n
defective rv giving the first-passage time from j to i.
1 is an integer. Let Tji be the possibly
196
APPENDIX A. SOLUTIONS TO EXERCISES
Verify the following equations for any integer t > 0:
n
o
n
o
(n)
(n)
Pr{Tji > t} = Pr Tji > t, Tjj > t + Pr Tji > t, Tjj  t
n
o
n
o
(n)
(n)
 Pr Tjj > t + Pr Tji > Tjj
n
o
(n)
 Pr Tjj > t + (1 ↵)n
Pr{Tji = 1}

(1
n
↵) .
(A.121)
(A.122)
(A.123)
(A.124)
Solution: (A.121) simply splits the event {Tji > t} into two disjoint terms; (A.122) upper
bounds the probability of the joint event in the first term
by the probability
of the first of
n
o
(n)
those events. The second term can be rewritten as Pr Tji > t Tjj
and upper bounded
(n)
by eliminating t. Note that {Tji > Tjj } is the event that the first visit to i comes after the
first n visits to j. This is upper bounded in (A.123) by the probability of not taking the given
path of probability ↵ from j to i in each of the first n visits to j. Finally, (A.124)
takes
n
o the
(n)
(n)
limit as t ! 1, first using the fact that Tjj (t) is a rv, so that limt!1 Pr Tjj > t = 0.
Since n is arbitrary, this shows that limt!1 Pr{Tji > t} = 0, which is the result claimed in
Lemma 6.2.4.
Exercise 6.6: (Renewal theory proof that a class is all recurrent or all transient.) Assume that state j
in a countable-state Markov chain is recurrent and i communicates with j. More specifically, assume that
there is a path of length m from i to j and one of length k from j to i.
a) Show that for any n > 0,
Piim+n+k
n
k
Pijm Pjj
Pji
.
(A.125)
n = 0, then the right side of (A.125) is zero, so (A.125) is satisfied in
Solution: First, if Pjj
n > 0, there is one or more walks of length m from i
that case. Next, for any n such that Pjj
to j, one or more walks of length n from i to i and one or more walks of length k from i to
j. Concatenating these, we have one or more walks of length m + n + k from i to i. The set
of such walks, which go from i to j in m steps, j to j in n steps, and j to i in k steps has
n P k and is a subset of all walks that go from i to i in m + n + k
positive probability Pijm Pjj
ji
steps, establishing (A.125).
b) By summing over n, show that state i is also recurrent. Hint: Use Theorem 6.2.6.
Solution: Using the fourth part of Theorem 6.2.6 and the fact that j is recurrent, we have
P1
(n)
n=1 Pjj = 1. Using (A.125),
X̀
n=1
(m+n+k)
Pii
= Pijm Pjik
X̀
n
Pjj
.
n=1
Since Pijm Pjik > 0, this has the limit 1 as ` ! 1. Applying the if and only if aspect of the
fourth part of the theorem shows that i is recurrent.
c) Explain why this shows that all states in a class are recurrent or all are transient.
Solution: We have shown that if a state j is recurrent, then each state i that communicates
with j (i.e., is in the same class as j) is also recurrent. Thus if any one state in the class is
197
A.6. SOLUTIONS FOR CHAPTER 6
recurrent, then all states in that class are, which also means that if any state is recurrent,
no state can be transient. Thus if any state is transient, all states must be transient.
Exercise 6.7: Consider an irreducible Markov chain that is positive recurrent. Recall the technique
used to find the expected first passage time from j to i, i.e., T ji , in Section 4.5. The state i was turned
into a trapping state by turning all the transitions out of i into a single transition Pii = 1. Here, in order
to preserve the positive recurrence, we instead move all transitions out of state i into the single transition
Pij = 1.
a) Use Figure 6.2 to illustrate that the above strategy can turn an irreducible chain into a reducible chain.
Also explain why states i and j are still positive recurrent and still in the same class.
Solution: In the figure below, we have modified state 2 so that is has only one outgoing
transition and that transition goes to state 3. As seen, states 0 and 1 have become transient,
but 2 and 3 (and all higher numbered states) are in the same recurrent class.
More generally, if any state i in this chain (other than state 0) is modified to have only one
outgoing transition, then if that one remaining transition is upward, all paths downward
from that state are severed. If j > i+1, then the transition from i to i+1 is also severed,
but all the states between i and j are reachable are reachable via j. Similarly, if the one
severed transition is downward, all paths from i upward are severed. Either way, the chain
is no longer irreducible.
q
: 0n
⇠
y
X
b) Let {⇡k0 ; k
p
q=1 p
z n
X
1
p
z n
X
2 X
y
1
q
z n
X
3 X
y
p
q
z n
X
4
...
0} be the steady-state probabilities for the positive-recurrent class in the modified Markov
0
chain. Show that the expected first passage time T ji from j to i in the modified Markov chain is (1/⇡i0 )
1.
Solution: We showed in (a) that states i and j are positive recurrent in the modified chain
for Figure 6.2.2, and we show in part (c) that they are positive recurrent in the modified
chain whenever the original chain is positive recurrent. Since the only outgoing transition
from state i in the modified chain goes to state j, we see that every first passage from i
back to i goes first to j and from there back to i via a first passage from j to i. Thus for
the modified chain, the expected first passage time from j to i is 1 less than the expected
first passage from i to i. That expected first passage time from i to i is 1/⇡i0 by renewal
theory (assuming that j is positive recurrent and thus ⇡i0 > 0.)
c) Show that the expected first passage time from j to i is the same in the modified and unmodified chain.
Solution: Each sample walk for a first passage from j to i contains state i only as the final
state in the path, and thus cannot include an outward transition from i. Thus each such
walk in the original chain is a walk in the modified chain and vice versa. The probability of
0
each such walk is the same in the modified chain as in the original chain. Thus T ji = T ji .
0
0
0
It follows that T ii = T ji + 1 = T ji + 1. Since ⇡i0 = 1/T ii , we can show that ⇡i0 > 0 by
showing that T ji < 1. Since i and j communicate in the modified chain, this will show
that i and j are positive recurrent in the modified chain.
For each state j in the original positive-recurrent chain, we have ⇡j = 1/Tjj > 0 and thus
198
APPENDIX A. SOLUTIONS TO EXERCISES
T jj < 1. We can express
T jj = Pjj +
X
k6=j
⇥
⇤
Pjk T kj + 1 .
Thus T kj < 1 for each k such that there is a path of length 1 from j to k. In the same
way, for every path `, k, j of length 2 to state j, we have
X
⇥
⇤
T kj = Pkj +
Pk` T `j + 1 .
`6=k
Since T kj < 1, this shows that T `j < 1 for all ` such that there is a path of length 2
from j to `. By iteration to longer paths, we see that for an arbitrary path length from j
to i, T ij < 1. Since the chain is recurrent, there is a path from i to j for each i and j,
so T ij < 1 for all i, j. Thus the class of states including i and j in the modified chain is
positive recurrent. Note that the modified chain is a function of both i and j, so ⇡i0 is a
function of j as well as i.
d) Show by example that after the modification above, two states i and j that were not positive recurrent
before the modification can become positive recurrent and the above technique can again be used to find
the expected first passage time.
Solution: Let the modified chain in (a) be the original chain in this part. Consider finding
the expected first passage time from state 1 to 2. Then the modified chain has a transition
of probability 1 from state 2 to 1 and states 0, 1, and 2 become a positive recurrent class.
The expected time from 1 to 2 is then 1/⇡20 1.
Exercise 6.8: Let {Xn ; n
0} be a branching process with X0 = 1. Let Y ,
of the number of o↵spring of an individual.
2
be the mean and variance
a) Argue that limn!1 Xn exists with probability 1 and either has the value 0 (with probability F10 (1)) or
the value 1 (with probability 1
F10 (1)).
Solution: We consider 2 special, rather trivial, cases before considering the important case
(the case covered in the text). Let {pi ; i
0} be the PMF of the number of o↵spring of
each individual. Then if p1 = 1, we see that Xn = 1 for all n, so the statement to be argued
is simply false. You should be proud of yourself if you noticed the need for ruling this case
out before constructing a proof.
The next special case is where p0 = 0 and p1 < 1. Then Xn+1 Xn (i.e., the population
never shrinks but can grow). Since Xn (!) is non-decreasing for each sample path, either
limn!1 Xn (!) = 1 or limn!1 Xn (!) = j for some j < 1. The latter case is impossible,
m = pmj ! 0 as m ! 1.
since Pjj = pj1 and thus Pjj
1
Ruling out these two trivial cases, we have p0 > 0 and p1 < 1 p0 . In this case, state 0
is recurrent (i.e., it is a trapping state) and states 1, 2, . . . are in a transient class. To see
this, note that P10 = p0 > 0, so F11 (1)  1 p0 < 1, which means by definition that state
1 is transient. All states i > 1 communicate with state 1, so by Theorem 6.2.5, all states
j 1 are transient. Thus one can argue that the process has ‘no place to go’ other than 0
or 1.
199
A.6. SOLUTIONS FOR CHAPTER 6
The following tedious analysis makes this precise. Each j > 0 is transient, so from Theorem
6.2.6 part 3,
lim E [Njj (t)] < 1.
t!1
Note that N1j (t) is the number of visits to j in the interval [1, t] starting from state 1 at
time 0. This is one more than the number of returns to j after the first visit to j. The
expected number of such returns is upper bounded by the number in t steps starting in j,
so E [N1j (t)]  1 + E [Njj (t)]. It follows that
lim E [N1j (t)] < 1
for each j > 0.
t!1
P
n.
Now note that the expected number of visits to j in [1, t] can be rewritten as tn=1 P1j
Since this sum in the limit t ! 1 is finite, the remainder in the sum from t to 1 must
approach 0 as t ! 1, so
lim
t!1
X
n
P1j
= 0.
n>t
From this, we see that for every finite integer `,
lim
t!1
X X̀
n
P1j
= 0.
n>t j=1
This says that for every ✏ > 0, there is a t sufficiently large that the probability of ever
entering states 1 to ` on or after step t is less than ✏. Since ✏ > 0 is arbitrary, all sample
paths (other than a set of probability 0) never enter states 1 to ` after some finite time.
Since ` is arbitrary, limn!1 Xn exists WP1 and is either 0 or 1. By definition, it is 0 with
probability F10 (1).
b) Show that VAR [Xn ] =
2
Y
n 1
(Y
n
1)/(Y
1) for Y 6= 1 and VAR [Xn ] = n 2 for Y = 1.
Solution: We will demonstrate the case for Y 6= 1 and Y = 1 together by showing that
h
i
n 1
2
n 1
VAR [Xn ] = 2 Y
1 + Y + Y + ··· + Y
.
(A.126)
⇥ ⇤
⇥
⇤
First express E Xn2 in terms of E Xn2 1 . Note that, conditional on Xn 1 = `, Xn is the
sum of ` IID rv’s each with mean Y and variance 2 , so
X
⇥ ⇤
⇥
⇤
E Xn2 =
Pr{Xn 1 = `} E Xn2 | Xn 1 = `
`
=
X
`
=
=
h
i
2
Pr{Xn 1 = `} ` 2 + `2 Y
⇤
2 ⇥
E [Xn 1 ] + Y E Xn2 1
⇤
2 ⇥
2 n 1
Y
+ Y E Xn2 1 .
2
200
APPENDIX A. SOLUTIONS TO EXERCISES
We also know from (6.50) (or by simple calculation) that E [Xn ] = Y E [Xn 1 ]. Thus,
h
i2
E [Xn ]
=
⇥ ⇤
VAR [Xn ] = E Xn2
2
Y
n 1
2
+ Y VAR [Xn 1 ] .
(A.127)
We now use induction to derive (A.126) from (A.127). For the base of the induction, we
see that X1 is the number of progeny from the single element X0 , so VAR [X1 ] = 2 . For
the inductive step, we assume that
h
i
n 2
2
n 2
VAR [Xn 1 ] = 2 Y
1 + Y + Y + ··· + Y
.
Substituting this into (A.127),
VAR [Xn ] =
=
completing the induction.
h
i
n
n 2
+ 2Y 1 + Y + · · · + Y
h
i
n 1
2 n 1
Y
1 + Y + ··· + Y
,
2
Y
n 1
Exercise 6.9: There are n states and for each pair of states i and j, a positive number dij = dji is given.
A particle moves from state to state in the following manner: Given that the particle is in any state i, it
will next move to any j 6= i with probability Pij given by
Pij = P
dij
k6=i
dik
.
(A.128)
Assume that Pii = 0 for all i. Show that the sequence of positions is a reversible Markov chain and find the
limiting probabilities.
Solution: From Theorem 6.5.3, {Pij } is the set of transition probabilities and ⇡ is the
steady-state probability vector of a reversible chain if ⇡i Pij = ⇡j Pji for all i, j. Thus, given
{dij }, we attempt to find ⇡ to satisfy the equations
⇡i dij
⇡j dji
⇡i Pij = P
=P
= ⇡j Pji ;
k dik
k djk
for all i, j.
We have taken dii = 0 for all i here. Since dij = dji for all i, j, we can cancel dij and dji
from the inner equations, getting
⇡
⇡
P i =P j ;
for all i, j.
k dik
k djk
P
P
Thus ⇡i must be proportional to k dik , so normalizing to make i ⇡i = 1, we get
P
k dik
P
P
⇡i =
;
for all i.
`
k d`k
If the chain has a countably infinite number of states, this still works if the sums exist.
This exercise is not quite as specialized as it sounds, since given any reversible Markov chain,
we can definite dij = ⇡i Pij and get this same set of equations (normalized in a special way).
201
A.6. SOLUTIONS FOR CHAPTER 6
Exercise 6.10: Consider a reversible Markov chain with transition probabilities Pij and limiting probabilities ⇡i . Also consider the same chain truncated to the states 0, 1, . . . , M . That is, the transition
probabilities {Pij0 } of the truncated chain are
(
P
P m ij
; 0  i, j  M
0
Pik
k=0
Pij =
.
0
; elsewhere.
Show that the truncated chain is also reversible and has limiting probabilities given by
PM
⇡i
j=0 Pij
⇣
⌘.
⇡i = P
PM
M
k=0 ⇡k
m=0 Pkm
(A.129)
Solution: The steady-state probabilities {⇡i ; i 0} of the original chain must be positive.
Thus ⇡ i > 0 for each i. By summing
P ⇡ i over i in (A.129), it is seen that the numerator is
the same as the denominator, so M
i=0 ⇡ i = 1. Finally,
⇡ i Pij0 = P
M
k=0
⇡P
⇣ iPij
⌘.
M
⇡k
P
km
m=0
Since ⇡i Pij = ⇡j Pji for each i, j, it is clear that ⇡ i Pij0 = ⇡ j Pji0 for each i, j  M. Thus, by
Theorem 6.5.3, the truncated chain is reversible.
Exercise 6.11: A Markov chain (with states {0, 1, 2, . . . , J
1} where J is either finite or infinite) has
transition probabilities {Pij ; i, j
0}. Assume that P0j > 0 for all j > 0 and Pj0 > 0 for all j > 0. Also
assume that for all i, j, k, we have Pij Pjk Pki = Pik Pkj Pji .
a) Assuming also that all states are positive recurrent, show that the chain is reversible and find the steadystate probabilities {⇡i ; i
0} in simplest form.
Solution: Section 6.5.3 shows that reversibility implies the three-cycle condition Pij Pjk Pki =
Pik Pkj Pji , and the exercise asks us to show that, with the addition of another condition
about positive transition probabilities, the three cycle condition also implies reversibility.
We first show why there is a need for some condition on positive probabilities, then demonstrate the result for finitely many states, and finally, after quite a bit of work, give a
demonstration for the countable case; this final demonstration also yields quite a few other
results.
To see the need for certain positive probabilities, consider a four-state chain where P01 =
P12 = P23 = P30 = 2/3 and P32 = P21 = P10 = P03 = 1/3. Assume that Pij = 0 for all
other i, j. This is not reversible, since (from symmetry) ⇡i = 1/4 for all i, but, for example,
⇡0 P01 6= ⇡1 P10 . The trouble here is that the 3-cycle condition is satisfied, but only satisfed
vacuously since there are no 3-cycles with positive probability in this chain. This example
can be extended to show the irreversibility of a chain with any number of states if three
states, say 1, 2, 3, have the same transition probabilities as above (and thus communicate
with states i > 3 only through state 0) and P01 = 2↵/3 and P03 = ↵/3, where 1 ↵ > 0 is
the aggregate probability of going from state 0 to states i > 3.
Next assume the positivity conditions in the problem statement again, and suppose the
number of states J is finite. From the guessing theorem, Theorem 6.5.3, the chain is
202
APPENDIX A. SOLUTIONS TO EXERCISES
reversible if {⇡i ; i 0} exists such that ⇡i Pij = ⇡j Pji for all i, j. To select a set of ⇡i that
satisfy this condition, let ⇡0 be arbitrary for the moment and let ⇡i = ⇡0 P0i /Pi0 for all i > 0
(each ⇡i , i P1 exists since Pi0 > 0). This ensures that ⇡i Pi0 = ⇡0 P0i is satisfied for all
i > 1. Since i ⇡i must be 1 for the guessing theorem conditions to be satisfied, we have
"
#
X
X P0i
1=
⇡i = ⇡0 1 +
.
(11.a)
Pi0
i
i>0
The sum has finitely many terms and by assumption each Pi0 is positive. Thus the sum is
finite and for the guessing theorem conditions to be satisfied, we must have
"
# 1
"
# 1
X P0i
X P0i
P0i
⇡0 = 1 +
and thus ⇡i =
1+
.
Pi0
Pi0
Pi0
i>0
i>0
This satisfies all the guessing theorem conditions except ⇡i Pij = ⇡j Pji for each i, j > 0. For
any such i, j > 0, consider the 3-cycle condition P0i Pij Pj0 = P0j Pji Pi0 . Since P0j > 0 and
P0i > 0, we can solve for Pji ,
Pji = Pij
P0i Pj0
⇡i ⇡0
⇡i
= Pij
= Pij .
Pi0 P0j
⇡0 ⇡j
⇡j
This demonstrates the reversibility condition for each i, j > 0, and thus, by the guessing
theorem, the chain is reversible with the indicated steady-state probabilities.
This P
does not quite work when the number of states is countably infinite, since then the
sum i>0 P0i /Pi0 might be infinite. Since the chain is positive recurrent, the true ⇡0 is
positive, but the ⇡0 in the above argument is only an hypothesized probability and thus
requires a di↵erent approach to show that it must be positive.
We now use the following 3 lemmas to demonstrate reversibility when the number of states
is countably infinite.
Lemma 1: For the conditions in the exercise, and for arbitrary states i, j, k, we have
P0i Pij Pjk Pk0 = P0k Pkj Pji Pi0 ,
Proof: If this is restricted to distinct nonzero i, j, k, it says that all 4-cycles that include state 0 are reversible. The lemma does not restrict i, j, k to be distinct and nonzero.
Similarly the condition P0i Pij Pj0 = P0j Pji Pi0 in the problem statement does not require
i and j to be distinct and nonzero. It makes no di↵erence whether the problem statement applies when i, j are equal to each other or to 0, since in those cases, for example
P00 P0j Pj0 = P0j Pj0 P00 , the equality holds automatically without any need for an assumption. We prove the lemma, however, without the assumption that i, j, k are nonzero and
unique.
For arbitrary states i, j, k, we now have
P0i Pij Pj0 P0j Pjk Pk0
Pj0 P0j
P0j Pji Pi0 P0k Pkj Pj0
=
Pj0 P0j
= Pji Pi0 P0k Pkj = P0k Pkj Pji Pi0 .
P0i Pij Pjk Pk0 =
203
A.6. SOLUTIONS FOR CHAPTER 6
where the first line multiplies and divides by the positive term Pj0 P0j ; the second uses the
given condition on P0i Pij Pj0 (and P0j Pjk Pk0 ); and the third rearranges terms.
The next lemma generalizes Lemma 1 from three states, i, j, k to an arbitrary number.
Lemma 2: For arbitrary m
or nonzero).. Then
2, let i1 , i2 , . . . , im be arbitrary states (not necessarily unique
P0i1 Pi1 i2 · · · Pim 1 im Pim 0 = P0im Pim im 1 · · · Pi2 i1 Pi1 0 .
(11.b)
Proof: We use induction. The given condition, P0i Pij Pj0 = P0j Pji Pi0 is the desired result
for m = 2 and Lemma 1 gives the desired result for m = 3. The inductive proof below
assumes the result to be true for any m 1 1 and uses this to establish the result for m.
Thus Lemma 1 is not really necessary except as a less abstract version of the proof below.
(P0i1 Pi1 i2 · · · Pim 2 im 1 Pim 1 0 ) (P0im 1 Pim 1 im Pim 0 )
Pim 1 0 P0,im 1
(P0,im 1 Pim 1 im 2 · · · Pi2 i1 Pi1 0 ) (P0,im Pim im 1 Pim 1 ,0 )
=
Pim 1 ,0 P0,im 1
= P0im Pim im 1 · · · Pi2 i1 Pi1 0 ,
P0i1 Pi1 i2 · · · Pim 1 im Pim 0 =
where the steps above are the same as those in Lemma 1.
Lemma 3: For each m
2 and each state i > 0,
m
m
P0i Pi0
= P0i
Pi0 .
Proof: Note that Pi1 ,i2 , Pi2 i3 , . . . Pim ,0 is the probability of a given path of length m going
from state i1 to 0. If this is summed over all choices for i2 , i3 , . . . , im , we get the probability
of going from state i1 to 0 in m steps, i.e.,
X
Pi1 i2 , Pi2 i3 , . . . Pim 0 = Pim
.
10
i2 ,i3 ,... ,im
Thus the left side of (11.b), summed over i2 , . . . , im is P0i1 Pim
. In the same way, the right
10
m
side of (11.b), summed over i2 , i3 , . . . , im is P0,i1 Pi1 0 . Since each term being summed on the
left is equal to the corresponding term on the right, we have
1
m
P0i1 Pim
= P0i
P .
1 i1 0
10
Since i1 > 0 is arbitrary, the lemma is proved.
We can now prove that the chain is reversible. There are two cases; first, the chain is
aperiodic and second, the chain is periodic. If the chain is aperiodic, then by Blackwell (see
m = ⇡ and lim
m
Section 6.3.3), limm!1 P0i
i
m!1 Pi0 = ⇡0 . Thus ⇡0 P0i = ⇡i Pi0 for all i. It
then follows by the same argument as in the finite Markov chain case that ⇡i Pij = ⇡j Pji
for all i, j, so the chain is reversible.
Finally, the periodic case can arise only if Pij = 0 for all i > 0 and j > 0, and if in addition
P00 = 0. In this case, the chain is obviously reversible. This reversibility also follows from
Blackwell’s theorem.
204
APPENDIX A. SOLUTIONS TO EXERCISES
Lemma 2 above can also be modified slightly to show that the 3-cycle condition (where i, j
are distinct and nonzero) implies the m-cycle condition (again with distinct and nonzero
states).
b) Find a condition on {P0j ; j
recurrent.
0} and {Pj0 ; j
0} that is sufficient to ensure that all states are positive
Solution: Positive recurrence was one of the given conditions in part (a) for demonstrating
reversibility.
P However, (11.b) was derived without using the positive recurrence assumption.
Thus, if i>0 PP0i
< 1 is satisfied, then (11.a) specifies all the ⇡i . The rest of the argument
i0
used above for the finite-state
case then shows that the chain is both positive recurrent
P
P0i
and reversible. Thus i>0 Pi0 < 1, in addition to the other conditions of part (a) except
positive recurrence, is sufficient to demonstrate positive recurrence (as well as reversibility.
Exercise 6.12: a) Use the birth and death model described in figure 6.4 to find the steady-state
probability mass function for the number of customers in the system (queue plus service facility) for the
following queues:
i) M/M/1 with arrival probability
, service completion probability µ .
ii) M/M/m with arrival probability
, service completion probability iµ for i servers busy, 1  i  m.
iii) M/M/1 with arrival probability
for all i of interest.
, service probability iµ for i servers. Assume
so small that iµ < 1
Assume the system is positive recurrent.
Solution: M/M/1: This is worked out in detail in section 6.6. From (6.45), ⇡i = (1
where ⇢ = /µ < 1.
⇢)⇢i
M/M/m: The Markov chain for this is
: 0n
⇠
y
X
µ
z n
X
1 X
y
O
z n
X
2
...
2µ
O
z mn
X
y
X
mµ
O
mµ
zm+1
X
n
...
O
This is a birth-death chain and the steady-state probabilitiers are given by (6.33) where
⇢i = /((i+1)µ) for i < m and ⇢i = /mµ for i m. Evaluating this,
⇡i =
⇡0 ( /µ)i
i!
for i < m;
⇡i =
⇡0 ( /µ)i
m!(mi m )
for i
m,
where ⇡0 is given by
⇡0 = 1 +
1
m
X1
i=1
( /µ)i
(m⇢m )m
+
.
i!
m!(1 ⇢m )
(A.130)
M/M/1: The assumption that is so small that iµ < 1 for all ‘i of interest’ is rather
strange, since we don’t know what is of interest. When we look at the solution to the
M/M/1 and M/M/m sample-time queues above, however, we see that they do not depend
on . It is necessary for mµ  1 for the Markov chain to be defined, and mµ << 1 for
it to be a reasonable approximation to the behavior of the continuous time queue, but the
205
A.6. SOLUTIONS FOR CHAPTER 6
results turn out to be the same as the continuous time queue in any case, as will be seen in
Chapter 7. What was intended here was to look at the limit of the M/M/m queue in the
limit m ! 1. When we do this,
⇡i =
⇡0 ( /µ)i
i!
for all i,
where
⇡0 1 = 1 +
1
X
( /µ)i
i=1
i!
.
Recognizing this as the power series of an exponential, ⇡0 = e
/µ .
b) For each of the queues above give necessary conditions (if any) for the states in the chain to be i) transient,
ii) null recurrent, iii) positive recurrent.
Solution: The M/M/1 queue is transient if /µ > 1, null recurrent if /µ = 1, and positive
recurrent if /µ < 1 (see Section 6.6). In the same way, the M/M/m queue is transient if
/mµ > 1, null recurrent if /mµ = 1 and positive recurrent if /mµ < 1. The M/M/1
queue is positive recurrent in all cases. As the arrival rate speeds up, the number of servers
in use increases accordingly.
c) For each of the queues find:
L = (steady-state) mean number of customers in the system.
Lq = (steady-state) mean number of customers in the queue.
W = (steady-state) mean waiting time in the system.
W q = (steady-state) mean waiting time in the queue.
Solution: The above parameters are related in a common way for the three types of queues.
First, applying Little’s law first to the system and then to the queue, we get
W = L/ ;
W q = Lq / .
(A.131)
Next, define Lv as the steady-state mean number of customers in service and W v as the
steady state waiting time per customer in service. We see that for each system, W v = 1/µ,
since when a customer enters service the mean time to completion is 1/µ. From Little’s law
applied to service, Lv = /µ. Now L = Lq + Lv and W = W q + W v . Thus
L = Lq + /µ;
W = W q + 1/µ.
(A.132)
Thus for each of the above queue types, we need only compute one of these four quantities
and the others are trivially determined. We assume positive recurrence in all cases.
For M/M/1, we compute
L =
X
i
i⇡i = (1
⇢)
1
X
i=0
i⇢i =
⇢
1
⇢
=
µ
.
206
APPENDIX A. SOLUTIONS TO EXERCISES
For M/M/m, we compute Lq since queueing occurs only in states i > m.
Lq =
X
(i
m)⇡i =
i>m
=
where ⇡0 is given in (A.130).
i>m
X j⇡0 ⇢jm ( /µ)m
j>0
X (i m)⇡0 ( /µ)i
m!(mi m )
m!
=
⇡0 ⇢m ( /µ)m
,
(1 ⇢m )2 m!
Finally, for M/M/1, there is no queueing, so Lq = 0.
Exercise 6.13: a) Given that an arrival occurs in the interval (n , (n+1) ) for the sampled-time M/M/1
model in figure 6.5, find the conditional PMF of the state of the system at time n (assume n arbitrarily
large and assume positive recurrence).
Solution: The probability of an arrival in (n , (n + 1) ) is the same for all states at time
n and thus is independent of the state at n . Thus
Pr{Xn =i | (Xn+1
Xn )=1} = Pr{Xn = i} = (1
⇢)⇢i ,
where the final equality assumes steady state.
b) For the same model, again in steady state but not conditioned on an arrival in (n , (n + 1) ), find the
probability Q(i, j) for (i
j > 0) that the system is in state i at n and that i
j departures occur before
the next arrival.
Solution: We do this directly for the sample-time Markov model and then briefly describe
the same argument for the underlying continuous time process. The marginal probability
Pr{Xn = i} in steady state is (1 ⇢)⇢i . The probability that the next state change is a
departure (given that Xn > 0) is µ/(µ + ). Similarly, the probability that the next i j
changes are departures (given that Xn > j) is [µ/(µ + )]i j . This is because each of these
changes is independent of the others. Finally the probability that the next change after
these i j departures is an arrival is /( +µ). Thus Q(i, j), the probability that the state
is i and exactly i j departures occur before the next arrival, is

µ i j
i
Q(i, j) = (1 ⇢)⇢
;
for i j > 0.
µ+
µ+
c) Find the expected number of customers seen in the system by the first arrival after time n . (Note: the
purpose of this exercise is to make you cautious about the meaning of “the state seen by a random arrival”).
Solution: Note that Q(i, j) is the probability that Xn = i and that XA = j where A is the
time at which the next arrival occurs in the transition from A to A + . In other words, XA
is the state seen by the next arrival after time n. We have
◆ ✓ ◆j ✓
◆i j ✓
◆
1
X
X ✓
Pr{XA = j} =
Q(i, j) =
1
µ
µ
µ+
µ+
i=j
i j 0
✓
◆ ✓ ◆j+1
=
1
.
µ
µ
207
A.6. SOLUTIONS FOR CHAPTER 6
This says that the next arrival after time n does not see a steady-state distribution. What
is happening is that if the state at time n is 0, then the next change must be an arrival.
If the state is positive, then the next change might be a departure, decreasing the state so
that when the next arrival occurs, the state tends to have a smaller value than it has in
steady state.
The same argument applies for the continuous queue, where now we can view arrivals and
departures as competing Poisson processes. There is a famous principle called PASTA,
meaning Poisson arrivals see time averages. It is a useful principle, but the meaning of ‘see’
is very subtle. We have just shown that the first arrival after an arbitrary time n in steady
state does not see a steady-state distribution .
Exercise 6.14: Find the backward transition probabilities for the Markov chain model of age in Figure
6.3. Draw the graph for the backward Markov chain, and interpret it as a model for residual life.
Solution: The backward transition probabilities are by definition given by Pij⇤ = ⇡j Pji /⇡i .
Since the steady-state probabilities ⇡i are all positive, Pij⇤ > 0 if and only if Pji > 0. Thus
the graph for the backward chain is the same as the forward chain except that all the arrows
are reversed, and, of course, the labels are changed accordingly. The graph shows that there
is only one transition coming out of each positive numbered state in the backward chain.
⇤
Thus Pi,i
1 = 1 for all i > 0. This can be easily verified algebraically from 6.27. It is also
⇤ = Pr{W = i + 1}.
seen that Pi0
9
⇠
: 0n
⇠
⇤
P00
⇤
P10
⇤
P01
9
⇠
: 1n
⇠
⇤
P21
⇤
P02
1
⇣
9
⇠
2n
⇤
P32
⇤
P03
1
⇣
9
⇠
3n
⇤
P04
⇤
P43
n
1 4
⇣
...
We can interpret this Markov chain as representing residual life for an integer renewal
process. The number of the state represents the residual life immediately before a transition,
i.e., state 0 means that a renewal will happen immediately, state 1 means that it will happen
in one time unit, etc. When a renewal occurs, the next state will indicate the residual life at
the end of that unit interval. Thus when a transition from 0 to i occurs, the corresponding
renewal interval is i + 1.
For those confused by both comparing the chain above as the backward chain of age in
Figure 6.3 and as the residual life chain of the same renewal process, consider a sample
path for the renewal process in which an inter-renewal interval of 5 is followed by one of 2.
The sample path for age in Figure 6.3 is then (0, 1, 2, 3, 4, 0, 1, 2, 0) The sample path for
residual life for the same sample path of inter-renewals, for the interpretation above, is (0,
4, 3, 2, 1, 0, 2, 1, 0). On the other hand, the backward sample path for age is (0, 2, 1, 0,
4, 3, 2, 1, 0). In other words, the residual life sample path runs backward from age within
each renewal interval, but forward between renewals.
Exercise 6.15: Consider the sample time approximation to the M/M/1 queue in Figure 6.5.
a) Give the steady-state probabilities for this chain (no explanations or calculations required–just the answer).
208
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: ⇡i = (1
⇢)⇢i for i
0 where ⇢ = /µ.
In parts b) to g) do not use reversibility and do not use Burke’s theorem. Let Xn be the state of the
system at time n and let Dn be a random variable taking on the value 1 if a departure occurs between n
and(n + 1) , and the value 0 if no departure occurs. Assume that the system is in steady state at time n .
b) Find Pr{Xn = i, Dn = j} for i
0, j = 0, 1.
Solution: The purpose of the following parts is to prove Burke’s theorem by elementary
means with no reference to reversibility. Given Xn = 0, a departure cannot occur in
(n , n+1) ) since there is no customer to depart. Thus Dn = 0. Given Xn = i for any
i > 0, Dn = 1 with probability µ . Thus
Pr{Xn =0, Dn =1} = 0;
Pr{Xn =0, Dn =0} = 1 ⇢
Pr{Xn =i, Dn =1} = µ (1 ⇢)⇢ ;
i
Pr{Xn =i, Dn =0} = (1 µ )(1 ⇢)⇢i ;
for i > 0.
c) Find Pr{Dn = 1}.
Solution: Summing over the values of Xn above,
X
Pr{Dn = 1} =
µ (1 ⇢)⇢i = µ ⇢ =
.
i 1
This is initially counter-intuitive. We are used to thinking of departures conditional on
the state of the system, where, so long as the state is positive, departures are independent
and occur with probabiity µ . This says that in the marginal distribution for departures
a departure occurs in an increment with probability . Part (g) shows that the marginal
distribution (averaged over state) for a sequence of departures is IID at rate .
d) Find Pr{Xn = i | Dn = 1} for i
0.
Solution: From (b) and (c), we see that Pr{Xn = 0 | Dn = 1} = 0. For i
Pr{Xn = i | Dn = 1} =
µ (1
⇢)⇢i
= (1
1,
⇢)⇢i 1 .
e) Find Pr{Xn+1 = i | Dn = 1} and show that Xn+1 is statistically independent of Dn . Hint: Use (d); also
show that Pr{Xn+1 = i} = Pr{Xn+1 = i | Dn = 1} for all i
0 is sufficient to show independence.
Solution: Given Dn = 1, i.e., a departure in (n , (n+1) ), we have Xn+1 = Xn
Pr{Xn+1 = i|Dn = 1} = Pr{Xn = i+1|Dn =1} = (1
1.
⇢)⇢i ,
where we have used (d). From (a), this is Pr{Xn+1 = i}, so the event {Xn+1 = i} is
independent of the event {Dn = 1} and thus also independent of the complementary event
{Dn = 0}. Thus the rv Xn+1 is independent of the rv Dn .
f ) Find Pr{Xn+1 = i, Dn+1 = j | Dn } and show that the pair of variables (Xn+1 , Dn+1 ) is statistically
independent of Dn .
Solution:
Pr{Xn+1 =i, Dn+1 =j | Dn } = Pr{Dn+1 =j | Xn+1 =i, Dn } Pr{Xn+1 =i | Dn }
= Pr{Dn+1 =j | Xn+1 =i} Pr{Xn+1 =i | Dn }
= Pr{Dn+1 =j | Xn+1 =i} Pr{Xn+1 =i}
= Pr{Dn+1 =j, Xn+1 =i} .
(A.133)
(A.134)
A.6. SOLUTIONS FOR CHAPTER 6
209
In (A.133), we used the fact {Xn ; n 0} is a Markov chain, so that, given Xn=1 , the rv’s
Xn and Xn+2 are independent. Since Dn+1 is a function of Xn+1 and Xn+2 and Dn is a
function of Xn+1 and Xn , it follows that Dn+1 and Dn are independent conditional of Xn+1 .
In (A.134) we used the independence between Xn+1 and Dn established in (e).
g) For each k > 1, find Pr{Xn+k = i, Dn+k = j | Dn+k 1 , Dn+k 2 , . . . , Dn } and show that the pair (Xn+k , Dn+k )
is statistically independent of (Dn+k 1 , Dn+k 2 , . . . , Dn ). Hint: use induction on k; as a substep, find
Pr{Xn+k = i | Dn+k 1 = 1, Dn+k 2 , . . . , Dn } and show that Xn+k is independent of Dn+k 1 , Dn+k 2 , . . . , Dn .
Solution: We use induction on k, noting that (f) establishes the base of the induction at
k = 1. We compress the notation by letting Dnn+k 1 represent (Dn , Dn+1 , . . . , Dn+k 1 ).
Then, using the same approach as in (f),
n
o
n
o n
o
Pr Xn+k , Dn+k | Dnn+k 1
= Pr Dn+k | Xn+k , Dnn+k 1 Pr Xn+k | Dnn+k 1
n
o
= Pr{Dn+k | Xn+k } Pr Xn+k | Dnn+k 1
(A.135)
= Pr{Dn+k | Xn+k } Pr{Xn+k }
(A.136)
= Pr{Dn+k , Xn+k } .
The argument in (A.135) is almost the same as that in (A.133). The only di↵erence is
that here, Dnn+k 1 is a function of Xn , Xn+1 , . . . , Xn+k , which, given Xn+k , is independent of Xn+k+1 and thus Dn+k . In (A.136), we have claimed that Pr Xn+k | Dnn+k 1 =
Pr{Xn+k }, and that is the step where we must use the hint. We look at this first for the
case Dn+k 1 = 1 and use the argument in (e).
n
o
n
o
Pr Xn+k =i | Dn+k 1 =1, Dnn+k 2 = Pr Xn+k 1 =i+1 | Dn+k 1 =1, Dnn+k 2 .
From the inductive hypothesis, this is independent of Dnn+k 2 , and from (e), it is equal to
Pr{Xn+k = i}.
h) What do your results mean relative to Burke’s theorem.
Solution: We have in fact proven Burke’s theorem for sampled time M/M/1 queues. Since
Xn+k , Dn+k is independent of the prior sequence of departures, Dn+k is also independent
of prior departures. Thus the departure process running backward in time is a Bernoulli
process of rate
and the state is independent of previous departures. The argument was
not particularly difficult, except for being unmotivated and somewhat tricky. The use of
reversibility provides a much more elegant and insightful way of proving this result.
Exercise 6.16: Let {Xn , n
1} denote an irreducible recurrent Markov chain having a countable state
space. Now consider a new stochastic process {Yn , n 0} that only accepts values of the Markov chain that
are between 0 and some integer m. For instance, if m = 3 and X1 = 1, X2 = 3, X3 = 5, X4 = 6, X5 = 2,
then Y1 = 1, Y2 = 3, Y3 = 2.
a) Is {Yn , n
0} a Markov chain? Explain briefly.
Solution: {Yn ; n 1} is a Markov chain. The transition probability, say Qij from i to j
in this modified chain is Pij (the probability of a direct transition in the original chain),
plus the aggregate probability of all distinct walks in the original chain starting in i, ending
210
APPENDIX A. SOLUTIONS TO EXERCISES
in j, and remaining in states greater than m otherwise. The probability of each such walk,
conditional on the initial state i, is independent of states previous to that initial state,
providing the Markov property.
b) Let pj denote the proportion of time that {Xn , n
of time is {Yn , n
1} is in state j. If pj > 0 for all j, what proportion
0} in each of the states 0, 1, . . . , m?
Solution: The assumption that pj > 0 for all j implies that {Xn ; n
1} is positive
recurrent. Thus, if the X chain starts in any given state i, the fraction of time spent by the
X process in state j is limt!1 (1/t)Nij (t) where the limit exists and is equal to pj WP1.
Time in the Y chain is di↵erent from time in the X chain. The clock ‘ticks’ in the Y chain
only when the state of the X chain is between 0 and m. Thus
Pm the ratio of time on the Y
clock to time on the X clock as X time goes to infinity is k=1 pk WP1.
Thus if we let Mij (⌧ ) be the number of visits to state j (starting in state i) up to time ⌧
on the Y clock, we see that time ⌧P
on the Y clock, as a function of sample path and time t
on the X clock is given by ⌧ (t) = m
k=0 Nik (t). Thus
Nij (t)
(1/⌧ )Mij (⌧ ) = Pm
.
k=0 Nik (t)
Taking the limit as ⌧ (and thus t) approaches 1, we get
lim (1/⌧ )Mij (⌧ ) =
⌧ !1
Nij (t)
pj
t
Pm
= Pm
.
t!1
t
k=0 Nik (t)
k=0 pk
lim
c) Suppose {Xn } is null recurrent and let pi (m), i = 0, 1, . . . , m denote the long-run proportions for {Yn , n
0}. Show that pj (m) = pi (m)E[time the X process spends in j between returns to i], for j 6= i.
Solution: We interpret E[time the X process spends in j between returns to i] as expected
visits to j between visits to i. Successive returns to i form a renewal process (or delayed
renewal process, depending on the starting state) in the X process (albeit with an infinite
expected renewal interval). For each sample path, the number of visits to j between visits
to i is the same in the Y process as the X process, Also, the Y process is a finite-state
Markov chain, and thus pi (m) > 0 for 0  i  m. The result then follows by viewing visits
to i as unit rewards in the renewal process of visits to j in the Y process.
Exercise 6.17: Verify that (6.58) is satisfied by the hypothesized solution to p in (6.62). Also show that
the equations involving the idle state
are satisfied.
Solution: Let us first review where these equations come from. We started by making some
plausible guesses about the backward transition probabilities. If these guesses are correct,
then (6.56)-(6-59) must be satisfied, by the definition of backward probabilities. Then using
the guess in (6.57) about departures in the forward direction, we derived a general formula
for arbitrary states in (6.62), repeated here as
0
1
✓
◆m Y
m
@
⇡s =
F (zj )A ⇡ .
(A.137)
1
j=1
211
A.6. SOLUTIONS FOR CHAPTER 6
where the state s is s = (m, z1 , . . . , zm ). We have also normalized time so that = 1 is
the unit service time. This formula satisfies (6.57) since it was derived from (6.57). As
shown in the text. it also satisfies (6.56), i.e., the relationship between rotation transitions
assuming that the guesses were correct. If the formula also satisfies (6.58) (for forward
arrival transitions) and if it satisfies the transitions from the empty state, then, by the
guessing theorem, Theorem 6.5.4, the guess must be correct. We then use (A.137) to
⇤
calculate ⇡s Ps,a(s) and ⇡a(s) Pa(s),s
and show they are the same.
⇡s Ps,a(s) =
(1
f (1))⇡s =
(1
f (1))
⇤
Since a(s) = (m+1, z1 , . . . , zm , 1) and Pa(s),s
=1
⇤
⇡a(s) Pa(s),s
= (1
=
Noting that 1
satisfied.
✓
)
1
✓
1
◆m
@
m
Y
j=1
◆m
1
,
◆m+1
0
✓
0
@
m
Y
j=1
1
0
@
m
Y
j=1
1
F (zj )A ⇡ .
1
F (zj )A F (1)⇡
F (zj )A F (1)⇡ .
f (1) = F (1), we see that these two expressions are the same and (6.58) is
There are two kinds of transitions from the empty state, first self transitions from to
and second transitions from to (1, 1). We clearly have P , = P ⇤, so ⇡ P , = ⇡ P ⇤, .
Finally,
⇡ P ,(1,1) = ⇡
F (1);
⇤
⇡1,1 P(1,1),
=
1
F (1)⇡ (1
),
⇤
so ⇡ P ,(1,1) = ⇡1,1 P(1,1),
.
Exercise 6.18: Replace the state m = (m, z1 , . . . , zm ) in Section 6.8 with an expanded state m =
(m, z1 , w1 , z2 , w2 , . . . , zm , wm ) where m and {zi ; 1  i  m} are as before and w1 , w2 , . . . , wm are the
original service requirements of the m customers.
a) Hypothesizing the same backward round-robin system as hypothesized in Section 6.8, find the backward transition probabilities and give the corresponding equations to (6.56-6.59) for the expanded state
description.
Solution: It might be helpful to do Exercise 6.17 before this to increase familiarity with this
type of argument. We normalize the time reference to make = 1 and use s to refer to the
expanded state here. If an arrival with m > 1 required units of service occurs in state s =
(m, w1 , z1 , . . . , wm , zm ), then the next state is a(s, m) = (m+1, w1 , z1 , . . . , wm , zm , w, w 1).
The probability of this transition is Ps,a(s(w)) = f (w).
The hypothesis we are adopting for the backward system is the same hypothesis as in
Section 6.8, namely the backward system is also round robin with Bernoulli arrivals at rate
212
APPENDIX A. SOLUTIONS TO EXERCISES
per time unit. The state is denoted using the same notation as the forward system, but
the customer at the right side of the queue (customer m) is served and then rotates to the
left side and zm represents the remaining service rather than the service already applied.
Thus an arrival in the forward system corresponds to a departure in the backward system.
A departure occurs only if the service still required by the customer at the right end of
the queue is one less than that of the original requirement and if there is no arrival in the
backward system.
The tricky part is to look at the state a(s, w), w > 1 = (m+1, w1 , z1 , . . . , wm , am , w, w 1)
and realize that when the backward system ‘sees’ this state, it sees only the state and does
not ‘know’ that the forward system just had an arrival. The state a(s, w); w > 1 satisfies the
backward requirement for departure, i.e., the rightmost element satisfies z = w 1. Also,
with probability 1
(from the backward systems view), no backward arrival occurs at
⇤
that increment. Thus Pa(s,w),s
=1
. The analogy to (6.56), assuming w > 1, with this
new way of representing states is then
Ps,a(s,w) = f (w);
⇤
Pa(s,w),s
=1
f (w)⇡s = (1
)⇡a(s,w) .
Using the same type of analysis for a departure in the forward system, a departure can only
only if s = (m, w1 , w1 1, w2 , z2 , . . . , wm , zm . If in addition no arrival occurs, a departure
occurs and the new state is d(s) = (m 1, w2 , z2 , . . . , wm , zm . In the backward system, this
corresponds to an arrival with service requirement wm ; this causes a backward transition
from d(s) to s. The probability of these transitions and the resulting hypothesized balance
equation is given by the analogy to 6.57,
Ps,d(s) = (1
);
⇤
Pd(s),s
= f (w1 );
(1
)⇡s = f (w1 )⇡d(s) ,
where we remember that this applies only for s = (m, w1 , w1 1, w2 , z2 , . . . , wm , zm ), i.e.,
for m 1 where z1 = w1 1.
Finally, rotation occurs only for s such that m
with probability 1
. The analogy to (6.58) is
Ps,r(s) = 1
;
1 and z1 < w1
⇤
Pr(s),s
=1
;
1, and it then occurs
⇡s = ⇡r(s) .
Finally a transition from s to s occurs if there is an arrival of requirement 1. The corresponding backward transition is also an arrival of requirement 1. The analogy to 6.59
is
⇤
Ps,s
= f (1);
Ps,s = f (1);
⇡s = ⇡s .
b) Solve the resulting equations to show that
⇡m = ⇡
⇣
1
m
⌘m Y
f (wj ).
(A.138)
j=1
Solution: For any state s, consider the string of transitions with no arrivals until the system
becomes empty. The product of these transition probabilities consist of 1’s for each rotation
and f (wi )/(1
) for each departure, leading to (A.138). There is a slight mathematical
question here in verifying that each state can be reduced to in this way. Note that after
213
A.6. SOLUTIONS FOR CHAPTER 6
each arrival, the new (w, z) pair satisfies z = 1, w > 1 and each rotation increases one of the
z’s, but only up to w 1 for that pair. Thus each feasible state contains only pairs for which
zi < wi . The zi ’s are increasing with time and the customer departs when z1 = w1 1. It
is easy to check that (A.138) is consistent with each of the balance equations above, and
therefore, by Theorem 6.5.4, both the backward round robin hypothesis and (A.138) are
verified.
c) Show that the probability that there are m customers in the system, and that those customers have
original service requirements given by w1 , . . . , wm , is
!m m
Y
Pr{m, w1 , . . . , wm } = ⇡
(wj 1)f (wj ).
(A.139)
1
j=1
Solution: We saw in (b) that all feasible states satisfy zi < wi for each
Q i, 11  i  m. Thus
if all states satisfying 1  zi  wi 1 are feasible, then there are m
i=1 (wi 1) di↵erent
states with the same values of m, w1 , . . . , wm . To show that these states are all feasible,
start with an arbitrary hypothesized state s = m, w1 , z1 , . . . , wm , zm ) with 1  zi < w1 , 1 
i  m. The argument in (c) finds the probability of that state (if it is feasible), and
the corresponding backwardQsequence then shows that that state is feasible. Adding the
1
probability of each of these m
i=1 (wi 1) and using (A.138) gives us (A.139).
d) Given that a customer has original service requirement w, find the expected time that the customer
spends in the system.
Solution: The probability that m customers are in the system in steady state is given
in (6.66) as Pm = (1 ⇢)⇢m where ⇢ = [ /(1
)](W 1) (recall that the notation for
states in this exercise does not change the underlying system). Thus the probability that
the m customers in the system have original
. , wm given m customers
Qm requirements w1 , . .m
in the system is Pr{w1 , . . . , wm | m} = i=1 (wi 1)f (wi )[W 1] . Thus the requirements
of the m customers are independent given m, Let N (w) be the number of customers in
the system in steady state that have requirement w. Conditional on m in the system,
E [N (w)|m] = m(w 1)f (w)/[W 1]. Averaging over m, then
E [N (w)] =
1
⇢
⇢)[W
1]
.
Finally using Little’s law on the customers requiring w units of service, the expected delay
of those customers is (w 1)/[1
(W 1)].
Note that as the increment of service becomes small, W and typical values of w become
large since we have normalized time to the increment of service. The fact that w and W
are reduced by 1 here is primarily an anomaly of the state labeling process. For example,
if all requirements were unit requirements, then the system would always be in state 0, In
the same way, if there is only one customer in the sytem with requirment w, the the state
includes that customer only for w 1 time units. There is nothing wrong in this; the state
accurately represents the system at each integer time at the end of the service during that
increment. However, quantities like delay must be interpreted with some care.
214
A.7
APPENDIX A. SOLUTIONS TO EXERCISES
Solutions for Chapter 7
Exercise 7.1: Consider an M/M/1 queue as represented in Figure 7.4. Assume throughout that X0 = i
where i > 0. The purpose of this exercise is to understand the relationship between the holding interval until
the next state transition and the interval until the next arrival to the M/M/1 queue. Your explanations in
the following parts can and should be very brief.
a) Explain why the expected holding interval E [U1 |X0 = i] until the next state transition is 1/( + µ).
Solution: By definition of a countable state Markov process, Un , conditional on Xn 1 is
an expontial. For the M/M/1 queue, the rate of the exponential out of state i > 0 is + µ,
and thus the expected interval U1 is 1/( +µ).
b) Explain why the expected holding interval U1 , conditional on X0 = i and X1 = i + 1, is
E [U1 |X0 = i, X1 = i + 1] = 1/( + µ).
Show that E [U1 |X0 = i, X1 = i
1] is the same.
Solution: The holding interval U1 , again by definition of a countable state Markov process,
conditional on X0 = i > 0, is independent of the next state X1 (see Figure 7.1). Thus
E [U1 |X0 = i, X1 = i + 1] = E [U1 |X0 = i] =
1
.
+µ
This can be visualized by viewing the arrival process and the departure process (while the
queue is busy) as independent Poisson processes of rate and µ respectively. From Section
2.3, on splitting and combining of Poisson processes, U1 (the time until the first occurrence
in the combined Poisson process) is independent of which split process (arrival or departure)
that first occurrence comes from.
Since this result is quite unintuitive for most people, we explain it in yet another way.
Quantizing time into very small increments of size , the probability of a customer arrival
in each increment is
and the probability of a customer departure (assuming the server is
busy) is µ . This is the same for every increment and is independent of previous increments
(so long as the server is busy). Thus X1 ( which is X0 + 1 for an arrival and X0 1 for a
departure) is independent of the time of that first occurrence, Thus given X0 > 0, the time
of the next occurrence (U1 ) is independent of X1 .
c) Let V be the time of the first arrival after time 0 (this may occur either before or after the time W of
the first departure.) Show that
E [V |X0 = i, X1 = i + 1]
=
E [V |X0 = i, X1 = i
=
1]
1
.
+µ
1
1
+ .
+µ
(A.140)
(A.141)
Hint: In the second equation, use the memorylessness of the exponential rv and the fact that V under this
condition is the time to the first departure plus the remaining time to an arrival.
Solution: Given X0 = i > 0 and X1 = i+1, the first transtion is an arrival, and from
(b), the interval until that arrival is 1/( +µ), verifying (A.140). Given X0 = i > 0 and
X1 = i 1, the first transition is a departure, and from (b) the conditional expected time
until this first transition is 1/( +µ). The expected time after this first transition to an
215
A.7. SOLUTIONS FOR CHAPTER 7
arrival is simply the expected interval until an arrival, starting at that first transition,
verifying (A.141).
d) Use your solution to (c) plus the probability of moving up or down in the Markov chain to show that
E [V ] = 1/ . (Note: you already know that E [V ] = 1/ . The purpose here is to show that your solution to
(c) is consistent with that fact.)
Solution: This is a sanity check on (A.140) and (A.141). For i > 0,
Pr{X1 =i+1 | X0 =i} =
+µ
;
Pr{X1 =i 1 | X0 =i} =
µ
.
+µ
Using this with (A.140) and (A.141),
E [V | X0 =i] =
=

1
µ
1
1
+
+
+µ
+µ
+µ +µ
1
µ
1
+
= .
+µ
( +µ)
·
Exercise 7.2: Consider a Markov process for which the embedded Markov chain is a birth-death chain
with transition probabilities Pi,i+1 = 2/5 for all i
otherwise.
a) Find the steady-state probabilities {⇡i ; i
1, Pi,i 1 = 3/5 for all i
1, P01 = 1, and Pij = 0
0} for the embedded chain.
Solution: The embedded chain is given below. Note that it is the same embedded chain
as in Example 7.2.8 and Exercise 7.3, but the process behavior will be very di↵erent when
the holding times are brought into consideration.
0n
y
X
1
1
3/5
z n
X
1 X
y
21
2/5
3/5
z n
X
2 X
y
22
2/5
3/5
z n
X
3
23
...
The embedded chain is a birth/death chain, and thus the steady-state probabilities are
related by (2/5)⇡i = (3/5)⇡i+1 for i
1. Thus, as we have done many times, ⇡i =
⇡1 (2/3)i 1 . The transitions between state 0 and 1 are di↵erent, and ⇡1 = (5/3)⇡0 . Thus
for i 1, we have
✓ ◆i 1
✓ ◆i
5
2
5
2
⇡i =
·
⇡0 =
·
⇡0 .
3
3
2
3
P
Setting the sum of ⇡0 plus i 1 ⇡i to 1, we get ⇡0 = 1/6. Thus
⇡i = (5/12)(2/3)i
for i
1;
b) Assume that the transition rate ⌫i out of state i, for i
1
⇡0 = .
6
(A.142)
0, is given by ⌫i = 2i . Find the transition
rates {qij } between states and find the steady-state probabilities {pi } for the Markov process. Explain
heuristically why ⇡i 6= pi .
Solution: The transition rate qij is given by ⌫i Pij . Thus for i > 0,
qi,i+1 =
2 i
·2 ;
5
qi,i 1 =
3 i
·2 ;
5
q01 = 1.
216
APPENDIX A. SOLUTIONS TO EXERCISES
⇥ P
⇤
The simplest way to evaluate pi is by (7.7), i.e., pi = ⇡i / ⌫i j 0 ⇡j /⌫j .
X ⇡j
j 0
2j
1
5 X
=
+
6 12
i>0
✓ ◆i
1
3
= .
3
8
Thus p0 = 4/9 and, for i > 0, pi = (10/9)3 i . Note that ⇡i is the steady-state fraction of
transitions going into state i and pi is the steady-state fraction of time in state i. Since
⌫i 1 = 2 i is the expected time-interval in state i per transition into state i, we would think
(correctly) that pi would approach 0 much more rapidly as i ! 1 than ⇡i does.
c) Explain why there is no sampled-time approximation for this process. Then truncate the embedded chain
to states 0 to m and find the steady-state probabilities for the sampled-time approximation to the truncated
process.
Solution: In order for a sampled-time approximation to exist, one must be able to choose
the time-increment small enough so that the conditional probability of a transition out of
the state in that increment is small. Since the transition rates are unbounded, this cannot
be done here. If we truncate the chain to states 0 to m, then the ⌫i are bounded by 2m , so
choosing a time-increment less than 2 m will allow a sampled-time approximation to exist.
There are two sensible ways to do this truncation. In both, qm,m+1 must be changed to
0. but then qm,m 1 can either be kept the same (thus reducing ⌫m ) or ⌫m can be kept the
same (thus increasing qm,m 1 )). We keep qm,m 1 the same since it simplifies the answer
(m)
sightly. Let {pi ; 0  i  m} be the steady-state process PMF in the truncated chain.
Since these truncated chains (as well as the untruncated chain) are birth-death chains, we
(m)
(m) Q
can use (7.38), pi = p0
j<i ⇢j , where ⇢j = qj,j+1 /qj+1,j . Thus, ⇢0 = 5/6 and ⇢i = 1/3
for 1 < i < m. Thus
✓ ◆i 1
1
(m)
(m) 5
pi = p0
for i  m.
6 3
(m)
Since p0
(m)
(m)
+ p1
(m)
+ · · · + pm = 1, we can solve for p0 from


5
5
(m)
(m)
1 = p0
1+
1 + 3 1 + 3 2 + · · · 3 m+1
= p0
1 + (1
6
4
3 m) .
Combining these equations.
(m)
p0
For
=
9
4
;
5(3 m )
(m)
pi
=
10(3 i )
9 5(3 m )
for 1  i  m.
(A.143)
< 2 m , these are also the sampled-time ‘approximation’ to the truncated chain.
d) Show that as m ! 1, the steady-state probabilities for the sequence of sampled-time approximations
approach the probabilities pi in (b).
(m)
Solution: For each i, we see from (A.143) that limm!1 pi
= pi . It is important to
recognize that the convergence is not uniform in i since (A.143) is defined only for i  m.
Similarly, for each i, the approximation is close only when both m i and is small relative
to 3 m .
217
A.7. SOLUTIONS FOR CHAPTER 7
Exercise 7.3: Consider a Markov process for which the embedded Markov chain is a birth-death chain
with transition probabilities Pi,i+1 = 2/5 for all i
otherwise.
a) Find the steady-state probabilities {⇡i ; i
1, Pi,i 1 = 3/5 for all i
1, P01 = 1, and Pij = 0
0} for the embedded chain.
Solution: This is the same embedded chain as in Exercise 7.2. The steady-state embedded
probabilities were calculated there in (A.142) to be
⇡i = (5/12)(2/3)i
for i
b) Assume that the transition rate out of state i, for i
1
⇡0 = .
6
1;
0, is given by ⌫i = 2 i . Find the transition rates
{qij } between states and show that there is no probability vector solution {pi ; i
0} to (7.23).
Solution: This particular Markov process was discussed in Example 7.2.8. The transition
rates qij are given by q01 = 1 and, for all i > 0,
qi,i+1 = Pi,i+1 ⌫i =
2
· 2 i;
5
Pi,i 1 ⌫i =
3
· 2 i.
5
The other transition rates are all 0. We are to show that there is no probability vector
{pi : i 0} solution to (7.23), i.e., no solution to
X
X
pj ⌫j =
pi qij for all j 0 with
pi = 1.
(7.230 )
i
i
0
Let ↵P
j = pj ⌫j , so that any hypothesized solution to (7.23 ) becomes ↵j =
and i ↵i /⌫i = 1.
P
i ↵i Pij for all j
Since the embedded
chain is positive
P
P recurrent, there is a unique solution to {⇡i ; i 0} such
that ⇡j = i ⇡i Pij for all j and i ⇡i = 1. Thus there must be some 0 < <
P1 such that
any hypothesized solution to (7.230 ) satisfies ↵j = ⇡j for all j
0. Thus i ⇡j /⌫j < 1
for this hypothesized solution.
j
Now note
P that ⇡j /⌫j = (5/12)(4/3) . Since this increases exponentially with j, we must
have j ⇡j /⌫j = 1. Thus there cannot be a probability vector solution to (7.23). We
discuss this further after (c) and (d).
c) Argue that the expected time between visits to any given state i is infinite. Find the expected number
of transitions between visits to any given state i. Argue that, starting from any state i, an eventual return
to state i occurs with probability 1.
Solution: From Theorem 7.2.6 (which applies since the embedded chain
P is positive recurrent),
the
expected
time
between
returns
to
state
i
is
W
(i)
=
(1/⇡
)
i
k ⇡k /⌫k . Since
P
k ⇡k /⌫k is infinite and ⇡i is positive for all i, W (i) = 1. As explained in Section 7.2, W (i)
is a rv(i.e., non-defective), and is thus finite with probability 1 (guaranteeing an eventual
return with probability 1). Under the circumstances here, however, it is a rv with infinite
expectation for each i. The expected number of transitions between returns to state i is
finite (since the embedded chain is positive recurrent), but the increasingly long intervals
spent in high numbered states causes the expected renewal time from i to i to be infinite.
d) Consider the sampled-time approximation of this process with
Markov chain and argue why it must be null recurrent.
= 1. Draw the graph of the resulting
218
Solution:
APPENDIX A. SOLUTIONS TO EXERCISES
0n
y
X
1
3/10
z n
X
1 X
y
O
1/2
2/10
3/20
z n
X
2 X
y
O
3/4
2/20
3/40
z n
X
3
...
O
7/8
Note that the approximation is not very good at the low-numbered states, but it gets better
and better for higher numbered states since ⌫i becomes increasingly negligible compared to
= 1 and thus there is negligible probability of more than one arrival or departure in a unit
increment. Thus it is intuitively convincing that the mean interval between returns to any
given state i must be infinite, but that a return must happen eventually with probability
1. This is a convincing argument why the chain is null recurrent.
At the same time, the probability of an up-transition from i to i + 1 is 4/3 of the probability
of a down-transition from i + 1 to i, makng the chain look like the classical example of a
transient chain in Figure 6.2. Thus there is a need for a more convincing argument.
One could truncate both the process and the sample-time chain to states 0 to m and then
go to the limit as m ! 1, but this would be very tedious.
We next outline a procedure that is mathematically rigorous to show that the chain above
is null recurrent; the procedure can also be useful in other circumstances. Each sample
sequence of states for the chain above consists of runs of the same state separated on
each side by a state either one larger or one smaller. Consider creating a new Markov
chain by replacing each such run of repeated states with a single copy of that state. It
is easy to see that this new chain is Markov. Also since there can be no repetition of a
state, the new transition probabilities, say Qi,i+1 and Qi,i 1 for i > 0 satisfy Qi,i+1 =
Pi,i+1 /(Pi,i+1 + Pi,i 1 ) = 2/5 and Qi,i 1 = 3/5. Thus this new chain is the same as the
embedded chain of the original process, which is already known to be positive recurrent.
At this point, we can repeat the argument in Section 7.2.2, merely replacing the exponentially distributed reward interval in Figure 7.7 with a geometrically distributed interval.
The expected first passage time from i to i (in the approximation chain) is then infinite as
before, and the return occurs eventually with probability 1. Thus the approximation chain
is null recurrent.
The nice thing about the above procedure is that it can be applied to any birth death chain
with self transitions.
Exercise 7.4: Consider the Markov process for the M/M/1 queue, as given in Figure 7.4.
a) Find the steady-state process probabilities (as a function of ⇢ = /µ) from (7.15) and also as the solution
to (7.23). Verify that the two solutions are the same.
Solution: For the M/M/1 queue, the embedded steady-state probabilities are calculated
in (7.4) to be ⇡0 = (1 ⇢)/2 and ⇡i = (1/2)(1 ⇢2 )⇢i 1 for i > 0 where ⇢ = /µ and ⇢ < 1
is assumed. Since the holding parameters ⌫0 = and ⌫i = + µ for P
i > 0, it is easy but
somewhat
tedious
to
calculate
p
from
(7.7),
i.e.,
from
p
=
(⇡
/⌫
)/(
j
j
j j
k ⇡k /⌫k ). We find
P
j
that k ⇡k /⌫k = 1/2 , and finally pj = (1 ⇢)⇢ . This is clearly not the simple approach
for the M/M/1 queue. Substituting this into (7.23), we easily see that this solution also
A.7. SOLUTIONS FOR CHAPTER 7
219
satisfies (7.23). It also satisfies (7.37), which of course is the easy way to find the pj .
b) For the remaining parts of the exercise, assume that ⇢ = 0.01, thus ensuring (for aiding intuition) that
states 0 and 1 are much more probable than the other states. Assume that the process has been running for a
very long time and is in steady state. Explain in your own words the di↵erence between ⇡1 (the steady-state
probability of state 1 in the embedded chain) and p1 (the steady-state probability that the process is in state
1). More explicitly, what experiments could you perform (repeatedly) on the process to measure ⇡1 and p1 .
Solution: ⇡1 is the long term fraction of transitions going into state 1, whereas p1 is the
long term fraction of time in state 1. With ⇢ = 0.01, the process is in state zero 99% of
the time and state one .99% ⇡ 1% of the time. Since for every transition into state 0, the
following transition must go to state 1, it is not surprising that ⇡1 = (1/2)(1 ⇢)2 ⇡ 1/2.
One could measure the ⇡i ’s by tabulating the fraction of transitions going into each state
over a very long period. One could measure the pi ’s by tabulating the fraction of time
in each state over a very long period. One could also measure the pi ’s by averaging the
number of samples in that state over a long interval, using either evenly spaced or randomly
spaced samples. One could not measure eiher the the ⇡i ’s or pi ’s, however, by sampling the
next transition after an evenly spaced or randomly spaced sample time. With ⇢ = 0.01, the
process is in state zero 99% of the time, and the next transition would then be state 1 with
probability slightly more than 0.99.
The pi ’s are called the steady-state process probabilities, and this terminology makes sense
because, from (7.11), limt!1 Pr{X(t) = i} = pi . Thus after the process has been running
a long time t, pi gives the probability that the process is in state i at time t. The ⇡i can
only be considered as steady-state probabilities within the sequence of state transitions in
the embedded Markov chain.
c) Now suppose you want to start the process in steady state. Show that it is impossible to choose initial
probabilities so that both the process and the embedded chain start in steady state. Which version of steady
state is closest to your intuitive view? (There is no correct answer here, but it is important to realize that
the notion of steady state is not quite as simple as you might imagine).
Solution: The embedded chain is in steady state when the state distribution is given by
{⇡i ; i 0} and the process is in steady state when the state distribution is {pi ; i 0}. We
have seen that these distributions are di↵erent. Most people would view the process to be
started in steady state if Pr{X(0) = i} = pi , since then Pr{X(t) = i} = pi for all t. This is
why the pi are denoted as steady-state process probabilities.
d) Let M (t) be the number of transitions (counting both arrivals and departures) that take place by time
t in this Markov process and assume that the embedded Markov chain starts in steady state at time 0.
Let U1 , U2 , . . . be the sequence of holding intervals between transitions (with U1 being the time to the first
transition). Show that these rv’s are identically distributed. Show by example that they are not independent
(i.e., M (t) is not a renewal process).
Solution: If the embedded Markov chain is in steady state, then the state at the next
transition is also in steady state, so U2 has the same distribution as U1 . If we visualize
this process as an M/M/1 queue, we see occasional arrivals usually followed very quickly
by a departure, so arrivals and departures almost alternate, with rather rare periods when
a customer actually waits in a queue. Analytically, since ⇡0 = 0.99/2 and 1 ⇡0 = 1.01/2,
220
APPENDIX A. SOLUTIONS TO EXERCISES
we see that FcU1 (t) ⇡ (1/2)e t + (1/2)e ( +µ)t . If we take
then FcU1 (t) ⇡ 1/2 and we get
Pr{U1 > t, U2 > t | X0 = 0, X1 = 1} = exp
⇥
= 0.01, µ = 1, and t = 10,
⇤
⇥
( + µ)t = exp
t
⇤
10.2 ⇡ 0.
We get the same answer conditioning on X0 = 1, X1 = 0, and the condition X0 = X1 is an
impossible event. The event {U1 > 10, U2 > 10} is more unlikely than above for all other
combinations of X0 , X1 , so Pr{U1 > 10, U2 > 10} ⇡ 0, demonstrating a lack of statistical
independence. The fact that M (t) is not a renewal process (for this process or others) makes
the analysis in Section 7.2 more complex than it might appear to be.
Exercise 7.5: Consider the Markov process illustrated below. The transitions are labelled by the rate
qij at which those transitions occur. The process can be viewed as a single server queue where arrivals
become increasingly discouraged as the queue lengthens. The word time-average below refers to the limiting
time-average over each sample-path of the process, except for a set of sample paths of probability 0.
0n
y
X
µ
z n
X
1 X
y
/2
µ
z n
X
2 X
y
/3
µ
z n
X
3 X
y
/4
µ
z n
X
4
...
a) Find the time-average fraction of time pi spent in each state i > 0 in terms of p0 and then solve for
p0 . Hint: First find an equation relating pi to pi+1 for each i. It also may help to recall the power series
expansion of ex .
Solution: The pi , i 0 for a birth-death Markov process are related by pi+1 qi+1,i = pi qi,i+1 ,
which in this case is pi+1 µ = pi /(i + 1). Iterating this equation,
pi = pi 1
µi
= pi 2
2
µ2 i(i 1)
= · · · = p0
i
µi i!
.
Denoting /µ by ⇢,
1=
1
X
pi = p0
i=0
"1
#
X ⇢i
i=0
i!
= p0 e⇢ .
Thus,
p0 = e ⇢ ;
b) Find a closed form solution to
P
pi =
⇢i e ⇢
.
i!
i pi ⌫i where ⌫i is the rate at which transitions out of state i occur. Show
that the embedded chain is positive recurrent for all choices of
> 0 and µ > 0 and explain intuitively why
this must be so.
Solution: The embedded chain steady-state probabilities ⇡i can be found from the steadystate process probabilities by (7.11), i.e.,
pj ⌫j
⇡j = P
i ⇡i ⌫i
(7.110 ).
221
A.7. SOLUTIONS FOR CHAPTER 7
We start by finding
P
i pi ⌫i . The departure rate from state i is
⌫0 = ;
⌫i = µ +
i+1
for all i > 0.
P
We now calculate
i pi ⌫i by separating out the i = 0 term and then, for i
separately over the two terms, µ and /(i + 1), of ⌫i .
1
X
pi ⌫i = e ⇢ +
i=0
Substituting µ⇢ for
1
X
e ⇢
i=1
1, sum
1
⇢i µ X ⇢ ⇢i
+
e
.
i!
i!(i + 1)
i=1
and combining the first and third term,
1
X
pi ⌫i =
i=0
1
X
e ⇢
i=1
1
X
= 2
i=1
P
1
⇢i µ X ⇢ ⇢i+1 µ
+
e
i!
(i+1)!
i=0
e ⇢
⇢i µ
i!
= 2µ(1
e ⇢ ).
P
Since i pi ⌫i < 1, we see from (7.110 ) that each ⇡i is strictly positive and that i ⇡i = 1.
Thus the embedded chain is positive recurrent. Intuitively, {⇡i ; i
0} is found from
0
0
{pi ; i
0} by finding ⇡i = pi ⌫i and then normalizing {⇡i ; i
0} to sum to 1. Since
 ⌫P

+
µ,
i.e.,
the
⌫
all
lie
within
positive
bounds,
this
normalization
must work
i
i
(i.e., i ⇡i ⌫i < 1).
c) For the embedded Markov chain corresponding to this process, find the steady-state probabilities ⇡i for
each i
0 and the transition probabilities Pij for each i, j.
⇥P
⇤
P
Solution: Since ⇡j = pj ⌫j /
i pi ⌫i , we simply plug in the values for pi , ⌫i , and
i pi ⌫i
found in (a) and (b) to get
✓
◆
⇢
⇢i
⇢
⇡0 =
;
⇡
=
+
1
; for i > 1.
i
2(e⇢ 1)
2i!(e⇢ 1) i + 1
There are many forms for this answer. One sanity check is to observe that the embedded
chain probabilities do not change if and µ are both multiplied by the same constant, and
thus the ⇡i must be a function of ⇢ alone. Another sanity check is to observe that in the
limit ⇢ ! 0, the embedded chain is dominated by an alternation between states 0 and 1, so
that in this limit ⇡0 = ⇡1 = 1/2.
d) For each i, find both the time-average interval and the time-average number of overall state transitions
between successive visits to i.
Solution: The time-average interval between visits to state i is W i = 1/(pi ⌫i ). This is
explained in detail in Section 7.2.6, but the essence of the result is that for renewals at
successive entries to state i, pi must be the ratio of the expected time 1/⌫i spent in state i
to the overall expected renewal interval W i . Thus W i = 1/(⌫i ⇢i ).
W0 =
e⇢
;
Wi =
(i + 1)! e⇢
;
⇢i [ + (i + 1)µ]
for i
1.
222
APPENDIX A. SOLUTIONS TO EXERCISES
The time-average number of state transitions per visit to state i is T ii = 1/⇡i . This is
proven in Theorem 6.3.8.
Exercise 7.6: (Detail from proof of Theorem 7.2.6 a) Let Un be the nth holding interval for a Markov
process starting in steady state, with Pr{X0 = i} = ⇡i . Show that E [Un ] =
P
k ⇡k /⌫k for each integer n.
Solution: There are two reasonable definitions of starting in steady state for a Markov
process; the first is for the embedded chain to start in steady state and the second is for the
process to start with X(0) having a PMF equal to the process steady-state probabilities pi .
(see Exercise 7.4). Here it is explicitly state that the embedded chain starts in steady state,
and thus at each transition, Xn is in steady state with the PMF Pr{Xn = k} = ⇡k . Since
the expected holding interval given
P Xn = k is 1/⌫k , the marginal expected holding interval
at each transition n is E [Un ] = k ⇡k /⌫k .
b) Let Sn be the epoch of the nth transition. Show that Pr{Sn
Solution: E [Sn ] = E [U1 ] + E [U2 ] + · · · E [Un ] = n
ity, we get the desired result.
P
n }
P
k ⇡k /⌫k
for all
> 0.
k ⇡k /⌫k . Applying the Markov inequal-
c) Let M (t) be the number of transitions
of the Markov process up to time t, given that X0 is in steady
P
state. Show that Pr{M (n )
n}
k ⇡k /⌫k
1
.
Solution: Using (b),
Pr{Sn  n } = 1
Pr{Sn > n }
1
Pr{Sn
n }
1
P
k ⇡k /⌫k
.
The event {Sn  n } is the event that the nth transition occurs before time n . This is
the same as the event {M (n ) n}, i.e., the event that the number of transitions by time
n is n or more. Thus for any > 0 and every n,
P
k ⇡k /⌫k
Pr{M (n ) n} 1
.
d) Show that if
P
k ⇡k /⌫k is finite, then limt!1 M (t)/t = 0 WP1 is impossible. (Note that this is equivalent
to showing that limt!1 M (t)/t = 0 WP1 implies
P
P
k ⇡k /⌫k = 1).
P
Solution: If k ⇡k /⌫k is finite and is, say, 2 k ⇡k /⌫k , then Pr{M (n ) n} 1/2 for
all n. Thus (1/n)M (n ) does not converge to 0 in probability and thus not with probability
1.
e) Let Mi (t) be the number of transitions by time t starting in state i. Show that if
limt!1 Mi (t)/t = 0 WP1 is impossible.
P
k ⇡k /⌫k is finite, then
Solution: Eq. (7.18) shows that limt!1 Mi (t)/t is the same for all i and thus the same as
limt!1 M (t)/t. Thus none of these can be equal to 0 WP1.
Exercise 7.7: a) Consider the process in the figure below. The process starts at X(0) = 1, and for all
i
1, Pi,i+1 = 1 and ⌫i = i2 for all i. Let Sn be the epoch when the nth transition occurs. Show that
E [Sn ] =
n
X
i=1
i 2 < 2 for all n.
223
A.7. SOLUTIONS FOR CHAPTER 7
Hint: Upper bound the sum from i = 2 by integrating x 2 from x = 1.
1k
1
- 2k
- 3k
4
9
- 4k
16 -
Solution:
E [Sn ] 
1
X
i=1
i 2  1+
Z 1
x 2 dx = 2.
1
b) Use the Markov inequality to show that Pr{Sn > 4}  1/2 for all n. Show that the probability of an
infinite number of transitions by time 4 is at least 1/2.
Solution: By the Markov inequality, Pr{Sn 4}  2/4 = 1/2 for all n. Thus Pr{Sn < 4} >
1/2 for all n. Since {Sn < 4} ◆ {Sn+1 < 4} for all n, Pr{limn!1 Sn < 4}
1/2. This
means that infinitely many transitions have occured by time 4 with probability at least 1/2.
Thus {X(t); t > 0} is not really a stochastic process since X(t) is not defined as a rv for
t > 4, (and in fact not for smaller t either) since X(t) = 1 with positive probability.
Exercise 7.8: a) Consider a Markov process with the set of states {0, 1, . . . } in which the transition
rates {qij } between states are given by qi,i+1 = (3/5)2i for i
otherwise. Find the transition rate ⌫i out of state i for each i
for the embedded Markov chain.
0, qi,i 1 = (2/5)2i for i
1, and qij = 0
0 and find the transition probabilities {Pij }
Solution: For i > 0,
⌫i = qi,i+1 + qi,i 1 = 2i .
For i = 0, ⌫0 = q01 = 3/5. It follows that Pij = qij /⌫i is given by
Pi,i+1 = 3/5;
b) Find a solution {pi ; i
0} with
P
Pi,i 1 = 2/5 for i > 0;
P01 = 1.
i pi = 1 to (7.23).
Solution: This is a birth-death chain so theP{pi ; i 0} satisfying 7.23 exist if there is a
solution to pi qi,i+1 = pi+1 qi+1,i for i 0 and i pi = 1. Thus
pi+1 =
(3/5)2i pi
= (3/4)pi .
(2/5)2i+1
Thus, defining ⇢ = 3/4, the desired probabilities are pi = (1 ⇢)⇢i . These are called steadystate process probabilities because, if the embedded chain is positive recurrent, then each pi
gives the long term average fraction of time the sample functions of the process are in state
i WP1. As we see in part (c), this embedded chain is transient; the point of the problem is
to discover what this solution for {pi ; i 0} means, if anything.
c) Show that all states of the embedded Markov chain are transient.
Solution: The embedded Markov chain is the embedded Markov chain of an M/M/1 queue.
Since the steady-state balance equations are ⇡i Pi,i+1 = ⇡i+1 Pi+1,i , we see that if a steady
224
APPENDIX A. SOLUTIONS TO EXERCISES
state exists, it satisfies ⇡i+1 = (3/2)⇡i for each i
1. We have seen before (see Exercise
6.2) that this means that the the embedded Markov chain is transient.
d) Explain in your own words why your solution to (b) is not in any sense a set of steady-state probabilities.
Solution: The transient embedded Markov chain shows that for each state i, there is a
positive probability that i will never be re-entered after the first visit, and more precisely
that the aggregate time spent in i after the first entry is a geometrically distributed sum of
IID exponential rv’s. This in turn (see Example 2.3.3) implies that the aggregate time in
state i is an exponentially distributed rv. This can certainly not be viewed in any reasonable
sense as a steady-state fraction of time in i.
We next show that an infinite number of states are visited with arbitrarily large probability
in a finite time. This indicates that this chain is similar to the one in Exercise 7.7, where
it is obvious that time averages can’t have any real meaning. To carry out this argument,
define Y m as the expected first passage time from state m to state m+1. Then S m =
Y 0 + Y 1 + · · · + Y m 1 is the expected first passage time from state 0 to state m. We denote
p as 3/5 and q as 2/5, thus allowing the argument to be used with di↵erent transition
probabilities such that p > q.
⇥
⇤
Y m = p2 m + q 2 m + Y m 1 + Y m ;
for all m 1.
To understand this, the expected time to the first transition, starting from state m, is 2 m .
With probability p, that transition goes directly to state m+1, whereas with probability q,
that transition must be followed first by a first passage from m 1 to m and next by a first
passage from m to m+1. Simplifying this equation, we get
Ym =
1 m q
2
+ Y m 1;
p
p
for all m
1.
Recursing on this equation back to Y 0 , we get
"
✓ ◆2
✓ ◆m 1 # ✓ ◆m
2 m
2q
2q
2q
q
Ym =
1+
+
+ ··· +
+
Y0
p
p
p
p
p
"
✓ ◆2
✓ ◆m 1 #
✓ ◆
2 m
2q
2q
2q
2 m 2q m
=
1+
+
+ ··· +
+
p
p
p
p
p
p

2 m (2q/p)m+1 1
=
p
(2q/p) 1
m+1
2(q/p)
(1/2)m
=
2q p
(i)
(ii)
(iii)
We have used the fact that Y 0 is the expected duration of the first passage time from state
0 to 1, which is simply 1/p. The result in (iii) thus holds for all m
1, and, as can be
easily checked from the formula, it also holds for m=0, where Y 0 = 1/p.
Each sample function of this process (starting at state 0) can be viewed as a sequence of
segments, the mth of which starts at the first entry to state m and ends at the first entry
to m+1. The duration of the mth segment is a rv Ym . It is independent of all the other
A.7. SOLUTIONS FOR CHAPTER 7
225
segment durations, and must be non-degenerate since it has finite expectation, Y m , as given
in (iii).
Since q < p, the second term in (iii) becomes negligible with respect to the first term
with increasing m, so that Y m essentially approaches 0 geometrically as (q/p)m . Thus the
expected aggregate duration of this countable sequence of intervals is finite. The partial
sums of this sequence of segments, and thus the partial sums of transition intervals, become
so closely spaced that the infinite aggregate is bounded. The process does not continue
forever, but ends at a limit point with infinitely many segments, and thus also transitions,
piled up at the limit point.
To make this more precise, we now find S n , the expected first passage time from 0 to n, by
summing Y m from 0 to n 1. Thus

⇤
2q
1 (q/p)n
2 ⇥
Sn =
1 (1/2)n
p(2q p)
1 q/p
2q p
✓
◆
2
2
q(q/p)n
2
=
(1/2)n
<
p q
2q p
p q
p q
The important thing here is that S n is 2/(p q), reduced by a quantity that essentially
decreases geometrically to zero as n ! 1. Thus limn!1 S n = 1/(p q). Now Sn , the
first passage time from state 0 to m is increasing with n and thus limn!1 Sn must be a rv
since its expected value is finite. This is the limit referred to above where successive process
transitions pile up. The limit depends on the sample path, but is a rv with expected value
1/(p q). This pile up of transitions at a finite limit is much like the simpler process in
Exercise 7.7.
For p = 3/5 and q = 2/5, limn!1 S n = 10. By the Markov inequality, Pr{Sn 2↵/(p q)} 
1/↵ for all ↵ > 1. Since this is true for all n 1 and Sn Sn 1 for all n 1, we see that
n
o
Pr lim Sn ↵/(p q)  1/↵
for all ↵ > 1.
n!1
In other words, the limiting point at which transitions pile up into a limit might vary with
the sample process, but that limiting point is finite WP1. For the case here, for example,
Pr{limn!1 Sn 20}  1/2.
Since we have come this far, it makes sense to also interpret the meaning in part (b) of the
solution, pi = (1 ⇢)⇢i , of (7.23) where ⇢ = p/2q). Note in (i) that Y m is the expected
first passage time to state m+1 from state m. The first term, 2 m /p, is the expected
time (perhaps including more than one visit) in state m and the other, (q/p)Y m 1 , is the
expected time in states 0 to m 1. The recursion leading to (ii) carries this further, so that
the term (q/p)k 2 m /p is the expected time in state m k during the duration of the
first passage from m to m+1. Thus the expected time in state i, 0  i  m within this
first passage time from m to m+1 is ⇢i 2 m ⇢ m /p . This is simply the solution to (7.23)
normalized to be a fraction of Y m . In other words, during
each segment m of the process,
P
the expected time in each state 0  i  m is Y m pi / jm pj .
One can interpret this as a type of average if one chooses, but it is not very close to the
usual time average of a positive recurrent Markov process. First, there is no SLLN to rely
226
APPENDIX A. SOLUTIONS TO EXERCISES
on and second the process terminates after a finite time. It is still quite surpising that the
pi have so much meaning given the extreme peculiarity of this process.
Exercise 7.9: Let qi,i+1 = 2i 1 for all i
0 and let qi,i 1 = 2i 1 for all i
1. All other transition rates
are 0.
a) Solve the steady-state equations and show that pi = 2 i 1 for all i
0.
Solution: The process is a birth/death process, so we can find the steady-state probabilities (if they exist)
= pi+1 qi+1,i for i
0. Thus pi+1 = pi /2.
P from the equations piiqi,i+1
1
Normalizing to i pi = 1, we get pi = 2
.
b) Find the transition probabilities for the embedded Markov chain and show that the chain is null recurrent.
Solution: First assume, for the purpose of finding a contradiction,
P that the embedded
Markov chain is positive recurrent. Then by (7.21), ⇡j = pj ⌫jP
/ i pi ⌫i . Note that ⌫i =
qi,i+1 + qi,i 1 = 2i for all i 1. Thus pi ⌫i = 1/2 for i 1 and i pi ⌫i = 1. Thus ⇡j = 0
for j 0 and the chain must be either null recurrent or transient. We show that the chain
is null recurrent in (c)
c) For any state i, consider the renewal process for which the Markov process starts in state i and renewals
occur on each transition to state i. Show that, for each i
1, the expected inter-renewal interval is equal to
2. Hint: Use renewal-reward theory.
Solution: We use the same argument as in Section 7.2.2 with unit reward when the process
is in state i. The limiting fraction of time in state i given that X0 = i is then pi (i) =
1/(⌫i W (i)) where W (i) is the mean renewal time between entries to state i. This is the
same for all starting states, and by Blackwell’s theorem it is also limt!1 Pr{X(t) = i}. This
pi (i) must also satisfy the steady-state process equations and thus be equal to pi = 2 i 1 .
Since ⌫i = 2i , we have W (i) = 2 for all i 1. Finally, this means that a return to i must
happen in a finite number of transitions. Since the embedded chain cannot be positive
recurrent, it thus must be null recurrent.
The argument here has been a little tricky since the development in the text usually assumes
that the embedded chain is positive recurrent, but the use of renewal theory above gets
around that.
d) Show that the expected number of transitions between each entry into state i is infinite. Explain why
this does not mean that an infinite number of transitions can occur in a finite time.
Solution: We have seen in (b) and (c) that the embedded chain is null-recurrent. This
means that, given X0 = i, for any given i, a return to i must happen in a finite number
of transitions (i.e., limn!1 Fii (n) = 1), but the expected number of such transitions is
infinite. We have seen many rv’s that have an infinite expectation, but, being rv’s, have a
finite sample value WP1.
Exercise 7.10: a) Consider the two state Markov process of Example 7.3.1 with q01 =
and q10 = µ.
Find the eigenvalues and eigenvectors of the transition rate matrix [Q].
Solution: The matrix [Q] is

. For any transition rate matrix, the rows all
µ
µ
sum to 0, and thus [Q]e = 0, establishing that 0 is an eigenvalue with the right eigenvector
227
A.7. SOLUTIONS FOR CHAPTER 7
e = (1, 1)T . The left eigenvector, normalized to be a probability vector, is the steady-state
vector p = (µ/( + µ), /( + µ). The other eigenvalue is then easily calculated as ( + µ)
with left eigenvector ( 1, 1) and right eigenvector ( /( +µ), µ/( +µ)). This answer is not
unique, since the eigenvectors can be scaled di↵erently while still maintaining p i v Tj = ij .
b) If [Q] has M distinct eigenvalues, the di↵erential equation d[P (t)]/dt = [Q][P (t)] can be solved by the
equation
[P (t)] =
M
X
⌫ i et i p Ti ,
i=1
where p i and ⌫ i are the left and right eigenvectors of eigenvalue
i . Show that this equation gives the same
solution as that given for Example 7.3.1.
Solution: This follows from substituting the values above. Note that the eigenvectors
above are the same as those for the sample time approximations and the eigenvalues are
related as explained in Example 7.3.1.
Exercise 7.11: Consider the three state Markov process below; the number given on edge (i, j) is qij ,
the transition rate from i to j. Assume that the process is in steady state.
1k
i
I
2
a) Is this process reversible?
q 2k
*
1
1
2
4
4
⇡
R 3k
Solution: Since qij = qji for all states i, j, we can satisfy the condition for reversibility in
(7.52) by choosing pi = pj for all i, j. Thus the chain is reversible and p1 = p2 = p3 = 1/3
b) Find pi , the time-average fraction of time spent in state i for each i.
Solution: This is solved in (a), but is somewhat surprising because of the di↵erent transition rates. One can rationalize this by starting with all qij equal, so that pi = 1/3 for all i
is obvious from symmetry. Then if q23 and q32 are each increased but kept equal, we can
see that this increases the oscillation rate between 2 and 3, but wouldn’t be expected to
change p2 or p3 . The same argument applies if either of the other pairs are then changed.
c) Given that the process is in state i at time t, find the mean delay from t until the process leaves state i.
Solution: This mean delay is (⌫i ) 1 = (qij + qik ) 1 where k, j, i are distinct. Thus
⌫1 1 =1/3, ⌫2 1 =1/5, ⌫3 1 = 1/6.
d) Find ⇡i , the time-average fraction of all transitions that go into state i for each i.
Solution: From (7.21), ⇡i = pi ⌫i /
5/14, ⇡3 = 6/14.
P
j pj ⌫j
.
Using (b) and (c), ⇡1 = 3/14, ⇡2 =
e) Suppose the process is in steady state at time t. Find the steady-state probability that the next state to
be entered is state 1.
Solution: In order for 1 to be the next state to be entered, the state at time t must be 2 or
3. Given X(t) = 2 the probability of going next to 1 is 1/5. Given X(t) = 2, the probability
of going next to 1 is 1/3. Thus Pr{next state = 1} = (1/3)(1/5) + (1/3)(1/3) = 8/45.
228
APPENDIX A. SOLUTIONS TO EXERCISES
f ) Given that the process is in state 1 at time t, find the mean delay until the process first returns to state
1.
Solution: From Theorem 7.2.6, W i = (⌫i pi ) 1 = 1. Another instructive approach is to
use the renewal-reward process (in time) for which renewals occur on entries to state 1 and
unit reward occurs while in state 1. The expected reward during a visit to state 1 is the
mean time ⌫1 1 until an exit from 1, i.e., 1/3. The overall fraction of time spent in state 1
is also 1/3 WP1. By the renewal reward theorem, Theorem 5.4.5, the mean time between
visits to state 1 is 1/3
1/3 = 1. Since the time spent in state 1 is exponential, this is also the
time to a renewal starting in state 1 at an arbitrary time t.
g) Consider an arbitrary irreducible finite-state Markov process in which qij = qji for all i, j. Either show
that such a process is reversible or find a counter example.
Solution: The argument in (a) is general for finite state processes, so any such process is
reversible and the steady-state probabilities are again equal. For a countable state space,
each steady-state process probability pi must be equal, but then they must all be 0, and
they cannot sum to 1. Thus steady-state probabilities can not exist, and the definition of
⇤ are
reversibility in Section 7.6 does not apply since the backward transition probabilities qij
undefined.
Exercise 7.12: a) Consider an M/M/1 queueing system with arrival rate
and service rate µ >
.
Assume that the queue is in steady state. Given that an arrival occurs at time t, find the probability that
the system is in state i immediately after time t.
Solution: One must be careful interpreting these statements. We assume that the system
is in steady state for times ⌧ < t. The arrival process is memoryless and for times ⌧
t,
the probability of an arrival in any increment is independent of the state at each ⌧ < t.
Thus given an arrival at t, the state increases by 1 from X(t ), the steady-state state
immediately before t. Thus, defining ⇢ = /µ, Pr{X(t) = i} = (1 ⇢)⇢i 1 for all i > 0. Of
course Pr{X(t) = 0} = 0
b) Assuming FCFS service, and conditional on i customers in the system immediately after the above arrival,
characterize the time until the above customer departs as a sum of random variables.
Solution: Let Y1 be the service time of the entering customer, let Y2 , . . . , Yi 1 be the
service times of the i 2 earlier customers in the queue, and let Yi be the remaining
required service of the customer being served. The time until the arriving customer departs
is then Y1 + Y2 + · · · + Yi . These are IID exponential rv’s of rate µ each, so the sum is
Erlang.
c) Find the unconditional probability density of the time until the above customer departs. Hint: You know
(from splitting a Poisson process) that the sum of a geometrically distributed number of IID exponentially
distributed random variables is exponentially distributed. Use the same idea here.
Solution: From (a), the number X of customers in the system at time t has the PMF
pX (i) = (1 ⇢)⇢i 1 for i 1. This is geometric starting at 1. The time W until the given
customer departs is W = Y1 + · · · + YX where the Yi are IID and each is exponential with
parameter µ. As explained in Example 2.3.3, the distribution of such a geometric sum of
229
A.7. SOLUTIONS FOR CHAPTER 7
IID exponentially distributed rv’s is itself exponential with rate (1
fW (w) = µ(1
⇢) exp[ µ(1
⇢)µ, i.e.,
⇢)w].
In order to better understand this equation for the problem at hand, note that the sequence
of departing customers (at least up to time t+W ) can be viewed as a Poisson process of rate
µ. View this Poisson process as the combination of two independent sub-Poisson processes,
the first of rate (1 ⇢)µ and the second of rate ⇢µ. These sub-processes are mathematical
constructs with no immediate interpretation for the given problem. The probability that
the first departure from the combined process is from sub-process 1 is then 1 ⇢. For each
i > 1, the probability that the first i 1 departures are from process two and the ith is
from process 1 is (1 ⇢)⇢i 1 . Thus the probability that the number in the original system
at time t is X = i is the same as the probability that the first i 1 departures are from
sub-process 2 and departure i is from sub-process 1. Thus the marginal probability density
for W is the same in each case, and is thus the density of the first departure in the first
sub-process, which is the PDF given above. Since ⇢ = /µ, this result is usually stated as
fW (w) = (µ
)e (µ )w .
Exercise 7.13: a) Consider an M/M/1 queue in steady state. Assume ⇢ = /µ < 1. Find the probability
Q(i, j) for i
j > 0 that the system is in state i at time t and that i
j departures occur before the next
arrival.
Solution: The probability that the queue is in state i at time t is pi = (1 ⇢)⇢i (see (7.40)).
Given state i > 0, successive departures (so long as the state remains positive) are Poisson
at rate µ and arrivals are Poisson at rate . Thus (conditional on X(t) = i) the probability
i j
of exactly i j departures before the next arrival is µ/( + µ)
/( + µ). Thus
Q(i, j) = (1
i
⇢)⇢
✓
µ
+µ
◆i j
+µ
.
b) Find the PMF of the state immediately before the first arrival after time t.
Solution: Q(i, j) is the probability that X(t) = i and that X(⌧ ) = j where ⌧ is the time
of the next arrival after t and j > 0. Thus for j > 0,
✓
◆i ✓
◆ j
X
X
µ
Pr X(⌧ )=j
=
Q(i, j) =
(1 ⇢)
+µ
+µ
+µ
i j
i j
✓ ◆j ✓
◆✓
◆
1
= (1 ⇢)
= (1 ⇢)⇢j+1 .
µ
1
/( +µ)
+µ
For j = 0, a similar but simpler calculation leads to Pr{X(⌧ ) = 0} = (1 ⇢)(1 + ⇢). In
other words, the system is not in steady state immediately before the next arrival. This is
not surprising, since customers can depart but not arrive in the interval (t, ⌧ )
c) There is a well-known queueing principle called PASTA, standing for “Poisson arrivals see time averages”.
Given your results above, give a more precise statement of what that principle means in the case of the
M/M/1 queue.
230
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: The PASTA principle requires being so careful about a precise statement of the
conditions under which it holds that it is better treated as an hypothesis to be considered
rather than a principle. One plausible meaning for what arrivals ‘see’ is given in (b), and
that is not steady state. Another plausible meaning is to look at the fraction of arrivals that
arise from each state; that fraction is the steady-state probability. Perhaps the simplest
visualization of PASTA is to look at the discrete-time model of the M/M/1 queue. There
the steady-state fraction of arrivals that come from state i is equal to the steady-state
probability pi .
Exercise 7.14: A small bookie shop has room for at most two customers. Potential customers arrive at
a Poisson rate of 10 customers per hour; they enter if there is room and are turned away, never to return,
otherwise. The bookie serves the admitted customers in order, requiring an exponentially distributed time
of mean 4 minutes per customer.
a) Find the steady-state distribution of number of customers in the shop.
Solution: The system can be modeled as a Markov process with 3 states, representing 0,
1, or 2 customers in the system. Arrivals in state 2 are turned away, so they do not change
the state and are not shown. In transitions per hour, = 10 and µ = 15. Since the process
is a birth-death process, the steady-state equations are pi = µpi+1 for i = 0, 1. Thus
p1 = (2/3)p0 and p2 = 4/9p0 . Normalizing, p0 = 9/19, p1 = 6/19, p2 = 4/19.
0n
y
X
µ
z n
X
1 X
y
µ
z n
X
2
b) Find the rate at which potential customers are turned away.
Solution: All arrivals in state 2 are turned away, so the average rate at which customers
are turned away is 40/19 per hour.
c) Suppose the bookie hires an assistant; the bookie and assistant, working together, now serve each customer
in an exponentially distributed time of mean 2 minutes, but there is only room for one customer (i.e., the
customer being served) in the shop. Find the new rate at which customers are turned away.
Solution: Now = 10 and µ = 30. There are two states, with p0 = 3/4 and p1 = 1/4.
Customers are turned away at rate 10/4, which is somewhat higher than the rate without
the assistant.
Exercise 7.15: This exercise explores a continuous time version of a simple branching process.
Consider a population of primitive organisms which do nothing but procreate and die. In particular, the
population starts at time 0 with one organism. This organism has an exponentially distributed lifespan
T0 with rate µ (i.e., Pr{T0 ⌧ } = e µ⌧ ). While this organism is alive, it gives birth to new organisms
according to a Poisson process of rate . Each of these new organisms, while alive, gives birth to yet other
organisms. The lifespan and birthrate for each of these new organisms are independently and identically
distributed to those of the first organism. All these and subsequent organisms give birth and die in the same
way, again independently of all other organisms.
a) Let X(t) be the number of (live) organisms in the population at time t. Show that {X(t); t
0} is a
Markov process and specify the transition rates between the states.
Solution: If X(t) = j > 0, then the state will change at the next instant when either a
birth or death occurs among the j living organisms. The first birth or death among the j
231
A.7. SOLUTIONS FOR CHAPTER 7
organisms can be viewed as the smallest of j independent exponential rv’s of rate and j
of rate µ. That first birth or death is a birth with probability /( + µ), so the transition
rate from state j to state j + 1 is j . Similarly, the transition rate from j to j 1 is µ. If
X(t) = 0, then no deaths or births can occur at any ⌧ > t. Thus in general, state transitions
are exponential (with no transitions from state 0), and the rate of each transition at time t
depends only on the state at time t. Thus the process is a Markov process.
b) Find the embedded Markov chain {Xn ; n
0} corresponding to the Markov process in (a). Find the
transition probabilities for this Markov chain.
Solution: Let Xn be the state at the end of the nth transition. Given that Xn = j for any
given j > 0, Xn+1 can be only j + 1 or j 1. Also
Pr{Xn+1 = j+1 | Xn =j} =
j
=
j + jµ
+µ
.
There are no transitions from state 0 in the Markov process, and thus there are no transitions
from state 0 in the embedded chain.
c) Explain why the Markov process and Markov chain above are not irreducible. Note: The majority of
results you have seen for Markov processes assume the process is irreducible, so be careful not to use those
results in this exercise.
Solution: State 0 is a trapping state in the Markov chain, i.e., it communicates with no
other state. Thus the Markov chain is not irreducible. By definition, the Markov process is
then also not irreducible.
d) For purposes of analysis, add an additional transition of rate
from state 0 to state 1. Show that the
Markov process and the embedded chain are irreducible. Find the values of
and µ for which the modified
chain is positive recurrent, null recurrent, and transient.
Solution: With the additional transition from state 0 to 1, there is a path from each
state to each other state (both in the process and the embedded chain), and thus both
are irreducible. Curiously enough, the embedded chain is now the same as the embedded
chain of an M/M/1 queue. It is positive recurrent for ⇢ < 1, null recurrent for ⇢ = 1, and
transient for ⇢ > 1.
e) Assume that
< µ. Find the steady-state process probabilities for the modified Markov process.
Solution: The transitions in the Markov process are illustrated below.
0n
y
X
µ
z n
X
1 X
y
2µ
z n
X
2 X
y
2
3µ
z n
X
3 X
y
3
4µ
z n
X
4
...
Let qj be the steady-state process probability of state j > 0. Then
q1 =
q0
;
µ
qj =
(j 1) qj 1
jµ
for j > 1.
(A.144)
Let ⇢ = /µ (so 0 < ⇢ < 1). Iterating (A.144), qj can be expressed in terms of q0 as
qj =
(j 1)⇢ (j 2)⇢
⇢ ⇢q0
⇢j q0
···
=
.
j
j 1
2 1
j
(A.145)
232
APPENDIX A. SOLUTIONS TO EXERCISES
The simplicity of (A.145) is quite surprising, but there is another way to derive the same
result that helps explain it. This process has the same embedded chain as the M/M/1
process for which the steady-state process probabilities are pj = ⇢j p0 . For the M/M/1 case,
the rate of transitions out of each state j is ⌫j = + µ, whereas here the corresponding
rate is ⌫j0 = j( + µ). From the relationship between the steady-state probabilities for the
embedded chain and the process, however, pj = ↵⇡j /⌫j and qj = ⇡j /⌫j0 where ↵ and
j
are normalizing constants. From this, we see that qj = (↵/ )pj /j =
P(↵/ )p0 ⇢ /j. The
normalizing constants can be ignored, since they merely insure that j qj = 1, and thus
this provides another derivation of (A.145).
We still need to solve for q0 to find the steady-state process probabilities. We have
1 =
X
j
2
qj = q0 41 +
1
X
⇢j
j=1
j
3
5 = q0 [1
ln(1
⇢)] .
P
To see the equality of j 1 ⇢j /j and ln(1 ⇢), note that they agree at ⇢ = 0 and, for
each ⇢ > 0, the derivative of each with respect to ⇢ is 1/(1 ⇢). We then have
q0 =
1
1
ln(1
⇢)
;
qj =
j[1
⇢j
ln(1
⇢)]
for j
1.
f ) Find the mean recurrence time between visits to state 0 for the modified Markov process.
Solution: The mean recurrence time between visits to state 0 in the modified Markov
process is
W0 =
1
1
=
⌫0 q0
ln[1
⇢]
.
g) Find the mean time T for the population in the original Markov process to die out. Note: We have
seen before that removing transitions from a Markov chain or process to create a trapping state can make
it easier to find mean recurrence times. This is an example of the opposite, where adding an exit from a
trapping state makes it easy to find the recurrence time.
Solution: The mean recurrence time in the modified chain for state 0 consists of two parts
— the mean time (1/ ) to first reach state 1 from state 0 and then the mean time to first
reach state 0 again. This latter time is the same whether or not the extra transition from
0 to 1 is present and is thus given by (1/ ) ln(1 ⇢).
Exercise 7.16: Consider the job sharing computer system illustrated below. Incoming jobs arrive from
the left in a Poisson stream. Each job, independently of other jobs, requires pre-processing in system 1 with
probability Q. Jobs in system 1 are served FCFS and the service times for successive jobs entering system
1 are IID with an exponential distribution of mean 1/µ1 . The jobs entering system 2 are also served FCFS
and successive service times are IID with an exponential distribution of mean 1/µ2 . The service times in the
two systems are independent of each other and of the arrival times. Assume that µ1 > Q and that µ2 > .
Assume that the combined system is in steady state.
233
A.7. SOLUTIONS FOR CHAPTER 7
Q-
System 1
µ1
-
System 2
µ2
-
1 Q
a) Is the input to system 1 Poisson? Explain.
Solution: The input to system 1 is the splitting of a Poisson process and is thus Poisson
of rate Q.
b) Are each of the two input processes coming into system 2 Poisson? Explain.
Solution: The output of system 1 is Poisson of rate Q by Burke’s theorem. The other
process entering system 2 is also Poisson with rate (1 Q) , since it is a splitting of the
original Poisson process of rate . This input to system 2 is independent of the departures
from system 1 (since it is independent both of the arrivals and departures from system 1.)
Thus the combined process into system 2 is Poisson with rate
c) Give the joint steady-state PMF of the number of jobs in the two systems. Explain briefly.
Solution: The state X2 (t) of system 2 at time t is dependent on the inputs to system 2 up
to time t and the services in system 2 up to time t. By Burke’s theorem, the outputs from
system 1 at times up to time t are independent of the state X1 (t) of system 1 at time t, and
thus the inputs to system 2 from system 1 at times up to t are independent of X1 (t). The
other input to system 2 is also independent entirely of system 1. Thus X1 (t) and X2 (t) are
independent. Thus X1 (t) and X2 (t) are independent and are the states of M/M/1 queues
in steady state at time t.
Pr{X1 (t), X2 (t) = i, j} = (1
⇢1 )⇢i1 (1
⇢2 )⇢j2 ,
where ⇢1 = Q/µ1 and ⇢2 = /µ2 . Note that this independence applies to the states of
system 1 and 2 at the same instant t. The processes are not independent, and, for example,
X1 (t) and X2 (t + ⌧ ) for ⌧ > 0 are not independent.
d) What is the probability that the first job to leave system 1 after time t is the same as the first job that
entered the entire system after time t?
Solution: The first job out of system 1 after t is the same as the first job into system 1 after
t if and only if system 1 is empty at time t. This is also the same as the first job into the
overall system if the first arrival to the entire system goes into system 1. The probability
of both of these events (which are independent) is thus P1 = Q(1 ⇢1 ).
e) What is the probability that the first job to leave system 2 after time t both passed through system 1
and arrived at system 1 after time t.
Solution: Let P2 be the probability of this event. This requires, first, that the event in (d)
is satisfied, second that system 2 is empty at time t, and, third, that the first job to bypass
sytem 1 after the first arrival to system 1 occurs after the service of that first arrival.
This third event is the event that an exponential rv of rate µ1 has a smaller sample value
234
APPENDIX A. SOLUTIONS TO EXERCISES
than one of rate (1
Q). The probability of this is µ1 / µ1 + (1
µ1
P2 = P1 (1 ⇢2 )
.
µ1 + (1 Q)
Q) . Thus
Exercise 7.17: Consider the following combined queueing system. The first queue system is M/M/1
with service rate µ1 . The second queue system has IID exponentially distributed service times with rate
µ2 . Each departure from system 1 independently leaves the entire system with probability 1 Q1 . System
2 has an additional Poisson input of rate 2 , independent of inputs and outputs from the first system.
Each departure from the second system independently leaves the combined system with probability Q2 and
re-enters system 2 with probability 1 Q2 . For (a), (b), (c) assume that Q2 = 1 (i.e., there is no feedback).
1
-
System 1
Q1
µ1
2
-
System 2
µ2
Q2 1 Q2
?
1 Q1
(part d))
a) Characterize the process of departures from system 1 that enter system 2 and characterize the overall
process of arrivals to system 2.
Solution: The departures from system 1 are Poissonm of rate 1 from Burke’s theorem.
After splitting into two streams, the stream entering system 2 is Poisson of rate Q1 1 . The
additional Poisson input to system 2 is independent of the first, and thus the combined
arrivals to system 2 are Poisson of rate Q1 1 + 2 .
b) Assuming FCFS service in each system, find the steady state distribution of time that a customer spends
in each system.
Solution: As determined in Example 2.3.3 and rederived in Exercise 7.12, the PDF of
system time in system 1 is f1 (t) = (µ1 1 )⇥exp( (µ1⇤ 1 )t. For system 2, (since it is
M/M/1), the PDF is f2 (t) = (µ2
µ2
2T ) exp
2T t, where 2T = (Q1 1 + 2 ).
c) For a customer that goes through both systems, show why the time in each system is independent of that
in the other; find the distribution of the combined system time for such a customer.
Solution: This is explained in the discussion of Burke’s theorem, but we repeat it briefly
here. By (c) of Burke’s theorem, the arrival time (and thus the service time) corresponding
to a departure from system 1 at time t is independent of the departures before t. Those
departures from system 1 that enter system 2 (plus the independent arrivals at rate 2 and
the service times of system 2) determine the state of system 2 at t, which is thus independent
of the system 1 service time. Thus the overall service time density, fT (t) is the convolution
of the individual service time densities.
⇤
↵ ⇥
fT (t) =
e t e ↵t ,
↵
where ↵ = µ1
1 and
density of order 2.
= µ2
2T .
If ↵ =
, this must be replaced with the Erlang
d) Now assume that Q2 < 1. Is the departure process from the combined system Poisson? Which of the
three input processes to system 2 are Poisson? Which of the input processes are independent? Explain your
reasoning, but do not attempt formal proofs.
235
A.7. SOLUTIONS FOR CHAPTER 7
Solution: If we look only at system 2 with the feedback loop around it, we see exogenous
Poisson inputs at rate 2T = 1 Q1 + 2 . This is the same system as in Figure 7.14. From
the analysis there, the overall output (that at rate Q2 2T ) is Poisson. The overall input to
system 2, including the feedback link is not Poisson (see Figure 7.15).
Exercise 7.18: Suppose a Markov chain with transition probabilities {Pij } is reversible. Suppose we
change the transition probabilities out of state 0 from {P0j ; j
0
0} to {P0j
;j
0}. Assuming that all
0
Pij for i 6= 0 are unchanged, what is the most general way in which we can choose {P0j
;j
0} so as to
maintain reversibility? Your answer should be explicit about how the steady-state probabilities {⇡i ; i
0}
are changed. Your answer should also indicate what this problem has to do with uniformization of reversible
Markov processes, if anything. Hint: Given Pij for all i, j, a single additional parameter will suffice to specify
0
P0j
for all j.
Solution: Let ⇡i0 denote the new steady-state probabilities. To maintain reversibility,
0 = ⇡0 P
0
0
⇡00 P0j
j j0 for all j 6= 0 and ⇡i Pij = ⇡j Pji for all i, j 6= 0. We consider the latter set
of equations first and divide each of these equations by the original reversibility conditions,
⇡i Pij = ⇡j Pji . This gives us ⇡i0 /⇡i = ⇡j0 /⇡j for all i, j 6= 0. This means that all the steadystate probabilities for i 6= 0 scale by the same amount, i.e., that for some ↵, ⇡i0 = ↵⇡i for
all i 6= 0. By summing over i, we see that 1 ⇡00 = ↵(1 ⇡0 ).
0 = ⇡0 P
We can substitute this equation into ⇡00 P0j
j j0 to get
0
P0j
=
⇡j0 Pj0
↵⇡j Pj0
↵⇡0 P0j
=
=
.
0
⇡0
1 ↵ + ↵⇡0
1 ↵ + ↵⇡0
Thus the single parameter ↵, along with the original probabilities, specifies all of the changed
0 , and those changes simply scale each P , j 6= 0, up or down in
transition probabilities P0j
0j
proportion. Note also that instead of specifying the parameter ↵, wePcould simply specify
the new desired ⇡00 or the desired scaling of P0j , j 6= 0. Finally, since j P0j = 1 where the
sum includes j = 0, specifying the change in P00 specifies all the other changes.
What this has to do with normalization is that when one inserts self transition rates into
a given state , say 0, of a reversible Markov process, the embedded Markov chain changes
in exactly this way. The process transitions qij except for i=j=0 remain the same, but the
{P0j ; j 6= 0} in the embedded chain each change in proportion to accomodate the new P00 .
Each such change preserves the reversibility of the embedded chain, so that if a Markov
process is reversible, the uniformized chain is also reversible.
Exercise 7.19: Consider the closed queueing network in the figure below. There are three customers
who are doomed forever to cycle between node 1 and node 2. Both nodes use FCFS service and have
exponentially distributed IID service times. The service times at one node are also independent of those at
the other node and are independent of the customer being served. The server at node i has mean service
time 1/µi , i = 1, 2. Assume to be specific that µ2 < µ1 .
236
APPENDIX A. SOLUTIONS TO EXERCISES
Node 1
µ1
-
Node 2
µ2
a) The system can be represented by a four state Markov process. Draw its graphical representation and
label it with the individual states and the transition rates between them.
Solution: The three customers are statistically identical, so we can take the number of
customers in node 1 (which can be 0, 1, 2, 3) to be the state. In states 1, 2, 3, departures
from node 1 take place at rate µ1 . In states 0, 1, 2, departures occur from node 2 at rate
µ2 , and these serve as arrivals to node 1. Thus the process has the following graphical
representation.
0n
y
X
µ2
µ1
z n
X
1 X
y
µ2
µ1
z n
X
2 X
y
µ2
µ1
z n
X
3
b) Find the steady-state probability of each state.
Solution: Perhaps surprisingly, this is the same as an M/M/1 queue in which arrivals are
turned away when X(t) = 3. In the queue representation, we lose the identity of the three
customers, but we need not keep track of them since they are statistically identical. As we
see in the rest of the exercise, the customer identities can be tracked supplementally, since
the arrivals to node 1 rotate from customer 1 to 2 to 3. Thus each third arrival in the queue
representation corresponds to the same customer.
Let ⇢ = µ2 /µ1 . Then the steady-state process probabilities are p1 = p0 ⇢; p2 = ⇢2 p0 , p3 =
⇢3 p0 , where p0 = 1/(1 + ⇢ + ⇢2 + ⇢3 ).
c) Find the time-average rate at which customers leave node 1.
Solution: Each customer departure from node 1 corresponds to a downward transition
from the queue representation, and thus occurs at rate µ1 from each state except 0 of the
queue. Thus in steady state, customers leave node 1 at rate r = (1 p0 )µ1 . This is also the
time-average rate at which customers leave node 1.
d) Find the time-average rate at which a given customer cycles through the system.
Solution: Each third departure from node 1 is a departure of the same customer, and this
corresponds to each third downward transition in the queue. Thus the departure rate of a
given customer from queue 1, which is the same as the rotation rate of that customer, is
r/3.
e) Is the Markov process reversible? Suppose that the backward Markov process is interpreted as a closed
queueing network. What does a departure from node 1 in the forward process correspond to in the backward
process? Can the transitions of a single customer in the forward process be associated with transitions of a
single customer in the backward process?
237
A.7. SOLUTIONS FOR CHAPTER 7
Solution: The queueing process is reversible since it is a birth-death process. The backward
process from the two node system, however, views each forward departure from node 1 as an
arrival to node 1, i.e., a departure from node 2. Customers 1, 2, 3 then rotate in backward
order, 1, 3, 2, in the backward system. Thus if one wants to maintain the customer order,
the system is not reversible. The system, viewed as a closed queueing system in the sense of
Section 7.7.1, is in essence the same as the queueing network here and also abstracts away
the individual customer identities, so it is again reversible.
Exercise 7.20: Consider an M/G/1 queueing system with last come first serve (LCFS) pre-emptive
resume service. That is, customers arrive according to a Poisson process of rate . A newly arriving
customer interrupts the customer in service and enters service itself. When a customer is finished, it leaves
the system and the customer that had been interrupted by the departing customer resumes service from
where it had left o↵. For example, if customer 1 arrives at time 0 and requires 2 units of service, and
customer 2 arrives at time 1 and requires 1 unit of service, then customer 1 is served from time 0 to 1;
customer 2 is served from time 1 to 2 and leaves the system, and then customer 1 completes service from
time 2 to 3. Let Xi be the service time required by the ith customer; the Xi are IID random variables with
expected value E [X]; they are independent of customer arrival times. Assume E [X] < 1.
a) Find the mean time between busy periods (i.e., the time until a new arrival occurs after the system
becomes empty).
Solution: The arrival process is Poisson, and thus the residual life until the next arrival,
starting at any given time, is exponential at rate . Thus the mean period separating busy
periods is 1/ .
b) Find the time-average fraction of time that the system is busy.
Solution: Applying Little’s law to the service facility, the time-average number of customers in service (which is the fraction of time the server is busy) is equal to times the
sample-path average service time per customer, E [X]. Since the server is busy if and only
if the system is busy, E [X] is the time-average fraction of time the system is busy.
c) Find the mean duration, E [B], of a busy period. Hint: use (a) and b).
Solution: Consider the beginning of each busy period as a renewal and consider a reward
rate of one unit while the system is busy. Then the mean reward per renewal is E [B] and
the mean duration of a renewal period is E [B] + 1/ . The ratio of these, by the renewal
reward theorem, is E [X]. Thus
E [X] =
E [B]
;
E [B] + 1/
E [B] =
1
E [X]
.
E [X]
(A.146)
d) Explain briefly why the customer that starts a busy period remains in the system for the entire busy
period; use this to find the expected system time of a customer given that that customer arrives when the
system is empty.
Solution: The customers in the system form a stack (where the top of the stack is regarded
as the customer in service. If a second customer arrives while the first is in service, it is
placed above the first in the stack, and the first can never complete service until after the
second. Successive arrivals within the busy period are also above the first (and also above
the second if it is still there), so the busy period finishes when the first customer finishes.
238
APPENDIX A. SOLUTIONS TO EXERCISES
e) Is there any statistical dependence between the system time of a given customer (i.e., the time from the
customer’s arrival until departure) and the number of customers in the system when the given customer
arrives?
Solution: There is no such dependence. The newly arriving customer is on the stack
above all previously arrived but not completed customers, so the new customer’s system
time depends only on future arrivals and their service times, all of which are independent
of past arrivals and service times.
f ) Show that a customer’s expected system time is equal to E [B]. Hint: Look carefully at your answers to
d) and e).
Solution: Since a customer’s system time is independent of the number in the system when
it arrives, the customer’s system time has the same expectation as a customer who finds
the system empty. Thus E [B] is the expected system time of each arrival.
g) Let C be the expected system time of a customer conditional on the service time X of that customer
being 1. Find (in terms of C) the expected system time of a customer conditional on X = 2; (Hint: compare
a customer with X = 2 to two customers with X = 1 each); repeat for arbitrary X = x.
Solution: For a customer with service requirement 2, consider the time required to complete one unit of service. This time depends only on the epochs and service requirements
of future arrivals, and is thus identically distributed to that of a customer with one unit of
service requirement. At the end of that unit of service, the second unit of service is on the
top of the stack, and its expected system time is the same as a new customer entering at
that instant. The system time of the original customer with service requirement 2 is the
sum, 2C, of these two system times, each of which is C.
Similarly, for integer m service requirement, the expected system time is mC. Doing the
same argument backwards, if a system has service requirement 1/k, an arrival with one
unit of service requirement has k times as much expected system time. Thus for a service
requirement equal to any rational number m/k, the expected system time is mC/k Finally,
since the expected system time is monotonic is the service requirement, a customer with
service requirement x has an expected system time equal to xC.
h) Find the constant C. Hint: use f ) and g); don’t do any tedious calculations.
Solution: Since xC is the expected service time conditional on X = x, the unconditional
expected service time is CE [X] = E [B]. Using (A.146), we then have C = 1/(1
X).
Note that the LCFS service discipline avoids the slow truck e↵ect of FCFS for MG1 queues.
The reason is that only customers already in the system are held up by a customer with
large service requirement. This number is independent of the service requirements for
LCFS, whereas for FCFS, the expected number held up by a long service-time customer is
proportional to the service requirement of that long service-time customer.
Exercise 7.21: Consider a queueing system with two classes of customers, type A and type B. Type
A customers arrive according to a Poisson process of rate
A
and type B customers arrive according to
an independent Poisson process of rate
B.
IID service times of rate µ >
Characterize the departure process of class A customers; explain
A +
B.
The queue has a FCFS server with exponentially distributed
carefully. Hint: Consider the combined arrival process and be judicious about how to select between A and
239
A.7. SOLUTIONS FOR CHAPTER 7
B types of customers.
Solution: The combined arrival process is Poisson, since it is the addition of two indepedent
Poisson processes. The service discipline serves customers in order of arrival and therefore
the departure process is the same as that for a single Poisson input process. Thus the
output process is Poisson also. Since the arriving customers are independently A type with
probability A /( A + B ), the departing customers, being in the same order as the arriving
customers, are independently A type with the same probability. Thus the departing A
customers form a Poisson process of rate A , and this process is independent of the type B
departures.
Exercise 7.22: Consider a pre-emptive resume last come first serve (LCFS) queueing system with two
classes of customers. Type A customer arrivals are Poisson with rate A and Type B customer arrivals are
Poisson with rate B . The service time for type A customers is exponential with rate µA and that for type
B is exponential with rate µB . Each service time is independent of all other service times and of all arrival
epochs.
Define the “state” of the system at time t by the string of customer types in the system at t, in order of
arrival. Thus state AB means that the system contains two customers, one of type A and the other of type
B; the type B customer arrived later, so is in service. The set of possible states arising from transitions out
of AB is as follows:
ABA if another type A arrives.
ABB if another type B arrives.
A if the customer in service (B) departs.
Note that whenever a customer completes service, the next most recently arrived resumes service, so the
state changes by eliminating the final element in the string.
a) Draw a graph for the states of the process, showing all states with 2 or fewer customers and a couple of
states with 3 customers (label the empty state as E). Draw an arrow, labelled by the rate, for each state
transition. Explain why these are states in a Markov process.
Solution:
◆⇣
AA
◆⇣
* ✓⌘
A ⇠
:
⇠
⇠
◆⇣
◆⇣
⇠
⇠ ABA
⇠
⇠
B ⇠
µA ✓⌘
⇡
9
⇠
: A
⇠
A
AB X
yXX
⇠⇠
◆⇣
⇠
⇠
µB
B ◆⇣
X
⇠
✓⌘
XXµX
X
⇠⇠
µA ✓⌘
9
⇠
A
z ABB
X
E X
y
◆⇣
µB ◆⇣
X
X
X
✓⌘
✓⌘
XXX
X
A z B
X
BA
B
µA
Y
H
H
✓⌘
HH
HHµB ✓⌘
BH H◆⇣
H
j
H
A
µA
✓⌘
BB
For each state, the interval to the next transition is exponential and the next state, conditional on the present state is independent of all earlier states. Thus by definition it is a
Markov process.
b) Is this process reversible? Explain. Assume positive recurrence. Hint: If there is a transition from one
state S to another state S 0 , how is the number of transitions from S to S 0 related to the number from S 0 to
S?
240
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: It is reversible for the same reason as a birth/death chain. The graph (viewing
each pair of transitions joining two nodes as a single link) is a tree, and thus each transition
in one direction on a link, must be followed by a transition in the opposite direction before
the next transition in the first direction. Thus if states s and s0 are joined by a link, the
steady-state process probabilities are related by ps qss0 = ps0 qs0 s .
Note that this is an important generalization of the result that a birth-death process is
reversible. Any process where the graph of nonzero transitions has the form of a tree is also
reversible under the same positive-recurrence conditions as a birth-death chain.
c) Characterize the process of type A departures from the system (i.e., are they Poisson?; do they form a
renewal process?; at what rate do they depart?; etc.)
Solution: Since the process is reversible, the backward process is statistically identical
to the forward process. Type A departures in the forward process correspond to type A
arrivals in the backward process, and thus the process of A departures is Poisson with rate
A . Similarly, type B departures are Poisson with rate B and are independent of type A
departures.
d) Express the steady-state probability Pr{A} of state A in terms of the probability of the empty state
Pr{E}. Find the probability Pr{AB} and the probability Pr{ABBA} in terms of Pr{E}. Use the notation
⇢A =
A /µA and ⇢B =
B /µB .
Solution: From (c), we have Pr{E} A = Pr{A} µA , so Pr{A} = ⇢A Pr{E}. Similartly, Pr{AB} = ⇢B Pr{A}, so Pr{AB} = ⇢B ⇢A Pr{E}. In the same way, Pr{ABBA} =
⇢2A ⇢2B Pr{E}.
e) Let Qn be the probability of n customers in the system, as a function of Q0 = Pr{E}. Show that
Qn = (1
⇢)⇢n where ⇢ = ⇢A + ⇢B .
Solution: Note that
Q1 = Pr{A} + Pr{B} = ⇢A Q0 + ⇢B Q0 = ⇢Q0 .
Similarly, consider any state s with n customers in the system. Let Pr{s} be the probabiity
of s. The probability of state sA (where sA is the state consisting of s followed by A) is
⇢A Pr{s}. It follows that the probability of s followed by the single letter A or B is ⇢Pr{s}.
Summing this over all states for which n P
customers are in the system, Qn+1 = ⇢Qn . By
induction then, Qn = Q0 ⇢n . Finally since n Qn = 1, we have Qn = (1 ⇢)⇢n .
Exercise 7.23: a) Generalize Exercise 7.22 to the case in which there are m types of customers, each
with independent Poisson arrivals and each with independent exponential service times. Let
be the arrival rate and service rate respectively of the ith user. Let ⇢i =
i
and µi
i /µi and assume that ⇢ =
⇢1 + ⇢2 + · · · + ⇢m < 1. In particular, show, as before that the probability of n customers in the system is
Qn = (1
⇢)⇢n for 0  n < 1.
Solution: The argument is exactly the same as that in Exercise 7.22.
P
b) View the customers in (a) as a single type of customer with Poisson arrivals of rate
=
i i and
P
with a service density i ( i / )µi exp( µi x). Show that the expected service time is ⇢/ . Note that you
have shown that, if a service distribution can be represented as a weighted sum of exponentials, then the
241
A.7. SOLUTIONS FOR CHAPTER 7
distribution of customers in the system for LCFS service is the same as for the M/M/1 queue with equal
mean service time.
Solution: Note that i / is the probability
that an arrival will be a type i arrival. Thus
P
the expected service time is (1/µ) = i ( i / )(1/µi ) = ⇢/ . Thus ⇢ = /µ. The result in
(a) then says that the distribution of the number of customers in the system is the same as
that of customers of an M/M/1 queue with the given and µ.
Exercise 7.24: Consider a k node Jackson-type network with the modification that each node i has s
servers rather than one server. Each server at i has an exponentially distributed service time with rate µi .
The exogenous input rate to node i is ⇢i = 0 Q0i and each output from i is switched to j with probability
Qij and switched out of the system with probability Qi0 (as in the text). Let i , 1  i  k, be the solution,
for given 0 , to
j =
k
X
i Qij ;
i=0
and assume that
1  j  k,
i < sµi ; 1  i  k. Show that the steady-state probability of state m is
Pr{m} =
k
Y
pi (mi ),
(A.147)
i=1
where pi (mi ) is the probability of state mi in an (M, M, s) queue. Hint: simply extend the argument in the
text to the multiple server case.
Solution: The Jackson network with M/M/s servers is a special case of the generalization
of Jsckson networks in which the service rate at each node is still exponential, but exponential with a rate depending on the number in service. The steady-state distribution in
(A.147) is given in (7.76). In the special case for M/M/s queues, ⇢ij in (7.76) and (7.77) is
given by ⇢ij = i /µij where µij = jµi for j  s and µij = sµi for j s.
It is important to observe here that the solution for the overall arrival rate at each node i
is independent of the number of servers (assuming that the network is stable, i.e., that the
arrival rate is less than sµi at each node). To be more explicit, {Qij } forms a stochastic
matrix which has a unique steady-state probability distribution ⇡ . This can be scaled to
the arrival rate i at each node by the fact that 0 is known.
Exercise 7.25: Suppose a Markov process with the set of states A is reversible and has steady-state
probabilities pi ; i 2 A. Let B be a subset of A and assume that the process is changed by setting qij = 0
for all i 2 B, j 2
/ B. Assuming that the new process (starting in B and remaining in B) is irreducible, show
that the new process is reversible and find its steady-state probabilities.
Solution: For the given transition rates {Qij } on A, the modified transition
P rates on B
0 = q for i, j 2 B and q 0 = 0 otherwise. For i 2 B, define p0 = p /
are qij
ij
i
ij
i
j2B pj . Then,
P
0
0
0
0
for all i, j 2 B, we have pi qij = pj qji . Since i pi ⌫i is clearly finite, Theorem 7.6.2 asserts
that the new Markov process on B is reversible.
This, along with a number of other exercises here, emphasizes the importance of the guessing
theorem, Theorem 7.6.3, with its special case for reversible processes in Theorem 7.6.2.
Exercise 7.26: Consider a queueing system with two classes of customers. Type A customer arrivals
are Poisson with rate
A and Type B customer arrivals are Poisson with rate
B . The service time for type
242
APPENDIX A. SOLUTIONS TO EXERCISES
A customers is exponential with rate µA and that for type B is exponential with rate µB . Each service time
is independent of all other service times and of all arrival epochs.
a) First assume there are infinitely many identical servers, and each new arrival immediately enters an idle
server and begins service. Let the state of the system be (i, j) where i and j are the numbers of type A
and B customers respectively in service. Draw a graph of the state transitions for i  2, j  2. Find the
steady-state PMF, {p(i, j); i, j
0}, for the Markov process. Hint: Note that the type A and type B
customers do not interact.
Solution: It is clear that type A and B customers do not interact, since the arrivals
are independent, there is no queueing, and each new arrival has a service time µA or µB ,
depending on the type of arrival. Theorem 7.6.2 (the guessing theorem) verifies this. Thus
p(i, j) = pA (i)pj (B) where pA (i) is the PDF of the number of customers in the system (in
steady state) for an M/M/1 queue for system A and pB (j) is the PDF for system B. From
the M/M/1 case of Exercise 6.12, pA (i) = ⇢iA exp( ⇢A )/i! and pB (i) = ⇢jB exp( ⇢B )/j!
where ⇢A = A /µA and ⇢B = B /µB . Thus, with ⇢ = ⇢A + ⇢B ,
p(i, j) =
⇢iA ⇢jB exp ⇢
.
i!j!
b) Assume for the rest of the exercise that there is some finite number m of servers. Customers who arrive
when all servers are occupied are turned away. Find the steady-state PMF, {p(i, j); i, j
0, i + j  m}, in
terms of p(0, 0) for this Markov process. Hint: Combine (a) with the result of Exercise 7.25.
Solution: The Markov process in (a) is reversible since p(i, j) A = p(i+1, j)µA with a
similar relation for B. If the number of servers is reduced to m, the corresponding Markov
process is the same as that in (a) except that all transitions from states (m, j) to (m+1, j)
are removed, all transitions from states (i, m) to (i, m+1) are removed, and we consider
only the class of states with i, j  m. Using the result from Exercise 7.25, the resulting
process is reversible and the steady-state probabilities are scaled versions of those in (a).
Thus, letting pm (i, j) be the resulting steady-state probabilities,
pm (i, j) =
⇢iA ⇢jB pm (0, 0)
.
i!j!
c) Let Qn be the probability that there are n customers in service at some given time in steady state. Show
that Qn = p(0, 0)⇢n /n! for 0  n  m where ⇢ = ⇢A + ⇢B , ⇢A =
A /µA , and ⇢B =
B /µB . Solve for p(0, 0).
Solution: We use convolution, getting
Qn =
n
X
pm (i, n
i) =
i=0
n
X
⇢i ⇢n i pm (0, 0)
A B
i=0
i!(n
i)!
.
Using the binomial expansion for (⇢A + ⇢B )n , this is equal to pm (0, 0)⇢n /n!.
P
Using the fact that m
n=1 Qn = 1, we see that
pm (0, 0) = 1/
X
nm
⇢n /n! .
243
A.7. SOLUTIONS FOR CHAPTER 7
Exercise 7.27: a) Generalize Exercise 7.26 to the case in which there are K types of customers, each
with independent Poisson arrivals and each with independent exponential service times. Let
k
and µk
be the arrival rate and service rate respectively for the kth user type, 1  k  K. Let ⇢k =
k /µk
and
⇢ = ⇢1 + ⇢2 + · · · + ⇢K . In particular, show, as before, that the probability of n customers in the system is
Qn = p(0, . . . , 0)⇢n /n! for 0  n  m.
Solution: This is an exercise in manipulating a large number of indices, and really does
little new beyond Exercise 7.26. Let a state be represented by (i1 , . . . , ik , . . . , iK ), where for
1  k  K, ik is the number of type k customers in the system. The steady-state process
probability, pm (i1 , . . . , iK ) is then
pm (i1 , . . . , iK ) = pm (0, . . . , 0)
⇢ikk
.
k=1 ik !
YK
This is derived by starting with m = 1, where each class of customer is independent of the
other classes by the same argument as in Exercise 6.26. Then Exercise 7.25 is used to see
that the finite m case is a truncation of m = 1. It is thus reversible and simply a scaling
of the m = 1 case over the truncated set of states. We next sum over all states for which
the sum of customers of each type is n. The multinomial expansion works in the same way
as the binomial expansion in Exercise 7.26 to yield
X
Qn = pm (0, . . . , 0)⇢n /n! = (1 ⇢)⇢n /n!
where pm (0, 0) = 1/
⇢n /n! .
nm
P
b) View the customers in (a) as a single type of customers with Poisson arrivals of rate = k k and with
P
a service density k ( k / )µk exp( µk x). Show that the expected service time is ⇢/ . Note that what you
have shown is that, if a service distribution can be represented as a weighted sum of exponentials, then the
distribution of customers in the system is the same as for the M/M/m/m queue with equal mean service
time.
Solution: In (a), the type k customers have exponential service at rate µk and thus density
µk exp( µk t) and mean service time 1/µk . Viewing the customers as a Poisson splitting of
a Poisson process of rateP , The marginal density of the service requirement of a customer
(averaged over types) is k ( k / )µk exp( µk x). The mean
P service requirement is then the
average of the mean service requirment over types, i.e., k ( k / )(1/µk ) = ⇢/ .
Exercise 7.28: Consider a sampled-time M/D/m/m queueing system. The arrival process is Bernoulli
with probability << 1 of arrival in each time unit. There are m servers; each arrival enters a server if a
server is not busy and otherwise the arrival is discarded. If an arrival enters a server, it keeps the server
busy for d units of time and then departs; d is some integer constant and is the same for each server.
Let n, 0  n  m be the number of customers in service at a given time and let xi be the number of time
units that the ith of those n customers (in order of arrival) has been in service. Thus the state of the system
can be taken as (n, x ) = (n, x1 , x2 , . . . , xn ) where 0  n  m and 1  x1 < x2 < . . . < xn  d.
Let A(n, x ) denote the next state if the present state is (n, x ) and a new arrival enters service. That is,
A(n, x )
=
(n + 1, 1, x1 + 1, x2 + 1, . . . , xn + 1)
for n < m and xn < d
(A.148)
A(n, x )
=
(n, 1, x1 + 1, x2 + 1, . . . , xn 1 + 1)
for n  m and xn = d.
(A.149)
That is, the new customer receives one unit of service by the next state time, and all the old customers
receive one additional unit of service. If the oldest customer has received d units of service, then it leaves
244
APPENDIX A. SOLUTIONS TO EXERCISES
the system by the next state time. Note that it is possible for a customer with d units of service at the
present time to leave the system and be replaced by an arrival at the present time (i.e., (A.149) with n = m,
xn = d). Let B(n, x ) denote the next state if either no arrival occurs or if a new arrival is discarded.
B(n, x )
=
(n, x1 + 1, x2 + 1, . . . , xn + 1)
for xn < d
(A.150)
B(n, x )
=
(n
for xn = d.
(A.151)
1, x1 + 1, x2 + 1, . . . , xn 1 + 1)
a) Hypothesize that the backward chain for this system is also a sampled-time M/D/m/m queueing system,
but that the state (n, x1 , . . . , xn ), (0  n  m, 1  x1 < x2 < . . . < xn  d) has a di↵erent interpretation: n
is the number of customers as before, but xi is now the remaining service required by customer i. Explain
how this hypothesis leads to the following steady-state equations:
⇡n,x
=
(1
)⇡A(n,x )
⇡n,x
=
⇡A(n,x )
(1
)⇡n,x
=
⇡B(n,x )
(1
)⇡n,x
=
(1
)⇡B(n,x )
; n < m, xn < d
(A.152)
; n  m, xn = d
(A.153)
; n  m, xn < d.
(A.155)
; n  m, xn = d
(A.154)
Solution: Given a state (n, x ) at time t from which an arrival can be admitted, is the
probability of an arrival, and thus a transition into A(n, x ) at time t+1 occurs if there is
an arrival from n to n+1. Since A(n, x ) is the result of an arrival, the new customer has
received only one unit of service. Thus for the backward system in state A(n, x ) at time
t+1, in the backward system, that customer is going to depart in the transition from t+1
to t.
Now consider the case in which xn < d. Then, the number of customers in the system in
state A(n, x ) is n + 1. In order for the backward system to transit from A(n, x ) at t+1
to (n, x ) at time t, it is necessary and sufficient that no backwrd arrival occur. Thus the
probability of a backward transition from A(n, x ) to (n, x ) is 1
, justifying (A.152) under
the given hypothesis.
Next consider the case xn = d. Then one customer arrives and another departs in making
a transition from (n, x ) to A(n, x ). In the backward system, then there is the necessity for
a backward arrival, as well as the backward departure we have already justified. Thus the
probability of this backward transition is , justifying (A.153). (A.154) and (A.155) are
verified in the same way.
b) Using this hypothesis, find ⇡n,x in terms of ⇡0 , where ⇡0 is the probability of an empty system. Hint:
Use (A.154) and (A.155); your answer should depend on n, but not x .
Solution: Let (n, x ) be any fixed state and consider a (forward) sample path in which
no arrivals occur for d time units. At the end of these d time units, the system must be
empty. Use (A.154) for each transition in which a departure occurs and use (A.155) where
a departure does not occur (at most one departure can occur at each time unit). We then
have
✓
◆n
⇡n,x =
⇡0 .
(A.156)
1
c) Verify that the above hypothesis is correct.
Solution: Substituting (A.156) into both sides of (A.152) to (A.155), we see that each is
satisfied for all states and both a forward arrival (if possible) or non-arrival. These equations
245
A.7. SOLUTIONS FOR CHAPTER 7
also include all backward transitions of backward arrivals (if possible) and backward nonarrivals. Thus by the guessing theorem, Theorem 7.6.3, these equations provide the unique
solution for the steady-state probabilities.
d) Find an expression for ⇡0 .
Solution: Each state (n, x ) for a given n can be alternatively viewed as a d-vector of 1’s
and 0’s where the ith component is 1 if one of the customers, say the jth in order in x ,
satisfies xj = i. Note that at most one arrival can occur in each time unit, so each customer
has received a di↵erent amount of service. Thus there are nd di↵erent states (n, x ) for
each given n. Summing (A.156) over x , the probability pn that there are n customers in
the system is
✓ ◆✓
◆n
d
pn = ⇡0
.
(A.157)
n
1
Summing this over n to find ⇡0 ,
✓ ◆✓
d
⇡0
n
1
n=1
m
X
◆n
= 1;
⇡0 =
" m ✓ ◆✓
X d
n=1
n
1
◆n # 1
.
e) Find an expression for the steady-state probability that an arriving customer is discarded.
Solution: This can be interpreted either as the probability that an arrival occurs at a given
time and is discarded or as the probability it is discarded given that it arrives at a given
time. The latter is the probability of the set of states (n, x ) for which n = m and xn < d.
There are dm1 such states, so the probability of turning away a customer is
✓
◆✓
◆m
d 1
⇡0
.
m
1
Multiplying by
discarded.
, we get the probability that a customer arrives at a given time and is
It is interesting to look at this in the limit as the time increment goes to 0. Let ⇤ be the
arrival rate in arrivals per second, let be the time increment, and = ⇤ be the arrival
rate per time increment. Let D be the service duration in seconds and d = D/ be the
service duration in time increments. (A.157) then becomes
✓
◆✓
◆n
D/
⇤
⇡0 Dn ⇤n
pn = ⇡0
!
.
n
1
⇤
n!
Recognizing that D is the expected service time, we see that this is the same result as the
M/M/m/m queue. Exercise 7.27 shows that the same result holds if the service distribution
is a sum of exponentials. The same result holds in general for M/G/m/m. This can be
derived as a limit of discrete distributions using the same arguments as here, with much
more complex notation and tricky limiting operations.
Exercise 7.29: A taxi alternates between three locations. When it reaches location 1 it is equally likely
to go next to either 2 or 3. When it reaches 2 it will next go to 1 with probability 1/3 and to 3 with
246
APPENDIX A. SOLUTIONS TO EXERCISES
probability 2/3. From 3 it always goes to 1. The mean time between locations i and j are t12 = 20, t13 = 30,
t23 = 30. Assume exponential service and assume tij = tji .
Find the (limiting) probability that the taxi’s most recent stop was at location i, i = 1, 2, 3.
What is the (limiting) probability that the taxi is heading for location 2?
What fraction of time is the taxi traveling from 2 to 3. Note: Upon arrival at a location the taxi immediately
departs.
Solution: The taxi location is best modeled as a semi-Markov process {X(t); t > 0} where
X(t) is the most recent site visited up to time t. (it cannot be modeled as a Markov process
since the holding time in each state i depends both on i and the next state j). For the
embedded Markov chain, we are given that P12 = 1/2, P21 = 1/3, P13 = 1/2, P31 = 1, P23 =
2/3. From this, the steady-state probabilities in the embedded Markov chain are
⇡1 = 3/7,
⇡2 = 3/14,
⇡3 = 5/14.
The holding times, i.e., the intervals between sites, are U (1, 2) = U (2, 1) = 20 and U (1, 3) =
U (3, 1) = U (2,P
3) = 30. We can then use (7.88) to find the expected holding time in each
state, U (j) = k Pjk U (j, k) so
U (1) = 25,
U (2) = 80/3,
U (3) = 30.
The time-average fraction of time in state i, whichP
is also the limiting probability of being
in state i is then given by (7.90) as pj = ⇡j U (j)/
k ⇡k U (k) . Thus we have
p1 = 15/38,
p2 = 8/38;
p3 = 15/38.
In terms of the taxi problem, pi is the probability that X(t) = i, i.e., the probability that
the most recent site visited up to time t is location i.
Next, the probability that the next site to be visited is j is the sum over i of the probabilitities that the state is going from i to j at time t. Letting Q(i, j) be the probabiity that
X(t) = i and that the next state is j, we see from (7.94) that Q(i, j) = pi Pij U (i, j)/U (i).
Thus the probability that the next state is 2 is
Pr{next state = 2} = Q(1, 2) = p1 P12 U (1, 2)/U (1) = 3/19.
Finally, the limiting fraction of time spent traveling from 2 to 3 is Q(2, 3), which from (7.94)
is 3/19.
Exercise 7.30: a) Assume that the Markov process in Exercise 7.5 is changed in the following way:
whenever the process enters state 0, the time spent before leaving state 0 is now a uniformly distributed
rv, taking values from 0 to 2/ . All other transitions remain the same. For this new process, determine
whether the successive epochs of entry to state 0 form renewal epochs, whether the successive epochs of exit
from state 0 form renewal epochs, and whether the successive entries to any other given state i form renewal
epochs.
Solution: On each successive entry to state 0, the embedded chain is in state 0 and the
duration in that state is a uniform rv independent of all earlier states and their durations.
247
A.7. SOLUTIONS FOR CHAPTER 7
This duration in state 0 is finite and IID between successive visits to 0. All subsequent
states and durations are independent of those after each subsequent visit to state 0 and
the time to return to state 0 is finite with probability 1 from the same argument used for
the Markov process. Thus successive returns to state 0 form a renewal process. The same
argument works for exits from state 0.
For successive entries to some arbitrary given state j, the argument is slightly more complex,
but it also forces us to understand the issue better. The argument that the epochs of
successive entries to a given state in a Markov process form a renewal process does not
depend at all on the exponential holding times. It depends only on the independence of
the holding times and on the Markov property of the embedded chain. Thus changing one
holding time from an exponential to a uniform rv of the same expected value does not
change the idependence of holding times and transitions between successive entries into any
given state. Thus successive visits to any given state form renewal epochs.
b) For each i, find both the time-average interval and the time-average number of overall state transitions
between successive visits to i.
Solution: The expected holding time in each state is unchanged by the modification and
thus the time average interval between successive visits to any given state i is the same
as before the modification. Also, the modificationdid not change the embedded chain, so
the overall time-average number of state transitions between visits to i is also unchanged.
These averages are calculated in part (d) of Exercise 7.5.
c) Is this modified process a Markov process in the sense that Pr{X(t) = i | X(⌧ ) = j, X(s) = k} =
Pr{X(t) = i | X(⌧ ) = j} for all 0 < s < ⌧ < t and all i, j, k? Explain.
Solution: The modified process is not a Markov process. Essentially, the fact that the
holding time in state 0 is no longer exponential, and thus not memoryless, means that if
X(t) = 0, the time at which state 0 was entered provides information about when state 0
will be left.
As an example, suppose that µ << so we can ignore multiple visits to state 0 within an
1) = 0 = 0
interval of length 2/ . We then see that Pr X(t+ 1 = 0 | X(t) = 0, X(t
since there is 0 probability of a visit to state 0 lasting for more than time 2/ . On the other
hand,
Pr X(t+
1
) = 0 | X(t) = 0, X(t
1
) = 1 > 0.
In fact, this latter probability is 1/8 in the limit of large t and small µ.
Exercise 7.31: Consider an M/G/1 queueing system with Poisson arrivals of rate and expected service
time E [X]. Let ⇢ = E [X] and assume ⇢ < 1. Consider a semi-Markov process model of the M/G/1 queueing
system in which transitions occur on departures from the queueing system and the state is the number of
customers immediately following a departure.
a) Suppose a colleague has calculated the steady-state probabilities {pi } of being in state i for each i
For each i
0.
0, find the steady-state probability ⇡i of state i in the embedded Markov chain. Give your
solution as a function of ⇢, pi , and p0 .
Solution: Note that the embedded chain is not a birth/death chain, since multiple arrivals
can occur without a state change. Eq. (7.90) gives {pi ; i 0} in terms of {⇡i ; i 0 and
248
APPENDIX A. SOLUTIONS TO EXERCISES
we can use that to solve for {⇡i ; i
0 in terms of {pi ; i
0},
pi /U (i)
⇡i = P
.
j pj /U (j)
For all states i > 0, U i is simply the expected service time, E [X]. For i = 0, however, the
state does not change untilPboth an arrival occurs and then that arrival is served. Thus
U (0) = E [X] + 1/ . Thus j pj /U (j) = (1 p0 )/E [X] + p0 /(E [X] + 1/ ) and for i > 0,
⇡i =
Similarly, ⇡0 = ⇢p0 /(1 + ⇢
pi
pi (⇢ + 1)
=
.
1 p0 + p0 /(1 + 1/⇢)
1 + ⇢ p0
(A.158)
p0 ).
b) Calculate p0 as a function of ⇢.
Solution: Consider the renewal process where renewals occur on each entry to state 0 and
let W be the expected time between renewals. We first show that W = /(1 ⇢) and then
use this to find p0 . Consider a reward function equal to 1 when the system is idle. Then the
expected reward per renewal is 1/ . Since the time-average reward is the fraction of time
the system is idle, it is 1 ⇢. Thus by the renewal reward theorem, 1 ⇢ = (1/ )(1/W ),
so W = 1/( (1 ⇢).
Now consider a di↵erent reward function which is 1 both while the system is empty and while
the first new arrival is being processed. The expected reward per renewal is 1/ + E [X] =
(1 + ⇢)/ . Now p0 is the time-average reward, so using the renewal-reward theorem again,
p0 =
1+⇢
·
1
= (1 + ⇢)(1
W
⇢).
(A.159)
c) Find ⇡i as a function of ⇢ and pi .
Solution: Substituting (A.159) into (A.158), we get ⇡i = pi /⇢ for i > 0. Similarly for
i = 0, ⇡0 = 1 ⇢.
d) Is pi the same as the steady-state probability that the queueing system contains i customers at a given
time? Explain carefully.
Solution: There is no reason to believe that pi , which is the probability of state i in the
embedded Markov chain, should be the same as the steady-state probability of i in the
M/G/1 queue. The state in the M/G/1 queue changes on each arrival and departure,
whereas that in the chain changes only on departures. For example, p0 = (1 + ⇢)(1 ⇢),
whereas the time-average probability of an empty system is (1 ⇢).
Exercise 7.32: Consider an M/G/1 queue in which the arrival rate is and the service time distribution
is uniform (0, 2W ) with W < 1. Define a semi-Markov chain following the framework for the M/G/1 queue
in Section 7.8.
a) Find P0j for j
0.
Solution: There is little to do here, since (7.99) gives P0j in terms of the density g(u) of
the service time. Here g(u) = 1/2W for 0  u  2W , so P0j is
Z 2W
( u)j exp( u)
P0j =
du; j 0.
2W j!
0
249
A.7. SOLUTIONS FOR CHAPTER 7
Note that the integrand here is the probability of j additional arrivals in the interval of size
u between the first arrival (i.e., the beginning of the first service) and the end of that service.
This expression can be manipulated in various ways, but does not materially simplify.
b) Find Pij for i > 0; j
i
1.
Solution:
Pij =
Z 2W
0
( u)j i+1 exp( u)
du;
2W (j i + 1)
j
i
1, i > 0.
Here, with i > 0, the integrand is the probability of j i + 1 arrivals in an interval of size
u. Note that starting a service in state i with j i + 1 arrivals and one departure leaves
the system in state j.
Exercise 7.33: Consider a semi-Markov process for which the embedded Markov chain is irreducible
and positive recurrent. Assume that the distribution of inter-renewal intervals for one state j is arithmetic
with span d. Show that the distribution of inter-renewal intervals for all states is arithmetic with the same
span.
Solution: Let X0 , X1 , . . . , be a sequence of embedded states and let U1 , U2 , . . . , be the
corresponding durations in the semi-Markov process. Assume for now that the durations
are allP
discrete rv’s. Consider a joint sample path of {(Xi , Ui )} in which X0 = j and Xn = j.
Then n`=1 Un is the duration of one or more renewals of state j in the semi-Markov process.
By hypothesis then, this is an arithmetic rv whose span is a multiple of d.
For some given state i, consider a subset of such joint sample paths such that X0 = j, Xm =
i, Xk = i, Xn = j for 0 < m < k < n and such that Xm+1 , . . . , Xk 1 6= i. Thus we are
looking at a sample path in the embedded chain that goesP
from j to i to a renewal of i
n
and then back to j. Considering the
associated
durations,
`=1 U` is again a multiple of
Pm
Pn
d for all such sample paths. Also `=1 U` + `=k+1 U` is a multiple of d since this also
corresponds P
to one or more renewals of state j (with the cycle from i to i omitted). It
follows that k`=m+1 U` is also a multiple of d.
We have thus shown that the inter-renewal intervals for state i are arithmetic with a span
that is a multiple of d. This multiple must be 1, since we can reverse the roles of i and j
above. Since i is arbitrary, the inter-renewal intervals for all states are arithmetic with span
d.
Finally we must show that the rv’s Uik are all discrete. Consider any sample path from
say j to j that includes a single transition from i to k. The duration of this path must be
a multiple of d for all sample values of Uik , and thus these sample values can di↵er only
by multiples of d, and they are thus discrete. Note that it doesn’t follow that the random
variables Uik are arithmetic since, for example, the embedded chain might have period 2
with odd numbered states transitioning only to even numbered states and vice versa. One
set of transitions might have durations of the form ↵ + md where 0 < ↵ < d is arbitrary
and m is a variable positive integer. The other set might have durations equal to md ↵.
Any renewal would contain an equal number of durations from each set and thus all renewal
durations would be integer multiples of d.
250
A.8
APPENDIX A. SOLUTIONS TO EXERCISES
Solutions for Chapter 8
Exercise 8.1: In this exercise, we evaluate Pr{e⌘ | X = a} and Pr{e⌘ | X = b} for binary detection
from vector signals in Gaussian noise directly from (8.40) and(8.41).
a) By using (8.40) for each sample value y of Y , show that
E [LLR(Y ) | X=a] =
(b
a)T (b
2 2
a)
.
Hint: Note that, given X = a, Y = a + Z .
Solution: Taking the expectation of (8.40) conditional on X = a,

(b a)T
b +a
E [LLR(Y ) | X=a] =
E
Y
2
2
✓
◆
T
(b a)
b +a
=
a
,
2
2
from which the desired result is obvious.
b) Defining
= kb
ak/(2 ), show that
E [LLR(Y ) | X=a] =
2 2.
Solution: The result in (a) can be expressed as E [LLR(Y )] =
the result follows.
kb
ak2 /2 2 , from which
c) Show that
VAR [LLR(Y ) | X=a] = 4 2 .
Hint: Note that the fluctuation of LLR(Y ) conditional on X = a is (1/ 2 )(b
a)T Z .
Solution: Using the hint,
VAR [LLR(Y )] =
=
1
2
1
2
(b
a)T E [Z Z T ] (b
(b
a)T [I](b
a) =
1
a)
2
1
2
kb
ak2 ,
from which the result follows.
d) Show that, conditional on X = a, LLR(Y ) ⇠ N ( 2 2 , 4 2 ). Show that, conditional on X = a,
LLR(Y )/2 ⇠ N (
, 1).
Solution: Conditional on X = a, we see that Y = a + Z is Gaussian and thus LLR(Y ) is
also Gaussian conditional on X = a. Using the conditional mean and variance of LLR(Y )
found in (b) and (c), LLR(Y ) ⇠ N ( 2 2 , 4 2 ).
When the rv LLR(Y ) is divided by 2 , the conditional mean is also divided by 2 , and the
variance is divided by (2 )2 , leading to the desired result.
Note that LLR(Y )/2 is very di↵erent from LLR(Y /2 ). The first scales the LLR and the
second scales the observation Y . If the observation itself is scaled, the result is a sufficient
statistic and the LLR is unchanged by the scaling.
251
A.8. SOLUTIONS FOR CHAPTER 8
e) Show that the first half of (8.44) is valid, i.e., that
Pr{e⌘ | X=a} = Pr{LLR(Y )
ln ⌘ | X=a} = Q
✓
ln ⌘
+
2
◆
.
Solution: The first equality above is simply the result of a threshold test with the threshold
⌘. The second uses the fact in (d) that LLR(Y )/2 , conditional on X = a, is N ( , 1.
This is a unit variance Gaussian rv with mean
. The probability that it exceeds ln ⌘/2
is then Q ln(⌘/2 + .
f ) By essentially repeating (a) through (e), show that the second half of (8.44) is valid, i.e., that
✓
◆
ln ⌘
Pr{e⌘ | X = b} = Q
+
.
2
Solution: One can simply rewrite each equation above, but care is needed in observing
that the likelihood ratio requires a convention for which hypothesis goes on top of the
fraction. Thus, here the sign of the LLR is opposite to that in parts (a) to (e). This also
means that the error event occurs on the opposite side of the threshold.
Exercise 8.2: (Generalization of Exercise 8.1) a) Let U = (b
E [U | X=b].
a)T KZ 1 Y . Find E [U | X=a] and
Solution: As explained in Example 8.2.5, U is a sufficient statistic. Conditional on X = a,
we have Y = a + Z and thus
E [U |X=a] = (b
a)T KZ 1 E [a + Z ] = (b
a)T KZ 1 a.
E [U |X=b] = (b
a)T KZ 1 E [b + Z ] = (b
a)T KZ 1 b.
Similarly
b) Find the conditional variance of U conditional on X = a. Hint: see the hint in Exercise 8.1 (c)
Solution: The fluctuation of U given X = a is (b
VAR [U | X=a] = (b
= (b
a)T KZ 1 Z . Thus
a)T KZ 1 E [Z Z T ] KZ 1 (b
a) KZ (b
T
1
a)
a).
c) Give the threshold test in terms of the sample value u of U, and evaluate Pr{e⌘ | X=a} and Pr{e⌘ | X=b}
from this and (b). Show that your answer agrees with (8.53).
Solution: As in (8.50), define 2 = (b a)T KZ 1 (b a)/4. Then VAR [U |X=a] = 4 2 .
The LLR for U is the same as that for Y and can be evaluated (somewhat tediously) from
(a) and (b). It is easier to use the value in (8.49), E [LLR(U )|X = a] = 2 2 . Similarly,
VAR [U |X=b] = 4 2 and E [LLR(U )|X = b] = +2 2 . The error probabilities are then the
same as those in (e) and (f) of Exercise 8.1, which is the same as (8.53).
d) Explain what happens if [KZ ] is singular. Hint: you must look at two separate cases, depending on the
vector b
a.
Solution: If b a is an eigenvector with eigenvalue 0 of [KZ ], it means that b a is in a
direction in which there is no noise. Thus the error probability will be zero if it is based on
252
APPENDIX A. SOLUTIONS TO EXERCISES
a threshold rule on the component of Y in the direction of b a. In fact, if b a has any
component in the direction of an eigenvector of eigenvalue 0, the error probability can be
made zero by looking only in that direction. This is called singular detection. Alternatively,
if b a is orthogonal to all eigenvectors of eigenvalue 0, the error proability will be nonzero
and can be found by using a coordinate system in which the directions of eigenvectors of
eigenvalue 0 are omitted from the formulation.
Exercise 8.3: a) Let Y be the observation rv for a binary detection problem, let y be the observed
sample value. Let v = v(y ) be a sufficient statistic and let V be the corresponding random variable. Show
that ⇤(y ) is equal to pV |X (v(y ) | b)/pV |X (v(y ) | a). In other words, show that the likelihood ratio of a
sufficient statistic is the same as the likelihood ratio of the original observation.
Solution: This is the third statement of Theorem 8.2.8, and we don’t see any way of
improving on that proof. It relies on the second statement of the theorem, which is further
investigated in Exercise 8.4
b) Show that this also holds for the ratio of probability densities if V is a continuous rv or random vector.
Solution: Repeating the proof of the third statement of Theorem 8.2.8 for the case in which
Y and V have densities, we start with the second statement, i.e., pX|Y (x|y ) = pX|V (x|v(y )).
pX|Y (1|y )
pX|Y (0|y )
fY |X (y |1)pX (1)/fY (y )
fY |X (y |0)pX (0)/fY (y ))
=
=
pX|V (1|v(y ))
pX|V (0|v(y ))
fV |X (v(y )|1)pX (1)/fV (v(y )
,
fV |X (v(y )|0)pX (0)/fV (v(y )
Where we have used Bayes’ rule on each term. Cancelling terms, we get
fY |X (y |1)
fV |X (v(y )|1)
=
.
fY |X (y |0)
fV |X (v(y )|0)
The left side of this is ⇤(y ), so this is the desired result. We see that it is derived simply
by replacing PMF’s with PDF’s, but there are some assumptions made about the densities
being positive. Turning this into a mathematical theorem with precise conditions would
require measure theory, and without that, it is better to rely on common sense applied to
simple models.
Exercise 8.4:
a) Show that if v(y ) is a sufficient statistic according to condition 1 of Theorem 8.2.8, then
pX|Y V x | y , v(y ) = pX|Y (x | y ).
(A.160)
Solution: Let V (Y
T ) be the rv with sample values v(y ). Note that {Y = y } is the same
event as {Y = y } {V = v(y )}. If Y is discrete and this event has positive probability,
then (A.160) is obvious since the condition on both sides is the same and has positive
probability. If Y is a rv with a positive density, then (A.160) is true if the condition Y = y
is replaced with y
< Y  y. Then (A.160) holds if lim !0 (1/ )Pr{y
< Y  y} > 0,
which is valid since Y has a positive density. This type of argument can be extended to the
253
A.8. SOLUTIONS FOR CHAPTER 8
case where Y is a random vector and holds whether or not V (Y ) is a sufficient statistic.
This says that X ! Y ! V is Markov, which is not surprising since V is simply a function
of Y .
b) Consider the subspace of events conditional on V (y ) = v for a given v. Show that for y such that
v(y ) = v,
pX|Y V x | y , v(y ) = pX|V (x | v).
(A.161)
Solution: We must assume (as a natural extension of (a)), that V (Y ) is a sufficient
statistic, i.e., that there is a function u such that for each v, u(v) = ⇤(y ) for all y such
that v(y ) = v. We also assume that Y = y has positive probabiity or probability density,
since the conditional probabilities don’t have much meaning otherwise. Then
pX|Y V 1 | y , v(y )
pX|Y V 0 | y , v(y )
=
=
pX|Y (1 | y )
pX|Y (0 | y )
p1
u v(y ) ,
p0
p1
⇤(y )
p0
=
where we first used (A.160), then Bayes’ law, and then the assumption that u(v) is a
sufficient statistic. Since this ratio is the same for all y for which v(y ) has the same value,
the ratio is a function of v alone,
pX|Y V 1 | y , v(y )
pX|Y V 0 | y , v(y )
=
pX|V 1 | v
pX|V 0 | v
(A.162)
.
Finally, since pX|V (0|v) = 1 pX|V (1|v) and pX|Y V (0|y , v(y )) = 1 pX|Y V (1|y , v(y )), we
see that (A.162) implies (A.161). Note that this says that X ! V ! Y is Markov.
c) Explain why this argument is valid whether Y is a discrete or continuous random vector and whether V
is discrete, continuous or part discrete and part continuous.
Solution: The argument above essentially makes no assumptions about either Y nor V
being discrete or having a density. The argument does depend on the assumption that the
given conditional probabilities are defined.
Exercise 8.5: a) Let Y be a discrete observation random vector and let v(y ) be a function of the sample
values of Y . Show that
pY |V X y | v(y ), x =
pY |X (y | x)
pV |X v(y ) | x)
.
(A.163)
Solution: We must assume that pV X (v(y ), x) > 0 so that the expression on the left is
defined. Then, using Bayes’ law on Y and V for fixed x on the left side of (A.163),
pY |V X y | v(y ), x =
pV |Y X v(y ) | y x pY |X y | x
pV |X v(y ) | x
.
(A.164)
Since V is a deterministic function of Y , the first term in the numerator above is 1, so this
is equivalent to (A.163),
b) Using Theorem 8.2.8, show that the above fraction is independent of X if and only if v(y ) is a sufficient
statistic.
254
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: Using Bayes’ law on the numerator and denominator of (A.163),
pY |X (y | x)
pV |X v(y ) | x)
=
pX|Y (x | y ) pY (y )
pX|V x | v(y ) pV (v(y )
.
(A.165)
If v(y ) is a sufficient statistic, then the second equivalent statement of Theorem 8.2.8, i.e.,
pX|Y (x | y ) = pX|V (x | v(y )),
shows that the first terms on the right side of (A.165) cancel, showing that the fraction
is independent of x. Conversely, if the fraction is independent of x, then the ratio of
pX|Y (x|y ) to pX|V (x|v(y )), is a function only of y . Since X is binary, this fraction must be
1, establishing the second statement of Theorem 8.2.8.
c) Now assume that Y is a continuous observation random vector, that v(y ) is a given function, and
V = v(Y ) has a probability density. Define
fY |V X (y | v(y ), x) =
fY |X (y | x)
fV |X v(y ) | x
.
(A.166)
One can interpret this as a strange kind of probability density on a conditional sample space, but it is more
straightforward to regard it simply as a fraction. Show that v(y ) is a sufficient statistic if and only if this
fraction is independent of x. Hint: Model your derivation on that in (b), modifying (b) as necessary to do
this.
Solution: The trouble with (A.164) can be seen by looking at (A.165). When this is
converted to densities, the joint density of Y , V (Y ) has the same problem as the densities
in Example 8.2.10. The numerator and denominator of (A.163) are well defined as densities,
however, and the argument in (b) carries through as before.
Exercise 8.6: Consider Example 8.2.5, and let Z = [A]W where W ⇠ N (0, I) is normalized IID
Gaussian and [A] is non-singular. The observation rv Y is a + Z given X = a and is b + Z given X = b.
Suppose the observed sample value y is transformed into v = [A 1 ]y . Explain why v is a sufficient statistic
for this detection problem (and thus why MAP detection based on v must yield the same decision as that
based on y ).
Solution: v is a sufficient statistic if there is a function u such that u(v (y )) = ⇤(y ). Since
v is an invertible function of y , we can clearly map v back into y and map that into ⇤(y ).
In equations, using (8.64),
⇤(y ) =
fV |X (v (y ) | 1)
.
fV |X (v (y ) | 0)
b) Consider the detection problem where V = [A 1 ]a +W given X = a and [A 1 ]b +W given X = b. Find
the log-likelihood ratio LLR(v ) for a sample value v of V . Show that this is the same as the log-likelihood
ratio for a sample value y = [A]v of Y .
Solution: The problem V = [A 1 ]X + W is the same as that in Example 8.2.3 except
that the received signal is V rather than Y , the transmitted signals are [A 1 ]a and [A 1 ]b,
255
A.8. SOLUTIONS FOR CHAPTER 8
and
2 = 1. The log likelihood ratio, from (8.40) is
⌘T ✓
⌘◆
1⇣
1
1
[A ]a
v
[A ]b + [A ]a
2
✓
⇣
⌘T
⇣
⌘◆
1 T
1
1 1
LLR(v (y )) =
b a [A ] [A ]y [A ] b + a
2
!
⇣
⌘
T
b +a
1
1 T
1
=
b a [A ] [A ] y
.
2
2
⇣
LLR(v ) =
[A 1 ]b
1
(A.167)
This is the same as LLR(y ) as given in (8.45) after making the observation that [KZ ] =
[AT A].
c) Find Pr{e | X=a} and Pr{e | X=b} for the detection problem in (b) by using the results of Example
8.2.3. Show that your answer agrees with (8.53). Note: the methodology here is to transform the observed
sample value to make the noise IID; this approach is often both useful and insightful.
Solution: The error probability using the results of Example 8.2.3 is given by (8.44) in
terms of , which is defined as
=
=
1
1
k[A 1 ]b [A 1 ]ak = kA 1 ] b
2
2
1p
(b a)T [A 1 ]T [A 1 ](b a).
2
a k
The result in (8.53) is the same equation, except that is defined in (8.50) as
r
(b a)T
(b a)
=
[KZ 1 ]
.
2
2
Since [KZ ] = [AT A], these are the same.
Exercise 8.7: Binary frequency shift keying (FSK) with incoherent reception can be modeled in terms of
a 4 dimensional observation vector Y = (Y1 , Y2 , Y3 , Y4 )T . Y = U + Z where Z ⇠ N (0, 2 I) and Z is independent of X. Under X = 0, U = (a cos , a sin , 0, 0)T , whereas under X = 1, U = (0, 0, a cos , a sin )T .
The random variable is uniformly distributed between 0 and 2⇡ and is independent of X and Z . The a
priori probabilities are p0 = p1 = 1/2.
a) Convince yourself from the circular symmetry of the situation that the ML receiver calculates the sample
values v0 and v1 of V0 = Y12 + Y22 and V1 = Y32 + Y42 and chooses x̂ = 0 if v0
v1 and chooses x̂ = 1
otherwise.
Solution: It is easier to visualize the here if we express the 4-rv ’s as pairs of complex
rv’s. That is, U = (aei , 0) under X = 0 and U = (0, aei ) under X = 1. Similarly,
Z = (R0 ei✓0 , R1 ei✓1 ) where R0 and R1 are Rayleigh rv’s and are independent
p
p of the other
rv’s as in the problem description. The received complex rv’s are ( V0 ei 0 , V1 ei 1 ). The
phases , ✓0 , and ✓1 are each uniformly distributed over [0, 2⇡) and are independent of
each other and of the other variables as described. The received phases, 0 and 1 are
complicated
functions of the other phases, but it is intuitively clear if one draws a picture
p i
0
of V0 e
that 0 is uniformly distributed over [0, 2⇡) and is independent of V0 , V1 , and
X. This is proven in a slightly more general context in the solution to Exercise 3.23. In the
256
APPENDIX A. SOLUTIONS TO EXERCISES
same way, 2 is uniform and independent of V0 , V1 , and X. It follows that V0 and V1 form
a sufficient statistic for X
b) Find Pr{V1 > v1 | X=0} as a function of v1 > 0.
Solution: Conditional on X = 0, V1 = |Z1 |2 . Since |Z1 |2 is the sum of the squares of 2 IID
real Gaussian rv’s each of mean 2 ,
Pr{V1 > v1 | X=0} = exp( v1 /2 2 ).
(A.168)
This is derived in detail in the solution to Exercise 3.1.
c) Show that
fY1 ,Y2 |X, (y1 , y2 | 0, 0) =
1
exp
2⇡ 2

y12
y22 + 2y1 a
2 2
a2
.
Solution: Conditional on X = 0 and
= 0, Y1 is N (a, 2 ) and Y2 is N (0,
independent of Y1 . Thus their conditional joint density is as shown.
(A.169)
2 ) and
d) Show that
Pr{V1 > V0 | X=0, =0} =
Z
fY1 ,Y2 |X, (y1 , y2 | 0, 0)Pr V1 > y12 + y22
dy1 dy2 .
(A.170)
Show that this is equal to (1/2) exp( a2 /(4 2 ).
Solution: The right side of (A.170) is Pr V1 > Y12 + Y22 | X = 0, + 0 . Recalling that
V0 = Y12 + Y22 , we see that (A.170) is satisfied. Substituting (A.168) and (A.169) into
(A.170), we get
 2

Z
1
y1 y22 + 2y1 a a2
exp( y12 y22 )
Pr{V1 >V0 |X=0, =0} =
exp
dy1 dy2
2⇡ 2
2 2
2 2

Z
1
2y12 2y22 + 2y1 a a2
=
exp
dy1 dy2
2⇡ 2
2 2
✓ 2◆
1
a
=
exp
.
(A.171)
2
4 2
e) Explain why this is the probability of error (i. e., why the event V1 > V0 is independent of ), and why
Pr{e | X=0} = Pr{e | X=1}.
Solution: If we use an arbitrary phase in (A.168), we can express Z1 , Z2 in a coordinate
system rotated by to see that (A.168) is still valid. Thus (A.171) is valid for all and
thus valid with no conditioning on . The probability of error conditional on X = 1 is
obviously the same because of the symmetry between X = 0 and X = 1.
Exercise 8.8: Binary frequency shift keying (FSK) on a Rayleigh fading channel can be modeled in
terms of a 4 dimensional observation vector Y = (Y1 , Y2 , Y3 , Y4 )T . Y = U + Z where Z ⇠ N (0, 2 I) and
Z is independent of U . Under X = 0, X = (U1 , U2 , 0, 0)T , whereas under X = 1, U = (0, 0, U3 , U4 )T . The
random variables Ui ⇠ N (0, a2 ) are IID. The a priori probabilities are p0 = p1 = 1/2.
a) Convince yourself from the circular symmetry of the situation that the ML receiver calculates sample
values v0 and v1 for V0 = Y12 + Y22 and V1 = Y32 + Y42 and chooses x̂ = 0 if v0 > v1 and chooses x̂ = 1
otherwise.
257
A.8. SOLUTIONS FOR CHAPTER 8
Solution: See the solution to Exercise 8.7(a) for a proof that (V0 , V1 ) is a sufficient statistic
for this problem. From the symmetry between X = 1 and X = 0, it is then clear that the
ML decision threshold is at v1 v0 = 0.
b) Find fV0 |X (v0 | 0) and find fV1 |X (v1 | 0).
Solution: Conditional on X = 0, we see that Y1 = Z1 + U1 and Y2 = Z2 + U2 . Thus,
conditional on X = 0, Y1 ⇠ N (0, 2 + a2 ) and, independently, Y2 ⇠ N (0, 2 + a2 ). Since V0
given X = 0 is the sum of the squares of two IID rv’s, each N (0, 2 + a2 ),
✓
◆
1
v0
fV0 |X (v0 | 0) =
exp
;
for v0 0.
2( 2 + a2 )
2( 2 + a2 )
In the same way, conditional on X = 0, Y3 ⇠ N (0, 2 ) and Y4 ⇠ N (0,
✓
◆
1
v1
fV1 |X (v1 | 0) = 2 exp
;
for v1 0.
2
2 2
c) Let W = V0
2 ). Thus
V1 and find fW (w | X=0).
Solution: The joint density of V0 and V1 conditional on X = 0 is
✓
◆
1
v0
v1
fV0 V1 |X (v0 , v1 |0) =
exp
;
4( 2 + a2 ) 2
2( 2 + a2 ) 2 2
for v0 , v1
0.
Substitute w = v0 v1 for v0 in this and then integrate over v1 . For w > 0, we integrate v1
from 0 to 1, and for w  0, we integrate from w to 1. The result is
8
1
w
for w > 0,
< 4 2 +2a2 exp 2 2 +2a2 ;
;
fW (w) =
:
1
w
exp
;
for w  0.
2
2
2
4 +2a
2
d) Show that Pr{e | X=0} = [2 + a2 / 2 )] 1 . Explain why this is also the unconditional probability of an
incorrect decision.
Solution: An error occurs if W  0. We have
Z 0
Pr{e | X = 0} =
fW |X (w|0) dw =
1
2 2
1
=
.
2
2
4 + 2a
2 + a2 / 2
The same probability of error clearly holds for X = 1 because of the symmetry between
X = 0 and X = 1; thus this is the unconditional error probability.
Note the di↵erence between error probability for the Rayleigh fading channel using FSK
and that for incoherent reception in Exercise 8.7. In the Rayleigh fading case, Pr{e} decays
essentially as 1/(a2 / 2 ), whereas it decays exponentially in a2 / 2 for the incoherent case.
The reason is that, even with very large a2 / 2 , the signal has a significant probabllity of
fading to a level smaller than the noise in the fading case.
Exercise 8.9: A disease has two strains, 0 and 1, which occur with a priori probabilities p0 and p1 = 1 p0
respectively.
258
APPENDIX A. SOLUTIONS TO EXERCISES
a) Initially, a rather noisy test was developed to find which strain is present for patients with the disease.
The output of the test is the sample value y1 of a random variable Y1 . Given strain 0 (X=0), Y1 = 5 + Z1 ,
and given strain 1 (X=1), Y1 = 1 + Z1 . The measurement noise Z1 is independent of X and is Gaussian,
Z1 ⇠ N (0,
2
). Give the MAP decision rule, i.e., determine the set of observations y1 for which the decision
is x̂ = 1. Give Pr{e | X=0} and Pr{e | X=1} in terms of the function Q(x).
Solution: This is simply a case of binary detection with an additive Gaussian noise rv. To
prevent simply copying the answer from Example 8.2.3, the signal a associated with X = 0
is 5 and the signal b associated with X = 1 is 1. Thus b < a, contrary to the assumption in
Example 8.2.3. Looking at that example, we see that (8.27 ), repeated below, is still valid.
LLR(y) =
✓
b
a
2
◆✓
y
b+a
2
◆
x̂(y)=b
<
ln(⌘).
x̂(y)=a
We can get a threshold test on y directly by first taking the negative of this expression and
then dividing both sides by the positive term (a b)/ 2 to get

y
>
x̂(y)=b
x̂(y)=a
2 ln(⌘)
a
b
+
b+a
.
2
We get the same equation by switching the association of X = 1 and X = 0, which also
changes the sign of the log threshold.
b) A budding medical researcher determines that the test is making too many errors. A new measurement
procedure is devised with two observation random variables Y1 and Y2 . Y1 is the same as in (a). Y2 , under
hypothesis 0, is given by Y2 = 5 + Z1 + Z2 , and, under hypothesis 1, is given by Y2 = 1 + Z1 + Z2 . Assume
that Z2 is independent of both Z1 and X, and that Z2 ⇠ N (0,
2
). Find the MAP decision rule for x̂ in
terms of the joint observation (y1 , y2 ), and find Pr{e | X=0} and Pr{e | X=1}. Hint: Find fY2 |Y1 ,X (y2 | y1 , 0)
and fY2 |Y1 ,X (y2 | y1 , 1).
Solution: Note that Y2 is simply Y1 plus the noise term Z2 , and that Z2 is independent of
X and Y1 . Thus, Y2 , conditional on Y1 and X is simply N (Y1 , 2 ), which is independent of
X. Thus Y1 is a sufficient statistic and Y2 is irrelevant. Including Y2 does not change the
probability of error.
c) Explain in laymen’s terms why the medical researcher should learn more about probability.
Solution: It should have been clear intuitively that adding an additional observation that
is only a noisy version of what has already been observed will not help in the decision, but
knowledge of probability sharpens one’s intuition so that something like this becomes self
evident even without mathematical proof.
d) Now suppose that Z2 , in (b), is uniformly distributed between 0 and 1 rather than being Gaussian. We
are still given that Z2 is independent of both Z1 and X. Find the MAP decision rule for x̂ in terms of the
joint observation (y1 , y2 ) and find Pr(e | X=0) and Pr(e | X=1).
Solution: The same argument as in (b) shows that Y2 , conditional on Y1 , is independent
of X, and thus the decision rule and error probability do not change.
e) Finally, suppose that Z1 is also uniformly distributed between 0 and 1. Again find the MAP decision rule
and error probabilities.
259
A.8. SOLUTIONS FOR CHAPTER 8
Solution: By the same argument as before, Y2 , conditional on Y1 is independent of X, so
Y1 is a sufficient statistic and Y2 is irrelevant. Since Z1 is uniformly distributed between 0
and 1, then Y1 lies between 5 and 6 for X = 0 and between 1 and 2 for X = 1. There is
thus no possibility of error in this case.
Exercise 8.10: a) Consider a binary hypothesis testing problem, and denote the hypotheses as X = 1
and X = 1. Let a = (a1 , a2 , . . . , an )T be an arbitrary real n-vector and let the observation be a sample
value y of the random vector Y = Xa + Z where Z ⇠ N (0,
2
In ) and In is the n ⇥ n identity matrix.
Assume that Z and X are independent. Find the maximum likelihood decision rule and find the probabilities
of error Pr(e | X= 1) and Pr(e | X=1) in terms of the function Q(x).
Solution: This is a minor notational variation on Example 8.2.4. Since we are interested
in maximum likelihood, ln ⌘ = 0. The ML test, from (8.41), is then
LLR(y ) =
x̂(y )=1
2a T y
2
<
0.
x̂(y )= 1
The error probabilities, from (8.44), are then
Pr{e | X=1} = Q ( )
where
Pr{e | X= 1} = Q ( ) ,
= k2ak/(2 ).
b) Now suppose a third hypothesis, X = 0, is added to the situation of (a). Again the observation random
vector is Y = Xa + Z , but here X can take on values 1, 0, or +1. Find a one dimensional sufficient
statistic for this problem (i.e., a one dimensional function of y from which the likelihood ratios
⇤1 (y ) =
pY |X (y | 1)
pY |X (y | 0)
and
⇤ 1 (y ) =
pY |X (y |
1)
pY |X (y | 0)
can be calculated).
Solution: We have seen that the likelihood ratio for each of these binary decisions depends
only on the noise in the direction of the di↵erence between the vectors. Since each di↵erence
is a, we conclude that a T y is a sufficient statistic. One can verify this easily by calculating
⇤1 (y ) and ⇤ 1 (y ).
c) Find the maximum likelihood decision rule for the situation in (b) and find the probabilities of error,
Pr(e | X=x) for x =
1, 0, +1.
Solution: For X = 1, an error is made if ⇤1 (y ) is less than 1. This occurs if a T y < kak2 /2
and has probability Pr{e | X = 1} = Q(kak/2 ). For X = 0, an error occurs if a T y
kak2 /2 or if a T y < kak2 /2. Thus Pr{e|X = 0} = 2Q(kak/2 ). Finally, for X = 1, an
error occurs if a T y
kak2 /2. Pr{e|X = 1} = Q(kak/2 ).
d) Now suppose that Z1 , . . . , Zn in (a) are IID and each is uniformly distributed over the interval
2 to
+2. Also assume that a = (1, 1, . . . , 1)T . Find the maximum likelihood decision rule for this situation.
Solution: If X = 1, then each Yi , 1  i  n lies between -1 and 3 and the conditional
probability density of each such point is (1/4)n . Similarly, for X = 1, each Yi lies between
-3 and 1. If all Yi are between -1 and +1, then the LLR is 0. If any are above 1, the LLR
260
APPENDIX A. SOLUTIONS TO EXERCISES
is 1, and if any are below -1, the LLR is 1. Thus the ML rule is x̂ = 1 if any Yi > 1
and x̂ = 1 if any Yi < 1. Everything else (which has aggregate probability 2 n ) is ‘don’t
care,’ which by convention is detected as X = 1.
Exercise 8.11: A sales executive hears that one of his salespeople is routing half of his incoming sales
to a competitor. In particular, arriving sales are known to be Poisson at rate one per hour. According to
the report (which we view as hypothesis X=1), each second arrival is routed to the competition; thus under
hypothesis 1 the interarrival density for successful sales is f (y|X=1) = ye y ; y 0. The alternate hypothesis
(X=0) is that the rumor is false and the interarrival density for successful sales is f (y|X=0) = e y ; y 0.
Assume that, a priori, the hypotheses are equally likely. The executive, a recent student of stochastic
processes, explores various alternatives for choosing between the hypotheses; he can only observe the times
of successful sales however.
a) Starting with a successful sale at time 0, let Si be the arrival time of the ith subsequent successful sale.
The executive observes S1 , S2 , . . . , Sn (n
1) and chooses the maximum aposteriori probability hypothesis
given this data. Find the joint probability density f (S1 , S2 , . . . , Sn |X=1) and f (S1 , . . . , Sn |X=0) and give
the decision rule.
Solution: The interarrival times are independent conditional each on X = 1 and X = 0.
The density of an interarrival interval given X = 1 is Erlang of order 2, with density xe x ,
so
n h
Y
fS |X (s1 , . . . , sn |1) =
(si
si 1 ) exp (si
i=1
n
i
Y
sn
si 1 ) = e
si
si 1 .
i=1
The density of an interarrival interval given X = 0 is exponential, so
fS |X (s1 , . . . , sn |0) = e sn .
The MAP rule, with p0 = p1 is then
LLR(y ) = ln(s0 ) +
n
X
x̂(y )=1
ln(si
si 1 )
i=2
<
0.
(A.172)
x̂(y )=0
The executive might have based a decision only on the aggregate time for n sales to take
place, but this would not have been a sufficient statistic for the sequence of sale times, so
this would have yielded a higher probability of error. It is also interesting to note that a
very short interarrival interval is weighed very heavily in (A.172), and this is not surprising
since very short intervals are very improbable under X = 1. The sales person, if both
fraudulent and a master of stochastic processes, would recognize that randomizing the sales
to the competitor would make the fraud much more difficult to detect.
b) This is the same as (a) except that the system is in steady state at time 0 (rather than starting with a
successful sale). Find the density of S1 (the time of the first arrival after time 0) conditional on X=0 and
on X=1. What is the decision rule now after observing S1 , . . . , Sn .
Solution: Under X = 1, the last arrival before 0 was successful with probability 1/2 and
routed away with probability 1/2. Thus fS1 |X (s1 |1) = (1/2)e s1 + (1/2)s1 e s1 . This could
also be derived as a residual life probability. With this modification, the first term in the
LLR of (A.172) would be changed from ln(s1 ) to ln((s1 + 1)/2).
261
A.8. SOLUTIONS FOR CHAPTER 8
c) This is the same as (b), except rather than observing n successful sales, the successful sales up to some
given time t are observed. Find the probability, under each hypothesis, that the first successful sale occurs
in (s1 , s1 +
], the second in (s2 , s2 +
], . . . , and the last in (sN (t) , sN (t) +
] (assume
very small). What
is the decision rule now?
Solution: The somewhat artificial use of
here is to avoid dealing with the discrete
rv N (t) and the density of a random number of rv’s. One can eliminate this artificiality
after understanding the solution. Under X = 0, the probability that N (t) = n and Si 2
[si , si + ) for 1  i  n and sn  t (using an approximation for very small) is
" n
#
Y
n
e s1
e (si si 1 ) e (t sn ) = n e t .
(A.173)
i=2
The term e (t sn ) on the left side is the probability of no arrivals in (sn , t], which along
with the other arrival times specifies that N (t) = n.
Under X = 1, the term e s1 in (A.173) is changed to (1/2)(s1 +1)e s1 as we saw in (b). The
final term, e (t sn ) in (A.173) must be changed to the probability of no successful sales in
(sn , t] under X = 1. This is (t sn + 1)e (t sn ) . The term n cancels out in the likelihood
ratio, so the decision rule is
LLR(y ) = ln
✓
1 + s1
2
◆
+ ln(t
sn + 1) +
n
X
i=2
x̂(y )=1
ln(si
si 1 )
<
0.
x̂(y )=0
Exercise 8.12: This exercise generalizes the MAP rule to cases where neither a PMF nor PDF exist. To
view the problem in its simplest context, assume a binary decision, a one-dimensional observation (y 2 R1 ),
and fixed a priori probabilities p0 , p1 . An arbitrary test, denoted test A, can be defined by the set A of
observations that are mapped into decision x̂ = 1. Using such a test, the overall probability of correct
decision is given by (8.6) as
n
o
Pr X̂A (Y ) = X = p0 Pr{Y 2 Ac | X = 0} + p1 Pr{Y 2 A | X = 1} .
(A.174)
The maximum of this probability over all tests A is then
n
o
⇥
⇤
sup Pr X̂A (Y ) = X = sup p0 Pr{Y 2 Ac | X = 0} + p1 Pr{Y 2 A | X = 1} .
A
(A.175)
A
We first consider this supremum over all A consisting of a finute union of disjoint intervals. Then (using
measure theory), we show that the supremum over all measurable sets A is the same as that over finite
unions of disjoint intervals.
a) If A is a union of k intervals, show that Ac is a union of at most k + 1 disjoint intervals. Intervals can be
open or closed on each end and can be bounded by ±1.
Solution: First suppose A is a union of k disjoint finite intervals, (a1 , b1 ), (a2 , b2 ), . . . , (ak , bk ),
ordered so that a1 < a2 < · · · < ak and where the notation throughout omits distinguishing
between the inclusion of end points. Then Ac is ( 1, a1 ), (b1 , a2 ), . . . , (bk , 1), which is
k + 1 intervals. If a1 = 1 or bk = 1, this simply reduces the number of intervals in Ac .
If the intervals are not disjoint, then at most one interval in Ac can start at each bi and at
most one at 1, so the number is at most k + 1 in general.
262
APPENDIX A. SOLUTIONS TO EXERCISES
b) Let I be the partition of R created by the intervals of both A and of Ac . Let I(j) be the jth interval in
this partition. Show that
h
i
X
Pr{x̂A (Y ) = X} 
max p0 Pr{Y 2 I(j) | X = 0} , p1 Pr{Y 2 I(j) | X = 1} .
(A.176)
j
Hint: Break (A.174) into intervals and apply the MAP principle on an interval basis.
P
Solution: Express Pr{Y 2 A | X} as ki=1 Pr{Y 2 (ai , bi )|X}. Pr{Y 2 Ac |X} is expressed
in the same way over the complementary intervals. Thus Pr{x̂A (Y ) = X} is the sum in
(A.176) without the max but rather a particular rule over each interval of I.
c) The expression on the right in (A.176) is a function of the partition but is otherwise independent of A. It
corresponds to a test where Y is first quantized to a finite set of intervals, and the MAP test is then applied
to this discrete problem. We denote this test as MAP(I). Let the partition I 0 be a refinement of I in the
sense that each interval of I 0 is contained in some interval of I. Show that
n
o
n
o
Pr x̂MAP(I) (Y ) = X  Pr x̂MAP(I0 ) (Y ) = X .
(A.177)
Solution: To avoid a lot of notation, suppose I 0 is same as I except for one interval
(a, b) 2 I that is broken into (a, c), (c, b) in I 0 . Then MAP(I) is the same as MAP(I 0 )
except that MAP(I) is restricted to the same decision on (a, c) and (c, b), and thus (A.177)
is satisfied in this case. Any finite refinement can be expressed as a sequence of single
refinements of this type.
d) Show that for any two finite partitions of R into intervals, there is a third finite partition that is a
refinement of each of them.
Solution: If I is a finite partition of n disjoint intervals, it is specified by the n 1
points separating these intervals. Similarly if I 0 has m disjoint intervals, it is specified by
m 1 points. Taking the union of these point sets, we specify another partition which is a
refinement of both.
e) Consider a sequence {Aj , j 1} (each is a finite union of intervals) that approaches the sup (over finite
unions of intervals) of (A.175).
Demonstrate o
a corresponding sequence of successive refinements of partitions
n
{Ij , j
1} for which Pr x̂MAP(Ij ) (Y ) = X
approaches the same limit.
Note what this exercise has shown so far: if the likelihoods have no PDF or PMF, there is no basis for a
MAP test on an individual observation y. However, by quantizing the observations sufficiently finely, the
quantized MAP rule has an overall probability of being correct that is as close as desired to the optimum
over all rules where A is a finite (but arbitrarily large) union of intervals. The arbitrarily fine quantization
means that the decisions are arbitrarily close to pointwise decisions, and the use of quantization means that
infinitely fine resolution is not required for the observations.
Solution: We demonstrated in (b) that each finite union of intervals Aj corresponds to a
finite partition, say Ij for which (A.176) is satisfied. Each test MAP(Ij ) also corresponds
to a test A0j where A0j is the union of intervals in Ij that are mapped into decision 1. Thus,
for each j,
n
o
n
o
Pr x̂MAP(Ij ) (Y ) = X  sup Pr X̂A (Y ) = X .
A
n
o
where the sup is over finite unions of intervals. Thus limj!1 Pr x̂MAP(Ij ) (Y ) = X must
have this same limit.
263
A.8. SOLUTIONS FOR CHAPTER 8
f ) Next suppose the supremum in (A.175) is taken over all measureable sets A 2 R. Show that, given any
measurable set A and any ✏ > 0, there is a finite union of intervals A0 such that Pr{A 6= A0 | X = `}  ✏
both for ` = 0 and ` = 1. Show from this that the sup in (A.175) is the same whether taken over measurable
sets or finite unions of intervals.
Solution: This is a standard result in measure theory.
Exercise 8.13: For the min-cost hypothesis testing problem of Section 8.3, assume that there is a cost
C0 of choosing X=1 when X=0 is correct, and a cost C1 of choosing X=0 when X=1 is correct. Show that
a threshold test minimizes the expected cost using the threshold ⌘ = (C0 p0 )/(C1 p1 ).
Solution: For a given sample vector y of the received rv Y , the expected cost if the
decision x̂(y ) = 0 is made is the probability of an incorrect decision (i.e., that X = 1) times
the cost C1 , Thus
E [Cost | Y = y , x̂(y ) = 0] = C1 Pr{X = 1|Y = y } .
Similarly, if the decision x̂ = 1 is made,
E [Cost | Y = y , x̂(y ) = 1] = C0 Pr{X = 0|Y = y } .
The min-cost decision, given Y = y , is the decision for which this conditional cost is
minimum (with the usual convention of choosing x̂(y ) = 1 in case of a tie. Thus the
decision can be expressed as
x̂(y )=1
C1 Pr{X = 1|Y = y }
<
x̂(y )=0
C0 Pr{X = 0|Y = y } .
Using Bayes’ law on each probability, and rearranging terms,
x̂(y )=1
⇤(y )
<
x̂(y )=0
C0 p0
.
C1 p1
Exercise 8.14: a) For given ✓, 0 < ✓  1, let ⌘⇤ achieve the supremum sup0⌘<1 q1 (⌘) + ⌘(q0 (⌘) ✓).
Show that ⌘ ⇤  1/✓. Hint: Think in terms of Lemma 8.4.1 applied to a very simple test.
Solution: Recall that q0 (⌘) and q1 (⌘) are the error probabilities, given X = 0 and X = 1
respectively, using a threshold test with threshold ⌘. As shown in Lemma 8.4.1, for each ⌘,
there is a straight line of slope ⌘ passing through the point q0 (⌘), q1 (⌘) and for every
test A, the point q0 (A), q1 (A) lies North East of that line.
Since this is true for all ⌘ > 0, it is also true for the error curve, defined as the supremum
of all these straight lines, sup0⌘<1 q1 (⌘) + ⌘(q0 (⌘) ✓). Then (see Theorem 8.4.2), for all
tests A, q0 (A), q1 (A) lies above or on the curve ✓, u(✓) ; 0  ✓  1.
From the first figure below, we can see that the slope |1/✓| |⌘ ⇤ | where ⌘ ⇤ is the slope of
the error curve at ✓. To see this analytically, |1/✓| is the slope of the hypotenuse of a right
triangle of height 1 and width ✓. Similarly, |⌘ ⇤ | is the slope of the hypotenuse of a right
264
APPENDIX A. SOLUTIONS TO EXERCISES
triangle of height q1 (⌘ ⇤ )+⌘ ⇤ q0 (⌘ ⇤ ) u(✓) and width ✓. Now u(✓) 0 and (as explained more
fully in the next paragraph), q1 (⌘) + ⌘q0 (⌘)  1 for all ⌘ 0, showing that |1/✓| |⌘ ⇤ |. The
second figure simply gives an example (similar to Example 8.4.3) in which this inequality
is met with equality.
1
1
slope 1/✓
q1 (⌘ ) + ⌘ q0 (⌘ ) ⇠
⇠⇠ tangent at ✓ to error curve
9
⇤
⇤
⇤
u(✓)
slope
⌘⇤
error curve
✓
@
@
@
@ error curve
@
@
@
@
@
u(✓)
@
✓
1
1
Finally, to verify that q1 (⌘) + ⌘q0 (⌘)  1, note from the proof of Lemma 8.4.1 that q1 (⌘) +
⌘q0 (⌘)  q1 (A) + ⌘q0 (A) for all sets A. Choosing A to be the empty set (i.e., always decide
hypothesis 0), we have q1 (A) = 1 and q0 (A) = 0, so q1 (⌘) + ⌘q0 (⌘)  1.
b) Show that the magnitude of the slope of the error curve u(✓) at ✓ is at most 1/✓.
Solution: The slope of the error curve at ✓ is ⌘ ⇤ as defined and shown in (a). It was also
shown in (a) that |⌘ ⇤ |  |1/✓|.
Exercise 8.15: Consider a binary hypothesis testing problem where X is 0 or 1 and a one dimensional
observation Y is given by Y = X + U where U is uniformly distributed over [-1, 1] and is independent of X.
a) Find fY |X (y | 0), fY |X (y | 1) and the likelihood ratio ⇤(y).
Solution: Note that fY |X is simply the density of U shifted by X, i.e.,
⇢
⇢
1/2;
1y1
1/2; 0  y  2
fY |X (y | 0) =
fY |X (y | 1) =
.
0; elsewhere
0; elsewhere
The likelihood ratio ⇤(y) is defined only for
non-zero outside this range.
1  y  2 since neither conditional density is
8
< 0;
fY |X (y | 1)
1;
⇤(y) =
=
:
fY |X (y | 0)
1;
1y<0
0y1 .
1<y2
b) Find the threshold test at ⌘ for each ⌘, 0 < ⌘ < 1 and evaluate the conditional error probabilities,
q0 (⌘) and q1 (⌘).
Solution: Since ⇤(y) has finitely many (3) possible values, all values of ⌘ between any
adjacent pair lead to the same threshold test. Thus, for ⌘ > 1, ⇤(y) ⌘, if and only if (i↵)
⇤(y) = 1. Thus x̂ = 1 i↵ 1 < y  2. For ⌘ = 1, x̂ = 1 i↵ ⇤(y) 1, i.e., i↵ ⇤(y) is 1 or 1.
Thus x̂ = 1 i↵ 0  y  2. For ⌘ < 1, (y) ⌘ i↵ ⇤(y) is 1 or 1. Thus x̂ = 1 i↵ 0  y  2.
Note that the MAP test is the same for ⌘ = 1 and ⌘ < 1, in both cases choosing x̂ = 1 for
0  y  2.
Consider q1 (⌘) (the error probabiity using a threshold test at ⌘ conditional of X = 1). For
⌘ > 1, we have seen that x̂ = 1 (no error) for 1 < y  2. This occurs with probability 1/2
265
A.8. SOLUTIONS FOR CHAPTER 8
given X = 1. Thus q1 (⌘) = 1/2 for ⌘ > 1. Also, for ⌘ > 1, x̂ = 0 for 1  y  1. Thus
q0 (⌘) = 0. Reasoning in the same way for ⌘  1, we have q1 (⌘) = 0 and q0 (⌘) = 1/2.
c) Find the error curve u(↵) and explain carefully how u(0) and u(1/2) are found (hint: u(0) = 1/2).
Solution: Each ⌘ > 1 maps into the pair of error probabilities (q0 (⌘), q1 (⌘)) = (0, 1/2).
Similarly, each ⌘  1 maps into the pair of error probabilities (q0 (⌘), q1 (⌘)) = (1/2, 0). The
error curve contains these points and also contains the supremum of the straight lines of
each slope ⌘ around (0, 1/2) for ⌘ > 1 and around (1/2, 0) for ⌘  1. The resulting curve
is given below.
1 (0, 1)
1/2 s
q1 (⌘)
s
q0 (⌘) 1/2
(1, 0)
1
Another approach (perhaps more insightful) is to repeat (a) and (b) for the alternative
threshold tests that choose x̂ = 0 in the don’t care cases, i.e., the cases for ⌘ = 1 and
0  y  1. It can be seen that Lemma 8.4.1 and Theorem 8.4.2 apply to these alternative
threshold tests also. The points on the straight line between (0, 1/2) and (1/2, 0) can
then be achieved by randomizing the choice between the threshold tests and the alternative
threshold tests.
d) Find a discrete sufficient statistic v(y) for this problem that has 3 sample values.
Solution: v(y) = (y) is a discrete sufficient statistic with 3 sample values.
e) Describe a decision rule for which the error probability under each hypothesis is 1/4. You need not use
a randomized rule, but you need to handle the don’t-care cases under the threshold test carefully.
Solution: The don’t care cases arise for 0  y  1 when ⌘ = 1. With the decision rule of
(8.11), these don’t care cases result in x̂ = 1. If half of those don’t care cases are decided
as x̂ = 0, then the error probability given X = 1 is increased to 1/4 and that for X = 0 is
decreased to 1/4. This could be done by random choice, or more easily, by mapping y > 1/2
into x̂ = 1 and y  1/2 into x̂ = 0.
266
A.9
APPENDIX A. SOLUTIONS TO EXERCISES
Solutions for Chapter 9
Exercise 9.1: Consider the simple random walk {Sn ; n
1} of Section 9.1.1 with Sn = X1 + · · · + Xn
and Pr{Xi = 1} = p; Pr{Xi = 1} = 1 p; assume that p  1/2.
nS
o h nS
oik
a) Show that Pr
{S
k}
=
Pr
{S
1}
for any positive integer k. Hint: Given that
n
n
n 1
n 1
the random walk ever reaches the value 1, consider a new random walk starting at that time and explore
the probability that the new walk ever reaches a value 1 greater than its starting point.
Solution: Since {Sn ; n 1} changes only in increments of 1 or -1, the only
S way that the
walk can reach a threshold at integer k > 1 (i.e., the only way that the event n 1 {Sn k}
can occur), is if the walk first eventually reaches the value 1, and then starting from the
first time that 1 is reached, goes on to eventually reach 2, and so forth onto k.
The probability of eventually reaching 2 given that 1 is reached is the same as the probability
of eventually reaching 1 starting from 0; this is most clearly seen from the Markov chain
depiction of the simple random walk given in Figure 6.1. Similarly, the probability of
eventually reaching any j starting from j 1 is again the same, so (using induction if one
insists on being formal), we get the desired relationship. This relationship also holds for
p > 1/2.
b) Find a quadratic equation for y = Pr
immediately after the first trial.
nS
n 1 {Sn
o
1} . Hint: explore each of the two possibilities
Solution: As explained in Example 5.5.4, y = p + (1
p)y 2 .
c) nFor p < 1/2, show
o that the two roots of this quadratic equation are p/(1
S
Pr
{S
1}
cannot be 1 and thus must be p/(1 p).
n
n 1
p) and 1. Argue that
Solution: This is also explained in Example 5.5.4.
d) For p = 1/2, show that the quadratic equation in (c) has a double root at 1, and thus Pr
1. Note: this is the very peculiar case explained in Section 5.5.1.
nS
n 1 {Sn
o
1} =
Solution:
As explained
in Example 5.5.4, the fact that the roots are both 1 means that
nS
o
Pr
1} = 1.
n 1 {Sn
e) For p < 1/2, show that p/(1
⇥
⇤
g(r) = E erX .
p) = exp( r⇤ ) where r⇤ is the unique positive root of g(r) = 1 where
⇥
⇤
Solution: Note that g(r) = E erX = per + (1
the r⇤ > 0 for which
⇤
1 = per + (1
⇤
p)e r . The positive root of g(r) = 1 is
⇤
p)e r .
⇤
This is quadratic in er (and also in e r ) and is the same equation as in (b) if we substitute
⇤
⇤
⇤
y for e r . The two solutions are e r =1 (r⇤ =0) and e r = p/(1 p) (r⇤ = ln[(1 p)/p]).
Thus the unique positive solution for r⇤ is ln[(1 p)/p]. Thus the optimized Cherno↵ bound
for crossing a threshold is satisfied with equality by the simple combinatorial solution in
(c).
Exercise 9.2:
{Yi ; i
Consider a G/G/1 queue with IID arrivals {Xi ; i
1}, IID FCFS service times
0}, and an initial arrival to an empty system at time 0. Define Ui = Yi 1 Xi for i 1. (Note:
267
A.9. SOLUTIONS FOR CHAPTER 9
the problem statement in the text incorrectly defined this is Ui = Xi Yi 1 , which is inconsistent with the
treatment in Section 9.2) Consider a sample path where (u1 , . . . , u6 ) = (1, 2, 2, 1, 3, 2).
a) Let Zi6 = U6 + U6 1 + . . . + U6 i+1 . Find the queueing delay for customer 6 as the maximum of the
‘backward’ random walk with elements 0, Z16 , Z26 , . . . , Z66 ; sketch this random walk.
Solution:
Z16 =
Z26
Z36
Z46
Z56
Z66
U6
=
=
U6 + U5
= 1
=
U6 + U5 + U4
= 0
=
U6 + U5 + U4 + U3
= 2
=
U6 + U5 + U4 + U3 + U2
= 0
2
= U6 + U5 + U4 + U3 + U2 + U1 = 1
P
Note that the equation for Zin is ij=01 Un j , rather than the slight typo given for Zin in the
equation between (9.5) and (9.6) in the text. The queueing delay W6 , for customer 6 is the
maximum of (0, Z16 , . . . , Z66 ), i.e., 2.
H6
⌦H
6
@ ⌦z2 z3
@⌦
6
z1
@
@6
6
z4
z5
6
z6
b) Find the queueing delay for customers 1 to 5.
Solution: In the same way, (W1 , . . . , W5 ) = (1, 0, 2, 1, 4)
c) Which customers start a busy period (i.e., arrive when the queue and server are both empty)? Verify
that if Zi6 maximizes the random walk in (a), then a busy period starts with arrival 6
i.
Solution: Customer 2 starts a busy period (i.e., W2 = 0) and Z66 i is maximized in the
random walk in (a) by i = 2.
d) Now consider a forward random walk Vn = U1 + · · · + Un . Sketch this walk for the sample path above and
show that the queueing delay for each customer is the di↵erence between two appropriately chosen values of
this walk.
Solution: (v1 , . . . , v6 ) = (1, 1, 1, 0, 3, 1). The corresponding sample path segment is then
3
1
0
-1
@v
@2
v1
⌦⌦@@
HH⌦
v4 v5
v3
v6
Note that Zin = Un + · · · + Un i+1 and this can be rewritten as Vn Vn i . Thus Wn =
maxin Vn Vn i = Vn minin Vi . Thus the queue delay at any n is the di↵erence between
the V random walk at n and the lowest previous point on that random walk. To understand
this, consider the random walk {Vn ; n 1}. The first arrival i > 0 for which vi < 0 is the
first i > 0 to see an already empty system, and thus Wi = 0 rather than Wi = Vi < 0.
Subsequent delays are then determined as if the random walk is reset to 0 at the ith arrival.
The next arrival j that sees an already empty system is the first j for which Vj < Vi . and
268
APPENDIX A. SOLUTIONS TO EXERCISES
so forth at each arrival to see an already empty system. Thus the smallest Vj for j  n
is the last j  n to start a new busy period. This is somewhat more intuitive than the
backward random walk, but the backward random walk is the one that connects G/G/1
queue-waiting with random walks.
Exercise 9.3:
A G/G/1 queue has a deterministic service time of 2 and interarrival times that are 3
with probability p < 1/2 and 1 with probability 1 p.
a) Find the distribution of W1 , the wait in queue of the first arrival after the beginning of a busy period.
Solution: The service time is always 2 units, so the first arrival after the beginning of a
busy period must arrive either after 1 unit of the service is completed or 1 unit after the
service is completed. Thus Pr{W1 = 0} = p and Pr{W1 = 1} = 1 p.
b) Find the distribution of W1 , the steady-state wait in queue.
Solution: The rv Ui = Yi+1 Xi is binary (either 2-1 or 2-3), so Pr{Ui = 1} = p and
Pr{Ui = 1} = 1 p. The backward random walk is thus a simple random walk. From
Theorem 9.2.1, Pr{W k} is the probability that the random walk based on {Ui ; i 1}
ever crosses k. From (9.2), recognizing that the simple random walk is integer valued,
✓
◆k
p
Pr{W k} =
.
1 p
c) Repeat (a) and (b) assuming the service times and interarrival times are exponentially distributed with
rates µ and
respectively.
Solution: This is an M/M/1 queue. With probability µ/( + µ), the first arrival appears
after the initial departure and thus has no wait in the queue. With probability /( +µ) the
first arrival appears before the initial departure, and then waits in queue for an exponentially
distributed time of rate µ. Thus Pr{W1 > w} = /( + µ) exp( µw) for w 0.
In steady state, i.e., to find Pr{W > w} = limn!1 Pr{Wn > w}, we recall that the steadystate number N in the system is geometric with pN (n) = (1
/µ)( /µ)n for n 0. Given
N = 0 (an
/µ), the wait is 0. Given N = n, the wait in the
Pnevent⇤ of probability ⇤1
queue is i=1 Yi where each Yi is exponential with rate µ. Thus for N > 0 (an event
of probability /µ the wait is a geometrically distributed sum of exponential rv’s. From
Example 2.3.3, this is equivalent to a sum of two Poisson processes, one of rate and one
of rate µ
where the first arrival from the µ
process is geometrically distributed in
the combined µ process. Thus,
Pr{W
Exercise 9.4:
w} =
µ
exp
(µ
)w ;
for w
0.
Let U = V1 + · · · + Vn where V1 , . . . , Vn are IID rv’s with the MGF gV (s). Show that
gU (s) = [gV (s)]n . Hint: You should be able to do this simply in a couple of lines.
Solution:
gU (s) = E [exp(sU )] = E [exp(sV1 + sV2 + · · · + sVn ] = [gV (s)]n ,
269
A.9. SOLUTIONS FOR CHAPTER 9
where we have used the fact that the expectation of a product of independent rv’s is equal
to the product of the expectations.
Exercise 9.5:
Let {an ; n
1} and {↵n ; n
1} be sequences of numbers. For some b, assume that
an  ↵n e bn for all n 1. For each of the following choices for ↵n and arbitrary b, determine whether the
bound is exponentially tight. Note: you may assume b 0 if you wish, but it really does not matter.
P
a) ↵n = kj=1 cj n j .
p
b) ↵n = exp(
n).
2
c) ↵n = e n .
Solution to (a), (b), (c): The question is ill-posed, since we know nothing about the
sequence {an ; n 1} other than it is upper-bounded by an  ↵n e bn . For example, an = 0
for each n 1 would satisfy the indicated bounds (assuming each cj > 0 in part (a)). To
make sense of the exercise, assume that an = ↵n e bn for each n 1 and assume that the
intended bound is an  e bn for each n. Thus the question has to do with whether the
coefficients {↵n ; n 1} are ‘significant enough’ to destroy exponential tightness.
By definition of exponential tightness, the sequence of bounds an  e bn for each n
exponentially tight if limn!1 (1/n) ln[an ] = limn!1 ln[e bn ], i.e., if
1 is
lim (1/n) ln[↵n e bn ] = lim (1/n) ln[e bn ].
n!1
n!1
Simplifying, the sequence of bounds is exponentially tight
if limn!1 (1/n)
ln[↵n ] = 0. Thus,
⇥P
⇤
k
j
assuming cj > 0 for 1  j  k, we have limn!1 (1/n) ln
= 0. Thus the bound
j=1 cj n
is exponentially tight in part (a), and is similarly exponentially tight in (b). The bound
2
is not exponentially tight in(c) since limn!1 (1/n) ln[e n ] = 1. In other words, the
2
‘coefficient’ e n overwhelms the exponential e↵ect in part (c).
Exercise 9.6: Define (r) as ln [g(r)] where g(r) = E [exp(rX)]. Assume that X is discrete with possible
outcomes {ai ; i 1}, let pi denote Pr{X = ai }, and assume that g(r) exists in some open interval (r , r+ )
containing r = 0. For any given r, r < r < r+ , define a random variable Xr with the same set of possible
outcomes {ai ; i 1} as X, but with a probability mass function qi = Pr{Xr = ai } = pi exp[ai r
(r)]. Xr
is not a function of X, and is not even to be viewed as in the same probability space as X; it is of interest
simply because of the behavior of its defined probability mass function. It is called a tilted random variable
relative to X, and this exercise, along with Exercise 9.11 will justify our interest in it.
P
a) Verify that i qi = 1.
Solution: Note that these tilted probabilities are described in Section 9.3.2.
X
X
1 X
g(r)
qi =
pi exp[ai r
(r)] =
pi exp[ai r] =
= 1.
g(r)
g(r)
i
i
b) Verify that E [Xr ] =
Solution:
P
i
0
i ai qi is equal to
E [Xr ] =
X
ai pi exp[ai r
i
=
(r).
(r)] =
1 X
ai pi exp[ai r]
g(r)
i
1 d X rai
g0 (r)
pi e
=
=
g(r) dr
g(r)
i
0
(r).
(A.178)
270
APPENDIX A. SOLUTIONS TO EXERCISES
c) Verify that VAR [Xr ] =
P
2
i ai qi
(E [Xr ])2 is equal to
00
(r).
⇥ ⇤
Solution: We first calculate the second moment, E Xr2 .
X
⇥ ⇤
E Xr2 =
a2i pi exp[ai r
1 X 2
ai pi exp[ai r]
g(r)
(r)] =
i
=
i
1 d2 X rai
g00 (r)
p
e
=
.
i
g(r) dr2
g(r)
(A.179)
i
Using (A.178) and (A.179),
g00 (r)
g(r)
VAR [Xr ] =
d) Argue that
00
(r)
[g0 (r)]2
d2
=
ln(g(r)) =
[g(r)]2
dr2
0 for all r such that g(r) exists, and that
00
(r) > 0 if
00
00
(r).
(0) > 0.
Solution: Since 00 (r) is the variance of a rv, it is nonnegative. If VAR [X] > 0, then X is
non-atomic, which shows that Xr is non-atomic and thus has a positive variance wherever
it exists.
e) Give a similar definition of Xr for a random variable X with a density, and modify (a) to (d) accordingly.
Solution: If X has a density, fX (x), and also has an MGF over some region of r, then
the tilted variable Xr is defined to have the density fXr (x) = fX (x) exp(xr
(r)). The
derivations above follow as before except for the need of more care about convergence.
Exercise 9.7:
0
a) Suppose Z is uniformly distributed over [ b, +b]. Find gZ (r), gZ
(r), and
0
(r) as a
function of r.
Solution:
⇥ ⇤
gZ (r) = E erZ =
0
gZ
(r)
d
=
dr
0
Z (r)
✓ br
e
=
e br
2br
Z b
1 rz
ebr e br
e dz =
.
2br
b 2b
◆
=
0 (r)
gZ
(br
=
gZ (r)
(br
1)ebr + (br + 1)e br
.
2br2
1)ebr + (br + 1)e br
⇣
⌘
.
r ebr e br
b) Show that the interval over which gZ (r) exists is the real line, i.e., r+ = 1 and r
limr!1
0
(r) = b.
(A.180)
=
1. Show that
Solution: We know from the theoretical development that all these quantities exist over
an interval r to r+ , and it is clear from (A.180) that they are finite over all nonzero r, but
it is not immediately clear from (A.180) itself that they are finite for r = 0. If one expands
ebr and e br into power series around r = 0, however, it can be seen that gZ (0) = 1 and
271
A.9. SOLUTIONS FOR CHAPTER 9
0 (0) = 0. Finally, by dividing numerator and denominator of (A.180) by rebr , it is clear
gZ
that limr!1 Z0 (r) = b.
c) Show that if a > b, then inf r2I(X) [ (r)
1, so that the optimized Cherno↵ bound in (9.10) says
ra] =
that Pr{Sn > na} = 0. Explain without any mathematics why Pr{Sn > na} = 0 must be valid. Explain
(with as little mathematical verbiage as possible) why an infimum rather than a minimum must be used in
(9.10).
Solution: We know that 00 (r)
0 and thus, since limr!1 Z0 (r) = b, we must have
0 (r)  b for all r. Since
Z (0) = 0, it then follows that (r)  br. Thus (r) ar  (b a)r,
which approaches 1 as r ! 1. Thus the optimized Cherno↵ bound is 0. Since Sn is
the sum of n rv’s each upper bounded by b, it is obvious that Pr{Sn > bn} = 0 and thus
Pr{Sn > an} = 0. Finally an infimum is needed in (9.10) because there is no minimum
over finite r.
d) For an arbitrary rv X, show that that if r+ = 1 and limr!1
0
(r) = b, then (9.10) says that
Pr{Sn > na} = 0 for a > b.
Solution: The argument in (c) is in fact general for any rv under the above assumptions.
Exercise 9.8: Note that the MGF of the nonnegative exponential rv with density e x is (1
r) 1 for
r < r+ = 1. It can be seen from this that g(r+ ) does not exist (i.e., it is infinite) and both limr!r+ g(r) = 1
and limr!r+ g0 (r) = 1, where the limit is over r < r+ . In this exercise, you are first to assume an arbitrary
rv X for which r+ < 1 and g(r+ ) = 1 and show that both limr!r+ g(r) = 1 and limr!r+ g0 (r) = 1.
You will then use this to show that if supr<r+ g0 (r) < 1, then (r+ ) < 1 and µ(a) in (9.10) is given by
µ(a) = (r+ ) r+ a for a > supr<r+ 0 (r).
a) For X such that r+ < 1 and g(r+ ) = 1, explain why
Z A
lim
exr+ dF(x) = 1.
A!1
(A.181)
0
R1
Solution: g(r+ ) = E [rexr+ ] = 0 exr+ dF(x). This is , by the definition of an integral
with an infinite upper limit, the integral from 0 to 1 is the limit on the left of (A.181).
b) Show that for any ✏ > 0 and any A > 0,
g(r+
✏)
e ✏A
Z A
exr+ dF(x).
(A.182)
0
Solution: Since ex(r+ ✏)
g(r+
✏)
0,
Z A
x(r+ ✏)
e
dF(x) = e
0
✏A
Z A
0
c) Choose A = 1/✏ and show that
✏) = 1.
lim g(r+
✏!0
Solution: Using A = 1/✏ in (A.182),
g(r+
✏)
e
1
Z 1/✏
0
From (A.181), the limit of this as ✏ ! 0 is 1.
exr+ dF(x).
exr+ dF(x).
272
APPENDIX A. SOLUTIONS TO EXERCISES
d) Show that lim✏!0 g0 (r+
✏) = 1.
Solution: Using the same approach as in (A.182) and then lower-bounding the coefficient
x by 1 for x 1,
g0 (r+
Z A
✏)
xex(r+ ✏) dFX (x)
0
Z A
Z A
ex(r+ ✏) dFX (x)
1
ex(r+ ✏) dFX (x)
er+ .
0
As shown in (c), if 1/✏ is substituted for A in the final integral, and then the limit ✏ ! 0 is
taken, we get lim✏!0 g0 (r+ ✏) = 1.
e) Use (a) to (d) to show that if r+ < 1 and supr<r+ g0 (r) < 1, then g(r+ ) < 1.
Solution: Given that r+ < 1 and ✏ > 0, we have shown that g(r+ ) = 1 implies that
lim✏!0 g0 (r+ ✏) = 1. Thus if lim✏!0 g0 (r+ ✏) < 1, we must have g(r+ ) < 1. Since g0 (r)
is increasing with r, supr<r+ g0 (r) = lim✏!0 g0 (r+ ✏), so supr<r+ g0 (r) < 1 also implies
that g(r+ ) < 1.
f ) Show that if r+ < 1 and supr<r+ g0 (r) < 1, then µ(a) =
X (r+ )
ar+ for a > supr<r+
0
X (r).
0 (r) < 1. If a >
Solution: If supr<r+ g0 (r) < 1 then g(r+ ) < 1 and thus supr<r+ X
0 (r), then d
supr<r+ X
(r) ra < 0 for all r < r+ . Thus (r) ra is decreasing in r for
dr
r < r+ , so
µ(a) = inf
r<r+
X (r)
ra = (r+ )
ar+ .
See Exercise 1.25, (c) and (d) for examples of this.
Exercise 9.9: [Details in proof of Theorem 9.3.3] a) Show that the two appearances of ✏ in (9.24) can
be replaced with two independent arbitrary positive quantities ✏1 and ✏2 , getting
Pr Sn
n( 0 (r)
✏1 )
(1
) exp[ n(r 0 (r) + r✏2
(r))].
(A.183)
Show that if this equation is valid for ✏1 and ✏2 , then it is valid for all larger values of ✏1 and ✏2 . Hint: Note
that the left side of (9.24) is increasing in ✏ and the right side is decreasing.
Solution: For an arbitrary ✏1 and ✏2 , let ✏ = min(✏1 , ✏2 ). For that ✏ and any > 0, there
is an no such that (9.24) is satisfied for all n
no . Then, using the hint to replace ✏ by
✏1 ✏ on the left of (9.24),
Pr Sn
n( 0 (r)
✏1 )
Pr Sn
(1
(1
n( 0 (r)
✏)
0
(r))]
0
(r))].
) exp[ n(r (r) + r✏
) exp[ n(r (r) + r✏2
This also shows that if (A.183) is satisfied for a given ✏1 , ✏2 and n, then it is satisfied for all
larger ✏1 , ✏2 for that n.
b) Show that by increasing the required value of no , the factor of (1
) can be eliminated in (A.183).
273
A.9. SOLUTIONS FOR CHAPTER 9
Solution: We can rewrite (A.183) as
Pr Sn
n( 0 (r)
✏1 )
(1
)enr✏2 exp[ n(r 0 (r) + 2r✏2
(r))].
This applies for n no for the no in part (a). Now choose n0o large enough so that both
0
(1
)eno r✏2 1 and (A.183) (for the given (✏1 , ✏2 )) is satisfied. Then for n n0o ,
Pr Sn
n( 0 (r)
✏1 )
(1
0
)enr✏ exp[ n(r 0 (r) + 2r✏2
0
exp[ n(r (r) + 2r✏2
(r))]
0
(r))],
= exp[ n(r
(r) + r✏02
(r))]
(A.184)
where ✏02 = 2✏2 . Since ✏2 > 0 is arbitrary, ✏02 > 0 is also arbitrary, with the corresponding
change to n0o .
c) For any r 2 (0, r+ ), let 1 be an arbitrary number in (0, r+ r), let r1 = r + 1 , and let ✏1 =
Show that there is an m such that for all n m,
⇥
⇤
Pr Sn n 0 (r)
exp
n (r + 1 ) 0 (r + 1 ) + (r + 1 )✏2
(r + 1 ) .
Using the continuity of
and its derivatives, show that for any ✏ > 0, there is a
0
of (A.185) is greater than or equal to exp[ n( (r)
0
(r1 )
0
(r).
(A.185)
1 > 0 so that the right side
r (r) + r✏)].
0 (r) > 0. Now (A.184) applies for
Solution: Since 0 (r) is increasing in r, ✏1 = 0 (r1 )
0
arbitrary r 2 (0, r+ ) and ✏1 , ✏2 (using the appropriate no for those parameters). Thus we
can apply (A.184) to r1 and ✏1 as defined above and with ✏2 replacing ✏02 .
⇥
⇤
Pr Sn n( 0 (r1 ) ✏1 )
exp
n r1 0 (r1 ) + r1 ✏2
(r1 )
Replacing 0 (r1 ) ✏1 on the left with 0 (r) and r1 on the right by r + 1 , we get (A.185).
This is valid for n m where m is the value of n0o for these modifications of r, ✏1 , and ✏2 .
Let ✏ > 0 be arbitrary. Since r 0 (r) is continuous in r, we see that for all small enough
0
0
(r + 1 ) 
(r) + r✏/3 for all small
1 , (r + 1 ) (r + 1 )  r (r) + r✏/3. Similarly,
enough 1 . Finally, for any such small enough 1 , we can choose ✏2 > 0 small enough that
(r + 1 )✏2  r✏/3. Substituting these inequalities into (A.185), there is an m such that for
n m,
⇥
⇤
Pr Sn n 0 (r)
exp
n r 0 (r) + r✏
(r) ,
thus completing the final details of the proof of Theorem 9.3.3.
Exercise 9.10: In this exercise, we show that the optimized Cherno↵ bound is tight for the 3rd case in
(9.12) as well as the first case. That is, we show that if r+ < 1 and a > supr<r+
Pr{Sn na} exp{n[ (r+ ) r+ a ✏]} for all large enough n.
0
(r), then for any ✏ > 0,
a) Let Yi be the truncated version of Xi , truncated for some given b to Yi = Xi for Xi  b and Yi = b
otherwise. Let Wn = Y1 + · · · + Yn . Show that Pr{Sn
na}
Pr{Wn
na}.
Solution: Since Xi Yi is a nonnegative rv for each i, Sn Wn is also nonnegative. Thus
all sample points satisfying Wn na also satisfy Sn na so Pr{Wn na}  Pr{Sn na}.
b) Let gb (r) be the MGF of Y . Show that gb (r) < 1 and that gb (r) is non-decreasing in b for all r < 1.
274
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: Assuming a PMF for notational simplicity, we have
X
X
X
gb (r) =
pX (x)erx +
pX (x)erb 
pX (x)erb = erb < 1.
xb
x
x>b
If b is replaced by b1 > b, the terms above for x  b are unchanged and the terms with
x > b are increased from pX (x)erb to either pX (x)erx or erb1 , increasing the sum. For those
@
who prefer calculus to logic, @b
gb (r) = rebr Pr{X > b}, which is nonnegative.
c) Show that limb!1 gb (r) = 1 for all r > r+ and that limb!1 gb (r) = g(r) for all r  r+ .
P
rx
Solution: For r > r+ , we have g(r) = 1 (by definition of r+ . Also
xb pX (x)e ,
Pgb (r)
rx
which goes to g(r) = 1 in the limit b ! 1. For r  r+ , we have xb pX (x)e  gb (r) 
g(r), so limb!1 gb (r) = g(r).
d) Let
b (r) = ln gb (r).
and limb!1
b (r) =
Show that
b (r) < 1 for all r < 1.
Also show that limb!1
(r) for r  r+ . Hint: Use b) and c).
b (r) = 1 for r > r+
Solution: From (b), gb (r) < 1, from which it follows that b (r) = ln gb (r) < 1. Also,
from (c), limb!1 gb (r) = 1 for r > r+ , from which it follows that limb!1 b (r) = 1 for
r > r+ . Finally, since limb!1 gb (r) = g(r) for r  r+ , it follows that limb!1 b (r) = (r)
for r  r+ .
0
@
> 0 be arbitrary. Show that for all large enough b,
b (r) = @r b (r) and let
0
First show that b (r+ + ) [ b (r+ + )
(r+ )]/ .
e) Let
0
b (r+ +
) > a. Hint:
Solution: This is the first part of the exercise where we must consider the situation in the
problem statement, where we are looking at the case r+ < 1 and a > supr<r+ 0 (r). This
implies that supr<r+ 0 (r) < 1, and, as shown in Exercise 9.8e, this implies that g(r+ ) < 1
and thus (r+ ) < 1.
Since
0
b (r) is the slope of
b (r) and is non-decreasing in r, we have
0
b (r+ +
)
b (r+ +
)
b (r+ )]
b (r+ +
)
(r+ )]
,
where in the final step we used the fact that b (r+ )  (r+ ). Since (r+ ) < 1 and
limb!1 b (r+ + ) = 1, we see that b0 (r+ + ) is unbounded with increasing b, and thus
exceeds a for large enough b..
f ) Show that the optimized Cherno↵ bound for Pr{Wn
na} is exponentially tight for the values of b in
(e). Show that the optimizing r is less than r+ + .
Solution: For each b < 1, the MGF of Wn is finite for all r, 0  r < 1. Thus, from
Theorem 9.3.3, the result is exponentially tight if a = b0 (r) for some r < 1. For any b such
that b0 (r+ + ) > a, the continuity of b0 (r) in r and the fact that b0 (0) < X shows that
such an r must exist. Since b0 (r+ + ) > a, the optimizing r is less than r+ + .
g) Show that for any ✏ > 0 and all sufficiently large b,
b (r)
ra
(r)
ra
✏
(r+ )
r+ a
✏
for 0 < r  r+ .
275
A.9. SOLUTIONS FOR CHAPTER 9
Hint: Show that the convergence of
b (r) to
(r) is uniform in r for 0 < r < r+ .
Solution: Since g(r+ ) < 1, it can be seen that g(r) is uniformly continuous in r over
0  r  r+ . It then follows that gb (r) must converge to g(r) uniformly in r for 0  r  r+
and the uniform convergence of b (r) follows from this. For any ✏ > 0, and for large enough
b, this means that (r)
b (r)  ✏ for all r 2 (0, r+ ). This is equivalent to the first part of
the desired statement. The second part follows from the fact that 0 (r) is increasing in r
0 (r) for 0  r  r . This implies that (r) ar is decreasing
and thus a > supr<r+ 0 (r)
+
in r over 0  r  r+ and is thus lower bounded, over 0  r  r+ , at r+ .
h) Show that for arbitrary ✏ > 0, there exists
b (r)
ra
(r+ )
> 0 and bo such that
r+ a
✏
for r+ < r < r+ +
and b
bo .
Solution: For the given ✏ > 0, choose bo so that b (r+ ) > (r+ ) ✏/3 for b
bo . We
0 (r ))]. Finally, define 0 (r ) as sup
0 (r). Since 0 (r )
also choose = ✏/[3(a
+
+
r<r+
b +
is non-decreasing in b and limb!1 b0 (r+ ) = 0 (r+ ), we can choose bo large enough that
0
0 (r )
✏/3 for b bo . We then have
+
b (r+ )
b (r)
r+ ) b0 (r+ )
b (r+ ) + (r
ra
(r+ )
=
=
=
(r+ )
(r+ )
✏/3 + (r
r+ a
r+ a
r+ )[ (r+ )
r+ )[2 (r+ )
0
r+ a
✏/3 + 2 [ (r+ )
(r+ )
r+ a
✏.
na}
Solution: Theorem 9.3.3 applies to {Pr{Wn
large enough b,
exp[n(µb (a)
2a]
of (h) in (d), then we have shown that the
r+ a
na} ; n
✏)]
✏/3 ]
a]
na} satisfies µb (a)
(r+ )
ra
a
0
(r+ )
large b. Show that this means that Pr{Sn
na}
r+ )[ (r+ )
✏/3 + (r
optimized exponent in the Cherno↵ bound for Pr{Wn
✏/3 ]
0
✏/3 + (r
i) Note that if we put (g), (h), and d) together, and use the
Pr{Wn
ra
0
(r+ )
r+ a
✏ for sufficiently
2✏ for sufficiently large n.
1}, so for sufficiently large n and
exp[n( (r+ )
r+ a
2✏].
Using the result of (a) completes the argument.
Exercise 9.11: Assume that X is discrete, with possible values {ai ; i
1) and probabilities Pr{X = ai } =
pi . Let Xr be the corresponding tilted random variable as defined in Section 9.3.2. Let Sn = X1 + . . . + Xn
be the sum of n IID rv’s with the distribution of X, and let Sn,r = X1,r + · · · + Xn,r be the sum of n IID
tilted rv’s with the distribution of Xr . Assume that X < 0 and that r > 0 is such that (r) exists.
a) Show that Pr{Sn,r =s} = Pr{Sn =s} exp[sr
n (r)]. Hint: Read the text.
Solution: Since Xr is the tilted random variable of X, it satisfies
Pr{Xr = ai } = Pr{X = ai } exp[ai r
(r)].
Let v1 , . . . , vn be any set of values drawn from the set {ai } of possible values for X. Let
s = v1 + v2 + · · · + vn . For any such set, since {Xi,r } is a set of IID variables, each with
PMF pXi,r (vi ) = pX (vi ) exp[vi r
(r)],
Pr{X1,r =v1 , . . . , Xn,r =vn } = Pr{X1 =v1 , . . . , Xn =vn } exp[sr
n (r)].
276
APPENDIX A. SOLUTIONS TO EXERCISES
Let V (s) be the set of all vectors v = (v1 , . . . , vn ) such that v1 + v2 + ... + vn = s. Then
X
Pr{Sn,r = s} =
Pr{X1,r =v1 , . . . , Xn,r =vn }
v 2V (s)
X
=
v 2V (s)
Pr{X1 =v1 , . . . , Xn =vn } exp[sr
= Pr{Sn = s} exp[sr
n (r)]
n (r)].
b) Find the mean and variance of Sn,r in terms of (r).
Solution: E [Sn,r ] = nE [Xr ] = n 0 (r) and [VAR [Sn,r ] = nVAR [Xr ] = n 00 (r), as shown in
Exercise 9.6.
c) Define a =
0
(r) and
n
Pr |Sn
p
(r). Show that Pr |Sn,r na|  2n r > 1/2. Use this to show that
o
p
p
na|  2n r > (1/2) exp[ r(an + 2n r ) + n (r)].
(i)
2
r =
Solution: Assuming
✏ 2 VAR [Y ], to Sn,r ,
00
r > 0 and applying the Chebyshev inequality, Pr
n
Pr |Sn,r
na|
p
2n
r
o
The complementary event then satisifes
n
p
Pr | Sn,r na |< 2n

r
|Y Y |
✏ 
n r2
1
= .
2
2n r
2
o
1
.
2
The strictness of the inequalities is not important here. We will use the result in this form,
but if one traces out the case of equality in the Chebyshev bound, one also gets the form
in the problem statement. Combining this with (a),
n
o
p
1/2  Pr |Sn,r na| < 2n r
X
=
Pr{Sn,r = s}
p
s: |s na|< 2n r
=
<
X
p
s: |s na|< 2n r
X
p
s: |s na|< 2n r
n
= Pr |Sn
⇥
Pr{Sn = s} exp sr
⇤
n (r)
p
⇥
Pr{Sn = s} exp nar + 2nr r
na| <
o
p
p
⇥
2n r exp nar + 2nr r
⇤
n (r)
⇤
n (r) .
p
In the third step, we used part (a), in the fourth step, we recognized that s < na + 2n r
over the terms in the sum, and in the fifth step, we rewrote the sum of terms as a single
probability. Rearranging terms, we have (i).
d) Use this to show that for any ✏ and for all sufficiently large n,
Pr Sn
n( 0 (r)
✏) >
1
exp[ rn( 0 (r) + ✏) + n (r)].
2
277
A.9. SOLUTIONS FOR CHAPTER 9
p
Solution:
Note that the event {|Sn na| < 2n r } is contained
p
p in the event {Sn >
na
2n r }. Also, for
any
✏
>
0,
we
can
choose
n
such
that
n
✏
>
2n r . For all n n0 ,
0
0
p
the event {Sn > na
2n r } is contained in the event {Sn > na n✏} = {Sn > n( 0 (r) ✏).
Thus
p
1
Pr |Sn > n( 0 (r) ✏) > exp[ r(na + 2n r ) + n (r)].
2
Also, for n
n0 , the right hand side of this equation can be further bounded by
Pr |Sn > n( 0 (r)
✏) >
1
exp[ rn( 0 (r) + ✏)) + n (r)].
2
Note that this exercise gives a slightly di↵erent proof from Theorem 9.3.3 that the Cherno↵
bound is exponentially tight.
Exercise 9.12: Let p = (p1 , . . . , pM ) and pe = (e
p1 , . . . , peM ) be probability vectors. Show that the
divergence satisfies D(e
p ||p)
Solution: Recall that
e = p).
0 and is 0 if and only if p
D(e
p ||p) =
X
j
pej ln
✓
pej
pj
◆
.
If pj = 0 and pej > 0 for one or more j, this is infinite and by convention greater than 0.
e to be strictly positive. Using the inequality ln x  x 1 for x > 0,
Thus assume p and p
we can upper bound D(e
p ||p) by
✓ ◆
X
X pj X
pj
D(e
p ||p) =
pej ln

pej
pej = 0.
(A.186)
pej
pej
j
j
j
Since ln x = x 1 if and only if x = 1, the inequality in (A.186) is satisfied with equality if
e = p.
and only if p
P
b Let u = u1 , . . . , uM satisfy j uj = 0 and show that p + ✏u is a probability vector over a sufficiently small
region of ✏ around 0. Show that
@
D(p + ✏u||p)|✏=0 = 0.
@✏
Solution: It is necessary to add the condition that pj > 0 for all j, since if pj = 0 then
pj + ✏uj < 0 for all ✏ < 0 when uj > 0 and for all ✏ > 0 when uj < 0. If all components
of p
P
are positive, then pj +✏uj is positive for all j for small enough |✏|. Note that j pj +✏uj = 1,
so p + ✏u is a probability vector for small enough |✏|.
Next, taking the derivative,
✓
◆
pj + ✏uj
@ X
(pj + ✏uj ) ln
@✏
pj
j
X  ✓ pj + ✏uj ◆
=
uj ln
+1 ,
pj
@
D(p + ✏u||p) =
@✏
j
which is clearly 0 at ✏ = 0.
278
APPENDIX A. SOLUTIONS TO EXERCISES
Viewing D(p +✏u||p) geometrically as a function in RM of p +✏u, this says that the function
is flat at p in each probability vector direction.
c Show that
@2
D(e
p + ✏u||p)
@✏2
0,
e + ✏u is a probability vector with non-zero components. Explain why this implies that D is convex
where p
e and nonnegative over the region of probability vectors.
in p
Solution: Taking the second derivative with respect to ✏, and here allowing ✏ to be positive
or negative,
X
u2j
@2
D(e
p + ✏u||p) =
@✏2
pej + ✏uj
0
j
for |✏| sufficiently small.
e , p, and arbitrary u, D(e
This shows that for fixed positive probability vectors p
p + ✏u||p)
is a convex function of ✏. This says that the function D over any straight line probability
e is convex, and this implies that D is convex over the convex region
vector direction from p
e is a probability vector.
where p
Exercise 9.13: Consider a random walk {Sn ; n
1} where Sn = X1 + · · · Xn and {Xi ; i
1} is a
sequence of IID exponential rv’s with the PDF f(x) = e x for x 0. In other words, the random walk is
the sequence of arrival epochs in a Poisson process.
a) Show that for a > 1, the optimized Cherno↵ bound for Pr{Sn
Pr{Sn
na}  (a )n e n(a
na} is given by
1)
.
⇥
⇤
Solution: The moment generating function is g(r) = E eXr = /(
r) for r < . Thus
0
(r) = ln g(r) = ln( /(
r)) and (r) = 1/(
r). The optimizing r for the Cherno↵
bound is then the solution to a = 1/(
r), which is r =
1/a. Using this r in the
Cherno↵ bound,

Pr{Sn na}  exp n ln(
) nra = exp[n ln(a ) n(a
1)],
r
which is equivalent to the desired expression.
b) Show that the exact value of Pr{Sn
na} is given by
Pr{Sn
na} =
n
X1
(na )i e na
.
i!
i=0
(A.187)
Solution: For a Poisson
counting process {N (t); t > 0}, the event {Sn > na} is the same
S
as {N (na) < n} = ni=01 {N (na) = i}. Since this is a union of disjoint events,
Pr{Sn > na} =
n
X1
Pr{N (na) = i} .
i=0
Using the Poisson PMF, the right side of this is equal to the right side of (A.187). Since Sn
is continuous, Pr{Sn > na} = Pr{Sn na}.
279
A.9. SOLUTIONS FOR CHAPTER 9
c) By upper and lower bounding the quantity on the right of (A.187), show that
(na )n e na
n! a
Hint: Use the term at i = n
 Pr{Sn
na} 
(na )n e na
.
n!(a
1)
1 for the lower bound and note that the term on the right can be bounded by
a geometric series starting at i = n
1.
Solution: The lower bound on the left is the single term with i = n
(A.187). For the upper bound, rewrite the sum in (b) as
n
X1
i=0
(na )i e na
i!
=
(na )n e na
n!

(na )n e na
n!


1 of the sum in
n
n(n 1)
+
+ ···
na
(na )2
1
1
+
+ ···
a
(a )2
=
(na )n e na
.
n!(a
1)
d) Use the Stirling bounds on n! to show that
(a )n e n(a 1)
p
 Pr{Sn
2⇡n a exp(1/12n)
(a )n e n(a 1)
na}  p
.
2⇡n (a
1)
Solution: The Stirling bounds are
⇣ n ⌘n
⇣ n ⌘n
p
p
2⇡n
< n! < 2⇡n
e1/12n .
e
e
Substituting these for n! in (c) and cancelling terms gives the desired expression. Note that
the Cherno↵ bound contains all the factors that vary exponentially with n. Note also that
the Erlang expression for Sn and the Poisson expression for N (t) are quite simple, but the
corresponding CDF’s are quite messy, and this makes the Cherno↵ bound more attractive
in this case.
Exercise 9.14: Consider a random walk with thresholds ↵ > 0,
< 0. We wish to find Pr{SJ ↵} in
the absence of a lower threshold. Use the upper bound in (9.46) for the probability that the random walk
crosses ↵ before .
a) Given that the random walk crosses
first, find an upper bound to the probability that ↵ is now crossed
before a yet lower threshold at 2 is crossed.
Solution: Let J1 be the stopping trial at which the walk first crosses either ↵ or . Let J2
be the stopping trial at which the random walk first crosses either ↵ or 2 (assuming the
random walk continues forever rather than actually stopping at any stopping trial. Note
that if SJ1
↵, then SJ2 = SJ1 , but if SJ1  , then it is still possible to have SJ2
↵.
In order for this to happen, a random walk starting at trial J1 must reach a threshold of
↵ SJ1 before reaching 2
SJ1 . Putting this into equations,
Pr{SJ2
↵} = Pr{SJ1
↵} + Pr{SJ2
 Pr{SJ1
↵} + Pr{SJ2
↵ | SJ1  } Pr{SJ1  }
↵ | SJ1  } .
The second term above upper bounds the probability that the RW starting at trial J1
reaches ↵ SJ1 before 2
SJ1 , given that SJ1  . Since ↵ SJ1 ↵
,
Pr{SJ2
↵ | SJ1  }  exp[ r⇤ (↵
)].
280
APPENDIX A. SOLUTIONS TO EXERCISES
Thus,
Pr{SJ2
↵}  exp( r⇤ ↵) + exp[ r⇤ (↵
)].
Note that it is conceivable that SJ1 = SJ2 , i.e., that and 2 are crossed at the same time.
The argument above includes this case, and no further note about it is necessary.
b) Given that 2
is crossed before ↵, upper bound the probability that ↵ is crossed before a threshold at
3 . Extending this argument to successively lower thresholds, find an upper bound to each successive term,
and find an upper bound on the overall probability that ↵ is crossed. By observing that
is arbitrary, show
that (9.46) is valid with no lower threshold.
Solution: Let Jk for each k
same argument as above,
Pr SJk+1
↵
1 be the stopping trial at which ↵ or k is crossed. By the
= Pr{SJk
 Pr{SJk
↵} + Pr SJk+1
⇤
↵} + exp[ r (↵
↵ | SJk  k
k )],
Pr{SJk  k }
Finally, let J1 be the defective stopping time at which ↵ is first crossed. We see from above
that the event SJ1 > ↵ is the union of the the events SJk ↵ over all k 1. We can upper
bound this by
Pr{SJ1
↵}  Pr{SJ1
↵} +
 exp[ r⇤ ↵]
Since this is true for all
1
X
Pr SJk+1
k=1
1
↵ | SJk  k
1
.
exp[ r⇤ ]
< 0, it is valid in the limit
!
⇤
1, yielding e r ↵ .
The reason why we did not simply take the limit ! 1 in the first place is that such
a limit would not define a non-defective stopping rule. The approach here is to define the
limit as a union of non-defective stopping rules.
Exercise 9.15: This exercise verifies that Corollary 9.4.4 holds in the situation where (r+ ) < 0 (see
Figure 9.4). We have defined r⇤ = r+ in this case.
a) Use the Wald identity at r = r+ to show that
Pr{SJ
↵} E [exp(r+ SJ
J (r+ )) | SJ
↵]  1.
(A.188)
Hint: Look at the first part of the proof of Corollary 9.9.4.
Solution: Starting with the Wald identity expressed for two thresholds,
1 = Pr{SJ ↵} E [exp(r+ SJ J (r+ ))|SJ
Pr{SJ ↵} E [exp(r+ SJ J (r+ ))|SJ
↵] + Pr{SJ  } E [exp(r+ SJ J (r+ ))|SJ  ]
↵] ,
which is the same as (A.188)
b) Show that
Pr{SJ
↵} exp[r+ ↵
(r+ )]  1.
(A.189)
281
A.9. SOLUTIONS FOR CHAPTER 9
Solution: Since the expectation is conditional on SJ ↵, we can lower bound SJ by ↵ in
the expectation. Also J can be lower bounded by 1. Thus
1
Pr{SJ ↵} E [exp(r+ SJ J (r+ ))|SJ
Pr{SJ ↵} E [exp(r+ ↵
= Pr{SJ ↵} exp(r+ ↵
(r+ ))|SJ
↵]
↵]
(r+ )).
c) Show that
Pr{SJ
↵}  exp[ r+ ↵ + (r+ )]
and that Pr{SJ
↵}  exp[ r⇤ ↵].
Note that the first bound above is a little stronger than the second, and note that lower bounding J by 1
is not necessarily a very weak bound, since this is the case where if ↵ is crossed, it tends to be crossed for
small J.
Solution: The first bound is simply a rearrangement of (A.189) and the second follows
from the first since (r+ ) is negative by assumption.
Exercise 9.16: a) Use Wald’s equality to show that if X = 0, then E [SJ ] = 0 where J is the time of
threshold crossing with one threshold at ↵ > 0 and another at
< 0.
Solution: Wald’s equality holds since E [|J|] < 1, which follows from Lemma 9.4.1, which
says that J has an MGF in an interval around 0. Thus E [SJ ] = XE [J]. Since X = 0, it
follows that E [SJ ] = 0.
b) Obtain an expression for Pr{SJ
↵}. Your expression should involve the expected value of SJ conditional
on crossing the individual thresholds (you need not try to calculate these expected values).
Solution: Writing out E [SJ ] = 0 in terms of conditional expectations,
E [SJ ] = Pr{SJ
= Pr{SJ
↵} E [SJ | SJ
↵} E [SJ | SJ
↵] + Pr{SJ  } E [SJ | SJ  ]
↵] + [1
Using E [SJ ] = 0, we can solve this for Pr{SJ
Pr{SJ
↵} =
Pr{SJ
↵}]E [SJ | SJ  ] .
↵},
E [ SJ | SJ  ]
E [ SJ | SJ  ] + E [SJ | SJ
↵]
.
c) Evaluate your expression for the case of a simple random walk.
Solution: A simple random walk moves up or down only by unit steps, Thus if ↵ and are
integers, there can be no overshoot when a threshold is crossed. Thus E [SJ | SJ ↵] = ↵
|
and E [SJ | SJ  ] = . Thus Pr{SJ ↵} = | ||+↵
. If ↵ is non-integer, then a positive
threshold crossing occurs at d↵e and a lower threshold crossing at b c. Thus, in this general
c|
case Pr{SJ ↵} = |b |bc|+d↵e
.
d) Evaluate your expression when X has an exponential density, fX (x) = a1 e
x
for x
0 and fX (x) = a2 eµx
for x < 0 and where a1 and a2 are chosen so that X = 0.
Solution: Let us condition on J = n, Sn
↵, and Sn 1 = s, for s < ↵. The overshoot,
V = SJ ↵ is then V = Xn + s ↵. Because of the memoryless property of the exponential,
282
APPENDIX A. SOLUTIONS TO EXERCISES
the density of V , conditioned as above, is exponential and fV (v) = e v for v 0. This
does not depend on n or s, and is thus the overshoot density conditioned only on SJ ↵.
Thus E [SJ | J ↵] = ↵ + 1/ . In the same way, E [SJ | SJ  ] =
1/µ. Thus
Pr{SJ
↵} =
| |+µ 1
.
↵+ 1+| |+µ 1
Note that it is not necessary to calculate a1 or a2 .
Exercise 9.17: A random walk {Sn ; n
for Xi
fX (x) =
1}, with Sn =
(
e x
e e 1
;
=0;
Pn
i=1
Xi , has the following probability density
1  x  1,
elsewhere.
a) Find the values of r for which g(r) = E [exp(rX)] = 1.
Solution: Evaluating g(r),
g(r) =
Z 1
1
erx
e x
1 er 1
dx
=
e e 1
r 1
e
e (r 1)
.
e 1
One solution to g(r) = 1 (for any rv) is r = 0, and this is easily verified for this special
case. To find the other solution, one can either plot the function g(r) numerically, or one
can observe that the function is symmetric around r = 1. Either way, it can be verified
that the other solution is r = 2.
b) Let P↵ be the probability that the random walk ever crosses a threshold at ↵ for some ↵ > 0. Find an
upper bound to P↵ of the form P↵  e ↵A where A is a constant that does not depend on ↵; evaluate A.
Solution: Using Corollary 9.4.4 (which is the major result for upper bounding the probability of a threshold crossing for a random walk), we see that Pr{SJ ↵}  exp( r⇤ ↵).
Here J is the stopping rule that stops on the first crossing of a positive threshold at ↵ or
a negative threshold at
. The negative threshold can be set at 1, as can be seen by
going to the limit, or by doing Exercise 9.14, or by using Theorem 9.8.5. The parameter r⇤
here is the positive root of ln g(r), which, from (a) is r⇤ = 2. Thus
P↵  exp( 2↵).
c) Find a lower bound to P↵ of the form P↵
Be ↵A where A is the same as in (b) and B is a constant
that does noth depend
i on ↵. Hint: keep it simple — you are not expected to find an elaborate bound. Also
⇤
recall that E er SJ = 1 where J is a stopping trial for the random walk and g(r⇤ ) = 1.
Solution: Wald’s identity, with r = r⇤ , is E [exp(r⇤ SJ )] = 1, which is evaluated as
1 = P↵ E [exp(r⇤ SJ ) | SJ
 P↵ E [exp(r⇤ SJ ) | SJ
↵] + (1
↵] + (1
P↵ )E [exp(r⇤ SJ ) | SJ  ]
⇤
P↵ )er .
The second term approaches 0 as ! 1, so the limiting value of P↵ as the lower threshold
approaches 1 is
⇣
⌘ 1
⇣
⌘ 1
P↵
E [exp(r⇤ SJ ) | SJ ↵]
exp(2(↵ + 1))
= (1/e)2 e 2↵ .
283
A.9. SOLUTIONS FOR CHAPTER 9
The second inequality above follows since |X|  1 and therefore the overshoot in crossing
↵ is at most 1, i.e., r⇤ SJ  r⇤ (↵ + 1) and r⇤ = 2.
Exercise 9.18: Let {Xn ; n
1} be a sequence of IID integer-valued random variables with the probability mass function pX(k) = Qk . Assume that Qk > 0 for |k|  10 and Qk = 0 for |k| > 10. Let {Sn ; n 1}
be a random walk with Sn = X1 + · · · + Xn . Let ↵ > 0 and < 0 be integer-valued thresholds, let J be the
smallest value of n for which either Sn ↵ or Sn  . Let {Sn⇤ ; n 1} be the stopped random walk; i.e.,
Sn⇤ = Sn for n  J and Sn⇤ = SJ for n > J. Let ⇡i⇤ = Pr{SJ = i}.
a) Consider a Markov chain in which this stopped random walk is run repeatedly until the point of stopping.
That is, the Markov chain transition probabilities are given by Pij = Qj i for
and i
< i < ↵ and Pi0 = 1 for i 
↵. All other transition probabilities are 0 and the set of states is the set of integers [ 9 + , 9 + ↵].
Show that this Markov chain is ergodic.
Solution: The number of states in this repeating Markov chain is finite, so we need only
show that there is a single class of states and it is aperiodic. All states clearly communicate
with each other, so there is a single class of states and it is recurrent. Also P00 > 0, so as
shown in Exercise 4.1, the chain is aperiodic.
b) Let {⇡i } be the set of steady-state probabilities for this Markov chain. Find the set of probabilities {⇡i⇤ }
for the stopping states of the stopped random walk in terms of {⇡i }.
Solution: The steady-state probabilities {⇡i } are rather a mess, even for given values of
{Qi }. We specified X with so many (21) possible values to discourage you from spending
much time on finding the ⇡i . Fortunately, finding {⇡i⇤ } from {⇡i } is fairly easy. The
repeating Markov process can be viewed as a (delayed) renewal process, associating each
visit to any of the stopping states as a renewal. The expected time between renewals is
then
1
E [T ] = P
P↵+9 .
⇡
+
i
i=↵ ⇡i
i= 9
The probabilities of the stopping values in the random walk are then ⇡i⇤ = ⇡i /T for
i  and ↵  i  ↵ + 9.
9
Note that we cannot use visits to state 0 as renewals here, since visits to state 0 occur both
on transitions from the stopping states and on transitions from states neighboring state 0.
c) Find E [SJ ] and E [J] in terms of {⇡i }.
Solution: A renewal requires J steps to cross a threshold, and the time to the next renewal
requires an additional step to return to 0. Thus E [J] = E [T ] 1. The expected stopping
value, E [SJ ] is
E [SJ ] =
X
i=
i⇡i⇤ +
9
↵+9
X
i=↵
i⇡i⇤ =
X
i=
↵+9
9
X i⇡i
i⇡i
+
.
E [T ]
E [T ]
i=↵
The Wald equality could also be used to find E [SJ ] as E [Q] E [J], but then E [Q] would be
required in the solution.
Exercise 9.19: a) Conditional on X=0 for the hypothesis testing problem, consider the random variables Zi = ln[fYi |X (yi |1)/fYi |X (yi |0)]. Show that r⇤ , the positive solution to g0 (r) = 1, where g0 (r) =
284
APPENDIX A. SOLUTIONS TO EXERCISES
E [exp(rZ )|X = 0], is given by r⇤ = 1.
Solution: First assume that Y and Z are one-dimensional. Then

✓
◆
Z 1
fY |X (y|1)
g0 (r) = E [exp(rZ)|X = 0] =
fY |X (y|0) exp r ln
fY |X (y|0)
1
Z 1
=
fY1 |Xr (y|0)fYr |X (y|1) dy
=
1
1
for r = 1.
Since g0 (r) is convex in r over the region where it exists, g0 (r) = 1 can have only two roots,
one of them at r = 0, and as just shown, the other at r = 1. For the finite dimensional
case, it is simply necessary to replace the one dimensional integral with a multidimensional
integral; there is no requirement that Y1 , . . . , Yn are independent conditional on X.
b) Assuming that Y is a discrete random variable (under each hypothesis), show that the tilted random
variable Zr with r = 1 has the PMF pY |X (y|1).
Solution: The assumption that Y and thus Z are conditional on X = 0 carries over from
(a). Thus,
qZr |X (z|0) = pZ|X (z|0) exp[rz
0 (r)]

pY |X (y|1)
= pY |X (y|0) exp r ln
pY |X (y|0)
= pY |X (y|1)
for r = 1.
0
(r)
Exercise 9.20: (Proof of Stein’s lemma) a) Use a power series expansion around r = 0 to show that
0 (r)
r 00 (r) =
r2 000 (0)/2 + o(r2 ).
Solution: Assuming that the MGF exists in an open interval around 0, g(r) has derivatives
of all orders at r = 0 and 0 (r) = ln(r) has the same. Then using a second order Taylor
expansion at r = 0, 0 (r) = 0 (0) + r 00 (0) + r2 000 (0)/2 + o(r2 ). Using a single order
Taylor expansion for 00 (r) around r = 0, we get 00 (r) = 00 (0) + r 000 (0) + o(r). This shows
that r 00 (r) = r 00 (0) + r2 000 (0) + o(r2 ). Combining these expansions, 0 (r) r 00 (r) =
r2 000 (0)/2 + o(r2 ). Since 0 (0) = 0, this is the desired result.
0 (0)
This can also be derived (with considerably more e↵ort) by applying the power series expansion of g(r) to 0 (r) = ln(g(r)) and r 00 (r)
b) Choose r = n 1/3 in (9.61) and (9.62). Show that (9.61) can then be rewritten as
Pr{e(n) | X=0}

=
exp n[ r2 000 (0)/2 + o(r2 )]
n
o
exp
n1/3 000 (0)/2 + o(n1/3 )] .
(A.190)
(A.191)
Thus Pr{e(n) | X = 0} approaches 0 with increasing n exponentially in n1/3 .
Solution: Eq. (A.190) is simply the result of substituting the result of (a) into (9.61).
Then (A.191) follows by choosing r = n 1/3 . The term o(n1/3 ) is a function of n that
satisfies limn!1 o(n1/3 )/n1/3 = 0. In other words,
ln Pr{e(n)|X = 0}

n!1
n1/3
lim
00
0 (0)
2
.
285
A.9. SOLUTIONS FOR CHAPTER 9
c) Using the same power series expansion for r in (9.62), show that
⇥
⇤
Pr{e(n) | X=1}  exp n r2 000 (0)/2 + 00 (0) + o(r)
=
Thus limn (1/n) ln Pr{e(n) | X=1} =
exp n[ 00 (0) + o(n)] .
0
0 (0) = E [Z | X = 0].
Solution: Using the expansion of (a) in (9.62), we get
⇥
Pr{e(n) | X=1}  exp n
r2 000 (0)/2 + o(r2 ) +
⇥
= exp n
r2 000 (0)/2 + o(r2 ) +
⇥ 0
⇤
= exp n (0) + r 00 (0) + o(r)
= exp n[
0
0 (0) + o(n)]
⇤
0
0 (r)
⇤
0
00
0 (0) + r 0 (0) + o(r)
,
(A.192)
(A.193)
where in (A.192) we recognized that r2 000 /2 can be included in the o(r) term and in
(A.193) we substituted n 1/3 for r, as is necessary for the bounds on Pr{e(n) | X=0}
and Pr{e(n) | X=1} to represent the same threshold rule and thus be jointly valid.
d) Explain why the bound in (9.62) is exponentially tight given that Pr{e(n) | X=0} ! 0 as n ! 1.
Solution: Part (d) is not stated precisely, but is intended to show that (9.62) at r = 0 (i.e.,
(A.193)) is exponentially tight for any sequence of tests such that limn!1 Pr{e(n)|X=0} =0.
Assume that the bound in (A.193) is not exponentially tight, i.e., that for some ✏ > 0,
Pr{e(n)|X = 1}  exp(n[ 00 (0) ✏] for some non-terminating sub-sequence of n. For each
such n, a threshold test minimizes Pr{e(n)|X=0} given this bound on Pr{e(n)|X=1}.
Since the MGF exists in some open interval around 0, there is an ✏ > 0 small enough
that 00 (0) ✏ = 00 (r) for some r less than 0. For the sequence of threshold tests at the
0
threshold ⌘ = en (r) , the Cherno↵ bound on Pr{e(n)|X=0} is an exponential bound on the
probability of correct decision (since r < 0). Thus for this subsequence of n, Pr{e(n)|X = 0}
must approach 1 exponentially rather than 0. Thus limn!1 Pr{e(n)|X=0} cannot exist and
be equal to 0 if (A.193) is not exponentially tight.
This completes the proof of Stein’s lemma.
Exercise 9.21: (The secretary problem or marriage problem) This illustrates a very di↵erent
type of sequential decision problem from those of Section 9.5. Alice is looking for a spouse and dates a set
of n suitors sequentially, one per week. For simplicity, assume that Alice must make her decision to accept
a suitor for marriage immediately after dating that suitor; she can not come back later to accept a formerly
rejected suitor. Her decision must be based only on whether the current suitor is more suitable than all
previous suitors. Mathematically, we can view the dating as continuing for all n weeks, but the choice at
week m is a stopping rule. Assume that each suitor’s suitability is represented by a real number and that all
n numbers are di↵erent. Alice does not observe the suitability numbers, but only observes at each time m
whether suitor m has the highest suitability so far. The suitors are randomly permuted before Alice dates
them.
a) A reasonable algorithm for Alice is to reject the first k suitors (for some k to be optimized later) and
then to choose the first suitor m > k that is more suitable than all the previous m
1 suitors. Find the
probability qk that Alice chooses the most suitable of all n when she uses this algorithm for given k. Hint
1: What is the probability (as a function of m) that the most suitable of all n is dated on week m. Hint 2:
Given that the most suitable of all suitors is dated at week m, what is the probability that Alice chooses
286
APPENDIX A. SOLUTIONS TO EXERCISES
that suitor? For m > k, show how this involves the location of the most suitable choice from 1 to m
1,
conditional on m’s overall optimality. Note: if no suitor is selected by the algorithm, it makes no di↵erence
whether suitor n is chosen at the end or no suitor is chosen.
Solution: Since the suitability numbers are randomly permuted (i.e., permuted so that
each permutation is equally likely), the largest appears at position m with probability 1/n
for 1  m  n.
Conditional on the largest number appearing at some given position m, note that for m  k,
Pr{choose m | m best} = 0 since the first k suitors are rejected out of hand. Next suppose
that the position m of the largest number is m > k. Then Alice will choose m (and thus
choose the best suitor) unless Alice has chosen someone earlier, i.e., unless some suitor j,
k < j < m, is more suitable than suitors 1 to k. To state this slightly more simply, Alice
chooses someone earlier if the best suitor from 1 to m 1 lies in a position from k + 1 to
m 1. Since the best suitor from 1 to m 1 is in each position with equal probability,
k
Pr{choose m | m best} =
m
1
k < j < m.
Averaging over m, we get
n
X
qk =
m=k+1
b) Approximating
Pj
k
.
n(m 1)
i=1 1/i by ln j, show that for n and k large,
qk ⇡
k
ln(n/k).
n
Ignoring the constraint that n and k are integers, show that the right hand side above is maximized at
k/n = e and that maxk qk ⇡ 1/e.
Solution: Using the solution for qk in (a), and letting i = m
"n 1
n 1
kX 1
k X1
qk =
=
n
i
n
i
i=k
i=1
k 1
X
1
i=1
i
#
⇡
kh
ln(n
n
1)
1,
ln(k
i
k h ni
1) ⇡
ln
.
n
k
The approximations here are pretty crude. The first one can be improved by using the
Euler-Mascheroni constant, but qk is still approximately optimized at k = n/e, and at that
point qk ⇡ 1/e. The exact value is too easy to calculate to scrutinize the approximations
closely.
c) (Optional) Show that the algorithm of (a), optimized over k, is optimal over all algorithms (given the
constraints of the problem). Hint: Let pm be the maximum probability of choosing the optimal suitor given
⇥
⇤
that no choice has been made before time m. Show that pm = max n1 + mm 1 pm+1 , pm+1 ; part of the
problem here is understanding exactly what pm means.
Solution outline: We have looked at the class of algorithms for which the first k suitors are
rejected and the first subsequent ‘best-yet’ suitor is chosen. Alice has many other choices,
including choosing the suitor at time m using the past history of best-yet suitors. It turns
287
A.9. SOLUTIONS FOR CHAPTER 9
out, however, that if we start at time m = n and work back inductively, we can find the
optimal probability of success conditional on no suitor being chosen before time m. Denote
this optimal conditional probability as pm . As we shall see, the remarkable fact which
makes this possible is that this conditional probability depends neither on the decisions
made before time m nor history of best-yet suitors before m.
Given that no decision is made before time n, then, a selection at time n is correct only
if suitor n is the best suitor, and this is an event of probability 1/n, independent of past
history and past decisions. Given no prior decision, the optimal choice is for Alice to chose
suitor n if n is best-yet and thus best overall. Thus pn = 1/n.
Next suppose no decision has been made before time n 1. The suitor at n 1 is optimal
with probability 1/n, and in this case must be a best-yet. Consider the following possible
decisions at time n 1: first, choose n 1 if best-yet, or, second, wait until time n whether
or not n 1 is best-yet. We evaluate pn 1 under the first of these decisions and then show
it is optimal. These decisions at n 1 and n are successful overall if n 1 is optimal (an
event of probability 1/n independent of past history and choices) and also if n 1 is not
best-yet and n is best. The event that n 1 is not best-yet has probability (n 2)/(n 1)
and is independent of the event that n is best, so
pn 1 =
1
n 2
+
.
n n(n 1)
(A.194)
If Alice waits until time n even if n 1 is best-yet, the probability of success drops to 1/n.
Finally, if Alice bases her decision on the past history of best-yet suitors, the probabiility
of success is a linear combination of pn 1 above and 1/n, thus verifying (A.194) in general.
We next find pm in terms of pm+1 for arbitrary m. We first assume that Alice’s optimal
strategy at time m, given that no prior decision has been made, is to choose m if it is bestyet, and if not best-yet to wait until time m + 1. As before, m is optimal with probability
1/n and if so, must be a best-yet choice. If m is not best-yet (an event of probability
(m 1)/m), then the event that success will occur at time m + 1 or later has probability
pm+1 . This event is independent of the event that m is not best-yet, so if this is the best
decision,
pm =
1 m 1
+
pm+1
n
m
tentatively.
(A.195)
Now recall that Alice can also reject suitor m even if best-yet, and in this case the probability
of success becomes pm+1 . If the choice is based on past history, then the probabiity of success
is a linear combination of pm+1 and the value in (A.195). We then have

1 m 1
pm = max
+
pm+1 , pm+1 .
(A.196)
n
m
⇥
⇤
If we subtract pm+1 from both sides, above, we see that pm pm+1 = max 1/n pm+1 /m, 0 ,
from which we conclude that pm increases as m decreases from n, and eventually becomes
constant, with the strategy rejecting all suitors for small m . If we expand the solution in
(A.196), we see that it is simply the strategy in (a), which is thus optimal.
288
APPENDIX A. SOLUTIONS TO EXERCISES
This proves the desired result, but it is of interest to see what happens if Alice receives more
information. If at each m, she receives the ordering of the first m numbers, then it is easy
to see by an extension of the argument above, that the additional information is irrelevant
and that the algorithm of (a), optimized over k, is still optimal.
On the other hand if Alice sees the actual suitability numbers, she can do better. For
example, if the sequence of suitability numbers is a sequence of n IID rv’s, then if Alice
knows the distribution of those numbers, she can solve the problem in a very di↵erent way
and increase the probability of best choice significantly. If she doesn’t know the distribution,
and n is very large, she can estimate the distribution and improve the algorithm. The case of
an ‘unknown’ distribution is flaky, however, and the notion of optimality becomes undefined.
d) Explain why this is a poor model for choosing a spouse (or for making a best choice in a wide variety of
similar problems). Caution: It is not enough to explain why this is not closely related to these real problems,
You should also explain why this gives very little insight into such real problems.
Solution: The assumptions that there are a given number n of candidates, that they are
randomly permuted, that they can be evaluated on a one-dimensional scale, that Alice is
unaware of that evaluation, but only of a best-yet ranking, and finally that decisions must
be made immediately are often individually appropriate. but, as discussed below, are very
strange when taken together.
We often model problems where some given parameter (n in this case) is taken to be
either large or infinite according to analytical convenience, and a stopping rule is needed to
terminate an essentially infinite search. Here, however, the optimal stopping rule depends
critically on n, so that n should be a meaningful parameter of the underlying problem. In
such cases, the reason for stopping and making an early decision is usually related to a
cost of making measurements, which does not exist here. Here, the need to make decisions
sequentially is quite artificial.
We next ask whether this model can illustrate limited aspects about choosing a spouse (or
similar problems of choice). The many artificial and arbitrary aspects of the model make
this difficult. The worst part is that most real world problems of sequential choice involve a
combination of learning, decision making, and avoidance of delay. This model, in rejecting
the first k suitors, appears to be like this, but in fact no learning is involved and no criterion
of time-savng is involved.
On the other hand, this model is clearly interesting from a mathematical standpoint —
there is a large literature that illustrates its inherent interest. Problems that are inherently
interesting are certainly worthy of attention. In addition, limiting the decision maker to
best-yet data can serve as a lower bound to situations of immediate choice where the decision
maker has more available data. What is objectionable is claiming applicability to a broad
set of real-world decision problems where no such applicability exists.
Exercise 9.22: a) Suppose {Zn ; n
1} is a martingale. Verify (9.83); i.e., E [Zn ] = E [Z1 ] for n > 1.
Solution: The special case of Lemma 9.6.5 with i = 1 is E [Zn |Z1 ] = Z1 . Since E [|Zn |] <
1 by definition of a martingale, (1.44) (i.e., the total expectation theorem) shows that
E [Zn ] = E [Z1 ]
289
A.9. SOLUTIONS FOR CHAPTER 9
b) If {Zn ; n
1} is a submartingale, verify (9.88), and if a supermartingale, verify (9.89).
Solution: As explained in (9.86), E [Zn |Z1 ] Z1 , so again (1.44) shows that E [Zn ]
The same argument with inequalities reversed applies to supermartingales.
E [Z1 ].
Exercise 9.23: Suppose {Zn ; n
1} is a martingale. Show that
⇤
E Zm | Zni , Zni 1 , . . . , Zn1 = Zni for all 0 < n1 < n2 < . . . < ni < m.
⇥
Solution: First observe from Lemma 9.6.5 that for ni < m,
E [Zm | Zni , Zni 1 , Zni 2 , . . . , Z1 ] = Zni .
This is valid for every sample value of every conditioning variable. Thus consider Zni 1 for
example. Since this equation has the same value for each sample value of Zni 1 , the expected
value of this conditional expectation over Zni 1 is E [Zm | Zni , Zni 2 , . . . , Z1 ] = Zni . In the
same way, any subset of these conditioning rv’s could be removed, leaving us with the
desired form.
Exercise 9.24: a) Assume that {Zn ; n
1} is a submartingale. Show that
E [Zm | Zn , Zn 1 , . . . , Z1 ]
Zn for all n < m.
Solution: The proof of Lemma 9.6.5 with inequalities used in place of equalities works
here.
b) Show that
⇥
⇤
E Zm | Zni , Zni 1 , . . . , Zn1
Zni for all 0 < n1 < n2 < . . . < ni < m.
Solution: This follows by using inequalities in place of equalities in the same set of steps
used in the solution to Exercise (9.23).
c) Assume now that {Zn ; n
1} is a supermartingale. Show that (a) and (b) still hold with
replaced by
.
Solution: One can repeat the steps of (a) and (b), blindly replacing each
with .
Alternatively, note that if {Zn ; n 1} is a supermartingale, then { Zn ; n 1} is a submartingale. This technique is frequently useful in converting results about submartingales
into results about submartingales.
Exercise 9.25: Let {Zn = exp[rSn
n (r)]; n
1} be the generating function martingale of (9.77)
where Sn = X1 + · · · + Xn and where X1 , X2 , . . . , are IID with mean X < 0. Let J be the possibly-defective
stopping trial for which the process stops after crossing a threshold at ↵ > 0 (there is no negative threshold).
Show that exp[ r⇤ ↵] is an upper bound to the probability of threshold crossing by considering the stopped
process {Zn⇤ ; n 1}.
The purpose of this exercise is to illustrate that the stopped process can yield useful upper bounds even
when the stopping trial is defective.
Solution: Example 9.6.4 shows that if gX (r) exists for any given r, then {Zn = exp[rSn
n (r)]; n
1} is a martingale and E [Zn ] = 1 for all n. From Theorem 9.8.3, the corresponding stopped process, {Zn⇤ ; n 1} is a martingale. From Theorem 9.8.5, E [Zn⇤ ] = 1 for
all n 1.
290
APPENDIX A. SOLUTIONS TO EXERCISES
There is a confusing notational issue here, since we are using Zn⇤ to refer to the stopped
process for some given r, and we earlier defined r⇤ as the positive root of X (r). We now
look at the generating function process {Zn ; n
1} for r = r⇤ and the stopped process
⇤S
⇤
⇤
⇤
r
n
{Zn ; n 1} for r = r . Then Zn = e
. If SJ
↵, then Zn⇤ er ↵ for some sufficiently
large n, and we have
n
o
⇤
Pr{SJ ↵} = lim Pr Zn⇤ er ↵ .
n!1
By the Markov inequality, Pr Zn⇤
⇤
the limit, and Pr{SJ ↵}  e r ↵ .
⇤
er ↵
⇤
 e r ↵ . Since this holds for all n, it holds for
Exercise 9.26: This exercise uses a martingale to find the expected number of successive trials E [J]
until some fixed pattern, a1 , a2 , . . . , ak , of successive binary digits occurs within a sequence of IID binary
random variables X1 , X2 , . . . (see Example 4.5.1 and Exercise 5.35 for alternative approaches). We take the
stopping time J to be the smallest n for which (Xn k+1 , . . . , Xn ) = (a1 , . . . , ak ). A mythical casino and
sequence of gamblers who follow a prescribed strategy will be used to determine E [J]. The outcomes of the
plays (trials), {Xn ; n 1} at the casino is a binary IID sequence for which Pr{Xn = i} = pi for i 2 {0, 1}
If a gambler places a bet s on 1 at play n, the return is s/p1 if Xn = 1 and 0 otherwise. With a bet s on 0,
the return is s/p0 if Xn = 0 and 0 otherwise; i.e., the game is fair.
a) Assume an arbitrary choice of bets on 0 and 1 by the various gamblers on the various trials. Let Yn be
the net gain of the casino on trial n. Show that E [Yn ] = 0. Let Zn = Y1 + Y2 + · · · + Yn be the aggregate
gain of the casino over n trials. Show that for any given pattern of bets, {Zn ; n
1} is a martingale.
Solution: The net gain of the casino on trial n is the sum of the gains on each gambler. If
a gambler bets s on outcome 1, the expected gain for the casino is s p1 s/p1 = 0. Similarly,
it is 0 for a bet on outcome 0. Since the expected gain from each gambler is 0, independent
of earlier gains, we have E [Yn |Yn 1 . . . , Y1 ] = 0. As seen in Example 9.6.3, this implies that
{Zn ; n 1} is a martingale.
b) In order to determine E [J] for a given pattern (a1 , a2 , . . . , ak ), we program our gamblers to bet as follows:
i) Gambler 1 has an initial capital of 1 which is bet on a1 at trial 1. If X1 = a1 , the capital grows to 1/pa1 ,
all of which is bet on a2 at trial 2. If X2 = a2 , the capital grows to 1/(pa1 pa2 ), all of which is bet on a3 at
trial 3. Gambler 1 continues in this way until either losing at some trial (in which case he leaves with no
money) or winning on k successive trials (in which case he leaves with 1/[pa1 . . . pak ]).
ii) Gambler `, for each ` > 1, follows the same strategy, but starts at trial `. Note that if the string
(a1 , . . . , ak ) appears for the first time at trials n k+1, n k+2, . . . , n, i.e., if J = n, then gambler n k + 1
leaves at time n with capital 1/[p(a1 ) . . . p(ak )] and gamblers j < n k + 1 have all lost their capital. We
will come back later to investigate the capital at time n for gamblers n k + 2 to n.
First consider the string a1 =0, a2 =1 with k = 2. Find the sample values of Z1 , Z2 , Z3 for the sample
sequence X1 = 1, X2 = 0, X3 = 1, . . . . Note that gamblers 1 and 3 have lost their capital, but gambler 2
now has capital 1/p0 p1 . Show that the sample value of the stopping time for this case is J = 3. Given an
arbitrary sample value n
2 for J, show that Zn = n
1/p0 p1 .
Solution: Since gambler 1 bets on 0 at the first trial and X1 = 1, gambler 1 loses and
Z1 = 1. At trial 2, gambler 2 bets on 0 and X2 = 0. Gambler 2’s capital increases from
1 to 1/p0 so Y2 = 1 1/p0 . Thus Z2 = 2 1/p0 . On trial 3, gambler 1 is broke and
doesn’t bet, gambler 2’s capital increases from 1/p0 to 1/p0 p1 and gambler 3 loses. Thus
Y3 = 1 + 1/p0 1/p0 p1 and Z3 = 3 1/p0 p1 . It is preferable here to look only at the
casino’s aggregate gain Z3 and not the immediate gain Y3 . In aggregate, the casino keeps
all 3 initial bets, and pays out 1/p0 p1 .
A.9. SOLUTIONS FOR CHAPTER 9
291
J = 3 since (X2 , X3 ) = (a1 , a2 ) = (0, 1) and this is the first time that the pattern (0, 1)
has appeared. For an arbitrary sample value n for J, note that each gambler before n 1
loses unit capital, gambler n 1 retires to Maui with capital increased from 1 to 1/p0 p1 ,
and gambler n loses. Thus the casino has n 1/p0 p1 as its gain.
c) Find E [ZJ ] from (a). Use this plus (b) to find E [J]. Hint: This uses the special form of the solution in
(b), not the Wald equality.
Solution: The casino’s expected gain up to each time n is E [Zn ] = 0, so it follows that
E [ZJ ] = 0 (It is easy to verify that the condition in 9.104 is satisfied in this case). We saw
in (b) that E [Zn | J = n] = n 1/p0 p1 , so E [ZJ ] = E [J] 1/p0 p1 . Thus E [J] = 1/p0 p1 .
Note that this is the mean first passage time for the same problem in Exercise 4.28. The
approach there was simpler than this for this short string. For long strings, the approach
here will be simpler.
d) Repeat parts b) and c) using the string (a1 , . . . , ak ) = (1, 1) and initially assuming (X1 X2 X3 ) = (011).
Be careful about gambler 3 for J = 3. Show that E [J] = 1/p21 + 1/p1 .
Solution: This is almost the same as (b) except that here gambler 3 wins at time 3. In
other words, since a1 = a2 , gamblers 2 and 3 both bet on 1 at time 3. As before, J = 3 for
this sample outcome. We also see that for J equal to an arbitrary n, gamblers n 1 and n
both bet on 1 and since Xn = 1, both win. Thus E [J] = 1/p21 + 1/p1 .
e) Repeat parts b) and c) for (a1 , . . . , ak ) = (1, 1, 1, 0, 1, 1).
Solution: Given that J = n, we know that (Xn 5 , . . . , Xn ) = (111011) so gambler n 5
leaves with 1/p51 p0 and all previous gamblers lose their capital. For the gamblers after n 5,
note that gambler n makes a winning bet on 1 at time n and gambler n 1 makes winning
bets on (1, 1) at times (n 1, n). Thus the casino wins n 1/p51 p0 1/p1 1/p21 . Averaging
over J, we see that E [J] = 1/(p51 p0 ) + 1/p1 + 1/p21 . In general, we see that, given J = n,
gambler n wins if a1 = ak , gambler 2 wins if (a1 , a2 ) = (ak 1 , ak ) and so forth.
f ) Consider an arbitrary binary string a1 , . . . , ak and condition on J = n for some n
sample capital of gambler ` is then equal to
• 0 for ` < n
k.
• 1/[pa1 pa2 · · · pak ] for ` = n
• 1/[pa1 pa2 · · · pai ] for ` = n
• 0 for ` = n
k. Show that the
k + 1.
i + 1, 1  i < k if (a1 , . . . , ai ) = (ak i+1 , . . . , ak ).
i + 1, 1  i < k if (a1 , . . . , ai ) 6= (ak i+1 , . . . , ak ).
Verify that this general formula agrees with parts (b), (d), and (e).
Solution: Gambler ` for `  n k bets (until losing a bet) on a1 , a2 , . . . , ak . Since the
first occurrence of (a1 , . . . , ak ) occurs at n, we see that each of these gamblers loses at some
point and thus is reduced to 0 capital at that point and stays there. Gambler n k + 1 bets
on a1 , . . . , ak at times n k+1, . . . , n and thus wins each bet for J = n. Finally, gambler
` = n i + 1 bets (until losing) on a1 , a2 , . . . , ai at times n i + 1 to n. Since J = n implies
that Xn k+1 , . . . , Xn = a1 , . . . ak , gambler n i + 1 is successful on all i bets if and only if
(a1 , . . . , ai ) = (ak i+1 , . . . , ak ).
For (b), gambler n is unsuccessful and in (d), gambler n is successful. In (e), gamblers n 1
and n are each successful. It might be slightly mystifing at first that conditioning on J is
292
APPENDIX A. SOLUTIONS TO EXERCISES
enough to specify what happens to each gambler after time n k + 1, but the sample values
of Xn k+1 to Xn are specified by J = n, and the bets of the gamblers are also specified.
g) For a given binary string (a1 , . . . , ak ), and each j, 1  j  k let Ij = 1 if (a1 , . . . , aj ) = (ak j+1 , . . . , ak )
and let Ij = 0 otherwise. Show that
E [J] =
k
X
i=1
Qi
Ii
`=1 pa`
.
Note that this is the same as the final result in Exercise 5.35. The argument is shorter here, but more
motivated and insightful there. Both approaches are useful and lead to simple generalizations.
Solution: The ith term in the above expansion is the capital of gambler n i + 1 at time
n. The final term at i = k corresponds to the gambler who retires to Maui and Ik = 1 in
all cases. How many other terms are non-zero depends on the choice of string. These other
terms can all be set to zero by choosing a string for which no prefix is equal to the suffix of
the same length.
Exercise 9.27: a) This exercise shows why the condition E [|ZJ |] < 1 is required in Lemma 9.8.4. Let
Z1 =
2 and, for n
values +1 and
1, let Zn+1 = Zn [1 + Xn (3n + 1)/(n + 1)] where X1 , X2 , . . . are IID and take on the
1 with probability 1/2 each. Show that {Zn ; n
1} is a martingale.
Solution: From the definition of Zn above,
E [Zn | Zn 1 , Zn 2 , . . . , Z1 ] = E [Zn 1 [1 + Xn 1 (3n
2)/n] | Zn 1 , . . . , Z1 ] .
Since the Xn are zero mean and IID, this is just E [Zn 1 | Zn 1 . . . , Z1 ], which is Zn 1 . We
also must show that E [|Zn |] < 1 for each n. Note that
|Zn |  |Zn 1 |[1 + (3n
2)/n]  4|Zn 1 |.
Thus for each n, |Zn | is bounded (although the bound is rapidly increasing in n), so E [|Zn |] <
1 for each n. It follows that {Zn ; n 1} is a martingale.
b) Consider the stopping trial J such that J is the smallest value of n > 1 for which Zn and Zn 1 have
the same sign. Show that, conditional on n < J, Zn = ( 2)n /n and, conditional on n = J, ZJ =
( 2)n (n
2)/(n2
n).
Solution: It can be seen from the iterative definition of Zn that Zn and Zn 1 have di↵erent
signs if Xn 1 = 1 and have the same sign if Xn 1 = 1. Thus the stopping trial is the
smallest trial n
2 for which Xn 1 = 1. Thus for n < J, we must have Xi = 1 for
1  i < n. For n = 2 < J, X1 must be 1, so from the formula above, Z2 = Z1 [1 4/2] = 2.
Thus Zn = ( 2)n /n for n = 2 < J. We can use induction now for arbitrary n < J. Thus
for Xn = 1,

3n + 1
( 2)n 2n
( 2)n+1
Zn+1 = Zn 1
=
=
.
n+1
n n+1
n+1
The remaining task is to compute Zn for n = J. Using the result just derived for n = J
and using XJ 1 = 1,

3(J 1) + 1
( 2)J 1 4J 2
( 2)J (2J 1)
ZJ = ZJ 1 1 +
=
=
.
J
J 1
J
J(J 1)
1
293
A.9. SOLUTIONS FOR CHAPTER 9
c) Show that E [|ZJ |] is infinite, so that E [ZJ ] does not exist according to the definition of expectation, and
show that limn!1 E [Zn |J > n] Pr{J > n} = 0.
Solution: We have seen that J = n if and only if Xi =
Thus Pr{J = n} = 2 n+1 so
E [|ZJ |] =
1
X
n=2
2 n+1 ·
1
X 2(2n
2n (2n 1)
=
n(n 1)
n(n
n=2
since the harmonic series diverges.
1 for 1  i  n
1)
1)
1
X
4
n
n=2
2 and XJ 1 = 1.
= 1,
(i)
Finally, we see that Pr{J > n} = 2 n+1 (since this event occurs if and only if Xi =
1  i < n). Thus
E [Zn | J > n] Pr{J > n} =
1 for
2 n+1 ( 2)n
2( 1)n
=
!0.
n
n
This martingale and stopping rule gives an example where
Pthe condition E [|YJ |] is needed
in Lemma 9.8.4. Note that E [YJ ], if it exists, is limn!1 1
i=1 E [ZJ =i] Pr{J=i}. For this
example, this is the alternating version of the harmonic series in (i), which converges in the
Cauchy sense but not in the usual sense. Exercise 1.8 provides a reminder of why this series
is defined to be nonexistant, i.e., of why the condition E [|YJ |] < 1 is needed.
Exercise 9.28: This exercise shows why the sup of a martingale can behave markedly di↵erently than the
maximum of an arbitrarily large number of the variables. More precisely, it shows that Pr supn 1 Zn
nS
o
can be unequal to Pr
a} .
n 1 {Zn
a) Consider a martingale where Zn can take on only the values 2 n 1 and 1
a
2 n 1 , each with probability
1/2. Given that Zn , conditional on Zn 1 , is independent of Z1 , . . . Zn 2 , find Pr{Zn |Zn 1 } for each n so
that the martingale condition is satisfied.
Solution: This is a positive martingale for which the sample values approach 0 or 1 with
increasing n. We will see that as n ! 1, the transitions between the larger and smaller
value become less frequent. Let p0 (n) = Pr Zn = 2 n 1 |Zn 1 = 2 n , i.e., the probability
that the sample value close to 0 at trial n 1 goes to the yet smaller value at trial n. For
n 1, with Z0 taken to be 1/2, we have
⇥
⇤
E Zn | Zn 1 = 2 n = 2 n 1 p0 (n) + 1 2 n 1 1 p0 (n) .
To satisfy the martingale condition, this must be equal to 2 n , so we can solve for p0 (n),
getting
p0 (n) =
2n 3/2
.
2n 1
(A.197)
We can define p1 (n) = Pr Zn =1 2 n 1 |Zn 1 = 1 2 n and solve for p1 (n) in the same
way, getting p1 (n) = p0 (n) for n 1. Thus a martingale exists with the transition probabilities for p0 (n) (and p1 (n) = p0 (n)) in (A.197).
We should also show that Pr Zn = 2 n 1 = 1/2 as specified in the problem statement.
For n = 1, it can be seen from (A.197) that p0 (1) = 1/2, and similarly p1 (1) = 1/2. It
294
APPENDIX A. SOLUTIONS TO EXERCISES
follows that E [Z1 ] = 1/2. From (9.83), then, E [Zn ] = 1/2 for all n
Pr Zn = 2 n 1 = Pr Zn = 1 2 n 1 = 1/2.
b) Show that Pr supn 1 Zn
1 = 1/2 and show that Pr
S
n {Zn
1. It follows that
1} = 0.
S
Solution: First we show that Pr{ n {Zn 1}} = 0. Since ZS
n has no sample values of 1 or
more, Pr{Zn 1} = 0 for all
S n. From the union bound, Pr{ n Zn 1} is upper bounded
by a sequence of 0’s, so Pr{ n {Zn 1}} = 0.
Finding Pr supn 1 Zn 1 is a little harder, since a sample sequence of {Zn ; n
1}
can jump back and forth between values less than 1/2 and greater than 1/2. This is an
ideal opportunity to use the martingale convergence theorem, which says that {Zn ; n 1}
converges WP1. Since the sample values greater than 1/2 approach 1 and the sample values
less than 1/2 approach 0, Zn must converge to 1 with some probability q and to 0 with
probability 1 q. Either using the symmetry between 0 and 1 or using the fact that each
sample value has probability 1/2 for each n, Pr supn 1 Zn 1 = 1/2.
c) Show that for every ✏ > 0, Pr supn 1 Zn a  aZ 1✏ . Hint: Use the relationship between Pr supn 1 Zn
S
and Pr
a} while getting around the issue in (b).
n {Zn
Solution: For any ✏ > 0, if supn 1 Zn a, then for some large enough m, max1nm Zn
a ✏. The probability of this latter event is non-decreasing with m and bounded by 1, so
it has a limit, and
⇢
⇢
E [Z1 ]
E [Z1 ]
Pr sup Zn a  lim Pr max Zn a ✏  lim
=
,
m!1
m!1 a
1nm
✏
a ✏
n 1
where we used (9.110) in the middle step.
d) Use (c) to establish (9.111).
Solution: Since (c) is valid for all ✏ > 0, it is valid in the limit as ✏ > 0 approaches 0,
yielding (9.111).
Exercise 9.29: Show that Theorem 9.7.4 is also valid for martingales relative to a joint process. That
is, show that if h is a convex function of a real variable and if {Zn ; n
process {Zn , Xn ; n
1}, then {h(Zn ); n
1} is a martingale relative to a joint
1} is a submartingale relative to {h(Zn ), Xn ; n
1}.
Solution: This is simply a matter of adding the additional conditioning to the proof of
Theorem 9.7.4. Specifically, using Jensen’s inequality and then the martingale property,
E [h(Zn )|Zn 1 , Xn 1 , . . . , Z1 , X1 ]
This shows that {h(Zn ); n
1} is a submartingale relative to {h(Zn ), Xn ; n
Exercise 9.30: Show that if {Zn ; n
a joint process {Zn , Xn ; n
h(E [Zn | Zn 1 , Xn 1 , . . . , Z1 , X1 ]) = h(Zn 1 ).
1}.
1} is a martingale (submartingale or supermartingale) relative to
1} and if J is a stopping trial for {Zn ; n
1} relative to {Zn , Xn ; n
1}, then
the stopped process is a martingale (submartingale or supermartingale) respectively relative to the joint
process.
Solution: This simply extends Theorem 9.8.3 to include a joint process. The proof that
E [|Zn |] < 1 is unchanged by the inclusion of a joint process. Now suppose that {Zn ; n 1}
a
295
A.9. SOLUTIONS FOR CHAPTER 9
is a submartingale relative to {Zn , Xn ; n
1} and let {Zn⇤ ; n
1} be the corresponding
stopped process. For any n 1, let {z1 , x1 , z2 , x2 , . . . , zn 1 , xn 1 } be a sample sequence for
{Z1 , X1 , Z2 , X2 , . . . , Zn 1 , Xn 1 }. This sample sequence specifies whether or not J  n 1
and specifies the sample value of J if J  n 1. Thus, as in the proof of Theorem 9.8.3, if
the sample sequence is such that J  n 1, then zn⇤ = zn⇤ 1 and
h
i
⇤(n 1)
⇤(n 1)
(n 1)
(n 1)
E Zn⇤ |Z 1
=z 1
,X1
=x 1
= zn⇤ 1 .
Alternatively, if J > n
1, then zn⇤ = zn and
h
i
⇤(n 1)
⇤(n 1)
(n 1)
(n 1)
E Zn⇤ |Z 1
=z 1
,X1
=x 1
zn⇤ 1 .
Thus {Zn⇤ ; n
1} is a submartingale relative to the joint process. The argument for a
supermartingale is the same with a reversal of inequality, and that for a martingale follows
since it is both a sub and supermartingale.
Exercise 9.31: Prove Corollaries 9.9.3 to 9.9.5, i.e., prove the following three statements:
a) Let {Zn ; n
⇥ ⇤
1} be a martingale with E Zn2 < 1 for all n 1. Then
⇥ 2⇤
⇢
E Zm
Pr max |Zn | b 
; for all integer m 2, and all b > 0.
1nm
b2
Hint: First show that {Zn2 ; n
(A.198)
1} is a submartingale.
⇥ ⇤
Solution: Zn2 is a convex function of Zn and E Zn2 < 1 for all n
1. Thus, from
Theorem 9.7.4, {Zn2 ; n 1} is a submartingale. Applying the submartingale inequality to
{Zn2 ; n 1},
⇥ 2⇤
⇢
E Zm
2
Pr max Zn a 
.
1nm
a
Substituting b2 for a, this becomes (A.198).
b) [Kolmogorov’s random walk inequality] Let {Sn ; n
1} be a random walk with Sn = X1 + · · · + Xn
where {Xi ; i
i} is a set of IID random variables with mean X and variance 2 . Then for any positive
integer m and any ✏ > 0,
⇢
2
Pr max |Sn nX| m✏ 
.
(A.199)
1nm
m✏2
Hint: Firt show that {Sn
nX; n
1} is a martingale.
Solution:
Sn nX. Then {Zn ; n
1} is a martingale as shown in Example
⇥ 2Let
⇤ Zn =
2
9.6.2. E Zm = m , so using (A.198) with m✏ substituted for b, we get (A.199).
c) Let {Sn ; n 1} be a random walk, Sn = X1 +· · ·+Xn , where each Xi has mean X < 0 and semi-invariant
moment generating function (r). For any r > 0 such that 0 < (r) < 1, and for any ↵ > 0. show that
⇢
Pr max Si ↵  exp{ r↵ + n (r)}.
(A.200)
1in
Hint: First show that {erSn ; n
1} is a submartingale.
296
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: Again let Zn = Sn nX, so {Zn , n 1} is a martingale. Then erSn = erZn +rnX
and we see by di↵erentiating twice with respect to Zn that erSn is a convex function of
Zn . Thus by Theorem 9.7.4, {erSn ; n 1} is a submartingale. Thus by the Kolmogorov
submartingale inequality,
⇥
⇤
⇢
E erSn
rSi
Pr max e
a 
.
(A.201)
1in
a
⇥
⇤
Now E erSn = [gX (r)]n = en (r) . Substituting er↵ for a, (A.201) becomes (A.200)
Exercise 9.32: a) Let {Zn ; n
1} be the scaled branching process of Section 9.6.2 and assume that Y ,
the number of o↵spring of an element, has finite mean Y and finite variance 2 . Assume that the population
at time 0 is 1. Show that
h 2
i
⇥ ⇤
3
n 1
E Zn2 = 1 + 2 Y
+Y
+ ··· + Y
.
(A.202)
Solution: Let {Xn ; n
1} be the unscaled process as discussed in Section 6.7. Thus
PXn 1
n
Zn = Xn /Y . We first find the second moment of Xn . Recall that Xn = i=1
Yi,n 1 .
Thus
E [Xn |Xn 1 ] = Xn 1 Y
and E [Xn ] = Y E [Xn 1 ] .
⇥ ⇤
n
Recursing over n, we see that E [Xn ] = Y . We use the same approach to find E Xn2 .
"✓
#
◆2
XXn 1
⇥ 2
⇤
E Xn |Xn 1 = E
Yi,n 1 | Xn 1
i=1
⇥ ⇤
2
= E Y 2 Xn 1 + Y Xn 1 1 Xn 1
=
⇥ ⇤
E Xn2 =
⇥ ⇤
In the third line, we used E Y 2 =
Since Zn2 = Xn2 /Y
2n
⇥ ⇤
E Xn2 =
2
Y
2
Xn 1 + Y Xn2 1
⇤
2 ⇥
2 n 1
Y
+ Y E Xn2 1 .
2
2 + Y 2 . Recursing on n down to n = 0, we get
n 1
n
+ Y + ··· + Y
2n 2
+Y
2n
.
, this implies (A.202).
⇥ ⇤
b) Assume that Y > 1 and find limn!1 E Zn2 . Show from this that the conditions for the martingale
convergence theorem, Theorem 9.9.8, are satisfied.
Note: This condition is not satisfied if Y  1. The general martingale convergence theorem requires only a
bounded first absolute moment, so it holds for Y  1.
⇥ ⇤
2
Solution: From (A.202), E Zn2 = 1 + ( 2 /Y ) 1 + Y
geometric series,
⇥ ⇤
E Zn2 =
2 (1
2
Y (1
Y
Y
n
)
1
)
.
1
+ ···Y
n+1
. Summing the
297
A.9. SOLUTIONS FOR CHAPTER 9
Since Y > 1, the limit converges to
tation of this result.
2 / Y 2 (1
Exercise 9.33: Show that if {Zn ; n
1} is a martingale (submartingale or supermartingale) relative
to a joint process {Zn , Xn ; n
Y
1
. See Example 9.9.10 for an interpre-
1} and if J is a stopping trial for {Zn ; n
1} relative to {Zn , Xn ; n
then the stopped process satisfies (9.98), (9.99), or (9.100) respectively.
1},
Solution: This follows from Exercise 9.30 by the same argument as used in proving Theorem 9.8.3 from Theorem 9.8.2.
Exercise 9.34: Show that if {Zn ; n
and if J is a stopping trial for {Zn ; n
(9.104) is satisfied.
1} is a martingale relative to a joint process {Zn , Xn ; n
1} relative to {Zn , Xn ; n
1}
1}, then E [ZJ ] = E [Z1 ] if and only if
Solution: This follows from the martingale part of Exercise 9.30 in the same way as Lemma
9.8.4 follows from Theorem 9.8.2.
Exercise 9.35: (The double-or-quarter game) Consider an investment example similar to that of
Example 9.10.1 in which there is only one investment other than cash. The ratio X of that investment
value at the end of an epoch to that at the beginning is either 2 or 1/4, each with equal probability. Thus
Pr{X = 2} = 1/2 and Pr{X = 1/4} = 1/2. Successive ratios, X1 , X2 , . . . , are IID.
a) In (a) to (c), assume a constant allocation strategy where a fraction
is constantly placed in the double-
or-quarter investment and the remaining 1
is kept in cash. Find the expected wealth, E [Wn ( )] and the
expected log-wealth E [Ln ( )] as a function of
2 [0, 1] and n
1. Assume unit initial wealth.
Solution: With probability 1/2, the fraction that is invested during epoch 1 becomes 2
and the 1
in cash remains 1 , leading to wealth W1 = 1+ . Similarly, with probability
1/2, the fraction that is invested becomes /4, yielding the final wealth W1 = 1 3 /4.
The expected wealth at the end of the first epoch is then
E [W1 ( )] = (1/2)(1 + ) + (1/2)(1
3 /4) = 1 + /8.
As in Example 9.10.1, this grows geometrically with n, so
E [Wn ( )] = (1 + /8)n .
The log-wealth L1 ( ) is the log of W1 ( ), which is ln(1 + ) with probability 1/2 and
ln(1 3 /4) with probability 1/2. Thus E [L1 ( )] = (1/2) ln (1 + )(1 3 /n) . Thus
E [Ln ( )] =
b) For
n
ln (1 + )(1
2
3 /4) .
= 1, find the PMF for W4 (1) and give a brief explanation of why E [Wn (1)] is growing exponentally
with n but E [Ln (1)] is decreasing linearly toward
1.
Solution: The probability of k successes out of n is given by the binomial nk 2 n , which
for n = 4 is, respectively, 1/16, 4/16, 6/16, 4/16, 1/16 for k = 0, 1, 2, 3, 4. Thus, using
298
APPENDIX A. SOLUTIONS TO EXERCISES
logs to the base 2,
1
16
4
Pr{L4 (1) = 1} =
16
6
Pr{L4 (1) = 2} =
16
4
Pr{L4 (1) = 5} =
16
1
Pr{L4 (1) = 8} =
.
16
Pr{W4 (1) = 16} = Pr{L4 (1) = 4}
Pr{W4 (1) = 2}
⇢
1
Pr W4 (1) =
4
⇢
1
Pr W4 (1) =
32
⇢
1
Pr W4 (1) =
256
=
=
=
=
=
As seen above, the sample values of W4 (1) are exponential in the number of successes,
whereas those of L4 (1) are linear in the number of successes. As n increases, the probabilities
cluster around the mean of Ln (since Ln is the sum of IID rv’s), but the very large low
probability values of Wn lead to an exponentially increasing value of E [Wn ].
c) Using the same approach as in Example 9.10.1, find the value of
⇤
that maximizes E [Ln ( )]. Show that
your solution satisfies the optimality conditions in (9.136).
Solution: The easiest way to do this is to use the fact that the log function is monotonic.
Therefore we can maximize Ln ( ) over by maximizing (1 + )(1 3 /4). The maximum
occurs at ⇤ = 1/6. We then see that E [Ln ( ⇤ )] = (n/2) log(49/48), which is slowly but
steadily increasing in n.
The optimality conditions of (9.136) in this case involve only 2 investments; the double or
⇤ be the
quarter price-relative, denoted by X, and cash, denoted by 1. Letting ⇤ and 1
corresponding allocations, (9.136) for the double or quarter investment becomes

n
X
=1 ;
for ⇤ > 0
E ⇤
⇤)
1 ;
for ⇤ = 0.
X + (1
⇤ ) is 12/7 for
To verify that ⇤ = 1/6 satisfies this equation, note that X ( ⇤ X + (1
X = 2 and 2/7 for X = 1/4. Since each value occurs with probability 1/2, we have

X
1 12 1 2
E ⇤
=
·
+ ·
= 1.
⇤)
X + (1
2 7
2 7
The same type of calculation works for the cash investment. The lesson to be learned from
this is that the optimality conditions, while simple appearing and highly useful theoretically,
are not trivial to interpret or work with.
d) Find the range of
over which E [Ln ( )] is positive.
Solution: E [Ln ( )] is positive where (1 + )(1 3 /4) is greater than 1. Solving for
(1 + )(1 3 /4) = 1, we get = 1/3 and = 0. We then see that E [Ln ( )] is positive
between those limits, i.e., for 0 < < 1/3.
e) Find the PMF of the rv Zn /Zn 1 as given in (9.138) for any given
n.
299
A.9. SOLUTIONS FOR CHAPTER 9
Solution: Again let Xn be the price-relative of the double or quarter investment and 1 be
⇤ be
that for cash. Letting and 1
be the corresponding allocations and ⇤ and 1
the optimal allocations, (9.138) becomes
Zn /Zn 1 =
n
Xn
⇤ X + (1
n n
⇤)
n
+ (1
1
n) ⇤
n Xn + (1
⇤)
n
.
For the two possible values of Xn , this becomes
Zn /Zn 1 =
Zn /Zn 1 =
2
+ (1
7/6
1/4
+ (1
n
7/8
1
6 6
=
+
7/6
7
7
1
8 6
=
n)
7/8
7
7
n)
n
for X = 2
for X = 1/2.
Thus, Zn /Zn 1 is 6/7+6 /7 with probability 1/2 and 8/7 6 /7 with probability 1/2. As a
check, E [Zn /Zn 1 ] = 1 for all , since as we have seen {Zn ; n 1} is a product martingale
in this case.
Exercise 9.36: (Kelly’s horse-racing game) An interesting special case of this simple theory of
investment is the horse-racing game due to J. Kelly and described in [5]. There are ` 1 horses in a race
and each horse j wins with some probability p(j) > 0. One and only one horse wins, and if j wins, the
gambler receives r(j) > 0 for each dollar bet on j and nothing for the bets on other horses. In other words,
the price-relative X(j) for each j, 1  j  ` 1, is r(j) with probability p(j) and 0 otherwise. For cash,
X(`) = 1.
The gambler’s
allocation of wealth on the race is denoted by (j) on each horse j with (`) kept in cash. As
P
usual, j (j) = 1 and (j) 0 for 1  j  `. Note that X(1), . . . , X(` 1) are highly dependent, since
only one has a nonzero sample value in any race.
a) For any given allocation
find the expected wealth and the expected log-wealth at the end of a race for
unit initial wealth.
Solution: With probability p(j), horse j wins and the resulting value of W1 ( ) is (j)r(j)+
(`). Thus
E [W1 ( )] =
` 1
X
j=1
E [L1 ( )] =
` 1
X
⇥
⇤
p(j) (j)r(j) + (`) ,
p(j) ln
j=1
⇥
⇤
(j)r(j) + (`) .
b) Assume that a statistically identical sequence of races are run, i.e., X 1 , X 2 , . . . , are IID where each
X n = Xn (1), . . . , Xn (`)
T
. Assuming unit initial wealth and a constant allocation
on each race, find the
expected log-wealth E [Ln ( )] at the end of the nth race and express it as nE [Y ( )].
Solution: Using (9.128) to express E [Ln ( )] as nE [Y ( )], we have
E [Y ( )] =
` 1
X
j=1
p(j) ln
⇥
⇤
(j)r(j) + (`) .
(A.203)
300
APPENDIX A. SOLUTIONS TO EXERCISES
c) Let ⇤ maximize E [Y ( )]. Use the necessary and sufficient condition (9.136) on ⇤ for horse j, 1  j < `
to show that ⇤ (j) can be expressed in the following two equivalent ways; each uniquely specifies each ⇤ (j)
in terms of ⇤ (`).
⇤
⇤
(j)
(j)
⇤
p(j)
=
n
(`)
;
r(j)
with equality if
(j) > 0
(A.204)
(`) o
, 0 .
r(j)
⇤
max p(j)
Solution: The necessary and sufficient condition for

⇤
X(j)
 1;
⇤ (k)X(k)
k
E P
(A.205)
⇤ in (9.136) for horse j is
with equality if
⇤
(j) > 0.
In the event that horse j wins, X(j) = r(j) while X(k) = 0 for horses k 6= j. Also X(`) = 1.
r(j)
Thus in the event that horse j wins, P X(j)
⇤ (j)r(j)+ ⇤ (`) . If any other horse wins,
⇤ (k)X(k) =
k
P
X(j)
⇤ (k)X(k) = 0. Thus, since j wins with probability p(j),
k

X(j)
⇤ (k)X(k)
k
E P
=
p(j) r(j)
⇤ (j)r(j) +
⇤ (`)
 1;
with equality if
⇤
(j) > 0.
(A.206)
Rearranging this inequality yields (A.204); (A.205) is then easily verified by looking separately at the cases ⇤ (j) > 0 and ⇤ (j) = 0.
Solving for ⇤ (`) (which in turn specifies the other components of ⇤ ) breaks into 3 special
Pcases which are
treated below in parts d), e), and f) respectively. The first
case,
in
(d),
shows
that
if
j<` 1/r(j) < 1,
P
⇤
then ⇤ (`) = 0. The second case, in (e), shows that if
(`) > minj (p(j)r(j),
j<` 1/r(j) > 1, then
with the specific value specified by the unique solution to (A.208). The third case, in (f), shows that if
P
⇤
(`) is nonunique, and its set of possible values occupy the range [0, minj (p(j)r(j)].
j<` 1/r(j) = 1, then
P
d) Sum (A.204) over j to show that if ⇤ (`) > 0, then j<` 1/r(j)
1. Note that the logical obverse of
P
⇤
⇤
this is that j<` 1/r(j) < 1 implies that (`) = 0 and thus that (j) = p(j) for each horse j.
Solution: Summing (A.204) over j < ` and using the fact that
get
1
⇤
(`)
1
⇤
(`)
X
P
j<`
⇤ (j) = 1
⇤ (`), we
1/r(j).
j<`
P
P
If ⇤ (`) > 0, this shows that j 1/r(j) 1. The logical obverse is that if j 1/r(j) < 1,
then ⇤ (`) = 0. This is the precise way of saying that if the returns on the horses are
sufficiently large, then no money should be retained in cash.
When ⇤ (`) = 0 is substituted into (A.204), we see that each ⇤ (j) must be positive and thus
equal to p(j). This is very surprising, since it says that the allocation ofP
bets does not depend
on the rewards r(j) (so long as the rewards are large enough to satisfy j 1/r(j) < 1). This
will be further explained by example in (g).
e) In (c), ⇤ (j) for each j < ` was specified in terms of ⇤ (`); here you are to use the necessary and sufficient
condition (9.136) on cash to specify ⇤ (`). More specifically, you are to show that ⇤ (`) satisfies each of the
301
A.9. SOLUTIONS FOR CHAPTER 9
following two equivalent inequalities:
X
p(j)
j<`
⇤ (j)r(j) +
X
p(j)
⇥
max
p(j)r(j),
j<`
⇤ (`)
⇤
⇤ (`)

1;
with equality if
⇤
(`) > 0
(A.207)

1;
with equality if
⇤
(`) > 0.
(A.208)
P
Show from (A.208) that if ⇤ (`)  p(j)r(j) for each j, then j 1/r(j)  1. Point out that the logical obverse
P
of this is that if j 1/r(j) > 1, then ⇤ (`) > minj (p(j)r(j). Explain why (A.208) has a unique solution for
⇤
(`) in this case. Note that
⇤
(j) = 0 for each j such that p(j)r(j) <
⇤
(`).
Solution: The necessary and sufficient condition for cash (investment `) is

X(`)
E P ⇤
 1;
with equality if ⇤ (`) > 0.
(A.209)
(k)X(k)
k
P
In the event that horse j wins, X(`) has the sample value 1 and k ⇤ (k)X(k) has the
sample value ⇤ (j)r(j)+ ⇤ (`). Taking the expectation by multiplying by p(j) and summing
over j < `, (A.209) reduces to (A.207). Now if we multiply both sides of (A.205) by r(j)
and then add ⇤ (`) to both sides, we get
⇥
⇤
⇤
(j)r(j) + ⇤ (`) = max p(j)r(j), ⇤ (`) ,
which converts (A.207) into (A.208). Now assume that ⇤ (`)  p(j)r(j) for each j. Then
the max in the denominator
of the left side of (A.208) is simply
P
P p(j)r(j) for each j and
(A.208) becomes j<` 1/r(j)  1. The logical obverse is that j<` 1/r(j) > 1 implies that
⇤ (`) > min p(j)r(j) , as was to be shown.
j
P
Finally, we must show that if j<` 1/r(j) > 1, then (A.208) has aPunique solution for
⇤ (`). The left side of (A.208), viewed as a function of ⇤ (`), is
j<` 1/r(j) > 1 for
⇤ (`) = min p(j)r(j) . This function is continuous and strictly decreasing with further
j
increases in ⇤ (`) and is less than or equal to 1 at ⇤ (`) = 1. Thus there must be a unique
value of ⇤ (`) at which (A.208) is satisfied.
It is interesting to observe from (A.205) that ⇤ (j) = 0 for each j such that p(j)r(j)  ⇤ (`).
In other words, no bets are placed on any horse j for which E [X(j)] < ⇤ (`). This is in
marked contrast to the case in (d) where the allocation does not depend on r(j) (within
the assumed range).
f ) Now consider the case in which
choice of
⇤
(`), 0 
⇤
P
j<` 1/r(j) = 1.
Show that (A.208) is satisfied with equality for each
(`)  minj<` p(j)r(j).
⇥
⇤
⇤
⇤
Solution: Note that max
P p(j)r(j), (`) = p(j)r(j) over the given range of (`), so the
left side of (A.208) is j<` 1/r(j) = 1 over this range. Thus the inequality in (A.208) is
⇤ (`)/r(j) can be used for
satisfied for all ⇤ (`) in this range. From (A.204), ⇤ (j) = p(j)
each j < `, to see that all the necessary and sufficient conditions are satisfied for maximizing
E [Y ( )].
g) Consider the special case of a race with only two horses. Let p(1) = p(2) = 1/2. Assume that r(1) and
r(2) are large enough to satisfy 1/r(1) + 1/r(2) < 1; thus no cash allotment is used in maximizing E [Y ( )].
With (3) = 0, we have
E [Y ( )] =
1
1
1
ln[ (1)r(1)] + ln[ (2)r(2)] = ln[ (1)r(1) 1
2
2
2
(1) r(2)].
(A.210)
302
APPENDIX A. SOLUTIONS TO EXERCISES
Use this equation to give an intuitive explanation for why
⇤
(1) = 1/2, independent of r(1) and r(2)).
Solution: Suppose that r(1) >> r(2). Choosing (1) to be large so as to enhance the profit
when horse 1 wins is counter-productive, since (A.210) shows that there is a corresponding
loss when horse 2 wins. This gain and loss cancel each other in the expected log wealth. To
view this slightly di↵erently, if each horse wins n/2 times, Wn is given by
Wn =
(1)
which again makes it clear that
n/2
1
(1)
n/2
r(1)
n/2
r(2)
n/2
,
⇤ (1) does not depend on r(1) and r(2).
h) Again consider the special case of two horses with p(1) = p(2) = 1/2, but let r(1) = 3 and r(2) = 3/2.
Show that
⇤
is nonunique with (1/2, 1/2, 0) and (1/4, 0, 3/4) as possible values. Show that if r(2) > 3/2,
then the first solution above is unique, and if r(2) < 3/2, then the second solution is unique, assuming
p(1) = 1/2 and r(1) = 3 throughout. Note: When 3/2 < r(2) < 2, this is an example where an investment
in used to maximize log-wealth even though E [X(2)] = p(2)r(2) < 1, i.e., horse 2 is a lousy investment, but
is preferable to cash in this case as a hedge against horse 1 losing.
Solution: Approach 1: Substitute ⇤ = (1/2, 1/2, 0)T and then (1/4, 0, 3/4)T into the
necessary and sufficient conditions; each satisfies those conditions. Approach 2: Note that
1/r(1) + 1/r(2) = 1. Thus, from (f), each of these values is satisfied.
Both choices of ⇤ here lead to the same rv, i.e., Y ( ⇤ ) = ln[3/2] for the event that horse 1
wins and Y ( ⇤ ) = ln[3/4] for the event that horse 2 wins. In other words, the maximizing
rv Y ( ⇤ ) is uniquely specified, even though ⇤ is not unique. All points on the straight line
between these two choices of ⇤ , i.e., (1/2 ↵, 1/2 2↵, 3↵)T for 0  ↵  1/4 also lead to
the same optimizing Y ( ⇤ ).
For r(2) > 3/2, we have 1/r(1) + 1/r(2) < 1, so from (d), the solution (1/2, 1/2, 0) is valid
and in this case unique. This can also be seen by substituting this choice of ⇤ into the
necessary and sufficient conditions, first with r(2) > 3/2 and then with r(2) < 3/2.
From (e), the choice ⇤ = (1/4, 0, 3/4) is the unique solution for 1/r(1) + 1/r(2) > 0, i.e.,
for r(2) < 3/2. This can be recognized as the allocation that maximizes E [Y ( )] for the
triple-or-nothing investment.
i) For the case where
P
j<` 1/r(j)
= 1, define q(j) = 1/r(j) as a PMF on {1, . . . , `
1}. Show that
E [Y ( ⇤ )] = D(p k q) for the conditions of (f). Note: To interpret this, we could view a horse race where
each horse j has probability q(j) of winning the reward r(j) = 1/q(j) as a ‘fair game’. Our gambler, who
knows that the true probabilities are {p(j); 1  j < `}, then has ‘inside information’ if p(j) 6= q(j), and can
establish a positive rate of return equal to D(p k q).
⇣
Solution: From (f), p(1), . . . , p(`
choice,
E [Y ( ⇤ )] =
⌘T
1), 0 is one choice for the optimizing
X
j<`
⇤ . Using this
p(j) ln[p(j)r(j)] = D(p k q).
To further clarify the notion of a fair game, put on rose-colored glasses to envision a race
track that simply accumulates the bets from all gamblers and distributes all income to the
303
A.9. SOLUTIONS FOR CHAPTER 9
bets on the winning horse. In this sense, q(j) = 1/r(j) is the ‘odds’ on horse j as established
by the aggregate of the gamblers. Fairness is not a word that is always used the same way,
and here, rather than meaning anything about probabilities and expectations, it simply
refers to the unrealistic assumption that neither the track nor the horse owners receive any
expected return from the betting.
Exercise 9.37: (Proof of Theorem 9.10.2) Let Zn = n1 Ln ( ) E [Y ( )] and note that limn!1 Zn = 0
WP1. Recall that this means that the set of sample points ! for which lim Zn (!) = 0 has probability 1.
Let ✏ > 0 be arbitrary but fixed throughout the exercise and let no 1 be an integer. Let A(no , ✏) = {! :
|Zn (!)|  ✏ for all n no }.
a) Consider an ! for which limn!1 Zn (!) = 0 and explain why ! 2 A(no , ✏) for some no .
Solution: By the definition of the limit (to 0) of a sequence of real numbers, i.e., of
lim Zn (!), there must be an no large enough that |Zn (!)|  ✏ for all n
no . Clearly,
! 2 A(no , ✏) for that no .
b) Show that Pr
nS
o
1
no =1 A(no , ✏)
= 1.
Solution: The set of ! for which limn!1 Zn (!) = 0 has probability 1, and, from (a), each
such ! is in A(no , ✏) for some no . Thus each such ! is in the above union, so that the union
has probability 1.
c) Show that for any
> 0, there is an no large enough that Pr{A(no , ✏)} > 1
. Hint: Use (1.9).
Solution: The sequence {A(no , ✏)} is nested in no , i.e., A(no , ✏) ✓ A(no +1, ✏) for all no 1.
Thus, from (1.9), limno !1 Pr{A(no , ✏)} = 1. Using the definition of a limit of real numbers
(plus the fact that probabilities are at most equal to 1), it follows that for any > 0, there
is an no large enough that Pr{A(no , ✏)} > 1
.
d) Show that for any ✏ > 0 and
n
h
Pr exp n E [Y ( )]
> 0, there is an no such that
i
h
i
✏  Wn ( )  exp n E [Y ( )] + ✏ for all n
o
>1
no } > 1
.
no
.
Solution: (c) applies to any given ✏, and (from the definition of A(no , ✏)) says that, for
any ✏, > 0, there is an no such that Pr{|Zn |  ✏ for all n no } > 1
. Substituting the
definition of Zn into this,
Pr{nE [Y ( )]
✏  Ln ( )  nE [Y ( )] for all n
Finally, by exponentiating the terms above, we get the desired expression.
Exercise 9.38: In this exercise, you are to express the buy-and-hold strategy and the two-pile strategy
of (9.137) as special cases of a time-varying allocation.
a) Suppose the buy-and-hold strategy starts with an allocation
investment k at time n. Show that
Wn (k) =
1 (k)
n
Y
1.
Let Wn (k) be the investors wealth in
Xm (k).
m=1
Solution: There is hardly anything to be shown. 1 (k) is the original investment, and
each Xm (k) is the ratio of wealth in k at the beginning and end of epoch m, so the product
gives the final wealth at epoch n.
304
b) Show that
APPENDIX A. SOLUTIONS TO EXERCISES
n (k) can be expressed in each of the following ways.
n (k)
Wn 1 (k)
n 1 (k)Xn 1 (k)
= P
= P
.
W
(j)
n 1
j
j n 1 (j)Xn 1 (j)
(A.211)
Solution: Note that n (k) is the fraction of total wealth in investment k at the beginning
of epoch n. Thus, as shown in the first equality above, it is the wealth in k at the end of
epoch n 1 divided by the total wealth at the end of epoch n 1. For the second equality,
we start by rewriting the terms in the first equality as
n (k)
=
=
=
W (k)Xn 1 (k)
P n 2
j Wn 2 (j)Xn 1 (j)
PWn 2 (k) Xn 1 (k)
i Wn 2 (i)
P Wn 2 (j)
P
Xn 1 (j)
j
i Wn 2 (i)
(k)Xn 1 (k)
,
n 1 (j)Xn 1 (j)
Pn 1
j
where in the last equality we have used the first equality in (A.211), applied to n
2.
Note that there is no real reason for wanting to know n (k) for any given investment
strategy, other than simply convincing oneself that n (k) (which depends on Xm (i) for all
m < n and investments i) exists for any strategy and in fact specifies any given strategy.
c) For the two-pile strategy of (9.137), verify (9.137) and find an expression for
n (k).
Solution: We omit the parenthesis for k since there is only one investment (other than
cash). For a constant allocation strategy, we see from (9.126) that E [Wn ( )] = (1 + /2)n .
The two-pile strategy invests 1/2 the initial capital in a constant allocation with = 1 and
1/2 with = 1/4. Thus the resulting expected capital is E [Wn ] = 12 (1 + 12 )n + 12 (1 + 18 )n as
desired.
⇥
To find ⇤E [Ln ], note that for the n-tuple Xm = 3 for 1  m  n, we have Ln = ln 12 3n +
1
n
n and thus this value of L can be
n
2 (3/2) . This n-tuple occurs with probability 2
neglected in approximating E [Ln ]. For all other n-tuples, Wn is 1/2 its value with the
constant allocation = 1/4, so Ln is reduced by ln(2) from the pure strategy value, giving
rise to (9.137).
To find n , we look first at 1 and 2 . Since each pile initially has 1/2 (of the initial
investment), and 1/4 of the first pile is bet, and all of the second pile is bet, 1 = 12 (1/4)+ 12 =
5/8.
Now note that 2 is a rv depending on X1 . If X1 = 3, then the piles contain 3/4 and 3/2
respectively. Since 1/4 of the first pile is bet and all the second pile is bet, we have 3/16
+3/2 bet overall out of 3/4 and 3/2. Thus, for X1 = 3, we have 2 = 3/4. For X1 = 0,
only the first pile is non-empty and 2 = 1/4.
In general,
n = 1/4 except where Xm = 3 for 1  m < n and then
n =
1 + 2n+1
.
4 + 2n+1
305
A.9. SOLUTIONS FOR CHAPTER 9
Exercise 9.39: a) Consider the martingale {Zn ; n
1} where Zn = Wn /Wn⇤ and Wn is the wealth at
time n for the pure triple-or-nothing strategy of Example 9.10.1 and Wn⇤ is the wealth using the constant
allocation
⇤
. Find the PMF of Z where Z = limn!1 Zn WP1.
Solution: The martingale convergence theorem applies, so Z actually exists as a rv. Actually, the power of the martingale convergence theorem is wasted here, since it is obvious
that limn!1 Zn (!) = 0 for all sample sequences ! except that for which Xn = 3 for all n.
Thus Z is the constant rv Z = 0 WP1.
b) Now consider the two-pile strategy of (9.137) and find the PMF of the limiting rv Z for this strategy.
Solution: The martingale convergence theorem again applies, and again is unnnecessary
since Zn (!) = 1/2 for all ! other than those for which Xm = 3 for all m  n. Thus
Z = limn!1 Zn = 1/2 WP1.
c) Now consider a ‘tempt-the-fates’ strategy where one uses the pure triple-or-nothing strategy for the first
⇤
3 epochs and uses the constant allocation
thereafter. Again find the PMF of the limiting rv Z.
Solution: This is slightly less trivial since Z is now a non-trivial (i.e., non-constant) rv.
The martingale convergence theorem again applies, although the answer can be seen without
using that theorem. If X1 = 3, then W1 = 3 and W1⇤ = 3/2 so Z1 = 2. If X1 = 0, then
Zn = 0 for all n 1.
If X1 = X2 = 3, then W2 = 9 and W2⇤ = (3/2)2 so Z2 = 4 If X2 = 0, then Zn = 0 for
all n 2. Similarly, if Xm = 3 for 1  m  3, then Zn = 8 and otherwise Zn = 0 for all
n m.
Since the allocation ⇤ is used for m > 3, we see that Zm = Z3 for m
3. Thus Z =
limn!1 Zn = 8 with probability 1/8 (i.e., if X1 = X2 = X3 = 3) and Z = 0 with
probability 7/8.
Exercise 9.40 Consider the Markov modulated random walk in the figure below. The random variables
Yn in this example take on only a single value for each transition, that value being 1 for all transitions from
state 1, 10 for all transitions from state 2, and 0 otherwise. ✏ > 0 is a very small number, say ✏ < 10 6 .
1
: 1n
⇠
y
X
✏
✏
1/2
z n
X
0 X
y
1/2
✏
1
z n
X
2 X
y
✏
a) Show that the steady-state gain per transition is 5.5/(1 + ✏). Show that the relative-gain vector is
✓
◆T
✏ 4.5)
10✏ + 4.5
w = 0,
,
.
(A.212)
✏(1 + ✏)
✏(1 + ✏)
Solution: Observe that the Markov chain goes from state 0 to either 1 or 2 with equal
probability, stays there for a very long, geometrically distributed time, then goes back to 0
for another equiprobable choice between 1 and 2. The chain is a birth-death chain, so its
steady-state probabilities are easily calculated to be
⇡0 =
✏
,
1+✏
⇡1 =
1
,
2(1 + ✏)
⇡2 =
1
.
2(1 + ✏)
306
APPENDIX A. SOLUTIONS TO EXERCISES
The steady-state gain per transition is then g = 5.5/(1 + ✏). The relative gain equation is
w + ge = Y + [P ]w . Solving this with w0 = 0, we get (A.212)
b) Let Sn = Y0 + Y1 + · · · + Yn 1 and take the starting state X0 to be 0. Let J be the smallest value of n
for which Sn
100. Find Pr{J = 11} and Pr{J = 101}. Find an estimate of E [J] that is exact in the limit
✏ ! 0.
Solution: If the first transition goes to state 2 and the state remains in state 2 for transitions
1 to 10, then S11 = 100. Since Yi  10 and Y0 = 0, this is the smallest value of n for which
Sn
100 and Pr{S11 = 100} = (1/2)(1 ✏10 ). As ✏ ! 0, Pr{S11 = 100} = (1/2). If
the first transition goes to state 1 and no transitions out of state 1 occur for the next 100
transitions, then the threshold is crossed at epoch 101 and Pr{J = 101} ⇡ (1/2)(1 ✏)100 ,
which is 1/2 in the limit ✏ ! 0. This is an approximation because there are other low
probability ways of crossing the threshold at time 101. The most probable of these is to
stay in state 1 for 98 transitions and then go to 0 and then 2. The probability of this is
(1/4)✏(1 ✏)98 . In the limit ✏ ! 0, J = 11 with probability 1/2 and J = 101 with probabiity
1/2. Thus lim✏!0 E [J] = 11/2 + 101/2 = 56/2.
c) Show that Pr{XJ = 1} = (1
45✏ + o(✏))/2 and that Pr{XJ = 2} = (1 + 45✏ + o(✏))/2. Verify, to first
order in ✏ that (9.159) is satisfied.
Solution: We can consider only sample paths that include at most on ✏ transition since
all other paths are o(eps). In order to stop in state2 (i.e., satisfy {XJ = 2}, there is one
sample path of probability (1/2)(1 ✏)10 = 1/2 5✏ + o(✏), and we see that 5✏ + o(✏) is
the probability of going to state 2 initially and then returning to 0 before stopping. Half of
these sample segments then go from 0 to 2, a set of paths of probability 5✏/2 + o(✏). In the
same way, there is a set of paths of probability 50✏/2 + o(✏) that go from 0 to 1 initially,
return to 0 before stopping, and then go to state 2. Adding these terms together,
Pr{XJ = 2} = 1/2
5✏ + 5✏/2 + 50✏/2 + o(✏) = 1/2 + 45✏/2 + o(2).
Then Pr{XJ = 1} = 1/2
45✏/2 + o(✏).
Exercise 9.42: Let {Zn ; n
1} be a martingale, and for some integer m, let Yn = Zn+m
Zm .
a) Show that E [Yn | Zn+m 1 = zn+m 1 , Zn+m 2 = zn+m 2 , . . . , Zm = zm , . . . , Z1 = z1 ] = zn+m 1
zm .
Solution: This is more straightforward if the desired result is written in the more abbreviated form
E [Yn | Zn+m 1 , Zn+m 2 , . . . , Zm , . . . , Z1 ] = Zn+m 1
Since Yn = Zn+m
E [Zn+m
Zm .
Zm , the left side above is
Zm |Zn+m 1 , . . . , Z1 ] = Zn+m 1
E [Zm | Zn+m 1 , . . . , Zm , . . . , Z1 ] .
Given sample values for each conditioning rv on the right of the above expression, and
particularly given that Zm = zm , the expected value of Zm is simply the conditioning value
zm for Zm . This is one of those strange things that are completely obvious, and yet somehow
obscure. We then have E [Yn | Zn+m 1 , . . . , Z1 ] = Zn+m 1 Zm .
307
A.9. SOLUTIONS FOR CHAPTER 9
b) Show that E [Yn | Yn 1 = yn 1 , . . . , Y1 = y1 ] = yn 1 .
Solution: In abbreviated form, we want to show that E [Yn | Yn 1 , . . . , Y1 ] = Yn 1 . We
showed in (a) that E [Yn | Zn+m 1 , . . . , Z1 ] = Yn 1 . For each sample point !, Yn 1 (!), . . . , Y1 (!)
is a function of Zn+m 1 (!), . . . , Z1 (!). Thus, the rv E [Yn | Zn+m 1 , . . . , Z1 ] specifies the
rv E [Yn | Yn 1 , . . . , Y1 ], which then must be Yn 1 .
c) Show that E [|Yn |] < 1. Note that b) and c) show that {Yn ; n
1} is a martingale.
Solution: Since Yn = Zn+m Zm , we have |Yn |  |Zn+m | + |Zm |. Since {Zn ; n
martingale, E [|Zn |] < 1 for each n so
1 is a
E [|Yn |]  E [|Zn+m |] + E [|Zm |] < 1.
Exercise 9.43: This exercise views the continuous-time branching process of Exercise 7.15 as a stopped
random walk. Recall that the process was specified there as a Markov process such that for each state j,
j 0, the transition rate to j + 1 is j and to j 1 is jµ. There are no other transitions, and in particular,
there are no transitions out of state 0, so that the Markov process is reducible. Recall that the embedded
Markov chain is the same as the embedded chain of an M/M/1 queue except that there is no transition from
state 0 to state 1.
a) To model the possible extinction of the population, convert the embedded Markov chain above to a
stopped random walk, {Sn ; n
a threshold at
probability
0. The stopped random walk starts at S0 = 0 and stops on reaching
1. Before stopping, it moves up by one with probability
and downward by 1 with
+µ
µ
at each step. Give the (very simple) relationship between the state Xn of the Markov chain
+µ
and the state Sn of the stopped random walk for each n
0.
Solution: Consider a random walk where Sn = Y1 + · · · + Yn and the sequence {Yn ; n
is IID with the PMF
pYn (1) =
+µ
and pYn ( 1) =
0}
µ
.
+µ
We note that Xn in the Markov chain is the sum of 1 (for the initial organism) and ±1 for
each subsequent birth or death transition. Since each of those birth and death transitions in
the Markov chain, is an IID random variable so long as the number of organisms is positive,
we see that Sn = Xn 1 so long as the number of organisms is positive. When the number
of organisms goes to 0, the number of deaths exceeds the number of births by 1, so Sn = 1.
The stopped process, stopping when Sn = 1, is then identical to the Markov chain, which
stays in state 0 after it is once entered.
b) Find the probability that the population eventually dies out as a function of
all three cases
> µ,
< µ, and
and µ. Be sure to consider
= µ.
Solution: The probability Pd that the population dies out is the probability that the
random walk in (a) crosses a threshold at 1. If < µ, the random walk has a negative
drift (i.e., the rv’s Yn of (a) have a negative mean), so the walk crosses the threshold
eventually with probability 1 and Pd = 1.
For
> µ, the random walk has a positive drift and the problem is identical to that
of the negative of the random walk (where the step Y has a negative drift) crossing a
positive threshold at +1. The probability of threshold crossing, (using the Wald identity
308
APPENDIX A. SOLUTIONS TO EXERCISES
⇤
and Corollary 9.4.4) is upper bounded by Pd  e r where r⇤ is the r where ( Y ) (r⇤ ) = 0.
Solving for r⇤ , we get r⇤ = ln(µ/ ), so Pd  µ/ . Since the random walk changes by ±1
at each step, and the threshold is an integer, this inequality is satisfied with equality, so
Pd = µ/ . This same analysis works for = µ, so Pd = 1 in that case.
An alternative approach is to note that this negative random walk with a threshold at +1
is the same as Example 5.5.4 (Stop when you’re ahead in coin tossing).
309
A.10. SOLUTIONS FOR CHAPTER 10
A.10
Solutions for Chapter 10
Exercise 10.1: a) Consider the joint probability density fX,Z (x, z) = e z for 0  x  z and fX,Z (x, z) =
0 otherwise. Find the pair x, z of values that maximize this density. Find the marginal density fZ (z) and
find the value of z that maximizes this.
Solution: e z has value 1 when x = z = 0, and the joint density is smaller whenever z > 0,
and is zero when z < 0, so fX,Z (x, z) is maximized by x = z = 0. The marginal density is
found by integrating fX,Z (x, z) = e z over x in the range 0 to z, so fZ (z) = ze z for z 0.
This is maximized at z = 1.
b) Let fX,Z,Y (x, z, y) be y 2 e yz for 0  x  z, 1  y  2 and be 0 otherwise. Conditional on an observation
Y = y, find the joint MAP estimate of X, Z. Find fZ|Y (z|y), the marginal density of Z conditional on Y = y,
and find the MAP estimate of Z conditional on Y = y.
Solution: The joint MAP estimate is the value of x, z in the range 0  x  z, that
maximizes fX,Z|Y (x, z|y) = fX,Z,Y (x, z, y)/fY (y) = (y 2 e yz )/fY (y). For any given y, 0 < y 
1, this is maximized, as in (a), for x = z = 0. Next, integrating fX,Z,Y (x, z, y) over x from 0
to z, we get fZ,Y (z, y) = y 2 ze yz . Integrating over z, we get fY (y) = 1 for 1  y  2. Thus
fZ|Y (z|y) = fZY (z, y) = y 2 ze yz .
This is maximized by z = 1/y, which is thus the MAP estimate for Z alone.
This shows that MAP estimation on joint rv’s does not necessarily agree with the MAP
estimates of the individual rv’s. This illustrates that MAP estimates do not necessarily
have the properties one might desire for estimation. More specifically, estimation, as distinguished from hypothesis testing, usually involves approximation of the desired quantity,
and MAP makes no allowance for closeness in the approximation.
Exercise 10.2: Let Y = X + Z where X and Z are IID and N (0, 1). Let U = Z
X.
a) Explain (using the results of Chapter 3) why Y and U are jointly Gaussian and why they are statistically
independent.
Solution: Since Y and U are both linear combinations of X, Z, they are jointly Gaussian
by definition 3.3.1. Since E [Y U ] = 0, i.e., Y and U are uncorrelated, and since they are
jointly Gaussian, they are independent
b) Without using any matrices, write down the joint probability density of Y and U . Verify your answer
from (3.22).
p
Solution: Since Y and U are independent and are each N (0, 2), the joint density is
 2
1
y
u2
fY U (y, u) =
exp
.
4⇡
4
⇥ ⇤
Since the covariance matrix of Y and U is [K] = 20 02 , this is the same as (3.22).
c) Find the MMSE estimate x̂(y) of X conditional on a given sample value y of Y . You can derive this from
first principles, or use (10.9) or Example 10.2.2.
310
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: From first principles, x̂(y) = E [X|Y = y]. To find this, we first find fX|Y (x|y).
fY |X (y|x)fX (x)
fX|Y (x|y) =
fY (y)
=
=
1
p exp
⇡
1
p exp
⇡
✓
(y
x)2
2
(x
x2 y 2
+
2
4
◆
y/2)2 .
p
Thus, given Y = y, X ⇠ N (y/2, 1/ 2). It follows that x̂(y) = E [X|Y = y] = y/2.
d) Show that the estimation error ⇠ = X
b ) = Y /2 so
Solution: From (c), X(Y
⇠ = X
b ) = X
X(Y
b ) is equal to
X(Y
U/2.
(X + Z)/2 = (X
Z)/2 =
U/2.
e) Write down the probability density of U conditional on Y = y and that of ⇠ conditional on Y = y.
Solution: Since U and Y are statistically independent,
1
fU |Y (u|y) = fU (u) = p exp
4⇡
Since ⇠ =
Y = y),
u2 /4 .
U/2 (or since ⇠, conditional on Y = y, is the fluctuation of X, conditional on
1
f⇠|Y (⇠|y) = f⇠ (⇠) = p exp
⇡
⇠2 .
f ) Draw a sketch, in the x, z plane of the equiprobability contours of X and Z. Explain why these are also
equiprobability contours for Y and U . For some given sample value of Y , say Y = 1, illustrate the set of
points for which x + z = 1. For a given point on this line, illustrate the sample value of U .
Solution: The circles below are equiprobability contours of X, Z. Since y 2 +u2 = 2(x2 +z 2 ),
they are also equiprobable contours of Y, U .
{(x, z) : x=0}
{(y, u) : u=0}
@ z axis
y axis
@u=1
r
@
@
@'$
@
@
@
@k
@u=
r 1 {(x, z) : z=0}
x axis
@
@
@
@
&%
@
@
@
@{(x, z) : x+z=1} = {(y, u) : y=1}
{(y, u) : y=0}
u axis
The point of the problem, in associating the estimation error with u/2, is to give a
graphical explanation of why the estimation error is independent of the estimate. Y and U
are independent since the y axis and the u axis are at 45o rotations from the the x and y
axes, and thus 90o from each other.
311
A.10. SOLUTIONS FOR CHAPTER 10
2
2
Exercise 10.3: a) Let X, Z1 , Z2 , . . . , Zn be independent zero-mean Gaussian rv’s with variances X
, Z
,
1
2
... Z
respectively. Let Yj = hj X + Zj for j
1 and let Y = (Y1 , . . . Yn )T . Use (10.9) to show that the
n
MMSE estimate of X conditional on Y = y = (y1 , . . . , yn )T , is given by
x̂(y ) =
n
X
gj yj ;
where
2
hj / Z
j
P
n
2
2 .
(1/ X
) + i=1 h2i / Z
i
gj =
j=1
(A.213)
Hint: Let the row vector g T be [KX·Y ][KY 1 ] and multiply g T by KY to solve for g T .
Solution: From (10.9), we see that x̂(y ) = g T y where g T = [KX·Y ][KY 1 ]. Since Y =
2 h T + [K ] and [K
2 T
hX + Z , we see that [KY ] = h X
Z
X·Y ] = X h . We can avoid inverting
T
[KY ] by taking the hint to solve the vector equation g [KY ] = [KX·Y ], i.e.,
2 T
g Th X
h + g T [KZ ] =
Since g T h is a scalar, we can rewrite this as (1
of this equation is
gj =
(1
2 T
Xh .
2 h T = g T [K ]. The jth component
g T h) X
Z
2 h
g T h) X
j
2
Zj
(A.214)
.
This shows that the weighting factors gj in x̂(y ) depend on j only through hj / Zj (if we
visualize scaling Yj by scaling both hj and Zj , we see that the estimate should involve hj
and j only in the ratio hj / j ). We still must determine the unknown constant 1 g T h.
To do this, multiply (A.214) by hj and sum over j, getting
g T h = (1
X
2 2
X hj
2 .
Zj
g Th =
P
g T h)
j
Solving for g T h, from this,
P 2
g
T
2
2
X hj / Zj
P 2 2 2 ;
h=
1+ j X
hj / Zj
j
Substituting the expression for 1
1
1+
1
j
2 2
2 .
X hj / Zj
(A.215)
g T h into (A.214) yields (A.213).
b
X(Y
) and show that (10.29) is valid, i.e., that
b) Let ⇠ = X
2
1/ ⇠2 = 1/ X
+
n
X
h2i
i=1
2 .
Zi
2
Solution: Using (10.6) in one dimension, ⇠2 = E [⇠X] = X
P
2
j gj Yj from (A.213), and since E [Yi X] = X hi , we have
2
⇠
=
2
X
n
X
gi E [Yi X] =
2
X
i=1
=
2
X (1
g T h) =
1
n
X
gi hi
i=1
1+
2
X
2 2
2
j X hj / Zj
P
=
!
1/
h
i
b
b
E X(Y
)X . Since X(Y
)=
2
X +
1
P
2
2 ,
j hj / Zj
312
APPENDIX A. SOLUTIONS TO EXERCISES
where we have used (A.215). This is equivalent to (10.29).
c) Show that (10.28), i.e., x̂(y ) =
2 Pn
2
⇠
j=1 hj yj / Zj , is valid.
Solution: From (b), the denominator of the expression for gj in (A.213) is equal to 1/ ⇠2 .
P
Thus gj = ⇠2 hj / Z2 j and x̂(y ) = j gj yj is the desired expression.
d) Show that the expression in (10.29) is equivalent to the iterative expression in (10.25).
Solution: First, we show that (10.29) implies (10.27). We use ⇠n to refer to the error for n
observations and ⇠n 1 for the error using the first n 1 of those observations. Using (10.29),
1
2
⇠n
1
=
2 +
X
2
⇠n
+
=
i
2
Zi
i=1
1
=
n
X
h2
1
2 +
X
n
X1
i=1
h2n
h2i
2
Zi
h2n
+
2
Zn
(A.216)
2 ,
Zn
1
which is (10.27). This holds for all n > 1, so (10.27) for all n also implies (10.29).
e) Show that the expression in (10.28) is equivalent to the iterative expression in (10.25).
Solution: Breaking (10.28) into the first n
x̂(y1n )
=
2
⇠n
n
X1
j=1
hj yj
2
Zj
+
2 hn yn
⇠n 2
Zn
1 terms followed by the term for n, we get
=
2
⇠n
2
⇠n
x̂(y1n 1 ) +
1
where we used (10.28) for x̂(y1n 1 ). We can solve for
2
⇠n , getting
2
⇠n
2
⇠n
2
2
⇠n / ⇠n
1
2 hn yn
⇠n 2 ,
Zn
(A.217)
by multiplying (A.216) by
2 2
⇠n hn
.
2
Zn
=1
1
Substituting this into (A.217) yields
x̂(y1n ) = x̂(y1n 1 ) +
2 hn yn
⇠n
h2n x̂(y1n 1 )
2
Zn
.
Finally, if we invert (A.216), we get
2
⇠n =
2
2
⇠n 1 Zn
.
h2n ⇠2n 1 + Z2 n
Substituting this into (A.217), we get (10.27).
Exercise 10.4: Let X, Z1 , . . . , Zn be independent, zero-mean, and Gaussian, satisfying Yi = hi X + Zi
for 1  i  n. The point of this exercise is to point out that estimation of X from Y n
1 is virtually no more
general here than the same probem with hi = 1 for 1  i  n.
a) Write out the MMSE estimate and variance of the estimation error from (10.28) and (10.29) where each
hi = 1.
313
A.10. SOLUTIONS FOR CHAPTER 10
Solution:
x̂(y n1 ) =
2
⇠n
n
X
yj
1
2 ;
j=1
2
⇠n
Zj
=
1
2 +
X
n
X
1
2 .
Zi
i=1
b) For arbitrary numbers h1 , . . . , hn , consider the equations hi Yi = hi X + hi Zi . Let Ui = hi Yi . Write out
the MMSE estimate of X as a funtion of U1 , . . . , Un , still using the rv’s Z1 , . . . , Zn . Hint: Note that if you
scale each observation Ui to Ui /hi you have the same estimation problem as in (a).
Solution: Assume that hi 6= 0 for 1  i  n. Letting Ui = hi Yi , the modified equations
become Ui = hi Xi + hi Zi for 1  i  n. Scaling these equations, Ui /hi = X + Zi for 1  i 
n. The MMSE estimate of X from U1 , . . . , Un must be the same as the MMSE estimate from
U1 /h1 , . . . , Un /hn , since either set of rv’s specifies the other set and the MMSE estimate
is a minimum over all types of processing of the observations. Since Ui /hi = X + Zi , the
MMSE estimation problem is that solved in (a), with the observations now equal to ui /hi .
Thus
x̂(u1 , . . . , un ) =
2
⇠n
n
X
j=1
uj
hj Z2 j
1
;
2 =
⇠n
n
X
1
1
2 +
X
i=1
(A.218)
2 .
Zi
If the assumption that all hi 6= 0 is not made, then for any i such that hi = 0, there is no
observation for that i and the equation above must be changed by omitting all such i.
c) Now let Wi = hi Zi . Write out the same MMSE estimate in terms of both Ui , . . . , Un and W1 , . . . , Wn
and show that you now have (10.28) with all the hi ’s back in the equation. Hint: you still have the basic
equation in (a), but the variance of the Zi ’s must be expressed in terms of the Wi ’s. Write out the error
variance from (10.29) in the same way.
Solution: The equations in terms of {Wi ; 1  i  n} are Ui = hi X + Wi and this set
of variables is still jointly Gaussian with independence between X and W1 , . . . , Wn . The
MMSE estimate given in (A.218) is still valid, since we are merely denoting Zi as Wi /hi .
2 = h2 2 , we have
Noting that W
i Zi
i
x̂(u1 , . . . , un ) =
2
⇠n
n
X
hj uj
j=1
2
Wj
;
1
2
⇠n
=
1
2 +
X
n
X
h2
i=1
i
2 .
Wi
This gives the MMSE estimate and error variance for the observations Ui = hi Xi + Wi , and
the expression is the same as the general equations in (10.28) and (10.29). In other words,
the more general problem is the restricted problem with appropriate scaling.
2
Exercise 10.5: a) Assume that X1 ⇠ N (X 1 , X
) and that for each n
1
0 < ↵ < 1, Wn ⇠ N (0,
1, Xn+1 = ↵Xn + Wn where
2
1,
W ), and X1 , W1 , W2 , . . . , are independent. Show that for each n
E [Xn ] = ↵n 1 X 1 ;
(1
2
Xn =
2
↵2(n 1 ) W
2
+ ↵2(n 1) X
.
1
1 ↵2
Solution: For each n > 1, E [Xn ] = ↵E [Xn 1 ], so by iteration, E [Xn ] = ↵n 1 X 1 . Simi-
314
APPENDIX A. SOLUTIONS TO EXERCISES
larly,
2
Xn
=
2
↵2 X
n
1
+
2
W
h
2
= ↵ ↵2 X
+
n 2
2
2
2
= ↵4 X
+ (1 + ↵2 ) W
= ···
n 2
2
W
i
2
= ↵2(n 1 X
+ 1 + ↵2 + ↵4 + · · · ↵2(n 2
1
=
(1
2
W
(A.219)
2
W
2
↵2(n 1 ) W
2
+ ↵2(n 1) X
.
1
1 ↵2
b) Show directly, by comparing the equation
2
Xn moves monotonically from
+
2
X1 to
2
W /(1
2
2 2
Xn = ↵
Xn 1 +
2
2
W
for each two adjoining values of n that
↵ ) as n ! 1.
Solution: View
2
2 2
Xn = ↵ Xn
1
+
(A.220)
2
W
2
2 . Since ↵2 > 0, this is monotonic increasing in
as a function of X
for fixed ↵2 and W
n 1
2
2
2
2
2
Xn 1 . Now consider two cases, first where X2 > X1 and second where X2 < X1 .
2 > 2 and since (A.220) is
Case 1: Consider (A.220) for n = 1 and n = 2. Since X
X1
2
2
2
2
2
2
increasing in Xn 1 , we see that X3 > X2 . Iterating this argument, X
> X
for
n
n 1
2
2
2
all n > 1, i.e., Xn is monotonic increasing in n. Next, substituting Xn > Xn 1 into
2 < ↵2 2 + 2 for all n
2 < 2 /(1
(A.220), we get X
1, so that X
↵2 ) for all n 1.
Xn
W
W
n
n
2 ; n > 1} is a bounded increasing sequence, it has a limit. The limit satisfies
Since { X
n
2 ) = ↵2 lim
2
2
2
2
limn!1 ( X
↵2 ).
n!1 ( Xn ) + W . Thus limn!1 Xn = W /(1
n
2 is decreasing in n from
Case 2: This follows the same argument as case 1, except here X
n
2
2
2
2
2 = 2 /(1 ↵2 )
an initial value above W /(1 ↵ ). Finally, for X1 = X2 , we see that X
W
n
for all n 1.
c) Assume that sample values of Y1 , Y2 , . . . , are observed, where Yn = hXn + Zn and where Z1 , Z2 , . . . ,
are IID zero-mean Gaussian rv’s with variance
assume h
2
Z.
Assume that Z1 , . . . , W1 , . . . , X1 are independent and
0. Rewrite the recursion for the variance of the estimation error in (10.41) for this special case.
Show that if h/ Z = 0, then
2
⇠n =
2
Xn for each n
1. Hint: Compare the recursion in (b) to that for
2
⇠n .
Solution: Rewriting (10.41) for this special case,
1
2
⇠n
=
1
↵2 ⇠2n 1 +
2
W
+
h2
2 .
Z
(A.221)
If h/ Z = 0, this simplifies to
2
2 2
⇠n = ↵ ⇠n
1
+
2
W.
(A.222)
2 replaced by 2 . Now for n = 1, 2 is the
This is the same recursion as (A.219) with X
⇠n
⇠1
n
2 = 2
variance of the error in the MMSE estimate of X1 with no measurement, i.e., X
⇠1
1
2
(this is also clear from (10.33)). Thus from the recursion in (A.222), ⇠2n = X
for
all
n
n 1.
315
A.10. SOLUTIONS FOR CHAPTER 10
2
⇠n is a decreasing function of h/ Z for each n
2
Xn for each n. Explain (without equations) why this result must be true.
d) Show from the recursion that
2
⇠n 
2. Use this to show that
Solution: From (10.33), ⇠21 is decreasing in h. We use this as the basis for induction on
(A.221), i.e., as h/ z increases, ⇠2n 1 decreases by the inductive hypothesis, and thus from
2
2
(A.221), ⇠2n also decreases. Since ⇠2n = X
for h = 0, we must then have ⇠2n  X
n
n
for h > 0. This can also be seen because one possible estimate for Xn is E [Xn ], i.e., the
2 , which must
mean of Xn in the absence of measurements. The error variance is then X
n
be greater than or equal to the variance with a MMSE estimate using the measurements.
e) Show that the sequence { ⇠2n ; n
this and (d), show that
1} is monotonic in n. Hint: Use the same technique as in (b). From
= limn!1
2
⇠n exists. Show that the limit satisfies (10.42) (note that (10.42) must
have 2 roots, one positive and one negative, so the limit must be the positive root).
Solution: View (A.221) as a monotonic increasing function of ⇠2n 1 , viewing all other
quantities as constants. Thus if ⇠2n 1 is replaced by ⇠2n and ⇠2n 1 < ⇠2n , then ⇠2n < ⇠2n+1 .
2 is decreasing
By induction, then, ⇠2n is increasing in n. Similarly, if ⇠2n 1 > ⇠2n , then X
n
2 is also bounded independent of n,
in n, so either way ⇠2n is monotonic in n. From (b), X
n
so = limn!1 ⇠2n must exist. Eq. (10.42) is simply a rearrangement of (A.221) replacing
2
2
⇠n and ⇠n 1 by , yielding
⇥
⇤
2
2
↵2 h2 Z 2 2 + h2 Z 2 ) W
+ 1 ↵2
(A.223)
W = 0.
This must be satisfied by = limn!1 ⇠2n . Since each term within the brackets above is
positive, the coefficient of is positive. Since the constant term is negative, this quadratic
equation must have two solutions, one positive and one negative. The limiting error variance
is, of course, the positive term, and thus can be found uniquely simply by solving this
equation.
f ) Show that for each n
1,
2
⇠n
is increasing in
2
W
and increasing in ↵. Note: This increase with ↵ is
surprising, since when ↵ is close to one, Xn changes slowly, so we would expect to be able to track Xn well.
The problem is that limn
2
Xn =
2
W /(1
↵2 ) so the variance of the untracked Xn is increasing without limit
as ↵ approaches 1. (g) is somewhat messy, but resolves this issue.
Solution: The argument that
the argument in (d).
2
⇠n is increasing in ↵ and increasing in
g) Show that if the recursion is expressed in terms of
=
2
W /(1
↵2 ) and ↵, then
2
W
is the same as
is decreasing in ↵ for
constant .
Solution: If we make the suggested substitution in (A.223) and rearrange it, we get
✓
2 ◆
2
↵2 2
Z
Z
+
+
= 0.
1 ↵2
h2
h2
Since the quadratic term is increasing in ↵ over the range 0 < ↵ < 1), the positive root is
decreasing.
Exercise 10.6: a) Assume that X and Y are zero-mean, jointly Gaussian, jointly non-singular, and
related by Y = [H]X + Z . Show that the MMSE estimate can be expressed as
x̂ (y ) = [K⇠ ][H T ][KZ 1 ]y .
(A.224)
316
APPENDIX A. SOLUTIONS TO EXERCISES
Hint: Recall from (10.9) that x̂ (y ) = [G]y . Relate [G] and [H] by using (3.115).
Solution: We assume that X and Z are independent (as required for [H] and [KZ ] to
specify the joint covariance of X and Y and as required for the development in Section
3.5 to make any sense). As shown in Section 3.5, the relation between X and Y can be
expressed both as Y = [H]X + Z (with X and Z independent) and X = [G]Y + V (with
Y and V independent). From (3.43) and (3.49), [G] and [H] are symmetrically given as
[G] = [KX ·Y KY 1 ];
1
T
[H] = [KX
·Y KX ].
(A.225)
From the relationship X = [G]y + V , it is clear that the MMSE estimate of X for a sample
observation y is x̂ (y ) = [G]y and that the estimation error is v and thus that [K⇠ ] = [KV ].
The matrices [G] and [H] are related in (3.115) as [GT KV 1 ] = [KZ 1 H] and thus
[GT ] = [KZ 1 HKV ] = [KZ 1 HK⇠ ].
(A.226)
Substituting the transpose of this into x̂ (y ) = [G]y , we get (A.224).
b) Show that
[K⇠ 1 ] = [KX 1 ] + [H T KZ 1 H].
(A.227)
T
Hint: A reasonable approach is to start with (10.10), i.e., [K⇠ ] = [KX ] [KX ·Y KY 1 KX
·Y ]. Premultiply both
sides by [KX 1 ] and post-multiply both sides by [K⇠ 1 ]. The final product of matrices here can be expressed
as [H T KZ 1 H] by using the hint in (a) on [KZ 1 H].
Solution: Following the hint,
[KX 1 ] = [K⇠ 1 ]
T
[KX 1 KX ·Y KY 1 KX·Y
K⇠ 1 ].
T
From (A.225), we can substitute [H T ] for [KX 1 KX ·Y ] and [GT ] for [KY 1 KX
·Y ] above,
getting
[KX 1 ] = [K⇠ 1 ]
[H T GT K⇠ 1 ].
Rearranging and substituting (A.226) into this, we get (A.227).
One minor detail we have omitted above is to verify that [KZ ] and [KV ] are non-singular.
If [KZ ] is singular, then there is some vector a such that a T Z is 0. This implies that
a T X = [G]Y , which means that X and Y can not be mutually non-singular. Since we
have assumed that X and Y are mutually non-singular, this means that Z (and thus [KZ ])
is non-singular. The same argument applies to V .
⇥
Exercise 10.7: a) Write out E (X
T
g T Y )2
⇤
=
2
X
2[KX·Y ]g + g T [KY ]g as a function of g =
(g1 , g2 , . . . , gn ) and take the partial derivative of the function with respect to gi for each i, 1  i  n. Show
that the vector of these partial derivatives is
2[KX·Y ] + 2g T [KY ].
@
Solution: Note that @g
[KX·Y ]g = [KX·Y ]. For the second term, we take the partial
derivative with respect to gi for any given i, getting
@ T
g [KY ]g = [KY ]i· g + g T [KY ]·i = 2g T [KY ]·i ,
@gi
317
A.10. SOLUTIONS FOR CHAPTER 10
where [KY ]·i denotes the ith column of [KY ] and [KY ]i· denotes the ith row. We have used
the fact that [KY ] is symmetric for the latter equality. Putting these equations into vector
form,
@ ⇥ 2
@g X
⇤
2[KX·Y ]g + g T [KY ]g =
2[KX·Y ] + 2g T [KY ].
b) Explain why the stationary point here is actually a minimum.
Solution: The stationary point, say g̃ , i.e., the point at which this vector partial derivative
T
is 0, is g̃ T = [KX·Y KY 1 ]. Taking
the transpose,
g̃ = [KY 1 ][KX·Y
]. There are two ways of
⇥
⇤
T
2
showing that g̃ minimizes E (X g Y ) over g . For the first of these, recall that for the
Gaussian case, x̂MMSE (y ) = g̃ T y . Since the MMSE estimate
in y , the LLSE
⇥ is then linear
⇤
estimate must be the same, and thus g̃ must minimize E (X g T Y )2 over g .
⇥
⇤
The second
way to show
that the stationary point g̃ minimizes E (X g T Y )2 is to note
⇥
⇤
that E (X g T Y )2 is convex in g , and thus must be minimized by a stationary point if
a stationary point exists. As we have just shown, a stationary point does exist in this case.
Exercise 10.8: For a real inner-product space, show that n vectors, Y1 , . . . , Yn are linearly dependent
if and only if the matrix of inner products, {hYj , Yk i; 1  j, k  n}, is singular.
Solution: Let [K] be the matrix of inner products, {hYj , Yi i; 1  i, j  n}, and let
[K]i =(hY1 , Yi i, . . . ,hYn , Yi i)T be the ith column ofP[K]. Then [K] is singular i↵ there
is a non-zero vector a = (a1 , . P
. . , an )T such that i ai [K]i = 0. This vector equation,
writtenP
out by components, is i ai hYj , Yi i = 0 for 1  j  n. This can be rewritten
as hYj , i ai Yi i = 0 for 1  j  n. Thus, if [K] is singular,
P there
Pis an a for which
this equationPis satisfied for 1  j  n. It follows that h j aj Yj , i ai Yi i = 0. This
implies that i ai Yi = 0 and {Yi ; 1  i  n} is a linearly dependent set. Conversely,
if
P
{Yi ; 1  i 
Pn} is a linearly dependent set, then there is a non-zero a such that i ai Yi = 0.
Thus hYj , i ai Yi i = 0 for 1  j  n and [K] is singular.
Exercise 10.9: Show that (10.89) - (10.91) agree with (10.11).
Solution: From (10.90),
X
= E [X]
↵j E [Yj ] .
(A.228)
j
Note that E [XYi ] = E [X] E [Yi ] + KXYi and E [Yi Yj ] = E [Yi ] E [Yj ] + KYi Yj . Substituting
this in (10.91),
E [X] E [Yi ] + KXYi = E [Yi ] +
X
↵j E [Yi ] E [Yj ] +
j
j
Multiplying (A.228) by E [Yi ] and subtracting from (A.229),
KXYi =
X
X
j
↵j KYi Yj .
↵j KYi Yj .
(A.229)
318
APPENDIX A. SOLUTIONS TO EXERCISES
b
This implies that ↵ T = KXY KY 1 so that (10.89) becomes X(Y
) = D + KXY KY 1 Y . In
1
b ) = + KXY K y . Associating with b, we
terms of any given sample value y , this is X(y
Y
see that this is the one dimensional (in X) form of (10.12), which is equivalent to (10.11).
Recall that (10.11) is the MMSE estimate for the Gaussian case and thus is also the LLSE
estimate for the general case with the same covariances.
Exercise 10.10: Let X = (X1 , . . . , Xn )T be a zero-mean complex rv with real and imaginary components Xre,j , Xim,j , 1jn respectively. Express E [Xre,j Xre,k ], E [Xre,j Xim,k ], E [Xim,j Xim,k ], E [Xim,j Xre,k ]
⇥
⇤
as functions of the components of [KX ] and E X X T .
Solution: Note that E [X X T ] is the pseudo-covariance matrix as treated in Section 3.7.2
and the desired covariances of real and imaginary parts are given in (3.100). We simply
rewrite those results in the notation of the problem statement..
1
< ([KX ]jk + [MX ]jk ) ,
2
1
= ( [KX ]jk + [MX ]jk ) ,
2
1
< ([KX ]jk [MX ]jk ) ,
2
1
= ([KX ]jk + [MX ]jk ) .
2
E [Xre,j Xre,k ] =
E [Xre,j Xim,k ] =
E [Xim,j Xim,k ] =
E [Xim,j Xre,k ] =
Exercise 10.11: Let Y = Yre +iYim be a complex random variable. For arbitrary real numbers a, b, c, d,
find complex numbers ↵ and
such that
< [↵Y + Y ⇤ ] = aYre + bYim ,
= [↵Y + Y ⇤ ] = cYre + dYim .
Solution: Writing out the first equation above in terms of the real and imaginary parts of
↵, , and Y , we get
<[↵Y + Y ⇤ ] = ↵re Yre
↵im Yim +
re Yre +
im Yim
= aYre + bYim .
This must be satisfied for all sample values of Yre and Yim , so it breaks into two equations,
↵re +
re = a;
↵im +
im = d.
Treating the second equation in the same way, we get
↵im +
im = c;
↵re
re = d.
We now have two equations in the real variables ↵re and re and two equations in ↵im and
im . They can be trivially solved for ↵re , re , ↵im , and im , getting
↵ =
=
a+d
c b
+i
2
2
a d
b+c
+i
.
2
2
319
A.10. SOLUTIONS FOR CHAPTER 10
Note: The complex field is similar to the real field in so many ways that it is important to
sometimes focus on essentially trivial exercises like this that force one to be aware of the
di↵erences, especially when going from a complex rv Y to the real rv’s Yre and Yim .
Exercise 10.12: (Derivation of circularly symmetric Gaussian density) Let X = X re + iX im
T
T T
be a zero-mean, circularly-symmetric, n dimensional⇥ Gaussian
⇤ complex rv. ⇥ Let UT ⇤= (X re , X im ) be the
T
corresponding 2n dimensional real rv. Let [Kre ] = E Xre Xre and [Kri ] = E Xre Xim .
a) Show that
2
[KU ] = 4
Solution:
Kre
Kri
Kri
Kre
3
5.
E [X X T ] = E [(X re + iX im )(X re + iX im )T ]
= E [X re X Tre ]
E [X im X Tim ] + iE [X re X Tim ] + iE [X im X Tre ] .
This is the pseudo-covariance matrix, so it must be zero from the circular-symmetry constraint. Thus
Kre = E [X re X Tre ] = E [X im X Tim ]
Kri = E [X re X Tim ] =
E [X im X Tre ] .
Relating this to [KU ],


⇤
X re ⇥ T
X re X Tre X re X Tim
T
[KU ] =
X re , X im =
X im
X im X Tre X im X Tim
=

Kre Kri
Kri Kre
.
This is the desired solution, but note in addition that since [KU ] is a covariance matrix
of a (real) rv, it is symmetric, so that the lower left matrix is also equal to KriT . Thus
KriT = Kri .
b) Show that
[KU 1 ] =
and find the B, C for which this is true.

B
C
C
B
.
Solution: We assume throughout that [KU ] is non-singular, i.e., that none of the 2n real
and imaginary parts of X are linear combinations of the others. We see later that this
is equivalent to [KX ] being non-singular. From (3.46), the upper left portion of [KU1 ] is
[B] = [Kre ] + [Kri Kre1 Kri ]. The same formula applies to the lower right matrix, so it is [B]
also. Similarly, from (3.47), the upper right matrix is [C] = [BKri Kre1 ]. The lower left
matrix is then [C] = [BKri Kre1 ].
c) Show that [KX ] = 2[Kre
iKri ].
⇥
⇤
Solution: Since [KX ] = E X X † ,
[KX ] = E [(X re + iX im )(X re
iX im )T ]
= E [X re X Tre ] + [X im X Tim ] + i [X im X Tre ]
= 2[Kre ]
2i[Kri ].
i [X re X Tim ]
320
APPENDIX A. SOLUTIONS TO EXERCISES
Note: After some notational changes, this agrees with 3.105. In 3.105, [Kre ] was defined as
<[KX ], which is (1/2)[Kre ] as defined here. In 3.105, [Kim ] was defined as =[KX ], which is
( 1/2)[Kri ] as defined here.
d) Show that [KX 1 ] = 12 [B
iC].
Solution: We are to show that [KX 1 ] = 12 [B iC]. This is equivalent to showing that
[KX ] 12 [B iC] = [In ], which is equivalent to showing that [Kre iKri ][B iC] = [In ]. To
show this, consider


Kre Kri
B C
1
[I2n ] = [KU ][KU ] =
.
Kri Kre
C B
Expanding, we see that [In ] = [Kre B] [Kri C] and 0 = [Kre C] + [Kri B]. Since the matrices
[Kre ], [Kri ], [B], and [C] are real, this is equivalent to [Kre iKri ][B iC] = [In ].
e) Define fX (x ) = fU (u) for u = (x Tre , x Tim )T and show that
fX (x ) =
exp x ⇤ KX 1 x T
p
.
(2⇡)n det[KU ]
Solution: Since U is a non-singular, real, zero-mean Gaussian 2n-rv, its density is
⇥
⇤
exp 12 u T KU1 u T
p
fU (u) =
.
(A.230)
(2⇡)n det[KU ]
Expanding the term in brackets, we get
T
1
u KU u
T
= (x re , x im )
T
T

B C
C B

x re
x im
= x Tre Bx re + x Tim Bx im + x Tre Cx im
x Tim Cx re .
(A.231)
⇥
Since ⇤B is real, the first two terms⇥ in (A.231) can be rewritten⇤ as < (x re ix im )T B(x re +
ix im ) . Since [B] is symmetric, ⇥= (x re ix im )T B(x re + ix im⇤) = 0 and thus the first two
terms of (A.231) are equal to (x re ix im )T B(x re + ix im ) . Since C = C T , we have
x T Cx = 0 for any real x (and thus for both x re and x im ). Thus the latter two terms in
(A.231) can be rewritten as
(x re ix im )T [ iC](x re +ix im ) = x re [ iC](ix im ) + (ix Tim )[iC]x re = x Tre Cx im
Thus, we have
u T KU1 u T = x † (B
iC)x = 2x † KX 1 x .
Substituting this back in (A.230), we have the desired expression.
f ) Show that
det[KU ] =
"
Kre + iKri
Kri
Kri
iKre
Kre
#
.
Hint: Recall that elementary row operations do not change the value of a determinant.
x Tim Cx re .
321
A.10. SOLUTIONS FOR CHAPTER 10
Solution: Start with the 2n ⇥ 2n matrix [KU ] and add i times row n + 1 to row 1.
Similarly, for each j, 2  j  n, add i times row n + j to row j. This yields
"
#
Kre + iKri Kri iKre
det[KU ] = det
.
Kri
Kre
g) Show that
"
det[KU ] =
Kre + iKri
Kri
0
Kre
iKri
#
.
Hint: Recall that elementary column operations do not change the value of a determinant.
Solution: This is essentially the same as (f). Add i times each of the first n columns to
the corresponding final n columns to yield
"
#
Kre + iKri
0
det[KU ] = det
.
Kri
Kre iKri
h) Show that
det[KU ] = 2 2n (det[KX ])2 ,
and from this conclude that (3.108) is valid.
Solution: The determinant in (g) is just the determinant of the upper left block times the
⇤ ] and the
determinant of the lower right block. From (c), the upper left block is (1/2)[KX
lower right block is (1/2)[KX ]. Since multiplying each element of an n ⇥ n matrix by 1/2
multiplies the determinant by (1/2)n , we have
det[KU ] = 2 2n (det[KX ])2 .
This also shows that [KU ] is non-singular if and only if [KX ] is.
Exercise 10.13: (Alternate derivation of circularly-symmetric Gaussian density)
a) Let X be a circularly-symmetric, zero-mean, complex Gaussian rv with variance 1. Show that
fX (x) =
exp x⇤ x
.
⇡
Recall that the real part and imaginary part each have variance 1/2.
2
2
Solution: Since Xre and Xim are independent and X
= X
= 12 , we have
re
im
"
#
x2im
1
x2re
exp x⇤ x
fX (x) =
exp
=
.
2
2
2⇡ Xre Xim
⇡
2 X
2 X
re
im
b) Let X be an n-dimensional, circularly-symmetric, complex Gaussian zero-mean random vector with
KX = [In ]. Show that
fX (x ) =
exp x † x
.
⇡n
322
APPENDIX A. SOLUTIONS TO EXERCISES
Solution: Since all real and imaginary components are independent, the joint probability
over all n complex components is the product of n terms with the form in (a).
Pn
⇤
exp
exp x ⇤T x
i=1 xi xi
fX (x ) =
=
.
⇡n
⇡n
c) Let Y = HX where H is n ⇥ n and invertible. Show that
⇥
exp y † (H 1 )† H
fY (y ) =
v⇡ n
1
y
⇤
,
where v is dy /dx , the ratio of an incremental 2n dimensional volume element after being transformed by H
to that before being transformed.
Solution: The argument here is exactly the same as that in Section 3.3.4. Since Y = HX ,
we have X = H 1 Y , so for any y and x = H 1 y ,
fY (y )|dy | = fX (x )|dx |.
f (H 1 y )
fY (y ) = X
.
|dy |/|dx |
Substituting the result of b) into this,
fY (y ) =
exp
⇥
⇤
y † [H † ] 1 [H] 1 y
.
|dy |/|dx |
d) Use this to show that that (3.108) is valid.
Solution: First note that
h
i
h
i
[KY ] = E Y Y † = E HX X † H † = [HH † ].
Thus [KY ] 1 = [H † ] 1 [H] 1 and (b) can be rewritten as
⇥
⇤
exp y † [KY 1 ]y
fY (y ) =
.
⇡ n dy |/|dx
Next, view |dx | as an incremental volume in 2n dimensional space (n real and n imaginary
components) and view |dy | as the corresponding incremental volume for y = Hx , i.e.,
"
# "
#"
#
y re
Hre
Him
x re
=
.
y im
Him Hre
x im
Using the same manipulations on the determinant as in Exercise 10.13, parts (f)and (g),
we then have
"
#
"
#
Hre
Him
Hre + iHim iHre Him
|dy |
= det
= det
|dx |
Him Hre
Him
Hre
"
#
Hre + iHim
0
= det
= det[H] det[H ⇤ ] = det[KY ].
Him
Hre iHim
323
A.10. SOLUTIONS FOR CHAPTER 10
Substituting this into the expression for fY (y ) above, we have 3.108.
Exercise 10.14: a) Let Y = X 2 + Z where Z is a zero-mean unit variance Gaussian random variable.
Show that no unbiased estimate of X exists from observation of Y . Hint. Consider any x > 0 and compare
with
x.
Solution: Let x > 0 be arbitrary. Then fY |X (y|x) = fZ|X (y x2 ). Similarly, fY |X (y| x) =
fZ|X (y x2 ). Thus for all choices of parameter x, fY |X (y|x) = fY |X (y| x). It follows that
Z
h
i
E X̂(Y )|X = x =
x̂(y)fY |X (y|x) dy
Z
h
i
=
x̂(y)fY |X (y| x) dy = E X̂(Y )|X = x .
As a consequence, the bias is given by
h
i
h
i
bx̂ (x) = E X̂(Y ) X|X=x = E X̂(Y )|X=x x
h
i
h
i
bx̂ ( x) = E X̂(Y ) X|X= x = E X̂(Y )|X=x + x.
Thus bx̂ ( x) = bx̂ (x) + 2x. This shows that for each x 6= 0, if x̂ is unbiased at X = x, then
it must be biased at X = x and vice-versa.
This is not a bizarre type of example; it is unusual only in its extreme simplicity. The
underlying problem here occurs whether or not one assumes an a priori distribution for X,
but it is easiest to understand if X has a symmetric distribution around 0. In this case, Y
is independent of the sign of X, so one should not expect any success in estimating the sign
of X from Y . The point of interest here, however, is that bias is not a very fundamental
quantity. Estimation usually requires some sort of tradeo↵ between successful estimation
of di↵erent values of x, and focussing on bias hides this tradeo↵.
b) Let Y = X + Z where Z is uniform over ( 1, 1) and X is a parameter lying in (-1, 1). Show that x̂(y) = y
is an unbiased estimate of x. Find a biased estimate x̂1 (y) for which |x̂1 (y)
with strict inequality with positive probability for all x 2 ( 1, 1).
x|  |x̂(y)
x| for all x and y
h
i
Solution: Choosing x̂(y) = y implies that E X̂(Y )|X=x = E [X + Z)|X = x] = x since
Z is independent of X and satisfies Z = 0. Thus bx̂ (x) = 0 for all x, 1 < x < 1 and x̂(y)
is unbiased. This is clearly a stupid estimation rule, however, since whenever y > 1, the
rule chooses x̂(y) to be larger than any possible x. Reducing the estimate to 1 whenever
y > 1 clearly reduces the error |x̂(y) x| for all x when y > 1. Increasing the estimate to
1 when y < 1 has the same advantage. Thus a more sensible estimate is
8
>
< 1; y 1
x̂1 (y) =
y;
1<y<1 .
>
:
1; y  1
For all y > 1, |x̂1 (y) x| = 1 x < y x = |x̂(y) x|. Similarly, for y <
|x̂(y) x|. For 1  y  1, |x̂1 (y) x| = |x̂(y) x|. In summary, |x̂1 (y)
1, |x̂1 (y) x| <
x|  |x̂(y) x|
324
APPENDIX A. SOLUTIONS TO EXERCISES
with strict inequality for |y| > 1. For each x 6= 0, there is a positive probability that |Y | > 0,
so the expected magnitude of error is smaller for x̂1 than for x̂1 for all x 6= 0.
To be more explicit, it is straightforward to calculate that bx̂1 (x) = x2 /4 for x 0 and
bx̂1 (x) = x2 /4 for x < 0, so the new estimate is biased for all x except x = 0. The expected
error magnitude for x̂1 is E [x̂1 (Y ) x] = 1/2 x2 /4 whereas for the unbiased estimate,
E [x̂(Y ) x] = 1/2. Thus the expected error magnitude using x̂1 is smaller than that for x̂
for all x 6= 0.
Neither this example nor the one in (a) are exotic. Together they give the picture that unbiased estimators can neither be counted on to exist nor to provide sensibly small estimation
errors without checking for additional specialized structure.
Exercise 10.15: a) Assume that for each parameter value x, Y is Gaussian, N (x, 2 ). Show that vx (y)
x)/ 2 and show that the Fisher information is equal to 1/ 2 .
as defined in (10.103) is equal to (y
Solution: Note that we can view Y as X + Z where Z is N (0,
(2⇡ 2 ) 1/2 exp( (y x)2 /(2 2 )). Thus
@f (y|x)/@x = [(y
x)/ 2 ](2⇡ 2 ) 1/2 exp( (y
2 ).
We have f (y|x) =
x)2 /(2 2 )).
Dividing by f (y|x), we see that vx (y) = (y x)/ 2 . Then the random variable Vx (Y ),
conditional on x, is (Y x)/ 2 . Since Y ⇠ N (x, 2 ), the variance of (Y x)/ 2 , conditional
2 . By definition (see (10.107)), this is the Fisher information, i.e.,
on x, is 2 / 4 =
2
J(x) = 1/ .
b) Show that for ML estimation, the bias is 0 for all x. Show that the Cramer-Rao bound is satisfied with
equality for all x in this case.
Solution: For the ML estimate, x̂ML (y) = y, and for any given x,
h
i
E X̂ML (Y ) | X = x = E [Y | X = x] = x.
Thus bx̂ML (x) = 0 and the the estimate
is unbiased. Thei estimation error is just Z, so the
h
mean-square estimation error, E
with (a),
h
E X̂ML (Y )
X̂ML (Y )
2
x |X = x , is
i
2
x |X = x = 1/J(x)
2 for each x. Combining this
for each x.
Comparing with (10.113), the Cramer-Rao bound for an unbiased estimator is met with
equality for all x by using the ML estimate.
c) Consider using the MMSE estimator for the a priori distribution X ⇠ N (0,
satisfies bx̂MMSE (x) =
x
2
/(
2
+
2
X ).
Show that the bias
2
X ).
Solution: For the MMSE estimator, x̂MMSE (y) =
h
i
x 2
have E X̂(Y ) | X=x = 2 +X2 and
2
X
2+ 2
X
y
X
bx̂MMSE (x) =
x 2
.
2+ 2
X
. Thus, since E [Y |X=x] = x, we
325
A.10. SOLUTIONS FOR CHAPTER 10
d) Show that the MMSE estimator in (c) satisfies the Cramer-Rao bound with equality for each x. Note
that the mean-squared error, as a function of x is smaller than that for the ML case for small x and larger
for large x.
Solution: From (10.109), the Cramer-Rao bound for a biased estimate is
h
i
b )|X=x
VAR X(Y
h
i
@b b (x) 2
1+ X
@x
=
2 /( 2 +
1/ 2
⇤
2+ 2 ) 2
2 4
X /(
X
X
=
2 )2 .
1/ 2
( 2+ X
⇥ 2
h
i
b )|X=x directly,
Calculating VAR X(Y
=
⇥
1
J(x)
⇤
2
2
X)

h
i
Y 2
b
VAR X(Y )|X=x = VAR 2 X 2 | X=x
+ X

2
(x + Z) X
= VAR
| X=x
2+ 2
X

2 4
Z 2
X
= VAR 2 X 2
=
2 )2 .
+ X
( 2+ X
(A.232)
(A.233)
Comparing (A.232) and (A.233), we see that the Cramer-Rao bound is met with equality
2 than that
for all x. Note also that the variance of the estimation error is smaller for all X
2
for the ML estimate, and that this variance shrinks to 0 as X ! 0. This seems surprising
2 ! 0.
until we recognize that the estimate is also shrinking to 0 as X
If we use the Cramer-Rao bound to look at the mean-square error given x rather than the
variance given x (see (10.112)), we get
h
b )
E (X(Y
i
x)2 |X=x
(
2 4
2 x4
X
+
2 )2 .
2 + 2 )2
( 2+ X
X
We see then that the mean-square error using MMSE estimation is small when |x| is small
and large when |x| is large. This is not surprising, since the MMSE estimate chooses x̂(y)
to be a fraction smaller than 1 of the observation y.
Exercise 10.16: Assume that Y is N (0, x). Show that Vx (y) as defined in(10.103) is vx (y) = [y2 /x
1]/(2x). Verify that Vx (Y ) is zero mean for each x. Find the Fisher information, J(x).
Solution: given x, Y is simply a zero-mean Gaussian rv, so
✓ 2◆
1
y
fY |X (y|x) = p
exp
2x
2⇡x

✓ 2◆
@fY |X (y|x)
1
1
y2
y
= p
+ 2 exp
@x
2x 2x
2x
2⇡x
2
2
1
y
(y /x) 1
Vx (y) =
+
=
.
2x 2x2
2x
326
APPENDIX A. SOLUTIONS TO EXERCISES
Since the variance of Y , conditional on X = x is x, the expected value of Vx (Y ) conditional
on X = x is 0 for all x.
Since E [Vx (Y )|X = x] = 0, we have
⇥
⇤
J(x) = VAR [Vx (Y )|X=x] = E Vx2 (Y )|X=x
 4
⇤
1 ⇥ 2
1
Y
2
=
E (Y /x 1) |X=x =
E 2
4x2
4x2
x
 2
1 3x
2x
1
=
+1 =
.
2
2
4x
x
x
2x2
2Y 2
+1|X =x
x
It is slightly easier to use (10.108) to calculate J(x), recognizing that the second derivative
in (10.108) is the first derivative of the score Vx (Y ).
Download