Uploaded by Srinivas Shetty

Solution Probability A Lively Introduction

advertisement
Instructor’s Solutions Manual
Version October 14, 20171
Probability, A Lively Introduction
Henk Tijms
Cambridge University Press, 2017
1
Despite careful checking, there is non-negligible probability that there are errors in
answers. If anything should be corrected, please let me know be sending an email to
h.c.tijms@gmail.com.
1
Chapter 1
1.1 Imagine a blue and a red die. Take as sample space the set of the
ordered pairs (b, r), where b is the number shown on the blue and r
is the number shown on the red die. Each of the 36 elements (b, r) is
equally likely. There are 2 × 3 × 3 = 18 elements for which exactly
one component is odd. Thus the probability that the sum of the two
1
dice is odd equals 18
36 = 2 . There are 3 × 3 = 9 elements for which
both components are odd. Thus the probability that the product of
9
the two numbers rolled is odd equals 36
= 14 . Alternatively, you can
obtain the probabilities by using as sample space the set consisting of
the four equiprobable elements (odd, odd), (odd, even), (even, even),
and (even, odd).
1.2 Label the plumbers as 1 and 2. Take as sample space the set of all
possible sequences of ones and twos to the length 3, where a one stands
for plumber 1 and a two for plumber 2. The sample space has 23 = 8
equiprobable outcomes. There are 2 outcomes with three ones or three
twos. The sought probability is 82 = 14 .
1.3 Label the 10 letters of “randomness” as 1 to 10. Take as sample space
the set of all permutations of the numbers 1 to 10. All 10! outcomes are
equally likely. There are 3 × 2 × 8! outcomes that begin and end with
a vowel and there are 8 × 3! × 7! outcomes in which the three vowels
are adjacent to each other. The probabilities are 3 × 2 × 8!/10! = 1/15
and 8 × 3! × 7!/10! = 1/15.
1.4 Take as sample space the set of all possible sequences of zeros and ones
to the length 4, where a zero stands for male gender and an one for
female gender. The sample space has 24 = 16 equiprobable outcomes.
There are 8 outcomes with exactly three zeros or exactly three ones
and 6 outcomes with exactly two zeros. Hence the probability of three
8
puppies of one gender and one of the other is 16
. The probability of
6
two puppies of each gender is 16 .
1.5 Take as sample space the set of all unordered
samples of m different
n
numbers. The sample space has m
equiprobable elements. There
n−1
are m−1
samples that contain the largest number. The probability
n
n−1
of getting the largest number is m−1
/ m = m
n . Alternatively, you
can take as sample space the set of all n! permutations of the integers
1 to n. There are m × (n − 1)! permutations for which the number n
2
is in one of the first m positions.
Note: More generally, the probability that the largest r numbers
n−r
n
are among the m numbers picked is given by both m−r
/ m
and
m
r r!(n − r)!/n!.
1.6 Take as sample space the set of all possible combinations
of two persons
6
who do the dishes. The sample space has 2 = 15 equally likely
outcomes. The number of outcomes consisting of two boys is 32 =
3
= 51 .Alternatively, using an ordered
3. The sought probability is 15
sample space consisting of all 6! possible ordering of the six people
and imagining that the first two people in the ordering have to do the
= 15
dishes, the sought probability can be calculated as 3×2×4!
6!
1.7 Imagine that the balls are labeled as 1, . . . , n. It is no restriction to
assume that the two winning balls have the labels 1 and 2. Take as
sample space the set of all n! permutations of 1, . . . , n. For any k,
the number of permutations having either 1 or 2 on the kth place is
(n − 1)! + (n − 1)!. Thus, the probability that the kth person picks a
winning ball is (n−1)!+(n−1)!
= n2 for each k.
n!
1.8 Take as sample space the set of all ordered pairs (i, j) : i, j = 1, . . . , 6,
where i is the number rolled by player A and j is the number rolled by
player B. The sample space has 36 equally likely outcomes. The number of winning outcomes for player B is 9 + 11 = 20. The probability
4
of player A winning is 16
36 = 9 .
1.9 Take as sample space the set of all unordered samples of six differ-
ent numbers from the numbers 1 to 42. The sample space has 42
6
equiprobable outcomes. There are 41
outcomes
with
the
number
10.
5
42
6
Thus the probability of getting the number 10 is 41
5 / 6 = 42 . The
probability
42that each of the six numbers picked is 20 or more is equal
23
to 6 / 6 = 0.0192. Alternatively, the probabilities can be calculated by using the sample space consisting of all ordered arrangement
of the numbers 1 to 42, where the numbers in the first six positions
6
are the lotto numbers. This leads the calculations (6 × 41!)/42! = 42
23
and ( 6 × 6! × 36!)/42! = 0.0192 for the sought probabilities.
1.10 Take as (unordered) sample space all possible combinations of two
candidates
to receive a cup of tea from the waiter. The sample space
5
has 2 = 10 equally likely outcomes. The number of combinations
of two people each getting the cup of tea they ordered is 1. The
3
1
. Alternatively, using an ordered sample space
sought probability is 10
consisting of all possible orderings of the five people and imagining
that the first two people in the ordering get a cup of tea from the
1
waiter, the probability can be calculated as 2×1×3!
= 10
.
5!
1.11 Label the nine socks as s1 , . . . , s9 . The probability model in which the
order of selection of the socks is considered relevant has a sample space
with 9 × 8 = 72 equiprobable outcomes (si , sj ). There are 4 × 5 = 20
outcomes for which the first sock chosen is black and the second is
white, and there are 5 × 4 = 20 outcomes for which the first sock is
white and the second is black. The sought probability is 40/72 = 5/9.
The probability model in which the order of selection
of the socks is
9
not considered relevant has a sample space with 2 = 36 equiprobable
outcomes. The number
of outcomes for which the socks have different
5
4
colors is 1 × 1 = 20, yielding the same value 20/36 = 5/9 for the
sought probability.
1.12 This problem can be solved by using either an ordered sample space or
an unordered sample space. Label the ten letters of the word Cincinnati as 1, 2, . . . , 10. As ordered sample space, take the set of all ordered pairs (i1 , i2 ), where i1 is the label of the first letter dropped and
i2 is the label of the second letter dropped. This sample space has
10 × 9 = 90 equally likely outcomes. Let A be the event that the two
letters dropped are the same. Noting that in the word Cincinnati the
letter c occurs two times and
occur three times,
i and n each
the letters
it follows that there are 22 × 2! + 32 × 2! + 32 × 2! = 14 outcomes
7
14
= 45
. An unordered sample
leading to the event A. Hence P (A) = 90
space can also be used . This sample space consists of all possible sets
of two differently labeled letters from the ten letters of Cincinnati.
This sample space has 10
2 = 45 equally likely outcomes. The number
of outcomes for which the two
the
labeled letters in the set represent
7
same letter is 22 + 32 + 32 = 7. This gives the same value 45
for the
probability that the two letters dropped are the same.
1.13 Take as sample space the set of all unordered pairs of two distinct
cards. The sample space has 52
There are
2 equally likely outcomes.
1
51
3
12
×
=
51
outcomes
with
the
ten
of
hearts,
and
×
1
1
1
1 = 36
outcomes with hearts and aten but not the ten of hearts. The sought
probability is (51 + 36)/ 52
2 = 0.0656.
1.14 Represent the words chance and choice by chanCe and choiCe. Take as
4
sample space the set of all possible pairs (l1 , l2 ), where l1 is an element
from the word chanCe and l2 is an element from the word choiCe.
By distinguishing between c and C, the sample space has 6 × 6 = 36
equally likely outcomes. The number of outcomes for which the two
chosen letters represent the same letter is 4 + 1 + 1 = 6. The sought
probability is 61 .
1.15 Take as sample space the set of all sequences (i1 , . . . , ik ), where ik is the
number shown on the kth roll of the die. Each element of the sample
space is equally likely. The explanation is that there is P
a one-to-one
10
correspondence between the elements (i1 ,P
. . . , ik ) with
k=1 ik = s
and the elements (7 − i1 , . . . , 7 − ik ) with 10
(7
−
i
)
=
70
− s.
k
k=1
1.16 Take as ordered sample space the set of all sequences (i1 , . . . , i12 ),
where ik is the number rolled by the kth die. The sample space has
612 equally likely outcomes. The number of outcomes
each
in which
12
10
8
6
4
number appears exactly two times is 2 × 2 × 2 × 2 × 2 =
12!/26 . The sought probability is 2612!
= 0.0034.
×612
1.17 Take as sample space the setof all possible
16 samples of three residents.
This leads to the value 41 41 41 / 12
3 = 55 for the sought probability.
1.18 Take as sample space the set of all ordered arrangements of 10 people,
where the people in the first five positions form group 1 and the other
five people form group 2. The sample space has 10! equally likely
elements. The number of elements for which your two friends and you
together are in the same group is 5 × 4 × 3 × 7! + 5 × 4 × 3 × 7!. The
1
sought probability is 120×7!
10! = 6 . Alternatively, the probability can be
(3)(7)+(3)(7)
calculated as 3 2 10 0 5 = 16 , using as sample space the set of all
(5)
possible combinations of five people for the first group. A third way
( 5) + ( 5)
to calculate the probability is 3 10 3 = 61 , using as sample space the
(3)
set of all possible combinations of three positions for the three friends.
1.19 Take as sample space the set of the 9! possible orderings of the nine
books. The subjects mathematics, physics and chemistry can be ordered in 3 × 2 × 1 = 6 ways and so the number of favorable orderings is
6 × 4! × 3! × 2!. The sought probability is (6 × 4! × 3! × 2!)/9! = 1/210.
1.20 The sample space is Ω = {(i, j, k) : i, j, k = 0, 1}, where the three
components corresponds to the outcomes of the three individual tosses
of the three friends. Here 0 means heads and 1 means tails. Each
5
element of the sample space gets assigned a probability of 81 . Let A
denote the event that one of the three friends pays for all the three
tickets. The set A is given by A = Ω\{(0, 0, 0), (1, 1, 1)} and consists
of six elements. The sought probability is P (A) = 68 .
1.21 Label the eleven letters of the word Mississippi as 1, 2, . . . , 11 and take
as sample space the set of the 1111 possible ordered sequences of eleven
numbers from 1, . . . , 11. The four positions for a number representing
i, the four positions for a number representing s, the two positions
for a number representing p, and the
for the number
one
position
7
3
representing m can be chosen in 11
×
×
ways.
Therefore the
4
4
2
number of outcomes
in
which
all
letters
of
the
word
Mississippi
are
11
7
3
4
4
2
represented is 4 × 4 × 2 × 4 × 4 × 2 . Dividing this number by
1111 gives the value 0.0318 for the sought probability.
1.22 One pair is a hand with the pattern aabcd, where a, b, c and d are
from distinct kinds of cards. There are 13 kinds and four of each kind
in a standard deck of 52 cards. The probability of getting one pair is
13
1
4
2
12
3
52
5
43
1
= 0.4226.
Two pair is a hand with the pattern aabbc, where a, b and c are from
distinct kinds of cards. The probability of getting two pair is
13 4 4 11 4
2
2
2
1
52
5
1
= 0.0475.
1.23 Take as sample space the set of all possible combinations of two apartments from the 56 apartments. These two apartment
represent the
56
vacant apartments. The sample space has 2 equiprobable elements.
The number of elements with no vacant
56apartment
on
the top floor is
48
48
56
.
Thus
the
sought
probability
is
−
/
2
2
2
2 = 0.2675. Alternatively, using a sample space made up of all permutations of the
=
56 apartments, the probability can be calculated as 1 − 48×47×54!
56!
0.2675.
1.24 Imagine that the balls are labeled as 1 to 11, where the white balls get
the labels 1 to 7 and the red balls the labels 8 to 11. Take as sample
space is the set of all possible permutations of 1, 2, . . . , 11. The number
of outcomes in which a red ball appears for the first time at the ith
6
7
drawing is i−1
× (i − 1)! × 4 × (7 − (i − 1) + 3)! for 1 ≤ i ≤ 8. The
sought probability is
4 7
13
1 X
× (2k − 1)! × 4 × (7 − (2k − 1) + 3)! = .
11!
33
2k − 1
k=1
1.25 Take as sample space the set of all ordered pairs (i, j), where i is
the first number picked and j is the second number picked. There
are n2 equiprobable outcomes. For r ≤ n + 1, the r − 1 outcomes
(1, r − 1), (2, r − 2), . . . , (r − 1, 1) are the only outcomes (i, j) for which
i + j = r. Thus the probability that the sum of the two numbers
picked is r is r−1
for 2 ≤ r ≤ n + 1. Therefore the probability of
n2
getting a sum s when rolling two dice is s−1
36 for 2 ≤ s ≤ 7. By a
symmetry argument, the probability of getting a sum s is the same as
the probability of getting a sum 14 − s for 7 ≤ s ≤ 12 (opposite faces
of a die always total 7). Thus the probability of rolling a sum s has
for 7 ≤ s ≤ 12.
the value 14−s−1
36
1.26 Take as sample space the interval (0, 1). The outcome x means that
the stick is broken on the point x. The length of the longer piece is
at least three times the length of the shorter piece if x ∈ (0, 14 ) or
x ∈ ( 43 , 1). The sought probability is 41 + 41 = 12 .
1.27 Take as sample space the square {(x, y) : 0 ≤ x, y ≤ a}. The outcome
(x, y) refers to the position of the middle point of the coin. The sought
probability is given by the probability that a randomly chosen point
in the square falls in the subset {(x, y) : d2 ≤ x, y ≤ a − d2 } and is equal
to (a − d)2 /a2 .
1.28 Take as sample space the interval (0, 1). The outcome x means that a
randomly chosen point in (0, 1) is equal to x. The sought probability is
the probability that a randomly chosen point in (0, 1) falls into one of
1
7
1
1
the intervals (0, 12
) or ( 21 , 12
). The sought probability is 12
+ 12
= 16 .
1.29 This problem can be solved with the model of picking at random a
point inside a rectangle. The rectangle R = {(x, y) : 0 ≤ x ≤ 1, 21 ≤
y ≤ 1} is taken as sample space, where the outcome (x, y) means that
you arrive 60x minutes past 7 a.m. and your friend arrives 60y minutes
past 7 a.m. The probability assigned to each subset of the sample space
is the area of the subset divided by the area of the rectangle R. The
sought probability is P (A), where the set A is the union of the three
7
1
1
}, {(x, y) : 12 + 12
< x, y < 34 }
disjoint subsets {(x, y) : 12 < x, y < 12 + 12
3
and {(x, y) : 4 < x, y < 1}. This gives
P (A) =
1 1
7
1
1 1 1 1
= .
×
+ × + ×
2 12 12 6 6 4 4
36
1.30 Translate the problem into choosing a point at random inside the unit
square {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1}. The probability that the
two persons will meet within 10 minutes of each other is given by the
probability that a point chosen at random in the unit square will fall
inside the shaded region in the figure. The area of the shaded region
is calculated as 1 − 56 × 65 = 0.3056. This gives the desired probability.
5/6
1/6
1/6
5/6
1.31 For q = 1, take the square {(x, y) : −1 < x, y < 1} as sample
space. The sought probability is the probability that a point (x, y)
chosen at random in the square satisfies y ≤ 41 x2 and is equal to
R1 1 2
1
(2
+
4
−1 4 x dx) = 0.5417. For the general case, the sought probability is (make a picture!):
Z q
1 2
1 2 1
q
x dx = +
for 0 < q < 4
2q
+
4q 2
4
2
24
−q
Z 2√q
2
1 2
1 2
√ 2q
+
x
dx
+
2(q
−
2
q)q = 1 − √
for q ≥ 4.
2
√
4q
3 q
−2 q 4
1.32 Take as sample space the set {(x, y) : 0 ≤ x, y ≤ 1}. The outcome
(x, y) means that a randomly chosen point in the unit square is equal to
(x, y). The probability that the Manhattan distance from a randomly
chosen point to the point (0, 0) is no more than a is given by the
probability that the randomly chosen point (x, y) satisfies x + y ≤ a.
The area of the region {(x, y) : 0 ≤ x, y ≤ 1 and x + y ≤ a} is 21 a2 .
This gives the sought probability. By a symmetry argument, this
8
probability also applies to the case that the point is randomly chosen
in the square {(x, y) : −1 ≤ x, y ≤ 1}.
1.33 Take as sample space the set S = {(y, θ) : 0 ≤ y ≤ 21 D, 0 ≤ θ ≤
π
2 }, where y is the distance from the midpoint of the diagonal of the
rectangular card to the closest line on the floor and the angle θ is as
described in the figure. It is no restriction to assume that b ≥ a.
a
a
b
b
α
yθ
θ
α
θ
Using the figure, it is seen that the card will intersect one of the
lines on the floor if and only if the distance
y is less than yθ , where
√
1
2
yθ is determined by sin(α + θ) = yθ /( 2 a + b2 ). Since sin(α + θ) =
sin(α)cos(θ)+cos(α)sin(θ) with sin(α) = √a2a+b2 and cos(α) = √a2b+b2 ,
it follows that yθ = 21 acos(θ) + bsin(θ) . The sought probability is
the area under the curve y = 21 acos(θ) + bsin(θ) divided by the area
of the set S and so it is equal to
Z π/2
1
1
2(a + b)
acos(θ) + bsin(θ) dθ =
.
(1/4)πD 0
2
πD
1.34 The perpendicular distance from the randomly chosen point is larger
than d if and only if the point falls inside the shaded region of the
triangle in the left figure. Using the fact that the base of the shaded
triangle is h−d
h b (the ratio of this base and b equals (h−d)/h), it follows
that the first probability is given by
h
d
b
1
2 [(h
− d) × (h − d)b/h]
(h − d)2
.
=
1
h2
2h × b
9
The randomly chosen point and the base of the triangle form an obtuse
triangle if and only if the randomly chosen point falls inside the shaded
region in the right figure. The area of the shaded region is the sum
of two areas of an equilateral triangle with side lengths b/2 plus the
area of one sixth of a circle with radius b/2. √
The area of an equilateral
triangle with sides of length a is given by 14 3a2 . It now follows that
the second probability is given by
√
√
(1/4) 3(b/2)2 + (1/6)π(b/2)2 + (1/4) 3(b/2)2
π
1
√
= + √ = 0.8023.
2
2
(1/4) 3b
6 3
1.35 Take as sample space the unit square {(x, y) : 0 ≤ x, y ≤ 1}. The
side lengths v = x, w = y × (1 − v) and 1 − v − w should satisfy the
conditions v + w > 1 − v − w, v + 1 − v − w > w and w + 1 − v − w > v.
1
1
These conditions can be translated into y > 1−2x
2−2x , y < 2−2x and x < 2 .
The sought probability is given by the area
shaded
region in the
R 0.5of the
R 0.5
1
1−2x
first part of the figure and is equal to 0 2−2x
dx − 0 2−2x
dx =
ln(2) − 0.5. To find the second probability, let v be the first random
1
1
0.5
0.5
0
0.5
1
0
0.5
1
breakpoint chosen on the stick and w be the other breakpoint. The
point (v, w) can be represented by v = x and w = y × (1 − v) if v < 21
and by v = x and w = y ×v if v > 12 , where (x, y) is a randomly chosen
point in the unit square. The second probability is the area of shaded
region in the second part of the figure and is equal to 2 ln(2) − 0.5 .
1.36 The problem can be translated into choosing a random point in the
unit square. The sample space is {(x, y) : 0 ≤ x, y ≤ 1}. For any point
(x, y) in the sample space, distinguish between the cases x > y and
x < y ( the probability that a randomly chosen point (x, y) satisfies
x = y is zero). Consider first the case of x > y. Then, the three side
lengths are y, x − y and 1 − x. Three side lengths a, b and c form a
10
triangle if and only if
a + b > c,
a + c > b and
b + c > a.
Hence the lengths y, y − x and 1 − y must satisfy the three conditions
y + x − y > 1 − x,
y+1−x>x−y
and
x − y + 1 − x > y.
These three conditions can be rewritten as x > 21 , y > x − 12 and
y < 12 . The set of all points (x, y) satisfying these three conditions is
the down-right shaded region in the figure. Next consider the case of
x < y. Then the three side lengths are x, y − x, and 1 − y. Then (x, y)
must satisfy the three conditions as y > 12 , x > y − 21 and x < 21 . The
set of all points (x, y) satisfying these three conditions is the top-left
shaded region in the figure. The area of the two shaded regions is
1
1
1
8 + 8 . Hence the desired probability is 4 .
1
0.5
0
0.5
1
1.37 The unique chord having the randomly chosen point P as its midpoint
is the chord that is perpendicular to the line connecting the point P
to the center O of the circle, see the figure. A little geometry tells us
P
O
that the chord is longer than the side of the equilateral triangle if and
only if the point P falls inside the shaded inner circle in the figure.
2
= 14 .
Thus the sought probability is π(r/2)
πr 2
11
1.38 This is a compound experiment that consists of three subexperiments.
Take as sample space the set Ω = {(i, j, k) : i, j, k = 0, 1}, where i=1
(0) if player A beats (is beaten by) player B, the component j=1 (0) if
player A beats (is beaten by) player C, and the component k=1 (0) if
player B beats (is beaten by) player C. The probabilities pi,j,k assigned
to the eight outcomes (i, j, k) are p0,0,0 = 0.5×0.3×0.6 = 0.09, p1,0,0 =
0.5 × 0.3 × 0.6 = 0.09, p0,1,0 = 0.5 × 0.7 × 0.6 = 0.21, p0,0,1 = 0.5 × 0.3 ×
0.4 = 0.06, p1,1,0 = 0.5×0.7×0.6 = 0.21, p1,0,1 = 0.5×0.3×0.4 = 0.06,
p0,1,1 = 0.5 × 0.7 × 0.4 = 0.14, and p1,1,1 = 0.5 × 0.7 × 0.4 = 0.14.
Denote by E the event that player A wins at least as many games as
any other player, then E = {(0, 1, 0), (1, 1, 0), (1, 0, 1), (1, 1, 1)}. Thus,
the desired probability P (E) = 0.21 + 0.21 + 0.06 + 0.14 = 0.62.
1.39 Take as sample the set of all four-tuples (δ1 , δ2 , δ3 , δ4 ), where δi = 0 if
component i has failed and δi = 1 otherwise. The probability r1 r2 r3 r4
is assigned to (δ1 , δ2 , δ3 , δ4 ), where ri = fi if δi = 0 and ri = 1 − fi
if δi = 1. Let A0 be the event that the system fails, A1 be the event
that none of the four components fails and Ai be the event that only
component i fails for i = 2, 3 and 4. Then, P (A1 ) = (1 − f1 )(1 −
f2 )(1 − f3 )(1 − f4 ) and P (A2 ) = (1 − f1 )f2 (1 − f3 )(1 − f4 ), P (A3 ) =
(1 − f1 )(1 − f2 )f3 (1 − f4 ), and P (A4 ) = (1 − f1 )(1 − f2 )(1 − f3 )f4 . The
events Ak are mutually exclusive and their P
union is the sample space.
Hence the sought probability P (A0 ) is 1 − 4i=1 P (Ai ).
1.40 Proceeding along the same lines as in Example 1.12, the probability
P
4
7 2k−1
7
that Bill is the first person to pick a red ball is ∞
= 18
.
k=1 11 11
1.41 Take as sample space the set {(1, s), (2, s), . . .} ∪ {(1, e), (2, e), . . .} and
assign to the outcomes (i, s) and (i, e) the probabilities (1 − a7 −
6
5
a8 )i−1 a7 and (1 − a7 − a8 )i−1 a8 , where a7 = 36
and P
a8 = 36
. The
∞
probability of getting a total of 8 before a total of 7 is i=1 (1 − a7 −
5
a8 )i−1 a8 = a8 /(a7 + a8 ) = 11
.
1.41 Take as sample space the set {(1, s), (2, s), . . .} ∪ {(1, e), (2, e), . . .} and
assign to the outcomes (i, s) and (i, e) the probabilities (1 − a7 −
6
5
a8 )i−1 a7 and (1 − a7 − a8 )i−1 a8 , where a7 = 36
and a8 = 36
. The
probability of getting a total of 8 before a total of 7 is
∞
X
i=1
(1 − a7 − a8 )i−1 a8 =
5
a8
= .
a7 + a8
11
12
1.42 Using the same reasoning as in Example 1.12, the probability that
desperado A will be the one to shoot himself dead is
∞ X
5 3n 1
n=0
The probabilities are
30
91
6
6
=
36
.
91
for desperado B and
25
91
for desperado C.
1.43 Take as sample space the set {(s1 , s2 ) : 2 ≤ s1 , s2 ≤ 12}, where
s1 and s2 are the sums rolled by the two persons. The probability
p(s1 , s2 ) = p(s1 ) × p(s2 ) is assigned to the outcome (s1 , s2 ), where
p(s) is the probability of getting the sum s in a roll of two dice. The
2
1
, p(3) = p(11) = 36
,
probabilities p(s) are given by p(2) = p(12) = 36
3
4
5
6
p(4) = p(10) = 36 , p(5) = p(9) = 36 , p(6) = p(8) = 36 , and p(7) = 36 .
The probability that the sums rolled are different is
X
s1 6=s2
p(s1 , s2 ) = 1 −
12
X
s=2
p(s)
2
=
575
.
648
1.44 (a) Let the set C = B\A consists of those outcomes that belong to B
but do not belong to A. The sets A and C are disjoint and B = A ∪ C.
Then, by Axiom 1.3 (in fact, Rule 1.1 in Section 1.3 should be used),
P (B) = P (A ∪ C) = P (A) + P (C). By Axiom 1.1, P (C) ≥ 0 and so
we get the desired result P (B) ≥ P (A).
S
(b) We canSdefine pairwise disjoint sets B1 , B2 , . . . such that ∞
k=1 Ak
∞
is equal to k=1 Bk . Let B1 = A1 and let B2 = A2 \A1 . In general, let
Bk = Ak \(A1 ∪ · · · ∪ Ak−1 )
for k = 2, 3, . . . .
By induction, B1 ∪ · · · ∪ Bk = A1 ∪ · · · ∪ AS
1 Also, the sets
k for any k ≥
S∞
B1 , . . . , Bk are pairwise disjoint. Hence ∞
B
=
k=1 k
k=1 Ak and the
sets B1 , B2 , . . . are pairwise disjoint. Using Axiom 1.3, it now follows
that
∞
∞
∞
X
[
[
P (Bk ).
Bk =
Ak = P
P
k=1
k=1
k=1
Since Bk ⊆ Ak , we have P (Bk ) ≤ P (Ak ) and so the desired result
follows.
T
1.45 The sought probability is at least as large as 1−P ( ∞
n=1 Bn ). We have
1 r n
P (Bn ) = 1 −
for any n ≥ 1.
2
13
By the continuity
T∞property of probability, P (
and so 1 − P ( n=1 Bn ) = 1.
T∞
n=1 Bn )
= limn→∞ P (Bn )
1.46 It is intuitively clear that the probability is equal to 0.5. This can be
proved as follows. Define A (B) as the event that you see at least 10
consecutive tails (heads) before you see 10 consecutive heads (tails)
for the first time if you toss a fair coin indefinitely often. Using the
result of Problem 1.45, it follows that P (A ∪ B) = 1. The events A
and B are mutually exclusive and satisfy P (A) = P (B). This proves
that P (A) = P (B) = 0.5.
1.47 Let A be the event that a second-hand car is bought and B be the
event that a Japanese car is bought. Noting that P (A ∪ B) = 1 − 0.55,
it follows from P (A ∪ B) = P (A) + P (B) − P (AB) that P (AB) =
0.25 + 0.30 − 0.45 = 0.10.
1.48 Let A be the event that a randomly chosen household is subscribed
to the morning newspaper and B be the event that a randomly chosen household is subscribed to the afternoon newspaper. The sought
probability P (A∩B) = P (A)+P (B)−P (A∪B) is 0.5+0.7−0.8 = 0.4.
1.49 Let A be the event that the truck is used on a given day and B be
the event that the van is used on a given day. Then, P (A) = 0.75,
P (AB) = 0.30 and P (Ac B c ) = 0.10. By De Morgan’s first law, P (A ∪
B) = 1 − P (Ac B c ). By P (A ∪ B) = P (A) + P (B) − P (AB), the
probability that the van is used on a given day is P (B) = 0.90 −
0.75 + 0.30 = 0.45. Since P (Ac B) + P (AB) = P (B), the probability
that only the van is used on a given day is P (Ac B) = 0.45−0.30 = 0.15.
1.50 Since B ⊆ A ∪ B, it follows that P (B) ≤ P (A ∪ B) = 32 . Using the
relation P (A ∪ B) = P (A) + P (B) − P (AB), it follows that P (B) ≥
1
1
P (A ∪ B) − P (A) = 34 − 32 = 12
. Hence, 12
≤ P (B) ≤ 32 .
1.51 The probability that exactly one
of the events A and B will occur is
given by P (A ∩ B c ) ∪ (B ∩ Ac ) = P (A ∩ B c ) + P (B ∩ Ac ). Next note
that P (A∩B c ) = P (A)−P (A∩B) and P (B ∩Ac ) = P (B)−P (A∩B).
Thus the probability of exactly one of the events A and B occurring
is
P (A) + P (B) − 2P (AB).
Note: Similarly, we find a formula for the probability of exactly one
of the events A, B, and C occurring. This probability is equal to
14
P (A ∩ B c ∩ C c ) + P (B ∩ Ac ∩ C c ) + P (C ∩ Ac ∩ B c ). The first term
P (A ∩ B c ∩ C c ) can be evaluated as P (A) − [P (A ∩ B) + P (A ∩ C)] +
P (A ∩ B ∩ C). In the same way, the other two terms can be evaluated.
Thus the formula for the probability of exactly one of the events A,
B, and C occurring is
P (A) + P (B) + P (C) − 2P (AB) − 2P (AC) − 2P (BC) + 3P (ABC).
A general formula for the probability that exactly r of the events
A1 , . . . , An will occur is
n−r
X
X
r+k
P (Aj1 · · · Ajr+k ).
(−1)k
r
j1 <···<jr+k
k=0
As an illustration, let us determine the probability of getting exactly d
different face values when rolling a fair die n times, see also Example
10.6 in the book for another approach. Defining Ai as the event that
face value i does not appear in n rolls of the die, we get that the
desired probability is given by
n
min(n−(6−d), d)
X
d−k
6
k 6−d+k
(−1)
for d = 1, . . . , 6.
6−d+k
6−d
6n
k=0
If n = 6, this probability has the values 1.28 × 10−4 , 0.0199, 0.2315,
0.5015, 0.2315, and 0.0154 for d = 1, . . . , 6.
1.52 In Problem 1.44, the upper bound has already been established. By
P (A ∪ B) = P (A) + P (B) − P (AB), the lower bound is true for
n = 2. Suppose the lower
S bound is verified for n = 2, . . . , k. Then,
for n = k + 1, let A = ki=1 Ai and B = Ak+1 . Using the induction
hypothesis, we get
P
k+1
[
≥
i=1
k
X
i=1
k
X
i=1
=
Ai = P (A) + P (B) − P (AB)
P (Ai ) −
P (Ai ) −
k+1
X
i=1
k−1 X
k
X
i=1 j=i+1
k−1
X
k
X
i=1 j=i+1
P (Ai ) −
k
X
P (Ai Aj ) + P (Ak+1 ) − P
P (Ai Aj ) + P (Ak+1 ) −
k
X
i=1 j=i+1
P (Ai Aj ),
k
X
i=1
k
[
(Ai Ak+1 )
i=1
P (Ai Ak+1 )
15
as was to be verified.
1.53 In this “birthday” problem, the sought probability is
1−
250 × 249 × · · · × 221
= 0.8368.
25030
1.54 This problem is the birthday problem with m equally likely birthdays
and n people. Using the complement rule, the probability that at
least one of the outcomes O1 , . . . , Om will occur two or more times in
n trials is
(m − 1) · · · (m − n + 1)
n − 1
1
1−
··· 1 −
.
=1− 1−
n
m
m
m
This probability can be approximated by 1 − e−1/m · · · e−(n−1)/m =
1
1
1 − e− 2 n(n−1)/m for m large. Solving n from 1 − e− 2 n(n−1)/m = 0.5
is equivalent to solving n from the quadratic equation − 12 n(n − 1) =
m ln(0.5). This yields
√
n ≈ 1.177 m + 0.5.
Using the complement rule,
the probability that the outcome O1 occurs
n
≈ 1 − e−n/m for m large. Solving n from
at least once is 1 − (m−1)
mn
1 − e−n/m = 0.5 gives
n ≈ 0.6931m.
1.55 This problem is a variant of the birthday problem. One
can choose
two distinct numbers from the numbers 1, 2, . . . , 25 in 25
2 = 300 ways.
The desired probability is given by
300 × 299 × · · · × 276
= 0.6424.
30025
1.56 This problem is a birthday problem with m = 49
6 equally likely birthdays and n = 3,016 people. Using the solution of Problem 1.54, the
probability that in 3,016 drawings some combination of six numbers
1
will appear more than once is about 1 − e− 2 n(n−1)/m = 0.2776. This
approximate value agrees with the exact value in all four decimals.
1−
1.57 The translation step to the birthday problem is to imagine that each of
the n = 500 Oldsmobile cars gets assigned a “birthday” chosen at random from m = 2,400,000 possible “birthdays”. Using the approximate
formula in Problem 1.54, the probability that at least one subscriber
1
gets two or more cars can be calculated as 1 − e− 2 n(n−1)/m = 0.051.
16
1.58 Imagine that the balls are numbered from 1 to 20. Using the complement rule,we find that the sought probability is
1−
20 × 10 × 18 × 9 × 16 × 8
= 0.8514.
20 × 19 × 18 × 17 × 16 × 15
1.59 (a) The sought probability is
1−
10,000 × 9,999 × · · · × 9,990
= 0.005487.
10,00011
(b) The sought probability is
1 − (1 − 0.005487)300 = 0.8081.
1.60 Using the complement rule, this variant of the birthday problem has
as solution
450 × 435 × 420 × 405 × 390 × 375 × 360 × 345 × 330 × 315
45010
= 0.8154.
1−
1.61 Let A1 be the event that the card of your first favorite team is not
obtained and A2 be the event that the card of your second favorite
team is not obtained. The sought probability is 1 − P (A1 ∪ A2 ). Since
9 5
8 5
9 5
+ 10
− 10
, the sought probability is 1 −
P (A1 ∪ A2 ) = 10
0.8533 = 0.1467.
1.62 (a) Let A be the event that you get at least one ace. It is easier to
compute the probability of the complementary event Ac that you get
no ace in a poker hand of five cards. For the sample space of the
chance experiment, we take all ordered five-tuples (x1 , x2 , x3 , x4 , x5 ),
where xi corresponds to the suit and value of the ith card you get dealt.
The total number of possible outcomes equals 52 × 51 × 50 × 49 × 48.
The number of outcomes without ace equals 48 × 47 × 46 × 45 × 44.
Assuming that the cards are randomly dealt, all possible outcomes are
equally likely. Then, the event Ac has the probability
P (Ac ) =
48 × 47 × 46 × 45 × 44
= 0.6588.
52 × 51 × 50 × 49 × 48
Hence, the probability of getting at least one ace in a poker hand of five
cards is 1 − P (Ac ) = 0.3412. Another possible choice for the sample
17
space consists of the collection of all unordered
52 sets of five distinct
48
cards, resulting in the probability 1 − 5 / 5 = 0.3412.
(b) It is easiest to take as sample space the collection of
all unordered
sets of five distinct cards. The sample space has 52
5 equally likely
elements. Let Ai be the event that the five cards ofthe poker hand
are from suit i for i = 1, . . . , 4. Each set Ai has 13
5 elements. The
events A1 , . . . , A4 are P
mutually exclusive and so the desired probability
52
4
13
P (A1 ∪ · · · ∪ A4 ) =
=
i=1 P (Ai ). For each i, P (Ai ) =
5 / 5
4.95 × 10−4 . Hence the desired probability is 0.00198.
1.63 It is easiest to compute the complementary probability that more than
5 rolls are needed to obtain at least one five and at least one six. This
probability is given by P (A ∪ B), where A is the event that no five is
obtained in 5 rolls and B is the event that no six is obtained in 5 rolls.
We have P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = ( 56 )5 + ( 65 )5 − ( 64 )5 .
Therefore the sought probability is given by 1 − 2( 65 )5 + ( 64 )5 = 0.4619.
Note: The probability that exactly r rolls are needed to obtain at least
one five and at least one six is given by Qr−1 − Qr , where Qn is defined
as the probability that more than n rolls are needed to obtain at least
one five and at least one six. We have Qn = ( 65 )n + ( 56 )n − ( 64 )n .
1.64 The sample space is given by the set {(i, j, k) : i, j, k = 1, 2, . . . , 6}.
1
Each element gets assigned a probability of 216
. It is easiest to compute
the complementary probability P (A), where A is the event that none of
the three rolls gives your number chosen. The set A contains 5×5×5 =
125 elements and so P (A) = 125
216 . Hence the desired probability is
1 − 125
=
0.4213.
216
S
1.65 The sought probability is given by P ( 253
i=1 Ai ). For any i, the prob23 =
ability P (Ai ) is equal to 365 × 1 × 364 × 363 × · · · × 344/365
S253
0.0014365. The events Ai are mutually exclusive and so P ( i=1 Ai ) =
P
253
i=1 P (Ai ). Therefore the probability that in a class 23 children exactly two children have a same birthday is equal to 253 × 0.0014365 =
0.3634.
1.66 Take as sample space the set of all sequences (i1 , . . . , i6 ), where ik is
the number shown by the kth roll. The sample space has 66 equally
likely elements. Let A be the event that all six face values appear and
B be the event that the largest number rolled is r. The set A has 6!
elements. Noting that the number of elements for which the largest
number rolled does not exceed k equals k 6 , we have that the set B has
18
r6 −(r−1)6 elements. The sought probabilities are P (A) =
and P (B) =
r 6 −(r−1)6
66
6!
66
= 0.0154
.
1.67 Take as sample space the set of all ordered sequences of the possible
destinations of the five winners. The probability that at least one of
the destinations A and B will be chosen is 1 − 315 . Let Ak be the
event that none of the winners has chosen the kth destination. The
5
5
5
31
.
probability P (A1 ∪ A2 ∪ A3 ) equals 235 + 325 + 235 − 315 − 315 − 315 = 81
1.68 Take as sample space the set of all possible combinations of sixdifferent
numbers from the numbers 1 to 42. The sample space has 42
6 equally
likely elements. Let A be the event that none of the numbers 7, 14,
21, 28, 35, and 42 is drawn and B be the event that exactly one of
42
6
these numbers is drawn. Then, P (A) = 36
6 / 6 and P (B) =
1 ×
42
36
/
.
The
events
A
and
B
are
disjoint.
Using
the
complement
5
6
rule, the sought probability is 1 − P (A) − P (B) = 0.1975.
1.69 Let A be the event that there is no ace among the five cards and B
be the event that there is neither a king nor a queen among the five
cards. The sought probability is
48
44
40
5 + 5 − 5
1 − P (A ∪ B) = 1 −
= 0.1765.
52
5
1.70 Take as sample space the set {(i, j) : 1 ≤ i, j ≤ 6}, where i is the
number rolled by John and j is the number rolled by Paul. Each
element (i, j) of the sample space gets assigned the probability p(i, j) =
1
36 . Let A be the event that John gets a larger number than Paul and
B the event that the product of the numbers rolled by John and Paul
is odd. The desired probability is P (A ∪ B) = P (A) + P (B) − P (AB).
This gives that the probability of John winning is P (A ∪ B) = 15
36 +
P6 Pi−1
3
1
6
, using the fact that P (A) = i=2 j=1 p(i, j) = 15/36,
36 − 36 =
P2 P
6
3
P (B) = i odd j odd p(i, j) = 36
and P (AB) = 36
.
1.71 Take as sample space the set of all 40! possible orderings of the aces
and the cards 2 through 10 (the other cards are not relevant). The first
1
4×3×38!
1
= 130
.
probability is 4×39!
40! = 10 and the second probability is
40!
1.72 The sample space consists of the elements O1 , O12 O1 , O12 O12 O1 , . . .,
where O1 occurs if the first trial gives outcome O1 , O12 O1 occurs if the
first trial gives neither of the outcomes O1 and O2 and the second trial
19
gives the outcome O1 , etc. The probabilities p1 , (1 − p1 − p2 )p1 , (1 −
p1 −p2 )2 p1 , . . . are assigned to the elements O1 , O12 O1 , O12 O12 O1 , . . ..
The first probability is
∞
X
(1 − p1 − p2 )n−1 p1 =
n=1
p1
.
p1 + p2
To get the second probability,
note that the probability of the event
n−1 n−r r−1 k
An,k is given by r−1 k p1 p2 (1−p1 −p2 )n−r−k p1 . Since the events
An,k are mutually exclusive, it now follows that the second probability
is given by
∞ s−1 X
X
n−1 n−r r k
p1 p2 (1 − p1 − p2 )n−r−k .
r−1
k
k=0 n=r+k
This sum can be rewritten as
s−1
X
k=0
∞
pr1 pk2 X
(l + 1) · · · (l + r + k − 1)(1 − p1 − p2 )l .
(r − 1)! k!
l=0
P∞
l
m+1 , the
Using the identity
l=0 (l + 1) · · · (l + m) x = m!/(1 − x)
desired result next follows.
1.73 (a) The number of permutations in which
the particular number r
n−1
belongs to a cycle of length k is k−1 (k − 1)!(n − k)!. The sought
probability is
n−1
1
(k − 1)!(n − k)!)/n! = .
k−1
n
(b) For fixed r, s with r 6= s, let Ak be the event that r and s belong
to a same cycle with length k. The sought probability is
n 1
1 X n−2
(k − 1)!(n − k)! .
P (A2 ∪ · · · ∪ An ) =
k−2
n!
2
k=2
1.74 It is no restriction to assume that the 2n prisoners have agreed that
the ith prisoner goes to the ith box for i = 1, 2, . . . , 2n. The order in
which the names of the prisoners show up in the boxes is a random
permutation of the integers 1, 2, . . . , 2n. Each prisoner finds his own
name after inspecting up to a maximum of n boxes if and only if
each cycle of the random permutation has length n or less. In other
20
words, the probability that all prisoners will be released is equal to 1 −
P (An+1 ∪ · · · ∪ A2n ), where Ak is the event that a random permutation
of 1, 2, . . . , 2n contains a cycle of length k. A crucial observation is
that, for any k > n, any random permutation of the integers i =
1, 2, . . . , 2n has at most one cycle of length k. Hence P (Ak ) = 2n
k (k −
1)!(2n−k)!/(2n)! = k1 . Further, the events An+1 , . . . , A2n are mutually
P
exclusive and so P (An+1 ∪ · · · ∪ A2n ) = 2n
k=n+1 P (Ak ). Hence the
sought probability is
1−
1
1
1
−
− ··· −
.
n+1 n+2
2n
This probability is about 1 − ln(2) = 0.3069 for n large enough. The
exact value is 0.3118 for n = 50.
1.75 In line with the strategy outlined in Problem 1.74, the person with
the task of finding the car first opens door 1. This person next opens
door 2 if the car key is behind door 1 and next opens door 3 if the
goat is behind door 1. The person with the task of finding the car key
first opens door 2. This person next opens door 1 if the car is behind
door 2 and next opens door 3 if the goat is behind door 2. Under this
strategy the probability of winning the car is 23 : the four arrangements
(car, key, goat), (car, goat, key), (key, car, goat) and (goat, key, care)
are winning, whereas the two arrangements (key, goat, car) and (goat,
car, key) are losing.
1.76 Some reflection shows that the game cannot take more than 15 spins
(it takes 15 spins if and only if the spins 1, . . . , 4, 6, . . . , 9 and 11, . . . , 14
result in “odds,” while the spins 5, 10 and 15 result in “heads”). Let
Ai be the event that the spinner wins on the ith toss for i = 3, . . . , 15.
The events Ai are mutually exclusive. The win probability is given by
15
X
i=3
P (Ai ) =
1
3
3
5
15
18
19
+ 7 + 7 + 8 + 10 + 11 + 12
6
2
2
2
2
2
2
2
15
5
3
3
1
9
+ 14 + 14 + 15 + 17 + 18 = 0.11364.
12
2
2
2
2
2
2
The game is unfavorable to the bettor.
+
1.77 The probability
identifying five or more wines can be calP of−1correctly
k /k! = 0.00366 when the person is not a connoisculated as 10
e
1
k=5
seur and just guesses the names of the wines. This small probability
is a strong indication that the person is a connoisseur.
21
1.78 This problem is a variant of the hat-check problem. Let A be the
event that exactly one student receives his own paper and Bi be the
event that the ith student is the only student who gets back his own
P
(−1)k
≈ e−1 .
paper. Then, by results in Example 1.19, P (A) = 14
k=0 Pk!
15
Since A = ∪15
i=1 Ei and the events Ei are disjoint, P (A) =
i=1 P (Ei ).
For reasons of symmetry P (Ei ) = P (E1 ) for all i. Thus the sought
1 −1
e .
probability is about 15
1.79 Let Ai be the event that the ith person gets both the correct coat and
the correct umbrella. The sought probability is
5
X
k+1 5 (5 − k)!(5 − k)!
(−1)
P (A1 ∪A2 ∪A3 ∪A4 ∪A5 ) =
= 0.1775.
k
5!5!
k=1
1.80 Label the three Italian wines as 1, 2, and 3. Let Ai be the event that
the Italian wine with label i is correctly
guessed.
The sought probabil
ity is 1−P (A1 ∪A2 ∪A3 ) = 1− 31 P (A1 )− 32 P (A1 A2 )+P (A1 A2 A3 ) .
8!
7!
9!
, P (A1 A2 ) = 10!
, and P (A1 A2 A3 ) = 10!
. The
We have P (A1 ) = 10!
probability that none of the three Italian wines is correctly guessed is
8!
7! 9!
= 0.7319.
−3×
+
1− 3×
10!
10! 10!
1.81 Let Ai be the event that the three choices of five distinct numbers have
number i in common. The sought probability is
5
25
X
[
25
(−1)k+1
P (A1 · · · Ak )
Ak ) =
P(
k
k=1
k=1
25−k3
5
X
5−k
k+1 25
(−1)
=
= 0.1891.
25 3
k
k=1
5
1.82 Let Ai be the event that all four cards of kind i are contained in the
hand of 13 cards. The desired probability is
3
X
k+1 13
P (A1 · · · Ak ).
(−1)
P (A1 ∪ · · · ∪ A13 ) =
k
k=1
This probability can be evaluated as
48
44
13
13
13
9
5
× 52 −
× 52 +
×
1
2
3
13
13
40
1
52
13
= 0.0342.
22
1.83 Let Ai be the event that the player’s hand does not contain any card
of suit i. The desired probability is
52−13k
4
3
[
X
k+1 4
13
= 0.0511
(−1)
P ( Ai ) =
52
k
13
i=1
k=1
for the bridge hand. For the poker hand, the desired probability is
52−13k
3
X
k+1 4
5
= 0.7363.
(−1)
52
k
5
k=1
1.84 Take as sample space the set of all possible ordered arrangements of
the 12 people. Label the six rooms as i = 1, . . . , 6. The first two people
in an arrangement are assigned into room 1, the next two people in the
arrangement are assigned into room 2, etc. Let Ai be the event that
room i has two people of different
The sought probability
P4nationalities.
6
k+1
is 1 − P (A1 ∪ · · · ∪ A5 ) = 1 − k=1 (−1)
k P (A1 · · · Ak ). We have
8 × 4 × 2 × 10!
8 × 4 × 7 × 3 × 22 × 8!
, P (A1 A2 ) =
,
12!
12!
8 × 4 × 7 × 3 × 6 × 2 × 23 × 6!
P (A1 A2 A3 ) =
,
12!
8 × 4 × 7 × 3 × 6 × 2 × 5 × 24 × 4!
.
P (A1 A2 A3 A4 ) =
12!
P (A1 ) =
This leads to the value
1
33
for the sought probability.
1.85 The possible paths from node n1 to node n4 are the four the paths
(l1 , l5 ), (l2 , l6 ), (l1 , l3 , l6 ) and (l2 , l4 , l5 ). Let Ai be the event that the
ith path is functioning. The probability P (A1 ∪ A2 ∪ A3 ∪ A4 ) can be
evaluated as
p1 p5 + p2 p6 + p1 p3 p6 + p2 p4 p5 − p1 p2 p5 p6 − p1 p3 p5 p6 − p1 p2 p4 p5
− p1 p2 p3 p6 − p2 p4 p5 p6 + p1 p2 p3 p5 p6 + p1 p2 p4 p5 p6 .
This probability reduces to 2p2 (1 + p + p3 ) − 5p4 when pi = p for all i.
1.86 Let Ai be the event that number i has not appeared in 30 draws of
the Lotto 6/45. The desired probability is given by P (A1 ∪ · · · ∪ A45 ).
4530
Noting that P (A1 · · · Ak ) = 45−k
/ 6
, it follows that the desired
6
probability is equal to
39
X
45 − k 45 30
k+1 n
= 0.4722.
(−1)
6
6
k
k=1
23
1.87 Let Ai be the event that it takes more than r purchases to get the
ith coupon. The probability that more than r purchases are needed
in order to get a complete set of coupons is
P
n
[
i=1
n
X
n (n − k)r
(−1)k+1
Ai =
.
k
nr
k=1
Using this expression for n = 38, 365, and 100, we find that the required number of rolls is r = 13, the required number of people is
r = 2,287, and the required number of balls is r = 497.
1.88 Let Ai be the event that the ith boy becomes part of a couple. The
desired probability is 1 − P (A1 ∪ · · · ∪ An ). For any fixed i, P (Ai ) =
2n−4
n×n2n−2
= nn2 . For any fixed i and j, P (Ai Aj ) = n×(n−1)n
= n(n−1)
n2n
n2n
n4
for i 6= j. Continuing in this way, we find
P (A1 ∪ · · · ∪ An ) =
n
X
k=1
(−1)
k+1
n n(n − 1) · · · (n − k + 1)
.
k
n2k
1.89 Let Ai be the event that the ith person does not share his or her
birthday with someone else. The sought probability is given by 1 −
P (A1 ∪ · · · ∪ An ) and is equal to
1−
1
365n
min(n,365)
X
k=1
(−1)k+1
n
(365)×· · ·×(365−k +1)×(365−k)n−k .
k
This probability is 0.5008 for n = 3,064.
1.90 Let Ai be the event that all of the three random permutations have
the same number in the ith position. The desired probability P (A1 ∪
A2 ∪ · · · ∪ A10 ) is given by
10
X
k=1
(−1)k+1
10 10 × · · · × (10 − k + 1) × [(10 − k)!]3
= 0.0947.
[10!]3
k
1.91 There are 10
2 = 45 possible combinations of two persons. Let Ak be
the event that the two persons from the kth combination have chosen
each other’s name. Using the fact that P (Ai Aj ) = 0 for i 6= j when Ai
24
and Aj have a person in common, we find that the sought probability
is given by
1 10 8 1 4
10 1 2
−
+ ···
P (A1 ∪ · · · ∪ A45 ) =
9
2! 2
2 9
2
1 10 8 6 4 2 1 10
−
= 0.4654.
5! 2
2 2 2 2 9
1.92 Let Ai be the event that the ith person is a survivor. The desired
probability is given by 1 − P (A1 ∪ · · · ∪ An ). For any i,
P (Ai ) =
(n − 2)n−1 (n − 1)
.
(n − 1)n
For any i, j with i 6= j,
P (Ai Aj ) =
(n − 3)n−2 (n − 2)2
.
(n − 1)n
Continuing in this way, it follows that
P (A1 ∪ · · · ∪ An ) =
n−2
X
(−1)
k=1
k+1
n−k
(n − k)k
n n − (k + 1)
.
k
(n − 1)n
Note: Suppose that there is a second round with the survivors if there
are two or more survivors after the first round, the second round is
followed by a third round if there are two or more survivors after the
second round, and so on, until there is one survivor or no survivor at
all. What is the probability that the game ends with one survivor?
This probability is given for several values of n in the table below.
These probabilities can be computed by using the powerful method of
an absorbing Markov chain, see Chapter 10 of the book. In this method
we need the one-step transition probabilities pn,m being the probability
of going in one step from state n to state m. In this particular example,
the state is the number of survivors and the probability pn,m is the
probability of m survivors after a round starting with n survivors. To
give these probabilities, we need the generalized inclusion-exclusion
formula
P (exactly m of the events A1 , . . . , An will occur)
n−m
X
X
k m+k
(−1)
P (Ai1 Ai2 · · · Aim+k ).
=
m
k=0
i1 <i2 <···<im+k
25
n
2
3
4
5
6
7
8
9
10
P (one survivor)
0.0000
0.7500
0.5926
0.4688
0.4161
0.4389
0.4890
0.5323
0.5547
n
20
30
40
50
60
70
80
90
100
P (one survivor)
0.4693
0.5374
0.4996
0.4720
0.4879
0.5155
0.5309
0.5291
0.5160
1.93 Let Ai be the event that no ball of color i is picked for i = 1, 2, 3. The
probability of picking at least one ball of each color is
1 − P (A1 ∪ A2 ∪ A3 ) = 1 −
3
X
i=1
P (Ai ) −
3
2 X
X
i=1 j=i+1
P (Ai Aj ) .
If the balls are picked with replacement, then
125
105
85
,
P
(A
)
=
,
P
(A
)
=
,
2
3
155
155
155
55
75
P (A1 A2 ) = 5 , P (A1 A3 ) = 5 , and P (A2 A3 ) = 0.
15
15
P (A1 ) =
The sought probability is 1 − 0.4763 = 0.5237. If the balls are picked
without replacement, then
7 5 7 3
3
5
3
3
4
X
X
X
k 5−k
k 5−k
k 5−k
, P (A2 ) =
, P (A3 ) =
P (A1 ) =
15
15
15
k=1
P (A1 A2 ) =
5
7
5
15 , P (A1 A3 )
5
=
5
k=1
5
5
15 , and P (A2 A3 )
5
k=1
5
= 0.
The probability of picking at least one ball of each color is 1−0.3443 =
0.6557.
1.94 Take as sample space the set of all possible ordered arrangements of
the 2n people. The sample space has (2n)! equally likely elements.
Imagine that the two people in the positions 2k − 1 and 2k of the
ordered arrangement are paired as bridge partners for k = 1, . . . , n.
Let Ai the event that couple i is paired as bridge partners. The sought
26
probability is 1 − P (A1 ∪ · · · ∪ An ). The number of elements in the set
A1 ∩ · · · ∩ Ak is n × (n − 1) × · · · × (n − k + 1) × 2k × (2n − 2k)!. There
are n(n − 1) · · · (n − k + 1) possible choices for the couples in the first
2k positions of the arrangement and two partners from a couple can
be ordered in two ways. The remaining 2n − 2k people can be ordered
in (2n − 2k)! ways. Thus we find
P (A1 ∪ · · · ∪ An ) =
=
n
X
k=1
n
X
(−1)
k+1
k=1
n
P (A1 · · · Ak )
k
n n × (n − 1) × · · · × (n − k + 1) × 2k × (2n − 2k)!
(−1)k+1
.
k
(2n)!
Note: the probability that none of the couples will be paired as bridge
1
partners tends to 1−e− 2 = 0.3935 as the number of couples gets large,
see also Problem 3.85.
1.95 Let Ai be the event that the four cards of rank i are matched. The
sought probability is
P
13
[
i=1
Ai
13
X
(−1)k+1
=
k=1
13
k
(4!)k (52 − 4k)!
= 4.80 × 10−5 .
52!
27
Chapter 2
2.1 Take as sample space the set of all ordered pairs (i, j), where the
outcome (i, j) represents the two numbers shown on the dice. Each
of the 36 possible outcomes is equally likely. Let A be the event that
the sum of the two dice is 8 and B be the event that the two numbers
shown on the dice are different. There are 30 outcomes (i, j) with i 6= j.
4
In four of those outcomes i and j sum to 8. Therefore P (AB) = 36
4/36
2
and P (B) = 30
36 . The sought probability P (A | B) is 30/36 = 15 .
2.2 The ordered sample space consists of the eight equally likely elements (H, H, H), (H, H, T ), (H, T, H), (H, T, T ), (T, T, T ), (T, T, H),
(T, H, T ), and (T, H, H), where the first component refers to the nickel,
the second to the dime and the third to the quarter. Let A the event
that the quarter shows up heads and B be the event that the coins
showing up heads represent an amount of at least 15 cents. To find
P (A | B) = P (AB)/P (B), note that the set AB consists of four elements (H, H, H), ,(H,T,H),(T,T,H) and (T, H, H), while the set B
consists of the 5 elements (H, H, H), (H, H, T ), (H, T, H), (T, T, H),
and (T, H, H). This gives P (AB) = 48 and P (B) = 58 . Hence the
desired probability P (A | B) is 45 .
2.3 Take as sample space the set of the ordered pairs (G, G), (G, F ), (F, G),
and (F, F ), where G stands for a “correct prediction” and F stands
for a “false prediction,” and the first and second components of each
outcome refer to the predictions of weather station 1 and weather
station 2. The probabilities 0.9×0.8 = 0.72, 0.9×0.2 = 0.18, 0.1×0.8 =
0.08, and 0.1 × 0.2 = 0.02 are assigned to these elements. Let the
event A = {(G, F )} and the event B = {(G, F ), (F, G)}. The sought
0.18
9
probability is P (A | B) = 0.26
= 13
.
2.4 Let A be the event that a randomly chosen student passes the first
test and B be the event that this student also passes the second test.
0.50
Then P (B | A) = 0.80
= 0.625. The answer is 62.5%.
2.5 Let A be the event that a randomly chosen household has a cat and
B be the event that the household has a dog. Then, P (A) = 0.3,
P (B) = 0.25, and P (B | A) = 0.2. The sought probability P (A | B)
satisfies
P (A)P (B | A)
P (AB)
=
P (A | B) =
P (B)
P (B)
28
and thus is equal to 0.3 × 0.2/0.25 = 0.24.
2.6 The ordered sample space is the set {(H, 1), . . . , (H, 6), (T, 1), . . . , (T, 6)}.
Each outcome is equally likely to occur. Let A be the event that the
coin lands heads and B the event that the die lands six. The set A
consists of six elements, the set AB consists of a single element and the
set A ∪ B consists of seven elements. Hence the desired probabilities
are given by
P (AB | A∪B) =
P (AB)
1
P (A)
6
= and P (A | A∪B) =
= .
P (A ∪ B)
7
P (A ∪ B)
7
2.7 Label the two red balls as R1 and R2 , the blue ball as B and the green
ball as G. Take as unordered sample space the set consisting of the
six equally likely combinations {R1 , R2 }, {R1 , B}, {R2 , B}, {R1 , G},
{R2 , G}, and {B, G} of two balls. Let C be the event that two non-red
balls have been grabbed, D be the event that at least one non-red ball
has been grabbed, and E be the event that the green ball has been
grabbed. Then, P (CD) = 61 , P (D) = 56 , P (CE) = 61 and P (E) = 63 .
The sought probabilities are P (C | D) = 51 and P (C | E) = 31 . In the
second situation you have more information.
2.8 The ordered sample space is the set {(i, j) : i, j = 1, 2, . . . , 6}. Each
element is equally likely to occur. Let A be the event that both dice
show up an even number and let B the event that at least one of the
two dice shows up an even number. The set AB is equal to the set A
consisting of 9 elements and the set B consists of 27 elements. The
9/36
probability P (A | B) of your winning of the bet is equal to 27/36
= 31 .
The bet is not fair to you.
2.9 Take as unordered sample space the set of all possible combinations of
13 distinct cards. Let A be the event that the hand contains exactly
one ace, B be the event that the hand contains at least one ace, and
C be the event that the hand contains the ace of hearts. Then
52
1 48
4 48
1 12
1 12 / 13
52 = 0.6304 and P (A | C) = 1 51 = 0.4388.
P (A | B) =
1 − 48
13 / 13
1 12
The desired probabilities are 0.3696 and 0.5612. The second case involves more information.
29
2.10 The probability that the number of tens in the hand is the same as
the number of aces in the hand is given by
4 X
4
4
k=0
k
k
. 52
44
= 0.3162.
13
13 − 2k
Hence, using a symmetry argument, the probability that the hand contains more aces than tens is 12 (1 − 0.3162) = 0.3419. Letting A be the
event that the hand contains more aces than tens and B the event that
the hand contains at least one ace, then P (A | B) = P (AB)/P (B) =
P (A)/P (B). Therefore
P (A | B) = P4
k=1
0.3419
48 4
k 13−k /
52
13
= 0.4911.
2.11 Let A be the event that each number rolled is higher than all those that
were rolled earlier and B be the event that the three different numbers
are rolled. Then P (A) = P (AB) and so P (A) = P (B)P (A | B). We
have P (B) = 6×5×4
= 59 and P (A | B) = 3!1 . Thus
63
P (A) =
1
5
20
× = .
36 3!
54
P (BA)
2.12 (a) Since P (A | B) > P (B | A) is the same as PP(AB)
(B) > P (A) , it
follows that P (A) > P (B).
(b) Since P (B | A) = P (AB)/P (A) = P (A | B)P (B)/P (A), we get
P (B | A) > P (B). Also, by P (B c | A) + P (B | A) = [P (B c A) +
P (BA)]/P (A) = P (A)/P (A) = 1, we get P (B c | A) = 1 − P (B | A) ≤
1 − P (B) = P (B c ).
(c) If A and B are disjoint, then P (AB) = 0 and so P (A | B) = 0. If
B is a subset of A, then P (AB) = P (B) and so P (A | B) = 1.
2.13 Let A be the event that a randomly chosen student takes Spanish and
B be the event that the student takes French. Then, P (A) = 0.35,
P (B) = 0.15, and P (A∪B) = 0.40. Thus P (AB) = 0.35+0.15−0.40 =
2
0.10 and so P (B | A) = 0.10
0.35 = 7 .
2.14 Let A be the event that a randomly chosen child is enrolled in swimming and B be the event that the child is enrolled in tennis. The
sought probability P (A | B) follows from P (A | B) = P (AB)/P (B) =
P (A)P (B | A)/P (B) and is equal to (1/3) × 0.48/0.40 = 0.64.
30
2.15 Let A be the event that a randomly chosen voter is a Democrat, B be
the event that the voter is a Republican, and C be the event that the
voter is in favor of the election issue.
(a) Since P (A) = 0.45, P (B) = 0.55, P (C | A) = 0.7 and P (C | B) =
0.5, it follows from P (AC) = P (C | A)P (A) and P (BC) = P (C |
B)P (B) that P (AC) = 0.7 × 0.45 = 0.315 and P (BC) = 0.5 × 0.55 =
0.275.
(b) Since P (C) = P (AC) + P (BC), we get P (C) = 0.59.
(c) P (A | C) = 0.315
0.59 = 0.5339.
2.16 Let A be the event that a randomly selected household subscribes
to the morning newspaper and B be the event that the household
subscribes to the afternoon newspaper. To find the sought probability
P (Ac | B), use the relation P (A) = P (AB) + P (Ac B). Thus
P (Ac | B =
0.50 − 0.40
1
P (A) − P (AB)
=
= .
P (B)
0.70
7
2.17 Let A1 (A2 ) be the event that the first (second) card picked belongs
3
.
to one of the three business partners. Then P (A1 A2 ) = 53 × 42 = 10
2.18 Let Ai be the event that the ith card you receive is a picture card that
you have not received before. Then, by P (A1 A2 A3 A4 ) = P (A1 )P (A2 |
A1 )P (A3 | A1 A2 )P (A4 | A1 A2 A3 ), the sought probability can be computed as
P (A1 A2 A3 A4 ) =
16 12
8
4
×
×
×
= 9.46 × 10−4 .
52 51 50 49
2.19 Let A be the event that one or more sixes are rolled and B the event
that no one is rolled. Then, by P (AB) = P (B)P (A | B), we have that
the sought probability is
4 6 5 6 1−
= 0.2471.
P (AB) =
6
5
2.20 It is no restriction to assume that the drawing of lots begins with the
Spanish teams. Let A0 be the event that the two Spanish team are
paired and Ai be the event that the ith Spanish team is not paired
to the other Spanish team or to a German team. The events A0 and
A1 A2 are disjoint. The sought probability is
P (A0 ) + P (A1 A2 ) =
1 4 3
17
1
+ P (A1 )P (A2 | A1 ) = + × = .
7
7 7 5
35
31
2.21 Let Ai be the event that you get a white ball on the ith pick. The
probability that you need three picks is P (A1 A2 A3 ) = 53 × 25 × 15 =
6
125 . Five picks require that one black ball is taken in the first three
picks. By the chain rule, the probability that five picks are needed is
4
3
2
1
3
3
3
2
1
3
2
4
2
1
6
2
5 × 5 × 5 × 5 × 5 + 5 × 5 × 5 × 5 × 5 + 5 × 5 × 5 × 5 × 5 = 125 .
2.22 (a) The sought probability is the same as the probability of getting
two red balls when two balls are drawn at random from a bowl with
three red bed balls and three blue balls. Let Ai be the event that
the ith ball drawn is red. The sought probability is P (A1 A2 ). This
probability is evaluated as P (A1 )P (A2 | A1 ) = 36 × 52 = 15 .
(b) Let Ai be the event that the ith number drawn is not 10 and Ei
be the event that the ith number drawn is more than 10. The first
probability is
1 − P (A1 · · · A6 ) = 1 −
36
6
41 40
×
··· ×
= .
42 41
37
42
22
18
The second probability is P (E1 · · · E6 ) = 23
42 × 41 · · · × 37 = 0.0192.
(c) Suppose that first the two cups of coffee are put on the table. Let
Ai be the event that the ith cup of coffee is given to a person who
ordered coffee. The sought probability is
P (A1 A2 ) = P (A1 )P (A2 | A1 ) =
1
2 1
× = .
5 4
10
(d) Suppose that the two socks are chosen one by one. Let Ai be
the event that the ith sock chosen is black for i = 1, 2. The sought
probability is 2P (A1 A2 ). We have P (A1 A2 ) = P (A1 )P (A2 | A1 ) =
1
1
1
1
1
5 × 4 = 20 . Hence the sought probability is 2 × 20 = 10 .
(e) Imagine that two apartments become vacant one after the other.
Let A1 be the event that the first vacant apartment is not on the top
floor and A2 be the event that the second vacant apartment is not on
the top floor. The sought probability is 1 − P (A1 A2 ). The probability
47
P (A1 A2 ) is evaluated as P (A1 )P (A2 | A1 ) = 48
56 × 55 = 0.7325.
2.23 Let Ai be the event that the ith person in line is the first person
matching a birthday with one of the persons in front of him. Then
364
1
P (A2 ) = 365
and P (Ai ) = 365
× · · · × 364−i+3
× i−1
365
365 for i ≥ 3. The
probability P (Ai ) is maximal for i = 20 and has then the value 0.0323.
2.24 Let A1 be the event that the luggage is not lost in Amsterdam, A2 the
event that the luggage is not lost in Dubai and A3 the event that the
32
luggage is not lost in Singapore. Then,
P (the luggage is lost) = 1 − P (A1 A2 A3 )
= 1 − P (A1 )P (A2 | A1 )P (A3 | A1 A2 )
= 1 − 0.95 × 0.97 × 0.98 = 0.09693.
Letting Aci be the complementary event of the event Ai , we have
P (the luggage is lost in Dubai | the luggage is lost)
P (A1 )P (Ac2 | A1 )
P (A1 Ac2 )
=
=
P (the luggage is lost)
P (the luggage is lost)
=
0.95 × 0.03
= 0.2940.
0.09693
2.25 Let Ai be the event that the ith leaving person has not to squeeze
past a still seated person. The sought probability is the same as
P (A1 A2 A3 A4 A5 ) = 72 × 62 × 25 × 42 × 32 = 0.0127.
2.26 Let Ak be the event that the first ace appears at the kth card, and let
pk = P (Ak ). Then, by P (A1 A2 · · · An ) = P (A1 )P (A2 | A1 ) · · · P (An |
48
4
4
, p2 = 52
× 51
, and
A1 . . . An−1 ), it follows that p1 = 52
pk =
48 47
48 − k + 2
4
×
× ··· ×
×
,
52 51
52 − k + 2 52 − k + 1
k = 3, . . . , 49.
The three players do not have the same chance to become the dealer.
For P = A, B, and C, let rP be the probability that player P becomes
the dealer. Then rA > rB > rC , because the probabilityP
pk is decreasing in k. ThePprobabilities can be calculated P
as rA = 16
n=0 p1+3n =
15
0.3600, rB = 15
p
=
0.3328,
and
r
=
p
C
n=0 2+3n
n=0 3+3n = 0.3072.
2.27 Under the condition that the events A1 , . . . , Ai−1 have occurred, the
ith couple can match the birthdays of at most one of the couples 1 to
i−1
i − 1. Thus P (Aci | A1 · · · Ai−1 ) = 365
2 and so P (Ai | A1 · · · Ai−1 ) =
i−1
1 − 3652 . The sought probability is 1 − P (A2 A3 · · · An ) and equals
Q
i−1
1 − ni=2 1 − 365
2 , by the chain rule.
2.28 The desired probability is 1 − P (A1 A2 · · · Am−2 ). We have
39
39
39
5
5 −1
5 −2
, P (Ai | A1 · · · Ai−1 ) =
P (A1 ) =
39 3
5
39
5
−2
39
5
33
for i ≥ 2. The desired probability now follows by applying the chain
rule P (A1 A2 · · · Am−2 ) = P (A1 )P (A2 | A1 ) · · · P (Am−2 | A1 · · · Am−3 )
and is equal to
39
39
m−2
−
1
−
2
5
.
1− 5
m−1
39
5
2.29 Using the chain rule for conditional probabilities, the sought probabilb
r
b
b−1
r
for k = 1, r+b
× r+b−1
for k = 2 and r+b
× r+b−1
× ··· ×
ity is r+b
b−(k−2)
r+b−(k−2)
r
× r+b−(k−1)
for 3 ≤ k ≤ b + 1. The sought probability can
be written as
b
r+b−k
r
k−1
r−1
= r+b .
×
r+b
r + b − (k − 1)
k−1
r
This representation can be explained as the probability that the first
k − 1 picks are blue balls multiplied with the conditional probability
that the kth pick is a red ball given that the first k − 1 picks are blue
b
, as can be directly seen
balls. The answer to the last question is r+b
by a symmetry argument. The probability that the last ball picked is
blue is the same as the probability that the first ball picked is blue.
2.30 The probability that the rumor will not be repeated to any one person once more is P (A1 A2 · · · A10 ), where Ai is the event that the
rumor reaches only different persons during the first i times that
the rumor is told. Noting that P (A1 ) = 1, P (A2 | A1 ) = 1 and
P (Ai | A1 · · · Ai−1 ) = 25−i
23 for i ≥ 3, it follows that the probability
that the rumor will not be repeated to any one person once more is
22 21
15
×
× ··· ×
= 0.1646.
23 23
23
The probability that the rumor will not return to the originator is
8
( 22
23 ) = 0.7007.
18
9
2.31 Since P (A) = 36
, P (B) = 18
36 , and P (AB) = 36 , we get P (AB) =
P (A)P (B). This shows that the events A and B are independent.
30
2.32 The number is randomly chosen from the matrix and so P (A) = 50
,
15
25
P (B) = 50 and P (AB) = 50 . Since P (AB) = P (A)P (B), the events
A and B are independent. This result can also be explained by noting
that you obtain a random number from the matrix by choosing first a
row at random and choosing next a column at random.
34
2.33 Since A is the union of the disjoint sets AB and AB c , we have P (A) =
P (AB)+P (AB c ). This gives P (AB c ) = P (A)−P (A)P (B) = P (A)[1−
P (B)] and so P (AB c ) = P (A)P (B c ), showing that A and B c are independent events. Applying this result with A replaced by B c and B
by A, we next get that B c and Ac are independent events.
2.34 Since A = AB∪AB c and the events AB and AB c are disjoint, it follows
that P (A) = P (AB) + P (AB c ) = P (A | B)P (B) + P (A | B c )P (B c ).
This gives P (A) = P (A | B)P (B)+P (A | B)P (B c ) = P (A | B). Thus
P (A) = PP(AB)
(B) and so P (AB) = P (A)P (B).
2.35 The result follows directly from P (A1 ∪ · · · ∪ An ) = 1 − P (Ac1 · · · Acn )
and the independence of the Aci , using P (Ac1 · · · Acn ) = P (Ac1 ) · · · P (Acn )
and P (Aci ) = 1 − P (Ai ).
2.36 Using Problem 2.35, the probability is 1 − 21 × 32 × 43 = 34 .
T
S∞
2.37 The S
set A can be represented as ∞
of
n=1 k=n Ak . Since the sequence
S∞
A
A
is
nonincreasing,
we
have
P
(A)
=
lim
P
sets ∞
n→∞
k=n k ,
k=n k
by S
the continuity
property
T∞ ofc probability. Next use the fact that
∞
A
=
1−P
P
k=n Ak . Using the independence of the events
k=n k
An and the continuity
property
measure, it is readily ver
Q∞ of probability
T∞
c ). By P (Ac ) = 1 − P (A ) and
c =
P
(A
A
ified that P
k
k=n
k=n k
k
k
the inequality 1 − x ≤ e−x , we get
P
∞
\
k=n
∞
P∞
Y
e−P (Ak ) = e− k=n P (Ak ) = 0 for n ≥ 1,
Ack ≤
k=n
P∞
where the last equality
uses
the assumption n=1 P (An ) = ∞. This
S∞
verifies that P
k=n Ak = 1 for all n ≥ 1 and so P (A) = 1.
2.38 Let A be the event that you have picked the ball with number 7 written
on it and Bi the event that you have chosen box i for i = 1, 2. By
the law of conditional probability, P (A) = P (A | B1 )P (B1 ) + P (A2 |
B2 )P (B2 ). Therefore
P (A) =
1
1
1
1
× +
× = 0.07.
10 2 25 2
2.39 Let A be the event that HAPPY HOUR appears again, B1 be the event
that either the two letters H or the two letters P have been removed,
and B2 be the event that two different letters have been removed. Then
35
P (B1 ) = 29 × 81 + 92 × 81 and P (B2 ) = 1 − P (B1 ). Obviously, P (A |
B1 ) = 1 and P (A | B2 ) = 12 . By the law of conditional probability,
P (A) =
2
X
i=1
P (A | Bi )P (Bi ) = 1 ×
1 17
19
1
+ ×
= .
18 2 18
36
2.40 Let A be the event that the cases with $1,000,000 and $750,000 are
still in the game when you have opened 20 cases. Also, let B0 be the
event that your chosen case does not contain either of the amounts
$1,000,000 and $750,000 and B1 be the complementary event of B0 .
Then P (A) = P (A | B0 )P (B0 ) + P (A | B1 )P (B1 ), which gives
h23 25i 24 h24 25i
3
2
P (A) =
/
×
+
= .
/
×
20
20
26
26
65
20
20
2.41 Let A be the event that you ever win the jackpot when buying a
single ticket only once. Also, let B be the event that you match the
six numbers drawn and C be the event that you match exactly two
of these numbers. It follows from P (A) = P (A | B)P (B) + P (A |
C)P (C) that P (A) = P (B) + P (A)P (C). Since P (B) = 1/ 59
6 and
59
P (C) = 62 53
/
,
we
get
P
(A)
=
1/40,665,099.
4
6
2.42 Let A be the event that Joe’s dinner is burnt, B0 be the event that he
did not arrive home on time, and B1 be the event that he arrived home
on time. The probability P (A) = P (A | B0 )P (B0 ) + P (A | B1 )P (B1 )
is equal to 0.5 × 0.2 + 0.15 × 0.8 = 0.22. The inverse probability
P (B1 | A) is given by
P (B1 )P (A | B1 )
0.8 × 0.15
6
P (B1 A)
=
=
= .
P (A)
P (A)
0.22
11
2.43 Let A be the event of reaching your goal, B1 be the event of winning
the first bet and B2 be the event of losing the first bet. Then, by
P (A) = P (A | B1 )P (B1 ) + P (A | B2 )P (B2 ), we get P (A) = 1 ×
9
25
12
37 + 37 × 37 . Thus the probability of reaching your goal is 0.4887.
Note: This probability is slightly more than the probability 0.4865 of
reaching your goal when you use bold play and stake the whole $10,000
on a 18-numbers bet.
2.44 Let A be the event that the player wins and Bi be the conditioning
event that the first roll of the two dice gives a dice sum of i points
36
P12
for i = 2, . . . , 12. Then, P (A) =
k=2 P (A | Bk )P (Bk ). We have
P (A | Bi ) = 1 for i = 7, 11, and P (A | Bi ) = 0 for i = 2, 3, 12.
Put for abbreviation pk = P (Bk ), then pk = k−1
36 for k = 2, . . . , 7
and pk = p14−k for k = 8, . . . , 12. The other conditional probabilities
P (A | Bi ) can be given in terms of the pk . For example, the conditional
probability P (A | B4 ) is no other than the unconditional probability
that the total of 4 will appear before the total of 7 does in the (compound) experiment of repetitive dice rolling. The total of 4 will appear
before the total of 7 if and only if one of the events E1 , E2 , . . . occurs,
where Ek is the event that the first consecutive k − 1 rolls give neither
the total of 4 nor the total of 7 and the kth consecutive roll gives a
total of 4. The events E1 , E2 , . . . are mutually exclusive and so
P (4 before 7) = P
∞
[
i=1
∞
X
P (Ei ).
Ei =
i=1
Any event Ek is generated by physically independent subexperiments
and thus the probabilities of the individual outcomes in the subexperiments are multiplied by each other in order to obtain P (Ek ) =
k−1
p4 for any k ≥ 1. This leads to the formula
1 − p4 − p7
P (4 before 7) =
∞
X
k=1
1 − p4 − p7
In this way, we find that P (A | Bi ) =
Putting all the pieces together, we get
P (A) =
12
X
k=2
k−1
pi
pi +p7
p4 =
p4
,
p4 + p7
for i = 4, 5, 6, 8, 9, 10.
P (A | Bk )pk = 0.4929.
2.45 Apply the gambler’s ruin formula with p = 0.3, a = 3 and b = 7. The
sought probability is 0.0025.
2.46 For fixed integer r, let Ar be the event that there are exactly r winning
tickets among the fifty thousand tickets sold. Let Bk be the event
that there exactly k winning tickets among the one hundred thousand
tickets printed. Then, by the law of conditional probability,
P (Ar ) =
∞
X
k=0
P (Ar | Bk )P (Bk ).
37
Obviously, P (Ar | Bk ) = 0 for k < r. For all practical purposes the
so-called Poisson probability e−1 /k! can be taken for the probability
of the event Bk for k = 0, 1, . . ., see Example 1.19. This gives
∞ r −1
∞
r
r X
X
1 (1/2)
1
k
e
1
−1 (1/2)
=e
= e− 2
.
P (Ar ) =
r
2
k!
r!
j!
r!
j=0
k=r
Hence the probability of exactly r winning tickets among the rfifty
1
for
thousand tickets sold is given by the Poisson probability e− 2 (1/2)
r!
r = 0, 1, . . ..
2.47 It is no restriction to assume that the starting point is 1 and the first
transition is from point 1 to point 2 (otherwise, renumber the points).
Some reflection shows that the probability of visiting all points before
returning to the starting point is nothing else than the probability
1
1
1+10 = 11 from the gambler’s ruin model.
2.48 Let A be the event that the card picked is a red card, B1 be the
event that the removed top card is red and B2 be the event that the
removed top card is black. The sought probability P (A) is given by
P (A | B1 )P (B1 ) + P (A | B2 )P (B2 ). Therefore
P (A) =
r−1
r
r
b
r
×
+
×
=
.
r+b−1 r+b r+b−1 r+b
r+b
2.49 Let A be the event that John needs more tosses than Pete and Bj
be the event that Pete needs j tosses to obtain three heads. Then
j 1 j
j 1 j
1 j
P (Bj ) = j−1
2 ( 2 ) and P (A | Bj ) = 0 ( 2 ) + 1 ( 2 ) . By the law of
conditional probability, the sought probability P (A) is
P (A) =
∞
X
j=3
P (A | Bj )P (Bj ) = 0.1852.
2.50 Take any of the twenty balls and mark this ball. Let A be the event
that this ball is the last ball picked for the situation that three balls
were overlooked and were added to the bin at the end. If we can show
1
that P (A) = 20
, the raffle is still fair. Let B1 (B2 ) be the event that the
marked ball is (is not) one of the three balls that were unintentionally
overlooked. Then, by the law of conditional probabilities,
P (A) = P (A | B1 )P (B1 ) + P (A | B2 )P (B2 ) =
1
3
17
×
+0× .
3 20
20
38
1
, the same win probability as for the case in which
Hence P (A) = 20
no balls would have been overlooked.
2.51 Let state i mean that player A’ s bankroll is i. Also, let E be the event
of reaching state k without having reached state a + b when starting
in state a and F be the event of reaching state a + b without having
reached state k − 1 when starting in state k. Then the unconditional
probability of player A winning and having k as the lowest value of its
bankroll during the game is given by P (EF ) = P (E)P (F | E). Using
b
1
the gambler’s ruin formula, P (E) = a+b−k
and P (F | E) = a+b−k+1
.
Thus the sought conditional probability is
b(a + b)
a(a + b − k)(a + b − k + 1)
for k = 1, . . . a.
This probability has the values 0.1563, 0.2009, 0.2679, and 0.3750 for
k = 1, 2, 3, and 4 when a = 4 and b = 5.
2.52 Let A be the event that two or more participating cyclists will have
birthdays on the same day during the tournament and Bi be the event
that exactly i participating cyclists have their birthdays during the
tournament. The conditional probability P (A | Bi ) is easy to calculate. It is the standard birthday problem. We have
P (A | Bi ) = 1 −
23 × 22 × · · · × (23 − i + 1)
23i
for 2 ≤ i ≤ 23.
Further, P (A | Bi ) = 1 for i ≥ 24. Also, P (A | B0 ) = P (A | B1 ) = 0.
Therefore the probability P (Bi ) is given by
180
23 i
23 180−i
P (Bi ) =
1−
i
365
365
for 0 ≤ i ≤ 180.
Putting the pieces together and using P (A) =
we get
P (A) = 1 − P (B0 ) − P (B1 ) −
P180
i=2 P (A
| Bi )P (Bi ),
23
X
23 × 22 × · · · × (23 − i + 1)
i=2
(23)i
P (Bi ).
This yields the value 0.8841 for the probability P (A).
2.53 Let pn (i) be the probability of reaching his home no later than midnight without having reached first the police station given that he is
39
i steps away from his home and he has still time to make n steps before it is midnight. The sought probability is p180 (10). By the law of
conditional probability, the pn (i) satisfy the recursion
1
1
pn (i) = pn−1 (i + 1) + pn−1 (i − 1).
2
2
The boundary conditions are pk (30) = 0 and pk (0) = 1 for k ≥ 0, and
p0 (i) = 0 for i ≥ 1. Applying the recursion, we find p180 (10) = 0.4572.
In the same way, the value 0.1341 can be calculated for the probability
of reaching the police station before midnight. Note: As a sanity check,
we verified that pn (10) tends to 23 as n gets large, in agreement with
the gambler’s ruin formula. The probability pn (10) has the values
0.5905, 0.6659, and 0.6665 for n = 360, 1,200 and 1,440.
2.54 Let A be the event that John and Pete meet each other in the semifinals. To find P (A), let B1 be the event that John and Pete are
allocated to either group 1 or group 2 but not to the same group and
B2 be the event that John and Pete are allocated to either group 3 or
group 4 but not to the same group. Then P (B1 ) = P (B2 ) = 12 × 72 = 17 .
By the law of conditional probability,
1
1
+ P (A | B2 ) ×
7
7
1 1 1 1 1 1
1
× × + × × = .
2 2 7 2 2 7
14
P (A) = P (A | B1 ) ×
=
Let C be the event that John and Pete meet each other in the final. To
find P (C), let D1 be the event that John is allocated to either group 1
or group 2 and Pete to either group 3 or group 4 and D2 be the event
that John is allocated to either group 3 or group 4 and Pete to either
group 1 or group 2. Then P (D1 ) = P (D2 ) = 12 × 74 = 27 . By the law
of conditional probability,
2
2
+ P (C | D2 ) ×
7
7
1
1 1 1 1 2 1 1 1 1 2
× × × × + × × × × = .
2 2 2 2 7 2 2 2 2 7
28
P (C) = P (C | D1 ) ×
=
The latter result can also be directly seen by a symmetry argument.
The probability that any one pair
contests the final is the same as that
8
for any other pair. There are 2 different pairs and so the probability
1
that John and Pete meet each other in the final is 1/ 82 = 28
.
40
2.55 Let A be the event that you have chosen the bag with one red ball
and B be the event that you have the other bag. Also, let E be the
event that the first ball picked is red. The sought probability that the
second ball picked is red is
1
3
P (A | E) + P (B | E),
4
4
by the law of conditional probability. We have
P (A | E) =
P (A)P (E | A)
P (AE)
=
.
P (E)
P (E)
Further, P (B | E) = 1 − P (A | E). Since P (A) = P (B) = 12 , P (E) =
P (A) × 41 + P (B) × 34 = 12 and P (E | A) = 14 , we get P (A | E) = 41
and P (B | E) = 43 . Thus the sought probability is 14 × 41 + 34 × 43 = 58 .
2.56 The key idea for the solution approach is to parameterize the starting
state. Define Ds as the event that Dave wins the game when the game
begins with Dave rolling the dice and Dave has to roll more than s
points in his first roll. Similarly, the event Es is defined for Eric. The
goal is to find P (D1 ). This probability can be found from a recursion
scheme for the P (Ds ). The recursion scheme follows by conditioning
on the events Bj , where Bj is the event that a roll of the two dice
results in a sum of j points. The probabilities pj = P (Bj ) are given
by pj = j−1
36 for 2 ≤ j ≤ 7 and pj = p14−j for 8 ≤ j ≤ 12. By the law
of conditional probability,
P (Ds ) =
12
X
j=s+1
P (Ds | Bj )pj
for s = 1, 2, . . . , 11.
Obviously, P (D12 ) = 0. Since P (Ds | Bj ) = 1 − P (Ej ) for j > s and
P (Ek ) = P (Dk ) for all k, we get the recursion scheme
P (Ds ) =
12
X
[1 − P (Dj )]pj
for s = 1, 2, . . . , 11.
j=s+1
Starting with P (D12 ) = 0, we recursively compute P (D11 ), . . . , P (D1 ).
This gives the value P (D1 ) = 0.6541 for the probability of Dave winning the game.
41
2.57 Fix j. Label the c = 7j possible combinations of j stops as l =
1, . . . , c. Let A be the event that there will be exactly j stops at
which nobody gets off and Bl be the event that
P nobody gets off at
the j stops from combination l. Then, P (A) = cl=1 P (A | Bl )P (Bl ).
We have that P (Bl ) = (7 − j)25 /725 for all l and P (A | Bl ) is the
unconditional probability that at least one person gets off at each
stop
there are 7 − j stops and 25 persons. Thus P (A | Bl ) =
Pwhen
7−j
25
25
1− k=1 (−1)k+1 7−j
k (7−j−k) /(7−j) , using the result of Example
1.18. Next we get after some algebra the desired result
P (A) =
7−j
X
k=0
j+k
7
(7 − j − k)25
(−1)
.
725
j
j+k
k
Note: More generally, the probability of exactly j empty bins when
m ≥ b balls are sequentially placed at random into b bins is given by
b−j
X
k=0
(b − j − k)m
b
j+k
(−1)
.
j+k
j
bm
k
2.58 Let A be the event that all of the balls drawn are blue and Bi be the
event that the number of points shown by the die is i for i = 1, . . . , 6.
By the law of conditional probability, the probability that all of the
balls drawn are blue is given by
P (A) =
6
X
i=1
P (A | Bi )P (Bi ) =
5
X
i=1
5
i
10
i
×
1
5
= .
6
36
The probability that the number of points shown by the die is r given
that all of the balls drawn are blue is equal to
(1/6) 5r /
P (Br A)
=
P (Br | A) =
P (A)
5/36
This probability has the values 53 ,
4
1
1
1
15 , 10 , 35 , 210
10
r
.
and 0 for r = 1, . . . , 6.
2.59 Let A be the event that the both rolls of the two dice show the same
combination of two numbers. Also, let B1 be the event that the first
roll of the two dice shows two equal numbers and B2 be the event
6
that the first roll shows two different numbers. Then P (B1 ) = 36
and
42
30
. Further, P (A | B1 ) =
P (B2 ) = 36
law of conditional probability,
P (A) =
2
X
i=1
P (A | Bi )P (Bi ) =
1
36
and P (A | B2 ) =
2
36 .
By the
1
6
2
30
11
×
+
×
=
.
36 36 36 36
216
2.60 Let Aj be the event that the team placed jth in the competition wins
the first place in the draft and Bj be the event that this team wins
the second place in the draft for 7 ≤ j ≤ 14. Obviously,
P (Aj ) =
15 − j
36
for j = 7, . . . , 14.
P
By the law of conditional probability, P (Bj ) = k6=j P (Bj | Ak )P (Ak ).
15−j
for k 6= j. Therefore
We have P (Bj | Ak ) = 36−15+k
P (Bj ) =
X
k6=j
15 − j
15 − k
×
36 − 15 + k
36
for j = 7, . . . , 14.
The probability P (Bj ) has the numerical values 0.2013, 0.1848, 0.1653,
0.1431, 0.1185, 0.0917, 0.0629, and 0.0323 for j = 7, 8, . . . , 14.
2.61 This problem can be seen as a random walk on the integers, where the
random walk starts at zero. In the first step the random walk moves
from to 1 with probability p and to −1 with probability q = 1 − p.
Take p < 21 . Starting from 1, the random walk will ever return to zero
with probability 1−limb→∞ [1−(q/p)a ]/[1−(q/p)a+b ] with a = 1. This
probability is 1. Starting from −1, the random walk will ever return to
zero with probability 1 − limb→∞ [1 − (p/q)a ]/[1 − (p/q)a+b ] with a = 1.
p
This probability is pq . The sought probability is p×1+(1−p)× 1−p
= 2p
1
(this result is also valid for p = 2 ).
2.62 Let A be the event that the drunkard will ever visit the point which is
one unit distance south from his starting point. Let B1 be the event
that the first step is one unit distance to the south and B2 be the event
that the first step is two units distance to the north. By the law of
conditional probability, P (A) = P (A | B1 )P (B1 ) + P (A | B2 )P (B2 ).
Obviously, P (B1 ) = P (B2 ) = 21 and P (A | B1 ) = 1. Noting that the
probability of ever going three units distance to the south from any
starting point is P (A) × P (A) × P (A), it follows that
3
P (A) = 0.5 + 0.5 P (A) .
43
The cubic equation x3 − 2x + 1 = 0 has the root x = 1 and so the
2 +x−1) = 0. The only positive
equation can be factorized as (x−1)(x
√
1
2
root of x +x−1 = 0 is x = 2 ( 5−1). This is the desired value for the
√
sought probability P (A). Next some reflection shows that 12 ( 5 − 1)
gives also the probability of the number of heads ever exceeding twice
the number of tails if a fair coin is tossed over and over.
2.63 It does not matter what question you ask. To see this, let A be the
event that your guess is correct, B1 be the event that the answer of
your friend is yes and B2 be the event that the answer is no. For the
1
1
1
question whether the card is red, we have P (A) = 26
× 21 + 26
× 12 = 26
,
by the law of conditional probability. For the other question, P (A) =
1
1
1
+ 51
× 51
1 × 52
52 = 26 . The same probability.
2.64 Let A be the event that player 1 wins the game. We have P (A) = 0.5,
regardless of the value of m. The simplest way to see this is to define E1
as the event that player 1 has more heads than player 2 after m tosses,
E2 as the event that player 1 has fewer heads than player 2 after m
tosses, and E3 as the event that player 1 has P
the same number of heads
as player 2 after m tosses. Then P (A) = 3i=1 P (A | Ei )P (Ei ), by
the law of conditional probability. To evaluate this, it is not necessary
to know the P (Ei ). Since P (E2 ) = P (E1 ) and P (E3 ) = 1 − 2P (E1 ),
it follows that
1
× P (E3 )
2
1
= P (E1 ) + × 1 − 2P (E1 ) = 0.5.
2
P (A) = 1 × P (E1 ) + 0 × P (E2 ) +
2.65 Let A be the event that you roll two consecutive totals of 7 before a
total of 12. Let B1 be the event that each of the first two rolls results
in a total of 7, B2 be the event that the first roll gives a total of 7 and
the second roll a total different from 7 and 12, B3 be the event that
the first roll gives a total different from 7 and 12, B4 be the event that
the first roll gives a total of 7 and the second roll a total of 12, and B5
be the event that the first roll gives a total of 12. Then,
P (A) = 1×
6
6 29
29
6
1
1
6
× +P (A)× × +P (A)× +0× × +0×
36 36
36 36
36
36 36
36
and so P (A) =
6
13 .
44
2.66 A minor modification of the analysis of Example 2.11 shows that the
optimal stopping level for player A remains the same, but the win
probability of player A changes to 0.458.
2.67 The recursion is
6−i
X
1
p(i + 1, t − j),
p(i, t) =
6−i+1
j=0
as follows by conditioning upon the number of tokens you lose at the
ith cup. This leads to p(1, 6) = 169
720 .
2.68 For fixed n, Let A be the event that the total score ever reaches the
value n. To find pn = P (A), condition on the outcome of the first
roll of the die. Let Bj be the event that the outcome of this roll is j.
Then, P (A | Bj ) = pn−j and so, by the law of conditional probability,
pn =
6
X
k=1
pn−k ×
1
6
for all n ≥ 1
1
with the convention pj = 0 for j ≤ 0. The result that pn tends to 3.5
as n gets large can be intuitively explained from the fact that after
each roll of the die the expected increase in the total score is equal to
1
6 (1 + 2 + · · · + 6) = 3.5.
2.69 (a) Define rn as the probability of getting a run of either r successes
or r failures in n trials.Also, define sn as probability of getting a run
of either r successes or r failures in n trials given that the first trial
results in a success, and fn as probability of getting a run of either r
successes or r failures in n trials given that the first trial results in a
failure. Then rn = psn + (1 − p)fn . The sn and fn satisfy the recursive
schemes
sn = p
r−1
+
r−1
X
k=1
pk−1 (1 − p)fn−k
fn = (1 − p)r−1 +
r−1
X
k=1
(1 − p)k−1 psn−k
for n ≥ r, where sj = fj = 0 for j < r − 1.
(b) Parameterize the starting state and let p(r, b, L) be the probability
45
that the longest run of red balls will be L or more when the bowl
initially contains r red and b blue balls. Fix r > L and b ≥ 1. Let A be
the event that a run of L red balls will occur. To find P (A) = p(r, b, L),
let BL be the conditioning event that the first L balls picked are red
and Bj−1 be the conditioning event that each of the first j − 1 balls
picked is red but the jth ball picked is blue, where 1 ≤ j ≤ L. Then
r
r − (L − 1)
× ··· ×
r+b
r + b − (L − 1)
r − (j − 2)
b
r
× ··· ×
×
, 1 ≤ j ≤ L.
P (Bj−1 ) =
r+b
r + b − (j − 2) r + b − (j − 1)
Note that P (A | BL ) = 1 and P (A | Bj−1 ) = p r − (j − 1), b − 1, L
for 1 ≤ j ≤ L. Then
P (BL ) =
P (A) = P (BL ) +
L
X
j=1
P (A | Bj−1 )P (Bj−1 )
gives a recursion scheme for the calculation of the probability p(r, b, L).
2.70 For the case of n dwarfs, p(k, n) is defined as the probability that the
kth dwarf will not sleep in his own bed when the first dwarf chooses
randomly one of the n beds (the dwarfs 1, 2, . . . , n go to bed in this
order and dwarf j has bed j). Let us first note that the dwarfs 2, . . . , j−
1 sleep in their own beds if the first dwarf chooses bed j. The first
dwarf chooses each of the n beds with the same probability n1 . Fix
k ≥ 2. Under the condition that the first dwarf chooses bed j with
2 ≤ j ≤ k, the conditional probability that the kth dwarf will not sleep
in his own bed is equal to 1 for j = kand is equal to the unconditional
probability p k − (j − 1), n − (j − 1) for 2 ≤ j ≤ k − 1 (when dwarf j
goes to bed, we face the situation of n − (j − 1) dwarfs where bed 1 is
now the bed of dwarf j and dwarf k is in the k − (j − 1)-th position).
Hence, by the law of conditional probability, we find the recursion
k−1
p(k, n) =
1
1 X
+
p k − (j − 1), n − (j − 1)
n
n
j=2
for k = 2, . . . , n and all n ≥ 2. Noting that p(1, n) = n−1
n for all n ≥ 1,
1
we get p(2, n) = n1 and p(3, n) = n−1
. Next, by induction, we obtain
p(k, n) =
1
n−k+2
for 2 ≤ k ≤ n.
46
Hence the probability that the kth dwarf can sleep in his own bed is
1
1
n−k+1
equal to 1 − n−1
n = n for k = 1 and 1 − n−k+2 = n−k+2 for 2 ≤ k ≤ n.
A remarkable result is that p(n, n) = 12 for all n ≥ 2. A simple intuitive
explanation can be given for the result that the last dwarf will sleep
in his own bed with probability 21 , regardless of the number of dwarfs.
The key observation is that the last free bed is either the bed of the
youngest dwarf or the bed of the oldest dwarf. This is an immediate
consequence of the fact that any of the other dwarfs always chooses
his own bed when it is free. Each time a dwarf finds his bed occupied,
the dwarf chooses at random a free bed and then the probability of
the youngest dwarf’s bed being chosen is equal to the probability of
the oldest dwarf’s bed being chosen. Thus the last free bed is equally
likely to be the bed of the youngest dwarf or the bed of the oldest
dwarf.
Note: Consider the following variant of the problem with seven dwarfs.
The jolly youngest dwarf decides not to choose his own bed but rather
to choose at random one of the other six beds. Then, the probability
5
that the oldest dwarf can sleep in his own bed is 56 × 12 = 12
, as can
be seen by using the intuitive reasoning above.
2.71 Let’s assume that the numbers 1, 2, . . . , R are on the wheel. It is
obvious that the optimal strategy of the second player B is to stop
after the first spin if and only if the score is larger than the final score
of player A and is larger than a2 , where a2 is the optimal switching
point for the first player in the two-player game (a2 = 53 for R = 100).
Denote by S3 (a) [C3 (a)] the probability that the first player A will beat
both player B and player C if player A obtains a score of a points in the
first spin and stops [continues] after the first spin. Let the switching
point a3 be the largest value of a for which C3 (a) is still larger than
S3 (a). Then, in the three-player game it is optimal for player A to
stop after the first spin if the score of this spin is more than a3 points.
Denote by P3 (A) the overall win probability of player A. Then, by the
law of conditional probability,
a3
R
1 X
1 X
P3 (A) =
S3 (a).
C3 (a) +
R
R
a=1
a=a3 +1
To obtain S3 (a), we first determine the conditional probability that
player A will beat player B when player A stops after the first spin
with a points. Denote this conditional probability by Pa . To find, we
47
2
first note that the probability S(a) = Ra 2 from the two-player game
represents the probability that the second player in this game scores
no more than a points in the first spin and has not beaten the first
player after the second spin. Thus, taking into account the form of
the optimal strategy of player B, we find for a ≥ a2 ,
Pa = P (B gets no more than a in the first spin and A beats B)
= S(a) =
a2
R2
and for 1 ≤ a < a2 ,
Pa = P (B gets no more than a in the first spin and A beats B)
+ P (B gets between a and a2 + 1 in the first spin and A beats B)
a2 1 a2
R − j
1 X
a2 (a2 + 1) − a(a + 1) ,
= 2+
1−
= S(a) +
2
R
R
R
2R
j=a+1
where 1 − (R − j)/R denotes the probability that B’s total score after the second spin exceeds R when B’s score in the first spin is j.
Obviously, for the case that player A stops after the first spin with a
points, the conditional probability of player A beating player C given
2
that player A has already beaten player B is equal to Ra 2 . Thus, the
function S3 (a) is given by
 2
 a 2 × a22 ,
a2 ≤ a ≤ R
R
R
S3 (a) = a2
2
 2 + 1 2 [a2 (a2 + 1) − a(a + 1)] × a 2 ,
1 ≤ a < a2 .
R
2R
R
Further,
C3 (a) =
R−a
1 X
S3 (a + k)
R
k=1
for 1 ≤ a ≤ R.
PR−a (a+k)4
and C3 (a) = R1 k=1
for a ≥ a2 and
Noting that S3 (a) =
R4
taking for granted that a3 ≥ a2 , the switching point a3 is nothing else
than the largest integer a ≥ a2 for which
a4
R4
R−a
1 X (a + k)4
a4
>
.
R
R4
R4
k=1
The probability P3 (B) of the second player B being the overall winner
can be calculated as follows. For the situation of optimal play by the
48
players, let p3 (a) denote the probability that the final score of player
A will be a points and, for b > a, let p3 (b | a) denote the probability
that the final score of player B will be b points given that players’s A
final score is a. Then,
P3 (B) =
R−1
X
p3 (a)
a=0
R
X
b=a+1
p3 (b | a)
b2
.
R2
Pa 3
It easily follows that p3 (0) = k=1 (1/R) × (k/R) = 21 a3 (a3 + 1)/R2 ,
Pa−1
2
p3 (a) =
k=1 (1/R) × (1/R) = (a − 1)/R for 1 ≤ a ≤ a3 , and
p3 (a) = 1/R + a3 /R2 for a3 < a ≤ R. Then, for 0 ≤ a < a2 and b > a,
1
a2
b−1
for b ≤ a2 , p3 (b | a) = + 2 for b > a2 ,
2
R
R R
2
p3 (b | a) = 1/R + a/R for a2 ≤ a < R and b > a.
p3 (b | a) =
Numerical results: For R = 20, we find a3 = 13 (a2 = 10), P3 (A) =
0.3414, P3 (B) = 0.3307, and P3 (C) = 0.3279 with P3 (C) = 1−P3 (A)−
P3 (B). For R = 100, the results are a3 = 65 (a2 = 53), P3 (A) =
0.3123, P3 (B) = 0.3300, and P3 (C) = 0.3577, while for R = 1,000 the
results are a3 = 648 (a2 = 532), P3 (A) = 0.3059, P3 (B) = 0.3296, and
P3 (C) = 0.3645 (see also the solution of Problem 7.35).
Note: The following result can be given for the s-player game with
s > 3. Denoting by as the optimal switching point for the first player
A in the s-player game, the value of as can be calculated as the largest
integer a ≥ as−1 for which
R−a 1 X a + k 2(s−1) a 2(s−1)
.
>
R
R
R
k=1
For R = 20, as has the values 14, 15, 16, and 17 for s=4, 5, 7, and 10.
These values are 71, 75, 80, and 85 when R = 100 and are 711, 752,
803, and 847 when R = 1,000.
2.72 Let the hypothesis H be the event that a 1 is sent and the evidence E
be the event that a 1 is received. The posterior odds are
P (H | E)
P (H) P (E | H)
0.8 0.95
×
= 380.
=
×
=
0.2 0.01
P (H | E)
P (H) P (E | H)
Hence the posterior probability P (H | E) that a 1 has been sent is
380
1+380 = 0.9974.
49
2.73 Let the hypothesis H be the event that oil is present and the evidence
E be the event that the test is positive. Then P (H) = 0.4, P (H) = 0.6,
P (E | H) = 0.9, and P (E | H) = 0.15. Thus the posterior odds are
P (H) P (E | H)
0.4 0.90
P (H | E)
=
×
=
×
=4
0.6 0.15
P (H | E)
P (H) P (E | H)
The posterior probability P (H | E) =
4
1+4
= 0.8.
2.74 Let the hypothesis H be the event that it rains tomorrow and E be
the event that rain is predicted for tomorrow. The prior odds of the
event H are P (H)/P (H) = 0.1/0.9. The likelihood ratio is given by
P (E | H)/P (E | H) = 0.85/0.25. Then, by Bayes’ rule in odds form,
the posterior odds are
P (H | E)
17
P (H) P (E | H)
0.1 0.85
×
= .
=
×
=
0.9 0.25
45
P (H | E)
P (H) P (E | H)
It next follows that the posterior probability P (E | H) of rainfall
tomorrow given the information that rain is predicted for tomorrow is
17/45
= 0.2742.
equal to 1+17/45
2.75 Let the hypothesis H be the event that the blue coin is unfair and the
evidence E be the event that all three tosses3 of the blue coin show
(0.75)
27
a head. The posterior odds are 0.2
0.8 × (0.5)3 = 32 . The posterior
probability P (H | E) =
27
59
= 0.4576.
2.76 Let the hypothesis H be the event that Dennis Nightmare played the
final and the evidence E be the event that the Dutch team won the
final Then, P (H) = 0.75, P (H) = 0.25, P (E | H) = 0.5, and P (E |
H) = 0.3. Therefore the posterior odds are
P (H | E)
0.75 0.5
×
= 5.
=
0.25 0.3
P (H | E)
Thus the sought posterior probability P (H | E) = 56 .
2.77 Let the hypothesis H be the event that both children are boys.
(a) If the evidence E is the event that at least one child is a boy, then
the posterior odds are
1
1
1/4
×
= .
3/4 2/3
2
50
The posterior probability P (H | E) = 31 .
(b) If the evidence E is the event that at least one child is a boy born
on a Tuesday, then the posterior odds are
6 2 ih 1 1 1 1 1
i 13
1/4 h
× 1−
× + × + ×0 = .
3/4
7
3 7 3 7 3
14
13
The posterior probability P (H | E) = 27
.
(c) If the evidence E is the event that at least one child is a boy born
on one of the first k days of the week, then the posterior odds are
i 14 − k
k 2 ih 1 k 1 k 1
1/4 h
× 1− 1−
× + × + ×0 =
.
3/4
7
3 7 3 7 3
14
The posterior probability P (H | E) =
14−k
28−k
for k = 1, 2, . . . , 7.
2.78 Let the hypothesis H be the event that the inhabitant you overheard
spoke truthfully and the evidence E be the event that the other inhabitant says that the inhabitant you overheard spoke the truth. The
posterior odds are
1
P (H | E)
1/3 1/3
×
= .
=
2/3 2/3
4
P (H | E)
Hence the posterior probability P (H | E) that the inhabitant you
1/4
overheard spoke the truth is 1+1/4
= 51 .
2.79 Let the hypothesis H be the event that the suspect is guilty and the
evidence E be the event that the suspect makes a confession. To verify
that P (H | E) > P (H) if and only if P (E | H) > P (E | H), we use
a
b
the fact that 1−a
> 1−b
for 0 < a, b < 1 if and only if a > b. Bayes’
rule in odds form states that
P (H) P (E | H)
P (H | E)
=
×
P (H | E)
P (H) P (E | H)
If P (E | H) > P (E | H), then it follows from Bayes’ rule in odds form
P (H|E)
P (H)
that 1−P
(H|E) > 1−P (H) and so P (H | E) > P (H). Next suppose that
P (H | E) > P (H). Then
P (H|E)
1−P (H|E)
>
in odds form, P (E | H) > P (E | H).
P (H)
1−P (H)
and thus, by Bayes’ rule
51
2.80 Let the hypothesis H be the event that the bowl originally contained a
red ball and the evidence E be the event that a red ball is picked from
the bowl after a red ball was added. Then, P (H) = 0.5, P (H) = 0.5,
(H|E)
1
= 1/2
P (E | H) = 1, and P (E | H) = 0.5. Therefore PP (H|E)
1/2 × 1/2 = 2.
Thus the posterior probability P (H | E) = 32 .
2.81 Let the hypothesis H be the event that the woman has breast cancer
and the evidence E be the event that the test result is positive. Since
P (H) = 0.01, P (H) = 0.99, P (E | H) = 0.9, and P (E | H) = 0.1, the
0.01
1
posterior odds are 0.99
× 0.9
0.1 = 11 . Therefore the posterior probability
1
.
P (H | E) = 12
Note: As a sanity check, the posterior probability can also be obtained
by a heuristic but insightful approach. This approach presents the
relevant information in terms of frequencies instead of probabilities.
Imagine 10,000 (say) women who undergo the test. On average, there
are 90 positive tests for the 100 women having the malicious disease,
whereas there are 990 false positives for the 9,900 healthy women.
Thus, based on the information presented in this way, we find that the
1
sought probability is 90/(90 + 990) = 12
.
2.82 Let the hypothesis H be the event that Elvis was an identical twin
and the evidence E be the event that Elvis’s twin was male. Then
300
5
= 17
, P (H) = 12
P (H) = 425
17 , P (E | H) = 1, and P (E | H) = 0.5.
(H|E)
5
Then, by Bayes in odds form, PP (H|E)
.
= 56 . This gives P (H | E) = 11
Note: A heuristic way to get the answer is as follows. In 3000 births
(say), we would expect 3000/300 = 10 sets of identical twins. Roughly
half of those we would expect to be boys. That’s 5 sets of boy-boy
identical twins. In 3000 births, we would expect 3000/125 = 24 sets
of fraternal twins. One fourth would be boy-boy, one-fourth would
be girl-girl, one fourth would be boy-girl, and one fourth girl-boy.
Therefore six sets would be boy-boy. So, out of 3000 births, five out of
eleven sets of boy-boy twins would be identical. Therefore the chances
that Elvis was an identical twin is about 5/11.
2.83 Let the hypothesis H be the event that you have chosen the two-headed
coin and the evidence E be the event that all n tosses result in heads.
The posterior odds are
P (H | E)
1
1/10,000
×
.
=
9,999/10,000 0.5n
P (H | E)
52
n
2
This gives P (H | E) = 2n +9
,999 . The probability P (H | E) has the
values 0.0929, 0.7662, and 0.9997 for n = 10, 15, and 25.
2.84 Let the random variable Θ represent the unknown probability that a
single toss of the die results in the outcome 6. The prior distribution
of Θ is given by p0 (θ) = 0.25 for θ = 0.1, 0.2, 0.3 and 0.4. The
posterior probability p(θ | data) = P (Θ = θ | data) is proportional to
75
225 . Hence the
L(data | θ)p0 (θ), where L(data | θ) = 300
75 θ (1 − θ)
posterior probability p(θ | data) is given by
p(θ | data) =
=
L(data | θ)p0 (θ)
k=1 L(data | k/10)p0 (k/10)
θ75 (1 − θ)225
, θ = 0.1, 0.2, 0.3, 0.4.
P4
75
225
k=1 (k/10) (1 − k/10)
P4
The posterior probability p(θ | data) has the values 3.5×10−12 , 0.4097,
0.5903, and 3.5 × 10−12 for θ = 0.1, 0.2, 0.3, and 0.4.
2.85 Let the random variable Θ represent the unknown win probability of
Alassi. The prior of Θ is p0 (0.4) = p0 (0.5) = p0 (0.6) = 13 . Let E
be the event that Alassi wins
The likelihood
the best-of-five
contest.
3 2
4 2
3
2
function L(E | θ) is θ + 2 θ (1 − θ)θ + 2 θ (1 − θ) θ. The posterior
probability p(θ | E) is proportional to p0 (θ)L(E | θ) and has the values
0.2116, 0.3333, and 0.4550 for θ = 0.4, 0.5, and 0.6.
2.86 The prior density of the unknown success probability is
p0 (θ) =
1
101
for θ = 0, 0.01, . . . , 0.99, 1.
For a single observation, the prior is updated with the likelihood factor
θ if the observation corresponds to a success of the new treatment and
with 1 − θ otherwise. The first observation S leads to an update that
is proportional to θp0 (θ), the second observation S to an update that
is proportional to θ2 p0 (θ), the third observation F to an update that
is proportional to θ2 (1 − θ)p0 (θ), and so on, the tenth observation F
to an update that is proportional to θ2 (1 − θ)θ2 (1 − θ)θ3 (1 − θ)p0 (θ) =
θ7 (1 − θ)3 p0 (θ). The same posterior as we found in Example 2.17,
where we simultaneously used all observations.
2.87 Let the random variable Θ be 1 if the student is unprepared for the
exam, 2 if the student is half prepared, and 3 if the student is well
53
prepared. The prior of Θ is p0 (1) = 0.2, p0 (2) = 0.3, and p0 (3) = 0.5.
Let E be the event that the student has answered correctly
26 26 out 24of
50 questions. The likelihood function L(E | θ) is 50
26 aθ (1 − aθ ) ,
where a1 = 13 , a2 = 0.45 and a3 = 0.8. The posterior probability
p(θ | E) is proportional to p0 (θ)L(E | θ) and has the values 0.0268,
0.9730, and 0.0001 for θ = 1, 2, and 3.
2.88 Let the random variable Θ represent the unknown probability that a
free throw of your friend will be successful. The prior probabilities are
p0 (θ) = P (Θ = θ) has the values 0.2, 0.6, and 0.2 for θ = 0.25, 0.50,
and 0.75. The posterior probability p(θ | data) = P (Θ =θ | data) is
7
3
proportional to L(data | θ)p0 (θ), where L(data | θ) = 10
7 θ (1 − θ) .
Hence the posterior probability p(θ | data) is given by
θ7 (1 − θ)3 p0 (θ)
0.257 × 0.753 × 0.2 + 0.507 × 0.503 × 0.6 + 0.757 × 0.253 × 0.2
for θ = 0.25, 0.50, and 0.75. The possible values 0.25, 0.50 and 0.75
for the success probability of the free throws of your friend have the
posterior probabilities 0.0051, 0.5812 and 0.4137.
54
Chapter 3
3.1 Take as sample space the set consisting of the four outcomes (0, 0),
(1, 0), (0, 1) and (1, 1), where the first (second) component is 0 if the
first (second) student picked has not done homework and is 1 other4
1
15
5
5
15
15
5
× 19
= 19
, 20
× 19
= 15
wise. The probabilities 20
76 , 20 × 19 = 76 ,
15
14
21
and 20 × 19 = 38 are assigned to these four outcomes. The random
variable X takes on the value 0 for the outcome (1, 1), the value 1 for
the outcomes (0, 1) and (1, 0), and the value 2 for the outcome (0, 0).
15
15
1
21
, P (X = 1) = 15
Thus P (X = 0) = 38
76 + 76 = 38 , and P (X = 2) = 19 .
3.2 The probability mass function of X can be calculated by using conditional probabilities. Let Ai be the event that the ith person entering
the room is the first person matching a birthday. Then P (X = i) =
P (Ai ) for i = 2, 3, . . . , 366. Using the chain rule for conditional prob1
abilities, it follows that P (X = 2) = 365
and
P (X = i) =
365 − i + 2 i − 1
364
× ···
×
365
365
365
for i ≥ 3.
3.3 The random variable X can take on the values 0, 1, and 2. By the
law of conditional probability, P (X = 0) = 13 × 0 + 23 × 14 = 16 , P (X =
1) = 31 × 0 + 32 × 12 = 13 , and P (X = 2) = 31 × 1 + 23 × 41 = 12 .
3.4 Denote by the random variable X the number of prize winners. The
random variable X takes on the value 1 if all three digits of the lottery
number drawn are the same, the value 3 if exactly two of the three
digits of the lottery number drawn are the same, and the value 6
if all three digits of the lottery number drawn are different. There
are 10 lottery numbers for which all three digits are the same and
so P (X = 1) = 1,10
000 . There are 10 × 9 × 8 = 720 lottery numbers
with three different digits and so P (X = 6) = 1720
,000 . The probability
270
P (X = 3) = 1 − P (X = 1) − P (X = 6) = 1,000 .
3.5 The random variable X can take on the values 2, 3, and 4. Two tests
are needed if the first two tests give the depleted batteries, while three
tests are needed if the first three batteries tested are not depleted or
if a second depleted battery is found at the third test. Thus, by the
1
chain rule for conditional probabilities, P (X = 2) = 52 × 41 = 10
,
3
2
1
2
3
1
3
2
1
3
P (X = 3) = 5 × 4 × 3 + + 5 × 4 × 3 + 5 × 4 × 3 = 10 . The probability
6
P (X = 4) is calculated as 1 − P (X = 2) − P (X = 3) = 10
.
55
3.6 For 1 ≤ k ≤ 4, you get 2k−1 × 10 dollars if the first k tosses are heads
and the (k + 1)th toss is tails. You get 160 dollars if the first five tosses
are heads. The probability mass function of X is P (X = 0) = 12 ,
1
1
P (X = 10) = 14 , P (X = 20) = 18 , P (X = 40) = 16
, P (X = 80) = 32
,
1
and P (X = 160) = 32 .
3.7 The random variable X can take on the values 0, 5, 10, 15, 20, and
25. Using the chain rule for conditional probabilities, P (X = 0) = 74 ,
P (X = 5) = 17 × 46 , P (X = 10) = 27 × 64 , P (X = 15) = 27 × 16 × 54 +
1
2
4
2
1
4
2
1
1
2
7 × 6 × 5 , P (X = 20) = 7 × 6 × 5 , and P (X = 25) = 7 × 6 × 5 + 7 ×
1
1
1
2
1
6 × 5 + 7 × 6 × 5.
3.8 The sample space is the set {1, . . . , 10}, where the outcome i means
that the card with number i is picked. Let the X be your payoff. The
random variable X can take on the values 0.5, 5, 6, 7, 8, 9, and 10. We
1
4
and P (X = k) = 10
for 5 ≤ k ≤ 10. Therefore
have P (X = 0.5) = 10
P
10
4
1
E(X) = 0.5 × 10 + k=5 k × 10 = 4.70 dollars.
3.9 The probability that a randomly chosen student belongs to a particular
class gets larger when the class has more students. Therefore E(Y ) >
E(X). We have E(X) = 15 × 41 + 20 × 14 + 70 × 14 + 125 × 14 = 57.5
15
20
70
and E(Y ) = 15 × 230
+ 20 × 230
+ 70 × 230
+ 125 × 125
230 = 88.125.
3.10 Let the random variable X be the net cost of the risk reduction in
dollars. The random variable X takes on the values 2,000, 2,000−5,000
and 2,000−10,000 with respective probabilities 0.75, 0.15 and 0.10. We
have E(X) = 2,000 × 0.75 − 3,000 × 0.15 − 8,000 × 0.10 = 250 dollars.
3.11 Let the random variable
X be the amount of money you win. Then
10
12
P (X = m) = m / m and P (X = 0) = 1 − P (X = m). This gives
10
12
E(X) = m
/
.
m
m
This expression is maximal for m = 4 with E(X) =
56
33 .
3.12 Using conditional probabilities, your probability of winning is 64 × 53 =
2
2
3
5 . Hence your expected winnings is 5 × 1.25 − 5 × 1 = −0.10 dollars.
This is not a fair bet.
52
3.13 Put for abbreviation ck = k−1
. Using the second argument from
4
48
3
Example 3.2, we get P (X2 = k) = c1k k−2
1 × 52−(k−1) , P (X3 = k) =
56
1 48
ck k−3
4
4
48
2
1
× 52−(k−1)
, and P (X4 = k) = c1k k−4
3 × 52−(k−1) . This
leads to E(X2 ) = 21.2, E(X3 ) = 31.8, and E(X4 ) = 42.4. Intuitively,
the result E(Xj ) = 52+1
5 × j for 1 ≤ j ≤ 4 can be explained by a
symmetry argument.
Note: More generally, suppose that balls are removed from a bowl
containing r red and b blue balls, one at a time and at random. Then
the expected number of picks until a red ball is removed for the jth
time is r+b+1
r+1 × j for 1 ≤ j ≤ r.
2
3.14 Let the random variable X denote the number of chips you get back
in any given round. The possible values of X are 0, 2, and 5. The
random variable X is defined on the sample space consisting of the 36
equiprobable outcomes (1, 1), (1, 2), . . . , (6, 6). Outcome (i, j) means
that i points turn up on the first die and j points on the second
die. The total of the two dice is 7 for the six outcomes (1, 6), (6, 1),
6
(2, 5), (5, 2), (3, 4), and (4, 3). Thus P (X = 5) = 36
. Similarly,
15
15
P (X = 0) = 36 and P (X = 2) = 36 . This gives
E(X) = 0 ×
15
6
2
15
+2×
+5×
=1 .
36
36
36
3
You bet two chips each round. Thus, your average loss is 2 − 1 32 =
chip per round when you play the game over and over.
1
3
3.15 Let the random variable X be the total amount staked and the random
variable Y be the amount won. The probability pk that k bets will be
19 10
19 k−1 18
for k = 11. Thus,
placed is 37
37 for k = 1, . . . , 10 and 37
E(X) =
10
X
(1 + 2 + . . . + 2k−1 )pk + (1 + 2 + . . . + 29 + 1,000)p11
k=1
= 12.583 dollars.
If the round goes to 11 bets, the player’s loss is $23 if the 11th bet is
won and is $2,023 if the 11th bet is lost. Thus
E(Y ) = 1 × (1 − p11 ) − 23 × p11 ×
18
19
− 2,023 × p11 ×
37
37
= −0.3401 dollars.
The ratio of 0.3401 and 12.583 is in line with the house advantage of
2.70% of the casino.
57
3.16 Let the random variable X be the payoff of the game. Then P (X =
0) = ( 12 )m and P (X = 2k ) = ( 21 )k−1 × 12 for k = 1, . . . , m. Therefore
E(X) = m.
3.17 Take as sample space the set 0 ∪{(x, y) : x2 +y 2 ≤ 25}, where 0 means
that the dart has missed the target. The score is a random variable X
with P (X = 0) = 0.25, P (X = 5) = 0.75 × 25−9
25 = 0.48, P (X = 8) =
9−1
1
0.75 × 25 = 0.24, and P (X = 15) = 0.75 × 25
= 0.03. The expected
value of the score is E(X) = 0.48 × 5 + 0.24 × 8 + 0.03 × 15 = 4.77.
3.18 Let the random variable X be your net winnings. The random variable
takes
on the values −1, 0 and 10.
We have P (X = 0) =
X 10
4
10
6 4
1
3
1 2 / 3 = 10 , P (X = 10) = 3 / 3 = 30 , and P (X = −1) =
1 − P (X = 0) − P (X = 10) = 32 . Thus
E(X) = −1 ×
3
1
1
2
+0×
+ 10 ×
=− .
3
10
30
3
3.19 Let the random variable X be the payoff when you go for a second
spin given that the first spin showed a score of a points. Then, P (X =
a + k) = 1/1,000 for 1 ≤ k ≤ 1,000 − a and P (X = 0) = a/1,000. Thus
1,000−a
X
1
1
E(X) =
(a + k) =
(1,000 − a)(1,000 + a + 1).
1,000
2,000
k=1
The largest value of a for which E(X) > a is a∗ = 414. The optimal
strategy is to stop after the first spin if this spin gives a score of more
than 414 points.
3.20 The expected value of the number of fiches you leave the casino is
U (a, b) = b × P (a, b) + 0 × (1 − P (a, b)). Since P (a, b) ≈ (q/p)−b for
a large, we get
U (a, b) ≈ b × (q/p)−b .
It is matter of simple algebra to verify that b × (q/p)−b is maximal for
b ≈ 1/ ln(q/p). This gives
P a,
1 b
1 1
≈
≈ .
and U a,
ln(q/p)
e
ln(q/p)
e
3.21 Let the random variable X be the payoff of the game. Then,
using
conditional probabilities, P (X = 0) = ( 56 )3 , P (X = 2) = 31 61 ( 56 )2 × 45 ,
58
P (X = 2.5) = 31 61 ( 56 )2 × 51 , P (X = 3) = 32 ( 16 )2 56 , and P (X = 4) =
60
15
15
1
( 61 )3 . This gives E(X) = 0× 125
216 +2× 216 +2.5× 216 +3× 216 +4× 216 =
0.956.
3.22 Let the random variable X be the payoff in the dice game. The random
variable X can take on the values 0, 10, and 100. We
have P (X 1=
1
, P (X = 10) = 3 × 3 × 42 × ( 61 )4 = 24
,
100) = 6 × ( 61 )4 = 216
206
and P (X = 0) = 1 − P (X = 100) − P (X = 10) = 216
. Therefore
10
E(X) = 100
216 + 24 = 0.8796. The dice game is unfavorable to you.
Let Y be the payoff in the coin-tossing game. Then P (Y = 0) = 12 ,
P (Y = i) = ( 12 )i+1 for 1 ≤ i ≤ 4, and P (Y = 30) = ( 21 )5 . Therefore
P
E(Y ) = 4i=1 i × ( 21 )i+1 + 30 × ( 12 )5 = 1.75. The coin-tossing game is
also unfavorable to you.
3.23 Let the random variable X be the payoff of the game. The probability
that X takes on the value k with k < m is the same as the probability
that a randomly chosen number from the interval (0, 1) falls into the
1
1
1
1
, k+1
) or into the subinterval (1 − k+1
, 1 − k+2
). Thus
subinterval ( k+2
1
1
2
P (X = k) = 2 k+1 − k+2 for 1 ≤ k ≤ m − 1 and P (X = m) = m+1
.
1
1
The stake should be E(X) = 2 2 + · · · + m+1 .
3.24 Suppose that your current total is i points. If you decide to roll again
the die and then to stop, the expected value of the change of your total
is
6
1
20
i
1X
k− ×i=
− .
6
6
6
6
k=2
Therefore the one-stage-look-ahead rule prescribes to stop as soon as
your total is 20 points or more. This stopping rule is optimal.
3.25 Suppose you have rolled a total of i < 10 points so far. The expected
value of the change of your current total is
10−i
X
k=1
k×
1
i−4
1
1
−i×
= (10 − i)(10 − i + 1) − i(i − 4)
6
6
12
6
if you decide to continue for one more roll. This expression is positive
for i ≤ 5 and negative for i ≥ 6. Thus, the one-stage-look-ahead rule
prescribes to stop as soon as the total number of points rolled is 6 or
more. This rule maximizes the expected value of your reward.
59
3.26 Suppose that at a given moment there are i0 empty bins and i1 bins
with exactly one ball (and 25 − i0 − i1 bins with two or more balls). If
you decide to drop one more ball before stopping rather than stopping
immediately, the expected value of the change of your net winnings is
i1
i0
25 × 1 − 25 × 1.50. The one-stage-look-ahead rule prescribes to stop
in the states (i0 , i1 ) with i0 − 1.5i1 ≤ 0 and to continue otherwise. For
the case that you lose 21 k dollars for every containing k ≥ 2 balls, the
expected value of the change of your net winnings is
i0
i1
25 − i0 − i1
×1−
×2−
× 0.50
25
25
25
when you decide to drop one more ball before stopping rather than
stopping immediately. Therefore, the one-stage-look-ahead rule prescribes to stop in the states (i0 , i1 ) with 3i0 − 3i1 − 25 ≤ 0 and to
continue otherwise.
3.27 Suppose that you have gathered so far a dollars. Then there are still
w − a white balls in the bowl. If you decide to pick one more ball,
then the expected change of your bankroll is
r
1
(w − a) −
a.
r+w−a
r+w−a
w
This expression is less than or equal to zero for a ≥ r+1
. It is optimal
w
to stop as soon as you have gathered at least r+1 dollars.
3.28 Suppose your current total is i points. If you decide to continue for
one more roll, then the expected value of the change of your dollar
value is
2
4
4
6
4
4
2
+4×
+5×
+6×
+7×
+8×
+9×
3×
36
36
36
36
36
36
36
2
2
1
210
i
+ 10 ×
+ 11 ×
− ×i=
− .
36
36 6
36
6
The smallest value of i for which the expected change is less than or
equal to 0 is i = 35. The one-stage-look-ahead rule prescribes to stop
as soon you have gathered 35 or more points. This rule is optimal
among all conceivable stopping rules.
Note: The maximal expected reward is 14.22 dollars, see Problem 7.52.
P
P∞ P ∞
3.29 (a) Writing ∞
k=0 P (X > k) =
k=0
j=k+1 P (X = j) and interchanging the order of summation give
∞
X
k=0
P (X > k) =
j−1
∞ X
X
j=0 k=0
P (X = j) =
∞
X
j=0
jP (X = j)
60
P∞
and so
k=0 P (X > k) = E(X). The interchange of the order of
summation is justified by the nonnegativity of the terms involved.
(b) Let X be the largest among the 10 random numbers. Then, P (X >
k 10
for 1 ≤ k ≤ 99. Thus, E(X) =
0) = 1 and P (X > k) = 1 − 100
P99
P
(X
>
k)
=
91.4008.
k=0
3.30 Define the random variable X as the number of floors on which the
elevator will stop. Let the random variable Xj = 1 if the elevator
P does
not stop on floor j and Xj = 0 otherwise. Then X = r − rj=1 Xj .
m
m
We have P (Xj = 1) = r−1
and so E(Xj ) = r−1
for all for
r
r
j = 1, 2, . . . , r. Hence
r
r − 1 m X
E(Xj ) = r 1 −
E(X) = r −
.
r
j=1
Note: This problem is an instance of the balls-and-bins model from
Example 3.8.
3.31 Let Ik = I(Ak ). Then P (Ac1 ∩ · · · ∩ Acn ) = E[(1 − I1 ) · · · (1 − In )]. We
have that
n
X
X
Ij +
Ij Ik + · · · + (−1)n I1 · · · In .
(1 − I1 ) · · · (1 − In ) = 1 −
j=1
j<k
Taking the expected value of both sides,
that E(I
Sn noting
Tni1 · · · cIir ) is
A
given by P (Ai1 · · · Air ) and using P
=
1
−
P
k=1 k
k=1 Ak , we
get the inclusion-exclusion formula.
3.32 Let the random variable X be the number of times that two adjacent
letters are the same in a random permutation
P10 of the word Mississippi.
Then, X can be represented as X =
j=1 Xj , where the random
variable Xj equals 1 if the letters j and j + 1 are the same in the
random permutation and equals 0 otherwise. By numbering the eleven
letters of the word Mississippi as 1, 2, . . . , 11, it is easily seen that
4 × 3 × 9! + 4 × 3 × 9! + 2 × 1 × 9!
26
=
.
11!
110
for all j (alternatively, using conditional probabilities, P (Xj = 1) can
3
4
3
2
1
26
4
× 10
+ 11
× 10
+ 11
× 10
= 110
). Hence, using the
be calculated as 11
linearity of the expectation operator,
P (Xj = 1) =
E(X) =
10
X
j=1
E(Xj ) = 10 ×
26
= 2.364.
110
61
3.33 Let the indicator variable Ik be equal to 1 if the kth team has a married
3
couple and zero otherwise. Then P (Ik = 1) = (12 × 22)/ 24
3 = 23
for any k. The expected number of teams with a married couple is
P
8
24
k=1 E(Ik ) = 23 .
Note: As a sanity check, the probability that a given team has no
20
22
× 20
married couple is 23
22 = 23 , by using the chain rule for conditional
probabilities.
3.34 It is no restriction to assume that n is even. Let the indicator variable
I2j be 1 if the random walk returns to the zero level at time 2j and
0 otherwise. Then, by the linearity of the expectation operator, the
expected number of returns to the zero level during the first n time
Pn/2
units is
j=1 E(I2j ). Let pj = P (I2j√= 1), then E(I2j ) = pj . In
Example 1.4, it is shown that pj ≈ 1/ πj for j large. Next, we get
p
Pn/2
√
2/π n for n large.
j=1 E(I2j ) ≈
3.35 Let Ik be 1 if two balls of the same color are removed on the kth pick
and 0 otherwise. The expected
Pr+b number of times that you pick two
E(Ik ). By a symmetry argument, each
balls of the same color is k=1
Ik has the same distribution as I1 (the order in which you draw the
pairs of balls does not matter, that is, for all practical purposes the
kth pair can be considered as the first pair). Since
2r
2r − 1
2b
2b − 1
×
+
×
,
2r + 2b 2r + 2b − 1 2r + 2b 2r + 2b − 1
the expected number of times that you pick two balls of the same color
is [r(2r − 1) + b(2b − 1)]/(2r + 2b − 1).
P (I1 = 1) =
3.36 (a) Let Xi be equal to 1 if there is a birthday on day i and 0 otherwise.
For each i, P (Xi = 0) = (364/365)100 and P (Xi = 1) = 1−P (Xi = 0).
The expected number of distinct birthdays is
365
X
Xi = 365 × [1 − (364/365)100 ] = 87.6.
E
i=1
(b) Let Xi be equal to 1 if some child in the second class shares the
birthday of the ith child in the first class and Xi is zero otherwise.
Then P (Xi = 1) = 1 − (364/365)s for all i. The expected number of
children in the first class sharing a birthday with some child in the
other class is
r
X
Xi = r × [1 − (364/365)s ].
E
i=1
62
3.37 Let the indicator variable Is be 1 if item s belongs to T after n steps
n
n
and so E(Is ) = 1− 1− n1 .
and 0 otherwise. Then P (Is = 0) = n−1
n
Thus theexpected value
of the number of distinct items in T after n
n
steps is n 1− 1− n1
. Note that the expected value is about n 1− 1e
for n large.
3.38 Let the random variable Xi be 1 if the ith person survives the first
round and Xi be zero otherwise. The random variable Xi takes on
the value 1 if and only if nobody of the other n − 1 persons shoots at
person i. Hence
n − 2 n−1
n−2
n−2
× ··· ×
=
P (Xi = 1) =
n−1
n−1
n−1
for all 1 ≤ i ≤ n. The expected value of the number of people who
survive the first round is
n−1
n
X
n − 2 n−1
1
Xi = n
E
=n 1−
.
n−1
n−1
i=1
This expected value can be approximated by
e = 2.71828 . . ..
n
e
for n large, where
3.39 Label the white balls as 1, . . . , w. Let the indicator variable Ik be equal
to 1 if the white ball with label k remains in the bag when you stop
and 0 otherwise. To find P (Ik = 1), you can discard the other white
1
balls. Therefore
Pw P (Ik ) = r+1w. The expected number of remaining
white balls is k=1 E(Ik ) = r+1 .
3.40 Number the 25 persons as 1, 2, . . . , 25 and let person 1 be the originator
of the rumor. Let the random variable Xk be equal to 1 if person k
hears about the rumor for 2 ≤ k ≤ 25. For fixed k with 2 ≤ k ≤ 25,
let Aj be the event that person k hears about the rumor for the first
time P
when the rumor is told the jth time. Then, E(Xk ) = P (Xk =
1) = 10
j=1 P (Aj ), where
1
and P (Aj ) =
P (A1 ) =
24
1 j−2 1
1
1−
1−
, 2 ≤ j ≤ 10.
24
23
23
This gives
10 X
1
1
1 j−2 1
1−
E(Xk ) =
+
= 0.35765
1−
24
24
23
23
j=2
63
for 2 ≤ k ≤ 25. Hence the expected value of the number of persons
who know about the rumor is
1+
25
X
k=2
E(Xk ) = 1 + 24 × 0.35765 = 9.584.
3.41 Let the indicator variable Ik be equal to 1 if the numbers k and
k + 1 appear
45in
the lotto drawing and 0 otherwise. Then, P (Ik =
1) = 43
/
4
6 . The expected number of consecutive numbers is
P44
2
k=1 E(Ik ) = 3 .
3.42 For any k ≥ 2, let Xk be the amount
Ps you get at the kth game. Then
the total amount you will get is
k=2 Xk . By the linearity of the
expectation
operator,
the
expected
value
of the total amount you will
Ps
1
get is k=2 E(Xk ). Since P (Xk = 1) = k(k−1)
and P (Xk = 0) =
1
1 − P (Xk = 1), it follows that E(Xk ) = k(k−1) for k = 2, . . . , s. Hence
the expected value of the total amount you will get is equal to
s
X
k=2
1
s−1
=
.
k(k − 1)
s
The fact that the sum equals
s−1
s
is easily verified by induction.
3.43 Let the random variable Xi be equal to 1 if box i contains more than
3 apples and Xi be equal to 0 otherwise. Then,
25 X
25
1 k 9 25−k
P (Xi = 1) =
= 0.2364.
k
10
10
k=4
and so E(Xi ) = 0.2364. Thus the expected value P
of the number
of
10
X
boxes containing more than 3 apples is given by E
=
10
×
k=1 k
0.2364 = 2.364.
3.44 In Problem 1.73 the reader was asked to prove that the probability that
a particular number r belongs to a cycle of length k is n1 , regardless
of the value of k (this result can also be proved by using conditional
n−2)
n−k+2
1
1
probabilities: n−1
n × n−1 × · · · × n−k+1 × n−k+1 = n ). Thus, for fixed
1
k, E(Xr ) = n for r = 1, . . . , n. Therefore the expected number of
cycles of length k is
n
n
1 X 1 X
1
E
Xr =
E(Xr ) =
k
k
k
r=1
r=1
64
for any 1 ≤ k ≤ n. This shows that the expected value of the total
number of cycles is
n
X
1
.
k
k=1
This sum can be approximated by ln(n) + γ for n sufficiently large,
where γ = 0.57722 . . . is Euler’s constant.
3.45 For any i 6= j, let Xij = 1 if the integers i and j are switched in the
random permutation and Xij = 0 otherwise. Then P (Xij = 1) =
(n−2)!
n! . The expected number of interchanges is
n−1
X
n
X
i=1 j=i+1
1
1
n
= .
E(Xij ) =
2
2 n(n − 1)
3.46 Let X be the payoff in dollars for investment A and Y be the payoff
in dollars for investment B . Then
E(X) = 0.20 × 1,000 + 0.40 × 2,000 + 0.40 × 3,000 = 2,200.
Also, we have
E(X 2 ) = 0.20 × 1,0002 + 0.40 × 2,0002 + 0.40 × 3,0002 = 5,400,000.
Thus σ 2 (X) = 5,400,000 − 2,2002 = 560,000 and so σ(X) = 748.33.
In the same way, we get E(Y ) = 2,200, E(Y 2 ) = 5,065,000, σ 2 (Y ) =
225,000, and σ(Y ) = 474.34.
P
P
3.47 Using the basic sums nk=1 k = 12 n(n + 1) and nk=1 k 2 = 16 n(n +
1)(2n + 1), it follows that E(X) and E(X 2 ) are 21 (a + b) and 61 (2a2 +
1
(a2 − 2ab − 2a + b2 + 2b).
2ab − a + 2b2 + b), which leads to var(X) = 12
3.48 Since E(X) = pa + (1 − p)b and E(X 2 ) = pa2 + (1 − p)b2 , we have
2
var(X) = pa2 + (1 − p)b2 − pa + (1 − p)b . It is a matter of simple
algebra to get var(X) = p(1 − p)(a − b)2 . Then, by noting that p(1 − p)
is maximal for p = 21 , the desired result follows.
Note: We have E(X) ≤ 14 (a − b)2 for any discrete random variable X
that is concentrated on the integers a, a + 1, . . . , b.
3.49 Let IkPbe 1 if the kth team hasP
a married couple
0 otherwise. Let
Pand P
X = 8k=1 Ik . Then E(X 2 ) = 8k=1 E(Ik2 ) + 2 7j=1 8k=j+1 E(Ij Ik ).
65
We have
E(Ik2 ) = P (Ik = 1) =
12 × 22
24
for all k
3
E(Ij Ik ) = P (Ij = 1, Ik = 1) =
12 × 11 × 20 × 19
24 21
3
This gives E(X) =
24
23 ,
E(X 2 ) =
48
23 ,
3
for all j 6= k.
and σ(X) = 0.9981.
3.50 Define X as the number of integers that do not show up in the 15
lotto drawings. Let Xi = 1 if the number i does not show
P up in the
15 lotto drawings and Xi = 0 otherwise. Then, X = 45
i=1 Xi . The
probability that
a specified number does not show up in any given
45
15
drawing is 44
/
6 = 39/45. Hence E(Xi ) = P (Xi = 1) = (39/45)
6
and so
!15
39
E(X) = 45 ×
= 5.2601.
45
The probability that two specified numbers
45 i and j with i 6= j do not
43
show up in any given drawing is 6 / 6 = (39/45) × (38/44). Hence
E(Xi Xj ) = P (Xi = 1, Xj = 1) = [(39 × 38)/(45 × 44)]15 and so
15
39 × 38 15
45
39
2
×
+2
= 30.9292.
E(X ) = 45 ×
2
45
45 × 44
This leads to σ(X) = 1.8057.
3.51 By the substitution rule, we have
10
10
X
X
11 − k
11 − k
2
k
E(X) =
k2
= 4 and E(X ) =
= 22,
55
55
k=1
k=1
√
and so σ(X) = 22 − 16 = 2.449. The number of reimbursed treatments is Y = g(X), where g(x) = min(x, 5). Then, by applying the
substitution rule,
E(Y ) =
10
4
X
37
11 − k X 11 − k
+
5
=
k
55
55
11
k=1
4
X
E(Y 2 ) =
k=1
k2
k=5
10
X
11 − k
+
55
k=5
25
11 − k
151
=
.
55
11
p
The standard deviation σ(Y ) = 151/11 − (37/11)2 ) = 1.553.
66
3.52 Let V be the stock left over and W be the amount of unsatisfied
demand. We have V = g1 (X) and W = g2 (X), where the functions
g1 (x) and g2 (x) are given by g1 (x) = Q − x for x < Q and g1 (x) = 0
otherwise, and g2 (x) = x − Q for x > Q and g1 (x) = 0 otherwise. By
the substitution rule,
E(V ) =
Q−1
X
k=0
(Q − k)pk and E(W ) =
∞
X
(k − Q)pk .
k=Q+1
PQ
P∞
P
Note: Writing ∞
k=0 (k − Q)pk −
k=Q+1 (k − Q)pk as
k=0 (k − Q)pk ,
it follows that E(W ) = µ−Q+E(V ), where µ is the expected demand.
3.53 Let the random variable X be the number of repairs that will be
necessary in the coming year and Y be the maintenance costs in excess
of the prepaid costs. Then Y = g(X), where the function g(x) =
100(x − 155) for x > 155 and g(x) = 0 otherwise. By the substitution
rule,
E(Y ) =
∞
X
x=156
∞
X
100(x − 155)e−150
E(Y 2 ) =
x=156
150x
= 280.995
x!
1002 (x − 155)2 e−150
The standard deviation of Y is
150x
= 387,929.
x!
p
E(Y 2 ) − E 2 (Y ) = 555.85.
3.54 Let the random variable X be the monthly demand for the medicine
and Y be the net profit in any given month. Then Y = g(X), where
the function g(x) is given by
400x − 800
for 3 ≤ x ≤ 8
g(x) =
400x − 800 − 350(x − 8) for x > 8.
By the substitution rule,
E(Y ) =
10
X
x=3
g(x) P (X = x) and E(Y 2 ) =
10
X
[g(x)]2 P (X = x).
x=3
p
The standard deviation of Y is E(Y 2 ) − E 2 (Y ). Substituting the
values of g(x) and P (X = x), it follows after some calculations that the
expected value and the standard deviation of the monthly net profit
Y = g(X) are given by $1227.50 and $711.24.
67
3.55 Your probability of winning the contest is
n
X
k=0
1 1
,
P (X = k) = E
k+1
1+X
by the law of conditional probability. The random variable X can
be written as X = X1 + · · · + Xn , where Xi is equal to 1 if the ith
person in the first round survives this round and 0 otherwise. Since
E(Xi ) = n1 for all i, we have E(X) = 1. Thus, by Jensen’s inequality,
E
1 1
1
≥
= .
1+X
1 + E(X)
2
3.56 By the linearity of the expectation operator, E[(X1 +· · ·+Xn )2 +(Y1 +
2 ] + E[(Y + · · · + Y )2 ]. Using
· · · + Yn )2 ] is equal to E[(X1 + · · · + Xn )P
1 P
n
n−1 Pn
the algebraic formula, (a1 +· · ·+an )2 = ni=1 a2i +2 i=1
j=i+1 ai aj
and again the linearity of the expectation operator, it follows that
E[(X1 + · · · + Xn )2 ] =
=
n
X
E(Xi2 ) + 2
i=1
n
X
E(Xi Xj )
i=1 j=i+1
i=1
n
X
n−1
X
E(Xi2 ) + 2
n−1
X
n
X
E(Xi )E(Xj ),
i=1 j=i+1
where the last equality uses the fact that E(Xi Xj ) = E(Xi )E(Xj )
by the independence of Xi and Xj for i 6= j. For each i, E(Xi ) =
1 × 14 + (−1) × 14 = 0 and E(Xi2 ) = 12 × 41 + (−1)2 × 41 = 12 . This gives
1
E[(X1 + · · · + Xn )2 ] = n.
2
In the same way, E[(Y1 +· · ·+Yn )2 ] = 12 n. Hence we find the interesting
result that the expected value of the squared distance between the
drunkard’s position after n steps and his starting position is equal to
n for any value of n.
Note: It is not true that the expected value of the distance between
the drunkard’s position after n steps and his starting position equals
√
n. Otherwise, the variance of the distance would be zero and so the
distance would exhibit no variability, but this cannot be true.
3.57 For x, y ∈ {−1, 1}, we have P (X = x, Y = y) = P (X = x | Y =
y)P (Y = y) and P (X = x | Y = y) = P (Z = x/y | Y = y). Since Y
68
and Z are independent, P (Z = x/y | Y = y) = P (Z = x/y) = 0.5.
This gives
P (X = x, Y = y) = 0.5 × P (Y = y)
for all x, y ∈ {−1, 1}.
Also, P (X = 1) = P (Y = 1, Z = 1) + P (Y = −1, Z = −1). Thus, by
the independence of Y and Z, we get P (X = 1) = 0.25 + 0.25 = 0.5
and so P (X = −1) = 0.5. Therefore, the result P (X = x, Y =
y) = 0.5 × P (Y = y) implies P (X = x, Y = y) = P (X = x)P (Y = y),
proving that X and Y are independent. However, X is not independent
of Y + Z. To see this, note that P (X = 1, Y + Z = 0) = 0 and
P (X = 1)P (Y + Z = 0) > 0.
3.59 Noting that X
= Xi−1 +P
Ri , we get Xi = R2 + · · · + Ri for 2 ≤ i ≤ 10.
Pi 10
This implies i=2 Xi = 10
k=2 (11 − k)Rk . Since P (Rk = 0) = P (Rk =
1) = 12 , we have E(Rk ) = 12 and σ 2 (Rk ) = 41 . The random variables
Rk are independent and so, by the Rules 3.1 and 3.9,
E
10
X
i=2
σ2
10
X
(11 − k)E(Rk ) = 22.5
Xi =
10
X
Xi =
i=1
k=2
10
X
k=2
(11 − k)2 σ 2 (Rk ) = 71.25.
Pk−1
3.60 We have P (X + Y = k) = j=1
(1 − p)j p (1 − p)k−j−1 p for k = 2, 3, . . .,
by the convolution formula. This leads to
P (X + Y = k) = (k − 1)p2 (1 − p)k−2
for k = 2, 3, . . . .
P∞
P∞
3.61 We have E
k=1 E(Xk Ik ), since it is always allowed
k=1 Xk Ik =
to interchange expectation and summation for nonnegative random
variables. Since Xk and Ik are independent, E(Xk Ik ) = E(Xk )E(Ik )
for any k ≥ 1. Also, E(Ik ) = P (N ≥ k). Thus,
E
∞
X
k=1
Xk Ik = E(X1 )
∞
X
k=1
P (N ≥ k) = E(X1 )E(N ),
P
using the fact that E(N ) = ∞
n=0 P (N > n), see Problem 3.29.
Note: If the Xk are
not
nonnegative,
applies in view of
P∞
P∞ the proof still
P∞
the fact that E
Y
)
=
E(Y
)
when
k
k=1 k
k=1
k=1 E(|Yk |) < ∞.
The proof of this so-called dominated convergence result can be found
in advanced texts.
69
3.62 Let the random variable X be the number of passengers showing up.
Then X is binomially distributed with parameters n = 160 and p =
0.9. The overbooking probability is
160 X
160
0.9k (0.1)160−k = 0.0359.
k
k=151
Denote by the random variable R the daily return. Using the substitution rule, the expected value of R is calculated as
150
X
160
X
160
75k + 37.5(160 − k)
75 × 150
0.9k (0.1)160−k +
k
k=0
k=151
160
+37.5(160 − k) − 425(k − 150)
0.9k (0.1)160−k .
k
This gives E(R) = $11367.63. Also, by the substitution rule, E(R2 )
is calculated as
160
150
X
X
2 160
75 × 150
75k + 37.5(160 − k)
0.9k (0.1)160−k
k
k=151
k=0
2 160
+37.5(160 − k) − 425(k − 150)
0.9k (0.1)160−k .
k
p
Hence the standard deviation of R equals E(R2 ) − E 2 (R) =$194.71.
3.63 The probability
Pr =
6r X
6r 1 k 5 6r−k
k=r
k
6
6
of getting at least r sixes in one throw of 6r dice has the values 0.6651,
0.6187, and 0.5963 for r = 1, 2, and 3. Thus it is best to throw 6 dice.
Pepys believed that it was best to throw 18 dice.
Note: The probability Pr is decreasing in r and tends to 12 as r → ∞.
3.64 If the competition would
been
the Yankees would have
kcontinued,
Phave
7
7
7−k
won with probability
= 11
k=2 k 0.5 0.5
16 . The prize money
should be divided between the Yankees and the Mets according to the
proportion 11:5.
3.65 This question can be translated into the question what the probability
is of getting 57 or less heads in 199 tosses of a fair coin. A binomial
70
random variable with parameters n = 199 and p√= 0.5 has expected
value 199 × 0.5 = 99.5 and standard deviation 0.5 199 = 7.053. Thus
the observed number of polio cases in the treatment group is more
than six standard deviations below the expected number. Without
doing any further calculations, we can say that the probability of this
occurring is extremely small (the precise value of the probability is
7.4 × 10−10 ). This makes clear that there is overwhelming evidence
that the vaccine does work.
3.66 Let X be the number of beans that will come up white and Y be the
number
8 of point gained by the bean thrower. Then,
P3 P (X = k) =
8
0.5
for
k
=
0,
1,
.
.
.
,
8.
We
have
P
(Y
=
1)
=
k=0 P (X = 2k +
k
1
1) = 0.5, P (Y = 2) = P (X = 0) + P (X = 8) = 128 , and P (Y = −1) =
63
3
1 − P (Y = 1) − P (Y = 2) = 128
. This gives E(Y ) = 128
. Thus the
bean thrower has a slight advantage.
3.67 Using the law of conditional probability, we have that the sought probability is given by
n
X
k n k
p (1 − p)n−k .
n k
k=0
This probability is nothing else than n1 E(X), where X is binomially
distributed with parameters n and p. Thus the probability that you
will be admitted to the program is np
n = p.
3.68 Let X and Y be the numbers of successful penalty kicks of the two
teams. The independent random variables X and Y are binomially
distributed
with parameters n = 5 and p = 0.7. The probability of a
P
tie is 5k=0 P (X = k, Y = k. Using the independence of X and Y ,
this probability can be evaluated as
5 X
5
k=0
k
k
5−k
0.7 0.3
5
×
0.7k 0.35−k = 0.2716.
k
3.69 Let the random variable X be the number of coins that will be set
aside. Then the random variable 100 − X is binomially distributed
with parameters n = 100 and p = 81 . Therefore P (X = k) =
1 100−k 7 k
100
( 8 ) for k = 0, 1, . . . , 100.
100−k ( 8 )
3.70 Using only the expected value and the standard deviation of the binomial distribution, you can see that is highly unlikely that the medium
71
has to be paid out. The expected value and the standard deviation
of the binomial distribution with parameters n = 250 and p = 15 are
q
50 and 250 × 51 × 54 = 6.32. Thus the requirement of 82 or more
correct answers means more than five standard deviations above the
expected value, which has a negligible probability.
Note: The psychic who took the challenge of the famous scientific
skeptic James Randi was able to get only fifty predictions correct.
3.71 Let the random variable X be the number of rounds you get cards with
an ace. If the cards are well-shuffled each time, then
X is binomially
52
48
distributed with parameters n = 10 and p = 1 − 13 / 13 . The answer
to the question should be based on
P (X ≤ 2) =
2 X
10
k=0
k
pk (1 − p)10−k = 0.0017.
This small probability is a strong indication that the cards were not
well-shuffled.
3.72 By the same argument as used for the binomial distribution, P (X1 =
x1 , X2 = x2 , . . . , Xr = xr ) is given by
n
n − x1
n − x1 − · · · − xr−1 x1
···
p1 · · · pxr r .
x1
x2
xr
Using this expression, the result follows.
3.73 Let Xi be the number of times that image i shows up in the roll of the
five poker dice. Then (X1 , X2 , . . . , X6 ) has a multinomial distribution
with parameters n = 5 and p1 = p2 = · · · = p6 = 61 . Let X be the
payoff to the player for each unit staked. Then E(X) = 3 × P (X1 ≥
1, X2 ≥ 1). This gives
E(X) = 3
4 5−x
X1
X
x1 =1 x2 =1
5!
px1 px2 (1 − p1 − p2 )5−x1 −x2
x1 !x2 !(5 − x1 − x2 )! 1 2
= 0.9838.
3.74 The number of winners in any month has a binomial distribution with
50
parameters n = 200 and p = 450,000
. This distribution can be very
well approximated by a Poisson distribution with an expected value
72
1
50
= 45
. The monthly amount the corporation
of λ = 200 × 450,000
will have to give away is $0 with probability e−λ = 0.9780, $25,000
2
with probability e−λ λ = 0.0217 and $50,000 with probability e−λ λ2! =
2.4 × 10−4 .
3.75 Let the random variable X be the number of king’s rolls. Then X has
a binomial distribution with parameters n = 4×6r−1 and p = 61r . This
distribution converges to a Poisson distribution with expected value
np = 32 as r → ∞. The binomial probability 1 − (1 − p)n tends very
fast to the Poisson probability 1 − e−2/3 = 0.48658 as r → ∞. The
binomial probability has the values 0.51775, 0.49140, 0.48738, 0.48660,
and 0.48658 for r = 1, 2, 3, 5, and 7.
3.76 An appropriate model is the Poisson model. We have the situation of
1
500 independent trials each having a success probability of 365
. The
number of marriages having the feature that both partners are born
on the same day is approximately distributed as a Poisson random
variable with expected value 500
365 = 1.3699. The approximate Poisson
probabilities are 0.2541, 0.3481, 0.2385, 0.1089, 0.0373, and 0.0102 for
0, 1, 2, 3, 4, and 5 matches, while the exact binomial probabilities are
0.2537, 0.3484, 0.2388, 0.1089, 0.0372, and 0.0101
3.77 The Poisson model is an appropriate model. Using the fact that the
sum of two independent Poisson randomPvariables is again Poisson
−6.2 6.2k /k! = 0.0514.
distributed, the sought probability is 1 − 10
k=0 e
3.78 It is reasonable to model the number goals scored per team per game
128
by a Poisson random variable X with expected value 2×48
= 1.3125.
To see how good this fit is, we calculate P (X = k) for k = 0, 1, 2, 3
and P (X > 3). These probabilities have the values 0.2691, 0.3533,
0.2318, 0.1014, and 0.0417. These values are close to the empirical
probabilities 0.2708, 0.3542, 0.2500, 0.0833, and 0.0444.
3.79 Using the Poisson model, an estimate is 1−
P7
k=0 e
−4.2 4.2k /k!
= 0.064.
3.80 The probability that you will the jackpot in any given week
by submitting 5 six-number sequences in the lottery 6/42 is 5/ 42
6 = 9.531 ×
−7
10 . The number of times that you will win the jackpot in the next
312 drawings of the lottery can be modeled by a Poisson distribution
73
with expected value λ0 = 312 × 5/
42
6
= 2.9738 × 10−4 . Therefore
P (you win the jackpot two or more times in the next 312 drawings)
= 1 − e−λ0 − λ0 e−λ0 = 4.421 × 10−8 .
Thus the number of people under the 100 million players who will
win the jackpot two or more times in the coming three years can be
modeled by a Poisson distribution with expected value
λ = 100,000,000 × 4.421 × 10−8 = 4.421.
The sought probability is 1 − e−λ = 0.9880.
3.81 Let X be the number of weekly winners. An appropriate model for
X is the Poisson distribution with
√ expected value 0.25. The standard
deviation of this distribution is 0.25 = 0.5. The observed number of
winners lies 3−0.25
0.5 = 5.5 standard deviations above the expected value.
Without doing any further calculations, we can say that the probability
of three or more winners is quite small (P (X ≥ 3) = 2.2 × 10−3 ).
3.82 (a) Suppose that X has a Poisson distribution. By the substitution
rule,
E[λg(X + 1) − Xg(X)]
∞
∞
X
λk X
λk
=
λg(k + 1)e−λ
−
kg(k)e−λ
k!
k!
k=0
=
∞
X
λg(k + 1)e−λ
k=0
λk
k!
k=0
∞
X
−λ
g(l + 1)e−λ
l=0
λl
= 0.
l!
(b) Let pj = P (X = j) for j = 0, 1, . . .. For fixed i ≥ 1, define the
indicator function g(x) by g(k) = 1 for k = i and g(k) = 0 for k 6= i.
Then the relation E[λg(X + 1) − Xg(X)] = 0 reduces to
λpi−1 − ipi = 0.
This gives pi = λi pi−1 for i ≥ 0. By repeated application of this
i
equation,
it next follows that pi = λi! p0 for i ≥ 0. Using the fact that
P∞
−λ . This gives
i=0 pi = 1, we get p0 = e
P (X = i) = e−λ
proving the desired result.
λi
i!
for i = 0, 1, . . . ,
74
3.83 Translate the problem into a chance experiment with 25
= 2,300
3
trials. There is a trial for each possible combination of three people.
1 2
)
Three given people have the same birthday with probability ( 365
and have birthdays falling within one day of each other with prob1 2
) . The Poisson heuristic gives the approximations
ability 7 × ( 365
2
2
−2
,
300×(1/365)
= 0.0171 and 1 − e−2,300×7/(365) = 0.1138 for the
1−e
sought probabilities. Simulation shows that the approximate values
are close to the exact values. In a simulation study we found the
values 0.016 and 0.103.
3.84 Label the four suits as i = 1, . . . , 4. Translate the problem into a
chance experiment with 4 trials. The ith trial is said to be successful if suit i is missing
inthe bridge hand. The success probability of
52
each trial is p = 39
/
13
13 . The Poisson heuristic gives the approxima−4p
tion 1 − e
to the probability that some suit will be missing in the
bridge hand. The approximate value is 0.0499. The exact value can
be obtained by the inclusion-exclusion formula and is 0.0511.
3.85 Translate the problem into a chance experiment with n trials. The ith
trial is said to be successful if couple i is paired as bridge partners.
1
The success probability of each trial is p = 2n−1
. Letting λ = np =
n
−n/(2n−1) to
2n−1 , the Poisson heuristic gives the approximation 1 − e
the probability that no couple will be paired as bridge partners. For
n = 10, the approximate value is 0.4092. This approximate value
is quite close to the exact value 0.4088, which is obtained from the
inclusion-exclusion method.
3.86 Imagine a chance experiment with 51 trials. In the ith trial the face
values of the cards in the positions i and i + 1 are compared. The trial
is said to be successful if the face values are the same. The success
3
probability is 51
. The sought probability is equal to the probability of
no successful trial. The latter probability can be approximated by the
Poisson probability e−51×(3/51) = e−3 = 0.950. In a simulation study
we found the value 0.955.
3.87 Translate the problem into a chance experiment with 365 trials. The
ith trial is said to be successful if seven or more people have their
354 75
birthdays on day i. The success probability is p = [1 − ( 365
) − 75 ×
364 74
1
×
(
)
].
Letting
λ
=
365p
=
6.6603,
the
Poisson
heuristic
gives
365
365
P6
−λ
k
the approximation 1 − k=0 e λ /k! = 0.499 to the probability that
there are seven or more days so that on each of these days two or more
75
people have their birthday. In a simulation study we found the value
0.516.
3.88 There are n2 combinations of two different integers from 1 to n. Take
a random permutation
of the integers 1 to n. Imagine a chance ex
n
periment with 2 trials, where the ith trial is said to be successful
if the two integers involved in the trial have interchanged positions
in the random permutation. The success probability of each trial is
(n−2)!
1
= n(n−1)
. The number of successful trials can be approximated
n!
1
= 21 . In parby a Poisson distribution with expected value n2 n(n−1)
1
ticular, the probability of no successful trial is approximated by e− 2 .
3.89 Think of a sequence of 2n trials with n = 5, where in each trial a
person draws at random a card from the hat. The trial is said to be
successful if the person draws the card with their own number, or the
card with the number of their spouse. The success probability of each
2
trial is p = 2n
. The number of successes can be approximated by a
Poisson distribution with expected value λ = 2n×p = 2. In particular,
the probability of no success can be approximated by e−2 = 0.1353.
The exact value is 0.1213 when n = 5. This value is obtained from the
exact formula
Z ∞
1
(x2 − 4x + 2)n e−x dx,
(2n)! 0
which is stated without proof.
Note: Using the exact formula, it can be experimentally verified that
1
e−2 (1 − 2n
) is a better approximation than e−2 . In a generalization of
the Las Vegas card game, you have two thoroughly shuffled decks of
cards, where each deck has r(= 13) types of cards and s(= 4) cards of
each type. A match occurs when two cards of the same type occupy
the same position in their respective decks. Then the probability of no
match can be approximated by e−s for r large, while the exact value
of this probability can be calculated from
r Z ∞
rs (s!)
[Ls (x)]r e−x dx
(−1)
(rs)! 0
j
P
with Ls (x) = sj=0 (−1)j sj xj! is the Laguerre polynomial of degree s.
3.90 Imagine a chance experiment with b trials, where the ith trial is said
to be successful if the ith bin receives no ball. The success probability
m
of each trial is p = b−1
. The trials are weakly dependent when b
b
76
is large. Then the probability mass function of the number of empty
bins can be approximated by a Poisson distribution with expected
m
.
value b b−1
b
3.91 Imagine a trial for each person. The trial is said to be successful if
the person involved has a lone birthday. The success probability is
364 m−1
p = ( 365
)
. The probability that nobody in the group has a lone
birthday is the same as the probability of having no successful trial.
Thus, by the Poisson heuristic, the probability that nobody in the
m−1
.
group has a lone birthday is approximately equal to e−m(364/365)
This leads to the approximate value 3,061 for the minimum number
of people that are required in order to have a fifty-fifty probability of
no lone birthday. The exact value is 3,064, see Problem 1.89.
3.92 Translate the problem into a chance experiment with 8 trials. The ith
trial is said to be successful if you have predicted correctly the two
=
teams for the ith match. The success probability is p = 8×2×14!
16!
1
1
8
.
The
Poisson
distribution
with
expected
value
λ
=
8
×
=
15
15
15
provides a remarkable accurate approximation for the distribution of
the number of correctly predicted matches. In a simulation study we
found the values 0.587, 0.312, 0.083, and 0.015 for the probability that
k matches are correctly predicted for k = 0, 1, 2, and 3, while the
approximate values are 0.5866, 0.3129, 0.0834, and 0.0148.
3.93 To approximate the probability of drawing two consecutive numbers,
translate the problem into a chance experiment with 44 trials, where
there is a trial for any two consecutive numbers from 1 to 45. The
45
probability of drawing two specific consecutive numbers is 43
/
4
6 .
45
Thus, letting λ1 = 44 × 43
/
,
the
Poisson
heuristic
gives
the
4
6
approximation 1 − e−λ1 = 0.487 for the probability of drawing
45two
consecutive numbers. In the same way, letting λ2 = 43 × 42
3 / 6 ,
we get the approximation 1 − e−λ2 = 0.059 for the probability of
drawing three consecutive numbers. In a simulation study we found
the values 0.529 and 0.056 for the two probabilities.
Note: An exact expression for the probability of two or more consecutive numbers in a draw of the lottery is given by
40
45
1−
/
.
6
6
The trick to get this result is as follows. There is a one-to-one correspondence between the non-adjacent ways of choosing six distinct
77
numbers from 1 to 45 and all ways of choosing six distinct numbers
from 1 to 40. To explain this, take a particular non-adjacent draw of
6 from 45, say 3 − 12 − 18 − 27 − 35 − 44. This non-adjacent draw
can be converted to a draw of 6 from 40 by subtracting respectively
0, 1, 2, 3, 4, and 5 from the ordered six numbers. This gives the draw
3 − 11 − 16 − 24 − 31 − 39 for the Lotto 6/40. Conversely, take any set
of 6 from 40 and add respectively 0, 1, 2, 3, 4, and 5.
3.94 Imagine that the twenty numbers drawn from the numbers 1, . . . , 80
are identified as R = 20 red balls in an urn and that the remaining
sixty, nonchosen numbers are identified as W = 60 white balls in the
urn. You have ticked ten numbers on your game form. The probability
that you have chosen r numbers from the red group is simply the
probability that r red balls will come up in the random drawing of
n = 10 balls from the urn when no balls are replaced. Thus
60 20
P (r numbers correct out of 10 ticked numbers) =
r
10−r
80
10
.
This probability has the values 4.58 × 10−2 , 1.80 × 10−1 , 2.95 × 10−1 ,
2.67 × 10−1 , 1.47 × 10−1 , 5.14 × 10−2 , 1.15 × 10−2 , 1.61 × 10−3 , 1.35 ×
10−4 , 6.12 × 10−6 , and 1.12 × 10−7 for r = 0, 1, . . . 10.
3.95 Let X denote how many numbers you will correctly guess. Then X
has a hypergeometric distribution with parameters R = 5, W = 34,
and n = 5. Therefore
34 5
P (X = k) =
k
5−k
39
5
for k = 0, . . . , 5.
Let E be the expected payoff per dollar staked. Then,
E = 100,000 × P (X = 5) + 500 × P (X = 4) + 25 × P (X = 3)
+ E × P (X = 2).
This gives E = 0.631. The house percentage is 36.9%.
3.96 Let the random variable X be the number of left shoes among the four
shoes you have chosen. The random variable X has a hypergeometric
distribution with parameters R = 10, W = 10 and n = 4 applies. The
desired probability is
10
10
1 − P (X = 0) − P (X = 4) = 1 −
4
20
4
−
4
20
4
= 0.9133.
78
3.97 The hypergeometric model with R = W = 25 and n = 25 is applicable
under the hypothesis that the psychologist blindly guesses which 25
persons are left-handed. Then, the probability of identifying correctly
18 or more of the 25 left-handers is
25 25
25
X
k
25−k
= 2.1 × 10−3 .
50
25
k=18
This small probability provides evidence against the hypothesis.
3.98 Let the random variables X1 , X2 , and X3 indicate how many syndicate
tickets that match five, four, or three of the winning numbers. Then
the expected amount of money won by the syndicate is
$25,000 × E(X1 ) + $925 × E(X2 ) + $27.50 × E(X3 ).
To work out this expression, we need the probability that a particular
ticket matches 5, 4, or 3 numbers given that none of the 200,000 tickets
matches all six winning numbers. Denote by Ak the event that a
particular ticket matches exactly k of the six winning numbers and B
be the event that none of the 200,000 tickets matches all six winning
numbers. Then, by the hypergeometric model,
40 6
P (Ak ) =
k
6−k
46
6
.
The sought probability P (Ak | B) satisfies
P (Ak | B) =
P (Ak B)
P (B | Ak )P (Ak )
=
.
P (B)
P (B)
The probability P (B) can be calculated as the Poisson probability
of zero successesin 200,000 independent trials each having success
probability 1/ 46
and P (B | Ak ) can be calculated as the Poisson
6
probability of zero successes in 199,999 of such trials. Noting that
P (B | Ak )/P (B) is 1 for all practical purposes, we get
P (A | Bk ) = P (Ak ).
We now find that
E(Xk ) = 200,000 ×
6
40
6−k
k
46
6
for k = 1, 2, 3.
79
This gives the values E(X1 ) = 5.12447, E(X2 ) = 249.8180 and E(X3 ) =
4219.149. Therefore the expected amount of money won by the syndicate is
25,000 × 5.12447 + 925 × 249.8180 + 27.50 × 4219.149 = 475,220
dollars. The expected profit is $75,220. To conclude, we remark that
the random variables X1 , X2 , and X3 are Poisson distributed. These
random variables are practically independent of each other and so the
standard deviation of the random variable 25,000X1 +925X2 +27.50X3
can be approximated by
q
25,0002 σ 2 (X1 ) + 9252 σ 2 (X2 ) + 27.502 σ 2 (X3 ) = 58,478.50.
Note: The probability distribution of the random variable 25,000X1 +
925X2 + 27.50X3 can be approximated by a normal distribution with
expected value $475,220 and standard deviation $58,478.50 (see Chapter 4). This leads to the approximation of 9.9% for the probability that
the syndicate will lose money on its investment of $400,000.
3.99 Use the hypergeometric model with R = 8, W = 7, and n = 10. The
sought probability is equal toPthe probability
5 or more red
7 of
picking
8
8
15
9
balls and this probability is k=5 k 10−k / 10 = 11 .
3.100 Let the random variable X be the largest number drawn and Y the
smallest number drawn. Then,
k−1
P (X = k) =
P (Y = k) =
5
45
6
45−k
5
45
6
for 6 ≤ k ≤ 45
for 1 ≤ k ≤ 40.
3.101 For a single player, the problem can be translated into the
urn
model
56
80
with 24 red and 56 white balls. This leads to Qk = 1 − 24
24 k−24 / k
for 24 ≤ k ≤ 79, where Q23 = 1 and Q80 = 0. The probability that
more than 70 numbers must be called out before one of the players has
achieved a full card is given by Q36
70 = 0.4552. The probability that
you will be the first player to achieve a full card while no other player
has a full card at the same time as you is equal to
79
X
k=24
Qk−1 − Qk Q35
k = 0.0228.
80
The probability that you will be among the first players achieving a
full card is
35 80 X
X
35
k=24 a=0
a
Qk−1 − Qk
a+1
Q35−a
= 0.0342.
k
3.102 Write X P
= X1 + · · · + Xr , where
number
picked. Then
P Xi is the ith P
r−1 Pr
E(X) = ri=1 E(Xi ), E(X 2 ) = ri=1 E(Xi2 )+2 i=1
j=i+1 E(Xi Xj ).
Since the Xi are interchangeable random variables, E(Xi ) = E(X1 )
for all i and E(Xi Xj ) = E(X1 X2 ) for all i 6= j. Obviously, E(X1 ) =
1
2 (s + 1) and so
1
E(X) = r(s + 1).
2
1
s
Since P (X1 = k, X2 = l) =
E(X1 X2 ) =
X
×
kl
k,l: l6=k
1
s−1
for any k 6= l, we have
s
s
k=1
l=1
X X
1
l−k .
k
=
s(s − 1)
P
P
Using the formulas sl=1 l = 21 s(s+1) and sk=1 k 2 = 61 s(s+1)(2s+1),
1
it follows after some P
algebra that E(X1 X2 ) = 12
(s + 1)(3s + 2). Also,
we have E(X12 ) = 1s sk=1 k 2 = 61 (s + 1)(2s + 1). Putting the pieces
together, we get
1
1
E(X 2 ) = r(s + 1)(2s + 1) + r(r − 1) (s + 1)(3s + 2).
6
12
Next, it is a matter of simple algebra to get the formula for σ 2 (X).
3.103 Fix 1 ≤ r ≤ a. Let the random variable X be the number of picks
needed to obtain r red balls. Then X takes on the value k if and only
if r − 1 red balls are obtained in the first k − 1 picks and another red
ball at the kth pick. Thus, for k = r, . . . , b + r,
P (X = k) =
a
r−1
b
k−1−(r−1)
a+b
k−1
×
1
.
a + b − (k − 1)
Alternatively, the probability mass
of X can be obtained from
Pfunction
r−1 a b a+b
the tail probability P (X > k) = j=0
j k−j / k .
81
3.104 Call your opponents East and West. The probability that East has
two spades and West has three spades is
5
2
21
10
26
13
=
Hence the desired probability is 2 ×
39
.
115
39
115
= 0.6783.
3.105 For 0 ≤ k ≤ 4, let Ek be the event that a diamond has appeared 4
times and a spade k times in the first 4 + k cards and Fk be the event
that
1)th card is a diamond. The sought probability is
P4 the (4 + k + P
4
P
(E
F
)
=
k
k
k=0 P (Ek )P (Fk | Ek ). Thus, the win probability
k=0
of player A is
4
8 7
X
4
4 k
×
= 0.6224.
15
11
−
k
4+k
k=0
3.106 Let Ak be the event that the last drawing has exactly k numbers in
common with the second last drawing. Seeing the six numbers from
the second last drawing as red balls and the other numbers as white
balls, it follows that
P (Ak ) =
6
k
43
6−k
49
6
for k = 0, 1, . . . , 6.
Let E be the event that the next drawing will have no numbers common with the last two
Pdrawings. Then, by the law of conditional
probabilities, P (E) = 6k=0 P (E | Ak )P (Ak ) and so
P (E) =
6
X
k=0
49−(6+6−k)
6
49
6
×
6
k
43
6−k
49
6
= 0.1901.
3.107 Let the random variable X be the number of tickets you will win.
Then X has a hypergeometric distribution with parameters R = 100,
W = 124,900, and n = 2,500. The sought probability is 1 − P (X =
0) = 0.8675. Since R + W ≫ n, the hypergeometric distribution
can be approximated by the binomial distribution with parameters
R
= 0.0008. The binomial distribution in turn
n = 2,500 and p = R+W
can be approximated by a Poisson distribution with expected value
λ = np = 2.
82
3.108 The probability of the weaker team winning the final is
7 X
k−1
(0.45)4 (0.55)k−4 = 0.3917.
3
k=4
Let the random variable X be the number of games the final will take.
Then,
k−1
k−1
4
k−4
P (X = k) =
(0.45) (0.55)
+
(0.55)4 (0.45)k−4 .
3
3
This probability has the numerical value 0.1325, 0.2549, 0.3093, and
0.3032 for k = 4, 5, 6, and 7. The expected value and the standard
deviation of the random variable X are given by 5.783 and 1.020.
3.109 The probability of fifteen successes before four failures is
19 X
n − 1 3 15 1 n−15
= 0.4654,
4
4
14
n=15
using the negative binomial distribution. As a sanity check, imagine
that 19 trials are done. Then, the sought probability is the probability of 15 or more successes in 19 trials and is equal to the binomial
probability
19 X
19 3 k 1 19−k
= 0.4654.
k
4
4
k=15
3.110 Let the random variable X have a negative binomial distribution with
parameters r = 15 and p = 34 . The probability that the red bag will
be emptied first is
19
X
k=15
19 X
k − 1 3 15 1 k−15
P (X = k) =
= 0.4654.
14
4
4
k=15
The probability that there still k ≥ 1 balls in the blue bag when the
red bag gets empty is
19 − k 3 15 1 5−k
P (X = 20 − k) =
4
4
14
for k = 1, . . . , 5.
83
3.111 Imagine that each player continues rolling the die until one of the
assigned numbers of that player appears. Let X1 be the number of
rolls player A needs to get a 1 or 2 and X2 be the number of rolls
player B needs to get a 4, 5 or 6. Then X1 and X2 are independent
and geometrically distributed with parameters p1 = 31 and p2 = 21 .
The probability of player A winning is
P (X1 ≤ X2 ) =
∞
X
j=1
p1 (1 − p1 )j−1 (1 − p2 )j−1 =
p1
1
= .
p1 + p2 − p1 p2
2
The length of the game is X = min(X1 , X2 ). Thus
P (X > l) = (1 − p1 )l (1 − p2 )l = (1 − p)l
for l = 0, 1, . . . ,
where p = p1 + p2 − p1 p2 . Therefore the length of the game is geometrically distributed with parameter p = 23 .
1
applies to
3.112 The geometric distribution with success probability p = 37
this situation. The probability that the house number 0 will not come
up in 25 spins of the roulette wheel is
1 − (1 − p)25 = 0.495897.
The expected value of the gambler’s net profit per dollar bet is $0.0082.
3.113 Let PA be the probability of player A winning and Pd be the probability
of a draw. By the law of conditional probability, PA = a(1 − b) + (1 −
a)(1 − b)PA and Pd = ab + (1 − a)(1 − b)Pd . This gives
PA =
a(1 − b)
ab
and Pd =
.
a + b − ab
a + b − ab
The length of the game is geometrically distributed with parameter
p = 1 − (1 − a)(1 − b) = a + b − ab.
3.114 The random variable X is given by Y − 3, where the random variable
Y has a negative binomial distribution with parameters n = 3 and
p = 12 . Hence
x+3
1
x+2
P (X = x) =
2
2
for x = 0, 1, . . . .
The √
expected value and the standard deviation of X are given 6−3 = 3
and 6 = 2.449.
84
3.115 Suppose the strategy is to stop as soon as you have picked a number
larger than or equal to r. The number of trials needed is geometrically
and the amount you get paid has
distributed with parameter 25−r+1
25
a discrete uniform distribution on r, . . . , 25. The expected net payoff
is given by
25
X
25
1
25
1
k−
= (25 + r) −
.
25 − r + 1
25 − r + 1
2
25 − r + 1
k=r
This expression takes on the maximal value $18.4286 for r = 19.
3.116 The probability that both coins simultaneously show the same outcome
is p × 21 + (1 − p) × 12 = 12 . The desired probability distribution is the
geometric distribution with parameter 21 .
3.117 Let X be the number of rounds required for the game. The
P random
2
variable X is geometrically distributed with parameter p = 12
i=2 ai =
73
648 , where ai is the probability of rolling a dice total of i and is given
by ai = i−1
36 for 2 ≤ i ≤ 7 and a14−i = ai for 8 ≤ i ≤ 12. The
probability of John paying for the beer is
5
X
k=1
a22k+1
12
.X
a2i =
i=2
38
.
73
3.118 Let us say that a success occurs each time an ace is drawn that you
have not seen before. Denote by Xj be the number of cards drawn
between the occurrences of the (j − 1)th and jth success. The random variable Xj is geometrically distributed with success probability
4−(j−1)
. Also, the random variables X1 , . . . , X4 are independent of
52
each other (the cards are drawn with replacement). A geometrically
distributed random variable with parameter p has expected value 1/p
and variance (1 − p)/p2 . Hence the expected value and the standard
deviation of the number of times you have to draw a card until you
have seen all four different aces are
52 52 52 52
+
+
+
= 108.33
4
3
2
1
v
u 4
uX 1 − k/52
= 61.16.
σ(X1 + X2 + X3 + X4 ) = t
(k/52)2
E(X1 + X2 + X3 + X4 ) =
k=1
85
Chapter 4
R 10
1
. The prob4.1 Since c 0 (10 − x) dx must be equal to 1, we get c = 50
R5 1
3
abilities are P (X ≤ 5) = 0 50 (10 − x) dx = 4 and P (X > 2) =
R 10 1
16
2 50 (10 − x) dx = 25 .
R1
4.2 The constant c follows from the requirement c 0 (3x2 − 8x − 5) = 1.
This gives c = − 81 . Since f (x) = (5+8x−3x2 )/8 is positive for 0 < x <
1, we have that f (x) represents indeed a probability density function.
The cumulative probability distribution function F (x) = P (X ≤ x) is
given by
Z
1
1 x
(5 + 8y − 3y 2 ) dy = (5x + 4x2 − x3 ) for 0 ≤ x ≤ 1.
F (x) =
8 0
8
Further F (x) = 0 for x < 0 and F (x) = 1 for x ≥ 1.
Ra
R ca2
2
2
4.3 Noting that 0 2cxe−cx dx = 0 e−u du = 1 − e−ca , we get
Z 15
2
2cxe−cx dx = 0.3023,
P (X ≤ 15) =
Z0 ∞
2
2cxe−cx dx = 0.2369,
P (X > 30) =
30
Z 25
2
2cxe−cx dx = 0.1604.
P (20 < X ≤ 25) =
20
4.4 Let the random variable X be the length of any particular phone call
made by the travel agent. Then,
Z ∞
∞
0.25e−0.25x dx = −e−0.25x 7 = e−1.75 = 0.1738.
P (X > 7) =
7
4.5 The proportion of pumping
R ∞engines that 2will not fail before 10,000
hours use is P (X > 10) = 10 0.02xe−0.01x dx . Since
Z ∞
Z ∞
2
−0.01x2
e−y dy = e−a /100 ,
0.02xe
dx =
a
a2 /100
we get P (X > 10) = e−1 . Also P (X > 5) = e−0.25 . Therefore the
probability that the engine will survive for another 5,000 hours given
that it has functioned properly during the past 5,000 hours is
P (X > 10 | X > 5) =
P (X > 10)
e−1
= −0.25 = 0.4724.
P (X > 5)
e
86
Rx
4.6 The cumulative distribution function P (X ≤ x) = −∞ f (y) dy is given
1
1
(x−115)2 for 115 ≤ x ≤ 120 and F (x) = 1− 50
(125−x)2
by F (x) = 50
21
for 120 ≤ x ≤ 125. Since P (117 < X < 123) = F (123) − F (117) = 25
,
4
the proportion of non-acceptable strain gauges is 25 .
4.7 The cumulative distribution
R xfunction F (x) = P (X ≤ x) of the random
variable X is F (x) = 105 0 y 4 (1 − y)2 dy = x5 (15x2 − 35x + 21) for
0 ≤ x ≤ 1. The solution of the equation 1 − F (x) = 0.05 is x = 0.8712.
Thus the capacity of the storage tank in thousands of gallons should
be 0.8712.
4.8 A stockout occurs if and only if the demand X is larger than Q. Thus
Z Q
Z ∞
f (x) dx.
f (x) dx = 1 −
P (stockout) =
Q
0
2
4.9 Let the random variable Y be the area of the circle. Then Y =
p πX .
Since P (X ≤ x) = xpfor 0 ≤ x ≤ 1 and P (Y ≤ y) = P (X ≤ y/π),
we get P (Y ≤ y) = y/π for 0 ≤ y ≤ π. Differentiation
of P (Y ≤ y)
√ gives that the density function of Y is 1/ 2 πy for 0 < y < π and 0
otherwise.
4.10 To find the density function of Y = X1 , we determine P (Y ≤ y).
Obviously, P (Y ≤ y) = 0 for y ≤ 1. For y > 1,
1
1
1
P (Y ≤ y) = P X ≥
=1−P X ≤
=1−F
,
y
y
y
where F (x) is the probability distribution function of X. By differentiation, it follows that the density function g(y) of Y is given by
6 1
1
1
1
+ √
× 2 =
g(y) = f
for y > 1
y
y
7 y3 y2 y
and g(y) = 0 otherwise.
4.11 The cumulative distribution function of Y = X 2 is
√
√
P (Y ≤ y) = P (X ≤ y) = F ( y) for y ≥ 0,
where F (x) = P (X ≤ x). Differentiation gives that the density func√ √
tion of Y is 12 f ( y)/ y for y > 0 and 0 otherwise. The cumulative
distribution function of W = V 2 is
√
√
√
2 w
for 0 ≤ w ≤ a2 .
P (W ≤ w) = P (− w ≤ V ≤ w) =
2a
87
√ The density function of W is 1/ 2a w for 0 < w < a2 and 0 otherwise.
4.12 (a) Let the random variable V be the sum of the coordinates of the
point Q. For 0 ≤ v ≤ 1, the random variable V takes on a value
smaller than or equal to v if and only if the point Q falls in a right
triangle with legs of length v (draw a picture). The area of this triangle
is 21 v 2 . Hence
1
P (V ≤ v) = v 2
2
for 0 ≤ v ≤ 1.
For 1 ≤ v ≤ 2, the random variable V takes on a value larger than v
if and only if the point Q falls in a right triangle with legs of length
1 − (1 − v) = 2 − v. The area of this triangle is 21 (2 − v)2 and so
P (V > v) = 12 (2 − v)2 for 1 ≤ v ≤ 2. This gives
1
P (V ≤ v) = 1 − (2 − v)2
2
for 1 ≤ v ≤ 2.
By differentiation, it now follows that the density function fV (v) of V
satisfies fV (v) = v for 0 < v ≤ 1, f (v) = 2 − v for 1 < v ≤ 2 and
fV (v) = 0 otherwise.
(b) Let the random variable W be the product of the coordinates
of the randomly chosen point Q. A point (x, y) in the unit square
satisfies xy ≤ w for any given 0 ≤ w ≤ 1 if and only if either the point
belongs to the set {(x, y) : 0 ≤ x ≤ w, 0 ≤ y ≤ 1} or the point satisfies
w ≤ x ≤ 1 and is below the graph y = wx (draw a figure). This gives
P (W ≤ w) = w +
Z
1
w
w
dx = w − wln(w).
x
The density function of W is fW (w) = −ln(w) for 0 < w < 1 and
fW (w) = 0 otherwise.
4.13 The random variable V = X/(1 − X) satisfies
v v
P (V ≤ v) = P X ≤
=
1+v
1+v
for v ≥ 0.
1
Thus the density function of V is (1+v)
2 for v > 0 and 0 otherwise. To
get the density of W = X(1 − X), note that the function x(1 − x) has
1
4 as its maximal value on (0, 1) and that the equation x(1 − x) = w
88
has the solutions x1 =
0 ≤ w ≤ 14 . Thus
1
2
−
1
2
√
1
2
1 − 4w and x2 =
P (W > w) = P (x1 ≤ X ≤ x2 ) =
Z
x2
1 dx =
x1
√
+
1
2
√
1 − 4w for
1
for 0 ≤ w ≤ .
4
1 − 4w
√
Thus the density function of W is 2/ 1 − 4w for 0 < w <
otherwise.
1
4
and 0
4.14 Let the random variable U be a number chosen at random from the
interval (0,1). Using the fact that P (U ≤ u) = u for 0 ≤ u ≤ 1, it
follows that
P (X ≤ x) = P (0 ≤ U ≤ x)+P (1−x ≤ U ≤ 1) = 2x
for 0 ≤ x ≤ 0.5.
Hence X has the density function f (x) = 2 for 0 < x < 0.5 and
f (x) = 0 otherwise. Let the random variable Y = X/(1 − X). Then,
P (Y ≤ y) = P
y
X≤
1+y
=
2y
1+y
for 0 ≤ y ≤ 1.
The density function of Y = X/(1−X) is fY (y) =
and fY (y) = 0 otherwise.
2
(1+y)2
for 0 < y < 1
4.15 The sample space of the experiment is {(x, y) : 0 ≤ x, y ≤ 1}. Noting
that max(x, y) ≤ v if and only if x ≤ v and y ≤ v, it follows that the
random variable V takes on a value smaller than or equal to v if and
only if the randomly chosen point falls in the set A = {(x, y) : 0 ≤
x, y ≤ v}. Hence the probability P (V ≤ v) is equal to the area of the
set A and so
P (V ≤ v) = v 2 for 0 ≤ v ≤ 1.
Hence the density function of V is fV (v) = 2v for 0 < v < 1 and
fV (v) = 0 otherwise. Noting that min(x, y) > w if and only if x > w
and y > w, the probability P (W > w) can be calculated as the area
of the set B = {(x, y) : 1 > x, y > w}. This gives
P (W ≤ w) = 1 − (1 − w)2
for 0 ≤ w ≤ 1.
Hence the density function of W is given by fW (w) = 2(1 − w) for
0 < w < 1 and fW (w) = 0 otherwise.
89
4.16 Drawing a figure and using the symmetry in the model, we can conclude that the height X above the ground can be modeled as X =
15 + 15 cos(Θ), where Θ is a randomly chosen angle between 0 and π.
Then,
x
P (X ≤ x) = P (15 + 15 cos(Θ) ≤ x) = P Θ ≥ arccos
−1
15
x
1
= − arccos
−1
for 0 ≤ x ≤ 30,
π
15
where the last equality uses the fact that the randomly chosen angle
Θ has the density π1 on (0, π). In particular, P (X ≤ 22.5) = 2/3 and
1
. Hence the
P (X ≤ 7.5) = 1/3. The derivative of arccos(z) is − √1−z
2
density function of X is given by
(
√ 1
for 0 < x < 30
15π
1−(x/15−1)2
f (x) =
0
otherwise.
4.17 We have E(X) =
hundred hours.
1
625
R 75
50
x(x − 50) dx +
1
625
R 100
4.18 By partial integration,
Z 0.25 √
√
xπ 2cos(πx) dx = 2x sin(πx)
E(X) =
0
75
x(100 − x) dx = 75
0.25
0
+
√
2
cos(πx)
π
0.25
.
0
This gives E(X) = 0.1182 seconds.
4.19 The density function of the distance X thrown by Big John is f (x) =
90−x
x−50
600 for 50 < x < 80, f (x) =R 200 for 80 < x < 90, and f (x) = 0
90
otherwise. This gives E(X) = 50 xf (x) dx = 73 13 meters.
4.20 The expected values of the random variables in the Problems 4.2, 4.4,
and 4.6 are 53
96 , 4, and 120.
4.21 Let the random variable X be the distance from the randomly chosen
point to the base of the triangle. Using a little geometry, it follows that
P (X > x) is equal to the ratio of 12 [(h − x) × (h − x)b/h] and 12 h × b.
Differentiation shows that the density function of X is 2(h − x)/h2 for
0 < x < h and 0 otherwise. Then
Z h
1
2(h − x)
dx = h.
x
E(X) =
2
h
3
0
90
4.22 The
value of the price paid for the collector’s item is E(X) =
R 1 expected
9
8
9
0 x 90(x − x ) dx = 11 .
4.23 Let X be the distance from the point to the origin. Then
1
P (X ≤ a) = πa2 for 0 ≤ a ≤ 1
4
Z ap
p
1
1
1 2
a2 − x2 dx = πa2 − a2 arccos( ) + a2 − 1
P (X ≤ a) = πa − 2
4
4
a
1
for 1 < a ≤
√
2. The density function f (x) of X satisfies
f (x) =
1
2 πx
1
2 πx
for 0 < x < 1,
√
for 1 < x < 2.
− 2x arccos( x1 )
Numerical integration leads to E(X) =
R √2
0
xf (x) dx = 0.765.
4.24 The range of the random variable X is the interval (0, 0.5). Let A be
the subset of points from the unit square for which the distance to the
closest side of the rectangle is larger than x, where 0 < x < 0.5. Then
A is a square whose sides have the length 1 − 2x and so the area of A
is (1 − 2x)2 . It now follows that
P (X ≤ x) = 1 − (1 − 2x)2
for 0 ≤ x ≤ 0.5.
The probability density f (x) of X is given by f (x) = 4(1 − 2x) for
0R < x < 0.5 and f (x) = 0 otherwise. The expected value of X is
0.5
0 x 4(1 − 2x) dx = 0.1667.
4.25 (a) By P (A | B) = P (AB)/P (B), we get
P (X ≤ x | X > a) =
Rx
a f (v) dv
.
P (X > a)
Thus the conditional density of X given that X > a is f (x)/P (X > a)
for x > a and 0 otherwise. In the same way, the conditional density
of X given that X ≤Ra is f (x)/P (X ≤ a) for x < a and 0 otherwise.
1
1
1
(b) By E(X) = 1−a
a x dx, we get E(X) = 2 (1 + a).
(c) E(X | X > a) = a + λ1 and E(X | X ≤ a) =
a > 0.
1−e−λa −λae−λa
λ(1−e−λa )
for any
91
4.26 (a) By P (X > x) =
Z
∞
R∞
x
f (y) dy, it follows that
P (X > x) dx =
0
Z
∞
dx
x=0
Z
∞
f (y) dy.
y=x
By interchanging the order of integration, the last integral becomes
Z
∞
f (y) dy
y=0
Z
y
dx
x=0
=
Z
∞
yf (y) dy = E(X),
0
proving the desired result.
(b) Let the random variable X be the smallest of n independent random numbers from (0, 1). Then P (X > x) = (1 R− x)n for 0 ≤ x ≤ 1
1
1
and P (X > x) = 0 for x > 1. This gives E(X) = 0 (1 − x)n dx = n+1
.
4.27 (a) The function g(x) = x1 is convex for x > 0. Therefore,
by5Jensen’s
3
1
1
1
inequality, E X ≥ E(X) . Since E(X) = 5 , we get E X ≥ 3 .
2
R1
R1
(b) E X1 = 0 x1 12x2 (1 − x) dx = 2 and E[ X1 ] = 0 x12 12x2 (1 −
√
x) dx = 6, and so σ( X1 ) = 2.
4.28 The random variable U has density function f (u) = 1 for 0 < u √
<1
and f (u) = 0 otherwise. By the substitution rule, we find for V = U
and W + U 2 thatthat
1√
Z 1
2
1
2
E(V ) =
u du =
u du =
and E(V ) =
3
2
0
0
Z 1
Z 1
1
1
u4 du = .
and E(W 2 ) =
u2 du =
E(W ) =
3
5
0
0
Z
Hence, the expected value and standard deviation of V are given by 23
q
and 12 − 94 = 0.2357. The expected value and standard deviation of
q
W are given by 31 and 15 − 91 = 0.2981.
4.29 Let Y be the amount paid by the supplement policy. Then Y = g(X),
where g(x) is min(500, x − 450) for x > 450 and 0 otherwise. By the
substitution rule,
E(Y ) =
Z
950
450
1
(x − 450)
dx + 500
1250
Z
1250
950
1
dx = 220 dollars.
1250
92
4.30 Let random variable X be the amount of waste (in thousands of gallons) produced during a week and Y be the total costs incurred during
a week. Then the random variable Y can be represented as Y = g(X),
where the function g(x) is given by
1.25 + 0.5x
for 0 < x < 0.9,
g(x) =
1.25 + 0.5 × 0.9 + 5 + 10(x − 0.9)
for 0.9 < x < 1.
and g(x) = 0 otherwise. By the substitution rule, the expected value
of the weekly costs is given by
Z 1
g(x)x4 (1 − x)2 dx = 1.6975.
E(Y ) = 105
0
To find the standard deviation of the weekly costs, we first calculate
Z 1
2
g 2 (x)x4 (1 − x)2 dx = 3.6204.
E(Y ) =
0
Thus the standard deviation of the weekly costs is
0.8597.
p
E(Y 2 ) − E 2 (Y ) =
4.31 The net profit Y = g(X), where g(x) = 2x for 0 ≤ x ≤ 250 and
g(x) = 2 × 250 − 0.5(x − 250) for x > 250. By the substitution rule,
Z ∞
Z 250
[500 − 0.5(x − 250)]f (x) dx = 194.10
2xf (x) dx +
E(Y ) =
0
250
dollars. The probability of a stockout is P (X > 250) = 1−
0.0404.
R 250
0
f (x) dx =
4.32 The insurance payment (in thousands of dollars) is a so-called mixed
random variable S, where

with probability 0.01

20 − 1
with probability 0.02
S = max(0, X − 1)


0
with probability 0.97,
where X represents the cost of a repairable damage. The random
1
variable X has the density function f (x) = 200
(20 − x) for 0 < x < 20.
Thus,
Z
Z 20
1
0 × f (x)dx +
(x − 1)f (x)dx + 0.97 × 0
E(S) = 0.01 × 19 + 0.02
0
1
Z 20
20 − x
(x − 1)
= 0.19 + 0.02
dx = 0.19 + 0.11432 = 0.30432.
200
1
93
The expected value of the insurance payment is 304.32 dollars.
4.33 Let U be the random point in (0, 1) and define g(u) = 1 − u if u < s
and g(u) = u if u ≥ s. Then L = g(U ) is the length of the subinterval
covering the point s. By the substitution rule,
E(L) =
Z
s
0
(1 − u) du +
Z
1
s
1
u du = s − s2 + .
2
53
4.34 In the Problems 4.2, 4.4, and 4.6, E(X) has the values 96
, 4, and 120
2
and the second moment E(X ) has the values 0.3833, 32, and 14404.2
Therefore the standard deviation σ(X) has the values 0.2802, 4, and
2.0412.
4.35 The area of the circle is Y = πX 2 , where X has the density function
f (x) = 1 for 0 < x < 1. By the substitution rule,
E(Y ) =
Z
1
0
π
and E(Y 2 ) =
πx dx =
3
2
Z
1
π 2 x4 dx =
0
The expected value and the standard deviation of Y are
π
3
π2
.
5
and
2π
√ .
3 5
4.36 Let the random variable X be the distance from the center of the
sphere to the point Q. Using the fact that the volume of a sphere with
radius r is 43 πr3 , we get
P (X ≤ x) =
x3
r3
for 0 ≤ x ≤ r.
2
for 0 < x < r and
Hence X has the density function f (x) = 3x
r3
f (x) = 0 otherwise. The expected value
p and the standard deviation
of the random variable X are 43 r and 3/80 r.
4.37 (a) E (X − c)2 = E(X 2 )−2cE(X)+c2 and is minimal for c = E(X),
as follows by differentiation.
The minimal
R ∞ value is the variance of X.
Rc
(b) E(|X − c|) = −∞ (c − x)f (x)dx + c (x − c)f (x)dx. The derivative
of E(|X − c|) is 2P (X ≤ c) − 1. The minimizing value of c satisfies
P (X ≤ c) = 21 and is the median of X.
4.38 The height above the ground is given by the random variable X =
15 + 15 cos(Θ), where Θ is uniformly distributed on (0, π). Using the
94
substitution rule and the relation cos2 (x) = 12 (cos(2x) + 1), we get
Z
1 π
E(X) =
[15 + 15cos(x)] dx = 15
π 0
Z π
1
E(X 2 ) =
225[1 + 2cos(x) + cos2 (x)] dx
π 0
Z
225 π
cos(2x) dx = 337.5.
= 337.5 +
2π 0
√
The standard deviation of X is σ(X) = 337.5 − 152 = 10.61 meters.
Note: An alternative method to calculate E(X 2 ) is to use the density
function h(x) of X, see Problem 4.16. However, it seems
R 30 that numerical integration must be used to obtain the value of 0 x2 h(x) dx (as a
sanity check, the numerical computation of the integral gives also the
answer 337.5). It is much simpler to use the substitution rule to get
this answer.
4.39 Let Y be the amount of demand that cannot be satisfied from stock
on hand and define g(x) = x − s for x > s and g(x) = 0 otherwise. By
the substitution rule,
Z ∞
Z ∞
−λx
2
(x − s)2 λe−λx dx.
(x − s)λe
dx and E(Y ) =
E(Y ) =
s
s
These integrals can be evaluated as
E(Y ) =
1 −λs
λe
2 −λs
e .
λ2
and
1 −λs
1
e
and σ(Y ) = [e−λs (2 − e−λs )]1/2 .
λ
λ
4.40 (a) The expected value of X is given by
Z
Z ∞
α+1
x(α/β)(β/x)
dx =
E(X) =
β
=
This leads to
αβ α
1−α
x−α+1
∞
=
β
∞
αβ α x−α dx
β
αβ
,
α−1
provided that α > 1; otherwise E(X) = ∞. For α > 2,
Z ∞
Z ∞
2
α+1
2
αβ α x−α+1 dx
x (α/β)(β/x)
dx =
E(X ) =
β
β
=
αβ α
2−α
x−α+2
∞
β
=
αβ 2
α−2
.
95
For 0 < α ≤ 2, E(X 2 ) = ∞. Thus, for α > 2,
αβ 2
αβ 2
αβ 2
=
−
.
var(X) =
α−2
α−1
(α − 1)2 (α − 2)
For any α > 0,
P (X ≤ x) =
Z
x
β
(α/β)(β/y)α+1 dy = 1 − (β/x)α
for x > β.
Putting P (X ≤ x) = 0.5 gives the for the median m the value
m = 21/α β.
(b) The mean of the income is 4,500 dollars and the median is 3,402
dollars. The percentage of the population with an income between 25
and 40 thousand dollars is 0.37%, as follows from
P (25 < X ≤ 40) = P (X ≤ 40) − P (X ≤ 25)
= (2.5/25)2.25 − (2.5/40)2.25 = 0.0037.
(c) The Pareto distribution shows rather well the way that a larger
portion of the wealth in a country is owned by a smaller percentage
of the people in that country. The explanation is that the Pareto
density f (x) decreases from x = β onwards and has a long tail. Thus,
most realizations of a Pareto distributed random variable tend to be
small but occasionally the realizations will be very large. This is quite
typical for income distributions. Also, the Pareto distribution has the
property that the mean is always larger than the median.
4.41 The age of the bulb upon replacement is Y = g(X), where
=x
R 10 g(x)
1
dx +
for x ≤ 10 and g(x) = 10 for x > 10. Then E(Y ) = 2 x 10
R 12
R 10 2 1
R 12 2 1
1
2
10 10 10 dx and E(Y ) = 2 x 10 dx + 10 10 10 dx. This leads to
E(Y ) = 6.8 and σ(Y ) = 2.613.
4.42 Let the random variable X be the thickness of a sheet of steel and Y
be the thickness of a non-scrapped sheet of steel. Then
P (Y > y) = P (X > y | X > 125)
for 125 ≤ y ≤ 150.
The random variable X is uniformly distributed on (120, 150) and so
for 120 ≤ x ≤ 150. This implies that
P (X > x) = 150−x
30
P (Y > y) =
150 − y
25
for 125 ≤ y ≤ 150.
96
In other words, the random variable Y is uniformly distributed on
(125, 150). Hence the expected value and the standard deviation of a
= 137.5 millimeters
non-scrapped sheet of steel are given by 125+150
2
and √2512 = 7.217 millimeters.
4.43 Since P min(X, Y ) > t = P (X > t, Y > t) = P (X > t)P (Y > t), we
get
P min(X, Y ) > t = e−αt e−βt = e−(α+β)t for all t > 0,
and so min(X, Y ) is exponentially distributed. Using this result and
the memoryless property of the exponential distribution, we have that
the time to failure of the reliability system is distributed as T1 +T2 +T3 ,
where T1 , T2 and T3 are independent and exponentially distributed
with respective parameters 5λ, 4λ and 3λ. Thus
1
1
1
47
+
+
=
5λ 4λ 3λ
60λ
√
1
769
1
1 0.5
=
+
+
.
σ(T1 + T2 + T3 ) =
25λ2 16λ2 9λ2
60λ
E(T1 + T2 + T3 ) =
4.44 By the memoryless property of the exponential distribution, the time
from three o’clock in the afternoon until the next departure of a limousine has an exponential distribution with an expected value of 20 minutes. Using the fact that the standard deviation of an exponential
density is the same as the expected value of the density, the expected
value and the standard deviation of your waiting time are both equal
to 20 minutes.
4.45 Since the sojourn time of each bus is exactly half an hour, the number
of buses on the parking lot at 4 p.m is the number of buses arriving between 3:30 p.m and 4 p.m. Taking the hour as unit of time, the buses
arrive according to a Poisson process with rate λ = 43 . Using the memoryless property of the Poisson process, the number of buses arriving
between 3:30 p.m and 4 p.m is Poisson distributed with expected value
λ × 21 = 23 .
4.46 Take the hour as unit of time. The average number of arrivals per hour
between 6 p.m and 10 p.m is 1.2. The random variable X measuring
the time from 6 p.m until the first arrival after 6 p.m is exponentially
distributed with parameter λ = 1.2. Hence the expected value of X
1
is 1.2
= 10
12 hours or 50 minutes. The median of X follows by solving
97
1 − e−1.2x = 0.5 and is equal to −ln(0.5)/1.2 = 0.5776 hours or 34.66
minutes. The probability that the first call occurs between 6:20 p.m
and 6:45 p.m is given by
3
1
= e−1.2×1/3 − e−1.2×3/4 = 0.2638.
≤X≤
P
3
4
Let the random variable Y be the time measured from 7 p.m until the
first arrival after 7 p.m. The probability of no arrival between 7 p.m
and 7:20 p.m and at least one arrival between 7:20 p.m and 7:45 p.m
is P ( 13 < Y ≤ 34 ). By the memoryless property of the exponential
distribution, the random variable Y has the same exponential distribution as X. Hence the probability P ( 31 < Y ≤ 34 ) is also equal to
e−1.2×1/3 − e−1.2×3/4 = 0.2638.
4.47 The probability that the time between the Rpassings of two consecutive
∞
cars is more than c seconds is given by p = c λe−λt dt = e−λc . By the
lack of memory of the exponential distribution, p = e−λc gives also the
probability that no car comes around the corner during the c seconds
measured from the moment you arrive at the road. The number of
passing cars before you can cross the road has the shifted geometric
distribution {(1 − p)k p, k = 0, 1, . . .}.
4.48 By the lack of memory of the exponential distribution, the remaining
washing time of the car being washed in the station has the same exponential density as a newly started washing time. Hence the probability
that the car in the washing station will need no more than five other
minutes is equal to
Z 5
1 −1t
e 15 dt = 1 − e−5/15 = 0.2835.
15
0
The probability that you have to wait more than 20 minutes before
your car can be washed is equal to P (X1 + X2 > 20), where X1 is
the remaining service time of the car in service when you arrive and
X2 is the service time of the other car. The random variables X1 and
X2 are independent. By the memoryless property of the exponential
distribution, X1 has the same exponential distribution as X2 . The
random variable X1 + X2 has an Erlang-2 distribution and the sought
probability is given by
P (X1 + X2 > 20) = e−20/15 +
20 −20/15
e
= 0.6151.
15
98
Alternatively, this answer can be seen from Rule 4.3 by noting that
P (X1 + X2 > 20) is the probability of at most one service completion
in the 20 minutes.
4.49 The probability of having a replacement because of a system failure is
given by
∞
X
n=0
∞
X
e−µnT − e−µ[(n+1)T −a] .
P nT < X ≤ (n + 1)T − a =
n=0
e−µ(T −a) )/(1
This probability is equal to (1 −
time between two replacements is
∞
X
n=1
− e−µT ). The expected
nT P (n − 1)T < X ≤ nT ) =
T
.
1 − e−µT
4.50 The probability that the closest integer to the random observation is
odd is equal to
∞
∞ Z 2k+1+ 1
X
X
2
1
1
P (2k + < X < 2k + 1 + ) =
e−x dx
1
2
2
2k+
2
k=0
k=0
1
∞
−1
X
1−e
e− 2
−(2k+1+ 12 )
− 21
−(2k+ 12 )
−e
=e
e
.
=
=
1 − e−2
1 + e−1
k=0
The conditional probability that the closest integer to the random
observation is odd given that it is larger than the even integer r is
equal to
∞
X
P (2k +
k=0
1
1
< X < 2k + 1 +
X > r)
2
2
∞
X
1
1
1
P (2k + < X < 2k + 1 + , X > r)
P (X > r)
2
2
k=0
∞ Z 2k+1+ 1
∞
2
1 X
1 X −(2k+ 1 )
−(2k+1+ 12 )
2 − e
= −r
e−x dx = −r
e
e
e
2k+ 1
=
k=r/2
k=r/2
2
P −(2l+ 1 ) −(2l+1+ 1 ) 2 −e
2
−e
= e−r ∞
,
Since k=r/2 e
l=0 e
the conditional probability that the closest integer to the random observation is odd given that it is larger than r is equal to
1
∞
1
X
−(2l+ 1 )
1 1
e−1 e− 2
−(2l+1+
)
−
2 − e
2
e
=e 2
−
.
=
1 − e−2 1 − e−2
1 + e−1
P∞
l=0
−(2k+ 21 )
−(2k+1+ 21 )
99
The conditional probability is the same as the unconditional probability that the closest integer to the random observation from the
exponential density is odd. This result can also be explained from the
memoryless property of the exponential distribution.
4.51 Your win probability is the probability of having exactly one signal in
(s, T ). This probability is e−λ(T −s) λ(T − s), by the memoryless property of the Poisson process. Putting the derivative of this expression
equal to zero, we get that the optimal value of s is T − λ1 . The maximal
win probability is e−1 .
4.52 Let N (t) be the number of events to occur in (0, t). Then,
P (N (a) = k | N (a + b) = n) =
P (N (a) = k, N (a + b) − N (a) = n − k)
P (N (a + b) = n)
for any 0 ≤ k ≤ n. We have P (N (a) = k, N (a + b) − N (a) = n − k) =
P (N (a) = k)P (N (a + b) − N (a) = n − k), by the independence of
N (a) and N (b) − N (a). Thus, for k = 0, 1, . . . , n,
e−λa (λa)k /k! × e−λb (λb)n−k /(n − k)!
n
e−λ(a+b) λ(a + b) /n!
n
a k b n−k
=
.
k a+b
a+b
P (N (a) = k | N (a + b) = n) =
In view of the characteristic properties of the Poisson process, it is not
surprising that the conditional distribution of N (a) is the binomial
a
distribution with parameters n and a+b
.
8
4.53 Take the minute as time unit. Let λ = 60
and T = 10. The probability
that the ferry will leave with two cars is 1 − e−λT = 0.7364. Let the
generic variable X be exponentially distributed with an expected value
of λ1 = 7.5 minutes. The expected value of the time until the ferry
leaves is
Z T
Z ∞
1
1
+ E min(X, T ) = +
tλe−λt dt + T
λe−λt dt
λ
λ
0
T
minutes. This leads to an expected value of
minutes.
1
λ
+ λ1 (1 − e−λT ) = 13.02
4.54 Noting that major cracks on the highway occur according to a Poisson
1
process with rate 10
, it follows that the probability that there are no
100
major cracks on a specific 15-mile stretch of the highway is e−15/10 =
0.2231 and the probability of two or more major cracks on that part
of the highway is 1 − e−15/10 − (15/10)e−15/10 = 0.4422.
4.55 In view of Rule 4.3, we can think of failures occurring according to a
Poisson process with a rate of 4 per 1,000 hours. The probability of
no more than five failures during 1,000 hours is given by the Poisson
probability
5
X
4k
e−4
= 0.7851.
k!
k=0
The smallest value of n such that
Pn
k=0 e
−4 4k
k!
≥ 0.95 is n = 8.
4.56 The probability of no bus arriving during a wait of t minutes at the
bus stop is e−t/10 . Putting e−t/10 = 0.05 gives t = 29.96. You must
leave home no later than 7:10 a.m.
4.57 (a) Since the number of goals in the match is Poisson distributed with
P
k
1
an expected value of 90 × 30
= 3, the answer is 1 − 2k=0 e−3 3k! =
0.5768.
(b) The numbers of goals in disjoint time intervals are independent of
2
−1.5 1.5 = 0.0840.
each other and so the answer is e−1.5 1.5
2! × e
(c) Let, for k = 0, 1, . . .,
ak = e−3×(12/25)
(3 × (12/25))k
(3 × (13/25))k
and bk = e−3×(13/25)
.
k!
k!
Then, by the results
P of Rule 3.12, we get that the probability of a
draw is equal to ∞
ak × bk = 0.2425 and the probability of a win
k=0 P
Pk−1
for team A is equal to ∞
k=1 ak ×
n=0 bn = 0.3524.
4.58 The probability of having no other emergence unit within a distance
r of the incident is given by the probability of no emergence unit in a
circle with radius r around the point of the incident. The probability
2
of no Poisson event in a region with area πr2 is e−απr and so the
2
desired probability is 1 − e−απr .
20
) × 100% = 10.56%.
4.59 The answer is 1 − Φ( 16
4.60 The solution of 1−Φ(x) = 0.05 is given by the percentile z0.95 = 1.6449.
Thus the cholesterol level of 5.2 + 1.6449 × 0.65 = 6.27 mmol/L is
exceeded by 5% of the population.
101
4.61 An estimate for the standard deviation σ of the demand follows from
the formula 50+σz0.95 = 75, where z0.95 = 1.6449 is the 95% percentile
of the standard normal distribution. This gives the estimate σ = 1.2.
4.62 The proportion of euro coins that are not accepted by the vending
machine is
23.60 − 23.25 22.90 − 23.25 +1−Φ
= 2[1 − Φ(3.5)] = 0.0046.
Φ
0.10
0.10
4.63 By P (X < 20) = P (X ≤ 20) = P X−25
≤ 20−25
= Φ(−2), we
2.5
2.5
have P (X < 20) = 0.0228. Finding the standard deviation σ of the
thickness of the coating so that P (X < 20) = 0.01 translates into
= 0.01. The 0.01th percentile of the N (0, 1)
solving σ from Φ 20−25
σ
distribution is −2.3263, and so −5/σ = −2.3263, or σ = 2.149.
4.64 The proportion of the mills output that can be used by the customer
is equal to
10.15 − 10 9.85 − 10 Φ
−Φ
= 0.9839.
0.07
0.07
4.65 We have P (|X − µ| > kσ) = P (|Z| > k) = P (Z ≤ −k) + P (Z ≥ k),
where Z is N (0, 1) distributed. Since P (Z ≥ k) = P (Z ≤ −k) and
P (Z ≥ k) = 1 − Φ(k), the sought result follows.
4.66 Let the random variable Y = aX + b. To evaluate P (Y ≤ y), distinguish between the two cases a ≥ 0 and a < 0. For the case that
a ≥ 0,
X −µ
y − b − aµ
y−b
=P
≤
P (Y ≤ y) = P X ≤
a
σ
aσ
y − b − aµ
= Φ
,
aσ
showing that Y is N (aµ+b, a2 σ 2 ) distributed. For the case that a < 0,
y−b
y − b − aµ
X −µ
P (Y ≤ y) = P X ≥
=P
≥
a
σ
aσ
−y + b + aµ
y − b − aµ
= 1−Φ
=1−Φ
.
aσ
|a|σ
Using the fact that Φ(−x)
=1 − Φ(x) for any x > 0, it next follows
y−b−aµ
that P (Y ≤ y) = Φ
. In other words, Y is N (aµ + b, a2 σ 2 )
|a|σ
distributed.
102
4.67 The number of heads in 10,000 tosses of a fair coin is approximately
normally distributed with expected value 5,000 and standard deviation
50. The outcome of 5,250 heads lies five standard deviations above
the expected value. Without doing any further calculations we can
conclude that the claim is highly implausible (1 − Φ(5) = 2.87 × 10−7 ).
4.68 (a) For any z ≥ 0, we have P (|Z| ≤ z) = P (−z ≤ Z ≤ z) =
Φ(z)−Φ(−z). Differentiation gives that |Z| has the probability density
function
1 2
2
√ e− 2 z
for z > 0.
2π
Using the change of variable v = z 2 , we get
√
Z ∞
Z ∞
1
2
2 − 1 z2
1
−
v
e 2 dv = √ .
z √ e 2 dz = √
E(|Z|) =
π
2π
2π 0
0
Also, noting that E(|Z|2 ) = E(Z 2 ) and using the fact that E(Z 2 ) = 1
for the N (0, 1) distributed random variableqZ, we get E(|Z|2 ) = 1.
2
π
1 − π2 .
R∞
(b) Let V = max(Z − c, 0). Since E(V ) = 0 P (V > v) dv (see
Problem 4.26), we have
Z ∞
Z ∞
[1 − Φ(x)] dx.
[1 − Φ(v + c)] dv =
E(V ) =
This gives σ 2 (|Z|) = 1 −
and so σ(|Z|) =
c
0
By partial integration, we next get
E(V ) = x[1 − Φ(x)]
The integral
R∞
c
1 2
xe− 2 x dx is
∞
c
R∞
1 2
c
2
1
+√
2π
Z
∞
1 2
xe− 2 x dx.
c
1 2
e−y dy = e− 2 c . Thus we get
1 2
1
E(V ) = −c[1 − Φ(c)] + √ e− 2 c .
2π
√
4.69 Since X − Y is N (0, 2σ 2 ) distributed, (X − Y )/(σ 2)√ispN (0, 1) distributed. Thus, using Problem 4.68, E(|X − Y |) = σ 2 2/π. Also,
E(X + Y ) = 2µ. The formulas for E(|X − Y |) and E(X + Y ) give
two equations in E[max(X, Y )] and E[min(X, Y )], yielding the sought
result.
q
Note: max(X, Y ) and min(X, Y ) have each standard deviation σ 1 − π1 .
103
4.70 The random variable Dn can be represented as
Dn = |X1 + · · · + Xn |,
where the random variable Xi is equal to 1 if the ith step of the
drunkard goes to the right and is otherwise equal to −1. The random
variables X1 , . . . , Xn are independent and have the same distribution
with expected value µ = 0 and standard deviation σ = 1. The central
limit theorem now tells us that X1 +· · ·+Xn is approximately normally
√
distributed with expected value 0 and standard deviation n for n
large. Thus,
x
−x
P (Dn ≤ x) ≈ Φ √
−Φ √
for x > 0.
n
n
In Problem 4.68, the expected value and the standard deviation of V =
|X| are given for a standard normally distributed random variable X.
√
Using this result and the fact that (X1 +· · ·+Xn )/ n is approximately
N (0, 1) distributed, the approximations for the expected value and the
standard deviation of Dn follow.
4.71 Let X1 and X2 be the two measurement errors. Since X1 and X2 are
independent, 21 (X1 + X2 ) is normally distributed with expected value
√
√
0 and standard deviation 21 0.0062 l2 + 0.0042 l2 = l 52/2,000. The
sought probability is
P
1
|X1 + X2 | ≤ 0.005l = P (−0.01l ≤ X1 + X2 ≤ 0.01l)
2
20 20 =Φ √
−Φ − √
= 0.9945.
52
52
4.72 The desired probability is P (|X1 − X2 | ≤ a). Since the random variables X1 and X2 are independent, the random variable X1 − X2 is
normal distributed
p with expected value µ = µ1 − µ2 and standard
deviation σ = σ12 + σ22 . It now follows that
P (|X1 − X2 | ≤ a) = P (−a ≤ X1 − X2 ≤ a)
−a − µ
X1 − X2 − µ
a−µ
=P
≤
≤
σ
σ
σ
!
!
a − (µ1 − µ2 )
−a − (µ1 − µ2 )
p
p
=Φ
−Φ
.
σ12 + σ22
σ12 + σ22
104
P
4.73 (a) The profit of Joe and his brother after 52 weeks is 52
i=1 Xi , where
1
the Xi are independent with P (Xi = 10) = 2 and P (Xi = −5) = 21 .
The
√ expected value and the standard deviation of the Xi are 2.5 and
62.5 − 2.52 = 7.5 dollars. The sought probability is
P
52
X
100 − 52 × 2.5 √
Xi ≥ 100 ≈ 1 − Φ
= 0.7105.
7.5 52
i=1
(b) Let Xi be the score of the ith roll. Then
P
80
X
300 − 80 × 3.5 √
= 0.9048.
Xi ≤ 300 ≈ Φ
1.7078 80
i=1
4.74 The random variable Yn = n1 (X1 + · · · + Xn ) − µ has expected value 0
2
and variance σn . Using the central limit theorem, Yn is approximately
2
N (0, σn distributed for n large. Since P (|Yn | > c) = P (Yn < −c) +
P (Yn > c), we have
−c√n c√n h
c√n i
P (|Yn | > c) ≈ Φ
+1−Φ
=2 1−Φ
.
σ
σ
σ
4.75 (a) The number of sixes in one throw
of 6r dice is distributed as the
P
binomial random variable S6r = 6r
X
k=1 k , where the Xk are independent 0−1 variables with expected value µ = 61 and standard deviation
√
σ = 61 5. We have
P (S6r ≥ r) = P
S
− 6rµ
√
≥0 .
σ 6r
6r
By the central limit theorem, this probability tends to 1 − Φ(0) = 21
as r → ∞.
(b) Let X1 , . . . Xn be independent and Poisson distributed random
variables with expected value 1. The sum X1 + · · · + Xn is Poisson
distributed with expected value n. Therefore,
n2
nn n
+ ··· +
P (X1 + · · · + Xn ≤ n) = e−n 1 + +
1!
2!
n!
Next repeat the arguments in (a).
4.76 The probability is about 1 − Φ(2.828) = 0.0023.
105
4.77 The number of even numbers in any given drawing of the lotto 6/45
has a hypergeometric distribution with expected value µ = 132
45 and
√
1
standard deviation σ = 15
299. By the central limit theorem, the
total number of even numbers that will be obtained in 52 drawings
of the lotto 6/45 is approximately normally distributed
with expected
√
value 52µ = 152.533 and standard deviation σ 52 = 8.313. The
outcome 162 lies (162 − 152.533)/8.313 = 1.14 standard deviations
above the expected value. This outcome does not cast doubts on the
unbiased nature of the lotto drawings.
4.78 The probability distribution of the total rainfall (in millimeters) next
year in Amsterdam can be modeled by a normal distribution with expected value 799.5 and standard deviation 121.39. The sought probability is equal to
1−Φ
1000 − 799.5 = 0.0493.
121.39
4.79 The payoff per game has an expected value of µ = 12 dollars and
√
a standard deviation of σ = 6,142 − 144 = 77.447 dollars. By the
central limit theorem, the probability of the casino losing money in a
given week is approximately
Φ −
5, 000 × 3 √
= 1 − Φ(2.739) = 3.1 × 10−3 .
77.447 5, 000
4.80 The total number of bets lost by the casino is X1 + · · · + Xn , where the
random variable Xi is equal to 1 if the casino loses the ith p
bet and Xi
is otherwise equal to 0. We have E(Xi ) = p and σ(Xi ) = p(1 − p).
By the central limit theorem, X1 + · · · + Xn has approximately a
normal distribution with expected value np and standard deviation
1√
[p(1 − p)] 2 n for large n. The casino loses money to the player if and
only if the casino loses 12 n + 1 or more bets (assume that n is even).
The probability of this is approximately equal to 1 − Φ (βn ), where
βn =
1
2n
+ 1 − np
1/2 √ .
p(1 − p)
n
The loss probability is about 0.1876, 0.0033, and 6.1 × 10−18 for n =
1,000, 10,000 and 100,000. Assuming one dollar is staked on each
106
bet, then for n plays the profit of the casino over the gamblers equals
Wn = n − 2 (X1 + · · · + Xn ). We have
1√
E(Wn ) = n(1 − 2p) and σ(Wn ) = 2[p(1 − p)] 2 n.
The random variable Wn is approximately normally distributed for
large n The standard normal density has 99% of its probability mass
to the right of point −2.326. This means that, with a probability of
approximately 99%, the profit of the casino over the player is greater
1√
than n(1 − 2p) − 2.326 × 2[p(1 − p)] 2 n.
4.81 The premium c should be chosen such that P rc − (X1 + · · · + Xn ) ≥
1
10 rc is at least 0.99, where Xi is the amount claimed by the ith
9
policy holder.
This probability can be approximated by Φ ( 10
rc −
√ √
9
rµ)/(σ r) . Thus c should be chosen such that ( 10 rc − rµ)/(σ r)
equals the 0.99th percentile 2.326 of the standard normal distribution.
Therefore,
2.326σ 10 µ+ √
c≈
.
9
r
4.82 The probability mass function of the number of copies of the appliance
to be used when an infinite supply would be available is a Poisson
distribution with expected value of 150
2 = 75. Suppose that Q copies
of the appliance are stored in the space ship. Let the exponentially
distributed random variable Xi be the lifetime (in days) of the ith copy
used. Then the probability of a shortage during the space mission
is P (X1 + · · · + XQ ≤ 150). The random variables X1 , . . . , XQ are
independent and have an expected value of λ1 days and a standard
deviation of λ1 days, where λ = 12 . By the central limit theorem,
X1 + · · · + XQ − 2Q
150 − 2Q
√
√
P (X1 + · · · + XQ ≤ 150) = P
≤
2 Q
2 Q
150 − 2Q
√
≈Φ
.
2 Q
The 0.001th percentile of the standard normal distribution is -3.0902.
√
= −3.0902 gives Q = 106.96 and so the
Solving the equation 150−2Q
2 Q
normal approximation suggests to store 107 units.
Note: The exact value of the required stock followsPby finding the
k
smallest value of Q for which the Poisson probability k>Q e−75 75k! is
smaller than or equal to 10−3 . This gives Q = 103.
107
4.83 Let Xi be the amount of dollars the casino owner loses on the ith bet.
18
Then the Xi are independent random variables with P (Xi = 10) = 37
√
19
45
and P (Xi = −5) = 37
. Then, E(Xi ) = 85
i ) = 37 38.
37 and σ(X
P2,500
The amount of dollars lost by the casino owner is i=1 Xi and is
85
and σ =
approximately N (µ, σ 2 ) distributed with µ = 2,500 × 37
√
38.
The
casino
owner
will
lose
more
than
6,500
dollars
with
50 × 45
37
a probability of about
6,500 − µ 1−Φ
= 0.0218.
σ
4.84 By a change to polar coordinates x = r cos(θ) and y = r sin(θ) with
dxdy = rdrdθ , it follows that
Z
∞
−∞
Z
∞
e
− 12 (x2 +y 2 )
−∞
Z
∞ Z 2π
1 2
e− 2 r r dr dθ
0
0
Z ∞
Z
1 2
re− 2 r dr = π
= 2π
dx dy =
0
∞
1 2
e− 2 r dr2 = 2π,
0
√
R∞ 1
using the fact that 0 e− 2 y dy = 2. This proves the result I = 2π.
√
Note: This result implies Γ( 21 ) = π. The change of variable t = 21 x2
√
R∞ 1 2
R∞
1
in I = 2 0 e− 2 x dx leads to 2π = √22 0 e−t t− 2 dt, showing that
√
Γ( 12 ) = π.
4.85 Let Vn be the bankroll (in dollars) of the gambler after the nth bet.
Then Vn = (1 − α)Vn−1 + αVn−1 Rn , where α = 0.05, V0 = 1,000 and
19
the Ri are independent random variables with P (Ri = 41 ) = 37
and
18
P (Ri = 2) = 37 . Iterating this equality gives Vn = (1 − α + αR1 ) ×
· · · × (1 − α + αRn )V0 . This leads to
ln(Vn /V0 ) =
n
X
i=1
ln(1 − α + αRi ).
The random variables Xi = ln(1 − α + αRi ) are independent. The
expected value and the variance of these random variables are
19
18
ln(0.9625) +
ln(1.05)
37
37
18 2
19 2
ln (0.9625) +
ln (1.05) − µ2 .
σ2 =
37
37
µ=
108
By the central limit theorem, the random variable ln(V100 /V0 ) is approximately N (100µ, 100σ 2 ) distributed (the gambler’s bankroll after
100 bets is approximately lognormally distributed). The probability that the gambler will take home more
than d dollars is P (Vn >
V0 + d) = P ln(Vn /V0 ) > ln(1 + d/V0 ) . This probability is approximately equal to
ln(1 + d/V ) − 100µ 0
1−Φ
10σ
and has the values 0.8276, 0.5494, 0.2581, and 0.0264 for d = 0, 500,
1,000, and 2,500.
4.86 Denoting by the random variable Fn the factor at which the size of the
population changes in the nth generation, the size Sn of the population
after n generations is distributed as (F1 × · · · × Fn )s0 . By the central
limit theorem,
n
X
ln(Fi ) + ln(s0 )
ln(Sn ) =
i=1
has approximately a normal distribution with expected value nµ1 +
√
ln(s0 ) and standard deviation σ1 n for n large, where µ1 and σ1 are
the expected value and the standard deviation of the ln(Fi ). The
numerical values of µ1 and σ1 are given by
µ1 = 0.5ln(1.25) + 0.5ln(0.8) = 0
p
σ1 = 0.5[ln(1.25)]2 + 0.5[ln(0.8)]2 = 0.22314.
Since ln(Sn ) has approximately a normal distribution with expected
√
value ln(s0 ) and standard deviation 0.22314 n, the probability distribution of Sn can be approximated by a lognormal distribution with
√
parameters µ = ln(s0 ) and σ = 0.22314 n.
√
4.87 The distance can be modeled as X 2 + Y 2 , where X and Y are independent N (0, 1) random variables. The random variable X 2 + Y 2 has
1
the chi-square density f (v) = 21 e− 2 v . We have
P(
p
X2
2
2
2
2
+ Y ≤ r) = P (X + Y ≤ r ) =
Z
r2
f (x) dx
for r > 0.
0
Hence the probability density of the distance from the center of the
1 2
target to the point of impact is 2rf (r2 ) = re− 2 r for r > 0. The
√
R ∞ 2 − 1 r2
expected value of the distance is 0 r e 2 dr = 21 2π. The density
109
√
1 2
re− 2 r assumes its maximum value 1/ e at r = 1.pThus the mode of
the distance is 1. The median of the distance is 2 ln(2), as follows
Rx
1 2
by solving the equation 0 re− 2 r dr = 0.5.
4.88 If the random variable X is positive, the result follows directly from
Rule 4.6 by noting that x1 is strictly decreasing for x > 0 and has − x12
as its derivative. If X can take on both positive and negative
values,
we use first principles. Then, by P (Y ≤ y) = P X1 ≤ y , we have
P (Y ≤ y) =
(
P (X ≤ 0) + P 0 < X ≤
P y1 ≤ X ≤ 0
1
y
for y > 0
for y ≤ 0.
Differentiation gives that Y has the density function y12 f y1 . The
1
1
desired result next follows by noting that y12 π(1+1/y
2 ) = π(1+y 2 ) .
q
2y
ds
is s =
4.89 The inverse of the function y =
m . We have dy =
q
1
2ym . An application of Rule 4.6 gives that the probability density
of the kinetic energy E is
1
2
2 ms
2
c3
r
y −my/c2
e
πm
for y > 0.
4.90 The conditions of Rule 4.6 are not satisfied for the random variable
ln(|X|a ). Noting that 0 < |X|a < 1 and so ln(|X|a ) < 0 , we get by
first principles that
P (ln(|X|a ) ≤ x) = P (ln(|X|) ≤
x
) = P (|X| ≤ ex/a ) for x ≤ 0.
a
Therefore, using the fact that X is uniformly distributed on (−1, 1),
P (ln(|X|a ) ≤ x) = P (−ex/a ≤ X ≤ ex/a ) =
2ex/a
for x ≤ 0.
2
This shows that the probability density of ln(|X|a ) is a1 ex/a for x < 0
and is 0 otherwise.
Note: For a < 0, the probability density of ln(|X|a ) is − a1 ex/a for
x > 0 and 0 otherwise. This result readily follows by noting that
ln(|X|a ) = − ln(|X|−a ) for a < 0.
110
4.91 It follows from P (Y ≤ y) = P X ≤
P (Y ≤ y) =
ln(y)
ln(10)
ln(y) ln(10)
that
for 1 ≤ y ≤ 10.
1
Next, differentiation shows that Y has the density function ln(10)y
for
1 < y < 10.
4.92 (a) The Weibull distributed random variable has the cumulative probα
ability distribution function F (x) = 1 − e−(λx) for x ≥ 0. Letting
u be random number between 0 and 1, the solution of the equation
F (x) = u gives the random observation x = λ1 [− ln(1 − u)]1/α from the
Weibull distribution. Since 1 − u is also a random number between 0
and 1, one can also take
x=
1
[− ln(u)]1/α
λ
as a random observation from the Weibull distribution. The Weibull
distribution with α = 1 is the exponential distribution.
(b) Let u be random number between 0 and 1, then solving the
u
equation F (x) = u for F (x) = ex /(1 + ex ) gives ex = 1−u
. Thus
u
x = ln 1−u is a random observation from the logistics distribution.
4.93 Generate n random
interval (0, 1). Then
numbers u1 , . . . , un from the
1
1
the number − λ ln(u1 ) + · · · + ln(un ) = − λ ln(u1 × · · · × un ) is a
random observation from the gamma distributed random variable.
4.94 It is easiest to consider first the case of a random variable V having
a triangular density on (0, 1) with m0 as its most likely value. The
density function f (v) of V is given by f (v) = 2v/m0 for 0 < v ≤ m0
and f (v) = 2(1 − v)/(1 − m0 ) for m0 < v < 1. The probability
distribution function of V is
( 2
v
for 0 ≤ v ≤ m0
0
P (V ≤ v) = m
2v−v 2 −m0
for m0 ≤ v ≤ 1.
1−m0
To obtain random observation from V , generate a random number u
between 0 and 1 and solve v from the equation P (V
p ≤ v) = u. The
√
solution is v = m0 u if 0 < u ≤ m0 and v = 1 − (1 − m0 )(1 − u)
for m0 < u < 1. If X has a triangular density on (a, b) with m as its
most likely value, then the random variable V = (X − a)/(b − a) has a
111
triangular density on (0, 1) with m0 = (m − a)(b − a). Hence a random
√
observation from X is given
p by x = a + (b − a) m0 u if 0 < u ≤ m0
and x = a + (b − a)[1 − (1 − m0 )(1 − u)] if m0 < u < 1.
4.95 (a) The random variable X is with probability p distributed as an
exponential random variable with parameter λ1 and with probability
1 − p as an exponential random variable with parameter λ2 . Hence
generate two random numbers u1 and u2 from (0, 1). The random
observation is −(1/λ1 ) ln(u1 ) if u1 ≤ p and is −(1/λ2 ) ln(u2 ) otherwise.
(b) Let V be exponentially distributed with parameter λ. Then the
random variable X is with probability p distributed as V and with
probability 1 − p as −V . Hence generate two random numbers u1 and
u2 from (0, 1). The random observation is − λ1 ln(u2 ) if u1 ≤ p and
1
λ ln(u2 ) otherwise. This simulation method is called the composition
method.
f (x)
4.96 Apply the definition r(x) = 1−F
(x) to get the expression for r(x).
It is a matter of elementary but tedious algebra to show that r(x)
is first decreasing and then increasing with a bathtub shape. As an
illustration of the bathtub shape of r(x), we give in the figure the
graph of r(x) for ν = 0.5.
r(x)
20
16
12
8
4
0
x
0
0.2
0.4
0.6
0.8
1
112
4.97 Since P (V > x) = P (X1 > x, . . . Xn > x), it follows from the independence of the Xk and the failure rate representation of the reliability
function that
P (V > x) = P (X1 > x) · · · P (Xn > x)
= e−
Rx
0
This gives P (V > x) = e−
r1 (y) dy
· · · e−
Rx
0
R x Pn
0 [ k=1 rk (y)]dy
4.98 Noting that 1 − F (x) = e−
Rx
0
r(t)dt
rn (y) dy
for x ≥ 0.
, proving the desired result.
, we get
1 − F (x) = e− ln(1+x) =
1
1+x
for x ≥ 0.
4.99 Let F (x) = P (X ≤ x). Since
Z x
Z x
r(t) dt = λ
d ln(1 + tα ) = λ ln(1 + xα ),
0
0
α
it follows that F (x) = 1 − e−λ ln(1+x ) . Thus the reliability function is
given by
1 − F (x) = (1 + xα )−λ for x ≥ 0.
The derivative r′ (x) = (α − 1 − xα )/(1 + xα )2 . For the case that
α > 1, the derivative is positive for x < (α − 1)1/α and negative for
x > (α − 1)1/α , showing that r(x) first increases and then decreases.
4.100 Let Fi (x) = P (Xi ≤ x). Then
R s+t
e− 0 ri (x)dx
1 − F (s + t
= − R s r (x)dx .
P (Xi > s + t | Xi > s) =
1 − F (s)
e 0 i
Since r1 (x) = 21 r2 (x), we get
1
P (X1 > s + t | X1 > s) =
e− 2
e
R s+t
− 12
0
Rs
0
r2 (x)dx
r2 (x)dx
=
e−
e
R s+t
−
0
Rs
0
r2 (x)dx 1
r2 (x)dx
2
.
p
showing that P (X1 > s + t | X1 > s) = P (X2 > s + t | X2 > s).
R∞
2.1
2.1
4.101 The integral 1,000 e−(y/1,250) dy/e−(1,000/1,250) gives the mean residual life time m(1,000). By numerical integration, m(1,000) = 516.70
hours.
113
P∞
4.102 In
P∞ − i=1 pi log pi subject to pi ≥ 0 for all i,
P∞order to minimize
i=1 ipi = µ, form the Lagrange function
i=1 pi = 1 and
F (p1 , p2 , . . . , λ1 , λ2 ) = −
∞
X
pi log pi +λ1
∞
X
i=1
i=1
∞
X
ipi −µ ,
pi −1 +λ2
i=1
where λ1 and λ2 are the Lagrange multipliers. Putting ∂F/∂pi = 0
gives the equations −1 − log pi + λ1 + λ2 i = 0 for i ≥ 1 and so
pi = eλ1 −1+λ2 i
for i ≥ 1.
P∞
λ1 −1 = (1 −
The condition
i=1 pi = 1 implies that λ2 < 0 and e
λ
−λ
λ
λ
(i−1)
e 2 ) e 2 . Hence we have pi = (1 − e 2 ) e 2
for i ≥ 1. Letting
i−1
λ
2
r = 1 − e , we get the
P∞geometric distribution 1pi = (1 − r) r 1for
i ≥ 1. The condition i=1 ipi = µ implies that r = µ and so r = µ .
4.103 Form the Lagrange function
F (p1 , . . . , pn , λ1 , λ2 ) = −
X
pi log pi +λ1
n
X
i=1
i≥1
n
X
pi −1 +λ2
pi Ei −E ,
i=1
where λ1 and λ2 are Lagrange multipliers. Putting ∂F/∂pi = 0 results
in −1 − log pi + λ1 + λ2 Ei = 0Pand so pi = eλ1 −1+λ2 Ei for all
Pi. Substituting pi into the constraint ni=1 pi = 1 gives eλ1 −1 = 1/ ni=1 eλ2 Ei .
Thus
eλ 2 E i
p i = Pn
for all i.
λ2 Ek
k=1 e
P
Substituting this into ni=1 pi Ei = E gives the equation
n
X
i=1
Ei eλ 2 E i − E
n
X
eλ 2 E k = 0
k=1
for λ2 . Replacing λ2 by −β, we get the desired expression for the p∗i
and the equation for the unknown β.
4.104 Apply Rule 10.8 with E1 = 4.50, E2 = 6.25, E3 = 7.50, and E =
5.75. Using a numerical root-finding method, we obtain β = 0.218406.
The maximum entropy probabilities of a customer ordering a regular
cheeseburger, a double cheeseburger and a big cheeseburger are 0.4542,
0.30099 and 0.2359.
114
Chapter 5
5.1 Let X be the low points rolled and Y be the high points rolled. These
random variables are defined on the sample space consisting of the 36
equiprobable outcomes (i, j) with 1 ≤ i, j ≤ 6, where i is the number
shown by the first die and j is the number shown by the second die.
For k < l, the event {X = k, Y = l} occurs for the outcomes (k, l) and
2
(l, k). This gives P (X = k, Y = l) = 36
for 1 ≤ k < l ≤ 6. Further,
1
P (X = k, Y = k) = 36 for all k.
5.2 Imagine that the 52 cards are numbered as 1, 2, . . . , 52. The random
variables
X and Y are defined on the sample space consisting of all
52
sets
of
13 different numbers from 1, 2, . . . , 52. Each element of the
13
sample space is equally likely. The number
ω for which
of 26elements
13 13
X(ω) = x and Y (ω) = y is equal to x y 13−x−y . Hence the joint
probability mass function of X and Y is given by
26 13 13
P (X = x, Y = y) =
x
y
13−x−y
52
13
for all integers x, y with x, y ≥ 0 and x + y ≤ 13.
5.3 The joint mass function of X and Y − X is
P (X = x, Y − X = z) = P (X = x, Y = z + x) =
e−2
x!z!
P∞ e−2
P
e−1
e−1
e−2
for x, z = 0, 1, . . .. Since ∞
x=0 x!z! = z! , the
z=0 x!z! = x! and
marginal distributions of X and Y − X are Poisson distributions with
expected value 1. Noting that
P (X = x, Y − X = z) = P (X = x)P (Y − X = z)
for all x, z,
we have by Rule 3.7 that X and Y − X are independent. Thus, using Example 3.14, the random variable Y is Poisson distributed with
expected value 2.
5.4 The joint probability mass function of X and Y satisfies
3+i
P (X = i, Y = 4 + i) =
0.45i 0.554 for i = 0, 1, 2, 3
i
3+k
0.454 0.55k for k = 0, 1, 2, 3.
P (X = 4, Y = 4 + k) =
k
The other P (X = i, Y = j) are zero.
115
5.5 The sample space for X and Y is the set of the 10
3 = 120 combinations
of three distinct numbers from 1 to 10. The joint mass function of X
and Y is
y−x−1
for 1 ≤ x ≤ 8, x + 2 ≤ y ≤ 10.
P (X = x, Y = y) =
120
The marginal distributions are
10
X
y−x−1
(10 − x)(9 − x)
P (X = x) =
=
for 1 ≤ x ≤ 8
120
240
y=x+2
P (Y = y) =
y−2
X
x=1
(y − 1)(y − 2)
y−x−1
=
for 3 ≤ y ≤ 10.
120
240
Further, for 2 ≤ k ≤ 9,
P (Y − X = k) =
10−k
X
P (X = x, Y = x + k) =
x=1
(k − 1)(10 − k)
.
120
5.6 The random variable X and Y are defined on a countably infinite
sample space consisting of all pairs (x, y) of positive integers with
x 6= y. The joint probability mass function of X and Y is given by
y−x−1
x−1
1
9
1
8
for 1 ≤ x < y
P (X = x, Y = y) =
10
10 10
10
x−y−1
y−1
1
9
1
8
for 1 ≤ y < x.
P (X = x, Y = y) =
10
10 10
10
Let the random variables V and W be defined by V = min(X, Y ) and
W = max(X, Y ). Then,
P (V = v) =
∞
X
P (X = v, Y = y) +
∞
X
P (X = x, Y = v)
y=v+1
y=v+1
8 v−1 1
=2
for v = 1, 2, . . . .
10
10
Pw−1
1 Pw−1
x−1 (9/10)w−x−1 = 1 (9/10)w
x
Noting that 100
x=1 (8/10)
x=1 (8/9) ,
72
we find after some algebra that
P (W = w) =
w−1
X
P (X = x, Y = w) +
x=1
=
8 w−1 2 9 w 1−
9 10
9
w−1
X
P (X = w, Y = y)
y=1
for w = 2, 3, . . . .
116
n n 1 n
5.7 Using the formula P (X = x, Y = y, N = n) = 16 nx 21
y
2 , we
find that
6 2n
1X n n
1
P (X = x, Y = y) =
for 0 ≤ x, y ≤ 6.
y
x
6
2
n=1
Since P (X = Y ) =
1 P6
n=1
6
n 2
1 2n Pn
x=0 x ,
2
it follows that
2n
6 1 X 2n
1
P (X = Y ) =
= 0.3221.
n
6
2
n=1
5.8 The random variables X, Y and N are defined on a countably infinite state space.
The event {X = i, Y = j, N = n} can occur in
n−1 n−1−(i−1)
ways. This is the number of ways to choose i − 1
i−1
j−1
places for the first i − 1 heads of coin 1 and to choose j − 1 nonoverlapping places for j − 1 heads of coin 2 in the first n − 1 tosses.
Thus the joint probability mass function of X, Y and N is given by
n
n−1 n−i
1
P (X = i, Y = j, N = n) =
4
i−1
j−1
for
P∞i, j = 1, 2, . . . and n = i + j − 1, i + j, . . .. By P (X = i, Y = j) =
n=i+j−1 P (X = i, Y = j, N = n), it follows that
n
∞
X
(n − 1)!(n − i)!
1
P (X = i, Y = j) =
(i − 1)!((n − i)!(n − i − j + 1)!(j − 1)! 4
n=i+j−1
n
X
∞
1
n−1
i+j−2
.
=
4
i+j−2
i−1
n=i+j−1
Using the identity
follows that
P∞
m=r
m m
= am /(1 − a)m+1 for 0 < a < 1, it
r a
m+1
X
∞
1
m
i+j−2
P (X = i, Y = j) =
i+j−2
i−1
4
m=i+j−2
i+j−1
i+j−2
1
=
3
i−1
P∞
By P (X = i) = j=1 P (X = i, Y = j),
i+j−1
n+1
∞ ∞ X
X
i+j−2
1
n
1
P (X = i) =
=
.
i−1
3
3
i−1
j=1
n=i−1
117
Using again the identity
it follows that
P∞
m=r
m
r
m
a = am /(1 − a)m+1 for 0 < a < 1,
i
1
P (X = i) =
2
for i = 1, 2, . . . .
Further, we have
P (X = Y ) =
2n−1
∞ X
1
2n − 2
3
n−1
n=1
∞ k
1
1 X 2k
=
.
k
3
9
k=0
k
√
x = 1/ 1 − 4x for |x| < 41 , the numeriUsing the identity
cal value 0.4472 is obtained for P (X = Y ).
P∞
2k
k=0 k
5.9 The constant c must satisfy
Z ∞Z x
Z
−2x
1=c
e
dx dy = c
0
0
x
xe−2x dx.
0
−2x integrates to 1 over
Noting that the Erlang probability density 4xe
RR
(0, ∞), we find c = 4. By P ((X, Y ) ∈ C) = C f (x, y) dx dy, we have
that Z = X − Y satisfies
Z ∞ Z ∞
Z ∞
−2x
P (Z > z) =
dy
4e
dx =
2e−2(y+z) dy = e−2z for z > 0.
0
y+z
0
Thus Z has the exponential density 2e−2z .
R∞R∞
5.10 The constant c is determined by c 0 0 xe−2x(1+y) dx dy = 1. The
R ∞ k uk−1 −λu
e
du = 1 for any integer k ≥ 1
gamma density satisfies 0 λ(k−1)!
and any λ > 0. Using this identity with k = 1 and λ = 2x, we get
Z ∞Z ∞
Z ∞
Z ∞
1
−2x(1+y)
−2x
xe
dx dy = c
c
2xe−2xy dy
e
dx
2 0
0
0
0
Z ∞
1
1
= c
e−2x dx = c = 1
2 0
4
and so c = 4. Let the random variable Z be defined
by Z = XY .
RR
Then, using the basic formula P ((X, Y ) ∈ C) = C f (x, y) dx dy, we
find
Z ∞ Z z/x
xe−2x(1+y) dx dy
P (XY ≤ z) = 4
0
0
=2
Z
∞
0
xe
−2x
dx
Z
z/x
0
2xe−2xy dy,
118
and so
Z
∞
e−2x (1 − e−2xz/x ) dx
0
Z ∞
−2z
2e−2x dx = 1 − e−2z
= (1 − e )
P (XY ≤ z) = 2
for z > 0.
0
√
R1 R1√
−1
5.11 Since c 0 dx 0 x + y dy = 1, the constant
RR c = (15/4)(4 2 − 2) .
Using the basic formula P ((X, Y ) ∈ C) = C f (x, y) dx dy, it follows
that
Z z−x
Z z
Z
√
2c z 3/2
dx
x + y dy =
P (X + Y ≤ z) = c
(z − x3/2 ) dx
3 0
0
0
2c √
= z 2 z for 0 ≤ z ≤ 1
5
Z 1
Z 1
√
x + y dy
dx
P (X + Y ≤ z) = c
z−x
z−1
2c
4c 5/2
2 − z 5/2 −
(2 − z) z 3/2 for 1 ≤ z ≤ 2.
=
15
3
√
Differentiation gives that the density function of X + Y is cz z for
√
0 < z < 1 and c(2 − z) z for 1 ≤ z < 2.
√
5.12 The random variable V = 2π X 2 + Y 2 gives the circumference of
1
2
2
the circle. Thus P (V > π)
RR = P (X + Y > 4 ). Using the basic
formula P ((X, Y ) ∈ C) = C f (x, y) dx dy with C = {(x, y) : x, y ≥
0, x2 + y 2 ≤ 14 }, we get
√
Z 0.5 Z 0.25−x2
1
P X2 + Y 2 ≤
(x + y) dy
dx
=
4
0
0
Z 1 p
1
x 0.25 − x2 dx +
=
24
0
Z 0.25
1
1√
1
0.25 − u du +
= .
=
2
24
12
0
Therefore P (V > π) =
11
12 .
5.13 Let U1 and U2 be independent and uniformly distributed on (0, 1).
Then, for ∆x and ∆y small,
P x < X ≤ x + ∆x, y < Y ≤ y + ∆y = P x < U1 ≤ x + ∆x,
y < U2 ≤ y + ∆y) + P (x < U2 ≤ x + ∆x, y < U1 ≤ y + ∆y
∆x ∆y
×
for 0 < x < y < a
=2
a
a
119
Therefore the joint density function of X and Y is given by f (x, y) = a22
for 0 < x < y < a and f (x, y) = 0 otherwise. Alternatively, the joint
density f (x, y) can be obtained from
P (X > x, Y ≤ y) =
Next, use the identity
y − x 2
for 0 ≤ x ≤ y ≤ a.
a
P (X ≤ x, Y ≤ y) + P (X > x, Y ≤ y) = P (Y ≤ y)
∂
∂
and apply f (x, y) = ∂x
P
(X
≤
x,
Y
≤
y)
.
∂y
5.14 The joint density function of (X, Y, Z) is given by f (x, y, z) = 1 for
0 < x, y, z < 1 and
f (x,
RRRy, z) = 0 otherwise. Using the representation
P (X, Y, Z) ∈ D =
D f (x, y, z) dx dy dz with D = {(x, y, z) : 0 ≤
x, y, z ≤ 1, x + y < z}, we get
Z 1 Z z Z z−y
dx.
dy
dz
P (X + Y < Z) =
0
0
This integral can be evaluated as
Z 1 Z z
Z
dz
(z − y) dy =
0
0
1
0
0
1 2
1
z dz = .
2
6
Since P max(X, Y ) < Z is the probability that Z is the largest of
the three components X, Y , and Z, we have by a symmetry argument
that P max(X, Y ) < Z = 13 . Thus, by P max(X, Y ) > Z =
1 − P max(X, Y ) < Z ,
1
2
P max(X, Y ) > Z = 1 − = .
3
3
R 1 R 1 R max(x,y)
Alternatively, P max(X, Y ) > Z = 0 dx 0 dy 0
dz. This
R1
Rx
R1
R1
integral is 0 dx[ 0 x dy + x y dy] = 0 [x2 + 12 (1 − x)2 ] dx = 32 .
RR
5.15 Using the basic formula P ((X, Y ) ∈ C) = C f (x, y) dx dy, it follows
that
Z ∞
Z 10
1
1
e− 2 (y+3−x) dy
dx
P (X < Y ) =
10 5
x
Z 10
1
3
1
1
e− 2 (3−x) 2 e− 2 x = e− 2 .
=
10 5
120
RR
5.16 Let Z = X+Y . By the basic formula P ((X, Y ) ∈ C) = C f (x, y) dx dy,
we get that P (Z ≤ z) is given by
Z z−x
Z z
Z
Z
1 z
1 z
−(x+y)
(x + y)e
dy =
ue−u du
dx
dx
2 0
2 0
0
x
Z
1
1 z
(−ze−z + xe−x + e−x − e−z )dx = 1 − e−z (1 + z + z 2 )
=
2 0
2
for z ≥ 0. Hence the density function of Z = X + Y is f (z) = 21 z 2 e−z
for z > 0 and f (z) = 0 otherwise. This is the Erlang density with
shape parameter 3 and scale parameter 1.
5.17 We have P (max(X, Y ) > a min(X, Y )) = P (X > aY ) + P (Y > aX).
Thus, by a symmetry argument,
P (max(X, Y ) > a min(X, Y )) = 2P (X > aY ).
The joint density of X and Y is f (x, y) = 1 for 0 < x, y < 1 and so
Z 1 Z x/a
Z 1
x
1
P (X > aY ) =
dx
dy =
dx =
.
2a
0
0
0 a
The sought probability is a1 .
5.18 The expected value of the time until the electronic device goes down
is given by
Z ∞Z ∞
24
(x + y)
E(X + Y ) =
dxdy
(x + y)4
Z1 ∞ 1 Z ∞
Z ∞
24
12
=
dx
dy =
dx = 6.
3
(x + y)
(x + 1)2
1
1
1
To find the density function of X + Y , we calculate P (X + Y > t) and
distinguish between 0 ≤ t ≤ 2 and t > 2. Obviously, P (X +Y > t) = 1
for 0 ≤ t ≤ 2. For the case of t > 2,
Z ∞
Z t−1 Z ∞
Z ∞
24
24
dx
dx
dy +
dy
P (X + Y > t) =
4
(x + y)4
1
t−x (x + y)
t−1
1
Z t−1
Z ∞
8
8
8(t − 2)
4
=
dx +
dx =
+ 2.
3
3
3
t
t
t
1
t−1 (x + 1)
By differentiation, the density function g(t) of X + Y is g(t) =
for t > 2 and g(t) = 0 otherwise.
24(t−2)
t4
121
5.19 The time until both components are down is T = max(X, Y ). Noting
that P (T ≤ t) = P (X ≤ t, Y ≤ t), it follows that
Z t
Z
1 t
P (T ≤ t) =
dx (2y + 2 − x)dy = 0.125t3 + 0.5t2 for 0 ≤ t ≤ 1
4 0
0
Z t Z 1
1
(2y + 2 − x)dy = 0.75t − 0.125t2 for 1 ≤ t ≤ 2.
dx
P (T ≤ t) =
4 0
0
The density function of T is 0.375t2 + t for 0 < t < 1 and 0.75 − 0.25t
for 1 ≤ t < 2.
5.20 Let X and Y be the two random points at which the stick is broken with X being the point that is closest to the left end point of
the stick. Assume that the stick has length 1. The joint density
function of (X, Y ) satisfies f (x, y) = 2 for 0 < x < y < 1 and
f (x, y) = 0 otherwise. To see this, note that X = min(U1 , U2 ) and
Y = max(U1 , U2 ), where U1 and U2 are independent and uniformly
distributed on (0, 1). For any 0 < x < y < 1 and dx > 0, dy > 0
sufficiently small, P (x ≤ X ≤ x + dx, y ≤ Y ≤ y + dy) is equal to
the sum of P (x ≤ U1 ≤ x + dx, y ≤ U2 ≤ y + dy) and P (x ≤ U2 ≤
x + dx, y ≤ U1 ≤ y + dy). By the independence of U1 and U2 , this
gives
P (x ≤ X ≤ x + dx, y ≤ Y ≤ y + dy) = 2dxdy,
showing that f (x, y) = 2 for 0 < x < y < 1. All three pieces are no
longer than half the length of the stick if and only if X ≤ 0.5, Y − X ≤
0.5 and 1 − Y ≤ 0.5. That is, (X, Y ) should satisfy 0 ≤ X ≤ 0.5 and
0.5 ≤ Y ≤ 0.5 + X. It now follows that
P (no piece is longer than half the length of the stick)
Z 0.5
Z 0.5+x
Z 0.5
1
x dx = .
2dy = 2
dx
=
4
0
0.5
0
RR
5.21 (a) Using the basic formula P ((X, Y ) ∈ C) = C f (x, y) dx dy, it
follows that the sought probability is
Z 1Z 1
χ(a, b)f (a, b) da db,
P (B 2 ≥ 4A) =
0
0
where χ(a, b) = 1 for b2 ≥ 4a and χ(a, b) = 0 otherwise. This leads to
Z 1 Z b2 /4
2
(a + b) da = 0.0688.
db
P (B ≥ 4A) =
0
0
122
(b) Similarly,
2
P (B ≥ 4AC) =
Z
0
1Z 1Z 1
0
χ(a, b, c)f (a, b, c) da db dc,
0
where χ(a, b, c) = 1 for b2 ≥ 4ac and χ(a, b, c) = 0 otherwise. A
convenient order of integration for P (B 2 ≥ 4AC) is
2
3
Z
1
db
0
Z
b2 /4
da
0
Z
1
0
2
(a+b+c) dc+
3
Z
1
db
0
Z
1
da
b2 /4
Z
b2 /(4a)
(a+b+c) dc.
0
This leads to P (B 2 ≥ 4AC) = 0.1960.
5.22 The marginal density of X is given by
Z ∞
Z
−2x(1+y)
−2x
4xe
dy = 2e
fX (x) =
0
= 2e−2x
∞
2xe−2xy dy
0
for x > 0.
The marginal density of Y is given by
Z ∞
Z ∞
2
1
−2x(1+y)
xe−2x(1+y) dx
2(1
+
y)
4xe
dx =
fY (y) =
(1 + y)2 0
0
1
=
for y > 0,
(1 + y)2
using the fact that the gamma density λ2 xe−λx for x > 0 integrates
to 1 over (0, ∞).
Rx
5.23 The marginal densities Rof X and Y are fX (x) = 0 4e−2x dy = 4xe−2x
∞
for x > 0 and fY (y) = y 4e−2x dx = 2e−2y for y > 0.
5.24 The marginal density of X is given by
Z 1−x
1
(3 − 2x − y) dy = (3 − 2x)(1 − x) − (1 − x)2
fX (x) =
2
0
= 1.5x2 − 4x + 2.5 for 0 < x < 1.
The marginal density of Y is given by
Z 1−y
(3 − 2x − y) dx = 3(1 − y) − (1 − y)2 − y(1 − y)
fY (y) =
0
= 2 − 2y
for 0 < y < 1.
123
√
5.25 The joint density of X and Y is f (x, y) = 4/ 3 for (x, y) inside the
triangle. The marginal density of X is
(R √
x 3
f (x, y) dy = 4x
for 0 < x < 0.5
fX (x) = R0(1−x)√3
f (x, y) dy = 4(1 − x)
for 0.5 < x < 1.
0
The marginal density of Y is
Z 1−y/√3
8y
4
f (x, y) dx = √ −
fY (y) =
√
3
3
y/ 3
for 0 < y <
1√
3.
2
5.26 Since f (x, y) = x1 for 0 < Rx < 1 and 0 < y < x and f (x, y) = 0
x
otherwise, we get fX (x) = 0 x1 dx = 1 for 0 < x < 1 and fY (y) =
R1 1
y x dx = − ln(y) for 0 < y < 1.
RR
5.27 Using the basic formula P ((X, Y ) ∈ C) = C f (x, y) dx dy, we have
Z x Z v+z
e−w dw = (1 − e−x )(1 − e−z )
dv
P (X ≤ x, Y − X ≤ z) =
0
v
for x, z > 0. By partial differentiation, we get that the joint density
of X and Z = Y − X is f (x, z) = e−x e−z for x, z > 0. The marginal
densities of X and Z are the exponential densities fX (x) = e−x and
fZ (z) = e−z . TheR time until the Rsystem goes down is Y . The density
y
y
function of Y is 0 f (x, y) dx = 0 e−y dx = ye−y for y > 0. This is
the Erlang density with shape parameter 2 and scale parameter 1.
5.28 The joint density of X and Y is f (x, y) = 1 for 0 < x, y < 1. The
area
RR of the rectangle is Z = XY . Using the relation P ((X, Y ) ∈ C) =
C f (x, y) dx dy, it follows that
Z z/x
Z 1
Z 1
Z z
dy = z −zln(z) for 0 ≤ z ≤ 1.
dx
dy +
dx
P (Z ≤ z) =
0
0
z
0
The density function of Z is f R(z) = − ln(z) for 0 < z < 1. The ex1
pected value of Z is E(Z) = − 0 z ln(z) dz = 41 . Note that E(XY ) =
E(X)E(Y ).
5.29 Let X and Y be the packet delays on the two lines. The joint density
of X and Y is f (x, y) = RR
λe−λx λe−λy for x, y > 0. Using the basic
formula P ((X, Y ) ∈ C) = C f (x, y) dx dy, we obtain
Z ∞
Z ∞
1
−λy
λe−λx dx = e−λv
λe
dy
P (X − Y > v) =
2
y+v
0
124
for v ≥ 0. For any v ≤ 0, P (X − Y ≤ v) = P (Y − X ≥ −v). Thus,
by symmetry, P (X − Y ≤ v) = 12 eλv for v ≤ 0. Thus the density of
X − Y is 12 λe−λ|v| for −∞ < v < ∞, which is the so-called Laplace
density.
5.30 It is easiest
to derive the results by using the basic relation P ((X, Y ) ∈
RR
C) = C f (x, y) dx dy. The joint density of X and Y is f (x, y) = 1 for
0 < x, y < 1. Let V = 21 (X + Y ). Then P (V ≤ v) = P (X + Y ≤ 2v).
Thus
Z 2v Z 2v−x
dy = 2v 2 for 0 ≤ v ≤ 0.5
dx
P (V ≤ v) =
0
0
Z 1
Z 1
dy = 2 − 4v + 2v 2 for 0.5 ≤ v ≤ 1.
dx
P (V > v) =
2v−x
2v−1
Thus fV (v) = 4v for 0 < v ≤ 12 and fV (v) = 4 − 4v for 0.5 < v < 1.
This is the triangular density with a = 0, b = 1, m = 0.5.
To get the density of |X − Y |, note that
P (|X − Y | ≤ v) = P (X − Y ≤ v) − P (X − Y ≤ −v) for 0 ≤ v ≤ 1.
Also, P (X − Y ≤ −v) = P (Y − X ≥ v) for 0 ≤ v ≤ 1. Thus
P (|X − Y | ≤ v) = 2P (X − Y ≤ v) − 1 for 0 ≤ v ≤ 1. We have
Z 1
Z 1
Z 1−v Z y+v
dx
dy
dx +
dy
P (X − Y ≤ v) = P (X ≤ Y + v) =
1−v
0
0
1
1
= v − v2 +
2
2
0
for 0 ≤ v ≤ 1.
Thus P (|X − Y | ≤ v) = 2(v − 21 v 2 + 12 ) − 1 for 0 ≤ v ≤ 1. Therefore
the density of |X − Y | is 2(1 − v) for 0 < v < 1. This is the triangular
density with a = 0, b = 1, m = 1.
5.31 We have
P (F ≤ c) = P (X + Y ≤ c) + P (1 ≤ X + Y ≤ c + 1) for 0 ≤ c ≤ 1.
R c R c−x
Since P (X + Y ≤ c) = 0 dx 0 dy and P (1 ≤ X + Y ≤ c + 1) =
R 1 R min(c+1−x, 1)
dy, we get for any 0 ≤ c ≤ 1 that
0 dx 1−x
1
P (X + Y ≤ c) = c2
2
P (1 ≤ X + Y ≤ c) =
Z
c
dx
0
Z
1
dy +
1−x
Z
1
dx
c
Z
c+1−x
1−x
1
dy = c2 + c(1 − c).
2
This gives P (F ≤ c) = c for all 0 ≤ c ≤ 1, proving the desired result.
125
5.32 Let X be uniformly distributed on (0, 24) and Y be uniformly distributed on (0, 36), where X and Y are independent. The sought
probability is given by P (X < Y < X + 10) + P (Y < X < Y + 10).
Since
Z x+7
Z 24
1
7
1
dx
dy =
P (X < Y < X + 10) =
24
36
36
x
0
Z 24
Z min(24, y+7)
1
1
287
P (Y < X < Y + 10) =
dy
dx =
,
36
24
1728
0
y
we find that the sought probability is equal to
7
36
+
287
1728
= 0.3605.
5.33 The joint density function f (x, y) of X and Y satisfies f (x, y) =
fX (x)fY (y) and is equal to 1 for all 0 < x,RR
y < 1 and 0 otherwise. Using the relation P ((X, Y ) ∈ C) =
C f (x, y) dx dy with
C = {(x, y) : 0 < x < min(1, yz), 0 < y < 1}, we get
P (Z ≤ z) =
Z
1
dy
0
Z
min(1,zy)
dx
for z > 0.
0
Distinguish between the cases 0 ≤ z ≤ 1 and z > 1. For 0 ≤ z ≤ 1.
P (Z ≤ z) =
Z
1
dy
0
Z
zy
dx =
0
Z
1
0
1
zy dy = z.
2
For z > 1,
P (Z ≤ z) =
Z
1/2
dy
0
Z
zy
dx +
0
Z
1
dy
1/z
Z
1
0
dx = 1 −
1
.
2z
Hence the density function of Z is 12 for 0 < z ≤ 1 and 2z12 for z > 1.
The probability that the first significant digit of Z equals 1 is given by
∞
X
n=0
n
n
P (10 ≤ Z < 2 × 10 ) +
5
1
1
=
+
= .
18 18
3
∞
X
n=1
P (10−n ≤ Z < 2 × 10−n )
In general, the probability that the first significant digit of Z equals k
is
1
1
10
×
+
for k = 1, . . . , 9.
18 k(k + 1) 18
126
5.34 We have
P (Z ≤ z) =
Z
∞
λe
−λx
dx
0
Z
∞
λe
−λy
dy =
x/z
Thus the density function of Z is
Z
∞
e−λx/z λe−λx dx =
0
z
.
1+z
1
.
(1+z)2
5.35 We have
P max(X, Y ) ≤ t = P (X ≤ t, Y ≤ t) = P (X ≤ t)P (Y ≤ t)
2
= 1 − e−λt
for t > 0.
Also,
1
P X+ Y ≤t =
2
Z
t
λe−λx dx
0
Z
2(t−x)
0
2
λe−λy dy = 1 − e−λt .
5.36 (a) The formula is true for n = 1. Suppose that the formula has
been verified for n = 1, . . . , k. This means that the density function of
sk−1
X1 + · · · + Xk satisfies (k−1)!
for 0 < s < 1. Then, by the convolution
formula, the density function of X1 + · · · + Xk + Xk+1 is given by
Z s
(s − y)k−1
sk
dy =
for 0 < s < 1.
(k − 1)!
k!
0
This gives
P (X1 + · · · + Xk+1 ≤ s) =
Z
s
0
sk+1
xk
=
for 0 ≤ s ≤ 1.
k!
(k + 1)!
1
(b) We have P (NP> n) = P (X1 + · · · + Xn ) = n!
, it follows from the
∞
formula E(N ) = n=0 P (N > n) (see Problem 3.29) that
E(N ) =
∞
X
1
= e.
n!
n=0
5.37 Let X1 , X2 , . . . be a sequence of independent random variables that
are uniformly distributed on (0, 1), and let Sn = X1 + · · · + Xn . The
sought probability is
P (S1 > a) +
∞
X
n=1
P (Sn ≤ a, a < Sn + Xn+1 ≤ 1).
127
Since Sn and Xn+1 are independent of each other, the joint density
sn−1
for 0 < s < 1 and
fn (s, x) of Sn and Xn+1 satisfies fn (s, x) = (n−1)!
0 < x < 1, using the result (a) of Problem 5.36. Therefore,
Z a Z 1−s
an
fn (s, x) dx = (1 − a) .
ds
P (Sn ≤ a, a < Sn + Xn+1 ≤ 1) =
n!
a−s
0
Thus the sought probability is
1−a+
∞
X
n=1
(1 − a)
an
= (1 − a)ea .
n!
5.38 By the independence of X1 , X2 , and X3 , the joint density function of
X1 , X2 , and X3 is 1 × 1 × 1 = 1 for 0 < x1 , x2 , x3 < 1 and 0 otherwise.
Let C = {(x1 , x2 , x3 ) : 0 < x1 , x2 , x3 < 1, 0 < x2 + x + 3 < x1 }. Then
Z x1 −x2
Z x1
Z 1
ZZZ
dx3 .
dx2
dx1
dx1 dx2 dx3 =
P (X1 > X2 + X3 ) =
0
C
0
0
This gives
P (X1 > X2 + X3 ) =
Z
1
dx1
0
Z
x1
0
(x1 − x2 ) dx2 =
1 1
1
= × = .
2 3
6
Z
1
0
1 2
x dx1
2 1
Since the events {X1 > X2 +X3 }, {X2 > X1 +X3 } and {X3 > X1 +X2 }
are mutually exclusive, the probability that the largest of the three random variables is greater than the sum of the other two is 3 × 16 = 12 .
Note: More generally, let X1 , X2 , . . . , Xn be independent random num1
for any n ≥ 2.
bers chosen from (0, 1), then P (X1 > X2 +· · ·+Xn ) = n!
5.39 By P (V > v, W ≤ w) = P (v < X ≤ w, v < Y ≤ w) and the
independence of X and Y , we have
2
P (V > v, W ≤ w) = P (v < X ≤ w)P (v < Y ≤ w) = e−λv − e−λw
for 0 ≤ v ≤ w. Taking partial derivatives, we get that the joint density
of V and W is f (v, w) = 2λ2 e−λ(v+w) for 0 < v < w. It follows from
Z ∞
Z ∞
2λ2 e−λ(v+w) dw
dv
P (W − V > z) =
0
v+z
that P (W − V > z) = e−λz for z > 0, in agreement with the memoryless property of the exponential distribution.
128
5.40 By the substitution rule, the expected value of the area of the rectangle
is equal to
E(XY ) =
Z 1Z
0
1
xy(x + y) dx dy =
0
Z
1
x
0
1
2
x+
1
1
dx = .
3
3
5.41 Define the function g(x, y) as g(x, y = T − max(x, y) if 0 ≤ x, y ≤ T
and g(x, y) = 0 otherwise. The joint density function of X and Y is
e−(x+y) for x, y > 0. Using the memoryless property of the exponential
distribution, the expected amount of time the system is down between
two inspections is given by
T
T − max(x, y) e−(x+y) dx dy
E[g(X, Y )] =
0 0
Z T
1
=2
(T − x)(1 − e−x )e−x dx = T − 1.5 + 2e−T − e−2T .
2
0
Z TZ
5.42 By the substitution rule, the expected value of the time until the
system goes down is
1
4
Z
Z
2
Z
dx
1
max(x, y)(2y + 2 − x) dy
Z
i
1
1 1 h2
1
2
2x dx +
=
(1 − x3 ) + (2 − x)(1 − x2 ) dx
4 0 3
2
0
Z 2
1
(3x − x2 ) dx = 0.96875.
+
4 1
E[max(X, Y )] =
0
0
The expected value of the time between the failures of the two components is E[max(X, Y )] − E[min(X, Y )]. By the substitution rule,
1
E[min(X, Y )] =
4
Z
2
dx
0
Z
1
0
min(x, y)(2y + 2 − x) dy = 0.44792
and so the expected time between the failures of the two components
is 0.52083.
5.43 Using the substitution rule, the expected value of the area of the circle
is
Z 1
Z 1Z 1
1
1
1
5
2
2
(x3 + x2 + x + ) dx = π.
π(x + y )(x + y) dx dy = π
2
3
4
6
0
0
0
129
5.44 Using the substitution rule and writing x + y = 2x + y − x, we get
E(X + Y ) =
∞ X
∞
X
(x + y)
x=0 y=x
∞
∞
x=0
z=0
X e−1 X e−1
e−2
2x
z
=
+
= 3.
x!(y − x)!
x!
z!
Also, using the substitution rule and writing xy = x(y − x + x)
E(XY ) =
∞
∞ X
X
x=0 y=x
xy
∞
∞
∞
X
e−1 X e−1 X 2 e−1
e−2
x
x
z
=
+
= 3.
x!(y − x)!
x!
z!
x!
x=0
z=0
x=0
5.45 The inverse functions x = a(v, w) and y = b(v, w) are a(v, w) = vw
and b(v, w) = v(1 − w). The Jacobian J(v, w) is equal to −v. The
joint density of V and W is
fV,W (v, w) = µe−µvw µe−µv(1−w) | − v| = µ2 ve−µv for v > 0, 0 < w < 1.
The marginal densities of V and W are
Z 1
µ2 ve−µv dw = µ2 ve−µv for v > 0
fV (v) =
Z0 ∞
µ2 ve−µv dv = 1 for 0 < w < 1.
fW (w) =
0
Since fV,W (v, w) = fV (v)fW (w) for all v, w, the random variables V
and W are independent.
5.46 To find the joint density of V and W , we apply the transformation
formula. The inverse functions x = a(v, w) and y = b(v, w) are given
by a(v, w) = vw/(1 + w) and b(v, w) = v/(1 + w). The Jacobian
J(v, w) is equal to −v/(1 + w)2 and so the joint density of V and W
is given by
v
fV,W (v, w) = 1 × 1 × |J(v, w)| =
for 0 < v < 2 and w > 0
(1 + w)2
and fV,W (v, w) = 0 otherwise. The marginal density of V is
Z ∞
1
v
dw = v for 0 < v < 2
fV (v) =
2
(1 + w)
2
0
and fV (v) = 0 otherwise. The marginal density of W is given by
Z 2
v
2
fW (w) =
dv =
for w > 0
2
(1 + w)2
0 (1 + w)
and fW (w) = 0 otherwise. Since fV,W (v, w) = fV (v)fW (w) for all
v, w, the random variables V and W are independent.
130
1
1
5.47 Since Z 2 has the χ21 density √12π u− 2 e− 2 u when Z is N (0, 1) distributed
and the random variables Z12 and Z22 are independent, the joint density
1
1
1
(xy)− 2 e− 2 (x+y) for x, y > 0. For the
of X = Z12 and Y = Z22 is 2π
transformation V = X + Y and W = X/Y , the inverse functions x =
a(v, w) and y = b(v, w) are a(v, w) = 21 (v + w) and b(v, w) = 21 (v − w).
The Jacobian J(v, w) is equal to − 21 . The joint density of V and W is
fV,W (v, w) =
1
1
1
(v 2 − w2 )− 2 e− 2 v for v > 0, −∞ < w < ∞.
2
4π
The random variables V and W are not independent.
√
5.48 Let V = Y X and W = X. To find the joint density of V and W , we
apply the transformation formula. The inverse functions x = a(v, w)
√
and y = b(v, w) are a(v, w) = w and b(v, w) = v/ w. The Jacobian
√
J(v, w) is equal to −1/ w and so the joint density of V and W is
given by
fV,W (v, w) =
1 √ − 1 w − 1 v2
1 −w(1+v2 /w) 1
√ =
we
we 2 e 2
π
π
w
and fV,W (v, w)R = 0 otherwise. The densities fV (v) =
∞
and fW (w) = 0 fV,W (v, w)dv are given by
fV (v) =
r
2 − 1 v2
e 2 for v > 0,
π
R∞
0
for v, w > 0
fV,W (v, w)dw
1
1
1
fW (w) = √ w 2 e− 2 w for w > 0.
2π
The random variable V is distributed as |Z| with Z having the standard normal distribution and W has a gamma distribution with shape
parameter 32 and shape parameter 21 . Since fV,W (v, w) = fV (v)fW (w)
for all v, w, the random variables V and W are independent.
√
1 2
2
5.49 The inverse functions are a(v, w) = ve− 4 (v +w ) / v 2 + w2 and b(v, w) =
√
1 2
1 2
2
2
we− 4 (v +w ) / v 2 + w2 . The Jacobian is 12 e− 2 (v +w ) . Since fX,Y (x, y) =
1
π , we get
fV,W (v, w) =
1 1 − 1 (v2 +w2 )
× e 2
π 2
1 2
for − ∞ < v, w < ∞.
1
2
Noting that fV,W (v, w) = √12π e− 2 v × √12π e− 2 w for all v, w, it follows
that V and W are independent and N (0, 1) distributed.
131
1
for
5.50 The joint density function f (r, θ) of (R, Θ) is given by 1 × 2π
0 < r < 1 and 0 < θ < 2π. The√inverse functions r = a(v, w) and
θ = b(v, w) are given by a(v, w) = v 2 + w2 and b(v, w) = arctan( wv ).
1
Using the fact that arctan(x) has 1+x
2 as derivative, it follows that
1
1
the Jacobian is given by √v2 +w2 . Noting that f (a(v, w), b(v, w)) is 2π
if v 2 + w2 ≤ 1 and 0 otherwise, it follows from the two-dimensional
transformation formula that the joint density fV,W (v, w) of the random
vector (V, W ) is given by
fV,W (v, w) =
1
1
√
2
2π v + w2
for − 1 < v, w < 1, v 2 + w2 ≤ 1
and fV,W (v, w) = 0 otherwise. To get the marginal density
fV (v) =
Z
√
1−v 2
√
− 1−v 2
fV,W (v, w)dw,
we use the following result from calculus:
Z x
p
dt
√
= ln(x + 1 + x2 )
1 + t2
0
This leads after some algebra to
!
√
1 − v2
1
1
+
fV (v) = ln
π
|v|
|v|
for x > 0.
for − 1 < v < 1.
The marginal density of W is of course the same as that of V . The
intuitive explanation that (V, W ) is not a random point inside the
unit circle is as follows. The closer a (small) rectangle within the unit
circle is to the center of the circle, the larger the probability of the
point (V, W ) falling in the rectangle.
1
5.51 The joint density of X and Y is Γ(α)Γ(β)
xα−1 y β−1 e−(x+y) . The inverse
functions are a(v, w) = vw and b(v, w) = w(1 − v). The Jacobian
J(v, w) = w. Thus the joint density of V and W is
1
(vw)α−1 (w(1 − v))β−1 e−v w
Γ(α)Γ(β)
for v, w > 0.
This density can be rewritten as
Γ(α + β) α−1
wα+β−1 −w
v
(1 − v)β−1
e
Γ(α)Γ(β)
Γ(α + β)
for all v, w > 0.
132
This shows that V and W are independent, where V has a beta distribution with parameters α and β, and W has a gamma distribution
with shape parameter α + β and scale parameter 1.
5.52 Since Z and Y are independent, the joint density of Z and Y is
1
1
1 2
y 2 ν−1 e− 2 y
1
f (z, y) = √ e− 2 z × 1
2π
2 2 ν Γ( 12 ν)
for z, y > 0.
The
= b(v, w) are given by z =
p inverse functions z = a(v, w) and y p
w v/ν and y = v. The Jacobian is
v/ν. Thus, by the twodimensional transformation formula, the joint density function of V
and W is that
r
1
1
v
1 − 1 w2 v/ν v 2 ν−1 e− 2 v
2
× 1
×
fV,W (v, w) = √ e
for v, w > 0.
ν
1
ν
2π
2 2 Γ( ν)
2
2
Letting λw = 1 + wν for any w, α = 12 (ν + 1)
R ∞and using the change
v
of variable u = 2 , it follows that fW (w) = 0 fV,W (v, w)dv can be
written as
Z ∞ α α−1 −λu
1
1
λ−α
λ−α
λw u
e
w Γ( 2 (ν + 1))
w Γ( 2 (ν + 1))
fW (w) =
du
=
,
√
√
Γ(α)
πν Γ( 12 ν)
πν Γ( 21 ν)
0
showing the desired result.
Note: This problem shows that the two-dimensional transformation
V = X and W = h(X, Y ) may be useful when you to want to find the
density of a function h(x, y) of a random vector (X, Y ) with a given
joint density function f (x, y).
5.53 For ∆x, ∆y sufficiently small,
1
1
1
1 P x − ∆x ≤ U(1) ≤ x + ∆x, y − ∆y ≤ U(n) ≤ y + ∆y
2
2
2
2
n n−1
=
(y − x)n−2 ∆x∆y for 0 < x < y < 1.
1
1
Therefore the joint density of U(1) and U(n) is
f (x, y) =
n!
(y − x)n−2
(n − 2)!
for 0 < x < y < 1.
133
5.54 Let X = U(1) and Y = U(n) . The joint density of X and Y is given
by n(n − 1)(y − x)n−2 for 0 < x < y < 1, see Problem 5.53. For the
transformation V = Y and W = Y − X, the inverse functions are
a(v, w) = v − w and b(v, w) = v. The Jacobian J(v, w) = 1. Thus the
joint density of V and W is given by n(n − 1)wn−2 for w < v < 1 and
0 < w < 1. The marginal density of the range W is
Z
1
w
n(n − 1)wn−2 dv = n(n − 1)wn−2 (1 − w)
for 0 < w < 1.
Note: an alternative derivation of the results of the Problems 5.53
and 5.54 can be given. This derivation goes as follows. It follows from
P (X > x, Y ≤ y) = P (x < Ui ≤ y for i = 1, . . . , n) that P (X >
x, Y ≤ y) = (y − x)n for 0 ≤ x ≤ y ≤ 1. Taking partial derivatives, we
get that the joint density function of X and Y is given by f (x, y) =
n(n − 1)(y − x)n−2 for 0 < x < y < 1 and
f (x, y)R = 0 otherwise. Next,
R 1−z
1
we get from P (Y − X > z) = n(n − 1) 0 dx z+x (y − x)n−2 dy that
R 1−z
P (Y − X > z) = n 0 [(1 − x)n−1 − z n−1 ] dx. This gives P (Y − X >
z) = 1 + (n − 1)z n − nz n−1 for 0 ≤ z ≤ 1 and so the density of Y − X
is n(n − 1)z n−2 (1 − z) for 0 < z < 1.
5.55 The marginal distributions of X and Y are the Poisson distributions
pX (x) = e−1 /x! for x ≥ 0 and pY (y) = e−2 2y /y! for y ≥ 0 with
E(X) = σ 2 (X) = 1 and E(Y ) = σ 2 (Y ) = 2. We have
E(XY ) =
∞ X
∞
X
xy
x=0 y=x
P∞
e−1
y=x y (y−x)!
e−2
.
x!(y − x)!
P∞
e
− x + x) (y−x)!
= 1 + x, we get
√
E(XY ) = 3. This gives ρ(X, Y ) = 1/ 2.
Noting that
=
−1
y=x (y
5.56 Using the marginal densities fX (x) =
fY (y) = 4y 3 for 0 < y < 1, we obtain
4
3 (1
− x3 ) for 0 < x < 1 and
2
4
14
2
E(X) = , E(Y ) = , σ 2 (X) =
, and σ 2 (Y ) = .
5
5
225
75
R1 Ry
R1
By E(XY ) = 0 dy 0 xy4y 2 dx = 0 2y 5 dy, we find E(XY ) =
Hence
E(XY ) − E(X)E(Y )
ρ(X, Y ) =
= 0.3273.
σ(X)σ(Y )
1
3.
134
5.57 The variance of the portfolio’s return is
2
2
f 2 σA
+ (1 − f )2 σB
+ 2f (1 − f )σA σB ρAB .
Putting the derivative of this function equal to zero, it follows that
the
2
2
2 −σ σ ρ
optimal fraction f is σB
A B AB / σA + σB − 2σA σB ρAB .
5.58 Using the linearity of the expectation operator, it is readily verified
from the definition of covariance that
cov(X + Z, Y + Z) = cov(X, Y ) + cov(X, Z) + cov(Z, Y ) + cov(Z, Z).
Since the random variables X, Y , and Z are independent, we have
cov(X, Y ) = cov(X, Z) = cov(Z, Y ) = 0. Further, cov(Z, Z) = σ 2 (Z),
σ 2 (X + Z) = σ 2 (X) + σ 2 (Z) = 2, and σ 2 (Y + Z) = σ 2 (Y ) + σ 2 (Z) = 2.
Therefore ρ(X + Z, Y + Z) = 12 .
5.59 (a) Let RA be the rate of return of stock A and RB be the rate of
return of stock B. Since RB = −RA + 14, the correlation coefficient is
−1.
(b) Let X = f RA + (1 − f )RB . Since X = (2f − 1)RA + 14(1 − f ),
the variance of X is minimal for f = 12 . Invest 21 of your capital in
stock A and 12 in stock B. Then the portfolio has a guaranteed rate
of return of 7%.
R1 Rx
5.60 We have E(XY ) = 6 0 dx 0 xy(x−y) dy = 51 . The marginal densities
of X and Y are fX (x) = 3x2 for 0 < x < 1 and fY (y) = 3y 2 −p6y + 3
for 0 < y < 1. Then, E(X) = 43 , E(Y ) = 14 , σ(X) = σ(Y ) = 3/80.
This leads to ρ(X, Y ) = 13 .
5.61 The joint density of (X, Y ) is f (x, y) = π1 for (x, y) inside the circle
C. Then,
ZZ
1
xy dx dy.
E(XY ) =
π
C
Since the function xy has opposite signs on the quadrants of the circle,
a symmetry argument gives E(XY ) = 0. Also, by a same argument,
E(X) = E(Y ) = 0. This gives ρ(X, Y ) = 0, although X and Y are
dependent.
5.62 The joint density function of X and Y is 21 on the region D. Since the
function xy has opposite signs on the four triangles of the region D, we
have E(XY ) = 0. Also, E(X) = E(Y ) = 0. Therefore ρ(X, Y ) = 0.
135
5.63 The joint density function fV,W (v, w) of V and W is most easily obtained from the relation
P (v < V < v + ∆v, w < W < w + ∆w)
= P (v < X < v + ∆v, w < Y < w + ∆w)
+ P (v < Y < v + ∆v, w < X < w + ∆w) = 2∆v∆w
for 0 ≤ v < w ≤ 1 when ∆v, ∆w are small enough. This shows that
fV,W (v, w) = 2 for 0 < v < w < 1. Next it follows that fV (v) = 2(1−v)
for 0 < v < 1 and fW (w) = 2w for 0 < w < 1. This leads to
1
E(V W ) = 14 , E(V ) = 31 , E(W ) = 32 , and σ(V ) = σ(W ) = 3√
. Thus
2
ρ(V, W ) = 12 .
5.64 Let X denote the low points rolled and Y the high points rolled. We
1
2
have P (X = i, Y = i) = 36
for 1 ≤ i ≤ 6 and P (X = i, Y = j) = 36
for 1 ≤ i < j ≤ 6, see also Problem 5.1. The marginal distribution
9
7
of X is given by P (X = 1) = 11
36 , P (X = 2) = 36 , P (X = 3) = 36 ,
5
3
1
P (X = 4) = 36
, P (X = 5) = 36
, and P (X = 6) = 36
, while the
1
3
marginal distribution of Y is P (Y = 1) = 36 , P (Y = 2) = 36
, P (Y =
7
9
11
5
.
3) = 36 , P (Y = 4) = 36 , P (Y = 5) = 36 , and P (Y = 6) = 36
Straightforward calculations yield
91
301
161
791
, E(X 2 ) =
, E(Y ) =
, E(Y 2 ) =
, σ(x) = 1.40408
36
36
36
36
6
6 X
X
441
σ(Y ) = 1.40408, E(XY ) =
xyP (X = x, Y = y) =
.
36
y=x
E(X) =
x=1
It now follows that
ρ(X, Y ) =
441/36 − (91/36)(161/36)
E(XY ) − E(X)E(Y )
=
= 0.479.
σ(X)σ(Y )
(1.40408)2
5.65 The joint probability mass function of X and Y is given by
P (X = x, Y = y) =
1
1
×
100 x
for x = 1, 2, . . . , 100, y = 1, . . . , x.
The marginal distributions of X and Y are given by
1
P (X = x) =
100
100
and
1 X1
P (Y = y) =
100 x=y x
136
for 1 ≤ x ≤ 100 and 1 ≤ y ≤ 100. Next it follows that
E(XY ) =
100 X
x
X
x=1 y=1
100
xy ×
1 X1
1
=
x(x + 1) = 1717.
100x
100
2
x=1
Further,
100
E(X) =
1 X
x = 50.5,
100
100
E(X 2 ) =
x=1
1 X 2
x = 3383.5,
100
x=1
100
100
100
x
100
1 X X1
1 X
1 X1X
E(Y ) =
y
y=
(x + 1)
=
100
x
100
x
200
x=y
y=1
x=1
y=1
x=1
= 25.75,
E(Y 2 ) =
100
100
100
x
1 X1X 2
1 X 2X 1
y
y
=
100
x
100
x
x=y
y=1
=
1
600
100
X
x=1
y=1
(x + 1)(2x + 1) = 1153.25.
x=1
√
Hence the standard deviations
of X and Y are σ(X) = 3383.5 − 50.52 =
√
28.8661 and σ(Y ) = 1153.25 − 25.752 = 22.1402 and so
ρ(X, Y ) =
1717 − 50.5 × 25.75
= 0.652.
28.8661 × 22.1402
5.66 The joint density of X and Y is f (x, y) = x1 for 0 < y < x < 1 and
R1 Rx
f (x, y) = 0 otherwise. Thus, E(XY ) = 0 dx 0 xy x1 dy = 16 . The
marginal Rdensities of X and Y are fX (x) = 1 for 0 < x < 1 and
1
fY (y) = y x1 dx = − ln(y) for 0 < y < 1. This leads to E(X) = 21 ,
q
q
1
7
and
σ(Y
)
=
E(Y ) = 14 , σ(X) =
12
144 . Therefore ρ(X, Y ) =
q
1
7
( 16 − 12 × 41 )/ 12
× 144
= 0.655.
5.67 The joint probability mass function p(x, y) = P (X = x, Y = y) is
given by p(x, y) = rx−1 p(r+p)y−x−1 q for x < y and p(x, y) = ry−1 q(r+
q)x−y−1 p for x > y. It is matter of some algebra to get
E(XY ) =
Also, E(X) =
1
p
1
1
1+r
q
1+r
p
+p
+
+q
.
2
3
2
q (1 − r)
(1 − r)
p (1 − r)
(1 − r)3
and E(Y ) = 1q . This leads to cov(X, Y ) = −1/(1 − r).
137
5.68 To obtain the joint density function of X and Y , note that for ∆x and
∆y small
P (x < X ≤ x + ∆x, y < Y ≤ y + ∆y) = 6∆x(y − x)∆y
for 0 ≤ x < y < 1, see also Example 5.3. Thus the joint density
function of X and Y is given by
f (x, y) = 6(y − x)
for 0 < x < y < 1
and f (x, y) = 0 otherwise. Therefore
Z 1 Z 1
xy(y − x) dy
dx
E(XY ) = 6
x
0
Z 1 1
1
1
3
2
x (1 − x ) − x(1 − x ) dx = .
=6
3
2
5
0
The marginal density functions of X and Y are given by
fX (x) = 3(1 − x)2 for 0 < x < 1,
fY (y) = 3y 2 for 0 < y < 1.
p
Simple calculations give E(X) = 14 , E(Y ) = 34 , σ(X) = 1/10 − 1/16 =
p
p
p
3/80, and σ(Y ) = 3/5 − 9/16 = 3/80. This leads to
1
1/5 − (1/4) × (3/4)
= .
ρ(X, Y ) = p
3
(3/80) × (3/80)
5.69 The “if part” follows from the relations cov(X, aX+b) = acov(X, X) =
aσ12 and σ(aX + b) = |a|σ1 . Suppose now that |ρ| = 1. Since
var(V ) =
ρ
1 2 ρ2 2
σ2 + 2 σ1 − 2
cov(X, Y ) = 1 − ρ2 ,
2
σ1 σ2
σ2
σ1
we have var(V ) = 0. This result implies that V is equal to a constant
E(X)
)
and this constant is E(V ) = E(Y
σ2 − ρ σ1 . This shows that Y =
aX + b, where a = ρσ2 /σ1 and b = E(Y ) − aE(X).
5.70 Using the linearity of the expectation operator, it is readily verified
from the definition of covariance that
cov(aX + b, cY + d) = accov(X, Y ).
Also, σ(aX + b) = |a|σ(X) and σ(cY + d) = |c|σ(Y ). Therefore
ρ(aX + b, cY + d) = ρ(X, Y ) if a and c have the same signs.
138
5.71 (a) Suppose that E(Y 2 ) > 0 (if E(Y 2 ) = 0, then Y = 0). Let h(t) =
E[(X − tY )2 ]. Then,
h(t) = E(X 2 ) − 2tE(XY ) + t2 E(Y 2 ).
The function h(t) is minimal for t = E(XY )/E(Y 2 ). Substituting
this t-value into h(t) and noting that h(t) ≥ 0, the Cauchy-Schwartz
inequality follows.
(b) The Cauchy-Schwartz inequality gives [cov(X, Y )]2 ≤ var(X)var(Y )
or, equivalently, ρ2 (X, Y ) ≤ 1 and so −1 ≤ ρ(X, Y ) ≤ 1.
(c) Noting that E(XY ) = E(X) and E(Y 2 ) = P (X > 0), the CauchySchwartz inequality gives [E(X)]2 ≤ E(X 2 )P (X > 0). This shows
that P (X > 0) ≥ [E(X)]2 /E(X 2 ) and so P (X = 0) ≤ var(X)/E(X 2 ).
5.72 (a) Since var(X) = a2 var(X) and cov(aX, bY ) = abcov(X, Y ), it suffices to verify the assertion for ai = 1 for all i. We use the method of
induction to prove that
var
k
X
Xj =
j=1
k
X
var(Xj ) + 2
j=1
k
k−1 X
X
cov(Xi , Xj )
i=1 j=i+1
for all k ≥ 2. For k = 2, the assertion has been proved in Rule
11.5. Suppose the assertion has been proved for k = 2, . . . , m for
some m ≥ 2. Then, by the induction hypothesis and Rule
P 11.5 with
X = X1 + · · · + Xm and Y = Xm+1 , it follows that var( m+1
j=1 Xk ) is
given by
var
m
X
j=1
+2
m
m
X
X
var(Xj )
Xj , Xm+1 =
Xj + var(Xm+1 ) + 2cov
m
X
cov(Xi , Xj ) + var(Xm+1 ) + 2
m+1
X
j=1
var(Xj ) + 2
m
X
cov(Xi , Xm+1 )
i=1
i=1 j=i+1
=
j=1
j=1
m
X
m m+1
X
X
cov(Xi , Xj ).
i=1 j=i+1
(b) Using the fact that σ 2 (aX) = a2 σ 2 (X) for any constant a, it
follows that
1
1
σ 2 (X n ) = 2 [nσ 2 + 2 × n(n − 1)].
n
2
(c) Since cov(aX, bY ) = abcov(X, Y ), it suffices to verify the assertion
for ai = 1 for all i and bj = 1 for all j. Using the linearity of the
139
expectation operator, it is immediately verified from the definition of
covariance that cov(X, Y + Z) = cov(X,P
Y ) + cov(X,P
Z). It is readily
m
verified by induction on m that cov(X1 , m
Y
)
=
j
j=1
j=1 cov(X1 , Yj )
for all m P
≥ 1. Next,
for
fixed
m,
it
can
be
verified
by
induction on n
Pm
Pn P m
n
that cov( i=1 Xi , j=1 Yj ) = i=1 j=1 cov(Xi , Yj ).
(d) Using the result of (c) and the fact that cov(X, Y ) = 0 for independent X and Y , we get
n
n
n
1X
1 XX
cov(Xk , Xj )
cov(Xk , Xi ) − 2
cov X n , Xi − X n =
n
n
k=1 j=1
k=1
n
1 2
1 X 2
σ (Xk ) = 0.
= σ (Xi ) − 2
n
n
k=1
(e) Using the result of (c), we have
cov(X1 − X2 , X1 + X2 ) = σ 2 (X1 ) − cov(X1 , X2 ) − cov(X2 , X1 ) − σ 2 (X2 )
= σ 2 (X1 ) − σ 2 (X2 ) = 0.
5.73 Since cov(Xi , Xj ) = cov(Xj , Xi ), the matrix C is symmetric.
Pn Pn To prove
that C is positive semi-definite, we must verify that i=1 j=1 ti tj σij ≥
0 for all real P
numbers t1 , . . . , tn . This property follows from the formula for var( ni=1 ti Xi ) in Problem 5.72 and the fact that the variance
is always nonnegative.
5.74 Since X and Y are independent cov(X, Y ) = 0. Therefore, using
the result of Problem 5.72(c), cov(X, V ) = cov(X,X)+cov(X,Y) =
σ 2 (X) = 1 > 0, cov(V, W ) = cov(X, Y ) − acov(X, X) + cov(Y, Y ) −
acov(Y, X) = −aσ 2 (X) + σ 2 (Y ) = 1 − a > 0 for 0 < a < 1, and
cov(X, W ) = cov(X, Y ) − acov(X, X) = −a < 0.
√
5.75 Let V = max(X, Y ) and W = min(X, Y ). Then E(V ) = 1/ π and
√
E(W ) = −1/ π, see Problem 4.69. Obviously, V W = XY and so
E(V W ) = E(X)E(Y ) = 0, by the independence of X and Y . Thus
cov(V, W ) =
1
.
π
We have min(X, Y ) = − max(−X, −Y ). Since the independent random variables −X and −Y are distributed as X and Y , it follows that
min(X, Y ) has the same distribution as min(−X, −Y ) = − max(X, Y ).
Therefore σ 2 (V ) = σ 2 (W ). Also, by V + W = X + Y , we have
140
σ 2 (V + W ) = σ 2 (X + Y ) = 2. Using the relation σ 2 (V + W ) =
σ 2 (V ) + σ 2 (W ) + 2cov(V, W ), we get σ 2 (V ) + σ 2 (W ) = 2 − 2/π and
so σ 2 (V ) = σ 2 (W ) = 1 − 1/π. This leads to
ρ(V, W ) =
1/π
1
=
.
1 − 1/π
π−1
Note: the result is also true when X and Y are N (µ, σ 2 ) distributed.
To see this, use the relations
h X − µ Y − µ i σ 2
hX − µ Y − µi
,
,
, min
=
cov(V, W ) = σ 2 cov max
σ
σ
σ
σ
π
1
var(V ) = var(W ) = σ 2 1 −
.
π
5.76 Let V = max(X, Y ) and W = min(X, Y ). The random variable V is
1
1
exponentially distributed and has E(V ) = 2λ
and σ 2 (V ) = (2λ)
2 . The
random variable W satisfies
P (W ≤ w) = (1 − e−λw ) × (1 − e−λw )
for w ≥ 0.
3
It is matter of some algebra to get E(W ) = 2λ
and σ 2 (W ) =
1
Noting that E(V W ) = E(XY ) = E(X)E(Y ) = λ2 , we find
cov(V, W ) =
This leads to ρ(V, W ) =
5
.
4λ2
1
1
3
1
−
×
= 2.
2
λ
2λ 2λ
4λ
√1 .
5
5.77 The linear least square estimate of D1 given that D1 − D2 = d is equal
to
σ(D1 )
[d − E(D1 − D2 )].
E(D1 ) + ρ(D1 − D2 , D1 )
σ(D1 − D2 )
By thep
independence of D1 and D2 , E(D1 − D2 ) = µ1 − µ2 , σ(D1 −
D2 ) = σ12 + σ22 and cov(D1 − D2 , D1 ) = σ12 . The linear least square
estimate is
σ2
µ1 + 2 1 2 (d − µ1 + µ2 ).
σ1 + σ2
141
Chapter 6
6.1 It suffices to prove the result for the standard bivariate normal distribution. Let W = aX + bY . Let us first assume that b > 0. Then,
P (W ≤ w) =
1
2π
p
1 − ρ2
Z
∞ h Z (w−ax)/b
1
e− 2 (x
−∞
2 −2ρxy+y 2 )/(1−ρ2 )
−∞
i
dy dx
for −∞ < w < ∞. Differentiation yields that the density function of
W is
Z ∞
1 2
1
2 2
2
p
fW (w) =
e− 2 [x −2ρx(w−ax)/b+(w−ax) /b ]/(1−ρ ) dx.
2πb 1 − ρ2 −∞
It is a matter of some algebra to obtain
√
1
fW (w) = (η 2π)−1 exp(− w2 /η 2 ) for − ∞ < w < ∞,
2
p
where η =
a2 + b2 + 2abρ. This result also applies when b ≤ 0.
To see this, write W = aX + (−b)(−Y ) and note that (X, −Y ) has
the standard bivariate normal density with correlation coefficient −ρ.
We can now conclude that aX + bY is normally distributed for all
a, b if (X, Y ) has a bivariate normal distribution. To find P (X >
Y ) for (X, Y ) having a bivariate normal distribution with parameters
(µ1 , µ2 , σ12 , σ22 , ρ), note that X − Y is N (µ1 − µ2 , σ12 + σ22 − 2ρσ1 σ2 )
distributed. Therefore
P (X > Y ) = 1 − Φ
−(µ1 − µ2 )
.
(σ12 + σ22 − 2ρσ1 σ2 )1/2
6.2 Let the random variables X and Y denote the rates of return on the
stocks A and B. Define the random variable V by V = 12 X + 12 Y . Since
(X, Y ) is assumed to have a bivariate normal distribution, any linear
combination of X and Y is normally distributed. Hence, the random
variable V is normally distributed with expected value E(V ) = 21 µ1 +
1
1 2
1
1 2
2
2 µ2 = 0.10 and variance σ (V ) = 4 σ1 + 4 σ1 + 2 × 4 ρσ1 σ2 = 0.004375.
Thus,
0.11 − 0.10 P (V > 0.11) = 1 − Φ √
= 0.4399.
0.004375
142
RR
6.3 Using the basic formula P ((X, Y ) ∈ C) = C f (x, y) dx dy, we have
R∞
R0
R yz
R∞
P (Z ≤ z) = 0 dy −∞ f (x, y) dx+ −∞ dy yz f (x, y) dx. Taking the
derivative, we get that the density of Z is
Z ∞
Z 0
Z ∞
|y|f (yz, y) dy.
yf (yz, y) dy =
yf (yz, y) dy −
fZ (z) =
−∞
−∞
0
Inserting the standard bivariate normal density for f (x, y), the desired
result follows after some algebra.
6.4 Using the decomposition formula for the standard bivariate normal
density,
X − µ
a − µ1 Y − µ2
b − µ2 1
P (X ≤ a, Y ≤ b) = P
≤
,
≤
σ1
σ1
σ2
σ2
Z (b−µ2 )/σ2
Z (a−µ1 )/σ1
2
1 2
1
1
1
2
√ e− 2 x dx
√ e− 2 (y−ρx) /τ dy,
=
2π
τ 2π
−∞
−∞
where τ 2 = 1 − ρ2 . Since the second integral represents the probability
that an N (ρx, τ 2 ) distributed random variable is smaller than or equal
to (b − µ2 )/σ2 , we obtain that P (X ≤ a, Y ≤ b) can be calculated as
!
Z (a−µ1 )/σ1
1 2
1
(b − µ2 )/σ2 − ρx
√
p
e− 2 x dx.
Φ
2π −∞
1 − ρ2
This one-dimensional integral is well suited for numerical integration.
For the special of a = µ1 and b = µ2 , we can
p give an explicit
expression for P (X ≤ a, Y ≤ b). Letting c = ρ/ 1 − ρ2 and using
the standard method of changing to polar coordinates, it follows that
P (X ≤ µ1 , Y ≤ µ2 ) can be evaluated as
!
Z 0
1 2
−ρx
1
√
e− 2 x dx
Φ p
2
2π −∞
1−ρ
Z ∞
Z 0 Z −cx
Z 3π
2
1 2
1
1
− 21 (x2 +y 2 )
e− 2 r r dr dφ.
dx dy =
=
e
2π −∞ −∞
2π π−arctg(c) 0
Noting that
R∞
0
1 2
e− 2 r r dr = 1, we obtain
P (X ≤ µ1 , Y ≤ µ2 ) =
1
2π
Z
3
π
2
π−arctg(c)
dφ =
1
1
+
arctg(c).
4 2π
143
6.5 Any linear combination of V and W is a linear combination of X and
Y and thus is normally distributed.
6.6 The vector (X, X + Y ) has a bivariate normal distribution
q with pa√
∗
∗
∗
∗
∗
rameters µ1 = µ2 = 0, σ1 = 1, σ2 = 2 + 2ρ, and ρ = 12 (1 + ρ).
6.7 Since X and V are linear combinations of X and Y , any linear combination of X and V is normally distributed and so (X, V ) has a
bivariate normal distribution. Noting that E(V ) = 0 and σ 2 (V ) =
(1 + ρ2 − 2ρ2 )/(1 − ρ2 ) = 1, the random variable V is N (0, 1) distributed like the random variable X. To prove the independence of
X and V , it suffices to verify that cov(X, V ) = 0. This is immediate
from
cov(X, Y ) − ρσ 2 (X)
ρ−ρ
p
cov(X, V ) =
=p
= 0.
2
1−ρ
1 − ρ2
6.8 The solution to this problem requires the fact that any linear combination of two independent normally distributed random variables is again
normally distributed (see Rule 8.6 in Chapter 8). Any linear combination of X1 and X2 is a linear combination of the independent and
normally distributed random variables Z1 and Z2 and is thus normally
distributed, showing that (X1 , X2 ) has a bivariate normal distribution.
Using Rule 5.11 and the relations cov(aX +b, cY +d) = accov(X, Y ) for
any constants a, b, c, d and cov(X, V + W ) = cov(X, V ) + cov(X, W ),
we obtain
E(X1 ) = µ1 , E(X2 ) = µ2 , σ 2 (X1 ) = σ12 , σ 2 (X2 ) = σ22 (ρ2 + 1 − ρ2 ),
cov(X1 , X2 )
σ1 σ2 ρcov(Z1 , Z1 )
ρ(X1 , X2 ) =
=
= ρ,
σ(X1 )σ(X2 )
σ1 σ2
where the expressions for σ 2 (X2 ) and cov(X1 , X2 ) use the independence of Z1 and Z2 . Hence the parameters of the bivariate normal
distribution of (X1 , X2 ) are given by (µ1 , µ2 , σ12 , σ22 , ρ).
6.9 Any linear combination of X + Y and X − Y is a linear combination
of X and Y and thus is normally distributed. This shows that the
random vector (X +Y, X −Y ) has a bivariate normal distribution. The
components X+Y and X−Y are independent if cov(X+Y, X−Y ) = 0.
Since
cov(X + Y, X − Y ) = cov(X, X) − cov(X, Y ) + cov(X, Y ) − cov(Y, Y ),
it follows that cov(X + Y, X − Y ) = σ 2 (X) − σ 2 (Y ) = 0.
144
6.10 Go through the path of length n in opposite direction and next continue this path with m steps.
6.11 Since Sn1 and Sn2 are approximately N (0, 21 n) distributed for n large,
it follows from the results in Problem 4.68 that both |Sn1 | and |Sn2 |
2
has the approximate density √2πn e−u /n with E(|Sn1 |) = E(|Sn2 |) ≈
p
n/π. This gives
p
E(Rn ) ≈ 2 n/π for lare n.
Also, |Sn1 | and |Sn2 | are nearly independent for n large. Using the
convolution formula in Section 5.2, the density of Rn is approximately
Z r
2
2
2
2
√ e−u /n √ e−(r−u) /n du.
πn
πn
0
R r/√n
1 2
1 2
4
e− 2 r /n √12π −r/√n e− 2 z dz,
This integral can be rewritten as √2πn
showing that the density of Rn is approximately
h
1 2
−r i
r
4
√
e− 2 r /n Φ( √ ) − Φ( √ ) for large n.
n
n
2πn
6.12 Noting that 70,000 km is equal to 7 × 1010 millimeters, the equality
r
8
7 × 1010
m=
3π
10−1
shows that the average number of collisions that a photon undergoes
before reaching the sun’s surface is approximately equal to m = 5.773×
1023 . A photon travels at a speed of 300,000 km per second and
thus the travel time of a photon between two collisions is equal to
10−1 /(3 × 1011 ) = 3.333 × 10−13 seconds. The average travel time of a
photon from the sun’s core to its surface is thus approximately equal
to
(5.773 × 1023 ) × (3.333 × 10−13 ) = 1.924 × 1011 seconds.
If you divide this by 365.25×24×3,600, then you find that the average
travel time is approximately 6,000 years. A random walk is not a very
fast way to get anywhere! Once it reaches the surface of the sun, it
takes a photon only 8 minutes to travel from the surface of the sun to
the earth (the distance from the sun to the earth is 149,600,000 km).
145
6.13 The random variables a1 X1 +· · ·+an Xn and b1 Y1 +· · ·+bm Ym are normally distributed for any constants a1 , . . . , an and b1 , . . . , bm . Moreover, these random variables are independent (any functions f and
g result into independent random variables f (X) and g(Y) if X and
Y are independent). Since the sum of two independent normally distributed random variables is again normally distributed (see Rule 8.6),
a1 X1 + · · · + an Xn + b1 Y1 + · · · + bm Ym is normally distributed, showing
that (X, Y) has a multivariate normal distribution.
6.14 Let the random variables XA , XB , and XC denote the annual rates of
return on the stocks A, B, and C. Also, let’s express all amounts in
units of $1,000.
(a) Let the random variable W denote the portfolio’s value after one
year. Then, the random variable W can be written as
W = (1 + XA ) × 20 + (1 + XB ) × 20 + (1 + XC ) × 40 + 1.05 × 20.
Since the random vector (XA , XB , XC ) is assumed to have a trivariate
normal distribution, the random variable W is normally distributed
with expected value
E(W ) = 1.075 × 20 + 1.1 × 20 + 1.2 × 40 + 1.05 × 20 = 112.5
thousand dollars. The variance of W is computed as
400 × σ 2 (XA ) + 400 × σ 2 (XB ) + 1,600 × σ 2 (XC )
+ 2 × 400 × 0.7 × σ(XA ) × σ(XB ) + 2 × 800
× (−0.5) × σ(XA ) × σ(XC ) + 2 × 800 × (−0.3) × σ(XB ) × σ(XC ).
This leads to the standard deviation σ(W ) = 7.660 thousand dollars.
(b) Suppose that the fractions f1 , f2 , f3 , and f4 of the investor’s capital
are invested in the stocks A, B, C, and the riskless asset, respectively.
Then the expected value of the portfolio’s value (in units of $1,000)
after one year is given by
(1.075f1 + 1.1f2 + 1.2f3 + 1.05f4 ) × 100
and the variance is given by
(0.0049f12 +0.01f22 +0.04f32 +0.0098f1 f2 −0.014f1 f3 −0.012f2 f3 )×104 .
There are many choices for the values of the fi such that the resulting
portfolio has an expected return that is not less than that of the portfolio from question (a) but whose risk is smaller than the risk of the
146
portfolio of question (a). The optimal values of the fi are determined
by the optimization problem
Minimize
0.0049f12 + 0.01f22 + 0.04f32 + 0.0098f1 f2 − 0.014f1 f3 − 0.012f2 f3
subject to the constraints
1.075f1 + 1.1f2 + 1.2f3 + 1.05f4 ≥ 1.125,
f1 + f2 + f3 + f4 = 1 and fi ≥ 0 for i = 1, . . . , 4.
This optimization problem is a so-called quadratic programming problem that can be numerically solved by existing codes. The solution is
f1 = 0.381, f2 = 0.274, f3 = 0.345, and f4 = 0. The expected return is
again 112.5 thousand dollars, but the standard deviation is now 6.538
thousand dollars.
(c) Since W is normally distributed, the probability that the portfolio’s value next year will be less than $112,500 is Φ(0) = 0.5 and
the probability that the portfolio’s value next year will be more than
$125,000 is
125 − 112.5 1−Φ
= 0.0295.
6.619
6.15 The value of the chi-square statistic is
(60,745 − 59,438.2)2
(60,179 − 61,419.5)2 (55,551 − 55,475.7)2
+
+ ··· +
61,419.5
55,475.7
59,438.2
2
(61,334 − 61,419.5)
+
= 642.46.
61,419.5
The probability that a χ211 distributed random variable takes on a
value larger than 642.46 is practically zero. This leaves no room at all
for doubt about the fact that birth dates are not uniformly distributed
over the year.
6.16 The observed value of test statistic D is 0.470. The probability P (χ23 ≥
0.470) = 0.925. The agreement with the theory is very good.
6.17 The parameter of the hypothesized Poisson distribution is estimated
as λ = 37
78 vacancies per year. The data are divided in three groups:
years with 0 vacancies, with 1 vacancy, and with ≥ 2 vacancies. Letting
pi = e−λ λi /i!, the expected number of years with 0 vacancies is 78p0 =
147
48.5381, with 1 vacancy is 78p1 = 23.0245 and with ≥ 2 vacancies is
78(1−p0 −p1 ) = 6.4374. The chi-square test statistic with 3−1−1 = 1
degrees of freedom has the value 0.055. Since P (χ21 > 0.055) = 0.8145,
the Poisson distribution gives an excellent fit.
6.18 The parameter of the hypothesized Poisson distribution for the number
of deaths per corps-year is estimated as
λ=
196
= 0.7.
14 × 20
The corps-years with 3 or more deaths are aggregated and so four
possible data groups are considered. In the table we give the observed
number and the expected number of corps-years with 0, 1, 2 and ≥
3 deaths, where the expected number of corps-years with exactly k
deaths is computed as 280 × e−λ λk /k!. The value of the test statistic
is calculated as
(144 − 139.0439)2 (91 − 97.3307)2 (32 − 34.0658)2
+
+
139.0439
97.3307
34.0658
2
(13 − 9.5596)
= 1.952.
+
9.5596
The test statistic has approximately a chi-square distribution with
4 − 1 − 1 = 2 degrees of freedom. The probability P (χ22 ≥ 1.952) =
0.3768. The Poisson distribution gives a good fit.
number of deaths
0
1
2
≥3
observed number
of corps-years
144
91
32
13
expected number
of corps-years
139.0439
97.3307
34.0658
9.5596
6.19 The parameter of the hypothesized Poisson distribution is estimated
as λ = 2.25. The matches with 5 or more goals are aggregated and so
six data groups are considered. The test statistic has approximately a
chi-square distribution with 6 − 1 − 1 = 4 degrees of freedom and its
value is 1.521. The probability P (χ24 > 1.521) = 0.8229. The Poisson
distribution gives an excellent fit.
148
6.20 The parameter of the hypothesized Poisson distribution is estimated
as λ = 10097/2608 = 3.8715. The intervals with 11 or more α-particles
are aggregated and so 12 data groups are considered. The expected
number of time intervals with exactly k particles
computed as 2608×
P∞ is−λ
−λ
k
e λ /k! for k = 0, 1, . . . , 10, while 2608× j=11 e λj /j! is computed
as the expected number of time intervals with 11 or more particles.
The value of the test statistic is calculated as 12.364. The test statistic
has approximately a chi-square distribution with 12 − 1 = 1 degrees
of freedom. The probability P (χ211 ≥ 12.364) = 0.3369. Thus the
Poisson distribution gives a good fit.
6.21 Think of n = 98, 364,597 independent repetitions of a chance experiment with seven possible outcomes, where the outcomes 1, 2, . . . , 6
correspond to the prizes 1, 2, . . . , 6 and the outcome 7 means that
none of these six prizes was won. Denote by pj the probability of outcome j for j = 1, . . . , 7. The probabilities pj are easiest obtained by
imagining a vase with 6 red balls ( the regular numbers from the lotto
drawing), 1 blue number (the bonus number from the lotto drawing)
and 38 black numbers (the other numbers). Then,
6
6
6 1
6 1
1
5
1
5
6
6
5 1
5 1
p1 = 45 × , p2 = 45 × , p3 = 45 × , p4 = 45 × ,
6
6
6
6
6
6
6
6
6
6 38
6 38
X
1
5
pj .
p5 = 5 451 × , p6 = 5 451 × , p7 = 1 −
6
6
6
6
j=0
Letting N1 = 2, N2 = 6, N3 = 9, N4 = 35, N5 = 411, N6 = 2, 374, and
N7 = 98, 361, 760 and assuming that the tickets are randomly filled in,
the value of the test statistic D is calculated as
7
X
(Nj − npj )2
= 20.848.
npj
j=1
Next we calculate the probability P (χ26 > 20.848) = 0.00195. This
small value is a strong indication that the tickets are not randomly
filled in. People do not choose their numbers randomly, but they often
use birth dates, lucky numbers, arithmetical sequences, etc., in order
to choose lottery numbers. A same conclusion was reached in a similar
study of D. Kadell and D. Ylvisaker entitled “Lotto play: the good,
the fair and the truly awful,” Chance 4 (1991): 22-25.
149
Chapter 7
7.1 To specify P (X = x | Y = 2) for x = 0, 1, we use the results from
Table 5.1. From this table, we get P (X = 0, Y = 2) = (1 − p)2 ,
P (X = 1, Y = 2) = 2p(1 − p)2 and P (Y = 2) = (1 − p)2 + 2p(1 − p)2 .
Therefore
P (X = 0 | Y = 2) =
1
2p
and P (X = 1 | Y = 2) =
.
1 + 2p
1 + 2p
7.2 First the marginal mass function of Y must be determined. Using the
binomium of Newton, it follows that
y X
y
1 x 1 y−x
y 1 y X
y
1 x
=
3
3
x 2
x 6
x=0
x=0
1 y 1
y 1 y
=
for y = 1, 2, . . . .
+1 =
3
2
2
P (Y = y) =
That is, the random variable X is geometrically distributed with parameter 12 . Hence the conditional mass function of X given that Y = y
is given by
P (X = x | Y = y) =
1 x 1 y−x
y
x
6
3
y
1
2
y
1 x 2 y
=
3
x 2
for x = 0, 1, . . . , y.
P∞ y y−x
= (1 − a)−x−1 for |a| < 1,
Note: Using the identity
y=x x a
it is readily verified that the marginal mass
function of X is given
1
3 1 x
by P (X = 0) = 2 and P (X = x) = 2 4
for x = 1, 2, . . .. The
conditional mass function of Y given that X = 0 is
1 y
P (Y = y | X = 0) = 2
3
for y ≥ 1.
For x ≥ 1 the conditional mass function of Y given that X = x is
y
2 x+1 1 y−x
P (Y = y | X = x) =
x 3
3
for y ≥ x.
150
7.3 Since P (X = 1, Y = 2) = 61 × 61 , P (X = x, Y = 2) = 46 × 61 ×
for x ≥ 3 and P (Y = 2) = 65 × 61 , it follows that
P (X = x | Y = 2) =
1
5
4
30
P (X = x | Y = 20) =
(
5 x−3 1
×6
6
for x = 1,
x ≥ 3,
5 x−3
for
6
x−1
−x
4
1 5
6
6 6
19
1 5 x−21
4
6
6 6
for 1 ≤ x ≤ 19,
for x ≥ 21.
7.4 In Problem 5.5 it was shown that the joint probability mass function
of X and Y is
P (X = x, Y = y) =
y−x−1
120
for 1 ≤ x ≤ 8 and x + 2 ≤ y ≤ 10.
and t the marginal distributions of X and Y are
P (X = x) =
(10 − x)(9 − x)
240
for 1 ≤ x ≤ 8.
(y − 1)(y − 2)
for 3 ≤ y ≤ 10.
240
It now follows that, for fixed y, the conditional probability mass function of X given that Y = y is
P (Y = y) =
2(y − x − 1)
(y − 1)(y − 2)
P (X = x | Y = y) =
for x = 1, . . . , y − 2.
For fixed x, the conditional probability mass function of Y given that
X = x is
P (Y = y | X = x) =
2(y − x − 1)
(10 − x)(9 − x)
for y = x + 2, . . . , 10.
7.5 The joint mass function of X and Y is
x−1 x
1
1 x
5
for 0 ≤ y ≤ x, x ≥ 1.
P (X = x, Y = y) =
6
6 y
2
The marginal mass function of Y is
 P
5 x 12
∞
x
1

 5 x=y y 12 = 35
P (Y = y) =

 P∞ 5 x−1 1 1 x =
x=1
6
6
2
5 y
7
1
7
for y ≥ 1,
for y = 0,
151
For fixed y ≥ 1,
x −y
5
5
7 x
for x ≥ y.
P (X = x | Y = y) =
12 y
12
7
Further, P (X = x | Y = 0) is
5 x
12
7
5
for x ≥ 0.
7.6 The joint probability mass function of X and Y is given by
1
36
2
36
P (X = i, Y = i) =
P (X = i, Y = j) =
for 1 ≤ i ≤ 6
for 1 ≤ i < j ≤ 6.
The marginal mass functions of X and Y are
P (X = i) =
2(6 − i)
1
2(j − 1)
1
+
and P (Y = j) =
+
36
36
36
36
for 1 ≤ i ≤ 6 and 1 ≤ j ≤ 6. This leads to
P (X = i | Y = j) =
(
1
1+2(j−1)
2
1+2(j−1)
for i = j
for i < j
P (Y = j | X = i) =
(
1
1+2(6−i)
2
1+2(6−i)
for j = i
for j > i
7.7 The joint mass function of X and Y is
24 1 x 5 24−x x 1 y 5 x−y
P (X = x, Y = y) =
.
x
6
6
y
6
6
for 0 ≤ x ≤ 24 and 0 ≤ y ≤ x. Thus, the marginal mass function of Y
is
y 24−y X
24 x
1
24 x
5
1
P (Y = y) =
y
x
6
6
6
x=y
for 0 ≤ y ≤ 24. For fixed y, the conditional mass function of X is
P (X = x | Y = y) =
24 x
x
y
P24 24
k=y k
1 x
6
1 k
k
y
6
for y ≤ x ≤ 24.
152
7.8 The joint probability mass function of X and Y is given by
26 13 13
P (X = x, Y = y) =
x
y
13−x−y
52
13
for x + y ≤ 13.The marginal mass functions of X and Y are
39 39 13
13
P (X = x) =
x
13−x
52
13
and P (Y = y) =
y
13−y
52
13
.
for x = 0, 1 . . . 13 and y = 0, 1, . . . , 13. Thus the conditional mass
functions are given by
26 26 13
13
P (X = x | Y = y) =
x
13−x−y
39
13−y
, P (Y = y | X = x) =
y
7.9 The marginal densities of X and Y are given by
Z ∞
fX (x) =
xe−x(y+1) dy = e−x for x > 0
Z 0∞
1
xe−x(y+1) dx =
for y > 0.
fY (y) =
(1
+
y)2
0
13−y−x
39
13−x
.
Therefore the conditional density functions of X and Y are
fX (x | y) = (y + 1)2 xe−x(y+1)
fY (y | x) = xe
−xy
for x > 0,
for y > 0.
R∞
R∞
fY (y | 1) dy = 1 e−y dy = e−1 .
R1
7.10 The marginal densities are fX (x) = 0 (x − y + 1) dy = x + 0.5 for
R1
0 < x < 1 and fY (y) = 0 (x − y + 1) dx = 1.5 − y for 0 < y < 1. Thus
the conditional density functions of X and Y are
Further, P (Y > 1 | X = 1) =
fX (x | y) =
1
x−y+1
x−y+1
and fY (y | x) =
.
1.5 − y
x + 0.5
for 0 < x < 1 and 0 < y < 1. This gives
Z 1
4
(x + 0.75) dx = 0.6
P (X > 0.5 | Y = 0.25) =
5
0.5
Z 1
4
1
P (Y > 0.5 | X = 0.25) =
(1.25 − y) dy = .
3
0.5 3
153
7.11 The marginal probability densities of X and Y are
Z x
1
dy = 1 for 0 < x < 1
fX (x) =
x
0
Z 1
1
fY (y) =
dx = −ln(y) for 0 < y < 1.
y x
Therefore, for any given y with 0 < y < 1, the conditional density
1
fX (x | y) = − x ln(y)
for y ≤ x < 1. For any given x with 0 < x < 1,
the conditional density fY (y | x) = x1 for 0 < y ≤ x.
7.12 The marginal density functions of X and Y are
Z ∞
fX (x) =
e−y dy = e−x for x > 0
x
Z y
e−y dx = ye−y for y > 0.
fY (y) =
0
Thus the conditional density functions of X and Y are fX (x | y) =
for 0 < x < y and fY (y | x) = e−(y−x) for y > x.
1
y
7.13 We have fX (x) = 1 for 0 < x < 1 and fY (y | x) = x1 for 1 − x < y < 1.
By f (x, y) = fX (x)fY (y | x), the joint density of X and Y is f (x, y) =
1
x for 0 < x < 1 and
RR 1 − x < y < 1. Hence, using the basic formula
P ((X, Y ) ∈ C) = C f (x, y) dx dy, we get
Z 1
Z 1
1
dx
P (X + Y > 1.5) =
dy = 0.5ln(0.5) + 0.5 = 0.1534
x
1.5−x
0.5
Z 0.5 Z 1
Z 1
Z 1
1
1
dx
P (Y > 0.5) =
dx
dy +
dy = 0.5 − 0.5ln(0.5)
1−x x
0
0.5 x
0.5
= 0.8466.
R 1−y
7.14 Since fY (y) = 0 3(x + y) dx = 32 (1 − y)2 + 3y(1 − y) for 0 < y < 1,
we have for fixed y that
x+y
for 0 < x < 1 − y.
fX (x | y) = 1
2
2 (1 − y) + y(1 − y)
7.15 By f (x, y) = fX (x)fY (y | x), the joint density of X and Y is f (x, y) =
2x × x1 = 2 for 0 < y ≤ x < 1. The marginal density of Y is fY (y) =
2(1 − y) for 0 < y < 1. Using again f (x, y) = fY (y)fX (x | y), we get
fX (x | y) =
1
1−y
for y ≤ x < 1,
154
which is the uniform density on (y, 1).
7.16 The marginal density functions of X and Y are
fX (x) =
Z
x
0
2y
dy = 1 and fY (y) =
x2
Z
1
y
2y
dx = 2 − 2y
x2
for 0 < x < 1 and 0 < y < 1. Therefore the conditional density
functions of X and Y are
fX (x | y) =
2y/x2
2y/x2
for y < x < 1, fY (y | x) =
for 0 < y < x.
2 − 2y
1
To simulate a random observation from f (x, y), we use the representation f (x, y) = fX (x)fY (y | x). A random observation from fX (x) is
obtained by generating a random number from (0, 1). Since for fixed x
2
the cumulative distribution function P (Y ≤ y | x) = xy 2 for 0 ≤ y ≤ x
is easily inverted, a random observation from fY (y | x) = x2y2 can be
obtained by using the inverse-transformation method. Therefore, a
random observation from f (x, y) can be simulated as follows: (i) generate two random numbers u1 and u2 from (0, 1), (ii) output x := u1
√
and y := u1 u2 .
7.17 Put for abbreviation
Then
1 1
P∆y (x | y) = P X = x | y − ∆y ≤ Y ≤ y + ∆y .
2
2
P y − 21 ∆y ≤ Y ≤ y + 12 ∆y | X = x P (X = x)
.
P∆y (x | y) =
P y − 21 ∆y ≤ Y ≤ y + 21 ∆y
Thus, for continuity points y, we have
P∆y (x | y) ≈
pX (x)fY (y | x)
pX (x)fY (y | x)∆y
=
.
fY (y)∆y
fY (y)
Define pX (x | y) as lim∆y→0 P∆y (x | y). Then, for fixed y, pX (x | y) as
function of x is proportional to
P pX (x)fY (y | x). The proportionality
constant is the reciprocal of x pX (x)fY (y | x). This explains the
definition of the conditional mass function of X.
155
7.18 Assuming that the random noise N is independent of X,
y−1
N −0
≤
P (Y ≤ y | X = 1) = P (N ≤ y − 1) = P
σ
σ
y−1
= Φ
for − ∞ < y < ∞.
σ
In the same way,
P (Y ≤ y | X = −1) = Φ
y+1
σ
for − ∞ < y < ∞.
Differentiation gives

√
(1/σ 2π)e− 12 (y−1)2 /σ2
fY (y | x) =
(1/σ √2π)e− 12 (y+1)2 /σ2
for x = 1
for x = −1.
Next apply the general formula
pX (x)fY (y | x)
.
pX (x | y) = P
u pX (u)fY (y | u)
This formula was derived in Problem 7.17. Hence we find
1
P (X = 1 | Y = y) =
pe− 2 (y−1)
1
pe− 2 (y−1)
2 /σ 2
2 /σ 2
1
+ (1 − p)e− 2 (y+1)
2 /σ 2
.
7.19 Let Y be the time needed to process a randomly chosen claim. For
fixed 0 < x < 1, we have fY (y | x) = x1 for x < y < 2x. By the
relation f (x, y) = fY (y | x)fX (x), the joint density function of X and
Y satisfies f (x, y) = 1.5(2 − x) for 0 < x < 1 and x < y < 2x, and
f (x, y) = 0 otherwise. The sought probability is P ((X, Y ) ∈ C) with
C = {(x, y) : 0 ≤ x ≤ 1, x ≤ y ≤ min(2x, 1)} and is evaluated as
Z 1 Z min(2x,1)
1.5(2 − x) dy
dx
x
0
Z 1
Z 0.5
1.5(2 − x)(1 − x) dx = 0.5625.
1.5(2 − x)x dx +
=
0
0.5
7.20 By the independence of X and Y , the joint density function f (x, y) of
X and Y is given by
f (x, y) = λe−λx λe−λy
for x, y > 0.
156
Let V = X and W = X + Y . To obtain fV,W (v, w), we use the transformation rule 5.7. The functions a(v, w) and b(v, w) are a(v, w) = v
and b(v, w) = w − v. The Jacobian is equal to 1. Hence the joint
density of V and W is
fV,W = λe−λv λe−λ(w−v) = λ2 e−w
for 0 < v < w < ∞.
The marginal density of W is given by
Z w
λ2 e−w dv = wλ2 e−w
fW (w) =
for w > 0.
0
Hence, for any fixed w, the conditional density of V given that W = w
is
λ2 e−w
1
fV (v | w) =
=
for 0 < v < w.
2
−w
wλ e
w
This verifies that the conditional density of X given that X + Y = u
is the uniform density on (0, u).
R1
7.21 We have P (N = k) = 0 P (N = k | X1 = u) du, by the law of
conditional probability. Thus
Z 1
1
uk−2 (1 − u)du =
for k = 2, 3, . . . .
P (N = k) =
k(k − 1)
0
P
1
The expected value of N is equal to ∞
k=2 k−1 = ∞.
7.22 The number p is a random observation from a random variable U that
is uniformly distributed on (0, 1). By the law of conditional probability,
Z 1 Z 1
n k
P (X = k | U = p) dp =
P (X = k) =
p (1 − p)n−k dp
k
0
0
R1
for k = 0, 1, . . . , n. Using the fact that the beta integral 0 xr−1 (1 −
x)s−1 dx is equal to (r − 1)!(s − 1)!/(r + s − 1)! for positive integers r
and s, we next obtain
1
n k!(n − k)!
=
for k = 0, 1, . . . , n.
P (X = k) =
n+1
k (n + 1)!
7.23 Condition on the unloading time. By the law of conditional probability, the probability of no breakdown is given by
Z ∞
1
1 2 2
1
2
2
e−λy √ e− 2 (y−µ) /σ dy = e−µλ+ 2 σ λ .
σ 2π
−∞
157
7.24 Let the random variable R denote the number of passengers who make
a reservation for a given trip. Then, P (R = r) = 61 for r = 1, . . . , 6.
By the law of conditional probability,
P (V = j) =
10 X
r
j
r=j
P (W = k) =
10
X
j=0
0.8j 0.2r−j P (R = j)
for j = 0, 1, . . . , 10
P (W = k | V = j)P (V = j)
for k = 0, 1, 2, 3.
The probability P (V = j) has the values 0.0001, 0.0014, 0.0121,
0.0547, 0.1397, 0.2059, 0.1978, 0.1748, 0.1286, 0.0671, and 0.0179 for
j = 0, 1, . . . , 10. The probability mass function of W is
P (W = 0) = 0.25[P (V = 0) + · · · + P (V = 9)] + P (V = 10) = 0.2634,
P (W = 1) = 0.45[P (V = 0) + · · · + P (V = 8)] + 0.75P (V = 9)
= 0.4621,
P (W = 2) = 0.20[P (V = 0) + · · · + P (V = 7)] + 0.30P (V = 8)
= 0.1959,
P (W = 3) = 0.10[P (V = 0) + · · · + P (V = 7)] = 0.0786.
7.25 Let YPbe the outcome of the first roll of the die. Then, by P (X =
k) = 6i=1 P (X = k | Y = i)P (Y = i), we get
P (X = k) =
5 X
1 k 5 i−k
i
i=1
k
6
6
×
1 k−1 5 6−k+1 1
6
1
×
+
k−1 6
6
6
6
6
for k = 0, 1, . . . , 7 with the convention ki = 0 for k > i and −1
=
0. This probability has the numerical values 0.4984, 0.3190, 0.1293,
0.0422, 0.0096, 0.0014, 1.1 × 10−4 , and 3.6 × 10−6 .
7.26 Let f (x) be the gamma density with shape
R ∞ parameter r and scale parameter (1 − p)/p. Then, P (N = j) = 0 P (N = j | X = x)f (x) dx,
by the law of conditional probability. Thus
Z ∞
xj 1 − p r xr−1 −x(1−p)/p
e−x
P (N = j) =
e
dx
j!
p
(r − 1)!
0
Z ∞ r+j
xr+j−1
(r + j − 1)! j
1
r
=
p (1 − p)
e−x/p dx.
j! (r − 1)!
p
(r + j − 1)!
0
158
Since the gamma density 1/p
(0, ∞), it next follows that
P (N = j) =
r+j
xr+j−1 −x/p
(r+j−1)! e
r+j−1 j
p (1 − p)r
r−1
integrates to 1 over
for j = 0, 1, . . . .
k−r
This can be written as P (N = k − r) = k−1
(1 − p)r for k =
r−1 p
r, r + 1, . . . . In other words, the random variable N + r has a negative
binomial distribution with parameters r and p (the random variable
N gives the number of failures before the rth success occurs).
7.27 By the law of conditional probability, the probability of having k red
balls among the r selected balls is
B
X
n=0
n
k
B−n
r−k
B
r
B n
p (1 − p)B−n .
n
This probability can be simplified to kr pk (1 − p)r−k . This result can
be directly seen by assuming that the B balls are originally non-colored
and giving each of the r balls chosen the color red with probability p.
7.28 Denote by f1 (x) and f2 (x) the probability densities of the random
variables X1 and X2 . Let us first point out that pf1 (x) + (1 − p)f2 (x)
is not the probability density of W = pX1 + (1 − p)X2 , as many
students erroneously believe. As counterexample, take p = 21 and
assume that X1 and X2 are independent random variables having the
uniform distribution on (0, 1). Then pf1 (x)+(1−p)f2 (x) is the uniform
density on (0, 1), but 21 X1 + 21 X2 has a triangular density rather than
a uniform density.
The random variable V is distributed as X1 with probability p and as
X2 with probability 1 − p. Then, by the law of conditional probability,
P (V ≤ x) = pP (X1 ≤ x) + (1 − p)P (X2 ≤ x)
and so pf1 (x)+(1−p)f2 (x) is the probability density of V . This density
is the N (pµ1 + (1 − p)µ2 , p2 σ12 + (1 − p)2 σ22 ) density when the N (µ1 , σ12 )
distributed X1 and the N (µ2 , σ22 ) distributed X2 are independent of
each other. This result uses the fact that the sum of two independent
normal random variables is normally distributed, see Rule 8.6.
159
7.29 By the law of conditional probability,
Z 1 hZ 1 Z 1 b2 i
b2 2
P C≤
db
db =
da
P AC ≤
P (B ≥ 4AC) =
4
4a
0
0
0
Z 1 2 i Z 1 h 2
Z 1 h Z b2 /4
b
b
b2 b2 i
da +
db
=
db
da =
− ln
4
4
4
b2 /4 4a
0
0
0
1
5
+ ln(2) = 0.2544.
=
36 6
7.30 Let the random variable X1 be the first number picked. Also, let the
random variable χ be 1 if you have to pick exactly two numbers and
be 0 otherwise. By the law of conditional probability, the probability
that you have to pick exactly two numbers is
Z 1
Z 1
0.5
1 1
P (χ = 1 | X1 = x1 ) dx1 =
P (χ = 1) =
dx1 = − ln
.
2
2
0.5 x1
0.5
Similarly, we get that the probability that you have to pick exactly
three numbers is equal to
Z x1
Z 1
1 h 1 i2
0.5
.
dx2 = ln
dx1
4
2
0.5 x2
0.5
In general, for any 0 < a < 1, the probability that exactly n num
bers must be picked in order to obtain a number less than a is a −
n−1
ln(a)
/(n − 1)! for n = 1, 2, . . . (by writing a as e−(− ln(a)) , this
probability mass function can be seen as a shifted Poisson distribution). The expected value of the number of picks is 1 − ln(a).
Note: A funny illustration of the discrete version of the problem is as
follows. It is your birthday. You are asked to blow out all of the c
burning candles on your birthday cake. The number of burning candles that expire when you blow while there are still d burning candles
has the discrete uniform distribution on 0, 1, . . . , d. Then, using the
law of conditional expectation, it is readily verified that the expected
P
value of the number of attempts to blow out all c candles is 1+ ck=1 k1 .
However, the probability mass function of the number of attempts is
rather difficult to calculate.
7.31 The expected
number of crossings of the zero level during the first
Pn−1
n jumps is
k=1 E(Ik ), where E(Ik ) = P (Ik = 1). Denote by Sk
the position of the particle just before the (k + 1)th jump. Then Sk
160
is the sum of k independent standard normally distributed random
variables and is thus normally distributed (see also Rule 8.6). The
random variable Sk has expected value 0 and variance k. Thus Sk has
1 2
1
e− 2 x /k . By conditioning on Sk , we get that
the density function √2πk
P (Ik = 1) is equal to
Z −x
1 2
1 2
1
1
√ e− 2 y dy
e− 2 x /k dx
2π
2πk
−∞
0
Z 0
Z ∞
1 2
1 2
1
1
√
√ e− 2 y dy.
+
e− 2 x /k dx
2π
2πk
−∞
−x
Z
∞
√
Using polar coordinates in order to evaluate these
it next
√ integrals,
follows that P (Ik = 1) is equal to π1 π2 − arctg( k) = π1 arctg √1k .
Therefore
n−1
n−1
1 X
1X
E(Ik ) =
arctg √ .
π
k
k=1
k=1
√
Note: An asymptotic expansion for this sum is π2 n + c + 6π1√n , where
c = −0.68683... .
7.32 Take the minute as time unit and represent the period between 5.45
and 6 p.m. by the interval (0, 15). If you arrive at time point x at
the bus stop, you will take bus number 1 home if no bus number
3 will arrive in the next 15 − x time units. The probability of this
1
happening is e−λ(15−x) with λ = 15
. This follows from the fact that
the exponential distribution is memoryless. By conditioning on your
arrival epoch having the uniform distribution on (0,15) with density
1
for 0 < x < 15 and using the law of conditional probability,
f (x) = 15
it now follows that
Z 15
1
e− 15 (15−x) f (x) dx
P (you take bus 1 home) =
0
Z 15
1
1
1
e− 15 (15−x) dx = 1 − .
=
15 0
e
An intuitive explanation of why this probability is larger than 21 is as
follows. If you arrive at a random point in time at the bus stop, your
average waiting time for bus number 3 is 15 minutes (by the memoryless property of the exponential distribution), while your average
waiting time for bus number 1 is 7.5 minutes.
161
7.33 It suffices to find P (a, b) for a ≥ b. By symmetry, P (a, b) = 1 − P (b, a)
for a ≤ b. For fixed a and b with a ≥ b, let the random variables SA
and SB be the total scores of the players A and SB . Let fA (s) be the
probability density of SA . Then, by the law of conditional probability,
Z 1
P (A beats B | SA = s)fA (s) ds.
P (a, b) =
0
By conditioning on the outcome of the first draw of player A, it follows
that
Z s
P (SA ≤ s) =
(s − u) du for 0 < s ≤ a,
0
Ra

for a < s ≤ 1,
 1 − s + 0 1 − (s − u) du
P (SA > s) = R
 a
1
−
(s
−
u)
du
for 1 < s < 1 + a.
s−1
Differentiation gives that the density function fA (s) of SA is s for
0 < s ≤ a, 1 + a for a < s ≤ 1 and 1 + a − s for 1 < s < 1 + a. The
distribution of SB follows by replacing a by b in the distribution of
SA . Next it is a matter of tedious algebra to obtain
P (a, b) =
1 1
− (a − b)(a2 b + a2 + ab2 + b2 + ab + 3a − 3)
2 6
for a ≥ b.
Also, by a symmetry argument,
P (a, b) =
1 1
+ (b − a)(b2 a + b2 + ba2 + a2 + ba + 3b − 3)
2 6
for a ≤ b,
using the fact that P (a, b) = 1 − P (b, a) for a ≤ b. Let a0 be the
optimal threshold value of player A. Then, P (a0 , b) ≥ 0.5 for all b
with P (a0 , b) = 0.5 for b = a0 . This leads to the equation
2a30 + 3a20 + 3a0 − 3 = 0.
The solution of this equation is a0 = 0.5634. If player A uses this
threshold value, his win probability is at least 50%, whatever threshold
player B uses.
7.34 It should be clear that each player uses a strategy of the following
form: choose a new number if the original number is below a given
threshold, otherwise keep the original number. Let P (a, b) denote the
winning probability of player A when player A uses a threshold a and
162
player B uses a threshold b. Player A wants to use the threshold
a = a∗ , where a∗ attains the maximum in maxa minb P (a, b). By the
law of conditional probability, we have for any given a, b that
Z 1 Z 1
P (a, b) =
dx
a(x, y)dy,
0
0
where a(x, y) is player A’s winning probability if the original number
of player A is x and the original number of player B is y. Consider
first the case of a ≥ b. Then, for x ≤ a we have that a(x, y) = 1 − y
for y > b and a(x, y) = 12 for y < b, while for x > a we have that
a(x, y) = 1 for b < y < x, a(x, y) = 0 for y > x and a(x, y) = x for
y < b. This leads to
P (a, b) =
1 1
+ (a − b − a2 + ab + ab2 − a2 b)
2 2
for a ≥ b.
By a symmetry argument, we have P (a, b) = 1 − P (b, a) for a ≤ b.
This gives
1 1
+ (a − b + b2 − ab + ab2 − a2 b)
for a ≤ b.
2 2
It is not necessary to invoke a numerical method for getting the number
a that attains the maximum in maxa minb P (a, b). It is not difficult to
verify by analytical means that this number is given by
1 √
a∗ = ( 5 − 1).
2
P (a, b) =
To prove this, we write P (a, b) as P (a, b) = 12 + 21 (a − b)(1 − a − ab)
for a ≥ b and P (a, b) = 12 + 12 (a − b)(1 − b − ab) for a ≤ b. Using these
√
expressions and the fact that a∗ = 21 ( 5 − 1) satisfies a∗ × a∗ + a∗ = 1,
it is directly verified that P (a∗ , b) > 21 both for b > a∗ and for b < a∗
(of course, P (a∗ , b) = 21 for b = a∗ ). Hence, if player A chooses his
√
threshold as 12 ( 5 − 1) he will win with a probability of more than
√
50% unless player B also uses the threshold 21 ( 5 − 1) in which case
player A wins with a probability of exactly 50%.
7.35 Denote by S3 (a) [C3 (a)] the probability of player A being overall winner if player A gets the score a at the first draw and stops [continues]
after the first draw. By conditioning on the outcome of the second
draw of player A,
Z 1−a
S3 (a + v) dv for 0 < a < 1.
C3 (a) =
0
163
The function S3 (a) is increasing with S3 (0) = 0 and S3 (1) = 1, whereas
the function C3 (a) is decreasing with C3 (0) > 0 and C3 (1) = 0. Let
a3 be defined as the solution to the equation S3 (a) = C3 (a), then a3 is
the optimal stopping point for the first player A in the three-player’s
game. It will be shown that
S3 (a) = a4
for a ≥ a2
and
1
C3 (a) = (1 − a5 )
5
for a ≥ a2 ,
where a2 (= 0.53209) is the optimal stopping point for the first player
in the two-player’s game. Taking for granted that a3 ≥ a2 , it follows
that the optimal stopping point a3 is the solution to the equation
1
a4 = (1 − a5 )
5
on the interval (a2 , 1). This solution is given by
a3 = 0.64865.
The calculation of the overall winning probability of player A is less
simple and requires S3 (a) for all 0 < a < 1.
To derive S3 (a) for all 0 < a < 1, we first observe that in the three
player’s game the optimal strategy of the second player B is to stop
after the first draw if and only if the score of this draw exceeds both
the final score of player A and a2 = 0.53209. Thus, given that player
A’s final score is a with a > a2 , the probability of player B getting a
score below a in the first
A in the
R adraw and next losing from player
2
second draw is equal to 0 [a − x + (1 − (1 − x))] dx = a . Hence
P (A will beat B | A’s final score is a) = a2
for a > a2 .
To obtain P (A will beat B | A’s final score is a) for 0 < a < a2 , we
have to add the probability that B’s score is between a and a2 in the
first drawR and exceeds 1 after the second draw. This probability is
a
given by a 2 xdx = 12 a22 − 21 a2 . Thus, for 0 < a < a2 ,
1
1
P (A will beat B | A’s final score is a) = a2 + a22 − a2 .
2
2
Obviously, the conditional probability that player A will beat player C
given that player’s A final score is a and player A has already beaten
player B is equal to a2 for all 0 < a < 1. This gives
(
a2 × a2
for a > a2
S3 (a) =
1 2
1 2
2
2
for 0 < a < a2 .
(a + 2 a2 − 2 a ) × a
164
Next we evaluate C3 (a). By
C3 (a) =
Z
1−a
S3 (a + v) dv for 0 < a < 1,
0
we obtain
1
 5 (1 − a5 )
C3 (a) =
1 5
1 5
1
5
2 3
5
10 (a2 − a ) + 6 (a2 − a2 a ) + 5 (1 − a2 ))
for a ≥ a2
for 0 < a < a2 .
This result completes the derivation of the critical level a3 , but also
enables us to calculate P3 (A), which is defined as the probability of
player A winning under optimal play of each of the players. By the
law of conditional probability,
P3 (A) =
Z
a3
C3 (a) da +
0
Z
1
S3 (a) da = 0.3052.
a3
Let P3 (B) be the probability of player B being the overall winner when
all players act optimally. To calculate P3 (B), it is convenient to define
F (a) as the probability that the final score of player A will be no more
than a for 0 ≤ a ≤ 1. Then, by conditioning on the result of the first
draw of player A,
Z a3
1
(1 − (1 − x))dx = a23 for a = 0,
F (a) =
2
0
Z a Z a−x
1
F (a) = F (0) +
dx
dy = F (0) + a2 for 0 < a < a3 .
2
0
0
For a > a3 ,
F (a) = F (0) +
Z
a
dx +
a3
Z
0
a3
dx
Z
a−x
0
1
dy = F (0) − a23 − a3 + (1 + a3 )a.
2
The cumulative distribution function F (a) has the mass 21 a23 at a = 0,
the density a for 0 < a < a3 and the density 1 + a3 for a3 < a < 1.
Next we calculate P3 (B) by conditioning on the final score of player A.
Using the fact that P2 (A) = 0.453802 is the overall winning probability
of the first player in the two-player game and noting that in the twoplayer game the first player wins with probability v 2 when the final
165
score of the first player is v, it follows that P3 (B) is given by
Z 1−v
Z x
Z 1
i
hZ 1
1 2
2
(v + w)2 dw
dv
v dv +
a3 P2 (A) +
(1 + a3 ) dx
2
x−v
0
x
a3
Z 1
Z a3
i
h Z x Z 1−v
v 2 dv
(v + w)2 dw +
+
dv
x dx
+
+
Z
Z
0
a2
a2
x dx
0
a2
dv
x
hZ
Z
x
0
1−v
dv
Z
x−v
1−v
x
(v + w)2 dw
x−v
2
(v + w) dw +
0
Z
1
a2
i
v 2 dv .
After some algebra, this leads to
h1 1
1
1
1 i
P3 (B) = a23 P2 (A) + (1 + a3 ) − a3 −
+ a43
2
3 3
12 12
h1 1
1
1 5i
2
+ a
+ (1 + a3 ) − a3 −
6 6
15 15 3
1
1
1
+ (a23 − a22 ) − a53 + a52
6
15
15
1 6 1 3 1 3
1
1
1 3
+ a3 − a3 + a2 − a2 − a62 + a62
9
18
6
9
24
72
1
1
+ (1 − a32 ) × a22 .
3
2
This gives P3 (B) = 0.3295. Finally, the probability of player C being
the overall winner is 1 − P3 (A) − P3 (B) = 0.3653. By simulation, we
found that the final score of the winning player in the three-player
game has the expected value 0.836 and the standard deviation 0.149
(in the two-player game the final score of the winning player has the
expected value 0.753 and the standard deviation 0.209).
Note: For the s-player game the optimal strategy for the players is
easy to characterize: the first player A stops after the first draw if
and only if this draw gives a score that exceeds as , the second player
stops after the first draw if and only if this draw gives a score that
exceeds both as−1 and the final score of the first player; generally, the
ith player stops after the first draw only if this draw gives a score that
exceeds both as−i+1 and the largest value of the final scores obtained
so far. For any s ≥ 2, the critical level as is the solution of
a2(s−1) =
1
(1 − a2s−1 )
2s − 1
166
on the interval (as−1 , 1), where a1 = 0. For the general s-player game,
the calculation of the overall win probability of each of the players is
rather cumbersome. We have used computer simulation to obtain the
overall win probabilities for the cases of s = 4 and s = 5. The overall
win probabilities of the players are 0.231, 0.242, 0.255, and 0.271 when
s = 4 and are 0.186, 0.192, 0.199, 0.207, and 0.215 when s = 5. The
optimal stopping point as has the values 0.71145 and 0.75225 for s = 4
and s = 5.
7.36 The conditional expected value of the number of consolation prizes
given that no main prize has been won is given by
30 15
3
3
X
X
y
3−y
y
yP (Y = y | X = 0) =
= 1.
E(Y | X = 0) =
45
y=0
y=0
7.37 Since
fY (y | s) =
3(s + 1)3
(s + y)4
3
for y > 1,
we have
∞
3(s + 1)3
dy
(s + y)4
1
Z ∞
1
(s + y)−3 dy = 1 + (s + 1).
= 1 + (s + 1)3
2
1
E(Y | X = s) =
Z
y
7.38 The joint probability mass function of X and Y is given by
P (X = x, Y = y) =
y−x−1
for 1 ≤ x ≤ 98, x + 2 ≤ y ≤ 100.
100
3
The marginal distributions of X and Y are given by
P (X = x) =
(100 − x)(99 − x)
(y − 1)(y − 2)
, P (Y = y) =
100
2 3
2 100
3
for x = 1, 2, . . . , 98 and y = 3, . . . , 100. Next it follows that
2(y − x − 1)
(y − 1)(y − 2)
2(y − x − 1)
P (Y = y | X = x) =
.
(100 − x)(99 − x)
P (X = x | Y = y) =
167
Hence
y−2
X
1
2
x(y − x − 1) = y
E(X | Y = y) =
(y − 1)(y − 2)
3
x=1
E(Y | X = x) =
100
X
2
1
y(y − x − 1) = (x + 202).
(100 − x)(99 − x)
3
y=x+2
7.39 The joint density of X and Y is f (x, y) = 6(y − x) for 0 < x < y < 1,
as follows from P (x < X ≤ x + ∆x, y < Y ≤ y + ∆y) = 6∆x(y − x)∆y
for ∆x, ∆y small, see also Example 5.3 This gives fX (x) = 3(1 − x)2
for 0 < x < 1 and fY (y) = 3y 2 for 0 < y < 1. Thus
6(y − x)
3y 2
6(y − x)
fY (y | x) =
3(1 − x)2
fX (x | y) =
for 0 < x < y
for x < y < 1.
This gives
y
6(y − x)
1
dx = y
2
3y
3
0
Z 1
6(y − x)
2+x
y
E(Y | X = x) =
dy =
.
2
3(1
−
x)
3
x
E(X | Y = y) =
Z
x
7.40 For ease, consider the case that X and X are continuously distributed.
If X and Y are independent, then their joint density function f (x, y)
satisfies f (x, y) = fX (x)fY (y). Then, by fX (x | y) = f (x, y)/fY (y), it
follows that fX (x | y) = fX (x) and so
Z
Z
E(X | Y = y) = xfX (x | y) dx = xfX (x) dx = E(X).
x
x
7.41 Noting that X can be written as X = 12 (X + Y ) + 21 (X − Y ), it follows
that
1
1
E(X | X + Y = v) = v + E(X − Y | X + Y = v).
2
2
By Problem 6.9, X + Y and X − Y are independent and so E(X − Y |
X + Y = v) = E(X − Y ). Also, E(X − Y ) = µ1 − µ2 . Thus
1
1
E(X | X + Y = v) = v + (µ1 − µ2 ).
2
2
168
Note: The conditional distribution of X given that X + Y = v is the
normal distribution with mean 12 (µ1 − µ2 + v) and variance 21 σ 2 (1 − ρ).
This result follows from the relation
1
1
P (X ≤ x | X + Y = v) = P (X − Y ) + v ≤ x
2
2
and the fact that X − Y is N (µ1 − µ2 , 2σ 2 (1 − ρ)) distributed.
R1
7.42 The marginal density of X is fX (x) = x dy = 1 − x for 0 < x < 1
R1
and fX (x) = −x dy = 1 + x for −1 < x < 0. The marginal density of
Ry
Y is fY (y) = −y dx = 2y for 0 < y < 1. Therefore, for any 0 < y < 1,
fX (x | y) =
1
2y
for − y < x < y
and fX (x | y) = 0 otherwise. Thus
E(X | Y = y) =
Z
y
x
−y
1
dx = 0.
2y
For any 0 < x < 1, we have
fY (y | x) =
1
1−x
for x < y < 1
and fY (y | x) = 0 otherwise. For any −1 < x < 0, we have
fY (y | x) =
1
1+x
for − x < y < 1
and fY (y | x) = 0 otherwise. Thus
E(X | Y = y) =
Z
y
x
−y
1
dx = 0.
2y
For 0 < x < 1, we have
E(Y | X = x) =
Z
1
y
1
1
dy = (1 + x).
1−x
2
y
1
1
dy = (1 − x).
1+x
2
x
For −1 < x < 0, we have
E(Y | X = x) =
Z
1
−x
169
7.43 Let X be the number of trials until the first success in a sequence of
Bernoulli trials and N be the number of successes in the first n trials.
Then, for 1 ≤ r ≤ n and 1 ≤ j ≤ n − r + 1,
n − j r−1
P (X = j, N = r) = (1 − p)j−1 p
p (1 − p)n−j−(r−1)
r−1
for 1 ≤ r ≤ n, 1 ≤ j ≤ n − r + 1. Since P (N = r) = nr pr (1 − p)n−r ,
we get
P (X = j | N = r) =
Thus, by E(X | N = r) =
Pn−r+1
j=1
E(X | N = r) =
n−j
r−1
n
r
.
j P (X = j | N = r), we find
n+1
r+1
for 1 ≤ r ≤ n.
7.44 Suppose that r dice are rolled. Define the random variable X as the
total number of point gained. Let the random variable I = 1 if none
of the r dice shows a 1 and I = 0 otherwise. Then E(X) = E(X |
I = 0)P (I = 0) + E(X | I = 1)P (I = 1) = E(X | I = 1)( 65 )r . Under
the condition that I = 1 the random variable X is distributed as the
sum of r independent random variables Xk each having the discrete
uniform distribution on 2, . . . , 6. Each of the Xk has expected value
4. Thus
5 r
.
E(X) = 4r
6
The function 4r( 65 )r is maximal for both r = 5 and 6. The maximal
value is 8.0376. To find σ(X), use E(X 2 ) = E(X 2 | I = 1)P (I = 1)
together with
E(X 2 | I = 1) = E[(X1 + · · · + Xr )2 ] = rE(X12 ) + r(r − 1)E 2 (X1 ).
P6
21
We have E(X1 ) = 4 and E(X12 ) =
k=2 k 5 = 18. This leads to
E(X) = 8.03756 and σ(X) = 10.008 for r = 5 and E(X) = 8.0376
and σ(X) = 11.503 for r = 6.
7.45 Denote by the random variables X and Y the zinc content and the
iron content. The marginal density of Y is
Z 3
1
1
fY (y) =
(5x + y − 30) dx = (y − 17.5) for 20 < y < 30,
75
2 75
170
and so, for any 2 < x < 3, we have fX (x | y) =
E(X | Y = y) =
Z
3
2
xfX (x | y) dx =
5x+y−30
y−17.5 .
15y − 260
6(y − 17.5)
Thus
for 20 < y < 30.
7.46 The insurance payout is a mixed random variable: it takes on one of the
discrete values 0 and 2×106 or a value in the continuous interval (0, 2×
106 ). To calculate its expected value we condition on the outcome of
the random variable I, where I = 0 if no claim is made and I = 1
otherwise. The insurance payout is 0 if I takes on the value 0, and
otherwise the insurance payout is distributed as min(2×106 , D), where
the random variable D has an exponential distribution with parameter
λ = 1/106 . Thus, by conditioning,
E(insurance payout) = 0.9 × 0 + 0.1 × E[min(2 × 106 , D)].
Using the substitution rule, it follows that
Z ∞
6
min(2 × 106 , x)λe−λx dx
E[min(2 × 10 , D)] =
0
=
Z
2×106
xλe
0
−λx
dx +
Z
∞
2×106
(2 × 106 )λe−λx dx.
This leads after some calculations to
E[min(2 × 106 , D)] = 106 (1 − e−2 ) = 864,665 dollars.
Hence, we can conclude that E(insurance payout) = $86,466.50.
7.47 (a) We have
1
P (X ≤ x | a < Y < b) =
P (a < Y < b)
Z
x
dv
−∞
Z
b
f (v, w) dw.
a
Rb
Differentiation yields that a f (x, w) dw/P (a < Y < b) is the conditional probability density of X given that a < Y < b. In the same
way, we get that
Rx
−∞ f (x, w) dw
P (X > Y )
is the conditional density of X given that X > Y .
(b) For (X, Y ) having a standard bivariate normal distribution with
171
correlation coefficient ρ, the formula for E(X | a < Y < b) is obvious
from (a). To get E(X | X > Y , note that X = 12 (X + Y ) + 21 (X − Y ).
Therefore
1
1
E(X | X > Y ) = E(X + Y | X − Y > 0) + E(X − Y | X − Y > 0).
2
2
By the independence of X + Y and X − Y (see Problem 6.9), it follows
that E(X + Y | X − Y > 0) = E(X + Y ) = 0. Since X − Y is N (0, σ 2 )
distributed with σ 2 = 2(1 − ρ), we have
Z ∞
1 2
1
1
2
√
E(X − Y | X − Y > 0) =
ve− 2 v /σ dv,
P (X − Y > 0) σ 2π 0
which yields
E(X − Y | X − Y > 0) =
7.48 For any 0 ≤ x ≤ 1,
p
(1 − ρ)π.
P (U1 ≤ x, U1 > U2 )
P (U1 ≤ x | U1 > U2 ) =
=
P (U1 > U2 )
= x2 ,
Rx
Ru
du1 0 1 du2
1/2
P (U2 ≤ x, U1 > U2 )
=
P (U2 ≤ x | U1 > U2 ) =
P (U1 > U2 )
1 = 2 x − x2 .
2
Rx
du2
0
0
R1
u2
du1
1/2
Thus the conditional densities of U1 and U2 given that U1 > U2 are
2x and 2(1 − x) for 0 < x < 1 and zero otherwise. This gives
Z 1
2
x 2x dx =
E(U1 | U1 > U2 ) =
3
0
Z 1
1
x 2(1 − x) dx = .
E(U2 | U1 > U2 ) =
3
0
7.49 By the law of conditional probability, the probability of running out
of oil is given by 32 P (X1 > Q) + 13 P (X2 > Q), where Xi is N (µi , σi2 )
distributed. The stockout probability can be evaluated as
Q − µ 1 Q − µ 2
1
2
+
.
1−Φ
1−Φ
3
σ1
3
σ2
172
By the law of conditional expectation, the expected value of the shortage is
2
1
E[(X1 − Q)+ ] + E[(X2 − Q)+ ],
3
3
+
where x = max(x, 0). The expected value of the shortage can be
evaluated as
Q − µ 1
Q − µ 2
1
2
σ1 I
+ σ2 I
,
3
σ1
3
σ2
where I(k) is the so-called normal loss integral
Z ∞
1 2
1
I(k) = √
(x − k)e− 2 x dx.
2π k
The normal loss integral can be evaluated as
1 2
1
I(k) = √ e− 2 k − k[1 − Φ(k)].
2π
The expected value of the number of gallons left over equals the expected value of the shortage minus 23 µ1 + 31 µ2 − Q.
7.50 Denote by the random variable R the number of people who wish to
make a reservation. The random variable R is Poisson distributed with
expected value λ = 170. Let the random variable S be the number of
people who show up for a given flight and the random variable D be
the number of people who show up for a given flight but cannot be
seated. By the law of conditional expectation,
E(S) =
E(D) =
∞
X
E(S | R = r)P (R = r) =
r=0
∞
X
r=0
∞
X
E(S | R = r)e−λ
r=0
∞
X
E(D | R = r)P (R = r) =
r=0
λr
r!
E(S | R = r)e−λ
λr
.
r!
We have
min(r,Q)
E(S | R = r) =
X
min(r, Q)
k
(1 − q)k q min(r,Q)−k
k
X
min(r, Q)
(1 − q)k q min(r,Q)−k .
(k − N )
k
k=0
min(r,Q)
E(D | R = r) =
k=N +1
For the numerical data Q = 165, N = 150 and q = 0.07, we get
E(S) = 150.61 and E(D) = 2.71.
173
7.51 Let the geometrically distributed random variable Y be the number of
messages waiting in the buffer. Under the condition that Y = y the
random variable X is uniformly distributed on 0, 1, . . . , y−1. Therefore
E(X | Y = y) = 12 (y − 1) and E(X 2 | Y = y) = 61 (2y 2 − 3y + 1), see
the answer to Problem 3.47. By the law of conditional expectation,
E(X k ) =
∞
X
y=1
E(X k | Y = y)p(1 − p)y−1
for k = 1, 2.
P
P
1
2 k−1 =
k−1 =
and ∞
Using the relations ∞
k=1 k a
k=1 ka
(1−a)2
for 0 < a < 1, we find after some algebra that
1+a
(1−a)3
1
p2 − 5p + 4
(1 − p) and E(X 2 ) =
.
2p
6p2
p
This gives σ(X) = 2√13p (1 − p)(p + 5).
E(X) =
7.52 Let the random variable X be the number of newly arriving messages
during the transmission time T of a message. The conditional distribution of X given that T = n is the binomial distribution with
parameters n and p. Thus, by the law of conditional expectation,
E(X) =
∞
X
E(X | T = n)P (T = n) =
n=1
∞
X
E(X 2 ) =
=
∞
X
n=1
npa(1 − a)n−1 =
pa
p
=
2
a
a
E(X 2 | T = n)P (T = n)
n=1
∞
X
n=1
p(1 − p)a p2 a(2 − a)
+
np(1 − p) + n2 p2 a(1 − a)n−1 =
a2
a3
The standard deviation of X is
1
√
a a
p
pa(1 − p) + 2p2 (1 − a).
7.53 For fixed n, let uk (i) = E [Xk (i)]. The goal is to find un (0). Apply
the recursion
1
1
uk (i) = uk−1 (i + 1) + uk−1 (i)
2
2
for i satisfying
u0 (i) =
i
n−k
≤ 21 . The boundary conditions are
i
1
i
and uk (i) =
for i > (n − k)for 1 ≤ k ≤ n.
n
n−k
2
174
The sought probability un (0) has the values 0.7083, 0.7437, 0.7675,
and 0.7761 for n = 5, 10, 25, and 50.
Note: un (0) tends to π4 as n increases without bound, see also Example
8.4.
7.54 If you arrive at time point x at the bus stop, then your waiting time
until the next bus arrival is distributed as W (x) = min(15 − x, T ),
where the random variable T is the time from your arrival epoch until
the next bus number 3 arrives. By the memoryless property of the
exponential distribution, the random variable T has the exponential
1
density λe−λt with λ = 15
. By conditioning on the random variable
T , the expected value of W (x) is calculated as
E W (x) =
Z
15−x
tλe−λt dt +
0
Z
∞
15−x
(15 − x)λe−λt dt,
which leads after some algebra to E[W (x)] = λ1 (1 − e−λ(15−x) ). Your
arrival time X at the bus stop is uniformly distributed over (0,15) and
1
= λ for 0 < x < 15. By conditioning on
thus has density f (x) = 15
your arrival time X and applying again the law of conditional expectation, we find that the expected value of your waiting time until the
next bus arrival is given by
Z
15
0
E W (x) f (x) dx =
Z
0
15 1
15
1 − e− 15 (15−x) dx = .
e
7.55 Let Xa be your end score when you continue for a second spin after
having obtained a score of a in the first spin. Then, by the law of
conditional expectation,
E(Xa ) =
Z
1−a
(a + x) dx +
0
Z
1
1
0 dx = a(1 − a) + (1 − a)2 .
2
1−a
√
The solution of a(1 − a) + 12 (1 − a)2 = a is a∗ = 2 − 1. The optimal
strategy
√ is to stop after the first spin if this spin gives a score larger
than 2 − 1. Your expected payoff is $609.48.
7.56 Given that the carnival master tells you that the ball picked from the
red beaker has value r, let L(r) be your expected payoff when you
guess a larger value and let S(r) your expected payoff when you guess
175
a smaller value. Then
L(r) =
10
r/2
1
r/2
110 − r2
1 X
k+
= (10 − r)(r + 11) +
=
10
10
20
10
20
k=r+1
r−1
1 X
1
r/2
r2
r/2
S(r) =
= (r − 1)r +
= .
k+
10
10
20
10
20
k=1
We have L(r) > S(r) for 1 ≤ r ≤ 7 and L(r) < S(r) for 8√≤ r ≤ 10,
as can be seen by noting that 110 − x2 = x2 has x∗ = 55 ≈ 7.4
as solution. Thus, given that the carnival master tells you that the
ball picked from the red beaker has value r, your expected payoff is
maximal by guessing a larger value if r ≤ 7 and guessing a smaller
value otherwise. Applying the law of conditional expectation, it now
follows that your expected payoff is
7
X
110 − r2
k=1
20
10
×
X r2
1
1
+
×
= 4.375 dollars
10
20 10
k=8
if you use the decision rule with critical level 7. The game is not fair,
but the odds are only slightly in favor of the carnival master if you
play optimally. Then the house edge is 2.8% (for critical levels 5 and
6 the house edge has the values 8.3% and 4.1%).
7.57 In each of the two problems, define vi as the expected reward that can
be achieved when your current total is i points. A recursion scheme
for the vi is obtained by applying the law of conditional expectation.
(a) For Problem 3.24, use the recursion
6
1X
vi =
vi+k
6
k=2
for 0 ≤ i ≤ 19,
where vj = j for j ≥ 20. The maximal expected reward is v0 = 8.5290.
(b) For Problem 3.25, use the recursion
6
vi =
1X
vi+k
6
k=1
for 0 ≤ i ≤ 5,
where vj = j for 6 ≤ j ≤ 10 and v11 = 0. The maximal expected
reward is v0 = 6.9988.
176
7.58 Let pr be the probability of rolling a dice total of r with two different
numbers. Then, pr = r−1
36 for 2 ≤ r ≤ 7 and pr = p14−r for 8 ≤ r ≤
12. To find the expected reward under the stopping rule, apply the
recursion
11
X
vi+r pr for i = 0, 1, . . . , 34,
vi =
r=3
where vs = s for s ≥ 35. The expected reward under the stopping rule
is v(0) = 14.215.
Note: The stopping rule is the one-stage-look-ahead rule, see also
Problem 3.28.
7.59 Define Ei as the expected value of the remaining duration of the game
when the current capital of John is i dollars. Then, by conditioning,
Ei = 1 + pEi+1 + qEi−1 for 1 ≤ i ≤ a + b − 1, where E0 = Ea+b = 0.
The solution of this standard linear difference equation is
Ei =
i
a + b 1 − (q/p)i
−
if p 6= q and Ei = a + b − i if p = q.
q − p q − p 1 − (q/p)a+b
19
Substituting p = 18
37 , q = 37 , a = 2 and b = 8 into the formula for
the expected duration of the game of gambler’s ruin, we get that the
expected value of the number of bets is 15.083.
Note: The expected value of the number of dollars you will stake in the
game is 25 × 15.0283 = 377.07, and the expected value of the number
of dollars you will lose is (1 − 0.1598) × 50 − 0.1598 × 200) = 10.2
(the probability that you will reach a bankroll of $250 is 0.1592). The
ratio of 10.2 and 377.07 is 0.027, in agreement with the fact that in
the long-run you will lose on average 2.7 dollar cents on every dollar
you bet in European roulette, regardless of what roulette system you
play.
7.60 Let µn be the expected number of clumps of cars when there are n
cars on the road. Then, by conditioning on the position of the slowest
car, we get the recursion
µn =
n
X
i=1
(1 + µn−i )
1
n
for n = 1, 2, . . . ,
where
P µ0 = 0. This gives that the expected number of clumps of cars
is ck=1 k1 .
177
7.61 For fixed n, let F (i, k) be the maximal expected payoff that can be
achieved when still k tosses can be done and heads turned up i times
so far. The recursion is
h1
1
i i
F (i, k) = max F (i+1, k−1)+ F (i−1, k−1),
for k = 1, . . . , n
2
2
n−k
with F (i, 0) = ni . The maximal expected payoff F (0, n) has the values
0.7679, 0.7780, 0.7834, and 0.7912 for n = 25, 50, 100, and 1,000.
7.62 Define the value function fk (i) as the expected value of the maximal
score you can still reach when k rolls are still possible and the last roll
of the two dice gave a score of i points. You want to find f6 (0) and the
optimal strategy. Let aj denote the probability of getting a score of j
in a single roll of two dice. The aj are given by aj = j−1
36 for 2 ≤ j ≤ 7
and aj = a14−j for 8 ≤ j ≤ 12. The recursion is
12
i
h X
fk (i) = max i,
fk−1 (j)aj
for i = 0, 1, . . . , 12.
j=2
applies for k = 1, . . . , 6 with the boundary condition f0 (i) = i for all
i. The recursion leads to f6 (0) = 9.474. The numerical calculations
reveal the optimal strategy as well: if still k rolls are possible, you
stop if the last roll gave sk or more points and otherwise you continue,
where s1 = s2 = 8, s3 = s4 = 9, and s5 = 10.
7.63 Let state (l, r, 1) ((l, r, 0)) mean that r numbers have been taken out
of the hat, l is the largest number seen so far and l was obtained (not
obtained) at the last pick. For k = 0, 1, define Fr (l, k) as the maximal
probability of obtaining the largest number starting from state (l, r, k)
when r numbers
out of the hat. The maximal success
P have been taken
1
F
(l,
1)
probability is N
.
The
optimality equations are
l=1 1
N
N
X
l−r
1
Fr+1 (j, 1)
+
N −r
N −r
j=l+1
N
h l−r
X
l−r
1 i
n−r
Fr+1 (j, 1)
+
Fr (l, 1) = max N −r , Fr+1 (l, 0)
N −r
N −r
n−r
Fr (l, 0) = Fr+1 (l, 0)
j=l+1
l−r
for l = r, . . . , N , where n−r
= 0 for l < n and the boundary conditions are Fn (l, 0) = 0 and Fn (l, 1) = 1 for l = n, . . . , N . For n = 10
178
and N = 100, the maximal success probability is 0.6219 and the optimal stopping rule is characterized by l1 = 93, l2 = 92, l3 = 91, l4 = 89,
l5 = 87, l6 = 84, l7 = 80, l8 = 72, and l9 = 55. This rule prescribes to
stop in state (l, r, 1) if l ≥ lr and to continue otherwise.
Note: For the case of n = 10 and N = 100, we verified experimentally
that lr is the smallest value of l such that Qs (l, r) ≥ Qc (l, r), where
Qs (l, r) =
l−r
10−r
100−r
10−r
and Qc (l, r) =
10−r
X
k=1
1
k
100−l
l−r
k
10−r−k
100−r
10−r
We have that Qs (l, r) is the probability of having obtained the overall largest number when stopping in state (l, r, 1), and Qc (l, r) is the
probability of getting the overall largest number when continuing in
state (l, r, 1) and stopping as soon as you pick a number larger than l.
7.64 For k = 0, 1, let state (l, r, k) and the value-function Fr (l, k) be defined
in the same way as in Problem 7.63. Then
N
Fr (l, 0) = Fr+1 (l, 0)
l−1 X
1
Fr+1 (j, 1)
+
N
N
j=l
N
h l n−r
1i
l−1 X
Fr+1 (j, 1)
, Fr+1 (l, 0)
+
Fr (l, 1) = max
N
N
N
j=l
for l = r, . . . , N , where the boundary conditions are Fn (l, 0) = 0 and
F
(l, 1) = 1 for l = n, . . . , N . The maximal success probability is
PnN
1
l=1 F1 (l, 1) N .
7.65 Define the value function v(i0 , i1 ) as the maximal expected net winnings you can still achieve starting from state (i0 , i1 ), where state
(i0 , i1 ) means that there are i0 empty bins and i1 bins with exactly
one ball. The desired expected value v(b, 0) can be obtained from the
optimality equation
h
1
i0
i1
v(i0 , i1 ) = max i1 − (b − i0 − i1 ),
v(i0 − 1, i1 + 1) + v(i0 , i1 − 1)
2
b
b
i
b − i0 − i1
v(i0 , i1 )
+
b
with the boundary condition v(0, i1 ) = i1 − 21 (b − i1 ). This equation
can be solved by backwards calculations. First calculate v(1, i1 ) for
179
i1 = 0, . . . , b−1. Next calculate v(2, i1 ) for i1 = 0, . . . , b−2. Continuing
in this way, the desired v(b, 0) is obtained. Numerical investigations
lead to the conjecture that the optimal stopping rule has the following
simple form: you stop only in the states (i0 , i1 ) with i1 ≤ a, where a is
the smallest integer larger than or equal to 2i0 /3. For b = 25, we find
that the maximal expected net winnings is $7.566. The one-stage-lookahead rule prescribes to stop in the states (i0 , i1 ) with i0 ≤ (1 + 0.5)i1
and to continue otherwise. This stopping rule has an expected net
winnings of $7.509.
Note: The standard deviation of the net winnings is $2.566 for the
optimal stopping rule and $2.229 for the one-stage-look-ahead rule.
7.66 Let state (i, s) correspond to the situation that your accumulated reward is i dollars and the dice total in the last role is s. Define the
value-function V (i, s) is the maximal
achievable reward starting from
P
state (i, s). The goal is to find 12
V
s=2 (s, s) ps , where ps is the probability of getting a dice total of s in a single roll of the two dice. The
pk are given by pj = j−1
36 for 2 ≤ j ≤ 7 and pj = p14−j for 8 ≤ j ≤ 12.
The optimality equation is
12
i
h
X
V (i + k, k) pk
V (i, s) = max i,
k=2,k6=s
with the boundary condition V (j, k) = j for j ≥ M . Using backward
calculations the values V (s, s) and the optimal stopping rule are found.
Note: A heuristic rule, which is very close in performance to the optimal stopping rule, is the one-stage-look-ahead rule. The heuristic
rule prescribes to stop in the states (i, s) with i ≥ Ns and to continue otherwise, where the threshold values Ns are given by N2 = 250,
N3 = 123, N4 = 80, N5 = 58, N6 = 45, N7 = 42, N8 = 43, N9 = 54,
N10 = 74, N11 = 115, and P
N12 = 240. The critical level Ns is the
smallest integer i such that 12
k=2,k6=s k pk − i ps ≤ 0 or, equivalently,
7 − (i + s)ps ≤ 0.
7.67 Imagine that the balls are placed into the bins at times generated by
a Poisson process with rate 1. Then, a Poisson process with rate 1b
generates the times at which the ith bin receives a ball. Using the
independence of the Poissonian subprocesses and conditioning upon
the time that the ith bin receives its first ball, it follows that
Z ∞X
m
k b−1 1
1 (t/b)
1
e− b t
P (Ai ) =
e− b t dt.
k!
b
0
k=1
180
The sought probability is
bility is bb!b for m = 1.
Pb
i=1 P (Ai ).
As a sanity check, this proba-
7.68 Imagine that the bottles are bought at epochs generated by a Poisson
process with rate 1. Let T be the first time at which the required
numbers of the letters have been collected and let N be the number
of bottles needed. Then E(N ) = E(T ). The letters A, B, R, and S
are obtained at epochs generated by Poisson processes with respective
rates 0.15, 0.10, 0.40, and 0.35. Moreover, these PoissonR processes are
∞
independent of each other. Using the relation E(T ) = 0 [1 − P (T ≤
t)] dt, we find that the expected number of bottles needed to form the
payoff word is
Z ∞
1 − (1 − (1 + 0.15t + 0.152 t2 /2!)e−0.15t ) × (1 − (1 + 0.1t)e−0.1t )
0
× (1 − (1 + 0.4t)e−0.4t ) × (1 − e−0.35t ) dt = 26.9796.
7.69 Imagine that rolls of the two dice occur at epochs generated by a
Poisson process with rate 1. Let N be the number of rolls needed to
remove all tokens and T be the first epoch at whichR all tokens have
∞
been removed. Then, E(N ) = E(T ) and E(T ) = 0 P (T > t) dt.
Also,
T = max2≤j≤12 Tj ,
where Tj is the first epoch at which all tokens in section j have been
removed. The rolls resulting in a dice total of k occur according to
a Poisson process with rate pk and these Poissonian subprocesses are
independent of each other. The pk are given by pk = k−1
36 for 2 ≤ k ≤ 7
and pk = p14−k for 8 ≤ k ≤ 12. By the independence of the Tk ,
P (T ≤ t) = P (T2 ≤ t) · · · P (T12 ≤ t).
Also,
P (Tk > t) =
aX
k −1
j=0
e−pk t
(pk t)j
.
j!
Putting the pieces together and using numerical integration, we find
E(N ) = 31.922.
7.70 Imagine that purchases are made at epochs generated by a Poisson
process with rate 1. For any i = 1, . . . , n, a Poisson subprocess with
rate n1 generates the epochs at which a coupon of type i is obtained.
181
The Poisson subprocesses are independent of each other. Let T be
the first epoch at which two complete sets of coupons are obtained
and let N be the number of purchases needed to get two complete
of coupons. Then E(N ) = E(T ). Using the relation E(T ) =
Rsets
∞
0 [1 − P (T ≤ t)] dt, we find that the expected number of purchases
needed to get two complete sets of coupons is equal to
Z ∞h
n i
t
dt.
1 − 1 − e−t/n − e−t/n
n
0
This integral has the value 24.134 when n = 6.
7.71 Imagine that the rolls of the die occur at epochs generated by a Poisson
process with rate 1. Then, the times at which an odd number is rolled
are generated by a Poisson process with rate 21 and the times at which
the even number k is rolled are generated by a Poisson process with
rate 16 for k = 2, 4, and 6. These Poisson processes are independent
of each other. By conditioning on the first epoch at which an odd
number is rolled, we find that the sought probability is
Z ∞
3 1
e−(1/2)t dt = 0.05.
1 − e−t/6
2
0
7.72 Taking the model with replacement and mimicking the arguments used
in the solution of Example 7.13, we get that the sought probability is
Z ∞
5
5
10 20
e− 35 t dt = 0.6095.
1 − e− 35 t 1 − e− 35 t
35
0
7.73 Imagine that the rolls of the die occur at epochs generated by a Poisson
process with rate 1. Then, independent Poisson processes each having
rate µ = 16 describe the moves of the horses. The density of the sum of
r = 6−s1 independent exponentially distributed interoccurrence times
tr−1 −µt
each having expected value 1/µ is the Erlang density µr (r−1)!
e ,
see Section 4.5.1. Thus the win probability of horse 1 with starting
position s1 = 0 is
Z
0
4
∞ X
k=0
e
− 6t
3
5
k=0
k=0
(t/6)k 2 X − t (t/6)k 2 X − t (t/6)k 1 6 t5 − t
e 6
e 6
e 6 dt.
k!
k!
k!
6 5!
This gives the win probability 0.06280 for the horses 1 and 6. In the
same way, we get the win probability 0.13991 for the horses 2 and
182
5, and the win probability 0.29729 for the horses 3 and 4. To find
the expected duration of the game, let Ti be the time at which horse
i would reaches the finish when the game would be continued until
each horse has been finished. The expected duration
R ∞of the game is
equal
R ∞ to T = min(T1 , . . . , T6 ). Noting that E(T ) = 0 P (T > t) dt =
0 P (T1 > t) · · · P (T6 > t) dt, it follows that the expected duration of
the game is
Z ∞X
5
(t/6)k 6
e−t/6
dt = 19.737
k!
0
k=0
when each horse starts at panel 0.
7.74 The analysis is along the same lines as the analysis of Problem 7.73.
The probability of player A winning is
Z
0
3
∞ X
2
e−3t/9
k=0
(3t/9)k X −2t/9 (2t/9)l 4 5 t4 −4t/9
e
e
dt.
×
k!
l!
9 4!
l=0
Similarly, the other win probabilities. This leads to P (A) = 0.3631,
P (B) = 0.3364, and P (C) = 0.3005. The expected number of games
is given by
Z
0
4
∞ X
j=0
e−4t/9
3
2
k=0
l=0
(4t/9)j X −3t/9 (3t/9)k X −2t/9 (2t/9)l e
e
×
×
dt.
j!
k!
l!
This integral can be evaluated as 7.3644.
7.75 Imagine that cards are picked at epochs generated by a Poisson process
with rate 1. Let N be the number of picks until each card of some
of the suits has been obtained and let T be the epoch at which this
occurs. Then E(T ) = E(N ). Any specific card is picked at epochs
1
generated by a Poisson process with rate 20
. These Poisson processes
are independent of each other. Let Ti be the time until all cards of the
ith suit has been picked. The Ti are independent random variables
and T = min(T1 , T2 , T3 , T4 ). Since P (Ti > t) = 1 − (1 − e−t/20 )5 and
P (T > t) = P (T1 > t) · · · P (T4 > t), we get
Z ∞h
5 i4
1 − 1 − e−t/20
dt = 24.694.
E(T ) =
0
Therefore E(N ) = 24.694.
183
7.76 Imagine that a Poisson process with rate 1 generates the epochs at
which a ball is placed into a randomly chosen bin. Then, for any i =
1, . . . , b, the epochs at which the ith bin receives a ball are generated
by a Poisson subprocess with rate 1b . These Poisson processes are
independent of each other. Let T be the first time at which each bin
contains m or more balls and let N be number of balls needed until
each bin containsR at least m balls. Then E(N ) = E(T ). Using the
∞
relation E(T ) = 0 [1 − P (T ≤ t)] dt, we get
E(N ) =
Z
=b
0
∞h
Z
1−
0
∞h
∞
X
e−t/b
k=m
1− 1−
m−1
X
k=0
(t/b)k b i
dt
k!
e−u
uk b i
du.
k!
7.77 Writing xi − θ = xi − x + x − θ, it follows that
n
X
i=1
(xi − θ)2 = n(x − θ)2 +
Noting that
Pn
n
X
i=1
(xi − x)2 − 2(x − θ)
n
X
i=1
(xi − x).
i=1 (xi
− x) = 0, it follows that
√
1 Pn
2
2
L(x | θ) = (σ 2π)−n e− 2 i=1 (xi −θ) /σ
1
2
2
is proportional to e− 2 n(θ−x) /σ . Thus the posterior density f (θ | x) is
1
1
2
2
2
2
proportional to e− 2 n(θ−x) /σ f0 (θ), where f0 (θ) = σ√12π e− 2 (θ−µ0 ) /σ0 .
Next it is a matter of some algebra to find that the posterior density
is proportional to
1
2
2
e− 2 (θ−µp ) /σp ,
where µp and σp2 are equal to
µp =
σ02 (σ 2 /n)
σ02 x + (σ 2 /n)µ0
2
and
σ
=
.
p
σ02 + σ 2 /n
σ02 + σ 2 /n
In other words, the posterior density is the N (µp , σp2 ) density. Inserting
√
the data n = 10, σ = 2, µ0 = 73 and σ0 = 0.7, it follows that the
posterior density is maximal at θ∗ = µp = 73.356. Using the 0.025
and 0.975 percentiles of the standard normal density, a 95% Bayesian
credible interval for θ is
(µp − 1.960σp , µp + 1.960σp ) = (72.617, 74.095).
184
7.78 The IQ of the test person is modeled by the random variable Θ. The
posterior density of Θ is proportional to
1
e− 2 (123−θ)
2 /56.25
1
× e− 2 (θ−100)
2 /125
.
Using a little algebra to rewrite this expression, we get that the posterior density is a normal density with expected value µp = 115.862 and
standard deviation σp = 6.228 (the normal distribution is a conjugate
prior for a likelihood function of the form of a normal density, see also
Problem 7.77). The posterior density is maximal at θ∗ = 115.862. A
95% Bayesian credible interval for θ is
(µp − 1.960 × σp , µp + 1.960 × σp ) = (103.56, 128.01).
7.79 The posterior density is proportional to
1
e− 2 (t1 −θ)
2 /σ 2
1
× e− 2 (θ−µ0 )
2 /σ 2
0
,
where t1 = 140, σ = 20, µ0 = 150, and σ0 = 25 light years. Next
a little algebra shows that the posterior density is proportional to
1
2
2
e− 2 (θ−µp ) /σp , where
µp =
σ02 σ 2
σ02 t1 + σ 2 µ0
2
and
σ
=
.
p
σ02 + σ 2
σ02 + σ 2
This gives that the posterior density is a normal density with an expected value of µp = 143.902 light years and a standard deviation
of σp = 15.617 light years. The posterior density is maximal at
θ = 143.902. A 95% Bayesian credible interval for the distance is
(µp − 1.960σp , µp + 1.960σp ) = (113.293, 174.512).
7.80 The prior density f0 (θ) of the proportion of Liberal voters is
f0 (θ) = c θ474−1 (1 − θ)501−1 ,
where c is a normalization constant. The likelihood function L(E | θ)
is given by
110 527
L(E | θ) =
θ (1 − θ)573 .
527
The posterior density f (θ | E) is proportional to L(E | θ)f0 (θ) and so
it is proportional to
θ1,000 (1 − θ)1,073 .
185
In other words, the posterior density f (θ | E) is the beta(1,001, 1,073)
density. The posterior probability that the Liberal party will win the
election is
Z
1
0.5
f (θ | E) dθ = 0.5446.
A Bayesian 95% confidence interval for the proportion of Liberal voters
can be calculated as (0.4609, 0.5039).
7.81 The prior density of the parameter of the exponential lifetime of a
light bulb is f0 (θ) = c θα−1 e−λθ , where c is a normalization constant.
Let E be the event that light bulbs have failed at times t1 < · · · < tr
and m − r light bulbs are still functioning at time T . The likelihood
function L(E | θ) is defined as
m
r! θr e−[t1 +···+tr +(m−r)T ]θ
r
(the rationale of this definition is the probability that one light bulb
fails in each of the infinitesimal intervals (ti − 12 ∆, ti + 12 ∆) and m − r
light bulbs are still functioning at time T ). The posterior density
f (θ | E) is proportional to
θα+r−1 e−[λ+t1 +···+tr +(m−r)T ]θ .
In other words, the posterior density f (θ | E) is aP
gamma density with
shape parameter α + r and scale parameter λ + ri=1 ti + (m − r)T .
186
Chapter 8
8.1 (a) The binomial random variable X can be represented as X = X1 +
· · · + Xn , where the Xi are independent with P (Xi = 0) = 1 − p and
P (Xi = 1) = p. Since E(z Xi ) = 1 − p + pz, Rule 8.2 gives that
GX (z) = (1 − p + pz)n .
The negative binomial random variable Y can be represented as Y =
Y1 +· · ·+Yr , where the Yi are independent with P (Yi = k) = pk−1 (1−p)
for k ≥ 1. Since E(z Yi ) = pz/(1 − (1 − p)z), Rule 8.2 gives that
GY (z) =
r
pz
.
1 − (1 − p)z
(b) For the binomial distribution, G′X (1) = np and G′′X (1) = n(n −
1)p2 , implying that
E(X) = np and σ 2 (X) = np(1 − p).
For the negative binomial distribution, G′Y (1) = r/p and G′′Y (1) =
(r(1 − 2p) + r2 )/p2 , and thus
E(Y ) =
r
r(1 − p)
and σ 2 (Y ) =
.
p
p2
8.2 Put for abbreviation pn = P (X = n). By the definition of GX (z), we
have
∞
∞
X
X
p2n+1 .
p2n −
GX (−1) =
n=0
n=0
P∞
P∞
Also, n=0 p2n + P
n=0 p2n+1 = 1. Adding these two equations, we
get GX (−1) + 1 = 2 ∞
n=0 p2n , showing the desired result.
8.3 By Rule 8.2, the generating function of the total score S is given by
1 2 1 3 1 4 1 5 n
z+ z + z + z + z .
GS (z) =
3
6
6
6
6
1
Since GS (−1) = (− 31 )n , the sought probability is 12 [(− 13 )n + 1].
187
8.4 (a) Denote by Xk the kth sample from the continuous probability
distribution. Obviously, P (Ik = 1) = P Xk = max(X1 , . . . , Xk ) and
so, by a symmetry argument,
P (Ik = 1) =
1
k
for k = 1, . . . , n.
By the chain rule for conditional probabilities, P (Ii1 = 1, . . . , Iir = 1)
is equal to
P (Iir = 1)×P (Iir−1 = 1 | Iir = 1)×· · ·×P (I1 = 1 | Iir = 1, . . . , Ii2 = 1).
Also, we have that
1
P (Iik = 1 | Iir = 1, . . . , Iik+1 = 1) = P Xik = max(X1 , . . . , Xik ) = .
ik
Hence, P (Ii1 = 1, . . . , Iir = 1) = i1r × · · · × i11 . This proves that, for all
1 ≤ i1 < · · · < ir ≤ n and 1 ≤ r ≤ n,
P (Ii1 = 1, . . . , Iir = 1) = P (Ii1 = 1) × · · · × P (Iir = 1).
Next it will be shown that P (Ii1 = δi1 , . . . , Iir = δir ) = P (Ii1 = δi1 ) ×
· · · × P (Iir = δir ) for all 1 ≤ i1 < . . . < ir ≤ n and δi1 , . . . , δir ∈ {0, 1}.
The proof is by induction on l, where l is the number of δik with the
value zero. Take l = 1 and suppose for ease that δi1 = 0. Then,
P (Ii1 = 0, Ii2 = 1, . . . , Iir = 1) is equal to
P (Ii2 = 1, . . . , Iir = 1) − P (Ii1 = 1, Ii2 = 1, . . . , Iir = 1)
= P (Ii2 = 1) × · · · × P (Iir = 1) − P (Ii1 = 1) × · · · × P (Iir = 1)
= 1 − P (Ii1 = 1) × P (Ii2 = 1) × · · · × P (Iir = 1)
= P (Ii1 = 0) × P (Ii2 = 1) × · · · × P (Iir = 1),
as was to be verified. Continuing in this way, we finally get that
P (I1 = δ1 , . . . , In = δn ) = P (I1 = δ1 ) × · · · × P (In = δn )
for all δ1 , . . . , δn ∈ {0, 1}, showing that I1 , I2 , . . . , In are independent.
(b) The number of record draws is distributed as
R = I1 + · · · + Ir ,
For each k, we have P (Ik = 1) = k1 and P (Ik = 0) = 1 − k1 . The
generating function of the random variable Ik is given by
1−
1 1
+ z
k k
for k = 1, . . . , r.
188
The random variables I1 , . . . , Ir are independent. Hence, by the convolution rule, the generating function of R is given by
1 1 1 1 GR (z) = z 1 − + z · · · 1 − + z .
2 2
r r
Using mathematical software for the multiplication of polynomials, we
find that for r = 10 the generating function GR (z) is given by
1
7 129 2 1 303 3
4 523 4
19 5
3 013 6
z+
z +
z +
z +
z +
z
10
25 200
4 032
22 680
256
172 800
29
1
1
1 7
z +
z8 +
z9 +
z 10 .
+
384
120 960
80 640
3 628 800
Denoting by pk the probability of exactly k records in 10 samples, we
7 129
1
1
1
, p2 = 25
have p1 = 10
200 , . . ., p9 = 80 640 , and p10 = 3 628 800 .
8.5 We have GX+Y (z) = e−µ(1−z) , where µ = E(X + Y ). By the independence of X and Y ,
GX+Y (z) = GX (z)GY (z).
Since X and Y have the same distribution, GX (z) = GY (z). Thus
[GX (z)]2 = e−µ(1−z) and so
1
GX (z) = e− 2 µ(1−z) .
The generating function uniquely determines the probability mass
function. Thus X and Y are Poisson distributed with mean 21 µ.
Note: The assumption that X and Y are identically distributed can be
dropped, but this requires deep analysis using characteristic functions.
8.6 By conditioning on N , we have
S
E(z ) =
∞
X
n=0
E(z S | N = n)P (N = n)
0
= z P (N = 0) +
∞
X
E(z X1 +···+Xn )P (N = n).
n=1
Using the convolution rule and the assumption that X1 , X2 , . . . are
independent random variables with generating function A(z), it follows
that
∞
X
µn
[A(z)]n e−µ
GS (z) =
= e−µ[1−A(z)] .
n!
n=0
189
Taking the first two derivatives of GS (z) at z = 1 gives the expressions
for E(S) and var(S).
8.7 Since E z N (t+h) = E z N (t+h)−N (t) z N (t) and N (t + h) − N (t) is independent of N (t), we have
gz (t + h) = E z N (t+h)−N (t) gz (t) = (1 − λh) + λhz + o(h)
for h → 0. This leads to gz (t+h)−gz (t) /h = −λ(1−z)gz (t)+o(h)/h
as h → 0, and so
∂
gz (t) = −λ(1 − z)gz (t) for t > 0.
∂t
Together with gz (t) = 1 for z = 1, this gives
gz (t) = e−λt(1−z) ,
showing that the generating function of N (t) is given by the generating
function of a Poisson distributed random variable with expected value
λt. By the uniqueness property of the generating function, it follows
that N (t) is Poisson distributed with expected value λt.
8.8 Let N be the outcome of the first roll of the die. Also, let X1 , X2 , . . .
be independent random variables each having the uniform distribution
on 1, 2, . . . , 6. The sum S of the face values of the simultaneous roll of
the dice is distributed as X1 +· · ·+XN . The conditional distribution of
X1 +· · ·+XN given that N = k is the same as the unconditional distribution of X1 + · · · + Xk . Applying the law of conditional expectation,
we get that the generating function of S is given by
GS (z) = E z
=
1
6
X1 +···+XN
6 X
k=1
6
X
E z X1 +···+XN | N = k P (N = k)
=
k=1
1
1
1
1
1 k
1
z + z2 + z3 + z4 + z5 + z6 .
6
6
6
6
6
6
This generating function can be expanded as a polynomial by using
standard mathematical software. Letting pk = P (S = k), we have
1
7
49
343
2 401
, p2 =
, p3 =
, p4 =
, p5 =
,
36
216
1 296
7 776
46 656
16 807
493
4 169
3 269
p6 =
, p7 =
, p8 =
, p9 =
,
279 936
11 664
93 312
69 984
2 275
749
749
, p11 =
, p12 =
.
p10 =
15 552
46 656
15 552
p1 =
190
The sought probability is
P12
k=1 pk
= 0.5323.
8.9 Using first-step analysis, we get E(z X ) = pzE(z X ) + q + rE(z X ). This
leads to
∞
X
q
.
P (X = k)z k =
1 − pz − r
k=0
Writing q/(1 − pz − r) as q/(1 − r) / 1 − pz/(1 − r) and using the
expansion
∞
q/(1 − r)
q X p k k
z ,
=
1 − pz/(1 − r)
1−r
1−r
k=0
for |z| < (1 − r)/p, we obtain by equating terms that
q p k
P (X = k) =
for k = 0, 1, . . . .
1−r 1−r
8.10 Define the random variable X as the sum of the integers that are generated. Let the random variable I denote the first integer generated.
Then, the conditional distribution of X given that I = i with i 6= 0 is
the same as the unconditional distribution of i + X. Thus, by the law
of conditional expectation,
X
E(z ) =
9
X
i=0
9
E(z
X
1 X
1
+
E(z i+X ).
| I = i)P (I = i) =
10 10
i=1
It can now be seen that GX (z) = E(z X ) is given by
GX (z) =
1/10
.
P
1 − (1/10) 9i=1 z i
An alternative derivation of this formula is as follows. ThePrandom
N −1
variable X can be represented as the random sum X =
i=1 Yi ,
1
where N has a geometric distribution with probability p = 10
and
Y1 , Y2 , . . . are independent random variables that have a discrete uniform distribution on 1, . . . , 9. Also, the Yi are independent of N . Next,
by repeating the analysis for Problem 8.6, we get the result.
Note: The first and second derivatives of GX (z) at z = 1 have the
values G′X (1) = 45 and G′′X (1) = 4,290. This gives E(X) = 45
and σ(X) = 48.06. By numerical inversion of GX (z) (using the Fast
Fourier Transform), we find that P (X > 10k) has the numerical values
0.9000, 0.7506, 0.6116, 0.4967, 0.4035, 0.3278, 0.2663, 0.2163, 0.1757,
0.1427, and 0.1159 for k = 0, 1, . . . , 10.
191
8.11 Conditioning on the outcome of the first toss, we get
1
1
E(z X ) = zE(z X1 ) + zE(z X2 ).
2
2
The random variable X1 is equal to r − 1 if the next r − 1 tosses give
heads and X1 is distributed as k + X2 for 1 ≤ k ≤ r − 1 if the next
k − 1 tosses give heads and are followed by tails. Therefore
E(z X1 ) =
1 r−1
2
z r−1 +
r−1 X
1 k
k=1
2
z k E(z X2 ).
Since E(z X2 ) = E(z X1 ), we have E(z X1 ) = ( 21 z)r−1 /[1 −
This leads to
r
2 21 z
.
E(z X ) =
Pr−1 1 k
z
1 − k=1
2
Pr−1
1 k
k=1 ( 2 z) ].
Taking the derivatives of E(z X ) and putting z = 1, we get
E(X) = 2r − 1 and var(X) = 2r (2r − 2r + 1) − 2.
8.12 Since limz→1 G′X (z) = ∞, we have E(X) = ∞. The probability mass
function of X has a long tail. It is interesting to give some numerical
values for the tail probability P (X > n). The tail probability has the
values 0.3125, 0.2460, 0.1550, 0.1123, 0.0796, 0.0504, 0.0357, 0.0252,
0.0160, 0.0113, 0.0092, and 0.0080 for n = 5, 10, 25, 50, 100, 250, 500,
1000, 2,500, 5,000, 7,500 and 10,000.
8.13 The extinction probability is the smallest root of the equation u =
P (u), where the generating function P (u) is given by
P (u) =
p
1 − (1 − p)u
for |u| ≤ 1.
p
and u = 1. The
The equation u = P (u) has the two roots u = 1−p
p
1
extinction probability is 1−p if p < 2 and is 1 otherwise.
10
8.14 The offspring distribution is given by p0 = 51 + 45 ( 13 × 14 + 23 × 18 ) = 30
,
4 1
1
2
3
10
4 1
1
2
3
8
p1 = 5 ( 3 × 2 + 3 × 8 ) = 30 , p2 = 5 ( 3 × 4 + 3 × 8 ) = 30 , and p3 =
2
1
2
4
5 × 3 × 8 = 30 . The generating function of the offspring distribution
is
1 1
4
1
P (u) = + u + u2 + u3 .
3 3
15
15
192
√
The equation P (u) = u has the roots u1 = 1, u2 = 21 (5 + 45),
√
and u3 = 21 (5 − 45) (the equation P (u) = u can be factorized as
1 2
u + 13 u − 13 ) = 0). The desired probability is u∞ = 0.8541.
(u − 1)( 15
8.15 The generating function of the offspring distribution is
P (u) =
1 2 2
+ u .
3 3
(a) To find u3 , iterate un = P (un−1 ) starting with u0 = 0. This gives
2
11
11
, and u3 = P ( 27
) =
u1 = P (0) = 13 , u2 = P ( 31 ) = 13 + 23 31 = 27
2
1
2 11
= 0.4440.
3 + 3 27
(b) The equation u = 13 + 32 u2 has the roots u = 1 and u = 21 . The
probability u∞ = 21 .
(c) The probabilities are u23 = 0.1971 and u2∞ = 0.25.
1
ebt − eat for all t, where MX (0) = 1.
8.16 We have MX (t) = (b−a)t
8.17 Since the uniform random variable on (0, 1) has the moment-generating
function
Z 1
et − 1
,
etu du =
t
0
it is plausible that X takes on the value 0 with probability 21 and is
uniformly distributed on (0, 1) with probability 21 . Indeed, for such a
t
mixed random variable X, we have E(etX ) = 21 + 12 e −1
t .
8.18 We have
0
Z ∞
1 ax
1
MX (t) =
e a e dx +
etx ae−ax dx
2
2
−∞
0
Z ∞
Z ∞
a
e−(a−t)x dx .
e−(a+t)y dy +
=
2 0
0
Z
tx
Therefore MX (t) is only defined for −a < t < a and is given by
a
1
1
.
MX (t) =
+
2 a+t a−t
′ (0) = 0 and M ′′ (0) =
By MX
X
2
,
a2
we have E(X) = 0 and var(X) =
2
.
a2
8.19 Since the random
Pn variables Xi are independent, the moment-generating
function of i=1 Xi is
λ α1
λ αn λ α1 +···+αn
=
.
···
λ−t
λ−t
λ−t
193
This proves the desired result by the uniqueness property of the momentgenerating function.
8.20 By the independence of X and Y , we have MX+Y (t) = MX (t)MY (t).
If the random variable X +Y is N (µ, σ 2 ) distributed, then MX+Y (t) =
1 2 2
eµt+ 2 σ t . Since X and Y are identically distributed, we have MX (t) =
MY (t) and so
1 2 2
[MX (t)]2 = eµt+ 2 σ t .
Hence we find that
1
MX (t) = e(µ/2)t+ 2 (σ
2 /2)t2
.
This is the moment-generating function of an N ( 12 µ, 12 σ 2 ) distributed
random variable. The moment-generating function uniquely determines the underlying probability distribution. Thus, both X and Y
are N ( 12 µ, 12 σ 2 ) distributed.
Note: The assumption that X and Y are identically distributed can
be dropped, but this requires deep analysis.
R∞
8.21 The definition MX (t) = −∞ etx ex /(1 + ex )2 dx reveals that MX (t) is
finite only for −1 < t < 1. Using the change of variable u = 1/(1 + ex )
1−u
x
x 2
x
and noting the relations du
dx = −e /(1 + e ) and e = u , it readily
follows that
Z 1
1 − u t
MX (t) =
du for − 1 < t < 1.
u
0
This integral is the beta-integral, see Section 4.6.3. Thus
MX (t) = Γ(−t + 1)Γ(t + 1)
for − 1 < t < 1,
where Γ(x) is the gamma function. Since at = et ln(a) has as derivative
ln(a)at , we get from the integral representation of MX (t) that
Z 1 1 − u 1 − u t
′
du.
ln
MX
(t) =
u
u
0
Thus
′
MX
(0)
=
Z
1
0
[ln(1 − u) − ln(u)] du = 0.
In the same way, we find
Z 1
π2
′′
[ln(1 − u) − ln(u)]2 du =
MX (0) =
,
3
0
showing that E(X) = 0 and σ 2 (X) =
π2
3 .
194
8.22 Let Y = 1 − X. Then, by the assumption MX (t) = et MX (−t), we get
MY (t) = E(et(1−X) ) = et MX (−t) = MX (t).
Thus 1 − X has the same distribution as X. This implies E(1 − X) =
E(X) and so E(X) = 0.5. The distribution of X is not uniquely
determined: 1 − X has the same distribution both for a continuous
random variable X that is uniformly distributed on (0, 1) and for a
discrete random variable X with P (X = 0) = P (X = 1) = 0.5.
8.23 (a) Using the decomposition formula for the standard bivariate normal density function
RR f (x, y) in Section 6.1 and the basic formula
P ((X, Y ) ∈ C) = C f (x, y) dx dy, the moment-generating function
E(esX+tY ) can be evaluated as
Z ∞
Z ∞
1
1
1
2
2
sx 1
− 12 x2
ety √ p
dx
e √ e
e− 2 (y−ρx) /(1−ρ ) dy.
2π
2π 1 − ρ2
−∞
−∞
The inner integral can be interpreted as E(etW ) with N (ρx, 1 − ρ2 )distributed W , and so, using Example 8.5, the inner integral reduces
1
2 2
to eρxt+ 2 (1−ρ )t . Thus we get
Z ∞
1
1 2
1
(1−ρ2 )t2
sX+tY
2
e(s+ρt)x √ e− 2 x dx.
E(e
)=e
2π
−∞
The latter integral can be interpreted as E(e(s+ρt)Z ) with N (0, 1)1
2
distributed Z and is thus equal to e 2 (s+ρt) . Putting the pieces together, we have
1 2
2
E(esX+tY ) = e 2 (s +2ρst+t ) ,
as was to be verified.
(b) It suffices to verify the assertion for N (0, 1) distributed X and Y .
Let ρ = ρ(X, Y )(= cov(X, Y )). Using the assumption and Rule 5.12,
the random variable aX + bY is N (0, a2 + 2abρ + b2 ) distributed for
any constants a, b. Then, by Example 8.5 with t = 1,
1
E(eaX+bY ) = e 2 (a
2 +2abρ+b2 )
for all a, b.
This proves the desired result with an appeal to the result of part (a)
and the uniqueness property of the moment-generating function.
195
Chapter 9
9.1 Let X be the total time needed for both tasks. Then E(X) = 45 and
σ 2 (X) = 65. Write the probability P (X < 60) as 1 − P (X ≥ 45 + 15).
The one-sided Chebyshew’s inequality gives
P (X < 60) ≥ 1 −
65
= 0.7759.
65 + 225
9.2 By the two-sided Chebyshew’s inequality,
P (|X − µ| ≤ kσ) ≥ 1 −
√
Thus choose k ≥ 1/ 1 − β.
1
σ2
= 1 − 2.
2
2
k σ
k
9.3 The moment-generating function of a random variable X that is uni1
(et − e−t ) for all t. Put for
formly distributed on (−1, 1) is MX (t) = 2t
abbreviation X n = n1 (X1 + · · · + Xn ). Since the Xi are independent,
h 1
in
et/n − e−t/n .
MX n (t) =
2t/n
By Chernoff’s bound, P (X n ≥ c) ≤ mint>0 e−ct MX n (t). Using the
2
inequality 21 (eu − e−u ) ≤ ueu /6 for u > 0, we get
P (X n ≥ c) ≤ min e−ct et
2 /6n
t>0
The function e−(ct−t
sired bound.
2 /6n)
.
is minimal for t = 3cn, which gives the de-
9.4 (a) We have
E(etXi ) = E
∞ n n
X
t X
i
n=0
n!
=
∞ n
X
t
n=0
n!
E(Xin ).
The interchange of the order of expectation
n and
summation is justified
P∞
t Xin
by the absolute convergence of n=1 E n! . Since E(Xi ) = 0 and
E(Xin ) ≤ B n−2 E(Xi2 ) = B n−2 σi2 for n ≥ 2, we get
E(e
tXi
∞
σ2
σi2 X tn B n
= 1 + i2 (etB − 1 − tB)
)≤1+ 2
B
n!
B
n=2
≤e
σi2 (etB −1−tB)/B 2
,
196
where the last inequality uses the fact that 1 + x ≤ ex for x > 0.
(b) Using Rule 9.3 and Rule 8.5, we have
P
n
1 X
n
i=1
Xi ≥ c ≤ mint>0 [e−nct MX1 (t) · · · MXn (t)]
≤ mint>0 [e(−nct+nσ
2 (etB −1−tB)/B 2 )
],
P
where σ 2 = n1 ni=1 σi2 . The minimizing value of t is t = B1 ln 1 + cB
,
σ2
2
tB
2
as follows by putting the derivative of −nct + nσ (e − 1 − tB)/B
equal to zero. Next it is matter of some algebra to get the desired
result
n
1 X
nc2
−
Xi ≥ c ≤ e 2σ2 +2Bc/3 for c > 0.
P
n
i=1
P
9.5 The random variable X is distributed as ni=1 Xi , where the Xi are
independent with P (Xi = 1) = p and P (Xi = 0) = 1 − p. This gives
MX (t) = (pet + 1 − p)n . By Chernoff’s bound,
P X ≥ np(1 + δ) ≤ min[e−np(1+δ)t (pet + 1 − p)n ].
t>0
Let g(t) = e−p(1+δ)t (pet + 1 − p). Putting the derivative of g(t) equal
to zero, it follows that the function g(t) takes on its absolute minimum
for t = ln(γ) with
(1 − p)(1 + δ)
γ=
.
1 − p(1 + δ)
This leads to the upper bound
pγ + 1 − p n
.
P X ≥ np(1 + δ) ≤
γ p(1+δ)
Next it is matter of some algebra to obtain the first bound. To get
the other bound, note that f (a) = a ln( ap ) + (1 − a) ln( 1−a
1−p ) has the
derivatives
a
1 − a
1
f ′ (a) = ln
− ln
, f ′′ (a) =
for 0 ≤ a < 1.
p
1−p
a(1 − a)
Next use Taylor’s formula f (a) = f (p) + (a − p)f ′ (p) + 2!1 (a − p)2 f ′′ (ηa )
for some ηa with p < ηa < a. Since f (p) = f ′ (p) = 0 and η(1 − η) ≤ 14
for 0 < η < 1, we obtain f (a) ≥ 2(a − p)2 = 2δ 2 p2 for p < a < 1,
which gives the second bound.
197
9.6 Using Rule 8.5 and the arithmetic-geometric mean inequality, we have
E(etX ) = (1 − p1 + p1 et ) × · · · × (1 − pn + pn et )
n
h1 X
in
≤
(1 − pi + pi et ) = (pet + 1 − p)n ,
n
i=1
Pn
where p = n1 i=1 pi . The rest of the proof is identical to the proof
for Problem 9.5.
R1
P
9.7 Since ln(Pn ) = n1 nk=1 ln(Xk ) and E(ln(Xk )) = 0 ln(x)dx = −1, it
follows from the strong law of large numbers that
P ({ω : limn→∞ ln(Pn (ω)) = −1}) = 1.
This implies that P ({ω : limn→∞ Pn (ω) = e−1 }) = 1.
9.8 For
P∞ fixed ǫ > 0, let An = {ω : |Xn (ω) − X(ω)| > ǫ}. Then, by
n=1 P (An ) < ∞, it follows from the first Borel-Cantelli lemma that
P ({ω : |Xn (ω) − X(ω)| > ǫ for infinitely many n} = 0.
This holds for any ǫ > 0. Thus P ({ω : limn→∞ Xn (ω) = X(ω)}) = 1.
9.9 Fix
P∞ǫ > 0. Let An = {ω : |Xn (ω)| > ǫ}. Suppose to the contrary that
n=1 P (An ) = ∞. Then, by the second Borel-Cantelli lemma,
P ({ω : ω ∈ An for infinitely many n} = 1.
This contradicts
P the assumption that P ({ω : limn→∞ Xn (ω) = 0}) =
1. Therefore ∞
n=1 P (An ) < ∞.
9.10 By the definition of Xk , we have P ({ω : limk→∞ Xk (ω) = 0}) = 1.
Therefore Xk converges almost surely to 0. However,
E(Xk ) =
∞
X
l=k+1
∞
X
1
c
=∞
l 2 =c
l
l
l=k+1
for any k ≥ 1.
9.11 By Markov’s inequality,
P (|Xn − X| > ǫ) = P (|Xn − X|2 > ǫ2 ) ≤
E(|Xn − X|2
ǫ2
for each ǫ > 0. Therefore limn→∞ P (|Xn − X| > ǫ) = 0 for each ǫ > 0,
showing convergence in probability. A counterexample showing that
198
convergence in probability does not imply convergence in mean square
is provided by Problem 9.10.
Note: Mean-square convergence is neither stronger nor weaker than
almost sure convergence.
9.12 Since the Xk are uncorrelated, we have E[(Xi − µ)(Xj − µ)] = 0 for
j 6= i. Therefore E (Mn − µ)2 can be evaluated as
n
n
n X
n
i 1 X
X
1 hX
E (Xi − µ)(Xj − µ)
(X
−
µ)
=
(X
−
µ)
E
j
i
2
2
n
n
j=1
i=1
=
1
n2
i=1 j=1
n
X
i=1
nσ 2
σ2
E (Xi − µ)2 = 2 =
,
n
n
which shows that E (Mn − µ)2 converges in mean square to µ.
P
9.13 Since the Xi are independent, E(Yk ) = µ2 and so E n1 nk=1 Yk = µ2 .
By Chebyschev’s inequality,
Pn
n
σ2
1X
k=1 Yk
2
Yk − µ > ǫ ≤
P
n
n2 ǫ 2
k=1
for each ǫ > 0. By the independence of the Xk , we have cov(Yi , Yj ) =
µ4 − µ4 = 0 for j > i + 1. Thus
σ2
n
X
k=1
n−1
n
X
X
σ 2 (Yk ) + 2
cov(Yi , Yi+1 ).
Yk =
k=1
i=1
2 )−µ4 and cov(Y , Y
2
2
We have σ 2 (Yk ) = E(Xk2 )E(Xk+1
i i+1 ) = µ E(Xi+1 )−
4
2
µ . Thus, by the boundedness of the σ (Xi ), there is a constant c > 0
such that
n
X
2
Yk ≤ nc for each n ≥ 1.
σ
k=1
1
n
Pn
2
Then, we get P
k=1 Yk − µ > ǫ tends to 0 as n → ∞, as was
to be proved.
P∞
9.14 Since E
n=1 Yn < ∞, we have
∞
X
Yn (ω) < ∞} = 1.
P {ω :
n=1
199
If
P∞
n=1 Yn (ω)
< ∞, then limn→∞ Yn (ω) = 0. Therefore
P {ω : lim Yn (ω) = 0} = 1.
n→∞
9.15 We have
P (Xn ≤ x) = P (Xn ≤ x, |Xn − X| ≤ ǫ) + P (Xn ≤ x, |Xn − X| > ǫ)
≤ P (X ≤ x + ǫ) + P (|Xn − X| > ǫ).
Letting n → ∞ and using the assumption that Xn converges in probability to X, we get
lim P (Xn ≤ x) ≤ P (X ≤ x + ǫ)
n→∞
for any ǫ > 0.
Next, letting ǫ → 0, we obtain
lim P (Xn ≤ x) ≤ P (X ≤ x)
n→∞
when x is a continuity point of P (X ≤ x). Next we will verify that
limn→∞ P (Xn ≤ x) ≥ P (X ≤ x) when x is a continuity point of
P (X ≤ x). Interchanging the roles of Xn and X in the above argument, we get
P (X ≤ x) = P (X ≤ x, |Xn − X| ≤ ǫ) + P (X ≤ x, |Xn − X| > ǫ)
≤ P (Xn ≤ x + ǫ) + P (|Xn − X| > ǫ).
Thus, by the assumption that Xn converges in probability to X,
P (X ≤ x) ≤ lim P (Xn ≤ x + ǫ).
n→∞
Replacing x by x − ǫ, we obtain
lim P (Xn ≤ x) ≥ P (X ≤ x − ǫ)
n→∞
for any ǫ > 0.
Therefore limn→∞ P (Xn ≤ x) ≥ P (X ≤ x) when x is a continuity
point of P (X ≤ x). This completes the proof.
P
P
9.16 Since nk=1 Ik ≤ 1, we have Sn2 ≥ Sn2 nk=1 Ik and so
E(Sn2 ) ≥ E
n
X
k=1
n
X
E(Sn2 Ik ).
Sn2 Ik =
k=1
200
Thus, writing Sn2 as Sk2 + 2Sk (Sn − Sk ) + (Sn − Sk )2 , we get
E(Sn2 )
≥
=
n
X
n
X
E Sk2 + 2Sk (Sn − Sk ) + (Sn − Sk )2 Ik
=
E(Sn2 Ik )
k=1
k=1
n
X
E(Sk2 Ik )
+
n
X
k=1
k=1
E[2Sk (Sn − Sk )Ik ] +
n
X
k=1
E[(Sn − Sk )2 Ik ]
Note that E[Sk (Sn −Sk )Ik ] = E(Sk Ik )E(Sn −Sk ), by the independence
of the Xj . Since E(Xj ) = 0 for all j, we have E(Sn − Sk ) = 0. Also,
E[(Sn − Sk )2 Ik ] ≥ 0. Next, using the fact that the events Ak are
disjoint, we find
E(Sn2 )
≥
n
X
E(Sk2 Ik )
k=1
2
≥
n
X
2
c P (Ik = 1) = c
k=1
2
n
X
2
P (Ak ) = c P
k=1
= c P max1≤k≤n |Sk | ≥ c ,
n
[
Ak
k=1
where the second inequality uses the fact that
E(Sk2 Ik ) =E(Sk2 | Ik = 1)P (Ik = 1) + E(Sk2 | Ik = 0)P (Ik = 0)
≥ E(Sk2 | Ik = 1)P (Ik = 1) ≥ c2 P (Ik = 1).
This completes the proof of Kolmorogov’s inequality. The result can
be rewritten as
n
1 X
2
E(Sn ) ≥ 2
var(Xk ).
c
k=1
This follows from the assumption that the Xk are independent and
satisfy E(Xk ) = 0 for all k, implying that E(Xi Xj ) = E(Xi )E(Xj ) =
0 for i 6= j and E(Xk2 ) = var(Xk ). Therefore,
E(Sn2 ) = E
n
X
k=1
=
n
X
k=1
9.17 Since
Rx
1
0 1+y 2
n
n−1
n
X
X X
E(Xi Xj )
E(Xk2 ) + 2
Xk2 =
E(Xk2 ) =
k=1
n
X
i=1 j=i+1
var(Xk ).
k=1
dy = arctg(x), we have P (Xi ≤ x) =
2
π
x ≥ 0. Since the Xi are independent, P Mnn ≤ x
nx) · · · P (Xn ≤ nx). Therefore
2
n M
2
1 n
n
≤x =
arctg(nx) = 1 − arctg
P
n
π
π
nx
arctg(x) for
= P (X1 ≤
for x > 0.
201
1
| < 1 for n large enough. Using the
For any fixed x, we have that | nx
power series expansion of arctg(y) for |y| < 1, it follows that
M
2
2 n
n
lim P
= e− πx for x > 0.
≤ x = lim 1 −
n→∞
n→∞
n
πxn
9.18 The assumption E(Xi4 ) < ∞ implies that E(|Xi |k ) < ∞ for k = 1, 2,
and 3. This follows from the inequality |x|k ≤ 1+x4 for all x and k = 1,
2, and 3. It suffices to prove the strong law of large numbers under
the assumption that E(Xi ) = 0; otherwise, replace Xi by Xi − E(Xi ).
We first verify that, for some constant c,
E (X1 + · · · + Xn )4 ≤ cn2 for all n ≥ 1.
To verify this, note that (X1 +· · ·+Xn )4 is the sum of terms Xi4 , Xi2 Xj2 ,
Xi3 Xj , Xi2 Xj Xk , and Xi Xj Xk Xl , where i, j, k, and l are different. By
the independence of X1 , . . . , Xn and the assumption E(Xj ) = 0, we
have E(Xi3 Xj ) = E(Xi3 )E(Xj ) = 0. Similarly, E(Xi2Xj X
k ) = 0, and
E(Xi Xj Xk Xl ) = 0. There are n terms E(Xi )4 and n2 42 = 3n(n − 1)
terms E(Xi2 Xj2 ). Therefore
h
i
E (X1 + · · · + Xn )4 = nE(X12 ) + 3n(n − 1)E 2 (X12 ),
showing that E (X1 + · · · + Xn )4 is bounded by cn2 for some constant
c. This implies
∞
∞
X
X
E (X1 + · · · + Xn )4
1
≤c
< ∞.
4
n
n2
n=1
n=1
Next, by the result of Problem 9.14, n14 (X1 + · · · + Xn )4 converges
almost surely to 0, which implies n1 (X1 + · · · + Xn ) converges almost
surely to 0, as was to be proved.
9.19 The Kelly betting fraction suggests to stake 2.7% of your current
bankroll each time.
9.20 Using the relation Vn = (1 − α + αR1 ) · · · (1 − α + αRn )V0 , the asymptotic growth rate is now given by
E[ln(1 − α + αR1 )] = p ln(1 − α + αf1 ) + (1 − p) ln(1 − α + αf2 ).
Putting the derivative of this expression with respect to α equal to
zero leads to the formula for α∗ . Next the formula for the asymptotic
202
rate of return follows.
Note: In the case that there is a rate of return r on the non-invested
part of your bankroll, the expression for α∗ becomes
α∗ = min
(pf + (1 − p)f )(1 + r) − (1 + r)2 1
2
,1 .
(f1 + f2 )(1 + r) − f1 f2 − (1 + r)2
9.21 Denote by Vk your bankroll after k bets. Then, by the same arguments
as
Pnused in the derivation of the Kelly betting fraction, ln(Vn /V0 ) =
i=1 ln(1 − α + αRi ), where the Ri are independent random variables
with P (Ri = f1 ) and P (Ri = f2 ) = 1−p. By the central limit theorem,
ln(Vn /V0 ) is approximately N (µα , σα2 ) distributed for n large enough,
where
µα = p ln(1 − α + f1 α) + (1 − p) ln(1 − α + f2 α)
σα2 = p ln2 (1 − α + f1 α) + (1 − p) ln2 (1 − α + f2 α) − µ2α .
Next it readily follows that, for large n,
ln(x/V ) − nµ 0
α
√
.
P (Vn > x) ≈ 1 − Φ
σα n
5
For p = 0.5, f1 = 1.8, f2 = 0.4, n = 52, and α = 24
, the normal
approximation yields the values 0.697, 0.440, and 0.150 for x/V0 = 1,
2, and 5. A simulation study with 1 million runs yields the values
0.660, 0.446, and 0.167. For α = 1, the probability is about 0.5 that
your bankroll will be no more than 1.826 ×0.426 ×10,000 = 1.95 dollars
after 52 weeks. The intuitive explanation is that in the most likely
scenario a path will unfold in which the stock price raises during half
of the time and falls during the other half of the time.
9.22 Let the random variable St be equal to 1 if the shuttle is on the way
from the hotel to the airport and be equal to 2 if the shuttle is on the
way from the airport to the hotel. The stochastic process {St , t ≥ 0}
is regenerative. The epochs at which the shuttle departs from the
hotel can be taken as the regeneration epochs. A cycle consists of a
trip from the hotel to the airport directly followed by a trip from the
airport to the hotel. Imagine that a reward at rate 1 is earned when
the shuttle is on the way from the hotel to the airport. If d is the
203
distance in miles between the airport and the hotel, then
d
1
d
1
×
+ ×
2 30 2 50
Z 50
d
d
1
d
1
1
+ ×
+
ds.
E(length of a cycle) = ×
2 30 2 50 20 30 s
E(reward in one cycle) =
Thus, with probability 1, the long-run proportion of the shuttle’s operating time (excluding any time needed to pick up passengers) that
is spent going to the airport is
1
2
1
2
×
d
30
×
+
1
2
d
30
+
1
2
×
d
50
+
×
d
50
1
20
R 50
d
30 s
= 0.5108.
ds
9.23 Let the random variable St be the age of the bulb in use at time t.
The stochastic process {St } describing the age of the light bulb in use
is regenerative. It regenerates itself each time a new bulb is installed.
Let the generic random variable X be distributed as the lifetime of a
bulb. Let F (x) = P (X ≤ x) be the probability distribution function
of the lifetime X of a bulb and f (x) be its probability density. Then,
E(length of a cycle) =
=
Z
T
0
Z T
0
tf (t) dt +
Z
∞
T f (t) dt
T
1 − F (t) dt
E(cost incurred in one cycle) = c2 P (X ≤ T ) + c1 P (X > T )
= c1 + (c2 − c1 )F (T ).
Thus, with probability 1, the long-run average cost per unit time is
c1 + (c2 − c1 )F (T )
.
RT
1
−
F
(t)
dt
0
9.24 Let the random variable St be the number of orders awaiting processing at time t. The stochastic process {St , t ≥ 0} is regenerative. It
regenerates itself each time N orders have accumulated. The expected
length of one cycle is the expected value of the sum of N interarrival
times and thus is equal to N η. The expected amount of time the
first order arriving in a cycle has to wait until processing begins is
(N − 1)η, the expected waiting time of the second order arriving in
204
a cycle is (N − 2)η, and so on. Hence the total expected cost in one
cycle is
1
K + h[(N − 1)η + (N − 2)η + · · · + η] = K + hN (N − 1)η.
2
Hence, with probability 1, the long-run average cost per unit time is
E(cost in one cycle)
K
1
=
+ h(N − 1),
E(expected length of one cycle)
Nη 2
This
p cost function is minimal for one of the two integers nearest to
2K/hη.
9.25 Let St be equal to 1 if the channel is on at time t and be equal to
0 otherwise. The stochastic process {St } is regenerative. Take the
epochs at which an on-time starts as the regeneration epochs. Let µon
be the expected length of the on-time X. Then,
Z 1
x 6x(1 − x) dx = 0.5.
µon =
0
By the law of conditional expectation,
Z 1
E(L | X = x)f (x) dx
E(length of a cycle) =
0
Z 1
√
(x + x2 x)6x(1 − x) dx = µon + µof f ,
=
0
√
where µof f = 0 x2 x 6x(1 − x) dx = 0.2424. By the same arguments
as in Example 9.9, it now follows that the long-run fraction of time
the system is on equals
R1
µon
0.5
= 0.673.
=
µon + µof f
0.5 + 0.2424
9.26 Let the random variable St be equal to 0 if the system is out of stock
at time t and be equal to 1 otherwise. The continuous-time stochastic
process {St , t ≥ 0} is regenerative. The epochs at which the stock
drops to zero can be taken as regeneration epochs. A cycle starts each
time the stock on hand drops to zero. The system is out of stock during
the time elapsed from the beginning of a cycle until the next inventory
replenishment. This amount of time is exponentially distributed with
205
mean 1/µ. The expected amount of time it takes to go from stock level
Q to 0 equals Q/λ. Hence, with probability 1,
the long-run fraction of time the system is out of stock =
1/µ
.
1/µ + Q/λ
To find the fraction of demand that is lost, let the random variable In
be equal to 0 if the system runs out of stock at the nth demand epoch
and be equal to 1 otherwise. The discrete-time stochastic process
{In , n = 1, 2, . . .} is regenerative. It regenerates itself each time a
demand occurs and the stock drops to zero. In the discrete case cycle
is to be interpreted as the number of demand epochs between two
demand epochs at which the stock drops to 0. The expected value of
the number of demands lost in one cycle equals λ × E(amount of time
the system is out of stock during one cycle) = λ/µ. The expected
number of demands occurring in one cycle is λ/µ + Q. Hence, with
probability 1,
the long-run fraction of demand that is lost =
λ/µ
.
λ/µ + Q
It now follows that the long-run fraction of customers finding the system out of stock is equal to the long-run fraction of time the system
is out of stock. This is a particular instance of the property “Poisson
arrivals see time averages.”
9.27 The process describing the status of the processor is regenerative. Take
as cycle the time interval between two successive epochs at which an
arriving job finds the processor idle. Using the memoryless property
of the Poisson process, the expected length of a cycle is µ + λ1 and
the expected amount of idle time in one cycle is λ1 . Thus the long-run
fraction of time the server is idle is
1/λ
1
=
.
µ + 1/λ
1 + λµ
Let N be the number of jobs arriving during the processing time X.
Then E(N | X = x) = λx and so E(N ) = λµ. Thus the expected
number of arrivals during one cycle is 1 + λµ. The number of jobs
accepted in one cycle is 1, and so long-run fraction of jobs that are
accepted is
1
.
1 + λµ
206
9.28 Let the random variable St denote the number of non-failed units
at time t. The stochastic process {St , t ≥ 0} is regenerative. The
regeneration epochs are the inspection epochs. A cycle is the time
interval between two inspections. The expected length of a cycle is T .
The expected amount of time the system is down during one cycle is
E[max(T − X1 − X2 , 0)], where X1 and X2 are independent random
variable having an exponential distribution with parameter α. The
density of X1 + X2 is the Erlang-2 density α2 te−αt . Thus
Z T
(T − t)α2 te−αt dt
E[max(T − X1 − X2 , 0)] =
0
(αT )2 −αT 2
1 − e−αT − αT e−αT −
e
= T 1 − e−αT − αT e
−
α
2
2
2
.
= T − + e−αT T +
α
α
−αT
Hence, with probability 1, the long-run fraction of time the system is
down equals
T − α2 + e−αT (T + α2 )
.
T
9.29 Let a cycle be the time interval between two successive replacements
of the bulb. Denote by the generic variable X the length of a cycle.
Imagine that a cost at rate 1 is incurred if the age of the bulb is larger
than c and a cost at rate 0 is incurred otherwise. Then, the cost
incurred during one cycle is max(X − c, 0). The long-run average cost
per unit time is
E[max(X − c, 0)]
.
E(X)
R∞
To evaluate E[max(X−c, 0)], we use the result that E(V ) = 0 P (V >
v) dv for any nonnegative random variable V , see Problem 4.26. This
gives
Z ∞
P (max(X − c, 0) > v) dv
E[max(X − c, 0)] =
0
Z ∞
Z ∞
=
1 − F (v + c) dv =
1 − F (x) dx,
0
c
which verifies the desired result.
9.30 Let a cycle be the time interval between two successive replacements
of the unit. Denote by the generic variable X the length of a cycle.
207
The expected length of one cycle is E(X). Under the condition that
X = x the cost incurred in the cycle is 21 x2 and so the expected cost
incurred in one cycle is 21 E(X 2 ). Thus the long-run average cost per
unit time is given by
E(X 2 )
.
2E(X)
9.31 Define a cycle as the time elapsed between two consecutive replacements of the item. By conditioning on the lifetime X of the item, we
have that the expected length of a cycle is
Z ∞
Z T
T + a(x) f (x) dx,
xf (x) dx +
T
0
where a(x) = E[min(x − T, V )] for x > T and V is the length of the
interval between T and the first preventive replacement opportunity
after time T . Since V is exponentially distributed with parameter λ,
we have
Z x−T
i
1h
a(x) =
vλe−λv dv + (x − T )e−λ(x−T ) =
1 − e−λ(x−T ) .
λ
0
Noting that P (V ≤ x − T ) = 1 − e−λ(x−T ) for x > T , we have that the
expected cost incurred in one cycle is
Z ∞
c1 1 − e−λ(x−T ) + c0 e−λ(x−T ) f (x)dx,
c0 F (T ) +
T
where F (x) = P (X ≤ x. The long-run average cost per unit time is
R∞
c0 F (T ) + T c1 1 − e−λ(x−T ) + c0 e−λ(x−T ) f (x)dx
RT
R∞
T + λ1 1 − e−λ(x−T ) f (x) dx
0 xf (x) dx + T
2
−λ(2−p)t ) into the right-hand side
9.32 Substituting M (t) = µt − 1−p
2−p (1 − e
of the renewal equation for M (t), we get that the right-hand side is
given by
Z th
i
t − x 1 − p 2
−λt
−λt
(1 − e−λ(2−p)(t−x) )
−
1−e
− λte
+
µ
2−p
0
h
i
× pe−λx + (1 − p)λ2 xe−λx dx
It is matter of some algebra to show that this expression is equal to
t 1 − p 2
(1 − e−λ(2−p)t ).
−
µ
2−p
208
9.33 For fixed clearing time T > 0, the stochastic process describing the
number of messages in the buffer regenerates itself each time the buffer
is cleared. The clearing epochs are regeneration epochs because of the
memoryless property of the Poisson process. The expected length of
a cycle is T . The expected cost incurred in the first cycle is
K +E
∞
hX
i
h(Sn ) = K +
n=1
Since E[ h(Sn )] = h
∞
X
RT
0
∞
X
E[ h(Sn )].
n=1
n−1
x
(T − x)λn (n−1)!
e−λx dx, we get
E[ h(Sn )] = h
n=1
Z
T
0
1
(T − x)λ dx = λT 2 .
2
Thus the average cost per unit time is (K + 21 λT 2 )/T . This expression
p
is minimal for T = 2K/(hλ).
9.34 Using Rule 5.10, we have that
E(time that i messages are present during one cycle)
∞
∞
i
k
X
X
T
1 X −λT (λT )j
1
(λT )j −λT (λT )
=
e
=
=
1−
.
e
e−λT
k+1
k!
λ
j!
λ
j!
k=i
j=i+1
j=0
The ratio of this expression and the cycle length T gives the sought
result.
9.35 In view of the interpretation of an Erlang distributed interoccurrence
time as the sum of r independent phases, imagine that phases are
completed according to a Poisson process with rate α, where the completion of each rth phase marks the occurrence of an event. Then
P (N (t) ≤ k) can be interpreted as the probability that at most (k +
1)r − 1 phases are completed up to time t.
9.36 By conditioning on the epoch X1 of the first renewal and using the
law of conditional expectation, we have that P (Y (t) > u) is equal to
Z t+u
P (Y (t) > u | X1 = x)f (x) dx
P (Y (t) > u | X1 = x)f (x) dx +
t
0
Z ∞
P (Y (t) > u | X1 = x)f (x) dx.
+
Z
t
t+u
209
Next note that the conditional distribution of Y (t) given that X1 = x
with 0 ≤ x ≤ t is the same as the unconditional distribution of Y (t−x).
Also, we have that Y (t) = 1 if X1 > t+u and Y (t) = 0 if t < X1 ≤ t+u.
Thus we get
Z t
P (Y (t − x) > u)f (x) dx,
P (Y (t) > u) = 1 − F (t + u) +
0
which verifies the equation for Qt (u). If f (x) = λeR−λx , then substit
tution of P (Y (t) > y) = e−λy in 1 − F (t + u) + 0 P (Y (t − x) >
t − u)f (x) dx gives
Z t
P (Y (t − x) > u)f (x) dx
1 − F (t + u) +
0
Z t
−λ(t+u)
e−λu λe−λx dx = e−λu .
=e
+
0
Since the renewal equation for Qt (u) has a unique solution, we have
verified that, for any t > 0,
P (Y (t) > u) = e−λu
for u > 0
if the interoccurrence times are exponentially distributed with parameter λ. This is the memoryless property of the Poisson process.
9.37 Fix s ≥ 1. Imagine that a reward of 1 is earned for each customer
belonging to a batch of size s. Then the expected reward earned for a
batch is sps . The expected number of customers in a batch is µ. Thus
the average reward per customer is
sps
,
µ
which intuitively explains the result.
210
Chapter 10
10.1 Let Xn be the number of type-1 particles in compartment A after the
nth transfer. The process {Xn } is a Markov chain with state space
I = {0, 1, . . . , r}. The one-step transition probabilities are
pi,i−1 =
2i(r − i)
(r − i)2
i2
,
p
=
,
p
=
for i = 0, 1, . . . , r,
ii
i,i+1
r2
r2
r2
and pij = 0 otherwise.
10.2 Let state 1 correspond to the situation that the professor is driving to
the office and has his driver’s license with him, state 2 to the situation
that the professor is driving to his office and has his driver’s license
at home, state 3 to the situation that the professor is driving to the
office and has his driver’s license at the office, state 4 to the situation
that the professor is driving to home and has his driver’s license with
him, state 5 to the situation that the professor is driving to his home
and has his driver’s license at the office, and state 6 to the situation
that the professor is driving to his home and has his driver’s license
at home. Denoting by Xn the state at the nth drive of the professor,
the process {Xn } is a Markov chain with state space I = {1, 2, . . . , 6}.
The matrix of one-step transition probabilities of the Markov chain is
given by
from /to
1
2
3 4
5 6


1
0
0
0 0.5 0.5 0
 0
2
0
0 0
0 1


 0
3
0
0 0.5 0.5 0 

.
 0.75 0.25 0 0
4
0 0


 0
5
0
1 0
0 0
6
0.75 0.25 0 0
0 0
10.3 Take as state the largest outcome in the last roll. Let Xn be the state
after the nth roll with the convention X0 = 0. The process {Xn }
is a Markov chain with state space I = {0, 1, . . . , 6}. The one-step
transition probabilities are p0k = 16 for 1 ≤ k ≤ 6 and
pjk =
k j
6
−
k − 1 j
6
for j, k = 1, . . . , 6,
using the relation P (Y = k) = P (Y ≤ k) − P (Y ≤ k − 1).
211
10.4 Let state XY correspond to the situation that yesterday’s weather is
of type X and today’s weather is of type Y , where X and Y can take
on the values S (sunny) and R (rainy). Denote by Xn the state of
the weather at the nth day, then {Xn } is a Markov chain with state
space I = {SS, SR, RS, RR}. The one-step transition probabilities of
the Markov chain are given by
from\to SS

SS
0.9
 0
SR

 0.7
RS
RR
0
SR
0.1
0
0.3
0
RS RR

0
0
0.5 0.5 
.
0
0 
0.45 0.55
10.5 Let’s say that the system is in state (0, 0) if both machines are good,
in state (0, k) if one of the machines is good and the other one is in
revision with a remaining repair time of k days for k = 1, 2, and in
state (1, 2) if both machines are in revision with remaining repair times
of one day and two days. Defining Xn as the state of the system at the
end of the nth day, the process {Xn } is a Markov chain. The one-step
transition probabilities are given by
1
9
1
9
, p
= , p
= , p
= ,
10 (0,0)(0,2) 10 (0,1)(0,0) 10 (0,1)(0,2) 10
1
9
= , p(0,2)(1,2) = , p(1,2)(0,1) = 1,
10
10
p(0,0)(0,0) =
p(0,2)(0,1)
and pvw = 0 otherwise.
10.6 A circuit board is said to have status 0 if it has failed and is said to
have status i if it functions and has the age of i weeks. Let’s say that
the system is in state (i, j) with 0 ≤ i ≤ j ≤ 6 if one of the circuit
boards has status i and the other one has status
j just before any
7
replacement. This state description requires 2 + 7 = 28 states rather
than 72 = 49 states when a separate state variable would have been
used for each circuit board. Denote by Xn the state of the system at
the end of the nth week. Then, the process {Xn } is a Markov chain.
The one-step probabilities can be expressed in terms of the failure
probabilities ri . For states (i, j) with i 6= j and 0 ≤ i < j ≤ 5, we have
p(i,j),(i+1,j+1) = (1 − ri )(1 − rj ), p(i,j),(0,i+1) = (1 − ri )rj ,
p(i,j),(0,j+1) = ri (1 − rj ), p(i,j),(0,0) = ri rj ,
212
and p(i,j),(v,w) = 0 otherwise. For states (i, i) with 0 ≤ i ≤ 5, we have
p(i,i),(i+1,i+1) = (1 − ri )2 , p(i,i),(0,i+1) = 2ri (1 − ri ), and p(i,i),(0,0) = ri2 .
Further, p(i,6),(i+1,1) = 1 − ri and p(i,6),(0,1) = ri for 0 ≤ i ≤ 5. Further,
p(6,6),(1,1) = 1.
10.7 Let’s say that the system is in state i if the channel holds i messages
(including any message in transmission). If the system is in state i
at the beginning of a time slot, then the buffer contains max(i − 1, 0)
messages. Define Xn as the state of the system at the beginning of
the nth time slot. The process {Xn } is a Markov chain with state
space I = {0, 1, . . . , K + 1}. In a similar way as in Example 10.4, the
one-step transition probabilities are obtained. Letting
ak = e−λ
λk
k!
for k = 0, 1 . . . ,
the one-step transition probabilities can be expressed as
p0j = aj for 0 ≤ j ≤ K − 1, p0,K =
∞
X
k=K
ak , pK+1,K = 1 − f,
pK+1,K+1 = f, pij = (1 − f )aj−i+1 + f aj−i for 1 ≤ i ≤ j ≤ K,
pi,i−1 = (1 − f )a0 pi,K+1 = 1 −
K
X
j=i−1
pij for 1 ≤ i ≤ K,
and pij = 0 otherwise.
10.8 Let’s follow the location of a particular car. Let Xn denote the location
of the car after its nth return, then the process {Xn } is a Markov chain
whose matrix P of one-step transition probabilities is given by
from\to
1

1
0.8

2
 0.1
 0.2
3
4
0
2
3
4

0.1 0 0.1
0.7 0.2 0 
.
0.1 0.5 0.2 
0.2 0.1 0.7
If the car is currently at location 3, it will be back at location 3 af(5)
ter being rented out five times with probability p33 = 0.2798. This
213
probability is obtained from the matrix product

0.4185

0.3089
P5 = 
 0.3008
0.1890
0.2677
0.3305
0.2860
0.3328
0.1151
0.1904
0.1677
0.1985

0.1987
0.1702 
.
0.2455 
0.2798
It appears experimentally that the matrix product Pn converges to
a matrix with identical rows as n gets large (the Markov chain is
aperiodic and has no two disjoint closed sets). We find that
P25

0.3165
 0.3165
=
 0.3165
0.3165
0.3038
0.3038
0.3038
0.3038
0.1645
0.1645
0.1645
0.1645

0.2152
0.2152 
 = P26 = P27 = · · · .
0.2152 
0.2152
The long-run frequency at which the car is returned to location i has
the values 0.3168, 0.3038, 0.1645, and 0.2152 for i = 1, 2, 3, and 4.
10.9 Use a Markov chain with four states SS, SR, RS, and RR. The matrix
P of one-step transition probabilities satisfies:

and
P30
0.71780

0.61508
P5 = 
 0.68371
0.60893

0.69231
 0.69231
=
 0.69231
0.69231
0.09767
0.10236
0.09972
0.10280
0.09890
0.09890
0.09890
0.09890
(5)
0.08787
0.13250
0.10236
0.13506
0.09890
0.09890
0.09890
0.09890

0.09666
0.15007 
.
0.11422 
0.15321

0.10989
0.10989 
 = P31 = · · · .
0.10989 
0.10989
(5)
The matrix P5 gives pRR,RS + pRR,SS = 0.7440. The matrix P30
gives that the long-run probability is 0.6923 + 0.0989 = 0.7912. The
expected value of the number of sunny days in the coming 14 days is
14
X
(t)
(t)
(pRR,RS + pRR,SS ) = 10.18.
t=1
214
10.10 The formula for pk00 is correct for k = 1:
(1)
p00 =
α(1 − α − β)
α(α + β)
β
+
=1−
= 1 − α = p00 .
α+β
α+β
α+β
(k)
Similarly, the formula for p11 is correct for k = 1. Suppose the formu(k)
(k)
las for p00 and p11 have been verified for k = 1, . . . , n − 1. Then,
(n−1)
(n−1)
(n−1)
using the formula for p00
together with p10
= 1 − p11
=
n−1
β(1−α−β)
β
, we get
α+β −
α+β
(n)
(n−1)
(n−1)
p00 = (1 − α)p00
+ αp10
h β
h β
α(1 − α − β)n−1 i
β(1 − α − β)n−1 i
+
−
+α
= (1 − α)
α+β
α+β
α+β
α+β
α(1 − α) − αβ
β
α(1
− α − β)n
β
+
(1 − α − β)n−1 =
+
=
α+β
α+β
α+β
α+β
(n)
Similarly, the formula for p11 is verified.
10.11 Consider a two-state Markov chain with states 0 and 1, where state
0 means that the last bit was received incorrectly and state 1 means
that the last bit was received correctly. The one-step transition probabilities are given by p00 = 0.9, p01 = 0.1, p10 = 0.001, and p11 = 0.999.
The expected number of incorrectly received bits is
5,000
X
(n)
p10 = 49.417.
n=1
10.12 It is sufficient to analyze the evolution of a single tree growing at a
particular spot in the forest. The state of the system is 0 if the tree is
a baby tree, is 1 if the tree is a young tree, is 2 if the tree is a middleaged tree, and is 3 if the tree is an old tree. Let Xn be the state after
50n years. Then, the process {Xn } is a Markov chain with state space
I = {0, 1, 2, 3}. The matrix P of one-step transition probabilities is
given by
from\to
0
1
2
3


0
0.2 0.8
0
0
 0.05 0 0.95
1
0 

.
 0.1
2
0
0
0.9 
3
0.25 0
0
0.75
215
(5)
From the matrix product P5 the probabilities aj = p0j are obtained for
j = 0, 1, 2, 3. The values of the probabilities aj are a0 = 0.2144, a1 =
0.1675, a2 = 0.0760, and a3 = 0.5421. It appears experimentally that
the matrix product Pn converges to a matrix with identical rows as n
gets large (the Markov chain is aperiodic and has no two disjoint closed
sets). Thus, we calculate the probabilities πj for j = 0, 1, 2, 3 as the
row elements of the matrix product Pn for sufficiently large values of n
(n = 20 is large enough for convergence in four decimals). The values
of the probabilities πj are π0 = 0.1888, π1 = 0.1511, π2 = 0.1435,
and π3 = 0.5166. The age distribution of the forest after 50 years
is a multinomial distribution with parameters (10,0000, a0 , a1 , a2 , a3 )
and the age distribution of the forest in equilibrium is a multinomial
distribution with parameters (10,0000, π0 , π1 , π2 , π3 ).
10.13 Using
var
n−1
n
n
n
X
X
X X
cov(It , Iu ),
var(It ) + 2
It =
t=1
t=1 u=t+1
t=1
we get the result for σ 2 [Vij (n)]. The normal approximation to the
sought probability is
240.5 − 217.294 1−Φ
= 0.0276
12.101
By simulation, we found the value 0.0267 .
10.14 Use a Markov chain with states 1, 2, and 3, where the three states
correspond to the situation that the last match was a win, a loss and
a draw for England. The matrix of one-step transition probabilities is
from\to
1
2
3


1
0.44 0.37 0.19
 0.28 0.43 0.29 .
2
3
0.27 0.30 0.43
The expected number of wins for England in the next three matches
given that the last match was a draw is
3
X
k=1
(k)
p31 = 0.9167.
216
10.15 Take a Markov chain with the states s = (i, k), where i = 1 if England
has won the last match, i = 2 if England has lost the last match, i = 3
if the last match was a draw, and k ∈ {0, 1, 2, 3} denotes the number of
matches England has won so far. For states s = (i, k) with k = 0, the
one-step transition probabilities are p(1,0)(1,1) = 0.44, p(1,0)(2,0) = 0.37,
p(1,0)(3,0) = 0.19, p(2,0)(1,1) = 0.28, p(2,0)(2,0) = 0.43, p(2,0)(3,0) = 0.29,
p(3,0)(1,1) = 0.27, p(3,0)(2,0) = 0.30, and p(3,0)(3,0) = 0.43. Similarly,
the one-step probabilities for the states (i, 1) and (i, 2). The one-step
transition probabilities for the states (i, 3) are not relevant and may
be taken as p(i,3)(i,3) = 1. Let pk denote the probability that England
will win k matches of the next three matches when the last match was
a draw. Then,
(3)
(3)
(3)
pk = p(3,0)(1,k) + p(3,0)(2,k) + p(3,0)(3,k) for 0 ≤ k ≤ 3
(verify that this formula uses the fact that the second component of
state s = (i, k) cannot decrease). This leads to p0 = 0.3842,
p1 =
P
0.3671, p2 = 0.1964 and p3 = 0.0523. As a sanity check, 3k=0 kpk =
0.9167, in agreement with the answer to Problem 10.14.
10.16 Use a Markov chain with fives states, where state 1 means deuce,
state 2 means Bill has advantage, state 3 means Mark has advantage,
state 4 means Bill is winner, and state 4 means Mark is winner. The
states 4 and 5 are absorbing. The one-step transition probabilities are
p12 = 0.55, p13 = 0.45, p21 = 0.40, p24 = 0.60, and p31 = p35 = 0.50.
For starting state i, let fi be the probability that Bill will be the winner
of the game and µi be the expected duration of the game. The sought
probability f1 = 0.5946 follow by solving the linear equations
f1 = 0.55f2 + 0.45f3 , f2 = 0.4f1 + 0.6, f3 = 0.5f1 + 0.5.
The sought expected value µ1 = 3.604 follows by solving the linear
equations
µ1 = 1 + 0.55µ2 + 0.45µ3 , µ2 = 1 + 0.4µ1 , µ3 = 1 + 0.5µ1 .
10.17 Use a Markov chain with the 11 states 1, 2, . . . , 10 and 10+, where
state i means that the particle is in position i and state 10+ means
that the particle is in a position beyond position 10. The state 10+ is
taken to be absorbing. The one-step transition probabilities are
pij = 0.5 for j = i − ⌊i/2⌋ and for j = 2i − ⌊i/2⌋.
(25)
The other pij are zero. The sought probability is p1,10+ = 0.4880.
217
10.18 Use a Markov chain with six states, where state i means that the jar
contains i red balls. Take state 5 as an absorbing state and so p55 = 1.
The other one-step transition probabilities are p01 = 1, pi,i−1 = 5i and
pi,i+1 = 5−i
5 for 1 ≤ i ≤ 4. The probability that more than n picks are
needed is
(n)
1 − p25
This probability has the values 0.9040, 0.8139, 0.5341, 0.2840, and
0.0761 for n = 5, 10, 25, 50, and 100. The expected number of picks is
µ2 = 40.167. To obtain this result, define µi as the expected number of
picks needed to reach state 5 from state i and solve the linear equations
5−i
i
µi+1
µi = 1 + µi−1 +
5
5
for i = 0, 1, . . . , 4
where µ−1 = µ5 = 0.
10.19 Use a Markov chain with four states 0, 1, 2, and 3, where state 0 means
neither a total of 7 nor a total of 12 for the last roll, state 1 means
a total of 7 for the last roll but not for the roll before, state 2 means
a total of 7 for the last two rolls, and state 3 means a total of 12 for
the last roll. The states 2 and 3 are absorbing with p22 = p33 = 1.
29
6
1
Further, p00 = p10 = 36
, p01 = p12 = 36
, p03 = p13 = 36
, and pij = 0
otherwise. Let fi be the probability of absorption in state 2 when the
6
, as follows by solving
initial state is i. The sought probability f0 is 13
the two linear equations
f0 =
29
6
29
6
f0 + f1 and f1 = f0 + .
36
36
36
36
10.20 (a) Consider a Markov chain with six states i = 0 , 1, 2, 3, 4, and 5,
where state i now corresponds to the situation that the last i tosses
resulted in heads but not the toss preceding those i tosses. State 5 is
absorbing. For i = 0, 1, 2, 3, 4, the one-step transition probabilities are
(20)
pi0 = pi,i+1 = 0.5. The desired probability is p05 = 0.2499.
(b) Consider again a Markov chain with six states i = 0, 1, 2, 3, 4, 5,
where state i corresponds to the situation that the last i tosses resulted
in the same outcomes but not the toss preceding those i tosses. State
5 is absorbing, that is p5,5 = 1. Then, p0,1 = 1. For i = 1, 2, 3, 4, the
one-step transition probabilities are pi1 = pi,i+1 = 0.5. The desired
(20)
probability is p05 = 0.4584.
(c) Use a Markov chain with the six states (i, j) for i = 0, 1 and
218
j = 0, 1, 2 and the three absorbing states ak for k = 1, 2, 3, where state
(i, j) means that player A has tossed i heads in a row and player B has
tossed j heads in a row, state a1 means that player A has won, state
a2 means that player B has won, and state a3 means that there is a
tie. The matrix P of one-step probabilities is easily determined. For
example, p(1,0)(0,0) = p(1,0)(0,1) = 0.25 and p(1,0)a1 = 0.5. The sought
probabilities can be obtained by solving twice a system of six linear
equations. However, we obtained these probabilities by calculating Pn
for n sufficiently large. Player A wins with probability 0.7398, player
B with probability 0.2125, and a tie occurs with probability 0.0477.
10.21 Take a specific city (say, Venice) and a particular number (say, 53).
Consider a Markov chain with state space I = {0, 1, . . . , 182}, where
state i indicates the number of draws since the particular number 53
appeared for the last time in the Venice lottery. The state 182 is
taken as an absorbing state. The one-step transition probabilities for
the other states of the Markov chain are
pi0 =
5
85
and pi,i+1 =
for i = 0, 1, . . . , 181.
90
90
The probability that in the next 1,040 draws of the Venice the lottery
there is some window of 182 consecutive draws in which the number
(1,040)
53 does not appear can be calculated as p0,182 . This probability has
the value p = 0.00077541. The five winning numbers in a draw of the
lottery are not independent of each other, but the dependence is weak
enough to give a good approximation for the probability that in the
next 1,040 drawings of the Venice lottery there is not some number
that stays away during some window of 182 consecutive draws. This
probability is approximated by (1 − p)90 . The lottery takes place in
10 cities. Thus the sought probability is approximately equal to
1 − (1 − p)900 = 0.5025.
This problem is another illustration of the fact that coincidences can
nearly always be explained by probabilistic arguments!
10.22 Let’s say that the system is in state i if i different numbers are drawn
so far. Define the random variable Xn as the state of the system after
the nth drawing. The process {Xn } is a Markov chain with state
space I = {0, 1, . . . , 45}. State 45 is an absorbing state. The one-step
219
transition probabilities are given by p06 = 1 and
i 45−i
pi,i+k =
k
6−k
45
6
for i = 0, 1, . . . , 44 and k = 0, 1, . . . , min(45 − i, 6), p45,45 = 1, and
pij = 0 otherwise, with the convention m
n = 0 for n > m. The
probability that more than r drawings are needed to obtain all of the
numbers 1, 2, . . . , 45 is equal to
(r)
1 − p0,45 .
This probability has the values 0.9989, 0.7409, 0.2643, and 0.035 for
r = 15, 25, 35, and 50.
10.23 Take a Markov chain with state space I = {(i, j) : i, j ≥ 0, i + j ≤ 25},
where state (i, j) means that i pictures are in the pool once and j
pictures are in the pool twice or more. State (0, 25) is absorbing. The
other one-step transition probabilities are
j
j
25 − i − j
j
× , p(i,j),(i+1,j) = 2 ×
× ,
25 25
25
25
25 − i − j 24 − i − j
×
,
p(i,j),(i+2,j) =
25
25
i
25 − i − j 25 − i − j i + 1
p(i,j),(i,j+1) =
×
+
×
,
25
25
25
25
j+1
j
i
i
i−1
i
×
+
× , p(i,j),(i−2,j+2) =
×
.
p(i,j),(i−1,j+1) =
25
25
25 25
25
25
p(i,j),(i,j) =
Then
(n)
P (N > n) = 1 − p(0,0),(0,25) .
We haveP
E(N ) = 71.4 weeks. This value can be calculated from
E(N ) = ∞
n=0 P (N > n) or by solving a set of linear equations.
10.24 Let us analyze the game for the case that player A chooses HHT
and player B responds with T HH. We use a Markov chain with the
7 states labeled as 0, H, T , HH, T H, HHT , and T HH. State 0
corresponds to the beginning of the game. State HH means (heads,
heads) for the last two tosses and state T H means (tails, heads) for
the last two tosses. State H means heads for the first toss of the game.
The meaning of state T is more subtle. State T means that the first
toss is heads is or the last toss is heads (it is not necessary to use states
220
T T and HT ). The states HHT (win for player A)and T HH (win for
player B) are absorbing states. Let Xn be the state after the nth toss.
Then the process {Xn } is a Markov chain. The matrix P of one-step
transition probabilities is given by
from\to 0 H
T HH

0
0 0.5 0.5
0
 0 0 0.5 0.5
H

 0 0 0.5
T
0


HH
0
0.5
0 0
 0 0 0.5
TH
0

HHT  0 0
0
0
T HH
0 0
0
0
TH
0
0
0.5
0
0
0
0
HHT
0
0
0
0.5
0
1
0
T HH

0
0 

0 

0 
.
0.5 

0 
1
Calculating Pn for several large values of n (or solving a system of
linear equations), we get the value 0.75 for the win probability of
player B. Similarly, the other win probabilities of player B can be
calculated (the calculation of the win probability 0.875 for player B
when A chooses HHH does not require a Markov chain: if player A
chooses HHH, then player A can only win if the first three tosses are
heads).
10.25 Take a Markov chain with state space I = {0, 1, . . . 5}, where state
i means that Joe’s bankroll i × 200 dollars. The states 0 and 5 are
absorbing. The other one-step transition probabilities are p10 = p20 =
19
, p12 = p24 = p35 = p45 = 18
p31 = p43 = 37
37 , and pij = 0 otherwise.
(a) The probability that Joe will place more than n bets is
(n)
(n)
1 − p4,0 − p4,5 .
This probability has the values 0.2637, 0.1283, 0.0320, and 0.0080 for
n = 2, 3, 5, and 7.
(b) To find the probability of Joe reaching his goal, solve the four
18
18
19
linear equations f1 = 18
37 f2 , f2 = 37 f4 , f3 = 37 + 37 f1 , and f4 =
18
19
37 + 37 f3 . The probability of Joe reaching his goal is f4 = 0.78531.
(c) Parameterize and let si be the expected value of the total amount
you will stake in the remaining part of the game when the current
state is i. To find s4 , solve the linear equations
s1 = 200 +
18
18
19
19
s2 , s2 = 400 + s4 , s3 = 400 + s2 , s4 = 200 + s3 .
37
37
37
37
221
This gives s4 = 543.37 dollars. As a sanity check, the ratio of your
expected loss and the expected amount staked during the game is
indeed equal to the house advantage of 0.0270 dollar per dollar staked
(the expected loss is 0.21469 × 800 − 0.78531 × 200 = 14.69 dollars).
Note: The maximal value of the probability of Joe reaching his goal is
not achieved by the strategy of bold play, see also Problem 2.43. Using
the optimization method of dynamic programming, the maximal value
can be calculated as 0.7900 when no combined bets are done on the
outcome of the wheel. Bold play is optimal in a primitive casino with
only one type of roulette bet having wf ≤ 1, where w is the probability
of getting back f times the stake and 1 − w is the probability of losing
the stake (payoff odds f for 1). The proof of this result requires deep
mathematics
10.26 To answer the question about European roulette, we define the following Markov chain. Let’s say that the system is in state 0 if the
game begins or if the last spin showed a zero and in state i if the last
i spins of the wheel showed the same color for i = 1, 2, . . . 26 but not
the spin before those i spins. State 26 is taken as an absorbing state.
Let Xn be the state of the system after the nth spin of the wheel. The
process {Xn } is a Markov chain with one-step transition probabilities
1
36
1
p00 = 37
, p01 = 37
, pi0 = 37
, pi,i+1 = pi1 = 18
37 for i = 1, 2, . . . , 25,
p26,26 = 1, and pij = 0 otherwise. The probability that in the next
n spins of the wheel the same color will come up 26 or more times
(n)
in a row is given by p0,26 . This probability has the value 0.0368 for
n = 5,000,000 (and 0.0723 for n = 10,000,000). A minor adjustment
of the one-step transition probabilities of the Markov chain is required
for the question about American roulette. Then, the probability that
in the next n spins of the wheel one of the numbers 1 to 36 will come
up 6 or more times in a row is 0.0565 for n = 5,000,000 (and 0.1099
for n = 10,000,000).
10.27 Use a Markov chain with the states i = 0, 1, . . . , 8, where state i means
that the dragon has i heads. The states 0, 7, and 8 are absorbing. The
one-step transition probabilities are pi,i−1 = 0.7, pi,i+1 = 0.3p, and
pi,i+2 = 0.3(1 − p) for 1 ≤ i ≤ 6. The win probabilities for p = 0, 0.5,
and 1 are the probabilities 0.6748, 0.8255, and 0.9688 of absorption in
state 0 starting from state 3.
10.28 (a) For i = 1, 2, . . . , 6, let state i mean that the outcomes of the last
i rolls are different but not the outcomes of the last i + 1 rolls. The
222
auxiliary state 0 means that the first roll is to occur. State 6 is taken
as an absorbing state. Let Xn be the state after the nth roll. The
one-step transition probabilities of the absorbing Markov chain {Xn }
are p01 = 0, pi1 = 65 and pi,i+1 = 61 for 1 ≤ i ≤ 5. Raising the matrix
P of one-step transition probabilities to the power 100, it follows that
the probability of getting a run of six different outcomes within 100
(100)
rolls is given by p0,6 = 0.7054. To find the expected value of the
number of rolls until a run of six different outcomes occurs, define µi
as the expected number of rolls until such a run occurs given that the
process starts in state i. Then,
5
1
µ0 = 1 + µ1 , µi = 1 + µ1 + µi+1 for i = 1, 2, . . . , 5,
6
6
where µ6 = 0. The solution of this system of six linear equations gives
the desired value µ0 = 83.20.
(b) Let state 1 mean that the last roll gave the outcome 1, state 2 that
the last two rolls gave the run 12, state 3 that the last three rolls gave
the run 123, state 4 that the last four rolls gave the run 1234, state 5
that the last five rolls gave the run 12345, and state 6 that the last six
rolls gave the run 123456. The auxiliary state 0 refers to the situation
that none of these six states applies. State 6 is taken as an absorbing
state. Let Xn be the state after the nth roll. The one-step transition
probabilities of the absorbing Markov chain {Xn } are
5
1
4
1
p00 = , p01 = , and pi0 = , pi1 = pi,i+1 = for 1 ≤ i ≤ 5.
6
6
6
6
The probability of getting the run 123456 within 100 rolls of the die is
obtained by raising the matrix P of one-step transition probabilities
(100)
to the power 100. This gives p0,6 = 0.00203. By setting up a similar
system of linear equations as in (a), it follows that the expected number of rolls until such a run occurs is 46,659.
Note: The value 0.00203 for the probability of getting the run 123456
within 100 rolls of the die can also be obtained by the Poisson heuristic
discussed in Section 3.7.1. Consider 95 trials i = 1, 2, . . . , 95, where
trial i is said to be successful if the rolls i, i + 1, . . . , i + 6 gives the
successive outcomes 1, 2, . . . , 6. The success probability of each trial is
1
. The trials are not independent, but the dependence is quite weak.
66
This justifies to approximate the number of successes by a Poisson
distribution with expected value λ = 95 × 616 . Thus the probability
of getting the run 123456 within 100 rolls of the die is approximately
equal to 1 − e−λ = 0.00203.
223
10.29 Take a Markov chain with six states 0, 1, . . . , 5, where state 0 corresponds to the start of the game, state 1 means that all five dice show
a different value, and state i with i ≥ 2 means that you have i dice
of a kind. State 5 is an absorbing state. In state 1 you re-roll all five
dice and in state i with i ≥ 2 you leave the i dice of a kind and re-roll
the other 5 − i dice. The matrix of one-step transition probabilities is
from /to 0

0
0
0
1

0
2

0
3

0
4
5
0
1
2
3
4
5
120
1296
120
1296
900
1296
900
1296
120
216
250
1296
250
1296
80
216
25
36
25
1296
25
1296
15
216
10
36
5
6
1
1296
1
1296
1
216
1
36
1
6
0
0
0
0
0
0
0
0
0
0
1




.



(3)
The probability of getting Yahtzee within three rolls is p05 = 0.04603.
10.30 Number the three players as 1, 2, and 3, where the players 1 and 2 play
the first game. Let state 0 correspond to the start of the tournament,
state (i, j) to the situation that players i and j are playing against
each other with player i being the winner of the previous game, and
state ai to the situation that player i has won two games in a row. Let
Xn be the state after the nth game. Then {Xn } is a Markov chain
with three absorbing states a1 , a2 , and a3 . The one-step transition
probabilities are given by
from /to 0 (1, 2)

0
0
0

(1, 2)
0
0
0
(1, 3)
0

0
(2, 1)
0

0
(2, 3)
0

 0 0.5
(3, 1)

0
(3, 2)
0

0
a1
0

0
a2
0
a3
0
0
(1, 3)
0.5
0
0
0.5
0
0
0
0
0
0
(2, 1)
0
0
0
0
0
0
0.5
0
0
0
(2, 3)
0.5
0.5
0
0
0
0
0
0
0
0
(3, 1)
0
0
0
0
0.5
0
0
0
0
0
(3, 2)
0
0
0.5
0
0
0
0
0
0
0
a1 a2 a3

0
0
0
0.5 0
0 

0.5 0
0 

0 0.5 0 

0 0.5 0 
.
0
0 0.5 

0
0 0.5 

1
0
0 

0
1
0 
0
0
1
The probability of player 3 being the ultimate winner can be computed
(n)
as limn→∞ p0,a3 or by solving a system of 7 linear equations in 7 un-
224
knowns. The probability has the value 0.2857. The expected duration
of the tournament can be computed from
∞
X
(n)
(n)
(n)
(1 − p0,a1 − p0,a2 − p0,a3 )
n=0
or by solving a system of 7 linear equations in 7 unknowns. The
expected value is 3 games. This result can also be directly seen by
noting that the tournament takes r games with probability 2( 12 )r for
r = 2, 3, . . ..
10.31 You may use a Markov chain with eight states (0, 0, 0), . . . , (1, 1, 1),
where 0 means an empty glass and 1 means a filled glass. However, for
reasons of symmetry, a Markov chain with four states i = 0, 1, 2, and
3 suffices, where state i means that there are i filled glasses. State 0
is absorbing with p00 = 1. The other one-step transition probabilities
are p10 = 31 , p12 = 23 , p21 = 23 , p23 = 31 , and p32 = 1. By solving the
linear equations
1
2
2
µ3 = 1 + µ2 , µ2 = 1 + µ1 + µ3 , µ1 = 1 + µ2 ,
3
3
3
(n)
we find E(N ) = 10. The probability P (N > n) = 1 − p30 has the
values 0.6049, 0.3660, 0.1722, 0.1042, and 0.0490 for n = 5, 10, 15, 20,
and 25.
10.32 For i = 0, 1, . . . , 25, let state i correspond to the situation that the
number of five-dollar notes in the till is equal to i and the next person
wants to buy a ticket. Also, define an auxiliary state −1 for the situation that there is no change for a person wishing to buy a ticket with
a ten-dollar note. The states −1 and 25 are taken as absorbing states.
For the states i = 0, 1, . . . , 24, the one-step transition probabilities are
given by
pi,i−1 = pi,i+1 = 0.5 and
pij = 0 otherwise.
The probability that none of the fifty persons will have to wait for
(50)
change is 1 − p0,−1 = 0.1123.
10.33 Use a Markov chain with the 8 states 0, 1, . . . , 6 and −1 , where state
0 is the starting state, state i with 1 ≤ i ≤ 6 means that the outcome
of the last roll is i and at least as large as the outcome of the roll
225
preceding it, and state −1 means that the outcome of the last roll is less
than the outcome of the roll preceding it. State −1 is absorbing with
p−1,−1 = 1. To answer the first question, take the one-step transition
probabilities p0j = 61 for 1 ≤ j ≤ 6, pii = · · · = pi6 = 61 and pi,−1 = i−1
6
for 1 ≤ i ≤ 6. The probability that each of the last three rolls is at
(4)
least as large as the roll preceding it equals 1 − p0,−1 = 0.0972. In a
similar way, the probability that each of the last three rolls is larger
than the roll preceding it is 0.0116.
10.34 For the first probability, we adjust the Markov matrix P in Problem
10.4 to the matrix Q by replacing the last row of P by (0, 0, 0, 1), that
is, state RR is made absorbing. The first probability is calculated as
(7)
1 − qSS,RR = 0.7281. To calculate the second probability, we introduce
two auxiliary states R1 and R2 and consider the following Markov
matrix M = (mij ):
from\to SS

SS
0.9

SR
 0
 0.7
RS

 0
RR

 0
R1
R2
0
SR
0.1
0
0.3
0
0
0
RS RR
0
0
0.5
0
0
0
0.45
0
0.45
0
0
0
R1
R2

0
0
0.5
0 

0
0 
.
0.55
0 

0
0.55 
0
1
(7)
This leads to the value 1−mRR,R2 = 0.4859 for the second probability.
Note: The second probability can also be calculated as 1 minus the
element in the intersection of the last column and last row from the
matrix product P2 Q5 .
10.35 We adjust the Markov matrix P in Problem 10.8 to the matrix Q by
replacing the first row of P by (1, 0, 0, 0), that is, state 1 is made
absorbing. The matrix products Q4 and Q5 are given by

1

0.39310
Q4 = 
 0.44310
0.17090
0
0.31010
0.19760
0.32140
0
0.19840
0.16090
0.19760

0
0.09840 

0.19840 
0.31010
226
and

1

0.46379
Q5 = 
 0.49504
0.24256
0
0.25659
0.19409
0.30676
0
0.17106
0.13981
0.19409

0
0.10856 
.
0.17106 
0.25659
The probability that the car will be rented out more than five times
(5)
before it returns to location 1 is equal to 1 − q41 = 0.7574 if the car
is currently at location 4 and is equal to
(4)
(4)
(4)
p12 (1 − q21 ) + p13 (1 − q31 ) + p14 (1 − q41 ) = 0.1436
if the car is currently at location 1.
10.36 Let the random variable Xn be equal to 1 if the nth letter is a vowel
and be equal to 2 if the nth letter is a consonant. The process {Xn }
is a Markov chain with state space I = {1, 2}. The matrix of one-step
transition probabilities is given by
from\to
1
2
1
0.128 0.872
.
2
0.663 0.337
The equilibrium equations are
π1 = 0.128π1 + 0.663π2
π2 = 0.872π1 + 0.337π2 .
Solving the equations π1 = 0.128π1 + 0.663π2 and π1 + π2 = 1 gives
π1 = 0.4319 and π2 = 0.5681. The theoretical equilibrium probabilities
are in perfect agreement with the empirical findings of Andrey Markov.
10.37 In Example 10.3, the equilibrium equations are
π1 = 0.50π1 + 0.50π3 , π2 = 0.5π1 + 0.5π3 , and π3 = π2 .
Solving two of these equations together with π1 + π2 + π3 = 1 gives
π1 = π2 = π3 = 31 . The long-run proportion of time the professor has
his license with him is π1 = 13 . In Problem 10.2 the Markov chain
has six states 1, 2, . . . , 6, where state 1/2/3 means that the professor is
driving to the office and has his driver’s license with him/at home/at
office, and state 4/5/6 means that the professor is driving to home and
227
has his driver’s license with him/at office/at home. The equilibrium
equations are
π1 = 0.75π4 + 0.75π6 , π2 = 0.25π4 + 0.25π6 , π3 = π5 ,
π4 = 0.50π1 + 0.50π3 , π5 = 0.50π1 + 0.50π3 , and π6 = π2 .
Solving five of these equations together with π1 +· · ·+π6 = 1 gives π1 =
π3 = π4 = π5 = 0.2143, π2 = π6 = 0.0714. The long-run proportion of
time the professor has his license with him is π1 + π4 = 0.4286.
10.38 Let state 1 mean that the student is eating in the Italian restaurant,
state 2 mean that the student is eating in the Mexican restaurant,
state 3 mean that the student is eating in the Thai restaurant, and
state 4 mean that the student is eating at home. Let Xn be the state
at the nth evening. The process {Xn } is a Markov chain with the
one-step transition probabilities
from\to
1

1
0.10

2
 0.40
 0.50
3
4
0.40
2
0.35
0.15
0.15
0.35
The equilibrium equations are
3
4

0.25 0.30
0.25 0.20 
.
0.05 0.30 
0.25
0
π1 = 0.10π1 + 0.40π2 + 0.50π3 + 0.40π4
π2 = 0.35π1 + 0.15π2 + 0.15π3 + 0.35π4
π3 = 0.25π1 + 0.25π2 + 0.05π3 + 0.25π4
π4 = 0.30π1 + 0.20π2 + 0.30π3 .
Solving three of these equations together with π1 + π2 + π3 + π4 = 1
gives π1 = 0.3237, π2 = 0.2569, π3 = 0.2083, and π4 = 0.2110. The
proportion of time the student is eating at home is 0.2110.
10.39 Use a four-state Markov chain with the one-step transition probabilities pii = 1 − ri and pij = 13 ri for j 6= i. The Markov chain is
aperiodic. Therefore the limiting probabilities exist and are given by
the equilibrium probabilities. The equilibrium equations are
3
X
1
πj = (1 − rj )πj +
r i πi
3
i=1,i6=j
for 1 ≤ j ≤ 4.
228
P
Solving three of these equations together with 4j=1 πj = 1, we get
π1 = 0.2817, π2 = 0.1690, π3 = 0.2113, and π4 = 0.3380.
10.40 The equilibrium equations are
πSS = 0.90πSS + 0.70πRS , πSR = 0.10πSS + 0.30πRS
πRS = 0.50πSR + 0.45πRR , πRR = 0.50πSR + 0.55πRR .
Solving three of these equations together with πSS + πSR + πRS +
πRR = 1 yields πSS = 0.6923, πSR = 0.0989, πRS = 0.0989, and
πRR = 0.1099. The long-run proportion of days it will be sunny is
πSS + πRS = 0.7912. The limiting probability of a rainy Sunday is
πRR + πSR = 0.2088 (the Markov chain is aperiodic).
10.41 State j means that compartment A contains j particles of type 1. An
intuitive guess is the hypergeometric distribution
r r
πj =
j
r−j
2r
r
for j = 0, 1, . . . , r.
These πj satisfy the equilibrium equations
πj =
(r − j + 1)2
2j(r − j)
(j + 1)2
π
+
π
+
πj+1 ,
j−1
j
r2
r2
r2
as can be verified by substitution. Since the Markov chain has no two
or more disjoint closed sets, its equilibrium distribution is uniquely
determined.
10.42 Solving the equilibrium equations
πA = 0.340πA + 0.193πB + 0.200πC + 0.240πD
πB = 0.214πA + 0.230πB + 0.248πC + 0.243πD
πC = 0.296πA + 0.345πB + 0.271πC + 0.215πD
together with πA + πB + πC + πD = 1 gives πA = 0.2419, πB = 0.2343,
πC = 0.2808, and πD = 0.2429. The long-run frequency of base A
appearing is πA = 0.2419 and the long-run frequency of observing
base A followed by another A is πA pAA = 0.0822.
10.43 The first thought might be to use a Markov chain with 16 states.
However, a Markov chain with two states 0 and 1 suffices, where state
0 means that Linda and Bob are in different venues and state 1 means
229
that they are in the same venue. The one-step-transition probability
p01 is
2 1
1 + 0.6 ×
× 0.6 ×
,
p01 = 2 × 0.4 × 0.6 ×
3
3
3
where the first term refers to the probability that exactly one of the
two persons does not change of venue and the other person goes to
the venue of that person, and the second term refers to the probability
that both persons change of venue and go the same venue. By a similar
argument,
1
.
p11 = 0.4 × 0.4 + 0.6 × 0.6 ×
3
This gives p01 = 0.24 and p11 = 0.28. Further, p00 = 1 − p01 and
p10 = 1 − p11 . Solving the two equations
π0 = p00 π0 + p10 π1 and π0 + π1 = 1
1
gives π0 = 18
19 and π1 = 19 . The long-run fraction of weekends that
1
Linda and Bob visit a same venue is π1 = 19
. The limiting probability
that they visit a same venue two weekends in a row is π1 ×p11 = 0.0147
(the Markov chain is aperiodic).
10.44 In Problem 10.3 the state of the Markov chain and the one-step transition probabilities are given (in the equilibrium analysis the auxiliary
state 0 is not needed). The equilibrium equations are given by
πk =
6 h X
k j
j=1
6
−
k − 1 j i
πj
6
for k = 1, . . . , 6.
Solving the equilibrium equations, we find that the long-run frequency
at which k dice are rolled has the values 0.0004, 0.0040, 0.0230, 0.0878,
0.2571, and 0.6277 for k = 1, 2, . . . , 6.
P
1
10.45 Since N
k=1 pkj = 1 for 1 ≤ j ≤ N , it follows that πj = N for all j is
a solution of the equilibrium equations
πj =
N
X
k=1
πk pkj
for 1 ≤ j ≤ N.
The Markov chain has no two or more disjoint closed sets and so the
discrete uniform distribution is its unique equilibrium distribution.
230
10.46 The process {Xn } is a Markov chain with state space {0, 1, . . . , r − 1}.
To give the matrix of one-step transition probabilities, we distinguish
between the cases 2 ≤ r ≤ 5 and r ≥ 6. For r = 2, 3, 4, and 5 the
respective matrices of one-step transition probabilities of the Markov
chain are given by
from\to
0
0
1/2
1
1/2
1
1/2
1/2
from\to
0
1

0
2/6 2/6
 2/6 2/6
1
2
2/6 2/6
from\to
0

0
1/6

1
 1/6
 2/6
2
3
2/6
from\to
0

0
1/6
 1/6
1

 1/6
2

 1/6
3
4
2/6
1
2/6
1/6
1/6
2/6
1
2/6
1/6
1/6
1/6
1/6
2

2/6
2/6 
2/6
2
1/6
2/6
2/6
1/6
2
1/6
2/6
1/6
1/6
1/6
3

1/6
2/6 

2/6 
1/6
3
1/6
1/6
2/6
1/6
1/6
4

1/6
1/6 

1/6 
.
2/6 
1/6
Each of the transition
matrices corresponding to r = 2, 3, 4, and 5 has
Pr−1
the property that i=0 pij = 1 for all j. This property is also satisfied
for the matrix of one-step transition probabilities corresponding to the
case of r ≥ 6. For any r ≥ 6, it is readily verified that for each fixed
j the probability pij = 61 for six different values of i and pij = 0 for
the other i. Thus the Markov chain is doubly stochastic for any r ≥ 2.
Moreover, the Markov chain has no two or more disjoint closed sets
and is aperiodic. Invoking the result of Problem 10.45, we get that
limn→∞ P (Xn = 0) = 1r .
10.47 The Markov chain is a regenerative stochastic process. The times at
which the process visits state r are taken as regeneration epochs and so
231
a cycle is the time interval between two successive visits to state r. The
expected length of one cycle is the mean recurrence time µrr = 1/πr .
Fix state j 6= r. Assume that a reward of 1 is earned each time the
Markov chain visits state j. Then, by the renewal-reward theorem,
the long-run average reward per unit time is γjr /µrr , where γjr is the
expected number of visits to state j between two returns of the Markov
chain to state r. Further, the long-run average reward per unit time
is πj and πr = 1/µrr, showing that πj = γjr πr for any state j.
10.48 The perturbed Markov chain {X n } has also no two or more disjoint
closed sets. Denote by {π i , i ∈ I} its unique equilibrium distribution.
Then, for all j ∈ I,
X
X
πj =
pkj π k = pjj π j +
pkj π k
k∈I
k∈I,k6=j
= (1 − τ + τ pjj )π j + τ
X
k∈I,k6=j
pkj π k = (1 − τ )π j + τ
X
pkj π k .
k∈I
This gives
πj =
X
pkj π k
for all j ∈ I.
k∈I
In other words, {π j , j ∈ I} is an equilibrium distribution of the
Markov chain {Xn }. Since this Markov chain has a unique equilibrium distribution {πj , j ∈ I}, it follows that π j = πj for all j ∈ I.
Note: Since pii > 0 for all i, the perturbed Markov chain {X n } is
(n)
aperiodic and so limn→∞ pij = πj for all i, j ∈ I.
10.49 (a) This result follows from the inequality
(n+m)
pab
(m)
≥ p(n)
ac pcb
for all states a, b and c.
(b) To prove the “if part,” assume to the contrary that C is not irreducible. Then there is closed set S ⊂ C with S 6= C. Choose any
i ∈ C. Since S is closed, C(i) ⊆ S and so C(i) 6= C, contradicting
that i communicates with all states in C. The “only if” part follows
by showing that C(i) = C for any i ∈ C. To show this, it suffices
to prove that C(i) is closed. Assume to the contrary that C(i) is not
closed. Then there are states a ∈ C(i) and b ∈
/ C(i) with pab > 0.
(n)
Since a ∈ C(i), there is an integer n ≥ 1 such that pia > 0 and so
(n+1)
pib
(n)
≥ pia pab > 0,
232
contradicting that b ∈
/ C(i).
(c) Since C is finite and closed, there must be some state j ∈ C such
that P (Xn = j for infinitely many n | X0 = j) = 1. Such a state is
recurrent.
(d) Assume to the contrary that some transient state t can be reached
(m)
from some recurrent state r, that is, prt > 0 for some m ≥ 1. Then
P (Xn = r for infinitely many n | X0 = r) = 1 implies that this relation also holds with r replaced by t, which would mean that state t is
recurrent.
(e) By (c), there is some state j ∈ C that is recurrent. Take any other
state i ∈ C. By (b), the states i and j communicate and so there are
(r)
(s)
integers r, s ≥ 1 such that pij > 0 and pji > 0. It now follows from
(r+n+s)
pii
(r) (n) (s)
≥ pij pjj pji that
∞
X
n=1
(n)
(r) (s)
pii ≥ pij pji
∞
X
(n)
pjj .
n=1
P
(n)
= ∞ for the recurrent state j, we find ∞
Since
n=1 pii = ∞,
showing that state i is recurrent.
(r )
(f ) Since P (Nij > r) ≤ P (Nij > ri ) = 1 − pij i ≤ 1 − ρ for all i ∈ C,
k
we
P∞have P (Nij > kr) ≤ (1 − ρ) for k ≥ 1. Then, by E(Nij ) =
n=0 P (Nij ) > n), we get
(n)
n=1 pjj
P∞
E(Nij ) = 1 +
rk
X
∞
X
k=1 l=(r−1)k+1
P (Nij > l) ≤ 1 +
∞
X
k=1
r(1 − ρ)k−1 ,
which proves that E(Nij ) < ∞. Note: let T be the set of transient
states of a Markov chain having a single irreducible set C of states.
Mimicking the foregoing proof shows that the expected number of
transitions until reaching the set C is finite for any starting state i ∈ I.
(g) Denote by dk the period of any state k ∈ C. Choose i, j ∈ C with
(v)
i 6= j. By (b), there are integers v, w ≥ 1 such that pij > 0 and
(w)
(v+w)
pji > 0. Then pii
> 0, and so v + w is divisible by di . Let n ≥ 1
(n)
pjj
(v+n+w)
be any integer with
> 0. Then pii
> 0 and so v + n + w is
divisible by di . Thus, n is divisible by di and so di ≤ dj . For reasons
of symmetry, dj ≤ di , showing that di = dj .
(d)
10.50 For fixed 1 ≤ k ≤ d, let qij = pij for i, j ∈ Rk . Then, the matrix
(n)
Q = (qij ), i, j ∈ Rk is a Markov matrix and has the property qij =
233
(nd)
(n)
pij . The Markov matrix Q is aperiodic. Therefore limn→∞ qij
(nd)
exists and so limn→∞ pij exists for all i, j ∈ Rk . To show that the
limit is dπj for all i, j ∈ Rk , we reason as follows. Imagine now that
the state of the Markov process {Xn } is only observed at the times
0, d, 2d, . . . and suppose that the starting state belongs to Rk . Then
the long-run frequency at which any state j ∈ Rk will be observed is
d times the long-run frequency πj at which state j will be observed
when considering the process over all times 0, 1, 2, . . . .
10.51 The state of the Markov chain is the inventory position just before
review. The equilibrium equations for the π1 with i 6= 0 are
πi =
s−1
X
j=0
πj e
−λ
λS−i
+
(S − i)!
S
X
j=max(s,i)
πj e−λ
λj−i
(j − i)!
for 0 < i ≤ S.
(a) P
The long-run average stock on hand at the end of the week is given
by Sj=0 jπj =4.387.
P
(b) The long-run average ordering frequency is s−1
j=0 πj = 0.5005.
P∞
−λ
k
(c) Let L(j) = k=j+1 (k − j)e λ /k! denote the expected amount of
demand lost in the coming week if the current stock on hand just after
review is j. By RuleP
10.7, the long-run
average amount of demand
PS
lost per week is L(S) s−1
π
+
L(j)π
j = 0.0938.
j=0 j
j=s
10.52 In Problem 10.40, we computed the equilibrium probabilities πSS =
0.6923, πSR = 0.0989, πRS = 0.0989, and πRR = 0.1099. Hence, the
long-run average sales per day is
1,000 × (πSS + πRS ) + 500 × (πSR + πRR ) = 895.60 dollars.
10.53 In the answer to Problem 10.6, the state of the Markov chain and the
one-step transition probabilities are given.
(a) The long-run proportion of time the device operates properly is
1 − π(0,0) = 0.9814.
(b) By Rule 10.7, the long-run average weekly cost is 750π(0,0) +
P
200[π(0,0) + π(6,6) + π(0,6) ] + 100 5j=1 [π(0,j) + π(j,6) ] = 52.46 dollars.
10.54 Let state 0 mean that both stations are idle, state 1 mean that only
station 1 is occupied, state 2 mean that only station 2 is occupied,
and state 3 mean that both stations are occupied. Let Xn be the
state just before an item arrives at t = n. Then, by the memoryless
234
property of the exponentially distributed processing times, {Xn } is
a discrete-time Markov chain with one-step transition probabilities
p00 = 1 − e−µ1 , p01 = e−µ1 , p10 = p20 = p30 = (1 − e−µ1 )(1 − e−µ2 ),
p11 = p21 = p31 = e−µ1 (1 − e−µ2 ), p12 = p22 = p32 = (1 − e−µ1 )e−µ2 ,
and p13 = p23 = p33 = e−µ1 e−µ2 . Putting for abbreviation ai = e−µi
for i = 1, 2, the equilibrium equations are
π0 = (1 − a1 )π0 +
π1 = a 1 π0 +
3
X
i=1
3
X
i=1
(1 − a1 )(1 − a2 )πi ,
a1 (1 − a2 )πi , π2 =
3
X
i=1
(1 − a1 )a2 πi , π3 =
3
X
a 1 a 2 πi ,
i=1
where µ1 = 4/3 and µ2 = 4/5. The equilibrium probabilities are
π0 = 0.6061, π1 = 0.2169, π2 = 0.1304, and π3 = 0.0467. The loss
probability is equal to π3 = 0.0467.
Note: The long-run fraction
of time that both stations are occupied
is (1 − π0 )E min(U, 1) = 0.1628, where U the minimum of two independent exponentially distributed random variables with rates µ1 and
µ2 . Thus U has an
exponential distribution with rate µ1 + µ2 , which
gives E min(U, 1) = 0.41323.
10.55 Let Xn be the number of tokens in the buffer at the beginning of
the nth time slot just before a new token arrives. Then, {Xn } is a
Markov chain with state space I = {0, 1, . . . , M }. Put for abbreviation
ak = e−λ λk /k!. The one-step transition probabilities are pjk = aj+1−k
P
for 0 ≤ j < M and 1 ≤ k ≤ j + 1, pj0 = 1 − j+1
pjk for 0 ≤ k < M ,
Pk=1
M
pM k = aM −k for 1 ≤ k ≤ M and pM 0 = 1 − k=1 pM k . Let {πj } be
the equilibrium distribution of the Markov chain. By Rule 10.7, the
long-run
is equal
Pj admitted in one timePslot
PM average number of packets
j
to j=0 c(j)πj , where c(j) = k=0 kak + (j + 1) 1 − k=0 ak for
P −1
PM −1 0 ≤ j < M and c(M ) = M
k=0 kak + M 1 −
k=0 ak .
10.56 Define the random variable Xn as the premium class for the transport
firm at the beginning of the nth year. Then, the stochastic process
{Xn } is a Markov chain with four possible states i = 1, ..., 4. The
one-step transition probabilities pij are easily found. Denote by the
random variable S the total damage in the coming year and let G(s)
denote the cumulative probability distribution function of S. A onestep transition from state i to state 1 occurs if and only if at the end
235
of the year a damage is claimed; otherwise a transition from state i to
state i + 1 occurs (state 5 = state 4). Since for premium class i only
a cumulative damage S larger than αi will be claimed, it follows that
pi1 = 1 − G(αi )
for i = 1, ..., 4,
pi,i+1 = G(αi ) for i = 1, 2, 3, and p44 = G(α4 ).
The other one-step transition probabilities pij are equal to zero. The
Markov chain has no two disjoint closed sets. Hence the equilibrium
probabilities πj , 1≤ j ≤ 4 are the unique solution to the equilibrium
equations
π1 = 1 − G(a1 ) π1 + 1 − G(α2 ) π2 + 1 − G(a3 ) π3 ,
+ 1 − G(a4 ) π4 ,
π2 = G(α1 )π1 , π3 = G(α2 )π2 , π4 = G(α3 )π3 + G(α4 )π4 .
together with the normalizing equation π1 + π2 + π3 + π4 = 1. Denote
by c(j) the expected costs incurred during the coming year when at
the beginning of the year premium Pj is paid. Then,
P by Rule 15.7,
the long-run average cost per year is g(α1 , ..., α4 ) = 4j=1 c(j)πj , with
probability 1. The one-year cost c(j) consists of the premium Pj and
any damages not compensated that year by the insurance company.
By conditioning on the cumulative damage S in the coming year, it
follows that
Z αj
c(j) = Pj +
sg(s)ds + rj [1 − G(αj )].
0
In case S has an exponential distribution with mean 1/η, the expression for c(j) can be simplified to c(j) = Pj + 1 − e−ηαj − ηαj e−ηαj +
rj e−ηαj ; otherwise, numerical integration must used to obtain the c(j).
10.57 The Markov matrix is doubly stochastic and irreducible. Therefore
its unique equilibrium distribution is the uniform distribution, that is,
πi = 0.25 for all i, see Problem 10.45. Also, the Markov matrix has
the property that pjk = pkj for all j, k. This gives that πj pjk = πk pkj
for all j, k, showing that the Markov chain is reversible.
10.58 The two equilibrium equations boil down to the single equation aπ1 =
bπ2 . This equation states that π1 p12 = π2 p21 , showing that the Markov
chain is reversible for all a, b with 0 < a, b < 1. This result is obvious
by noting that in any two-state Markov chain with no absorbing states
236
the long-run average number of transitions from state 1 to state 2 per
unit time must be equal to the long-run average number of transitions
from state 2 to state 1 per unit time.
10.59 The detailed balance equation πi pij = πj pji boils down to
πi
PN
k=1 wik
πj
= PN
k=1 wjk
for i, j = 1, . . . , N.
Therefore the detailed balance equations are satisfied by
PN
wik
πi = PN k=1
for i = 1, . . . , N.
PN
j=1
k=1 wjk
Thus the Markov chain is reversible and has the πi as its unique equilibrium distribution. The equilibrium probabilities for the mouse prob2
lem are π1 = π5 = π11 = π15 = 44
, π2 = π3 = π4 = π6 = π10 = π12 =
3
4
π13 = π14 = 44 , and π7 = π8 = π9 = 44
(take wij = 1 if the rooms i
and j are connected by a door). The mean recurrence time from state
i to itself is µii = 1/πi .
10.60 Let K be the common number of points in each of the sets N (i).
Fix j, k ∈ I with j 6= k. If k ∈
/ N (j), then pjk = pkj = 0 and so
e−c(j)/T pjk = e−c(k)/T pkj . For k ∈ N (j),
pjk =
Therefore
e
−c(j)/T
pjk = e
−c(j)/T
e−c(k)/T 1
min 1, −c(j)/T .
K
e
1
e−c(k)/T
min 1, −c(j)/T
K
e
!
1
min(e−c(j)/T , e−c(k)/T )
K
!
−c(j)/T
e
1
= e−c(k)/T min 1, −c(k)/T = e−c(k)/T pkj .
K
e
=
Letting αi = e−c(i)/T for all i ∈ I, we now have verified that αj pjk =
αk pkj for all j, k ∈ I (trivially, αj pjk = αk pkj for j = k). Converting
P
the αj into the probability mass function {αj∗ } with αj∗ = αj / i∈I αi ,
we have
αj∗ pjk = αk∗ pkj for all j, k ∈ I.
237
In other words, the Markov chain with the one-step transition probabilities pjk is reversible. Summing over k in the latter equation, we
get that the αj∗ satisfy the equilibrium equations of the Markov chain.
Since the Markov chain has no two disjoint closed sets, the αj∗ form
the unique equilibrium distribution of the Markov chain.
Note 1 : The assumption that each set N (i) has the same number of
elements is not essential. If this assumption does not hold, the Markov
chain is defined as follows. The Markov chain moves from any state i
to state j ∈ N (i) with probability pijP
= K1 min(1, e−c(j)/T /e−c(i)/T ) or
stays in state i with probability 1 − j∈N (i) pij , where K = maxi Ki
with Ki denoting the number of elements in N (i) (note the assumption that i ∈
/ N (i)).
Note 2 : In the case that the function c(i) assumes its absolute minimum in a unique point m, then
πm =
1+
P
1
−[c(k)−c(m)]/T
k6=m e
with c(k) − c(m) > 0 for all k 6= m, implying that πm tends to 1 as
T → 0. More generally, if M is the set of points at which the function
1
c(i) takes on itsP
absolute minimum, then πj for j ∈ M tends to M
as
T → 0 (and so j∈M πj tends to 1 as T → 0), where M denotes the
number of elements in the set M.
10.61 Applying 100,000 iterations of the Metropolis-Hastings algorithm with
random-walk sampling, we found for a = 0.02, 0.2, 1, and 5 the average
values 97.3%, 70.9%, 34%, and 4.5% for the acceptance probability.
Further, the simulation results clearly showed that a high acceptance
probability does not necessarily guarantee a good mixing of the state.
We experimentally found that a = 0.6 gives an excellent mixing with
an average acceptance probability of about 49.9% (based on 100,000
iterations). For the choice a = 0.6, the estimates 1.652 and 1.428 for
the expected value and the standard deviation of X1 were obtained
after 100,000 iterations (the exact values are 1.6488 and 1.4294).
10.62 The following algorithm can be used to see how quickly the probability
mass function π(1) = 0.2, π(2) = 0.8 can be recovered when applying
the Metropolis-Hastings algorithm with q(t | s) = 0.5 for s, t = 1, 2.
Algorithm:
Step 0. Choose a (large) number M (the number of iteration steps)
Take a starting state s0 ∈ {1, 2} and let n := 1.
238
Step 1. Generate a random number u1 from (0, 1). Let the candidate state tn := 1 if u1 ≤ 0.5 and tn := 0 otherwise. Calculate the
acceptance probability
π(t )
n
,1 .
α = min
π(sn−1 )
Step 2. Generate a random number u2 from (0, 1). If u2 ≤ α, accept
tn and let sn := tn ; otherwise, sn := sn−1 .
Step 3. n := n + 1. If n < M , repeat step 1 with sn−1 replaced by sn ;
otherwise, stop.
1 PM
The probability π(1) is estimated by M
n=1 In , where In = 1 if
sn = 1 and In = 0 otherwise.
10.63 Using the definition
π(x1 , x2 )
,
−∞ π(u1 , x2 )du1
π1 (x1 | x2 ) = R ∞
it follows that π1 (x1 | x2 ) as function of x1 is proportional to
1
2
1
2
2
2 −1 x
e− 2 (1+x2 )x1 −7x1 = e− 2 [x1 −7(1+x2 )
2 −1
1 ]/(1+x2 )
.
Hence, for fixed x2 , π1 (x1 | x2 ) as function of x1 is proportional to
1
2 −1 ]2 /(1+x2 )−1
2
e− 2 [x1 −(7/2)(1+x2 )
.
6000
5000
4000
3000
2000
1000
0
−2
−1
0
1
2
3
4
5
6
7
8
In other words, the univariate conditional
density π1 (x1 | x2 ) is given
by the N 27 (1 + x22 )−1 , (1 + x22 )−1 density. Similarly, the univariate
239
conditional density π2 (x2 | x1 ) is the N 72 (1 + x21 )−1 , (1 + x21 )−1 density. Next it is straightforward to apply the Gibbs sampler. Using the
standard Gibbs sampler, the estimates 1.6495 and 1.4285 are found for
E(X1 ) and σ(X1 ) after one million runs (the exact values are 1.6488
and 1.4294). The simulated probability histogram of the marginal
density of X1 is given in the above figure.
10.64 The univariate conditional densities follow from the definitions
π(x, y, n)
π1 (x | y, n) = Pn
u=0 π(u, y, n)
π2 (y | x, n) = R 1
π(x, y, n)
u=0 π(x, u, n)
du
π(x, y, n)
.
π3 (n | x, y) = P∞
u=x π(x, y, u)
This gives that π1 (x | y, n) is proportional to
y x (1 − y)n−x
as function of x with 0 ≤ x ≤ n, π2 (y | x, n) is proportional to
y x+α−1 (1 − y)n−x+β−1
as function of y with 0 < y < 1, and π3 (n | x, y) is proportional to
λn−x (1 − y)n−x
(n − x)!
as function of n with n ≥ x. This shows that the marginal distribution of X is the binomial distribution with parameters n and y, the
marginal distribution of Y is the beta distribution with parameters
x + α and n − x + β, and the marginal distribution of N is a Poisson distribution shifted to x and having parameter λ(1 − y). Next
it is straightforward to apply the Gibbs sampler. Using the standard
Gibbs sampler, the estimates 9.992 and 6.836 are found for E(X) and
σ(X) after 100,000 iterations (the exact values are 10 and 6.809). In
the figure below the simulated probability histogram for the marginal
density of X is given.
240
0.1
0.08
0.06
0.04
0.02
0
0
10
20
30
40
50
Note: The values in the simulated probability histogram can be compared with the exact values of the probabilities. It can be analytically
verified that the marginal distribution of X is given by
P (X = x) = 72(x + 1)
∞
X
n=x
e−50
50n (n − x + 7)!
,
(n + 9)! (n − x)!
x = 0, 1, . . . .
The exact values of P (X = x) are 0.0220, 0.0500, 0.0646, 0.0541,
0.0332, 0.0167, 0.0071, 0.0026, and 0.0008 for x = 0, 2, 5, 10, 15, 20,
25, 30, and 35.
241
Chapter 11
11.1 Let state i mean that i units are in working condition, where i = 0, 1, 2.
Let X(t) be the state at time t. The process {X(t)} is a continuoustime Markov chain. If an item has an exponentially distributed lifetime
with failure rate α, then an item of age t will fail in the next ∆t time
units with probability α∆t + o(∆t) as ∆t → 0. Thus, for ∆t → 0,

µ∆t + o(∆t),
i = 0, j = 1



λ∆t(1 − µ∆t) + o(∆t), i = 1, j = 0
P (X(t+∆t) = j | X(t) = i) =
µ∆t(1 − λ∆t),
i = 1, j = 2



(λ + η)∆t + o(∆t),
i = 2, j = 1,
where P (X(t + ∆t) = 1 | X(t) = 2) uses the fact that the minimum of
two independent exponentially distributed lifetimes is again exponentially distributed. Therefore the transition rates of the Markov chain
are given by q01 = µ, q10 = λ, q12 = µ, and q21 = λ + η.
11.2 Let the random variable X(t) be the number of cars present in the
gasoline station at time t. The process {X(t)} is a continuous-time
Markov chain with state space I = {0, 1, 2, 3, 4}. Noting that the
probability of two of more state changes in a very small time interval
of length ∆t is o(∆t) as ∆t → 0, we have
P (X(t + ∆t) = i + 1 | X(t) = i) = λ∆t + o(∆t) for 0 ≤ i ≤ 3,
P (X(t + ∆t) = i − 1 | X(t) = i) = µ∆t + o(∆t) for 1 ≤ i ≤ 4.
Therefore the transition rates of the Markov chain are given by qi,i+1 =
λ for 0 ≤ i ≤ 3 and qi,i−1 = µ for 1 ≤ i ≤ 4. The other qij are zero.
11.3 Let X( t) denote the number of customers in the barbershop at time
t. The process {X(t)} is a continuous-time Markov chain with state
space I = {0, 1, . . . , 7}. For 0 ≤ i ≤ 6, we have
P (X(t + ∆t) = i + 1 | X(t) = i) = λ∆t × (1 − b(i)) + o(∆t).
Thus qi,i+1 = λ(1 − b(i)) for 0 ≤ i ≤ 6. Also, q10 = µ, qi,i−1 = 2µ for
2 ≤ i ≤ 7, and the other qij = 0.
11.4 Let the random variable X(t) be equal to i if the ferry is at point A at
time t and i cars are on the ferry for i = 0, 1, . . . , 6, be equal to 7 if the
ferry is traveling to point B at time t, and be equal to 8 if the ferry is
242
traveling to point A at time t. The stochastic process {X(t), t ≥ 0} is
a continuous-time Markov chain. Its transition rates qij are given by
qi,i+1 = λ for i = 0, 1, . . . , 6, q78 = µ1 , q80 = µ2 , and the other qij = 0.
11.5 Let state (i, 0) mean that i passengers are waiting at the stand and
no sheroot is present (0 ≤ i ≤ 7), and let state (i, 1) mean that i
passengers are waiting at the stand and a sheroot is present (0 ≤
i ≤ 6). Let X(t) be the state at time t. The process {X(t)} is a
continuous-time Markov chain with transition rates
q(i,0),(i+1,0) = λ and q(i,0),(i,1) = µ for 0 ≤ i ≤ 6, q(7,0),(0,0) = µ,
q(i,1),(i+1,1) = λ for 0 ≤ i ≤ 5, and q(6,1),(0,0) = λ.
11.6 For any t ≥ 0, define the random variable X1 (t) as the number of
ships present at time t. Let X2 (t) be equal to 1 if the unloader is
available at time t and be equal to 0 otherwise. The process X(t) =
{(X1 (t), X2 (t))} is a continuous-time Markov chain with state space
I = {(i, 0) | i = 1, . . . , 4} ∪ {(i, 1) | i = 0, 1, . . . , 4}. Noting that the
probability of two or more state transitions in very small time interval
of length ∆t is o(∆t) as ∆t → 0, we get the transition rates
q(i,0)(i,1) = β for 1 ≤ i ≤ 4, q(i,0)(i+1,0) = q(i,1)(i+1,1) = λ for 1 ≤ i ≤ 3,
q(i,1)(i,0) = δ, and q(i,1)(i−1,1) = µ for 1 ≤ i ≤ 4.
11.7 Let X1 (t) be the number of messages in the system at time t. Also,
let X2 (t) be 1 if the gate is not closed at time t and be 0 otherwise.
The process {(X1 (t), X2 (t))} is a continuous-time Markov chain with
state space I = {(0, 1), . . . , (R − 1, 1)} ∪ {(r + 1, 0), . . . , (R, 0)}. The
transition rates are q(i,1),(i+1,1) = λ for 0 ≤ i ≤ R−2, q(R−1,1),(R,0) = λ,
q(i,1),(i−1,1) = µ for 1 ≤ i ≤ R − 1, q(i,0),(i−1,0) = µ for r + 2 ≤ i ≤ R,
and q(r+1,0),(r,1) = µ.
11.8 For i = 1, 2, let the random variable Xi (t) be equal to 0 if station
i is free and be equal to 1 if station i is occupied. The process
{(X1 (t), X2 (t))} is a continuous-time Markov chain. The transition
rates are q(0,0)(1,0) = q(1,0)(1,1) = q(0,1)(1,1) = λ, q(1,0)(0,0) = q(1,1)(0,1) =
σ1 µ, q(0,1)(0,0) = q(1,1)(1,0) = σ2 µ, and the other qst = 0.
11.9 Define the state as the number of machines in repair. In the states
i = 0 and 1 there are 10 machines in operation and 2 − i machines
on standby, while in state i ≥ 2 there 12 − i machines in operation
243
and no machines on standby. Let X(t) be the state at time t. The
process {X(t)} is a continuous-time Markov chain with state space
I = {0, 1, . . . , 12}. The transition rates are qi,i+1 = 10λ for i = 0, 1,
qi,i+1 = (12 − i)λ for 2 ≤ i ≤ 11, and qi,i−1 = iµ for 1 ≤ i ≤ 12.
11.10 The expected amount of time that the process is in state j during
(0, T ) when the initial state is i is given by
Z
T
Z
I(t)dt | X(0) = i =
T
E(I(t) | X(0) = i) dt
Z T
Z T
pij (t) dt.
P (I(t) = 1 | X(0) = i) dt =
=
E
0
0
0
0
11.11 Using Problem 11.10 and Example 11.1 (continued) in Section 11.2,
the answer is
Z Th
i
µ1
µ1
−(µ1 +µ2 )t
−
e
dt
µ1 + µ2 µ1 + µ2
0
µ1
µ1 T
−
(1 − e−(µ1 +µ2 )T ).
=
µ1 + µ2 (µ1 + µ2 )2
11.12 It is matter of straightforward algebra to verify the result by substitution into the Kolmogorov forward differential equations.
11.13 (a) Let N (t) be the number of state changes in (0, t). Then
P (N (T ) = 1 | X(0) = a, X(T ) = b)
P (X(T ) = b, N (T ) = 1 | X(0) = a)
=
.
P (X(T ) = b | X(0) = a)
Obviously, P (X(T ) = b | X(0) = a) = pab (T ). Let νi =
pij = qij /νi . Then, by the law of conditional probability,
P (X(T ) = b, N (T ) = 1 | X(0) = a) =
Z
T
P
j6=i qij
e−νb (T −t) νa e−νa t pab dt.
0
Thus the sought probability is
qab e−νb T − e−νa T
(νa − νb )pab (T )
if νa 6= νb and
and
qab T e−νa T
if νa = νb .
pab (T )
244
(b) For fixed j, let I(t) = 1 if X(t) = j and I(t) = 0 otherwise. Then
P (I(t) = j, X(T ) = b | X(0) = a)
P (X(T ) = b | X(0) = a)
paj (t)pjb (T − t)
.
=
pab (T )
P (I(t) = j | X(0) = a, ] X(T ) = b) =
Thus the desired expected value is
Z T
I(t)dt | X(0) = a, X(T ) = b =
E
0
1
pab (T )
Z
T
0
paj (t)pjb (T − t) dt.
11.14 Using the result of Problem 11.10, the expected amount of time that
the process
R t is in state j during (0, t) when the starting state is 1 is
given by 0 p1j (s) ds for j = 1, 2. Therefore the expected number of
Poisson events during (0, t) when the initial state is 1 is
λ1
Z
t
p11 (s) ds + λ2
0
Z
t
p12 (s) ds.
0
By the result of Example 11.1 (continued), we have for any s > 0 that
p11 (s) =
β
α −(α+β)s
α
α −(α+β)s
+
e
, p12 (s) =
−
e
.
α+β α+β
α+β α+β
Next we find after some algebra that the expected number of Poisson
events during (0, t) when the initial state is 1 is given by
α(λ1 − λ2 ) λ1 α + λ2 β
1 − e−(α+β)t .
t+
2
α+β
(α + β)
11.15 The number of messages in the buffer is described by a continuoustime Markov chain with state space I = {0, 1, . . . , 10} and transition
rates qi,i−1 = iµ for 1 ≤ i ≤ 10 and qi,i+1 = λ for 0 ≤ i ≤ 9. Using the
alternative construction of a continuous-time Markov chain,
fi =
iµ
λ
fi−1 +
fi+1 for 1 ≤ i ≤ 9,
λ + iµ
λ + iµ
where f0 = 0 and f10 = 1.
11.16 By the alternative construction of a continuous-time Markov chain,
it takes an exponentially distributed time with expected value ν1i to
245
make a transition from state i and such a transition will be to state j
q
with probability pij = νiji . By the law of conditional expectation,
µi =
1 X
1 X qij
+
µj pij =
+
µj
νi
νi
νi
j6=i
j6=i
for i 6= r.
11.17 (a) Let N (t) be the number of state transitions in (0, t) and Xn be
the state after the nth state transition. The random variable N (t)
is Poisson distributed with expected value νt and the embedded process {Xn } is a discrete-time Markov chain. By the law of conditional
probability, P (J(t) = j | J(0) = i) is given by
∞
X
n=0
=
P (J(t) = j | J(0) = i, N (t) = n) P (N (t) = n)
∞
X
n=0
P (Xn = j | X0 = i)e−νt
(νt)n
,
n!
which gives the desired result.
(b) The process {J(t)} is a continuous-time Markov chain. Since a
finite-state continuous-time Markov chain is uniquely determined by
its transition rates, it suffices to verify that {J(t)} has the qij as its
transition rates. Using the expression for pij (t) = P (J(t) = j | J(0) =
(0)
i) in part (a) and noting that rij = 0 for j 6= i, it follows from
∞
pij (t)
1 X (n) −νt (νt)n
= lim
rij e
t→0
t→0 t
t
n!
lim
n=0
that limt→0 pij (t)/t = rij ν = qij for any j 6= i, as was to be proved.
11.18 (a) Consider first the case of a single standby unit. In the solution of
Problem 11.1, the state is defined as the number of units in working
condition and the transition rates are given. To find E(T ), define µi
as the expected amount of time until the first visit to state 0 starting
from state i for i = 1, 2. Then E(T ) = µ2 . The µi can be found by
solving two linear equations, see Problem 11.1. Noting that the rate
ν1 out of state 1 and the rate ν2 out of state 2 are given by ν1 = λ + µ
and ν2 = λ + η, we get
µ1 =
1
µ
λ
1
+
µ2 +
× 0 and µ2 =
+ µ1 .
λ+µ λ+µ
λ+µ
λ+η
246
For the numerical data λ = 1, η = 0 and µ = 50, the solution of
the two equations is µ1 = 51 and µ2 = 52. Hence the expected time
until the system goes down for the first time is E(T ) = 52. To find
P (T > t), the two linear differential equations
Q′1 (t) = −(λ + µ)Q1 (t) + µQ2 (t)
Q′2 (t) = −(λ + η)Q2 (t) + (λ + η)Q1 (t)
are solved by using a numerical code. The probability P (T > t) is
given by Q2 (t). This probability has the numerical values 0.8253,
0.6184, 0.3823, 0.1461, and 0.0213 for t =10, 25, 50, 100, and 200.
(b) For the case of two standby units, use a continuous-time Markov
chain with the four states i = 0, 1, 2, and 3. State i means that the
number of units in working condition is i. The transition rates are
q01 = µ, q10 = λ, q12 = µ, q21 = λ + η, q23 = µ, q32 = λ + 2η.
The other qij = 0. The rate νi at which the process leaves state i is
ν1 = µ + λ, ν2 = µ + λ + η and ν3 = λ + 2η.
This leads to the following linear equations for the µi :
1
µ
+
µ2
µ+λ µ+λ
λ+η
µ
1
+
µ1 +
µ3
µ2 =
µ+λ+η µ+λ+η
µ+λ+η
1
+ µ2 .
µ3 =
λ + 2η
µ1 =
The solution of these three linear equations leads to E(T ) = µ3 =
2,603. To find P (T > t), the three linear differential equations
Q′1 (t) = −(µ + λ)Q1 (t) + µQ2 (t)
Q′2 (t) = −(µ + λ + η)Q2 (t) + (λ + η)Q1 (t) + µQ3 (t)
Q′3 (t) = −(λ + 2η)Q3 (t) + Q2 (t)
are solved by using a numerical code. The probability P (T > t) is
given by Q3 (t). This probability has the numerical values 0.9962,
0.9905, 0.9810, 0.9623, and 0.9261 for t =10, 25, 50, 100, and 200.
Note: In both cases the occurrence of a system failure is a rare event,
247
because λ ≪ µ. This implies that the time until the first system is
approximately exponentially distributed (see Section 4.5). That is,
P (T > t) ≈ e−t/E(T )
for t > 0.
For case (a), e−t/E(T ) gives the approximate values 0.8251, 0.6183,
0.3823, 0.1462, and 0.0214 for t =10, 25, 50, 100, and 200. Indeed this
is an excellent approximation for P (T > t). It is also interesting to
compare the solution for case (a) with the solution of Example 4.9 in
which the replacement time is a constant rather than an exponentially
distributed random variable. For case (b), e−t/E(T ) gives the approximate values 0.9962, 0.9904, 0.9810, 0.9623, and 0.9260 for t =10, 25,
50, 100, and 200.
11.19 This problem can be analyzed by a continuous-time Markov chain with
6λ
6
5λ
5
4λ
4
3λ
3
2
µ
2λ
η
sleep 2
crash
1
µ
η
λ
2λ
λ
sleep 1
nine states. The states and the transition rates are given in the figure.
The sleep mode requires two states, because it takes an exponentially
distributed time until the system is converted into the sleep mode and
another gyroscope may fail during this time. The state labeled as the
crash state is taken as an absorbing state. Using a numerical code for
linear differential equations, the values 0.000504 and 0.3901 are found
for the sought probabilities.
11.20 Let state i mean that the drunkard is i steps away from his home.
Denote by X(t) the state at time t. The process {X(t)} is a continuoustime Markov chain with state space I = {0, 1, . . . , 30}, where the states
0 and 30 are absorbing (ν0 = ν30 = 0). Taking the minute as time unit,
the other transition rates are qi,i−1 = qi,i+1 = 1 for 1 ≤ i ≤ 29. Using
the uniformization method from Problem 11.17, the sought probability
p21,0 (90) is calculated as 0.4558 (the probability p21,30 (90) = 0.1332.
As sanity check, we verified that p21,0 (t) tends to 23 as t gets large, in
agreement with the gambler’s ruin formula (see also Problem 2.53).
248
The probability p21,0 (t) has the values 0.5899, 0.6659, and 0.6665 for
t = 180, 600, and 720.
11.21 By “rate out of state i = rate into state i”, we get the balance equations
µp0 = λp1 , (λ + µ)p1 = µp0 + (λ + η)p2 , and (λ + η)p2 = µp1 .
Together with the normalization equation p0 + p1 + p2 = 1, these
equations can be solved (one of the balance equations can be omitted).
The long-run fraction of time the system is down is given by
p0 =
λ
.
λ + µ + µ2 /(λ + η)
The probability p0 = 0.0129 when λ = 0.1, η = 0.05, and µ = 1.
11.22 Using the transition rates given in the solution of Problem 11.2 and
equating the rate out of state i to the rate into state i, we get the
balance equations
λp0 = µp1 , (λ + µ)pi = λpi−1 + µpi+1 for i = 1, 2, 3, and µp4 = λp3 .
Together with the normalization equation p0 + · · · + p3 = 1, the equilibrium probabilities pi can be computed. It is noted that the balance
equations can be recursively solved: they can be rewritten as
µpi = λpi−1
for i = 1, 2, 3.
as can be seen by equating the rate out of the set {i, . . . , 3} to the rate
into the set {i, . . . , 3} for i = 1, 2, and 3. For λ = 16 and µ = 14 , the
equilibrium probabilities are p0 = 0.3839, p1 = 0.2559, p2 = 0.1706,
p3 = 0.1137, and p4 = 0.0758. The long-run fraction of time the pump
is occupied is equal to p1 + p2 + p3 + p4 = 0.6162. The average number
of cars waiting in queue is
Lq =
4
X
i=1
(i − 1)pi = 0.6256.
By the property Poisson arrivals see time averages, the long-run fraction of cars that enter the station is equal to 1 − p4 , being the long-run
fraction of time that there is room for other cars in the station. Let
Wq be the average waiting time per car entering the station. To find
Wq , note that the conditional probability of a car finding upon arrival
249
j other cars present given that the car can enter the station is
for j = 0, 1, 2, and 3. Therefore
pj
1−p4
3
X
j pj
Wq =
= 4.0615.
µ 1 − p4
j=1
Note: Letting the rejection probability Prej = p4 (the long-run fraction
of cars that cannot enter the station), note that
Lq = λ(1 − Prej )Wq .
This relation is generally valid for finite-capacity queueing systems and
is known as Little’s formula.
11.23 There are five states. Let state (0, 0) mean that both stations are free,
state (0, 1) that station 1 is free and station 2 is busy, state (1, 0) that
station 1 is busy and station 2 is free, state (1, 1) that both stations
are busy and state (b, 1) that station 1 is blocked and station 2 is busy.
Using the transition rate diagram and equating the rate out of state
0,0
µ2
λ
1,0
0,1
λ
µ1
µ2
µ2
1,1
µ1
b,1
(i, j) to the rate into state (i, j), we get the balance equations
λp(0, 0) = µ2 p(0, 1), (µ2 + λ)p(0, 1) = µ1 p(1, 0) + µ2 p(b, 1),
µ1 p(1, 0) = λp(0, 0) + µ2 p(1, 1), (µ1 + µ2 )p(1, 1) = λp(0, 1),
µ2 p(b, 1) = µ1 p(1, 1).
The long-run fraction of time station 1 is blocked is equal to p(b, 1).
The long-run fraction of items that are rejected is equal to
p(1, 0) + p(1, 1) + p(b, 1),
by the property Poisson arrivals see time averages.
11.24 Let state (i, j) mean that i customers for gas and j customers for LPG
are at the station. Denote by X(t) the state at time t. Then {X(t)} is
a continuous-time Markov chain with the thirteen states (0, 0), (1, 0),
250
(2, 0), (3, 0), (0, 1), (1, 1), (2, 1), (3, 1), (0, 2), (1, 2), (2, 2), (0, 3), and
1
, µ1 = 12 ,
(1, 3). Take the minute as time unit and let λ1 = 13 , λ2 = 12
and µ2 = 31 . The transition rates are easily expressed in the λi and
the µi . Denote by p(i, j) the limiting probability of state (i, j). It
is helpful to draw a transition diagram in order to get the balance
equations
(λ1 + λ2 )p(0, 0) = µ1 p(1, 0) + µ2 p(0, 1),
(λ1 + λ2 + µ1 )p(i, 0)
= µ1 p(i + 1, 0) + µ2 p(i, 1) + λ1 p(i − 1, 0) for i = 1, 2, 3,
(λ1 + λ2 + µ1 )p(0, j)
= µ1 p(1, j) + µ2 p(0, j + 1) + λ2 p(0, j − 1) for j = 1, 2, 3,
(λ1 + λ2 + µ1 + µ2 )p(1, j)
= µ1 p(2, j) + µ2 p(1, j + 1) + λ1 p(0, j) + λ2 p(1, j − 1) for j = 1, 2, 3,
(λ1 + λ2 + µ1 + µ2 )p(2, j)
= µ1 p(3, j) + µ2 p(2, j + 1) + λ1 p(1, j) + λ2 p(2, j − 1) for j = 1, 2,
(µ1 + µ2 )p(3, 1) = λ1 p(2, 1) + λ2 p(3, 0),
where p(4, 0) = p(0, 4)p(1, 4) = p(2, 3) = p(3, 2) = 0. Denote by pk the
limiting probability of having a total of k cars in the station. Then,
p0 = p(0, 0), p1 = p(1, 0) + p(0, 1), p2 = p(2, 0) + p(0, 2) + p(1, 1),
p3 = p(3, 0) + p(2, 1) + p(1, 2) + p(0, 3), p4 = p(3, 1) + p(2, 2) + p(1, 3).
Also, the long-run average number of cars served per unit time is
µ1
3
X
i=1
+ µ2
p(i, 0) + p(i, 1) + p(1, 2) + p(2, 2) + p(1, 3)
3
X
j=1
(p(0, j) + p(1, j) + p(2, 1) + p(2, 2) + p(3, 1)
The long-run fraction of LPG-cars that cannot enter is p(3, 1)+p(2, 2)+
p(1, 3) + p(0, 3), by the property Poisson arrivals see time averages.
1
For the numerical data λ1 = 31 , λ2 = 12
, µ1 = 12 , and µ2 = 31 , the
limiting probabilities pj are p0 = 0.3157, p1 = 0.2894, p2 = 0.2127,
p3 = 0.1467, and p4 = 0.0354. Also, the long-run fraction of LPG-cars
that cannot enter is 0.0404 (and the long-run fraction of gas-cars that
cannot enter is 0.1290). The long-run average number of cars served
per hour is 60 × (0.2904 + 0.0800) = 22.2.
251
11.25 Let state (i, 0) mean that i passengers are waiting at the stand and
no sheroot is present, and let state (i, 1) mean that i passengers are
waiting at the stand and a sheroot is present. Using the transition
rate diagram and equating the rate out of state (i, j) to the rate into
µ
λ
µ
λ
...
1,0
0,0
i,0
µ
λ
0,1
µ
...
1,1
...
i+1,0
λ
i,1
λ
6,0
µ
0,0
µ
...
i+1,1
6,1
λ
state (i, j), we get the balance equations
(λ + µ)p(0, 0) = µp(7, 0) + λp(6, 1),
(λ + µ)p(i, 0) = λp(i − 1, 0) for 1 ≤ i ≤ 6,
µp(7, 0) = λp(6, 0), λp(0, 1) = µp(0, 0), λp(0, 1) = µp(0, 0)
λp(i, 1) = µp(i, 0) + λp(i − 1, 1) for 1 ≤ i ≤ 5.
P
The long-run average number of waiting passengers is 6i=1 i[p(i, 0) +
p(i, 1)] + 7p(7, 0). The long-run fraction of potential passengers who
go elsewhere is p(7, 0).
11.26 Let the random variable X(t) be the number of units in stock at time
t. The process {X(t)} is a continuous-time Markov chain with state
space I = {1, 2, . . . , Q} (note that state 0 should not be included: if
only one unit is in stock and a demand occurs or the unit deteriorates,
the stock is immediately replenished to Q). The transition rates are
given by q1Q = λ + η and qi,i−1 = λ + iη for 2 ≤ i ≤ Q. The other
qij are zero. In the figure the transition rate diagram is given. This
diagram is helpful in writing down the balance equations.
λ+2µ
1
λ+ i µ
...
2
i-1
λ+( i +1)µ
i
i+1
λ+µ
λ+ Q µ
...
Q-1
Q
252
By equating the rate out of state i to the rate into state i, we get
(λ + iµ)pi = [λ + (i + 1)µ]pi+1
(λ + Qµ)pQ = (λ + µ)p1 .
for 1 ≤ i ≤ Q − 1
The equilibrium probabilities can be recursively computed. Starting
with p1 = 1, the numbers p2 , . . ., pQ are recursively obtained. Next the
desired pi follow by normalization. The long-run average stock equals
P
Q
i=1 ipi . The long-run average number of replenishment orders placed
per unit time equals the long-run average number of transitions from
state 1 to state Q per unit time and thus is equal to (λ + µ)p1 .
11.27 The state of the system is defined as (i, k) when i ≥ 1 cars are in
the gasoline station and the service time of the car in service is in
phase k for k = 1, 2. State 0 means that the gasoline station is empty.
Then the state of the system is described by a continuous-time Markov
chain. The arrival rate of cars is λ = 61 and the rate at which a service
phase is completed is β = 12 . Denoting the limiting probability of state
s by p(s) the balance equations are
(λ + β)p(i, 1) = λp(i − 1, 1) + βp(i + 1, 2) for 1 ≤ i ≤ 3,
(λ + β)p(i, 2) = λp(i − 1, 2) + βp(i, 1) for 1 ≤ i ≤ 3, λp(0) = βp(1, 2),
βp(4, 1) = λp(3, 1), and βp(4, 2) = λp(3, 2) + βp(4, 1),
where p(0, 1) = p(0) and p(0, 2) = 0. The numerical solution of the
balance equations leads to the following answers:
(a) the fraction of time the pump is occupied is 1 − p(0) = 0.6138.
(b) The average number of cars waiting in queue is
Lq =
4
X
i=1
(i − 1) × p(i, 1) + p(i, 2) = 0.5603.
(c) The fraction of cars not entering the station is
Prej = p(4, 1) + p(4, 2) = 0.0529,
by the property Poisson arrivals see time averages.
(d) The average waiting time in queue of a car entering the station is
Wq =
3
3
X
2 X p(i, 2) 1
2
p(i, 1) i×
+
= 3.5494
+ (i − 1) ×
1 − Prej
β
1 − Prej β
β
i=1
i=1
253
minutes, using the fact that the conditional probability of a car entering the station finds upon arrival state (i, k) is p(i, k)/(1 − Prej ).
Note: As sanity check, Lq = λ(1 − Prej )Wq . This relation is generally
valid for finite-capacity queues and is known as Little’s formula.
11.28 The Erlang-distributed repair time with shape parameter 3 and scale
parameter β can be seen as the sum of three independent phases each
having an exponential distribution with mean β1 . The system is said
to be in state (0, k) if both units have failed and the repair phase of
the unit in repair is phase k for k = 1, 2, and 3, and the system is said
to be in state (1, k) if one unit is working and the repair phase of the
unit in repair is phase k for k = 1, 2, and 3. The state of the system
is 2 if both units are in working condition. Let the random variable
X(t) be the state of the system at time t. Then the process {X(t)} is
a continuous-time Markov chain. The transition rates are
q2,(1,1) = λ + η, q(1,1)(1,2) = β, q(1,1)(0,1) = λ, q(1,2)(1,3) = β, q(1,2)(0,2) = λ,
q(1,3)2 = β, q(1,3)(0,3) = λ, q(0,1)(0,2) = q(0,2)(0,3) = q(0,3)(1,1) = β.
Denoting the limiting probability of state s by p(s), we have the balance equations
(λ + η)p(2) = βp(1, 3), (λ + β)p(1, 1) = (λ + η)p(2) + βp(0, 3),
(λ + β)p(1, 2) = βp(1, 1), (λ + β)p(1, 3) = βp(1, 2),
βp(0, k) = λp(1, k) + βp(0, k − 1)
for k = 1, 2, 3,
where p(0, −1) = 0. The long-run fraction of time the system is down
is p(0, 1) + p(0, 2) + p(0, 3). By solving the equilibrium equations for
λ = 0.1, η = 0.05, and β = 3, we find p(0, 1) = 1.527 × 10−3 , p(0, 2) =
3.005 × 10−3 , and p(0, 3) = 4.435 × 10−3 . Therefore the long-run
fraction of time the system is down is equal to 8.97 × 10−3 . Also, the
long-run fraction of time that a repair is going on is 1 − p2 = 0.1420.
11.29 (a) In each of the problems the continuous-time Markov chain has a
state space of the form {0, 1, . . . , N } and transition rates qjk that are
zero if |j − k| > 1. Equating the rate at which the system leaves the
set of states {j, j + 1, . . . N } to the rate at which the system enters this
set gives that
qj,j−1 pj = qj−1,j pj−1
for all j = 1, . . . N,
254
showing that the Markov chain is reversible.
(b) By the reversibility of the Markov process {X(t)}, pj qjk = pk qkj
for all j, k ∈ A with j 6= k. This gives that pj q jk = pk q kj for all
j, k ∈ A with j 6= k. Summing this equality over k ∈ A, we get
X
X
pj
q jk =
pk q kj for all j ∈ A.
k6=j
k6=j
In other words, the pj satisfy the balance equations of the continuousP
time Markov chain {X(t)} and the normalization equation j∈A pj =
1. The solution of these equations is unique. This proves that the pj
are the limiting probabilities of the Markov process {X(t)}.
11.30 Let state 1 correspond to the perfect state, state 2 to the good state,
and state 3 to the acceptable state. Let the random variable X(t) be
the state at time t. The process {X(t)} is a continuous-time Markov
chain with transition rates q12 = q13 = 21 µ1 , q23 = µ2 , q31 = µ3 ,
and qij = 0 otherwise. The equilibrium equations are µ1 p1 = µ3 p3 ,
µ2 p2 = 21 µ1 p1 , and µ3 p3 = 12 µ1 p1 + µ2 p2 . This gives p1 = µµ13 p3 and
µ3
µ3
p2 = 2µ
p3 . Thus µµ31 + 2µ
+1 p3 = 1. It now follows that the average
2
2
number of replacements per unit time is
µ3 p3 =
µ1 µ2 µ3
.
µ2 µ3 + µ1 µ2 + 12 µ1 µ3
11.31 Let Ti be the sojourn time in state i. Then,
P (Ti ≤ t + ∆t | Ti > t) = (µi ∆t) × (1 − ai ) + o(∆t)
for ∆t small, showing that Ti is exponentially distributed with mean
1/[(1−ai )µi ]. Let X(t) = i if a type-i part is processed at time t. Then
{X(t)} is a continuous-time Markov chain with state space I = {1, 2}
and transition rates q12 = µ1 (1−a1 ) and q21 = µ2 (1−a2 ). The limiting
probabilities of a two-state Markov chain can be explicitly given (see
also Example 11.1 (continued)). The probabilities are
p1 =
µ2 (1 − a2 )
µ1 (1 − a1 )
and p2 =
.
µ1 (1 − a1 ) + µ2 (1 − a2 )
µ1 (1 − a1 ) + µ2 (1 − a2 )
The average number of type-i parts processed per unit time is µi pi for
i = 1, 2.
255
11.32 (a) The epochs at which a transition into state 0 occurs are regeneration epochs of the continuous-time Markov chain. Imagine that a
reward at rate 1 is earned when the system is in state 0. Then, by the
renewal-reward theorem from Section 9.4, the long-run average reward
0
per unit time is 1/ν
µ00 . The long-run average reward per unit time is
nothing else than the long-run fraction of time the process is in state
0. Therefore
1/ν0
= p0 ,
µ00
showing that µ00 = ν01p0 .
(b) Conditioning upon the time the process leaves state 0 and using
the law of conditional expectation, we get
Z τ
Z ∞
X
−ν0 x
x+
µj0 + E(T ) ν0 e−ν0 x dx
ν0 e
dx +
E(T ) = τ
τ
0
j6=0
1
= τ e−ν0 τ + (1 − e−ν0 τ − ν0 τ e−ν0 τ )
ν0
1
1
−
+ E(T ) (1 − e−ν0 τ ),
+
ν 0 p0 ν 0
which yields the desired expression for E(T ) after some simplification.
11.33 Let state 0 mean that both stations are idle, state 1 mean that only
station 1 is occupied, state 2 mean that only station 2 is occupied and
state 3 mean that both stations are occupied. Let X(t) be the state
at time t. Then the process {X(t)} is a continuous-time Markov chain
with transition rates q01 = λ, q10 = µ1 , q13 = λ, q20 = µ2 , q23 = λ,
q31 = µ2 , and q32 = µ1 . The balance equations are
λp0 = µ1 p1 + µ2 p2 , (λ + µ1 )p1 = λp0 + µ2 p3
(λ + µ2 )p2 = µ1 p3 , (µ2 + µ1 )p3 = λp1 + λp2 ,
where λ = 1, µ1 = 43 , and µ2 = 45 . The solution of the balance
equations is p0 = 0.4387, p1 = 0.2494, p2 = 0.1327 and p3 = 0.1791.
The long-run fraction of time both stations are occupied is p3 = 0.1791.
By the property Poisson arrivals see time averages, this probability also
gives the long-run fraction of items that are rejected.
Note: The loss probability is 0.0467 in Problem 10.52, showing that
the loss probability is very sensitive to the distributional form of the
arrival process.
256
11.34 In the solution of Problem 11.9, the state is defined as the number of
machines in repair and the transition rates are given. By equating the
rate out of the set {i, i + 1, . . . , 12} to the rate into this set, we get the
following recursion scheme for the limiting probabilities:
iµpi = 10λpi−1
for i = 1, 2
iµpi = (12 − i + 1)λpi−1
for i = 3, . . . , 12.
Starting with p0 = 1,
Pwe recursively compute p1 , p2 , . . . , p12 . Next we
calculate pj as pj / 12
k=0 pk for j = 0, 1, . . . , 12, using the fact that
any solution to the balance equations is uniquely determined up to a
multiplicative constant.
11.35 This problem can be solved through the Erlang loss model. The customers are the containers and the lots on the yard are the servers. A
capacity for 18 containers is required. Then, the loss probability is
0.0071. By the insensitivity property of the Erlang loss model, the
answer is the same when the holding time of a customer has a uniform
distribution with the same expected value of 10 hours.
11.36 This problem is another application of the Erlang loss model and the
solution of the problem is based on the insensitivity property of the
Erlang loss model. Since the sum of two independent Poisson processes
is again a Poisson process, cars for the parking place arrive according
to a Poisson process with rate λ = 4 + 6 = 10 cars per hour. Also, any
4
1
arriving car is a short-term parker with probability λ1λ+λ
= 10
and a
2
λ2
6
long-term parker with probability λ1 +λ2 = 10 , see Rule 5.4. Thus the
4
6
expected parking time of a car is 10
× 23 + 10
× 12 = 35
30 hours. Hence
30
the parameters of the Erlang loss model are given by λ = 1, µ = 35
and s = 10. Thus the limiting probability of an arriving car finding
all parking places occupied is equal to
(λ/µ)s /s!
Ps
= 0.4275.
k
k=0 (λ/µ) /k!
11.37 This inventory model is a special case of the Erlang loss model. Identify the number of outstanding orders with the number of busy servers.
The limiting distribution of the stock on hand is given by
rj =
γ(λL)S−j
(S − j)!
for 0 ≤ j ≤ S,
257
P
P
where γ = 1/ Sk=0 (λL)k /k!. The average stock on hand is Sj=0 jrj .
The fraction of lost demand is r0 .
11.38 Use the infinite-server queueing model from Example 11.5 to conclude
that the limiting distribution of the number of outstanding replenishment orders is a Poisson distribution with an expected value of
λL. Defining the net stock as the stock on hand minus the amount
of backordered demand, the sum of the net stock and the number of
outstanding replenishment orders is always S. This gives that the
k
P
−λL (λL) . The longlong-run average stock on hand is S−1
k=0 (S − k) e
k!
run of demand that is backordered is equal to the long-run fraction of
time that S or more orders are outstanding and is thus given by
∞
X
j=S
e−λL
(λL)j
.
j!
11.39 The process describing the inventory position is a continuous-time
Markov chain with state space I = {s + 1, . . . , s + Q} and transition rates qi,i−1 = λ for s + 2 ≤ i ≤ s + Q and qs+1,s+Q = λ. The
limiting probabilities are pi = Q1 for s + 1 ≤ i ≤ s + Q. Let pi (t) be the
probability that the inventory position at time t is i and rk (t) be the
probability that the stock on hand at time t is k for 0 ≤ k ≤ s+Q. For
any t > L, the stock on hand minus the amount backordered at time
t + L equals the inventory position at time t minus the total demand
in (t, t + L]. Then
rk (t) =
s+Q
X
i=k
e−λL
(λL)i−k
pi (t) for 1 ≤ k ≤ s + Q
(i − k)!
P
P∞ −λL (λL)l
and r0 (t) = s+Q
l=i e
i=s+1 pi (t)
l! . Noting that limt→∞ pi (t) =
1
,
the
limiting
distribution
of
the
stock
on hand follows.
Q
11.40 Let the random variable X1 (t) be the number of tables occupied by
one person and X2 (t) be the number of tables occupied by two persons.
The process {(X1 (t), X2 (t)} is a continuous-time Markov chain with
state space {(i, j) : i + j ≤ 20, 0 ≤ i ≤ 16, and 0 ≤ j ≤ 20}. In state
(i, j) there are 20 − i − j free tables. Take the hour as time unit and
let λ1 = 10, λ2 = 12, µ1 = 2, and µ2 = 1.5. The transition rates are
q(i,j)(i+1,j) = λ1 for i + j ≤ 15, q(i,j)(i,j+1) = λ2 for i + j ≤ 19,
q(i,j)(i−1,j) = iµ1 for 1 ≤ i ≤ 16, and q(i,j)(i,j−1) = jµ2 for 1 ≤ j ≤ 20.
258
Denoting by p(i, j) the limiting probabilities, we have the balance
equations
(λ1 + λ2 + iµ1 + jµ2 )p(i, j) = λ1 p(i − 1, j) + λ2 p(i, j − 1)
+ (i + 1)µ1 p(i + 1, j) + (j + 1)µ2 p(i, j + 1) for 0 ≤ i + j ≤ 15,
(λ2 + iµ1 + jµ2 )p(i, j) = λ1 p(i − 1, j) + λ2 p(i, j − 1)
+ (i + 1)µ1 p(i + 1, j) + (j + 1)µ2 p(i, j + 1) for 16 ≤ i + j ≤ 20,
where p(i, j) = 0 for i > 16 and p(i, j) = 0 for i + j > 20. The
performance measures are
the average number of occupied tables =
16 20−i
X
X
(i + j)p(i, j),
i=0 j=0
the average number of singles served per hour =
16 20−i
X
X
iµ1 p(i, j),
i=1 j=0
the average number of pairs served per hour =
16 20−i
X
X
jµ2 p(i, j).
i=0 j=1
Note: By the property Poisson arrivals P
see time
averages, the frac16 P20
tion of singles who cannot get a table is i=0 j=16−i p(i, j) and the
P
fraction of pairs who cannot get a table is 16
i=0 p(i, 20 − i).
11.41 Use the M/G/∞ model. The probability that there are more than 15
parts on the conveyor is
∞
X
k=16
e−3×3
(3 × 3)k
= 0.0220.
k!
11.42 Define X(t) as the number of customers in the system at time t. The
process {X(t), t ≥ 0} is a continuous-time Markov chain with state
space I = {0, 1, . . .}. In the figure below the transition rate diagram
is given. Since the transition rate qij = 0 for j ≤ i − 2, the equilibrium
probabilities pj can be recursively computed. By equating the rate
out of the set {i, i + 1, . . .} to the rate into this set, we find
σ1 µpi = λpi−1
σ2 µpi = λpi−1
for 1 ≤ i ≤ R − 1,
for i ≥ R.
Starting with p̄0 = 1, we can recursively compute p̄1 , p̄2 , . . . and next
obtain the desired pi ’s by normalization.
259
λ
σ1µ
...
λ
λ
λ
1
0
i-1
i
...
σ1µ
σ1µ
λ
R
R-1
R-2
...
k-1
σ2µ
k
...
σ2µ
11.43 Define the state of the system as the number of taxis waiting at the
stand. It is immediate from the transition rate diagram that the
7
M/M/1 queueing model with arrival rate λ = 60
and service rate
µ = 10
applies
to
the
number
of
taxis
present
at
the
stand. The ser60
7
vice requests in the model are the taxis arriving at a rate of λ = 60
per
10
minute and the service rate of µ = 60 per minute is the rate at which
potential passengers come to the stand. The limiting probability of
having no taxi present is p0 = 1 − λ/µ = 0.3. Hence
1
λ/µ
=2
1 − λ/µ
3
long-run proportion of passengers who get a taxi = 1 − p0 = 0.7,
the long-run average number of waiting taxis =
where the last result uses the property Poisson arrivals see time averages.
11.44 Let the random variable X(t) be the number of customers present at
time t. The process {X(t)} is a birth-and-death process with state
space I = {0, 1, . . .} and transition rates
λi =
λ
for i ≥ 0,
i+1
µi = µ for i ≥ 1.
Note that the probability of going from state i to state i + 1 in a very
1
small time interval of length h is λh × i+1
+ o(h). By Rule 11.5, we
find
(λ/µ)j
pj =
p0 for j = 0, 1, . . . .
j!
P
Using the normalization equation ∞
j=0 pj = 1, it next follows that
−λ/µ
p0 = e
and so
pj = e−λ/µ
(λ/µ)j
j!
for j = 0, 1, . . . .
By the property Poisson arrivals see time averages, the long-run fraction of customers finding upon arrival j other customers present is pj .
Any customer seeing j other customers upon arrival enters the sys1
. Using the law of conditional probability, it
tem with probability j+1
260
now follows that the long-run fraction of arrivals that actually join the
queue is
∞
X
j=0
pj
∞
µ X −λ/µ (λ/µ)k
µ
1
e
=
=
1 − e−λ/µ ,
j+1
λ
k!
λ
k=1
in agreement with result that the long-run average number of customers served per unit time is µ(1−p0 ) = µ 1 − e−λ/µ . The long-run
fraction of customers who go elsewhere is 1 − µλ 1 − e−λ/µ .
11.45 This problem is an application of the infinite-server queueing model
from Example 11.5. The steady-state probability that more than seven
oil tankers are on the way to Rotterdam is insensitive to the shape of
the sailing-time distribution and is given by the Poisson probability
∞
X
e−4
j=8
4j
= 0.0511.
j!
11.46 To solve this problem, a key observation is the following. If all s
servers busy, then, by the memoryless property of the exponential distribution, each of the s (remaining) service times is exponentially distributed with rate µ. The minimum of s independent random variables
each having an exponential distribution with rate µ is exponentially
distributed with rate sµ. As long as all s servers are working, the
times between service completions are independent random variables
each having an exponential distribution with rate sµ. In other words,
service completions occur according to a Poisson process with rate sµ
as long as s or more customers are present. Thus the conditional delay in queue of a customer finding j ≥ s other customers present upon
arrival is the sum of j − s + 1 independent random variables each hav1
ing an exponential distributed with expected value sµ
and thus has
an Erlang (j − s + 1, µ) distribution. The probability that an Erlang
(j − s + 1, µ) distributed random variable is larger than t is given by
j−s
X
k=0
e−sµt
(sµt)k
k!
for t ≥ 0.
By the property Poisson arrivals see time averages, the steady-state
probability that an arriving customer sees j other customers present
261
is equal to pj . By the law of conditional probability,
lim P (Wn > t) =
n→∞
∞
X
πj
j=s
j−s
X
e−sµt
k=0
(sµt)k
k!
for t ≥ 0.
Using the formulas for pj and Pdelay , it is next a matter of some algebra
to obtain
lim P (Wn > t) = Pdelay e−sµ(1−ρ)t
n→∞
for t ≥ 0.
11.47 The process describing the number of service requests present is a
birth-and-death process with transition rates
λi = λ for i ≥ 0, µi = iµ for 1 ≤ i ≤ s, µi = sµ + (i − s)θ for i > s.
The limiting probabilities pj can be recursively obtained from
jµpj = λpj−1 for 1 ≤ j ≤ s, sµ + (j − s)θ pj = λpj−1 for j > s.
Since the long-run average number of balking callers per unit time is
∞
X
(j − s)θpj
j=s+1
and the average arrival rate of callers is λ, it follows that the long-run
fraction of balking callers is
∞
1 X
(j − s)θpj .
λ
j=s+1
11.48 Let the random variable X1 (t) be the number of available bikes at the
bike rental and X2 (t) be the number of bikes at the depot. The process
{(X1 (t), X2 (t)} is a continuous-time Markov chain with state space
I = {(i, j) : i + j ≤ 25, i ≥ 0, and 0 ≤ j ≤ 9}.
In state (i, j) there are 25 − i − j bikes rented out. Take the hour as
time unit and let λ = 10 and µ = 0.5. The transition rates are
q(i,j)(i+1,j) = 0.75(25 − i − j)µ, q(i,j)(i,j+1) = 0.25(25 − i − j)µ for j < 9,
q(i,9)(i+10,0) = 0.25(25 − i − 9)µ and q(i,j)(i−1,j) = λ for i ≥ 1.
262
Denoting by p(i, j) the limiting probabilities and letting δ(0) = 0 and
δ(i) = 1 for i ≥ 1, we have the balance equations
λδ(i) + (25 − i − j)µ p(i, j) = λp(i + 1, j) + (25 − i − j + 1)µ
× [0.25p(i, j − 1) + 0.75p(i − 1, j)] for 0 ≤ i < 10, 0 ≤ j ≤ 9,
λ + (25 − i)µ p(i, j) = λp(i + 1, 0) + (26 − i)µ
× [0.25p(i − 10, 9) + 0.75p(i − 1, 0)] for 10 ≤ i ≤ 25,
λ + (25 − i − j)µ p(i, j) = λp(i + 1, j) + (25 − i − j + 1)µ
× [0.25p(i, j − 1) + 0.75p(i − 1, j)] for 10 ≤ i ≤ 25, 1 ≤ j ≤ 9,
where p(i, −1) = p(−1, j) = 0 and p(i, j) = 0 for i + j > 25. The
performance measures are
the average number of bikes at the bike rental =
(25−i,9)
25 min X
X
the average number of bikes at the depot =
(25−i,9)
25 min X
X
jp(i, j),
j=0
i=0
the fraction of tourists who cannot rent a bike =
ip(i, j),
j=0
i=0
9
X
p(0, j),
j=0
where the last result uses the property Poisson arrivals see time averages. Further,
the average number of transports per unit time from the depot
to the bike rental =
16
X
i=0
0.25(25 − i − 9)µp(i, 9).
11.49 Let X(t) be the number of busy channels at time t. The process {X(t)}
is a continuous-time Markov chain with state space I = {0, 1, . . . , c}.
The transition rates are
qi,i−1 = iµ for i = 1, . . . , c and qi,i+1 = (M −i)α for i = 0, 1, . . . , c−1.
By equating the rate out of the set {j, . . . , c} to the rate into the set
{j, . . . , c} for 1 ≤ j ≤ c, we find that the equilibrium probabilities pj
satisfy the recursive equations
jµpj = (M − j + 1)αpj−1
for j = 1, . . . , c.
263
The equilibrium probabilities are given by the truncated binomial distribution
M j
M −j
j p (1 − p)
for j = 0, 1, . . . , c,
pj = Pc
M k
M −k
k=0 k p (1 − p)
where p = (1/µ)/(1/µ + 1/α). The long-run average number of service
requests generated per unit time when i service channels are busy is
(M − i)αpi . Thus, the long-run fraction of service requests that are
lost is
M − c)αpc
Pc
.
i=0 (M − i)αpi
Note: The probability model in this problem is known as the Engset
loss model. This model has the property that the limiting probabilities
pj are insensitive to the specific form of the service-time distribution
and thus require from the service time only its expected value. In
the Engset model, the limiting probabilities are also insensitive to
the shape of the on-time distribution of the sources when the on-time
distribution and/or the service time distribution is continuous. A proof
of this deep result is beyond the scope of the book.
By letting M → ∞ and α → 0 such that M α remains equal to the
constant λ, it follows from the Poisson approximation to the binomial
probability that pj converges to
e−λ/µ (λ/µ)j /j!
−λ/µ (λ/µ)k /k!
k=0 e
Pc
for j = 0, 1, ..., c.
In other words, the Erlang loss model is a limiting case of the Engset
loss model. This is not surprising, since the arrival process of service
requests becomes a Poisson process with rate λ when we let M → ∞
and α → 0 such that M α = λ.
11.50 (a) Let X(t) be the number of working units at time t. The process {X(t)} is a continuous-time Markov chain with state space I =
{0, 1, . . . , c} and transition rates qi,i−1 = iµ for 1 ≤ i ≤ c and qi,i+1 = µ
for 0 ≤ i ≤ c − 1. The transition rate diagram is identical to the transition diagram in Figure 11.2 for Example 11.4 when s is replaced by
c, λ by µ, and µ by α. The lifetime in Problem 11.50 is the repair time
in Example 11.4 (or the service time in the Erlang loss model). Thus
the limiting distribution of the number of working units is
(µ/α)j /j!
p j = Pc
k
k=0 (µ/α) /k!
for j = 0, 1, . . . , c.
264
In particular, the long-run fraction of time the system is down is p0 .
This performance measure is insensitive to the shape of the lifetime
distribution, by the insensitivity property of the Erlang loss model.
(b) For the case of ample repairmen, the model is the same as the
Engset model by identifying the number of working units with the
number of active sources and taking M = c. The limiting probability
of having j working units is
c X
c k
c j
c−j
pj =
p (1 − p)c−k
p (1 − p) /
k
j
k=0
for 0 ≤ j ≤ c,
where p = (1/α)/(1/α+1/µ). The long-run fraction of time the system
is down is equal to p0 . Again, the insensitivity property applies.
Note: As a sanity check, in both case (a) and case (b) we have p0 =
1/α
1/α+µ when c = 1, in agreement with the result of Example 9.3.
11.51 (a) Let X(t) be the stock on hand at time t. The process{X(t)} is a
continuous-time Markov chain with state space I = {1, 2, . . . , R − 1}.
The transition rates are q1Q = λ, qi,i−1 = λ for 2 ≤ i ≤ R − 1,
qR−1,Q = µ, qi,i+1 = µ for 1 ≤ i ≤ R − 2, and the other qij = 0. The
balance equations are
(λ + µ)pi = µpi−1 + λpi+1
for i = 1, . . . , Q − 1,
(λ + µ)pQ = λp1 + µpQ−1 + λpQ+1 + µpR−1
(λ + µ)pi = µpi−1 + λpi+1
for i = Q + 1, . . . , R − 1,
where p0 = pR = 0.
PR−1
(b) The average stock on hand is i=1
ipi .
(c) The average number of stock replenishments per unit time is λp1
and the average number of stock reductions per unit time is µpR−1 .
Download