Uploaded by kt70w467rh

Probability Theory: Introduction & Applications

advertisement
Introduction to
PROBABILITY THEORY
with
CONTEMPORARY APPLICATIONS
Lester L. Helms
Dover Books on Mathematics
Handbook of Mathematical Functions: with Formulas, Graphs, and Mathematical Tables, Edited
by Milton Abramowitz and Irene A. Stegun. (0-486-61272-4)
Abstract and Concrete Categories: The Joy of Cats, Jiri Adamek, Horst Herrlich, George E.
Strecker. (0-486-46934-4)
Nonstandard Methods in Stochastic Analysis and Mathematical Physics, Sergio Albeverio, Jens
Erik Fenstad, Raphael Hoegh-Krohn and Tom Lindstrom. (0-486-46899-2)
Mathematics: Its Content, Methods and Meaning, A. D. Aleksandrov, A. N. Kolmogorov, and M.
A. Lavrent’ev. (0-486-40916-3)
College Geometry: An Introduction to the Modern Geometry of the Triangle and the Circle,
Nathan Altshiller-Court. (0-486-45805-9)
The Works of Archimedes, Archimedes. Translated by Sir Thomas Heath. (0-486-42084-1)
Real Variables with Basic Metric Space Topology, Robert B. Ash. (0-486-47220-5)
Introduction to Differentiable Manifolds, Louis Auslander and Robert E. MacKenzie.
(0-486-47172-1)
Problem Solving Through Recreational Mathematics, Bonnie Averbach and Orin Chein.
(0-486-40917-1)
Theory of Linear Operations, Stefan Banach. Translated by F. Jellett. (0-486-46983-2)
Vector Calculus, Peter Baxandall and Hans Liebeck. (0-486-46620-5)
Introduction to Vectors and Tensors: Second Edition-Two Volumes Bound as One, Ray M. Bo­
wen and C.-C. Wang. (0-486-46914-X)
Advanced Trigonometry, C. V. Durell and A. Robson. (0-486-43229-7)
Fourier Analysis in Several Complex Variables, Leon Ehrenpreis. (0-486-44975-0)
The Thirteen Books of the Elements, Vol. 1, Euclid. Edited by Thomas L. Heath. (0-486-60088-2)
The Thirteen Books of the Elements, Vol. 2, Euclid. (0-486-60089-0)
The Thirteen Books of the Elements, Vol. 3, Euclid. Edited by Thomas L. Heath. (0-486-60090-4)
An Introduction to Differential Equations and Their Applications, Stanley J. Farlow. (0-486-44595-X)
Partial Differential Equations for Scientists and Engineers, Stanley J. Farlow. (0-486-67620-X)
Stochastic Differential Equations and Applications, Avner Friedman. (0-486-45359-6)
Advanced Calculus, Avner Friedman. (0-486-45795-8)
Point Set Topology, Steven A. Gaal. (0-486-47222-1)
Discovering Mathematics: The Art of Investigation, A. Gardiner. (0-486-45299-9)
Lattice Theory: First Concepts and Distributive Lattices, George Gratzer. (0-486-47173-X)
Ordinary Differential Equations, Jack K. Hale. (0-486-47211-6)
Methods of Applied Mathematics, Francis B. Hildebrand. (0-486-67002-3)
Basic Algebra 1: Second Edition, Nathan Jacobson. (0-486-47189-6)
Basic Algebra II: Second Edition, Nathan Jacobson. (0-486-47187-X)
Numerical Solution of Partial Differential Equations by the Finite Element Method, Claes John­
son. (0-486-46900-X)
Advanced Euclidean Geometry, Roger A. Johnson. (0-486-46237-4)
Geometry and Convexity: A Study in Mathematical Methods, Paul J. Kelly and Max L Weiss
(0-486-46980-8)
(continued on back flap)
INTRODUCTION TO PROBABILITY THEORY
With Contemporary Applications
Lester L. Helms
University of Illinois at Urbana-Champaign
DOVER PUBLICATIONS, INC.
Mineola, New York
Copyright
Copyright © 1997, 2010 by Lester L. Helms
All rights reserved.
Bibliographical Note
This Dover edition, first published in 2010, in an unabridged republication of the
work originally published in 1997 by W. H. Freeman and Company, New York. The
author has provided a new errata list for this edition.
Library of Congress Cataloging-in-Publication Data
Helms, L. L. (Lester La Verne), 1927—
Introduction to probability theory : with contemporary applications / Lester L.
Helms.
Dover ed.
p. cm.
Originally published: New York : W. H. Freeman, 1997. With new errata list.
Includes bibliographical references and index.
ISBN-13: 978-0-486-47418-2
ISBN-10: 0-486-47418-6
1. Probabilities. I. Title.
QA273.H52 2010
519.2—dc22
2009034243
Manufactured in the United States by Courier Corporation
47418601
www.doverpublications.com
In memory of
David Michael Helms
1955-1990
CONTENTS
1
2
3
vii
Preface
ix
Errata
xi
Classical Probability
1
1.1
1.2
1.3
1.4
1.5
1
2
5
12
17
Beginnings...................................................................................
Basic Rules...................................................................................
Counting......................................................................................
Equally Likely Case....................................................................
Other Models.............................................................................
Axioms of Probability
25
2.1
2.2
2.3
2.4
2.5
2.6
2.7
25
26
31
35
41
45
52
Introduction................................................................................
Set Theory....................................................................................
Countable Sets.............................................................................
Axioms.........................................................................................
Properties of Probability Functions............................................
Conditional Probability andIndependence................................
Some Applications.......................................................................
Random Variables
60
3.1
3.2
3.3
3.4
3.5
3.6
60
60
72
81
92
96
Introduction................................................................................
Random Variables.......................................................................
Independent RandomVariables...................................................
Generating Functions.................................................................
Gambler’s Ruin Problem . .. /................................................
Appendix ...................................................................................
viii
CONTENTS
4
5
6
Expectation
99
4.1 Introduction.......................................................................
4.2 Expected Value..............................................................................
4.3 Properties of Expectation............................................................
4.4 Covariance and Correlation <......................................................
4.5 Conditional Expectation ............................................................
4.6 Entropy........................................................................................
99
99
107
117
125
133
Stochastic Processes
144
5.1
5.2
5.3
5.4
5.5
144
145
159
167
172
Introduction.................................................................................
Markov Chains ...........................................................................
Random Walks..............................................................................
Branching Processes.....................................................................
Prediction Theory........................................................................
Continuous Random Variables
6.1
6.2
6.3
6.4
6.5
6.6
181
Introduction................................................................................. 181
Random Variables........................................................................ 182
Distribution Functions............................................................... 190
Joint Distribution Functions..................................................... 199
Computations with Densities..........................................................209
Multivariate and Conditional Densities.......................................... 216
7 Expectation Revisited
226
7.1 Introduction................................................................................. 226
7.2 Riemann-Stieltjes Integral............................................................ 226
7.3 Expectation and Conditional Expectation.................................... 234
7.4 Normal Density........................................................................... 243
7.5 Covariance and Covariance Functions.......................................... 255
8
Continuous Parameter Markov Processes
8.1
8.2
8.3
8.4
8.5
8.6
267
Introduction................................................................................. 267
Poisson Process............................................................................... 267
Birth and Death Processes................................................................ 273
Markov Chains ............................................................................... 278
Matrix Calculus........................................................................... 285
Stationary Distributions............................................................... 293
Solutions to Exercises
300
Standard Normal Distribution Function
346
Symbols
347
Index
348
PREFACE
In addition to exposing a student to diverse applications of probability theory
through numerous examples, a probability textbook should convince the
student that there is a coherent set of rules for dealing with probabilities and
that there are powerful methodologies for solving probability problems. Aside
from routine differentiation and integration methods, as far as possible I
have based this book on the following three topics from the calculus: (I) the
principle of mathematical induction, (2) the existence of limits of monotone
sequences, and (3) power series. With three or four exceptions, complete proofs
of theorems are included for the benefit of the highly motivated student.
The transition from calculus to probability theory is not easy for the typical
student. New concepts, which are not amenable to “plug and chug” methods,
are introduced at each turn of the page. At the risk of verbosity, I have
endeavored to err on the side of readability to make this transition easier.
Even so, the student will need to have pen and scratch pad ready for writing
out some of the details. Again for the benefit of the student, solutions to all
the exercises are included at the end of the text. Students need immediate
reassurance that they have worked a problem correctly so that they can get on
with the learning process and should not be made to wait until the next class
meeting. Some of the exercises are tagged with a caution symbol in the form
of a hand; these should not be attempted without Mathematica or Maple V
software. Some of these exercises stipulate that answers should be calculated to
n decimal places to ensure that mathematical software is actually used rather
than a hand calculator and a crude approximation.
At most, two or three classroom periods should be allotted to Chapter 1. A
standard one-semester course might consist of Chapters 1-4, one section from
ix
I
PREFACE
Chapter 5, and Chapters 6-7. On the other hand, an instructor who believes in
the inevitability of a digitized science might offer a one-semester course based
only on Chapters 1-5. There is more than enough material for a two-semester
course.
The manuscript was classroom tested during the fall semester of 1994 and
the spring semester of 1996. Many examples and exercises have been added
to the original manuscript at the suggestion of the students. This book was
written for students, and I would welcome any suggestions from them on how
it might be improved via e-mail at l-helms@math.uiuc.edu.
I would like to express my appreciation to W. H. Freeman reviewers S. James
Taylor of the University of Virginia and Cathleen M. Zucco of LeMoyne College
for their many suggestions on how to improve the book. I also thank Mary
Louise Byrd, Project Editor at W. H. Freeman, for maintaining a reasonable
production schedule. I especially thank Holly Hodder, Senior Editor at W. H.
Freeman, for her interest in publishing this book.
July 1996
ERRATA
1. In line 15 of page 44, change 3/16 to 5/32.
2. In line 12 of page 48, interchange .116 and .5.
3. In line 16 of page 48, change sentence beginning “Since ...” to
That someone must be able to pass on a B allele and therefore must
be of genotype OB, BB, or AB; in the first and third cases there is
a 50-50 chance that the B allele will be passed to the child. The
computation of P(E\FC) is similar to that of P(.E|.F) except for an
additional term P(E A Fc\Fab')P(Fab') in the numerator. Thus,
p
= (.5)(.116) + .007 + (.5)(:038) =
.877
4. Line 11“ of page 48 should read
P(E\F") _-528
P(E\FC)
.096
5. In line 2 of page 49, change .89 to .85.
6. The following sentence should be added to line 2 on page 106:
Before trying to verify (4.2) below, the reader should do Exercise
4.2.8 first.
7. The first sentence of line 14 on page 106 should read:
A similar argument applies to the fourth starting pattern, but care
must be taken with the second and third starting patterns.
8. Equation (4.2) on page 106 should be changed to read:
£r(n) = gT(n - 1) - ^9r(n - 3) + j^r(n - 4),
xi
n > 4.
xii
9. The display equation following Equation (4.3) on page 106 should read:
pT(t)-l-t-t2-t3
=
t3
t4
- 1 - t - t2) - ~(H0 - 1) + ZTPt(0o
10
10. The last display equation following Equation (4.3) on page 106 should read:
,
9t{ '
.
16 + 2t3
16 - 16t + 2t3 -t4'
11. The 30 on lines 3 and 4 of page 107 should be changed to 18.
12. Exercise 4.2.8 on page 107 should be replaced by
Consider a sequence of Bernoulli trials with probability of success
p = 1/2, define a waiting time T by putting T = n if the word
101 appears for the first time at the end of the nth trial, and let
pr(n) = P(T > n). Show that
1
/I
1
\
1
9r(n) = -gT(n - 1) + -gT(n - 1) - -gT(n - 2) + -gT(n - 3)
z
yz
4
j
o
for n > 3. Using the fact that pr(0) = pr(l) = Pt(2) = 1, find the
generating function g? and E[T].
13. The following sentences should be added to Definition 5.2 on page 154:
(3) The state j is periodic of period d(j) if d(j) is the greatest com­
mon divisor of the set {n € N : pff > 0} and is called aperiodic if
d(j) = 1; (4) the chain {zn}£L0 is aperiodic if each state is aperiodic.
14. In line 6“ on page 154, insert “an aperiodic” before the word irreducible.
15. In line 8“ of page 168, add right parenthesis.
16. In line 6 of page 172, replace “with” by “by”.
17. In line 9“ of page 179, replace “and” by “<”.
18. In line 5“ of page 179, replace “and” by “<”.
19. The D in line 3 on page 238 (after Figure 7.1) should have an exponent 2
as in D2.
20. Solution 2.5.3 on page 308 should be 17/24.
21. Solution 4.2.8 on page 319 should read:
xiii
Let T = n if the word 101 occurs on the nth trial for the first time.
Then
8 + 242
9tW - 8 _ st + 2t2 _ t3
and E[T] = £T(1) = 10.
22. Solution 4.2.9 on page 319 should read:
435
P(T < 11) = 1 - P(T > 11) = 1 - gT(ll) =77^7 = -4248.
INTRODUCTION TO PROBABILITY THEORY
With Contemporary Applications
CLASSICAL PROBABILITY
BEGINNINGS
As far back as 3500 B.C., devices were used in conjunction with board games to
inject an element of uncertainty into the game. A heel bone or knucklebone of
a hooved animal was commonly used. Dice made from clay were in existence
even before the Greek and Roman empires. Just how the outcomes of these
devices were measured or weighted, if at all, is unknown. It may be that the
outcomes were ascribed to fate, the gods, or whatever, with no attempt being
made to associate numbers with outcomes.
At the end of the fifteenth century and beginning of the sixteenth century,
numbers began to be associated with the outcomes of gaming devices, and by
that time empirical odds had been established for some devices by inveterate
gamblers. In the first half of the sixteenth century, the Italian physician and
mathematician Girolamo Cardano (1501-1576) made the abstraction from
empiricism to theoretical concept in his book Liber de Ludo Aiea, “The Book
of Games of Chance,” which was published posthumously in 1663. An English
translation of this book by Sydney Gould can be found in Cardano, The
Gambling Scholar by Oystein Ore (see the Supplemental Reading List at the end
of this chapter). Among other things, Cardano calculated the odds of getting
various scores with two dice and with three dice.
During the period 1550-1650, several mathematicians were involved in
calculating the chances of winning at gambling. Sometime between 1613 and
1623, Galileo (1564-1642) wrote a paper on dice without alluding to any prior
work, as though the calculation of probabilities had become commonplace by
then. Some historians mark 1654 as the birth of the theory of probability. It was
1
2
1
CLASSICAL PROBABILITY
in this year that a gambler, the Chevalier de Mere, proposed several problems to
Blaise Pascal (1623-1662), who in turn communicated the problems to Pierre
de Fermat (1601-1665). Thus began a correspondence between Pascal and
Fermat about probabilities that some authors claim to be the beginning of the
theory of probability. On the heels of this correspondence, in 1667, another
seminal work appeared: De Ratiociniis in Ludo Aiea by Christianus Huygens
(1629-1695), in which the concept of expectation was introduced for the first
time.
In the following sections, a theory of probability will be developed (as
opposed to the theory of probability, since there are several approaches to
probability) as expeditiously as possible using contemporary notation and
terminology.
BASIC RULES
Game playing and gambling have been common forms of recreation among all
classes of people for hundreds of years. In fact, the desire to win at gambling
was a primary driving force in the development of probability theory during
the sixteenth century. As a result, much of the early work dealt with dice
and with answering questions about perceived discrepancies in empirical odds.
Some of the gamblers were quite astute at recognizing discrepancies on the
order of 1/100. The point is that the gamblers of the sixteenth century were
aware of some kind of empirical law according to which there was predictability
about the frequency of occurrence of a specified outcome of a game, even
though there was no way of predicting the outcome of a particular play of the
game.
Consider an experiment or game in which the outcome is uncertain and
consider some attribute of an outcome. Let A be the collection of outcomes
having that attribute. Suppose the experiment or game is repeated N times and
N(A) is the number of repetitions for which the outcome is in A. The ratio
N (A)/N is called the relative frequency of A. The fact that the ratio N (A)/N
seems to stabilize near some real number p when N is large, written
N(A)
N ~
* P’
is an empirical law. This law can no more be proved than Newton’s law of
cooling can be proved. Of course, the number p depends upon A, and it is
customary to denote it by P(A), so that the empirical law is usually written
The number P(A) is called the probability of A. This tendency of relative
frequencies to stabilize near a real number is illustrated in Figure 1.1. The
1 .2
3
BASIC RULES
graph depicts the relative frequency of getting a head in flipping a coin N times
for N up to 500, calculated at multiples of 5 and rounded off. The number
500 was chosen in advance of any coin flips.
Just what was Cardano’s contribution? It was the observation that for most
simple games of chance, the probability of a particular outcome is simply the
reciprocal of the total number of outcomes for the game, an observation that
seemed to agree with empirical odds established by the gamblers. For example,
if the game consists of rolling a fair die (i.e., a nearly perfect cubical die), then
the outcomes 1,2,3,4,5,6 represent the number of pips on the top surface of
the die after coming to rest, and.so each outcome has an associated probability
of 1/6.
Cardano also considered the roll of two dice; for purposes of argument, a
red die and a white die. There are six outcomes for the red die. Each of the
outcomes of the red die can be paired with one of six outcomes for the white
die, and so there are a total of 36 possible outcomes for the roll of the two dice.
Thus, Cardano assigned each outcome a probability of 1/36.
Cardano’s assignment of probabilities is universally accepted for most simple
games of chance: rolling a die, rolling two dice, . . ., rolling n dice; flipping
a coin, flipping a coin two times in succession, . .., flipping a coin n times
in succession; flipping n coins simultaneously; and dealing a hand of n cards
from a well-shuffled deck of playing cards.
Consider two collections of outcomes A and B having no outcomes in
common; i.e., A and B are mutually exclusive. If A U B denotes the collection of
outcomes in A or in B and N (A U B) denotes the number of times the outcome
is in A U B in N repetitions of the game, then N(A U B) = N(A) + N(B).
Since
N(AUB) _ N(A) N(B)
N
” N + N ’
it follows from the empirical law that
P(AUB)
P(A) + P(B)
(1.1)
4
1
CLASSICAL PROBABILITY
whenever A and B are mutually exclusive. Note also that 0
and it follows from the empirical law that
0 < P(A) < 1.
N(A)/N
1,
(1-2)
In particular, if O is the collection of all outcomes of the game, then N(fl) = N
and
P(O) = 1.
(1.3)
The properties of probabilities expressed by Equations 1.1,1.2, and 1.3 embody
the basic rules for more general probability models.
Returning to Cardano’s assignment of probabilities, if A consists of the
outcomes &>i, a>2> ■■■>&>& and N(a>j) is the number of times the outcome is
a>i in N repetitions of a game, then N(A) = N(ci>i) + • • • + N(&>jt) and
P(A) = P(o»i) + • • • + P(tojt) by the empirical law. Letting |A| denote the
number of outcomes in A,
P(A) =
(1-4)
We now have the basic rules for calculating probabilities associated with
simple games of chance. Such calculations are reduced to counting outcomes.
The reader should develop a systematic procedure for identifying and labeling
outcomes, as in the following example.
EXAMPLE 1.1 Consider an experiment in which a coin is flipped three
times in succession. The outcomes can be labeled using three-letter words
made up from an alphabet of H and T (or 1 and 0). The label TTH stands
for an outcome for which the first two flips resulted in tails and the third in
heads. All possible outcomes can be listed: HHH, THH, HTH, HHT, HTT,
THT, TTH, TTT. Consider the attribute “the number of heads in the outcome
is 2.” If A is the collection of outcomes having this attribute, then |A[ = 3, and
soP(A) = 3/8. ■
EXAMPLE 1.2 Consider an experiment in which a bowl contains five
chips numbered 1,2, 3,4,5. The chips are thoroughly mixed and one of them
is selected blindly, the remaining chips are thoroughly mixed again, and then
one of the remaining chips is selected blindly. An outcome of this experiment
can be labeled using a two-letter word made up from an alphabet consisting of
the digits 1,2,3,4,5, with the proviso that no digit can be repeated. All possible
outcomes can be listed: 12,13,14,15, 21,23,24,25,31, 32, 34,35,41,42,43,
45,51,52,53,54. Consider the attribute “the first digit is less than the second.”
If A is the collection of outcomes with this attribute, then |A| = 10, and so
P(A) = 10/20 = 1/2. ■
1.3
EXERCISES 1.2
5
COUNTING
The last problem requires the principle of mathematical induction, which
states: If P(n) is a statement concerning the positive integer n that satisfies (i)
P(l) is true and (ii) P(n + 1) is true whenever P(n) is true, then P(n) is true
for all integers m
1.
1.
Consider an experiment in which a coin is flipped four times in
succession. If A is the collection of outcomes having two heads,
determine P(A).
2.
If a coin is flipped n times in succession, what is the relationship
between |O| and n?
3.
Consider four distinguishable coins (e.g., a penny, a nickel, a dime, and
a quarter). If the four coins are tossed simultaneously and A consists of
all outcomes having two heads, determine P(A).
4.
If four coins of like kind are tossed simultaneously and A consists of all
outcomes having three heads, determine P(A).
5.
A simultaneous toss of two indistinguishable dice results in a con­
figuration (if the positions of the two dice are interchanged, a new
configuration is not obtained). What is the total number of configura­
tions?
6.
Use the principle of mathematical induction to prove that
and
]2 .
, ,2 , ... , m2 _ n(n + l)(2n + l)
1 + z +3 + ••• + « —
----------6
for every integer n
MWb
1.
COUNTING
In the previous section, outcomes were given labels that were words made up
using a specified alphabet. This procedure is just a special case of more general
schemes.
If ab .. .,a„ are n distinct objects, A will denote the collection consisting
of these objects, written A = {ab ..., an}- If B = {&i»..., bra} is a second
collection, we can form a new collection, denoted by A X B, which consists of
all ordered pairs (a,, bj) with i = 1,..., n and j =
Since we can
form a rectangular array with n rows and m columns in which the element of
the ith row and jth column is (a,-, bj), the total number of ordered pairs in the
array is n X m. Therefore,
|A X B|’ = n X m.
(1-5)
6
1
CLASSICAL PROBABILITY
More generally, let Ab...,Ar be r collections having nb...,nr members,
respectively. We can then form the collection Ai X • • • X Ar of ordered r-tuples
(«i, ..., «r) where each a, belongs to Ai, i = 1,..., r. In this case,
|Ai X • • • X A>| = «| X m X • • • X nr.
(1.6)
The proof of this result requires a mathematical induction argument. The
essential step in the argument is as follows if we agree that the ordered rtuple (ti|,..., ar) is the same as the ordered pair ((ab..., ar-1)> ar)- By the
induction argument, the number of ordered (r — l)-tuples (ab ..., ar-i) is
m X • • • X nr-1, and it follows from Equation 1.5 that the number of ordered
r-tuples is (m X ••• X nr-1) X nr.
In the particular case that all of the Ab .. ,,Ar are the same collection A
and |A| = n, then the number of ordered r-tuples (ab ..., ar) where each a,belongs to A is given by
[3 X ••• X Aj = nr.
(1.7)
r times
Ordered r-tuples of the type just described have another name in probability
theory. The collection A is called a population and the ordered r-tuple
(«i,...,«r) is called an ordered sample of size r with replacement from a
population A of size n. Such an ordered sample can be thought of as being
formed by successively selecting r elements from A with each element being
returned to A before the next element is chosen.
Theorem 1.3.1
The number of ordered samples of size r with replacement from a population of
size n is nr.
EXAMPLE 1.3 Suppose a die is rolled three times in succession. The
outcome can be regarded as an ordered sample of size 3 with replacement from
the population A = {1, 2, 3,4,5,6}. In this case n = 6 and r = 3, and so the
total number of outcomes is 63 = 216. ■
EXAMPLE 1.4 Two dice are thrown 24 times in succession. The outcome
can be described as an ordered sample of size 24 from a population of size 36
with replacement. The total number of outcomes is 3624. ■
If in forming an ordered sample from a population A we choose not to
return an element to A, then we obtain an ordered sample of size r without
replacement from a population A of size n.
Theorem 1.3.2
The number of ordered samples of size r without replacement from a population
of size n is n(n — 1) X • • • x (n — r + 1).
1.3
COUNTING
7
Again a mathematical induction argument is needed to prove this result.
Simply put, there are n choices for the first member of the sample, n - 1
choices for the second, and upon making the rth choice there are only
n — (r — 1) = n — r + 1 choices for the rth member of the sample, and so the
total number of choices is n(n — 1) X • • • x (n — r + 1). Because the latter
product arises quite frequently, it is convenient to introduce a symbol for it;
namely,
(n)r = n(n - 1) X ••• X (n - r +1).
(1.8)
Note that
(n)„ = n(n — 1) x • • • x 2 x 1 = n\
EXAMPLE 1.5 The game of solitaire is played with a deck of 52 cards.
The game commences with 28 cards placed on a table in a prescribed order
as drawn from the deck and constitutes an ordered sample of size 28 without
replacement from a population of size 52. The number of such samples is
(52)28 = 52 X 51 X • •• X 25. ■
In some cases, order is irrelevant. For example, it is not necessary to hold
the cards of a poker hand in the order in which they are dealt from a deck. An
unordered sample of size r from a population A of size n is just a subpopulation
of A having r members. C(n, r) will denote the number of such unordered
samples. Such a sample is also called a combination of n things taken r ata time.
Theorem 1.3.3
The number of unordered samples of size r from a population of size n is
.
(n)r
n(n — 1) X • • • X (n — r + 1)
C(n,r) = — = ------------------- :------------ ----- •
r!
r!
A convincing argument can be made as follows. The number of ordered
samples of size r without replacement from the population of size n is (n)r.
Each such ordered sample can be obtained by first selecting an unordered sample
of size r from the population, which can be done in C(n, r) ways, and then
taking an ordered sample of size r without replacement from the subpopulation
of size r. Since the latter can be done in r! ways, (n)r = C(n, r) X r!
Suppose a poker hand of 5 cards is dealt from a wellshuffled deck of 52 cards. Since a poker hand can be regarded as unordered,
the total number of poker hands is C(52,5) = 2,598,960. ■
EXAMPLE 1.6
Note that
8
1
CLASSICAL PROBABILITY
n(n - 1) X • • • X (h - r+1) =
r!
r!(n — r)!
nl
since nl = (n(n - 1) X • • • x (n - r + 1)) X (n - r)l. Asa matter of nota­
tional convenience, 0! is equal to 1 by definition and C (n, 0) = 1 if calculated
formally using the last displayed equation. Another commonly used notation
for C(w, r) is ( ” ). The two will be used interchangeably. It is implicit in the
above definition of C(n, r) that 0
r < n. Again as a matter of notational
convenience, we will put C(n, r) = (”) = 0 if r < 0 or if r > n. The ( ” )
are called binomial coefficients because of their association with the binomial
theorem:
n / \
(a + b)n =y\{tl)akbn-k.
(1.9)
k=0
This theorem can be used to derive useful relationships connecting the coeffi­
cients. For example, putting a = b = 1,
2"“±(Bt) = (”0)4") + - 4J).
(MO)
k =0
EXAMPLE 1.7
If n is a positive integer with n S: 2, then
where the coefficient of ( ” ) is + or - as n is even or odd, respectively. This
can be seen as follows. Taking b = 1 in Equation 1.9,
(0+1)”
it = 0
Taking a = — 1,
”
z ”\)
0 = (-! + !)■ =£(-!/(
1 =0
COUNTING
9
EXAMPLE 1.8
If tn and n are positive integers and t is any real number,
1.3
then
(l+t)m+n = (1 + t)m(l + t)".
Applying the binomial theorem three times,
m+n .
.
! m
,
n
,
.
s('nr)‘‘=fe(“)‘i e(")"
\i=0
Jt=O
/\j=0
\
/
= ±±(? )(")<■’•
i=0;=0
J
Collecting terms with a factor of tk,
m+n ,
m+n / i
.\
fc = 0
t = 0\i=0
/
s('"r),‘“SE(7)(/i)r
Equating corresponding coefficients of t*. it follows that
(”’l+’’)-±(7)(il,.)."
i=0
(i.io
Returning to Equation 1.8, note that the right side of the equation makes
sense if n is any real number. For any real number x and any positive integer
r, we define
(x)r = x(x — 1) X • • • X (x — r + 1)
and
,
(x\
(x)r
x(x - 1) X • • • X (x - r + 1)
C(x,r) = ( J = — = ----------------- - ----------------- .
We also define (x)0 = landC(x,0) = (
)
*
= 1; for any negative integer r,
we define ( * ) = 0.
EXAMPLE 1.9
If r is a nonnegative integer, then ( ? ) = ( —l)r since
IO
1
CLASSICAL PROBABILITY
(-a = (-D(-2)x---xei — r + i)
r!
r!
(- n
' r '
=
r!
„ (_,r. ■
There are many more equations relating to binomial coefficients. The reader
interested in pursuing the subject further should consult the books by Feller
and Tucker (see the Supplementary Reading List at the end of the chapter).
It was tacitly assumed in the preceding discussions that the elements of
a sample are distinguishable. But there are probability models in physics in
which some elementary particles behave as though they are indistinguishable.
A general scheme for dealing with such particles can be described as follows.
Consider r indistinguishable balls and n distinguishable boxes numbered 1,
2,...»n. If the balls are distributed among the boxes in some way, the result is
called a configuration. If a ball in Box 1 is interchanged with a ball in Box 2,
the configuration does not change. The total number of configurations can be
calculated using the following device. The label * | * | * * * | ** | * signifies
that there are a total of five boxes with one ball in Box 1, one ball in Box 2, three
balls in Box 3, two balls in Box 4, and one ball in Box 5, which is to the right of
the last vertical bar. In general, there are (n — 1) + r = n + r — 1 symbols in
the label because the number of vertical bars is one less than the number n of
boxes. A label is completely specified if r of the n + r — 1 positions are selected
to be filled by asterisks; i.e., by selecting a subpopulation of size r. The total
number of ways of selecting such subpopulations is C(n + r — 1, r).
Theorem 1.3.4
The total number of ways of distributing r indistinguishable balls into n boxes is
("+rr"’).
EXAMPLE 1.10 Suppose two dice, indistinguishable to the naked eye, are
tossed. To determine the total number of possible configurations, consider
boxes numbered 1,2,..., 6 and consider the dice as balls that are placed into
the boxes. In this case, n = 6 and r = 2, so that the total number of
c
.. (6 +2 - 1 \
_
configurations is
) = 21. ■
Dice do not behave as though they are indistinguishable; in fact, two dice
behave as though there are 36 outcomes with each having the same probability.
EXERCISES 1.3
The reader should review Maclaurin and Taylor series expansions before
starting on these problems.
1.3
1.
11
COUNTING
If tn and n are positive integers with 0 < n
tn, show that
(
m *2
) by (a) expressing both sides in terms of factorials
and (b) interpreting each side as the number of ways of selecting a
subpopulation.
2.
If n and r are positive integers with 1 < r < n, show that C(n, r) =
C(n — 1, r) + C(n — 1, r — 1). (This equation validates the triangular
array
1
1
1
1
1
2
13
3
4
6
1
1
4
1
commonly called “Pascal’s Triangle” in Western cultures.)
3.
If n is a positive integer, show that the Maclaurin series expansion of
the function/(t) = (1 + t)" is
r«) = ±(”)r‘.
'
fc = 0
A
How does this result relate to the binomial theorem?
4. If a is any real number, show that the Maclaurin series expansion of the
function/(r) = (l+r)“is
(!+<)“ =Z(p»‘
fc = 0
(which is valid for |r| < 1).
5.
If n is a positive integer, use the binomial expansion of (1 + t)" to show
that
„2»-=t‘("j = (")+2("2 )—"(:)■
k=l
6.
If n is a positive integer with n
2, show that
n(n- 1)2”-2 = 2 • 1( ” ) + 3 • 2( ” ) + • • • + n(n - 1)( ” )•
12
1
CLASSICAL PROBABILITY
7. A die is tossed n times in succession. What is the probability that a 1
will not appear?
8. A coin is flipped 2n times in succession. What is the probability that
the number of heads and tails will be equal?
9. A rectangular box in 3-space is'subdivided into 2m congruent rectan­
gular boxes numbered 1, 2,..., 2n. (a) If n indistinguishable particles
are distributed in the 2n boxes, what is the total number of configura­
tions? (b) If all configurations have the same probability of occurrence,
what is the probability that boxes numbered 1,2,..., n will be empty?
10. If x > 0 and k is a nonnegative integer, show that
The remaining problems are too tedious to do manually.
software such as Mathematica or Maple V is appropriate.
Mathematical
11. A coin is flipped 20 times in succession. Find the probability, accurate
to six decimal places, that the number of heads and tails will be equal.
12. Suppose 20 dice, indistinguishable to the naked eye, are tossed. What is
the total number of possible configurations?
EQUALLY LIKELY CASE
This section will address only probability models of the type described in the
previous section. For such models, calculating probabilities consists of two
steps: counting the total number of outcomes and counting the number of
outcomes in a given collection. In doing so it is important either to make a
complete list of all outcomes or to give a precise mathematical description of
all such outcomes.
Consider an experiment in which two dice, one red and one white, are tossed
simultaneously. The outcome of the experiment is a complicated picture that
can be recorded only partially by a camera. It is not necessary to go that far,
however, since we are interested only in the number of pips showing on the
two dice, and that information can be summarized by creating a name or label
for it; e.g., the ordered pair (i,j), 1
i, j
6 can be used as a label for the
outcome in which the red die shows i and the white die shows j. Each such
outcome is an ordered sample of size 2 with replacement from a population of
size 6. The total number of such outcomes is 62 = 36. The collection of all 36
labels is shown in Figure 1.2.
In tossing two dice, we are not usually interested in the number of pips
on each die but rather in the sum of the two numbers; i.e., the score. For
example, consider the score of 4. We can identify the outcomes with a score
of 4 as those in the third diagonal from the upper left corner in Figure 1.2;
1 .4
13
EQUALLY LIKELY CASE
(1,1)
(2,1)
(3,1)
(4,1)
(5,1)
(6,1)
(1,2)
(2,2)
(3,2)
(4,2)
(5,2)
(6,2)
(1,3)
(2,3)
(3,3)
(4,3)
(5,3)
(6,3)
(1,4)
(2,4)
(3,4)
(4,4)
(5,4)
(6,4)
(1,5)
(2,5)
(3,5)
(4,5)
(5,5)
(6,5)
(1,6)
(2,6)
(3,6)
(4,6)
(5,6)
(6,6)
FIGURE 1.2 Outcomes for two dice.
namely, (1,3), (2,2), and (3,1). If A is the collection of these outcomes, then
P(A) = 3/36 = 1/12. In general, ifx is one of the scores 2, 3,..., 12 andp(x)
denotes the probability of the collection of outcomes having the score x, then
p(x) can be calculated in the same way. The results are shown in Figure 1.3:
FIGURE 1.3 Scores for two dice.
Coin flipping is an experiment that most people have performed. Consider
an experiment in which a coin is flipped n times in succession. An outcome of
this experiment can be labeled by an n -letter word using an alphabet made up
of T and H (or 0 and 1). For example, TTHTHH is the label for an outcome of
an experiment of flipping a coin six times in succession with tails occurring on
the first, second, and fourth flips and heads appearing on the remaining flips.
If n is large it is impractical to make a list of all outcomes, but we can count
the total number of outcomes because an outcome is an ordered sample of size
n with replacement from a population {T,H} of size 2. Thus, the total number
of outcomes is 2” by Theorem 1.3.1.
EXAMPLE 1.11 Suppose a coin is flipped 10 times in succession. The
total number of outcomes is 210 = 1024. What are the chances that there will
be three heads in the outcome? To answer this question, let A be the collection
of outcomes having three heads. Since each outcome has the same probability
assigned to it, we need only count the number of outcomes having three heads.
A label for an outcome consists of 10 letter positions that are filled by H’s or
T’s. There are C(10,3) ways of selecting three positions to be filled with H’s
and the remaining seven positions with T’s. Thus,
(10)
”
V 3 7
15 _
210 ” 128’ "
Notice in the wording of the question posed in this example that “three
14
1
CLASSICAL PROBABILITY
heads” is used rather than the “exactly three heads” that is commonly used in
elementary algebra books. Three means exactly three; the prefix “exactly” is
redundant.
In both the two-dice and the coin-flipping experiments, the outcomes
were regarded as ordered samples with replacement. The following example
illustrates counting ordered samples without replacement.
EXAMPLE 1.12 (The Birthday Problem) Consider a class of 30 students.
Each student has a birthday that can be any one of the days numbered
1,2,..., 365. Assume that the 30 birthdays of the students constitute an
ordered sample of size 30 with replacement from a population of size 365 and
that all outcomes have the same chance of occurring. What are the chances
that no two of them will have the same birthday? Let A be the collection of
outcomes for which there are no repetitions of birthdays; i.e., A is an ordered
sample of size 30 without replacement from a population of size 365. Thus,
|A| = (365)30 and
P(A) =
(365)3o
36530
which is equal to .29 rounded to two decimal places. Thus, it is unlikely that
no two will have the same birthday. ■
A poker hand can be considered an ordered sample without replacement or
an unordered sample as far as calculating probabilities is concerned, provided
there is total adherence to whichever of the two is adopted. It is customary
to consider poker hands as unordered samples. In counting outcomes, it is
important to not introduce order.
Suppose a poker hand of 5 cards is dealt from a wellshuffled deck of 52 playing cards. What is the probability of getting a royal
flush; i.e., 10,J,Q,K,A of the same suit? Regarding a poker hand as an unordered
sample of size 5 from a population of size 52, the total number of outcomes is
( ). Let A be the collection of outcomes that are royal flushes. We can form
EXAMPLE 1.13
a royal flush in the following way. We first select a suit from among the four
suits, which can be done in four ways; having selected the suit, the royal flush
is then completely determined. Thus, P(A) = 4/(^) = .0000015. ■
A common mistake is to introduce order into the following example where
there should be none.
EXAMPLE 1.14 Consider a poker hand as described in the previous
example. What is the probability of getting two pairs; i.e., a hand of the type
1 .4
15
EQUALLY LIKELY CASE
{*, x, y,y,z} where x, y, and z are distinct face values? There are 13 face values.
We first choose a subpopulation of size 2, which can be done in (
) ways,
to specify the face value for each of the pairs. We then choose the face value
for the singleton card from among the remaining 11 face values, which can
be done in 11 ways. All face values have now been selected. Since there are
four cards having the face value of the singleton, there are four choices for the
singleton. We now go to the lower of the face values of the two selected for
the pairs. Since there are four cards having that face value, we select a sub­
population of size 2, which can be done in ( ^ ) ways. Having done this we
now select a subpopulation of size 2 from the four cards having the other face
value for a pair, which also can be done in ( ^ ) ways. If A is the collection of
outcomes that have two pairs, then
X 11 X 4 X
P(A) =
which is approximately 1/20. ■
It might appear that order was introduced into this calculation when we
chose to look first at the pair with the lower face value, but the order was
already there once the two face values were chosen. In choosing the face values
for the two pairs, it would have been incorrect to say that this could be done in
13X12 ways, because this would regard “a pair of jacks and a pair of kings” as
different from “a pair of kings and a pair of jacks”.
An unordered sample of size r from a population of size n is called a random
sample if each sample has the same probability l/( ” ) of occurring. A poker
hand is a random sample of size 5 from a population of size 52.
We will conclude this section on counting by looking at a commonly used
sampling model. Consider a population consisting of
Type 1 individuals
and «2 Type 2 individuals. The population size is then n =
+ «2- Suppose a
random sample of size r is selected from the population. Since the population
contains individuals of both types, we can ask for the probability that the
random sample will contain k Type 1 individuals where 0
k
r. Of
course, k cannot exceed the number of Type 1 individuals, so we must also have
k
niji.e., 0
k
min{r, nJ. Let A be the collection of samples having
16
1
CLASSICAL PROBABILITY
k individuals of Type 1. Then
P(A) =
forO
k
min{r, nJ.
EXAMPLE 1.15 Ona given day, a machine produces 100 items. Assuming
that 10 of the items are defective, what is the probability that a random sample
of size 5 from the output will contain 3 defective items? Let A be the collection
of samples having 3 defective items. Then
More generally, suppose a population of size n contains
individuals of
Type 1, n2 individuals of Type 2, . . . , nk individuals of Type k. If a random
sample of size r is taken from the population, what is the probability that the
random sample will contain r( Type 1 individuals, r2 Type 2 individuals,. . . ,
rjt Type k individuals? Let A be the collection of such samples. Then
T1 '
■■■ X
'
' Tk
P(A) =
where r = rj + r2 + • • • +
.
EXAMPLE 1.16 If a bridge hand of 13 cards is dealt from a well-shuffled
deck of 52 playing cards, what is the probability that the hand will contain
three hearts, five diamonds, two spades, and three clubs? Since the hand is a
random sample of size 13 from a population of size 52, the total number of
outcomes is ( ). Let A be the collection of samples as described. Then
P(A) =
1 .5
EXERCISES 1.4
OTHER MODELS
17
In sampling problems, the student should first decide whether the sample is
unordered or ordered and, in the latter case, whether with replacement or
without replacement.
1.
Instead of the usual dice, consider two (regular) tetrahedral dice with
faces bearing 1,2,3,4 pips. If the two tetrahedral dice are rolled simulta­
neously, find the probability p(x) that the total score will be x where x
can be one of the integers 2,..., 8.
2.
If three tetrahedral dice are rolled simultaneously, find the probability
p(x) that the score will be one of the integers 3,..., 12.
3.
If three cubical dice are rolled simultaneously, find the probability
p(x) that the total score will be x where x can be one of the integers
3,4,...,18.
4. If you purchase a single ticket for a lottery in which a random sample
of size 6 is selected from the population {1, 2,..., 54}, what is the
probability that you hold the winning ticket?
5. In some state lotteries, a winning ticket must have six numbers between
1 and 48 listed in the same order as the numbers were successively
drawn at random without replacement. What is the probability that the
purchaser of a single ticket will hold the winning ticket?
6.
If 1000 raffle tickets are sold, of which 50 are winning tickets and you
purchase 10 tickets, what is the probability that you will have 2 winning
tickets?
7.
In a group of four people, what is the probability that no two will have
the same birth month?
8.
If a poker hand of 5 cards is dealt from a well-shuffled deck of 52
playing cards, what is the probability of getting a full house (i.e., 3 cards
with the same face value and 2 cards with the same face value)?
9.
If a poker hand of 5 cards is dealt from a well-shuffled deck of 52
playing cards, what is the probability of getting a straight flush (i.e., 5
cards in sequence in the same suit with the ace counting as a 1 or as the
highest card)?
10. In a fish-tagging survey, 100 bass are netted, tagged, and released. After
waiting long enough for the tagged fish to disperse, a second sample of
100 bass is taken, of which 5 are observed to be tagged. If the number
of bass in the lake is n, what is the probability that a random sample of
size 100 will contain 5 tagged fish? If you were asked to estimate the
number of bass in the lake, what would you estimate?
OTHER MODELS
In the early stages of probability theory, a controversy arose between M. de
Roberval and Blaise Pascal over the assignment of equal probabilities to
18
1
CLASSICAL PROBABILITY
outcomes. The basic issue of the controversy can be described as follows.
Suppose a coin is flipped until a head appears with a maximum of two flips. It
was argued by M. de Roberval that the outcomes H,TH,TT are equally likely
and each should be assigned probability 1/3; Pascal, however, reasoned that
they should be assigned the probabilities 1/2,1/4, and 1/4, respectively, on the
grounds that the coin could be flipped twice and the result of the second flip
simply ignored after getting a head on the first flip; thus, the two outcomes HH
and HT would have probabilities adding to 1/2.
Whether or not outcomes should be assigned equal probabilities in the case
of simple games of chance depends on what one calls an outcome. Consider,
for example, rolling two dice. If we declare the score obtained an outcome,
then the possible outcomes are 2,3,..., 12, and we have previously seen that
these outcomes should not be assigned equal probabilities but rather those
given in Figure 1.3. This suggests that we should have available a more general
model.
Consider an experiment with a finite number of outcomes &>i, co 2, ■. ■ > <on
and let O = {coi, a>2, ■■■, <u„}. For each i = 1, 2,..., n, let p(o>,) be a weight
associated with a,, satisfying
(i)
0 < p(w() < 1.
(”)
X"= !?(&>/) = 1-
The weight p(<Uj) will be called the probability of
a collection of outcomes, we define
. If A = {&>,,,..., <uIt} is
k
pw = J>(co,-);
;=i
i.e., P(A) is the sum of the weights of the outcomes in A. It can be
seen that Equation 1.1 is satisfied as follows. If A = {cu(l, ..., uj and
B = {cu;i,..., cup} have no outcomes in common, then A U B consists of the
outcomes cup,..., cuit, cup,..., cup and P(A U B) is the sum of the weights
associated with the latter outcomes. Thus,
P(A U B) = p(cup) + • • • + p(cup) + p(coj,) + • • • + p(cup)
=
+ ''' + p(wp)] + [p(wj|) + • • • + p(cup)]
= P(A) + P(B).
Similarly, Equations 1.2 and 1.3 are satisfied.
EXAMPLE 1.17 Consider an experiment for which the outcome can be
described by a four-letter word using the alphabet 0,1. If co is such an outcome,
a weight p(cu) can be associated with co by forming a product in which each
1 .5
19
OTHER MODELS
1 in a) is replaced by 1/3 and each 0 by 2/3. For example, if a> = 1110, then
p(<o) = 1/3 • 1/3 • 1/3 • 2/3 = (1 /3)3(2/3)1. Note that the exponent of 1/3 is
just the sum of the digits in <o and the exponent of 2/3 is 4 minus the sum of
the digits in w. There are 16 outcomes and 16 associated weights. It is tedious
to do so, but the 16 outcomes and weights can be listed and the sum of the
weights shown to be 1. ■
EXAMPLE 1.18 (n Bernoulli Trials) Fix 0 < p < 1. Let q = 1 — p and
let n be any positive integer. Let O be the collection of all words of length n
using the alphabet {0,1}. We can think of 0 and 1 as an encoding of failure
and success or tail and head, respectively, in n repetitions of a basic experiment
in which the probability of success is p and the probability of failure is q. If
(i) = {*;}"=! is an element of O, we associate with the weight
p(w) = p^>~' xiq(n~^j-i x>\
Clearly, p(w)
0 for each at 6 O. We need only verify that the sum of
all the weights is 1. Each outcome u> = {xj}"=1 such that
= k
has associated weight pkqn~k. The number of such outcomes is equal to the
number of ways of selecting k of the n letter positions to be filled with 1’s,
which is ( ” ). Thus, the sum of the weights of outcomes with X."= ixj = k
is ( ? >
)pkqn~k. If we now add the sums of these weights for k = 0,
K
we obtain by the binomial theorem
Xfppv*
fc=0
= (?+?)" = i-
This model goes by the name n Bernoulli trials. We have just seen that if we let
Ak be the collection of outcomes having k successes, then
P(AJ = (nk)pkqn~k. ■
A distributor plans to use an optical character recogni­
tion scanner to transfer the contents of a catalog of parts to a computer. Each
part has a 12-digit part number and a 13-digit stock number. The probability
that the scanner will misread a digit depends upon the digit being scanned; e.g.,
it is more likely that an 8 will be misread as a 3 than as a 1. Assuming that the
maximum probability that a digit will be misread is .01, what is the probability
that the part number and stock number will be recorded without error? We can
view this experiment as a succession of 25 trials in which success is interpreted
EXAMPLE 1.19
20
1
CLASSICAL PROBABILITY
to mean that a digit is read correctly and failure is interpreted to mean that a
digit is misread. Assuming that probabilities are assigned in accordance with
the Bernoulli model with p = .99, let A be the collection of outcomes having
no misreads. A consists of a single outcome with probability
P(A) *= (.99)25 = .78. ■
Consider an experiment in which a coin is flipped until
a head appears for the first time with a maximum of five flips. The outcomes
can be labeled H, TH, TTH, TTTH, TTTTH, TTTTT with weights P(H) =
1/2,P(TH) = 1/4, P(TTH) = 1/8, P(TTTH) = 1/16,P(TTTTH) =
1/32, and P(TTTTT) = 1/32. ■
EXAMPLE 1.20
The weights attached to the outcomes in this example were constructed using
Pascal’s line of reasoning.
The previous example suggests a coin-flipping experiment in which a coin
is flipped until head appears for the first time, at which time the experiment
terminates. The outcomes of this experiment can be described as an infinite
sequence of labels H, TH, TTH, TTTH,.... Does this list describe all out­
comes? What about the possibility that a head never appears? We can include
or not include an unending label TTT. . . for this possibility as we choose. We
will not include such a label on the grounds that in any instance in which this
experiment has been performed, the experiment terminates in a finite number
of steps. By analogy with the previous example, we can assign weights as
follows: p(H) = 1/2,p(TH) = 1/4,p(TTH) = 1/8,.... Note that even if
we included the outcome with label TTT. . . , there would be no weight left for
it because
p(h)+p(th) + -- - = jrl
n=l Z
and the sum of this geometric series is 1.
This model suggests an even more general model. Consider an experiment
for which there is an infinite sequence of outcomes
(d2, . . . . Let O =
{o>b a>2,...}, and for each i
1 let pfw,) be a weight associated with each
satisfying
(«)
0 < p(a)i) < 1.
(»«)
Z“=iP("i) = 1-
If A = {wj(,..., <w,m} is any finite subcollection of outcomes, we define
m
pw
=
H=1
1 .5
OTHER MODELS
21
if A = {(Dj', a>i2,...} is an infinite sequence of outcomes, we define
00
p(A) =
fc=l
Rather than making the distinction between finite sums and infinite sums as in
the last two equations, we usually Just write P(A) = X p(wit )> the range of k
being clear from the description of A. Again, Equations 1.1, 1.2, and 1.3 are
satisfied.
EXAMPLE 1.21 A pair of dice are rolled until a score of 6 appears for
the first time, at which time the experiment is terminated. A typical outcome
can be labeled by the word **
*
•••
6 where * represents a score other than 6;
if there are n asterisks preceding the 6 with n > 0, the weight or probability
associated with the outcome is
.n ( 5
f31 V
.
* * . . . * 6) = I — I —
n(
136/ \36
'--v---- '
n times
'
Note that the weights are nonnegative, and since the weights constitute the
terms of a convergent geometric series,
' f3! Y
= !
o\36/ \36/
The last model we will describe involves the concept of conditional prob­
ability. Consider two collections of outcomes A and B associated with an
experiment. Before performing the experiment, we have some notion of what
P(A) should be. Instead of performing the experiment and observing the
outcome, an impartial observer views the outcome and relates only partial
information to us; namely, that the outcome is in B. Quite often in a situation
like this, we would adjust our estimate of the chance that the outcome is in A.
For example, suppose the experiment consists of selecting a person at random
from a given population consisting of men and women in equal numbers.
Before performing the experiment, the probability that the selected person is
a man is 1/2. But if the experiment is performed and an impartial observer
tells us only that the selected person is color-blind, then we would adjust our
estimate of the probability that the person is a man to be much higher, because
color blindness is much more prevalent in men than in women.
To see how probabilities should be changed in the light of partial informa­
tion, we go back to the empirical law. Suppose the experiment in question
is repeated N times. Since the impartial observer conveys information to us
when the outcome is in B, we can ignore all repetitions for which the outcome
22
1
CLASSICAL PROBABILITY
is not in B. Let A Cl B denote the collection of outcomes that are in both A
and B. The number of outcomes for which the outcome is in B is N(B), and
among these N (A Cl B) are also in A. The relative frequency of occurrence
of outcomes in A among those in B is N(A Cl B)/N(B). This ratio should
stabilize near the new probability when N is large. Thus,
N(AHB) _ N(AQB)/N
N(B)
~
N(B)/N
P(ADB)
P(B)
Of course, P(B) must be positive for the quotient to be defined. This new
probability is called the conditional probability of A given B and is denoted by
P(A | B). We therefore define
I
P(ADB)
P(A|B, = -Ttsr-
(1.12)
P(ADB) = P(A|B)P(B).
(1-13)
Note that
EXAMPLE 1.22 Two dice are rolled and we are informed that the score is
6. What is the probability that there is a 3 on each die? Let A be the collection
of outcomes for which there is a 3 on each die and let B be the collection of
outcomes for which the score is 6. Then P(A | B) = P(A Cl B)/P(B) =
(l/36)/(5/36) = 1/5. ■
Note that the conditional probability can be viewed in the following way.
As soon as we are told that the score is 6, we are dealing with the population
{(5,1), (4, 2),..., (1, 5)}. Since there are only five outcomes in this new
population, the probability of the outcome (3, 3) is 1/5 .
There are probability models for which the probability mechanism is not
specified by giving the probability of each outcome but rather by a mixture of
such probabilities and conditional probabilities.
EXAMPLE 1.23 Suppose a bowl contains 10 red chips and 5 white chips.
An experiment consists of selecting a chip at random from the bowl. If the
drawn chip is red, it and 5 other red chips are returned to the bowl; if the chip is
white, it is discarded. A second chip is then selected at random from the bowl.
What is the probability that both chips will be red? This model is not described
in such a way that the probability of each outcome is known; it is described in
terms of probabilities of some outcomes and conditional probabilities. Let B
be the collection of outcomes for which the first chip selected is red and let A
be the collection for which the second chip is red. Then P(A | B) = 3/4 and
P(B) = 2/3 so that
1 .5
23
OTHER MODELS
P(B flA) = P(A | B)P(B) = |
by Equation 1.13. ■
EXERCISES 1.5
The reader should review infinite series, sums of infinite series, and infinite
geometric series before doing the following exercises.
1.
Determine the sum of the series ^”=0(l/2)3".
2. Determine the sum of the series 2L”=4(1/4)2”.
3. Suppose a pair of dice are rolled until a score of 7 appears for
the first time, whereupon the experiment ends. An outcome with
n scores different from 7 followed by a 7 is assigned probability
(5/6)” (1/6), n
0. What is the probability that the experiment will
terminate on an odd number of rolls of the dice?
4. If a pair of dice are rolled, what is the probability that the score will be
greater than or equal to 8?
5. Suppose a coin is flipped 10 times in succession. For i = 1,2,..., 10
let A, be the collection of outcomes for which there is a head on the ith
flip. Calculate P(Ai), P(Aa)> P(Ai Cl A2), and P(A2 | Ai). How are the
first three probabilities related? How are the second and fourth related?
How do these numbers change if the 10 is replaced by 20?
6. In the notation of Problem 5, calculate P (Ay | A,) fori
i <j
n.
7. Bowl 1 contains 10 red chips and 5 white chips. Bowl 2 contains 10
red chips and 10 white chips. A chip is selected at random from Bowl
1, transferred to Bowl 2, and then a second chip is selected at random
from Bowl 2. What is the probability that both chips will be red?
8. A pair of dice, one red and one white, are rolled. Let A be the collection
of outcomes for which the number of pips on the red die is less than or
equal to 2 and let B be the collection of outcomes for which the number
of pips on the white die is greater than or equal to 4. Calculate P(A ( B).
What does this say about the partial information “the number of pips
on the white die is greater than or equal to 4”?
9. A man has n keys of which one will open his lock and the others will
not. If he tries the keys randomly one at a time, what is the probability
that the lock will be opened on the rths try where 1
r S n?
10.
Consider an experiment in which the outcomes are the positive integers
1,2,.... For each k
1, let
P[)
=
1
- 1 - 1
k(k + V)
k
k + 1'
Can the p(k) serve as weights for a probability model?
24
1
CLASSIC AL PROBABILITY
SUPPLEMENTAL READING LIST
1.
2.
3.
4.
5.
E N. David (1962). Games, Gods, and Gambling. New York: Hafner Publishing
Co.
W. Feller (1957). An Introduction to Probability Theory and Its Applications, 2nd
ed. New York: Wiley.
Oystein Ore (1953). Cardano, The Gambling Scholar. Princeton, N.J.: Princeton
University Press.
M. A. Todhunter (1965). A History of the Mathematical Theory of Probability.
New York: Chelsea.
A. Tucker (1984). Applied Combinatorics. New York: Wiley.
AXIOMS OF PROBABILITY
INTRODUCTION
Rules for calculating probabilities.associated with simple games of chance were
developed in the works of P. R. de Montmort (1678-1719) and A. de Moivre
(1667-1754). These rules also began to be applied in mortality tables and life
insurance calculations as early as the late seventeenth century. Most of the
effort during this period was concentrated on specific problems dealing with
combinations. But eventually problems required more than combinatorial
methods, and powerful tools had to be developed for their solutions.
Terms such as “gain” and “duration of play” were commonly used during
this period and evolved into an abstract concept known as a “chance variable,”
much like “momentum” in mechanics. In any particular application, a chance
variable was defined in some natural way, not as a mathematical entity but
rather by its properties.
The publication of Foundations of the Theory of Probability by A. N.
Kolmogorov in 1933 marked the beginning of a rapid development of prob­
ability theory and its application to diverse fields, particularly during and
immediately after World War II.
The reader interested in alternatives to the axiomatic probability model
discussed in this chapter should read the book by Hamming listed in the
Supplemental Reading List.
The content of this chapter is rather abstract. A real appreciation of
probability theory cannot be gained without some firsthand experience with
a random device. Experiment with flipping a coin many times—you will be
surprised by some of the facets of randomness.
25
26
2
AXIOMS OF PROBABILITY
SET THEORY
A typical exercise in probability theory involving two dice will start out “Let A
be the event ‘the score is 11For our purposes, this layman s description of A
is an abbreviated form of “Let Abe the collection of outcomes a> with score 11
and consequently A is a subcollection of all possible outcomes. Later, we will
define an event to be a subcollection of the collection of all possible outcomes.
We saw in Chapter 1 that probability theory pertains to collections of
outcomes. Such collections are called sets. One starting point for developing
mathematics is the set N of natural numbers 1,2,..., which is denoted by
N = {1, 2, 3,...} and eventually leads to the set R of real numbers. Algebraic
and order properties of real numbers will be taken for granted.
A primitive notion of set theory is that of membership. We write x 6 X if
x is a member or element of the set X. If x is not a member of X, we write
x £ X. If X and Y are two sets, we write X C Y if x G X implies x G Y
and say that X is contained in or is a subset of Y. We say that two sets X and
Y are equal, written X — Y, if X C Y and Y C X. It sometimes happens in
manipulating sets that we end up with something that has no members. As a
matter of notational convenience, we use 0 to signify a set that has no elements
and call 0 the empty set.
We need a procedure for specifying sets. To obtain one, let p(x) be a
sentence containing a variable x. Then {x : p(x)} will denote the set of objects
x for which p(x) is true. For example, consider the sentence “x = lorx = 2
or x = 3.” Then {x : p(x)} consists of the natural numbers 1, 2, and 3. This
set is usually written {1, 2,3} for brevity.
EXAMPLE 2.1 Let p(x) be the sentence “x G N and x2
{x : p(x)} = {1, 2, 3,4,5,6}. ■
40.” Then
EXAMPLE 2.2 Ifa, b G R witha
b, then we have the usual definitions
of closed, open, and semiclosed intervals:
[a, b]
= {x : x G R and a < x
(a,b)
= {x : x G R and a < x < b}
[a, b)
= {x : x G R and a < x < b}
(a, b]
= {x : x G R and a < x
b}
b}.
Infinite intervals are defined similarly; e.g.,
[a, +°°) = {x : x G R andx
a}. ■
For the remainder of this section, we will assume that we are dealing with a
universe U. All objects under consideration will be members of U, and all sets
will be subcollections of U.
2.2
SET THEORY
27
Given X C U, the complement of X (relative to 17), denoted by Xc, is
defined by
Xc = {x : x (£ X}.
The set specified on the right should contain “x G 17” as part of its description,
but this part is customarily omitted when it is understood that we are dealing
with a fixed universe. It is easy to see that
0C = U
Uc = 0.
If X and Y are two subsets of 17, the union of X and K, denoted by X U Y, is
defined by
/UK = {x : x GXorx 6 Y};
the intersection of X and Y, denoted by X Cl Y, is defined by
X A Y = {x : x GXandx G Y}.
If X A Y = 0, we say thatX and Y are mutually exclusive or disjoint.
These concepts can be illustrated as follows. For 17 take the points inside
a rectangle in a plane, and for a subset A of 17 take the points within and on
a simple closed curve (e.g., a circle). If this is done for subsets X, Y, Z,...
of C7, the resulting picture is called a Venn diagram. The operations on sets
defined above can be depicted as in Figure 2.1. Venn diagrams can be helpful
for understanding set operations.
Shaded region: X u Y
Shaded region: Xc
FIGURE 2.1 Venn diagrams illustrating set operations.
28
2
AXIOMS OF PROBABILITY
If Xb X2,..., Xn is a finite sequence of sets, their union is denoted and
defined by
n
(J Xi = {x : x G Xj for some i = !,...,«}
*
1 =1
'
and their intersection by
n
Q Xi = {x : x G Xi for all i =
i=i
Similarly, if Xi,X2,... is an infinite sequence of sets, then the union and
intersection of the sets are defined by
X
Xi = {x : x G Xi for some i S 1}
i=1
and
P| Xi = {x : x G X; for all i S: 1},
i=i
respectively. As was the case with sums, rather than making the distinction
between finite unions (intersections) and infinite unions (intersections), we
usually just write UX,- (ITX,) if the range on i is easily ascertained.
The following example requires the use of the Archimedian property of the
real numbers, which states that if r is a real number, then there is a positive
integer n such that n > r.
For each n
l,letA„ = [0,1/n). Since 0 G An for all
n S: 1, 0 G DA„. Clearly, ClA„ cannot contain any negative numbers. But
what about positive numbers? Assume x > 0. By the Archimedian property,
there is a positive integer m such that m > 1/x. It follows that x > Um so
that x € Am, and thus x £ ClAn. It follows that C1A„ = {0}. ■
EXAMPLE 2.3
The union, intersection, and complement operations on sets are subject to
algebraic laws that in some cases, but not all, are the same as the algebraic laws
for real numbers. Corresponding to the addition and multiplication of real
numbers we have commutative laws:
xur = rux
x n y = y nx.
A proof of the commutative law for union requires that two things be proved;
namely, that X U Y C V UX and Y UX C X U Y. Consider the first relation.
2.2
SET THEORY
29
FIGURE 2.2 X n(YUZ) = (X C1Y)U(X n Z).
Suppose x G X U Y. Then x 6 X or x 6 T; but this statement is the same as
x G Y orx G X, and sox G Y UX. Thus,
*
G X U Y implies
*
G TUX. At
a crucial point in this argument, there is a claim that the statement “x G X or
x G y” is equivalent to the statement “x G Y orx G X.” To justify this claim,
we could move on to formal “truth tables,” but we will not. The equivalence of
the two statements is taken for granted as something from logic.
If X, Y, and Z are three sets, there are associative laws:
XU(TUZ) = (XUT)UZ
xn(ynz) = (xny)nz.
The associative laws permit us to omit the parentheses altogether since they can
be reinserted in any manner; e.g., A U B U C U D = ((A UB) U C) UD =
A U (B U(CUD)).
There are also distributive laws:
x n (y u z) = (X n y) u (X n z)
x u (y n z) = (x u y) n (x u z).
A convincing, but not rigorous, argument that the first distributive law is true
can be made by examining a Venn diagram for X IT (y U Z) as in Figure 2.2.
The two top shaded regions represent X IT Y and X IT Z. Their union is the
lower shaded region X IT (T U Z).
30
2
AXIOMS OF PROBABILITY
The effect of complementation on unions and intersections is the subject of
de Morgans laws:
(X U Y)c = xc n yc
(x fi y)j = xc u yc.
There are also more general distributive laws:
x n uy„ = u(x n y„)
x u ny„ = n(x u y„)
and more general de Morgan’s laws:
(ux„)c = nx'
(nx„)c = ux‘.
The following special relations hold for all X C U:
X no = 0
XUU = U
X UXC = U
X U0 = X
xnu =x
x n xc = 0.
Venn diagrams must be recognized for what they are—doodles. Equations
relating sets cannot be proved using Venn diagrams. Such proofs require
repeated applications of the laws defined above. Venn diagrams can be used
legitimately to prove negative results, however.
EXAMPLE 2.4 Consider the equation X fl y fl Z = X n y n (y U Z).
Is this equation true for all subsets X, Y,Z of a given U? The answer is no
if we can construct a U and X, Y, Z for which the equation is not true. By
constructing a Venn diagram for three sets, labeling the parts 1,2,..., 8 as in
Figure 2.3, and defining U = {1,2,..., 8}, X = {1,4,5,7}, Y = {2, 5,6, 7},
andZ = {3,4,6,7}, we obtain X fl Y fl Z = {7} # {5, 7} = XfiynfTUZ).
We thus have a specific example of X, Y, and Z for which the above equation
is not true, and therefore the equation is not always true. ■
FIGURE 2.3 Counterexample.
2.3
31
COUNTABLE SETS
Care must be taken when going beyond the relations listed above. For
example, if X U 7 = X U Z, there may be a temptation to conclude that
Y = Z because the analogous result is true in arithmetic. But the conclusion
would not be valid. For example, let U = {1,2,3, 4}, X = {1,2,4}, Y = {2,3},
andZ = {2,3,4}. Then? #Z,butX U Y = {1,2,3,4} = X U Z.
EXERCISES 2.2
1. Which of the following statements are correct? (a) 2 E {1,2,3},
(b) 2 C {1,2,3}, (c) {2} E {1,2,3}, (d) {2} C {1, 2, 3}.
2. Consider a universe U consisting of ordered pairs (i,j)> 1
i>j
6
where i represents the number of pips on a red die and j the number
on a white die. Express the lay statement “the number of pips on
the red die is greater than the number of pips on the white die” as a
proposition concerning elements of U, and identify the set A specified
by the proposition.
3.
IfA„ = [0,21/n), n >: 1, determine ClAn.
4. IfA„ = {(x,y) : x E R,y E K,0 == y == x”,0 == x < 1}, n > 1,
determine ClAn.
5. If A„ = {(x,y) : x G R,y E R, 0 == y
determine C1A„.
xn, 0 s x < 1}, n > 1,
6.
If A is any subset of the universe U, show that (Ac)c = A.
7.
If A and B are any two sets, show that A C B if and only if Bc C Ac.
8. Is it true that X D (7 UZ) = (X Cl 7) UZ for all subsets X, Y, and Z
of the universe U? If not, give an example to show that the equation is
not true in general.
9. Prove that (X U 7) Cl (X Cl 7)c = (X Cl 7C) U (Xc Cl 7) for all subsets
X, 7 of the universe U.
10. IfA„ = [0, | sin(n-n72)| ], n > 1, determine U~=1 (Cl
*A ) and
n:=1(u^„Aj.
COUNTABLE SETS
Let A and B be two nonempty sets and consider A X B, the collection of all
ordered pairs (x,y) with x E A,y G B. A function or mapping f from A to B
is a subset/ C A X B with the property that
(x,y) G/ and
(x,z) G / implies y = z.
The domain of / is the set
{x : x G A and (x,y) G / for some/ G B}.
(2.1)
32
2
AXIOMS OF PROBABILITY
We will assume that A is chosen so that A is the domain off. The range off is
the set
{y : y 6 B and (x, y) G f for some x G A}.
If (x,y) G /, then y is written/(x1) in the usual calculus terminology so that f
consists of all pairs (x,f(x)} as x ranges over A. All of the above is condensed
into a single symbol
/: A
EXAMPLE 2.5
B.
Let A = B = R and consider
f = {(x,y):x G R,y G R,-l < x < l,y > 0,x2+y2 = 1}.
Then/is a semicircle in the xy-plane. In the usual notation, f(x) = J\ — x2
with domain{x :x G fl, -1
x
1} and range{y : y G fl, 0
y < 1}, ■
EXAMPLE 2.6
Let A = B = R and consider
g = {(x,y) : x G R,y G R, — 1
x
l,x2 +y2 — 1}.
In this case, for each x with — 1 < x < 1, there are two values g(x) =
i -s/1 — x2 such that (x,g(x)) G g. Therefore, g is not a function or mapping
because 2.1 is not satisfied. ■
Finite and infinite sequences are specific examples of mappings. When
we speak of a finite sequence of real numbers
we are dealing with a
collection of ordered pairs (k, ak) where k E {1,2,..., n} and at G R. If we
let a be the collection of such pairs, then a : {1,2,..., n} —> R. Similarly,
an infinite sequence of real numbers is a mapping a : N —> R; if we put
ak = a(k), the usual notation for a is then a = {ajt}“=1. Care must be taken
to distinguish between the terms of an infinite sequence and the range of the
sequence. For example, if a = {(-then-1,+1,-1,... are the terms
of the sequence, but the range is the set {— 1, +1}.
We can use these concepts to make precise the commonly used term “finite.”
A set X is finite if for some n G N and some set B it is the range of a mapping
a : {1,2,..., n}
B. A set is infinite if not finite.
The set X is countable if it is the range of an infinite sequence; i.e., the range
of a mapping a : N —* B for some B containing X. We can always replace
B by X. By definition, the empty set 0 is countable. Finite sets are countable
because if X is the range of the finite sequence {o>}”=1, then it is the range
of the infinite sequence {ajt}^=1 where ak = an for all k > n. The set of
natural numbers N = {1,2,...} is countable because N is the range of the
map I : N —> N where /(n) = n, n
1. The set of even positive integers
2.3
33
COUNTABLE SETS
FIGURE 2.4 Countable union.
{2,4,6,...} is countable because it is the range of the map a : N —> N with
a(n) = 2m, m > 1. The set of negative integers {..., —2, —1} is countable
because it is the range of the map a : N —> R with a(n) = — n, m & 1.
A countable set can be finite or infinite. If we wish to exclude the finite case,
we say that X is countably infinite if X is infinite and countable.
The union of two countable sets is again countable; in fact, the union of
finitely many countable sets is again countable. The proof of the following
theorem is more palatable if it is looked upon as a programing problem. An
algorithm is given for each n s 1 for calculating the fcth term of a sequence
p and we would like to define a single algorithm for listing all of
thex„,j.
Theorem 2. 3.1
The union of a countable collection of countable sets is countable.
PROOF: We will assume that the collection is countably infinite. In this case,
the collection is the range of a sequence
of countable sets. We can
assume that each of the Xj is the range of an infinite sequence by repeating
one of its elements infinitely many times if this is not the case. Letting
X = U”= j Xj, we must show that there is a mapping a : N —> X having X
as its range. For; 2: 1, let X, = {«;&}”= r The terms of Xj appear in the
jth row of the array shown in Figure 2.4. An informal argument can be made
for arranging the elements of this array as a sequence by following the path
indicated in Figure 2.4. A map a : N —> X can be constructed using the same
idea but following a diagonal from lower left to upper right, dropping down to
the next diagonal, following the next diagonal from lower left to upper right,
and so forth. We will illustrate the construction by using the identity
to identify the element in the array corresponding to a(100). Note that
1 + 2 + --- + 13 =
13-14
— = 91,
34
2
AXIOMS OF PROBABILITY
which is the total number of elements in the array located in the first 13
diagonals. Starting at a14,i, if we move 9 positions along the 14th diagonal, we
arrive at the element in the array designated by a(100); it is easy to calculate
that a( 100) = a^g. ■
EXAMPLE 2.7 The set of integers Z ={..., — 2, —1,0,1, 2,.. discount­
able. This is true since Z is the union of the countable sets {..., — 2, — 1}, {0},
and{1,2,...}. ■
If q GN, let Z, = {..., ~2/q, -1/q, 0/q, 1/q,...}.
Then Zq is countable since Zq is the union of three sets {..., — 2/q, — 1/q},
{0/q}, and {1/q, 2/q,...} each of which is easily seen to be countable. ■
EXAMPLE 2.8
EXAMPLE 2.9 The set Q of rational numbers p/q, where q G N and
p G Z, is countable. This follows from the fact that Q = U j Zq and that
each Zq is countable and from Theorem 2.3.1. ■
Theorem 2. 3.2
IfY is countable and X C Y, thenX is countable.
PROOF: We can assume that X # 0, because otherwise X is countable by
definition. Since Y is countable, there is a mapping a : N —> Y with Y the
range of the map. Let Xo be a fixed element of X. Define fl : N —> X by
putting
>
ifa(n)G Y D Xc
x0
for n s 1. The range of (3 is then X and consequently X is countable. ■
The set R of real numbers is not countable. In view of the previous
theorem, it suffices to show that [0,1) is not countable. This is done by using
a method known as Cantor’s diagonalization procedure. Each x in [0,1) has a
decimal representation x = .d\d2 • • • where d, G {0,1, 2, • • •, 9}, i > 1. But
the representation is not unique. For example, 1/2 = .500 • • • = .499 • • •.
We will achieve uniqueness when this happens by using the representation
that has all zeros beyond some point. Assume that [0,1) is countable. Then
[0,1) = {x],x2,...}. Suppose x,- has the unique decimal representation
Xj = .dndi2,..., i
1, and consider the array
.du
d|2
du
■d2i
.d3i
d22
di2
d2i
da
2.4
35
AXIOMS
Consider the diagonal starting at dn. For each j > 1, choose ej different
from djj, 0, and 9. Then y =
’ represents a real number in [0,1) that
is different from each x,. But we assumed that the decimal representation
of every real number in [0,1) appears in the above array, and we have a
contradiction. Our assumption that [0,1) is countable leads to a contradiction.
EXERCISES 2.3
The last problem requires the use of the well-ordering property of the natural
numbers N, which states that if A C N and A # 0, then A has a least element.
1.
Iff = {(x,y) : x E R,y E R,y = x2, —1
domain and range.
x < 1}, determine its
2. If in the customary notation of the calculus/(x) = JI — x4, describe
f as a subset of R X R and determine its domain and range.
3.
If in the customary notation of the calculus/(x) = 1/
— x2,
describe / as a subset of R X R and determine its domain and range.
4.
If q E N and X = {p/q : p G N}, show that X is countable.
5. Let Xi, Xz,..., Xm be a finite sequence of countably infinite sets. Show
that X = Xj U • • • U Xm is countable.
6. Show that the set X of all infinite sequences of 0’s and 1 ’s is uncountable.
7. Show that N X N = {(m,n) : m G N,n E N} is countable by con­
sidering the collection of finite sets Ak = {(m, n) : m + n = k}.
8.
Let A and B be countable sets. Show that A X B is countable.
9.
Which of the following sets are countable?
(a) The set of circles in the plane having centers with rational coordi­
nates and rational radii.
(b) The set of all polynomials P(x) = a„xn + • • • + tyx + ao having
integer coefficients.
(c)
The set of all intervals (a, b) C R having rational endpoints.
10. Let Xi, Xz> ■ ■ • be an infinite sequence of countably infinite sets. Show
thatX = (j”=1X„ is countable.
AXIOMS
If A and B are disjoint collections of outcomes, we have seen that
P(AUB) = P(A) + P(B).
More generally, if Ab ..., A„ are disjoint, it follows from the empirical law that
n
\
n
;=1
/.
j=l
(
(2.4)
36
2
AXIOMS OF PROBABILITY
If the total number of outcomes is finite, there is no more to be said in regard
to Equation 2.4. But what if {Ay} is an infinite sequence of disjoint collections?
An experiment with an infinite number of outcomes was discussed in Chapter
1; namely, flipping a coin until a head appears for the first time. In this case, it
is possible to have an infinite sequence,{Ay} of disjoint collections of outcomes,
and so it makes sense to ask if
X \
X
U Ay =XP(Ay)y=i / y=i
(
(2-5)
This is a moot question for all the examples with finitely many outcomes
and has an affirmative answer for the single model just described. Since
Equation 2.5 is compatible with every example we have considered, can we
assume that Equation 2.5, in addition to Equations 1.1, 1.2, and 1.3, is valid in
a general model for probability theory? We can, but we cannot have everything
we would like to have. It turns out that we cannot assume that Equation 2.5 is
valid for all sequences {Ay} of disjoint collections of outcomes and at the same
time assume that P(A) is meaningful for all possible A. We must give up one
of the two assumptions. We will give up the latter, and so P(A) may not be
meaningful for some A.
Let O denote the collection of all outcomes for a given experiment. The
following definitions are needed to limit the A for which P(A) will be defined.
Definition 2.1
A collection si of subsets of fl is an algebra if
1.
A,B€=si implies A U B G si.
2. ASsl impliesAc G si.
3.
ft Esi. ■
That is, si is an algebra of subsets of O if it is closed under the operations
of union and complementation and O G si. A mathematical induction
argument can be used to show that an algebra si is closed under finite unions;
i.e., if A|, A2,..., An G si, then Uy1-। Ay G si.
The important thing to remember about algebras is that by starting with a
finite number of elements of the algebra and performing a finite number of
union, intersection, and complementation operations on them, the result is
still in the algebra.
Let si bean algebra of subsets of O and let A i,A2,A3,A4
be elements of the algebra with some or all having nonempty intersections.
If we need to restructure the union U^=1Ai into a union of disjoint sets
in .91, we could let B, = AbB2 = A2 A A‘,B3 = A3 A (Ai U A2)f, and
B4 = A4 A (A 1 U A2 U A3)c. Then By C Ay, 1 < j < 4, the By are disjoint,
and UAy = UBy. ■
EXAMPLE 2.10
2.4
37
AXIOMS
We need to postulate more if we want to deal with infinite sequences.
Definition 2.2
A collection
1.
2.
of subsets of 17 is a cr-algebra if
is an algebra.
If{Aj] is an infinite sequence in
then (JAy E S'. ■
The important thing to remember about a-algebras is that by starting with a
sequence of elements of the cr-algebra and performing countably many union,
intersection, and complementation operations on them, the result is still in the
cr-algebra.
EXAMPLE 2.11 Let 17 be a finite set of outcomes and let si be the
collection of all subsets of 17. Then si is an algebra. ■
Let 17 be any set and let S' be the collection of all subsets
of 17. Then S' is a cr-algebra. Clearly, 17 E S' since 17 C 17. If {A„} is a finite
or infinite sequence in S', then UA„ is a subset of 17 and therefore is in S'. If
A E S', then Ac C 17 and Ac E S'. ■
EXAMPLE 2.12
If
is any collection of subsets of a set 17, then there is a “smallest
cr-algebra,” denoted by cr(®), that contains SZk In discussing probability, we
began with objects (o that were used to form collections A that have now
been used to form cr-algebras S'. This process has taken us through three
hierarchical levels of set theory, and to prove the result just stated would require
going to a fourth hierarchical level. This fourth level is left to more advanced
texts. For the time being, we have all the concepts needed to describe a general
probability model. 17 will be a fixed collection of outcomes.
Definition 2.3
A probability space is a triple (£l,'3',P') where & is a nonempty cr-algebra of
subsets of 17 and P is a mapping from S' to R satisfying
1. P(ty = 1.
2. 0 < P(A) < 1 for all A E S^.
3. If {Ay} is a finite or infinite disjoint sequence in
then
P(UA;) = 2ZP(A;). ■
All the simple games of chance described in Chapter 1 for which 17 is finite,
S' is the collection of all subsets of 17, and P is defined as described there result
in probability spaces (17, S', P).
Definition 2.4
If (fl,
P) is a probability space, elements of'3' are called events. ■
38
2
AXIOMS OF PROBABILITY
We return to a model discussed in Section 1.5.
EXAMPLE 2.13 Let O = {w b a>2, ■ ■ ■} be countably infinite, S' the aalgebra of all subsets of O, and
) a weight function as defined in Section
1.5. Define P(A),A E
as in Section 1.5. Then (0,9%P) is a probability
space. To show this, we need only verify Item 3 of Definition 2.3. Let {Ay} be a
sequence of disjoint events in S'and let A = UAy. Suppose A = {&>,,, a>,2,...}.
The fact that the series
) is a convergent series with sum P(A) means
that the terms of the series can be rearranged without affecting the sum of
the series. This fact about absolutely convergent series is proved or at least
discussed in most calculus books. We rearrange the terms of the series so that
the termsp(w;.) with
E Ai come first, then the termsp(o>(v) with 0$ E A2
second, and so on, to obtain
P(A) =
J
+ Z
=
+■■■
= P(Ai) + P(A2) + • • •.
Therefore,
P(UAj) = 51p(A)«
and (O, S’, P) is a probability space. ■
We have previously encountered the following situation. Suppose a coin
is flipped until a head appears for the first time with a maximum of n flips.
Suppose n is a large positive integer. Let A be the event “the experiment
terminates on the fifth flip.” We can think of this experiment continuing
through all n flips of the coin and simply ignoring what happens after the fifth
flip. If a) E A, then the first four letters of a> are T, the fifth is H, and there are
two choices for each of the remaining n — 5 letters. Thus, |a| = 2”-5,
and it appears that P(A) does not depend upon n at all! This computation is
based on Pascal’s reasoning in which we think of the coin as continuing to be
flipped beyond the fifth flip and simply ignoring everything beyond the fifth
flip. Why bother to mention the number n at all? If we eliminate mentioning
n at all, then we are confronted with a conceptual experiment in which a
coin is continually flipped. We can, in fact, construct a probability space
(0,9’, P) with O consisting of outcomes w that are words of infinite length
2.4
AXIOMS
39
using the alphabet T, H, and probabilities for events such as A described above
are calculated by fixing some large n. We will consider a more general model.
In the following example, O will denote the set of all infinite sequences
{xi}/°=1 where each x, is a 1 or a 0. We can think of 1 and 0 as an encoding
of H and T or S and F, respectively, where S stands for success and F stands
for failure. This
is uncountable; i.e., not countable. (See Exercise 2.3.6.)
Therefore, none of the models we have discussed pertain to O. The model
depends upon a parameter p, called the probability of success, with 0 < p < 1.
The number q = 1 — p is called the probability offailure. Whenever p and q
appear, these conditions onp and q will be taken for granted without comment.
EXAMPLE 2.14 (Infinite Sequence of BernouDi Trials) Fix 0 < p < 1
and let q = 1 — p. Let O be the set described above and let S'o be the collection
of subsets A of fl of the form
A = {co : co =
= 8i,...,xin = 8„},
(2.6)
where n is any positive integer, 1 < ix < i2 < • • • < in, and each 8; is a 0 or
a 1. We think of the x, as the results of successive trials. For S' we take (/(S'o),
the smallest cr-algebra containing S'o- As an illustration of how probabilities
are to be computed, consider the event “1 on the second trial, 0 on the fourth
trial, and 1 on the eighth trial”; i.e., the event
A - {co : co = {xi}°°=1,x2 = l,x4 = 0,x8 = !}•
Then
P(A) =p2q =p2(l-pf
Note that
p(A) = px2+x*+»
qi-(x2+x4+xa)
For an event A of the type described in Equation 2.6, its probability is defined
to be
(2.7)
P(A) =
Note that X”= i fy is the number of l’s in the trials numbered ib i2,..., in and
n — X”= i
is the number of 0’s in the same trials. It cannot be done here,
but it is possible to extend the definition of P so that P(A) is defined for all
A E S'. Any set of outcomes that can be expressed in terms of events placing
restrictions on only a finite number of trials will also be an event. Consider the
event A described by “a 1 eventually appears in the outcome co” ; i.e.,
00
A = {to : co =
1}.
;=i
40
2
AXIOMS OF PROBABILITY
If we let Aj be the event “1 appears for the first time on the jth trial,” then
A = |J Aj G &
for the reasons just cited, and the Aj are disjoint. Since P(Ay) = q}~xp and
the latter is the general term of a geometric series,
'p = P^q’ 1 =
^(A) =
j=|
= 1- ■
;=1
*1
There is no reason to limit the number of results of each trial to just the 0 and
1 of the preceding example. We can allow the possibility that each trial results
in one of k possibilities r(,
., r* with associated weights pi,p2> • • • >pk>
where 0
p,;
1, i = 1,..., k. Suppose n
1,1 < q < i2 < • • • < i„,
and 3i,..., 8„ G {rb ..., fy}. For the event
A = {to : to = {xy}
*
=1,x;, = 8„
= 8„},
we can define
P(A) = p?' Xp^X--’X p”*,
where ntj is the number of trials resulting in r,, 1
i
k. This model is
applicable to an unending sequence of throws of a die where the result of each
throw is one of the integers 1,2,3,4,5,6 with weight 1/6 associated with each.
EXERCISES 2.4
1.
Using mathematical induction, write out a formal proof that an algebra
is closed under finite unions; i.e., for every n >: 1,
n
Ai,..., A„ G sA implies [J Ay G s£.
2.
If is an algebra of subsets of O, show that
intersections.
is closed under finite
3. Let S' be a cr-algebra of subsets of O. Show that S' is closed under
countable intersections; i.e., if {Ay} is a finite or infinite sequence in S',
then Pl Ay G S'.
4. Let O be an uncountable set and let S' be the collection of subsets A of
O such that either A is countable or Ac is countable. Show that S' is a
cr-algebra.
5.
Consider an infinite sequence of Bernoulli trials with probability of
success p. What is the probability that a success (or 1) will occur for the
first time on an even-numbered trial?
2.5
PROPERTIES OF PROBABILITY FUNCTIONS
41
6. An experiment consists of tossing a pair of dice until a score of 8 is
observed for the first time, whereupon the experiment is terminated.
What is the probability that it will terminate on an odd number of
tosses of the dice?
7. A bowl contains w white chips, r red chips, and b black chips. Chips are
successively selected at random from the bowl with replacement. What
is the probability that a white chip will appear before a black chip?
8. If S' is a cr-algebra of subsets of O and {A;- }”=, is an increasing sequence
in S' (i.e., An C A„+J for all n > 1), show that there is a disjoint
sequence {By }”= j in S' such that U JL, Aj = U °°= l Bj.
9. If S' is a cr-algebra of subsets of O, {Aj }”= l is a decreasing sequence in S'
(i.e.,A„+i C A„ for all m S 1), and A = Cl JL, Ay, show that there is a
decreasing sequence {Bj}j°=, in S'such that An = A(JB„, A IT B„ =0
for all n si and d”= j Bj = 0.
PROPERTIES OF PROBABILITY FUNCTIONS
Throughout this section, (O, S', P) will be a fixed probability space as described
in Definition 2.3. We will now deduce several properties of the probability
function P from the axioms listed in Definition 2.3.
Consider two events A, B E&. Since fl = A U Ac, intersecting both sides
of this equation with B we obtain
B = B no = B n (A U Af) = (B n A) U (B n Ac);
i.e., we can decompose B into two parts according to whether or not an
outcome in B is in A or not in A. Since B Cl A and B Cl Ac are disjoint, by Item
3 of Definition 2.3,
P(B) = P(B A A) + P(B IT Ac) for all A, B G S?.
(2.8)
If we put B = O and use Item 1 of Definition 2.3, then 1 = P(O) =
P(O A A) + P(O A Ac), so that
P(Af) = 1 - P(A) for all A G S?.
(2.9)
In particular, P(0) = 1 — P(O) = 0.
Consider n flips of a coin and let A be the event “the
outcome a> has one or more heads.” Calculating P(A) directly is complicated,
but calculating P(AC) is easily done because Ac consists of just one outcome
having a label of n T’s. Since each outcome has probability 1/2”, P(A) =
1 - P(AC) = 1 - (1/2"). ■
EXAMPLE 2.15
I
42
2
AXIOMS OF PROBABILITY
Suppose now that A, B G 9% A C B. Then A Cl B = A, and so Equation 2.8
becomes P(B) = P(A) + P(B A Ac). Since P(B fl Ac) > 0 by Item 2 of
Definition 2.3,
P(A) :£ P(B) whenever A, B E S', A C B.
(2.10)
If A and B are any two events, then A UB = (AABC)U(AAB)U(AC AB);
i.e., A U B can be split into three parts: (1) those outcomes in A but not in B,
(2) those outcomes in both A and B, and (3) those outcomes in B but not in
A. Thus,
P(AUB) = P(AflBc) + P(AnB) + P(Ac AB).
Applying Equation 2.8 to the first and third terms on the right and simplifying,
we obtain
P(AUB) = P(A) + P(B)-P(AAB).
(2.11)
EXAMPLE 2.16 A card is selected at random from a deck of 52 cards.
What is the probability that the card selected will be a king or a spade? Let A be
the event “the outcome w is a king” and let B be the event “the outcome w is a
spade.” The required probability is P(A U B) = P(A} + P(B) — P(A A B) =
1/13 + 1/4 - 1/52 = 4/13. ■
If A, B, and C are any three events, then
P(A U B U C) = P(A) + P(B) + P(C) - P(A A B)
- P(A A C) - P(B A C) + P(A A B A C).
More generally, if A(,..., A^ are any events, then
N
P(A! U---UAN) = ^P(Af) i=l
+
2
P(Ai,AAj
|£i|<i2£N
2Z
P(A>, AA,-2 AAb)
+P(A| AA2 A-- - AAn)
N
= 22(-l)r-1
r=l
2Z
P(Ai, A-.-AAJ. (2.12)
SN
This result goes by the name inclusion/exclusion principle and can be proved
using mathematical induction.
Returning to Equation 2.11,
P(AUB) < P(A)+ P(B) for all A, B G 9?
2.5
43
PROPERTIES OF PROBABILITY FUNCTIONS
since P (A Cl B) 2 0 by Item 2 of Definition 2.3. This inequality is a special case
of a more general inequality whose proof will require the following lemma.
Lemma 2.5.1
If {Aj}°°=1 is a sequence of events, then there is a disjoint sequence {By}y°=1 of
events such that Bj QAjforallj
1, U;"=1Ay = \j"=l Bj for all n 2 1, and
UBj = UAy.
PROOF: Let Bi = Aiand Bj = Ay Cl (UAi)c for; 2 2. Clearly, By C Ay
i < y — 1,
for ally 2 1. For 1
//-i
Bi n By C Ai n Bj C
\
U A; In By = 0.
\i=1
/
Thus, B; Ci By = 0 whenever 1 £ i < j — l,j 2 1. This means that the By
are disjoint. Clearly, U"=1 By C U"=1Ay. Supposes 6 U"=1Ay. Then there
is a smallest integer k £ n such that co G Ak. Thus, co G Ak Cl (U ^A,- )c =
Bk C U nk = xBk, and it follows that U" =! Ay C U i By and therefore that the
two are equal. The proof of the last assertion is essentially the same. ■
Theorem 2.5.2
(Boole’s
Inequality)
//{Ay} is any sequence of events, then P(UAy) £ X P(Ay).
PROOF: By Lemma 2.5.1, there is a disjoint sequence of events {By} such that
By CAj,j 2 l.andUBy = UAy. By Inequality 2.10, P(By) £ P(Ay),y 2 1.
Since the By are disjoint,
P(UAy) = P(UBy) = 2LP(By) <
Theorem 2.5.3
P(Ay). ■
Let {Ay}}°= j be a sequence of events.
(i)
(ii)
If Ai C Az C • • • is an increasing sequence and A = (J™=lAj, then
P(A} = lim„_»«> P(A„).
If Ai 3 Az 3 ••• is a decreasing sequence and A = (\™=lAj, then
P(A) = lim„_»ooP(A„).
PROOF: (i) Let {Ay}°°=1 be an increasing sequence of events and let A =
U ”= j Ay G S'. Note that U;”= j Ay = A„. By Lemma 2.5.1, there is a disjoint
sequence of events {By}”= j such that By C Ay,j 2 1,U;"=1 Aj = Uy = 1 Bj,
and UAy = UBy. By Item 3 of Definition 2.3,
00
\
(
/ 00
\
00
|jAy = P
By =
j=i /
V=i / j=l
/ n
n
= lim y P(By) = lim P
n—,oo-f—'
;=1
n-+oo
*
\
\
I ]Bj = lim P(A„).
V=1
I
/
n-»<»
44
2
AXIOMS OF PROBABILITY
(») Let {A;}“= j be a decreasing sequence of events and let A =
A; G S'.
Then {AJ}
*
=, is an increasing sequence of events, and Af = (np=1A;)c =
j Aj. By the first part of the proof,
U
1 -P(A) = P(AA) =.Jim P(A„)
= lim (1 - P(A„)) = 1 - lim P(A„),
n -*
w
n -* x
and so P (A) = limn_xP(An). ■
EXERCISES 2.5
1.
In manufacturing brass cylindrical sleeves, 5 percent are defective
because the outer diameter is too small and 3 percent are defective
because the inner diameter is too large. What is the best you can say
about the probability that a sleeve selected at random from a lot will be
defective?
2. If A, B, and C are any three events, show that P(A U B U C) =
p(A)+P(B)+P(G)-P(AnB)-P(Anc)-P(Bnc)+P(AnBnc).
3. Consider three events A, B, C for which P(A) = 1/3, P(B) = 1/4,
P(C) = 1/2, P(ADB) = 1/8, P(AflC) = 1/8, P(B Cl C) = 3/16,
and P(A ABAC) = 1/32. CalculateP(A U B U C).
4. An integer is chosen at random between 0000 and 9999. (a) Use the
inclusion/exclusion principle to calculate the probability that at least
one 1 will appear in the number, (b) Calculate the same probability
assuming that the experiment is that of four Bernoulli trials.
5. Show that the probability that one and only one of the events A and B
will occur is
P(A) + P(B)-2P(ACIB).
6. The mid-seventeenth century gambler Chavalier de Mere thought that
the probability of getting at least one ace with the throw of four dice is
equal to the probability of getting at least one double ace in 24 throws
of two dice. Was de Mere correct?
7. Consider an infinite sequence of Bernoulli trials with probability of
success p. If o»o is any outcome, show that P({o)q}) = 0. (Note: There
is no significance to the fact that each outcome has probability 0 where­
as the aggregate of all outcomes has probability 1! After all, points in
the interval [0,1] have zero length, but the aggregate [0,1] has lengthl.)
8. If P(A) = .8 and P(B) = .75, show that P(A AB) > .55. More
generally, show that if A and B are any two events, then
min(P(A), P(B)) > P(A A B) > P(A) + P(B) - 1.
2.6
45
CONDITIONAL PROBABILITY AND INDEPENDENCE
9. If Ai,..., An are any events, show that
P(A] n ••• riA„) > PCAJ + ’-’ + PCAJ - (n - 1).
CONDITIONAL PROBABILITY
AND INDEPENDENCE
Conditional probabilities are defined for general probability spaces as in
Equation 1.12.
Definition 2.5
For B G S' with P(B) > 0 and A G S', define
P(A|B) =
P(A n B)
P(B)
Since P(A|B) associates with each A G S' a real number, it is a function
from S' to R, which we denote by P(-|B) and call a conditional probability
function. An immediate consequence of the definition is the equation
P(A n B) = P(A|B)P(B),
A,B G &,P(B) > 0,
(2.13)
which is sometimes called the law of compound probabilities. If P(B) = 0,
we usually define P(A|B) = 0, which is consistent with Equation 2.13 since
P(AClB) = 0 whenever P(B) = 0.
It was pointed out at the end of Section 1.5 that some probability models
are described not by specifying the probability of each outcome but rather by
a combination of outcome probabilities and conditional probabilities as in the
following example.
A bowl contains 10 red balls and 10 white balls. An
EXAMPLE 2.17
experiment consists of selecting a ball at random from the bowl, replacing it
by a ball of the other color, putting the replacement into the bowl, and then
selecting a second ball at random from the bowl. There are four outcomes of
the experiment: (R,R), (R,W), (W,R), and (W,W). Probabilities of these four
outcomes are not given explicitly, but the model is described so that they can be
determined. To do this, let R । denote the event “first ball selected is red” and
let R2 denote the event “second ball selected is red.” The following numbers
are the given data:
1
P(Ri) = 5
1
R(R?) = j
11
= 55
11
PtfzlRS) = 55
*.)
PCRzl
9
= 55
9
P(R5lRD = 55
46
2
AXIOMS OF PROBABILITY
As an illustration of these computations, consider PfRalPi)- Given that the
outcome is in R i, at the time of the second selection there are 9 red and 11 white
balls in the bowl, and so the probability that the second ball will be red is 9/20.
Probabilities of individual outcomes can be calculated using Equation 2.13;
e.g.,
*
%
9 1
9
P((R, R)) = P(R1 IT R2) = P(R2|R j )P(R,) = - • - =
■
All the theorems proved for probability functions in the previous section are
true for conditional probability functions P(-|B) with a fixed B G S', P(B) > 0.
To see this, define
P(A) = P(A|B) for A G S\
SinceP(O) = P(OAB)/P(B) = 1, Item 1 of Definition 2.3 is satisfied. Since
for A G S', 0 s P(A) = P(A|B) = P(AAB)/P(B)
1, Item 2 is satisfied.
Let {Ay} be a finite or infinite disjoint sequence in S'. Then
= P(UA,|B) =
Since the events Ay A B are disjoint,
=ZPWj).
and Item 3 of Definition 2.3 is satisfied. Since the theorems proved in the
previous section were consequences of Items 1,2, and 3 in Definition 2.3, these
same theorems are true for conditional probability functions P(-|B) for fixed
B G S', P(B) > 0. For example, if {Ay}“_ 1 is an increasing sequence of events
with A = U ”=1 Ay, it is not necessary to give a proof that
lim P(A„|B) = P(A|B).
One of the most useful applications of conditional probabilities is known
as Bayes’ rule. Let Ab A2,..., An be a finite disjoint collection of events that
exhausts O; i.e., O = U"=1Ay. We think of the Ab ..., A„ as a stratification
of O. If B is any other event with P(B) > 0,1 < i < n, then
P(A,|B) =
1
P(B)
2.6
CONDITIONAL PROBABILITY AND INDEPENDENCE
47
Since B = (J ”=1(B A Af) with the latter events disjoint,
n
j=i
and so
P(A,|B) =
P(B|A,)P(A,)
i =
X-^P^A^AjY
(2.14)
Note that all the probabilities P (A iP (A„), P (B | A iP (B | A„) must
be given data to apply Bayes’rule and that the Ai,..., A„ are disjoint and exhaust
fl. It was tacitly assumed in this discussion that P(A,) > 0,1
i £ n. This
is always true of at least one A;, and the last equation is true assuming only that
P(B) > 0.
EXAMPLE 2.18
A bowl contains three red balls and one white ball. A
ball is selected at random from the bowl, replaced by a ball of the other color,
and returned to the bowl. A second ball is then selected at random from the
bowl. Given that the second ball is red, what is the probability that the first
ball was red? Let R, be the event “ith ball is red,” i = 1, 2. Then Ry and
Ri are disjoint and exhaust fl. We are given the data P(Pi) = 3/4, P(J?f) =
1/4,P(/?2|Pi) = l/2,P(P2|Pf) = 1. Thus,
P<k IR.) =
P(R2|RQP(R,)
' 11 21
p(r2|Ri)P(Ri) + p(r;|r;)P(Rs)
= 3
s’
The next application of Bayes’ rule has to do with the settlement of paternity
cases in a court of law and necessitates a crude review of genetics related to
blood types.
In conceiving a child, each parent contributes one of the alleles O, A, or B
to form one of the pairs 00, AO, AA, BO, BB, AB, called genotypes. Both
A and B are dominant over O; neither A nor B is dominant over the other.
The observed blood types, called phenotypes, of the child can be O, A, B, or
AB. Figure 2.5 gives combinations of genotypes and phenotypes as well as the
proportion of each combination in the general population.
Genotype
Phenotype
Proportion
00
0
.479
OA
A
.310
AA
A
.050
OB
B
.116
BB
B
.007
AB
AB
.038
FIGURE 2.5 Frequencies of genotypes and phenotypes.
48
2
AXIOMS OF PROBABILITY
EXAMPLE 2.19 (Paternity Index) Jane of blood type A claims in court
that Dick of blood type B is the father of her child of blood type B. The
following calculations are made to support her claim. Consider an experiment
in which a person is selected at random from the population of adult males.
Let E be the event “The child of the person selected is of blood type B” and
let F be the event “The person selected is Dick.” The genotype of the child is
either OB or BB. Since the mother has blood type A, her genotype can only
be OA, and she passed on the O allele to her child. The genotype of Dick
is unknown, but it must be either OB or BB. Let Fob and Fbb be the events
“Dick’s genotype is OB” and “Dick’s genotype is BB,” respectively. Then
n/r,lr,x
P(E nF|F0B)P(F0B) + P(E nF|FBB)P(FBB)
P(EClF)
P(C|F) = "wT “------------------------------------------ W------------------------------------------- ■
From Figure 2.5, P(E D F|FqB) = .116,P(FqB) = .5,P(E IT F|Fbb) =
I.P(Fbb) = -007, and P(F) = .123. Therefore,
1
= (.n6)(.5H.Q07 = 528
.123
We now calculate P(E|Ff), the probability that the child is of blood type B
given that someone other than Dick is the father. Since that someone must
have blood type B and therefore genotype OB or BB, and in the first case there
is a 50-50 chance that the B allele will be passed to the child,
P(E|FC) = (.116)(.5) + .007 = .065.
The quantity
P(E|F) = ___
P(E|FC)
.123
is called the paternity index and is interpreted to mean that a person of blood
type B is eight times more likely to be the father than some other person. The
paternity index is just as applicable to one man of blood type B as it is to any
other man of the same blood type. This is a useful index, but it does not give
us the probability that Dick is the father of the child with blood type B. By
Bayes’ rule,
P(F|E) =
P(E|F)P(F)
P(E|F)P(F) + P(E|Fc)P(Ff)’
We can use the calculations above to obtain the two conditional probabilities
on the right, but to complete the computation we need to know P(F). Jane
claims that this number should be 1 and Dick claims that it should be 0. In
2,6
CONDITIONAL PROBABILITY AND INDEPENDENCE
49
this situation it is customary to compromise by using the figure P(F) = .5, in
which case P(F|B) = .89. ■
The reader might question the applicability of Bayes’ rule in paternity cases
but not Bayes’ rule itself. The basic premise in this example is that Jane chose
an adult male at random from the population of adult males and the chosen
person fathered her child.
As in Chapter 1, we will interpret P(A|B) as the probability of A given
the partial information that the outcome is in B. It sometimes happens that
the partial information is irrelevant as far as the event A is concerned; i.e.,
P(A|B) = P(A) or, using Equation 2.13, P(A Cl B) = P(A)P(B). In this
case, the events A and B are said to be independent. We will reformulate the
definition so that it is not required that P(B) > 0.
Definition 2.6
The events A,BE9 are independent events if
P(AQB) = P(A)P(B). ■
The definition is now symmetric in A and B.
Consider a roll of two dice, one red and one white. Let
Ri, i = 1,..., 6 be the event “i pips on the red die” and let Wj,j = 1,..., 6
be the event “j pips on the white die.” Any pair R, and Wj are independent
events since
EXAMPLE 2.20
p(Ri n Wj) = P«i,j)) = 1 = 1-1 = P(Ri)P(Wj). ■
30
O O
Generally speaking, any event specified solely by conditions on a red die
will be independent of any event specified solely by conditions on a white
die. Let A be the event “even number of pips on the red die” and let B be
the event “odd number of pips on the white die.” By examining Figure 1.2,
P(A Cl B) = 1/4 = 1/2 • 1/2 = P(A)P(B).
Theorem 2. 6.1 IfA and B are independent events, then each of the pairs A and Bc, Ac and B, Ac
andBc are independent.
PROOF: Consider the pair A and Bc. By Equation 2.8, P(AABC) = P(A) —
P(AClB) = P(A) - P(A)P(B) = P(A)(1 — P(B)) = P(A)P(BC), and so A
and Bc are independent. Similarly for the other two pairs. ■
If A, B, and C are any three events, independence of the three could be taken
to mean that the three pairs A and B, A and C, and B and C are independent
50
2
AXIOMS OF PROBABILITY
pairs. This type of independence is called pairwise independence. In some
models there is a stronger built-in independence.
Definition 2.7
The events Ab ..., An
are mutually independent if
P(Ai, n A,-) = P{A,)P(A<2),
h # i2,1
ib i2
n
P(Ait DA^QA^ = P(Ai,)P(Ai2)P(Ai})
for 11,1'2,1'3 distinct,!
i]> i2> 1'3 — n
P(A\ aa2a---aa„) = p(a1)P(a2)x---xp(a„). ■
The total number of conditions imposed in this definition is easily calculated
using Equation 1.10 and is 2" — n — 1. It is possible for events to be pairwise
independent but not mutually independent.
Suppose a pair of dice are rolled, one red and one white.
Let A be the event “odd number of pips on the red die,” B the event “odd
number of pips on the white die,” and C the event “the score is odd.” Checking
Figures 1.2 and 1.3, it is easy to see that A, B, and C are pairwise independent,
butP(AABAC) = 0 # (1/2)3 = P(A)P(B)P(C)sinceAABAC = 0. ■
EXAMPLE 2.21
Caveat: Independent and mutually exclusive are not the same.
Theorem 2. 6.2
If A\, A2,..., A„ are mutually independent events, then Bi, B2,..., B„ are also
mutually independent where each Bj is Aj orAf
In rolling a pair of dice, the numbers of pips on each die constitute
independent events. Coin flipping also has built-in independence.
EXAMPLE 2.22 Consider an infinite sequence of Bernoulli trials with
probability of success p. Suppose Sb S2,..., 8„ 6 {0,1} are given and 1 <
1'1 < i2 < • • • < in. For j = 1,2,..., n, let
Ait = {w : co = {x,}
*
=1,x;. = 8;};
i.e., if co G Ait, then the result of the 1) trial is 8, and nothing else is known
about the results of the other trials. The events A,,,..., A,„ are then mutually
independent events. According to Equation 2.7, P(Ajp = ps>qx~s>, and since
Af| A • • • A AIn — {<u : co — {xj }f- _ ।, x;, = 31,..., Xja = S„ },
P(A,- A ■ • • A A,„) =
= fWi1-8')
1 =1
= P(A,-) X • • • X P(Ain).
2.6
CONDITIONAL PROBABILITY AND INDEPENDENCE
51
Since this is true for any set of integers 1 < ii < i2 < • • • < i„,the2" - n - 1
conditions for mutual independence are fulfilled. ■
Theorem 2.6.3
Let Ai, A2, • • • > A„ be mutually independent events and let I = {ib ..., i^}, J =
{ji>.. .,jn-k} be nonempty disjoint subsets o/{l, 2,..., n }. Then any event con­
structed from the Ait>..A/t is independent of any event constructed from the
Ajt,..., Ajn_k.
This theorem is proved in more advanced texts. For now we must be satisfied
with verifying it in specific cases in the exercises.
EXERCISES 2.6
Bayes’ rule is needed for some of the following problems.
1.
Let A, B, and C be mutually independent events. Show that A, Bc, and
C are mutually independent.
2.
Let Ai, A2,..., An be mutually independent events. ShowthatBb B2,...,
Bn are mutually independent events where B,- is A, or A-, i =
1, 2,..., n.
(a) If A, B, and C are any three events, show that P(A Cl B Cl C) =
P(A|B A C)P(B|C)P(C) provided the conditional probabilities are
defined, (b) State a generalization for events Ab A2,..., A„.
A bowl contains 10 red balls and 10 white balls. A ball is selected
at random from the bowl, replaced by a ball of the other color, and
returned to the bowl. This procedure is repeated two more times. An
outcome is defined to be an ordered triple (i,j, k) where: is the number
of red balls in the bowl after the first return, j is the number after the
second return, and k is the number after the third return. Determine
the probability of each outcome.
A coin is flipped twice in succession. Let A be the event “head on the
first flip,” B the event “head on the second flip,” and C the event “the
two flips match.” (a) Are the events A, B, and C pairwise independent?
(b) Are they mutually independent? In both cases, justify your answer.
Binary digits 0 and 1 are transmitted over a communications channel.
If a 1 is sent, it will be received as a 1 with probability .95 and as a 0 with
probability .05; if a 0 is sent, it will be received as a 0 with probability
.99 and as a 1 with probability .01. If the probabilities that a 0 or 1 is
sent are equal, what is (a) the probability that a 1 was sent given that a 1
was received and (b) the probability that a 0 was sent given that a 0 was
received?
If, in the previous problem, three successive binary digits are transmitted
with independence between digits, what is the probability that 111 was
sent given that 101 was received?
There are three chests with two drawers each, and each drawer con­
tains a gold coin or a silver coin. Chest 1 contains two gold coins, Chest 2
3.
4.
5.
6.
7.
8.
52
2
AXIOMS OF PROBABILITY
contains a gold coin and a silver coin, and Chest 3 contains two silver
coins. (Gold coins? A very old problem.) A chest is selected at random,
and then one of its two drawers is selected at random and opened. If
a gold coin is observed, what is the probability that the other drawer
contains a gold coin?
9. Let A, B, C, and D be mutually independent events. Show that A U B
and C O D are independent events.
10. Consider an event A and an infinite sequence of disjoint events {A;}”=,
such that A and Aj are independent for each j > 1. Show that A and
UA;- are independent events.
11. A mechanical system consists of components A, Blt B2, C, and D as
indicated in the diagram. The system will function if there is a path
from a to (3 along which all components are functioning, (a) If in a
specified period of time, A, C, and D will each malfunction with
probability .05 while Bj and B2 will each malfunction with probability
.2, all independently of one another, what is the probability that the
system will function during the period? (b) If Bj is added to the system
in parallel with B i and B2 and with the same probability of malfunction,
what is the probability that the system will function?
The following problem does not require mathematical software such as
Mathematica or Maple V, but using a hand calculator is a bit tedious.
12. Suppose in the previous exercise that the components Bi and B2
are replaced by Bj,..., Bm connected in parallel and C is replaced by
Ci,..., C„ connected in parallel. Assume that each B, will malfunction
with probability .4 and each Cj will malfunction with probability .6.
If the Bi cost $100 each and the Cj cost $80 each, how many of the
Bj and Cj components are required to ensure that the total system will
function with probability at least .88 and will minimize the cost?
SOME APPLICATIONS
The first application will deal with such questions as “How secure is your re­
motely operated garage door opener? Your computer password? Your answer-
2.7
53
SOME APPLICATIONS
ing machine access number? Your telephone calling card number?” Such
applications require an extension of the definition of mutually independent
events to countable collections. Fix the probability space (O, S', P).
Definition 2.8
The events Ai,A2,... are mutually independent if every finite subcollection
consists of mutually independent events. ■
EXAMPLE 2.23 Consider an infinite sequence of Bernoulli trials with
probability of success p. For each; 2: 1, let 8j E {0,1} and define
j s 1.
Aj = {to : to = {x,}7=1,x;- = 3;},
If{A(1, ... ,Ain} is any subcollection of the A,, we saw in the previous section that
the A,,,..., Ajn are mutually independent events. Thus, the events Ab A2,...
are mutually independent. ■
Consider any sequence of events {A;-}J°_ j and an outcome to. What is to be
meant by the statement that to belongs to infinitely many of the Aj? It should
mean that no matter how far out you go in the sequence, the to should belong
to an Aj out beyond that point; i.e., for every k 2 1, to belongs to some Aj
with j 2 k or, in the language of set theory, to E \Jj>tAj. Since this is true
for every k 2 1, to E
UjafcAj.
Definition 2.9
(f{A;}”=! is any sequence of events, we define
{A„ i .o.} = Ofcai Ujajt Aj. ■
Note that {A„ i.o.}, read “A„ infinitely often,” is an event, because S' is
closed under countable unions and intersections.
What is the complement of {A„ i.o.}? If to is in the complement, then it is
not true that to E Aj for infinitely many); i.e., to E Aj for at most finitely
many A,. Formally, by de Morgan’s laws,
{A„ i.o.}c = Ufc>i
Aj.
This brings us to a famous theorem that for some reason is called a lemma.
Lemma 2.7.1
(Borel-Cantelli)
Let {A,}”_! be an infinite sequence of events.
converges, thenP({An i.o.}) = 0.
(i)
If
(ii)
If the Aj are mutually independent events and ^>=1P(Aj) diverges,
then P({An i.o.}) = 1.
54
2
AXIOMS OF PROBABILITY
PROOF: (i) Assume that the series Z“= i P(Aj) converges. Since the sequence
{U; afcA;}”=1 is a decreasing sequence and {A„ i.o.} = IT”=1
A;,
P({A„ i.o.}) = lim P(Uj>kAj)
*«>
kby Theorem 2.5.3. By Theorem 2.5.2,
00
P(U;afcA;)
i=k
Thus,
00
o
P({A„ i.0.}) < J>(A;)
j=k
for all k > 1. Since the series X”=i P(A;) converges, the sum on the right
has the limit zero as k -> ». Since P({A„ i.o.}) does not depend upon k,
P({A„ i.o.}) = 0.
(ii) It is easily checked using calculus that the graph of the equation y = 1 — x
lies below the graph of the equation/ = e~x forx >: 0; i.e., 1— x
e~x for
allx 2: 0. Therefore,
P(AJ) = l-P(Aj) < e~P(A>\
j > 1.
(2.15)
Consider {A„ i.o.}c = U
* >i IT; ait AJ. Since {lTJ=ilAj}"=jfe is a decreasing
sequence and
ChaM; C n; = fcA-,
for all r
k,
it follows that
P(njsfcA-)
limP(lTj=fcAj).
Since At,..., Ar are mutually independent events, Ack,.. ,,Acr are mutually
independent events and
r
P(Clj^) < Km np<AP
r~* i=k
lim
r->oc A f
= lime-^”*
^^^
2.7 SOME APPLICATIONS
55
by Inequality 2.15. Since the series
P(Aj) diverges to +<» for each
k 2- 1, the limit on the right is zero, and so P(Cl, afcAj) = 0 for all A > 1.
Therefore,
00
0 < P({A„ i.o.}c) = P(Ufca]
^P(n;afcAp = 0
fc=l
and P({A„ i.o.}c) = 0. Thus, P({A„ i.o.}) = 1 , as was to be proved. ■
Lemma 2.7.2
//{A,,, A;2,.. .}isany subcollection of the collection {Ai,A2,...}, then{Ai„ i.o.} C
{A„ i.o.}.
PROOF: If co is in infinitely many of the A,-., then it is in infinitely many of
theA; . ■
Consider an infinite sequence of
Bernoulli trials with probability of success p. Consider the four-letter word
1001. You may substitute the binary representation of your social security
number (which may require up to 30 binary digits 0 and 1), computer password,
answering machine access number, or telephone calling card number for this
number. What is the probability that the word 1001 will appear infinitely often
in an outcome co = {x;}”=1? For each; 2 1, let
EXAMPLE 2.24 (Password Problem)
Bj = {co : co = {xi}”=1,x; = 1,Xj+i = 0,Xj+2 = 0,xj+3 = 1}.
If co 6 Bi, then co looks like 1001.... If co G {B„ i.o.}, then 1001 appears in co
infinitely often. Although the events Bb B2> ■ ■ ■ are not mutually independent,
the events Bi,Bs, Bg,... are mutually independent because they are based
on nonoverlapping sets of four trials. Since each B; has probability p2q2 and
Z7=oP(Bi+4j) = X^=iP2q2 diverges, P({B]+4„ i.o.}) = 1 by (ii) of the
Borel-Cantelli lemma. Since Bb Bs, B9,... is a subcollection of the collection
Bi,B2, ... and{Bi+4„ i.o.} C {B„ i.o.} by the preceding lemma, P({B„ i.o.}) =
1. ■
How safe is your remotely operated garage door opener, computer password,
answering machine access number, or telephone calling card number? It de­
pends upon how long it would take a random generator of l’s and 0’s to hit the
electronic combination. The question should not be “Can it be violated?” but
rather “How long will it take?” But that is another mathematical problem to
which we will return in Chapter 4.
To illustrate the inclusion/exdusion principle, consider a deck of cards that
are numbered 1, 2,..., N. Suppose the deck is thoroughly shuffled and the
cards are dealt one by one onto positions numbered I, 2,..., N. A match
occurs at position j if the card numbered j is at that position. If all N!
arrangements of the deck are equally likely, what is the probability that there
56
2 AXIOMS OF PROBABILITY
will be at least one match? Let Aj be the event “there is a match at the jth
position.” The answer to the question lies in calculating P(A\ U • • • U A^).
This probability can be calculated using the inclusion/exclusion principle given
by Equation 2.12:
P^U — UAn) =2>(a;)-
JI
l£i|<i2£N
i=l
p(a;, nA,2nAi,)
y
+
1 £ii<i2<i3
SN
- •••+P(A1nA2n---nAN)
N
= 21(-ir1
r=l
y
l£i|<-"<i,
p(A,,n---nAj.
SN
Consider a typical term P(A,-, Cl • • • Cl A;,) where 1 < ij < i2 < • • • ir
N.
For an outcome to be in A,, D • • • D A,r there must be matches at the ilt.. .,ir
positions. The number of outcomes with such matches is (N — r)>. Thus,
and
lsil<i2<-<irsN
since the sum on the left has (
1 •
) terms corresponding to the number of ways
of choosing a subset {ij,..., ir} from {1,..., N}. Therefore,
N
P(A!U---UAN) =y(-iy(
*
r=i
_
2-
(-ir1
,i
r=l
r=l
r!
The last sum is a partial sum for the Maclaurin series expansion e~x =
*
with
= 1 except for a missing r = 0 term. If N is large,
the sum
j( — 1 )r/r! can be approximated by e-1 — 1. Thus, for large N,
the probability of at least one match is
P(Aj U ••• UAN) ~ 1 -
e
2.7
57
SOME APPLICATIONS
and the probability of no match is approximately Me. Actually, the approxi­
mation of the probability of no match by Me is quite good even for N as small
as 6, the error in this case being on the order of .0002.
The following example appears in many guises.
EXAMPLE 2.25 (Coupon Collector Problem) Any one of N different
coupons (e.g., baseball cards) is included in a commercial product. Assume
independence between purchases and that the coupons are equally likely to
appear in a product. If a collector purchases the product n times, n S: N, what
is the probability that a complete set of coupons will be collected? Suppose
the coupons are labeled 1,2,..., N. Let Aj be the event that coupon j does
not appear among the n purchases, j = 1,... ,N. The probability that a
complete set is not collected in n purchases is then P(Ai U • • • U An). By the
inclusion/exclusion principle,
N
PfAjU — UAjv) = 22(-l)r-1
r=l
P(Ai, A--- nA;,)
y,
N
y
= yP(A;)i=l
P(A;AAj)
+ ••• + (-l)N-1P(A1n ••■An).
Note that the last term P(Ai A • • • A A^) = 0 since it is impossible for no
coupon to appear. Consider A;. The probability that coupon i will not appear
with a particular purchase is 1 — (1/N), and the probability that it will not
appear in n purchases is (1 — (1/N))". Thus,
i -1
\
'
f
'
Consider A; and Aj,i
j. The probability that coupons i and; will not
appear with a particular purchase is (1 — (2/N)), and the probability that they
will not appear in n purchases is (1 - (2/N))". Since there are ( ^ ) choices
of i and; with 1
i <j
y.
N,
WOA;) =
•
Similarly,
S
___ ,
/
\/
r \"
PMi.n-.-nA,,)’ (rX1-N)
* * *• S
58
2
AXIOMS OF PROBABILITY
Therefore,
r \)” ’
P(Ar U • • • U AN) = X(-1> r_,/NV
r A1 " n
r=1
For example, if N = 6 and n = 25, then P(Ai U • • • U Ag) = .062 and
the probability of collecting a complete set of 6 coupons with 25 purchases is
.938. ■
EXERCISES 2.7
1.
A deck of cards numbered 1,2,..., 10 is shuffled and the cards are dealt
one by one onto positions 1,2,..., 10. Calculate the exact probability
of at least one match.
2. Consider an infinite number of Bernoulli trials with probability of
success p # 1/2,0 < p < 1. An equalization occurs as of some trial
if there is an equal number of heads and tails. Equalization can occur
only on an even number of trials, (a) If A2n is the event “Equalization
occurs on the 2n trial,” show that
/W) = (2„")p”(l-p)".
(b) What is the probability that an infinite number of equalizations will
occur?
The next two problems relate to an infinite sequence of Bernoulli trials
with probability of success 1/2. A run of length r beginning on the nth
trial occurs if there are l’s on the n through (n + r — 1) trials followed
immediately by a 0. For integers n, r 2: 1, let A„,r be the event consisting of
those outcomes for which there is a run of length greater than or equal to r
beginning on the nth trial.
3. If r is a fixed positive integer, determine P(A„,r i.o.).
4. If for n 2 1 and 8 > 0, r„ = (1 + 8) log2 n, determine P(An,r„i .o.).
The following problems pertain to the game of craps, which is played with
two dice according to the following rules:
•
You win on the first roll if you roll a score of 7 or 11.
•
You lose on the first roll if you roll a score of 2, 3, or 12.
•
If you do not roll a 2, 3, 7, 11, or 12 on the first roll, the score
becomes your “point” for subsequent rolls.
•
You win on subsequent rolls if you roll your point without having
rolled a 7 and lose if you roll a 7 without having rolled your point.
5. Assuming independence between trials, describe appropriate O, S',
and P.
2.7 SOME APPLICATIONS
59
6. What is the probability that you will win with a point of 8?
7. What is the probability that you will win at craps?
8. What is the probability that the game will terminate?
The following problems require mathematical software, such as Mathematica
or Maple V, or much patience.
9. A deck of 52 cards numbered 1,2,..., 52 is shuffled and the cards are
dealt one by one onto positions 1,2,..., 52. Calculate the probability
of at least one match without using the approximation 1 — Me.
10. A commercial product includes a coupon that can be either a worthless
coupon or one of eight collectible coupons. If 30 percent of the coupons
are worthless and the collectible coupons occur in equal proportions,
how many products must be purchased to be 95 percent confident
of obtaining a complete set of collectible coupons, assuming that the
coupons are inserted randomly into the product?
SUPPLEMENTAL READING LIST
R. W. Hamming (1991). The Art of Probability for Scientists and Engineers.
Redwood City, Calif.: Addison-Wesley.
RANDOM VARIABLES
INTRODUCTION
The score obtained upon rolling two dice and the number of heads in n
flips of a coin are examples of random variables. It is possible to forgo the
apparatus of the first two chapters and deal directly with a primitive concept
of random variables by specifying certain probability statements about the
random variables. Eventually, however, the study of algebraic and limiting
properties of random variables would lead to the considerations of the first
two chapters.
One of the problems we will study in this chapter is the gambler’s ruin
problem, which apparently appeared in print for the first time in a paper by
Huygens around the beginning of the eighteenth century. This problem was
solved by James Bernoulli in a paper published posthumously in 1713. A more
modern method, the method of difference equations, will be used to solve
the ruin problem. Another important methodology for solving probability
problems involves generating functions, which were introduced by de Moivre
around 1740 and treated exhaustively by Laplace at the end of the eighteenth
century. The reader wanting to learn more applications of generating functions,
or of probability theory in general, would be well advised to read the book by
Feller listed at the end of the chapter.
RANDOM VARIABLES
Unless otherwise specified, (O, S', P) will be a fixed probability space. At this
juncture, we are not going to give the most general definition of a random
variable but will keep things as simple as possible.
60
3.2
Definition 3.1
61
RANDOM VARIABLES
A map X : fl —> R is a random variable if the range of X is a countable set
{xi,x2,...}, finite or infinite, and {co : X(co) = x7} G S' for all j S 1. ■
A random variable as just defined is customarily called a “discrete random
variable,” but the prefix “discrete” will be dropped because no other type of
random variable will be considered until much later. The definition will be
extended to allow the possibility thatX can take on the value +<».
In all the probability models considered so far, except for an infinite
sequence of Bernoulli trials, S' consists of all subsets of O. When this is the
case, {to : X(co) = Xj} is just another subset of fl and therefore is in S'. In
most cases, showing that X is a random variable simply amounts to verifying
that the range of X is countable.
The notation for the event {co : X(co) = Xj} will be compressed to
(X = Xj) by suppressing the co. The same is true for other events; e.g., the
event{to : a <X(co)
b} will be compressed to (a <X
b).
Definition 3.2
Let X be a random variable with range {xi,x2,...}. The density function fx is
the real-valuedfunction on the range ofX defined by
fx(xj) = P(X =Xj),
j = 1,2,... ■
The range of X can be finite or infinite. The density function fx will be
denoted simply by/ if the meaning is clear from the context. It is important to
keep in mind that the domains of fx and P are not the same. P is a function
on S' with values in R, whereas fx is a function on the range {xj, x2,...} of the
random variable X with values in P.
EXAMPLE 3. 1 Let X be the score upon rolling two dice. The function
p defined on {2, 3,..., 12} as in Figure 1.3 is the density function of X; e.g.,
/x(7) = P(7) = 6/36. ■
Consider n Bernoulli trials with prob­
ability of success p. Let X be the number of successes in n trials; i.e., if
to = *{;} ”= i withx; G {0,1}, thenX(to) = *
r
X"=i
Since the range of X
is {0,1,...»n}, X is a random variable. If X"= j xj = k, then there are k
successes in co and n - k failures, and so co has probabilitypkqn~k. But there
EXAMPLE 3. 2 (Binomial Density)
*)
&(
=
fc-o.i..... "
is the density function. This density function is called the binomial density
function with parameters n andp and-is denoted by b(-,n,p). ■
62
3
RANDOM VARIABLES
Between a source S and a collector C there is an absorption
medium as indicated below.
EXAMPLE 3. 3
S —►
'Absorber
The probability that a given particle emitted from S is not absorbed is p, and
the probability that it is absorbed is 1 — p, 0 < p < 1. Assume that the
particles are absorbed independently of each other. If n particles are emitted,
the probability that exactly k particles will reach C is given by the binomial
density
fc(k;n,p) = (? )p
(l
*
-p)n~k>
k = 0,...,n. ■
Consider an infinite se­
quence of Bernoulli trials with probability of success p as described in Example
2.14. If X is defined as the first trial at which success occurs, then we have a
small problem in that X(o>o) is not defined for the outcome w0 = (0,0,...)
consisting of all 0’s. It was shown in Example 2.14 that a success will eventually
occur with probability 1. We can therefore define X(&>o) however we choose,
the result having no bearing on the computation of probabilities. We choose
to define X(o>o) = +°°- The range of X is then N U {+<»}, which is countable.
For k E N,
EXAMPLE 3.4 (Geometric Density Function)
(X = k) = {a> : a> = {x;}* =1,Xi = 0, • • • ,Xk-i = 0,x> = 1} E S',
and since (X = +<») = (U
* = 1(X = k))f E S', X is a random variable. We
saw in Section 2.4 that
/x(k) = P(X = k) = pqk~l,
k = 1,2,...
This density function is called a geometric density function with parameters p
and q = 1 — p, 0 < p < 1. ■
Note that the definition of a random variable has been extended in this
example because Definition 3.1 requires that the values of X be real numbers
and R does not contain +<». If the value +<» is allowed, the random variable
is called an extended real-valued random variable. The only situation in which
a random variable will be allowed to take on the value +<» is that in which X
measures some waiting time as in the previous example. In either case, the
criterion is still the same, because if (X = Xy) E S' for all real values Xy of
3.2
63
RANDOM VARIABLES
X, then (X = +») = (Uj(X = Xj))c G 3% since S' is a a-algebra. In most
instances, the original definition is applicable.
The geometric density is applicable to physical systems for which aging is not
a factor. For example, the waiting time, in discrete units, for a radioactive atom
to decay has a geometric density in which the parameter p can be determined
from the half-life of the atom.
EXAMPLE 3.5 (Negative Binomial Density) Consider an infinite se­
quence of Bernoulli trials with probability of success p. Fix a positive integer r
and let Tr be the trial at which the rth success occurs for the first time. If for
the outcome w the rth success never occurs, then we put Tr(o>) = +<». The
range of Tr is {r, r + 1,.. .,+<»}. If x s r is a positive integer, it is easy to
see that (Tr = x) is a condition on just finitely many trials, and so Tr is an
extended real-valued random variable. Each outcome in the event (Tr = x)
has probability prqx~r- Since r — 1 of the trials preceding the xth trial must be
successes, the number of outcomes in this event is ( X J ). Thus,
P(Jr = x) = (
_
*
J )prqx T,
x = r, r + 1,...
Changing the scale by replacing x by x + r,
P(Tr-r = x) = (X+rT
_~^prqx
= (X + ^-1W>
x = 0,1,2,...
P(Tr-r = x)= {~^pr{~q)x,
x = 0,l,2,...
By Exercise 1.3.10,
It follows that Tr - r has the density function
fM =
x = 0,1,2,...
which is called the negative binomial density with parameters r and p. The
name arises from the fact that
64
3
RANDOM VARIABLES
by the generalized binomial theorem (see Exercise 1.3.4). Accordingly,
y. (
'
=/(I - ir = i.
x=0
and it follows that P(Tr < +<») = P(Tr 6 {r, r + 1,...}) = 1. This means
that the rth success will eventually occur with probability 1. ■
Caveat: In the discussions that follow, assume that all random variables are
real-valued unless explicitly stated otherwise.
The next density, the Poisson density, can be obtained as a limiting case of
the binomial density as follows. Consider a sequence of experiments described
by a binomial density for which the probability of success depends upon n;
that is, consider a sequence of binomial densities b(-, n,pn), n S: 1. Assume
that as n increases, p„ varies in such a way that npn —> A > 0 for some fixed
A. Fix an integer k
0. Then
lim fc(fc;n,p„) = lim ( ” )p^(l - p„)"-k.
For large n,p„ = A/n, 1 -p„ = 1 - (A/n), and
fc! \n / \
nJ
_ A^n(n - 1) X • • • X (n - fc + 1)
kl
nk
Since k is fixed, there are a fixed number of factors in the last product, and the
limit of the product is the product of the limits. Since lim,,.^! - (j/n)) =
1 for; = l,...,k — 1, lirrin-^xU — (A/n))” = e-A from the calculus, and
lim„_»«,(l - (Mn)}~k = 1,
lim ( nk )p
(l
*
-p„)n~k =
.
I
n —K
Therefore,
lim b(k’,n,pn} =
A^e-A
,
k > 0.
3.2
65
RANDOM VARIABLES
Can the function/(k) = (Xke A)/k>, k >: 0, serve as the density of a random
variable X?
According to Definition 3.1, the domain of the density function fx is the
range {xi,x2,...} of X. Whenever it is convenient to do so, we will define
fx(x) — 0 for x g {xi,X2>...}. With this convention in mind, a density
function has the following properties:
0</(x)<l
forallxGP.
(3.1)
There is a countable set {xi, x2,...} such that 'Z.jftXj) = 1
and/(x) = 0 whenever x S {xI,x2,...}.
(3.2)
Conversely, given such a function, we can construct a probability space
(fl, S', P) by taking fl = {xb x2, ■■■},& the collection of all subsets of fl,
and defining P using the weight function f(Xj) as in Section 1.5. The random
variable X defined on fl byX(xy) = Xj then has f as its density function. To
show that this construction can be applied to the function/Xk) of the previous
paragraph, we need the Maclaurin series expansion
(3.3)
EXAMPLE 3.6 (Poisson Density Function)
Fix a positive number A and
let
(Xke~
yk\
*
0
if k = 0,1,...
otherwise.
Clearly,/(x) S 0 for all x 6 R. Since {0,1,...} is a countable set,/(x) = 0
for all x £ {0,1,...}, and
= 1 by Equation 3.3, the function
f satisfies 3.1 and 3.2. Thus, there is a probability space (fl, 9% P) and a random
variable X having f as its density function. This density function is called a
Poisson density function with parameter A and is usually denoted by p( •; A). ■
The Poisson density is usually applied in situations in which there are a large
number n of trials with a small probability p that an event will occur in each
trial and with A = np moderate in magnitude.
EXAMPLE 3.7 An electronic system has a periodic operating cycle of
0.01 second. In each of the cycles, an event can occur with probability .001.
What is the probability of observing fewer than 15 events in a 100-second
time interval? During 100 seconds, 10,000 cycles will be observed. Letting
A = 10,000(.001) = 10, the probability that k events will be observed is given
66
3
RANDOM VARIABLES
by the Poisson density p(k-, 10). The required probability is
14
14
10ke'l°
Xp»-, 10) = X
« .9165.
k=0
k=0
k\
EXAMPLE 3.8 (Uniform Density Function) For fixed n 6 N, let O =
{1,2,...,«} and let/(^) = 1/n for k = 1,2, ...,n. Then/satisfies 3.1 and
3.2, and there is a random variable X having / as its density function. ■
If A is any set of real numbers and X is a random variable, then
(X GA) = {a: X(w) £A} = Ux.6A{w : X(cu) = Xy}
belongs to S' since each set {w : X(w) = xy} 6 S'. Since the events in the
union are disjoint,
P(X E A) = y P(X = Xj) =
(3.4)
x, GA
Xj GA
This equation allows us to compute probabilities related to the random vari­
able X.
EXAMPLE 3.9
Suppose X has a geometric density with parameters p and
q = 1 — p. Then
00
P(X s 10) = P(X E [10, oo)) = y p</-1 = q9. ■
/ = 10
Let X be any random variable and let <p be a real-valued function on R.
Given any a> E ft, it makes sense to form the composite function g>(X(to)).
This composite map is denoted by <p(X). The range of ^(X) is the count­
able set {^>(xi), g>(xz),..Let y be in the range of <p(X). To show that
(^>(X) = y) E S', let Xj,,Xj2,... be those values of X for which <p(xi.) = y.
Then (y>(X) = y) = U; (X = x,;) E S', since X is a random variable. This
shows that ^(X) is a random variable.
EXAMPLE 3.10 IfX is a random variable, thenX2 is the random variable
defined for each w by X2(w) = (X(co))2, sinX is the random variable defined
for each u> by (sinX)(w) = sin(X(w)), |X| is the random variable defined for
each <a by |X|(w) = |X(w)|, and so forth. ■
3.2
67
RANDOM VARIABLES
Given a random variable X and a real-valued function <p on R, how do we
determine the density function fz of the random variable Z = <p(X)? There
is no algorithm for generating the density function fz.
EXAMPLE 3.11
Let X be a random variable having the geometric density
fxto = pqx~\
x=l,2,...
andletT = min (X, 5). The range of Y is the set {1,2, 3,4,5}. IfX(w) is 1,2,3,
or4,theny(w) = X(o>) and/y(x) = fx(x),x = 1,2, 3,4. IfX(w) >. 5,then
y(w) = min(X(o>), 5) = 5 so that (y = 5) = (X 2 5) = U”=5(X = x).
Therefore,
= <* 4-
/r(5) =
x=5
Therefore,
f
fy(y) =
< <74
0
if/ = 1,2, 3,4
if/ = 5
otherwise. ■
Consider now two random variables X and Y on the same probability space
with ranges {xi, X2,...} and {yi, yi,...}, respectively, and let </, be a real-valued
function of two variables. Then for each a> E fl,</r(X(w), y(w)) defines a new
map from ft to R that is denoted by ip(X, y). The range of </r(X, T) is the set
{ip(xi,yj) : i
l,j S 1}, which is a subset of the set
: j — 1}>
which is countable by Theorem 2.3.1 and is therefore countable. Let z be any
value of <A(X, T) and let (Xi,,y71), (xj2, y72),... be those ordered pairs for which
^xik,yh) =z-Then
(</r(X,y) = z) = |J{co :X(a>) = x4,y(«>) = yjt}
k
= U«x =xjn(y = %))e?
k
since X and y are random variables. Thus, Z = i/*(X,
T) is a random variable.
EXAMPLE 3.12 If X and T are random variables, then X + Y, X - Y,
XY, X2 + y2, max(X, y), min(X, y), sinXy, and so forth, are all random
variables. ■
More generally, if Xb ... ,X„ are n random variables and ip is a real-valued
function of n variables, then i/
*(Xi, ..., X„) defined by
^(XI,...,X„)(w) = iKXda>),...Ma>y)
68
3
RANDOM VARIABLES
for a> E fl is a random variable. Finding the density function of Z =
can be difficult, depending upon ip. We will show how this can
be done when n = 2 in some cases using the joint density of two random
variables.
Definition 3.3
Let X and Y be random variables with ranges {xi.xz,...} and
respectively. The joint density function fx,y of X and Y is defined on the set of
ordered pairs {(Xi, yj) : i
l,j 2: 1} by
fx.Y&i.yj) = P(X = xitY = yj),
i,j > 1. ■
Equation 3.4 can be extended to two or more random variables. LetX and Y
be two random variables with ranges {xb Xi,...} and {yb y2,...}, respectively,
and let A be any subset of R X R. Since ((X, T) E A) = U
)eA(X =
Xi>Y = yf),
P((X,Y)EA) =
P(X = Xi,Y = Yj) =
fx,y(xhyj). (3.5)
In calculating probabilities pertaining to the pair X, Y, there is some latitude
in the choice of A. The set A can usually be defined by replacing X and Y by
typical values x, and y}respectively.
EXAMPLE 3.13 Suppose two dice are rolled, one red and one white. Let
X be the number of pips on the red die, let Y be the number on the white die,
and let Z be the maximum of the two numbers; i.e., Z = max(X, Y). The
joint density of X and Y is
fx,y(x,y) =
x.y = 1,2,..., 6.
Suppose we want to find the joint density of X and Z. Both X and Z have
range {1,2,..., 6}. Let x and z be typical values of X and Z, respectively. Since
Z 2 X,fxtz(x,z) = P(X = x,Z = z) = 0 if z < x. If z = x, then we
must have Y < x. Hence, (X = x,Z = x) = (X = x, Y < x). To put the
event (X = x, Y
x) into the form ((X, T) E A), formally replace X by i
and Y by j in the first event to define A = {(i,j) : i = x,j < x}. Thus, by
Equation 3.5,
fx.z(x.z) = P(X = x.Z = x) = P((X, T) E A) =
1 = ±
36
36
1=X,1<J<X
V
3.2
69
RANDOM VARIABLES
whenever z = x. If z > x, the event (X = x,Z = z) can occur only if
Y = z; i.e., (X — x,Z — z) = (X = x, Y = z) whenever z > x, and thus
>z)
*
/x,z(
— P(X = x,Y = z) = 1/36 whenever z > x. In summary,
fx,z(x,z) = <
Definition 3.4
0
x/36
1/36
if z < x
if z = x
if z > x. ■
(3.6)
The joint density function fx.... x„ °f the random variables Xb..., Xn is the
real-valued function
fx... • ■ • >xn) ~ P(Xi = Xi,. . .,Xn = x„). ■
Of course, if any x, is not in the range of X,-, then the probability on the right
is zero. If it is clear from context, the joint density will be denoted simply by
f(xi,..., x„), keeping in mind that the order of the variables in the argument
off corresponds to the order of the random variables.
If the joint density fxt,...,x„ °f Xu..., X„ is known, then the joint density
of any subcollection of the X/s can be determined. For example, suppose we
want to determine the joint density of Xb... >Xn-\. Let {x„bx„2,...} be the
range of Xn. Then
n = U(x" = Xnk}k
Intersecting both sides of this equation by (Xi =Xi,...,X„-i = x„-i),
(Xi = Xi,...,Xn~i = Xn-i) = ^J(Xj = Xi,...,X„-i = X„-i,X„ = Xnif)k
Since the events on the right are disjoint,
fx... ,xn.1(^b--->^-i) = ^>Jx.... x„(x1,...,xn-i,xnkf
k
Alternatively,
fx.... X„-|(X1’ • •
= ^./x.... X„(xl>-• -’Xn-l’Xn)
(3.7)
x„
since fx.... x„(
*i»
■ ••>xn-i>xn') = 0 whenever x„ is not one of the values of
X„. This procedure can be repeated as often as necessary to obtain the joint
density of any subcollection.
70
3
RANDOM VARIABLES
EXAMPLE 3.14 A point with integer coordinates (X, V) is chosen at
random from the triangle with vertices at (1,1), (n, 1), and (n, n) where n is a
fixed positive integer. Since the total number of points with integer coordinates
(x, y) in the triangle is 1 + 2 +••• + n = (n(n + l))/2> the density function of
the pair (X, V) is
2/(n(n + 1))
0
if 1 < y < x,x = 1,2,.. .,n
otherwise.
For x = 1,2,..., n, the density fx (x) is given by
f M = V
2
=
X X
n(n + 1)
n{n + 1)'
Thus,
, , .
x>
2x/(n (n + 1))
o
if x = 1,2,..., n
otherwise.
Fory = 1,2,...»n, the density/y(y) is given by
, , , _ <5
2
_ 2(n — y+1)
/y(7)
^n(n + l)
n(n + l) ‘
A 7
Thus,
(2(n — y + l))/(n(n + 1))
EXERCISES 3.2
1.
if y = 1,2,..., n
otherwise. ■
A bus tour operator uses a bus with a capacity of 45 passengers but sells
50 tickets. If one person out of 12 is a no-show, what is the probability
that everyone who shows up for the tour will be accommodated?
2. What is the maximum number of tickets the bus tour operator should
sell to be able to accommodate all that show up with probability at least
equal to .90?
3.
An electronic system has an operating cycle of 0.01 second. During
successive time intervals of length 5 X 10-6, an event may occur with
probability p = .0005. What is the approximate probability that fewer
than 8 events will occur during 10 cycles?
4. The random variables X and Y have the joint density function
2/(n(n + l))
0
ifl <y< -x + n + 1,1
otherwise
where x and y are positive integers. Find/x(
)
*
and/y(y).
x
n
3.2
RANDOM VARIABLES
71
5. Suppose a pair of dice are rolled and Z is the larger of the number of
pips on each. What is the density of Z?
6. Denote the general term of the binomial density with parameters n and
pby
b(k;n,p) = (?)pkqn~k>
K
k = 0,
Find a recursion formula for calculating b(k-, n,p) from b(k — 1; «,p),
and put the ratio of the two in the form ! + •••. For what value or
values of k is b(k; n,p) a maximum?
7. The two random variables X and Y have the joint density
fx,y(x,y) = ------ —------,
x,y = 0,1,2,...
A.y.
where a, fl >0. What are the densities of X and V?
8. Suppose the random variables X and V have ranges {xi,x2,...} and
{yi, y2,...}, respectively, and their joint density has the form
fx,y(Xi>yj) = f(xi)g(yj)>
i,j = 1,2,...
Express the densities of X and Y in terms off andg.
9. The random variables X and Y have the joint density function tabulated
below. Find the densities of X and Y and calculate P(Y
X).
X = 1
x = 2
x = 3
x = 4
y = 1
.03
.01
.07
.01
y = 2
.07
.01
.06
.13
y = 3
.04
.08
.06
.02
y = 4
.06
.09
.03
.07
y = 5
.05
.06
.03
.02
10. The cards of a deck are numbered 1,2,..., 50. The deck is thoroughly
shuffled and then two cards are dealt. Let X and V be the numbers on
the first and second cards, respectively. Use Equation 3.5 to calculate
P(|X - y| > 2).
The following problems require mathematical software such as Mathematica
or Maple V.
11. A jumbo jet with a capacity of 365 passengers is oversold by 10 tickets.
If one person out of 25 is a no-show, what is the probability that all of
those who show up will board the jet?
72
3
RANDOM VARIABLES
12. What is the maximum number of tickets the airline should sell to be
able to accommodate all who show up with probability .99?
INDEPENDENT RANDOM VARIABLES
The discussion of the empirical law in Chapter 1 does not include the phrase
“under identical conditions,” which is usually a part of the discussion. The
phrase means that an experiment should be repeated in such a way that
the outcome of a repetition should not be influenced by the outcome of
previous repetitions; i.e., it should be independent of previous outcomes of
the experiment. If XbX2, • • • denote the outcomes of successive repetitions,
then the events (Xj = xj, (X? = *2), • • • should be independent events. The
formal definition will be given in terms of joint densities.
Definition 3.5
The random variables Xj>..., X„ are independent if
fx.... *
i»---»
x„(
»
)
= /x.(
i)
*
x/x2(x2) x ••• x/Xn(x„)
for all Xi, ...,xnER. ■
That is, Xj,.. . ,X„ are independent if their joint density is the product
of their individual densities. We frequently use the fact that if Xb... ,Xn
are independent random variables and {ij,..., it} is a subset of {1,2,..., n},
then Xip ..., Xjk are independent random variables. Consider, for example,
X|,...,X„-i. By Equation 3.7,
fx......Xo-,(Xl,...,X„-i) = ^fx.... X„(X1>.
X„
= 5Lfx. (Xi) X • • • X fX"., (x„_1 )fXn (x„)
= fx,M x • • •
= /x,(xi)X •• • x/xn_.(x„-i).
Thus, Xi,... ,Xn-i are independent random variables.
EXAMPLE 3.15 Consider the roll of two dice, one red and one white. Let
X be the number of pips on the red die and Y the number on the white die. Since
fxM = P(X = x) = 1/6 for x = l,...,6,/y(y) = P(Y = y) = 1/6 for
y = 1, ...,6, andfx,y(
>y)
*
= 1/36 for 1 < x,y < 6,
3.3
INDEPENDENT RANDOM VARIABLES
fx,y(x,y) = fxWfY(y)
73
for all x,y G R.
Thus, X and Y are independent random variables. ■
Independence of random variables X and Y implies much more. Suppose
A and B are any sets of real numbers and the ranges of X and Y are {xi, x2>...}
and {yi, y2,...}, respectively. Then
P(X G A,Y EB) = P(X G A)P(Y GB).
This follows from independence, since
P(XGA,YGB) =
X
=
2Z
fx(Xi)fy(yj)
XiEA.yjGB
JI fx(Xi)¥ 22 A (/;))
= P(X GA)P(YGB).
For example,
P(X > x,Y > y) = P(X > x)P(Y > y).
More generally, if Xi,X2>.. .,Xn are independent random variables and
Ai, A2>..., An are any sets of real numbers, then
P(Xi GAi,...,X„ GA„) = P(Xj GAj) X • • • X P(X„ G A„).
Consider n Bernoulli trials with probability of success
p. For j = 1,.. .,n, let X; = 1 if there is a success on the j th trial and
let Xj = 0 if there is a failure on the jth trial. Note that the set (Xj = Xj)
involves a condition imposed solely on the;th trial. It follows that the events
(Xj =
..., (X„ = x„) are mutually independent, and therefore
EXAMPLE 3.16
P(Xj = Xi,...,X„ = x„) = P(Xj = Xj) X ••• XP(X„ = x„);
that is,
fx
(*i> •. •, x„) = fXl (xi) x • • • X fXn (x„),
and therefore Xi,..., X„ are independent random variables. ■
74
3
RANDOM VARIABLES
Example 3.16 is a special case of a more general model. Instead of allow­
ing each trial to have just two outcomes 0 and 1, we could allow r outcomes
on each trial. For example, in eight repeated tosses of a die, the outcome
of each toss is one of the integers 1, 2,.... 6. A typical outcome might look
like co = (3,5,2, 1, 2, 3, 3,4). As in the Bernoulli case, we can define random
variables Xb X2,..., Xs to specify outcomes of individual tosses; i.e., for this
outcome, X|(w) = 3, X2(w) = 5, ...,X8(o>) = 4. In the Bernoulli case,
we also counted the number of successes; in tossing a die, we could count
the number of times Y, the outcome i appears among the eight tosses of the
die. For the above outcome, Ki(w) = l.Y^co) = 2,Yi(co) = 3,y4(w) =
i,y5(w) = i,y6(a>) = o.
EXAMPLE 3.17 (Multinomial Density Function) Consider a basic ex­
periment with r outcomes that we choose to label as 1, 2, . . . , r having
probabilities plt...,pr, respectively. Consider the compound experiment
of n independent repetitions of this basic experiment. An outcome co of
the compound experiment is an ordered n-tuple co = (ib ..., in) where
each ij 6 {1,2,..., r}. We associate with each co = (ib ..., i„) the weight
p(w) = pi, X • • • X pin. Letting O be the collection of all such outcomes,
O is finite and we can take for S' the collection of all subsets of O. For
j = 1,..., n, we can define a random variable Xj by putting Xj(w) = ij
whenever co = (ib ..., in). Probabilities have been defined so that
i ।,..., X„ = in) = pit X • • • X pin
P (X i
fori) G {1, ...,r},j = 1,..., n. Since ^J = 1P; = 1>
r
P(Xl=il) =
=
21 P(
i
*
h-Jn = 1
r
y.
= h>X2 = i2,...,Xn = i„)
pi, x • • • x piit
'2...... 'n = 1
= z •••
y.
(s>, x-xJ
'2=1
'..-1 = 1 \'n=l
/
r
r
/ r
.
= £•••Spi x"-xp>..,
'2 = 1
'„-i = l
T
T
= yy-
z p-.x-'-xp^,
'2 = 1 ',,-1 = 1
\
5>.
\i„ = 1
/
3.3
75
INDEPENDENT RANDOM VARIABLES
Similarly, P(Xj = if) = p^j = 1,..., n. Thus,
= ix,...,X„ = f„) = P(X\ = ij) x • • • x P(X„ = i„),
and Xx,...,Xn are independent random variables. For 1
k
r, let
Kjt(co) be the number of k’s in the outcome co = (h,..., i„). Note that
y1+...+yr —
Let Hi,..., nr be nonnegative integers with n = «i+- • -+nr.
Any outcome co for which the number of l’s is nx, the number of 2’s is n2, and
so on, has probability p"' X • • • X pf. Since there are many outcomes fitting
this description,
P(YX = ni,...,Yr = nr) = C(n; nh ..., nr)p"' X • •• X p"r
where C(n; nx,..., nf) is the number of such outcomes. We can calculate this
constant as follows. The number of ways of selecting nx positions out of the n
positions to be filled with l’s is ( ” ); having done this, the number of ways
of selecting n2 positions out of the remaining n — nx to be filled with 2’s is
(n
ni ), and so forth. Thus,
' «2 z
P(YX = m,...,yr = nf)
= ("
xn1/x
n2
'
---------- V' X • • ■ X p
*
nr
/ri
rr
Expressing the binomial coefficients in terms of factorials and simplifying,
This joint density of Yb ..., Yr is called the multinomial density. ■
Suppose a die is tossed 12 times in succession. The
probability that there will be two l’s, one 2, four 3’s, one 4, three 5’s, and one
6 is
EXAMPLE 3.18
p(y, = 2, r, = i, r, = 4,
12!
1
= i, r, = 3, r6 = i) = 2il|4lli?11!g7
- .00076. ■
Definition 3.6
The random variables of the sequence XbX2,... are independent if for every
n
1, Xi, X2,..., X„ are independent random variables. ■
76
3
Definition 3.7
RANDOM VARIABLES
The sequence of random variables Xi,X2, • • • « called an infinite sequence of
Bernoulli random variables with probability of success p if they are independent,
P(Xj = 1) = p,andP(Xj = 0) = 1 - p for all j > 1. ■
EXAMPLE 3.19 Consider an infinite sequence of Bernoulli trials with
probability of success p as described in Example 2.14. For/
l,letXj = 1 if
there is a success on the/th trial and let Xj = 0 if there is a failure on the /th
trial. It was shown in Example 2.22 that the events (Xi = xj,..., (X„ = x„)
are mutually independent for every n S: 1. Thus, the random variables of the
sequence Xi,X2,... are independent. It was shown in Example 2.22 that
P(Xx = x1,...,X„ = x„) =
The probability P(%2 = *
i, ..., X„+i = x„) is also equal to the product on
the right side of this equation. More generally, we have the following property
of the joint density of n consecutive X/s:
fx.... = fxk......................................
.
(3-8)
for all n, k S: 1; i.e., the joint density of X
, X
*
*-n> • • ■ > X;t+„ is independent of
k. In particular, the probability of getting r successes in the first n trials is the
same as the probability of getting r successes in any n trials. ■
Theorem 3. 3.1
Let X and Y be independent random variables with ranges {xj,X2,...} and
{yi>/2> • • •}> respectively, and letZ = X + Y. Then
fz(z) =^fx(Xi)fr(z-x,) =^fx(z~yj)fy(yj)•
'
J
(3.9)
PROOF: For fixed z E R, fz(z) = P(Z = z) = P(X + Y = z).
stratify the event (X + Y = z) according to the values of X; i.e., write
(X + Y = z) = |J(X + Y = z,X = Xi).
i
Since (X + Y - z,X = x;) = (Y = z - xitX = x,),
(X + Y = z) = |J(X = xh Y = z - Xi),
i
Since the events on the right are disjoint,
P(X + Y = z) = £p(X = Xi,Y = z -Xi).
Now
3.3
77
INDEPENDENT RANDOM VARIABLES
By independence of X and Y,
fz(z) = ^fx.Y(Xi,Z - Xj) = 2Z/x(Xi)/y(z -X,).
i
i
A similar argument applies to the second assertion. ■
If the random variables of Theorem 3.3.1 are of a particular type, the
formula takes on a simpler form.
Definition 3.8
The random variable X is nonnegative integer-valued iffxM = 0 whenever
x £ {0,1,2,...}. ■
Theorem 3. 3.2
If X and Y are independent nonnegative integer-valued random variables and
Z = X + Y, then
/zW =
Xx=ofxMfy(z -x)
0
if z = 0,1,2,...
otherwise.
PROOF: Note that Z is also nonnegative integer-valued. By Theorem 3.3.1,
for z = 0,1, 2,...,
/z(z) = ^fx&lffc ~ x\
x =0
Note that/y (z — x) = 0 whenever x > z and the infinite limit can be replaced
byz. ■
Let X and Y be independent random variables having
Poisson densities with parameters Ai and A2, respectively. Let Z = X + Y.
Consider any z 6 {0,1,2,...}. By Theorem 3.3.2 and the binomial theorem,
EXAMPLE 3.20
z
fzW = S
Afe A' A| xe A*
x =0
It follows that Z has a Poisson density with parameter A! + A2. ■
EXAMPLE 3.21 Let X and Y be independent random variables having
binomial densities b(-;m,p) and b(-;n,p), respectively. The range of Z =
78
3
RANDOM VARIABLES
X + Y is then {0,1,..., m + n}. Suppose z is in the range of Z. By Theorem
3.3.2 and Equation 1.11,
z
fz(z} = y'rb(x-,jn,p.)b(z - x;n,p)
x=0
x=0
x =0
= (m+n >
)pzq{m+n}~t.
Thus, Z has the binomial density /?(•; m + n, p). ■
The last two examples have extensions to sums of finitely many random
variables. The following lemma will be needed. Recall that a density function
fx (x) is zero when x is not in the range of X.
Lemma 3.3.3
IfXi, X2,.. .,Xn are independent random variables, then Xi + X2 + • • • + X„-i
and Xn are independent random variables.
PROOF: (n = 4 case) Let a and /3 be values of Xi + X2 + X3 and X4,
respectively. Then
fxt+x2+x},xSa> P)
= P(X1+X2+X3 = a,X4 = /?)
= 2Zp(X1+X2 + X3 = a,Xi = x3,X4 = /?)
X3
= 22P(Xi+X2 = a-Xi,Xi = x3,X4 = /3)
=
= a - Xi - x2,X2 = x2,Xi = Xi,XA = 0).
X3
x2
Since Xj, X2, X3, and X4 are independent random variables,
fxi+x2+x},xSa’ P)
= a-Xi- x2)P(X2 = x2)P(X3 = x3)P(X4 = )3).
=
X)
X2
Using the fact that Xj, X2, and X3 are independent random variables,
3.3
INDEPENDENT RANDOM VARIABLES
79
fxi+Xi+X3,xSa>
= ££p(Xi = a - Xi - x2,X2 = x2,X3 = x3)P(X4 = P)
x}
x2
= S£Wi +X2+X3 = a,X2 = x2,X3 = x3))P(X4 = /?)•
V Xi x2
/
Applying Equation 3.7 two times in succession,
fx^x2^Xi,xSa’P") = fxt+x1+x}(a)fx4(P')- ■
This result is a special case of a more general result. Consider a collection
Xi,... ,X„,X„+1,.. .,Xn+m of independent random variables. Let 0 and if/ be
real-valued functions of n and m variables, respectively. Then </>(Xi,...,X„)
and i/»(X„+i,.. .,X„+m) are independent random variables. Any potential
for gaining insight into probability theory by proving this result would be
overwhelmed by the cumbersome notation required at this stage.
Theorem 3.3.4
LetXi, X2,.. ,,Xk be independent random variables.
(i)
If each Xi has a binomial density with parameters n, and p, then
Xj + • • • + Xk has a binomial density with parameters ni + • • • + nk
and p.
(ii)
If each Xi has a Poisson density with parameter Xi, then Xi + • • • + Xk
has a Poisson density with parameter Ai + • • • + Xk.
(Hi)
If each X, has a negative binomial density with parameters r; and p,
then Xj + • • • + Xk has a negative binomial density with parameters
H + • • • + rk and p.
Assertions (i) and (ii) were proved in the n =2 case in the last two
examples; the general cases use these results and a mathematical induction
argument. Assertion (hi) is left as an exercise in the n =2 case, the general
case again being an easy application of mathematical induction.
The problem of finding the density function of Z = </>(Xi,.. ..XJ can
be difficult. Sometimes independence makes it possible, as in the following
example.
EXAMPLE 3.22 Let X and Y be independent random variables each of
which has a uniform density on {1,2,..., n} and let Z = max(X, T). The
range of Z is then {1, 2, ...,n}. Suppose z is in the range. Rather than
calculating/z(z) = P(Z = z), we calculate P(Z
z) for reasons that will
become apparent.
P(Z < z) = P(max(X,T) < z)
= P(X
z,Y < z)
3
80
RANDOM VARIABLES
z
z
= y)
=
x = 1 y- 1
z
=
z
= x)p(y =
x = 1 y= 1
_ z2
n2
By Equation 2.8, for 2
z
n,
P(Z = z) = P((Z < z) Cl (Z < z - l)c)
= P(z < z) - P((z < z) n (z < z - i))
= P(Z < z) -P(Z < z - 1)
_ 2z - 1
It is easy to see that this result holds for z = 1 also. Thus,
/z(z) =
EXERCISES 3.3
(2z — l)/n2
0
ifz = 1,2, ...,m
otherwise. ■
(3.10)
Problem 1 requires the following fact from the calculus. If 27=o
and 2”=0 bnxn are power series, the product of their sums can be written
zL„=ocnx", where c„ = ^.k=oakbn-k for n S: 1, on the common interval
of convergence.
1.
If a and b are any real numbers and z is a nonnegative integer, show
that
x=0
2.
Let X and Y be independent random variables having negative binomial
densities with parameters r and p and s and p, respectively. Derive the
density of Z = X + Y.
3.
Let X and Y be independent random variables each having a uniform
densityon{l,2,...,«}. CalculateP(X > T)andP(X = T).
4.
Let N be a random variable having a Poisson density with parameter
A > 0. Given that N = n, n Bernoulli trials are performed, the
number X of successes is counted, and the number Y of failures is
counted. Show that X and Y are independent random variables.
3.4
81
generating functions
5.
Let X and Y be independent random variables having geometric
densities with the same parameter p. Calculate P(X 2: y) and
P(X = K).
6.
LetX and Y be as in Problem 5. Find the density of Z = X + Y.
7.
LetX and Y be independent random variables having uniform densities
on{l, 2,..n} and let Z = X + Y. Find the density of Z.
8.
Let X and Y be independent random variables and let 0 and be two
real-valued functions on R. Show that </>(X) and ^(Y) are independent
random variables.
Solving the following problem without the benefit of Mathematica or Maple
V software would be tedious.
9.
Let X and Y be independent random variables with X having a binomial
density &(■; 10,1/2) and Y having a uniform density on {1,2,3}. Find
the density of Z = X + Y accurate to three decimal places.
GENERATING FUNCTIONS
In some instances, the problem of finding the density function of a sum of two
random variables can be transformed into a purely algebraic problem using
generating functions.
Definition 3.9
Let {«/}”=o
fl sequence of real numbers. If the power series S”=o aj^ has
(—10, to) as its interval of convergence for some to > 0, then the function A(t) =
ajP is called its generating function. ■
If aj = 1 for all j > 0, then A(t) = X“‘=ot-' has
( —1,1) as its interval of convergence and A(t) = 1/(1 —t). If aj = 1/jlforall
j > 0, thenA(t) = X”=o t’/j\has (—•»,•») as its interval of convergence and
EXAMPLE 3.23
A(t) = e*. If «o — fli — 0 and ay = 1 for all; 2: 2, thenA(t) =
has (-1,1) as its interval of convergence and A(t) = t2/(l - t). ■
Returning to the notation of Definition 3.9, if there is an M G R such that
|a| < M for all; > 0, then the series Z7=ofl;tJ converges absolutely at least
for —1 < t < 1, since the general term of the series zl”=o \aj^\ Is dominated
by the general term of the series X”=o M|tp, which is known to converge for
|f | < 1, and thus the interval of convergence of 5Z”=o aj
contains the interval
(-!,!).
An important result about generating functions is the fact that if a function
can be represented as the sum of a power series on an open interval containing
0, then that representation is unique; i.e., if
82
3
RANDOM VARIABLES
/(r) = ^ajtj = ^bjt}
j=0 j=0
on an open interval containing 0, then
EXAMPLE 3.24
= bj for all j
0.
Suppose we have found that a generating function is
given by
AM = y-TTP
lfl < L
What is the sequence {fljlJLo? We can interpret 1/(1 — t2) as the sum of the
— ^”=0 t2;. Thus,
geometric series
= JV'’
A(r) =
j=0
j=o
and corresponding coefficients of t; are equal. Noting that coefficients of even
powers of t on the right are equal to 1 and coefficients of odd powers of t on
the right are 0, cij = (1/2)(1 — ( —l)J+1),j =0,1,2,.... ■
Another important property of power series is the following. If {a; }“=o and
{fy}“=0 are sequences of real numbers and Cj = aobj + • • • + ajh^.j
0, then
the power series Xj=ocj^ converges absolutely on the common interval of
convergence of the series XJ=o aj^ and zL
*=o
and
ZE cj= (S aif}Y
V=o
j=0
/\; = 0
bi A
/
It is important to remember that the method of generating functions applies
only to nonnegative integer-valued random variables.
Definition 3.10
IfX is a nonnegative integer-valued random variable with density fx, its gener­
ating function is the function fx on [ — 1,1] defined by
X
7x(t) = X/x(x)tx,
-1 < t < 1. ■
x=0
Note that the generating function of X is the same as the generating function
of the sequence {/x(x)}
*
=0. Since [fxj < 1 for all x > 0,/x(t) is certainly
3.4
83
GENERATING FUNCTIONS
defined on (—1,1). But since X”= 0 fxW = 1, the power series converges
absolutely when |t| = I. Thus.^x is defined on [—1,1].
EXAMPLE 3.25
Let X have a geometric density with parameter p, 0 <
p < 1. Then
x=l
x —1
EXAMPLE 3.26
y=0
1
Let X have a binomial density with parameters n and p.
Then
00
A«) =
X
aw'”
x =0
n
x=0
n
. .
x =0
X
= 22 (”)pv
f
*
= (pt + q)n. ■
EXAMPLE 3.27
Let X have a Poisson density with parameter A > 0.
Then
~ A
e"
*
_A^ (At)
*
A x
A(0 = X-TT-' = e
x=0
'
x =0
EXAMPLE 3.28
_A Af
StT = e e = e
A(r_n _
• "
Let X have a negative binomial density with parameters
r and p. Then
A» = E (7 )/(-«)’<' = p'S (7
Fo
x
x=o
x
By the generalized binomial theorem,
The following theorem is one justification for introducing generating
functions.
84
3
Theorem 3.4.1
RANDOM VARIABLES
If X and Y are independent nonnegative integer-valued random variables and
Z = X+ Y>thenfz(t) = fx(t)fY(t)forallt G [-1,1].
PROOF: First note that
/x(Ofy(O =
\x=o
J\y=o
/
=±^
z=0
where ct = ^x=<sfxWfy{z - x). By Theorem 3.3.2,
z=0
z=0
Thus,/z(t) =/x(t)/y(t). ■
Let X be a random variable taking on the values 1,2, 3
with probabilities .02, .53, .45, respectively, and let Y be a random variable
taking on the values 1,2, 3, 4 with equal probabilities. What is the density
of Z = X + T, assuming that X and Y are independent? The generating
function of X is fx(t) = -02t + .53t2 + .45t3, the generating function of Y is
FY(t) = .25t + .25t2 + .25t3 + ,25t4, and the generating function of Z is
EXAMPLE 3.29
fZ(t) =fx(t)fy(t)
= (.02t + ,53t2 + .45t3)(.25t + .25t2 + ,25t3 + .25t4)
= ,005t2 + .1375t3 + ,25t4 + .25t5 + .245t6 + .1125t7.
Therefore, fz (2) = .005,£(3) = .1375,£(4) = /z(5) = .25, fz (6) = .245,
and/z(7) = .1125. ■
Corollary 3.4.2
IfXi,..., Xn are independent nonnegative integer-valued random variables and
Z =Xi + ---+X,l,thenfz(t') =
PROOF: The statement is trivially true when n = 1. Assume it is true for
n — 1. Since X| + • • • + X„_| andX„ are independent by Lemma 3.3.3,
= Jxl+-+xn_l(t) ’fx„(t)
by Theorem 3.4.1. By the induction hypothesis,/x,+—+x„_I (t) =
Therefore,fz(t) = fl^i/x/O. ■
*
(^)
3.4
85
GENERATING FUNCTIONS
This corollary provides an alternative proof of the three assertions in
Theorem 3.3.4.
EXAMPLE 3.30 Let Xi,..., Xk be independent random variables and let
Z = X\ + • • • +Xk. If each X, has a negative binomial density with parameters
r, and p, then
(1 “
“ (1 -
where r = Ti + • • • + rk. But by the generalized binomial theorem,
p'd-««)■' = S( 7 )/(-«<)'•
z=0
Therefore,
z=0
and it follows that
/z(z) = (~r)pr(-qY,
* = 0,1,2,...;
i.e., Z has a negative binomial density with parameters r = n + • • • + r*
and p. ■
Having tediously calculated the probabilities p(x) of getting a score of x
upon rolling three dice in Exercise 1.4.3, the reader will appreciate the ease
with which these probabilities can be calculated using generating functions.
Let Xi, X2, and X3 denote the number of pips on each of the dice and let
X = Xi +X2 +X3. The generating function of eachX; is
Since Xi, X2, and X3 are independent, by Corollary 3.4.2,
A(,)“ 6 + T + ’"+6 J’
86
3
RANDOM VARIABLES
Expanding the expression on the right side using mathematical software,
1 c
1 4
+ —t + 36 *
72
A«) =
ZAO
5 7
7 8
+ —t7 + —r
72
72
108
25 ,2
7 u
5 14
+---- + — t13 + —t14
216
72
72
25 9
°
*
+ -?1
+ ---- 1 + -t
8
216
8
1
5 i'
+ — t17 + — ?8.
+ ---36
72
216
108
)
*
Since/x(
is just the coefficient of tx, the probabilities can be read off; e.g.,
P(X = 13) = 7/72.
Generating functions are particularly useful for solving difference equations,
as in the following example.
EXAMPLE 3.31 Consider an infinite sequence of Bernoulli trials with
probability of success p. For each n 2: 1, let pn be the probability of the event
En “an even number of successes in the first n trials.” We will use the fact
that the probability of an even number of successes in trials !,...,« — 1 is the
same as the probability of an even number of successes in trials 2,..., n. If the
first trial results in a failure, in order for an outcome to be in En there must
be an even number of successes in trials 2,..., n, and if the first trial results
in success, there must be an odd number of successes in trials 2,..., n. Thus,
Pn = qpn-l +p(l - pn-l).
Decomposing E„ in this way makes sense only for n 2: 2. In one trial there
is only one way to get an even number of successes, namely none at all, by that
trial resulting in failure. Thus, pi = q. If the equation above is to hold when
n = 1, we must have q = pi = qpo + p(l - p0), and so p0 must be taken
equal to 1. Therefore, the pn satisfy the difference equation
Pn = qpn-i +p(l-p»-i),
n>l
(3.11)
and the initial condition po = 1. To solve the equation subject to the initial
condition, let P(t) be the generating function of the sequence {p„}”=0; i.e.,
P(0 = !L
=
*
opntn- Multiplying both sides of Equation 3.11 by t" and
summing n = 1,2,...,
^Pnt” = <Jt^Pn-itn~l + pt^tn~l - pt^pn-itn~l
n=!
n=1
m=1
x
00
M=1
»
= qt^Pni” +pt^tn -pt^pntn.
n=0
n=0
n=0
SinceEZ=iP„f" = P(r)“Po = P(J) ~ 1 andEZ=o'" = 1/(1 “ 0.
P(t) - 1 = qtP(t) +
- ptP(t).
3.4
GENERATING FUNCTIONS
87
Solving for P(t),
P(t) = .—!— +---------- ££________ .
1—qt+pt (1 — t)(l — qt + pt)
Applying the method of partial fraction expansions to the second term on the
right,
Pit) =
_!— + _£----- !_
1 — qt+pt
1—q+pl — t
P_________ 1
1 — q + p 1—qt+pt'
Since 1 — q +p = 2p,
Pit) =
1 1
21-t
1
1
2 1 - qt + pt
and
1
1
2Pit) = ------ .j.--------------- _
1 - t 1 - qt +pt
Regarding the two terms on the right as sums of geometric series,
^2pntn =Xtn
n=0
n=0
=^ + i<J-p)")tnn=0
n=0
Equating coefficients of tn, we obtain
pn = ^(1 + iq - p)")>
n == 1.
This solution for pn is much more enlightening than the solution
pn = biO;n,p) + b(2’,n,p) + bi4;n,p) + - - - + bi2m;n,p)
where m is the greatest integer such that 2m < n. ■
It is often necessary to interchange the order of summation of two infinite
series. The essential facts will be presented; proofs are contained in the ap­
pendix at the end of the chapter.
A map a : N X N —> R is called a double sequence, and its value at (z, j) is
denoted by rt/j. We also write a = {a,j} =
= The double sequence
{a; j} converges and has limit L if for every € > 0 there are integers M, N 2: 1
such that
— L] < €
for all m 2 M, n > N.
In this case we write lim, j_»«> a, j = L.
88
3
RANDOM VARIABLES
Given a double sequence {a, j }, the formal expression
00
S a{->
can be formed and is called a double series. For each m 2 1, n 2 1, the
following partial sum can be formed:
Srn,n
y,
1 s i £ tn
ISjSn
EXAMPLE 3.32 Consider the double sequence {«/j} defined by aI>;- =
( — l)'+j(l/j'). Since
is a sum of finitely many terms, the terms can be
added in any order. Fixing; and summing over i,
On the other hand, fixing i and summing over j,
Definition 3.11
The double series'fff^^ a,:j is said to converge and have sum S i/lim^-^ooSij =
S; i.e., iffor each e > 0 there are integers M,N 2 1 such that
|S;j — S| < e
for all i 2 M,j 2 N. ■
If the at,j
0 for all i,j 2 1, we say that
=
diverges to +00
if for every L e R there are integers M, N 2 1 such that Sitj 2 L for all
i > M, j 2 N.
Given the double series 21”^ = i fli,j we can form the iterated sums
3.4
GENERATING FUNCTIONS
89
Proofs often depend upon showing that the latter two iterated series are equal;
i.e., the order of summation can be interchanged.
Theorem 3.4.3
=i
1/
is a double series with aitj > 0 for all i,j > 1, then
even if any sum is +«.
The double series
=i
Theorem 3.4.4
=
converges absolutely if the double series
converges.
If the double series
= i «i,j converges absolutely, then
The next application of generating functions has to do with the sum of
a random number of random variables. Consider any infinite sequence of
random variables {X;}”=1 and let N be any random variable taking on values
in {1,2,...}. An outcome a> determines an infinite sequence X^w), X2(o>),...
as well as a positive integer
and we can form the sum of the first N(w)
terms of the infinite sequence, which is denoted by
5n(<o) = Xi (o>) + X2(w) + • • • + X„M (w )•
Sn is called the sum of a rando m number ofrandom variables. If N is a constant
n, then S„ is the sum of a fixed number of random variables. That
is a
random variable follows from the fact that
00
CO
(SN = s) = |J(Sn = S,N = n) = U((N = n) n (S„ = s))
n=1
n=1
and the fact that each S„ is a random variable.
Theorem 3.4.5
If {Xj}”=1 is an infinite sequence of independent nonnegative integer-valued
random variables all having the same density function f, N is a positive integer­
valued random variable, andN,XltX2>.. • are independent, then
fsN(t) ^fM(fx,(t))-
90
3
RANDOM VARIABLES
PROOF: By stratifying the event (S^ = s) according to the values of N and
using the fact that S/v = S„ on the event (N = n),
X
A,(») =
s=0
X
= ]?P(SN = $)ts
s=0
X
X
= s’N =
=
j=0 n = 1
x
x
= S.N = n)ts.
=
j=0 n = l
Since N, Xb..., Xn are independent random variables, Xi + • • • + Xn and N
are independent random variables by Lemma 3.3.3. By Theorem 3.4.4,
X
X
= *) p(N =
fsN(t) =
s=On=l
X / X
\
= Z|2P(S" =$)rsP(N = n)
n = l\$=O
/
x
- Ekw»(”)
n=0
X
= S(A.(O)Vn(«)
n =0
where the terms corresponding to n = 0 in the last two expressions can be
included since fN(0) = 0. Since/N(t) = ^^=ltnfN(n),
fs^t) =fN(fXl(t». ■
EXAMPLE 3.33 Suppose the wind carries N seeds onto a given plot of
land where N has a Poisson density with parameter A > 0 and each seed
has probability p of germinating, independently of the number of seeds and
independently of the other seeds. Let XbX2,... be an infinite sequence of
Bernoulli random variables with P(Xj = 1) = p. The number of germinating
seeds is then Sn =Xi + ---+Xn. Since/x,(d = pt + q and fa (t) =
fsN(t) = fN (pt + q)
3.4
GENERATING FUNCTIONS
It follows that
EXERCISES 3.4
91
has a Poisson density with parameter Ap. ■
1.
Let X be a random variable having a uniform density on {1,2,, n}.
Find the generating function of X.
2.
The sequence of real numbers {a/}”=i has the generating function
A(t) = 1 — (1 — t2)1/2. Find a formula for the ay.
3.
If the random variable X has each of the following generating functions,
what is the corresponding density function?
(a) fx(t) =
(b) fx(t) =
(c) fX(t) = t/(8 - 7t)5
4.
A die is rolled to determine how many times a coin is to be flipped and
then the coin is flipped that many times. Let X be the number of heads
so obtained. Find the generating function of X.
5.
Consider the generating function
Describe a compound experiment and an associated random variable
having this generating function.
6.
If the random variable X has the generating function
?x(t) = e2(,2-1)>
what is the density function of X?
7.
Consider an infinite sequence of Bernoulli random variables Xi, X2, • • •
with probability of success p. Let En be the event “there are an even
number of successes in the first n trials.” Express E„ in terms of the
random variables Xb X2,... and use the theorems of probability theory
to derive Equation 3.11 by stratifying En according to the values of Xf,
i.e.,p„ = P(E„) = P(E„ n(Xj = 0)) + P(E„ n(X! = 1)), and so
forth.
The following problems require software such as Mathematica or Maple V.
8.
Consider an infinite sequence of Bernoulli trials with probability of
success p = 1/2. For n
1, let qn be the probability that the pattern
11 will not appear in the first n trials (i.e., the probability that there will
not be two consecutive l’s). Derive a difference equation for the qn,
specify initial conditions, and find a formula for the qn.
92
3
RANDOM VARIABLES
9. If 10 dice are tossed simultaneously, what is the probability of getting a
score of 42?
10. If X has the binomial density £>(-; 5, .5), Y has a uniform density
on {1,2,..., 6}, and X, Y are independent random variables, find the
density of Z = X + F. .
11. For j = 1,..., 10, the random variable Xj takes on the values 1 and 0
with probabilities pj and 1 — pj, respectively, where pj = (.95)V2. If
X = Xi + ■ • ■ + Xio, find the density of X assuming that
..., Xio
are independent.
GAMBLER'S RUIN PROBLEM
Suppose a gambler and an opponent have a combined capital of a units and
the gambler has x units of capital where 1
x
a — 1. The gambler wagers
one unit on successive plays of a game in which the probability that he will win
one unit is p and that he will lose one unit is q = 1 — p, where 0 < p < 1. The
gambler is ruined if his capital ever reaches zero units; his opponent is ruined
if the gambler’s capital ever reaches a units. What is the probability that the
gambler will be ruined eventually? Since it is conceivable that the wagers could
go on forever and neither be ruined, it is of interest to also find the probability
that the opponent will be ruined.
A more immediate question concerns a probability model for which these
questions make sense. Let {Xj} be an infinite sequence of Bernoulli trials with
probability of success p so that the X/s are independent, P(X; = 1) = p,and
P(X; =0) = q. If for each j
1 we let Yj = 2Xy — 1, then the Yj's are
independent random variables with P(Fj = 1) = p and P(F; = —1) = q.
For;
1, Yj represents the gambler’s gain on the;th play of the game. His
capital as of the; th play is then S; = x + Y i + • • • Yj, j S 1. The gambler is
ruined if $! = 0 or 0 < Si < a,..., 0 < S; = i < a,Sj =0 for some; > 2.
The probability of eventual ruin qx, which depends upon x, is given by
qx — P($i = 0or0<$| < a,.... 0 < Sj-i < a, Sj = 0 for some; > 2).
Since the indicated events are mutually exclusive,
— P($i = 0) + P(0 < S| < a,..., 0 < Sj-i < a, Sj = 0 for some;
Suppose 1 < x < a — 1. Then ruin cannot occur on the first wager, and
qx = P(0 < S| < a, ...,0 < Sj-i < a,Sj = 0 for some; > 2).
We now show that
qx — pqx+i + qqx-i>
1 < x < a — 1.
2).
3.5
93
GAMBLER'S RUIN PROBLEM
A “probabilistic argument” can be made as follows. The first wager can
result in winning one unit, with probability p, whereupon the gambler’s capital
becomes x + 1 and the probability of subsequent ruin is qx+i; since an event
determined solely by the first wager and an event determined by subsequent
wagers are independent, the probability of winning the first wager and then
being ruined is pqx+\. Similarly, the probability of losing the first wager and
then being ruined is qqx-\. Since these two possibilities are mutually exclusive,
qx = pqx+i + qqx-i>
for 1 < x < a - 1.
The same argument applies when x = 1, with the exception that the probability
of losing the first wager and then being ruined is q • 1, since ruin has already
occurred on the first wager. Thus, qi = pq2 + q, and if the equation above is
to hold when x = 1, we must have qo = 1. Similarly, qa-i = qqa-2> and we
must have qa = 0. The qx must then satisfy the difference equation
qx = pqx+i +qqx-i>
1 < x < a - 1
(3.12)
subject to the boundary conditions
qo = i>qa = o.
(3.13)
One way of solving such a problem is to try known functions successively until
we come across a solution; e.g., qx = A, qx = Bx,qx = Cx2,...,qx = DXX,
and so forth, where A, B, C, D,... are constants. It is easy to check that if A is
any constant, then qx = A satisfies the difference equation but does not satisfy
the boundary conditions. Trying qx ~ BXX results in a quadratic equation in
A that has two roots A = 1 and A = q/p. At this point, we must consider
two cases according to whether p
q or p = q = 1/2, since there is only
one solution in the latter case. Suppose first that p # q so that there are two
distinct roots of the quadratic equation. In this case there are two solutions
qx = A and qx = B(q/p)*, but neither satisfies both boundary conditions.
Noting that the difference equation has the property that if q^ and q^1 are
two solutions, then q™ + q® is also a solution, we might try
\x
~1 •
p)
(
In this case, A and B can be chosen so that both boundary conditions are
satisfied and satisfy the equations
A+B = 1
(\«
- | = 0.
P
94
3
RANDOM VARIABLES
Solving for A and B,
W
= (,/p)- X,
(q/p)a - 1
(3.14)
provided p # q. Suppose now that p'= q = 1/2. Again qx = A is a solution
of the difference equation
qx = ~<?x+i +
1 < x < fl - 1
but does not satisfy both boundary conditions. This time qx = Bx satisfies
the difference equation but not the boundary conditions. The function
qx = A + Bx will satisfy all conditions and leads to the solution
qx = 1 — —,
— 1
(3.15)
provided p = q = 1/2.
Equations 3.14 and 3.15 provide the answer to the first of the two questions
originally raised about the probability of eventual ruin. What about the second
question pertaining to the probability px that the gambler will wipe out his
adversary? It is not necessary to repeat the arguments given above, since we
can interpret px as the probability of ruin for the adversary, in which case x is
replaced by a — x and p by q in the equations above. In the p # q case,
and in thep = q = 1/2 case,
px = -,
a
1 < x < fl - 1.
(3.17)
Returning to the probability of ruin, we have found a solution to the
problem, depending upon whether p
q or p = q = 1/2. How do we know
that the qx is the real solution to our problem? Perhaps there is some other
solution qx that satisfies the difference equation and the boundary conditions.
It is a question of the uniqueness of the solution. Suppose q^ and q<2) are two
solutions. Then ux = qxl> — qx2> will satisfy the equation
»x = pMx+i +qux-i,
1 < x < a - 1
(3.18)
and the boundary conditions
Mo — 0> Mfl = 0.
(3.19)
3.5
95
GAMBLER'S RUIN PROBLEM
Assume that ux & 0. By replacing ux by — ux, if necessary, we can assume
that uy > 0 for some y 6 {0,1,, a}. Consider the finite set of numbers
{mo, »i,
, ua}. There is some m for which um is the largest of the numbers
in this set; i.e., um 2: ux for x = 1,2,..., a - 1 and um > 0. If there
is more than one such m, we can assume by the well-ordering principle
that m is the smallest integer with this property. Then um-i < um. But
Um = pUm+1 + qum-i < pum+i + qum
pum+qum = um, a contradiction.
The assumption that ux & 0 leads to a contradiction and therefore ux = 0;
i.e., qW = qW for x = 1, 2,..., a — 1. Thus, the qx given by Equation 3.14
or 3.15 is the only solution of the difference equation satisfying the boundary
conditions.
What happens if the gambler decides to wager one-half unit each time
instead of one unit? Will this improve his chances of avoiding eventual ruin?
The effect of this change is to double the number of units. In the p = q = 1/2
case, the probability of eventual ruin is
1-^ = 1--,
2a
a
(3.20)
and there is no change in the probability of eventual ruin. Suppose p
this case the probability of eventual ruin is
(q/p?a ~ (q/p)
*
(q/p)2a - 1
=
qx
. (q/p)a + (q/p)x
(q/p)a + 1
q. In
_
In the usual situation in which the game is unfair to the gambler q > p, the
second factor on the right is greater than 1 so that wagering half a unit instead
of a whole unit actually increases the probability of eventual ruin.
EXAMPLE 3.34 Suppose the gambler has an initial capital of $100 and
he decides in advance to continue placing wagers of $10 on a game with
p = .45 until he has increased his capital by $10 or has been ruined. He
then has 10 units to wager. By Equation 3.14, the probability of eventual ruin
is gio = -204. Thus, there is a probability of .796 of achieving the goal of
increasing his capital by $10. Of course, if upon winning the $10 the gambler
gets greedy and continues to play against an adversary who for all practical
purposes is infinitely rich, then it is simply a question of how long it will take
for ruin to occur. But that is another mathematical problem. ■
EXERCISES 3.5
1.
What is the probability that the wagering will eventually terminate?
2. If q > p, what is the gambler’s probability of eventual ruin against an
infinitely rich adversary?
3. If {ii,.... im} and {.. .,jn} are disjoint sets of positive integers, it
is known that events of the type (Y-,-, = 8i,...,Yim = 8m) and
(Y;i = €l,...,Yjn = €„) are independent. Show that the events
(y-j = i) and (y2 + • • • + Yj = y for some j
2) are independent.
96
3
RANDOM VARIABLES
4. Modify the gambler’s ruin problem by allowing the possibility of a tie
on each play of the game so that there are positive numbers a, ft, y,
with a + /3 + y = 1 such that P(Yj = 1) = a, P(Yj = 0) = /3,and
P(Yj = -1) = y, and let qx be the probability of eventual ruin for
the gambler. Derive a difference equation for the qx and appropriate
boundary conditions. Solve for the qx and draw conclusions.
APPENDIX
We need only deal with the first equation because
the second can be obtained by interchanging the role of i and j. Suppose first
that 2”y = i ai,j converges and has sum S. Clearly,
S for all i,j
1.
Since limfj_»oo Sl>;- = S, given e > 0 there are integers M, N 2: 1 such that
Proof of Theorem 3.4.3
S -e<
X
ai.i ~ s
for all m > M,n 2 N.
ISjSm
1 < i<n
Since
m
S
I
1 Si Sn
n
a'-j
i=lj=l
is valid for finite sums, S — e < 27=1 2"=i <%i.j
Sforallm 2:M,n 2 N.
Since 2"=i ai.j is an increasing sequence for each i, with m fixed we can take
the limit as n —> oo to obtain S — e < 27= i 2”=i a‘-i ~
Since the
middle expression increases with m and is bounded above by S, the series
27= 1 (27=! ai.j) converges and
Since e is arbitrary,
Suppose now that 27=i(27=i
converges to S in R. Given e > 0, there is
anM > 1 suchthatS-e < 27=1(27=] «,,;)< S+eforallw > M,from
which it follows that for each i = 1,...»m, the series 27= i ai,j converges.
Thus, S — e < lim„
27= i 2"= i «i,; < S + e, and there is an N >: 1 such
3.6
APPENDIX
97
aij — Sm,n < $ + e for all m > M,n s N. This
that S — e < X7= ]
shows that 2E7; = j
converges and has sum S = X7=1(XJ>= i «/,;)•
Assume that the double series X7y = i ai,j diverges to +<®. Given any L G R,
there are integers M, N > 1 such that
m
n
22 22 ai.j =
I =1 ;=1
Thus, 227
*=
22 aij >
for all m
M, n
N.
ISjSm
1 < i == n
i(^7= ia'^ >
f°r
m — M. Thus, the sequence
diverges to +<», and so
Finally, suppose that the series 27=i^7=i a’^ diverges to +<». To deal with
this case, note that lim„_»ooS„in exists as a real number or lim„_»ooS„,„ = +°°
since {S„,n}7=i is an increasing sequence of real numbers. In the latter
case, given L G R there is an M S 1 such that S„,„ > L for all n
M,
and therefore Sm,„ > L for all m > M,n £ M; i.e., limm,n_»«> Sm,„ =
X,7; = iai.j = +°°- On the other hand, if lim„_»«>S„,„ = S G R, then
it is easy to see that hmw,„_»ooSw,„ = S, and by the first part of the
proof
](2^7=1 a'^ converges to S G R, a contradiction. Therefore,
ll^Ozn.n—*°° ^m,n
= ] ^i,j
"P00. ■
The following functions will be needed for the next proof. For x G R, let
x*
= max(x, 0)
x~ = max(—x, 0).
Then it is easy to see, by considering the two cases x
that
x = x+ — x
|x| = x+ + x~
0 < x+ s |x|
0 < x" < |x|.
Oandx
0 separately,
98
3
RANDOM VARIABLES
Proof of Theorem 3.4.4: Since 0
the double series
i a;,j
converges. By Theorem 3.4.3,
Taking the difference between the + and — versions results in the conclusions
of the theorem. ■
SUPPLEMENTAL READING LIST
W. Feller (1957). An Introduction to Probability Theory and Its Applications, 2nd ed.
New York: Wiley.
EXPECTATION
INTRODUCTION
The concept of expectation was first formalized in print by Huygens in the
middle of the seventeenth century, and it has played an essential role in
probability theory ever since. The expected value of a random variable is a
number that summarizes information about a random variable. From the
time of its inception until the middle of the twentieth century, the concept of
expected value developed along two paths: the discrete and continuous cases.
Although it is possible to treat both paths simultaneously, we will stay with the
discrete for the time being.
Among other things, we will determine the expected duration of play in
the gambler’s ruin problem, discuss prediction and filtering theory, and look
briefly at some applications to communication theory.
EXPECTED VALUE
Unless specified otherwise in examples, (0,9% P) will be a fixed probability
space. The idea behind expected value is very simple. If a gambler wagers on
a game in which he can win one unit with probability p = 3/4 and lose two
units with probability q = 1/4 and he plays 100 games, then according to the
empirical law for relative frequencies, the gambler would expect to win about
75 games and lose about 25 games. Thus, he would expect to win about 75 • 1
units and lose about 25 • 2 units with a net gain of 75 • 1 - 25 • 2 units. Putting
this on a per-game basis, he would expect a net gain per game of
99
1OO
4
EXPECTATION
S'-g-W-’g
where the coefficients 3/4 and 1/4 are the probabilities of winning 1 and —2
units, respectively.
Definition 4.1
Let X be a random variable with range {xi,x2,...}, finite or infinite. The ex­
pected value ofX, denoted by E [X], is defined as the real number
E[X] = ^XifxM
i
provided that the series converges absolutely and X is said to have finite expecta­
tion; ifP(X > 0) = 1 and the series diverges, E[X] is defined as +°°. ■
If the range of X is finite, the series on the right is a finite sum and there is
no question of convergence, absolute or otherwise. The question of absolute
convergence is appropriate only when the series is infinite. Recall that if a series
converges but not absolutely, then it is conditionally convergent. In the case
of a conditionally convergent series, a rearrangement of the terms of the series
can alter the sum of the series. If the sum in the series above is conditionally
convergent, then one person listing the values of X in one order might arrive
at a different sum than would some other person listing the values in another
order. Under absolute convergence, the order in which the values of X are
listed does not matter.
EXAMPLE 4. 1 Let X have a uniform density on {0,so that
fxW = 1/(m + 1),x = 0,1,..., n. By Exercise 1.2.6,
B[X] = Vi— = —Ti = - — -',(”4'1) = - ■
pi " + i
n + lp;
» +>
2
EXAMPLE 4. 2
Let X have a binomial density with parameters n and p.
Then
,
n
.
E|X] -2}x(")pV'‘
rt
=
*
V
(x - l)!(n -x)r
X=1
n—1
= np^^b(x; n — l,p)
x=0
= np,
X — g(n-l)-(x-l)
4.2
EXPECTED VALUE
1O1
the last equation holding because Z"=o b(x-, n - l,p) is the sum of all the
probabilities making up a binomial density. ■
Let X have a geometric density with parameter p so that
fxW = pqx~l,x = 1,2,.... Regard the series zl”=0 Q
* as a power series in
q with ( — 1,1) as its interval of convergence. Since
EXAMPLE 4.3
within the interval of convergence
(1-q)2
dq\~q
f^dq^
dq^
with the latter series also converging absolutely in the interval ( — 1,1). Return­
ing to the geometric density,
E[X] = J^xp^’1 = P^x^-1 = —■
x=l
EXAMPLE 4.4
x=0
'
r
Let X have a Poisson density with parameter A > 0 so
that
mi
x=0
provided that the series converges absolutely. Since the terms of the series are
nonnegative, absolute convergence and convergence are the same thing and we
need only verify the latter. Since
and the latter series is the Madaurin series expansion of eA, which is
known to converge absolutely on (—°°, +00), E[X] is defined and E[X] =
Ae-AeA = A. ■
If X is a random variable and 0 is a real-valued function on R, then
Z = (f>(X) is also a random variable. According to the definition of expected
value, to calculate E [Z ] we must first determine the density fz of the random
variable Z. This need not be done according to the following theorem.
102
Theorem 4.2.1
4
EXPECTATION
Let X be a random variable with range {xb Xi,...} and let <j> be a real-valued
function on R. Then E[</>(X)] is defined and
sm]
j
provided the series converges absolutely.
In applying this result, the sum on the right is formed by replacing X in
<p(X) by a typical value Xj, multiplying by the probability that X takes on that
value, and then summing over j.
PROOF: Assume that the series converges absolutely. Any rearrangement of
the series will not affect the convergence or sum of the series. Let {zi, Z2» • • ■}
be the range of Z = </>(X). By rearranging the terms of the series,
xwwy. w=x
iW)=zi)
*
j
'
i
=
=
= E[Z].
i
show that the series
The same steps applied with
replaced by
S, ztfz fa) converges absolutely. ■
EXAMPLE 4.5
&(•; n,p) and let
Let X be a random variable with binomial density
= x2, x E R. Then
W(X)]
n
=£
fc = O
z
n
.
.
,
- »(I )pV“‘+(
J )A”‘.
it
=O
We have seen that the second sum on the right is equal to np. Since
= n(n - l)p222 (
)pk-2(fn
k=2
n —2 .
_ _ .
= n(n - l)p222 ( ” k
)pkq{n~2}~k
k=0
n—2
= n(n - l)p2^
k =0
= n(n - l)p2,
- 2,p)
4.2
EXPECTED VALUE
103
E[X2] = n(n — l)p2 + np = n2p2 — np2 + np. ■
It is not hard to construct examples of random variables X for which E[X]
is not defined as a real number.
EXAMPLE 4.6 Let X be a random variable with density/x(x) =
x = 1,2,... (see Exercise 1.5.10). Since the series
is the divergent harmonic series except for a missing first term, the series
21”=1x/x(x) does not converge, and therefore E[X] is not defined as a real
number. ■
In some instances, as in the previous example, when the terms of the series
Xy Xjfx(Xi') are nonnegative but the series does not converge, we say that the
series diverges to+<» and we write E[X] = +oo.
Theorem 4.2.2
IfX has finite expectation and c is any real number, then
(i)
lfP(X > 0) = l,thenE[X] > 0.
(it) IfP(X = c) = l,thenE[X] = c.
(iii)
E[cX] = cE[X].
PROOF: IfP(X > 0) = l,then/x(
)
*
= Owheneverx <0,andsoE[X] =
XjXjfx^Xj) = Xx.s0Xj/x(
j)
*
Ifp(x = c) = Lthenfx(c) = 1 and
fxM = 0 whenever x # c, sothatE[X] = ^jXjfx(Xj) = cfx(c) = c. By
Theorem 4.3.1, E[cX] =
Xjfx(xf) = cE[X]. ■
If the density function of a nonnegative integer-valued random variable X
is not known but its generating function is, the expected value of X can be
calculated indirectly using the generating function. The following notation is
useful for this purpose. Let / be a real-valued function defined on an interval
having fl as its right endpoint. Then/(fl —) is defined to be limx_>(J-/(x), even
if infinite.
Theorem 4.2.3
(Abel)
Let {flj}J°=0 be a sequence with aj > 0 and generating function A(t) on (-1,1),
andletA(T) = Xy°=oflr Then
A(l-) = lim A(t) = A(l),
t -St­
even if the series diverges to +oo.
104
4
EXPECTATION
PROOF: Suppose first that ^=oaj diverges to +<». Given any M, there is an
N > 1 such that X"= oai > M for all n s N. Since limf_»i-~
j — 1 J >M for all n S N,
> M.
Since M is arbitrary, A(l —) = +<». Suppose now that
< +°°- Let
L = ^?=oaj. Given any e > 0, there is an N
1 such that ^L"=oaj > L — e
for all m > N. Thus,
= L = X.j = oaj- ■
Since e is arbitrary, A(1 —) = lim^!-2
*
=0
The point of Abel’s theorem is that if we formally put t = 1 in the
equation A(t) = X7=ofl/f;, then A(l—) = A(l), even if infinite. In working
with power series on ( — 1,1) having nonnegative coefficients, we will put
A(l) =
, even if infinite.
Let X be a nonnegative integer-valued random variable with density func­
tion fx and generating function /x- ln this case, it is possible for E[X] =
2”„ox/x(
)
*
= +°°. We will need a tail probability function defined by
gxW = P(X > x),
x = 0,1,...
and its generating function
X
gx(0 = ^gxWtx,
-Kt < 1.
x =0
Note that gx is not a density function. The functions fx and gx are related by
the following equation:
“I < t < 1.
gx(0 =
To see this, write
00
(1 - Ogx(r) = (1 - t'l'y' gx(x)tX
x=0
X
cc
= ^LgxWtX - ^Tgx(x)tX+l
x=0
x=0
(4.1)
4.2
EXPECTED VALUE
105
00
co
~^gx(x - l)tx
= gx(O)
x=l
x=l
00
= 1 ~/x(0) - 2j#x(
*
- 1) ~gx(x))tx.
x=l
Noting that gx(x - 1) - gx(x) = P(X > x - 1) - P(X > x) = P(X =
x) = fx(x} forx > 1,
(l-t)gx(O = l-7x(f).
This establishes Equation 4.1.
Since the interval of convergence of the power series defining }x contains
(-1,1),
7x(0 =
= i^xfxW1,
x=0
flrx=o
and therefore E[X] = 2”=ox/x(
)
*
~
Z”=o*
(
“ l)/x(x) = /x^l), even if infinite.
Theorem 4.2.4
— 1 < t < 1,
Similarly, E[X(X — 1)] =
IfX is a nonnegative integer-valued random variable, then
B[X] =7x(D = MD
whetherfinite or infinite.
PROOF: By Equation 4.1 and the mean value theorem for derivatives,
fa(f) = 1 -
= £(f)
where t < £ < 1. By Abel’s theorem,
fed) = fed-) = Jun.
= Zfd-) -?xd) = si
*)-
■
EXAMPLE 4.7 Let X have a negative binomial density with parameters r
andp so that7x(*) = pr(l “ ^)"r-Then7x(0 = rprq(l - qt)-r-1, andso
E[X] =7^(1) = r(q/p). ■
It was shown in Section 2.7 that a remotely operated garage door opener is
anything but secure. Let {Xj}J°= j be an.infinite sequence of Bernoulli random
106
4
EXPECTATION
variables with probability of success p = 1/2. We have seen that the word 1001
will occur infinitely often in an outcome with probability 1.
EXAMPLE 4.8 (Password Problem) Consider the sequence just described
and define a random waiting tjme T by putting T = n if the word 1001
appears for the first time at the end of the nth trial so that T > 4. Let
gr(n) = P(T > n); i.e., gr(n) is the probability that the word 1001 doesnot
occur in the first n trials. Note that gr(0) = gr(l) = gr(2) = gr(3) = 1.
The only way for the word 1001 not to appear in the first n trials for n S: 4 is
for an outcome to begin with one of the following starting patterns:
0 • • •, 11 •••,101•••,1000 • ■ •,
and the word 1001 does not subsequently appear. The event consisting of
those outcomes with the starting pattern 0 • • • and the word 1001 does not sub­
sequently appear in the remaining n — 1 trials has probability (l/2)gr(n — 1).
A similar argument applies to the other three starting patterns. Therefore, the
gr(n) must satisfy the difference equation
gr(n) =
“ 1) +
(4.2)
~ 2)
+ Jst(h “ 3) + -^grtn - 4),
o
lo
n > 4
subject to the initial conditions
gr(0) = gr(l) = ^t(2) = gr(3) = 1.
(4.3)
Multiplying both sides of Equation 4.2 by tn and summing over n > 4,
gr(t) - 1 - t - t2 - r3
t
t2
= AgAt} - 1 - t ~ t2) + ~(gr(t} - 1 - f)
Z>
t4
+ -z(gr(t) ~ 1) + T^grtt')O
AO
Solving forgr,
16 + 8t + 4t2 + 2r3
8t f “ 16 - 8t - 4r2 - 2r3 - r4'
Since fT(t) = 1 — (1 — t)gr(f), which is a rational function of t, we could in
principle determine the density fT by applying the method of partial fraction
4.3
PROPERTIES OF EXPECTATION
107
expansions to frW; this requires, however, finding the roots of the polynomial
in the denominator of gj-. If all we are interested in is E [ T ], then these problems
can be avoided by using Theorem 4.2.4 to obtain E[T] = gr(l) = 30. On
the average, it will take about 30 trials for the word 1001 to appear in an
outcome. ■
Let X have a geometric density with parameter p. Find E [X2].
EXERCISES 4.2
2.
If X has a Poisson density function p(-; A), calculate E[X2].
3.
A random sample of size 3 is drawn from a bowl containing 10 white
and 5 red balls. If X is the number of white balls in the sample, find
B[X].
4.
Let X be a random variable having a Poisson density with parameter
A > 0. Calculates[1/(1 + X)].
5.
Let X be a random variable having a negative binomial density with
parameters r S: 2andp. Calculate E[1/(X + 1)].
6.
If X is a nonnegative integer-valued random variable, show that E [X ] =
x:=1p(x ^x).
7.
Let {Xj}“=1 be a sequence of independent nonnegative integer-valued
random variables all having the same density function for which
E[Xj] = E[Xi] is defined as a real number, and let N be a positive
integer-valued random variable such that E[N] is defined as a real
number. Assume that N, Xi,X2,... are independent. If
=
Xi +X2 + • • • +Xn, show that E[Sn] = E[N]E[Xi].
8.
A remotely operated garage door opener has an electronic combination
lock of 10 binary digits. If a random device transmits a signal that
has the probability properties of an infinite sequence of Bernoulli trials
with probability of success p = 1/2, what is the expected number of
digits required to activate the opener?
The following problem requires mathematical software such as Mathematica
or Maple V.
9.
Consider the random variable T of Example 4.8. Calculate P(T £11).
PROPERTIES OF EXPECTATION
In Chapter 3, we defined functions of random variables such as <p(X, T) and
i/r (Xi,..., X„). In the case of a single random variableX, it was shown that the
expected value of Z = </>(X) could be calculated without going through the
intermediate step of determining the density function of Z. A similar result
applies to a function of several random Variables.
108
Theorem 4.3.1
4
expectation
IfXi,... ,Xn are random variables and ip is a real-valued function of n variables,
then
E[^Xl,...,X„')] = 2
'
.... xB(Xb...,x„)
(4.4)
»t
provided the multiple series on the right converges absolutely.
Operationally, the sum on the right is obtained by replacing X\,.. .,Xn by
typical values xn..., xn, respectively, multiplying by the probability that the
random variables will take on those values, and then summing in any order
over all possible values of the random variables.
The proof of Theorem 4.3.1 amounts to a justification of the rearrangement
of the terms of the series. The reader is referred to Theorem 12-42 in the book
by Apostol listed at the end of the chapter.
Theorem 4.3.2
If X and Y are random variables with finite expectations and P(X 2: T) = 1,
thenE[X] > E[y).
PROOF: Since/x.y(xi,y;) = 0 whenever x, <
by Theorem 4.3.1,
E[X] = y'xifx.Y(xi,yj')
Xi.fi
= 22 xifx.y(xi,yj')
Xi *
fi
- X yjfx.y{Xi,yj)
Xi *
fi
= '^yjfx.Y(xi,yf)
Xi.fi
= E[y]. ■
Theorem 4.3.3
If Xi,...,Xn are any random variables with finite expectations and ci,...,cn
are any real constants, then 22"=1 CjXj has finite expectation and
n
n
= 2>W
E
J=1
;=i
PROOF: By Theorem 4.2.2, we can assume that the Cj = 1, j — 1,..., n.
Taking i/
*(xi, ... ,xn) = Xi + • • • + x„ in Theorem 4.3.1, we must first show
that the series therein is absolutely convergent. Since ]xj + • • • + x„ | <
4.3
PROPERTIES OF EXPECTATION
109
... Xn(xb...,X„)
22 IX1 + ‘
- 22 (tai + • • • + ta|)/x.... X„(Xb • ■ ■ >X„)
X| ,...fxn
= Stal/x...... (Xb • • • > X„ ) + • ' * + 22talA,... X„(X1,...,X„).
X|>...>X^
X|f...fXfj
By Theorem 3.4.3, a suitable order of iterated summation can be chosen so that
22 tal/x....x„(xb...,x„)
= 21 tai
Xj
22
fx,... x„(xb...,x„)
\X|>...tXj—|>Xj + |t...tXn
•
= 21taWxA
and so
22 |xi + • • • +Xn|/xi,...,X„(xb...,X„)
22tal/x,(Xl)
Xi
X|.-...X„
+ ■ ■ ■ + 22 ta \fx„ (Xn )•
x„
Since each Xj has finite expectation, each term on the right is finite and the
multiple series converges absolutely. Therefore, E[Xi + • • • + X„] is defined,
and
E[Xi +•• • +X„] = 22 (*i + ’ ’ ’ +x„)/x... jcn(xb...,x„)
= 22 x^fx... x(xb--.,x„)
X|,...jXn
+ •••+ 22 xnfxt... x„(xb...,x„)
X!,...pcn
= E[X1] + ---+E[X„]. ■
EXAMPLE 4.9 Consider an infinite sequence of Bernoulli random vari­
ables {Xj }J°= । with probability of success p and let S„ = Xi + • • • + X„. Then
no
4
EXPECTATION
£[Xj] = 1 • p + 0 • q = p,j =
By Theorem 4.3.3, £[S„] = np,
a result obtained previously using the fact that Sn has the binomial density
b(-;n,p). ■
The introduction of auxiliary random variables as in the next example can
simplify the computation of expected value.
EXAMPLE 4.10 Suppose a population of n objects consists of nx objects
of Type 1, »2 objects of Type 2,... , ns objects of Type s, where n = ni +
n2 + • • • + ns. A random sample of size r
n is taken without replacement
from the population. Let Xi be the number of Type 1 objects, X2 the number
of Type 2 objects,..., Xs the number of Type $ objects in the sample. To cal­
culate £ [Xy ], we define auxiliary random variables Ijj,..., Ij>r as follows. Let
I^k = 1 or 0 according to whether the fcth object in the sample is of Type j or
not. The value of Ij,k is determined by looking at the fcth object chosen from
the population and totally disregarding the other choices. This amounts to
selecting just one object. Thus, P(Ij,k = 1) = nj/n,P(Ijtk = 0) =
(n — nj)/n, and therefore £[/;,&] = 0 • ((n — «y)/n) + 1 • (ny/n) = nj/n.
Since X} = I+ • • • + IJir,
EIX;] = XEl'iAl - T~~
k=l
by Theorem 4.3.3. ■
Theorem 4.3.4
IfX and Y are independent random variables with finite expectations, then X Y
has finite expectation and
£[XT] = £[X]£[r].
PROOF: Let x-t and yy be arbitrary elements of the range of X and Y, respec­
tively. We must first show that
< +00-
'J
By Theorem 3.4.3,
2Z \xiyj\fx.Y(Xi,yj) = 22 l*
>lhl/x(xi)/y(y ;)
= S
>
1 DE hl A (yj)
\j
/
(*-• )•
4.3
PROPERTIES OF EXPECTATION
111
Since the sum within the parentheses is a constant, it can be taken outside the
summation over i to obtain
^\xiyj\fx,Y(xi>yj)
= (22
)< +00/Xi
X j
i.j
/
Therefore,XY has finite expectation and
i
\;
/
= E(y]X^/x(x,)
i
= E[T]E[X]. ■
It is important to remember that Theorem 4.3.4 applies only to independent
random variables.
Suppose two dice, one red and one white, are rolled. Let
X and Y be the number of pips on the red and white die, respectively. By
Exercise 1.2.6,
EXAMPLE 4.11
6 •
6
B[X) = E|Y) - y
>-l
y
= X j = T.
>-l6
2
Since X and Y are independent random variables, E[XV] = E[X]E [T] =
49/4. ■
Definition 4.2
The random variable X has a finite second moment if E [X2] is finite. ■
We will need the following fact: if x is any real number, then |x|
x2 + 1.
To see this, note that if |x|
1, then |x| < x2 + 1 whereas if |x| s 1, then
Jx|
|x|2
x2 + 1.
Consider a random variable X with finite second moment and range
{xi,x2, •••}• Since Z, h|/x(x,) == Z;(x2 + l)/x(
i)
*
E[X2] + 1,X has
finite expectation. In this case, we can define a parameter fix by
Mx = E[X]
which is called the mean or expected value of X.
112
4
EXPECTATION
Consider the random variable (X—/xx)2 = -K2-2/xx-^+Mx- SinceX2and
X have finite expectation, (X - Mx )2 has finite expectation by Theorems 4.2.2
and 4.3.3, and we can define a second parameter
<r2
x = £[(X - Mx)2].
called the variance of X. The variance of X is also denoted by var X. ax =
VvarX is called the standard deviation of X. If the random variable X is
clear from the context, the subscript X on fix and crx will be omitted. Since
(X - Atx)2 = X2 - 2/xx-X + Mx andE[/xx-X] = Mx^M = (E[X])2, by
Theorem 4.2.2
varX = o-2 = E[(X - Mx)2] = W2] " (B [X])2.
It is easily checked that var (aX) = a2 varX and var (X + c) = varX.
EXAMPLE 4.12 Let X be a random variable having a uniform density
on {0,1,..., n}. It was shown in the previous section that E [X] = n/2. By
Exercise 1.2.6,
2 1
1 r 2 ^2
2i
n(2n + 1)
> .X2----- - = ----- -[I2 + 22 + • • • + m2] = ----- -----n +1
n +1
6
x=0
V-1
_ m(2m + 1)
6
(n V _ n(n + 2)
\2/
12
Let X be a random variable having a binomial density
b(-;n,p). It was shown in the previous section that E[X] = np and that
E[X2] = n2p2 — np2 + np, so that
EXAMPLE 4.13
varX = E[X2] - (E[X])2 = np(l - p). ■
Let Xb ..., Xn be independent random variables all hav­
ing the same density function and finite second moments. Let p, = E [X; ] and
cr2 = varXp 1
j S n, and let S„ = X] + • • • + X„. By Theorem 4.3.3,
E [S„ ] = n fi. The variance of S„ can be calculated using the equation
EXAMPLE 4.14
n
(
;=|
n
= 2JX; - p)2 + 2jx; - fj^Xj - fl),
;=i
> j
4.3
113
PROPERTIES OF EXPECTATION
provided the terms on the right have finite expectations. The terms of the
first sum have finite expectations because the Xj have finite second moments.
By independence and Theorem 4.3.4, the terms of the second sum have finite
expectations, and E[(X; - /x)(X;- - /x)J = E[X; -/x]E[X;- -/x] =0. Thus,
n
varS„ = E[(S„ - n/x,)2] = ^E[(X, - /x)2] = no2. ■
;=i
Let X be a nonnegative integer-valued random variable with finite second
moment. We have seen that E[X] can be calculated from the generating
function/x by the equation E[X] =
varX can also be calculated from
the generating function. In fact,
varX =/;(!)+/x(l)-[/x(l)]2To see this, recall that/x(t) = zl”=0/x*,()£
(4.5)
so that
• 00
fx(t) = 2Z
(
*
x =0
“ Vfx(x)tx~2
on the interval (-1,1). By Abel’s theorem,
00
7xd) =
= b[x(x ~1)1 =
_
x=0
andsoE[X2] =
+ £(1) andvarX =/x'(l)+/x(l) - [/x(l)]2.
EXAMPLE 4.15 LetX be a random variable having a Poisson density with
parameter A > 0. Then/^(t) = eA(f-1),/x(t) = AeA(f-1),/x(t) = A2eA(f-1).
Thus, varX = ^(1) +/x(l) “ [/x(D]2 = A2 + A — A2 = A. ■
The mean and variance of a random variable X are just two parameters that
summarize some of the information in its density function. Even though in
most cases they do not determine the density, they can provide information
about probabilities.
Lemma 4.3.5
(Markov’s
Inequality)
If X is any random variable with finite expectation and t > 0, then
। ।
P(|x| > t) <
E[M]
114
4
EXPECTATION
PROOF: Let {xbx2,...} be the range of X. Since the series defining
is
absolutely convergent, E[|X|] < +<». By Theorem 4.2.1,
B[|X|] =
j
S t
- X
fx(Xj)
|xj2f
Ixjat
= tP(|X| > t). ■
The next inequality is an easy consequence of Markov’s inequality.
Theorem 4.3.6
(Chebyshev’s
Inequality)
LetX be a random variable with mean pc and finite variance cr2. Then
P(|X -
।
O’2
> 8) < —
for all 8 > 0.
PROOF: By Markov’s inequality,
p(|x-> 3) = P((x- jt)2 > a2) < £[(X~M)2' =
■
Consider an infinite sequence of Bernoulli trials {X;}J°=1 with probability
of success p and let Sn = Xi + • • • + Xn be the number of successes in n
trials. We know that Sn has a b(--, n,p) density, that E[S„] = np, and that
varS„ = np(l — p). By Chebyshev’s inequality,
y n
a a) = P(|s„ - ,.p| a „a; S
- f(1 f).
J
n232
n8l
By maximizing the function g(p) = p(l-p),0
that the maximum value of g is 1/4. Thus,
P
S„
4n82'
p
1, it is easily seen
(4.6)
Since Sn represents the number of successes in n trials, Sn/n represents the
relative frequency of successes in n trials. Taking the limit as n —> oo,
lim P
Sn
(4.7)
for all 8 > 0;i.e., given a prescribed error 8 > 0, the probability that the relative
frequency Sn/n will differ from p by more than 8 goes to zero as n -> <». This
4.3
PROPERTIES OF EXPECTATION
115
sounds suspiciously like the empirical law for relative frequencies, but it is, in
fact, a mathematical theorem.
Inequality 4.6 can be used to determine how many repetitions of an
experiment are required to pin down the probability of success p when it is
unknown.
EXAMPLE 4.16 Consider an infinite sequence of Bernoulli trials with
probability of success p. How many repetitions are required to be 97 percent
confident that the relative frequency of success Sn/n will be within .05 of p?
That is, how do we choose n so that
P
Sn
> .051 < .03?
By Inequality 4.6, if we choose n so that
----------- < .03,
4n(.O5)2
’
then the above condition will be satisfied. Therefore,
n > 4(.05)2(.03)
and so n can be taken to be 3334. ■
The number n = 3334 in this example is rather large, but it must be
remembered that nothing has been assumed about p. Any preliminary
information about p can reduce the number n by several factors; e.g., if it is
known that p pertains to an event that is relatively uncommon, say p
1/10,
then p(l — p)
9/100 and n can be reduced to 1200.
The fact that lim„_>«>P(|(S„/n) - p| >: 3) = 0 for all 8 > 0 was first
proved by Jacob Bernoulli around 1713. It is a special case of a slightly more
general result.
Theorem 4.3.7
(Weak Law
of Large
Numbers)
Let {X; }°°=1 be a sequence of independent random variables all having the same
density function and finite second moments. If Sn = Xi + - ■ - + X„ and
pt =
1, then
lim P
n—>oo
for all 8 > 0.
s„
n
116
4
EXPECTATION
PROOF: Let a2 = varX;,j > 1.
By Theorem 4.3.3 and Example 4.14,
E[Sn] = n /x andvar Sn = ncr2. Thus, for each 8 > 0,
pf — -fi 2: 8
\ n
= P (|S„ - n/x| £ n8)
varSn
a2
as n —> <». ■
There is a strong law of large numbers that reflects the empirical law more
precisely than the weak law. The strong law is beyond the scope of this book.
EXERCISES 4.3
1.
Suppose two dice are rolled, one red and one white. Let X be the
number of pips on the red die, let Y be the number of pips on the
white die, and let Z be the larger of the two numbers of pips. The joint
density offx,z is given by Equation 3.6. Calculate E[XZ].
2.
Let X and Y be as in Problem 1 and let U = min (X, T). Calculate
E[U].
3.
Let X be a random variable having a geometric density with parameter
p. Use the generating function/x(d to find varX.
4.
Let X be a random variable having generating function/(t) =
Calculate varX.
5.
Let X be a random variable having a negative binomial density with
parameters r andp. Calculate varX.
6.
A manufacturer produces items of which 3 percent are defective.
The manufacturer contracts to sell 10,000 items to a buyer with the
stipulation that if the number of defective items exceeds d units, then
the buyer can claim a full refund. How should d be chosen so that the
manufacturer does not have to give a refund to more than 5 percent of
the buyers?
7.
Consider an infinite sequence of Bernoulli trials with probability of
success p for which it is known that p
1/4. How many trials
are required to be 90 percent confident that the relative frequency of
successes Sn/n will be within .05 of p?
The next three problems pertain to a population of n objects of which
of Type 1, n2 are of Type 2,. . ., ns are of Type s.
8.
are
If Type 1 objects have value Vb Type 2 have value V2,. . . , Type s have
value V,, and V is the value of a random sample of size r < n without
replacement from the population, derive a formula for E [ V].
4.4
COVARIANCE AND CORRELATION
117
9. A commercial fisherman is allowed to net 50 game fish each month from
a lake in which 30 percent of the fish are largemouth bass, 10 percent are
smallmouth bass, 20 percent are white bass, and 40 percent are walleyes.
If the largemouth bass average 2.5 pounds, the smallmouth bass 1.8
pounds, the white bass 1.2 pounds, and the walleye 2.4 pounds, what is
the expected weight of his catch?
10. For j = 1,..., s, let Xj be the number of Type j objects in a random
sample of size r
n without replacement from the population. Use
the auxiliary random variables of Example 4.10 to calculate varXj.
11. Let X be a random variable having a finite second moment. If
/x = E[X] and a2 = varX = 0, show thatX = /z with probability
1.
12. If X is a nonnegative integer-valued random variable, then E[X] =
/x(l) = £x(l)- Assuming that E[X2] is finite, express varX in terms
ofgx13. Calculate the standard deviation err of the waiting time T of Exam­
ple 4.8.
COVARIANCE AND CORRELATION
It was shown in the previous section that the expected value of a sum of random
variables is the sum of the expected values. Is this true of variances? Generally
speaking, it is not true.
A simple inequality will be needed for the next result. If a, b are any real
numbers, then
(a + b)2 < 2(a2 + b2).
This follows from the fact that a2 — 2ab + b2 = (a — b)2 2: 0, so that
2ab
a2 + b2 and (a + b)2 = a2 + 2ab + b2
2(a2 + b2).
Lemma 4.4.1
IfXi}... , Xn are random variables with finite second moments and Ci,...,c„ are
any real numbers, then 22"=j CjXj has a finite second moment.
PROOF: If the random variable X with range {xb xz,...} has finite second mo­
ment and c G R, then cX has finite second moment since ^.j(cxj)2fx(Xj) =
c2mfix2fx(Xj') < +00- Thus, each CjXj has finite second moment and it
can be assumed that q = 1,1 < ; < n. We prove the result for the
n = 2 case first. Since (Xi + X2)2
2(Xf + Xj), by Theorem 4.3.2
E[(Xi+X2)2]
2 (E [X^]+E [Xf ])<+<», and Xi+X2 has finite second mo­
ment. The general case follows from a mathematical induction argument. ■
Consider two random variables X and Y with finite second moments. Since
Mx+r =
= Mx
118
4
EXPECTATION
var(X + T) = E[((X + K) - (mx + Mr))2]
= E[((X-Mx) + (ir-Mr))2]
= E[(X - Mx)2] + B[(K - Mr)2] + 2E[(X -
- Mr)]
= varX + varY + 2E[(X - px)(Y ~ Mr)]The last term will be given a name of its own. But we must first establish that
it is finite.
Theorem 4.4.2
(Schwarz’s
Inequality)
IfX and Y have finite second moments, then
(E[XT])2 < E[X2]E[Y2];
(4.8)
equality holds if and only if P(X = 0) = 1 or P(Y = aX) = 1 for some con­
stanta.
PROOF: Either P(X = 0) = 1 or P(X = 0) < 1. In the first case, equality
holds in Equation 4.8 because both sides are zero. We can therefore assume
that P(X = 0) < 1, which means that X takes on some value xq # 0 with
positive probability, so that E[X2] =
. xffx (*
/) > 0. Define a quadratic
function by the equation
g(A) = E[(K - AX)2] = E[Y2] - 2XE[XY] + A2E[X2].
This function has a minimum value at
Ao
Thus, 0
E[(y-A0X)2]
Ao by E[XV]/E[X2],
_ E[xy]
E[X2] '
E[(T — AX)2] for all real numbers A. Replacing
E[(T - A0X)2] = E[Y2] - 2A0E[Xy] + AqE[X2]
_
2
(E[xy])2 (E[xy])2
- m 1" 2~eW + WT
_ . ,
- £|r 1
(EIXK))2
“eW
and so
0 < E[(y - AoX)2] = E[y2] -
^X!)2
On the one hand, this implies that
(E[XT])2 < E[X2]E[y2];
£[(r - AX)2].
4.4
COVARIANCE AND CORRELATION
119
on the other hand, if there is equality then E[(T — AqX)2] = 0. If Y — AqX
takes on some nonzero value with positive probability, we would have E [(K —
AqX)2] > 0, a contradiction. Thus, P(Y — AqX = 0) = 1. ■
If X and Y have finite second moments, then we know that both E [(X —
fix)2] and E[(y “ Py)2] are finite. Applying Inequality 4.8 to the random
variablesX — fix and Y — fiy,
(E[(X - flX)(Y - fiy)])2 < E[(X - flx)2]E [(y - fly)2] < +00,
and therefore E[(X — fix)(Y ~ Mr)] is defined.
Definition 4.3
IfX and Y have finite second moments, the covariance of X and Y, denoted by
cov (X, K), is defined by
COV(X,y) = E[(X - flX)(Y - fly)].
Alternatively,
cov(x,y) = E[xy]-E[x]E[y]. ■
Note that cov (X, c) = E [(X - fix)(c ~ c)] = 0 whenever c is a constant,
that cov (X,X) = E[X2] - (E[X])2 = varX, and also that cov (X, y) = 0
whenever X and Y are independent, by Theorem 4.3.4. We now return to the
variance of a sum.
Theorem 4.4.3
//X;,..., X„ have finite second moments, then
n
/ n
\
varj^X; j = ^varXj+2
cov(X,-,Xy).
=1
1 < i <j S n
V=i
/
PROOF: Since E[£"= j Xj] = Z- = lfMX,>
"
varMTX;
U=1
= El ^X, - S
[v=l
/
;=1
\2'
)
/
\2
/ n
(£(X
; “ MX, J
V=1
n
= t
2 (X, “ MX, )(^; “ MX;)
i.) = 1
120
4
EXPECTATION
n
n
= '^E[(Xj-tix.)2] +
j=l
n
-MX;)]
E[(Xi
=
___
cov(X;,Xj). ■
varX, + 2
=
_ isicjsn
;=1
Corollary 4.4.4
2Z
IfXi, ...,Xn are independent random variables having finite second moments,
then
(
n
\
n
!LXj]=lL™Xr
;=1
/
j=l
PROOF: Fori # j,Xi and X, are independent and cov (X,-,X;) = 0. ■
There is a more general version of Theorem 4.4.3.
Theorem 4.4.5
Let Xi,... ,Xm, Tb ..., Y„ have finite second moments and let alt...,am,
b\,...,bn be arbitrary real numbers. Then
f m
n
\
m
n
covi^atXi,^bjYj 1= y' ^ajbj cov(X,-» Yj).
V=i
;=i
i = ij = i
/
PROOF: By Theorem 4.3.3,
m
m
= ^a,E[Xi]
E
: =1
i=l
and
n
;=i
Since
m
n
= E xxa-bJx-xJ
.>=!; = !
m
n
= ^^a.bjE{x,Yj],
>=!;=!
4.4
121
COVARIANCE AND CORRELATION
m
(21
n
ai^’’
\/ n
/m
\
21 Yj I = E I 2 i^i II 21
a
m
(i=l
m
\
bj Yj
j
\
\/ n
/\j2>
= l £iy>i /
n
i=l;=l
(m
\ / n
\
>=i
/\j=i
/
m
n
= 'X^ibj(E[XiYj] - E[Xj]E[YJ)
i=!;=!
m
n
= 22fl>bj cov(Xi,Yj). ■
>=!;=!
EXAMPLE 4.17 Consider an experiment in which balls numbered 1,
2,..., n are distributed at random in n boxes so that the total number of
outcomes is n!. Let S„ be the number of matches; i.e., the number of balls in
boxes having the same number. The range of S„ is {0,1,..., n}. Suppose we
want to calculate E[S„] and varSrt. For; = 1,..., n.letX) = 1 or 0 according
to whether the jth ball is in the jth box or not. Then S„ = Xi + • • • + Xn. Since
P(Xj = !) = (« — !)!/»! = l/n,E[Xj] — 1/n. Thus, E[S„J = 1. Since
X? = Xp
varXy = E[X/] - (E[X;])2 = E[XJ - (EIXJ)2 =
=
We now calculate E[XjXt] for j # k. Now XjXk is 1 or 0 according to
whether the jth and fcth balls are in the corresponding boxes or not. Thus,
P(XjXk = 1) = (n -2)!/n! = l/(n(n - 1)), so that for; # k,
cov(XjX
t) = E[X,X
*]
*]
-E[X;]E[X
y
= n^(n
■—— 1)-.
By Theorem 4.4.3,
n
varS„ =21var^;+2 2. cov(X;,Xj).
j=l
4 lSi<;£n
122
4
EXPECTATION
Since all the terms in the second sum are equal to l/(n2(n — 1)) and the
number of terms is the number of ways of selecting two distinct integers i and
j from {1,..., n} without regard to order,
varS„ = n •
n - 1
/n \
1
.
.
1“ 1.
nl
2 ' n2(n — 1)
Therefore, E [S„ ] = landvarS„ = 1. ■
There are good reasons for replacing the random variable X by a centered
and normalized random variable (X — p,x)/crx.
Definition 4.4
IfX and Y are two random variables having finite second moments, the correla­
tion between X and Y, denoted by p(X, Y), is defined by
cov(X,Y)
fYV.
.\
aX
J\
°~Y
/J
Wy
The following result is of interest in its own right and also tells us something
about p(X, Y).
If X and Y are independent random variables with finite second moments,
then p(X, Y) = 0 since cov (X, Y) = 0 in this case. The converse is not true
in general. It is possible for p(X, Y) = 0 without X and Y being independent.
Also, replacing X and Y in p(X, Y) by certain linear functions of X and Y,
respectively, does not change the correlation; i.e.,
p(aX + b,cY + d) = p(X, Y) whenever a > 0, c > 0.
This follows from the fact that var (aX+ b) = E[((aX + b) — (ap,x + b))2] =
E[a2(X - p-x)2] = a2 varX, var (cY + d) = c2varY,and
cov(aX + b,cY + d) = E[((aX + b) - (ap,x +b))((cY + d) - (cp.Y + d))]
= E[ac(X - px)(i" “ Mr)]
= ac cov(X, Y),
so that
p(aX + b,cY + d) = flCcg.v(X>y) = p(X,Y)
aaxcaY
(4.9)
whenever a > 0, c > 0.
Theorem 4.4.6
LetX and Y be random variables with finite second moments, crx > 0, cry > 0.
Then |p(X, Y)|
1 with equality if and only if there are constants a and b such
thatP(Y = aX + b) = 1.
4.4
123
COVARIANCE AND CORRELATION
PROOF: LetX
*
= (X — /xx)/crx» Y
* = (Y — iMy^/cry. By Equation 4.9,
,y
*
)p(X
= p(X, y). Since E[X
*
2] = £[((X - px)/<rx)2] = l/a2
xE[(XMx)2] = 1 and likewise E[y
*
2] = 1, by Inequality 4.8,
p(X, Y)2 = *
,y
)p(X
2 s E[X
*
2]E(y
*
2] = 1.
Thus, |p(X, y)| s 1 with equality if and only if P(Y
*
= aX
)
*
= 1 for some
a £ R, in which case there are constants a and b such that P(Y = aX + b) =
1. ■
EXAMPLE 4.18 Let Xi, X2, and X3 be independent random variables
with cr2
Xi = 2, cr^ = 4, and a2X} = 3, respectively, and consider the problem
of calculating the correlation between the random variables 2Xi — 3X2 + 5X3
andXi + 2X2 — 4X3. By independence, cov (X;, Xj) = 0 whenever i
j. By
Theorem 4.4.5,
cov(2X! - 3X2 + 5X3, X! + 2X2 - 4X3)
= (2)(1)cov(X1,X1) + (2)(2)cov(X1,X2)
+ (2)(-4)cov(X1,X3) + (-3)(1)cov(X2,X1)
+ (~3)(2) cov(X2,X2) + ( —3)(—4) cov(X2,X3)
+ (5)(1)cov(X3,X1) + (5)(2) cov(X3,X2)
+ (5)(-4)cov(X3,X3)
= 2 cov(XbXi) - 6 cov(X2,X2) - 20 cov(X3,X3)
= 2^X] “
= -80
— m°X3
Since the random variables 2Xb — 3X2, and 5X3 are independent,
var (2Xj - 3X2 + 5X3) = var (2XJ + var (-3X2) + var (5X3)
= 4<r^ + 9aX1 + 25a2
Xf
= 119
Similarly,
var(X1 + 2X2 -4X3) = 66.
Therefore,
p(2X] - 3X2 +X3, Xi + 2X2- 4X3) =
-80
7119 766
-.903.
The correlation between X and y measures the linear dependence between
X and y. In the case p(X, Y) = 1 1, there is a linear functional relationship
124
4
EXPECTATION
between X and Y. It is possible for two random variables U and V to be
related functionally in the same way as two random variables X and Y with
wide disparities between the correlations p(U, V) and p(X, Y).
Let X be a random variable that takes on values -1,0,1
with probabilities 1/4, 1/2, 1/4, respectively, and let Y = X2. It is easy to
calculate that p(X, Y) = 0. If we let U = X + 1 and V = U2, then U and V
have the same functional relationship as X and Y. It is also easy to calculate
that p(U, V) = 2 >/2/3 ~ .94 if use is made of the fact that X3 = X and
X4 = X2; for example,
EXAMPLE 4.19
varV = E[V2] - (B[V])2
= E[(X + I)4] — (E[(X + I)2))2
= E[X4] + 4E[X3] + 6E[X2] + 4E[X] + 1
- (E[X2] + 2E[X) + I)2
= E[X2]+4E[X]+6E[X2] + 4E[X] + 1
- (E[X2] + 2E[X] + I)2
Since E[X] = 0 andE[X2] = 1/2, var V = 9/4. Eventhough U and V are
functionally related in the same way as X and Y, one pair has correlation zero
and the other pair has correlation close to 1. ■
EXERCISES 4.4
1.
The joint density function /x,y (x, y) of the random variables X and Y
is tabulated below. Calculate p(X, Y).
y = 1
y = 2
y = 3
y = 4
y = 5
X = -1
i
20
2
20
i
20
0
0
X = 0
0
3
20
2
20
i
20
0
X = 1
i
20
2
20
3
20
0
0
X = 2
0
1
20
1
20
i
20
i
20
2.
Let X and Y be random variables with p(X,Y) = 3/4, var X = 2, and
varY = 1. Calculate var (X + 2 Y).
3.
A bowl contains r red balls and b black balls. An unordered random
sample of size 2 is selected from the bowl. Let X be the number of red
balls and Y the number of black balls in the sample. Calculate p(X, Y)
without using the joint density of X and Y.
4.5
125
CONDITIONAL EXPECTATION
4.
Let Xi, X2, and X3 be independent random variables with *
a
=
4, 0^ = 3, andcr^ = 1. Calculatep(Xi + 2X2 - X3,3Xi - X2 + X3).
5.
A bowl contains three balls numbered 1,2,3. Two balls are successively
selected at random from the bowl without replacement. If X is the
number on the first ball and Y the number on the second ball, calculate
p(x,y).
6.
7.
Suppose n distinguishable balls are randomly distributed into r boxes.
If Sr is the number of empty boxes, Sr = Xj + • • • + Xr where X, is 1
or 0 according to whether box i is empty or not, 1
i < r.
(a) Calculate E[X,].
(b)
Calculate E [X, X; ], i * j.
(c)
Calculate E[Sr].
(d)
Calculate var Sr.
Consider a basic experiment with r outcomes 1, 2,..., r having prob­
abilities pi, p2,..., pr> respectively, and consider n independent repeti­
tions of this basic experiment. For i = 1,2,..., r, let Yj be the number
of trials resulting in the outcome i. Writing Y, = 7,,i +1^2 + • • • + Iiin
where 7,j = 1 or 0 according to whether the jth trial results in i or not,
(a) Calculate £[7,^7^] for i ^j,k =
€> \ s k, € < n.
(b)
Calculate E[7j,*7;/] for i
(c)
Calculate E [ Y,- ] and E [ Y,- Yj ], i # j.
(d)
Calculate var Yj.
(e)
Calculate p(Y,, Yj), i
j,k
j.
8.
Let X and Y be two random variables that take on only two values each.
If cov (X, Y) = 0, show that X and Y are independent.
9.
Let X and Y be random variables with finite second moments. The
linear function aX + b of X is called the best mean square linear predictor
ofYif
E[(Y - aX - fe)2] =£ E[(Y - cX - d)2]
for all real numbers c, d E.R. Calculate a and b.
CONDITIONAL EXPECTATION
We have seen in some instances that conditional probabilities can be used to
simplify computations and, in fact, some probability models are defined in
terms of conditional probabilities. We will look at this concept in the context
of random variables.
126
4
EXPECTATION
LetX and Y be two random variables with ranges {x), x2,...} and {yb y2, • ■ •}»
respectively. If P(X = Xj) > 0, then P(Y = yJX = Xj) is defined, and we
will let/y|x (yk |x;-) denote this conditional probability. Thus,
JX(Xj)
When P(X = x;) = fx(xj) = 0, the above quotient is undefined, and we
define/y|x(/it|x7 ) = 0 whenever fx(xj) = 0. The function/y|x(yt|x;) of the
two variables Xj,yk is called the conditional density of Y given X = Xj. The Xj
variable is usually thought of as a parameter. It follows from the definition that
fx,Y(Xj>yk) = fylx(yk\xj)fx(Xj).
EXAMPLE 4.20
(4.10)
A bowl contains chips numbered from 1 to 10. A chip
is selected at random from the bowl. If the chip selected is numbered x,
1 < x
10, then a second chip is selected at random from the chips
numbered 1,2,... ,x. This is an experiment for which the probability model
is defined in terms of conditional densities. Let X be the number on the first
chip and Y the number on the second chip. Then
forx = 1,2,..., 10
otherwise.
1/10
0
The remainder of the description of the experiment specifies /y|x(y|x). For
x = 1,2,..., 10,
)
*
A|x(/I
1/x
0
=
fory = 1,2,... ,x
otherwise.
Thus,
fx.y(x,y) =
1/1 Ox
0
for 1 < y < x, x = 1,2,..., 10
otherwise. ■
Conditional probabilities can also be defined for collections of random
variables. In what follows, Xj will denote a typical value of Xj and y a typical
value of Y.
Definition 4.5
If Y,Xi,X2>. .., X,n are random variables, the conditional density of Y given
X\, X2,..., Xm is the function
fy\x.... xm(y|
i>*
*„>)
••>
fxt,...,xm (^1> • • • >xm)
whenever the denominator is differentfrom zero and is equal to zero otherwise. ■
4.5
127
CONDITIONAL EXPECTATION
The conditional density satisfies the following equation:
fx.... xm,Y(xi,...,xm,y')
= fy[x.... xm (y\xlt
Theorem 4.5.1
(4.11)
)fx....... (xi, ...,xm).
If the random variable Y is independent of the collection of random variables
{Xi,...,Xm} (i.e., fx.... xm,y(xi>.. .,xm,y) = /xl,...,xm (xb.. .,xm)/y(y)),
then
fy\x... ,xn(/l
b
*
whenever fx
Xm (*
b
«)
*
•.•»
= /?(/)
• • • >xm) > 0.
PROOF: The result follows directly from the definition of the conditional
density. ■
EXAMPLE 4.21 Consider an infinite sequence of Bernoulli random vari­
ables {Xj}f=! with probability of success p. Fixing n > 1,
fx„\X
1*, • • • , Xn -1) = fx„ (xn )
whenever fx xn-,(
*i>
••
> 0. This follows from the fact that the
Xj,... ,X„ are independent random variables, so that
fxl....,xn(xi,...,x„') = fx,(xi) X • • • x/Xn(x„)
= fx
Xn-](xb • • • >Xn — i)fxn(Xn)>
and therefore X„ is independent of the collection {Xb ..., X„-j}. ■
Let {yi>y2>- ••} be the range of the random variable Y. For any values
xi,...,x„ of Xb ..., X„, respectively, such that fx„...,x„ (*
b • • •»x„) > 0, the
conditional density fy[X.... x„(yl
*i>
• • • >x„) is a density function as a function
of y since
fy\x.... x„(>%lxb • • ->xn) = P(Y = yt[Xi = xb ...,X„ = x„),
the conditional probabilities on the right are nonnegative, and the union of the
disjoint events (T = y* ) is all of O, so that
'^'1fY\Xi,...,x„(yk\xi>. ..,xn) —
P(Y = yJXi = Xi,.. .,X„ = x„)
yt
yt
= p[|J(y = n)l
i
*
=x1,...,x„ =X„)
\n
/
= P(ft|X! = xb...,X„ = x„)
= 1
.
128
Definition 4.6
4 EXPECTATION
Let {Y, Xb ... ,Xn} be a collection of random variables with E[ Y] finite. The
conditional expectation of Y given Xi = xb ...,X„ = x„ is defined by
E[y|X! = xi,...,X„ =
= 2^y/y|x„...,xn(y;|xi,...,x„)
whenever fx... ,x„ (*i> • • ■ > xn) > 0 and is defined arbitrarily when
fx.... x„(
i.
*
*».)
•••»
= 0. ■
The definition of E [Y] required that the defining series be absolutely
convergent, but no mention is made of absolute convergence of the series above
defining E[Y|Xi = Xi,...,X„ = x„]. The absolute convergence is inherent in
the requirement that E[Y] be finite; i.e., that E[|Y|] < +<». By Theorem 3.4.3,
+00 > 22 hlM/p
Xi
= 22W 52 fx.... xn,y(^i.--..^,y;)
yj
xi,...,xn
= EE Wx.... xn.Y(xl,...,xn,yj)
Yj X^.^Xn
= 22 22 Ww.... Xn**
b--*
(yl
n)fx
I,...,Xn(*
b--->Xn)
y, xlt...,x„
=
22^1/m.... X„(/I
b*
• -,X„) j/x.... ,X„(X1,...,X„).
\ y,
/
Thus, for any term with fx.... x„(
b
*
• • ->xn) > 0, the series within the par­
entheses converges, and so the series defining E[Y|Xi = xi,...,Xn = x„]
converges absolutely.
EXAMPLE 4.22 Consider the random variables X and Y of Example 4.20.
Suppose x E {1, 2,..., 10}. By Exercise 1.2.6,
£(y|X = x] = SWO'lx) = ±y'y=l
y=i
X
= 41
=
x
a
z
for x = 1,2,..., 10. ■
We will now consider operational properties of the conditional expectation.
Theorem 4.5.2
If Y, Xi,..., X„ are any random variables with E [ Y] finite, then
E[Z) - 2 E(r|X, = ............
xn
= x,}fx.... x.(x,......... x,).
4.5
129
CONDITIONAL EXPECTATION
PROOF: By definition of the conditional expectation,
52 £[T|Xi =X\,...,X„ = x„]fx.... x„(
*i»
=
51
• • ->xn)
22^|x.... xn(yith,...,x„)/x.....
*1... x„ yt
= 5. ^.ykfx.... x„,y(xi, .. .,xn,yk)
x..... >X„ Yk
= 22 22 yrfx.... x„.Y(x1,...,x„,yk')
yt X|,...x„
=
= Ein
n
The interchange of order of summation is justifiable by absolute convergence
and Theorem 3.4.4. ■
EXAMPLE 4.23 Consider the random variables X and Y of Example 4.22.
The expected value of Y is given by E[Y] = S]°=i B[y|X = xlfx(x) =
*
Xl°=i((
+l)/2)(l/10) = l/20Ei°=1(* + 1) = 3.25. ■
Theorem 4.5.3
Let Y he a random variable with finite expectation that is independent of
Xi,...,X„.Then
ElTlX! = x1,...,X„ =x„] = E[T]
whenever fx.... .. (xb ..., xn) > 0.
PROOF: Suppose fx... .
i,
(*
... , xn) >0. By Theorem 4.5.1,
E[y|X! =xi,...,x„ = x„] = 5.n/yix.... *
i>---.
x„(nl
n
)
n
=
= E^- ■
yt
We mention in passing that if {Vi,..., Ym} and {Xi,...,X„} are two
collections of random variables and i/
*(Yi, ..., Ym) has finite expectation, then
E[iA(Yi,...,ym)!Xl = xu...,X„ = x„]
= 22 •/'(/!’•
.... Xmix..... .. ••>
*")•
yi.-./m
The proof of this result again involves rearranging the terms of an infinite
series. Using this result, properties of conditional expected value analogous to
130
4
EXPECTATION
those of expected value can be proved; e.g., the conditional expectation of a
sum of random variables is equal to the sum of the conditional expectations.
In the remainder of this section, we will deal with the expected duration of
play for the gambler’s ruin problem. Consider an infinite sequence of Bernoulli
random variables {Xj}
*
=1 with probability of success p and the associated
gambler’s ruin problem. We will use the notation of Section 3.5 where we
were able to calculate the probabilities of eventual ruin given in Equations 3.14
and 3.15. We will now consider how long the play will last. If the gambler’s
initial capital is x, let Tx = n if play terminates on the nth play (i.e., either
the gambler or his adversary is ruined on the nth play) and let Dx = E[TX]
be the expected duration of play. Suppose 1 < x < a — 1. If the gambler
wins one unit on the first play (with probability p), then his capital becomes
x + 1, and the subsequent expected duration of play is Dx+l; if he loses one
unit on the first play (with probability q), then his capital becomes x — 1, and
the subsequent expected duration of play is Dx-i. Since one play has already
taken place and these two possibilities are mutually exclusive,
Dx = p(Dx+i + 1) + q(Dx-i + 1)
if 1 < x < a - 1.
If x = 1 and the gambler loses on the first play, then his subsequent expected
duration is zero; since one play has already taken place, the second term in this
equation becomes just q. Similarly, the first term becomes just p if x = a — 1
and he wins on the first play. This means that this equation holds for x = 1
and x = a — 1 provided we put Dq = Da = 0. Therefore, the expected
duration Dx satisfies the difference equation
Dx = pDx+i +qDx-{ + 1
ifl<x<a-l.
(4.12)
subject to the boundary conditions
Do = 0, Da = 0.
(4.13)
It should be emphasized that the derivation of the difference equation and
boundary conditions is heuristic and not mathematical. Were it not for the
constant term in Equation 4.12, we could solve this problem as in Section 3.5.
The procedure for solving this problem is as follows. Note that if ux satisfies
the equation
Mx = pWx+1 + qux-i
for 1 £ x s a - 1
(4.14)
and vxpj satisfies the equation
vxp} = pvxP\ + qv{xp_\ + 1
for 1 < x < a - 1,
(4.15)
4.5
131
CONDITIONAL EXPECTATION
then ux +
satisfies Equation 4.12. Since Equation 4.14 is the same as
Equation 3.12 in Section 3.5, we can use the results of Section 3.5 to solve
Equation 4.14 depending upon whether p tA q or p = q. In the p
q case,
ux — A + B
4
J>
where A and B are arbitrary constants, and in the p = q case
ux = A + Bx.
Since there are two arbitrary constants A and B in these solutions, it suffices
to find some vxp\ called a particular solution, of Equation 4.15. In the p
q
case, we can take
V(P) = _L_
(i~P
X
and in the p = q case,
= —x2.
We can therefore find a solution to Equation 4.12 in the p
q case of the form
\x
(
-I ,
1 < x < a - 1
PJ
and in thep = q case of the form
Dx = —x2 + A + Bx,
1 < x < a — 1.
Choosing A and B to satisfy the boundary conditions 4.13, in the p # q case,
and in the p — q case,
Dx = x(a — x),
1 < x
a — 1.
(4.17)
Against an infinitely rich adversary in the unfair case q > p,lima_mDx =
x/(q — p); in the fair case q = p, limfl _♦«> Dx = +<».
EXAMPLE 4.24 Suppose a gambler and his adversary each have $100
and $1 is wagered each time in a fair game. The expected duration is then
D100 = 100(200 - 100) = 10,000. If one-half dollar is wagered each time,
132
4
expectation
then this has the effect of doubling the units, and the expected duration is then
D2oo = 200(400 — 200) = 40,000. We saw in Section 3.5 that doubling the
number of units has no effect on the probability of eventual ruin in the fair
case; doubling the units by wagering one-half unit on each play only prolongs
the agony. ■
• „
The heuristic argument used to derive the difference equation 4.12 and
the boundary conditions 4.13 should not be confused with a mathematical
derivation. Although a proper mathematical argument can be made, the details
are too tedious at this stage.
EXERCISES 4.5
1.
An experiment consists of selecting an integer X at random from
{1,2,..., 100} and then selecting an integer Y at random from
{1,2,..., X}. Calculate E [y] and var Y.
2. An experiment consists of selecting an integer X at random from
{0,1,..., 100} and then selecting an integer Y at random from
{0,1,.. .,X}. Use the results of Example 4.1 and Example 4.12 to
identify E[K|X = x] and E(y2|X = x] for x = 0,1,..., 100 and
then calculate E [ Y ] and var Y.
3. Let Xi and X2 be independent random variables with Poisson densities
p(-; Ai) andp(-; A2), respectively. If n is a positive integer, show that the
conditional density of Xi given that X] +X2 = n is a binomial density
with parameters n andp = Aj/(Ai + A2).
4. If X and Y are independent random variables with binomial densities
b(-; m,p) and b(-; n,p), respectively, calculate E[X|X + Y = z],z =
0,1,..., m + n.
5. A number P is selected from the set {1/10,2/10, • • •, 9/10} according
to a uniform density. Given that P = j/10, a number X is selected
from {1,2,..., 100} according to a binomial density with parameters
n = 100 andp = j/10. Calculate E[X].
6. Let {Xj} be a sequence of random variables having finite expectations
and let N be a nonnegative integer-valued random variable that is
independent of each Xj. Show that
E[Xn|N = n] = E[X„]
whenever/n(m) > 0.
7. Let {X;-} be a sequence of independent random variables having the
same mean p and finite variance cr2, let So = 0, and let Sn =
Xj + • • • + X„, n
1. Also let N be a nonnegative integer-valued ran­
dom variable having finite mean and finite variance that is independent
of the Xj. Use the result of the previous problem to show that
E[Sn] = pE[N]
and
var$N = <r2E[N] + p2 varN.
4.6
ENTROPY
133
8. If c is a constant and X is any random variable, show that E[c|X =
x] = c whenever fx(x) > 0.
9. Let X and Y be random variables with Y having a finite second mo­
ment, and let </>(X) be a real-valued function of X having finite
second moment. Show that E[0(X)T|X = x] = 0(x)E[T|X = x]
whenever /x(x) > 0.
10. Calculate the expected duration of play Dx for the modified gambler’s
ruin problem described in Exercise 3.5.4.
ENTROPY
In 1948, in a fundamental paper on the transmission of information (see the
Supplemental Reading List at the end of the chapter), C. E. Shannon proposed a
measure to quantify the uncertainty of an event. The basic idea of his measure
is that frequently occurring events convey less information than infrequently
occurring events. For example, the frequently occurring letter E in an English
message conveys less information than the infrequently occurring Q, X, or Z.
The two words uncertainty and information are used repeatedly in what
follows, and it is necessary to have some understanding of the relationship
between the two. For example, consider a random variable X that takes on the
values 1,..., 6 with equal probabilities. Initially, there is uncertainty about the
value of X. But if X is observed and we are told that the value of X is 3 or 4,
then there is a decrease in uncertainty and an increase in information.
Consider an event A with probability p. A measure of the uncertainty of A
should be some nonnegative monotone decreasing function /(p) of p so that
Z(p) is large when p is small. Moreover, if Ai and Az are independent events
with probabilities pi and p2, respectively, then the uncertainty of Ai Cl Az is
/(pipa). If it becomes known that Az has occurred, the uncertainty I(pipz)
should be decreased by I (pz}, and we should be left with the uncertainty I (pi);
i.e.,/(pip2) - I(p2) = I(pi) or
Kprpz) = f(pi)+/(p2)-
(4.18)
This additive property of uncertainty for independent events is an assumption
on our part. While we are at it, we might as well assume that /(p) is a continu­
ous function of p. The property expressed by Equation 4.18 is reminiscent of
the log function, and it should not come as a surprise that I(p) can be shown
to be the log function except possibly for a multiplicative factor. The function
I(p) = ~logp = logp
0<p<l
satisfies all of the above requirements. But Z(p) has not been completely
determined, because there is more than one choice for the base of the log
134
4
EXPECTATION
function. In communication theory, the base 2 is used because an on-off
relay records one unit of information, called a bit. In this section, it will be
understood that the log function is to the base 2. If an event A has probability
1/2, then/(1/2) = - log 1/2 = 1 bit.
Having defined the uncertainty,of a single event, we now define the
uncertainty associated with a random variable as the average uncertainty of the
events (X = x).
Definition 4.7
LetX be a random variable with range {xi, X2>...}. The entropy or uncertainty
ofX is the quantity
H(X) = -^fxWlogfxM =
wherefxW log fxM = 0 wheneverfxM = 0. ■
It must be emphasized that H (X) is determined by the values of the density
function and only indirectly by the random variable X.
EXAMPLE 4.25 Let X be a random variable taking on the values —1,0,
and 1 with probabilities 1/4, 1/2, and 1/4, respectively, and let Y be a random
variable taking on the values 0, e, and it with probabilities 1/2, 1/4, and 1/4,
respectively. Then
1 1
1
1 , 1
1,1
3
H(X) = — - log - — - log - — - log - = 4 &4 2 &2
4 °4
2
and
1
*i
1
h
1
3
-
H(y) = --log-- jlog J - jlog- = -. ■
It is apparent from this example that the entropy of a random variable is totally
unrelated to the meaning of the random variable and is determined solely by
the values of its density function. This situation would be better portrayed if
the notation H^f), where f is a density function, were used instead of H(X).
H(X) is the expected value of a function of X in a rather complicated way;
namely, H(X) = E[-log/x(X)] = - Sj^x,) log/x(x,-).
Consider a typical term of the form h(p) = -p log p in the definition of
H(X). The function h is continuous on (0,1]. If we define h(p) to be zero
when p = 0, then h is also continuous at 0 since limp_0+(—p log p) = 0 by
I’Hopital’s rule. The graph of h(p) is depicted in Figure 4.1.
Since the terms in the series defining H (X) are nonnegative, H (X) is defined
even if the series diverges to +<». It is possible for H(X) to be infinite.
4.6
135
ENTROPY
EXAMPLE 4.26
Consider a random variable X having density function
c
/x(«) = ——— >
n log n
n=2,3,...
where c is chosen so that
=2/x(«) = 1. The integral test for infinite series
can be used to show that the series
=2 l/(« log n) diverges to +» and that
the series
=21/(« log2 n) converges. The entropy of X is then
hot = -x-A- log (-A-)
n log h
\n log n )
= V ——— (-logc + logn +21oglogn)
7^1 n log n
= c y' (
logc + 1
+ 2 log log n\
~2 \ n log2 n
n log n
n log2 n /
Since the sum of the first terms converges, the sum of the second terms diverges
to +<», and the sum of the third terms is nonnegative, the series defining H (X)
diverges to +oo. Thus, H (X) = +oo.
EXAMPLE 4.27
Let X be a random variable having a uniform density on
{1,2,..., n}. Then
H(X) = - y-log- = log n bits. ■
t—' n
n
The following lemma will be needed to establish an important property of
H(X).
P
FIGURE 4.1 Graph of -p log p.
136
4
Lemma 4.6.1
EXPECTATION
Inx
x — 1 for allx > 0 with equality holding if and only ifx = 1.
This result is proved by showing that the line y ~ x — 1 is tangent to the
curve y = Inx when x = 1 and that the graph of the latter lies below the
tangent line since y — Inx is concave.downward.
The assumption that the random variables of the following theorem have
the same range is not essential because their ranges can be replaced by the
union of their ranges insofar as densities are concerned.
Theorem 4.6.2
(Gibbs’
Inequality)
Let X and Y be discrete random variables having the same range such that
fx(z) = 0 if and only iffy(z) = 0. Then
(4.19)
2&(^)log^ s 0
with equality holding if and only iffx(zf) = fy (zj) for all j-
PROOF: We need only consider those zj for whichfx(zj) > 0. By Lemma 4.6.1,
j
jx(Zj)
\jx(Zj)
j
}
= XW }
}
= 1-1 = 0
(4.20)
Since log a = (In a)/(In 2), multiplying both sides of this inequality by 1/ In 2
we obtain Inequality 4.19. If/x(z>) — fy(zf) for all j, then the left side of
Inequality 4.19 is zero and there is equality therein. Assume now that there is
equality in Inequality 4.19. Multiplying both sides by In 2,
■ •
Thus, the left member of Inequality 4.20 is zero, and therefore
j
-1)
\\JX(Zj)
)
lnf775V°
fx(Zj)J
(4>21)
Since the terms of the sum are nonnegative, they must all be zero; i.e.,
inZkli2 = foW
fx(Zj)
fx(Zj)
and fx(Zj) = fy(zj) by Lemma 4.6.1. ■
4.6
ENTROPY
137
EXAMPLE 4.28 Consider all random variables X that take on exactly n
values X\,... ,xn with positive probabilities. Let Y be a random variable such
that P(Y = Xj) = 1/n.i = l,...,n. ThenH(X) < H(K) = logn for all
such X; i.e., the entropy is a maximum when the density is uniform. This
follows from Gibbs’ inequality, since
<.
n
so that
H(X) = y>./x(x,)log—— < -;>>(
)
*
: — i Jx (X:)
. .
n
log - = logn = H(Y)
by Example 4.27. ■
If a random variable X takes on values including —1 and 1, thenX2 has the
effect of lumping — 1 and 1 together. Such lumping reduces entropy. Upon
observing X2, there is a loss of information gained as compared to observ­
ing X.
Theorem 4.6.3
IfX is a discrete random variable and <f> is any real-valued function on the range
ofX, then H(<p(X}) < H(X).
PROOF: Suppose first that {xi,,x,j,...} is a subset of the range of X, finite
or infinite. For each j
1, let pj = /x(x;.). Consider first the finite case
fc,,..., x,t}. Since pi < T.kj = lpj,
k
\
y'.pi)
(
;=i
i = i,
/
Adding corresponding members of these k inequalities,
i -1
\i = 1
/
\i = 1
If the sequence {x;.} is infinite, since plogp is continuous on [0,1] we can let
k
oo in this inequality to obtain
^P.logP.
Ep> log E^'
4
138
EXPECTATION
in the finite or infinite case. Therefore,
-
\ i
Kg E>
)- \
i
/
/
i
lo&
Now let {zi, z2,...} be the range of Z = <p(X). Then
H(Z) = H(<£(X)) = -51/z(Z;)log/z(Z;).
}
Consider a fixed z} and let {x;j, Xji2,...} be the set of values of X such that
Zj = <b(xj,k). Since the probabilities of the Xj,\, Xji2,... are lumped together to
produce fz(zj), by the above result
-/zCzjOlog/zCzj) == -^fx(Xj,k) log fX(Xj,k).
k
Summing over j,
H(Z) < X, ^.{-fx (Xj,k) log fx (Xj,k )}•
j k
Since the terms in the iterated sums on the right are nonnegative, Theorem 3.4.3
can be applied to obtain
H(Z) < -^fxW\o&fxW = H(Xf ■
Since the definition of uncertainty applies to any discrete density, it makes
sense to discuss the joint uncertainty of two random variables.
Definition 4.8
IfX and Y are discrete random variables, the joint uncertainty or joint entropy
is defined by
H(X,Y) = -^fx.y^.y^fx.Y^.yj). ■
'J
Theorem 4.6.4
IfX and Y are discrete random variables, then H(X,Y)
equality holding if and only ifX and Y are independent.
H(X) + H(Y) with
PROOF: Since
H(X) = ~^fxM log fxM = -^^fx.YtXi.yjyiogfxtXi)
'
'
j
4.6
ENTROPY
139
and
H(K) = -2Z/y(y;)log/y(y;) = - y y
* fx,Y(x,-,yf) log fY(),
i
}
’
by Theorem 3.4.3,
H(X) + H(Y) = - y y./x,Y(xi»Yi)(log fx(xi) + log fy(yiY)
= “ y 2Z/x. y (Xi > Yj) Jog fx (Xi }fY (yj).
> ;
By Gibbs’ inequality,
i
m
fx(Xi)fY(yj)
> . > ./x,y(^»/>)log -7—.------ vij
fx.y{xi,yj}
0,
from which it follows that
H{X,Y} = -yy/x.Yte^yOlog/x.Y^.y;)
' j
- - y y fx.y (Xi, yj) log fx (Xi }fY (yj)
' j
= H(X)+H(Y).
There is equality in this application of Gibbs’ inequality if and only if
fx,y(x;,yj) = fx(xi)fY(yj) for all i and j; i.e., if and only if X and Y are
independent.
Since the concept of uncertainty applies to any discrete density function, we
can define conditional uncertainty.
Definition 4.9
1. Let X and Y be discrete random variables. The conditional uncertainty or
conditional entropy ofY given thatX = x is defined by
H(T|X = x) = -y/y|x(y;^)log/y|x(yjl^)i
2. The conditional uncertainty or conditional entropy ofY given X is defined by
H(Y\X) = y/x(x;)H(y|X = x,);
i.e., H(Y |X) is the weighted average of the H (y|X = x,). ■
140
4
EXPECTATION
Note that
H(y|X) = -^/x(x,)2^/y|x(y;k)log/y|x(z,k)
>■
}
) log -frix (yj I* )•
= - 22 fa
Theorem 4.6.5
IfX and Y are discrete random variables, then
H(X, Y) = H(X) + H(y|X) = H(y) + H(X|y).
PROOF:
H(X,y) = -22/x,y(^»/>)log/x,y(^»/>)
'•}
= - 22-^lx <Yj !*'■ )/x (Xi ) log fy\x ( Yj l*i )/x (Xj )
'J
= - ^fy\x (Yj I* )/x (Xi) log fY\x (Yj I* )
’•}
“ 22/y|x (Yj I* )/x (Xi) log fx (x,)
ij
= -^JxMY\X = Xj) - 52fx(Xi)logfxte)
i
i
= H(y|X) + H(X). ■
We will now examine a procedure for selecting a density function called
the maximum entropy principle. Consider a random variable X that takes
on values 1,2,..., 6 with unknown probabilities. What is known is that
E [X] = 9/2 rather than the 7/2 it would be if X took on the six values with
equal probabilities. Fori = 1,..., 6, let pi = P(X = i). Can we choose
pi,... ,p6 so that E [X] =
Jpi = 9/2, and how do we choose them? We
might see if we can choose the pi,... ,p6 to maximize the entropy
6
H(X) = -5>!og Pi>
i=1
subject to the conditions that
6
22P- = 1
i=l
(4.22)
4.6
ENTROPY
141
and
A •
9
2_, fp' = ?•
i=l
(4.23)
2
Recall from the calculus that this is a maximization problem subject to two
constraints, which can be dealt with by the method of Lagrange multipliers.
Let
/6
Upi,...,p6) = H(X) - AI
\i = i
\
/ 6
Q
- 1 I- pA^Tipi - /
\i = i
2
Setting (d/dpj)L = 0, i = 1,
(?
1
— (p, log —) - A -/xi = 0,
dpi
pi
i = 1, ...,6
— 1 — log pi — A — pi =0,
i = 1,... 6.
or
Thus, log pi = — (1 + A + pi) or
p,-.= e-(i+A+M«)
Let x = e-Mandy = e^1+A). Then p, = x'/y and Equations 4.22 and 4.23
become
:=i
and
• i
9
tx = -y.
It follows from the first of these two equations that x cannot be zero. Therefore,
Dividing byx,
4
142
EXPECTATION
Writing out the terms of this equation and clearing of fractions,
3x5+x4~x3-3x2-5x-7 = 0.
This equation has only one positive root x ~ 1.449254.
y ~ 26.663653, and using the equation p, = x'/y,
It follows that
p, = .05435
p2 = .07877
p3 = .11416
p4 =.16544
p5 = .23977
p6 = .34749.
EXERCISES 4.6
1.
Calculate the entropy of a random variable X having density/(l) = 1/2,
f(2) = 1/4,/(3) = 1/8,/(4) = 1/16,/(5) = 1/16.
2. Let X be a random variable having a geometric density /x(x) =
,
(1/2)
*
x = 1,2, ... . Calculate the entropy H(X). How much infor­
mation is gained upon observing that X = 3?
3.
Consider a random variable X that has a uniform density on {1,2,...,
2m}. If successive pairs of integers are lumped together (i.e., 1 and 2,
3 and 4, etc., are lumped together), by how much is the uncertainty
decreased?
4. Let X be selected at random from the set of integers {1,2,..., n }. Given
that X = x, Y is then selected at random from the set of integers
{1, 2,..., x}. Calculate H(X, Y) without using the joint density of X
and y.
5.
If X is the score on tossing two dice, calculate H(X).
6. If a card is drawn at random from a deck of 52 cards and a king of
diamonds is observed, how much information has been gained?
7.
If a card is drawn at random from a deck of 52 cards and you are told
that a king has been observed, how much information has been gained?
8.
If X and Y are random variables, use Lemma 4.6.1 to show that
H(X|K) < H(X)
by calculating H(X|K) - H(X).
9.
Consider a random variable X that takes on the values 1, 2,..., 6. Given
that E [X] = 4, determine the density of X that maximizes H(X).
4.6
143
ENTROPY
SUPPLEMENTAL READING LIST
1.
2.
T. M. Apostol (1957). Mathematical Analysis. Reading, Mass.: Addison-Wesley.
C. E. Shannon (1948). A Mathematical Theory of Communication. Monograph
B-1598. Bell System Technical Journal.
p
STOCHASTIC PROCESSES
INTRODUCTION
The topics discussed in this chapter have been selected not only to illustrate
the concepts introduced in the previous chapters but also to expose the reader
to the breadth and depth of applications of probability theory. In this elemen­
tary treatment, we will only scratch the surface of these topics. Topics within
this chapter are independent, and subsequent chapters are independent of the
topics of this chapter.
We first take up a model for randomly evolving processes having the
property that probability statements about future developments given the past
history depend only upon the immediate past and not the remote past. The
section on random walks was chosen primarily because the topic involves more
applications of generating functions and difference equations. After random
walks comes a section on branching processes, which were developed as a model
for survival of family names and nuclear chain reactions. Because some of
the great successes of probability theory have to do with prediction theory and
communication theory in general, the chapter concludes with an application
to prediction theory.
Each of these topics could be expanded to book length and has been. Having
learned some of the techniques for dealing with such topics, the reader can
pursue them in greater depth in the book by Karlin and Taylor listed in the
Supplemental Readings at the end of the chapter. More substantial applications
to engineering can be found in the book by Helstrom. The section on prediction
theory just barely scratches the surface of this subject. An excellent additional
source is the book by Kendall and Ord.
144
5.2
145
MARKOV CHAINS
MARKOV CHAINS
Several of the examples discussed in the previous chapters share a common
structure that will be elaborated upon in this section.
Consider a countable set S = {sj, si,...}, finite or infinite, called a state
space and consisting of objects called states. Since we can encode the states
by giving Sj the label j, we can assume that S = {1,2,..., N} for some N or
S = {1,2,...}.
Definition 5.1
A sequence of random variables {Xn}™=ois called a Markov chain iffor all n
andj0,ji, ...,j„ E. S,
P(Xn
Jnl-^0
j0> ■ ■ ■ ,X„- i
1
jn — i)
= P(X„ = j„ |X„ —i = jn — 1). ■
(5.1)
The significance of a Markov chain lies in the fact that if (X„ = J„) is
a future event, then the conditional probability of this event given the past
history (Xo = jo,.. .,X„-i = j„-i) depends only upon the immediate past
(X„-i = j„-i) and not upon the remote past (Xo = jo,... ,X„-2 = j„-2).
Let {X„}„ =0 be a Markov chain. If Xn = j, vte say that the chain is in the
state) at time n. The probabilities
p”;1’" = p(xn = j|x„_1 = i),
nsij.jes
are called one-step transition probabilities and depend upon the time that a
transition from i to j takes place. If P(X„ = j |X„_] = i) is defined and is
independent of n, the probabilities
pid = P(Xn = j\X„-y = i),
nsl,i,j6S
are called stationary transition probabilities’, if the conditional probability is not
defined, we put pij = 0. The numbers pi,j can be displayed in matrix form:
pl.l
Pl,2
P2,l
p2,2
•
pi,I
Pi,2
If |S | = N, this is an N X N matrix; if S is infinite, there are an infinite number
of rows and columns. The matrix is customarily symbolized by P = [p,,; ].
The f th row of P is the conditional density of X„ given that X„ -i = i. Clearly,
146
5
STOCHASTIC PROCESSES
each pi,j 2: 0. Since the union of the disjoint events (X„ = j), j — 1,2,..
is O,
= ’) = WIX" = ’) = L
i
i
•
A matrix P = [ p, j ] with the last two properties is called a stochastic matrix.
The density of Xo is denoted by tt0—i.e., 7r0(j') = /x0(j),j £ S—and is
called the initial density. Suppose n 2: 1 andjo.ji,.. .,jn £ S. If the Markov
chain {X„}“_0 has stationary transition probabilities, then
P (Xo = >...,X„ = j„)
= P(Xn = jn | Xo = jo, . . ■ , X„-i = Jn-l)P(Xo = Jo> • • • > %n -1 = Jn-1)
~ P(Xn = jn | Xn - j = jn- ] )P(Xq = Jo> • • • >X„-1
= Pjn-i>jnP(X>i-l
jn — l}
=Jn-l|Xo = Jo» • • • > Xn-2 = jn-i)
XP(X0 = Jo, ...,X„_2 = Jn-2)
Pjn-l.jnPin-l.jn-l X ' ' ' X Pj0.jtP (Xq
= 7ro(jo)pjo,ji X ‘ ‘ ‘
— Jo)
Pj„(5.2)
It should be noted that if at some stage the given event has zero probability,
then the final result is still true since both sides are then zero.
The last equation provides the means for constructing Markov chains on
a state space S = {1,2,...}. Given a stochastic matrix P = [p,(J and a
density function 7r0 on S, a probability space (O, S', 9“) and random variables
{X„}"=0 can be constructed so that the probabilities P(Xo = jo,. ■ ■ >Xn = jn)
are defined by Equation 5.2.
Equation 5.2 can be used to reformulate the definition of a Markov chain.
If 1 < m < n andjo.ji,...,j„ £ S, then
P (Xm +1 = Jni + 1> • • • > X„ = jn | Xq = Jo, • • • > Xm — jm)
P(Xm + l
jm + l> • • • >Xn = jn I Xm = jm)>
(5.3)
i.e., the probability of any future event given the past depends only upon
the immediate past (X,„ = j,„) and not upon the remote past (Xo =
jo,... ,X,„-| = jm-i). To see this, first consider the left side of the equation.
By Equation 5.2,
P (-^Gn + l ~ jm + l> ■ • • >
= jn |Xq = Jo, ... , Xm = jm )
= ^MPm, X • • • X pjm)Jmti X »• • x
Koljolpio.}, X • • • X P;m_1Jm
= Pjm.jmtl X • • • X pj„_,Jn.
MARKOV CHAINS
5.2
147
Since
P (,Xm — jm,..., X„ — j„)
zEl 7ro(jo)pji,j2 X • • • X
— (
X ••• X
and
P(^m
jm) -
'rro(jo)pjt,jz X • • • Xp7m_(jm,
it follows that
P(Xm
jm> - >Xn
jn) ~ P<Xm — jrn)pjm,jm^ X • • • X pjn_tljn.
Dividing by P(Xm = jm),
P(Xm + i = jm+l> • • • >Xn = jn |Xn = jm) = Pjm,jm+i X ' ’ ' P]„-i,j„-
Thus, both sides of Equation 5.3 are equal to the product pjm,jmtl
This establishes Equation 5.3.
EXAMPLE 5.1 (Binary Information Source)
A Markov information
source is a sequential mechanism for which the chance that a certain symbol
will be produced may depend upon the preceding symbol. Suppose the possible
symbols are 0 and 1. If at some stage a 0 is produced, then at the next stage a 1
will be produced with probability p and a 0 will be produced with probability
1 — p; if a 1 is produced, at the next stage a 0 will be produced with probability
q and a 1 will be produced with probability 1 — q. Since the p and q do
not depend upon the number of times a symbol has been produced, this
experiment can be described by the stationary transition matrix
In 1907, P. and T. Ehrenfest
described a conceptual experiment for the movement of N molecules between
two containers A and B. The state of the system at any given time is the number
of molecules in A so that S = {0,1,..., N}. At any given time, a molecule
is chosen at random from among the N and moved to the other container.
This chance mechanism is repeated indefinitely. Since the mechanism does
not depend upon how many changes have occurred, the process has stationary
EXAMPLE 5.2 (Ehrenfest Diffusion Model)
148
5
STOCHASTIC PROCESSES
transition probabilities given by pi,i-i = i/N, pi,i+i — 1 — (»/N) and
pi.] = 0 otherwise. In this case,
0
1 - (l/N)
0
0
...
...
;
1
0
•
0
0
•
0
0
0
0
0
0
...
...
0
1
1/N
0
0
1/N
p =
EXAMPLE 5.3 (Random Walk on the Integers) Consider the space S =
{..., —2, — 1, 0,1, 2,...}. Let {Yj}JL0 be a sequence of independent random
variables such that Vo has a specified density 7r0 and P(K; = 1) = p, P(Yj =
— 1) = q where) £ l,p, q>0,p + q = 1. For n £ l,letX„ = Xy=0 Yj.
Then the sequence {X„}„=0 is a Markov chain with stationary transition
probabilities
if j = i + 1
if) = i—l
otherwise.
P
Pi,j = P(%n = j I^Gj-1 = )* = \ q
o
To show that {X„}
*
=0 is a Markov chain, consider
P(X„ “ j |Xq — jo> • • • > Xn -1 — jn—l).
NotethatXo = jo.Xi = ji,. • .,X„-i = jn-i if and only if To = jo>Yl —
ji “jo,
= in-I -jn-2- By independence,
P (Xn — J 1-^0 — jo>Xl — jl, . . . ,Xn-i — jn-i)
n
Ko — jo, ^1 — jl ~ jo> • • • > Yn -1 — jn—l ~ jn-2
(
i =0
jo,...,Yn — \
jn — l jn—2>Yn
j
P(Y0
jo>--->Yn-i — jn-i jn-2)
= P(Y„ = j~j„-l).
P(Y0
jn—l}
Also by independence and Lemma 3.3.3,
n
n-I
\
1=0
1=0
/
(
= P^/=0y-
=jn-l>Y„ =j-j„-l)
P&KlYi =jn-l)
= P(Yn =j~j„-l).
5.2
149
MARKOV CHAINS
Therefore,
= j„_j) = P(X„ = j\X„-}
P(Xn = ;|X0 =
and the sequence {X„}”=0 is a Markov chain. Since
pi.j = P(Xn = j |X„_) = i) = P(Yn = j-i)=j q
if; = i + 1
if; = i - 1
otherwise,
the chain has stationary transition probabilities. The chain {X„}"=0 is
interpreted as follows. A particle starts off at an initial position jo in accord­
ance with the initial density ttq. The particle will then jump to jo + 1 with
probability p or jump to jo ~ 1 with probability q; in general, if after n jumps it
is at i, it will then jump to i + 1 with probability p or to i — 1 with probability
q, independently of n. ■
Let {X„ }”_ 0 be a Markov chain with stationary transition probabilities. The
conditional probabilities
pi,;(n) ~ P(Xm+n — j |Xm — i)
are independent of m (see Exercise 5.2.11) and are called n-step transition
probabilities. More generally, if m, n > 1 and ji,. ,.,jn G S, the conditional
probabilities
P(Xm + l — jl> • • • > Xm+n
jn\Xm
are independent of m. This property of a Markov chain with stationary
transition probabilities is called the stationarity property. The property simply
means that an integer m >: 1 can be subtracted from all the indices appearing
in a conditional probability; e.g.,
P (X4 — ;4,Xs — y’s.Xft — ;s|X3 — 73)
= P(Xj = j4,X2 = J5,X3 = j6|X0 = J3).
Letting P(n) = [pi,;(«)], P(n) is called the n-step transition matrix. We
define
PijW
jq
if i = j
if i # j.
If n = 1, then p,j(l) = P(Xm+l = j |Xm = i) = pij, and therefore
P(l) = P.
150
5
Theorem 5.2.1
(ChapmanKolmogorov
STOCHASTIC PROCESSES
Forallm,n a: landi.j G 5,
p/,;(™ +") = ^pi,k(m)pk,j(n).
(5.4)
Equation)
PROOF: We can assume that P(X0 = i) > 0, because otherwise p,-.j(m+n) =
P(Xm+n = j\X0 = i) = 0,piik(m) = P(Xm = k|X0 = i) = 0, and both
sides are zero. Suppose 1
m < n, jo,.. .,jn G S, and P(Xo = jo) > 0.
Then it follows from Equation 5.3 that
P (Xq = jo, . . •, Xm = jm> ■ • • > Xn
jn )
— P(Xm + i = jm + 1,...,Xn = jn |Xo = jo> • • • >Xm = jm)
X P(Xq = jo, • • . , Xm
jm )
= P(Xm + i = jm + i, . . . ,Xn = jn |Xm — jm)
X P(X] =ji,...,Xm = jm |Xq = jo)P(Xo = jo)-
Summing over....... jm-i>jm+i........ jn-i,
P (Xo = jo> Xm
jm > Xn
jn )
= P(X„ = jn |Xm = j„,)P(Xm = jrn |X0 = j0)P(X0 = Jo).
(5.5)
Replacing n by m + n, jo by i, jm by k, andj„ byj in Equation 5.5,
piij(m + n) = P(Xm+n = j |X0 = i)
= ^P(Xm+n = j.Xm = k\X0 = i)
k
_ 'C-' P(.Xm+n = j>Xm = k, Xp ~ i)
= ^P(Xm+n = jfXm = k)P(Xm = k\X0 = i)
k
= ^Pk.j{n)pi,k{m). ■
k
Equation 5.4 can be interpreted in terms of matrix multiplication. Noting
that the first factors pi.i(m),pi^m),... constitute the ith row of P(m) and
the second factors pi,j(«).p2,; («)> • ■ • constitute the jth column of P(n), the
sum on the right of Equation 5.4 is the product of the elements of the ith row
of P(m) and the corresponding elements of the jth column of P(n); but this is
just the definition of the element in the ith row and jth column of the product
of the two matrices P(m) and P(n). In terms of matrix multiplication, the
Chapman-Kolmogorov equation simply says that P(m + n) = P(m)P(n).
Since P(l) = P, P(n + 1) = P(n)Pforalln s 1; iterating this result, we see
5.2
MARKOV CHAINS
151
n transitions —>
m transitions —>
FIGURE 5.1 Stopping and restarting a chain.
that P(n) = Pn and that P(n) is simply the nth power of the transition ma­
trix P.
Note also that P(n) is a stochastic matrix for each n S: 1. Since P(l) = P,
the statement is true for n = 1. Assuming the statement is true for n — 1, it
follows from Equation 5.4 and Theorem 3.4.3 that
= 5151pu(« - l)pitj(l)
}
■ j
k
= X.P».^» “
k
y'.Pk.iW
j
= ^Pi.k(n - 1)
= 1
k
and that P(n) has nonnegative entries. It follows from the principle of
mathematical induction that P(n) is a stochastic matrix for every n > 1.
Figure 5.1 is a graphical illustration of the Chapman-Kolmogorov equation.
If at some time the Markov chain is in the state i, the probability of going
from i to j in m + n steps can be obtained by stopping the chain after m steps
in state k, restarting the chain with initial state k, ending up in state; after n
additional steps, and summing over k.
In the case of the Ehrenfest diffusion model, Example 5.2, it is reasonable
to ask how the molecules will be distributed between the two containers after
much time has lapsed; i.e., what happens top;,; (n), the probability that starting
from state i the chain will be in state j after n transitions, as n —> oo. In
general, determining the limiting behavior of the p,-,;(n) can be difficult. To
keep things as simple as possible in this chapter, we will limit the remaining
discussion to Markov chains having a finite state space S = {1,2,..., r} and
r X r transition matrix P = [pi,;]. We assume that r > 2 to avoid the trivial
case of a chain with just one state.
152
Theorem 5.2.2
5
STOCHASTIC PROCESSES
If there is an integer N such thatpi,j(N) > Ofor 1
lim pij(n) = Ttj,
i,j
r,then
j = l,...,r
exists and is independent of i.
PROOF: We first prove the result assuming that N = 1; i.e., pij > 0, 1
i,j < r. If r = 2 and
1/2
1/2
P =
1/2
1/2
then Pn = P for all n si and the assertion is true with 771 = 772 = 1/2.
We can therefore assume that
S = , J™.
min
1! Pi.j < y
Consider a fixed; and define
m„ = min pi.j(n).
l<i£r J
M„ = max p,-,;(n)
l<i<r 1
If we can show that the sequence
as depicted below,
} increases, the sequence {M„ } decreases
and lim„_»«>(M„ — m„) = 0, then there would be a number ttj such that
lim M„ = lim nin = tt,,
rt-»oo
n—>00
J
and since m„ < p,j(n) < M„, it would follow that
lim p;,;(n) = 77;,
j = 1,.. .,r.
Since m„ is the minimum of a finite collection of numbers, it must be one of
them, say
= Pknj(n).
5.2
MARKOV CHAINS
153
By Equation 5.4,
M„+i = max pi,;(n + l) = max Vpi,kpk,iW
ISiSr
'
k=l
= max \pi.knpkl,,jW+'^pi>kpk,j(n)\
max Ipi.^tn, + M„ ^pi,k I
1S'Sr\
k*„
J
= max (pi,k„ ntn+Mn(l- pitkn))
= max (M„ - (M„ - m„)pi,kn)
1 £i sr
= M„ - (M„ - r^) min pi>kn
l^iSr
— Mn ~ (Mn - n^S.
Therefore,
Mn+) < M„ - (M„ - mJ 3 < M„,
and the sequence {Af„} is decreasing. A similar argument shows that the
sequence {mJ is increasing, and
> nin + (M„ - mJ 3 > rn,.
By combining the last two inequalities,
M„+1 - m„+1
(M„ - ntn) - 23(M„ - mJ = (1 - 23)(M„ - nJ.
Since M] — mj < 1,
< (1 - 28)"-1,
0 < M„ -
n
1,
and therefore M„ -» 0 as n -» <». Thus, lim„ _»«,pitj (n) = ttj exists.
Since lim„ _*«>/>, >;(n) = ]imn_eoM„ and the latter does not depend on i,
irj is independent of i. Suppose now that N is a positive integer for which
pi,j(N) > 0 for 1
i,j
r, let P = [pitj] = PN_ = [pi,;(N)], and let
P(m) = Pn = PnN = [p;,;(nN)]. Since P(l) = P is a stochastic matrix
withp(l) > 0,1
i,j
r, the first part of the proof implies that
lim pi,;(n) = lim pit;(nN) = tt.-
M-+OO
J
n^co
154
5
STOCHASTIC PROCESSES
exists for 1 < i,j
r, independently of i. This means that given £ > 0, for
each pair i,j there is an N,-,y
1 such that
|pij(»N) - 77;| < e
whenever n s
Letting Nc = maxi£iij £r Nij,
|pi,j(nN) - 7Tjl< e
simultaneously for all i,j with 1
i,j
r whenever n 2: Ne. Any positive
integer n can be written n = k(n)N + €(n) where limn_xk(n) = +<» and
0 £ €(n) < r. By Equation 5.4,
Pi,j(n) = 2Zp>.k(€(n))(pk,;(fc(n)N).
k
Since
pi,/€(«)) = 1,
p,-. k (€( n ))(pk,, (^ (n )N) - tt,}.
pij(n) - iTj =
k
Since lim„_»x k(n) = +«>, there is an M 2 1 such that k(n) 2 Ne whenever
n 2 M. Thus, for n 2 M,k(n) 2 Ne and
|pij(n) - 7T,| < 2Lp,-,fc(€(n))|pfc,j(^(«)N) - ttJ
k
< e^p,. J€(n)) = e.
k
Therefore,
lim pi./n) = TTj,
n —*x
I <j
r,
independently of i. ■
Definition 5.2
fl J The state j can be reached from the state i if there is a positive integer n such
that> 0; f2j the transition matrix P or chain {X„}
*
=0 is irreducible if
each state can be reached from every other state. ■
Consider a Markov chain with irreducible transition matrix P = (p/j).
There is then a positive integer N such that pij(N) > 0 for all i,j G S,
and Vj = lim„_»xpij(n) is defined for all i,j G S, independently of i,
according to Theorem 5.2.2. Taking n = 1 and letting m -> oo in the
Chapman-Kolmogorov equation,
r
vj=^vkpk.j,
k=i
j = l,...,r.
(5.6)
5.2
155
MARKOV CHAINS
The Vj are clearly nonnegative. Since XJ=i pi.jW = 1 and
r
V
.—,
;=1
r
r
= V lint p,j(n) = lim Vp,.;-(n) = 1,
.;=1
—,n*a>
n—»a>.—
'
j=l
{p;}J = 1 is a probability density called the asymptotic distribution or limiting
distribution. It is left as an exercise to show that the density {p;}J=1 satisfy­
ing Equation 5.6 is unique. According to Equation 5.6, the determination of
the Vj amounts to solving a system of linear equations.
Consider a Markov chain with state space S = {1,2, 3}
and stationary transition matrix
EXAMPLE 5.4
1/2
0
1/2
1/2
1/2
0
0
In this case, Equation 5.6 becomes
Pl =
v2 ~
P3 =
1
v2 + - Vy
1
1
Vi + - V3
1
-Vi.
These equations are not linearly independent, and one of them must be
discarded and replaced by the equation
Vi + P2 + Vi = 1.
If this is done and the resulting equations are solved, we find
4
1
"■ = 9-
"2 “ 5’
2
_
= 5- ■
EXAMPLE 5.5
Consider the Ehrenfest diffusion model with S = {0,
1, ....Nj.pi.i-! = i/N,piii+i = 1 - (i/N), and p,j = 0 otherwise. In this
case, Equation 5.6 reads
Pj = 2Z Pfcpiij,
fc=0
j=0,...,N.
156
5
STOCHASTIC PROCESSES
The equation corresponding to j = 0 is
V0= N’
and the equation corresponding to j
For 1
N is
j < N — 1,
Disregarding theN in the denominators on the right side, the equations suggest
that Vj has a form that allows they + 1 and N — j +1 coefficients to be cancelled.
This suggests that the Vj have the form (
; but since the sum of all the Vj is
equal to 1 and Sp=o (
= 2N, the Vj must have the form
It is easily verified that the Vj given by this equation do in fact satisfy the
above equations and represent the asymptotic distribution. The interpretation
is that whatever the initial number of molecules in container A, ultimately
each of the N molecules is assigned to one of the two containers with equal
probabilities. ■
EXERCISES 5.2
1.
A gambler and his adversary have a combined capital of N units. In
successive wagers, the gambler can win one unit with probability p or
lose one unit with probability q = 1 - p. Describe the transition
matrix.
2.
Consider a Markov chain with state space S = {1,2,3,4} and transition
matrix
P =
1/2
0
0
0
0
1/3
0
1/4
0
0
1
3/4
1/2
2/3
0
Find the smallest integer n such that pi,j (n) > 0, i, j = 1,2,3,4 and
determine the asymptotic distribution of the chain.
5.2
157
MARKOV CHAINS
3.
Determine the asymptotic distribution of the binary information source
of Example 5.1.
4.
Consider a Markov chain with state space S = {1,2,3} and transition
matrix
p =
0
1/2
1/2
1/2
0
1/2
1/2
1/2
0
Find the smallest integer n for which p,,;(n) > 0 for i,j = 1,2,3 and
find the asymptotic distribution of the chain.
5.
Let P = [pij] be an N X N transition matrix and suppose that
fij =
l^kpk,j>} — 1> •• • >N. Show that for each n S: 1,
N
P-j = ^Pkpk.jtn),
fc=i
6.
An N X N transition matrix P = [ p/.y ] is doubly stochastic if SfLi
pi,; = 1 for; = 1,... ,N. Assuming that the limits exist, show that
r
, '
1
Vj = hm pi;j(n) =
J
n —>oo 7
N
7.
j = 1,...,N.
j = 1,...,N.
Consider a Markov chain with state space S = {1,2,3,4,5} and
transition matrix
‘ 1/3
1/6
1/6
P =
1/6
. 1/6
1/6
1/3
1/6
1/6
1/6
1/6
1/6
1/3
1/6
1/6
1/6
1/6
1/6
1/3
1/6
1/6 ‘
1/6
1/6
1/6
1/3 .
Find the asymptotic distribution of the chain.
8.
Let P = [ pij] be an N X N transition matrix for which there is an
n > 1 such that pi,;(n) > 0 for all i,j = 1,.. .,N. Let {/Xj}f=i be a
probability density that satisfies the equation
= ^Pkpk.j,
k=l
Show that {p-j};N= i is unique.
j = 1,...,N.
158
5
STOCHASTIC PROCESSES
9.
Consider a Markov chain with state space S = {1,2, 3} and transition
matrix
■ 0
p. = X1
1
1/2
0
0
1/2
0
0
Calculate P(2n) and P(2n — 1) for all n
1 and draw conclusions
about the asymptotic distribution of the chain.
10. N red balls and N white balls are placed into two containers A and B so
that each contains N balls. The number of red balls in A is the state of a
system. At each step of a continuing process, a ball is selected at random
from each container and transferred to the other container. Determine
the transition matrix P and find the asymptotic distribution.
11. Let {X„ }“=0 be a Markov chain with stationary transition probabilities.
lfm,n S: l,i, j G S, show that P(Xm+n = j jXm = i) is independent
of m.
The next two problems require mathematical software such as Mathematica
or Maple V.
12. Consider a Markov chain with state space S = {1,2,3,4,5} and
transition matrix
’ .1
.2
P = .2
.3
.3
.2
.2
.2
.3
.3
.1
.3
.2
.2
.3
.5
.1
.2
.2
0
.1
.2
.2
0
.1
Find the asymptotic distribution of the chain.
13.
Consider a Markov chain with state space S — {1,2, 3,4,5,6} and
transition matrix
P =
0
.11
.10
.40
.05
0
.12
0
.10
0
.05
0
.38
.29
0
0
.30
0
0
.20
.15
0
.40
.30
.40
0
.25
.30
0
.30
Find the asymptotic distribution of the chain.
.10 '
.40
.40
.30 .20
.40
5.3
159
RANDOM WALKS
RANDOM WALKS
The probability model for the gambler’s ruin problem consists of an infinite
sequence of independent random variables {y?}”= j with P(y; = 1) = p and
P(Yj = —1) = q = 1—p where 0 < p < 1. Fixing integers x and a with
1 < x
a — 1, Sn =x + Yi + -- - + Ynis the gambler’s capital as of the nth
play of a game, with x representing the gambler’s initial capital. The S„ can
also serve as model for a particle taking a random walk on the integer points
of the line. We interpret x as the initial position of the particle, and starting
at x the particle will jump one unit to the right with probability p and one
unit to the left with probability q; i.e., its position after the first jump will be
Si = x + y j. Starting from this new position, the particle will jump one unit
to the right with probabilityp and one unit to the left with probability q, so that
its position after the second jump will be $2 = x + Y^ + y2. After the nth jump,
its position will be S„ = x + Y i + • • • + Y„. Let qx be the probability that the
particle will reach 0 before reaching a. Since only the interpretation of the S„
has changed and not the probability model, the qx are given by Equations 3.14
and 3.15. In the language of random walks, 0 and a are boundary points for
the interval of integers {0,1,.... a}, and qx is the probability of absorption
at 0.
Let {Tj} be an infinite sequence of independent random variables as
described above and let S„ = Y! + • • • + Yn. Then Sn can be interpreted
as the position of a particle after n moves with the particle starting at 0 and
successively jumping one unit to the right with probability p and one unit to
the left with probability q = 1 - p. After the first jump, the particle is no
longer at 0 and may or may not eventually return to 0. We will endeavor
to calculate the probability that the particle will eventually return to 0. The
notation introduced above will be used throughout this section. Recall that Z
is the set of integers {..., —2, — 1,0,1,2...}.
Definition 5.3
If p = q, the sequence {S„}”=1 is called a symmetric random walk on Z; if
p
q> the sequence {S„ }"=! is called a random walk on Z with drift to the left
ifq>p and to the right ifp>q. ■
We will introduce two sequences of numbers {uj} and {/,} by defining
uj = P(Sj = 0),
fj = P(S! # O.-.-.S;-! # 0,Sj = 0),
j * 1
j > 1.
Since the numbers u0 and fo have no meaning, we are free to define them
however we choose and put »o = l>/o = 0- Since
1 and j/j|
1 for
all; S: 0, the generating functions
00
U(s) = X. uis’’
j=o
160
5
STOCHASTIC PROCESSES
FW =
i=0
converge absolutely in the interval ( — 1,1). The probability that the particle
will eventually return to the origin is'P(Sn = 0 for some n 2: 1). Since this
event can be stratified according to the first time the particle reaches the origin,
00
P(S„
= 0 for some n > 1) =
P(Si # 0,..
0, Sj = 0)
;=i
00
= Sj5
* i.
;=i
Letting/ =
i /> 1 ~ / is the probability that the particle will never return
to the origin. Tne {uj} and {/} sequences are related by the equation
Uj = f0Uj
+ftUj-2 + • • • +fjUo,
(5.8)
j>l.
Note that the first term on the right is zero since/0 = 0. This equation follows
from the fact that
j
(Sj
= °) = U(S1
= 0>Sj =
0)
fc = l
and the following argument. The event (Si # 0, ...,Sjt-i # 0,Sfc = 0)
depends only upon X|,...,Xfc, and since Sjt = Xi + • • • + Xfc = 0, the
condition Sy = 0 is the same as the condition X
* +i + • • • + Xy = 0, which
depends only upon
... ,Xy. By independence and the fact that the joint
density of Xfc+1,... ,Xy is the same as the joint density of Xb .. .,Xj-k,
Uj = P(Sj = 0)
j
#0,...,Sfc_1 #0,Sfc = 0)P(Xfc+1 + • • •+X; = 0)
=
k=1
}
= ^P(Si
Sfc-! #0,Sfc = 0)P(X1 + ---+XJ_fc = 0)
k=1
j
j
” ^^,fkuj~k = ^'.fkUj-kk=l
k=0
This establishes Equation 5.8. Multiplying both sides of Equation 5.8 by s/
5.3
161
RANDOM WALKS
1, and using the fact that/0 = 0,
summing over;
00
00
22 uisj = 2?(/o»;
+ ’ ’ ’ +fj“oW-
;=o
j=i
Since uq = 1, the left side of this equation is U(s) — 1, and according to the
discussion preceding Definition 3.10, the right side is equal to
00
00
i=o
j =0
= U(s)F(s).
Thus, U(s) — 1 = F(s)U(s) and the generating functions are related as
follows:
-V ?
(J(s) = t
f
1 - F(f)
(5.9)
Since the Uj and/- are nonnegative, by Abel’s theorem (Theorem 4.2.3) both
limits lim,-.!- U(s) = Z^=0Uj and lim^i-F(s) =
= / - 1
exist, even if the first is infinite.
Theorem 5.3.1
f < 1 ifandonlyifZ^=QUj < +<».
PROOF: Note that 0^/<lor/=l. When f < 1,
00
I
I
2?“/ = lim U(s) = ---- P--------- =—7- = ----- - < »;
j-i1 - hm,_>1_F(s)
1-f
when/ = l,lims_i-F(s) = 1 andlims_»i-U(s) = 2^°=ow; = +°°- Int^e
latter case, X”=o uj < +°° implies that f < 1. ■
This theorem gives us a workable criterion for deciding if the particle will
eventually return to the origin with probability 1. Before applying the criterion,
it is necessary to take up approximations to factorials.
A sequence {aj}J°= 1 is said to be asymptotically equivalent to the sequence
{fy}”=1, written aj ~ bj, if lim;_»«,a/fy = 1. It is easy to see that if aj ~ bj
and Cj — dj, then aj/q — bj/dj. The following relationship is known as
Stirling's formula:
nl ~ J2^nn+(1/2)e~n.
(5.10)
An elementary proof of this result can be found in the book by R. Ash listed at
the end of this chapter.
162
5
STOCHASTIC PROCESSES
Returning to the series £
* =o Uj.} note that tij = 0 whenever; is odd because
a return to the origin can occur only in an even number of jumps (i.e., the
number of jumps to the right must be equal to the number of jumps to the
left). Consider m2„ for n > 1. Since the number of jumps to the left and to
the right must be equal,
-2. = (2„")pV.
"SI.
By Stirling’s formula,
Since
hm ------------ == = 1,
n~* x (4pq)n/ yjn tt
there is an N S: 1 such that
_____
for all n > N,
(4pq)n/ Jn 7T
and since Jmr s 1 for all positive integers n,
u2n <
./7TM
< 2(4pq)"
for all n > N.
We now take up the p # q cases and p = q cases separately. Suppose first
that p # q. In this case, 4pq = 4p(l — p) < 1 since the maximum value
1/4 of p(l - p) is attained only when p = q = 1/2. By the comparison test
for positive series, the series
=0 u2n converges since the geometric series
ZZ=o(4P?)" converges. Thus, p
q implies that X7=ow; converges. By
Theorem 5.3.1, f < 1. This means that in the p # q case, there is a positive
probability that the particle will never return to the origin. Consider now the
p = q = 1/2 case. Then 4pq = 1 and u2n ~ 1/ Jrrn. That is,
U2n
lim
n-»x 1/ Jmr
and there is an N S: 1 such that
J1/ Jn tt 2
U2n
for all n > N
5.3
163
RANDOM WALKS
or
u2n >
for all n £ N.
y=
2 y/Trn
By comparison with the divergent p-series X”= i l/« 1/2> the series 2”=0 u2n
diverges. Thus, p = q = 1/2 implies that X”=o uj diverges. By Theorem 5.3.1,
f = 1 and the particle will return to the origin with probability 1. In summary,
we have the following theorem.
Theorem 5. 3.2 If p # q, there is a positive probability that the random walk {S„}” = j will never
return to the origin; ifp = q, the random walk {S„}”_ j will return to the origin
with probability 1.
Can the probability of eventually returning to the origin be determined in
the p # q case? With a little more work we can answer this question since the
Uj are known. In fact,
;=0
J
Using the easily verified fact that
(2„") = (-4r(-'/2),
j=0
j =0
}
J
= (1 - 4pqs2)-1/2.
By Equation 5.9,
F(s) = 1 - (1 - 4pqs2)1'2.
Since F(l) = 1 ~ (1 ~ 4pq)1/2 and also F(l) =
= /,/ = 1 ~ (1 ~
4pq)1/2. Noting that 1 — 4pq = 1 — 4p(l — p) = (1 — 2p)2 = (q — p)2,/ =
1 “
“Pl-
Theorem 5. 3.3
The random walk {S„}”=1 will return to the origin with probability f =
1 - |q - p|.
164
5
STOCHASTIC PROCESSES
We can also use the previous result to determine the expected number
of jumps to return to the origin. Define a waiting time random variable
T by putting T = n on (Si
0, ...,S„-j # 0, Sn = 0) for n S 1
and T = +<» otherwise. Then P(T = n) = P(Si # 0, ...,Sn-i
0,S„ = 0) = fn. Consider the p^ q case. Since f = ^.™=ofn < 1,
P(T = +oo) = i - p(p < oo) = i - f > o and E[T] = +°°. Now
consider the p = q = 1/2 case. This time P(T < +°°) = f = 1. Note that
fr(s) = ^=0P(T = n)s” = Z
* =of„sn = F($). Therefore,
B[T] - £(,) - F’(l) = ,lun_
= +»•
In summary, we have the following theorem.
Theorem 5. 3.4
The expected number ofjumpsfor return to the origin is +°°for the one-dimensional
random walk {S„
.
In the symmetric case, the random walk will return to the origin with
probability 1, but the expected time for doing so is infinite.
A two-dimensional random walk on the points in the plane with integer
coordinates can be described as follows. If at a given time a particle is at
a point (x, y) with integer coordinates, then it will jump to one of the four
neighboring points (x + l,y), (x - l,y), (x,y + 1), (x,y — 1) with specified
probabilities independently of what has taken place previously. A threedimensional random walk on the points in 3-space with integer coordinates
can be described similarly, except that jumps to six neighboring points will be
allowed. To simplify the discussion, we will consider only symmetric two- and
three-dimensional random walks.
For each j > 1, let (Xj, Yj) be an ordered pair of random variables with
joint density function
fx^Xj.yj) =
1/4
0
if (Xj,yj) = ( + 1,0) or (0, ± 1)
otherwise.
We will assume that a probability space (0, S', P) can be constructed so that
the pairs (Xi, Ti), (%2» ^2).
are independent; i.e., for every n S: 1,
/x„y,... x„,yn(x1,y1,...,xn,yn) =
(^i,y») x ’' ’ x/x„.yn(x„,y„).
Note that for each j > 1, the random variables Xj and Yj are not independent.
For each n > 1, let S„ = JG + • • • + X„ and Tn = Yj + • • • + Y„. Then
the sequence of pairs {(S„, T„ )}
* = j describes a two-dimensional random walk
starting at the origin on the points in the plane with integer coordinates. A
particle taking such a random walk can be at the origin as of the nth jump if
and only if both Sn = 0 and Tn =0. As before, we can define
un = P(S„ = 0, T„ = 0),
n > 1
5.3
RANDOM WALKS
165
with u0 = 1 and also define
fn = P(|Si| + |Tj| #0,...,|S„_1| + |T„_1|#0,S„ = 0,T„ = 0),
n > 1
with /o = 0. The probabilityfn is the probability that the particle will return to
the origin for the first time on the nth jump, and/ = X”= i fj is the probability
that the particle will eventually return to the origin. The generating functions
U(s) and F($) are related as in Equation 5.9, so that again/ < 1 if and only
if 2”=o ui < +00- As in the one-dimensional random walk, a return to the
origin can occur only in an even number of jumps. For S2n = 0and?2n = 0,
the number k of jumps to the right must be equal to the number k of jumps
to the left, and the number n — k of jumps up must be equal to the number
n — k of jumps down where k = 0,..., n. By the multinomial density and
Equation 1.11,
■'T-
(2n)!_______ (1)12”
k'.kl(n — k)l(n -fc)!k4'
By Stirling’s formula,
»2n ~
mtt’
Thus, there is a positive integer N 2: 1 such that
1 1
2 mr
»2n - --------for all n
N.
Since the series X7= i
diverges, the series 2^=o u« diverges and therefore
/ = 1. Thus, a symmetric random walk in the plane will return to the origin
with probability 1.
The situation changes, however, in higher dimensions. In the threedimensional case, the random walk starting at the origin takes place on the
points in 3-space with integer coordinates, and the particle will jump to any
one of its nearest neighbors with probability 1/6. As in the previous cases, a
return to the origin can occur only in an even number of jumps. For this to
happen, the number of jumps in the positive x-direction must be equal to the
166
5
STOCHASTIC PROCESSES
number of jumps in the negative x-direction, and the same for the /-direction
and z-direction. In this case,
A
_____________ (2n)l
(2n)l_____________ A V"
£^ojWk -;)!(
*
~ j)!(n " k)l(n - k)l \6j
1 / 2n \
/_______ n
n!!________ \
22" ' n ' ^0^\3njl(k - j)i.(n - k)\/ '
Since the quantity in the parentheses is the general term of a multinomial
density,
U2n
1 (2n\
(
n!_______ \
22" ' n ' ^[y^k -jMn - k)l J
n
k /
।
\
1 f 2n \
(
n\
\
22n V n ' j,k \3njl(k -jy.(n - k)l)'
The indicated maximum will be achieved when the three factorials in the
denominator are equal and, since their sum is n, when each is equal to (n/3)l,
assuming that n/3 is an integer. Putting aside such technical details,
U2n
<
2« )
22" ' n 73" ((n/3)!)3’
Applying Stirling’s formula, Equation 5.10, to the factorials:
< -L(2n }
n'________ L
1
W2" “ 22n ' n '3"((n/3)l)3
27r'V7rn3/2’
By the comparison test, the series ^^=oun can be compared with the
convergent p-series X“=1 l/n3/2 with p = 3/2 > 1, and therefore the series
X”=o
converges. In this case,/ < 1 and there is a positive probability that
a return to the origin will never occur.
The technical details glossed over previously can be taken care of by using
the fact that for 0
j
k
n,
j!(k-j)!(n -k)i >
/ /fi
\\3
+
,
where T is the gamma function (see Section 6.5), and making use of known
estimates of the gamma function for large values of the argument.
5.4
EXERCISES 5.3
167
BRANCHING PROCESSES
The following terminology will be used in connection with a particle taking a
random walk on the integers {0,1,..., a}, a >2. The boundary point 0 is an
elastic barrier for the walk if there is a number 8 with 0 < 8 < 1 such that the
particle upon reaching 1 will Jump to 2 with probability p or remain at 1 with
probability 8q or jump to 0 with probability (1 - 8)q.
1.
Verify that
(2") . (-4)"(-1/2
x n '
x
n
for every positive integer n.
2. Consider a random walk on Z that Jumps two units to the right with
probability p and one unit to the left with probability q, 0 < p <
1, p + <? = 1. If a particle starts at 0, for what values of p is return to 0
certain?
3. Let qx be the probability that a random walk on {0,1,, a} with elastic
barrier at 0 as described above will hit 0 before hitting a. Find a
difference equation for the qx, find boundary conditions, and deter­
mine qx.
4. Let Txt 1
x
a — 1, be the waiting time for the random walk
of the previous problem to hit either 0 or a. Calculate Dx = E[Tx],
1 < x < a — 1, in the p ¥= q case.
BRANCHING PROCESSES
If a neutron collides with the nucleus of an atom, the nucleus may split
and give rise to new neutrons, which in turn may collide with other nuclei
and give rise to more neutrons, and so forth. This is an example of a
branching process. Another commonly cited example involves the survival of
family names, assumed to be passed on to male offspring. Starting with one
individual, k offspring may be produced with probability pk, k = 0,1, ... .
The number of offspring is a random variable Xj that describes the size of
the first generation. Each of the Xi offspring can then produce k offspring
with probability pk, k = 0,1,2,..., independently of Xi and independently
of the number of offspring of individuals of the same generation. The total
number of the offspring of the Xi individuals is then a random variable X2
that describes the size of the second generation, and so forth. Continuing in
this way, there is a sequence of random variables Xo,Xi,... where Xo is the
size of the initial generation and Xj describes the size of the jth generation.
A careful construction of a branching process in terms of random variables
requires an infinite sequence of independent nonnegative integer-valued ran­
168
5
STOCHASTIC PROCESSES
dom variables all having the same density p(k) = pk,k = 0,1, . . . . We will
assume that there is such a sequence of random variables.
We commence with the density function p just described and assume
throughout that Xo = 1-LetX( be a random variable having density function
p and let y^, Y^,... be a sequenceof independent random variables that all
have the same density p and that are also independent of Xi. We then let
x2 * = y*I 0 + y42° + • • • + A |
i.e., X2 is the sum of a random number of random variables. Letting p denote
the generating function of the density p, by Theorem 3.4.5,
/x,(0 = 7x,(p(0).
Since fXl = p,
fx2(‘) = P(?(f)-
Now let y j2', y^,... be a sequence of independent random variables all
having the same density and independent of all previously mentioned random
variables, and let
x3 = y(!2> + y(22) + • • • + yg.
Again by Theorem 3.4.5,
fx}M = fx2(p<J» =?(£(?(')))■
Continuing in this manner, a sequenceX|,X2... is obtained whose generating
functions satisfy
fxiJj) =fxj(p(t)
forallj > 1.
(5.11)
We will now show using mathematical induction that
7xbl(t) = p(/x,-(O)
for all) > 1.
(5.12)
Since/x2 = /x,(p(t)) and/X| = p,fx,(t) = p(Jx,W) an^ Equation 5.12 is
true for) = 1. Suppose Equation 5.12 is true for j - 1. By Equation 5.11,
7x/H(f) = /x,•(?(')) = p(7x,_,(p(f)) = p(/x,(0),
and the assertion is true for j. It follows from the principle of mathematical
induction that Equation 5.12 is true for all j > 1.
5.4
BRANCHING PROCESSES
169
The parameters/x = EpG] = p'(l)and<r2 = varJG = p"(l)+p'(l) (pr(l))2> assumed to be finite, are useful in describing qualitative properties of
the branching process. SinceE[X;+,] = fa (1), by Equation 5.12,
7^,(0 = M
and
= p'Cfx^fx^ = P'Vfx^) = ^E[Xj].
Iterating this result, E[Xj+i] = /xj+1. Therefore,
EpQ] = /x;
for allj 2 1,
and the expected size of thejth generation increases or decreases geometrically
according to whether /x > 1 or /x < 1.
Consider now the probability that the branching process will eventually
terminate; i.e., P(Xn = 0 for some n 2 1). We will want to exclude from
consideration some special cases. Suppose first that po = 0; i.e., the probability
that an individual will have zero offspring is zero; in this case, extinction will
never occur and we can henceforth assume that po > 0. Suppose now that
po = 1. Then extinction will occur with the first generation, and the po = 1
case will be excluded. Henceforth, we will assume that 0 < po < 1.
Consider the probability qj that the size of thejth generation will be zero;
i.e.,
qj = P{Xj = 0) = /x.(0).
By Equation 5.12, <?j+1 = /X/+I(0) = p(/x>(0)) = p(^). Thus, the qfs are
related by the equation
= p(<?j)
forallj 2 1.
(5.13)
Since p is supposedly given as part of the data describing the branching process
and qi = P(Xi = 0) = po> in principle Equation 5.13 can be used to
determine the sequence {<?;}”=! by iteration. In general, however, p(s) will
have nonlinear terms s\ j 2 2, which makes it difficult to find a formula for
the qj. As an alternative, we might examine the long-range behavior of the qj
by considering lim; _,« qj, if it exists. Assuming that the limit exists and using
the fact that p(s) is continuous on [0,1],
q = lim qj+i = limp(q;) = p(<j);
j —>00
j —>00
i.e., q is a solution of the equation s = p(s'). Note that s = 1 solves this
equation since p(l) = 1, but there may be other solutions as well.
170
5
STOCHASTIC PROCESSES
FIGURE 5.2 Graphs of t = s and
t = p(s),po+p\ = 1-
To determine other roots of the equation s = p(s), we will examine the
function p(-) in greater detail. Since 0 < po < l,pj > 0 for some) 2: l,and
sincep'(s) = T“=ijpj$’~l>p'(s') > 0 on (0,1) andp($) is strictly increasing
on [0,1]. Since qi = po > 0, q2 = p(qi) > p(0) = p0 = qi- Assume that
qj > qj_t. Then q7+i = p(qj') > ptqj-]) = qj. By mathematical induction,
qj < qj+] for allj 2 1. Thus, {<?;}“=! is a monotone increasing sequence that
is bounded above by 1, and therefore q = lim?_»x qj exists with 0 < q < 1.
An alternative approach to solutions of the equation s = p(s) is to look
at points of intersection of the graphs of the equations t = $ and t = p($),
0 < $ ■< 1, since the s-coordinate of a point of intersection is a solution of
the equation s = p(s). It will be necessary to consider two cases. Suppose
first that po +pi — 1- Thus, p(s) = po + p\$ with 0 < pi < 1, and the two
graphs are as depicted in Figure 5.2. In this case, it is clear that there is only
one solution to the equation s = p(s); namely, s = 1, so that q = 1. This
means that for large), the probability that extinction will occur with the jth
generation is very close to 1. Note thatp. = p'(l) = pi < linthepo+pi = 1
case. Suppose now that po + pi < 1 so that p; > 0 for some j > 2. In
this case, p"(s) = 2^x=2j(j — l)p7s^-2 > 0 on (0,1) and the function p($)
is convex and strictly increasing as depicted in Figure 5.3. In this case, it is
clear that there are at most two solutions of the equation s = p(s). We will
now show that q is the smallest solution of this equation. Let r > 0 be any
solution. Thenqi = po = p(0) < p(r) = r. Assume that cjj — i < r. Then
qj = p(qj-i) < p(r) = r. Thus, the sequence {<?,}“= । is bounded above by
r, and therefore q = limj^xq;
r; i.e., q is the smallest solution of the
equations = p(s).
EXAMPLE 5.6 According to a statistical study by A. J. Lotka, the number
of male offspring of an American male is given by the modified geometric
5.4
BRANCHING PROCESSES
171
density po — .4823 and pk = (.2126)(.5893)fe \ k S: 1. The generating
function p is then
p(t) = .4823 +
.2126t
1 —.5893t'
Using software such as Mathematica or Maple V to approximate the solution
of the equation
.4823 +
.2126t
= t,
1 - .5893t
the probability of extinction is .8183. ■
We will now relate the probability of ultimate extinction to the expected
number of offspring of a single individual in the po + pi <1 case. If q < 1,
then there is a point so in (q, 1) such thatp'(s0) = 1 by the mean value theorem;
since p'(s) is strictly increasing on [0,1] and continuous from the left at 1 by
Abel’s theorem (Theorem 4.2.3), p, = p'(l) > 1. Thus, if p. = p'(l)
1,
thenq = 1. Suppose now that q = 1 so that the graph of t = p(s) intersects
the graph of t = $ in only one point. It follows that p = p'(l)
1. Thus,
q = 1 if and only if p,
1. In the po + pi = 1 case, q = 1 and p
1. We
thus have the following theorem.
Theorem 5.4.1
q = lim; _a>q;- = 1 if and only if p
1.
Suppose p0 = 1/8, pi = 1/4, and p2 = 5/8. Then
p($) = 1/8 + (1/4)$ + (5/8)s2 and the equation $ = p(s) has the two
solutions 1/5,1. Therefore, the probability of ultimate extinction is 1/5 with
p = 3/2. ■
EXAMPLE 5.7
FIGURE 5.3 Graphs of t = s and
t = p(s)>po+pi < I-
172
5
EXERCISES 5.4
STOCHASTIC PROCESSES
1.
What is the probability of ultimate extinction q for a branching process
with
3,3
*
1
1,
P(5) = RS +85 +85+8?
o
o
o
o
2. Consider the branching process with po — 1/8, pi = 3/8, p2 = 3/8,
pi = 1/8, and p„ = 0 for all n 2: 4. Calculate the probability q3 that
extinction will occur with the third generation.
3. Find a formula for the probability of ultimate extinction q for a
branching process with p(s) = a + fis2 where 0 < a < 1 and
a + (3 = 1.
4. If /x = E [Xi ] and <r2 = varXi, show that
varX;+i = /x2 varXj +/x; <r2
for; 2: 1.
5. Show that
varXj = a2(/x2;-2+/x2;-3 + • • • + p/-1)
for; 2 1.
The following problems require mathematical software such as Mathematica
or Maple V.
6. Consider the branching process with po = 1/4, pi = 1/2, pj = 1/8,
pi = 3/32, p4 = 1/32, and pn = 0 for all n 2 5. Approximate the
probability of ultimate extinction q.
7. Consider a branching process for which the number of offspring of an
individual has a Poisson density with parameter A = 2. Approximate
the probability of ultimate extinction q.
8. Consider a branching process for which the number of offspring of an
individual has a binomial density with parameters n = 5andp = .25.
Approximate the probability of ultimate extinction q.
9. Consider the branching process with p0 = 1/8, pi = 3/8, p2 = 3/8,
pi = 1/8, and p„ = 0 for all n 2 4. Calculate qi through <?i0.
PREDICTION THEORY
Consider a sequence {X,} of random variables with finite second moments
where j is allowed to range from — <» to +<». The sequence may correspond
to a random process that has been going for some time. Suppose the index
n corresponds to the present and the random variables ... ,X„-2,X„-i corre­
spond to observations in the past. How can the past observations.. .,X„-2,X„-i
be used to predict X„? That is, is there some function i//(... ,Xn-2,X„-i) of
5.5
PREDICTION THEORY
173
the past that predicts Xn ? Because prediction entails some probability of error,
there must be some criterion for choosing a predictor. There also must be some
internal coherence in the sequence {Xy}. For example, if Xn is independent of
the past, then the past is of no use for predicting Xn.
Throughout this section, {Xy} will denote a two-sided sequence of random
variables {Xy
having finite second moments. The construction of such
sequences is similar to the construction of infinite sequences of Bernoulli
random variables.
Definition 5.4
The sequence {Xy} is a stationary sequence iffor each finite sequence of integers
ji < J2 < • • • < jk and integers n,
• • • > xjf) ~ fXj....,Xjk (xji> ■ • • > xjk )• ■
We have seen that a (one-sided) sequence of Bernoulli random variables
has this property for positive integers n. Stationarity is stronger than what is
required for this section.
Definition 5.5
The sequence of random variables {Xy} is weakly stationary if E[Xy] = p.
independently ofj and the covariance E[(Xj — p.)(Xk — /x)] depends only upon
\j — k\,-<x> < j,k <+<x>. ■
Since E[(Xy —/x)(Xj — /x)] is independent of j, cr2 = var Xo = E[(Xq —
/x)2] = E[(X;- — /x)2] = var Xy, — co <j < +oo We will assume throughout
that cr > 0.
If {Xy} is a weakly stationary process, the function
7?(n) = E[(Xy-/x)(Xy+„ -/x)],
-oo < M <+oo
is independent of j and is called the covariance function of the sequence. Note
that R (0) = cr2 and that
R(-n) = E[(Xj ~ p.)(X}-n - /x)] = E[(Xy-„ - /x)(Xy - /x)] = R(n),
andsofl(n) = R(~n) = R(|n|). The function
p(n) = —r->
crz
—oo < n < +oo
is called the correlation function of the sequence {Xy}. Note that p(0) = 1.
Let {Xy} be a two-sided sequence of independent ran­
dom variables having the same density function and let cr2 be the common
variance. Then cov(Xj,Xk) = 0 whenever j # k and cov(Xy,Xy) =
var Xj = cr2. Thus,
EXAMPLE 5.8
174
5
STOCHASTIC PROCESSES
ifn = 0
ifn # 0.
RW =
The sequence {Xj} is both stationary and weakly stationary. ■
*
\
The sequence of the previous example can be used to construct other weakly
stationary sequences.
EXAMPLE 5.9 (Moving Average Process) Let {T;} be a two-sided se­
quence of independent random variables with finite second moments having
the same density function, and let
= E[Vo], cr2 = varTo- Now let
a0,..., am -! be a finite sequence of real numbers and define
X;- = aoYj + aiYj-i + • • • + am-iYj-m+i,
—00 < j < +°°.
Each Xj has finite second moments by Lemma 4.4.1, and
Yj-k
E[X;-] = E
k =0
This shows that E [X;- ] is independent ofj. We will now show that cov (X,, XI+„)
is independent of i. Define dj = 0 for j £ {0, 1,..., m — 1} and assume for
the time being that n > 0. Since the Yj are independent random variables
andE[(y,_y — /x)(V
— ^c)] = Oexceptwhenj = k — n,
cov (Xj, Xi+„) = E [X,- - A* 22 ai P'
A
i = 0 /\
=E
-
22 ajakE[(Yi-j
j,k = O
(2E2
fl^Yi+n-k
= 22flfc-"flfc vary-+„-fc
fc = 0
. &k—n &k Var Yi+n — fc.
k=n
/i)]
5.5
PREDICTION THEORY
175
Therefore,
cov(X;,X,+„) =
if n < m - 1
if n > tn - 1.
(T2(rtortn + • • • +
0
Clearly, cov (X,, X,+n) is independent of i. This is also true if n is replaced by
— n, because then cov (X;, X, _„) = cov (X, , X,) which is independent of i.
In the particular case that a* = 1/ Jm, k = 0,...,m - 1,
cr2(l - (|n|/m))
if\n\ < tn - 1
0
if\n\ > tn.
If we want to predict X„ using past observations ... , X„-2,X„-i, there
are many ways to choosea predictor X„ = </,(• • •, X>-2>
and we must
formulate some criterion for deciding which is the best. One possible criterion
for choosing a best predictor X„ is to choose X„ so that £[(X„ - X„)2] is a
minimum, where E [(X„ — X „ )2] is a measure of the distance between X„ and
X„, called the mean square error.
Consider a finite collection of random variables Y, Yb..., Yp having finite
second moments and zero means and let
be the collection of all linear
combinations of the 7b ..., Yp', i.e.,
ai Yi; ay ..., ap ER
An element of will be denoted by Y and called a linear predictor of 7. It is
easy to see that if Yi and Yi are in £ and a, b are any two real numbers, then
aYi + bYi is in
A proof of the following theorem would take us too far astray from
probability theory. Proofs can be found in books on measure theory or Hilbert
space theory.
Theorem 5.5.1
There is a Y
* E
such that
£[(y - y
) 2] < £[(y - y)2]
*
foraiiye^.
(5.14)
The y
* of this theorem is called a minimum mean square linear predictor of
7. The quantity £[(7 — 7
) 2] is called the minimum mean square error. It is
*
possible to prove this result using calculus by writing
£[(7 - 7)2] = £[72] - 2'XaiE[YYi] +
i=1
WjElYiYj]
i,j = i
and minimizing the expression on the right as a quadratic function of the
variables ab ..., ap.
176
Theorem 5.5.2
5
STOCHASTIC PROCESSES
ApredictorY has minimum mean square error if and only if E{(Y — Y
*
*)Y]
= 0
for every linear predictor Y in S£. Moreover, if Y
* and Y2 are any two linear
predictors with minimum mean square error, then Y
* = Yj
probability 1.
PROOF: Suppose first that E[(Y - Y
*)Y]
= 0 for all Y G
Then
E[(Y - Y)2] = E[(Y — Y
* +Y
* — Y)2]
= E[(Y - Y
*) 2] + 2E[(Y - *Y
)(Y
Since Y
* - YG
E[(Y - *Y
)(Y
- Y)] + E[(Y
*
- Y)2].
- Y)] = 0 by hypothesis. Therefore,
E[(Y-Y)2] > E[(Y — Y
*) 2] for all Y G
and Y
* has minimum mean square error. Now let Y
* be the predictor of
Theorem 5.5.1 with minimum mean square error. Note that if Y G
and E[Y2] = varY = 0, then Y = 0 with probability 1, and therefore
E[(Y — Y
*)Y]
= 0. We can therefore assume that E[Y2]
0. Suppose that
E[(Y - Y
*)Y]
= A #0
for some Y G <£.
Consider
which is in
since Y and Y
* are in !£. Writing
Z - Y
* = -4— Y,
E(Y2]
E[(Y - Z)2] = E[((Y - Y
*)
+ (Y
* - Z))2]
= E[(Y - Y
*) 2] + 2E[(Y - *Y
)(Y
- Z)]
+ E[(Y
*
— Z)2]
)(-A-Y)
*
= E[(Y- Y
*) 2] ~2E (Y- y
E[Y2]
+ E ——Y2
Le[y2]2 J
A
= E[(Y - Y
*) 2] - 2-4-E[(Y - Y
*)Y]
E[Y2]
2 + -4A2
= E[(Y-Y
)
*
2]-—2A
4E[Y2]
E[Y2]
A2
+ -4—E[Y2]
E[Y2]2
5.5
PREDICTION THEORY
177
= E[(y - r*) 2] - —~
W2]
)
*
<£[(y-y
2].
But this contradicts the fact that Y
* minimizes the mean square error. The
assumption that E[(y - y
)y]
*
= A # 0 for some T E
leads to a
contradiction, and therefore E(y - y
)y]
*
= 0 for all Y E X. Finally,
suppose that Y* and
are both minimum mean square linear predictors of
y. Then
o = E[(y- yrjy] = E[yy] -E[y
y]
*
fori = i,2,y e£.
Therefore, E[(Y
*
— y2)y] = 0 for all Y E !£. Since the first factor is inX
AAA A
A _
we can replace Y by Y
* — YJ to obtain E[(Y
*
— Y^)2] = 0, and therefore
* = Y^ with probability 1 (see Exercise 4.3.11). ■
y
We now return to the two-sided weakly stationary sequence {Xy} and the
problem of predicting X„ using the past .. .,X„-2,Xn-i- Computationally,
we cannot expect to use all of the past ■. .,X„_2,X„-i to predict X„ and
must decide upon how many observations in the immediate past we will
use. Suppose it has been decided to use just p observations Xn-p,..., Xn-i.
The most general predictor, not necessarily linear, will then have the form
X„ = i/r(X„-p,..., X„-i). It is true, but cannot be proved here, that if we put
$(Xn—pt • ■ ■ > Xn — 1) — E [X„ | Xn—p
Xn —p> . . ., Xn-i
Xn — 1 ],
then X„ = ik(Xn-p, ■. .,X„-i) is the minimum mean square predictor of
Xn. This is not the same as the minimum mean square linear predictor of
X„. In practice, the computation of E[X„ |X„-P = xn~p,... ,Xn-i = *
n-i]
requires complete knowledge of the joint density fxn.l,,...,xn-l,x„> even if the
joint density were known, the calculation of the conditional density might be
intractable. The prediction problem is easier to handle if we limit ourselves to
linear prediction.
To apply the above theorems, we must assume that the random variables
Xj have been centered; i.e., that E [Xy] = 0, —<» < j + °°. Let p and n be
fixed positive integers and let
be the collection of all linear combinations
of X„-p,... ,X„-i. A typical element of
will be denoted by X„. Let
X * = Xy =, flyX„-j be a minimum mean square linear predictor of X„. Then
= 0
for all X „ G SL
E[(X„ — X
* )X„-y] =.0
forj = l,...,p
„)X„]
*
E[(X„-X
These equations hold if and only if
178
5
STOCHASTIC PROCESSES
or
— • • • — apX„-p)Xn-j] — 0
for/ — 1,... ,p
= aiE[X,i-iX„-j] + • • • + apE[Xn-pX„-;]
for; = l,...,p.
E[(XM —
or
E [X„X„-j]
Thus, the above condition is equivalent to
R(J) = aii?(l - j) + --• + apR(p - j)
i.e., the
...,
for j = 1,.. . ,p; (5.15)
must satisfy the linear equations
P (1) = aiR(0) + • • • + apR(p
P (2) = ciiR(
1)
1) + * * * + apR(p
2)
R(p) = a\R(l - p) + • • • + apR(0).
Since there is at least one minimum mean square linear predictor X
* , there is
at least one solution ai,..., ap of this system of equations.
We can also calculate the minimum mean square error crj, using X
* as
follows:
<r2p = E[(X„ -x;,)2]
= E[(X„ -X
„)(X„
*
„)]
*
-X
= E[(X„ — X
)X„]
*
— E[(X„ — *
)X
X
The second term on the right is zero since X
*
].
6 ZE. Thus,
p
= E[X„X„] -E
;=i
EXAMPLE 5.10
Let {Xy} be a two-sided weakly stationary process with
covariance function
R( } =
1 “ (|n|/3)
0
forn=0,+ 1,+2
otherwise.
5.5
PREDICTION THEORY
179
Then i?(0) = !,/?(!) = 2/3, and R(2) = 1/3. Suppose we take p = 2 so
that the minimum mean square linear predictor X
* = aiXn-1 + a2Xn-2 will
be used to predict Xn. The coefficients alt a2 must then satisfy the equations
2
2
3 =fll + 3fl2
1
2
3 "3
1
*
+ *2’
Solving for ai and a2,
X’„ = -X„_J - -X„-2,
and the minimum mean square error is er
* = 8/15. ■
EXERCISES 5.5
1. Let { Tj} be a sequence of independent random variables having the same
density function with E[Ty] = OandvarYy = 1. For each j 2: 1,
let Xj = l/4Kj + l/2Tj-i + l/4Yj-2. Find the covariance function
for the {Xj} sequence, a minimum mean square linear predictor *X
of X„ based on the last three observations X„-i,X„-2,X„-3, and the
minimum mean square error <t%.
2. Would there be any improvement in the minimum mean square error
in Problem 1 if the last four observations X„ - b X„ _2, X„ -3, X„ -4 were
used to predict X„ ? Verify your answer by determining *X and *cr
.
3. Consider a stationary sequence {Xn*} “
X„ = aX„-i + e„,
that satisfies the equation
-oo < n < +oo,
where {en
is a stationary process wither
*
= vare„ >0 for which
E[e„Xm] = 0 for all integers m andn. Show that |a| < 1.
4. Consider a stationary process {X„}+”
that satisfies the equation
X„+1 — fliX„ + a2X„-i + e„+b
where
is a stationary process with E[e„] = 0, er
* = vare„ >
0 and E[e„Xm] = 0 for all integers m and n. If p is the correlation
function of the X„ process, show that
p(l) — ai + a2p(l)
p(2) = flip(2)+ a2
and determine ai and a2 in terms of p(l) and p(2).
180
5
STOCHASTIC PROCESSES
Solving the following problem without the benefit of mathematical software
such as Mathematica or Maple V would be extremely tedious.
5.
As a result of a statistical study of a stationary process, the values
J? (0), i?(2), i?(3), and i?(4) of the covariance function R(n) have
been estimated to be 2,-1.68,-1.46, 1.22, and 1.08, respectively. If the
last four observed values of the process are, in the order observed,
— 2.25, —1.25, .25, and 3.75, what is the minimum mean square linear
predictor of the next value?
SUPPLEMENTAL READING LIST
R. B. Ash (1970). Basic Probability Theory. New York: Wiley.
C. W. Helstrom (1991). Probability and Stochastic Processes for Engineers, 2nd
ed. New York: Macmillan.
S. Karlin and H. M. Taylor (1975). A First Course in Stochastic Processes, 2nd ed.
New York: Academic Press.
M. Kendall and J. K. Ord (1990). Time Series, 3rd ed. New York: Oxford
University Press.
CONTINUOUS RANDOM VARIABLES
INTRODUCTION
At one time, a chance variable or random variable X was an undefined entity
with an associated function F(x), called the distribution function of X, that
specified the probability that X
x. Probability theory at that time dealt with
properties of distribution functions. The concept of probability space did not
enter into the picture. Rapidly expanding applications of probability theory
eventually necessitated a renewed look at the foundations. Most of this chapter
discusses random variables as they were dealt with before the development of
the probability space model.
Familiarity with the evaluation of double integrals by means of iterated
integrals will be taken for granted. There will be situations in which it is
necessary to interchange the order of integration of iterated integrals. The
following statement justifies this procedure. Let/ : R2 —> R be a nonnegative
real-valued function that is Riemann integrable on each finite rectangle, and
let (a, b), (c,d) be two intervals of real numbers, finite or infinite. Then
rb /rd
I
I
a \Jc
\
rd /rb
f(x,y)dyjdx =
/
I
Jc
\la
\
f(x,y)dx \dy.
/
Proofs of this result can be found in most calculus books.
181
182
6
CONTINUOUS RANDOM VARIABLES
RANDOM VARIABLES
The random variables considered in the previous chapters are customarily
called discrete random variables, meaning that their ranges are countable
sets. But because there are meaningful numerical attributes of outcomes of
experiments not having this property,'we must look at the concept of random
variables anew.
Let (0,S7, P) be a probability space. In previous chapters, a mapping
X : O —> R was called a random variable if its range is a countable set
{xi, x2>...} and (X = x) E S' for all x in the range of X. Since (X < x) =
UXi <x(X = X,) E S', events of the type (X
x),x E R, also belong to S'.
We will take the latter property as the definition of a random variable.
Definition 6.1
A mapping X : Q. —> R is called a random variable if (X
X(o>)
x} E S' for all x E R. ■
x) = {w :
In some instances, we allow X to take on the value +°°, which is not in R,
particularly for waiting times; in this case, X is called an extended real-valued
random variable. The criterion is exactly the same; i.e., (X s x) E S' for all
x E R.
The fact that (X
x) E S' for x E R means that P(X
x) is defined
for all x E R and defines a function on R.
Definition 6.2
IfX is a random variable, the function Fx ■ R
Fx(x) = P(X < x),
R defined by
x E R,
is called the distribution function of the random variable X. ■
Consider a,b E R with a
b. Since P(a < X < b) = P((X
i>) Cl (X
a)c) = P(X < fe) — P(X
a), probabilities of the type
P(a < X
b) can be calculated using Fx by the equation
P(a <X < b) = Fx(b) -Fx(a).
(6.1)
To illustrate these concepts, we need to enlarge our collection of probability
spaces. In many experimental situations, the outcome of the experiment is
a real number, and it is natural to take O = R. It should be permissible to
speak of the outcome being in some interval of real numbers. This means that
S' should at least include all intervals of the form (a, b), [a, b), (a, b], [a, b],
(a, +<»), [a, +<»), and so forth. It is a fact, but cannot be proved here, that
there is a smallest (T-algebra of subsets of R that contains all intervals of the
type just described. S% however, does not contain all subsets of R. The reader
may take comfort in the fact that any subset of R encountered at this level will
be in S'. As usual, subsets of R in S' are called events. Now that we have settled
on fl and S', what do we do for a probability function P?
6.2
183
RANDOM VARIABLES
Consider a conceptual experiment in which a number is
selected at random from the interval (0,1). What does this mean? It should
mean that the probability that the number selected will be in the interval
(1/8,1/4) should be the same as the probability that it will be in the interval
(7/8,1) and also that it is twice as likely to be in the interval (1/8,3/8).
This suggests that P should be determined by the length of the interval,
provided the interval is a subinterval of (0,1); i.e., P((a, bf) = b — a whenever
0
a
b
1. Since no probability should be assigned to points outside
(0,1), if (a, b) is any interval, P((a,b)) should be equal to P((a,b) Cl (0,1));
e.g., P(( —3,1/2)) = P((0,1/2)) — 1/2. It can be shown that there is a
probability function P defined on S' with these properties. ■
EXAMPLE 6.1
Example 6.1 can be modified by replacing the interval (0,1) by an interval
(a, b} with —oo < a < b < +oo, as indicated below:
(///■////////) )
c
d b
and defining
(6.2)
whenever a < c < d < b, P(( —°°, a)) = 0, and P((b> +<»)) = 0. Pis
then called a uniform probability measure on (a, b). For this example and
Example 6.1, it should be noted that P({x}) = 0 for all x E R. For example,
if x E (a, b), then {x} C (x — (1/n), x + (l/n)) C (a, b) for large n, and so
0 S P({x}) S P| \x - —, x + — ) | = ----------- > 0 as n —> oo.
\\
n
nJ)
b-a
Since P({x}) does not depend upon n, P({x}) = 0. Similar arguments can be
used to show the same when x
a orx
b. Since single points are assigned
zero probability, P((c,d]) = P([c,d)) = P([c,d]) = P((c,d)).
Another way to modify Example 6.1, in addition to replacing (0,1) by
(a, b), is to take O = (a, b) and define P only for intervals (c,d) C (a, b) by
Equation 6.2.
Every reader is familiar with experiments for which the above model is
appropriate. The familiar pointer mounted on a circular disk is an example
of an experiment in which a number between 0 and 2ir is selected at random,
although the interpretation of the outcome usually involves digitizing the
outcome by assigning digits to equal sectors of the disk.
With such examples in mind, we can now exhibit nondiscrete random
variables.
184
6
CONTINUOUS RANDOM VARIABLES
EXAMPLE 6.2 Let (O, S', P) be a probability measure space where O =
R and P is the uniform probability measure on (a, b), —00 < a < b < +•».
For each a> G O, let X(w) = co. If x G R, then (X
x) = {« :
x} = {o> : co
x} = (—oc,x] £ 9? and X is a random variable. X is not
discrete since it can take on every value in R. The distribution function Fx of
X can be calculated as follows:
(i)
Ifx < a,thenFx(x) = P(X < x) = P((-oo,x]) = P((—<»,x] A
(a, b)) = P(0) = 0 since (—<»,x] Cl (a,b) = 0.
(ii)
If a < x < b, then Fx(x) = P(X < x) = P((-<»,x] Cl (a, b)) =
P((ti,x]) = P((a,x)) = (x — a)/(b — a).
(Hi)
If x > b, then Fx(x) = P(X < x) = P((-oo,x] A (a, b)) =
P((a,b)) = 1.
Thus,
FxW =
if x
a
if a < x < b
if x >: b. ■
0
(x-a)/(b~a)
1
The random variable of this example is called a continuous random variable.
The choice of “continuous” as a modifier is a traditional but poor one in that
continuous in this context is analogous to a continuous distribution of mass as
opposed to a discrete distribution and is in no way related to the concept of
continuity of a function as studied in the calculus.
Other examples of continuous random variables can be constructed as
follows. Let/ : R —> R be a real-valued nonnegative function that is Riemann
integrable on every subinterval of R, and the improper integral [
^f(t)dt
*
is
defined and equal to 1. Let O = R, let S' be the smallest cr-algebra containing
all intervals of real numbers, and define P(A) for A G S' by putting
rb
P((a,b)) =
f(t)dt
for any interval (a,b). The integral on the right is also equal to P((a,b]),
P([a, b)), and P([a, b]). Consider, for example, P((a, b]). Since
//
1 \\
P((a,b)) < P((a,b]) == P a,b + - =
\\
" //
ffr+(l/n)
f(t)dt
and the Riemann integral fxf(t)d t is a continuous function of its upper limit,
rb+(l/n)
P((a,b]) < lim
n~
* xJa
rb
f(t)dt =
Ja
f(t)dt = P((a,b)).
6.2
185
random variables
Therefore, P(fa,b]) = P((a,by). In calculating probabilities P(I) for an
interval I, we can remove or adjoin endpoints to I without affecting the
probabilities. Now define X : ft -> R by putting X(to) = to for all to E R.
Then (X
x) = {to : X(to)
x) = {to : to
x} = (—°°,x] and
Fx(x) = P(X < x) = P((-oo,x]) = P((—°°,x)) = r /(t)dt.
J —00
Caveat: Generally speaking, endpoints can be removed or adjoined in this
way only when probabilities are computed by integrating a Riemann integrable
function.
EXAMPLE 6.3
Consider the function
fM =
o
e~x
if x < 0
if x > 0.
Since f is nonnegative and
1 = f“e~'dt = linib^>.i.a>f()be~tdt =
limb_»+<»[—e~b + 1] = 1, there is a random variable X with distribution
function
= { i _°e-«
if x < 0
if x
0.
The graphs of f and Fx are shown in Figure 6.1. The functions f and
Fx are called the exponential density function and the distribution function,
respectively. ■
As in the discrete case, it is necessary to perform various algebraic operations
on random variables and deal with functions of random variables. Let(17, S', P)
be a probability space. Given random variables X and K, we can define X + Y
by putting (X + T)( to) = X(to) +T(to) for to E O and define X Y by putting
(X Y)(to) = X(to)y(to) for to E O. More generally, if 0 is a function of n
real variables xb..., x„ and Xi,..., X„ are random variables, we can define
0(Xb ...,X„)(to) = 0(Xi(to),.. .,X„(to)); the above sum and product
operations are special cases by taking 0(x,y) = x + y and </>(x,y) = xy,
respectively.
Lemma 6.2.1
If X is a random variable, then (X < x), (X 2: x), (X > x) G S' for all x E R.
If a,b E R with a < b, then (a < X
b) E S'.
PROOF: Let Q = {n, r2,...} be the countable collection of rational numbers.
Suppose x E R and X(to) < x. Then there is an rj E Q such that
X(to) < rj < x. Conversely, if r;- < x andX(to)
ry, thenX(to) < x.
Thus,
(X < x) = Ur.6Q,r.<x(X < rf) G &
186
6
CONTINUOUS RANDOM VARIABLES
/(x) = e~x, x > 0
Fx(x) = 1 " e~x<x - 0
FIGURE 6.1 Exponential density and distribution
functions.
by definition of a random variable. Since (X S x) G S',
(X > x) = (X < x)c G S?.
By the first part of the proof, (X < x) G S', and so (X < x)c = (X > x) G
S?. If a < b, then (a < X < b) = (X < b) Cl (X > a) G S5. ■
It is easily seen that (a < X < b), (a < X < b), and (a < X :£ b) also
belong to S'.
Theorem 6.2.2
IfX, Y are random variables and a,b G R, then aX + bYt XYt and |X| are all
random variables.
PROOF: We first show that aX is a random variable. If a =0, then aX = 0
and (aX S x) = 0 if x < 0 and (oX
x) = Q if x SO and aX is
a random variable. If a > 0, then (aX < x) = (X < x/a) G S' for all
x G R, and if a <0, then (aX < x) = (X S x/a) G & for all x G R.
Thus, aX is a random variable. We now show that X + Y is a random variable.
Consider (X + Y > z), z G R. If w is in this set, then X(w) > z — Y{oY) and
6.2
187
RANDOM VARIABLES
there is a rational number rj such that X(&>) > rj > z — y(&>). The converse
is also true. Therefore,
(X + Y > z) = U [(X > rj) n (y > z - rj)) e 9?,
and therefore (X + Y
z) G S'. To show that XT is a random variable, we
first show that X2 is a random variable. If x < 0, then (X2
x) = 0 6 9';
if x 2 0, then (X2 < x) = ( — Jx < X < Jx) 6 S', as was to
be proved. Since XY = (1/4)((X + T)2 — (X — T)2), XY is a random
variable by the previous steps. Since (|Xj s x) = 0 for x < 0 and
(|X|
x) = (—x < X
x)forx s 0, |X| is a random variable. ■
It follows from this theorem that if n is a positive integer, X is a random
variable, and ao,.... an are constants, then p(X) = «oX" +aiX"-1 + • • • + a„
is a random variable; i.e„ a polynomial function of a random variable is again
a random variable. This result can be extended to continuous functions. That
is, if </> : R —> R is a continuous function and X is a random variable, then
</>(X) is a random variable. Likewise, if <f> is a continuous function of n
variables Xi,.. .,xn and Xi,... ,X„ are random variables, then </>(Xi,... ,Xn)
is a random variable. There is, however, trouble lurking beyond this point. In
the case of a discrete random variable X, <£(X) is a random variable for any
function </> : R —> R. This fact need not be true for nondiscrete random
variables. But since we will have no need to go beyond continuous functions
of random variables, we will leave this matter where it belongs; namely, in a
graduate course in real analysis.
One of the central problems we will take up has to do with finding the
distribution function of 7 = </>(X) knowing the distribution function of X.
EXAMPLE 6.4
Let X be a random variable having the distribution
function
FXW = [ f(P)dt
J —co
where
1
0
ifO <x < 1
otherwise.
Then
FXM =
ifx < 0
ifO < x < 1
if x 2= 1.
188
6
CONTINUOUS RANDOM VARIABLES
FIGURE 6.2 Distribution function of Y - X2.
If Y = X2, what is the distribution function of Y? If y < 0, then
Fy(y) = P(Y < y) = P(X2 < y) = P(0) = 0. If 0 < y < 1,
then Fy(y) = P(Y == y) = P(X2 < y) = P(-< X <
=
= Jo
~ Jy- If y then Fy(y) = P(Y < y) =
P(X2 < y) = j^f(t)dt = 1. Therefore,
Fy(y) = <
ify < 0
if 0 < y < 1
ify > 1.
The graph of Fy is shown in Figure 6.2. ■
Consider a conceptual experiment in which a point is chosen at random
from a region S in the plane; e.g„ a region encompassed by a simple closed
curve. "At random” should mean that the probabilities that the chosen point
will be in congruent subregions of S should be the same and that the probability
that the chosen point will be in disjoint subregions should be the sum of the
probabilities of being in each. These criteria suggest that probabilities should be
determined by areas; e.g., if A C S is the shaded region depicted in Figure 6.3,
then the probability that the chosen point will be in A is given by
FIGURE 6.3 Geometric probabilities.
6.2
RANDOM VARIABLES
189
pM)
( } = —
|s|
where |A| denotes the area of A. More generally, if S is a region in the ndimensional space Rn and A is a subregion, then P(A) is defined by the same
equation, with |A| representing the n -dimensional volume of A. Probabilities
defined in this way are called geometric probabilities.
EXAMPLE 6.5 Suppose a point is chosen at random from a region S in
the plane consisting of points (x,y) with x2 + y2
1; i.e., a point in a disk of
radius 1 and having center at (0,0). Let A be the set of points in a disk of radius
1/2 and having the same center. In this case, |A| = rr(l/2)2 = tt/4, |S| = it,
and
|S|
EXERCISES 6.2
1.
4' ■
An experiment consists of choosing a point at random from a disk D
in the plane with center at (0,0) of radius 1. If X is the distance of the
point from the origin, find the distribution function of X.
2. An experiment consists of selecting a point at random from a ball in
3-space with center at the origin and radius 1. If X is the distance from
the origin, find the distribution function of X.
3.
By choosing a point X at random from the interval [0,1], the line
segment [0,1] is broken into two line segments [0,X] and [X, 1]. What
is the probability that the length of the shorter segment will be less than
or equal to one-fourth of the length of the longer segment?
4. A point X is chosen at random from the interval [0,1]. What is the
probability that the roots of the equation 4y2 + 7Xy + 1 = 0 will be
real?
5. If
0
x+1
~ S ! -x
0
r, .
calculate F(x) =
ifx < -1
if — 1 < x < 0
if0 < x < 1
ifx > 1,
dt for each real number x.
6.
Consider the function g(x) =
x G R. Find a constant c
such that
= 1 where/(t) = cg(t). If X is a random
variable having distribution function Fx(x) = ^-«,fWdt, calculate
P(-l < X < 1).
7.
Let
'
f 1/2
0 *
if - 1 < t < 1
otherwise,
190
6
CONTINUOUS RANDOM VARIABLES
F(x) - f* mf(t)dt>x e
and X be a random variable having dis­
tribution function F. Calculate F(x) for each x G R. If Y = X12, what
is the distribution function of Y?
8. Let X be a random variable having the distribution function Fx (x) =
J xf(t)dt where
*
f(t) =
ifO < x < 1
otherwise.
1
0
If Y = X3, what is the distribution function of Y?
9. If
0
x2
F(x) = <
x - (1/4)
1
ifx < 0
ifO < x < 1/2
if 1/2 < x < 5/4
ifx > 5/4,
find a function /(x),x G R, such that F(x) = l* xf(t)dt for all
x G R.
10. If X : ft —> R, show that the following statements are equivalent.
x) G S'for allx G R.
(a)
(X
(b)
(X < x) G S'for allx G R.
(c)
(X 2: x) G S' for allx G R.
(d)
(X > x) G S'for allx G R.
DISTRIBUTION FUNCTIONS
Let (ft, S', P) be a probability space and let X be a random variable. The
distribution function Fx ofX was defined in the previous section and is given by
Fx(x) = P(X
x),
x G R.
If the random variable X is known from context, the subscript X will be
suppressed. Before looking at properties of distribution functions, we review
some definitions from the calculus. Consider a function f : R —> R and let
a G R.
1.
If there is a number L with the property that for each e > 0 there is a
8 > 0 such that
|/(x) — L| < e whenever a < x < a + 8,
we write limx_a+/(x) = L. L is usually denoted by f(a +).
6.3
DISTRIBUTION FUNCTIONS
2.
191
If there is a number I with the property that for each e > 0 there is a
8 > 0 such that
|/(x) — Z| < e whenever a — 8 < x < a,
we write limx_fl-/(x) = I. I is usually denoted by/(a—).
3.
If there is a number L with the property that for each e > 0 there is an
MGR such that
j/(x) — L| < e whenever x > M,
we write limx_»+oo/(x) = L. L is usually denoted by/(+<»).
4.
If there is a number I with the property that for each e > 0 there is an
m G R such that
[/(x) — /| < e whenever x < m,
we write limx_-a>/(x) = I. I is usually denoted by/(—<»).
Theorem 6.3.1
IfF is a distribution function, then
(i) 0
(ii) F(x)
F(x)
1 for all x G R.
F(y) wheneverx — y.
(iii) F(—°°) = OandF(+°o) = 1.
(iv) F(x+) = lim y ^X+F(y) exists for each x E.RandF(x') = F(x+)
F(x-) = lim^^x- F(y) exists for each x G R.
(v) F is right-continuous at each x G R;i.e., F(x) = F(x+) for allx G R.
In addition, F(x—) = P(X < x).
PROOF:
(i) Since F(x) is a probability, (i) is trivially true.
(ii) If x s y, then (X < x) G (X
y) and F(x) = P(X < x) <
P(X < y) = F(y), so that (ii) is true.
(iii) We will prove only the second part of (iii), the proof of the first
part being similar. We first prove that lim„_>ooF(n) = 1. Note
that {(X < «)}“=i is an increasing sequence of events. For any
co G (l,X(co) is real number and there is an n such thatX(w) s
i.e., ft C U“=1(X < n). Since the opposite relation is always true,
the events (X
n) increase to ft. Therefore, lim„_ooF(n) =
lim„_»ooP(X < n) = P(O) = 1 by Theorem 2.5.3. Thus, for each
e > 0 there is a positive integer N such that ]F(N) — 1| < €, which
192
6
CONTINUOUS RANDOM VARIABLES
implies that 1 — e < F(N)
1. Ifx 2 N,then 1 — e < F(N)
F(x)
1 < 1 + e, and so |F(x) — 1| < e whenever x 2 N. This
proves that F(+<») = limx_»+xF(x) = 1.
(iv) Fixx G R. Since the sequence of events {(X
x + (1/m ))}„_, isadecreasing sequence and (X £ x) = A„ = 1(X
x + (l/n)),F(x) =
lim„_a>F(x + (1/n)) by Theorem 2.5.3. Thus, for each € > 0 there
is a positive integer N such that |F(x) ~ F(x + (1/N))| < e, which
implies that F(x + (1/N)) < F(x) + e. Supposex < y < x + (1/N).
Then,
F(x) — e < F(x)
F(y)
F^x +
< F(x) + e;
i.e., |F(y) — F(x)| < e whenever |y — x| < 1/N. This shows that
F(x) = F(x+) = limy_»x+ F(y). In the second part of (iv), we
can show only that the left limit limy_»x- F(y) exists; it need not be
equal to F(x). To show that the left limit at x exists, note that the
sequence of events {(X
x — (l/n))}„ =, is an increasing sequence
with U”=1(X
x — (1/n)) = (X < x). By Theorem 2.5.3,
lim F|x — — ) = lim P[X < x — — ) = P(X < x).
I
nj
\
nj
*OC
n-
Thus, given e > 0, there is a positive integer N such that
P(X < x) - e < F|x - 1) < P(X < x).
\
™/
Let 8 = (1/N). Suppose x — 8 = x — (1/N) < y < x. Let M be a
positive integer such thaty < x — (1/M) < x. Then
P(X < x) - e < Fix - 1 ) < F(y) < Fix ) < P(X < x);
\
Nj
\
M)
i.e., |F(y) — P(X < x)| < e. We have shown that for each e > 0
there is a 8 > 0 such that |F(y) - P(X < x)| < e whenever
x — 8 < y < x; i.e., we have shown not only that the left limit at x
exists but also that F(x-) = limF(y) = P(X < x).
(v)
Corollary 6.3.2
Statement (v) is just a restatement of the first part of (iv). ■
For each x G P, P(X = x) = F(x+) - F(x~) = F(x) - F(x-).
PROOF: P(X = x) = P((X
x) = F(x) — F(x—). ■
x) A (X < x)c) = P(X
x) - P(X <
6.3
193
DISTRIBUTION FUNCTIONS
Since P(X < x) s P(X
x), we always have
F(x-) < F(x) = F(x+)>
but there may not be equality on the left. Note that F is continuous at x if
and only if F(x~) = F(x) and that F can have jump discontinuities only as
depicted in Figure 6.4, the magnitude of the jump at x being F(x) — F(x~).
How large is the set of points of discontinuity of F?
Theorem 6.3.3
The set ofpoints of discontinuity of a distribution function F is at most countable.
PROOF: For each n >: l,letD„ = {x : F(x) — F(x—) S 1/m}. EachD„ is
empty or finite, because otherwise the sum of the jumps of F at points in D„
would exceed 1, which cannot happen since the total increase in F is 1. Since
D =
is the set of points at which F has a jump discontinuity, D is at
most countable, by Theorem 2.3.1. ■
EXAMPLE 6. 6
Consider a random variable X with distribution function
F(x) =
0
(l/4)(x - 1)
1/2
(l/2)(x - 3)
. 1
ifx < 1
ifl == x < 2
if 2 < x < 4
if 4 < x < 5
ifx S: 5.
Clearly, F(2—) = 1/4 and F(2) = F(2+) = 1/2. F is not continuous at 2,
andP(X = 2) = 1/4. ■
Theorem 6.3.4
Given a function F : R —> R with properties (i)-(v) in Theorem 6.3.1, there is a
probability space
and a random variable X having F as its distribution
function.
Sketch of Proof: Take Q, = R and take S' to be the smallest <r-algebra con­
taining all intervals of real numbers. For each co Gft,letX(co) = co. Since
(X < a) = {co : X(co) £ fl} = {co : co S «} = (~°°, a J G S', X is a ran­
FIGURE 6.4 Jump discontinuity.
194
6
CONTINUOUS RANDOM VARIABLES
domvariable. Foraninterval(a,bJ,defineP((a,bj) = F(b)—F(a). Thefunction P can then be extended to all events in S'. Thus, Fx(x) = P(X
x) =
P((-°°,xJ) = lim„_»_xP((n,x]) = lim„_»_x(F(x) - F(n)) = F(x). ■
If f ; R —> R is nonnegative and Riemann integrable on every interval of
real numbers with f2Z/(Od t = I, then Theorem 6.3.4 applies to the function
fX
Fix') =
J —X
f(t)dt.
Thus, there is a probability space (ft, S', P) and a random variable X having
F as its distribution function. The function / is called a density function for
F and for X. Is the converse true? That is, given a random variable X with
distribution function F, is there a nonnegative Riemann integrable function
f:R
R such that the above equation holds? If there were such a function
/, F would certainly have to be a continuous function, since the indefinite
integral of / is a continuous function of its upper limit. In Example 6.6, the
distribution function F is not continuous and consequently does not have a
density function. A positive answer to the question just posed requires that F
be at least continuous. But that is not enough to ensure that F has a density
function. There is a criterion for determining if there is such a function.
Definition 6.3
The function F : R —> R is absolutely continuous iffor each e > 0 there is a
8 > 0 such that
n
^\F(Pi)-F(a^< e
i=l
whenever n 2: 1 and (ot\, Pf),... ,(an,
X"=1|)3,-a;|<8. ■
are nonoverlapping intervals with
If F is absolutely continuous, then F is continuous, as can be seen by taking
n = 1 in this definition.
The following theorem settles the question asked above.
Theorem 6.3.5
Let F be a distribution function on R. Then F is the indefinite integral of a function
f if and only ifF is absolutely continuous.
The function / of this theorem can be identified, at most points of R, as the
derivative F’(x) of F. We will not elaborate on what is meant by “most points”
at this stage. Such matters, as well as the proof of this theorem, are best left to
advanced analysis courses. It should also be noted that the function / is not
unique. If g : R —> R agrees with / except at a finite number of points (or
even at countably many points), then F(x) = f-a>g(t)dt also. In practice,
the following fact can be used to establish that the distribution function F has
6.3
195
DISTRIBUTION FUNCTIONS
a density function. If f(x) = F'(x) wherever the derivative is defined and
~ 1, then/is a density function for F.
Consider a situation in which it is known that the random variable X has
a density function / and we want to find a density function, if there is one,
for the random variable Y = </>(X) where cf>: R —> R is continuous. The
procedure for doing this is best illustrated by means of examples.
Suppose the random variable X has the density
EXAMPLE 6. 7
e~x
0
fW =
ifx 2: 0
ifx < 0.
If Y = X2, what is the density of T? We first calculate the distribution function
ofX. Forx < 0, Fx(x) =
= 0;forx S: 0, Fx(x) = ^e~‘dt =
1 — e~x. Thus,
FxM =
ifx < 0
if x S 0.
0
1 - e~x
Let G be the distribution function of Y. Fory < 0, G(y) = P(Y
y) = 0.
Suppose y
0. Then G(y) = P(Y < y) = P(X2 < y) = P{~ Jy
X - 7?) =
Jy < X - 7?) = px( 7?) “ Fx<~ Jy)’ Therefore,
G(y)" . Fx(7y)~Fx(-7y)
ify < 0
ify > 0.
Since — Jy < 0 fory > 0, Fx( — Jy} = 0 and
G(y) =
0
1 —e
■Jy
ify < 0
ify >: 0.
The density g( y) = G'(y) is therefore given by
g(y) =
EXAMPLE 6. 8
0
(1/2 75^-^
ify < 0
ify > 0. ■
Suppose the random variable X has the density
/(x)
0
ifO < x < 1
otherwise
0
= < x
1 ‘
ifx < 0
if 0
x < 1
ifx
1.
and let Y = 7x. Then
Fx(
)
*
196
6
CONTINUOUS RANDOM VARIABLES
We can assume that X takes on values in [0,1]. The same will be true of Y.
Therefore, if G is the distribution function of Y, then G (y) = 0 if y < 0 and
G(y) = 1 if y > 1. Suppose 0
y
1. Then G(y) = P(Y < y) =
P(0 < jx <y) = P(0 < X </) = Fx(y2) — Fx(0) = Fx(y2) = y2.
Hence,
G(y) =
0
y2
i
ify < 0
if 0 < y < 1
ify
1.
The density g(y) = G'(y') is therefore given by
g(y) =
ifO < y < 1
otherwise. ■
2y
0
In both of these examples, the distribution function of Y = 4>(X) is
obtained by converting P(Y
y) into a probability statement about X using
the properties of the function <f>.
EXAMPLE 6. 9 (Cauchy Density) A source of light is mounted on one of
two parallel walls, which are a unit distance apart as depicted in Figure 6.5. An
angle 0 is chosen at random from the interval ( — tt/2, tt/2), measured from
the perpendicular to the wall at the source, and a light beam is cast in that
direction. The density function of 0 is then
feW =
I/77
0
if — tt/2 < 0 < tt/2
otherwise,
and the distribution function F©(0) is given by
0
F©(0) = < (1/tt)(0 + (tt/2))
I
if 6 < — tt/2
if - tt/2 S 0 < tt/2
if 0 > 77/2.
6.3
197
DISTRIBUTION FUNCTIONS
FIGURE 6.6 X = tan©.
Let X be the directed distance from the nearest point on the opposite wall to
the point where the light beam hits the opposite wall. Then X = tan 0. Since
0 can take on values between —tt/2 and tt/2, X can take on values between
—<» and +oo. For — oo < x < +oo,
FXW =
x) = P(tan©
x).
To determine P(tan 0
x), we must convert the statement tan 0 ^x into a
statement about©. Since arctan x is an increasing function on (—oo,+oo),0 =
arctan (tan©)
arctanx whenever tan©
x and conversely. Thus,
P(tan0
x) = P(0
arctan x), but we should keep in mind that 0 takes
on values between — tt/2 and tt/2, so that
Fx(x) = P(tan0
x) = P^—y < 0
arctan x
77’\
= Fq (arctan x) — F© I —
J
1 (
7r\
77 \
2/
= — arctan x + — .
Therefore,
/x(x) = F'X(X) =
-oo < X < +00.
(6.3)
Equation 6.3 can be obtained by looking at the graph of the tangent function
in Figure 6.6; for tan 0 s x, 0 must be in the interval from —tt/2 to arctan x,
198
6
CONTINUOUS RANDOM VARIABLES
and according to the definition of “random,”
_
FxM = P(tan0
/ 7T
_
\
arctan x — ( —tt/2)
x) = P [-— < 0 := arctan x = -----------------------\ 2
'
7T
so that
A(X) = Pi(x) = 1^.
— 00 < X < +00.
This function is known as the Cauchy density. ■
EXERCISES 6.3
1.
Calculate F'(x) for the distribution function of Example 6.6 and verify
that F is not the indefinite integral of F'.
2.
Let X be a random variable with distribution function
0
x2/4
F(x) = <
1/2
(l/2)(x - 1)
1
ifx < 0
ifO < x < 1
if 1 < x < 2
if2 < x < 3
ifx
3.
Calculate
(a)
P(0 < X < 1).
(b)
P(0 < X < 1).
(c)
P(X = 1).
(d) P(l/2 < X < 5/2).
3.
Let X be a random variable having density function
fW =
1/tt
0
if — tt/2 < x < tt/2
otherwise
and let K = sinX. Find a density function for y.
4.
Let X be a random variable with distribution function
F(x) =
0
1 -
ifx < 0
ifx s 0.
If M >0, let y = min(X,M). Determine the distribution function
of 7. Does y have a density function?
5.
Let X be a random variable with distribution function
F(x) =
0
1 ~e~x
ifx < 0
ifx 2: 0
and let Y — \[x. Since P(X S: 0) = 1, y/x is defined. What is the
density function of y?
6.4
6.
199
JOINT DISTRIBUTION FUNCTIONS
Let X be a random variable having a density function
fxM =
e x
0
ifx >: 0
ifx < 0
and let Y = logX. What is the density of T?
7.
Consider a searchlight that is mounted on a wall. An angle 0 is chosen
at random between — tt/2 and tt/2, as measured from a perpendicular
to the wall, and the light beam falls on an object 100 units away.
If X denotes the distance of the object from the wall, what are the
distribution and density functions of X?
JOINT DISTRIBUTION FUNCTIONS
It is not unusual in experimental situations to consider two or more numerical
attributes of an outcome simultaneously. For the time being, we will consider
only two attributes.
Let X, Y be two random variables on the probability space (ft, 9% P). Then
P(X
x, Y < y) is meaningful and defines a function P
* >y of two real
variables.
Definition 6.4
Ifx,y G R, the function of two real variables
Fx,y(x,y) = P(X <x,T <y)
is called the joint distribution function of the pair (X, Y). ■
Suppose a point is chosen at random from the unit square
U in the plane with opposite vertices at (0,0) and (1,1). Clearly, we should take
ft = U and “random” should mean that the probabilities that the outcome
will be in two congruent regions in ft will be the same. This suggests that the
probability that the outcome will be in a region A within ft should be equal to
the area of the region divided by the area of the unit square, which is 1. Thus,
P(A) = Area(A). If co = (x,y) G ft , let X(co) = x and let T(co) = y.
Then X and Y should be random variables with
EXAMPLE 6.10
Fx.y(x>y) = <
y
xy
I
if x < 0 or y < 0
ifO < x < l,y > 1
ifx > 1,0 < y < 1
if0<x<l,0<y<l
if x > l,y > 1.
As an example of the computation of Px.r (*>/)> suppose 0 < x £ 1,0 £
y < 1. Then Fx,y(x>y) is the area of the shaded rectangle with opposite
vertices at (0,0) and (x,y) as depicted in Figure 6.7. Since the area is
xy,Fx,Y(x,y) = xy. ■
200
6
CONTINUOUS RANDOM VARIABLES
(l.D
Uy)
(0,0)
FIGURE 6.7 Fx.r(xj),0 <x^l,0<y^l.
Joint distribution functions have properties similar to those listed in Theo­
rem 6.3.1 for a distribution function of a single random variable.
Theorem 6.4.1
IfF is a joint distribution function, then
F(x,y)
1 for all (x,y) ER2.
(i)
0
(ii)
If (X|,yi) and (X2,72) are any tw0 points in R2 with x{
x2 and
71
72, thenF(x2,y2) - F(x{,y2} - F(x2,yi) + F(xbyi) 2:0.
(Hi)
limx_+x,?_+x F(x,y) = 1; limx__x>y_-x F(x,y) = 0;
for each y E R, limx_-x F(x, 7) = 0; and
for each x E R, limr_-x F(x,y) = 0.
(iv)
Foreach (a,b) f= R2,hmx*_>a+,yb+F(x,y')
= F(a,b),
limx_»a+F(x, b) = F(a,b), and\imy_b+F(a,y) ~ F(a,b).
(v)
For each (a, b) E R2, limx_.a- F(x, b) and lim^-^- F(a,y) exist.
The inequality of (ii) is easily reconstructed using Figure 6.8 by associating
with each of the vertices a + or — sign, starting with a + at the upper right
vertex and alternating signs, and then applying the signs to the value of F at
the corresponding point. Except for (ii), proofs of these statements are similar
to the proofs of the statements in Theorem 6.3.1.
Proof of (ii) Let (xbyi) and (x2,y2) be two arbitrary points in the plane with
Xi
x2 and 7!
y2. Then
F(x2>72) “ F(xi,y2) - F(x2,7i) + F(xbyi) >: 0.
This inequality follows from the fact that
0 < P(Xj <X < x2,7! < T
72)
= P((X < x2) n (7! < Y < 72) n (X < x,)c)
(6.4)
6.4
201
JOINT DISTRIBUTION FUNCTIONS
= P(X < x2,71 < Y < y2) - P(X < xt,yt < K < y2)
= P((X <x2)n(y <y2)n(y <71)c)
- P((x < xoncr <y2) A(y <yi)c)
= P(X < x2, Y
y2) - P(X <x2,Y < yi)
- P(Xj < xb Y < y2) + P(X < xb Y < y,)
= -Px.yfeyz) ~ -Fx.yte.yi) “ Px.y^b/2) + Fx,y(
*i»/i)
since (X < x2) A (T < y2) A (Y < yj = (X < x2) A (K < yj, and so
forth. ■
In the case of a single random variable X, Theorem 6.3.5 gives a necessary
and sufficient condition that Fx have a density function; namely, that Fx
be absolutely continuous. This concept can be extended to joint distribution
functions, but it is best left to more advanced courses.
Definition 6.5
A nonnegative Riemann integrable function f : R2 —> R is a density function
for the joint distribution function F if
F(x,y) =
where AXiy = {(«, v) : u
to area. ■
f(u,v)da
x,v
forallx,yE.R
y} and da denotes integration with respect
In practice, the preceding double integral over AXiy is calculated using
iterated integrals, since
'f
JJ
AX,y
f(u,v)da ■=
fx icy
\
cy irx
\
I
/(M.vjdvjdw =
I
/(M.vjdwjdv.
J —00 \J —00
/
J—00\J—00
j
We will assume in the remainder of this section that all joint distribution
functions have density functions.
FIGURE 6.8 Alternating signs.
202
6
CONTINUOUS RANDOM VARIABLES
Most distribution functions come about by starting with a nonnegative
Riemann integrable function/ : R2 -> R with total integral 1 and constructing
a probability space (ft, S', P) and a pair of random variables (X, Y) such that
Fx,y has density function / by imitating the construction of the previous
section as follows. Let ft = R2, let S' be the smallest cr-algebra containing all
rectangles in R2, and for co = (x,y) GftletX(w) = x, K(w) = y. ThenX
and Y are random variables. For any rectangle I C R2, define
PUl = j] f(u>v)da,
I
noting that including or excluding the edges of I has no effect on the value
of the double integral. The probability function P can then be extended to
all events in S'. Since (X
x, Y
y) — {a> : X(w)
x, K(w)
y) =
{(«, v) : u < x,v
y}>
Px.Y(x,y) = P(X < x.Y < y) = Jj f(u,v)da
where Ax,y = {(«, v) : u
x,v
y}. It follows that the pair (X, Y) has/ as
density function. It can be shown that for A G S',
P((X, K)GA)
Without getting involved deeply in integration theory, for computational
purposes we must limit the class of regions A for which this probability can be
calculated. For example, if a < b,
</>2 are continuous functions on [a, b]
with </i(x)
<}>2(x'),x G [a, fe], and A — {(«, v) : a < m < b,
S
v
</>2(m)}, then
P((X, K) G A) =
ff
JJ
/(«, v)da =
fb
I
\
/(«, v)dvjdM.
Ja V<Ai(w)
/
A
EXAMPLE 6.11
Let X and Y be random variables with joint density
function
fx.Y^x.y) =
ifO < x < 1,0 < y < 1
otherwise.
6.4
203
JOINT DISTRIBUTION FUNCTIONS
Suppose we want to calculate P(X2 + y2 < 1). We must first define a region
A C R2 such that (X2 + y2 < 1) = ((X, Y) G A). This is done by formally
replacing X and Y by typical values u and v, respectively. In this case, we let
A = {(«, v) : u2 + v2 < 1},
as shown in Figure 6.9. It is easy to see that (X(w), y(w)) G A if and only if
X2(w) + y2(w)
1. Therefore,
P(X2 + Y2 < 1) = P((X, T) G A) =
fx,y(n>v)da.
Recalling that the integrand vanishes outside the unit square U and is equal to
1 inside, the integral is equal to Area(A Cl U) = tt/4. Thus, P(X2 + Y2
1) = tt/4. ■
FIGURE 6.9 P(X2 + V2 < 1).
Knowing the joint distribution function or density function of the random
variables X and Y, the corresponding individual distribution or density func­
tions can be obtained.
Theorem 6.4.2
IfX andY have the joint distribution function Fx,y> then
(i)
Fx(x) = limy_+ooFx,y(x,y).
(ii)
Fy(y) = limx_+ooFx,y(x,y).
In addition, ifX and Y have a joint density fx,y> then
(Hi) fxM = IZfx.ytx^dv,
x G R,and
(iv) fdy} = ^fx.y^yWu,
y&R,
are densities for X and Y, respectively.
204
6
CONTINUOUS RANDOM VARIABLES
PROOF: The first step in proving (i) is to show that
Fx(x) = lim FXtY(x,n).
n
Since the sequence of events {(X
as n —> +00 for each x G R,
n)} increases to the event (X S x)
x, Y
lim FXiY(x,n) = lim P(X
n->+x *
x, Y
n) = P(X < x) = Fx(x).
The rest of the proof of (i) is the same as the corresponding part of the proof
of (in) in Theorem 6.3.1. To prove (iii), let
■+x
g(x) =
fX,y(x,v)dv.
—X
Since
Fx(x) = P(X < x, -oo < Y < +oo)
fXtY(u, v}dv \du
it follows that g is a density for X. ■
When fx and fY are obtained in this way, they are called marginal density
functions.
Suppose a point is chosen at random from a disk D with
center (0,0) and radius 1. Let X be the x-coordinate of the chosen point and
Y they-coordinate. The joint density fXtY is then
EXAMPLE 6.12
A,y(w>v) =
if u2 + v2
otherwise.
1
The density of X is then given by/x(x) = [
^f XlY(x, v)dv, —oo < x < +<».
*
To evaluate the integral, it is necessary to consider the cases x < -1,-1 <
x < l.andx > 1 separately. Whenx < -1, the function fx,Y(x, v) vanishes
on the vertical line through x on the w-axis, and so /x(x) = 0. The same is
true when x > 1. When -1 < x < l,/x.y(x, •) is equal to 1/rr on the line
segment joining (x, - Vl - x2) to (x, >/l - x2) as depicted in Figure 6.10
and equal to 0 at other points of the vertical line through x, so that
6.4
205
JOINT DISTRIBUTION FUNCTIONS
v
FIGURE 6.10 Marginal density.
Therefore,
fxM =
if."1<x^1
otherwise. ■
0
This example illustrates a technique for finding the density function of a
random variable X. The consideration of a second random variable Y can result
in the determination of the joint density fx,y> from which fx can be obtained
as above.
Suppose now that we are given the joint density function of two random
variables X and Y and we would like to find the density fz of the sum
Z — X + Y, assuming there is such a density.
Theorem 6.4.3
= X + Y.
LetX and Y be random variables having a joint density fx,y an^
The density ofZ exists and is given by
/z(z) =
r +oc
fx.y{u,z - u)du
J —00
‘+00
~
fx,y(z ~ v>v)dv,
(6.5)
z G R.
J —00
PROOF: Consider first the distribution function Fz. Since Fz(z) = P(Z
z) = P(X + Y < z),
r +°° / r z“w
r
FZW =
J
/x.y(M,v)da =
\
v)dv \du.
J—00 \J —00
/
{u+1/Sz}
With u fixed, let w == v + u in the inner integral to obtain
\
f+eo/fz
J —00 \J —00
rz Sf
*™
=
J —00
w
fx,y(u>w —00
- u)dujdw.
206
6
CONTINUOUS RANDOM VARIABLES
If we let
r+x
g(z) =
fx.Y^x.z - x)dx
fx,Y^>z ~ u)du>
thenFz(z) = [* xg(w)dw. Therefore, g is a density for the random variable
Z. ■
The formula for the density of Z in Theorem 6.4.3 takes on a more usable
form in the case of independent random variables.
Definition 6.6
T/je random variables X and Y are independent if
P(X < x,Y < y) = P(X < x)P(T == y)
for all x,y e R. ■
This definition is clearly equivalent to the requirement that
Fx,Y<x>y) = Fx(x')FY(y')
for all x,y G R.
If X and Y are independent random variables and <f> and <// are continuous
real-valued functions on R, then </>(X) and ip(Y) are also independent random
variables. As in the discrete case, the proof of this fact will be omitted.
If X and Y have a joint density fx,y, then we know that X has a density/x
and Y has a density fy.
Theorem 6.4.4
Let X and Y be random variables having a joint density. Then X and Y are
independent if and only iffx (x)fY (y) is a density for X and Y.
PROOF: Suppose fx (x)fy( y) is a joint density for X and Y. Then
•x /fy
P(X < x, Y < y) =
\
fxWfY{v)dv\du
— X. \J—x
j
p
fp
\
£(“)
/y(v)dv dw
—X
00
J
6.4
JOINT DISTRIBUTION FUNCTIONS
207
fxMP(Y
y)du
fx
= P(Y < y)
fx(u)du
— CO
= P(Y < y)P(X < x),
and therefore X and Y are independent random variables. On the other hand,
if X and Y are independent, then
Px.y{x,y) = P(X < x,Y < y)
= P(X < x)P(Y < y)
f(x
\/fy
/x(M)dw
fy(v)dv
•y
fxWfyMda,
so that/x(x)/y(y) is a joint density forX and Y. ■
Theorem 6.4.5
If the random variables X and Y are independent with densities fx and fy,
respectively, and Z = X + Y, then Z has the density
r+a>
r+<»
fxWfy(z ~x)dx
fzW =
—CO
—co
/x(z ~y)/r(y)dy,
z e r.
PROOF: Theorem 6.4.3 ■
The above equations take on simpler forms if, in addition, X and Y are
nonnegative random variables. In this case,
Io fxWfy(z — x)dx
if z — 0
if z < 0,
(6.6)
with a similar result holding for integration with respect toy instead of x. Since
Z is nonnegative,/z(z) = 0 for z < 0. For z > 0, the integral over (-<», +<»)
can be replaced by the integral over (0, +<») since fx {x.} = 0 for x < 0; since
fY(z — x) = 0 when x > z, the integral over (0, +<») can be replaced by the
integral over (0,z).
Caveat: In real life, independence of random variables is the exception rather
than the rule.
208
6
CONTINUOUS RANDOM VARIABLES
Let X and Y have the joint density
EXAMPLE 6.13
e~x~y
=
*>y)
Zx,y(
ifx S: 0,y > 0
otherwise.
Since the joint density vanishes outside the first quadrant, the pair (X, V) is
in the first quadrant with probability 1 and outside with probability 0. Thus,
/x(x) = Oifx < Oand/y(y) = Oify < 0. Suppose
*
> 0. Then
r +x
fxM =
J—X.
r +oo
/x.y(x,v)dv =
e~x~vdv = e
Jo
Thus,
Similarly,
Since/X,y(x,y) = fx (*
)/y(/)> the random variables X andY are independent.
Now let Z = X + Y. Again jz(z) = 0 if z < 0. Forz
0,
- +CO
/z(z) =
)/y(z
*
/x(
e~xfY(z — x)dx.
~x)dx
. o
Since fY(z — x) = 0 when z — x < 0,
rz
e~xe~^z~x^dx = ze~z.
/z(z) =
Jo
Therefore,
/2(z) =
ze z
0
if Z S: 0
if z < 0.
6.5
EXERCISES 6.4
1.
COMPUTATIONS WITH DENSITIES
209
LetX and Y be independent random variables having density functions
if 0 < x < 1
otherwise
fxW =
if 0 < y < 1
otherwise.
/r(y) =
Calculate P{Y < X).
2.
Let X and Y be independent random variables having density functions
2e 2*
0
fx(x) =
ifx > 0
if x < 0
fy(y) =
3e~3y
0
if y
0
ify < 0.
Calculate P{Y < X).
3.
Let
fx,y(x,y) =
c(l -x2 -y2)
0
ifO < x2+y2 < l,x > 0,y > 0
otherwise
be a density function. Find the value of c and calculate P(X S: 1/2).
4.
Let
fx.yfx.y) =
xe
0
if 2: 0,y 2: 0
*
otherwise.
Find/X(x) and/y(y).
5.
Let X and Y be independent random variables with densities
=• { J
ifo < X < 1
otherwise
/r(y) =
ify > 0
ify < 0
What is the density of Z = X + Y?
6.
Let X and Y be the random variables of Problem 2 and let Z =
min (X, K). Find the density of Z.
7.
Let X and Y be the random variables of Problem 2 and let Z = X + Y.
Find the density of Z.
8.
Let X and Y be the random variables of Problem 1 and let Z = X + Y.
Find the density of Z. (Hint: Calculate the distribution function by
considering the intersection of the region {(«, v) : v
z ~ u} with the
unit square U in the u v-plane.)
COMPUTATIONS WITH DENSITIES
We begin this section by cataloging several common density functions, special
cases of which were seen in the previous section.
210
6
CONTINUOUS RANDOM VARIABLES
EXAMPLE 6.14 A density commonly used for many board games that
employ a spinner is the uniform density on [a, b] defined by
l/(b — a)
0
if a < x £ b
otherwise. ■
EXAMPLE 6.15 The continuous version of the discrete geometric density
is the exponential density with parameter A > 0 defined by
fW =
Xe~Ax
0
ifx > 0
if x > 0. ■
Waiting times connected with continuously varying random processes
sometimes have an exponential density.
One of the best known density functions is the normal density (or Gauss
density or Laplace density). Consider the function e~x2/2 for x S 0 and let
c = f0+x e~x /2dx. The constant c can be determined indirectly by calculating
e~^^/2da
where A = {(x, y) : x S: 0,y > 0} is the first quadrant in the plane.
Transforming to polar coordinates,
c
• ?r/2 /r +x
\
I
e~r2/2rdrjd 0 =
. o \Jo
/
Thus, c = J0+oc e x2/2dx = .Jiri'!, and so
EXAMPLE 6.16
If we define
*)
</>(
= —— e"x2/2,
■Jitt
xER,
(6.7)
then </> can serve as the density function of a random variable. The density <p is
called the standard normal density. The graph of the standard normal density
is depicted in Figure 6.11. ■
6.5
211
COMPUTATIONS WITH DENSITIES
1 --
I
-3-2-10
2
3
X
FIGURE 6.11 Standard normal density.
The corresponding distribution function defined by 4>(x) = J </>(t)dt,
x G R, is called the standard normal distribution function. Values of $(%) have
been calculated using Maple V software and are given in the Standard Normal
Distribution Function table (see p. 346). $(%) can be determined for negative
values of x from this table by using the fact that </> is a symmetric function, so
thatd>(x) = 1 — d>(— x). For example, <!>(—.75) = 1 — 4>(.75) = .2266.
If X is any random variable with density fx and Y = aX + b,a,b E R
a
0, thenfy can be expressed in terms offx as follows. Suppose first that
a > 0. SinceFy(y) = P(Y < y) = P(X < (y ~ b)/a) = Fx((y~b)/a),
fy(y) = (l/«)fX((y ~ b)/a). If a < 0, the Fy(y) = P(X > (y ~ b)/a) =
P(X>(y~b)/a) = 1-Fx((y-b)/a) andfY(y) = (~l/a)fx((y~b)/a).
Since —a = |a | when a < 0, the two cases can be combined by writing
1
(y — b \
fr(y) ~ j~tfx i~-----
I«1
\ « /
(6.8)
EXAMPLE 6.1 7 Consider a random variable X having a standard normal
density *</>(
)
and Y = aX + /z where /z, cr G R, a > 0. Then fy(y) =
(l/<r)</>((y — /z)/tr),y G R. Therefore,
/r(y) =
y/2ircr
A random variable Y having this density is said to have a normal density
n(/z, <r2) with parameters /z and a, called the mean and standard deviation,
respectively. The latter terms will be justified later. For the time being, /z and
a are just parameters. ■
If X and Y are independent random variables having normal densities, what
can we say about the density of the sum?
Theorem 6.5.1
LetX and Y be independent random variables having normal densities n (p.x, ax)
and n(p.y, <??), respectively Then Z - X + Y has a normal density n(p,x +
/zr, a2
x + a2).
212
6
CONTINUOUS RANDOM VARIABLES
Sketch of Proof: By Theorem 6.4.3,
/z(z) =
1
g - U - Ax )2/2<rJ
__L—e"(z"x"/i'')2/2<r‘' dx.
■JlTTCry
The next step is to combine the'two exponents and then complete the square
on x. The result is that a factor of 1/
along with an exponential function
of a quadratic in z can be taken outside the integral, leaving the total integral
of a normal density, which is 1. Although the algebra is tedious, readers should
carry out these steps at least once in their lifetime. ■
If the random variable Y has a normal density with parameters /z and <r,
then probabilities of the type P(a < Y
b) can be expressed in terms of <!>
as follows. Since Y = aX + /z, where X has a standard normal density,
P(a < Y
b) = P(a < crX + /z
b)
EXAMPLE 6.1 8 Suppose the random variable X has a normal density
with parameters /z = 100 and a = 10. According to the Standard Normal
Distribution Function table (see p. 346) and the fact that <I>(x) = 1 — <>(—x),
x E R,
P(75 <X < 125)
. 125 — 100
=
—io—}
75 — 100.
x
—io—} = <>(2'5) “
= #(2.5) - (1 - #(2.5)) = 2#(2.5) - 1 = .9876. ■
If X is any random variable with density fx and Y = X2, then the
density of Y can be obtained as follows. We first express Fy in terms of
Fx. Since Y is nonnegative, Fy(y) = 0 if y < 0. Suppose y > 0.
Then Fy(y) = P(Y < y) = P(X2 < y) = P(< X < ^) =
jy < X — jy) = Fx( jy) — Fx(- y/y}- Since the derivative of the
latter expression is (1/(2 y/yW^ jy} + F'x(-^/y)),
0
ify < 0
if/ao. (6-9)
EXAMPLE 6.1 9 Let X have a standard normal density and let Y = X2.
Applying the formula above,
1Vr! “[ (l/^)^
ify < 0
ify > 0. ■
(6.10)
6.5
213
COMPUTATIONS WITH DENSITIES
The last density is a special case of a family of density functions having the
form x“-1e-Aj; for positive x except for a multiplicative constant where a and
A are positive parameters. To determine the multiplicative constant, we must
evaluate the integral
r+oo
xa~le~^dx.
Jo
Letting y = Ax, this integral becomes A-“
ya~le~ydy. Putting aside the
evaluation of the latter integral for the time being, for each a > 0 let
r +«
ya~le~ydy.
T(a) =
Jo
Then
The reciprocal of this constant is therefore the required multiplicative constant.
EXAMPLE 6.2
0
A random variable having the density T(a, A) defined by
,fx<0
0
r(a,A)(x)-|
(6.11)
is called a gamma density with parameters a and A. ■
Returning to Ha) as a function of a, the recurrence relation
T(a + 1) = aT(a),
a > 0,
(6.12)
can be obtained by applying integration by parts to the integral
r+oo
X(a+l)-le~xdx
Jo
Since T(l) =
e~ydy = 1, it is easy to show by an induction argument
that for every positive integer n,
f(n + !) = «!.
Since the density given in Equation 6.10 is a T(l/2,1/2) density, it follows that
the multiplicative constants in Equations 6.10 and 6.11 must be equal in this
case. Therefore,
214
6
CONTINUOUS RANDOM VARIABLES
From this result, T(a) can be calculated using Equation 6.12 for any a > 0
that is an odd multiple of 1/2 . For example,
= -
r/5\
\2 /
2
\2 / 4 2 2\2 /
4
EXAMPLE 6.2 1 Let X have a normal density n(0, <r2) and let Y = X2
Then/y(y) = 0 ify < 0. Suppose / > 0. Then
Jy
Since Jtt = F(l/2),
/r(y) =
((l/2<r2)1/2/r(l/2))y(1/2)-1e">'/2<r2
0
ify > 0
ify
0.
It follows that Y has a F(l/2, l/2<r2) density. ■
The exponential density is a special case of the gamma density with a = 1
and A = 1. According to Example 6.13, if X and Y are independent and have
exponential densities with the same parameter A = 1, then Z = X + Y has a
T(2,1) density. This suggests the following theorem.
Theorem 6.5.2
LetX and Y be independent random variables having gamma densities F(ai, A)
andr(a2> A), respectively. IfZ = X + Y, then Z has a F(ct1 + ct2) A) density.
PROOF: By Equation 6.6 for z > 0,
/z(z) =
fz
A"1
Jo 1 (al)
“l-lg-^.-^2 (2
r(«2)(
^ai+a2
r(a!)F(a2)e
e-A(z'x)dx
rz
xa'~l(z -x}ai~xdx.
0
Making the substitution y = x/z in the integral,
xa'~l(z ~x)ai~'dx =
0
y^-’d-y^-^y.
o
The latter integral is a constant that will be denoted by B(ab a2). Therefore,
/z(z) =
Aa'+«7(F(a1)r(a2))B(a1,a2)2«>+^-1e-^
0
if z > 0
if z < 0.
6.5
215
COMPUTATIONS WITH DENSITIES
FIGURE 6.12 The bell curve.
Disregarding the fractional constant in the description of/z(z), this function
has the form of a gamma density and therefore must be a T(ai + a2> A) density.
This concludes the proof. ■
The fact that the above function is a gamma density permits us to determine
the constant B(ab az)- Since the function is a gamma density F(a! + a2, A),
we must have
BCai.az) _
1
r(ai)r(a2)
r(ai+a2)’
and therefore
Dz
x
f1 #l-in
w-U
rfaJHaz)
1 (“1 + a2)
JO
The reader should observe the remarkable fact that we have evaluated a
family of integrals using probability methods.
EXAMPLE 6.2
2
=
Jo
. .
1W
Returning to the latter part of the above proof, consider the function
3(aba2)(x) =
(Ha, + a2)/(r(a1)r(a2))x«>-1(l - x)
* ’"1
0
ifO<x < 1
otherwise.
Since this function is nonnegative and the multiplicative constant has been
chosen so that its total integral is 1, it is a density function called the beta
density with parameters ai and a2.
216
6
CONTINUOUS RANDOM VARIABLES
A final remark about the graph of the standard normal density shown in
Figure 6.11. The standard normal density is shown there using the same units
on the axes. The usual practice is to use a much larger unit on the y-axis, which
produces the distorted view of the standard normal density seen in Figure 6.12.
Figure 6.11 shows that the normal density is spread out more uniformly than
shown on the distorted graph. The distorted graph is commonly called the
“bell curve” and is the favorite of book cover designers.
EXERCISES 6.5
1.
If the random variable X has a normal density n (ft, a2) and Y =
a X + b, what is the form and parameters of the density of T?
2.
A random variable X has a normal density n (/j., <r2) and it is known
that P(X < 100) = .9938 and P(X < 60) = .9332. Find /z and a.
3.
Let X and Y be independent random variables with standard normal
densities. Assuming that X2 and Y2 are independent, find the density
ofZ = X2 + T2.
4.
Let X and Y be independent random variables having standard normal
densities. Calculate the probability that the pair (X, T) will be between
the lines through the origin making angles tt/6 and tt/4 radians with
the x-axis and also in the first quadrant.
5.
A point in the plane is chosen in such a way that its x-coordinate X is
n(0, <r2), its y-coordinate Y is n(0, cr2), and the two are independent.
Find the density of the distance Z = Jx2 + T2 of the point from the
origin. The density of Z is called the Rayleigh density.
6.
If the random variable X has a uniform density on the interval
[a,b],a < b, find a function </> such that Y = </>(X) has a uniform
density on [0,1].
7.
Let X have a standard normal density h(0,1). Find a function </> such
that Y = </>(X) has a uniform density on (0,1). (Hint:
is strictly
increasing and has an inverse function <I>-1.)
MULTIVARIATE AND CONDITIONAL DENSITIES
Given discrete random variables X and Y, the conditional density of Y given
X = x was defined in Section 4.5 to be the function
A|x(y|x) = P(y =y|x =X).
This is not possible for continuous random variables, because if x is a point
of continuity of F
,
*
then P(X = x) = 0 and the above conditional prob­
ability is not defined. In particular, if Fx is continuous at every point, then
the conditional probability is not defined for any x. Even if P(X = x) > 0,
P(T = y |X = x) would still be equal to zero for most y since P(K = y) = 0
6.6
217
MULTIVARIATE AND CONDITIONAL DENSITIES
whenever Fy is continuous at y. In the case of continuous random variables,
it is necessary to avoid events of the type (Y = y) or (X = x).
To keep the discussion at the introductory level, we will assume that X and
Y are random variables with a joint density function. Consider the conditional
probability P(Y
y\x S X
x + Ax). Assuming that the given event has
positive probability,
P(T < y|x < X < x + Ax) =
P(Y
y,x < X
x + Ax)
P(x < X
x + Ax)
f-'JxX+^fx,Y(u>v)dudv
Lx+AxfxWdu
We might try defining the conditional distribution of Y given X = x by
FY[x(y I*
= x) = lim P(Y
Ax—>0
Ax-kJ_o°
y |x s X < x + Ax)
^fx(u}du
\*
Assuming that the limit can be taken under the first integral sign,
Vv
FrlxtylX =x) =
fc+AxfxWdu
/
This suggests that the conditional density of Y given that X = x should be
defined as
,
, । ,
..
(1/Ax)fX+A*fx.Y(u,yMu
If the functions/x and/x,y are continuous atx and (x, y), respectively, then the
denominator and numerator will have limits fx (x) and/x.r (*> /)»respectively,
since they represent the average values of these functions near the points of
continuity. In this case, we would have
fnx(y
Definition 6.7
I*)
fx,r(x,y)
fxM
provided/x(x) > 0-
If X and Y are random variables having a joint density, the conditional den­
sity of Y given X is defined for x,y E R by
)>0
*
if/x(
if/x(x) = 0. ■
218
6
CONTINUOUS RANDOM VARIABLES
Note that the definition does not require any of the assumptions made in
the above heuristic argument. Note also that the equation
fx,y(x,y) = fy\x(y I x)fx(x),
x,y G R,
holds at all points x for which fx (x) > 0.
As in the discrete case, experiments are sometimes defined in terms of
densities and conditional densities.
EXAMPLE 6.23 A point X is chosen at random from the interval [0, 1].
Given that X = x, a point Y is then chosen at random from the interval [0, x].
Suppose we are required to find the density of Y. The density of X is clearly
uniform on [0,1]; i.e.,
ifo < x < 1
otherwise.
A(x) =
The information concerning Y is given in conditional form; namely,
if 0
y
otherwise.
1/x
0
x
That is, given that X — x, Y is uniform on [0,x]. The joint density fx.Y^x.y)
is given by
if 0 < y < x, 0 < x < 1
otherwise.
fx,r(x,y) = fy|x(y | x)fx(x) =
Since Y takes on values in [0,1], fy(y) = 0 if y < 0 or y > 1. Suppose
0 < y < 1. Then
/y(y) =
f +x
fx.ytx.yjdx =
f1 1
- dx = -Iny.
Thus,
if 0 == y < 1
otherwise. ■
EXAMPLE 6.24 To assess the reliability of a component of a system,
fatigue or wear and tear must be taken into consideration. Suppose it is known
that if a component has survived up to time t, then it will fail in a small time
interval (t, t +At) with probability approximately proportional to the length At
of the interval where the constant of proportionality depends upon t; i.e., the
6.6
MULTIVARIATE AND CONDITIONAL DENSITIES
219
conditional probability is approximately equal to j3 (t) At for some nonnegative
function j3(t) on (0, <»). Let T denote the time at which the component will
fail. We will use an intuitive argument to determine the distribution function
of T from the given data /3(t). According to the above assumption,
P(t < T
t + At | T > t) « j3(t)At.
On the other hand, if we let F r denote the distribution function of T, then
P(t < T < t +At | T > t) =
P(t < T < t + At)
P(T > t)
Fr(t + At) ~ F'r(t')
I-FtW
Assuming that Ft has a continuous density function fa, by the mean value
theorem of the integral calculus,
P(t < T < t + At | T > t) -
/r(pA^.
1 — rrttj
Thus,
P(t < T < t + At | T > t)
„
J3(t) = hm ----------------------------------Ar-»0
fr(t}
i-FT(ty
Therefore,
J3(t) = -^-(ln(l-PT(t))).
at
Integrating from 0 to t and using the fact that Fr(0) = 0,
ft
J3(s)ds,
ln(l-Fr(t)) = ~
Jo
so that
Fr(t) = 1 - e~^p{s}ds
and
fT(t) =
220
6
CONTINUOUS RANDOM VARIABLES
Since any real-life component will eventually fail, we should require that
limf_+x F-r(t) — I and therefore that
-X
j3(s) ds = +«>. ■
■ o'
EXAMPLE 6.25 Consider a system made up of two components that are
connected in series with associated failure rates i(t) and
Let T be the
time of failure of the system and let Tb T2 be the times of failure of the two
components. If the two components fail independently of each other, what
is the failure rate fl(t) of the system? Since the components are connected in
series, T = minfT^ T2). By the assumed independence,
P(T>0 = P(min(T1,T2)>0
= P(T, >t,T2>t)
= P(JX > t)P(T2 >t)
= (l-Pr.fOXl-Pr^)).
Thus,
1 _ PT(t) = e~lo Pi^)dse-fotp2sds _ e-[o'(^i(s)+02(s))as
and
F?(t) = 1 — e“lo^ds)+02(j))<is,
and therefore j3(t) =
+ fi2(t'). ■
We need not limit ourselves to just two random variables. If Xb ..., Xn are
n random variables, we can define the joint distribution function F XlXn by
the equation
Fx,... xB(xh...,x„) = P(Xx
< x„).
The distribution function Fx1„..,xn and random variables Xb ..., X„ are said
to have the Riemann integrable function fx...... : Rn -> R ns joint density
function if
P(«t <Xt < bx....... an <Xn < bn)
(b"
\
\
/x,... X„(xb .. .,x„)dx„ • • • dxb
/
/
6.6
221
MULTIVARIATE AND CONDITIONAL DENSITIES
More generally, if A C Rn and A is in the smallest <r-algebra of subsets of
Rn containing all n-dimensional rectangles in P", then
p((xb...,x„)eA) = j- - -J/x... X„(xl,...iX„)dVn
A
where dVn denotes integration with respect to n-dimensional volume. To
calculate a probability of the type P(g(Xb
a) where g is some
function of n-variables, the probability is put into the form
P(g(Xb...,X„) < «) = P((Xit...,Xn)(=A)
• r
= ■■■ fx.... Xrl(*l> -->Xn)dV„
J
A
whereA = {(xb...,x„) : g(xb .. .,x„) < a}.
EXAMPLE 6.26
Consider the unit cube U C Rn defined by
U = {(xb...,x„) :0Sxi <
<x„ < 1}.
An experiment consists of choosing a point at random from U. The latter
statement means that the probability that the outcome will be in a region A,
if A is in the <r-algebra described above, is taken to be the volume of A A U,
Vbl(A A 17), divided by the volume of U, which is 1. If Xb ..., X„ denote the
first, second,..., Hth-coordinates, respectively, of the chosen point, then
P((Xb...,X„) GA) = Vol(A A U).
X„), we put
In particular, if we want to evaluate P(Xi s X2
A = {(xb...,xM): Xi < x2
x„}, so that
P(Xj < X2 < ••• < X„) = p-j ldV„.
AA U
The multiple integral can be calculated using iterated integrals by fixing
xb...,x„-! and integrating with respect to xn between x„-! and 1, then
integrating with respect tox„-i between x„-2 and 1, and so forth. Thus,
f1
P(Xj < X2 < • • • s Xn
/f1
\
\
dxn jdxn-i... Idx!
• Xn-2
/
/
•1
\
(1 - xn-i)dx„-x • • • Idx,
• Xn — 2
/
I
222
6
CONTINUOUS RANDOM VARIABLES
Jo
(«-D!
«!’
Independence of continuous random variables is defined as before; i.e., the
random variables Xj, ... ,Xn are independent if
P(X1 < xu...,X„ < x„) = P(Xi < Xi)X ••• XP(X„ < x„)
for all
.. xn ER or, equivalently,
Fx,... xAxi,...,x„) = Fx,(xl) X •• • X FXn(xn)
forallx1}.. .,xn ER. Assuming that Xb ... ,X„ have a joint density function
fxy...,x„> then each X, has a density function fXi>i =
and the
random variables are independent if and only if
fx,....X„ (Xb • • • > Xn ) = fx, <X1) X • • • x fXn (Xn )
for all Xy ..., xn ER. More precisely, if and only if fXl (xi) X • • • X fXn (x„)
is a joint density for Xi,... ,X„. As in the discrete case, if Xi,... , X„ are
independent random variables and </>!,...,</>„ are real-valued continuous
functions on R, then the random variables (Xi),..., <t>n (XM) are indepen­
dent random variables, as are the two random variables Xi + • • • + Xn-i
and Xn.
LetX, Y, and Z be independent random variables with
each having a uniform density on [0, 1 ]. The joint density is the product of the
three densities and is given by
EXAMPLE 6.27
fx,Y,z(x, y,z) =
if 0 < x, y, z
otherwise.
1
Consider P(Z S XY). This probability can be calculated by defining
A = {(x,y, z) : z S xy}so that
The region of integration consists of all points (x, y, z) ER3 that lie above the
surface z = xy. Since the integrand vanishes outside the unit cube, we can
integrate over the region consisting of points (x,y, z) that are above the surface
6.6
223
MULTIVARIATE AND CONDITIONAL DENSITIES
z = xy, below the surfacez = 1, and above the unit square in the xy-plane.
By fixing x and y with 0
x, y
1, we can integrate with respect to z from
xy to 1, so that
P(Z > XT) = [ [ (j ldz]dxdy
/
JoJo\lxy
= [ [(1 ~xy)dxdy
Jo Jo
= 43 ■
The proofs of the following theorems will be reserved as exercises.
Theorem 6. 6.1
Let Xb... ,Xn be independent random variables. If Xi has a gamma density
r{ai,X),i = l,...,n,thenXi + ---+XnhasthegammadensityI\al + -'- +
an, A).
Theorem 6. 6.2
Let X\,...,Xn be independent random variables. If X , has a normal density
n(p.i,al),thenXi + - • -+Xn has an (p, a2) density where p, = Pi + '' • + pn
and a2 = a2 + • • • + a2.
LetXb .. .,Xn be random variables having a joint density function fxt... x„If 1
m < n, define
r
,
x _ Al.....X„(X1>
I
JXm«... X„|X..... X„(Xm + l,...,Xn |Xb...,Xm) - -
JX...
—
provided the denominator is positive. As before,
fx.... xB(xb...,x„) =
fx . ...... X n|X|,...,X m (x m+b ..., x „ | xb ..., x m )/X1.... x Jxi,...,xm),
provided the second factor on the right is positive.
EXAMPLE 6.28
A point X is chosen at random from the interval [0, Ij.
Given that X = x, a point Y is chosen at random from the interval [0,x].
Finally, given that Y = y> a point Z is chosen at random from the interval
[0, y]. What is the density of Z? The given information specifies the following
densities and conditional densities:
fxW =
1
0 •
if 0 < x < 1
otherwise.
6
224
CONTINUOUS RANDOM VARIABLES
fr|x(y I*
)
if o < y < X < 1
otherwise.
=
if 0 < z < y < x < 1
otherwise.
/z|x,r(z \x,y) = /Z|y(z |y).=
Since
fx,Y,z(x,y.z) = /z|x,y(z | x,y)fx.Y(x,y) = fz\Y(z | y)/y|X(y I x)fx(x)
provided 05z<y£x<l,
ifO^z<y^x<l
otherwise.
fx,Y.z(x,y,z) =
ForO < z
1,
fx.Y,z(x,y,z)dx ]dy
fzW =
p/r1 i
\
I —dx ]dy
Z Vy xy )
= ~(lnz)2,
and fz(z) = 0 otherwise. ■
EXERCISES 6.6
1.
Let Xi,...,X„ be independent random variables each having an
exponential density with parameter X > 0. Find the density of
Z = X, + ••• +X„.
2. Let Xi,..., Xn be independent random variables each having a standard
normal density. Find the density of Z = JX} +X^+ • • • + X2.
3. Write out complete proofs of Theorems 6.6.1 and 6.6.2.
4. Consider a disk D with center at (0,0) and radius 1. A point with x~
coordinate X is chosen at random from the line segment joining ( — 1,0)
to (1,0), and then a point is chosen at random from the line segment
joining (x, - J1 -~x2) to (x, - x2). Let Y be the /-coordinate of
the latter point. What is the density of V?
5. Let X1.X2.X3 be independent random variables each having an ex­
ponential density with the same parameter A = 1. Calculate the
probability P(X\
2X2 3X3).
6.6
MULTIVARIATE AND CONDITIONAL DENSITIES
225
6. Random variables Xi,X2,. ..,Xn are defined as follows. A point Xi is
selected at random from the interval [0,1]; given that Xi =
a point
X2 is selected at random from the interval [0, xj; and at the last step,
given that Xn-i = xn-i, a point Xn is selected at random from the
interval [0, x„-i]. What is the density of X„?
7. Let Xi,.. .,Xn be independent random variables each having a uni­
form density on [0,1]. Let U = min (Xi,...,Xn) and V =
max(Xi,.. .,Xn). Find the joint distribution function of U and
V and verify that
is the joint density of U and V.
8. Let X, Y, and Z be random variables with joint density
Z2e~z(l+x+y)
fx,Y,z(x>y>z) ~
0
if x > 0, y >: 0, z S 0
otherwise.
'Pindfx,fY>fz>fx,Y> and/x.rlz9. A system consists of two components operating in parallel with as­
sociated failure times Ti and T2 and failure rates /?i(t) and j32(t),
respectively. Let T be the time of failure of the system. Assuming
that the two components fail independently of each other, what is the
density of T?
I
p
EXPECTATION REVISITED I
INTRODUCTION
Although this chapter discusses most of the concepts introduced in Chapter 4,
the introduction of the Riemann-Stieltjes integral makes it possible to formu­
late a definition of expected value that combines the discrete and continuous
cases. An additional condition, however, must be imposed to have an effective
means of calculating expected values using the calculus.
One of the hallmarks of probability theory is a classic theorem known as
the DeMoivre-Laplace limit theorem, which states that a sum of binomial
probabilities can be approximated by the integral of a function known as the
“bell-shaped” normal density. Although the proof of this result is tedious,
a complete proof is given with enough detail that a postcalculus student can
follow the arguments. Reading the proof at this stage is not essential for
learning probability.
The chapter concludes with applications to certain types of sequences of
random variables called stationary processes, which occur in filtering theory
and prediction theory. Processes of this type have their origin in the works of
G. U. Yule and E. Slutsky during the period 1920-1940.
RIEMANN-STIELTJES INTEGRAL
If X is a discrete random variable with range {xb X2>...} and density function
fx, the expected value of X was defined to be E[X] = Xj xjfx(xj\ provided
the series converges absolutely. This definition cannot be used for continuous
random variables for several reasons. What is needed is a formulation of
expected value that applies to continuous random variables and that reduces to
226
7.2
227
RIEMANN-STIELTJES INTEGRAL
the definition above in the case of discrete random variables. Fortunately, there
is a type of integral that can do both, called the Riemann-Stieltjes integral.
Let us quickly review the Riemann integral. Let [a, b] C R, a < b, be a
finite interval and let </► : [a, b] —> R. A collection of points {x0) xb
with a = xq
x2 — • • • — x„ = b is called a partition of [a, b] and
is denoted by ir. The norm of the partition rr is denoted by |tt| and defined
by |tt| = max{%! - x0, x2 — xb ... ,xn - Xn-J. For i = l,..., n, let £,- be
any point of the interval [xf-_i, jcf-] and let Ax,- = xt — Xj-\. The Riemann
integral of </> over [a, b] is then defined as
n
rb
4>{t)dt = lim ^ </>(£,-) Ax,-
provided the limit on the right exists. There is nothing sacred about using the
weight Ax,- for the subinterval [x,-- b x,- ]. We could just as well use some other
weighting scheme.
Let F be any distribution function on R. Using the same notation as above,
we could replace Ax,- by AF,- = F(x,) — F(x,-i) and define an integral of </>
over [a, b] with respect to F by
n
rb
<A(t)dF(t) = lim V </►(£,-)AF,L
M-ofzr
provided the limit on the right exists. To be more precise about the existence of
the limit, if
and tt2 are any two partitions of [a, b], we say that tt2 is finer
than TTi if tti C tt2.
Definition 7.1
The function <p : [a, b] —> R is Riemann-Stieltjes integrable over [a,b] with
respect to F if there is a real number L such thatfor every e > 0 there is a partition
tte of [a, b] for which
n
<e
i=l
for all partitions rr finer than irE. The number L is denoted by
fb
rb
<£(t)dF(t)
|
a
<j>dF. ■
or
J&
The Riemann-Stieltjes integral shares many of the properties of the Riemann
integral:
228
7
EXPECTATION REVISITED
1.
If </>!, 02 are Riemann-Stieltjes integrable over [a, b] with respect to F
and Ci,c2 G R, then C| 0i + c202 is Riemann-Stieltjes integrable over
[a, b] with respect to F and
■b
rb
rb
(A01 + c202)dF <= cj
0]dF + c2
02dF.
Ja
Ja
Ja
2. Let a < c < b and 0 : [a,b] —> R. If two of the following three
integrals exist, then so does the third and
3.
If a < b and 0 is Riemann-Stieltjes integrable over [a, b] with respect
to F,then
ra
~b
Ja
0dF = —
(f>dF.
Jb
4. If 0 is continuous on [a, b], then 0 is Riemann-Stieltjes integrable over
[a, b] with respect to F.
There is also an integration by parts formula that will be omitted because it
will not be needed here. Proofs of the above statements can be found in the
book by Apostol listed at the end of this chapter.
EXAMPLE 7.1
Let tfj be continuous on [a, b] and let
F(x) =
0
1
if x < c
ifx > c
where a < c < b. Then
0(f) dF(r) = 0(c).
This can be seen as follows. Since 0 is continuous at c, given any £ > 0 there
is a 8 > 0 such that
|0(x) - 0(c)| < e whenever |x - c| < 8,x G [a, b].
Fix a partition tte of [fl, b] with 17re| < 8. Consider any partition it =
{x0, xi,...,xn} finer than tte. Then c G [x,;0_j, x,-0] for some i0 = 1,2,
Since F(x,) - F(xj-i) = 0 for i
i0 and F(x,0) - F(x,0-1) = I,
n
i =i
7.2
229
RIEMANN-STIELTJES INTEGRAL
whereG
and therefore
Since |£0-c| < |tt| < |7re| < 8, |0(&)-0(c)| < e,
0(£) AF; - 0(c) = |0(£,o) - 0(c)] < e
for any partition tt finer than tte. Thus, L = 0(c) satisfies the above definition
and//0(t)dF(t) =
■
Note that continuity of 0 was required only at the point c in this example.
The same type of argument can be used to show that if 0 : [a, b] —> R
is continuous and there are a finite number of points Ci, C2,..., cn with
a < ci < C2 <•••< cn
b such that F only increases by jumps at
Ci, C2,.cn, then
0(t)dF(t) = ^0(c,)(F(cf) -F(c,-)).
In particular, if X is a discrete random variable with range {cb ..., c„} and
F = Fx,thenP(X = c, ) = Fx(c<) - Fx(cf-) =/x(c,)and
n
.
rb
E[X] =2L0(ci)/x(c,) =
Improper integrals of the type
the usual way as the limits
(b
lim
d>dF
—+o°Ja
0(t)dF(t).
<f>dF and J
and
lim
fidF can be defined in
<!>dF,
respectively, provided the limits exist in R. If both limits do exist, then J
is defined by the equation
(pdF =
Definition 7.2
<l>dF +
(j>dF,
fid?
a G R.
The improper integral ^™<t>dF is absolutely convergent if the improper inte­
gral f 2Z l0l dF is defined. ■
We can now formulate the definition of expected value of any random
variable.
230
Definition 7.3
7
EXPECTATION REVISITED
The expected value E [X ] of the random variable X is defined by
■+=c
xdFx(x)
B[X] =
J “00
provided the improper integral is absolutely convergent; i.e., f2”|*
HRxCx)
+ 00. ■
<
If X is a discrete random variable, this definition of B[X] agrees with the
definition given in Section 4.2.
We have seen that if the distribution function Fx is absolutely continuous,
then there is a density function/x such that
FXM = [ /x(T)dt,
J —X
x e R,
and
rfe
Fx(fe)“fx(«) =
fxtfdt.
Ja
Suppose in addition, that fx is continuous on the interval [a,b]. Then by
the mean value theorem of the integral calculus, there is a number c with
a < c < b such that
rb
PxW-FM= \ fx(f)dt =fx(cXb-a).
Ja
Suppose now that the function </>: [a, b] —> R is continuous on [a, b] and let
it = {xq, ... ,x„} be any partition of [a, b]. Then
”
:=l
where
•b
n
fXi
"
fx^dt =
=
i=l
i=l
G [x, - b x, ], i = 1,Then
n
n
<l>(t')dF(t) = lim Y<A(^>)AFi = lim V </>(£,)/x(£;) Ax,-.
Jo
l’rl-‘Oi=i
If the & and
were the same in the second sum, the latter limit would exist
since </> is Riemann-Stieltjes integrable with respect to Fx and would be equal
t0 L <AWx(^) dt; the same result is valid even if and
are not the same,
7.2
RIEMANN-STIELTJES INTEGRAL
231
by a result known as Duhamel’s principle. Therefore, assuming that Fx has a
density fx that is continuous on [a, b] and </> is also continuous on [a, b],
rb
rb
(fMdFxW =
Ja
Theorem 7.2.1
<f(x)fx(x}dx.
Ja
Let X be a random variable with E [X] defined and having a continuous density.
Then
£[X] =
J —00
xfx(x)dx.
|x|dFx(x)forallb > O,fo+" |x|/x(x) dx
f0+°°|x|dFx(x) < +oo. Similarly,
|x|/x(x) dx =
|x| dFx(x) < +oo.
Since the integrals [°Mfx M dx and l0°°xfx(x) dx are absolutely convergent,
PROOF: Since|x|/x(x) dx =
r +00
E[X] =
r +00
f0
xdFxW =
J —oo
J— oo
r +00
rO
=
xdFxW
xdFx(x) +
Jo
J —00
x/x(x)dx +
xfxMdx
Jo
-+X
=
xfx(x)dx. ■
If Z = </>(X) and we want to calculate E[Z], we must first find the density
function of Z, assuming there is one. The following theorem allows us to
bypass this step by using fx rather than// as a weight function. The proof will
be omitted. Proofs require approximating the random variable X by a discrete
random variable and applying Theorem 4.2.1.
Theorem 7.2.2
Let X be a random variable having a continuous density and let <f> : R —> R be a
continuous function. If the integral j
d>(x)fx(x) ^x converges absolutely, then
E [</>(X)] is defined and
r+<X>
W(X)] =
EXAMPLE 7.2
J —00
chMfxMdx.
Let X have a uniform density on [a, b], a < b. Then
r +°o
E[X] =
J — oo
i
a +b
X------- dx =
2
Ja
&
a
rb
xfxMdx =
232
7
EXPECTATION REVISITED
Note that £[X] is the midpoint of the interval [«,&]. If we take </>(
*)
— x2,
then
£[X2] =
. —x
x2fx(xy)dx =
x27—T~dx = r(b2 + ab + a2). ■
Ja . & A
To calculate £[</>(X)], the integral is set up by replacing X in the argument
of </> by a typical value x, multiplying by the density of X, and integrating with
respect to x.
EXAMPLE 7.3
Let X be a random variable having a F(l/2, 1/2) density.
Then
x2T |- |(x) dx
\i 2 r
B[X2]
y._-.
JO
x2—
x(,/2) le x/2dx
yj27T
■J2tt Jo
Note that the integrand looks like a F(5/2,1/2) density. It would be if it were
multiplied by the constant
(l/2)5/2 _
1
H5/2) " 3y/^'
Thus,
—j=x(5/2) le x/2 dx
3 v2tt
Since the F(5/2, l/2)(x) density is zero for x < 0, the last integral is the
total integral of T(5/2,1/2) over (-oc>+oo) and has value 1. Therefore,
£[X2] = 3. ■
There are random variables for which £[X] is not defined according to the
criterion in Definition 7.3.
EXAMPLE 7.4
Let X be a random variable having the Cauchy density
fxW =
1
1
7T 1 + X2'
11,
TT ] +X
*
2 f77 Jq
xER.
Since
---------tdx — —
x
------ 7 dx — +oo
1 +x2
7.2
RIEMANN-STIELTJES INTEGRAL
233
the integral
J
J
X-------- 7 dx
-00
7T 1 + X1
+0°
is not absolutely convergent, and therefore E[X] is not defined according to
Definition 7.3. ■
If the random variable X takes on only nonnegative values with probability
1, then fxM = 0 for x < 0 and the integral defining E[X] has the form
l^xfxWdx. Absolute convergence and convergence are the same in this
case. If the integral is not convergent, then as in calculus we write E [X ] =
{0+ xfx(x)dx = +°°. In addition, if T is a waiting time that takes on the
value +<» with positive probability, we then put E [T] = +<».
EXERCISES 7.2
1.
Let X be a random variable having a uniform density on [0, rr]. Calcu­
late E [sin X].
2.
Let X be a random variable having the standard normal density
</>(x) = (1/ y/2Tr)e~x2/2, —oo < x < °°. Calculate E[|X|].
3.
Let X be a random variable having a uniform density on [0,1]. Calculate
E [min (X, 1/2)] and E [max (X, 1/2)].
4.
Let Xj,...,X„ be independent random variables each having a uni­
form density on [0,1], let U = min(Xi,.. .,Xn), and let V =
max (Xb ..., X„). Calculate E [17], E [172], E [V], and E [V2].
5.
Let X be a random variable having an exponential density with param­
eter A > 0. Calculate E [X ] and E[X2], if defined.
6.
Let X be a random variable having a gamma density T(a, A) where
a, A > 0. If defined, calculate E[Xr] where r is a positive integer.
7.
Let X be a random variable having a beta density j3(ai,a2) where
a\, a2 > 0. If defined, calculate E[Xr] where r is a positive integer.
8.
Let F be a distribution function that increases only by jumps at the
points Ci < C2 < • • • < cm and let </> : R —> R be continuous at
Prove that
r +°°
ni
^(t)dF(t) = ^<MciXF(ci)-F(ci-y>.
'~x
9.
i=i
Let X be a random variable having a density function fx for which
fxW = 0 if x < 0. Assuming that E[X] is defined, show that
r 4-oo
E[X] =
Jo
(See Exercise 4.2.6.)
r 4-co
P(X>x)dx=\
Jo
(l-Fx(x))dx.
234
7
EXPECTATION REVISITED
10. If X and Y are random variables having densities with Y S: X S 0
with probability 1, show that £[T] S: £[X] S: 0.
EXPECTATION AND«
CONDITIONAL EXPECTATION
The reason for defining the expected value of a random variable in terms of a
Riemann-Stieltjes integral in the previous section was to convince the reader
that there is a way of treating the discrete and continuous cases simultaneously
and also of treating random variables that are neither discrete nor continuous.
Operationally, the discrete case uses summation and the continuous case uses
integration.
Let Xi,.. .,Xn be n random variables, let </> : Rn —> R be a real-valued
continuous function of n variables, and let Z = </>(Xi,.. .,X„). Then the
expected value £[Z] = zf** dFz(z) is defined, provided the integral is
absolutely convergent. The definition, however, requires the determination of
Fz knowing the probability characteristics of Xj,..., Xn. The expected value
of Z can be calculated using the following theorem.
Theorem 7.3.1
LetXi,..., Xn ben random variables having a joint density fx x„ and let <f> be
a real-valued continuous function of n variables. lfZ = <f>(Xi,...,X„), then
(f>(xi,...,xny)fXl... xn(xi,...,x„)dV„,
E[Z] = |
R"
where dV„ denotes integration with respect to volume, provided the integral
converges absolutely.
In a slightly more advanced course dealing with probability measures, this
theorem is relatively easy to prove.
EXAMPLE 7.5 Let (X, T) be the coordinates of point chosen at random
from the unit square U = {(x,y) : 0 < x
1,0 < y < 1} in the plane.
Then
1
0
if0<x<l,0<y<l
otherwise
and
£[xr]
([
i J
R2
xyfx.ytx.yjdA =
fVf1
Jo Vo
\
i
/
4
xydy ]dx — -.
7.3
EXPECTATION AND CONDITIONAL EXPECTATION
235
In this example, note that the integral for calculating £[XT] is obtained
by replacing X and Y by x and y, respectively, to form the integrand xy,
multiplying by the joint density, and then integrating with respect to area.
Granted Theorem 7.3.1, we can establish properties of the expected value.
Theorem 7.3.2
Let X and Y be random variables having a joint density with E [X] and E [T]
defined and leta.b ER.
(i)
IfP(X 2 0) = 1, thenE[X] > 0.
(it)
The expected value of aX + bY is defined, and £[aX + bY] =
a£[X] + b£[K].
(Hi)
IfP(X > K) = 1, then E[X] > £[/].
(iv)
|£[X]| <£[|X|],
PROOF:
(i)
If P(X 2 0) = 1, then/xW = 0 for
*
r +oo
r +oo
xfx(x)dx
B[X] =
— 00
(it)
< 0 and
xfx(x)dx>0.
o
Putting aside the question of whether or not £ [aX + bY ] is defined,
(ax + by)fx,y(x,y) dA
R2
r +°° / r +°°
\
a
I
xfx,Y(x,y)dy]dx
f +oo / f 4.00
I
+
\
yfx,y(x,y) dx jdy
C +oo
a
/■ +oo
xfx(x) dx + b I
= «£[X] + b£[Y].
The steps involved in showing that £ [aX + bT] is defined are similar
using the inequality \ax + by\ — |a ||x| + |b||y|.
(tit)
(iv)
Since P(X > D = 1,P(X - Y > 0) = 1 and£[X] -£[K] =
£[X-y] 2 Oby (ii) and (i).
Since —|X|
X < |X|,~£[|X|]
£[X]
£[|X|] by (iii), and
therefore |£[X]| jE[|X|]. ■
236
7
expectation revisited
EXAMPLE 7.6 Let (X, Y) be the coordinates of a point chosen at random
from the part of a disk D with center (0,0) and radius 1 in the first quadrant.
The joint density is
ft
x _
fx.y(x,y) =
ifx2+y2 £ l,x — 0,y S: 0
otherwise,
C4/77)
Q
and the marginal densities are
(4/tt) Jl — x2
ifO < x < 1
otherwise
ifO < y < 1
otherwise.
SincefxWfy(y) for all
in a small disk centered at (0,0), X
and Y are not independent random variables. Since
4 !-------- 7 j
4
y~ V1 ~y2dy = —,
77
jTT
r
E[X] = E[K] =
q
E[X + r] = (4/3)77 + (4/3)77 = (8/3)77. ■
Theorem 7.3.3
Let X and Y be independent random variables for which E[X] and E[Y]are
defined. Then E [X Y] is defined and
E[xr] = E[x]E[y].
PROOF: Since
\xy\fxMfY(y)dA
dA
)/r(y)
*
Mly|/x(
R2
r +x
~
/ f +°°
\
ly\fy(y)dyjdx
MAW I
J— X
\J-X
)
\/f+0°
=
V
|y|/r(y)^y
|x|/x(x) dx
/ \J —co
< 4-oo f
E[XK] is defined. The same calculation with |xy| replaced by xy shows that
E[xr] = E[x]E[r]. ■
7.3
237
EXPECTATION AND CONDITIONAL EXPECTATION
EXAMPLE 7.7 Let X and Y be independent random variables having
exponential densities with parameters Aj and A2, respectively. By Exercise 7.2.5,
E[X] = l/AiandE[y] = I/A2. By Theorem 7.3.3, E [XT] = 1/(AiA2). ■
The preceding theorem can be extended to functions of X and Y as follows.
Theorem 7.3.4
Let X and Y be independent random variables and let and ip be continuous
real-valued functions with E[</>(X)] andE[il/(Yf] defined. Then E[0(X)i/<y)]
is defined, and
E^X^Y)] = E[d>(X)]E[iKY)].
All of the theorems and corollaries of Section 4.4 hold for arbitrary random
variables. The proof of Schwarz’s inequality (Equation 4.8) is precisely the
same as in Section 4.4 as soon as it is established that E[X2] = 0 implies that
P(X = 0) = 1. Consider the case that X has a density function fx. For every
positive integer n,
0 = £[X2] =
f+°°
x2fx(x)dx >
f
1 /
1 \
X2fX(x)dx > —rP[|X| > -
J —00
\
Thus, P(|X| S: 1/n) = 0 for every positive integer n, and since the events
(|X| S: 1/n) increase to the event (|X| > 0) as n —> <», P(X = 0) =
P(|X| = 0) = 1. This is all that is needed to replicate the proof of Schwarz’s
inequality. It is not essential that X have a density function.
The following example can also be interpreted as a random walk in the
plane if the bonds are thought of as instantaneous displacements of a randomly
moving particle. For a more comprehensive treatment of chain molecules, see
the book by P. J. Flory listed at the end of the chapter.
EXAMPLE 7.8 (Chain Molecules) Consider a chain molecule formed in
the following way. Let €be a fixed positive number representing the distance
between two molecules or the length of the bond between the two. Starting
with an initial molecule at the origin of thexy-plane, a bond is formed between
it and molecule #1 of length € making an angle 0i with the positive x-axis,
where 0! is chosen at random from the interval [0, 2tt]; starting from the
position of molecule #1, a bond is formed between it and molecule #2 of
length € making an angle 02 with the positive x-axis, where 02 is chosen at
random from the interval [0, 2tt], independently of 01, and so forth, as shown
in Figure 7.1. If there are n bonds in the chain with an initial molecule at
the origin, what is the expected value of the square of the distance of the nth
molecule from the origin? For i = 1,..., n, let (X,, Y,) be the change in the
coordinates in going from the (i — l)st molecule to the ith molecule. Then
X,- = £cos0i
Yi = €sin0;.
238
7
EXPECTATION REVISITED
FIGURE 7.1 Chain molecule.
The position of molecule #n is then
If D denotes the square of the distance of molecule #n from the initial
molecule, then
f/ „fi
v=l
/[n„
\2
\
+EKi
/ V=1
/
\2\
fl
fl
i=l
i=l
i
Note that
£[X?] = £[€2cos20;]
€2
f2n
= —
2tt Jo
=
€2
COS2 0; d
f2” 1 + COS 20; d
2tt Jq
2
Oi
= f
2
and that
£[X,X; ] = €2£ [cos 0; COS 0j ]
= €2£[cos0;]£[cos0;]
2tt
/ 1 f2,r
= €2 —
COS 0; d 0; I--cos 0; d 6j
\2tt Jo
J\2tt Jo
= 0
7.3
EXPECTATION AND CONDITIONAL EXPECTATION
239
by the independence of 0, and 0;- for i
j and Theorem 7.3.4. Similarly,
E[K?] = €2/2 and E[r, r; ] = 0. Therefore,
P2
P2
E[D2] = n— + n — = nP2.
2
2
It should be emphasized that this is not the square of the expected value but the
expected value of the square. By Schwarz’s inequality, (E[D])2 < E[D2] =
nP2, from which it follows that E[D]
Pjn. ■
In considering a single random variable X, most of the information
concerning X is embodied in its density function. For some purposes, we
would like to summarize that information in a few parameters.
Consider a random variable X that has a finite second moment and density
function/x; i.e., E[X2] = f^x2fx(x) dx < +<». In Section 4.3, we saw that
|x|
x2 + 1 for all x G R, so that
r +°°
J—00
r +°°
r +a>
MxW dx <
J—00
(x2 + l)/x(x) dx =
J—00
x2/x(x) dx+1 <+°o,
and it follows that E[X] is defined. Thus,
Mx = B[X]
is finite and is called the mean of X. Since (x — Mx)2 — x2 — 2fixx + Mx>
-+00
(x - p-x^fxWdx
-00
P +00
r +oo
r 4-00
=
x2fx(x')dx - 2/zx
J—00
J—00
= E[X2] -jll2
x< +oo.
/j.2
xfx(x)dx
)dx+l
*
x/x(
J —00
Thus, E [(X — Mx)2] is finite, and we define a second parameter
a2
x = E[(X - /zx)2] = W2] “
= E[X2] - (E[X])2,
called the variance of X. The variance of X is also denoted by var X. It is eas­
ily checked that
var(cX) = c2varX
var (X + c) = var X
EXAMPLE 7. 9 Let X have a uniform density on [a, fe], a < b. It
was shown in the previous section that E[X] = (a + b)/2 and E[X2] =
(l/3)(b2+ab+a2). Thus./zx = («+b)/2and
= var X = (b-a)2/12. ■
7
240
EXPECTATION REVISITED
EXAMPLE 7.1 0
Let X have a T(a, A) density. By Exercise 7.2.6, for each
positive integer r,
, r,
(a + r - l)(a + r - 2) X • • • X a
B[X J = ------------ - ----- —------- ----------- •
A
In particular, £[X] = a/Aand£[X2] = ((a + l)a)/A2, so that fix = a^
andvarX = a/A2. ■
EXAMPLE 7.1 1 Let X have an exponential density with parameter A > 0.
Since this density is the same as the T( 1, A) density, fix — 1/A and var X =
1/A2 by Example 7.10. ■
EXAMPLE 7.1 2
Let X be a random variable having a standard normal
density. Since
f
J-x
|x|—^=e~x2/2 dx = 2 [
J2it
Jo
x—j=e~x2/2 dx < +00,
j2ir
£[X] is defined and
£[X] = P x—~=e~xl/2 dx = 0,
J-x
y/2lT
since the integrand is an odd function. To find var X = £ [X2] — (£ [X])2 =
£[X2J, we need only determine £[X2 J. Letting Y = X2, by Example 6.19 Y has
a T( 1/2,1/2) density. By Example 7.10, £[T] = 1. Thus, fix = £ [X ] = 0
and ax = var X = 1. ■
Now let Y be a random variable Y having an n(fi, a2) density. By
definition, Y = aX + fi where X has a standard normal density. It follows
that fiY = £[T] = <r£[X] + fi = fi and a2Y = var Y = var (aX + fi) =
a2 var X = a2. From Example 6.17,
/r(X) =
y/2Tra
and fiy and ay can be readily identified by examining the parameters fi and
a in the function fiy.
Although the random variable X is not required to have a density in the
next lemma, the proof will assume a density.
Lemma 7.3.5
(Markov’s
Inequality)
IfX is any random variable for which E [X ] is defined and t > 0, then
P(]X| > t) <
7.3
EXPECTATION AND CONDITIONAL EXPECTATION
241
PROOF: E[|X|] = | "|x|/x(x)dx 2 (|x|a, |x|/x(x)dx a :P(|X| a r). ■
As in the discrete case, Chebyshev’s inequality is an easy consequence of
Markov’s inequality.
Theorem 7.3.6
(Chebyshev’s
Inequality)
Let X be a random variable with mean /z and finite variance a2. Then
(T2
P(|X-/z| > 8) < —
for all 8 > 0.
Suppose now that the random variables Y, Xb ..., X„ have a joint density.
We can then consider the conditional density of Y given Xi,..., Xn and define
the conditional expectation of Y given Xb ..., Xn by the equation
r d-oo
W |Xj = xb...,x„ = x„] =
yfy\xu. „x„(y\xi>--->xn)dy,
J —00
provided the integral is absolutely convergent.
Theorem 7.3.7
If Y, Xb... ,X„ have a joint density and E [7] is defined, then £[T | Xi =
xb..., X„ = xn ] is defined, and
£[/] =
| "’J jE[r|Xi = Xi,...,xn = xn]fXl... x„(xi, ■ ■ ■ ,xn) dxi ...dx„.
Rn
Sketch of Proof: Since £[ 7] is defined,
r+oc
+00 >
=
J —00
r +oo
J —00
\y\fy(y)dy
r
r
1/1
J
•••
J
/r,x.... xn(y,xb...,x„)dxi ...dxn dy
Rn
f
=
J
f /f+°°
\
.... x,(/lxi’•••’*
'«) ‘O'
J v “0°
Rn
X fx.... xn(xb...,x„)dxi ...dxn. ■
/
Thus, the integral within parentheses is finite for “most” points (xb ... ,xn) in
Rn. The same calculation with |y| replaced byy will establish the final result.
242
7
EXPECTATION REVISITED
EXAMPLE 7.13 Let (X, K) be the coordinatesofa point chosen at random
from a triangle with vertices at (0,0), (1,0), and (1,1). Intuitively, given that
X =■ x with 0 < x
1, the point (x., Y) is then chosen at random from the
line segment joining (x, 0) to (x,x); i.e., given that X =■ x, Y has a uniform
density on (0,x), and therefore E[y|X = x] = x/2 for 0 < x
1. This
intuitive argument can be justified by the following formal calculations. The
joint density of X and Y is
if0 < y < x,0 < x
otherwise.
fx.y(x,y) =
Clearly,/x(x) = Oforx < Oandx > 1. For 0 < x
1
1,
‘X
fxM =
fx,Y(x,y)dy
.o
2dy = 2x.
Therefore,
f 2x
[ 0
ifO < x < 1
otherwise
and
)
*
/r|x(y|
=
1/x
0
if 0
y < x, O^x
otherwise.
1
ForO < x < 1,
E[r|x = x] =
yfy\x(y\x)dy =
rx i ,
ix2
x
„
= TT = 2
Care must be taken in making heuristic arguments of the tyoe appearing in
this example. Such arguments can allow one to discover facts, but they should
be formally verified as above before staking one’s reputation on the result.
The last theorem can be stated in a more general context. Suppose
Ti> ■ • ■ > Ym, Xj,..., X„ are random variables having a joint density and i/j is
a continuous real-valued function of nt variables. If E[i/j(Y1,.. ., Tm)J is
defined, then
E[IA(r1,...,y„,)|x1 = xb...,x„ = x„] =
| '"I </'(/!>••
Rn'
■>ym)fYl.... rJX!..... X„ (/b • • ■ »/ni|xi,. .., x„) dyx... dym.
7.4
NORMAL DENSITY
243
Starting with this result, it is possible to develop such concepts as condi­
tional variance of Y given Xi = xb...,X„ = xn denoted by var(Y^ =
xb..., X„ = xn), and so forth.
EXERCISES 7.3
1.
Let X and Y be independent random variables, both having an expo­
nential density with parameter A = 1. If Z = max(X, Y), calculate
E [Z] without using the density of Z.
2. A point (X, Y) is selected at random from the unit square {(x,y) : 0
x
1,0
y
1}. Given that X = x and Y = y, a point (U, V) is
selected at random from the rectangle {(«, v) : 0
u < x, 0
v
y}. Calculate £[D|X = x, Y = y].
3.
Let A be a random variable having an exponential density with pa­
rameter 1. Given that A = A, the random variables Xb...,Xn are
independent random variables each having an exponential density with
parameter A. Find£[A|X! = xb...,X„ = x„],
4.
Let X and Y be random variables having the joint density
8xy if 0 :£ y < x < 1
0
otherwise.
Find£[Y|X = x] and E[X |Y = y].
5.
Let A be a random variable having a gamma density Ha, /?) and given
that A = A, X has an exponential density with parameter A. Find the
density of X and the conditional density of A given X = x.
6. Let U and V be the random variables of Exercises 6.6.7 and 7.2.4 and
let/? = V — U be the range of the Xb.. .,Xn. Calculate £[/?] and
var R.
7. Consider the random variables U and V of the previous problem.
Determine/u|v(M|v) and calculate £[U|V = v].
Consider the random variables X, Y, and Z having the joint density
of Exercise 6.6.8. Determine _/z|x.y(z|x,y) and calculate £[Z|X =
x,Y = y].
9. Let X and Y be the coordinates of a point chosen at random from
the triangle in the plane with vertices at (—1,0), (0,1), and (1,0).
Determine £[Y|X = x] without calculations.
8.
10.
A number X is chosen at random from the interval [0,1]. Given that
X = x, a number Y is chosen at random from the interval [0,x].
Calculate £[X | Y = y].
NORMAL DENSITY
Consider a sequence {Xy}"=1 of n Bernoulli trials with probability of success
p = 1/2 and let $„ =
We know that
has the binomial density
function
244
7
EXPECTATION REVISITED
tU;"4)= (")2'"’
i
’"■
2
with E[S„] = n/2 and a$„ = Jn/2. If we were to examine a bar graph
of this density, it would be centered about the point x = n/2 on the x-axis,
and the variance n/4 would be large for large n, indicating that the density is
spread out far from the mean. As n becomes large, the individual probabilities
would also become small. To eliminate the spreading effect, we consider the
normalized sum
Sw ~ (n/2)
which is centered about E[S
]
*
= 0 and has variance var S* = var((S„ —
(n/2))/( Jn/2)) = (4/n)var Sn = 1. The bar graph of Sj6 is depicted in
Figure 7.2. The area of the rectangle centered above 0 represents the probability
* i6 = 0) = P(S„ = 18) = ( ig )2'36 - .132.
P(S
If we compare Figure 7.2 with the graph of the standard normal density depicted
in Figure 6.11, it might appear that one of the two could be used to approximate
the other. At one time it was impractical to calculate? (a < Sn < /?) because of
the large number of arithmetic operations required even when n is moderately
large, and so the normal density was used to approximate
With the advent of fast computers, such calculations can now be done in
fractions of a second for moderate values of n. Even though approximation of
FIGURE 7.2 P(Sj6 = i).
7.4
245
NORMAL DENSITY
binomial probabilities is not as important as it once was, there are valid reasons
for looking into normal approximations to the binomial density.
The following theorem was first proved by DeMoivre (1667-1754). The
proof of the theorem involves only elementary facts from the calculus. Reading
the proof is not essential for learning probability at this stage.
Theorem 7.4.1
(Central Limit
Theorem)
Let {X j }?= ! be a sequence of n Bernoulli trials with probability ofsuccess p = 1/2
and letSn =
=
Then
lim Pi\S„ ~ —| <x----- j = ■
f
e t2/2dt.
The probability on the left is the same as P (|S
* | < x). The proof of the
theorem requires the following approximations.
Lemma 7.4.2
There are functions y and 8 such that
(i)
In (1 + x) = x(l + y(x)) if ]x| < 1.
(ii)
|y(x)| == 8(|xj) if|x| < 1.
(Hi)
8 is nondecreasing on [0,1) and limJC_>o+ 8(x) = 0.
PROOF: The function In (1 + x) has the Madaurin series expansion
x2 x3
x4
ln(l + x) ~x~ — + — — — + ••■
2
3
4
= x(l + y(x))
for |x| < 1 where y(x) = — (x/2) + (x2/3) — (x3/4) + • • •• Let
Since the series defining 8 converges absolutely in (-1,+1) and the sum
of a power series is a continuous function on its interval of convergence,
limx_»o+ 8(x) = 8(0) = 0. It is clear that 8 is increasing on [0,1). ■
We will also need the following discrete version of the mean value theorem
of the integral calculus.
246
7
Lemma 7.4.3
EXPECTATION REVISITED
If a\,... ,an are nonnegative real numbers and b},...,bn are any real numbers,
then there is an m with \m |
maxj < j
| such that
n
n
^ajbj = m^aj.
j=i
'
j=i
PROOF: We can assume that some aj > 0 because otherwise we can take
m = 0. Since
n
Z^ajbj
■
n
~ ZE MM
n
IM)ZEai’
;=i
“ <
;=i
we can take
m =
^a;b;
■
We will also need to approximate the exponential function. It follows
from the Maclaurin series expansion of ey that ey = 1 + 0(y) where 0(y) =
y + y2/2!+y3/3!+••• satisfies (1) limy-,0 0(y) = 0, (2)|0(y)|
0(|y|),and
(3) 0(y) is nondecreasing on [0, +<»).
Proof of Theorem 7.4.1 We prove the theorem for an even number of trials
first. The number x will be fixed throughout the proof. Consider only those n
for which x Jn/1 < n. Let
jPn(x) = P(|$2n ~~n\<x Jnji').
Since Sj„ has the binomial density &(•; 2n, 1/2),
P„(x) =
Substituting j for k — n,
PnM
Note that ( n + j ) — ( n _ j )> so that the term in the sum corresponding
to j is equal to the term corresponding to -j. Consider the terms of the sum
7.4
247
NORMAL DENSITY
for which j >: 0. Since
(2n)l
n+j'
(n+j)!(n-j)I
(n +;)! = (n + j) X • • • X (n + l)n!
( 2n
\ =
n(n - 1) X • • • X (n -j + 1)’
we can write
2n
\2~2n = (2n')'2~2n
n!n!
+ 1)
(n +j)(n + j — 1) X • • • X (n + 1)'
»(» ~ !) x • • • x (»
By the above remark, this equation holds for; < 0. Therefore,
y (2n)!2-2„ n(n ~ 1) X • • ♦ X (n - j + 1)
-y__ nln!
(n +j)(n + j — 1) X • • • X (n + 1)'
Letting
= (2»)!2-2n
and applying Stirling’s formula, Equation 5.10, to the factorials, P„ ~ 1/ Jim.
This means that P„/(l/ ypirn ) —> 1 as n —> <»; i.e.,
wherelim„_,oo8„ = 0. Thus, we can write
(l + 8„).
Therefore,
PnM =
1 z, . c x
,---- ( 1 + O n )
n(n - 1) X • • • X (n -j + 1)
.• \ /
•
t ,f\
(n +j)(n +j - 1) X •• • X {n + 1)'
For |j | < x Jn/l, let
_
n{n — 1) X • • • X (n — j + 1)
Dn,i ~ {n +j)(n + j - 1) X ••• X (n + 1)
1
(1 + (j/n))(l + (j/(n - 1))) X • • • X (1 + (j/(n ~ j + 1)))'
248
7
EXPECTATION REVISITED
Taking natural logarithms and approximating the In function using the y
function of Lemma 7.4.2,
Writing j/(n — i) = (j/n)(l + (i/(n - i))),
Applying Lemma 7.4.3 to the second sum on the right,
.V 2 = -L-y .L
In Dn,j = -L-y
n
7n',2^n
n
7n'J n
i =o
where
< max
J-l
Recalling the 8 function of Lemma 7.4.2,
Noting that
and letting A„iX denote the quantity on the right, lim„
\n — i
A„>x = 0 and
3(A„,X).
Also note that |i/(n - i)| < |j|/(n - |j|) < A„,x. Thus,
|y„jl < 8(A„,X) + A„>X + A„>X8(A„,X).
249
NORMAL DENSITY
7.4
Letting A„jX denote the quantity on the right,
71, X
where lim„_oo A„>x = 0. Thus,
Dnij = e^2/"e~lVn^2/n)
Using the approximation
= 1 + 0(y) discussed after Lemma 7.4.3,
/
/
Dn,j = e-j1,n 1 + 0
\
\
i2
n
Thus, Equation 7.1 can be written
1
/
/
j
M7T
\
\
M
—e-/2/"(l + 8„) 1 + 0
1
mr
i2\
/
j2\\
+ 8ne\-yn,j- •
n
1
Jn
1
/
(
7=e~C/n 8„ + 0
M7T
\
\
We will now show that the second sum on the right has the limit zero as n —> <».
Applying Lemma 7.4.3 to the second sum, it can be written
(7.2)
where
|A„J < max
Since|0(y)|
/
8n + 0
\
/
j2 \
M /
+3n0
\
j2
—
«
;|j|
0(|y|) and 0(y) is nondecreasing on [O,00),
e(-y jyn.jn
•2 \
I'M7J -
(
/
2\
and
/_ x2\
~ x2 \
|A„,X| < a„+ 0[A„,xy )+6„0lA„,xy 1^ Oasn -> co,
250
7
EXPECTATION REVISITED
and therefore limM-i.x A„,x = 0. Note also that
and therefore the quantity in 7.2 has the limit zero as n —> oo. It follows that
lim P„(x) = lim
n-tx
3^
- 1—-.e ^/n
n -»X
Jn 7Jv
|;|<x Jn/2
provided the limit on the right exists. If we let Xj = j j2/n, then the
points {xj : |j[ < x Jn/2} constitute a partition of the interval [—x,x] into
subintervals [xj - b Xj ] of length Ax;- = j2/n, and
Since the sum on the right is just a Riemann sum defining the integral
f *x(l/,/27r)e-f2/2dt,
lim P„(x)
n—
This completes the proof for even n. To take care of odd n it is necessary to
do an epsilon argument. We will use </>(x) in place of (1/ J2^^)e~x^,^ for the
rest of the proof. Since J
d t is a continuous function of x, given e > 0
there is an h >0 such that
■x+h
rx
<J>(t}dt <
-x-h
<f>(t)dt + £.
J—x
Since
lim P([S2,i - n\< (x - h) Jn/2)
n —>a>
4>(t)dt
and
rx+h
lim P(|S2„ - m| < (x + h) Jn/2) =
4>(t)dt,
J-x-h
there is a positive integer N such that for all n > N
7.4
251
NORMAL DENSITY
(i)
(1/2) - (h/2)j2n < 0.
(«)
P(|S2n - n| < (x - A) 7^/2) > f_\ <f>(t) dt - e.
(in)
P(|S2m - n| < (x + h) y/n/i) < $* x<f>(t) dt + e.
Since X:(co) = 0 or 1, |X,(co) — (1/2)| = 1/2 for all i, and therefore
.
2n + 1
•$2n+i(w) “ •—~— = 52n(w)+X2n+i(co) - n —— |S2„(w) - n| + X2n+i(w) - = |52m(w) “«l+ J-
Suppose |S2„ (co) — n \ < (x — h) Jn/2. By (i), for m > N,
„
, x
2m + 1
52n+i(w) — —-—
„
, x
52m+i (a>)
„
, x
52m+i(co)
2n + 1
-—
2n + 1
-—
: |S2(„+i)(to) - (n + 1)| < (x + A)
It follows from these relations and (ii) and (iii) above that for n
<f>(t)dt - e < P |S2„ - n| < (x - /j)
N,
252
7
EXPECTATION REVISITED
2n + 1
2
2n + 1
4
n +1
2
</>(t) dt + e.
This shows that
J.
2n + 1.
2n + 1 \
fx lt . , _
hm P |S2„+i “ ~~—\<XJ—7— =
<f>(t)dt. ■
I
2
V
4
I j-v
The original central limit theorem has a tendency to underestimate the
binomial probabilities, as can be seen from Table 7.1 in the n = 36, p = 1/2
case.
The central limit theorem was improved upon by Laplace (1749-1827).
Let {Xj}"=1 be a sequence of n Bernoulli random variables with probability
of success p and let Sn =
The following result gives a better
approximation of binomial probabilities. The proof belongs in a more
advanced text.
P(a < Sn <
+
(7.3)
where h = 1/ ^/npq andxf = (t — np)h.
Suppose n — 36 and p = 1/2. According to Table 7.1,
P(13 < Sn < 23) = .9347. If we use Equation 7.3 to approximate this
probability, then h = 1/3, x13 = —5/3, x23 = 5/3, and
EXAMPLE 7.14
(5 1\
(5
1\
P(13 < S„ ^23)«4> - + - -4> ----)«= .9332,
\3 6 /
\ 3
6/
Number of Successes
Probability
Normal Approximation
17 < S„
19
16 == S„ < 20
15 < S„ < 21
14 < S„ < 22
13 < S„
23
12 < S„ < 24
11 < S„
25
10 < S„ < 26
9 < S„
27
.3833
.5950
.7570
.8675
.9347
.9711
.9887
.9960
.9988
.2611
.4950
.6827
.8176
.9044
.9545
.9804
.9923
.9973
TABLE 7.1 Normal Approximation of Binomial Probabilities
7.4
253
NORMAL DENSITY
a much better approximation than that given in Table 7.1. ■
The following theorem is a weakened version of the last approximation but
has the advantage of being easier to apply in some situations.
Theorem 7.4.4
(DeMoivreLaplace Limit
Theorem)
Let {X;}"_ j be a sequence of n Bernoulli random variables with probability of
success p and let Sn =
=
Then for fixed a < b,
P(a < S*
EXAMPLE 7.15
< &) « #(&) - <!>(«).
(7.4)
A survey is undertaken to determine how many voters
in a population of eligible voters favor candidate A. Assume that the unknown
proportion of voters who favor A is p and that voters act independently of one
another. Suppose we want to determine how many should be polled so that
the observed proportion of favorable voters is within .05 of p with probability
at least .95. We can look upon the polling as a sequence of Bernoulli trials
{Xj}"=1 with unknown probability of success p. The observed proportion of
favorable voters will then be Sn/n, and we want to choose n so that
P
Sn
T~P < .05 > .95.
We could proceed as in Section 4.3 by using Inequality 4.6 to require that
P
S„
T~P
> .05 I <
1
< .05;
4n(.O5)2
i.e., that n > 2000. Since Inequality 4.6 assumes virtually nothing about the
density of S„/n, a better result might be obtained by invoking Theorem 7.4.4.
Note that
P --P
n
< .05^ = p( Sn “ np
= p(|s*i
(.05)Vn/pg
< cos) y^).
By the Standard Normal Distribution Function table (see page 346), the
approximate solution of the equation ^(x) — <I>(— x) = 2<I>(x) — 1 = .95 is
x = 1.96. Since P(|S
|
*
x) == 4>(x) - 4>(-x) = .95, we should choose n
so that
(.05) yjn/pq > 1.96,
in which case we would have P(|S
|
*
that
n >
(.05) Jn/pq)
/L96\2
.95. This requires
254
7
EXPECTATION REVISITED
Since pq = p(l — p) s 1/4, if we choose n so that
/1.96\2 1
/1.96\2
n — ----- - 2: ----- pq,
\.O5 ) 4
\ .05 ) ™
we would then have P(|(S„/m) — p| < .05) S: .95. We therefore take n to be
the smallest integer for which
i.e.,wetaken = 385. By polling 385 eligible voters and using S„/n toestimate
the unknown p, we know that our estimate will be within .05 of p 19 times out
of 20. ■
The central limit theorem is valid in much more general situations than those
dealt with here. For example, if {Xj} is a sequence of independent random
variables having the same distribution function with p. = E[Xi], <r2 =
var X] < +<», and Sn = X"= i 7C;, then
lim Pla
l
„ _oo
EXERCISES 7.4
fj.
— &|l =<>(&)-<>(«).
(7.5)
Equations 7.4 and 7.5 were used to obtain answers to the following problems.
1.
Approximate
)2-64.
2.
Approximate XJI30 (
)(l/4); (3/4)128-7.
3. A jumbo jet with a seating capacity of 360 passengers is allowed a
maximum weight of 59,000 pounds for passengers. If the average
weight of a passenger is 160 pounds with a standard deviation of
cr = 48 pounds, what is the approximate probability that the weight
limit will be exceeded, assuming the 360 passengers that board are a
random sample from the population?
4. A national polling agency would like to determine the percentage of
eligible voters who favor their client within 3 percentage points with 90
percent confidence. How many eligible voters should be polled?
5. Consider a particle taking a random walk on the integers starting at 0
with p = 1/2. What is the approximate probability that the particle
will be within 30 units of 0 after 1000 steps?
6.
Consider a particle taking a random walk on the integers starting at 0
with p = .45. What is the approximate probability that the particle will
be to the right of -50 after 1000 steps?
7.5
COVARIANCE AND COVARIANCE FUNCTIONS
255
7. Then real numbers alt... ,an are rounded off to the nearest integers a\ +
X\,...,an +X„, respectively, where the round-off errors Xb.. .,X„ are
assumed to be independent and have a uniform density on [ —1/2,1/2].
Use the central limit theorem to find a number A > 0, depending upon
n, such that P (|
= i Xj | < A) = . 99. (See the final paragraph of this
section.)
8. A programmer decides to carry tn significant figures to the right of the
decimal point and round off the result of any addition, multiplication,
or division operation to that many figures. Assume that 106 elementary
operations are performed, that successive round-off errors are inde­
pendent and have a uniform density on [~ (l/2)10~m, (1/2) 10~m ], and
that the final error is the sum of all the round-off errors. Find an upper
bound, which does not depend upon m, for the probability that the
final error will be less than 5 X 10-,”+2 in absolute value.
COVARIANCE AND COVARIANCE FUNCTIONS
The covariance cov (X, K) between two random variables with finite second
moments was defined in Section 4.4. All the definitions, lemmas, and theorems
of that section are valid for any random variables—discrete, continuous, or a
mixture of the two. We can and will use the properties of cov (X, K) described
in Section 4.4.
Assuming that the random variables X, Y have finite second moments with
<Tx > 0 and ay > 0, the correlation between X and Y is defined just as in the
discrete case by the equation
fV v. _ cov(X,r) _ £[(X-/zx)(y-/xy)]
p(A> I )
!
— /
o-xo-y
JvarXjvarK
As pointed out above, Inequality 4.8, Schwarz’s inequality, holds for any
random variables with finite second moment.
Theorem 7.5.1
(Schwarz’s
Inequality)
I/X and Y are any random variables with finite second moments, then
(£[XF])2
£[X2]£[r2]
(7.6)
with equality holding if and only if P(X =0) = lorP(y = aX) - 1 for
some constanta.
As in the discrete case, |p(X, K)|
1 with equality if and only if there are
constants a, b G R such that PfY = dX + b) = 1.
256
7
EXPECTATION REVISITED
If x =
...»xn) is a point in Rn, the length of the vector x is the quantity
(X"=i x?)I/2. By analogy, if {xb... ,x„} is the range of a random variable X,
we could define the length of X, written ||X||, by the equation
/ n
.
,
\1/2
m=
t_______
= ^x2iV=1
/
Since the quantity on the right makes sense for any random variable with finite
second moments, we can extend this notion as follows.
Definition 7.4
If X is any random variable with finite second moment, define || X || =
Je [X2]; || X || is called the norm ofX. ■
What is to be made of the equation ||X|| = >/jE [X2] = 0? Putting Y = 1
in Inequality 7.6,
(£[X])2 < £[X2]
and £[X] = 0. According to the discussion following Theorem 7.3.4,
var X = 0, and therefore P(X = 0) = 1. Note that this does not mean that
X(to) = 0 for every a> G ft. If we have two random variables X and Y with
||X — r|| = >/£[(X — K)2] = 0, we can conclude only that P(X = Y) = 1.
Two random variables X and Y with this property are said to be equal in the
probability sense and will be regarded as the same in this section.
As in vector calculus, once we have a concept of length, we can go on to
distances.
Definition 7.5
IfX and Y are random variables with finite second moments, the mean square
distance between X and Y is the quantity
|]x-y|| = 7b[(x - y)2]. ■
The following inequality is the analog of the geometrical fact that the length
of a side of a triangle is less than or equal to the sum of the lengths of the other
two sides.
Lemma 7.5.2
Triangle Inequality)
IfX and y are any random variables with finite second moments, then
U + I'll SW + 114
(7.7)
PROOF: Since ||X + F||2 = E[(X + Y)2J = £[A'2| + 2£(XY) + E[Y!] and
7.5
COVARIANCE AND COVARIANCE FUNCTIONS
E [XY]
257
^/£[X2] VE[K2J by Schwarz’s inequality,
||x + r||2 < e[x2] + 2 Vb[x2] Ve[y2] + e[r2]
= pEpr2) + Jem)2
= (W+IMA
and therefore ||X + Y|| < ||X|| + ||K||. ■
Replacing X by X — Z and Y by Z — Y in 7.7, we obtain the inequality
||x-r|| == l|x - z|| + ||z - r||,
(7.8)
which is also referred to as the triangle inequality. Another inequality can
be obtained from Inequality 7.7 by replacing X by X — Y to obtain ||X||
||X — y|| + ||y|| or ||X — y|| S: ||X|| — ||y||. Interchanging X and Y in the
latter inequality and using the fact that ||X — Y|| = ||y — X||, ||X — Y|| S:
||y|| — [[X[|. We thus obtain another version of the triangle inequality:
llx - r||
|||x||-||r|||.
(7.9)
With the above definitions in mind, we can now discuss convergence of
sequences of random variables.
Definition 7.6
Let X,Xi,X2,... be random variables having finite second moments. The
sequence {X„ }"= j converges in mean square to X, written ms-lim„
Xn = X,
if
lim ||X„ -X|| = 0
n -*a>
or, equivalently,
lim E[(X„ -X)2] = 0. ■
n -*oo
We can use Inequality 7.8 to show that the mean square limit of a sequence
{Xn } is unique in the probability sense if it exists. Suppose
lim ||X„ — X|| = 0
and
lim ||Xn — X'|| = 0.
n—n—♦<»
Since
0 < ||x -x'll == ||x -X„|| + ||x„ -X'll -> 0
as n -> «>, || X - X'|| = 0, and therefore X = X' with probability 1.
258
7
EXPECTATION REVISITED
Now that we have a means for taking limits, we can deal with infinite
series. If {%/}”= j is an infinite sequence of random variables having finite
second moments, we can form the infinite series 5?j = lXj. Letting Sn =
= i Xj, n > 1, if there is a random variable S with finite second moment
such that S = ms-lim„_x S„, we write S = ms-lim„_»xXy = i Xj = msXy°=iXj and say that the series ^mlXj converges in the mean square
sense.
Convergence in the mean also implies convergence of means.
Lemma 7.5.3
Let X,Xi,X2, ...and Y, YUY2... be random variables with finite second
moments. //ms-limn_»xX„ = X andms-lim„_»x Yn = Y, then
(i)
limM_»»£[XM] = £[X].
(ii)
\imn^E[XnY] = E[XY].
(Hi)
lim„_>w£[X2] = £[X2].
(iv)
lim„_«,£[X„y„] = £[Xy].
PROOF: Replacing X and Y in Inequality 7.6 by 1 and |X„ — X|, respectively,
by Theorem 7.3.2:
|£[X„] — £[X]|2 < £[|X„ - X|]2 == £[(X„ -X)2]
0
as n —> oo. Thus, lim„
£ [X„ ] = £ [X ] and (i) is proved. By Schwarz’s
inequality, 0 < |£[X„y] - £[XT]|2 = |£[(X„ - X)T]|2 < £[(X„ X)2]£[T2] -> 0 as n -> oo, and lim„_x£[X„ y] = £[XV], so that (ii) is
true. Part (iii) is the same as the statement that limn_x||X„||2 = ]]X]]2. If we
can show that lim„_»x ||X„|| = ||X||, then (iii) would be proved by continuity
of the function/(x) = x2. By Inequality 7.9, | ||X„||-||X|| | < ||X„ -X||
0
as n —> oo and (iii) is true. To prove (iv), by Schwarz’s inequality:
|£[x„y„] -£[xy]|
= |£[(x„-x)y„] + £[x(y„ - y)]|
£[|(x„ -x)y„]]+£[]x(y„ - y)|]
Vb[(X„ -x)2]V£[y2]+ V£[x2]y£[(y„-y)2].
Since limn_»x£[(X„ — X)2] = 0, lim„_»x £[(yn — y)2] = 0 by hypothesis,
and lim„ _ x £ [ y2 ] = £[y2] by (iii), lim„_x£[Xn Yn ] = £[Xy]. ■
In the remainder of this section, we will consider a family of random
variables {Xf : t G T} having finite second moments where T is the set of all
real numbers R or the set of integers Z, assuming that such a family exists.
Definition 7.7
The family or process {Xt : t E.T} is weakly stationary if
1.
£[X$] = £[Xf] for alls, t £ T.
7.5
259
COVARIANCE AND COVARIANCE FUNCTIONS
2.
cov(Xs,Xt) = cov(X$+Jt,Xt+h)forallh,s,t G T. ■
The function
r(h) = cov(Xf,Xf+fc) = cov(X0,Xfc),
h G T,
is called the covariance function of the process. To avoid trivialities, we assume
that r(0) = cov(Xq,Xq) = var Xq — <t2 > 0. Note that for h > 0,
r(—h) = cov(Xt,Xt-h) = cov(Xt-h,Xt) = r(h), and therefore
r(h) = r(-h) = r(|h|),
h G T.
The normalized covariance function
is called the correlation function of the {Xf : t G T} process.
EXAMPLE 7.16 Let {Xj}
!..^
*
be a sequence of independent random
variables having the same distribution function, finite second moments, and
a2 = var Xq. Then for any v G Z,
if v = 0
if v ^0. ■
r(u) = E[(Xj - £[Xj)(Xj+„ - E[Xi+P])] =
EXAMPLE 7.17 Let U and V be uncorrelated random variables (i.e.,
p(U, V) = 0) with zero means and unit variances. For A G R, let
X t = U cos At + V sin At,
t G R.
Since £[X(] = £[(7] cos At + £[V] sin At = 0, the covariance function is
given by
r(h) = cov(X t,Xt+h) = E[XtXt+h]
= £ [(17 cos At + V sin Ar) (CJ cos A(t + h) + V sin A(t + h))]
= cos At cosA(t + h)E [U2] + cos At sinA(t + h)E [U V]
+ sin At cosA(t + h)E [CJ V] + sin At sin A(t + h)£[ V2].
Since £[172] = 1,£[V2] = l,and£[(JV] = 0,
r(h) = cos At cos A(t + h) + sin At sin A(t + h) = cos Ah.
260
7
EXPECTATION REVISITED
It follows that the process {Xf : t G T} is weakly stationary. Since we can write
Ci cosx + c2 sinx = Jc2 + c2 cos (x + 0)
where 0 = arctan (ci/c2),Xf can-be written
Xf = Ju2 + V2 cos (At+ 0)
where 0 = arctan (177 V), a random variable. Thus, Xf is a periodic function
with random amplitude Ju2 + V2, random phase shift 0, and fixed frequency
A/2rr. ■
We now consider an example that can serve as a model for the sound
produced by n different tuning forks that are struck at random times.
EXAMPLE 7.18 Let 17O,..., CJ„, Vo,... V„ be uncorrelated random vari­
ables with zero means. Assume that the U, and V, have common variances
a2,i = 0,..., n, and let cr2 = (Tq + • • • + cr2. Also let Ao ..., A„ be distinct
real numbers. Set
n
Xf = 2^(17; cos Ay t + Vj sin Ay t),
t G R.
j=o
Note that E[17; [7y] = E[V,Vy] = E[CJjV;] = 0 whenever i
E[IT,2] = E[V?] = cr?. Clearly,
n
B[Xf] = ^T(E[l7y]cosAyt+E[Vy]sinAyt) = 0
7=0
and
B[XfXf+/l]
ITy COS Ay t + Vy sin Ay t)
7=0
n
22(l7fc cos Afc(t + h) + Vk sin A
*(t
fc = 0
E[IT2] COS AytCOSAy(t + h)
+ E[V2]sinAytsinAy(t+ h)
n
cr-cosXjh.
=
j=o
+ /»))
j and
7.5
COVARIANCE AND COVARIANCE FUNCTIONS
261
The process {Xf : t E F} is therefore weakly stationary with covariance
function
n
r(/i) = ^7
cos W-
j~o
Since r(0) =
= a2, the correlation function is given by
" <r2
pw= 21^2 cos Xjh,
h E R.
(7.10)
As in Example 7.17, Xt can be thought of as a mixture of n +1 sound waves with
random amplitudes Juj + V2, random phase shifts 0j = arctan (Uj/Vj),
and fixed frequencies Aj/2-Tr, j = 0,..., n. ■
Equation 7.10 suggests that the correlation function p(A) of a weakly
stationary process {Xf : t G T} has a representation
r00
p(h) =
J —co
cosA/idF(A),
h E R,
(7.11)
where F is a distribution function on R. Without going into details, such a
function exists and is called the spectral distribution function. In Example 7.18,
the function F increases only at the points A, by jumps of <t2/<t2, j = 0,..., n.
For a weakly stationary process of the type
the correlation
function p(v) is defined only for v E Z, and in this case the integrand
cosAp in Equation 7.11 is a periodic function of A of period 2rr since
cos (A + 2 n 7T) v = cos A v. In this case, Equation 7.11 can be written
r(2j+l)7r
00
cosApdF(A).
p(p) =
y = — oo * (2/ ~ 1)tt
It can be shown, but will not be done here, that the latter equation can be
written
pW =
cos ArdF(A),
v E Z,
J(-7T,7r]
where P is a distribution function with F( —rr) = 0 and F(rr) = 1. In the
particular case that
Ip(j')! < °°> the spectral distribution function P
has a density function/, called the spectral density function, so that
pM =
f
J —IT
/(A)cosApdA;
262
7
EXPECTATION REVISITED
moreover,
1
1
f(A) =— + — 3"
* p(p)cos Ap,
2tt tt v= 1
*
— tt < A < tt.
(7.12)
%
It is sometimes useful to smooth data by using a moving average, as in the
next example.
Consider a sequence {X; }"=_«> of uncorrelated random
variables having the same means and variances a2. Fix m S: 1 and for each
integer n,let
EXAMPLE 7.19
m
Yn
^l-Kn + ^2-^n — I + ' ' ' +
m+1
\ QjXn—j + 1
where ab..., am are constants. Put cij = 0 if j € {1,2,..., m}. The sequence
{^j}”=-oo is called a moving average process. Note that
= Ai(«i + " • + «m)
var Yn = a2(a2 + • • • + a^).
and
To show that the Y„ process is weakly stationary, fix p s 0 and consider
m
m
/z)(Xn + i,_y+i ~ P-)]-
[(Xn -i-t-1
i=lj=l
Since the terms of this sum are nonzero only when the subscripts of the X/s
are equal, the sum is equal to
m
a2^aj-vaj.
j=i
Since this quantity does not depend upon n, the {T;}”= _re sequence is weakly
stationary with covariance function
m
r(p) = a2y aj-vaj.
j=i
7.5
263
COVARIANCE AND COVARIANCE FUNCTIONS
Since j — v 2: 1 is required for the first factor to be nonzero, we must have
v <j ~
— 1, and therefore
m
r(p) = a2 JT aj-»aj
j = v+\
Therefore,
( } _
1 ’
<r2(am-pam+am-i-pam-l+'■ ■+aiaI/+l')
0
In particular, if a;- = 1/ ^ftn,j — 1,tn, then for v
<t2(1 - (p/m))
if v < tn - 1
ifp>m.
0,
ifO < v < tn - 1
if v 2 tn.
Since r(~v) = r(p) = r(|i'|),
cr2(l - (|p|/m))
0
if M < tn - 1
ifjpj 2 m. ■
Let {Xj }y°= -a, be a weakly stationary process. Suppose there is a real number
A with |A| < 1 such that
Xn=AXn-l+Nn,
nf=Z,
where the Ny’s, representing a “noise” component, are uncorrelated with zero
means and variance <r2. By iteration,
Xn = A(AX„-2 + N„_|)+N„
= X2Xn-2 + XNn-,+Nn
j-i
= XXn-j+^XNn-i.
i=o
Thus,
= £[(A>X„_y)2] = A2>E[X2_y].
(7.13)
264
7
EXPECTATION REVISITED
Since £[X2_J = var X„-j + (£[X„-J)2 and the {X;}JL_x process is weakly
stationary, the quantities on the right are independent of n — j, and so
for some constant c S: 0. Since |A| < 1, lim; _»x A2-* = 0, and so
;-l
X„ = ms- lim
A'Nn-j = msi =0 i=0
«
X Nn
The significance of this equation lies in the fact that the {Xj} process has a
“representation” in terms of a sequence of random variables that are much
easier to analyze. This is apparent in the following calculation of the covariance
function of the {X;}x_ _x process. By Lemma 7.5.3,
since £[N2_,] = a2 and £[N„] = 0 when i
j. To calculate
cov(X„,X„+jt), note that cov(X„,X„+jt) = £[X„X„+t], since £[X„] = 0 for
7.5
265
COVARIANCE AND COVARIANCE FUNCTIONS
all n E Z. By Equation 7.13, for k > 1,
Jt-i
Xn+k ~ kkXn +
XNn+k-i>
:=0
so that
B[X„X„+J
l
= A
*E[X
k-\
2] + lim
I —>00 •• 'J f
j=01=0
X+i E[Nn-jNn+k-i]'
Since n+k ~ i S: n + 1 fori = 0,...,k — 1 and n —j
all of the terms in the double sum are zero. Thus,
n for; = 0,..
B[X„X„+d = A
* —
The covariance function of the {X_m process is
r(Jt) = —A1*1,
AEZ,
and the correlation function is given by
p(A) = AW,
EXERCISES 7.5
1.
JtEZ.
Let ai,...,am,bi,...,bm be positive constants, let Zj,...,Zm be
independent random variables having a uniform density on [0, 2tt],
and let
m
aj cos {nbj + Zj).
Xn =
j-i
Show that the sequence {X;-}"=
its correlation function.
2.
is weakly stationary and determine
Let {Xjbe a weakly stationary process, let a i,.. .,am beconstants,
and let
m
J=1
266
7
EXPECTATION REVISITED
Show that the process {Yj}^_x is weakly stationary and determine
its correlation function in terms of the correlation function of the X j
process.
3.
Consider a sequence of random variables *
{Xj} =_x defined as the
moving average
Xj = Nj + aNj-i,
—<» < j <0°,
where {Nj}
*=
is a weakly stationary sequence of uncorrelated random
variables having unit variances. Find the spectral density function of
theXy process.
4.
Consider a sequence of random variables {Xj}^=_x defined as the
moving average
Xj = Nj + aNj-i + ftNj-2,
-oo <j < oo,
where {Njis a weakly stationary sequence having unit variances.
Find the spectral density of the Xj process.
5.
Let A and 0 be independent random variables where A takes on the
values Ab ..., km with probabilities pb ... ,pm and 0 has a uniform
density on [0, 2tt], and let
Xt = cos (At + 0),
t G R.
Show that the process {X b t G R} is weakly stationary and determine
its correlation function.
SUPPLEMENTAL READING LIST
1.
2.
T. M. Apostol (1957). Mathematical Analysis. Reading, Mass.: Addison-Wesley.
P. J. Flory (1969). Statistical Mechanics of Chain Molecules. New York: Wiley/
Interscience.
CONTINUOUS PARAMETER
MARKOV PROCESSES
INTRODUCTION
In this chapter, we will pursue a path that is less dependent upon the structure
of the probability space and more dependent upon macro properties of non­
random functions.
In the next section, starting with a few heuristic principles governing the
probabilistic behavior of an evolving system in small time intervals, a system
of differential equations is derived and solved, resulting in time dependent
probability functions that describe a process known as the Poisson process.
This process plays an important role in waiting time models.
The Poisson process is a special case of a more general class of processes that
can be described by time dependent probability functions, called continuous
parameter Markov chains. Starting with a set of equations that the probability
functions must satisfy, it is shown that the functions satisfy a system of
differential equations. Using matrix calculus, a method is developed for solv­
ing such systems.
POISSON PROCESS
Consider an experimental situation in which events occur at random times.
For example, calls to a mainframe computer may arrive at random times
0 < ti s t2 < • • •. If a counter is initiated at time 0, it will increase to 1
at time ti, increase to 2 at time t2, and so forth. The outcome in this case is
267
268
8
CONTINUOUS PARAMETER MARKOV PROCESSES
FIGURE 8.1 Counter outcome.
a function w(t) on [0, <») with graph as depicted in Figure 8.1. A probability
space for this type of experimental situation would consist of all such a>. The
construction of such a probability space is better left to more advanced texts.
A different approach, which avoids such constructions, will be followed here.
This approach entails the derivation of some equations based on heuristic
arguments.
In the following discussion, o(h) (read “little o of h”) will be a generic
symbol for a function of h that satisfies the condition
We will assume the following properties of the counter process just described.
Independently of the number of occurrences of events in the interval (0, t), for
small h > 0:
(i) The probability that an event will occur in the
interval (r, t + h) is Ah + o(h),
(ii) The probability that no event will occur in the
interval (t, t + h) is 1 — Ah + o(h), and
(iii) The probability of two or more events occurring
in the interval (t, t + h) is o(h),
where A is a positive constant. These might be reasonable assumptions for the
situation described above for periods when saturation is unlikely.
Assuming there is an appropriate probability space for which these assump­
tions are valid, let P„ (t) be the probability that n events will occur in the time
interval (0, t). Consider adjacent time intervals (0, t) and (t, t + h). If n > 1
events occur in the interval (0, t + h), then one of three things must be true:
(1) n of the events occur in (0, t) and none occur in (t, t + h), (2) n — 1 events
occur in (0, t) and one occurs in (t, t + h), or (3) two or more of the n events
occur in (t, t + h). Since these are mutually exclusive possibilities and there is
8.2
POISSON PROCESS
269
independence between events occurring in (0, t) and (t, t + A),
P„(t + h) = P„(t)(l - Xh + o(h)) + Pn-x(t)(Xh + o(h)) + o(h)
or
P„(t + h) = P„(t)(l - Ah) + Pn-!(t)Ah+o(h)
where the o(h) in the first equation has been replaced by an 0(h) of the form
Pn(t)o(h) + P„-i(t)o(h). Therefore,
Letting h —> 0+,
P'n(t) = -AP„(t) + AP„-i(t),
n > l,t>0,
(8.1)
assuming, of course, that the Pn (t) are differentiable. It is necessary to consider
the n = 0 case separately since only (1) holds in this case. Thus,
P0(t + h) = P0(f)(l -Ah + o(h)),
so that
Po(t + h)-Po(0 _
,p lf}oW
-xp0(t) + —
Letting h —> 0+, we obtain the differential equation
P'0(t) = -AP0(t),
t > 0.
(8.2)
In deriving Equations 8.1 and 8.2, we let h —> 0+ so that the above derivatives
are really right derivatives. By considering the intervals (0, t — h) and (t — h, t),
these equations can be seen to hold for left derivatives also and therefore for
unrestricted derivatives. The Pn (t), n > 0, must satisfy the initial conditions
Po(O) = 1
(8.3)
and
PM(0) = 0,
n > 1.
(8.4)
We will now undertake to solve these differential equations subject to
the stated initial conditions. Consider first the differential equation P'0(t) =
— AP0(t) subject to the initial condition P0(0) = 1. It is easily seen that the
solution is
270
8
CONTINUOUS PARAMETER MARKOV PROCESSES
Po(t) = e~Xt,
t > 0.
Putting n = 1 in Equation 8.1 and substituting e-Af for Po(t), we find that
Pi(t) satisfies the equation
Pj(t) = — APi(t) + Ae-Af
and the initial condition Pi (0) = 0. To solve the equation, write it as
p;(t) + AP!(t) = Ae'Af.
Multiplying both sides by the factor eAf, the equation can be written
—
at
= A.
Integrating, Pi(t) = Ate-Af + ce~Xt. The integration constant c must be zero
to satisfy the initial condition Pi (0) = 0. Thus,
P,(t) = Ate-Af.
Repeating the same steps for the n = 2 case, we find
By mathematical induction,
AM tn
P„(t) = -^-e"Af,
n - 0,t > 0.
(8.5)
The heuristic description has resulted in a collection of specific functions.
If there is any validity to this procedure, we should be able to start with the end
product—namely, the P„(t) functions—and construct a probability model
from which the differential equations above can be deduced.
To construct such a process, we would take ft to consist of all real-valued
nondecreasing step functions co on [0, <») with co(0) = 0 that increase only
by unit jumps. The probability function P would be defined as follows.
For each t > 0, let Xf(co) = co(t). If 0
h < t2 <••• <
and
< m2 — ''' — «!■> let
P(Xfl = «!,...,Xft = nk)
= P(Xt,-X0 = ni,Xf2~ Xf, = m2— «],... ,Xtt — Xft_, = nk — nk-1)
=
- tl) X • • • X Pn.-n^ttk - tk-kY,
8.2
POISSON PROCESS
271
i.e., probabilities are assigned so that the increments
Xt, ~ Xq> Xf2 — Xtl,..., Xtk — Xft_1
are independent random variables with
PV^-X,,., = n)
=pn(tj - tj-o =
■ = b
nl
k
We can now state the formal definition of a Poisson process on a given
probability space (ft, S', P).
Definition 8.1
The family of random variables {Xt : t 2 0} is a Poisson process with rate A > 0
if
1.
P(X0 = 0) = 1,
2.
Xtl — Xt},..., Xtlt — Xtlt_t are independent
random variables whenever 0
ti
t^, and
3.
P(Xt-Xs = n) = (A"(t-$)"/«!) e~A(f"j),n E N,
whenever 0 < s
t. ■
Let {Xt : t 2 0}be a Poisson process with parameter A > 0. Taking s = 0
in (3), Xt has a Poisson density with parameter At, so that E[Xf] = At and
varXf = At (see Example 4.15). We can consider the time at which the first
event occurs by letting Wi be the first time t that Xt = 1. Assuming that
Wi is a random variable, the density of Wi can be obtained as follows. If
t 2 O.thenW!
t if and only if Xf 2 1, so that Fwt (t) = P(W\ < t) =
P(Xf 2 1) = P(Xf — Xo 2 1), and therefore
”
in fn
Fw,(t) = X
= e'AVf - 1) = 1 - e"Af.
«!
It follows that the density of W! is given by
Ae"Af
0
if t 2 0
if t < 0;
i.e., Wi has an exponential density with parameter A, and it follows from
Exercise 7.2.5 that E[ Wi] = 1/A. The parameter A is called the rate. The
greater the rate of occurrence of events, the smaller the waiting time for an
event to occur.
EXERCISES 8.2
1.
Let {Xf : t 2 0} be a Poisson process with rate A>0. If 0 < s < t
andO < k < n, calculate P(XS■ = k | Xf = n).
272
B
CONTINUOUS PARAMETER MARKOV PROCESSES
2.
Let {Xf(,) : t > 0}"=1 be independent Poisson processes with the same
rate A > 0. Find the density of the waiting time for all n of the processes
to have at least one event occur.
3.
Let {Xt : t 2: 0} be a Poisson process with rate A > 0. By writing
Xf = Sfcli(Xt ~ *
X —।) + (Xt X[f])> where [t] denotes the largest
integer less than or equal to t, show that P(limf
Xf = +00) = 1 by
considering the events At ~ (Xk — Xk-i = 1).
4.
Suppose the rate A = A(t) is a nonnegative function on [0, +°°) that is
Riemann integrable on each finite interval and define P„ (t) as before.
Then P„(t) satisfies Equations 8.1, 8.3, and 8.4. Let
X
G(t,s) = ^Pn(t)sn,
n=0
-1 < s < l,f > 0,
be the generating function of the sequence {P„ (t)}„ =0.
(a) Verify that G(t, s) satisfies the equation
= —A(t)(l ~s)G,
-l<s<l,t>0,
and the initial condition
-1 < s < 1.
G(0,s) = 1,
(b)
Verify that G(t, s) = e (1
(c)
Verify that
Pn(t) = 1
n!
«
*(«)<
satisfies these conditions.
rt
/o'
A(M)d«
/
o
and that
E[XfJ =
Jo
A(M)dM.
5.
Use the results of the previous problem to determine P„(t), n > 0 and
E[Xf]forA(t) = 1/(1 + t).
6.
Use the result of Problem 5 to approximate Pi0(100) = P(Xiqq = 10)
when A(t) = 1/(1 + t).
8.3
BIRTH AND DEATH PROCESSES
273
BIRTH AND DEATH PROCESSES
The Poisson process is an example of a birth process in which the population
size only increases. Realistic models for population growth must not only
incorporate deaths but also allow the possibility that birth and death rates
depend upon population size.
Assuming that there is a probability space (ft, S', P) and a process
{Xt : t 3: 0} reflecting population growth, we will assume that for h > 0,
1.
P(Xt+h-Xt = 1 |Xf = n) = P„h + o(h),
2.
P(Xt+h -Xt = -1 |Xf = n) = 8nh + o(h),
3.
P(Xt+h ~Xt = 0|Xf = n) = 1 - (/3„ + 8n)h + o(h), and
4.
P(|Xf+fc -Xf| > 2) = o(h),
where
& 0, 8q = 0, and 8„ S 0 for all n S: 1. We will also assume that
Xq = n0 where «q — 1 is the initial population size, so that Xt represents
the population size at time t. Note that there is no mention of independence
between population size in (0, t) and changes in (t, t + h); in fact, independence
will be lacking because birth and death rates in (t, t + h) can depend upon
population size in (0, t). Letting P„(t) = P(Xf = n), a system of differential
equations for the P„(t) can be derived as follows. Note first of all that
P„0(0) = P(X0 = n0) = 1 and that P„(0) = P(X0 = m) = 0 for all
n
«o- Consider first Po(t). Forh > 0,
Po(t + h) = P(Xf+A = 0)
= P(Xt+h = 0,Xf = 0)+P(Xf+/1 = 0,Xf = 1)
+ P(Xt+h = 0,Xf > 2)
= P(Xt+h = 0 |Xf = 0)P(Xf = 0) + P(Xf+A = 0 |Xf = l)P(Xf = 1)
+ P(Xt+h = 0,Xf > 2)
= (1 - poh + o(h))P0(t) + (8ih + o(h))P,(t) + o(h).
Therefore,
h
h
Letting h —> 0+, we obtain
P'0(t) = -j30P0(t) + 8iPi(f),
t>0.
The derivative in this equation is a right derivative, but by replacing t by t - h,
the same equation holds for the left derivative and therefore for the derivative.
Consider now P„(t) for n S 1. Proceeding as before,
274
8
CONTINUOUS PARAMETER MARKOV PROCESSES
P„(t + h) = XP(Xt+h = n,Xt = k)
<t=0
= ^P(Xt+h — Xt = n — k,Xt = k)
Jc=O
= P(Xt+h-Xt = -l,Xf = n + 1)
+ P(Xt+h -Xt = 0,Xt = «)
+ P(Xt+h ~Xt = l,Xf = n - 1)
+
P(Xt+h ~ Xt — n ~ k,Xt — k).
|k-n|£2
Since X|fc_„|£2P(Xf+/, -X, = n - k,Xt = k) < P(|Xf+h - Xf| > 2) =
o(h),
Pn(t + h) = (8„+1h + o(h))P„+i(f) + (1 - (j3„ + 8„)h + o(h))P„(t)
+ (0„-}h + o(h))Pn-l(t) + o(h).
Therefore,
Pn(T + h) ~ P„(t) _
. _ , n ■ o \n
. a
n
(tX .
,
On + \P n + \\t)
\Pn + Oh'PhCu + Pn — lP n — 1(U +
n
<
n
Letting h —> 0+ (and also letting h —> 0+ after replacing t by t — h),
P'n(t) = 8n+iPn+i(t) —
+ 8„)P„(t) + j3„-iP„-i(t).
The functions Po(t), Pi(f),... therefore satisfy the system of differential
equations
( P’0(t) = -j30P0(f) + 8iPi(t),
f>0
[p;(t) = 8n+lPn+dt}-(P„+8n)Pn^ + pn-}Pn-dt}
,
1
.
1
subject to the initial conditions
We will now specialize by considering a pure birth process for which
8„ = 0,n >: 0. In this case, the system of differential equations becomes
f P’0W = -^oPo(^)
( P;(t) = -PnP^+Pn-lPn-M,
n > 1.
Suppose n0 > 0. The general solution of the first equation has the form
Po(^) = Coe~^ot; to satisfy the initial condition in Equation 8.7, we must have
8.3
BIRTH AND DEATH PROCESSES
275
Cq — 0, in which case Po(t) = 0 and the second equation in 8.8 for n = 1
reduces to
If n0 > 1, again Pi(t) = 0 and the second equation in 8.8 reduces to
Pfr) = -P2P2W.
Continuing in this way, we arrive at the fact that P„(t) = 0 for all 0
and that
n < n0
P„0(t) = e~p"of
Consider the second equation in 8.8 for n > n0; after multiplying both
sides by e^nt, it can be written
at
Integrating from 0 to t and using the initial condition P„ (0) = 0 for n
we obtain the recurrence relation
Pn(t) = pn-ie-P"1 f e^pn-x{s)ds.
no,
(8.9)
Jo
Since P„0(t) is known, this equation can be used to generate the P„(t) suc­
cessively. Note that P„(t) S: 0 for all n > 1, t > 0.
It is conceivable that in certain populations the birth rates might be so great
that the population “explodes” or becomes infinite in a finite time interval, and
it is of interest to consider the probability
00
00
P(Xf = +00) = l-^P(Xt =n) = l-^TPM
n=0
n=0
Thus, P(Xf = +<») > 0 if and only if X”=o Pn (t) < 1- An explosion will not
occur if P(Xt = +<») = 0 for all t > 0; i.e., if X”=o P"
~ 1 f°r f > 0A criterion for the latter is given by the following theorem.
Theorem 8.3.1
£"=0P,i(t) = Ifordlt — 0 if and only if the series XZ=o
PROOF: Letting Sk(t) = Xj=0P«W,S^t) =
diverges.
By Equa­
tion 8.8,
k
Sl(t) = -poPoity+^-pnPn^ + pn-.Pn-^t)) = ~pkPk(t).
n=l
276
8
CONTINUOUS PARAMETER MARKOV PROCESSES
Integrating from 0 to t and using the initial conditions in 8.7,
1 - Sjt(r) =j3jf P^sjds.
(8.10)
Jo
*
.%
Since the terms defining the sums $k(t) are nonnegative, for each t the left side
decreases monotonically as k increases, and so the right side decreases in the
same way. Let
/z(t) = lim
Pjt(s) ds S: 0.
JQ
Thus,
•t
Pk(s)ds 2= ^2.
Pk
.o
Using the fact that Sk(t) = 'Z’^Pnlt) = S
* =0P(Xf = n) = P(0 S
Xf < k) < 1 and summing k = 0,..., n,
rt
"1
S„(s) ds >
t >
Jo
If the series
k=oPk
Wfin diverges, then ju,(t) must be zero for all t,
lim(l - Sjt(t)) = lim £k tPk(s) ds = /x(t) = 0,
k—><x>
k-*<x>
o
and therefore XZ=o^«(f) = lim
-^
*
hand, by Equation 8.10,
■' st(S) ds=± r pM
Sk(t) - 1 for all t >: 0. On the other
=±
n=Qd0
n=Q
Pn
n=QPn
Since Sk(s) increases with k, the integral on the left increases with k and
Sk(5)ds
lim
k—»oc 0
If the limit can be taken past the integral sign (which is permissible in this
case),
ri
qo
( lim St(s)) ds < X
«
8.3
277
BIRTH AND DEATH PROCESSES
= linijt-»oo Sk(t) = 1 for all t 2 0, we would have
If
for all t > 0. Since t can be arbitrarily large, X7=o Vfi" = +°°- Thus,
^,^=0Pk(t) = Iforallt S: 0 implies that the series
l/|3n diverges. ■
Generally speaking, the differential equations in 8.6 for a birth and death
process are difficult to solve because they must be solved simultaneously, as
opposed to the pure birth case where they can be solved sequentially starting
with the first equation in 8.8. There are methods of obtaining qualitative
information about population growth even if the equations in 8.6 cannot be
solved.
Consider a birth and death process with
= j3n and
8n = 8n, n 2: 0, where j3, 8 >0. We will assume that «q = 1> so that
Xo = 1. The initial conditions are then
EXAMPLE 8.1
Pi(0) =•!
•
Pn (0) =0,
n/1.
ConsiderM(t) = E[XJ = ^^=onP(Xt = n) = ^.„=onPn(t). Note that
M(0) = Z^onPnW = 1 and that M'(t) = Sn = inP«(r)> formally at
least. Multiplying both sides of the second equation in 8.6 by n and summing
over n = 1, 2,...,
co
CO
(« + l)Pn+i (')“(/? +3) 2>2PM(r)
M'(r) = 8
n=1
n =1
co
n(n - l)P„-i(t)
+ j3
n =1
co
= (J3
~8.^nPn(t)
n =1
= (j3 - 8)M(t).
The average population size M (t) therefore satisfies the differential equation
M'(t) = (/? - 8)M(t)
subject to the initial condition M(0) = 1. Thus,
e(P~S)t
M(r) =
1
ifj3^8
ifP = 5-
278
8
CONTINUOUS PARAMETER MARKOV PROCESSES
This function also gives qualitative information about the long-term behavior
of population size, since
lim M(t) = <
EXERCISES 8.3
1.
4-00
0
1'
Consider a pure birth process with
if j3 > 8
ifj3 < 8
if j3 = 8. ■
= ft n,8n = 0, n S: 0, and
Xq = 1. Calculate Pt(t), Pi(t), and ?3(t).
2.
Consider the pure birth process of the previous problem. Use mathe­
matical induction to prove that Pn (t) =
n S: 2.
3.
Explain what happens to the pure birth equations in 8.8 when /3>
0, i = 1,..., n — 1, =0.
4.
Calculate E[Xf] for a birth and death process with j3„ = a+pn,8„ =
8m, n S: 0, andXo = L
5.
Consider the pure birth process for which 8„ = 0,
= |3m,m S: 0.
Then the P„ (t) satisfy Equation 8.8 and the initial conditions in 8.7.
Let
.
X
G(t,s) = X Pn(t)s%
-l<$<l,tS0
M =M0
be the generating function of the sequence {P„ (t)}“=0.
(a) Verify that G(t, s) satisfies the equation
^- + ^(1-5)^- = 0
(b)
and the initial condition G(0,s) = s"°.
Verify that
/
\n°
G(t,s) = sn<> ------- --------- —
\1 -s(l - e-0f)/
satisfies the conditions of the previous problem.
(c)
Determine E [Xf ].
(d) Determine P„(t),M S: 0, intheM0 = lease.
MARKOV CHAINS
Let S = {sb s2,...} be a finite or countably infinite collection of objects called
states. We will now describe a model for moving among the states in a random
way.
8.4
Definition 8.2
MARKOV CHAINS
279
A collection of random variables {Xt : t
0} is called a Markov chain if
P^tn = sin |Xf, = s,Xfn_, = s,n.,) = P(Xf„ = sin |Xfn_, = 5,-_1)
whenever Q
t\
t2 —
tn and s;,,..., Sin GS. ■
This defining property is called the Markov property. If we regard t„-i
as the present time, this property requires that probability of a future event
given the past does not depend upon the remote past or, put another way, the
process does not have a memory.
Definition 8.3
The transition function of the Markov chain {Xf : t S: 0} is defined by
pi.j($,t) = P(Xt = Sj | Xs = s^,
0 < s < t,sitsj £ S.
The chain has stationary transition functions ifthepitj(s, t) depend only upon
t — s. ■
We will consider only Markov chains {Xf : t
function
Pi,i(t) = P(Xt+s = sJX, = Si),
0} with stationary transition
i,j > l.s.t > 0, (8.11)
which satisfies the additional continuity condition that
flim+p:,;(t) = 8^
(8.12)
where j = 1 if i = j and 8I>;- = 0 if i
j. For fixed t >: 0, the numbers
pi,j(t) can be displayed in the following matrix form:
Pl,l(t)
P2,l(t)
pl,2(t)
p2,2(t)
Pi,l(t)
pil2(t)
P(t) =
which is called the stationary transition matrix. If S is finite, this matrix is
1$ I X |s|; if S is countably infinite, it has an infinite number of rows and
columns.
EXAMPLE 8.2
let 0
ti <
Let {Xf : t 5: 0} be a Poisson process with rate A > 0,
’ — h an<^ 0 — »i < m2 < • • •
nk, and let
280
8
CONTINUOUS PARAMETER MARKOV PROCESSES
P„(t) = P(Xt = n) as in Section 8.2. Since the process has independent
increments,
P (Xtk = nk | Xf| — nl,...,Xtk.l ~ »k-i)
_ P(Xtl - Xo = n!,Xt2 ~Xi, = n? ~ «b- • •>Xtt ~ Xtk_t - nk ~ nt-i)
P(Xf, ~ Xo — n\,..., Xft_j — Xft_2 — nk-i ~ njr-z)
Xtk_,
= P(Xfk
= Pnic — nk-i^tk
tlk
nk — l)
tk — 1)-
Calculating P(Xtk = nk | Xtk_t =■ njt-i) in the same way, it too is equal to
PMfc—Mfc_, (*> “ tk-1)- According to Equation 8.5,
P (Xft = nk |Xf, = nl,...,Xtk_, = njt-i)
~ P(Xtk = nk | Xft_! = nk-})
tk — })
Pnk—nk-i(tk
= (A(tt
(nk~nk-i)l
so that {Xf : t S: 0} is a Markov chain with stationary transition function
. . __
?m,n
(\n~mtn~m/(n — m)l)e~Xt
0
ifn>m,tS0
otherwise.
The stationary transition functions pij inherit an important property from
the Markov property. The equation in (iii) of the following theorem is known
as the Chapman-Kolmogorov equation.
Theorem 8.4.1
The transition functions pjj have the following properties:
(i) Pi.jW = &i,j ifi.j
(H)
I-
T.jpi,j(t) = 1 for all t
0.
(iii) pitj(s + t) = T.kPi.k(s)pk,j(t) ifs, t > 0.
PROOF: We first prove (iii). By the Markov property,
pi, j
+1) = P(Xt+s = Sj\XQ = s^
P(Xt4-{ = Sj,Xs = Sk,X0 = Sj)
_
k
P(X0 = sf)
8.4
281
MARKOV CHAINS
= ^P(Xt+J = Sj |X0 = S'>Xs = sk)P(Xs = $ft|X0 = $,)
k
= ^P(Xt+s = Sj I Xj = sk)P(Xs = sd Xo = $,)
k
= ^PiMpk.j(t).
k
Since P(X0 = s;- | Xo = $,) = P(X0 = s;,X0 = s;)/P(X0 = $,) is 0 or 1
according to whether i j or i — j, respectively, pI>; (0) = &i.j and (i) holds.
Finally,
pi.jft = Xj P(Xt = Sj | Xo = sf) = P(Xf G S | Xo = s;) = 1
and (ii) holds. ■
The three properties listed in this theorem in conjunction with the conti­
nuity assumption 8.12 imply a good deal more about the transition functions
pi j. Additional properties will be reviewed briefly for the purpose of clarifying
applications. Proofs of these facts require a little more mathematical back­
ground than is presupposed here. Complete details are available in Chapter 14
of the book by Karlin and Taylor listed at the end of the chapter. The first facts
concern differentiability of the transition functions p/j.
Theorem 8.4.2
(i)
For every i,
"
(U.i =
exists but may be — °°.
(ii)
For alii,j with i
j,
4i.j = Pi.jW =
lim
exists and is finite.
Since Xj pi.j(t) = 1> it might be tempting to take the derivative term by
term to show that
i
This is certainly a valid step if the sum Xj pi.j (*) is a finite sum, as it is in many
applications, but in general the most that can be proved is that for each i,
—4i.i —
282
B
CONTINUOUS PARAMETER MARKOV PROCESSES
We will assume in the remainder of this section that
(8.13)
This requirement means that all of the entries in the following matrix, called
the q-matrix, are finite and that each row sum is zero:
?1.1
?2.1
?1,2
KA
?>.2
<12,2
In matrix notation, Q1 = 0 where 1 is a column vector all of whose entries
are 1.
Not only are the pi,j(t) differentiable at 0, butp/j (t) is defined for all t > 0.
Consider the equations
Pi,j(t + $) ~ pi,j (t) = 2 \pi,k(s)pk,j(t)
pi,j (t)
k
= ^,PiMpk,j(t) + (pi,,(s) ~ l)p,j(t).
i
*
k
Proceeding formally by dividing by s and letting s —> 0+, we obtain the
following equations, which are known as the Kolmogorov backward equations:
t > 0.
p'i.j(t) = 'ZjH.kPk.jW + qi.ipi.jt0.
(8.14)
i
*
k
On the other hand, we can write
Pi,j(t+s) - p^s)
= 2piik(s)pk.j(t) - ^pi.k&pk.jW
k
k
= ^PiMpk.j(t) -^pi,k(s)8k,j
k
k
= ^Pi.k^pk.^t) - Sfcj).
k
Operating formally again by dividing by t and letting t -> 0+, we obtain the
following equations, which are known as the Kolmogorov forward equations:
P'i.j^ =^Pi.k^k.j+qj,jpi,j(t),
j
*
k
t > 0.
(8.15)
8.4
283
MARKOV CHAINS
If the state space S is infinite, then both sets of forward and backward
equations represent an infinite system of differential equations that must be
solved simultaneously. Both sets of equations take on deceptively simple forms
if expressed in matrix notation. Letting P'(t) = [p,' ;-(t)] be the matrix with
entries p^^t), the backward equations take on the form
P'(0 = QP(t),
t > 0,
(8.16)
t > 0,
(8.17)
and the forward equations take on the form
P'(t) = P(f)Q,
with both equations subject to the initial continuity condition 8.12.
EXAMPLE 8.3 Consider a Poisson process {Xt : t
In this case, the transition functions p,,j are given by
Pi.jW = ----------- e
-Ar
,
0} with rate A > 0.
j > i > 0,
and it is easily seen that qij = Pt,iW = ~Xqt,i+i = Pi|l+i(0) = A, and
qij = 0 for;
{i, i + 1} and all i S: 0. The q-matrix is given by
Q =
-A
0
0
A
-A
0
0
A
-A
...
...
...
Application of these results lies in the choice of the q-matrix. From the
definition of the g,-j, for all i,
pi.i(fi) = 1 + qt,ih + o(h),
and fori
pi,j(h} ~ ty-jh + °W-
These equations frequently suggest how the qitj should be chosen. It is then a
matter of solving the Kolmogorov backward or forward equations for the pit).
Consider a system consisting of a single unit that has a
failure rate of /z and a repair rate of A; i.e., if the unit is in operation at time t,
the probability that it will fail in the interval (t, t + h) is ph + o(h); if not in
operation at time t, the probability that it will be repaired and put back into
EXAMPLE 8.4
284
8
CONTINUOUS PARAMETER MARKOV PROCESSES
operation in the interval (t, t + h) is Ah + o(h). This system can be in one of
the two states 0,1 with the q-matrix
Q =
-A
A
e.g.» the probability of going from state 1 to state 0 in a time interval of length h
is approximately q^oh = fih. In this case, there are four transition functions
to be determined: po.o>po,i>pi,o>pi,i- The Kolmogorov forward equations are
PiM
=
, PiM
=
i=0,l
pi,o(t}X-i=0,l.
PiMfi - Api.o(t),
the equations can be
Since p,j(t) = 1 — p,-.o(t) and pi,o(t) = 1 —
written
= /z
p,'A(t) + (A + ju-Jpuft) = A.
p'iM + (X +fi)pii0(t)
Multiplying both sides of the first equation by e^A+/i)f, it can be written
^(e^'pM) = fie^.
Integrating and then multiplying both sides by e (A+M)f,
PM
= T^- + cI>oe-(A+/x)f
A + fi
where the integration constants must be chosen to satisfy the continuity con­
dition 8.12. After so choosing the constants and applying the same procedure
to the pi,! (t), we obtain
EXERCISES 8.4
1.
Po,o(f) —
/z + Ae~(A+/x)f
A + /z
pi.o(0 =
/z - /ze (A+/x)f
A + fi
Po,\(t) —
A - Ae~(A+/x)t
A + fi
A + zie-(A+/i)f
PM
Consider the q-matrix
Q =
-A
0
0
A
-A
0
0
A
-A
...
...
...
=A +^-44
---fi
8.5
285
MATRIX CALCULUS
and the corresponding Kolmogorov forward equations for P(t) =
(a) Use the forward equations to show that pIj0(t) s 0 for all i > 1.
(b) For each i S: 1, use an induction argument to show that p/j (t) = 0
for; < i - 1.
(c) Calculate p; j (t) for; Si.
2.
A system consists of m components, some of which are in operation and
some of which are not at any given time t. If there are k components
in operation at time t, the probability that one of them will fail in the
interval (t, t + h) is p.kh + 0(h), the probability that one will be put
back into operation in the interval is Ah + o (h), and the probability that
two or more changes will occur is o(h). If the state of the system is
the number of components in operation, what is the q-matrix for the
system?
3.
A system consists of two components that are connected in parallel with
one online and the other on standby. The one in operation at time t will
fail in the interval (t, t + h) with probability Ah + o(h). A component
cannot fail while on standby. A component in failed condition at time t
will be repaired in the interval (t, t + h) with probability p,h + o(h). The
probability of two or more changes taking place in an interval of length
h is o(h). If the state of the system is the number of failed components,
what is the q-matrix for this system?
4.
Suppose the system of the previous problem is modified so that a
component on standby at time t will fail in the interval (t, t + h) with
probability Ash + o(h). If the number of failed components is the state,
what is the q-matrix for the system?
MATRIX CALCULUS
In the previous sections, we have seen that the differential equation p'(t) =
ap(t) has the solution p(t) = ceat. We have seen also that the transition
matrix P(t) = [pi,j(t).| satisfies the matrix equation P'(t) = QP(t), where
Q, the q-matrix, is a constant matrix. We could try to solve the equation
P'(t) = QP(t) by means of the function P(t) = etQ if only we knew what
e‘Q represents. Since
e
a2t2
= 1 + at + —— + —- +
2!
3!
we might try replacing a by Q to obtain
e tQ
t2Q2
Q 3
t*
= ,+,Q+-ir+—
286
8
CONTINUOUS PARAMETER MARKOV PROCESSES
where it is necessary to replace the number 1 by the matrix I =
j]. The
individual terms involving powers of Q on the right side make sense. Except
for the fact that the right side is a sum of an infinite series all of whose terms
are matrices, this equation can be taken as the definition of etQ. Since sums
of infinite series are defined in term? of limits, we must digress to discuss
sequences of matrices and limits of such sequences.
In keeping within the bounds of introductory material, we will limit
ourselves to matrices with a finite number of rows and columns. We will deal
with r X s matrices A = [«,-j] with r rows and s columns. Either {A(n)} or
A(2),... will denote an infinite sequence of such matrices, all of size r X s.
The dependence upon n is specified by putting the n in parentheses, since the
notation A" stands for the nth power of A when A is a square matrix. Thus,
A(n) = [a/^]. If lim„_»x
exists and is equal to aIi;- for each i and;, we say
that the sequence {A(n)} converges to A — [atj ] and write
lim A(n) = A.
n—>x
Alternatively,
lim [«/"•’] = [lim a/"’] =
*»nJ
n -*x
J
EXAMPLE 8.5
lim
n —*oo
2l/n
(1 + (!/«))"
1 =
(n2 + l)/(n3 + n2 + 1)
1
1
e
0
Let {A(n)} be a convergent sequence of r X $ matrices and let {B(n)}
be a convergent sequence of s X t matrices with lim„_>0oA(") = A and
lim„_>S0B(") = B, respectively. Then the sequence {A(")B(")} converges and
limn_»oo= AB. This follows from the fact that the entry in the ith
row and jth column of A(n)B(n) is XUi ahkb<k,j with limit Zl = i “i.kbk.j,
which is the corresponding entry in A B.
Given a sequence of matrices {A(n)}, all of the same size, we can form the
expression 2Z“=1A(n), since matrices of the same size can be added. The
definition of the sum of the infinite series X”=1A(n) mimics the calculus
definition. For n >: 1, let
n
= ^(I)+^(2) + • • •+^(n)
s(n) = K?] =
k=l
8.5
287
MATRIX CALCULUS
where s/y = ^=1a^ • If the sequence {S(n)} has a limit S = [s;>;-], we say
that the series
converges and has sum 5. Note that
i
“S
s‘-i =
In what follows, an infinite series will begin with a zero term as in
=0 A(">.
If the nth term is An for some matrix A, AQ by convention will be the identity
matrix/ =
We will also adopt the notation A" =
Definition 8.4
IfA is an r X r matrix, we defineeA =
=0 A"/n! provided each of the series
converges absolutely, i,j = 1,..., r. ■
Lemma 8.5.1
Let A = [«,j ] be an r X r matrix and let An =
If there is a
constant M such that
< M" for 1
i,j
r and n > 1, then
eA is defined. Moreover, if a^j
0 for 1
i,j
r, then a-^ >: 0 for
1
i,j
r,n > 1, and£™=oafj/n! > 0.
PROOF: Since the series ^,^=0Mn/n! converges, the series
converges absolutely by the comparison test, and eA is defined. Since
=
= i ai,k<
*k,j
—
— 0 for 1
i,j :£ r. It is easily seen that a/"' 2: 0
fori S i,j S r,n >: 1, using mathematical induction. ■
If a and bare real numbers, then a b ~ ba and eaeb = ea+b; but if A and B
are r X r matrices, it need not be true that A B = B A nor that eAe B = eA+B
when the latter are defined. For example, if
A =
1
0
1
1
and
0
1
1
1
then
Granted even that AB = BA, we are confronted immediately with the
binomial theorem in dealing with eA+B.
Lemma 8.5.2
Let A and B be r X r matrices such that AB = B A.
Sl'-o ( "t >B"-‘
Then (A + B)" =
8
288
CONTINUOUS PARAMETER MARKOV PROCESSES
PROOF: Since Z{„0AkB,-k = B +A = A + B, the assertion is true for
n = 1. Assume it is true for n — 1. Then
(A + B)" = (A + B)(A + B)"-1 = (A + B) V ( ” , 1 )AkBn~'~k.
k=0
Since BAkB"-1-k = BAAk~lBn~l~k = ABAk'lBn',''< =
AkBBn~l~k = AkBn~k,
)AkBn~k
(A + B)" = X ( " ~ 1 ')AMBn~x~k
k=0
j=l
=
k=0
J
j=0
7
Since the sum of the binomial coefficients is ( ? ) (see Exercise 1.3.2),
A + B =A"+22(M )a7B""7 +B" = X ( n ^AB”-’
;=1 J
j =0 }
and the assertion is true for n. By the principle of mathematical induction, the
assertion is true for all n S 1. ■
For the proof of the following lemma, the entry in the ith row and jth
column of the matrix A will be denoted by (A),j.
Lemma 8.5.3
Let A and B be r X r matrices for which eA and e B are defined and AB = BA.
Then eA+B is defined and eA+B = eAeB.
PROOF: Since
eB =
8.5
289
MATRIX CA1CULUS
Since the indicated series are absolutely convergent, the product of their sums
can be written X”=o
* where
n
M
i.j.k
_(€)
,(«-€)
Zai,k
_ & («—€)!’
By Lemma 8.5.2,
°k,j
Jt = 1 n = o €=0
0. (n- £)\
ai,k
n = 0 €= 0 V = 1
Dk,j
& (n — €)!
aW
bl{n;~e}
U;
LU
^on!
Theorem 8.5.4
Let Q = [<7i,y] be an r X r matrix for which q,^ S: 0 if i
j and
= 0,i = l,...,r. ThenP(t) = [pi,j(f)] = ef Q is defined, and the
pi,j(t) have the following properties:
(i)
limf_o+pi,j(t) = Si.}.
(H)
Pi.j(t)
> 0,t>0.
(Hi)
= l»f > 0.
= *
=5)pJt,j(t)(S
iPi.
(iv)
Pi.j(s + 0
(v)
P'i.jW = TJ^X^i.kPk.j^f
PROOF: We first show that etQ is defined by showing that there is a constant
M such that
< M",
1 < i,j < r,n > 1.
To see this, let M = maxlsi£r(ZJ=i
Since |^j|
M, the assertion
is true for n = 1. Assume it is true for n - 1; i.e„ \q-"j 1}| =s M"-1 for
1 < i,j < r. Since
= XUi
kffl s StoAT”! s
k=l
s M”’
*=1
290
8
CONTINUOUS PARAMETER MARKOV PROCESSES
and the assertion is true for n whenever it is true for n — 1. It is therefore true
for all n
1 by mathematical induction. Since
tnMn
n!
/n \ converges, the series
and the series
■x.
X" 1 ‘"•i
converges absolutely for 1
i, j
r. Thus, etQ is defined for all t > 0. In
fact, this argument shows that this power series in t has (— °°, °°) as its interval
of convergence. Since
Pi.jW
&i,j + 22
nl f
n=1
and the sum of a power series is continuous on its interval of convergence,
limf_>o+pij(f) ~ 3i,y>l
r. This proves (i). Consider the matrix
Q + al. The off-diagonal elements of Q + al are the same as those of Q.
The diagonal elements of Q are negative, but an a > 0 can be chosen so that
Q + al has nonnegative elements. Writing efQ = e~atIet^+al\ note that the
entries of the matrix et(Q+al) are all nonnegative by Lemma 8.5.1. Since it is
easily checked that
= e~atl, the same is true for the entries of e~atI.
Thus, the entries of P(t) = [pi,;(f)l — e‘Q are nonnegative. This proves (ii).
Since
x f.n
r
- &i,j + tcb\j + 22 ^7 22 ^i'k
n=2
Jt = 1
+Z^22&'1)22^ = Zu = i
j=l
j=l
n=2'1' k = l
j=l
and (Hi) is proved. Since esQ and e,Q are defined, so is e(j+f)Q, and e(s+l^Q =
esQetQ; i.e.,
Pi,j(s + t)
= (e(j+,)Q)i.y = 22(ejQ)i.it(e,Q)M = ^pi.k(s)pkij(t)
k=\
8.5
291
MATRIX CALCULUS
and (iv) is satisfied. It remains only to show that P’(f) = [p,j(t)J = QP(t)Since a power series can be differentiated term by term within its interval of
convergence,
n!
—-------- ?.(M)
n=l
(n - 1)!^',;
t"-1
:n-l)
k=l
and (v) is true. ■
Theorem 8.5.4 provides the answer to whether or not a q-matrix determines
a Markov chain. Direct application of the theorem is impractical except possibly
for the simplest cases, as will be seen in the following example, and useful
information about the resulting Markov chain must be obtained indirectly, as
in the next section.
EXAMPLE 8.6
Q4 =
Consider the q-matrix
-3
6
-3
-3
-3
6
Q6 =
-3
-3
,Q5 =
9
-9
0
-18
9
9
9
-18
9
9
9
-18
6
0
9
-9
-9
0
9
292
8
CONTINUOUS PARAMETER MARKOV PROCESSES
Upon calculating Q7, it is seen that Q7 = (—27)Q, and from this it follows
that
1 < k < 6, n > 0.
Qk+6n = (—27)”Qk,
Thus,
(-27)ntfc+6n
(k + 6n)!
k=ln=0
«
6
27)" t6n X
11 = 1
n=0
- Qk(k + 6n)!
In particular,
“
Pb3(r) ~ S
/
r
t3
t2
V2 + 6”)!
t4
3(3 + 6m)! +6(4 + 6m)1
1■
- 9---- ------ +9----- ------(5 + 6n)l
(6 + 6n)l /
EXERCISES 8.5
1. Show that eatI = eatI.
2. If A, /z > 0 and
<2 = f “A
A
M
-p-
L
solve the matrix equation P'(t) = QP{t),t >: 0, using the methods of
this section.
3. Solve the matrix equation P'(t) = QP(t), t > 0, using the methods
of this section, where
-110
0
1-10
0
0
0-1
1
0
0
1-1
8.6
293
STATIONARY DISTRIBUTIONS
STATIONARY DISTRIBUTIONS
LetQ = [q,.;] bean r X r q-matrixand let P(t) = [pI>7 (t)] be the associated
matrix of transition functions. One of the problems related to the P(t) is
the long-range behavior of the pi,j(f) as t —> <». In general, the long-range
behavior can be complicated, and we will deal only with the simplest case.
Letting Q" =
by definition of matrix multiplication,
k=i
Writing Q3 = QQ2,
= 22
«
= H ‘H.klk,mi­
k = i e= i
le =i
More generally,
Qij
= 22 22''' 22
(8.18)
i„-i=l
= 1:2 = 1
Definition 8.5
‘
The q-matrix Q = [qJ>7- ] is irreducible if for every pair i,j G {1,..., r} with
i
j either
> 0 or there is a finite sequence ilt ...,im such that
X • • • X qjmj
0. ■
• We can assume that the i, ilf..., im,j in this definition are distinct, because
if io — io for somep < q, then the factor
typjp+l
1»:'<:
can be deleted since it is preceded by
and followed by q^i^- The
resulting product will remain nonzero. Since the qk, t — 0 whenever i
we
can assume that
X ’ ’ ‘ X qin)j > 0
in addition to the i, i\,..., im» j being distinct.
EXAMPLE 8.7
The q-matrix
Q =
-1
0
1
1
-1
0
0
1
-1
i
294
8
CONTINUOUS PARAMETER MARKOV PROCESSES
is irreducible since <jli2 = 1 > 0,
~ 1 > 0, ?2.3?3,i = 1 > 0, <72,3 =
1 > 0, <73,1 = 1 > 0, and <73,! <71,2 = 1 > 6. ■
There is an easy way to determine if an r X r q-matrix is irreducible by
drawing a diagram as in Figure 8.2. The numbers in the circles represent the
states 1,2,..., 6. The arrow connecting 2 to 4 signifies that <72,4 > 0. If it is
possible to find a path connecting all the states by following the arrows, then
Q is irreducible. In this case, the matrix Q is irreducible since there is a path
connecting all the states.
Theorem 8.6.1
If the q-matrix Q = [<7;,; J is irreducible, then pi.j(t') > 0 for all t > 0,1
i,j
r.
PROOF: We first show that pi, 1 (t) > 0 for t > 0. Since limf_o+pi,; (t) = 1,
there is a 5 > 0 such that p;,; (f) > 0 for all 0 < fr
8. Consider any t > 0
and choose k such that k8
t < (k + l)8. Since P(t) = P(k8)P(t — k8} =
P(8)P(8) X • • • X P(8)P(t — kS), pi'j(f) is a sum of terms as in Equation 8.18,
with the qk, ( replaced by pjt, f, and
pi.iW)
pi.i(8>)pi,i(8) X • • • X p;,i(8)pi.;(t - kSf,
since t — k8 < 8, all the factors on the rigjit are positive and therefore p;,; (t)
> 0 for all t > 0. Suppose now that i
j. Let m be the smallest integer for
which there is a finite sequence ih..., in, such that
x ‘ ‘ ‘ X qimtj > 0.
Then q-^ = 0 for all n < m. Consider
r
r
I) 1
im ~ 1
8.6
STATIONARY DISTRIBUTIONS
295
Each term on the right must be nonnegative, because if some term were strictly
negative then some factor q^ would be strictly negative, and this can only
happen if ij = ij+i, contradicting the minimality of m. Because all terms are
nonnegative and at least one is positive,
> 0. But qW = 0 for all n < m
implies that
n=m
Since
Pi.iW
tm
00,
m!
jn-m
'y' —0 as t —> 0+,
=
has the same sign as qff for all small t, and therefore> 0 for
all small t. Choosing 8 > 0 such thatpi,j(t) > 0 for 0 < t < 8,pij(t + s) S
pi,j(t')pj,j(s') > 0 for 0 < t < 8 and all s > 0. Therefore, pi.j(t') > 0 for all
t > 0. ■
To study the long-range behavior of the pI>;(t), we will recall the discrete
parameter version first. Let P = [p,j] be an r X r stochastic matrix; i.e.,
pi j > 0 for 1
i,j
r and
= 1,1
i
r. Moreover,
let P" = [pfj ] be the matrix of n-step transition probabilities. According
to Theorem 5.2.2, if there is a positive integer N such that pW > 0 for
1 < i,j
r, then limn_oop-”-) exists and is independent of i. It should be
noted that the conclusion of the following theorem does not require that the
q-matrix be irreducible, but the limit may depend upon i.
Theorem 8.6.2
Let Q =
be an irreducible r X r q-matrix and let P(t) = [p/j(t)] be
the associated matrix of Markov transition functions. Then there is a probability
density it = {tti, ..., ttJ such that
lim pi, j(t) = TTj,
j = 1, ...,r,
independently ofi.
PROOF: Fixj G {1,..., r}and let e > 0. Since lim^_o+pi,j(^) = 8i,j> 1 —
i,j
r,
r
296
8
CONTINUOUS PARAMETER MARKOV PROCESSES
and there is an ho > 0 such that
y, i pi,} (f») - sij\ < %
i=l
whenever 0 < h < ho. Fixing h with 0 < h < ho, we will now show that
TTj = lim„ _>x pi, j(nh) exists independently of i. To see this, note that P(h} =
[p,j(h)J is a stochastic matrix and that pIfJ(h) > 0 by Theorem 8.6.1, since Q
is irreducible. Moreover, [pi.j(Mh)] = P(nh) = [P(h)]M = (p-j^h)] and
TTj = lim„_xpi,j(nh) exists and is independent of i, since this is true of the
Pi”/(h) by Theorem 5.2.2. Any t > 0 can be written t = n(t)h + r(t) where
n(t) is a nonnegative integer and 0
r(t) < h. From the inequalities
|pij(t) “
|pi.y(t) — Pi.j(«(t)^)l + lp/.j(«(t)^) “ ’fjl
and
r
r
Ipij(f) ~Pf.;(«(£)k)| = ^pi.k(nWh')pk,j(r(t)) -y'_pi,dn{t')h)pk,i(O>)
k=l
k=l
n=i
lp».j(t) - ^j\
g
2
Since n(t) —> <» as t —>
the second term on the right can be made less
than e/2 for large t; i.e., limf_>xpI,j(t) = Try. Clearly, tt; >0,1
r.
Since ^2 -= 1 p,j(t) = 1 and there are only a finite number of terms in the sum,
=
■
Except for relatively simple q-matrices, using limits to calculate it is difficult.
Fortunately, there is a simple algebraic method for finding tt, assuming that
the q-matrix is irreducible.
Definition 8.6
The probability density ir on {I,..., r} is a stationary density for the Markov
transition functions pi, j(t) if
r
= y-kPk.iW,
j =
l,...,r. ■
t=l
Theorem 8.6.3
Let Q = [g,j] be an r X r irreducible q-matrix and let P(t) = [p:,j(t)] be
the associated matrix of transition functions. Then P(t) has a unique stationary
8.6
297
stationary distributions
density it that satisfies the equation
r
j = 1, ...,r.
^TTiqi.j = 0,
(8.19)
:=1
PROOF: Let 77; =
Since
r
pi.fis + T)
=
k=l
letting s —> 00 we obtain tt; =
= i iTkpk,j(t')> and tt;- is a stationary density
for P(t). Let a be a second stationary density. Since
r
<rj = X>'.<rkPk,i(t')>
t > 0,
k=l
letting t —> oo,
r
r
= ITj^tTk = TTj,
<Tj =
j = 1, ...,r,
k=l
k=l
and a = tt. Thus it is unique. Since
r
r
»
k=l
k=l
n=0
.n
"
= 52 ^pk.j(t) = 22 irk 22 -q{ky= 22
n=0
/ r
\
V=1
/
(52
I
the sum of the power series in t on the right is a constant, and therefore all the
coefficients of t, t2,... must be zero; in particular
r
22
k=l
r
= 52 ^kj = o. ■
k=l
Equation 8.19 alone does not determine the stationary density tt uniquely.
The equation X/=i 7rj = 1 must be used in conjunction with Equation 8.19.
EXAMPLE 8.8
Consider the q-matrix
Q =
-4121
1-421
2 1-63
0
0
1-1
B
298
CONTINUOUS PARAMETER MARKOV PROCESSES
It is easily seen that Q is irreducible. In this case, Equation 8.19 becomes
—4tti + 7T2 + 2773
= 0
TTj — 4tT2 +
=0
7T3
2tti + 2tt2 — 6tt3 + tt4 = 0
TTi +
TT2 + 3tt3 — tt4
= 0.
Applying the usual row and column operations, these equations are easily seen
to be linearly dependent, and the equation
TTi + 7T2 + 7T3 + 7T4 — 1
must be used to find the stationary density 7T. In this case, 7T! = 1/10, tt2 =
1/15, rr3 = 1/6, tt4 = 2/3. If the pi,j(t) are the transition functions corre­
sponding to the above q-matrix, then limf_,x pi, i(t) = 1/10, limf_>xp,,2(0 =
1/15,limf_,»pIj3(t) = 1/6, limf_xpi,4(t) = 2/3 for i = 1,2,3,4. ■
EXERCISES 8.6
1.
Which of the following q-matrices are irreducible?
■ -1
10
0
0
0-1011
0
1-10
0
Qi =
00
1-1
0
1
0
0
0 -1
q2 =
-1
1
0
0
0 ‘
1-1000
10-210 •
000-11
0
0
10-1
2. Consider the system of Exercise 8.4.3. If the failure rate is A = .01 and
the repair rate is /z = 2, what is the limiting distribution of the system?
3. Determine the limiting distribution of the transition functions
corresponding to the following q-matrix:
Q =
-3
1
1
1
0
0
-3
1
0
0
1
0
-4
1
0
1
1
1
-2
1
1
1
1
0
-1
8.6
STATIONARY DISTRIBUTIONS
299
The following problems require mathematical software such as Mathematica
or Maple V.
(
4. Consider the Markov chain P(t) determined by the q-matrix
I
Q =
-1
1/3
1/4
1/2
-1
3/4
1/2
2/3
-1
Approxim<M|ith three-place accuracy the Markov transition function
P(t) ~ {pijv)}whent = 2, and determine tt, = lim t_>oo pi,j (t), 1 —
5.
Approximate with three-place accuracy the limiting distribution of the
transition function P(t) = [p, j(t)] corresponding to the q-matrix
»
-1
.05
.10
.25
.50
.10
-1
.10
.25
.10
.25
.15
-1
.25
.20
.45
.65
.50
-1
.20
.20
.15
.30
.25
-1
SUPPLEMENTAL READING LIST
S. Karlin and H. M. Taylor (1981). A Second Course in Stochastic Processes. New
York- Academic Press.
SOLUTIONS TO EXERCISES
CHAPTER 1
Exercises 1.2
1.
P(A) = 3/8.
2. |O|=2".
3. P(A) = 3/8.
4.
P(A) = 1/4.
5.
21 configurations.
6.
Let Pi(«) be the statement that 1 + 2 + • • • + n = n(n + 1 )/2. Since
1 = 1(1 + l)/2, Pi(l) is true. Assume P](n) is true. Then
, „
.
n(n + l)
(n + l)(n+2)
1 + 2 + • • • + n + (n + 1) = ----------- + (n + 1) = ------------------- ,
2
2
which is just the statement Pj(m + 1). Therefore, Pj(m + 1) is true
whenever Pj (m ) is true, and Pi (m) is true for all m >: 1 by the principle of
mathematical induction. Now let P2(m) be the statement I2 + 22 + • • • +
m2 = n(n + 1)(2m + l)/6. Since I2 = 1(1 + 1)(2 + l)/6, P2(l) is true.
Assume P2(m) is true. Then
,2
^2
2
/
, ,2
m(m + 1)(2m + 1)
,
->
I2 + 22 + • • • + n2 + (n + I)2 = —--------- -------- - + (« + I)2
6
.
/2n2 + n
\
= (n + 1) ---- ----- + n + 1
300
SOLUTIONS TO EXERCISES
301
.
/2n2 + n + 6n + 6\
= (« + 1) --------- ----------y
o
j
__ (n + l)(n + 2)(2n +3)
6
which is just the statement P2(n + 1). Therefore, P2(n + 1) is true
whenever P2(n) is true, and P2(n) is true for all n > 1 by the principle
of mathematical induction.
Exercises 1.3
1.
(a) is obvious, (b) Since the number of ways of selecting n individuals
out of m for inclusion in a sample is the same as the number of ways
of selecting tn — n to not be included, the two binomial coefficients are
equal.
2.
Express C(n — 1, r) and C(m — 1, r — 1) in terms of factorials, take the
sum, and simplify the resulting equation.
3.
(a)/(0) = 1 and fork
n,/(fc)(0) = n(n — 1) X • • • X (n — k + 1) =
(«)jt,/(<r)(0)/k! = (n)k/kl = ( ” ); for k > n,/(fc)(0) = 0, which is
K
also equal to ( ^ ) since k > n. Therefore,456789
4.
Thekthderivative off(t) at t = Ois a(a — 1) X • • • X (a — k + l) = (a)j..
5.
Let a = 1 and b = t in Equation 1.9, differentiate with respect to t, and
set t = 1.
Put a = 1 and b = t in Equation 1.9, differentiate twice with respect to
t, and put t = 1.
6.
7.
Let A be the collection of outcomes not having a 1. Then A is an ordered
sample of size n with replacement from the population {2,3,4,5,6} of size
5. The number of such samples is 5". Thus, P(A) = 5"/6".
8.
Let A be the collection of outcomes for which the number of heads and
tails are equal. The total number of outcomes is 22n. To count the
number of outcomes in A, select n out of the 2m positions in a label to be
filled with H’s, which can be done in (
j ways, and fill the remaining
positions with T’s. Thus, P(A) = (^M)/22n.
9.
(a) (
1 ). (b) Let A be the collection of outcomes for which boxes
1,2,..., m are empty. The n particles are then distributed among the
302
SOLUTIONS TO EXERCISES
remaining boxes numbered n + 1,..., 2n in
( -x \
' k '
(~x)k _ (~x)(~x - 1) X - ■ X (~x - k+ 1)
k'.
kl
__
1
= (
1
11.
.176197.
12.
53,130.
) ways. Thus,
J
* (% + fc ~ 1) X ••• X (x)
kl
nUx + fc"1)*
'
kl
Exercises 1.4
1. p(2) = 1/16, p(3) = 1/8, p(4) = 3/16, p(5) = 1/4, p(6) = 3/16,
p(7) = 1/8, p(8) = 1/16.
2. p(3) = 1/64, p(4) = 3/64, p(5) = 6/64, p(6) = 10/64, p(7) = 12/64,
p(8) = 12/64, p(9) = 10/64, p(10) = 6/64, p(ll) = 3/64, p(12) =
1/64.
3. p(3) = 1/216, p(4) = 3/216, p(5) = 6/216, p(6) = 10/216, p(7) =
15/216, p(8) = 21/216, p(9) = 25/216, p( 10) = 27/216, p(ll) =
27/216, p(12) = 25/216, p(13) = 21/216, p(14) = 15/216, p(15) =
10/216, p(I6) = 6/216, p(I7) = 3/216, p(I8) = 1/216.
4. l/(564) = 3.8719 X 10"8; or one chance in 25,827,165.
5. l/(48)6 « .11318 X IO"9.
6- (5°)(958°)
/ 1000 A
10 '
or, what is the same,
/ 10 \ / 990 \
2 ' 48 '
/ 1000\
507
7.
(12)4/124 = .573, accurate to three decimal places.
SOLUTIONS TO EXERCISES
303
8- 13X12X(
*
)Q)
'
(?)
9. 10 *
/(?).
■
10. f 100Wn-100X
5 7^
95
7
v 100 7
One would guess 2000.
Exercises 1.5
1.
8/7.
2.
1/(15- 163).
3.
6/11.
4.
5/12.
5.
An outcome is an ordered sample of size 10 from a population of size 2
with replacement, (a)P(A!) = P(A2) = 29/210 = 1/2, P (A i A A2) =
28/210 = 1/4,P(A2 | Ai) = 1/2. (b)P(A! A A2) = P(A,)P(A2). (c)
P(A2 | Ai) = P(A2). (d) If 10 is replaced by 20, there is no change in any
of these probabilities.
6.
1/2.
7.
22/63.
8.
From Figure 1.2, P(A A B) = 1/6, P(B) = 1/2, so that P(A | B) = 1/3.
Since P(A) = 1/3, the partial information does not change the probability
of A.
What does random mean in this exercise? If all n keys are placed in a row
and tried one at a time, then we are dealing with an ordered sample of
size n without replacement from a population of size n, and there are n!
outcomes. The number of outcomes with the good key in the rth place is
(n — 1)!. The required probability is (m — l)!/n! = 1/n.
9.
10. (a) 0
p(k) =£ 1. (b) Since
= lim 11-------—) = 1,
n -»oo I
n +1/
the p(k) can be used as weights in a probability model.
304
SOLUTIONS TO EXERCISES
CHAPTER 2
Exercises 2.2
1.
(a) Correct, (b) Incorrect since 2 is not a set. (c) Incorrect since {1,2,3}
has no sets as members. (d)‘Correct.
2.
The proposition is
G O and i > j.” A =
: (i,j) G ft and
i > j} = {(2,1), (3,1), (4,1), (5,1), (6,1), (3,2), (4,2), (5,2), (6,2),
(4,3), (5,3), (6,3), (5,4), (6,4), (6,5)}.
3.
Since 21/n > 1, [0,1] C A„ for all n > 1. Thus, [0,1] C AA„. Since
lim„_x21/" = 1, there is nox > 1 in AA„. Thus, AA„ = [0,1].
4.
Examine the graph of the equation y = xn. AA„ = {(x,y) : 0
l,y = 0}.
5.
AA„ = {(x,y) : 0
6.
If co G A, then co is not in Af; i.e., to G (Ac)c. If to G (Ac)c, then to is not
in Ac; i.e., co GA. Thus, (Ac)c C A and A = (Ac)c.
7.
Assume A C B. If co G Bc, then co cannot be in A since if it were it would
be in B also, which is impossible. Thus, co G A and Bc C Ac. Assume
now that Bc C Ac. If co G A, then co £ Bc since if it were then it would
be in Ac, which is impossible; thus, co G (Bc)c = B, and so A C B.
8.
Not true in general. Let U = {1,2,3,4}, X = {1,2,3}, Y = {3,4}, Z =
{2,3,4}. Then Y U Z = {2,3,4},X A (Y U Z) = {2,3}, whereas
XAT = {3}, (XAK)UZ = {2,3,4}. Clearly,XA(KUZ)
(XAK)UZ.
9.
By de Morgan’s laws, the distributive laws, and the facts that X A Xc =
0, Y A Yc = 0, (X U K) A (X A K)c = (X U Y) A (Xf U Tc) =
((x u r) a xc) u ((x u y) a yc) = (x a xc) u (y a xc) u (X a
yc) u (y a yc) = (y a xc) u (x a yf).
x < l,y = 0} U {(x,y) : x = 1,0
x <
y S 1}.
10. If n is odd, A„ = [0,1]; if n is even, An = [0,0] = {0}. For each
« 1. (\a*
„A = {0}>U:=1(nfca„Afc) = u:=1{o} = {O}. Also,
u^„Afc = [0,i],n:=1(ufca„Afc) = [0,1],
Exercises 2.3
1.
The domain is [ — 1,1], The range is [0,1].
2-
f = {(x,y) : x G R,y G R,y — >/l — x4, — 1 < x < 1}. The domain
is [ — 1,1], The range is [0,1].
3-
f ~ {(x,y) : x G R,y G R,y = 1/ y/1 — x2, — 1 < x < 1}. The domain
is (— 1,1), and the range is [ 1, °°).
4.
Define a : N -> X by putting a(p) = p/q. Then X is the range of a,
and X is countable.
5.
Fori = 1,2,..., m, let X, = {xj^x/z,...}. Since each X, is countably
infinite, there is a mapping a, : N -> Xj having X,- as its range. Consider
SOLUTIONS TO EXERCISES
305
the array
*11
*21
*31
*12
*22
*32
*13
*23
*33
*ml
*m2
*m3 • • •
•••
The elements of this array can be arranged in a sequence {«(»)};., by
going down the first column, then down the second column, and so forth.
To identify a(n) with some a,- (m), let p be the number of columns to the
left of the nth element a(n). The n — pm element of the (p + 1 )st column
is then the nth element a(n). More precisely, each n G N has a unique
representation n = pm + q where p G N U {0} and q G {1,2,..., m}.
Letting X = UX,, define a : N —> X as follows. If n = pm + q as
above, put
«(«) = <
*q(p
+ I).
Then X is the range of a.
6.
Assume X is countable; i.e., X is the range of an infinite sequence *{ n}"=1
where each xn is an infinite sequence of 0’s and l’s. Let xn = *{n,L:} “=1.
Consider the infinite sequence y = *{y
J “=| where yk = 1 — Xk,k. Since
yn
Xn.n for each n S l,y xn for all n S: 1. This is a contradiction
since y G X but is not in the range of {x„}"=|. Therefore, X is not
countable.
7.
N X N = U“_2Ak. Since each Ak is countable, N X N is countable by
Theorem 2.3.1.
8.
We can assume that A and B are ranges of infinite sequences {ani}“ = j and
{b„}"=1, respectively. Then A XB = U"=2{(am, bn) : m + n = k}.
9.
All three are countable.
10. Suppose Xi = *i2>
{ii»
• • •}>1 — L Fix n G N and consider the set
A = {m : m G N, ((m + l)(m + 2))/2 S: n}. A C N. Since
(n + l)(n + 2)
3n
—
— M,
2------ 2
n G A and therefore A / 0. It follows from the well-ordering property
that A has a least element m (m ) for which
(mj(m) + 1)(mj(m) + 2)
2
306
SOLUTIONS TO EXERCISES
Since m(n) is the smallest integer with this property, tn(n) - 1 does not
have this property, and so
+ 1)
“
2
"
and tn(n') is the largest element of N with this property. Now define /(n)
so that
m(n)(m(n) + 1) ,
n = ---------------------+ I («)
/(«) ~ m(n) + 1. Define a : N —> X = U”=1X„ by
where 1
putting
«(«) = Xm(„)+2-/(n ),/(«)•
Exercises 2.4
1.
Consider the statement
ft
P(n) : A\,... ,An E si
implies
^jAyGsi.
;=i
When n = 1, the statement simply says that A\ E si implies Ai E si,
which is trivially true. Assume that P(h) is true and consider P(n + 1).
Let Ai,.. .,An, An+i E si. By the induction hypothesis that P(n) is true,
Ai U • • • U An E si. Since si is an algebra, A! U • • • An U Am + i =
(A( U •■■An) U An+i E si. Therefore, P(n + 1) is true. It follows from
the principle of mathematical induction that P(m) is true for all positive
integers n.
2.
Suppose Ai,...,A„ E si. Since si is closed under complementation,
Ap ...,A„ E si. Since si is closed under finite unions by Problem 1,
UJ=1 AJ G si. Since si is closed under complementation, (U ”=1 AJ)C E
si. By de Morgan’s laws,
3.
Let {Ay} be a finite or infinite sequence in the cr-algebra S'. If finite, the
intersection is in S' by Problem 2 since S' is an algebra. Assume {Ay} is an
infinite sequence. Since S' is closed under complementation, the sequence
{AJ} is in S' and UAJ is in S'. Thus, AAy = (UAJ )c G S'.
SOLUTIONS TO EXERCISES
307
4. ft G S' since Oc = 0 is countable. Suppose A G S'. If A is countable,
then the complement of Ac is countable and Ac G
if A is not countable,
then Ac is countable and Ac G S'. In either case, Ac G S'. Let {A;} be a
sequence in S'. If every Aj is countable, then U)Aj is countable and belongs
to S'; if some Aja is not countable, then Aj0 is countable; (UAy)c = (AAJ)
is countable since it is a subset of the countable Aj0, and so UAy G S'. In
either case, UAy G S'.
5.
For each n > 1, let A2n be the event “success occurs for the first time on
trial numbered 2m.” Let A = U„ = 1A2n. Since the A2n are disjoint events,
P(A) = S:=1P(A2„). ButP(A2„) = q2"-1pand
P(A) = ^q2n~lp = £
n=l
=
*1 n = l
= P? j^2)"’1 = 7TG3
n=l
1
”
q
1 + q'
6. (5/36) + (31/36)2 (5/36) + (31/36)4 (5/36) + (31/36)6(5/36) + • • • =
.5373.
7. Let Wn,Rn, and B„ be the events that a white chip, a red chip, and a
black chip are chosen on the nth drawing, respectively. The event A of
interest can be decomposed into disjoint events according to the trial at
which a white chip appears for the first time while being preceded by
red chips; i.e., A =
U (Pi A W2) U (Pi A P2 A W3) U • • • and
P(A) = P(Wi)+P(Pi AW2) + P(Pi AP2AW3) + --- = w/(w + b).
8.
9.
Define the sequence {By }J°=1 byBi = Ai and By = Aj AA{_1 for; S: 2.
The By are disjoint. Since each By C Aj,j 2: 1, UBy C UAy. Suppose
a> G UAy. Then there is a largest k >: 1 such that w G Ak and
to G UyZ} Aj, so that w G Ak A
= Bk C UBy. Thus, UAy C UBy
and the two are equal.
Fix n S: 1. Since A = Pl°°=lAj C A„,An = AU (An AAC). Let
Bn = An A Ac. Since A„+i C An, B„+i = An+i A Ac G An A Ac = Bn
and{By}y°=1 is a decreasing sequence. Nown“=iBy = n“=1(Ay AAC). It
is easy to check that the last set is equal to Ac A fl
, Aj = Ac DA = 0.
Exercises 2.5
1.
Let A be the event “outer diameter of the sleeve co is too large” and
let B be the event “inner diameter of the sleeve to is too large.” Then
P(A) = .05 and P(B) = .03. Since we know nothing about P(A A B),
P(A UB) < P(A) + P(B) = .08. Since .05 = P(A) < P(A U B) and
.03 = P(B) < P(AUB), .05 < P(AUB) < P(A) + P(B) = .08.
2.
Let A, B, and C be three arbitrary events. Then, by Equation 2.11,
P(A U B U C) = P((A UB) U C) = P(A U B) + P(C) - P((A U B) A C).
308
SOLUTIONS TO EXERCISES
But
P(A U B) = P(A) + P(B) - P(A A B)
and
p((a ub) no = p((a n C) u (B a C))
= P(A nC) + P(B AC) - P((Aa C) a (B n C))
= p(a n c) + p(b n C) - P(a n b n c).
Thus,
P(A U B U C) = P(A) + P(B) - P(A A B) + P(C)
- p(anq- p(b dc>) + p(adb n c).
3.
4.
65/96.
(a) P = 4 • (1000/104) - 6 • (100/104) + 4 • (10/104) - (1/104) =
.3439 (b) P = 1 - (9/10)4 = .3439.
5.
The required probability is P((A A Bc) U (B A Ac)) = P(A A Bc) +
P(B n ac) - P((a n bc) n (B n ac)) = P(A) - P(a n B) + P(B) P(A AB) = P(A) + P(B) - 2P(A AB).
6.
No. The probability of getting at least one ace with the throw of four dice
is
~ .51775,
whereas the probability of getting a double ace in 24 throws of a pair of
dice is
1-
.49140.
\36/
7.
Suppose c«>o =
1 where each 8; is a 1 or a 0. For each n S: 1,
let An = {to : a) = {xjJJLpX! = 8|,...,x„ = 8„}. Then {co0} =
nJ3-; Aj C An for all n S: 1. Let a = max(p, q) < 1. Since
0 < P(A„) =
= an
and lim„_,oc an = 0, lim„_ooP(A„) = 0, and so P(coq) = 0.
309
SOLUTIONS TO EXERCISES
8.
SinceAAB CAandAAB CB,P(AAB) == P(A) andP(AAB) < P(B),
sothatP(AAB)
min(P(A),P(B)). On the other hand, P(A A B) =
P(A) + P(B) - P(A UB) > P(A) + P(B) - 1 since P(A U B) < 1.
9.
The inequality is trivially true when n — 1. Assume the inequality is true
for n. By Problem 8,
P(Aj A —AA„ AAn+i)
> P(Aj A---A„)+P(A„+1)“1
> (P(A!) + ---+P(A„)-(n - 1)) + P(A„-i) — 1
= P(A1) + ---+P(A„+J)-n,
and the inequality is true for n + 1 whenever it is true for n. By the
principle of mathematical induction, it is true for all n S: 1.
Exercises 2.6
1.
By mutual independence, the pairs A and B, A and C, B and C are inde­
pendent. By Theorem 2,6.1, the pairs A and Bc, Bc and C are independent.
Thus, the pairs A and Bc, A and C, Bc and C are independent. We need
onlyshowthatP(AABcAC) = P(A)P(BC>)P(C>). ByEquation2.8,P(AA
BCAC) = P((AAC)ABC) = P(AAC)-P(AAC AB) = P(A)P(C)P(A)P(B)P(C) = P(A)P(C)(1 - P(B)) = P(A')P(BC')P(C'). Thus, A,
Bf, and C are mutually independent.
2. Let Ai,.. .,A„ be mutually independent. This means that if 1
• • • 4 < n, then
h <
k
P(A,-, A —AA,J =
j=i
If we can show that this equation remains valid whenever some A^ is
replaced by its complement, then this procedure can be repeated as often
as necessary to arrive at the B>.. Just by interchanging the positions of
the Ai;, we can assume that we want to replace A;, by its complement. By
Equation 2.8,
P(AJ, A A,2 A • • • A A,J = P(A,-2 A • • • A Ait) - P(A;, A • • • A AJ
= n/w-np(A>)
j=2
k
(
j=l
\/
\
npuo) i -p(a.)
J- = 2
/\
/
= P(ApP(A;2) X ••• XP(A,t).
310
SOLUTIONS TO EXERCISES
3.
(a) P(A A B A C) = P(A A C)B A C)) = P(A |B A C)P(C) =
P(A\B A C)P(B|C)P(C). (b)P(A, A • •• AA„) = P(A„ |Aj A • • • A
A„-i)P(A„-1 |A] A • • • A A„-2) X • • • X P(A2 |Ai)P(Ai), provided the
conditional probabilities are defined.
4.
The only outcomes
with positive probabilities are (11,12,13),
(11,12,11), (11,10,11), (11,10,9), (9,10,11), (9,10, 9), (9,8,9), (9, 8,7).
Let Ri be the event “the ith ball selected is red,” i = 1,2, 3. Then
P(ll, 12,13) = P(Rci ClRc2ClRcJ
= P(P'|PJ AP')P(^|PJ)P(PJ)
1
_ £
~ 20
20
2
= _L
“ 100'
Similarly, P(ll, 12, 11) = 27/200, P( 11,10,11) = 11/80, P( 11,10,9) =
11/80, P(9,10,11) = 11/80, P(9,10,9) = 11/80, P(9, 8,9) = 27/200,
P(9, 8, 7) = 9/100.
5.
6.
7.
We are given P(A) = P(B) = P(C) = 1/2, P(A A B) = P(A A C) =
P(B AC) = P(A AB AC) = 1/4. (a) Since P(A A B) = P(A)P(B),
P(A AC) = P(A)P(C), and P(B AC) = P(B)P(C), A, B, and C
are pairwise independent, (b) Since P(A AB A C) = 1/4 ¥= 1/8 =
P(A)P(B)P(C), the three events are not mutually independent.
Let A be the event “1 is sent” and let B be the event “1 is received.” By
Bayes’ rule,
(a)P(AjB) = .9896.
(b)P(Ac|Bc) = .9519.
For i = 1, 2,3, let A, be the event “the ith digit sent is 1” and let B, be
the event “the ith digit received is 1.” By independence and the previous
problem,
P(At AA2 AA3|Bi AB$AB3)
= P((Ai A Bi) A (A2 A B2c) A (A3 A B3))
P(Bi A B2 A B3)
_ P(Ay A B,)P(A2 A B2c)P(A3 A B3)
P(B,)P(B2)P(B3)
= P(A1[B1)P(A2|B$)P(A3[B3)
= (,9896)2(.0481) = .0471.
8.
Let Ct be the event “Chest i is selected” and let G be the event “Gold
coin is observed.” The given facts are P(CJ = P(C2) - P(C3) =
1/3, P(G | Ci) = 1,P(G[C2) = 1/2,P(G|C3) = 0. Given that the
observed coin is gold, the only way that the other coin can be gold is for
31 1
SOLUTIONS TO EXERCISES
the outcome to be in Chest 1, which means that we are required to find
P(C| | G). Apply Bayes’ rule. P(Ci | G) = 2/3.
9.
P((AUB)A(CAD)) = P((AACAD)U(B ACAD)) = P(AACAD) +
P(BACAD)-P(AAB ACAD) = P(A)P(C)P(D)+P(B)P(C)P(D)P(A)P(B')P(C)P(D) = P(C)P(D){P(A)+P(B)-P(A)P(B)} = P(CA
D){P(A) + P(B) -P(A AB)} = P(C n D)P(A U B).
10. Since the events A A Aj,j S: 1, are disjoint, P (A A (UA; )) = P(U(A A
A;)) = EP(A AA;) = SP(A)P(A;) = P(A)P(UAy).
11. (a) .9007. (b) .9021. Not much is gained by adding B3.
12. Three of the B, and two of the Cj at a cost of $460.
Exercises 2.7
1.
28,319/44,800 « .6321.
2.
(a) For equalization to occur on the 2 nth trial, there must be n heads and
n tails, (b) Using the fact thatp(l — p) < 1/4 and the limit ratio test, the
infinite series
=s(2")p”<i-p)”
n=1
n=1
converges, and so P(A2ni.o.) = 0.
3.
Let Bn,r be the event consisting of those outcomes for which there is a run
of length r beginning on the nth trial. Since P(B„,r) = l/2r+1, the events
Br+2,r > B2r+4,r.B»(r+2),r.... are independent events, and
00
CO
1
^P(Bn[r+2),r) = 22 tTh = +°°’
n=l
n=l
P(BM(r+2),r j.o.) = 1. Since (B„(r+2),r j.o.) C (B„,r».o.) C (A„,r i.o.),
P(A„,ri.O.) = 1.
4.
Since
CO
CO
1
co
J
^PCAn.rJ = 22 2(l+8)log2n “ 2E2 n l+S < °°’
n=1
5.
n=1
n=1
P(A„,r„ i.O.) = 0.
For ft, take all sequences w = {x;}"_, where each x, G {2,3,..., 12}. For
n > 1,1 < i| < »2 < • • • < > and 5b ...,
G {2,3,..., 12}, let
A,-... in(8i,...,8„) = {w: w = {x,}“=1,x,- = 8i,...,xin = 8„}
312
SOLUTIONS TO EXERCISES
and let S' be the smallest <r-algebra containing all such sets. Define
P(Ait... i„(8b..., 8„)) = p(A;i(81)n---nA;„(S„))
n
;=i
n
= np<5<)
J=1
where p is the weight function given in Figure 1.3.
6.
E”=i (5/36) (25/36)""1 (5/36) = 25/(11 • 36).
7.
6/36 +2/36+ 2{ 1/36+ 2/45 +25/(11)(36)} « .4929.
8.
The probability of losing is 1/36 + 2/36 + 1/36 + 2{1/18 + 1/15 +
5/66} .507. There is probability 1 that the game will terminate,
since the sum of the probabilities of winning and losing is 1.
9.
.63210558.
10. Let T be the number of purchases required to obtain a complete set of
collectibles. Then
P(T > m) =2(“1)r"1(,)(-3 + -7(1“ r\\"
8// ’
r=1
Since P(T > 55) > .05, P(T > 56) < .05, 56 purchases are required.
CHAPTER 3
Exercises 3.2
1.
.4050.
2.
47.
3. .2202.
4.
fxW =
fr(y) =
2(-x+n + l)
h(h + 1)
0
2(—y+n-t-1)
n(n + l)
0
ifx = 1, 2,... n
otherwise.
if y = 1,2,... n
otherwise.
313
SOLUTIONS TO EXERCISES
5.
Let z G {1,2,..., 6}. By Equation 3.6,
6.
The recursion formula is
kq
The ratio is
WiMO
b(k — 1; n,p)
= j+
k = li2.......
kq
Let tn = [(« + 1 )p], the greatest integer less than or equal to (n + l)p. If
(n + l)p is not an integer, the b(k;n,p) are strictly increasing for k
tn
and then are strictly decreasing for k
tn, so that the maximum value is
b(m-t n,p). If (n + l)p is an integer, then both b(nr, n,p) andb(tn — 1; n,p)
are maximum values.
7.
8.
9.
fx is a Poisson density with parameter a, and fy is a Poisson density with
parameter /?.
fxM = cf(xi) where c = Sj°=ig(y;)» and/y(y; ) = dgty^ where
d =
fxM = -25 for x = 1,2,3,4. /y(l) = .12,/y(2) = .27,/y(3) =
.20,fY(4) = .25,/y(5) = .16. P(Y > X) = .70
10. Think of O as points (i, j) in the plane with integer coordinates 1
i,j < 50, except that points on the diagonal i = j are not included.
Points on the two diagonals adjacent to the diagonal are not included in
the event.
a2) = 2^= '9S11. .9341.
12. 372.
Exercises 3.3
1.
For all real t, (1 + t)a+b = (I + t)a(1 + t)b. By the generalized binomial
theorem,
z=0
x=0
y=0
7
314
SOLUTIONS TO EXERCISES
X
= 2>tz
z =0
where
* = £(“)(
b
)•
VX ' VZ —X '
x=0
If two power series agree on an open interval about 0, then their coefficients
must be equal. Thus,
x=0
2.
Forz = 0,1,2,...,
2
fzw = X (7 V<>'C7
*
X )p,(-«)!”
x =0
=r'(-,)-±(7)(27x)
x=0
Thus, Z has a negative binomial density with parameters r + $ and p.
3.
4.
p(x > y) = ((m + i)/2«),p(x = y) = i/«.
fx.Y(x,y} = P(X = x,Y = y) = P(X = x,Y = y,N = x + y)
= P(X = x,Y = y\N = x + y'jPtN = x + y)
= (Aprc-AP(A?r
Xl
Since X
Aa
yi
N,
X
fx(x) = P(X = x) = ^P(X = x\N = n)P(N = n)
fl
=±(:w-^
n=x
Xl
'*•
^(n~x)i
SOLUTIONS TO EXERCISES
315
x!I c
c
Similarly,
P(T = y) =
and X and Y are independent random
Thus, /x,y(x,y) = /x (*
)/?(/)»
variables.
5.
P(X > T) = 1/(2 —p), P(X = K) = p/(2 - p).
6.
For z = 2,3,...,
z
z-1
/z(z) = ^fxMfy(y') = ^pqx~lpqz~x~l = (z ~ Vp2qz~2.
x=0
x=l
z-1
if 2
7.
/z(z) = 4
8.
z < n
-z+2n + l
---- rp-----
if n < z
0
otherwise.
2n
Suppose the ranges of X and Y are {xb x2,...} and {yb y2, • • ■}, respectively.
Suppose <pi and ipj are in the ranges of <A(X) and ip(y), respectively. Let
{x/p...,xla} be the set of values of X such that </>(x,m) = </>,- and let
{y;i,... ,yjp} be the set of values of Y such that ip(y7n) = <Ay- Then
(<A(X) =
= ipj) = Um,n(X = xim,Y = yjn). Since the latter
sets are disjoint,
P(0(X) = <Ai><A(n = <Aj) =
= X-Jp<r
mtn
= Ep<x “x<-> Ep<y = ».)
\m
/\n
= P(Um(X = x,J)P(U„(y = yjn))
= P(<A(X) = <Ai)P(<A(T) = <A;)9. /z(D = /z(i3) = .ooo,;z(2) = ;Z(12) = .004,;z(3) = ;Z(11) =
.018,/z(4) = /z(10) = .057,/z(5) = /z(9) = .122,/z(6) = /z(8) =
. 189, fz(7) = .219.
316
SOLUTIONS TO EXERCISES
Exercises 3.4
1. fx(t') = t(l ~ tn)/n(l — t).
2. SinceA(t) = 1 —(1 —12)1/2 = I- *
=o
X
(
) ( —= 0> a2j+i =
) for all j > 1.
0 for all j > 0,and«2j = ( —1)>+1(
3. (a) Geometric density with p = 1/3. (b) Poisson density with A = 1/4.
(c) Negative binomial density with r = 5 and p = 1/8.
4. X =Xj+---+XW whereX[,X2, ••• is an infinite sequence of independent
random variables with/xj (0 = (1/2) + (l/2)f, N has generating function
fx(t} = (t/6)((l — t6)/(l — t))» and the random variables N,Xi,X2, ...
are independent. Thus,
.
> ,> , x
.5 + .5t
1 - (.5 + .5t)6
fx(t} - fN(fxSt) “
6
x 1_(5+5r)-
5. /(t) is the generating function of a random number of random variables
Sn =Xi + -,,+Xx where the X/s are Bernoulli random variables with
p = 1/3 and N has a uniform density on {1,2,..., 6}. This could arise
from tossing a die to determine how many times a basic S or F trial with
p = 1/3 should be performed.
6. /x(2x) = (2xe~2)/xl and/x(2x + 1) = 0 for x = 0,1,... .
7. Letting tn = [n/2],m is the largest integer such that 2m
n. Write
En = (2"=IXj G {0,2....,2m}). Stratifying En using the values of Xi,
E„ = (Xi = O.^X, G {0,2,...,2m})
J =i
n
U (Xi = l,2Xj e {0,2,...,2m})
j=i
n
= (Xi = O.^Xj G {0,2,...,2m})
J =2
{ n
Xi = 1,
e {0,2,..., 2m}
\j = 2
<
and so
n
pn = P(£„) = P(Xt = O.^Xj G {0,2,...,2m})
J=2
/
/ n
¥\
+ P IXi = 1,1
Xj G {0,2,..., 2m} I j.
\
v=2
//
SOLUTIONS TO EXERCISES
317
Since Xi and ^"=2Xj are independent by Lemma 3.3.3,
\
n
(
(
e {0,2, ...,2m}
i=2
/
/ n-
\\
e {0,2, ...,2m} jj.
1 “P
//
V=2
Since PtS^Xj e {0,2,...,2m}) = P(Z"^Xj G {0,2,...,2m}) =
Pn — l>
Pn = qpn-\ +P(1 -p„-l).
8.
The difference equation is q„ — (l/2)q„-i + (l/4)q„-2, n S: 2, subject
to the initial conditions qo — qi ~ 1- The solution is
9.
.03262.
10. fz(l) = 1/192, fz(2) = 1/32,fz(3) = 1/12, fz(4) - 13/96,/z(5) =
31/192), ;z(6) = l/6,fz(7) = 31/192, ;z(8) = 13/96,/z(9) = 1/12,
;z(10) = 1/32,/z(ll) = 1/192.
11. /x(0) = .00790, £(1) = -04972,/x(2) = .13998, £ (3) = ..23204,
/x(4) = .25083, £ (5) = .18474, fx(6) = .09390, fx(7) = .03252,
£(8) = .00735,£(9) = .00098,£(10) = .00006.
Exercises 3.5
1.
Add the right sides of Equations 3.14 and 3.16 to verify that px + qx = 1
in thep
q case and the right sides of Equations 3.15 and 3.17 to verify
that px + qx = 1 in the p = q = 1/2 case.
2.
lima_>oo qx = 1 is the probability of eventual ruin against an infinitely rich
adversary in the unfair situation q > p.
3.
If y2 + •' • + Yj = y f°r some; S: 2, then there is a smallest integer for
which this is true; i.e.,
(y2 + ... + r. =y) =
00
(Y2 =y)U U(y2^y’---’y2 + - ■ ■ + Y2+k
k=0
y> y2 + • • • + Yi+k — y)-
318
SOLUTIONS TO EXERCISES
Since the events on the right side are disjoint and each is independent of
(K, = 1), the events (Yj = 1) and (Y2 + '' • Yj = y for some) > 2)
are independent (see Exercises 2.6.9).
4.
The difference equation for the probability of ruin qx is
*
*
qx = aqx+i + fiqx + yqx-i>
1 < x < a - 1,
subject to the boundary conditions
qo = i,qa = o.
The equation is the same as the difference equation
= Y^qx+l + T~^pqx~1'
The solution to this equation is obtained by replacing p by a/(l — j3) and
q by y/( 1 — /?) in Equations 3.14 and 3.15 to obtain
(y/a)fl - (y/a)x
qx = ---------:----------- >
(y/a)fl - 1
in the a
1 < x
a — 1,
p case and
q x = 1 — —,
1 < x
a — 1,
in the a = j3 case.
CHAPTER 4
Exercises 4.2
1.
E[X2] = (I+q)/p2.
2. E[X2]=A2 + A.
3. E[X] = 2.
4. E[1/(X + 1)] = (1 — e-A)/A.
5.
E[1/(X + 1)] = p/(q(r — 1)).
6. B[X] = gx(l) = E:=ogxW = *Z x = 0P(X > x) = ZX
x=QP(X >
x+1) = S7=1P(Xx).
7. The generating function of SN is given by fsN(t) = fN(fxt^ and
£[$n] = /sN(l). Since f'N(t) = fN(fXl^f'Xl(t),E[N] = fN(l),
SOLUTIONS TO EXERCISES
319
E[Xd =/Xi(l), and /X1(l) = 1,
=/;(A1(l))4(l)
^(D/x/D
8.
Let T = n if the 10-digit combination occurs on the nth trial for the first
time. Then
and E[T] = gr(l) = 2046.
9.
P(T < 11) = 1 ~P(T > 11) = 1 -gr(ll) = 279/1024.
Exercises 4.3
1.
E[XZ] = 154/9.
2. E[17] = 91/36.
3. varX = q/p2.
4. varX = 32.
5.
varX = r(q/p2).
6.
d = 377.
7.
n = 750.
8.
E[V] = Vir(ni/n) + • • • + Vsr(ns/n).
9. 106.5 pounds.
10. varX; = r(n;/n)((n — r)(n — H;)/n(n — 1)).
11. Let {xi,X2,.. .}be the range ofX. By hypothesis, *
)
X/
— /i)2/x(xj) =
For any j with Xj — /z ¥= O.fx(xj') = 0. Since 2;/x(xj) ~ 1, there is
some Xj in the range of X with fx (x,) > 0 that cannot be different from
fi. Therefore, x, =
>0and/x(/z) = 1; i.e., P(X = /z) = 1.
12. varX = 2^'(l)+g(l)-[g(l)P.
13. aT = 27.09.
Exercises 4.4
1.
2.
fix = .5,0-2 = 1.05,/zy = 2.6, a2 = .94, E[XY] = 1.7,p(X,Y) «
.40.
var(X+2T) = 6 + 3^.
3. Note that X + Y = 2. Since cov (X, 2) = 0andcov(X,X) = a2
x,
cov(X, T) = cov(X, 2 —X) = cov(X,2) - cov(X,X) = -a^. Since
a2-x = (Tx’PtX.Y) = -(r2x/aX(T2-x = “1-
320
SOLUTIONS TO EXERCISES
4.
p(Xi + 2X2 ~ X3,3X1 - X2 + X3) = 5/(2 7170).
5.
p(X,r) = -1/2.
6.
(a)
£[X,] = (r - l)n/rn = (1 - (l/r))".
(b)
£[XiXj = (r - 2T/rn = (1 “ (2/r))M.
(c)
(d)
£[Sr] = r((r - DVr") = r(l - (1/r))".
Since X,2 = Xf.varXi = (1 - (1/r))" - (1 - (1/r))2". Fori
cov(X;Xj) = (1 - (2/r))M - (1 - (1/r))2". Thus,
r
var Sr =
__
varX, + 2
i=l
7.
(a)
£[li.klj.k] = Ofori
(b)
(c)
£ [Z,-,fcZj,d = PiPj for
£[y,-] = npi and£[V,yj] = n(n — l)p,p7 fori
(d)
varT,- = Mp, (l - p, ).
cov (Yi,Yj) = -npipj andp(Yit Yj) = - yp,pj/(l — p, )(1 -py)
for i j.
(e)
8.
cov(X,-,X;)
l£:<J£r
j.
Suppose the range of X is {a, b} and the range of y is {c, d}. If £ is
any event, let Ie be the indicator function of £; i.e., /f(co) = 1 or 0
according to whether co is in £ or not. Let A = {co : X(co) = a}, C =
{co : K(co) = c}. Then X = aIA + bIA',Y = clc + dIC',XY =
acIAr\c + adIAr\ce + bcIA<r\c + bdl^QQe. Thus,
£[Xy] = (a - b)(c - d)P(A nC) + d(a - b)P(A)
+ b(c ~d)P(C) + bd
and
£[X]£[y] = (« - b)(c - d)P(A)P(C) + d(a - b)P(A)
+ b(c - d)P(C) + bd.
Since cov(X,y) = £[Xy] — £[X]£[y] = 0,P(AAC) = P(A)P(C).
Therefore, the pair {A, C} are independent events, as are the pairs {Ac, C},
{A, Cc}, and {Ac, Cc}. Thus, X and Y are independent.
SOLUTIONS TO EXERCISES
321
9. Minimize the function of two variables g(c, d~) = E[(Y — cX — d)2].
Then
cov(X,y)
2
ax
b = E[y] - c-ov(*>y)E[x].
4
fl
Exercises 4.5
1.
E[y] = 25.75, varT = 490.1875.
2.
E[K] = 25, varK = 500.
3.
Since Xi + X2 has a Poisson density/>(•; A! + A2) by Theorem 3.3.4, if n is
any positive integer,
DfY
iv j V _ \
^(-^1 = x,Xi+X2 = n)
P(Xi = x Xi+X2 = n) = -----_ ----PyXi +X2 = n)
_ P(X, = x)P(X2 = n - x)
P(X, +X2 = n)
_ P(XMp(n ~~X?A2)
p(n;A! + A2)
■ ~(n')l Ai Tf A2
x \M + ^2 j \M + a2 y
4. By Theorem 3.3.4, X + Y has a binomial density b(-,m + n,p).
z = 0,..., m + n and x = 0,..., z,
r
, . ,
P(X =x,X + r =z)
A|Xrf(x|2)-------- P(X + y==z)
_ p(x = x)P(y = z - x)
P(x + y = z)
_ b(x; m,p')b{z — x; n,p)
b(z; m + «,p)
(:)(2:x)
(mz+n)
By Equation 1.11,
E[X|X + y = z]
For
322
SOLUTIONS TO EXERCISES
tn
{m +n — 1
(T7)
2-1
tn
= -------- z.
» tn + n
5.
6.
E[X] = 50.
Since N andXM are independent, fxN |^(x | n) = fx„W and
£[Xn|N = n] = 2Lx/Xn|N(x|m) = 2Lx/Xn(x) = E[X„].
x
X
7.
By Problem 6,
X
»
E[$n] = 2>[$n IN = n]fN(n) = 2>[S„]/n(h)
n=0
n=0
x
= /zE[N],
=
n=0
Also,
X
X
E[$&] = 22^1^ = n]fN{n} =
n=0
n=0
x
x
= ^JvarSn + (E[S„])2)/n(«) = ^.(na2 + h2/z2)/n(»)
n=0
n =0
= <r2E[N] +/z2E[N2].
Therefore, var$x = <t2E[N] + /z2E[N2] — /z2(E[N])2 = <r2£[N] +
/z2 var N.
8.
Let Z = c. Then
f
/z|x(
lIx)> - P<ZD,=v 2>X x =
whenever/x(*
)
_
n
1
ifz = c
if z
c
> 0. Therefore,
E[Z|X =x] =2>/z|x(z|x) = c
z
wheneverfxM > 0.
SOLUTIONS TO EXERCISES
9.
323
By Inequality 4.8, <£(X)y has finite expectation and F[^>(X)y ]X = x] is
defined whenever/x(x) > 0. Suppose/x(x) > 0. Letting Z =
note that
JS[Z|X = x] = 2Lz/Z|x(z|x) = y' zfz\x(z\x').
z
z#0
For z / 0,
/z|x(z|x) = P(</>(X)y = z.X = x)/P(X = x).
Suppose </>(x)
0. Then
P(Y = z/<j>(x),X = x)
P(X = x)
fzlx(z\x)
and
Blz'x=x] = ^zfr,x[^x)
z
Z
2#0 4>M
I z
TFT
\0W
= <^(x)E[y|X =x].
If </>(x) = 0, then/Z|x(z |x) = P(0 = z,X ~ x)/P{X = x) = 0, and
F[Z|X =x] = ^zfz\x(z\x) = 0 = <£(x)E[y|X = x]
z#0
wheneverfx(x) > 0.
10. Dx will satisfy the difference equation
Dx = aDx+\ + pDx + yDx-i + 1
1 < x
a - 1,
and the boundary conditions
Do = 0,Da = 0.
We can assume that j3 < 1, since otherwise neither gambler or adversary
ever wins or loses. Thus, Dx satisfies
324
SOLUTIONS TO EXERCISES
subject to the boundary conditions
Do — 0, Da — 0.
Letting Dx = (1 - j3)Dx, Dx satisfies the equation
Dx = pDx+i + qbx-t + 1
and the boundary conditions
Do — 0, Da — 0
wherep = a/(l ~/3),q = y/(l — /?). Thus, Dx satisfies Equations 4.12
and 4.13 with p and q replaced by p and q, respectively. Dx is therefore
given by Equations 4.16 and 4.17, so that in the a
y case,
r,
D
1 ( x
al — (y/a)x\
~ ------- 1
—
'
I
x
1 — ft \y — a
y — al — (y/a)a J
1 < x ~ ci — 1,
and in the a = y case,
Exercises 4.6
1.
H(X) = 15/8 bits.
2.
(a)H(X) = - E:=i(1/2/ log(l/2r = Zx = 1x(l/2)X =
(1/2)^2“=1x(1/2)x-1 = 2 (see Example 4.3). (b) 2 bits.
3.
1 bit.
4. Use the equation H (X, K) = H(y|X) + H(X). SinceH(X) = logn and
H(K|X = /) = log/,i = l,...,n,H(y|X) = S- = 1(l/«) logi =
(1/«)S"=1 log/ = (log(n!))/«,H(X, y) = (log(fi!))/M +logn.
5. H(X) « 3.2744.
6. 5.7 bits.
7. 2 bits.
8.
By Lemma 4.6.1,
H(X|y) -H(X)
= “
IY(Xi Iyj ) losfx IY(*:• I Yj
) + ^fx.Y(Xi,
) log/X (x<)
SOLUTIONS TO EXERCISES
_
1
~ E77 X ■ fx.Y
ln2
t
325
w \ fx(Xi)
> Vi)ln 7—T । ,
i|yj)
*
vx|r(
/xfc) _ .
,/x|r(x; |/j)
9.
pi ~ .10307, p2 « .12273, p3 « .14615, p4 ~ .17403, p5 « .20724,
p6 « .24678.
CHAPTER
2.
n = 5 and Vj = 4/59, v2 = 8/59, v3 = 24/59, v4 = 23/59.
3.
Vi = q/(p + q), v2 = p/(p + q).
4.
5.
n = 2 and Vi = v2 = v3 = 1/3.
The result is true for n = 1 by hypothesis. Assume that the result is true
for n. Then
N
NN
y'.P'kpk.jtn + 1) =
k= 1
Jt = 1
6=1
N
N
€=1
N
=
k=1
=
6=1
and the result is true for n +1. By the principle of mathematical induction,
the result is true for all n S: 1.
6.
First show that each P(«) = [p/j(n)] is doubly stochastic as follows.
The result is true for n = 1 by hypothesis. Assume that P(n) is doubly
stochastic. Then
326
SOLUTIONS TO EXERCISES
N
NN
22?:,; (n + 1) = y'.y'Jpi,kpk,j(n')
:=1
i=1k=1
N
N
= ^Pk.jM^pi,k
i=l
k=l
"
N
= ^LPkjW =
b
k=l
and the result is true for n +1. By the principle of mathematical induction,
P(n) is doubly stochastic for every n S: 1. Since 2^ = 1 ?:,;(«) = 1,
n > 1,
NN
N
1 = lim V ?:,;(«) = V lhn P’-A”) = 2". Vj = Nvj>
i = l"“"
n~
* Ki = l
>=1
and therefore Vj = l/N,j = 1,..., N.
7.
8.
9.
Since P is doubly stochastic, Vj = 1/5, j = 1,.... 5, by the previous
exercise.
By Problem 5, /zy = X^=i Pkpk.j(n) for all n
1. Letting Vj =
lim„_oo pi,y(n),j =
N
Pj = ^PkVj = Vj,
j = 1,...,N.
k=l
Thus, {/zy}^=1 is the asymptotic distribution.
' 0
P(2n - 1) = P2n~l = [pi,j(2n - 1)] =
1
1
P(2n) = p2" = [p,-y(2n)] =
1
0
0
1/2
0
0
0
1/2
1/2
1/2 '
0
0
0
1/2
1/2
n > 1.
n > 1.
The asymptotic distribution is not defined since limn_,oop;,;(n) does not
exist.
' 0
i
?7T
0
D __
u
1
.2.^-1)
. 0
...
N7
0
^-i)
N2 2
2^N-2)
N2
0
0U
(N—2)2
N2
0 ...
U0 . . .
Q
u •••
...
0 0 '
U0 0U
0 0
u u
1
0
SOLUTIONS TO EXERCISES
11.
327
P(Xm+n = j\Xm
Ji> • • • > Xm+n — i
jn—i,Xm+n
. ........
m+1
jl>--->Xm+n
Jl.—.jn-1
1
P(Xm = i)
By Equation 5.2,
P(X„+„ = j |Xm = i)
77’^o)p'o.'i X * * ■ X pi„-\,ipi,j\ X ' " X pj„-uj
^2
1
x ------------ —
P(Xm = i)
= 22 Pi.Ji X ■ ’ • X Pi„-t,j>
and the latter does not depend upon nt.
12.
Pi = .2126372201, v2 = .2339009422, p3 = .2144524159,
Vi = .2157489844, p5 = .1232604374.
13.
Pi = .1098019693, p2 = .0365008905, p3 = .1201612072,
Vi = .2061555430, v5 = .2261706688, v6 = .3012097212.
Exercises 5.3
/ 2n \
' n '
(2n)(2n - l)(2n - 3) • • • 3 > 2 • 1
n!nl
2n(2n - 1)(2m -3) •••3- 1
m!
( —1)”223
”( —1/2)(—3/2) - • • ((1/2) - m)
n!
(-4)n(-l/2)(-3/2) • • • (-(1/2) -nil)
n!
2.
p = 1/3.
3.
qx satisfies the difference equation
qx = pqx+t + qqx-i>
2 < x < a - 1,
328
SOLUTIONS TO EXERCISES
subject to the boundary conditions qa = O,qo~ &Q1 — 1 — 8. Ifp # q,
Ifp = q,
1-5
= -r.------ TV”“ *)•
a (1 — 5) + 3
4.
Dx satisfies the difference equation
Dx = pDx+i + qDx-\ +1,
1 < x < a - 1,
subject to the boundary conditions Da = 0, 3D] = Dq. In the p # q
case, if we put
D = (1-5)
then Dx = B[TX] = A + B(q/p)x, 0
=
1
D(«-P)l,
\P/
a, where
x
\
Sq
P
Exercises 5.4
1.
q = -2+ Js « .236.
2.
qy = .204325.
3.
q = (1- |a - £ |)/2/3.
4.
Start with equationfx>tl(i) =
to obtain
(p(s)). differentiate twice, and set s = 1
Usethefactsthatp'(l) =
= ^,^"(1) = varX!(1) =
W =
obtain
= varX;-f;.(l) + (/;.(l))2 = varX;-^+^to
/x'.J1) = P-1 (<r2 ~ p. + /z2) + /z2(varX;- - p) + /z2>).
SOLUTIONS TO EXERCISES
329
Then
+ (/xj+l(l))2 — /z2 varX;-+ <r2/z;.
varXj+1
5.
For; S: 1, let P(j) be the proposition
varXj = <r2(/z2;-2 + /z2;-3 + • • • + p?-1).
Since varXi = a2 and P(l) is the proposition varXi ~ <r2(/z°) ~ a2,
P(I) is true. Assume P(j — 1) is true. By Problem 3,
varX; = /z2 varXj-i+/J-1<r2
= /z2<r2(/z2^-4 + /z2j"3
45 + • • • + /z'"2) + p)~x<y2
= <r2(/z2>-2 + /z2>-3 + • • • + pj + /z'-1),
and it follows that P(j) is true. Therefore, P( j) is true for all j S: 1 by
the principle of mathematical induction.
6.
q = .706420.
7.
q ~ .203188.
8.
q = .552719.
9. qi = .125, q2 = .177979, q3 = .204325, q< = .218344, q5 = .226058,
q6 = .230379, q7 = .232824, qs = .234214, q9 = .235007, =
.235461.
Exercises 5.5
1.
P(0) = 3/8,P(+l) = l/4,P( + 2) = 1/16,P(n) = 0 otherwise. X
*
(6/5)X„-i - (9/10)X„-2 + (2/5)X„_3 and aj = 21/160 « .131.
2.
The independence of Xn = (l/4)y„+(l/2)y„-i+(l/4)y„_2andXn-4 =
(l/4)y„-4 + (l/2)y„-5 + (l/4)y„-6 would seem to indicate that nothing
would be gained by including X„-4 in X
.
*
There is, however, a reduction
in the mean square error by including X„-4. cr\ « .117, and
Xn
Xn — 1
5
_Xn—2+ _Xn-3
5
□
=
_Xn-4.
3
3.
Letting
= varX„, a2
x = a2<r^ +<r2 or <r^(I — a2) = a2 > 0. Thus,
a2 < 1, and so | a | <1.
4.
ai = (pi(l - p2))/(l ~ P?)»«2 = (P2-p?)/(l “Pi)-
5.
Predicted value = 2.946.
330
SOLUTIONS TO EXERCISES
.CHAPTER 6.
Exercises 6.2
F(x) =
ifx < 0
if 0 < x < 1
ifx > 1.
FxW =
ifx < 0
if 0 < x < 1
ifx >: 1.
1.
2.
3.
2/5.
4.
3/7.
5.
F(x) =
6.
7.
0
(l/2)(x + I)2
1 - (1/2)(1 - x)2
1
c = l/2andP(-l < X < 1) = 1 - Me.
0
ifx < —1
F(x) = { (l/2)(x + 1)
if — 1 < x < 1
1
ifx S: 1.
Fy(y) =
if y < 0
ifO <y < 1
ify S: 1.
Fr(y) =
if y > 0
if 0 < y < 1
if y > 1.
8.
9.
10.
ifx
—1
if - 1 < x < 0
ifO < x < 1
ifx =2 1.
0
2x
1
0
ifx < 0
ifO < x < 1/2
if 1/2 < x < 5/4
ifx >: 5/4.
(a)
(b) For any x 6 R,(X < x) = Ur>GQ,rj<x(X
r,) 6 9? since
each (X
rj) G S' by (a).
(b)
(c) For any x £ R, (X >: x) = (X < x)c £ 9? since
(X < x) G 9? by (b).
(c)
(d) Foranyx G R, (X > x) = U^q.^JX > r,) 6 since
each (X S: x) 6 S' by (c).
(d)
(a). For any x 6 R, (X < x) = (X > x)c 6 S5 since
(X > x) 6 S' by (d).
331
SOLUTIONS TO EXERCISES
Exercises 6.3
1.
F'M =
1/4
1/2
if 1 < x < 2
if 4 < x < 5
otherwise.
0
(l/4)(x - 1)
1/4
(1/4) + (l/2)(x - 4)
3/4
F'(f)dt =
ifx < 1
if 1 < x < 2
if 2 < x < 4
if 4 < x < 5
if x > 5
Since F(5) = 1 5^3/4 = j2ooF'(t) dt, F' is not a density for F.
2.
3.
(a)
P(0 < X < 1) = 1/4.
(b)
P(0 < X < 1) = 1/2.
(c)
P(X = 1) = 1/4.
(d)
P(l/2 < X < 5/2) = 11/16.
Let G(y) = P(T
y) = P(sinX < y). If y < -l,G(y) = 0 and
if y > 1, G(y) = 1. Suppose — 1 < y < 1. Then G(y) = P(sinX
y) = P( —1 sinX < y) = P(arcsin( —1) < X
arcsiny) =
P(arcsiny) — F%( —tt/2). Since
0
FXM = < (l/7r)(x + (77/2))
ifx < — tt/2
if — tt/2
x < tt/2
ifx S: tt/2,
0
G(y) = < (1/tt) (arcsiny + (tt/2))
1
and
g(y) =
4.
l/(-n- Vl -y2)
0
if - 1 <y < 1
otherwise.
Since Y takes on values in [0,+°°] with probability 1, G(y) = P(Y s
y) = 0 whenever y < 0 and G(y) = P(T ~ y) = 1 whenever
y S: M. Suppose 0 <y < M. Then G(y) = P(min(X, M) < y) =
1 - P(min(X,M) >y) = 1 - P(X > y,M > y) = 1 - P(X > y) =
P(X < y) = 1 - e~y. Thus,
0
1
-e~>
G(y) = <
1
ify < 0
if 0 < y < M
ify S M.
332
SOLUTIONS TO EXERCISES
5.
Since the graph of G has a jump at M, G is not continuous and does not
have a density function.
ify < 0
0
2ye
if y == 0.
6- /y(y) =
eZ7.
f
0
Fx(x) = * 1 — (2/tt) arccos(x/100)
1
ifx < 0
ifO < x < 100
if x — 100.
(2/tt)(1/ 71002 -x2)
0
fxM =
ifO < x < 100
otherwise.
Exercises 6.4
1.
P(Y < X) = 1/2.
2. P(Y < X) = 3/5.
3. c = 8/tt, P(X > 1/2) = (2/3) - (3^3/477)
4. r , x _
5. [
e~x
0
ifx S: 0
ifx < 0
=
if z < 0
if 0
z < 1
ifz >: 1
0
/z(z) = S
1 ~ e~z
[ e~z(e — 1)
if z > 0
if z < 0.
6.
7.
1/(1+y)2
0
, , .
6(e 22 — e 3z)
0
fz(z) =
0
8.
F2(2) = <
z2/2
-1 + 2z - (z2/2)
1
Z
/z(2) =
2—z
0
if z > 0
if z < 0.
if z < 0
if 0
z < 1
if 1
z <2
if z > 2.
if 0 < z < 1
if 1 < z < 2
otherwise.
ify > 0
ify < 0.
333
SOLUTIONS TO EXERCISES
Exercises 6.5
1.
SinceX = <rZ + /z where Z has a standard normal density, Y = aX + b =
a <tZ + ap, + b, and Y has a m (a/z + b, a2 a2) density.
2.
Since P(X < 100) = #((100 - /z)/a) = .9938 and P(X < 60) =
#((60 — /z)/<r) = .9332, using the table of the normal distribution
function,
100 — /z
-------- -- = 2.5
60 — fj.
Solving for /z and a, /z = 0, <r = 40.
3.
According to Example 6.19, X2 and Y2 both have F(l/2, 1/2) densities. By
Theorem 6.5.2, Z = X2 + Y2 has a F(l, 1/2) density.
4.
1/24. Integrate by transforming to polar coordinates.
5.
By Equation 6.9, X2 and Y2 both have F(l/2, l/2<r2) densities. By
Theorem 6.5.2, W = X2 + Y2 has a F(l, l/2<r2) density. For z <
0, Fz(z) = 0. Forz > 0,Fz(z) = P(Z < z) = P{jw < z) =
P(W
z2) = Fw(z2). Thus,/Z(z) = 2z/w(z2), and
/z(z) =
6.
0
(z/<r2)e-^2/2<r^
The function </>(s) = (x — a)/(b — a) maps the interval (a, b) onto the
interval (0,1). Let Y = </>(X). Since X takes on values between a and
b, Y takes on values between 0 and 1. Thus, /y(y) = 0 if y < 0 or
y > 1. IfO <y < l,thenFy(y) = P(T < y) = P((X-«)/(&-«) <
y) = P(X
y(b — a) + a) = Fx(y(b ~ a) + a), and therefore
/r(y) = *
(y(b
F
- a) + a)(b - a) = 1. Thus,
/r(y) =
7.
if z < 0
ifz > 0.
1
0
ifO <y < 1
otherwise.
Let Y = #(X). Since # takes on values between 0 and 1, the same
is true of Y, and so /y(y) = 0 if y < 0 or y > 1. For 0 < y <
l,Fy(y) = P(#(X) < y) = P(X < #'1(y)) = #(#-I(y)) = y.
Thus./y (y) = 1 for 0 < y < 1 and Y has a uniform density on (0,1).
Exercises 6.6
1.
2.
Since the exponential density is the same as the F(l, A) density, Z =
Xi + • •• + X„ hasaF(n.A) density by Theorem 6.6.1.
Since each X, has a standard normal density, each X2 has a F(l/2,1/2)
density. By Theorem 6.6.1, W = X2 + • • • + X2 has a F(n/2,1/2) density.
334
SOLUTIONS TO EXERCISES
Since Z —
Therefore,
= 0 if z < 0. For z > 0,/z(z) = 2z/w(z2)-
(2/2"/2)r(n/2)z”-Ie-z2/2
3.
if z > 0
if z < 0.
(a) Proof of Theorem 6.6.1. Consider the proposition
P(m) :%!+••• +X„ has a F(«i + • • • + a„, A) density.
P(l) is trivially true sinceXi has a F(ai, A) density by hypothesis. Assume
that P(n — 1) is true. Then
+ • • • + X„-! has a T(ai + ••• + «„-!, A)
density. By Theorem 6.5.2, (Xi + - • -+Xn-|)+Xn hasaF(ai + • •• + «„, A)
density. Therefore, P (n) is true whenever P (n — 1) is true. By the principle
of mathematical induction, P(n) is true for all h S 1.
(b) Proof of Theorem 6.6.2. Consider the proposition
P(n) :Xi + •• • +Xn hasa n(/zi + • • • + /z„, <r2 + • • • + <r2) density.
P(l) is trivially true since Xi has a n(/zi,<7|) density by hypothesis.
Assume P(n - 1) is true. Then X! + • • • + XM-! has a m(/z1 + • • • +
P-n-i, <r2 + • • • +a2_1) density. By Theorem 6.5.1, (Xj + • • • +X„-i)+X„
has a n(/zi + • • • + /z„, a] + • • • + <r2) density. Therefore, P(h) is true
whenever P(n — 1) is true. By the principle of mathematical induction,
P(m) is true for all m S: 1.
fy(y) =
(l/2)arcsin ^/l — y2
0
if — 1 < y < 1
otherwise.
c
/ r/ r '^00
J0
\Jx1/2 \J(2/3)x2
\
\
dx3]dx2]dXl
/
/
P(Xj < 2X2 < 3X3) =
55'
6.
7.
Use a mathematical induction argument to show that
Fu.v(m,v) =
(!/«!)(-logx)"
if 0 < x < 1
otherwise.
vn — (v — u)n
0
if 0 £ H < V S 1
otherwise.
335
SOLUTIONS TO EXERCISES
For 0 s m < v < 1,
ru
n(n — l)(y — x)" 2dydx
JoJx ,
o
n(v — x)n~l dx
~ vn — (y ~ u}n.
AW = { 1/(10+x)2
8.
ifx > 0
ifx < 0.
AW = j l/(’0+^
ify
0
if y < 0.
if Z S: 0
if z < 0.
/z(z) =
ifx
0,y 5: 0
otherwise.
/x.y(x,y) =
^(x+y)
fx.Ylz(x,ylz) =
9.
For t
0
if x,y,z S: 0
otherwise.
0,
FtW = P(T
t) = P(max(TbT2) < t)
= P(Ti < t,T2 < t)
= PfTi < t)P(T2 < t)
fTW = Fr^fT.W+fT^PT^
= (1 -
+ j8i(t)e"^'^,(j)rfj(l - e~lo'fi2(s)ds)-
CHAPTER 7.
Exercises 7.2
1. £[sinX] = 2/tt.
2. £[|X|]=2/^.
3.
E[min(X,l/2)] = 3/8, £ [max (X, 1/2)] = 5/8.
4.
£[U] = l/(n + 1), £[U2] = 2/(n+ l)(n+2), £[V] = n/(n + 1),
£[ V2] = n/(n +2).
£[X] = 1/A, £[X2] = 2/A2.
5.
336
SOLUTIONS TO EXERCISES
6.
E[Xr] = ((a + r - l)(a + r - 2) X • • • X a)/Ar
7.
E[Xr] =
Jo
i(ai)r(a2)
~x)^dx
r(ai + a2) r(ai + r)r(a2)
P(ai)r(a2) Haj + a2 + r)
_______ (ai + r ~~ l)(<
*i
+ r — 2) X • • • X _______
(oti + a2 + r ~ l)(c
*i
+ a2 + r — 2) X • • • X (a1 + a2)’
since the second integral is equal to 1.
8.
Choose a < b so that a < c, < b,i =
x < a and F(x) = 1 for x > b and
Then F(x) = 0 for
</>(t) dF(t) — jab
dF(t).
Since </> is continuous at each c,-» given e > 0, there is a 8; >0 such that
| </>(%) — </>(g) | < e whenever |x — c, | <8,. Let
8 = min{8i,..8m,c2 - q,...,cm - cm-i}.
Then
| </>(%) — <A(g) | < e whenever |x — c< | < 8, i — 1,... ,m.
Let 7re be any partition of [a, 8] such that | 7re | < 8. Let rr = {x<),..., x„ }
be any partition of [a, b] finer that tte. Then each c, belongs to one and
onlyone interval (x;;_!, x;-], i = 1,..m. Note that F(xjt)-F(xjt- 1) = 0
if k is not one of the j/s. Let & be any point of (x
* —i, JCjt], k =
Then
n
m
2L<A(^)(F(xjt) - Fte-O) - y </>(c,)(F(c,) - F(c,—))
k=l
i=l
m
m
= 2>(&)(F(^)-FtXj^Y)
-F(Ci-))
i=1
i=1
m
m
= X <A(^)(F(G) - F(C,- -)) - y <A(G )(E(G ) - F(C,- -))
i =1
i~1
m
y, I
) | (F(Ci) - F(Ci -))
:=1
tn
< e^(F(a) -F(Ci-)) = e.
«=!
This shows that
</>(t)dF(t) = f/ </>(t) dF{tj =
lim^|_>oS2 = i ^(^)AFjt = Xi"=i ci(p(c<) “ F(cf-)).
337
SOLUTIONS TO EXERCISES
9.
£[X] = f0*“ xfx(x)dx = fo+°°(loxfx(x)dy)dx. Interchanging the order
of integration,
•+00
fx(x)dx jdy =
£[X] =
Jo
P(X>y')dy
Jo
r +<»
=
10.
Jo
(i - Fx(y))dy-
Since Y
X, (X > x) C (7 > y) and 1 — FxCx)
1 — Fy(x). By
Problem 5, £[X] = f0+"(l - Fx(x))dx < - Fr(x))dx = £[/].
Exercises 7.3
1.
£[max(X,/)] = 3/2.
2.
£[U |X = x,Y = y] = x/2.
3.
£[A | Xi = xi,...,Xn = x„] = (n + 1)/(1 + xi + • • • + x„).
E[Y |X = x] =
£[X|y =y] =
5.
(2/3)x
0
ifO<x<l
otherwise.
ifO <y < 1
otherwise.
(2/3)(y2 + y + l)/(y + 1)
0
The density of X is
(a^3“)/(x +/3)a+1
ifx > 0
ifx == 0.
The conditional density of A given that X = x is a F(a + l,x + j3) density.
6.
7.
£[RJ = (n- l)/(n + l),varR = 2(n - l)/((n + l)2(n + 2)).
fu\v(u I v) =
(n — l)(v — m)" 2
vn-l
£[U| V = v] = { V/Qn
ifO < v < 1
otherwise.
fz\x.Y(z\x,y') = |(l+x+y)3z2e~z(1+x+/)
£[Z|X = x,Y = y] =
ifx,y,z
3
1 +x +y"
0.
338
SOLUTIONS TO EXERCISES
(1 +x)/2
E[K |X = x] = <! (1 -x)/2
£[X I V = y] =
(/ - l)/(lny)
0
if - 1 < X < 0
if 0 < x < 1
otherwise.
ifO <y < 1
otherwise.
Exercises 7.4
1.
.7745.
2.
.5819.
3.
P(S360
4.
n = 757.
5.
P( |S10oo I
6.
7.
P(Siooo
-50) = .056.
A = 2.575 Jn/12.
8.
Let m = 106. Since/z = E[Xy] = 0 and a2 = 10“m/12» P( |S„ | <
5 • 10“m+2) < P( | *S | < ^3) « 2<I>( 73) - 1 « .9168.
59000) = .0622.
30) = .6574.
Exercises 7.5
The identities
1 + cos 2a
2
cos a =
1.
E[X„] — Xy"=1 djE[cos(nbj + Zj)] =
E[X2] = E I
^(1/2^) j02n cos(nbj + z) dz
a, cos(nb,
= 22«;2E[cos2(nb,- +Zj)]
+ ^/atajE[cos(nbi + Zj)]E[cos(nbj + Zj)]
339
SOLUTIONS TO EXERCISES
m
c2tt
cos2 (nb i + z}dz
)o
frv
1
2,7
'27r 1 +COS(2(m&, + z)) ,
•----------- ------------ dz
2
o
m
-y«i2^
For v S: 1,
r(r) = £[X„X„,J
/m
\/ m
= E I
a, cos(m bj + Z,) j
a, cos((n + v)bj + Zj)
m
= ^^a2E[cos{nb, +Z,)cos((m + p)b,- +Z,)]
i=l
+ ^^ajajE[cos(nbi +Z,)]E[cos((m + v)bj +Z;-)]
i^j
m a)
=
— E[cos((2m + v)bj + 2Z,) + cos(pb,)]
i=i 2
_^L 7,2
_fL 7,2
1
f27T
cos((2m + v)bi + 2z) dz
o
=i
7,2
m
=
2
cos vbi.
Since r(p) = r(~v) = r( | v|),
7
,
C0S vbi
£7=1
pM =
2.
Let/z = E[X0]andr(p) = cov(X0,X„). ThenE[K„] = MZf=i«jand
m
m
\2
\
E
m
\/ w
fli(Xn-i + i ~ /z) Il
=1
m
" fly(X„-y+i
~ ft)
/\J = 1
m
^(Xn—j + l
M)]
|
340
SOLUTIONS TO EXERCISES
m
m
- j),
=
:=lj=l
which is independent of n. For v 2: 1,
m
m
= ^y^aiajE[(Xn-i+i - /J-KKn+v-j+i ~ /z)]
:=1j=l
m
m
-
-j + v),
i=x 3 =x
which is independent of n. Since r(p) = r( —v) = r(|v|),
2X i X)"= i fli
“MM)
Xr=1Xr=I«,«;r(x--» ’
1 + a1
2 + 2 cos ak
2tt(1 + a2)
3.
4.
'
5.
v G Z.
— 77 < A < 77.
_ 1 + a2 + j32 + 2(a + aj3)cosA + 2j3 cos2A
~
277(1 + a2 + j32)
— 77 < A < TT.
E [XJ = Ej”=! Pj(1/277) f02,r cos (kj t + 0) d e = 0, and
m
। /• 2tt
BIX}] = Xft—
cos2(A; t + 0) d0
; = 1 277 Jo
= V> I [2'1±^W±2«^
y=l
277 Jo
2
For h > 0,
cov(XnXf+^) = £[cos(At + 0) cos(A(t + h) + 0)]
= ^B[cos(A(2t + /i) + 20) + cosA/i]
1 J”
i
f2,r
(cos(Aj(2t + h) + 20) + cosAi/j)d0
=
=
1 m
cosXjh-
SOLUTIONS TO EXERCISES
341
Since r(h} = r(-h') = r(|h|),r(h) = (l/2)Sj”=iPj cosAyli, and
therefore p(fi) = S”=iPj cos A; h.
CHAPTER
Exercises 8.2
1.
P(XS = k |Xf = «) = ( n *
d
)(s/t)
- (s/t))"~k.
2.
For i = 1,...,«, let
be the first time that
= 1 and let
W = max(W},...,W(1")). Then P(W < t) = (1 - e~At)n and
fw(t) = «Ae-Af(l — e-Af)"-1 for t >: 0 and = 0 for t
0.
3. Note that P(Xf-X[f) < 0) = 0. Thus,P(Xf > Xi'L j *(X — Xjt-i)) = 1,
and it suffices to show that P(limf_+ooX[f] = +°°) = 1. Since the events
Ab Az,... are independent events and *
P(A ) = Ae-A, X”=1 P(Ak) = °°.
By the Borel-Cantelli lemma, P(A„i.o.) = 1 and P(^“=l(Xk —Xjt-i) =
+00) = 1. Since limf_«>X[f] = ^=l(Xk - Xjt-i),P(limf*_ooX( f] =
oo) = 1.
5-
11
P„(t) = _(ln(l + t))"—
nl
1 +1
£[Xf] = ln(l + t).
Pio(lOO) « .012.
6.
Exercises 8.3
1.
P,(t) = e"^,P2(t) = e~pt - e_2^,P3(t) = e~^ - 2e"^f + e'3^.
2. By Problem 1, Pi(t) = e~^f and the assertion is true for n = 1. Assume
that the assertion is true for m — l;i.e.,P„-i(t) = e~^r(l —
By
Equation 8.9,
P„(t) = P(« “
Jo
-e'^)"“2ds
= (n - l)e“^"f f (ePs - Y)n~2fiePs ds
Jo
=
- I)"-1
= e'^d
Thus, the assertion is true for n whenever it is true for n - 1, and it follows
from the principle of mathematical induction that the assertion is true for
all n G N.
342
SOLUTIONS TO EXERCISES
3.
By Equation 8.9,
e^sPn(s}ds = 0
P'n+l(P) = Pne-p^
'
»%
Jo
since /3n = 0. Thus, P„+1(t) = 0 and so P„+i(t) = c. Using the initial
condition P„+1(0) = 0, c = 0 and P(Xt = n + 1) = PM+i(t) = Oforall
t, as would be expected without any calculations.
4. As in Example 8.1,
oc
M'(t) =
+
n=l
If there were no deaths in the population, by Theorem 8.3.1 we would have
P(Xf < +00) = X
=iP„-i(^)
*
= 1 since the series X“=i l/(a + /3n)
diverges. A fortiori, P(Xt < +00) =
Pn-i(t) ~ 1 in the presence
of deaths. Thus,
M’(t) = (/? - 8)M(t) + a,
and so
M(t) =
5.
(c) E[Xf] = noeK (d) P0(t) - 0,P„(t) = e~^(1 no = 1 case.
in the
Exercises 8.4
1.
(a)Fixi
1. The forward equation for p,i0(t) is
Pi,o(f) = '^'.Pi.k(lk,0 + ^o,opi,o(t).
Since q^.o = 0 if fc 2: 1, the equation reduces to
Pi,o^) = ~^p:,o(t)>
which has the general solution p,-(0(t) = c,e-Af where c, is a constant.
Since the continuity condition requires that limf_»0+pii0(t) = 0, c, = 0
and so p,-,0 = 0.
(b) Fix i > 1. From part (a), p,)0(t) = 0. Suppose pj.jt (t) = 0 for all
k < j — i ~ 1. The forward equation forp,j(t) is
= ^Mk,j +qj,jpi,jW).
k^i
343
SOLUTIONS TO EXERCISES
Since tjj+i.j = 0, ty-i.j = A, and
— 0 for all other cases for which
p'ij(t) = Ap,j-i(t) - Apj./t).
But
= 0 by the induction hypothesis. Thus,
p,'j(t) = -Api,;(t)
and pij(t') = 0 as in (a), and thereforepi,j(t) = 0 whenever; < i.
(c) Suppose; >: i. Then the forward equation for pI>; (t) is
Ap;J-l(t)
kpij(t)
so that
Integrating from 0 to t,
eXspi,j-i(s)ds
eXtpi.j(t) - 8ij
o
and
pi.jW = 8i.je At + ke Af
o
eXspi,j-i(s)ds.
Since pi(,-i(t) = 0,
= e Af
Thus,
eAse~As ds = kte~At
pf,f+i(t) = Ac Af
o
and
P:.:+2(f)
e^kse As ds
Jo
A2'2 -Ar
----- e ,
2
Ae Af
and so forth. By induction,
A^'t>-' _Af
Pi,jW = —---- rrre
344
SOLUTIONS TO EXERCISES
0
A
~(A + /z)
2/z
0
A
— (A + 2/z)
o
0
A
...
...
...
0
0
0
0
...
-A
2.
Q =
r
-A
fi
Q =
- 0
3.
A
-(ji + k)
/z
— (A + Aj)
fi
0
4.
Q =
0
0
0
tn fi
— nt fi
0
A
~fi
(A + A$)
-(k + fi)
fi
0
A
~fi
Exercises 8.5
1.
Since!" = I for all n s 1,
2.
Since Q2 = -(A + /z)Q,Q3 = (A +/z)2Q,..., Q" = (-1)" *(A +
fi}n~lQ,n > 1,
P(r) = I +
IfP(r) =
- (A+/x)r
—
—Q—Q-
then
, x
/z + ke~{K+^
pi,i(t) =
r---------
pi,2(t) =
A - ke~^+^‘
“ P-e (A+At)'
A + /ie
P2,1(O----- 7—----- ?2,2(t) —
3.
Writing Q as a block matrix,
Q =
A
O
O
D
Q"
where A = D =
A"
O
O
Dn
-1
1
n > 1.
1
-1
SOLUTIONS TO EXERCISES
345
By Problem 2 for n > 1,A" = (-l)"-12"-1AandD" = (-1)"-12"-1D.
Thus,
( —1)"-12"-1D
O
and
((-l/2)X:=1(-2f)"/nl)A
O
O
((-l/2)X:=1(-2t)"/n!)D
I+
Identifying entries in P(t),
1+
Pl,l(f) = P3,3(t) = ------~------
1 - e~2t
p2,l(f) — p4,3(f)----------- - ------
Pl,2(f) — p3,4(f) —
P2,2(P) = p4,4(t) “
For all other pairs (i,j),= 0.
Exercises 8.6
1.
Qi is irreducible; Q2 is not irreducible.
2.
tt,
3.
limf_+0opi(i(t) = 1/6, lim^o, pi,2(f) = 1/24, limt_oopi,3(f) = 1/8,
limt_Oop:,4(f) ~ 1/3, lim^oopi,s(f) = 1/3.
F .285 .365 .350 ‘
P(2) = {pij(2)} = .214 .419 .367 .
.205 .391 .404
4.
= .995000, tt2 = .004975, rr3 = .000025.
tti = 12/53, tt2 = 21/53, tt3 = 20/53.
5.
TTi = .195, tt2 = .132, tt3 = .182,
= .302, tt5 = .189.
STANDARD NORMAL DISTRIBUTION FUNCTION
*(x) = -4= [’ e"'lndt,
x > 0.
JlTT J~x
Forx < 0, use the relation ^(x) = 1 — <I>(— x).
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
346
.00
.5000
.5398
.5793
.6179
.6554
.6915
.7257
.7580
.7881
.8159
.8413
.8643
.8849
.9032
.9192
.9332
.9452
.9554
.9641
.9713
.9772
.9821
.9861
.9893
.9918
.9938
.9953
.9965
.9974
.9981
.9987
.01
.5040
.5438
.5832
.6217
.6591
.6950
.7291
.7611
.7910
.8186
.8438
.8665
.8869
.9049
.9207
.9345
.9463
.9564
.9649
.9719
.9778
.9826
.9864
.9896
.9920
.9940
.9955
.9966
.9975
.9982
.9987
.02
.5080
.5478
.5871
.6255
.6628
.6985
.7324
.7642
.7939
.8212
.8461
.8686
.8888
.9066
.9222
.9357
.9474
.9573
.9656
.9726
.9783
.9830
.9868
.9898
.9922
.9941
.9956
.9967
.9976
.9982
.9987
.03
.5120
.5517
.5910
.6293
.6664
.7019
.7357
.7673
.7967
.8238
.8485
.8708
.8907
.9082
.9236
.9370
.9484
.9582
.9664
.9732
.9788
.9834
.9871
.9901
.9925
.9943
.9957
.9968
.9977
.9983
.9988
.04
.5160
.5557
.5948
.6331
.6700
.7054
.7389
.7704
.7995
.8264
.8508
.8729
.8925
.9099
.9251
.9382
.9495
.9591
.9671
.9738
.9793
.9838
.9875
.9904
.9927
.9945
.9959
.9969
.9977
.9984
.9988
.05
.5199
.5596
.5987
.6368
.6736
.7088
.7422
.7734
.8023
.8289
.8531
.8749
.8944
.9115
.9265
.9394
.9505
.9599
.9678
.9744
.9798
.9842
.9878
.9906
.9929
.9946
.9960
.9970
.9978
.9984
.9989
.06
.5239
.5636
.6026
.6406
.6772
.7123
.7454
.7764
.8051
.8315
.8554
.8770
.8962
.9131
.9279
.9406
.9515
.9608
.9686
.9750
.9803
.9846
.9881
.9909
.9931
.9948
.9961
.9971
.9979
.9985
.9989
.07
.5279
.5675
.6064
.6443
.6808
.7157
.7486
.7794
.8078
.8340
.8577
.8790
.8980
.9147
.9292
.9418
.9525
.9616
.9693
.9756
.9808
.9850
.9884
.9911
.9932
.9949
.9962
.9972
.9979
.9985
.9989
.08
.5319
.5714
.6103
.6480
.6844
.7190
.7517
.7823
.8106
.8365
.8599
.8810
.8997
.9162
.9306
.9429
.9535
.9625
.9699
.9761
.9812
.9854
.9887
.9913
.9934
.9951
.9963
.9973
.9980
.9986
.9990
.09
.5359
.5753
.6141
.6517
.6879
.7224
.7549
.7852
.8133
.8389
.8621
.8830
.9015
.9177
.9319
.9441
.9545
.9633
.9706
.9767
.9817
.9857
.9890
.9916
.9936
.9952
.9964
.9974
.9981
.9986
.9990
SYMBOLS
e, 26
|A|, 4, 188
{An : i.o.}, 53
b(k : n,p), 61
C(n,r), 7
C(x, r), 9
cov (X, 7), 119, 225
Dx, 130
£[•], 100, 230
E[K |Xi = xb...,X„ = x„J, 128,241
/x. 61,194
fx y, 68, 201
A’„..X.69,22O
/x.82
/r(x(y | x), 126,127
/r|X|,...,xn(y 1 *
i> . • .,xn), 126
Fx,182
Fx>y, 201
9s, 37
gx .104
gx, 104
N, 26
N(A) ,4
(n)r,7
n(/z, <r2), 211
o(h), 268
p(k : A), 65
p, 39
px.94
347
P, 2,37
P(A | B), 22,45
q,39
<fr,92
Q, 34
R, 34
R(n), 173
varX, 112
(X = x), 61
Z, 34
C, 26
U, 27
D,27
X U Y, 27
X D Y, 27
Ac, 27
j3(ai, a2), 215
T(a), 213
T(a, A), 213
0,18, 36
(O, 9?, P), 37
Px. Hl
p(n), 259
p(X, K), 255
</>(x), 210
4>(x), 211
<rx, 112, 239
~, 161
INDEX
cr-algebra, 37
smallest, 37
Abel’s theorem, 103
absolutely continuous, 194
absolutely convergent, 229
absorption, 159
algebra, 36
Archimedian property, 28
associative laws, 29
asymptotic distribution, 155
asymptotically equivalent, 161
Bayes’ rule, 46
Bernoulli random variables, 76,127,130
Bernoulli trials, 19,39, 55, 73, 86,92, 105, 109,114,
115,243
beta density, 215
binary information source, 147
binomial density, 61, 77, 79
binomial theorem, 8
birth and death processes, 273
birthday problem, 14
bit, 134
Boole’s inequality, 43
Borel-Cantelli lemma, 53
branching process, 167
extinction, 169
Cantor’s diagonalization procedure, 34
Cardano, 1, 3
Cauchy density, 196, 232
central limit theorem, 245
chain molecule, 273
Chapman-Kolmogorov equation, 150, 280
348
Chebyshev’s inequality, 114, 241
commutative laws, 28
complement, 27
composite function, 66
compound probabilities, 45
conditional density, 126, 217
conditional entropy, 139
conditional expectation, 128,241
conditional probability, 21,22,45
conditional probability function, 45
conditional uncertainty, 139
configuration, 5,10
continuous random variable, 184
converges in mean square, 257
correlation, 122, 255
correlation function, 173, 259
countable, 32
countably infinite, 33
coupon collector problem, 57
covariance, 119
covariance function, 173, 259
de Morgan’s laws, 30
DeMoivre, 245
DeMoivre-Laplace limit theorem, 253
density function, 61,194
beta, 215
binomial, 61
Cauchy, 232
conditional, 217
exponential, 210
gamma, 213, 223
Gauss, 210
geometric, 62
joint, 68, 69, 220
INDEX
Laplace, 210
marginal, 204
multinomial, 74
negative binomial, 63
normal, 210,223
Poisson, 64,65
Rayleigh, 216
spectral, 261
standard normal density, 210
uniform, 66,210
difference equation, 93
discrete random variable, 61
disjoint, 27
distribution function, 182
joint, 220
spectral, 261
standard normal, 211
distributive laws, 29,30
domain, 31
double sequence, 87
double series, 88
doubly stochastic, 157
Ehrenfest diffusion model, 147,155
elastic barrier, 167
empirical law, 2
empty set, 26
entropy, 134
conditional, 139
joint, 138
maximum principle, 140
equalization, 58
events, 37,182
expected value, 99,100,111,230
binomial density, 100
geometric density, 101
negative binomial density, 105
Poisson density, 101
uniform density, 100
exponential density, 210
mean, 240
variance, 240
extended real-valued random variable, 62,182
Fermat, 2
finite, 32
finite second moment, 111
finite sequence, 32
functional
349
gambler’s ruin problem, 92, 130,159
gamma density, 213
mean, 240
variance, 240
gamma density function, 223
garage door opener, 52, 55, 105, 107
Gauss density, 210
generating function, 81,82
binomial density, 83
geometric density, 83
negative binomial density, 83
Poisson density, 83
random variable, 82
sequence, 81
sum of random variables, 84
geometric density function, 62
geometric probabilities, 188
Huygens, 2
inclusion/exdusion principle, 56,57
independence, continuous random
variables, 222
independent, 49
independent random variables, 72,
75, 206
inequality
Chebyshev’s, 114
Markov’s, 113
Schwarz’s, 118, 255
triangle, 256
infinite sequence, 32
infinitely often, 53
information, 133
initial density, 146
intersection, 27
irreducible, 154,293
joint conditional density, 217
joint density function, 68, 69,220
joint distribution function, 198, 220
joint entropy, 138
joint uncertainty, 138
Kolmogorov backward equations, 282
Kolmogorov forward equations, 282
Laplace density, 210
limiting distribution, 155
350
INDEX
linear dependence, 123
linear predictor, 175
mapping, 31
marginal density function, 204
Markov chain, 145, 279
Markov property, 279
Markov’s inequality, 113, 240
match, 55
maximum entropy principle, 140
mean, 111,211,239
binomial density, 100
geometric density, 101
negative binomial density, 105
Poisson density, 101
uniform density, 100
mean square convergence, 257
mean square distance, 256
mean square error, 175
moment, finite second, 111
moving average process, 174, 262
multinomial density, 75
multinomial density function, 74
mutually exclusive, 27
mutually independent, 50, 53
n-step transition matrix, 149
n-step transition probabilities, 149
negative binomial density, 63, 79
nonnegative integer-valued random variable, 77
norm, 256
normal density, 210, 211
mean, 240
variance, 240
normal density function, 223
one-step transition probabilities, 145
ordered pairs, 5
ordered r-tuple, 6
ordered sample
with replacement, 6
without replacement, 6
pairwise independence, 50
Pascal, 2
password problem, 55,106
paternity index, 47, 48
Poisson density, 64, 77,79, 91
Poisson density function, 65
Poisson process, 271
poker hand,14
population, 6
probability
conditional, 21
of eventual ruin, 92,94
of failure, 39
space, 37
of success, 39
pure birth process, 274
q-matrix, 282
random sample, 15
random variable, 61, 182
continuous, 184
discrete, 61
entropy, 134
extended real-valued, 62, 182
nonnegative integer-valued, 77
uncertainty, 134
independent, 72, 75, 110,127,206, 222
random walk, 148
drift, 159
symmetric, 159
three-dimensional, 164
two-dimensional, 164
range, 32
Rayleigh density, 216
relative frequency, 2
reliability, 218
run, 58
sample
random, 15
unordered, 7
Schwarz’s inequality, 118,255
set theory, 26
associative laws, 29
commutative laws, 28
de Morgan’s laws, 30
disjoint, 27
distributive laws, 29, 30
empty set, 26
equal, 26
intersection, 27
membership, 26
INDEX
mutually exclusive, 27
subset, 26
union, 27
sets, 26
spectral density function, 261
spectral distribution function, 261
standard deviation, 112, 211
standard normal density, 210
standard normal distribution function, 211
state space, 145
states, 145,278
stationarity property, 149
stationary density, 296
stationary sequence, 173
stationary transition function, 279
stationary transition matrix, 279
stationary transition probabilities, 145
Stirling’s formula, 161
stochastic matrix, 146
stratification, 46
subset, 26
sum of random number of random variables, 89
symmetric random walk, 159
tail probability function, 104
transition function, 279
transition matrix, n -step, 149
transition probabilities
351
n-step, 149
stationary, 145
triangle inequality, 256
uncertainty, 133,134
conditional, 139
joint, 138
uniform density, 79, 210, 231
variance, 239
mean, 239
uniform density function, 66
uniform probability measure, 183
union, 27
unit square, 199
universe, 26
unordered sample, 7
varX, 239
variance, 112,239
binomial density, 112
Poisson density, 113
of a sum, 119
uniform density, 112
Venn diagram, 27
weak law of large numbers, 115
weakly stationary, 173,258
well-ordering property, 35
(continued from front flap)
Trigonometry Refresher, A. Albert Klaf. (0-486-44227-6)
Calculus: An Intuitive and Physical Approach (Second Edition), Morris Kline. (0-486-40453-6)
The Philosophy of Mathematics: An Introductory Essay, Stephan Korner. (0-486-47185-3)
Companion to Concrete Mathematics: Mathematical Techniques and Various Applications, Z. A.
Melzak. (0-486-45781-8)
Number Systems and the Foundations of Analysis, Elliott Mendelson. (0-486-45792-3)
Experimental Statistics, Mary Gibbons Natrella. (0-486-43937-2)
An Introduction to Identification, J. P. Norton. (0-486-46935-2)
Beyond Geometry: Classic Papers from Riemann to Einstein, Edited with an Introduction and
Notes by Peter Pesic. (0-486-45350-2)
The Stanford Mathematics Problem Book: With Hints and Solutions, G. Polya and J. Kilpatrick.
(0-486-46924-7)
Splines and Variational Methods, P. M. Prenter. (0-486-46902-6)
Probability Theory, A. Renyi. (0-486-45867-9)
Logic for Mathematicians, J. Barkley Rosser. (0-486-46898-4)
Partial Differential Equations: Sources and Solutions, Arthur David Snider. (0-486-45340-5)
Introduction to Biostatistics: Second Edition, Robert R. Sokal and F. James Rohlf.
(0-486-46961-1)
Mathematical Programming, Steven Vajda. (0-486-47213^2)
The Logic of Chance, John Venn. (0-486-45055-4)
The Concept of a Riemann Surface, Hermann WeyL (0-486-47004-0)
Introduction to Projective Geometry, C. R. Wylie, Jr. (0-486-46895-X)
Foundations of Geometry, C. R. Wylie, Jr. (0-486-47214-0)
See every Dover book in print at www.doverpublications.com
MATHEMATICS
Introduction to
PROBABILITY THEORY
with
CONTEMPORARY APPLICATIONS
Lester L. Helms
his introduction to probability theory transforms a highly
abstract subject into a series of coherent concepts. Its
extensive discussions and clear examples, written in plain
language, expose students to the rules and methods of prob­
ability. Suitable for an introductory probability course, this
volume requires abstract and conceptual thinking skills and a
background in calculus.
T
Topics include classical probability, set theory, axioms, prob­
ability functions, random and independent random variables,
expected values, and covariance and correlations. Additional
subjects include stochastic processes, continuous random
variables, expectation and conditional expectation, and con­
tinuous parameter Markov processes. Numerous exercises
foster the development of problem-solving skills, and all
problems feature step-by-step solutions.
9780486474182
12/04/2019 14.06-3
wun-W: 0-486-47418-6
SEE EVERY DOVER BOOK IN PRINT AT
WWW.DOVERPUBLICATIONS.COM
23
Download