Week 6 notes : Continuous random variables and their probability

advertisement
Week 6 notes : Continuous random variables and their probability densities
WEEK 6 page 1
uniform, normal, gamma, exponential,chi-squared distributions, normal approx'n to the binomial
Uniform [0,1] random variable : For a simple example of a continuous random variable we consider
choosing a value between 0 and 1 (hence lying in the interval [0,1] ) in which any real number in this
interval is equally likely to be chosen. To obtain such a uniform [0,1] random variable we define the
probability P a X b that X lies between the values a and b with 0≤ab≤1 to be the portion
of the area lying between the values x = a and x = b under the curve having constant height 1 for x
between 0 and 1 and height 0 elsewhere f  x =1 if 0≤x≤1 ( f  x =0 elsewhere ) . In this way
the probability only depends on the length b−a of the sub-interval [a,b] hence is uniformly spread
over the interval [0,1].
Uniform [a,b] random variable : Generalizing this we can next consider a uniform [a,b] random
variable as one having density which is constant on the interval [a,b] and 0 elsewhere. The constant
1
for axb is determined from the condition that the total area under the
density f  x =
b−a
density curve y= f  x (= total probability) equals 1. ( rectangle of width b−a having area 1
determines the height f  x of the rectangle )
Probability density function : An arbitrary continuous random variable X is similarly described by its
probability density function f  x = f X  x meaning again that the probability that X lies between
two values a and b is the integral of the density f  x on this interval which is the area under the
density curve. In symbols
b
P a≤ X ≤b=∫ f  x dx = area under the curve y = f  x  between a and b .
a
We can regard this as the probability measure of the set (event) which is the interval [a , b ] . There
will in general be certain sets which can not be obtained as countable unions or intersections of
intervals for which one is not able to define a probability. However in practice this rarely comes up.
(Technically the sample space can be abstract and what we have really defined is the probability (acting
on the actual sample space ) of the inverse of the function X . This inverse maps from the reals back to
the sample space. This induces a probability measure P  X −1 defined on sets of reals and so for all
practical purposes we can view events as sets of real numbers without loss of generality. )
For a continuous random variable the probability that X equals any particular value is 0 since according
to our above definition this is P  X =a= P a≤ X ≤a which is given by the integral of the density
f  x from a to a which is 0 . Consequently for a continuous random variable, whether we include
an endpoint of an interval or not in calculating the probability makes no difference :
P a≤ X ≤b= P aX b .
Since probabilities are determined by densities and visa versa, from our axioms of probability we must
require that
1) 0≤ f  x  (the density is non-negative)
∞
2)
∫ f  x dx =1
The first requirement is to insure that we don't get negative probabilities
−∞
which are forbidden, and the second just says that the total probability is 1 .
The interpretation of the density is that it gives the local probability per unit length along the x-axis
in the sense that for x lying in a small enough interval say [a , a x ] of width  x provided the
density is a continuous function at x=a , the probability is approximately
f a⋅ x≈ f  x ⋅ x that the random variable X lies in this interval :
WEEK 6 page 2
a x
P aX a x =
∫
f  x dx ≈ f  x⋅ x for x near a .
a
Said differently, if we define the (cumulative) distribution function (cdf )
x
F  x=F X  x=P  X ≤x= ∫ f t  dt
−∞
(Johnson just calls it the distribution function) for which one easily verifies that
F b−F  a=P a X ≤b
(i.e. area to the left of b under the density curve minus area to the left of a equals area between a and
b)
then by the fundamental theorem of calculus, the derivative of the cdf F  x is just the density
f  x :
F a x− F  a
P  a X a x
d
F  x = f  x  = lim
= lim
dx
x
x
 x 0
 x 0
where we have recalled the definition of the derivative of F  x  . We see that the above
approximation for the probability holds exactly in the limit  x 0 .
Definition of expected value of X : The main change in passing from discrete to continuous random
variables is the replacement of the discrete sum by the continuous integral. For a continuous random
variable X with density f  x , the expected value is defined as
∞
=E [ X ]=∫ x f  x dx
−∞
and one can show that for any random variable which is a measurable function h  X  of the random
variable X (which we won't define precisely here but which will be the case for almost any function
occurring in practice) the expected value is
∞
E [ h X ]= ∫ h  x f  x dx .
−∞
In particular this gives for the variance the formula
∞
∞
V  X =E[ X− ]=E [ X ]− since E[ X− ]= ∫ x− f  x dx=∫ x 2 f  x dx−2 =E[ X 2 ]−2
2
2
2
2
2
−∞
−∞
EXAMPLE 1 : (like HW problems 5.2, 5.3, 5.13 ) : Consider a random variable X having the density
function f  x=k x 5 for 0 x1 ( f  x=0 else ).
1
2
a) Find the constant k which makes this a density and then find P   X   :
3
3
1
k
5
The requirement that the total probability is 1 translates as 1=k ∫ x dx=
which gives k =6 .
6
0
2 /3
1
2
2 6 1 6 7
P  X  =∫ 6 x 5 dx=  −  =
.
3
3 1 /3
3
3
81
b) Find the mean and variance of X :
1
1
6
5
6
E [ X ]=∫ x⋅6x dx=∫ 6 x dx= .
7
0
0
1
E [ X 2]=∫ 6 x 7 dx=
0
Thus
3
.
4
WEEK 6
page 3
2
3 6
3
 2 =E [ X 2 ]− 2= −  =
4 7
196
gives the variance.
c) Find the cumulative distribution function of X :
x
F  x =∫ 6 t 5 dt = x 6 for 0x1
0
(
F  x=0 for x≤0 and F  x =1 for x≥1 )
d) Using this distribution function find
Also find
P .3 X .5 :
5 6
3 6
P .3 X .5=F .5−F .3=  −  =.014896
10
10
P  X .9 :
6
9
P  X .9=1−P  X ≤.9=1−F .9=1−  =.468559
10
A Standard Normal random variable Z is one with mean 0 and variance 1 having density given by
the symmetric bell-shaped curve :
f  x ;0, 1=
1 −x /2
e
.
 2
2
For such a normally distributed random variable we write
Z ~ N(0,1) .
It is common to use the letter Z to indicate a standard normal.
Normal random variable : A random variable X has normal distribution with parameters mean 
and variance  2 , in symbols
X ~ N  , 2 
if its density is of the form
f  x ; ,  2 =
1
e−x− /2 
 2
2
2
One can show by the definition of expected value that the parameters  and  2 really do give the
mean and variance of this density.
It is a fact that if X ~ N  , 2  is normally distributed with parameters  and  2 then the
standardized random variable
X −
Z =
~ N 0,1

has a standard normal distribution with mean 0 and variance 1.
That we get the stated mean and variance is because constants factor outside expectations and the
expected value of a sum or difference is the sum or difference of the expected value so that
X − 1
E[
]=  E [ X ]− =0 since E [ X ]= .


Also using the definition of the variance , we find that when computing the variance of a constant times
a random variable, the constant gets squared before pulling it outside the variance WEEK 6
so that
X −
1
V[
]= 2 V [ X −]=1


2
as claimed ( since V [ X −]=V [ X ]= )
page 4
To see that the distribution is still in fact normally distributed, observe that the cumulative distributions
of the random variables Z and X are related by :
X −
F Z  z =P 
z =P  X  z =F X  z =F X  x

where x= z  . Then by the chain rule, and noting that  x−2 / 2  2 = z 2 /2 one has
d
d
dx
1
1 −z / 2
f Z  z= F Z  z = F X  x =
e−z / 2⋅=
e
dz
dx
dz  2 
2 
2
2
which is the standard normal density.
1
is the correct one to use in the standard normal
2 
There is a trick used to show that the constant
density. Writing
∞
2
I =∫ e−x / 2 dx
−∞
One notes that this integral stays the same when one changes the dummy variable of integration from x
to y so that one has
∞
I =∫ e
2
2
−x /2
∞
dx ∫ e
−∞
2
− y /2
∞
dy =
−∞
∞
2
2
∫ ∫ e− x  y  / 2 dy dx
−∞ −∞
Re-arranging the order of integration can be done since the y integral is a constant with respect to x .
But now if we change to polar coordinates for the radius r going from 0 to ∞ one has
x 2 y 2=r 2
and after breaking up the plane into circular donut shaped annulus regions of radius r and
infinitesimal thickness dr and integrating out the angle  the area element dx dy turns into
2  r dr so we find the above integral becomes
∞
2
I 2 = 2∫ e−r /2 r dr = 2
0
which verifies the claim since we must choose the constant k so that kI = 1 .
That the mean of the standard normal is 0 is a consequence of symmetry since
∞
 2 ⋅E [ X ]=∫ x e
−∞
2
−x /2
dx=lim

0
∫ xe
R  ∞ −R
2
−x / 2
R
2

dx∫ x e− x / 2 dx = 0
0
(To see that the first integral on the right is minus the second we change variable from x to -x in that
integral yielding the product of 3 minus signs which is an overall minus the 2nd integral : the first minus
comes from replacing x by -x , the second by replacing dx by – dx and the third because the limits of
integration for -x have changed from -R to 0 for x to R to 0 for -x which requires the interpretation that
we must add a minus sign when flipping the limits from R to 0 back to 0 to R. )
To see that the variance of the standard normal really is 1 , since the mean is 0 the variance is just
∞
2
I =E [ X 2] where  2  I = ∫ x 2 e−x /2 dx & we can either do a polar coordinate trick WEEK 6 page 5
−∞
2
∞
2
2 −x / 2
2  I =∫ x e
−∞
∞
2
2 −y /2
dx ∫ y e
∞
∞
2
dx=∫ ∫ x y e
−∞
2
2
2 − x  y  / 2
2 ∞
2
dy dx=∫ ∫  rcos 2 rsin 2 e−r /2 r dr d 
0
−∞ −∞
0
like we did before { but this now involves an integration over the angle  and requires the trig
identities
sin 2 =2 sin  cos  and sin2 2 =1−cos 4 /2
This cos 4  term vanishes (gives 0) after doing the  integration. One changes variables to
u=r 2 /2 so du=r dr . This is a good exercise in integration. }
or : However it is considerably easier to simply integrate by parts, namely
∞
∞
2
2
 2  2=∫ x 2 e−x / 2 dx=∫ x⋅d −e−x /2 
−∞
−∞
2
is of the form for which integration by parts applies with u=x and v=−e−x / 2 so that using
∫ u dv=uv −∫ v du now the uv term vanishes at the limits of integration so the integral reduces to
∞
2
 2 =∫ e −x / 2 dx /  2 =1 since this is just the total area under the standard normal curve which we
−∞
know equals 1. (This integral was evaluated earlier by the polar coordinate trick above.)
To see the mean of the normal distribution is really  we have
∞
2
2
E [ X ]= ∫ x e− x− /2  dx /  2 
−∞
which by the change of variables
∞
=
2
z= x−/ hence dx= dz becomes
∞
∞
2
2
∫  z  e−z / 2 dz /  2 =  ∫ z e−z / 2 dz /  2  ∫ e−z / 2 dz / 2 = 0⋅1=
−∞
−∞
−∞
which proves the claim since we already saw that the first integral on the right vanishes by symmetry
and the second is just the total area under the standard normal curve which is 1.
To see that the variance of the normal distribution really is  2 , we have
∞
2 − x−2 / 2 2
V [ X ]=∫  x− e
dx /  2 
−∞
which by the same change of variable as above becomes
∞
=
2
∫  z 2 e−z / 2 dz /  2
−∞
∞
2
=  2 ∫ z 2 e− z /2 dz /  2  =  2
−∞
since the rightmost integral is just the variance of the standard normal which we have seen is 1.
EXAMPLE 2 : a) Problem 5.31 from the text : The specifications for a certain job calls for washers
with an inside diameter of .300±.005 inch . If the inside diameter is normally distributed with mean
=.302 and standard deviation =.003 what percentage of the washers will meet the
specifications?
Let
X = inside diameter of a randomly selected washer
X −.302
Then the standardized variable Z =
will be standard normal N(0,1)
.003
The question translates into what is the probability
P .300−.005 X .300.005
WEEK 6 page 6
.300−.005−.302
X −.302 .300.005−.302
Z =


.003
.003
.003
= P −7/ 3Z 1= F 1−F −2.333=.8413−.0098=.8315=83.15%
= P
Note that in using the cumulative distribution function for the standard normal given in table 3 of
Appendix B at the back of the book, the more accurate value .0098 for F(-2.333) is obtained by linear
interpolation of the values F( -2.33 ) =.0099 and F( - 2.34 ) = .0096 since it lies one third of the way
between these two and one third of .0003 is .0001 which we subtract from .0099.
b) For the above problem, for what value of the inside diameter is the probability 98% that the washer's
inside diameter will exceed this value ?
x 0 say for which P  X  x0 =.98 or equivalently we first find
x −.302
X −.302
the standardized value z 0= 0
for which the standardized variable Z =
satisfies
.003
.003
P  Zz 0=.98 . Then we have P  Z≤z 0=1−.98=.02 which from table 3 of Appendix B is the
linear interpolation between P  Z −2.05=.0202 and P  Z −2.06=.0197 The value .02 is .0002
less than the first value which lies .0005 above the 2nd value so we want to go 2/5 or 4 tenths of the way
between the two values or z 0=−2.054 which says
x 0=.003⋅z 0.302=.003⋅−2.054.302=−.006162.302=.2958 inch
c) What is the probability that the inside diameter is at least .305 ?
X −.302 .305−.302
P  X .305=P Z =

=1 = P  Z 1 = 1− P Z 1
.003
.003
= 1−.8413=.1586 or 15.86
Note that by the symmetry of the standard normal curve, we could have also obtained the answer via
P  Z1=P  Z−1=.1586
which can be found directly from table 3.
We wish to find the diameter
Had we asked what is the probability that the inside diameter is at most .305 we would have wanted
P  X .305=1−P  X .305= P Z 1=.8413
Note that the event that a standard normal lies at least 1 standard deviation from the mean has double
the probability that the inside diameter is at least .305, namely
P  Z −1 or Z1=2P Z −1=2.1586=.3172
Review question : For a Poisson random variable with parameter
deviation ?
=5 what is the standard
The parameter for a Poisson equals both the mean= 5 and the variance =5 so the standard deviation of
this is by definition just the square root of the variance which is  5 .
Recall the above Poisson random variable has distribution
The z-critical value
by
P  X =k =
5 k −5
e .
k!
z  is the 1001− percentile of the standard normal distribution defined
=P Z z 
so that 1−=P  Z≤ z   . That is  is the area under the standard normal curve WEEK 6 page 7
to the right of z  and 1− gives the area to the left of z  . By symmetry of the standard
normal curve the z-value for which the area to the left of this value is  (hence the area to the right is
1− ) is
z 1− = − z 
From table 3 of Appendix B we find
z .01=2.33 and z .05=1.645 .
Note this says that there is a 98% chance that a standard normal random variable lies within 2.33
standard deviations from 0 (the mean) which is to say that Z lies between
−z .01=−2.33 and z .01=2.33 since .02 is the probability of the complementary event that Z is either
greater than 2.33 or that Z is less than -2.33 by symmetry since both of these disjoint events have the
same probability .01.
Note that to say that roughly 95% of the area under the standard normal lies within two standard
deviations of the mean is to say that z .025 is close to 2 or equivalently that the cumulative distribution
function F(-2) is close to .025 ( table 3 of Appendix B gives this as F(-1.96 )) From this table it would
thus be more accurate to say that 95.44% lies within two standard deviations and that 95% lies within
1.96 standard deviations of 0 (the mean) .
Normal Approximation to the Binomial distribution
We have already encountered the Poisson approximation to the binomial distribution which was
derived in the regime for which the number of trials or sample size n is large (exactly Poisson in limit
n∞ )
while success probability p is small so that the mean number of successes in n trials np= is fixed.
The normal approximation works in the different regime where both np15 and n1− p15 .
For n sufficiently large it is possible that both regimes overlap so that both approximations are valid
and yield approximately the same answer. .
Recall that for a binomial random variable with parameters n and p
X = the number of successes in n Bernoulli trials
we can regard X as a sum of n independent identically distributed (i.i.d.) Bernoulli ( 0 or 1 valued)
random variables each having expected value the success probability p. Since the mean and standard
deviation of this binomial random variable we know are E [ X ]=np and  X =  n p1− p one then
has
Theorem 5.1 of the text (normal approximation to the binomial distribution) :
X −np
The standardized variable Z=
is approximately standard normal and becomes exactly
 np 1− p
so in the limit as n  ∞ . This will generally be a good approximation provided both
np15 and n1− p15 so that implicitly n30 (Jay Devore's Statistics for Engineers book
≥10 instead of > 15 ) This is a special case of the
uses
n
Central Limit Theorem (Theorem 6.2 of the text). Any sum
S n=∑ X i of a large number of i.i.d.
i =1
(independent identically distributed) random variables X i each having mean =E [ X i ] and finite
variance V [ X i ]= 2 ( n > 30 in practice is usually large enough ) is approximately normally
distributed. As we have seen the standardized variable will then be (approximately) standard normal.
Specifically
WEEK 6
page 8
S n−n 
x −
=
 /n
 n
is approximately standard normal for large n.
Z=
To get the second expression on the right we have divided numerator and denominator of the quotient
S
on the left by n . Here x = n is the sample mean. Since x is a sum it is approximately normal for
n
large n. Since constants factor outside expectations and the expected value of a sum is the sum of the
n
1
1
expected values we have E [ x ]= ∑ E [ X i ]= n = which says that
n i=1
n
the expected value of the sample mean is the population mean :
E [ x ]=
(said differently the sample mean is an unbiased estimator for the population mean).
Similarly since the variance of a sum of independent random variables is the sum of the variances but
constants get squared before being pulled outside the variance we have
2
2
2
1
1
V [ x ]=
V [S n ]=
n  2=
n
n
n


so that the standard deviation of the sample mean x is
 x= /  n
The continuity correction is typically used when approximating a discrete count random variable like
the binomial random variable X by a continuous random variable as in the standard normally
distributed Z in the above normal approximation to the binomial. That is we view the discrete event
say
{ X =14 }
as equivalent to the continuous event
{13.5 X 14.5 } .
for purposes of approximating the standardized variable by a standard normal one by Theorem 5.1.
Similarly for the events
{ X ≤14 }={ X ≤14.5 }
and for
{ X ≥14 }={ X ≥13.5 }
We would have for example the approximation that a standard normal random variable Z satisfies
13.5−np
X −np
14.5−np
P  X =14=P 
 Z=


 np 1− p 
 np 1− p  np 1− p
To see that some kind of continuity correction is needed , note since continuous random variables take
specific values with probability 0, there is 0 probability that a standard normal takes the specific value
14−np
corresponding to the single value X =14 . However, the density function of the
 np 1− p 
standard normal evaluated at this value in the middle of the above interval does make sense and when
this function value is multiplied by the length of the small interval 1/  np1− p (proportional to
1/  n ) we get an approximation to the integral of the nearly constant density. The central limit
theorem applied to discrete integer valued random variables possibly re-scaled (or “lattice valued”) is
usually stated in this way by evaluating the density at a specific location and multiplying by the length
of the interval. Higher order corrections involving a polynomial times the standard
normal density are at times used.
WEEK 6 page 9
EXAMPLE 3 Thus when flip a fair coin 36 times so n = 36 and p = 1/2 we'd have the chances of
seeing 14 heads is approximately (with np=36 /2=18 and  np 1− p =3 ) using table 3 of
Appendix B for the standard normal cumulative distribution function F (x) which is the area to the left
of x under the standard normal curve :
13.5−18
X −18 14.5−18
P  X =14=P 
 Z=


3
3
3
= P −3/ 2Z −7/6=F −7/ 6−F −3 / 2=F −1.166−F −1.5
= .1216−.0668=.0548
where we have used linear interpolation to arrive at F( -1.166) which is 2/3 of the way between the two
values F( -1.16) =.1230 and F(-1.167) =.1210 whose difference .0020 is roughly .0021 so 2/3 of this
is .0014 less than .1230 equals .1216. Note again that the probability
P aZ b=F b −F a
equals the difference of the cumulative distribution function F for these values since the probability
that the standard normal variable Z lies between a and b is just the area under the standard normal
curve between a and b which is the area F(b) to the left of b minus F(a) ( the area to the left of a )
EXAMPLE 4 Problem 5.37 of the text : The probability that an electronic component will fail in less
than 1000 hours of continuous use is 0.25. Use the normal approximation to find the probability that
among 200 such components, fewer than 45 will fail in less than 1000 hours of continuous use.
We have the probability of failure in less than 1000 hours of continuous use is the binomial parameter
p = 0.25
while the the sample size parameter is
n = 200.
Our binomial random variable is then
X = The number of the 200 components which fail in less than 1000 hours of continuous use
which has mean E [ X ]=np=50 and standard deviation  X = np1− p= 50⋅3/4=5/ 2  6 We
want to find (using the continuity correction )
X −50 44.5−50 −11/2
P  X 45=P  X 44.5= P Z =

=
=−.8981
5 /2  6 5/ 2  6 5/2  6
(Be careful here to use 44.5 and not 45.5 ! ) From table 3 of Appendix B ,
P  Z −.8981≈P  Z −.90=.1841
The mean and variance of a uniform [a,b] random variable X :
To see that the mean of a uniform [a,b] is
ab
=
2
and the variance of a uniform [a,b] is
1
2
2
 = b−a  ,
12
recall such a uniform random variable has constant density on the interval of x values from a to b
of width b-a so since the total probability = 1 is just the area under this rectangle , the constant density
which is the height f  x  of the rectangle must equal 1/(b-a) . Since the density is 0 outside of the
interval [a,b] , we have for the mean
b
b

2
2

1
1 b a
ab
=E [ X ]=∫ x f  x dx=
x dx=
− =
and to compute WEEK 6 page 10
∫
b−a a
b−a 2 2
2
a
the variance we have
b
b
1
1 b3 a3 a 2abb2
2
2
E [ X ]=∫ x f  x dx =
x 2 dx =
− =
. Subtracting off
∫
b−a a
b−a 3 3
3
a
2
2  ab
=
and simplifying gives the result (since  2 =E [ X 2 ]−2 ) .
4


EXAMPLE 5 problem 5.47 of text : From experience Mr. Harris has found that the low bid on a
construction job can be regarded as a random variable X having the uniform density
3
2C
f  x =
for
 x2C
4C
3
and f  x =0 elsewhere, where C is his own estimate of the cost of the job (what it will cost him).
What percentage z should Mr. Harris add to his cost estimate to maximize his expected profit?
If Mr. Harris bids (1+z) C , if the low bid X is less than his bid he will make no profit (and have no
costs) but he will be paid (1+z) C if the low bid is greater than his bid of (1+z) C so that he wins the
job but it will cost him C so his net profit is then z C. Thus his expected profit is just his profit zC times
the probability of obtaining that profit (which is the probability that the low bid exceeds his own) or
3
3
zC⋅P 1 z C X 2C= zC
 2C−1z C  = z 1−z C .
4C
4
This will be maximized when z(1-z) is maximized so when he charges a z =1/2 = 50% mark up over
his actual cost estimate.
The log-normal distribution arises when we have a random variable X whose logarithm log X has a
log X −
normal distribution with mean  and variance 2 so that the standardized variable Z =

is standard normal. Note that for such a random variable X > 0 since otherwise the log would not exist.
The density of a log-normal random variable X is
1
=
x−1 e−log x− / 2  for x0 , 0
2  
2
2
and is 0 elsewhere.
To see this note that the cumulative distribution function of X is related to the cdf of a standard normal
random variable Z by
log X − log x−
log x−
F X  x=P  X ≤ x=P  Z =
≤
=F Z 
=F Z  z 



log x−
where z=
. Recall that by the fundamental theorem of calculus the density is the derivative

of the cumulative distribution function so
log x−
d
d
d
dz
f X  x= F X  x = F Z 
= F Z  z ⋅
dx
dx
dz
dx

dz
1 − z /2 1
= f Zz =
e ⋅
dx  2 
x
2
since by the chain rule we first compute the derivative of
F Z  z  with respect to z which gives the
log x −
multiplied by the
WEEK 6 page 11

derivative of z with respect to x. Plugging in the expression for z in terms of x gives the result.
standard normal density in the variable
z=
The Gamma distribution and its special cases the exponential and chi-squared distributions :
This distribution has density
1
−1 −x/ 
f  x= 
x e
for x0,0 ,0
  
and is 0 elsewhere. Here   is the gamma function defined by
∞
 =∫ x −1 e−x dx
0
which satisfies  =−1  −1 and hence  =−1! when  is a positive integer
which can be seen by integration by parts.
The mean of the gamma distribution is
= 
and the variance of the gamma distribution is
 2 = 2 .
The exponential distribution corresponds to the special case where =1 which gives for the
density of an exponential random variable :
1
f  x= e−x/  for x0, 0

and f(x) = 0 elsewhere.
The mean and variance of an exponential random variable are then
= and  2= 2 .
Alternate form of the density of an exponential random variable in terms of the parameter
=1/ 
− x
f  x= e
for x0,0
so that the mean and variance are then =1/ and  2=1/2 .
The cumulative distribution function for an exponentially distributed random variable :
x
F X  x =P  X ≤ x=∫  e− t dt = 1−e− x
= 1−e−x/  
0
Note this says that
P  X x  = 1− F X  x  = e − x
To good approximation the lifetime T of a tungsten filament light bulb is exponentially distributed.
There we would replace x above by a time t in which case P T t = e− t gives the probability that
the lifetime of the bulb exceeds the time t.
The exponential distribution is the only one which has the
Memoryless Property : an exponentially distributed lifetime random variable T is the only distribution
which has the property that for positive times t and s
P T ts | T s= P T s ( memoryless property )
This says for example that if the light bulb has lived for 100 years, the distribution of the remaining
lifetime left is the same as if the bulb were brand new !
We won't show the “only” part but the memoryless property itself is a simple
consequence of the definition of conditional probability since
WEEK 6
page 12
PT ts e − ts −t
P T t s |T s=
= − s =e =P T t
PT  s
e
Note above that the intersection event {T ts and T s } is the same as the event {T ts }
since the latter implies {T s } as well.
Relationship between Poisson process and exponential random variables : Recall a Poisson process
X(t) = number of events occurring in a time interval of length t
with mean arrival rate  ( = the expected number of events occurring per unit time) is one with
 t k − t
P  X t=k =
e
k!
Then the waiting time until the first arrival (first event) or the time between successive arrivals
(events) has an exponential distribution with parameter =1/ = . Again waiting times until the
first event or
Waiting times between events in a Poisson process are exponentially distributed.
To see this note that for the waiting time T =T 1 until the first event , saying that T exceeds time t is
the same as saying that 0 events have occurred in the Poisson process :
P T t=P  X t =0=e− t
but this is exactly the tail probability of an exponential random variable with parameter =1/ =
( The complementary event is exactly the cumulative distribution function of an exponential. )
Similarly for the waiting time T =T 2 between the first and second events we have
P T 2t | T 1=s=P  0 events in (s, s+t] | T 1=s 
= P 0 events in (s, s+t] =e − t
The last two equations follow by the independence and the stationarity (i.e identical distribution) of
disjoint equal time increments assumptions of the Poisson process. But then by the law of total
probability
P T 2t=∑ P T 1=s  P T 2t | T 1=s =e− t ∑ P T 1=s =e−t
s
s
(The first sum is a kind of conditional expectation = E [ P T 2t | T 1 ] ) Similarly one sees that all
the inter-arrival times (waiting times between two successive events) are independent and identically
distributed exponential random variables.
The Poisson assumptions are often satisfied hence Poisson processes arise naturally in the real world
and so exponential random variables often occur as waiting times between Poisson events. Recall
the
Assumptions used to obtain Poisson distribution as a limit of Binomial probabilities were :
1) Probability of 1 event occurring in a small time interval proportional to the length  t=t /n
of the time interval, with constant  but does not depend on when the interval starts so :
2) identically distributed on each time sub-interval (1 event happens with probability p=  t )
(A process not depending on time is called a stationary or time homogeneous process) ,
3) events in disjoint time intervals are independent,
4) Probability of more than one event occurring in a small time interval is negligible : This is what
insures only two possible outcomes (either 0 events happen or 1 happens) WEEK 6 page 13
in a small enough time interval hence Bernoulli 0 or 1 valued trials in the small sub-intervals of
time but the sum of a large number of Bernoulli r.v.'s gives a Binomial number of events in the
large time interval.
Interpretation of the parameter of an exponential random variable : We can view the parameter
 of an exponential random variable as the rate per unit time at which events happen (in the light
bulb case the event that the light bulb dies) whereas =1 / gives the mean time until the event.
This is similar to the situation for geometric random variables where the probability 1/6 that a roll of a
six sided fair die produces a 3 (success) we can regard as the rate at which successes (3's) happen per
roll of the die while the expected time 6 until a 3 is rolled corresponds to the mean time until the event.
EXAMPLE 6 : A skillful typist has a low probability of making a typo on a typed page of text. There
are a large number n of characters on a page n =2500 say. Assuming typo errors occur independently
for each character with the same small probability p, the binomial (2500 t, p) number of typos on t
pages is then well approximated by a Poisson process with parameter  t giving the mean number of
errors for t pages and =np=2500 p the mean number of errors in a single page.
The waiting time T (measured in number of pages typed which may be a fraction of a page) until the
first error occurs or between two successive errors is then approximately exponential with parameter
 . Note that if =3 gives the mean of the Poisson number of errors on a page then we expect to
wait for “time” 1/3=1/ of a page until the first error occurs which agrees with the formula for the
mean of an exponential, in this case the exponential waiting time until the first Poisson event (a typo)
occurs. We are viewing time as continuous when in fact the smallest fraction of a page here is 1
character or 1/2500 of (the number of characters making up) a page. In reality the time we have to wait
in units of number of characters typed or 1/2500 of a page is a geometric random variable with small
parameter p and is only approximately exponential.
P  X =k =P T =t=P t−1/ 2 tT T 1 /2  t (<--continuity correction)
= the probability we ' ll wait for k characters to be typed until the first error
with X a geometric random variable, T = X /n is the waiting time with t=k /n measured in pages
P  X =k =1− pk−1 p where p=/n=/2500 .
By the same approximations used to approximate a binomial by a Poisson random variable, n large, p
small, np= fixed, for “time “ t pages typed the number of characters typed is k =n t=2500 t
so that with  t=1/n=1 /2500 of a page giving the “time” to type one character
t = k /2500 = k /n and P  X =k =1− /nnt −1 /n =P T =t≈  e− t  t
Relationship between exponential and geometric r.v.'s : The geometric random variable is the
discrete analogue of the exponential and becomes exponential in the limit in which we take
k =[nt ] is the greatest integer less than or equal to nt so that time is then approximately t≈
k
n
1
gives the time interval between successive values of k and probability per unit time
n

scales as in the Poisson process as p=
(same assumptions and same approximations used ) so that
n
 [nt ]−1 
k−1
P  X =k =1− p p = 1− 
≈ e−t / n =  e− t  t
n
n
= P  k −1/2/n X /nk 1/ 2/ n = P t− t/ 2T =X / ntt /2
and
t=
using the continuity correction. In the limit as n ∞ with np= fixed,
WEEK 6 page 14
we see the geometric random variable X re-scaled by 1/n is then approximated by an exponential r.v. T
with parameter 
EXAMPLE 7 Suppose calls arrive at a telephone switchboard according to a Poisson process with rate
one call every four minutes on average (so mean number of calls per minute is .25) Then
X = the number of minutes between successive calls
is an exponential random variable with mean rate =.25 (or equivalently =4 ) so
a) the probability that more than 4 minutes elapse between successive calls is
−.25 4
−1
P  X 4=e
=e =.368
We could also have obtained this via the complementary event :
−.25 4
−1
P  X 4 = 1−P  X ≤4=1− F X  x =1−1−e
=e
b) The mean time between successive calls is
1
= 4 minutes .

For exponential random variables the mean is also the standard deviation so
the standard deviation of X the time between successive calls is also
 X = 1 =4 minutes

c) Find the probability that a call arrives in less than 3 minutes :
−.25 3
P  X 3=F 3=1−e
=.5276
E[ X ] =
d) Find the probability that no calls arrive in a 8 minute interval :
P  X 8=e− .258=e−2=.1353
Note that this is also the probability of deviating by at least 1 standard deviation from the mean.
e) Find the probability that the next call arrives sometime between the 2nd and 3rd minute after the last
call
P 2 X 3=F 3− F  2=1−e−.25 3−1−e −.252 =e −.25 2 −e− .25 3=.60653−.47236=.1342
EXAMPLE 8 Consider 4 identical components connected in series, each of which has an exponential
lifetime X i with parameter =.01 independent of the lifetime of the other components.
----1----2----3----4---- The system fails as soon as any of the components fails. Let
X = lifetime of the system = min X i ; i=1,2,3,4
= minimum of the lifetimes of each component
a) express the event { X t } that the system is functioning at time t in terms of the events
{ X it } ; i=1,2,3,4
{ X t }={ X 1t }∩{ X 2t }∩{ X 3t }∩{ X 4t }
b) What is the probability that the system functions at time t ?
By independence and the identical nature of the components this is just
P  X t=P  X 1t P  X 2t P  X 3t  P  X 4t =P  X 1t 4=e− t 4=e−4 t=e−.04 t
which is exactly the tail probability of an exponential random variable with parameter 4  . Thus the
system life is exponential with parameter 4  .
(Note : There is nothing special about the number of components being 4 here. Moreover the above
discussion illustrates how to find the
WEEK 6 page 15
distribution of the minimum of a collection of independent random variables from the distribution
of each of them )
It can be shown that the sum of independent gamma random variables each with the same 
parameter but arbitrary values i of the  parameter will also have a gamma distribution with the
same  and with parameter =∑ i = the sum of the i 's. Thus the
i
sum of n independent exponential random variables with the same parameter =1/  (which are
gamma with =1 ) is not exponential but rather a special case of the gamma distribution known as
the Erlang distribution (gamma with parameters =n and =1 / ).
The chi-squared distribution with parameter  corresponds to the special case of the gamma
distribution with = /2 and =2 and is the distribution of the X 2 random variable
n
n−1 S 2
1
2
X 2=
S
=
 X i −x 2 is the sample variance of a sample of size n drawn
when
∑
2
n−1

i =1
from a normal population having variance  2 . Here the parameter =n−1 is called the number
of degrees of freedom. The chi-squared distribution is thus essentially the distribution of the sum of
squares of normally distributed random variables and can easily be used to give the distribution of the
sample variance of a sample drawn from a normally distributed population. We only mention here its
relation to the gamma distribution but will postpone further discussion until chapter 6.
Download