lecture4

advertisement
Topic 4: Discrete Random
Variables and Probability
Distributions
CEE 11 Spring 2002
Dr. Amelia Regan
These notes draw liberally from the class text, Probability and Statistics for
Engineering and the Sciences by Jay L. Devore, Duxbury 1995 (4th edition)
Definition


For a given sample space S of some experiment, A random
variable is any rule that associates a number with each outcome
in S
Random variables may take on finite or infinite values
Examples: {0,1}, the number of sequential tosses of a coin in which
the outcome “head” is observed


A set is discrete either if it consists of a finite number of elements
or if its elements may be listed in sequence so that there is a first
element, a second element, a third element, and so on, in the list
A random variable is said to be discrete if its set of possible
values is a discrete set.
p( x)  P( X  x)  P(all s  S : X ( s)  x)
Definition

The probability distribution or probability mass function (pmf) of a
discrete random variable is defined for every number x by
p( x)  P( X  x)  P(all s  S : X ( s)  x)

The cumulative distribution function (cdf) F(x) of a discrete rv X
with pmf p(X) is defined for every number x by
F ( x )  P( X  x ) 
 P( y )
y: y  x
Examples
0.3
0.12


p(x)= 0.25
0.33


0
x=0
x=1
x=2
x=3
otherwise
0.5 x = 0

p(x) = 0.5 x = 1
0 otherwise

1/n x = 1,2,...n
p(x) = 
0 otherwise
Definition

Let X be a discrete rv with set of possible values D
and pmf p(x). The expected value or mean value of
X, denoted E(x) or mx is given by:
E(X) = mx   xp(x)
xD

If the rv X has a set of possible values D and pmf p(x),
then the expected value of any function h(X), denoted
by E[h(x)] or mh(x) is computed by:
E(h(X)) =  h(x)p(x)
D

Note: according to Ross(1988) p. 255 this is known as the
law of the unconscious statistician
Example

Let X be a discrete rv with the following pmf and corresponding cdf
0.10 x = 0
0
0.25 x = 1
0.10


0.35 x = 2
0.35
p (x )= 
F (x)= 
0.20
x
=
3

0.70
0.10 x = 4
0.90


0 otherwise
1.00
x<0
x  0
x  1
x  2
x  3
x  4
E ( x)   xp(x)=
D
0(0.1)+1(0.25)+2(0.35)+3(0.20)+4(0.10)=1.95

Now let h(x) = x2+2
E (h( x))   h(x)p(x)=
D
2(0.1)+3(0.25)+6(0.35)+11(0.20)+18(0.10)=7.05
Class exercise


Let X be a discrete rv with the following pmf
Calculate the cdf of X
0.15 x = 0
0.30 x = 1

p (x)= 
0.55 x = 2
0 otherwise




F (x )= 


Calculate the E(X)
E ( x)   xp(x)=
D

Now let h(x) = 3x3-100 -- Calculate h(x)
E (h( x))   h(x)p(x)=
D
Definition

For linear functions of x we have simple rules
E(aX+b) = aE(X)+b
E(aX) = aE(X)
E(X+b) = E(X)+b
Variance of a random
variable

If X is a random variable with mean m, then the
variance of X denoted by Var(X) is defined by
Var(X) = E  (X-m )2 

Recall that we previously defined the variance of a
population as the average of the squared deviations
from the mean. The expected value is nothing other
than the average or mean so this form corresponds
exactly to the one we used earlier.
Variance of a random
variable

Its often convenient to use the a different form of the
variance which, applying the rules of expected value which
we just learned and remembering the E[X] = m, we derive
in the following way.
2
Var(X) = E  (X-m )

= E  X 2  2m X-m 2 
= E  X 2   E  2m X  +E  m 2 
= E  X 2   2 m E  X  +m 2
= E  X 2   2 m 2 +m 2
= E  X2   m 2
= E X   E X
2
2
The Bernoulli distribution

Any random variable whose only two possible values are 0 and 1
is called a Bernoulli random variable.
Example: Suppose a set of buildings are examined in a western city for
compliance with new stricter earthquake engineering specifications.
After 25% of the cities buildings are examined at random and 12% are
found to be out of code while 88% are found to conform to the new
specifications it is supposed that buildings in the region have a 12%
likely hood of being out of code.
Let X = 1 if the next randomly selected building is within code and X = 0
otherwise.
The distribution of buildings in and out of code is a Bernoulli random
variable with parameters p= 0.88 and 0.12.
The Bernoulli distribution

We write this as follows
1 if the building is "to code"
X 
0 otherwise
p(0)  0.12
p(1)  0.88
p( x)  p( X  x)  0 for x  0 or 1
0.12 if x = 0

p(x)= 0.88 if x =1
0 otherwise

The Bernoulli distribution
In general form, a is the parameter for the Bernoulli distribution
but we usually refer to this parameter as p, the probability of
success

1-a if x = 0

p (x;a )= a if x =1
0 otherwise


The mean of the Bernoulli distribution with parameter p is
E(X) = xp(x)= (1) p  (0)(1  p)  p
The variance of the Bernoulli distribution with parameter p is
Var = E(X2 )-E(X)2 = (12 ) p  (02 )(1  p)  p2  p(1  p)
The Binomial distribution


Now consider a random variable that is made up of successive
(independent) Bernoulli trials and define the random variable X
as the number of successes among n trials.
The Binomial distribution has the following probability mass
function
 n x
p( x; n, p)    p (1  p)n x for n = 0,1,2,...n
 x

Remembering what we learned about combinations this makes
intuitive sense. The binomial coefficient represents the number
of ways to distribution the x successes among n trials. px
represents the probability that we have x successes in the n
trials, while (1-p)(n-x) represents the probability that we have n-x
failures in the n trials.
The Binomial distribution

Computing the mean and the variance of the binomial distribution
is straightforward. First remember that the binomial random
variable is the sum of the number of successes in n consecutive
Bernoulli trials. Therefore
X=X1 +X 2 +...+X n
E(X)=E  X1  +E  X 2  +...+E  X n 
E(X)=p+p+...p = np
Var(X) = (1) 2 Var(X1 )+(1) 2 Var(X 2 )...+...(1) 2 Var(X n )
Var(X) = p(1-p)+p(1-p)...+...p(1-p)
Var(X) = np(1-p)
The Binomial distribution
p(X=x)
b(x;5;0.1)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
4
5
X
b(x;5;0.3)
0.4
p(X=x)
0.3
0.2
0.1
0
0
1
2
3
X
4
5
The Binomial distribution
p(X=x)
b(x;5;0.5)
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
1
2
3
4
5
3
4
5
X
b(x;5;0.7)
0.4
p(X=x)
0.3
0.2
0.1
0
0
1
2
X
The Binomial distribution
p(X=x)
b(x;5;0.9)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
X
4
5
Class exercise




A software engineer has historically delivered completed code to clients
on schedule, 40% of the time. If her performance record continues, the
probability of the number of on schedule completions in the next 6 jobs
can be described by the binomial distribution.
Calculate the probability that exactly four jobs will be completed on
schedule
Calculate the probability that at most 5 jobs will be completed on
schedule
Calculate the probability that at least two jobs will be completed on
schedule
The Hypergeometric
distribution





The binomial distribution is made up of independent trials in which the
probability of success does not change.
Another distribution in which the random variable X represents the
number of successes among n trials is the hypergeometric distribution.
The hypergeometric distribution assumes a fixed population in which the
proportion or number of successes is known.
We think of the hypergeometric distribution as involving trials without
replacement and the binomial distribution involving trials with
replacement.
The classic illustration of the differences between the hypergeometric
and the binomial distribution is that of black and white balls in a urn.
Assume the proportion of black balls is p. The distribution of the
number of black balls selected in n trials is binomial(x;n;p) if put the balls
back in the urn after selection and hypergeometric(x;n;M,N) if we set
them aside after selection. (engineering examples to come)
The Hypergeometric
distribution

The probability mass function of the hypergeometric distribution is given
by the following, where M is the number of possible successes in the
population, N is the total size of the population, x is number of
successes in the n trials.
 M  N  M 
 

x
n

x

p( x; n, M , N )   
N
 
n 
The Hypergeometric
distribution

Example: Suppose out of 100 bridges in a region, 30 have been
recently retrofitted to be more secure during earthquakes. Ten bridges
are selected randomly from the 100 for inspection. What is the
probability that at least three of these bridges will be retrofitted?
 30  70   30  70   30  70 
        
0 10  1  9   2  8 

p( x  3)  1 


100 
100 
100 
 
 
 
10
10
 
 
10 
The Hypergeometric
distribution

The mean and variance of the hypergeometric distribution is given
below:
E( X ) 
nM
N
Var ( X ) 
N n
N 1
n
M
N
(1 
M
)
N
Sometimes we use p to refer to
M
N
(the proportion of the population with a particular characteristic)
Then E(X) = np and Var(X) =
N n
N 1
np(1  p)
The Hypergeometric and the
Binomial distributions

Note the following:
Sometimes we refer to M/N, the proportion of the population with a
particular characteristic as p.
Then E(X) = np and Var(X) =

N n
N 1
np(1  p)
This is very close to E[X] = np and Var(x) = pn(1-p) which are the mean and
variance of the Binomial distribution. In fact we can think of
N n
N 1
as a correction term which accounts for the fact that in the
hypergeometric distribution we sample without replacement.
Question: Should the variance of the Hypergeometric distribution with
proportion p be greater than or less than the variance of the Binomial
with parameter p? Can you give some intuitive reason why this is so?
The Hypergeometric and the
Binomial distributions
Example: Binomial, p = 0.40, three trials vs.
Hypergeometric with M/N = 2/5 = 0.40, three trials
3
P ( X  0)    (0.40) 0 (0.60) 3
 0
 3
P ( X  1)    (0.40)1 (0.60) 2
1 
3
P ( X  2)    (0.40) 2 (0.60)1
 2
 3
P ( X  3)    (0.40)3 (0.60) 0
 3
 2  3 
  
0 3
P( X  0)    
 5
 
 3
 2  3 
  
1 2
P( X  1)    
 5
 
 3
 2  3 
  
2 1
P( X  2)    
 5
 
 3
P( X  3)  0
The Hypergeometric and the
Binomial distributions



If the ratio of the n, the number of trials is small and N, the number in
the population is large then we can approximate the hypergeometric
distribution with the binomial distribution in which p = M/N.
This used to be very important. As recently as five years ago computers
and hand calculators could not calculate large factorials. The calculator
I use can not calculate 70!. A few years ago 50! was out of the
question.
Despite improvements in calculators its still important to know that if the
ratio of n to N is less than 5% (we are sampling 5% of the population)
then we can approximate the hypergeometric distribution with the
binomial.

Check your calculators now -- what is the maximum factorial they can
handle?
Class exercise




The system administrator in charge of a campus computer lab has
identified 9 machines out of 80 with defective motherboards.
While he is on vacation the lab is moved to a different room, in order to
make room for graduate student offices.
The administrator kept notes about which machines were bad, based on
their location in the old lab. During the move the computers were places
on their new desks in a random fashion so all of the machines must be
checked again.
If the administrator checks three of the machines for defects, what is the
probability that one of the three will be defective?

Calculate using both the hypergeometric distribution and the binomial
approximation to the hypergeometric.
The Geometric
distribution
The geometric distribution refers to the random variable represented by
the number of consecutive Bernoulli trials until a success is achieved.


Suppose that independent trials, each having probability p,
0 < p < 1, of being a success are performed until a success occurs. If
we let X equal the number of trials prior to the success, then
p(x;p)=(1-p)x p
x = 1,2,...
The above definition is the one used in our textbook. A more common
definition is the following: Let X equal the number of trials including the
last trial, which by definition is a success. Then we get the following:
P(x;p)=(1-p)x-1p
x = 1,2,...
The geometric
distribution
The expected value and variance of the geometric distribution is given
for the first form by:

(1-p)
(1-p)
E(X)=
, Var(x)= 2
p
p

For the second form, the expected value has more intuitive appeal. Can
you convince yourself that the value is correct?
1
(1-p)
E(X)= , Var(x)= 2
p
p

Please note that the variance is the same in both cases.

Explain why this is so.
The geometric
distribution
E(X)= x p( x)
Trial
1
2
3
.
.
D

Probability of first success on trial
p
p(1-p)
2
p(1-p)
.
.
We can derive E(X) in the following way:
E(X)=p+2pq+3pq 2 +...
qE(X)=pq+2pq 2 +3pq 3 +...
E(X)-qE(X)=p+pq+pq 2  ...
E(X)(1-q)=1
E(X)(p)=1
E(X)=
1
p
The geometric
distribution
Let p = 1-q and remember that E(X)= x p( x)
D
E(X) = (0)p(x = 0) + (1)p(x = 1) + (2)p(x = 2) + ...
E(X)=p+2pq+3pq 2 +...
qE(X)=pq+2pq 2 +3pq 3 +...
E(X)-qE(X)=p(1+q+q 2 +q 3 ...)
 1 
E(X)(1-q)=p 

1-q


E(X)(p)=1
1
E(X)=
p
These steps are so that we
can work with the geometric series
1+x+x2+x3+…=1/(1-x) so
p(1+q+q2+…)=p(1/1-q)
Here we just substitute p for 1-q
The geometric
distribution
Try deriving the variance of the geometric distribution
by finding E(X2)
Poisson Distribution





One of the most useful distributions for many branches of
engineering is the Poisson Distribution
The Poisson distribution is often used to model the number of
occurrences during a given time interval or within a specified region.
The time interval involved can have a variety of lengths, e.g., a
second, minute, hour, day, year, and multiples thereof.
Poisson processes may be temporal or spatial. The region in
question can also be a line segment, an area, a volume, or some ndimensional space.
Poisson processes or experiments have the following
characteristics:
Poisson Distribution
1. The number of outcomes occurring in any given time interval or region is
independent of the number of outcomes occurring in any other disjoint time
interval or region.
2. The probability of a single outcome occurring in a very short time interval
or very small region is proportional to the length of the time interval or the
size of the region. This value is not affected by the number of outcomes
occurring outside this particular time interval or region.
3. The probability of more than one outcome occurring in a very short time
interval or very small region is negligible.

Taken together, the first two characteristics are known as the
“memoryless” property of Poisson processes.
Transportation engineers often assume that the number of vehicles passing
by a particular point on a road is approximately Poisson distributed.
Do you think that this model is more appropriate for a rural hiqhway or a city
street?
Poisson Distribution
The pdf of the Poisson distribution is the following:

e   x
P(X  x) 
, x  0,1,2...for some  >0
x!

The parameter  is equal to a t, where a is the intensity of the process the
average number of events per time unit and t is the number of time units in
question. In a spatial Poisson process a represents the average number of
events per unit of space and t represents the number of spatial units in
question.


For example, the number of vehicles crossing a bridge in a rural area might be
modeled as a Poisson process. If the average number of vehicles per hour,
during the hours of 10:00 AM to 3:00 PM is 20 we might be interested in the
probability that fewer than three vehicles cross on from 12:30 to 12:45 PM. In
this case  = (20 per hour)(0.25hours) = 5.
e 5 50 e 5 51 e 5 52
P(X< 3) 


0!
1!
2!
5
 e 5 (1  5  )  0.573
2
The expected value and the variance of the Poisson distribution are identical
and are equal to .
Class exercise





An urban planner believes that the number of gas stations in an urban area is
approximately Poisson distributed with parameter a = 3 per square mile. Lets
assume she is correct in her assumption.
Calculate the expected number of gas stations in a four square mile region of
the urban area as well as the variance of this number.
Calculate the probability that this region of four square miles has less than
six gas stations.
Calculate the probability that in four adjacent regions of one square mile
each, that at least two of the four regions contains more than three gas
stations.
Do you think the situation is accurately modeled by a Poisson process?

Why or why not?
Some random variables that
typically obey the Poisson
probability law (Ross, p.130)

The number of misprints on a page (or group of pages) of a book

The number of people in a community living to 100 years of age

The number of wrong telephone numbers that are dialed in a day

The number of packages of dog biscuits sold in a particular store each day




The number of customers entering a post office (bank, store) in a give time
period
The number of vacancies occurring during a year in the supreme court
The number of particles discharged in a fixed period of time from some
radioactive material
WHAT ABOUT ENGINEERING?

Poisson processes are the heart of queueing theory --which is one of the most
important topics in transportation and logistics. Lots of other applications
too -- water, structures, geotech etc.
The Poisson Distribution as an
approximation to the Binomial


When n is large and p is small, the Poisson distribution with
parameter np is a very good approximation to the binomial (the
number of successes in n independent trials when the probability of
success is equal to p for each trial).
Example --Suppose that the probability that an idem produced by a
certain machine will be defective is 0.10. Find the probability that a
sample of 10 items will contain at most one flaw.
10 
10 
0
10
P( X  1)    (0.10) (0.90)    (0.10)1 (0.90)9  0.7361
0 
1 
np = 0.10*10=1.0
0  np
1  np
(np) e
(np) e
P( X  1) 

 0.7357
0!
1!
References

Ross, S. (1988), A first course in
probability, Macmillian Press.
Download