Notes 11

advertisement
Statistics 510: Notes 10
Reading: Sections 4.7.
Schedule:
Thursday night (10/13): I will e-mail Homework 5 which
will be due Friday, October 21st.
Monday, 10/17: Fall break, no class.
Wednesday, 10/19: Chapters 4.8-4.9
Monday, 10/24: Chapters 5.1-5.2
Wednesday, 10/26: Midterm. The midterm will cover
Chapters 1-4.
I will provide review materials and review problems for the
midterm by Wednesday 10/19. I will hold extended office
hours the week before the midterm.
I. Poisson Random Variables
A random variable X taking on one of the values 0, 1, 2, ...
is said to be a Poisson random variable with parameter  if
for some   0 ,
p(i )  P{ X  i}  e
i

i!
, i  0,1, 2,
This is a pmf because


p(i)  e
i 0



i 0
i
i!
 e   e  1
Poisson random variables provide an approximation for a
binomial random variable when n is large and p is small
enough so that np   is of moderate size.
Let X be a binomial random variable with parameters (n,p)
and let   np . Then
n!
P( X  i) 
p i (1  p ) n i
(n  i )!i !
n!   

 
(n  i )!i !  n 
n i
 
1  
 n
n(n  1) (n  i  1)  i (1   / n) n

ni
i ! (1   / n)i
For n large,  moderate and i considerably smaller than
i
n(n  1) (n  i  1)
 
 

1


e
,

1,


1    1
i
n
n


 n
Hence for n large and  moderate,
n
P( X  i )  e
i

i
i! .
(Note that if i is not considerably smaller than n
and p is small, then P( X  i )  0 for a binomial random
variable and P( X  i )  0 for a Poisson random variable
since i would be much greater than   np if i is not
considerably smaller than n).
The following tables gives an idea of the accuracy of the
Poisson approximation to the binomial. For n=100,
p=1/100, the Poisson approximation is remarkably good.
Binomial probabilities and Poisson probabilities for n=5
and p=1/5 (   1 )
5 
e (1)
x
  (.2) (.8)
1
5 x
x
 x
0
1
2
3
4
5
6
x
x!
0.328
0.410
0.205
0.051
0.006
0.000
0
0.368
0.368
0.184
0.061
0.015
0.003
0.001
Binomial probabilities and Poisson probabilities for n=100
and p=1/100 (   1 )
100 
e (1)
X

 (.01) (.99)
x
x 
0
1
2
3
4
5
6
7
8
9
10
0.366032
0.369730
0.184865
0.060999
0.014942
0.002898
0.000463
0.000063
0.000007
0.000001
0.000000
100  x
1
x
x!
0.367879
0.367879
0.183940
0.061313
0.015328
0.003066
0.000511
0.000073
0.000009
0.000001
0.000000
Applications of Poisson random variables: The Poisson
family of random variables provides a good model for the
number of successes in an experiment consisting of a large
number of independent trials with a small probability of
success for each trial (since the number of successes is a
binomial random variable with n large and p small)
Examples of random phenomenon that are accurately
modeled as Poisson random variables include:
 The number of misprints on a page (or a group of
pages) of a book
 The number of people in a community living to 100
years of age
 The number of wrong telephone numbers that are
dialed in a day
Example 1: A chromosome mutation believed to be linked
with colorblindness is known to occur, on the average, once
in every 10,000 births. If 20,000 babies are born this year
in a certain city, what is the probability that at least one will
develop color blindness?
Expected Value and Variance of Poisson Random
Variables
Let X be a Poisson (  ) random variable.
ie    i
E( X )  
i!
i 0

e    i 1
 
i 1 (i  1)!

 e



i 0
j
j!
(by letting j  i  1)
  e   e

i 2e  i
E( X )  
i!
i 0

2
ie    i 1
 
i 1 (i  1)!

( j  1)e    j
 e 
(by letting j  i  1)
j
!
j 0


  je    j  e    j 
  


j
!
j! 
j 0
 j 0
  (  1)
where the final equality follows since the first sum is the
expected value of a Poisson random variable with
parameter  and the second is the sum of the probabilities
of this random variable.
Therefore,
Var ( X )  E ( X 2 )  ( E ( X )) 2   .
Poisson random variables for number of events occurring in
a time period
Another use of the Poisson probability distribution, besides
approximating the binomial for large n, small p, is to model
the number of “events” occurring in a certain period of
time, e.g.,
 the number of earthquakes occurring during some
fixed time span
 the number of wars per year
 the number of electrons emitted from a radioactive
source during a given period of time
 the number of freak accidents, such as falls in the
shower, for a large population during a given period of
time (used by insurance companies)
 number of vehicles that pass a marker on a roadway
during a given period of time.
Let X denote the number of events occurring in a certain
period of time. Suppose for a positive constant  , the
following assumptions hold true:
1. The probability that exactly 1 event occurs in a given
interval of length h is equal to  h  o(h) where o(h) stands
f (h) / h  0 [for
for any function f ( h) that is such that lim
h 0
2
instance, f (h)  h is o(h) whereas f (h)  h is not.]
2. The probability that 2 or more events occur in an
interval of length h is equal to o(h) .
3. For any integers n, j1 , j2 ,
, jn and any set of n
nonoverlapping intervals, if we define Ei to be the event
that exactly ji of the events under consideration occur in
the ith of these intervals, then events E1 ,
independent.
, En are
Under Assumptions 1-3, the number of events occurring in
any interval of length t is a Poisson random variable with
parameter t .
Proof: Let N (t ) denote the number of events occurring in
the interval [0, t ] . To obtain an expression for
P{N (t )  k} , we start by breaking the interval [0, t ] into n
non-overlapping subintervals of length t / n . Now,
P{N (t )  k}  P{k of the subintervals contain exactly 1 event
and the other n  k contain 0 events}
+P{ N (t )  k and at least one subinterval contains
2 or more events}
(1.1)
Let A and B denote the two mutually exclusive events on
the right hand side of the above equation. We have
P( B)  P(at least one subinterval contains 2 or more events)
n
 P(
{ith subinterval contains 2 or more events})
i 1
n
  P(ith subinterval contains 2 or more events)
i 1
n
=  o(t / n)
i=1
 no(t / n)
 o(t / n) 
t

 t/n 
Now for any t, t / n  0 as n  0 and so
o(t / n)(t / n)  0 as n   by the definition of o. Hence,
P( B)  0 as n  
(1.2)
.
On the other hand, since assumptions 1 and 2 imply that
P(0 events occur in an interval of length h) 
1  [ h  o(h)  o(h)]  1   h  o(h)
we see from Assumption 3 that
P( A)  P{k of the subintervals contain exactly 1 event and
and the other n-k contain 0 events}
 n   t
 t 
     o  
 n 
k  n
However, since
k
  t   t  
1   n   o  n  
    
nk
 t
 t 
 o(t / n) 
n   o     t  t 
 t as n   ,

 n 
 t/n 
n
it follows by the same argument that verified the Poisson
approximation to the binomial that
 t 
P( A)  e t
k!
k
as n  
(1.3)
Thus, by letting n   and using (1.1), (1.2) and (1.3), we
obtain
e   t ( t ) k
P{N (t )  k} 
, k  0,1,...
k!
Hence, if assumptions 1-3 are satisfied, the number of
events occurring in any fixed interval of length t is a
Poisson random variable with mean t . The value  is the
rate per unit time at which events occur.
Example 2: In the 432 years from 1500 to 1931, war broke
out somewhere in the world a total of 299 times (By
definition, a military action was a war if it either was
legally declared, involved over 50,000 troops or resulted in
significant boundary realignments. To achieve greater
uniformity from war to war, major confrontations were
split into smaller “subwars”: World War I, for example,
was treated as five separate wars).
The following table gives the distribution of the number of
years in which x wars broke out and the expected
frequencies for a Poisson (   0.69 ) random variable.
Number of wars
beginning in a
given year
0
1
2
3
4+
Total
Observed
Frequency
Expected
Frequency
223
142
48
15
4
432
217
149
52
12
2
432
Example 3: Suppose that earthquakes occur in the western
portion of the United States in accordance with
assumptions 1, 2 and 3 with   2 and with 1 week as the
unit of time (That is, earthquakes occur in accordance wiht
the three assumptions at the rate of 2 per week).
(a) Find the probability that at least 3 earthquakes occur
during the next 2 weeks.
(b) Find the probability distribution of the time, starting
from now, until the next earthquake.
Download