Probabilities associated with the Poisson Distribution

advertisement
Module H1 Practical 7
Probabilities associated with the Poisson Distribution
1. In this exercise, you will use the same data set as in the previous practical, i.e. data
corresponding to a subset of information collected from the Tanga region in Tanzania.
The variable of interest here is HHsize , i.e. number of members living in the household.
Objectives:

To investigate whether the data on household size follows a Poisson distribution,
and if so, to determine the “best”estimate for the Poisson parameter .
Using the value of  above, to determiner the probability that a randomly selected
household will have more than 5 members. This latter question is of interest in a
family planning campaign aimed at investigating factors which lead rural families to
have more children.

To address the above objectives, you will need to assume that the family size is a random
variable resulting in independent outcomes from household to household.
You are encouraged to attempt this practical exercise in pairs or in groups of three persons.
After completion of the exercise, some groups will be randomly chosen to present the
answers to one question to the rest of the class.
(a) To address the objectives above, open the Excel workbook named H1_data.xls and
move to the sheet named Poisson. In this worksheet, the variable HHsize is already
summarized into a frequency table. The following are the descriptions of the variables:




X = HHsize – the random variable representing household size, i.e. the number of
persons living in the household.
Obs_freq – the observed number of households out of 3223 sampled that are of
the given HHsize.
Poi_prob – assuming X has a Poisson distribution with probability =3, the
probability of getting a household of size X = HHsize.
Poi_freq – the expected frequency of households of size HHsize calculated using
the Poisson probability Poi_prob.
SADC Course in Statistics
Module H1 Practical 7 – Page 1
Module H1 Practical 7
The Poisson mean (or parameter ) is given in Cell D22 with red borders, currently set at
=3. The Poisson distribution fits the data very nicely if its bar chart distribution of
observed frequencies is very close to its expected frequencies based on assuming a Poisson
distribution with an appropriate value for its parameter .
(a) Change the mean value  in cell D22 until the yellow bars (expected values) appear
quite close to the brown bars (observed frequencies). Note that the value of  can be any
positive value.
Note down below the value of lambda () that appears most appropriate to describe
household size in terms of a Poisson distribution.
Mean of Poisson =  =
(b) Using the best fit value for  as found above, compute the probability that a household
in Tanga region will have exactly 5 household members. Do this first by “hand” using the
formula for the probability distribution of a Poisson random variable, i.e.
P( X  k ) 
 k e 
k!
,
for k  0,1, 2,3,
You will need to use this formula by substituting the appropriate values for k and . Note
down your answer below.
(c) Now take note of the Excel formula for the calculation of Poisson probabilities. Go to
Cell C2. Use the down arrow to go down the column and take note of the changes in the
formula. Work out in discussion within your group, what the elements of the formula
represents – but ignore the element FALSE for now. Use this formula to check your
answer to part (b) above.
SADC Course in Statistics
Module H1 Practical 7 – Page 2
Module H1 Practical 7
(d) Now find the answer to the second objective stated on page 1. You will need to find
P (X > 5) = 1 – P (X=0) – P(X=1) – P(X=2) – P(X=3) – P(X=4) – P(X=5)
Use the Excel function POISSON to find the answer and note it down below.
2. Note that the mean of the Poisson random variable HHsize is denoted by ‘a ’ in the
worksheet Poisson. Here ‘a ’ is the name given to cell D22 (click on cell D22 and you will
see the name ‘a ’ shown below the tool bar on the top left above the cells of the
worksheet).
The probability values are given by:
POISSON(A4,a,FALSE) = P(X=A4).
However, if you replace FALSE by TRUE in the formula, what you obtain are the
cumulative probabilities, e.g. POISSON(A4,a,TRUE) = P(X≤A4). Use this information to
determine the answer to the second objective (as in 1(d) above) and verify you get the same
answer as you got in part 1(d).
3.
If you have time, create a new worksheet (using the menu sequence Insert,
Worksheet…) and see if you can create a bar-chart of the type shown in sheet Poisson.
SADC Course in Statistics
Module H1 Practical 7 – Page 3
Download