Module H1 Practical 7 Probabilities associated with the Poisson Distribution 1. In this exercise, you will use the same data set as in the previous practical, i.e. data corresponding to a subset of information collected from the Tanga region in Tanzania. The variable of interest here is HHsize , i.e. number of members living in the household. Objectives: To investigate whether the data on household size follows a Poisson distribution, and if so, to determine the “best”estimate for the Poisson parameter . Using the value of above, to determiner the probability that a randomly selected household will have more than 5 members. This latter question is of interest in a family planning campaign aimed at investigating factors which lead rural families to have more children. To address the above objectives, you will need to assume that the family size is a random variable resulting in independent outcomes from household to household. You are encouraged to attempt this practical exercise in pairs or in groups of three persons. After completion of the exercise, some groups will be randomly chosen to present the answers to one question to the rest of the class. (a) To address the objectives above, open the Excel workbook named H1_data.xls and move to the sheet named Poisson. In this worksheet, the variable HHsize is already summarized into a frequency table. The following are the descriptions of the variables: X = HHsize – the random variable representing household size, i.e. the number of persons living in the household. Obs_freq – the observed number of households out of 3223 sampled that are of the given HHsize. Poi_prob – assuming X has a Poisson distribution with probability =3, the probability of getting a household of size X = HHsize. Poi_freq – the expected frequency of households of size HHsize calculated using the Poisson probability Poi_prob. SADC Course in Statistics Module H1 Practical 7 – Page 1 Module H1 Practical 7 The Poisson mean (or parameter ) is given in Cell D22 with red borders, currently set at =3. The Poisson distribution fits the data very nicely if its bar chart distribution of observed frequencies is very close to its expected frequencies based on assuming a Poisson distribution with an appropriate value for its parameter . (a) Change the mean value in cell D22 until the yellow bars (expected values) appear quite close to the brown bars (observed frequencies). Note that the value of can be any positive value. Note down below the value of lambda () that appears most appropriate to describe household size in terms of a Poisson distribution. Mean of Poisson = = (b) Using the best fit value for as found above, compute the probability that a household in Tanga region will have exactly 5 household members. Do this first by “hand” using the formula for the probability distribution of a Poisson random variable, i.e. P( X k ) k e k! , for k 0,1, 2,3, You will need to use this formula by substituting the appropriate values for k and . Note down your answer below. (c) Now take note of the Excel formula for the calculation of Poisson probabilities. Go to Cell C2. Use the down arrow to go down the column and take note of the changes in the formula. Work out in discussion within your group, what the elements of the formula represents – but ignore the element FALSE for now. Use this formula to check your answer to part (b) above. SADC Course in Statistics Module H1 Practical 7 – Page 2 Module H1 Practical 7 (d) Now find the answer to the second objective stated on page 1. You will need to find P (X > 5) = 1 – P (X=0) – P(X=1) – P(X=2) – P(X=3) – P(X=4) – P(X=5) Use the Excel function POISSON to find the answer and note it down below. 2. Note that the mean of the Poisson random variable HHsize is denoted by ‘a ’ in the worksheet Poisson. Here ‘a ’ is the name given to cell D22 (click on cell D22 and you will see the name ‘a ’ shown below the tool bar on the top left above the cells of the worksheet). The probability values are given by: POISSON(A4,a,FALSE) = P(X=A4). However, if you replace FALSE by TRUE in the formula, what you obtain are the cumulative probabilities, e.g. POISSON(A4,a,TRUE) = P(X≤A4). Use this information to determine the answer to the second objective (as in 1(d) above) and verify you get the same answer as you got in part 1(d). 3. If you have time, create a new worksheet (using the menu sequence Insert, Worksheet…) and see if you can create a bar-chart of the type shown in sheet Poisson. SADC Course in Statistics Module H1 Practical 7 – Page 3