Objectives, Data and Measurements

advertisement
Module H2 Practical 16
Goodness-of-fit tests
Objectives:
By the end of this practical you should be able to:



carry out a goodness-of-fit test to determine whether a measurement
variable follows a Poisson, binomial or normal distribution
take appropriate action to improve the chi-square approximation
present and state conclusions emerging from a chi-square test.
1. The aim of this exercise is to demonstrate how goodness-of-fit tests can be carried
out using Excel facilities and to provide some revision on the Poisson distribution.
Open the Excel workbook named H2_data.xls and move to the sheet named
KiliWomenPoisson. In this worksheet, data on household size, for women headed
households from the Kilimanjaro region of Tanzania (see sheet KilijaroWomen), have
been summarized into a frequency table. The following are the descriptions of the
variables:

X = HHsize – the random variable representing household size, i.e. the number of

persons living in the household.
obsfreq – the observed number of households out of 94 sampled that are of the
given HHsize.
PoiProb – assuming X has a Poisson distribution with sampled mean as the
Poisson parameter =4.4, this column has the probability that a randomly selected
household will have size X = HHsize.
expfreq – the expected frequency of households of size HHsize calculated using

the Poisson probability PoiProb.
Chisquare – contributions to the chi-square statistic, i.e. values (O-E)2/E.


The Poisson mean (or parameter ) is given in cell C16 with red borders, currently set at
=4.4. By eye, the appropriateness of the Poisson distribution can be judged according to
how close the bar chart of the distribution of observed frequencies is to the expected
frequencies based on assuming a Poisson distribution. More appropriately, the “best”  is
the one that minimises the chi-square test statistic shown in cell C18 with blue borders.
SADC Course in Statistics
Module H2 Practical 16 – Page 1
Module H2 Practical 16
(a) First check you understand how columns C, D and E have been created by clicking on
one of the cells in each column and looking at the formula given. Note that the probability
density function of the Poisson distribution with parameter  is given by
P( X  k ) 
 k e 
k!
,
for k  0,1, 2,3,
where k! = (k)(k-1)(k-2)……(3)(2)(1).
(Note: Module H1, the pre-requisite for Module H2, gives further details concerning this
distribution).
(b) Check also the formulae given in cells E14, C18 and B20. Note down these formulae
below and explain how they relate to the goodness-of-fit tests explained in the lecture
presentation.
(c) Explain why the degrees of freedom=10 for the p-value in cell B20, corresponding to
the chi-square statistic in cell C18.
(d) Change the mean value  and note how the graph and the chi-square value changes.
Do you get any improvement in the closeness of the yellow and brown bars in the graph?
(e) Now change the mean value  by small amounts(no more than 0.1 or less at a time) and
see if you can make the chi-square statistic any smaller than the value 2 = 8.36 resulting
when the sample mean = 4.4 was used. Comment below on the ease with which a
goodness-of-fit test for a Poisson distribution can be carried out using Excel.
SADC Course in Statistics
Module H2 Practical 16 – Page 2
Module H2 Practical 16
2. The worksheet Tete-Jan-raindays in file H2-data.xls has information on the number
of rain days over 5-day and 10-day periods in January (days 1-30) for the years from 1953
to 2005. Also included is data on the total annual rainfall in mm (see last column).
(a) Complete the table below to show an estimate for the chance (probability) of rain per
day in each of the specified time periods in January.
Period in January
Estimated probability of rain per day
Days 1-5
Days 6-10
Days 11-15
Days 16-20
Days 21-25
Days 26-30
(b) Set up an Excel spreadsheet with observed and expected frequencies, to determine
whether data on number of rain days in the first five days of January follow a binomial
distribution. Carry out a goodness-of-fit test for this purpose and comment on the results
obtained.
(c) In doing the above tests, you would have needed the number of years when there were
0, 1, 2, 3, 4, or 5 raindays in the first 5 days of January. These frequencies are shown in the
table below. Complete the remainder of the table to show frequencies in the remaining 5day periods in January.
Number of days 1-5
rain days
0
1
2
3
4
5
days 6-10
days 11-15
days 16-20
days 21-25
days 26-30
9
7
14
15
6
1
SADC Course in Statistics
Module H2 Practical 16 – Page 3
Module H2 Practical 16
(d) Now explore, making appropriate substitutions to the spreadsheet you had set up,
whether any of the remaining 5-day totals follow a binomial distribution. Remember that
you may need to collapse some cells if the assumptions underlying the chi-square test
appear invalid.
Note down your conclusions.
3. IF YOU HAVE TIME, TRY ALSO THE FOLLOWING:
In the same data set you used in Question 2, the annual rainfall total was also given (in
column named TotRain. Investigate whether this data follows a normal distribution.
You may begin with a normal probability plot, then proceed to carry out a goodness-of-fit
test. Remember you will have to group your data and obtain observed frequencies in each
group in order to perform your chi-square goodness-of-fit test.
SADC Course in Statistics
Module H2 Practical 16 – Page 4
Download