Identifying Response and Explanatory Variables

advertisement
Module H8 Practical 12
Interpreting the anova model involving one factor
Objectives:
By the end of this practical you should be able to:



understand and interpret the output from fitting a linear model
involving one categorical factor
make comparisons between group means by examining results
concerning the parameter estimates
Conduct a residual analysis to check anova assumptions
1.
In this practical you will again use data from the site Tete in Mozambique,
corresponding to total annual rainfall (total), number of rainy days (numdays) and
maximum dry spell (maxspell).
Recall that the data are available in the sheet tete in the Excel workbook H8_data.xls,
and also in the Stata file tete.dta The main categorical factor of interest is named nino
which identifies each year as being an “elnino” year or a “lanina” year or a “normal” year.
Recall the second of the analysis objectives:
To analyse the data to answer the following two questions:


Are Elnino years drier than normal years?
Are Lanina years wetter than normal years?
Here you will address the second of these questions using the number of rainy days.
(a) First determine the mean and standard errors for variable numdays across each of the
nino levels. Note down the results in the table below.
Variable name
Mean
Standard error
Sample size
Elnino
n1 =
Lanina
n2 =
Normal
n3 =
SADC Course in Statistics
Module H8 Practical 12 – Page 1
Module H8 Practical 12
(b) Consider fitting the model yij = 0 + gi + ij , i = 1, 2, 3
to data corresponding to the number of rainy days, where 0 is a constant and the gi
represent an effect due to variable nino (i=1 for Elnino, i=2 for Lanina, i=3 for normal).
The parameter estimates from fitting the above model appears below, first using the Stata
software, then using SPSS software.
Estimates using Stata software
Variable name Parameter estimate
g1 (Elnino)
0
g2 (Lanina)
10.6
g3 (Normal)
3.169
0 (constant)
22.8
Std. error
3.156
2.557
2.232
t-value
3.36
1.24
10.22
Std. error
2.557
2.557
1.248
t-value
-1.239
2.906
20.813
t-probability
0.002
0.221
0.000
Estimates using SPSS software
Variable name
g1 (Elnino)
g2 (Lanina)
g3 (Normal)
0 (constant)
Parameter estimate
-3.169
7.431
0
25.969
t-probability
0.221
0.005
0.000
Try and produce the results above using the software package available to you. Do your
results correspond to results of one of the tables above? Which one? Note down the steps
you undertook to produce the results.
SADC Course in Statistics
Module H8 Practical 12 – Page 2
Module H8 Practical 12
(c) Consider now the second stated question, i.e. are Lanina
years?
years wetter than normal
Formulate a null hypothesis, using variable numdays, which will allow you to answer
this question. First carry out a t-test, by “hand”, to test your null hypothesis. The
steps you need are outlined below.
(i) State the null hypothesis H0:
(ii) State the alternative hypothesis: H1:
(iii) Compute by “hand” the difference in means and the standard error for this
difference using the formula:
s2 s2
s.e.d. 
 =
n1 n2
Here s2 refers to the Residual mean square from your anova and n1, n2 correspond to
the number of observations for Lanina and normal years respectively.
Now compute the t-statistic given by
t = difference in means / std. error of difference =
Compare your results above with those on page 2 from the SPSS output. Are your
results the same as those from the SPSS software? If so, what conclusions do you
draw concerning the question about whether Lanina years are wetter than normal
years?
SADC Course in Statistics
Module H8 Practical 12 – Page 3
Module H8 Practical 12
(d) Now consider how you might produce the same results as above using output from
Stata software. First write down below the comparisons that are possible with the
output from Stata (see page 2 results). You will find that neither of these correspond
to a comparison of Lanina years with normal years.
(e) Stata can be used to produce the variance-covariance matrix of the model estimates.
This matrix is produced below.
var-covar:
0
g2
g3
0
g2
g3
4.982
-4.982
-4.982
9.963
4.982
6.538
Compute the standard error of the difference between Lanina and normal years by
either using the formula given in part (d) above, or (preferably) by using the expression
Var (g2 – g3) = var(g2) + (var(g3) – 2covar(g2, g3) =
In this particular example, where you have only one categorical factor, either approach
will give the same answer. However, note that in more complex models, the general
expression above using model estimates will still hold, whereas the formula used in part
(d) will be incorrect.
Select the appropriate elements from the variance-covariance matrix to compute the
expression above, and take the square root of the result to get the standard error.
SADC Course in Statistics
Module H8 Practical 12 – Page 4
Module H8 Practical 12
Now compute the difference in means, using parameter estimates given in the Stata
results on page 2, i.e.
g2 – g3 =
Verify that this is the same as the corresponding difference in means computed from
the table of results you completed on page 1. What does this tell you?
Next compute the t-statistic for comparing Lanina and normal years.
Compare your results with the SPSS output (page 2). Are they the same as those in the
row corresponding to g2(Lanina)?
What conclusions do you draw from the analyses above about the ease with which two
means may be compared?
(f) Finally consider using the model equation for predictions.
e.g. For a normal year, what would you predict as being the mean number of rainly
days?
Can you also find the standard error of your prediction?
SADC Course in Statistics
Module H8 Practical 12 – Page 5
Module H8 Practical 12
(g) Conduct also a residual analysis. Do you have any concerns about the validity of your
model assumptions?
SADC Course in Statistics
Module H8 Practical 12 – Page 6
Download