Module H8 Practical 12 Interpreting the anova model involving one factor Objectives: By the end of this practical you should be able to: understand and interpret the output from fitting a linear model involving one categorical factor make comparisons between group means by examining results concerning the parameter estimates Conduct a residual analysis to check anova assumptions 1. In this practical you will again use data from the site Tete in Mozambique, corresponding to total annual rainfall (total), number of rainy days (numdays) and maximum dry spell (maxspell). Recall that the data are available in the sheet tete in the Excel workbook H8_data.xls, and also in the Stata file tete.dta The main categorical factor of interest is named nino which identifies each year as being an “elnino” year or a “lanina” year or a “normal” year. Recall the second of the analysis objectives: To analyse the data to answer the following two questions: Are Elnino years drier than normal years? Are Lanina years wetter than normal years? Here you will address the second of these questions using the number of rainy days. (a) First determine the mean and standard errors for variable numdays across each of the nino levels. Note down the results in the table below. Variable name Mean Standard error Sample size Elnino n1 = Lanina n2 = Normal n3 = SADC Course in Statistics Module H8 Practical 12 – Page 1 Module H8 Practical 12 (b) Consider fitting the model yij = 0 + gi + ij , i = 1, 2, 3 to data corresponding to the number of rainy days, where 0 is a constant and the gi represent an effect due to variable nino (i=1 for Elnino, i=2 for Lanina, i=3 for normal). The parameter estimates from fitting the above model appears below, first using the Stata software, then using SPSS software. Estimates using Stata software Variable name Parameter estimate g1 (Elnino) 0 g2 (Lanina) 10.6 g3 (Normal) 3.169 0 (constant) 22.8 Std. error 3.156 2.557 2.232 t-value 3.36 1.24 10.22 Std. error 2.557 2.557 1.248 t-value -1.239 2.906 20.813 t-probability 0.002 0.221 0.000 Estimates using SPSS software Variable name g1 (Elnino) g2 (Lanina) g3 (Normal) 0 (constant) Parameter estimate -3.169 7.431 0 25.969 t-probability 0.221 0.005 0.000 Try and produce the results above using the software package available to you. Do your results correspond to results of one of the tables above? Which one? Note down the steps you undertook to produce the results. SADC Course in Statistics Module H8 Practical 12 – Page 2 Module H8 Practical 12 (c) Consider now the second stated question, i.e. are Lanina years? years wetter than normal Formulate a null hypothesis, using variable numdays, which will allow you to answer this question. First carry out a t-test, by “hand”, to test your null hypothesis. The steps you need are outlined below. (i) State the null hypothesis H0: (ii) State the alternative hypothesis: H1: (iii) Compute by “hand” the difference in means and the standard error for this difference using the formula: s2 s2 s.e.d. = n1 n2 Here s2 refers to the Residual mean square from your anova and n1, n2 correspond to the number of observations for Lanina and normal years respectively. Now compute the t-statistic given by t = difference in means / std. error of difference = Compare your results above with those on page 2 from the SPSS output. Are your results the same as those from the SPSS software? If so, what conclusions do you draw concerning the question about whether Lanina years are wetter than normal years? SADC Course in Statistics Module H8 Practical 12 – Page 3 Module H8 Practical 12 (d) Now consider how you might produce the same results as above using output from Stata software. First write down below the comparisons that are possible with the output from Stata (see page 2 results). You will find that neither of these correspond to a comparison of Lanina years with normal years. (e) Stata can be used to produce the variance-covariance matrix of the model estimates. This matrix is produced below. var-covar: 0 g2 g3 0 g2 g3 4.982 -4.982 -4.982 9.963 4.982 6.538 Compute the standard error of the difference between Lanina and normal years by either using the formula given in part (d) above, or (preferably) by using the expression Var (g2 – g3) = var(g2) + (var(g3) – 2covar(g2, g3) = In this particular example, where you have only one categorical factor, either approach will give the same answer. However, note that in more complex models, the general expression above using model estimates will still hold, whereas the formula used in part (d) will be incorrect. Select the appropriate elements from the variance-covariance matrix to compute the expression above, and take the square root of the result to get the standard error. SADC Course in Statistics Module H8 Practical 12 – Page 4 Module H8 Practical 12 Now compute the difference in means, using parameter estimates given in the Stata results on page 2, i.e. g2 – g3 = Verify that this is the same as the corresponding difference in means computed from the table of results you completed on page 1. What does this tell you? Next compute the t-statistic for comparing Lanina and normal years. Compare your results with the SPSS output (page 2). Are they the same as those in the row corresponding to g2(Lanina)? What conclusions do you draw from the analyses above about the ease with which two means may be compared? (f) Finally consider using the model equation for predictions. e.g. For a normal year, what would you predict as being the mean number of rainly days? Can you also find the standard error of your prediction? SADC Course in Statistics Module H8 Practical 12 – Page 5 Module H8 Practical 12 (g) Conduct also a residual analysis. Do you have any concerns about the validity of your model assumptions? SADC Course in Statistics Module H8 Practical 12 – Page 6