Curvature and Nonconstant Variance Testing for Curvature ~ Black Cherry Tree Data We begin by examining a scatterplot matrix for these data. The marginal response plot of Vol vs. D shows evidence of a nonlinearity. The Ht vs. D panel suggests that Ht and D are linearly related. It is also worthwhile examining a 3-D spin plot of Vol vs. (Ht,D). Try fitting a plane and examining the residuals graphically. Also try removing linear trend from the plot and spinning. Is there evidence of lack of fit for the planar model exhibited in the 3-D plot? (yes, check it out) We will now examine the results of fitting the mean function E(Vol|Ht,D) = Ht + D The regression summary is given below: Data set = Trees, Name of Fit = L1 Normal Regression Kernel mean function = Identity Response = Vol Terms = (D Ht) Coefficient Estimates Label Estimate Std. Error Constant -57.9877 8.63823 D 4.70816 0.264265 Ht 0.339251 0.130151 R Squared: Sigma hat: Number of cases: Degrees of freedom: Summary Analysis of Source df Regression 2 Residual 28 Lack of fit 26 t-value -6.713 17.816 2.607 0.94795 3.88183 31 28 Variance Table SS MS 7684.16 3842.08 421.921 15.0686 421.716 16.2199 F 254.97 p-value 0.0000 158.24 0.0063 Pure Error 2 0.205 0.1025 Note that the p-value for the LOF tests suggests that this model may be inappropriate. We will now examine a plot of the residuals vs. the fitted values. There is definitely evidence of lack of fit. The linear mean function above is not supported by this residual plot. Testing for Curvature ~ Tukey's Test for Nonadditivity Following the procedure outlined in class, we first fit the linear model above and save the fitted values by selecting the Add to dataset … option from the pull down menu for model. Then use Transform… option to square the fitted values. Now perform the regression of Vol on Ht, D, and L1:Fit-Values^2. The results are shown below: Normal Regression Kernel mean function = Identity Response = Vol Terms = (D Ht L1.Fit-Values^2) Coefficient Estimates Label Estimate Std. Error t-value Constant -16.5517 8.93717 -1.852 D 1.48988 0.560572 2.658 Ht 0.207008 0.0891259 2.323 L1.Fit-Values^2 0.00971504 0.00160720 6.045 R Squared: Sigma hat: Number of cases: Degrees of freedom: Summary Analysis of Source df Regression 3 Residual 27 Lack of fit 25 Pure Error 2 0.977882 2.5769 31 27 Variance Table SS MS 7926.79 2642.26 179.291 6.64042 179.086 7.16346 0.205 0.1025 F 397.91 69.89 p-value 0.0000 0.0142 t dist. with 27 df, value = 6.045, two-tail probability = 1.87924e-06 The test statistic for Tukey's Test for Nonadditivity is the t-value for the L1:Fit-Values^2 term in this model. Here t = 6.045 ~ t df = 27 (p-value = .0000018), thus we reject the NH and conclude there is sufficient evidence of nonadditivity. An easier way to obtain these results is to use the Residual plots… option from the model menu. Below is a plot of the residuals vs. the fitted values obtained in this manner. Notice that the results of Tukey's Test for Nonadditivity are given above the plot. You can also test for significant nonadditivity as a function of the individual terms in the model. By clicking on the slider for the horizontal axis we can obtain plots of the residuals vs. Ht & D as well as the corresponding nonadditivity test. The results are shown below: The test for nonadditivity as a function of diameter (D) is significant, while the nonadditivity test for height (Ht) is mildly significant (not shown, p < .10). Based on these results we would conclude there is significant curvature present and some reformulation of the mean function is needed. For example, log transforming Vol, Ht, and D does not result in significant nonadditivity. Testing for Nonconstant Variance ~ Transaction Data Below is a plot of the residuals vs. the fitted values from fitting the mean function E(Time|T1,T2) = T1 + T2 In working with these data previously you have used weighted least squares to account for the fact that the Var(Time|T1,T2) is not constant. We will now examine formal tests to determine if nonconstant variances is present, and secondly gain some insight into the form of the variance function. First we will test to see if the variance depends significantly on the estimated mean function. In Arc select Nonconstant variance plot… from the model menu, which gives the plot and test result shown below. The plot clearly shows nonconstant error variance. The score test for nonconstant variance suggests that the variance changes with the value of the estimated mean, E(Time|T1,T2). (df = 1, p = .000) You can specify a different linear combination of T1 and T2 to use for testing the nonconstancy of the variance by clicking on the Variance terms menu to the left of the NCV plot. Here we can specify the variance depends on T1 and T2, but not in the exact form specified by the estimated mean function. The result is shown below. The score test again suggests that the variance is not constant. (df = 2, p = .000). Which choice of v is more appropriate for these data? We can perform a hypothesis test to answer this question. NH: variance changes with the estimated mean E(Time|T1,T2) log(Var(Time|T1,T2)) = log() + E(Time|T1,T2) AH: variance changes as a function of T1 and T2 log(Var(Time|T1,T2)) = log() + *T1 + *T2 The test statistic is the difference in the chi-square statistics above. diff = df = (2 - 1) = 1 The associate p-value for the test statistic is: Chisq dist. with 1 df, value = 21.27, upper-tail probability = .000 Thus we reject the NH and conclude that the estimated mean does not adequately model the variance and an arbitrary linear combination of T1 and T2 is needed. In similar fashion we could test if the total number of transactions, S = T1 + T2, adequately models the variance. (Note: to do this in Arc simply add the variate S = T1 + T2 to the data set.) The results of the test suggest that S = T1 + T2 does not adequately model the variance either.