Curvature and Nonconstant Variance

advertisement
Curvature and Nonconstant Variance
Testing for Curvature ~ Black Cherry Tree Data
We begin by examining a scatterplot matrix for these data.
The marginal response plot of Vol vs. D shows evidence of a nonlinearity. The Ht vs. D panel suggests
that Ht and D are linearly related. It is also worthwhile examining a 3-D spin plot of Vol vs. (Ht,D). Try
fitting a plane and examining the residuals graphically. Also try removing linear trend from the plot and
spinning. Is there evidence of lack of fit for the planar model exhibited in the 3-D plot? (yes, check it out)
We will now examine the results of fitting the mean function
E(Vol|Ht,D) = Ht + D
The regression summary is given below:
Data set = Trees, Name of Fit = L1
Normal Regression
Kernel mean function = Identity
Response
= Vol
Terms
= (D Ht)
Coefficient Estimates
Label
Estimate
Std. Error
Constant -57.9877
8.63823
D
4.70816
0.264265
Ht
0.339251
0.130151
R Squared:
Sigma hat:
Number of cases:
Degrees of freedom:
Summary Analysis of
Source
df
Regression
2
Residual
28
Lack of fit
26
t-value
-6.713
17.816
2.607
0.94795
3.88183
31
28
Variance Table
SS
MS
7684.16
3842.08
421.921
15.0686
421.716
16.2199
F
254.97
p-value
0.0000
158.24
0.0063
Pure Error
2
0.205
0.1025
Note that the p-value for the LOF tests suggests that this model may be inappropriate. We will now
examine a plot of the residuals vs. the fitted values.
There is definitely evidence of lack of fit. The linear mean function above is not supported by this residual
plot.
Testing for Curvature ~ Tukey's Test for Nonadditivity
Following the procedure outlined in class, we first fit the linear model above and save the fitted values by
selecting the Add to dataset … option from the pull down menu for model. Then use Transform…
option to square the fitted values. Now perform the regression of Vol on Ht, D, and L1:Fit-Values^2. The
results are shown below:
Normal Regression
Kernel mean function = Identity
Response
= Vol
Terms
= (D Ht L1.Fit-Values^2)
Coefficient Estimates
Label
Estimate
Std. Error
t-value
Constant
-16.5517
8.93717
-1.852
D
1.48988
0.560572
2.658
Ht
0.207008
0.0891259
2.323
L1.Fit-Values^2
0.00971504
0.00160720
6.045
R Squared:
Sigma hat:
Number of cases:
Degrees of freedom:
Summary Analysis of
Source
df
Regression
3
Residual
27
Lack of fit
25
Pure Error
2
0.977882
2.5769
31
27
Variance Table
SS
MS
7926.79
2642.26
179.291
6.64042
179.086
7.16346
0.205
0.1025
F
397.91
69.89
p-value
0.0000
0.0142
t dist. with 27 df, value = 6.045, two-tail probability = 1.87924e-06
The test statistic for Tukey's Test for Nonadditivity is the t-value for the L1:Fit-Values^2 term in this
model. Here t = 6.045 ~ t df = 27 (p-value = .0000018), thus we reject the NH and conclude there is
sufficient evidence of nonadditivity.
An easier way to obtain these results is to use the Residual plots… option from the model menu. Below is
a plot of the residuals vs. the fitted values obtained in this manner.
Notice that the results of Tukey's Test for Nonadditivity are given above the plot. You can also test for
significant nonadditivity as a function of the individual terms in the model. By clicking on the slider for
the horizontal axis we can obtain plots of the residuals vs. Ht & D as well as the corresponding
nonadditivity test. The results are shown below:
The test for nonadditivity as a function of diameter (D) is significant, while the nonadditivity test for height
(Ht) is mildly significant (not shown, p < .10). Based on these results we would conclude there is
significant curvature present and some reformulation of the mean function is needed. For example, log
transforming Vol, Ht, and D does not result in significant nonadditivity.
Testing for Nonconstant Variance ~ Transaction Data
Below is a plot of the residuals vs. the fitted values from fitting the mean function
E(Time|T1,T2) = T1 + T2
In working with these data previously you have used weighted least squares to account for the fact that the
Var(Time|T1,T2) is not constant. We will now examine formal tests to determine if nonconstant variances
is present, and secondly gain some insight into the form of the variance function. First we will test to see if
the variance depends significantly on the estimated mean function. In Arc select Nonconstant variance
plot… from the model menu, which gives the plot and test result shown below.
The plot clearly shows nonconstant error variance. The score test for nonconstant variance suggests that
the variance changes with the value of the estimated mean, E(Time|T1,T2). (df = 1, p = .000)
You can specify a different linear combination of T1 and T2 to use for testing the nonconstancy of the
variance by clicking on the Variance terms menu to the left of the NCV plot. Here we can specify the
variance depends on T1 and T2, but not in the exact form specified by the estimated mean function. The
result is shown below.
The score test again suggests that the variance is not constant. (df = 2, p = .000). Which choice
of v is more appropriate for these data? We can perform a hypothesis test to answer this question.
NH: variance changes with the estimated mean E(Time|T1,T2)
log(Var(Time|T1,T2)) = log() + E(Time|T1,T2)
AH: variance changes as a function of T1 and T2
log(Var(Time|T1,T2)) = log() + *T1 + *T2
The test statistic is the difference in the chi-square statistics above.
diff =  df = (2 - 1) = 1
The associate p-value for the test statistic is:
Chisq dist. with 1 df, value = 21.27, upper-tail probability = .000
Thus we reject the NH and conclude that the estimated mean does not adequately model the variance and
an arbitrary linear combination of T1 and T2 is needed. In similar fashion we could test if the total number
of transactions, S = T1 + T2, adequately models the variance. (Note: to do this in Arc simply add the
variate S = T1 + T2 to the data set.) The results of the test suggest that S = T1 + T2 does not adequately
model the variance either.
Download