Statistics 512 Study Guide 2 Spring 2002 A) One-way ANOVA with Polynomial Effects 1.A substantial percentage of the potatoes raised in this country never have a chance to reach the table. Instead they fall victim to potato rot while being stored for later use. To find out what could be done to reduce this loss, and experiment was carried out at the University of Wisconsin. Potatoes were injected with a bacteria known to cause rot, then stored. After 5 days the diameter of the rotted portion on each potato was measured (in millimeters). The levels of injection were low, medium and high, with 18 potatoes receiving each level of treatment. The ANOVA table is below: Analysis of Variance For: rot Source df Sum of Squares bact 2 651.815 Error 51 2055.22 Total 53 2707.04 Summary statistics for: low NumNumeric = 18 Mean = 5.2778 Standard Deviation = 4.1981 Mean Square 325.907 40.2985 F-ratio 8.0873 Summary statistics for: medium NumNumeric = 18 Mean = 9.1667 Standard Deviation = 6.6177 Prob 0.0009 Summary statistics for: high NumNumeric = 18 Mean = 13.778 Standard Deviation = 7.7121 Normal Probability plots were drawn for each group, and the data appears to be normally distributed a) Does the level of bacteria injected have an effect on the diameter of rot in the potato? b) Does the amount of rot at the low level differ significantly from amount at the medium level? c) Does the amount of rot at the high level differ significantly from the amount at the other two levels? d) Are the two constrasts above orthogonal? e) What are the sums of squares for the contrasts? What is the sum of the two sums of squares? f) The investigator wasn't sure of the exact inoculum level in the injections. However, the levels were determined by dilution of a liquid medium containing the bacteria. Therefore, she was fairly certain that the medium level is about 2 times the low level, and the high level is about 4 times the low level. How can she test whether the relationship between inoculum level and rot diameter is linear? g) What is the highest degree polynomial that can be fitted to this data? h) What is the difference in this case between fitting a linear regression and testing the lack of fit statistic, and fitting a quadratic regression and determining if the quadratic term is significant? -1- Statistics 512 Study Guide 2 Spring 2002 i) Below is the regression output from regressing rot diameter on inoculum level (1, 2 or 4). Compute the lack of fit sum of squares. Is there evidence of lack of fit? Dependent variable is: rot R2 = 23.6% R2(adjusted) = 22.1% s = 6.306 with 54 - 2 = 52 degrees of freedom Source Sum of Squares Regression 638.922 Residual 2068.12 df 1 52 Variable Constant level s.e. of Coeff 1.821 0.6881 Coefficient 2.97222 2.75794 Mean Square 639 39.7714 -2- t-ratio 1.63 4.01 F-ratio 16.1 Statistics 512 Study Guide 2 Spring 2002 Solutions 1. a. The null hypothesis is either 1=2=0, or 1=2=3 or 1=2=3. In any case, the question is answered by the F-test in the ANOVA table. F*=8.0873 and under the null hypothesis it should be compared to an F(2, 51). The Pvalue is 0.0009. There appears to be a highly significant effect. b. Ho: 1=2 versus not equal. This can most readily be determined by a contrast: y -y 1 2 s.e.(y -y ) 1 2 = t *= 5.2778 - 9.1667 40.2985( 1 + 1 ) 18 18 = -1.84 t(.95,51)=1.68 t(.975,51)=2.00 The difference between the low and medium levels is marginally significant. 1+1 - 3 = 0 . Ho: 2 versus not equal. This can most readily be determined by a contrast: y1 +y2 5.2778 + 9.1667 - 13.778 -y3 2 2 y1 +y2 40.2985( 1 + 1 + 1 ) -6.55575 s.e.( -y3 ) * 2 4*18 4*18 18 = 1.8325 = -3.577 t= = The contrast is highly significant. d. The contrasts are orthogonal, since 1*1 * 1 + (-1)* 1 * 1 + 0 * 1 * 1 = 0 2 18 2 18 18 . e. The sum of squares for the first contrast is 5.2778 - 9.1667 2 1 + 1 18 18 = 136.435 The sum of squares for the second contrast is -3- Statistics 512 Study Guide 2 Spring 2002 5.2778 + 9.1667 - 13.778 2 2 1 + 1 + 1 = 515.616 4*18 4*18 18 Since the contrasts are orthogonal, they should sum to the SSR. 136.435+515.616=652.051. The difference is due to round off error. f. It is not necessary to know the "units" of measure (i.e. how much inoculum is in the "low" injection, in order to fit polynomial regression, as long as the relative sizes of the levels are known. That is, we know there is some concentration, C, so that the levels are 1C, 2C, and 4C. The regression coefficients will vary with C, but the fitted values will not. We can determine if there is a linear (or polynomial fit) by regressing on the continuous variable with values 1, 2, and 4, and then seeing if there is lack of fit (by testing lack of fit, and looking at the residual plots). g. Since there are 3 distinct values of the independent variable, a quadratic polynomial is the highest degree that can be fit. h. There is no difference between these methods. If a quadratic polynomial is fit, the sequential sum of squares for the quadratic term will be the same as the lack of fit statistics. If there were more levels, so that a higher degree polynomial could be fit, the lack of fit statistic is the simultaneous test of whether any of the higher order terms improve the fit. i. The lack of fit sum of squares is: SS(lack of fit) SSR(polynomial) = 651.815-638.922 =12.893 = SSR(categorical)- The degrees of freedom for lack of fit is: df(lack of fit) = df(categorical)-df(polynomial) = 2-1 =1 The test for lack of fit is: F* =SSR(lack of fit)/df(lack of fit) MSE(categorical) =12.893/40.2985 -4- Statistics 512 Study Guide 2 Spring 2002 =.320 This should be compared to F(.95,1,51)=4.0. (However, an F-statistic must be greater than 1 to be significant, so we don't really need to use the table.) -5-