Analysis of Covariance 1. Model Yij j 1 1 j X ij X eij Yij j 1 1 j X ij X eij Yij 0 j 1 X ij 1 j X ij X eij j 1, 2, , k , i 1, 2, (1) , nj Translation to JMP > Fit Model specification Yij Y 0 j TRT 1 X ij 1 j X ij X eij X TRT*X Math Notation JMP Notation (2) Variables of Interest Yij Quantitative response to observation i of qualitative treatment level j X ij Quantitative covariate of observation i of qualitative treatment level j j = Qualitative treatment or factor Parameters of Interest 0 = grand mean response n τj = effect of qualitative treatment j, j 1 j 0 β = slope of regression line if all lines are parallel Document1 1 2/8/2016 βj = effect if treatment j on slope of regression line, n j 1 1 j 0 0 1 X the grand mean j j 0 j 1 X the treatment means adjusted for the covariate X Unobservable Random Variable eij = random error or residual of observation i of treatment j Assume eij are distributed with a (i) mean of 0 (ii) homogeneous standard deviation σ, and (iii) normally 2. Hypotheses 1. H01: j 0 for all j versus H11: j 0 for some j The term j X ij X in the model, equation (1), represented by TRT*X in the JMP model, equation (2), is an interaction between the qualitative factor, j, and the quantitative covariate, Xij. If H01: is rejected, o Stop testing hypotheses and estimate the parameters of the full model. o The effects of the qualitative factor vary with the covariate. If H01: is not rejected, o adopt reduced model Yij 0 j 1 X ij eij (3) o and test hypotheses 2 and 3, which follow. 2. H02: 1 0 for all j versus H12: 1 0 for some j and 3. H03: j 0 for all j versus H13: j 0 for some j Document1 2 2/8/2016 3. Formulate Test the hypotheses with F-Tests in Fit Model [see equation (2) above] Y = Response Model Effects = TRT, the qualitative explanatory factor X, the quantitative explanatory covariate TRT*X, the interaction between the qualitative factor and the covariate > OK (with defaults) 4. Design α= ___ k= number of treatments (j = 1, 2, …, k) in equation (1). nj = number of observations at treatment level j. Xij = values of covariates. In an experiment, sometimes covariate levels are set by the investigator, sometimes they are random. In an observational study, the covariate values are random. 5. Perform the Study and then Analyze the Sample Example: Ott and Longnecker Exercise 16.5. Perform an anova on the following study. A researcher wants to evaluate the difference in the mean film thickness of a coating placed on silicon wafers using three different processes.(1) Six wafers are randomly assigned to each of the processes.(2) The film thickness (micro-m) and the temperature (C) in the lab during the coating process are recorded on each wafer.(3) The researcher is concerned that fluctuations in temperature may affect the thickness of the coating.(4) Test whether the processes have a difference in mean.(5) Document1 3 2/8/2016 Note (1) Interpretation by numbered sentences Parameters of interest are mean film thickness (units?) for three processes, 1 , 2 , 3 . Process (1, 2, 3) appears to be an explanatory factor, k = 3. Wafer appears to be the observational unit. (2) N = k n = 3 x 6 = 18. It’s an experiment, and wafer is the experimental unit as well as the observational unit. (3) We know the unit of the response, film thickness, is micrometers temperature (C) is also an explanatory variable (4) Temperature (C) is a covariate, i.e., the quantitative version of a qualitative blocking variable. (5) We have a research objective. Document1 4 2/8/2016 Document1 5 2/8/2016 We now Test Hypothesis 1 Analyze > Fit Model > Run Document1 6 2/8/2016 Regression Plot From this we can construct a traditional Anova Table Document1 7 2/8/2016 Analysis of Covariance (Ancova) Source R2 df SS MS F 2 4129 2065 56 <.0001 25% Temp (C) 1 5679 5679 154 <.0001 34% Process*Temp 2 304 152 4 0.0431 2% Model 5 16396 3279 89 <.0001 97% Error 12 441 37 18 Total 17 16838 3 Process P 3% 100% Notice that The degrees of freedom are calculated in the usual way. o The 3 categories of Process have (3 – 1) = 2 degrees of freedom. o The 1st-degree simple linear regressor had 1 degree of freedom. o The interaction has (2 x 1) = 2 degrees of freedom. o The Model has (2 + 1 + 2) degrees of freedom. o The Total has (N – 1) = (18 – 1) = 17 degrees of freedom. o The Error has (17 – 5) = 12 degrees of freedom. The sums of squares and partial R2 are not additive, because the Process and Temperature are not orthogonal. Part of what is explained by process is also explained by temperature, as you can see from the scatterplot. First we consider the interactions o While the interactions are slightly significant, they explain only (partial R2 = ) 2% of the variation above and beyond that which is explained by Process and Temperature, which are highly significant and which together explain 97% 2% 95% if the total variation in the response, film thickness. o This justifies dropping the interaction from the model. Document1 8 2/8/2016 To the right of the Fit Model Output we find the Details for Process, the treatment variable: Note that The Least Square Means are different from the Means However, the least squares means are not meaningful when there is an interaction in the model More about these later Having decided to drop the interaction from the model, we proceed to the Tests of Hypotheses 2 and 3 In JMP, we revisit the Model Specification dialog box of Fit Model (shown above), and remove the Process*Temp (C) interaction effects. Document1 9 2/8/2016 > Run This fits the reduced model produces a scatter-plot with parallel regression lines for each of the three processes (j = 1, 2, 3). The accompanying statistics include the Anacova Table, the Least-Squares Means for Process. From the Process Hotspot > LS Means Tukey HSD produces the usual results. Document1 10 2/8/2016 From this we can construct a traditional Anova Table, and a Table of the Thickness Means by Process Adjusted for Temperature. Analysis of Covariance (Ancova) 3 18 P R2 2185 41 <0.0001 26% 6011 6011 113 <0.0001 36% 3 16092 5364 101 <0.0001 96% Error 14 745 53 C. Total 17 16838 Source df SS MS Process 2 4371 Temp (C) 1 Model F 4% 100% Notice that The Least Square Means are the treatment mean responses adjusted for the covariate, e.g., the mean thickness (micro-m) by process (1, 2, 3) adjusted for temperature (C). The (least-squared) process means adjusted for temperature are the thickness level on the regression line at the mean value of the covariate temperature, which is 28.3 degrees C. The least-squared, i.e., adjusted means are the estimates of j j Document1 11 (4) 2/8/2016 in model (1). The estimate of the grand mean, , is the Mean of Response in Summary of Fit of the JMP Fit Model output. Estimates of the j are retrieve from the Fit Model Hotspot > Estimates > Expanded Estimates See the revision in Figure 1 on page 13 and Figure 2 on page 14. Table 1 Mean film thickness (micrometers) by process (1, 2, 3) adjusted for temperature (C). Thickness (micro-m) Process n Adjusted Mean* SE 1 6 116.2b 2.98 2 6 135.9a 3.06 3 6 95.1c 3.11 * Adjusted means not followed by the same superscript are significantly different at the 0.05 level of significance experimentwise using the Kramer-Tukey HSD (Zar, 2010). Slope Estimation We can estimate the common slope of the parallel regression lines in the reduced model (3) by the coefficient of the covariate, which is Temp (C) in the Parameter Estimates section of the Fit Model output. We see that ̂1 3.19 micrometers/degree C (5) The standard error of ̂1 is given there also as 0.300 micrometers/degree C. To get confidence intervals for the slope we can simply multiply the standard error by the Student’s t critical value with the degrees of freedom of the residual error. In JMP, we can obtain confidence intervals for all estimates in Fit Model by Hotspot > Regression Reports > Show All Confidence Intervals See the revision in Figure 1 on page 13 and Figure 2 on page 14. Document1 12 2/8/2016 Figure 1 Document1 Fit Model output for the reduced model after invoking Hotspot > Regression > Show All Confidence Intervals and Hotspot > Estimates > Expanded Estimates 13 2/8/2016 Figure 2 Fit Model output for the reduced model after invoking Hotspot > Regression > Show All Confidence Intervals and Hotspot > Estimates > Expanded Estimates Figure 3 Fit Model output for the reduced model after invoking Hotspot > Regression > Show All Confidence Intervals and Hotspot > Estimates > Sequential Tests 6. Full Report Methods The effects of process (1, 2, 3) on mean film thickness (micrometers) adjusted for ambient laboratory temperature (C) is studied by analysis of covariance (Zar, 2010) of a balanced design with 6 wafers per process; 18 wafers in all. Document1 14 2/8/2016 Results There are no statistically significant interactions between processes (1, 2, 3) and temperature (C), so interactions are removed from the analysis of covariance model. The effects of process are statistically significant, i.e., mean film thickness adjusted for temperature varied significantly across processes (P < 0.0001). Temperature was also statistically significant (P < 0.0001), accounting for an increase of 3.19 micrometers of film thickness per degree C (SE = 0.300 micrometers/degree C). Process explains (partial R2 = ) 26% of the variation in mean thickness above and beyond that which is explained by temperature, while temperature explains 36% of the variation in mean thickness above and beyond that which is explained by process. The model comprised by process and temperature (without the interactions) explains R2 = 96% of the variation in film thickness, while only 4% is unexplained. See Table 2. Table 2 Mean film thickness (micrometers) by process (1, 2, 3) adjusted for temperature (C). [This is the same as Table 1] Thickness (micro-m) Process n Adjusted Mean* SE 1 6 116.2b 2.98 2 6 135.9a 3.06 3 6 95.1c 3.11 * Adjusted means not followed by the same superscript are significantly different at the 0.05 level of significance experimentwise using the Kramer-Tukey HSD (Zar, 2010). Estimation of non-parallel slopes in the full model if the interactions between treatment and covariate are significant and important Suppose that, after testing the full model in the film thickness study, we had decided to retain the interaction term, i.e., that the interactions are statistically significant (P = 0.04) and important enough (R2 = 2%). That 2% is important is a bit of a stretch, but let’s see how that would have changed the analysis and the full report. Document1 15 2/8/2016 If the interactions are retained, then the regression lines are not parallel, and the adjusted (least-squares) means are irrelevant. The interesting result are the slopes of the lines, which, by virtue of the significant interactions, are significantly different. The estimates of the slopes are derived from the JMP Fit Model > Hotspot > Estimates > Expanded Estimates table: If we right click on the table and Make Into Data Table, we can (i) create a new column and (ii) use its Formula Editor to calculate the estimates, or do so by hand. Copied from JMP Process, j Parameter Term . Slope = Temp(C) + Estimate Estimate 1 Temp (C) 1 1 j 3.546 7.092 1 11 Process[1]*(Temp (C)-28.2778) 0.696 4.242 2 12 Process[2]*(Temp (C)-28.2778) 0.238 3.784 3 13 Process[3]*(Temp (C)-28.2778) -0.935 2.611 Results There are statistically significant interactions between processes (1, 2, 3) and temperature (C), so interactions are not removed from the analysis of covariance model (P = 0.043). The model comprised by process, temperature and their interactions explains R2 = 97% Document1 16 2/8/2016 of the variation in film thickness, while only 3% is unexplained. See Table 2 and Figure 4. Table 3 Figure 4 Document1 Change in film thickness (micrometers/degree C) by process (1, 2, 3) Slope (micrometer s/°C) Process n 1 6 4.2 2 6 3.8 3 6 2.6 Film thickness (micrometers) as a function of temperature (C) for three processes (1, 2, 3) 17 2/8/2016