AnCova01

advertisement
Analysis of Covariance
1. Model
Yij   j   1  1 j  X ij  X   eij
Yij     j   1  1 j  X ij  X   eij
Yij   0   j  1 X ij  1 j  X ij  X   eij
j  1, 2,
, k , i  1, 2,
(1)
, nj
Translation to JMP > Fit Model specification
Yij
Y
 0


j
TRT
 1 X ij
 1 j  X ij  X   eij
X
TRT*X
 Math Notation 
 JMP Notation 
(2)
Variables of Interest
Yij  Quantitative response to observation i of qualitative treatment level j
X ij  Quantitative covariate of observation i of qualitative treatment level j
j = Qualitative treatment or factor
Parameters of Interest
 0 = grand mean response
n
τj = effect of qualitative treatment j,

j 1
j
0
β = slope of regression line if all lines are parallel
Document1
1
2/8/2016
βj = effect if treatment j on slope of regression line,

n
j 1
1 j  0
    0  1 X   the grand mean
 j      j     0   j  1 X   the treatment means adjusted for the covariate X
Unobservable Random Variable
eij = random error or residual of observation i of treatment j
Assume eij are distributed with a
(i)
mean of 0
(ii)
homogeneous standard deviation σ, and
(iii)
normally
2. Hypotheses
1. H01:  j  0 for all j versus H11:  j  0 for some j

The term  j  X ij  X  in the model, equation (1), represented by TRT*X
in the JMP model, equation (2), is an interaction between the qualitative
factor, j, and the quantitative covariate, Xij.

If H01: is rejected,
o Stop testing hypotheses and estimate the parameters of the full
model.
o The effects of the qualitative factor vary with the covariate.

If H01: is not rejected,
o adopt reduced model
Yij   0   j  1 X ij  eij
(3)
o and test hypotheses 2 and 3, which follow.
2. H02: 1  0 for all j versus H12: 1  0 for some j
and
3. H03:  j  0 for all j versus H13:  j  0 for some j
Document1
2
2/8/2016
3. Formulate
Test the hypotheses with F-Tests in Fit Model [see equation (2) above]
Y = Response
Model Effects =

TRT, the qualitative explanatory factor

X, the quantitative explanatory covariate

TRT*X, the interaction between the qualitative factor and the covariate
> OK (with defaults)
4. Design
α=
___
k=
number of treatments (j = 1, 2, …, k) in equation (1).
nj =
number of observations at treatment level j.
Xij =
values of covariates. In an experiment, sometimes covariate levels are set
by the investigator, sometimes they are random. In an observational study,
the covariate values are random.
5. Perform the Study and then Analyze the Sample
Example: Ott and Longnecker Exercise 16.5.
Perform an anova on the following study. A researcher wants to evaluate the
difference in the mean film thickness of a coating placed on silicon wafers using
three different processes.(1) Six wafers are randomly assigned to each of the
processes.(2) The film thickness (micro-m) and the temperature (C) in the lab
during the coating process are recorded on each wafer.(3) The researcher is
concerned that fluctuations in temperature may affect the thickness of the
coating.(4) Test whether the processes have a difference in mean.(5)
Document1
3
2/8/2016
Note
(1) Interpretation by numbered sentences

Parameters of interest are mean film thickness (units?) for three
processes, 1 , 2 , 3 .

Process (1, 2, 3) appears to be an explanatory factor, k = 3.

Wafer appears to be the observational unit.
(2) N = k n = 3 x 6 = 18. It’s an experiment, and wafer is the experimental unit as well
as the observational unit.
(3) We know

the unit of the response, film thickness, is micrometers

temperature (C) is also an explanatory variable
(4) Temperature (C) is a covariate, i.e., the quantitative version of a qualitative
blocking variable.
(5) We have a research objective.
Document1
4
2/8/2016
Document1
5
2/8/2016
We now
Test Hypothesis 1
Analyze > Fit Model
> Run
Document1
6
2/8/2016
Regression Plot
From this we can construct a traditional Anova Table
Document1
7
2/8/2016
Analysis of Covariance (Ancova)
Source
R2
df
SS
MS
F
2
4129
2065
56
<.0001
25%
Temp (C)
1
5679
5679
154
<.0001
34%
Process*Temp
2
304
152
4
0.0431
2%
Model
5
16396
3279
89
<.0001
97%
Error
12
441
37
18 Total
17
16838
3 Process
P
3%
100%
Notice that

The degrees of freedom are calculated in the usual way.
o The 3 categories of Process have (3 – 1) = 2 degrees of freedom.
o The 1st-degree simple linear regressor had 1 degree of freedom.
o The interaction has (2 x 1) = 2 degrees of freedom.
o The Model has (2 + 1 + 2) degrees of freedom.
o The Total has (N – 1) = (18 – 1) = 17 degrees of freedom.
o The Error has (17 – 5) = 12 degrees of freedom.

The sums of squares and partial R2 are not additive, because the Process
and Temperature are not orthogonal. Part of what is explained by process
is also explained by temperature, as you can see from the scatterplot.

First we consider the interactions
o While the interactions are slightly significant, they explain only
(partial R2 = ) 2% of the variation above and beyond that which is
explained by Process and Temperature, which are highly
significant and which together explain
97%  2%  95%
if the total variation in the response, film thickness.
o This justifies dropping the interaction from the model.
Document1
8
2/8/2016
To the right of the Fit Model Output we find the Details for Process, the treatment
variable:
Note that

The Least Square Means are different from the Means

However, the least squares means are not meaningful when there is an
interaction in the model

More about these later
Having decided to drop the interaction from the model, we proceed to the
Tests of Hypotheses 2 and 3
In JMP, we revisit the Model Specification dialog box of Fit Model (shown
above), and remove the Process*Temp (C) interaction effects.
Document1
9
2/8/2016
> Run
This fits the reduced model
produces a scatter-plot with parallel regression lines for each of the three
processes (j = 1, 2, 3).
The accompanying statistics include the Anacova Table, the Least-Squares Means
for Process. From the Process Hotspot > LS Means Tukey HSD produces the
usual results.
Document1
10
2/8/2016
From this we can construct a traditional Anova Table, and a Table of the
Thickness Means by Process Adjusted for Temperature.
Analysis of Covariance (Ancova)
3
18
P
R2
2185
41 <0.0001
26%
6011
6011
113 <0.0001
36%
3
16092
5364
101 <0.0001
96%
Error
14
745
53
C. Total
17
16838
Source
df
SS
MS
Process
2
4371
Temp (C)
1
Model
F
4%
100%
Notice that

The Least Square Means are the treatment mean responses adjusted for
the covariate, e.g., the mean thickness (micro-m) by process (1, 2, 3)
adjusted for temperature (C).

The (least-squared) process means adjusted for temperature are the
thickness level on the regression line at the mean value of the covariate
temperature, which is 28.3 degrees C.
The least-squared, i.e., adjusted means are the estimates of
 j    j
Document1
11
(4)
2/8/2016
in model (1). The estimate of the grand mean,  , is the Mean of Response in Summary of
Fit of the JMP Fit Model output. Estimates of the  j are retrieve from the Fit Model
Hotspot > Estimates > Expanded Estimates
See the revision in Figure 1 on page 13 and Figure 2 on page 14.
Table 1
Mean film thickness (micrometers) by process (1, 2, 3) adjusted for
temperature (C).
Thickness (micro-m)
Process
n
Adjusted Mean*
SE
1
6
116.2b
2.98
2
6
135.9a
3.06
3
6
95.1c
3.11
* Adjusted means not followed by the same superscript are significantly
different at the 0.05 level of significance experimentwise using the
Kramer-Tukey HSD (Zar, 2010).
Slope Estimation
We can estimate the common slope of the parallel regression lines in the reduced model
(3) by the coefficient of the covariate, which is Temp (C) in the Parameter Estimates
section of the Fit Model output. We see that
̂1  3.19 micrometers/degree C
(5)
The standard error of ̂1 is given there also as 0.300 micrometers/degree C.
To get confidence intervals for the slope we can simply multiply the standard error by the
Student’s t critical value with the degrees of freedom of the residual error. In JMP, we
can obtain confidence intervals for all estimates in Fit Model by
Hotspot > Regression Reports > Show All Confidence Intervals
See the revision in Figure 1 on page 13 and Figure 2 on page 14.
Document1
12
2/8/2016
Figure 1
Document1
Fit Model output for the reduced model after invoking Hotspot >
Regression > Show All Confidence Intervals and Hotspot > Estimates
> Expanded Estimates
13
2/8/2016
Figure 2
Fit Model output for the reduced model after invoking Hotspot >
Regression > Show All Confidence Intervals and Hotspot > Estimates
> Expanded Estimates
Figure 3
Fit Model output for the reduced model after invoking Hotspot >
Regression > Show All Confidence Intervals and Hotspot > Estimates
> Sequential Tests
6. Full Report
Methods
The effects of process (1, 2, 3) on mean film thickness (micrometers) adjusted for
ambient laboratory temperature (C) is studied by analysis of covariance (Zar, 2010) of a
balanced design with 6 wafers per process; 18 wafers in all.
Document1
14
2/8/2016
Results
There are no statistically significant interactions between processes (1, 2, 3) and
temperature (C), so interactions are removed from the analysis of covariance model. The
effects of process are statistically significant, i.e., mean film thickness adjusted for
temperature varied significantly across processes (P < 0.0001). Temperature was also
statistically significant (P < 0.0001), accounting for an increase of 3.19 micrometers of
film thickness per degree C (SE = 0.300 micrometers/degree C). Process explains (partial
R2 = ) 26% of the variation in mean thickness above and beyond that which is explained
by temperature, while temperature explains 36% of the variation in mean thickness above
and beyond that which is explained by process. The model comprised by process and
temperature (without the interactions) explains R2 = 96% of the variation in film
thickness, while only 4% is unexplained. See Table 2.
Table 2
Mean film thickness (micrometers) by process (1, 2, 3) adjusted for
temperature (C). [This is the same as Table 1]
Thickness (micro-m)
Process
n
Adjusted Mean*
SE
1
6
116.2b
2.98
2
6
135.9a
3.06
3
6
95.1c
3.11
* Adjusted means not followed by the same superscript are significantly
different at the 0.05 level of significance experimentwise using the
Kramer-Tukey HSD (Zar, 2010).
Estimation of non-parallel slopes in the full model if the interactions
between treatment and covariate are significant and important
Suppose that, after testing the full model in the film thickness study, we had decided to
retain the interaction term, i.e., that the interactions are statistically significant (P = 0.04)
and important enough (R2 = 2%). That 2% is important is a bit of a stretch, but let’s see
how that would have changed the analysis and the full report.
Document1
15
2/8/2016
If the interactions are retained, then the regression lines are not parallel, and the adjusted
(least-squares) means are irrelevant. The interesting result are the slopes of the lines,
which, by virtue of the significant interactions, are significantly different.
The estimates of the slopes are derived from the JMP
Fit Model > Hotspot > Estimates > Expanded Estimates
table:
If we right click on the table and Make Into Data Table, we can (i) create a new column
and (ii) use its Formula Editor to calculate the estimates, or do so by hand.
Copied from JMP
Process,
j
Parameter Term
.
Slope =
Temp(C) + Estimate
Estimate
1 Temp (C)
 1  1 j
3.546
7.092
1
11 Process[1]*(Temp (C)-28.2778)
0.696
4.242
2
12 Process[2]*(Temp (C)-28.2778)
0.238
3.784
3
13 Process[3]*(Temp (C)-28.2778)
-0.935
2.611
Results
There are statistically significant interactions between processes (1, 2, 3) and temperature
(C), so interactions are not removed from the analysis of covariance model (P = 0.043).
The model comprised by process, temperature and their interactions explains R2 = 97%
Document1
16
2/8/2016
of the variation in film thickness, while only 3% is unexplained. See Table 2 and Figure
4.
Table 3
Figure 4
Document1
Change in film thickness (micrometers/degree C) by process (1, 2, 3)
Slope
(micrometer
s/°C)
Process
n
1
6
4.2
2
6
3.8
3
6
2.6
Film thickness (micrometers) as a function of temperature (C) for
three processes (1, 2, 3)
17
2/8/2016
Download