DOE Tutorial - Winona State University

advertisement
DOE Tutorial – Regression and Analysis of Covariance
Regression Example: Power and Etch Rate (revisited)
Data File: Power-Etch (Ex 3-1).JMP
We have analyzed there data previously by treating the power setting as a categorical
factors with 4 levels using ANOVA. That is analysis is fine, but it does not allow for us
to make inferences about what the etch rate might be if we used a power setting of
210 W. Because the levels of the factor are numeric we could treat them as such and use
regression analysis to develop a model that relates etch rate to the power setting in watts.
In regression we develop models for:
E (Y | X )  the mean of a response Y given X
and
V (Y | X )  the variance of a response Y given X  sometimes denoted Var(Y|X)
Here X represents a set of potential explanatory/predictor variables or factors that we
think might be related to the response Y. Virtually every analysis we will perform in this
class is actually a regression analysis. For example putting the one-way ANOVA (single
factor) model into this regression context we have,
E (Y | X )     i for i  1,..., a
V (Y | X )   2 i.e., the variance is constant and does not depend on X.
To relate a response Y to a single numeric predictor X we often times use a simple linear
regression model. Simple means a single predictor is used and linear means the model is
linear in X.
E (Y | X )   o  1 X
Note:  o  y-intercept and 1  slope of the line
V (Y | X )   2  not required but we only focus on constant variance case here
To fit the simple linear regression model for the power/etch rate experiment make sure
that power level is interpreted as continuous © , then select Analyze > Fit Y by X
placing Etch Rate in the Y box and Power in the X box.
From the Bivariate Fit... pull-down menu select Fit Line to fit the simple linear
regression model for these data. This process will fit the following model to our data.
yi   o  1 xi   i i  1,...., n Note: here n = 20 the total number of observations.
The estimate regression line is
yˆ i  ˆo  ˆ1 xi  137.62 + 2.527 xi
1
Does a line fit these data well?
It does not appear so, it almost
looks the etch rate is a quadratic
function of the power supplied.
Summary Statistics and Tests Results for the Fitted Simple Linear Regression Model
R-square – is the percent of variation in response
that can be explained by the fitted model.
( R 2  88.4% )
Lack of Fit Test – is statistical test used to
determine if assumed model is adequately fits the
data. Here the p-value = .0279 which suggests that
this does not adequately model the relationship
between the etch rate and the power setting.
ANOVA table – in the regression setting this pvalue is used to answer the question: Does the
current model explain a significant amount of
variation in the response? Here the answer is yes
(p < .0001), but this does not mean that this is the
“best” or even an appropriate model.
Parameter p-values – come from testing whether
the population parameter is different from 0. Here
we conclude neither parameter is 0.
The residual plot clearly shows that our model is deficient, as there is an obvious
nonlinear pattern in the residuals. Basically, this plot suggests that we have fit a line to a
curvilinear relationship. To obtain this residual plot select Plot Residuals from the
Linear Fit pull-down menu located below the scatterplot.
A better model for the etch rate as a function of the power setting may be quadratic in
power level, i.e.
E (Y | X )   o  1 X   2 X 2
2
To fit the quadratic model
2
yi   o  1 xi   2 xi   i i  1,..., n
in JMP select Fit Polynomial > 2, quadratic from the Bivariate Fit pull-down menu.
The results of this fit are shown below.
Summary Statistics and Test Results for Quadratic Model
R-square – the quadratic model explain 92% of the
variation in the observed etch rates.
Lack of Fit test – the lack of fit test provides no
evidence against the quadratic model.
ANOVA table – yes this model is useful also
(p < .0001).
The fitted model is given by...
yˆ i  123.43  2.527 xi  .028375( xi  190) 2
Notice the quadratic term is mean centered. This prevents
the estimated coefficient from being small, but does not
change the performance of the model in way. We can use
this equation to estimate etch rate for virtually any power
setting, but we should restrict our predictions to be within the
range of the settings used during the experiment, namely
between 160 W and 220 W.
The residual plot still looks like it has some nonrandom
structure. We could potentially consider using a higher
degree polynomial model. The general philosophy in model
building is that of parsimony, i.e. use the simplest model that
adequately models or explains a given situation. Here the
quadratic model seems sufficient.
3
Example 2 - Analysis of Covariance – Monofilament Fiber Strength
(Example 15-5 pg. 578)
Data File: Fiber-Stength.JMP
The variables in this data file are:
 Machine – machine used make fiber (the effect of interest)
 Diameter – diameter of fiber (covariate, varies from sample to sample)
 Breaking Strength – response (lbs.)
To visualize the analysis of covariance process select Fit Y by X from the Analyze menu
and place Diameter in the X box and Breaking Strength in the Y box. This will create
a scatter plot of tensile strength vs. glue thickness. To color code the points according to
the machine that produced the fiber select Color/Marker by Col... from the Rows menu,
highlight Machine, and the click OK. Then select Grouping Variable... from the
Bivariate Fit... pull-down menu. Highlight Machine as the variable to group by in the
window that opens. Finally select Fit Line from the Fitting menu. This will add a
separate linear fit for breaking strength vs. fiber diameter for each machine. The
resulting plot is shown below.
There does not appear to be any major differences between the machines, however
machine 2 looks like it might produce stronger fibers than the rest. Keep in mind that
there are very few replicates for each machine so differences will have to be “large” to be
statistically significant! Clearly fiber diameter is related to breaking strength.
To perform a formal analysis of covariance of the machine and diameter effects we need
to use the Fit Model approach. To do this in JMP select Fit Model from the Analyze
menu and place Breaking Strength in the Y box and both Machine and Diameter in the
Effects in Model box. The relevant computer output is shown below.
4
Fitting an ANCOVA model
To fit lines with different intercepts and
slopes (i.e. unrelated lines) we highlight both
the factor and the covariate then select Full
Factorial from the Macros pull-down menu
as shown to the left.
The text only discusses the equal slope case
(i.e. parallel lines) in which case we simply
add both Machine and Diameter to the Model
Effects box, but NOT the interaction term.
This means we do not mess with the Macro
pull-down menu at all.
The Effect Test output from the running the “unrelated lines” model is shown below.
From the p-value for the Machine*Diameter interaction (p = .6293) we see that a parallel
line model would be sufficient for these data.
Fitting the parallel line ANCOVA model we have the following output.
These plots give a visualization of the individual effect tests by showing
the relationship between breaking strength and the effect adjusting for
the other. Here we see that Machine as almost no effect after taking
diameter into account, while the opposite is true for the fiber diameter.
The p-values for the effect test are reported below the plots.
The p-value for diameter (p < .0001) indicates obvious significance. This should not be
surprising given the obvious relationship between breaking strength and fiber diameter
exhibited in the scatterplot above. There is little separation between the machine-
5
dependent regression lines suggesting that the machine effect is not large. The p-value for
machine is .1181 providing very weak evidence of a machine differences. Keep in mind
the number of replicates is small so the observed differences would have to be “large”
before they would be detected as significant. With enough replicates we would
eventually conclude that there are machine differences, but given what we see here it is
unlikely they are large enough to be of practical interest.
As with any analysis were a model is fit you should examine the residuals to check for
any assumption violations. A plot of the residuals vs. the fitted values from this model is
shown below, with the exception of the one case with a small predicted breaking strength
nothing stands out as unusual here.
To assess normality first select Save Studentized Residual (i.e., standardized residuals)
from Response Breaking Stength > Save Columns pull-out menu and then use
Distribution to examine their distribution via a histogram and normal quantile plot.
The residuals appear to be approximately normally distributed.
Note: The ANCOVA models
yij     i   ( xij  x )   ij
i  1,..., a
j  1,..., n
This model stated simply says that...
Observed response = overall mean + treatment effect + covariate effect + random error
For separate slopes model we have,
yij     i   i ( xij  x )   ij  notice the slope now has treatment subscript
6
Download