Uploaded by arafathosain

Hsuhl DAEClass Chap3

advertisement
Chapter 3 Analysis of Variance
(ANOVA; 變異 數 分 析 )
許湘伶
Applied Linear Regression Models
(Kutner, Nachtsheim, Neter, Li)
Chapter 16
Design and Analysis of Experiments
(Douglas C. Montgomery)
hsuhl (NUK)
DAE Chap. 3
1 / 46
Part I
Supplement
hsuhl (NUK)
DAE Chap. 3
2 / 46
Relation between Regression and Analysis of Variance
Regression model:
yi = β0 + β1 X1i + · · · + βk Xki + εi , i = 1, . . . , n
ANOVA Model or One-way model:
yij = µi + ij = µ + τi + ij ,
hsuhl (NUK)
DAE Chap. 3
i = 1, . . . , a
j = 1, . . . , n
3 / 46
Relation between Regression and Analysis of Variance (cont.)
Analysis of variance models differ from ordinary regression models in
two key respects:
1
The explanatory or predictor variables in ANOVA models may be
qualitative.
2
If the predictor variables are quantitative, no assumption is made
in ANOVA models about the nature of the statistical relation
between Xs and Y.
hsuhl (NUK)
DAE Chap. 3
4 / 46
Relation between Regression and Analysis of Variance (cont.)
hsuhl (NUK)
DAE Chap. 3
5 / 46
Relation between Regression and Analysis of Variance (cont.)
When indicator variables are so used with regression models, the
regression results will be identical to those obtained with ANOVA
models.
ANOVA models and regression models with indicator variables
will lead to identical results.
hsuhl (NUK)
DAE Chap. 3
6 / 46
Relation between Regression and Analysis of Variance (cont.)
Figure : Regression model
Figure : Figure 16.4 Illustration of Partitioning of Total Deviations Yij − Ȳ··
hsuhl (NUK)
DAE Chap. 3
7 / 46
Part II
Chapter 3 The Analysis of Variance
hsuhl (NUK)
DAE Chap. 3
8 / 46
Outline
1
Example
2
The analysis of variance
3
Analysis of the fixed effects model
4
Model adequacy checking
5
Practical interpretation of results
6
Sample computer output
7
Determining sample size
8
Other example of single-factor experiments
9
The random effect model
10
The regression approach to the ANOVA
11
Nonparametric methods in the ANOVA
hsuhl (NUK)
DAE Chap. 3
9 / 46
Example
methods for the design and analysis of single-factor experiments
with a levels of the factor (or a treatments)
Assume: completely randomized
wafer(晶片)
Relationship: RF power
setting vs. the etch rate(蝕刻
速率)
I
I
RF power: 4 levels:
160, 180, 200, 220 W
蝕刻速率: 測量物質從晶圓
表面被移除的的速率有多快
n = 5 replicates- 20 runs in
random order
hsuhl (NUK)
DAE Chap. 3
10 / 46
Example (cont.)
hsuhl (NUK)
DAE Chap. 3
11 / 46
Example (cont.)
no strong evidence to suggest that the variability in the etch rate
around the average depends on the power setting
Test: differences between the mean etch rates at a = 4 levels of
RF power
1
2
t-test for all six possible pairs of means: inflates the type I error
the analysis of variance
hsuhl (NUK)
DAE Chap. 3
12 / 46
ANOVA
a treatments of a single factor
yij : the jth observation taken under treatment i
means model:
yij = µi + ij
hsuhl (NUK)
DAE Chap. 3
i = 1, 2, . . . , a
j = 1, 2, . . . , n
13 / 46
ANOVA (cont.)
Model:
yij = µi + ij
i = 1, 2, . . . , a
j = 1, 2, . . . , n
= µ + τi + ij
mean model
effect model
yij : the ij observation
µi : the mean of the ith factor level
µ: overall mean
τi : the ith treatment effect
ij : the random error component; sources of variability
I
I
I
I
measurement
variability from uncontrolled factors
differences between the experimental unit
noise in the process
hsuhl (NUK)
DAE Chap. 3
14 / 46
ANOVA (cont.)
yij = µi + ij
i = 1, 2, . . . , a
j = 1, 2, . . . , n
= µ + τi + ij
mean model
effect model
linear statistical models
one-way or single-factor analysis of variance model (單因子變異
數分析)
the effect model is more widely encountered in the experimental
design literature
object:
I
I
test hypotheses about the treatment means
estimate model parameters: (µ, τi , σ 2 )
ij ∼ NID(0, σ 2 ) ⇒ yij ∼ N(µ + τi , σ 2 )
hsuhl (NUK)
DAE Chap. 3
15 / 46
ANOVA (cont.)
fixed effects model (固定效應模型): chosen by experimenter
random effects model (隨機效應模型; components of variance
model變異數成分模型): (Chap. 3.9; Chap. 13)
a treatment could be a random sample from a larger population of
treatments
hsuhl (NUK)
DAE Chap. 3
16 / 46
Notation
ȳi· : the average of the observations under the ith treatment
y·· : the grand total of all the observations
ȳ·· : the grand average of all the observations
yi· =
y·· =
n
X
ȳi· = yi· /n
yij
j=1
a X
n
X
yij
ȳ·· = y·· /N,
i = 1, 2, . . . , a
N = an
i=1 j=1
hsuhl (NUK)
DAE Chap. 3
17 / 46
Testing
Testing the equality of the a treatment means E(yij ) = µ + τi = µi :
Hypothesis:
H 0 : µ1 = µ2 = · · · = µa
H a : µi =
6 µj for at least one pair (i, j)
H0 : τ1 = τ2 = · · · = τa = 0
Ha : τi =
6 0 for at least one i
Pa
a
X
i=1 µi
∵
=µ ⇔
τi = 0
a
i=1
⇔
The appropriate procedure for testing the equality of a treatment
means is the analysis of variance.
hsuhl (NUK)
DAE Chap. 3
18 / 46
Testing (cont.)
hsuhl (NUK)
DAE Chap. 3
19 / 46
Decomposition of the Total Sum of Squares
ANOVA: derived from a partitioning of total variability into its
component parts
1
2
3
SST : the total corrected sum of squares
SSTreatment : the sum of squares due to treatments (between
treatment)
SSE : the sum of squares due to error (within treatments)
a X
n
X
SST =
(yij − ȳ·· )2
(N−1)
i=1 j=1
a
X
2
(ȳi· − ȳ·· ) +
=n
i=1
a X
n
X
(yij − ȳi· )2
i=1 j=1
= SSTreatment + SSE
(a−1)
hsuhl (NUK)
(N−a)
DAE Chap. 3
20 / 46
Decomposition of the Total Sum of Squares (cont.)
hsuhl (NUK)
DAE Chap. 3
21 / 46
Decomposition of the Total Sum of Squares (cont.)
Total variability: can be partitioned into
1
the total corrected sum of squares
a X
n
a X
n
X
X
y2
2
SST =
(yij − ȳ·· ) =
y2ij − ··
N
i=1 j=1
2
a sum of squares of the differences between the treatment average
and the grand average
SSTreatment = n
a
X
i=1
3
i=1 j=1
a
1 X 2 y2··
(ȳi· − ȳ·· ) =
yi· −
n
N
2
i=1
a sum of squares of the differences of observation within
treatments from the treatment average
SSE =
a X
n
X
(yij − ȳi· )2 = SST − SSTreatment
i=1 j=1
hsuhl (NUK)
DAE Chap. 3
22 / 46
Decomposition of the Total Sum of Squares (cont.)
1
2
a pooled estimate of the common variance within each of the a
treatments
SSE
N−a
an estimate of σ 2 if µi s are all equal
SSTreatment
a−1
3
ANOVA identity: provide two estimated of σ 2
hsuhl (NUK)
DAE Chap. 3
23 / 46
Decomposition of the Total Sum of Squares (cont.)
Error mean square (MSE; 誤差均方):
1
2
SSE
N−a
E(MSE) = σ 2
MSE =
Treatment mean square (處理均方):
1
2
3
SSTreatment
a−1 P
n ai=1 τi2
E(MSTreatment ) = σ 2 +
a−1
if there are no differences in treatment means (i.e. τi = 0),
MSTreatment also estimate σ 2
MSTreatment =
A test of hypothesis of no difference in treatment means can be
performed by comparing METreatment and MSE
hsuhl (NUK)
DAE Chap. 3
24 / 46
Statistical Analysis
Assumptions
ij ∼ NID(0, σ 2 ) ⇒ yij ∼ NID(µ + τi , σ 2 )
Cochran’s Theorem
SST : a sum of squares in normally distributed r.v.
1
2
3
4
SST /σ 2 ∼ χ2N−1
SSTreatment /σ 2 ∼ χ2a−1 if H0 : τi = 0 is true
SSE /σ 2 ∼ χ2N−a
SSTreatment /σ 2 and SSE /σ 2 are independent χ2 r.v.
⇒ test statistic: F0 =
hsuhl (NUK)
SSTreatment /(a − 1)
MSTreatment H0
=
∼ Fa−1,N−a
SSE /(N − a)
MSE
DAE Chap. 3
25 / 46
Statistical Analysis (cont.)
Cochran’s Theorem
Let Zi ∼ NID(0, 1), i = 1, . . . , ν, and
ν
X
Zi2 =
i=1
s
X
Qi ,
i=1
where s ≤ ν and Qi has νi d.f. (i=1,. . . ,s). Then Qi , ı = 1, . . . , s
are independent χ2νi r.v., if and only if
ν=
s
X
νi
i=1
hsuhl (NUK)
DAE Chap. 3
26 / 46
Statistical Analysis (cont.)
If H0 is false, MSTreatment > MSE
⇒ reject H0 if F0 is too large, i.e., F0 > Fα,a−1,N−a
ANOVA table:
hsuhl (NUK)
DAE Chap. 3
27 / 46
Statistical Analysis (cont.)
The Plasma Etching Experiment
H0 : µ1 = µ2 = µ3 = µ4 vs. H1 : some means are different
hsuhl (NUK)
DAE Chap. 3
28 / 46
Statistical Analysis (cont.)
## ANOVA table
etch$FRF <- as.factor(etch$RF)
etch.aov <- aov(rate˜FRF,data=etch)
summary(etch.aov)
FRF
Residuals
Df
3
16
Sum Sq Mean Sq F value Pr(>F)
66870.55 22290.18
66.80 0.0000
5339.20
333.70
F0 > F(0.99, 3, 16) = 5.29
hsuhl (NUK)
DAE Chap. 3
29 / 46
Estimation of the Model Parameters
Model:
yij = µ + τi + ij
i = 1, . . . , a
j = 1, . . . , n
Parameter: µ, τi , σ 2
Estimates:
I
I
I
I
overall mean: µ̂ = ȳ··
treatment effect: τ̂i = ȳi· − ȳ·· , i = 1, . . . , a
µi : µ̂i = µ̂ + τ̂i = ȳi·
σ 2 : σ̂ 2 = MSE
hsuhl (NUK)
DAE Chap. 3
30 / 46
Estimation of the Model Parameters (cont.)
ij ∼ NID(0, σ 2 ) ⇒ ȳi· ∼ N(µi , σ 2 /n)
100(1 − α)% Confidence interval:
r
r
MSE
MSE
ȳi· − tα/2,N−a
≤µi ≤ ȳi· + tα/2,N−a
n
n
r
r
2MSE
2MSE
ȳi· − ȳj· − tα/2,N−a
≤ µi −µj ≤ ȳi· − ȳj· + tα/2,N−a
n
n
hsuhl (NUK)
DAE Chap. 3
31 / 46
Estimation of the Model Parameters (cont.)
Ex 3.3
overall mean: µ̂ = 617.75
treatment effect:
i
1
2
3
4
RF power
160
180 200
220
τ̂i -66.55 -30.35 7.65 89.25
95% confidence interval for µ4 : (one-at-a-time)
689.6815 ≤ µ4 ≤ 724.3185
Bonferroni method: correct level α/2r
hsuhl (NUK)
DAE Chap. 3
32 / 46
Unbalanced data
ni observations under treatment i (i = 1, . . . , a)
P
N = ai=1 ni : total sample size
SST =
SSTreatment =
ni
a X
X
i=1 j=1
a
X
y2i·
i=1
hsuhl (NUK)
DAE Chap. 3
ni
y2ij −
−
y2··
N
y2··
N
33 / 46
Model Adequacy Checking
ŷij : estimate of yij
ŷij = µ̂ + τ̂i = ȳi·
residual eij : investigating violations of the basic assumptions and
model adequacy
eij = yij − ŷij
I
I
I
I
The checking should be automatic
Model is adequate ⇒ eij s should be structureless
graphical analysis
how to deal with commonly occurring abnormalities
standardized residual: dij = √
hsuhl (NUK)
eij
MSE
DAE Chap. 3
34 / 46
Model Adequacy Checking (cont.)
Residual plot
## Residual plot
opar <- par(mfrow=c(2,2),cex=.8)
plot(etch.aov)
par(opar)
hsuhl (NUK)
DAE Chap. 3
35 / 46
Model Adequacy Checking (cont.)
hsuhl (NUK)
DAE Chap. 3
36 / 46
Model Adequacy Checking (cont.)
eij vs. time: independence assumption
eij vs. ŷij : nonconstant variance-variance-stabilizing
transformation
hsuhl (NUK)
DAE Chap. 3
37 / 46
Statistical Tests for Equality of Variance
Bartlett’s test:
H0 :σ12 = σ22 = · · · = σa2
Ha : above not true for at least on σi2
Test statistic:
χ20 = 2.3026
q H0 2
∼ χa−1
c
q = (N − a) log10 Sp2 −
a
X
(ni − 1) log10 Si2
i=1
a
X
1
c=1+
(ni − 1)−a − (N − a)−1
3(a − 1)
i=1
Pa
2
(ni − 1)Si
Sp2 = i=1
N−a
hsuhl (NUK)
DAE Chap. 3
!
38 / 46
Statistical Tests for Equality of Variance (cont.)
Reject H0 : χ20 > χα,a−1
very sensitive to the normality assumption
>
bartlett.test(rate˜RF,data=etch)
Bartlett test of homogeneity of variances
data: rate by RF
Bartlett’s K-squared = 0.4335, df = 3, p-value = 0.9332
> qchisq(0.95,3)
[1] 7.814728
hsuhl (NUK)
DAE Chap. 3
39 / 46
Statistical Tests for Equality of Variance (cont.)
Modified Levene test:
robust to departures from normality
considering the absolute deviation of yij from the treatment
median ỹi· :
dij = |yij − ỹi· |
i = 1, 2, . . . , a
j = 1, 2, . . . , n
The test statistic for Levene’s test is simply the usual ANOVA F
statistic for testing equality of means applied to the absolute
deviations
hsuhl (NUK)
DAE Chap. 3
40 / 46
Statistical Tests for Equality of Variance (cont.)
Peak Discharge Data
hsuhl (NUK)
DAE Chap. 3
41 / 46
Statistical Tests for Equality of Variance (cont.)
hsuhl (NUK)
DAE Chap. 3
42 / 46
Statistical Tests for Equality of Variance (cont.)
>
>
>
library(lawstat)
peak.aov<-aov(Observ˜as.factor(Method),data=peak)
summary(peak.aov)
Df Sum Sq Mean Sq F value
Pr(>F)
as.factor(Method) 3 708.3
236.1
76.07 4.11e-11 ***
Residuals
20
62.1
3.1
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
leveneTest(peak$Observ,as.factor(peak$Method))
Levene’s Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 4.5684 0.01357 *
20
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
hsuhl (NUK)
DAE Chap. 3
43 / 46
Statistical Tests for Equality of Variance (cont.)
Transformation: y∗ij =
hsuhl (NUK)
√
yij
DAE Chap. 3
44 / 46
Statistical Tests for Equality of Variance (cont.)
Formal method: Box-Cox Method
## Box-Cox Method
library(MASS)
boxcox(Observ ˜ Method, data = peak,lambda = seq(-1, 1, length = 10))
hsuhl (NUK)
DAE Chap. 3
45 / 46
Comparing Among Treatment Means
ANOVA:
reject H0 ⇒ differences between the treatment means
which means differ is not specified
multiple comparison methods
ȳi· ∼ N(µi , σ 2 /n),
σ̂ 2 = MSE
⇒µ1 6= µ2 6= µ3 6= µ4
hsuhl (NUK)
DAE Chap. 3
46 / 46
Download