The Analysis of Variance

advertisement
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
1
What If There Are More Than
Two Factor Levels?
• The t-test does not directly apply
• There are lots of practical situations where there are
either more than two levels of interest, or there are
several factors of simultaneous interest
• The analysis of variance (ANOVA) is the appropriate
analysis “engine” for these types of experiments
• The ANOVA was developed by Fisher in the early
1920s, and initially applied to agricultural experiments
• Used extensively today for industrial experiments
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
2
An Example (See pg. 61)
• An engineer is interested in investigating the relationship
between the RF power setting and the etch rate for this tool. The
objective of an experiment like this is to model the relationship
between etch rate and RF power, and to specify the power
setting that will give a desired target etch rate.
• The response variable is etch rate.
• She is interested in a particular gas (C2F6) and gap (0.80 cm),
and wants to test four levels of RF power: 160W, 180W, 200W,
and 220W. She decided to test five wafers at each level of RF
power.
• The experimenter chooses 4 levels of RF power 160W, 180W,
200W, and 220W
• The experiment is replicated 5 times – runs made in random
order
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
3
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
4
An Example (See pg. 62)
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
5
• Does changing the power change the
mean etch rate?
• Is there an optimum level for power?
• We would like to have an objective
way to answer these questions
• The t-test really doesn’t apply here –
more than two factor levels
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
6
The Analysis of Variance (Sec. 3.2, pg. 62)
• In general, there will be a levels of the factor, or a treatments,
and n replicates of the experiment, run in random order…a
completely randomized design (CRD)
• N = an total runs
• We consider the fixed effects case…the random effects case
will be discussed later
• Objective is to test hypotheses about the equality of the a
treatment means
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
7
The Analysis of Variance
• The name “analysis of variance” stems from a
partitioning of the total variability in the
response variable into components that are
consistent with a model for the experiment
• The basic single-factor ANOVA model is
 i  1, 2,..., a
yij     i   ij , 
 j  1, 2,..., n
  an overall mean,  i  ith treatment effect,
 ij  experimental error, NID(0,  2 )
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
8
Models for the Data
There are several ways to write a model
for the data:
yij     i   ij is called the effects model
Let i     i , then
yij  i   ij is called the means model
Regression models can also be employed
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
9
Efeitos fixos ou efeitos aleatórios?
• O modelo estatístico Yij =+i +ij pode descrever duas
situações diferentes com respeito aos efeitos de
tratamento.
• Primeiro, os a tratamentos poderiam ser
especificamente escolhidos pelo experimentador.
• Nessa situação, deseja-se testar hipóteses sobre as
médias de tratamento, e nossas conclusões se aplicam
somente aos níveis do fator considerados na análise.
• As conclusões não podem ser estendidas para
tratamentos similares que não foram explicitamente
considerados.
• Podemos também estimar os parâmetros do modelo
(,i,2).
• Esse é o chamado modelo de efeitos fixos.
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
10
Efeitos fixos ou efeitos aleatórios?
• Alternativamente, os a tratamentos poderiam
representar uma amostra aleatória de uma população
maior de tratamentos.
• Nessa situação, deveríamos ser capazes de estender as
conclusões – baseadas numa amostra de tratamentos para todos os tratamentos na população, tendo eles sido
explicitamente considerados ou não na análise.
• Os i’s são variáveis aleatórias.
• Nesse caso testamos hipóteses sobre a variabilidade
dos i’s e tentamos estimar essa variabilidade.
• Modelo de efeitos aleatórios ou modelo de componentes
de variância (Capítulo 13).
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
11
The Analysis of Variance of the
Fixed Effects Model
• Total variability is measured by the total
sum of squares:
a
n
SST   ( yij  y.. )
2
i 1 j 1
• The basic ANOVA partitioning is:
a
n
a
n
2
(
y

y
)

[(
y

y
)

(
y

y
)]
 ij ..  i. .. ij i.
2
i 1 j 1
i 1 j 1
a
a
n
 n ( yi.  y.. ) 2   ( yij  yi. ) 2
i 1
i 1 j 1
SST  SSTreatments  SS E
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
12
The Analysis of Variance
SST  SSTreatments  SSE
• A large value of SSTreatments reflects large differences in
treatment means
• A small value of SSTreatments likely indicates no
differences in treatment means
• Formal statistical hypotheses are:
H 0 : 1  2 
 a
H1 : At least one mean is different
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
13
The Analysis of Variance
• While sums of squares cannot be directly compared
to test the hypothesis of equal means, mean
squares can be compared.
• A mean square is a sum of squares divided by its
degrees of freedom:
dfTotal  dfTreatments  df Error
an  1  a  1  a(n  1)
SSTreatments
SS E
MSTreatments 
, MS E 
a 1
a(n  1)
• If the treatment means are equal, the treatment and
error mean squares will be (theoretically) equal.
• If treatment means differ, the treatment mean square
will be larger than the error mean square.
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
14
Teorema de Cochran
Z i ~ NID(0,1) para i  1,2,...,n
n
Z
i 1
2
i
 Q1  Q2  ...  Qv , com ν  n
e Q j t em n j graus de liberdade ( j  1,..., ).
Então,
Q1 , Q2 ,...,Q são independentement edist ribuídas com Q j ~  n2j
se, e somentese,
n  n1  n2  ...  nv
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
15
The Analysis of Variance is
Summarized in a Table
• Computing…see text, pp 69
• The reference distribution for F0 is the Fa-1, a(n-1) distribution
• Reject the null hypothesis (equal treatment means) if
F0  F ,a1,a( n1)
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
16
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
17
ANOVA Table
Example 3-1
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
18
The Reference Distribution:
P-value
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
19
ANOVA calculations are usually done via
computer
• Text exhibits sample calculations from
three very popular software packages,
Design-Expert, JMP and Minitab
• See pages 98-100
• Text discusses some of the summary
statistics provided by these packages
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
20
ANOVA NO R
• No R está disponível a função aov(analysis of variance).
aov(stats)
R Documentation
Fit an Analysis of Variance Model
Description
Fit an analysis of variance model by a call to lm for each stratum.
Usage
aov(formula, data = NULL, projections = FALSE, qr = TRUE,
contrasts = NULL, ...)
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
21
formula
a formula specifying the model.
data
A data frame in which the variables specified in the
formula will be found. If missing, the variables are
searched for in the standard way.
projections
Logical flag: should the projections be returned?
qr
Logical flag: should the QR decomposition be
returned?
contrasts
A list of contrasts to be used for some of the factors in
the formula. These are not used for any Error term,
and supplying contrasts for factors only in the Error
term will give a warning.
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
22
dados=read.table(“m://flavia//etche.txt”,header=T)
boxplot(dados$y~dados$rf)
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
23
summary(aov(dados$y~dados$rf))
Df Sum Sq Mean Sq F value Pr(>F)
dados$rf
3 66871 22290
66.797 2.883e-09 ***
Residuals 16 5339
334
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
24
Estimação dos parâmetros do modelo
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
25
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
26
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
27
Intervalos de confiança simultâneos
• As expressões dadas em 3.12 e 3.13 valem separadamente. Isto é,
o nível de confiança 1- aplica-se somente a uma particular
estimativa.
• Porém, em muitos problemas, o experimentador pode desejar
calcular vários intervalos de confiança.
• Se existem r intervalos de nível 1- , então a probabilidade de que
esses r intervalos sejam válidos conjuntamente é pelo menos 1-r .
• O número de intervalos simultâneos, r, não deve ser muito grande.
• Por exemplo, se r=5 e =0,05, o intervalo simultâneo teria confiança
de pelo menos 75% e se r=10, com o mesmo , o nível seria de
pelo menos 50%.
• Uma abordagem que assegura que o nível de confiança simultâneo
não seja tão pequeno é substituir o /2 por /(2r) em cada uma das
experssões 3.12, 3.13.
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
28
Intervalos de confiança simultâneos
• Esse método é conhecido como método de Bonferroni.
• Essencialmente, se existe um conjunto de r intervalos de
confiança a serem construídos o método propõe
substituir α/2 por α/(2r).
• Isso produz um conjunto de r intervalos de confiança
para os quais o nível de confiança simultâneo é pelo
menos 100(1-α)%.
• A justificativa para o método é devida à desigualdade de
Bonferroni:
r
 r

P Eic   1 


i

1



Chapter 3
 PE 
i
i 1
Design & Analysis of Experiments
7E 2009 Montgomery
29
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
30
Model Adequacy Checking in the ANOVA
Text reference, Section 3.4, pg. 75
•
•
•
•
•
•
Checking assumptions is important
Normality
Constant variance
Independence
Have we fit the right model?
Later we will talk about what to do if
some of these assumptions are
violated
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
31
Model Adequacy Checking in the ANOVA
• Examination of
residuals (see text, Sec.
3-4, pg. 75)
eij  yij  yˆij
 yij  yi.
• Computer software
generates the residuals
• Residual plots are very
useful
• Normal probability plot
of residuals
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
32
Other Important Residual Plots
Chapter 3
Design & Analysis of Experiments
7E 2009 Montgomery
33
Download