The Simplest Possible Experiment*

advertisement
Handout 4: The Statistical Model
Example 4.1 Consider the following etch rate data. The goal is to make comparisons across the
power settings.
Outcomes in Run Order, i.e. random order
Outcomes in Standard Order
Summaries for this data.
Overall Average
Average for each power setting
1
The Statistical Model
In the last handout, we carried out the ANOVA and intuitively obtained measures of error by
computing the sums of squares “by hand.” These calculations are made much simpler using the
framework for a general linear model. The general linear model approach does require the use
of matrices and some linear algebra operations.
In the previous handout, the goal was to compare different levels of a single factor. One way to
write the statistical model for our example is as follows:
y ij  μ  τ i  ε ij ,
where


i=1, 2, 3, and 4 identifies the treatment level, and
j=1, 2, 3, 4, 5 denotes the replicate.
Identify the meaning of each of the terms in the model.
y ij :
μ:
τi :
ε ij :
2
The Model in Matrix Notation
The easiest way to work with such a model is through the use of matrices. The model for our
simple example can be expressed in matrix form as follows.
𝑦11
𝜇
𝜏1
𝜀11
𝑦12
𝜏1
𝜀12
𝜇
𝑦13
𝜇
𝜏1
𝜀13
𝑦14
𝜏1
𝜀14
𝜇
𝑦15
𝜇
𝜏1
𝜀15
𝑦21
𝜏2
𝜀21
𝜇
𝑦22
𝜇
𝜏2
𝜀22
𝑦23
𝜏2
𝜀23
𝜇
𝑦24
𝜇
𝜏2
𝜀24
𝑦25
𝜏2
𝜀25
𝜇
𝑦31 = 𝜇 + 𝜏3 + 𝜀31
𝑦32
𝜇
𝜏3
𝜀32
𝑦33
𝜇
𝜏3
𝜀33
𝑦34
𝜇
𝜏3
𝜀34
𝑦35
𝜏3
𝜀35
𝜇
𝑦41
𝜇
𝜏4
𝜀41
𝑦42
𝜏4
𝜀42
𝜇
𝑦43
𝜇
𝜏4
𝜀43
𝑦44
𝜏4
𝜀44
𝜇
[𝑦45 ] [𝜇] [𝜏4 ] [𝜀45 ]
which is equivalent to
3
Here, y represents the response vector and X represents the design matrix. The estimated
model parameters can be obtained as follows -- this approach to estimating the model
parameter is very general and in fact works for any linear model.
𝑀𝑜𝑑𝑒𝑙 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 = (X′ X)−1 X′ y
Compute (X ′ X)−1
Problem: The columns of X ′ X are not linearly independent; thus, the inverse of X ′ X cannot be
computed.
Solution: We need to re-parameterize the model so that the model parameters can be
estimated. Consider the following re-parameterization. This re-parameterization utilizes the
“Sum to Zero” restriction 𝜏1 + 𝜏2 + 𝜏3 + 𝜏4 = 0.
Given this parameterization,
𝜏4 = −(𝜏1 + 𝜏2 + 𝜏3 )
Design matrix using the sum-to-zero parameterization.
4
Computing these estimates in Minitab
Select Stat > ANOVA > General Linear Model.
5
Getting the Predicted Outcomes using our General Linear Model
The predicted response vector from our simply contains the mean for each treatment level.
Compute the predicted response vector for our data.
̂, is given by
Predicted response, often identified as 𝒚
̂
𝒚
= 𝑿(𝑿′ 𝑿)−𝟏 𝑿′ 𝒀
(𝑿′ 𝑿)−𝟏 𝑿′ 𝒀
= 𝑿⏟
𝝁
̂
̂
= 𝑿𝝁
μ̂
𝜏̂1
= 𝑿[ ]
𝜏̂ 2
𝜏̂ 3
For our design matrix and estimated vector of model coefficients we have the following
predicted outcomes.
𝑦̂11
1
𝑦̂12
1
𝑦̂13
1
𝑦̂14
1
𝑦̂15
1
𝑦̂21
1
𝑦̂22
1
𝑦̂23
1
𝑦̂24
1
𝑦̂25
1
̂=
𝒚
=
𝑦̂31
1
𝑦̂32
1
1
𝑦̂33
1
𝑦̂34
1
𝑦̂35
1
𝑦̂41
1
𝑦̂42
1
𝑦̂43
1
𝑦̂44
[1
[𝑦̂45 ]
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
−1
−1
−1
−1
−1
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
−1
−1
−1
−1
−1
0
551.2
0
551.2
0
551.2
0
551.2
0
551.2
0
587.4
0
587.4
0
587.4
0 617.75
587.4
0 −66.55
587.4
[
]=
1 −30.35
625.4
1
7.65
625.4
1
625.4
1
625.4
1
625.4
−1
707.0
−1
707.0
−1
707.0
−1
707.0
[707.0]
−1]
6
Getting the Residuals, i.e. Errors, using our General Linear Model
The amount of error present in our model is simply the difference between the original
response vector and the predicted response vector. The residual vector is computed as follows.
̂)
𝒓̂ = (𝒚 − 𝒚
Getting the residual vector for our example.
23.8
575
551.2
−21.2
530
551.2
−9.2
542
551.2
−12.2
539
551.2
18.8
570
551.2
22.6
610
587.4
5.6
593
587.4
−22.4
565
587.4
2.6
590
587.4
−8.4
579
587.4
𝒓̂ =
−
=
−15.4
610
625.4
25.6
651
625.4
−25.4
600
625.4
3.6
629
625.4
11.6
637
625.4
8.0
715
707.0
18.0
725
707.0
3.0
710
707.0
−7.0
700
707.0
[685] [707.0] [−22.0]
The sums-of-squared error can easily be computed as follows
𝑆𝑆𝐸 = 𝒓̂′ 𝒓̂
The average amount of squared error can be computed as
𝜎̂ 2 = 𝑀𝑆𝐸 =
𝑆𝑆𝐸
𝒓̂′ 𝒓̂
=
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 𝑑𝑓𝑒𝑟𝑟𝑜𝑟
For our example, we get
𝜎̂ 2 =
𝒓̂′ 𝒓̂
5339
=
= 334
𝑑𝑓𝑒𝑟𝑟𝑜𝑟
16
7
The Standard Error of Model Estimates
The estimate of 𝜎̂ 2 is used in computing the standard errors of the estimated model
parameters. The standard error estimates of our model parameters are necessary for
hypothesis testing and for computing confidence intervals (which we used in the previous
handout).
The variance-covariance matrix of the estimated parameter vector is given by
̂ ) = 𝜎̂ 2 (𝑿′ 𝑿)−𝟏
Var(𝝁
Computing the variance of the estimated model parameters for our example.
μ̂
𝜏̂1
̂ ) = 𝑉𝑎𝑟 ([ ]) = 𝜎̂ 2 (𝑿′ 𝑿)−1
Var(𝝁
𝜏̂ 2
𝜏̂ 3
20 0
0
0 −𝟏
0 10 5
5
= 334 [
]
0
5 10 5
0
5
5 10
0.05
0
0
0
0
0.15 −0.05 −0.05
= 334 [
]
0
−0.05 0.15 −0.05
0
−0.05 −0.05 0.15
The standard errors for the estimated model parameters obtained from Minitab.
The diagonal elements of this matrix are the variances for each of the estimated model
parameters. The off-diagonal elements are the covariances. A nonzero covariance implies that
variation in of the estimated model parameters influences the variation in another. The design
matrix results in, i.e. causes, a covariance structure amongst the 𝜏̂ ′𝑠.
This covariance structure implies that the design choice will impact our ability to estimate one
treatment level effect independent of another. Designs that permit one treatment level effect
to be estimated independent of another are often considered optimal designs.
8
Next, we will consider the standard error necessary to compare across treatment levels.
̂ ) = 𝒄[σ
Var(𝒄 ∗ 𝝁
̂2 (𝑿′ 𝑿)−1 ] 𝒄′
The following output from Minitab contains the standard error Difference quantities for the
treatment level comparisons.
Next, we will confirm the calculations for one of these standard errors, say the comparison for
Power Setting 180 against Power Setting 160.

Computing the Difference of Means
𝐿𝑆𝑀𝐸𝐴𝑁180 − 𝐿𝑆𝑀𝐸𝐴𝑁160

=
=
=
=
(𝜇̂ + 𝜏̂ 2 ) − (𝜇̂ + 𝜏̂1 )
(𝜏̂ 2 − 𝜏̂1 )
−30.35 − (−66.55)
36.20
The above Difference in Means is simply a linear combination, say 𝒄, of the estimated
̂ . In particular,
parameter vector, 𝝁
μ̂
𝜏̂
̂ = [0 −1 1 0] ∗ [ 1 ] = (𝜏̂ 2 − 𝜏̂1 ) = 36.20
𝒄∗𝝁
𝜏̂ 2
𝜏̂ 3
9

The associated SE of Difference can be computed using the fact that
̂ ) = 𝒄[σ
Var(𝒄 ∗ 𝝁
̂2 (𝑿′ 𝑿)−1 ] 𝒄′
with 𝐜 = [0 −1 1
̂ ) = 𝑉𝑎𝑟([0
Var(𝒄 ∗ 𝝁
0] yields
̂ ) = [0 −1 1
−1 1 0] ∗ 𝝁
= [0 −1 1
0
−1
̂) ∗ [ ]
0] ∗ 𝑉𝑎𝑟(𝝁
1
0
0
0.05
0
0
0
−1
0
0.15
−0.05
−0.05
]∗[ ]
0] ∗ 334 [
1
0
−0.05 0.15 −0.05
0
0
−0.05 −0.05 0.15
= 133.60
The standard error is simply the square root of this quantity.
𝑆𝐸 𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒
̂)
= √Var(𝒄 ∗ 𝝁
= √133.60
= 11.55
10
Example 4.2 A manufacturer of television sets is interested in the effect on tube connectivity of
four different types of coating for color picture tubes. A completely randomized experiment is
conducted and the following conductivity data are obtained.
Complete the following:
1. Write out the general linear model (using matrix notation) for this data. In particular,
clearly specify the response vector y, the design matrix X, the vector of model
parameters , and the error vector . You should use the set-to-zero parameterization
for the model as in done in Minitab.
11
Consider the following Minitab output from an appropriate analysis.
2. Verify ❶. That is, use your design matrix X and the response vector y to compute the
estimated model parameters.
̂ = (X ′ X)−1 X′ y
𝝁
3. Use your estimated model parameters to verify at least one of the Least Square Means
given in ❷.
4. Compute the estimated residual vector, 𝒓̂, or your model. (See page 7).
5. Verify ❸. In particular, use your residual vector to compute Adj MS.
6. Use the following to verify ❹on the Minitab output.
σˆ  MSE 
SSE
dferror
12
7. Use your design matrix X and
̂ ) = 𝜎̂ 2 (𝑿′ 𝑿)−𝟏
Var(𝝁
to verify at least one of the quantities in ❺.
8. Verify at least one of the quantities in ❻. You will need to use the following
relationship.
̂ ) = 𝒄[σ
Var(𝒄 ∗ 𝝁
̂2 (𝑿′ 𝑿)−1 ] 𝒄′
Consider the following output from the pairwise comparison portion of the Minitab output.
9. Verify ❶. What is the appropriate c vector for this calculation?
10. Verify ❷.
13
Download