Two-way MANOVA

advertisement
Two-way MANOVA
• We now consider designs with two factors. Factor 1 has g
levels and factor 2 has b levels.
• If Xikr is the p × 1 vector of measurements on the rth unit
in the ith level of factor 1 and the kth level of factor 2:
Xikr = µ + αi + τ k + γ ik + eikr ,
with i = 1, ..., g, k = 1, ..., b and r = 1, ..., n and all are p × 1
vectors.
• Here γik is a vector of interaction effects.
• To simplify notation, we assume that there are n units in
each of the gb combinations of factor levels, but everything
we will say holds as well for the case where there are ngb units
in each factor combination.
420
Two-way MANOVA
• The vector of measurements taken on the r-th unit in the
treatment group distinguished by the i-th level of factor 1
and the k-th level of factor 2 can be expressed as





Xikr1
Xikr2
...
Xikrp






=


µ1
µ2
...
µp






+


αi1
αi2
...
αip






+


τk1
τk2
...
τkp






+


γik1
γik2
...
γikp






+


ikr1
ikr2
...
ikrp
where
ikr ∼ N IDp(0, Σ)
421





Two-way MANOVA
• We will impose the SAS restrictions:
αg` = 0 for ` = 1, 2, ..., p
τb` = 0 for ` = 1, 2, ..., p
γgk` = 0 for k = 1, 2, ...b and ` = 1, 2, ..., p
γib` = 0 for i = 1, 2, ...g and ` = 1, 2, ..., p
• By making the responses from different units the rows of a
response matrix, the model for the entire set of data can be
written in matrix notation as follows
422
Two-way MANOVA





X1111
X1121
...
Xgbn1

















1
1
...
1
1
...
1
...
1
1
1
...
1
0
...
0
...
0
X1112
X1122
...
Xgbn2
0
0
...
0
1
...
1
...
0
...
...
...
...
...
...
...
...
...
...
...
...
...
0
0
...
0
0













0 



0
X111p
X112p
...
Xgbnp






















=

µl
α11
...
αg−1,1
τ11
...
τb−1,1
γ111
...
γg−1,b−1,1
µ2
α12
...
αg−1,2
τ12
...
τb−1,2
γ112
...
γg−1,b−1,2
...
...
...
...
...
...
...
...
...
...
µp
α1p
...
αg−1,p
τ1p
...
τb−2,p
γ11p
...
γg−1,b−2,p










+








423
Two-way MANOVA
• If the interaction is non-zero, then factor effects are not additive and the effect of one factor may depend on the level
of the other factor.
• See next figure: on top panel the effect of factor 2 depends
on level of factor 1. When factor 1 is at level 2, there relatively little effect of factor 2 on the outcome variable.
• The bottom panel shows parallel profiles, an example with
no interaction
424
Sum of squares and cross-product matrices
• As in the one-factor model, we can decompose the overall
variability into different sources. Note that
xikr − x̄ = (x̄i. − x̄) + (x̄.k − x̄) + (x̄ik − x̄i. − x̄.k + x̄) + (xikr − x̄ik ),
where x̄i. is the p × 1 mean vector of observations at ith level
of factor 1, x̄.k is the p × 1 mean vector of observations at
kth level of factor 2 and x̄ik is the p × 1 mean vector of observations at the ith level of factor 1 and the kth level of
factor 2.
• Multiplying both sides in expression above by the corresponding transposed vectors and summing over n, k, i we get the
usual decomposition.
425
Two-way MANOVA
Source
SS and CP matrices
Degress of freedom
Factor 1
Factor 2
Interaction
Residual
SSPf ac1
SSPf ac2
SSPint
SSE
g−1
b−1
(g − 1)(b − 1)
gb(n − 1)
Corr. total
SSPctot
gbn − 1
See next page for expressions for the various SS and CP matrices.
426
Two-way MANOVA
SSPf ac1 =
SSPf ac2 =
SSPint =
SSE =
SPctot =
X
i
X
bn(x̄i. − x̄)(x̄i. − x̄)0
gn(x̄.k − x̄)(x̄.k − x̄)0
k
X
X
n(x̄`k − x̄i. − x̄.k + x̄)(x̄ik − x̄i. − x̄.k + x̄)0
i k
XX
X
(xikr − x̄ik )(xikr − x̄`k )0
i k r
XX
X
i
k
r
(xikr − x̄)(xikr − x̄)0.
All matrices are p × p dimensional.
427
Testing hypothesis in the two-way model
• We first test for the presence of absence of interaction
effects:
H0 : γ11 = γ12 = · · · = γgb = 0,
versus H1 : at least one γik 6= 0.
• The Wilk’s Λ statistic below has an asymptotic χ2 distribution. For
|SSE|
Λ=
,
|SSPint + SSE|
we reject H0 at level α if
"
− bg(n − 1) −
#
p + 1 − (g − 1)(b − 1)
ln Λ ≥ χ2
(α).
p(g−1)(b−1)
2
428
Testing hypothesis in the two-way model
• A more accuate p-value is obtained from Rao’s F-approximation
• If we reject the null hypothesis of no interaction effects,
then interpreting the meaning of factors 1 and 2 effects
gets complicated.
• One recommended approach in this case is to focus on the
p variables individually (perhaps by fitting the p univariate
ANOVA models) to see whether interactions are present on
all outcome variables or only on some of them.
429
Testing hypothesis in the two-way model
• Interpretation of results can be aided by constructing p profile
plots (one for each of the p response variables) with the
sample mean at each combination of factor levels substituted
for the corresponding population mean.
• If we fail to reject the null hypothesis of no interaction effects,
then we proceed with hypothesis tests for additive effects
of factors 1 and 2 using the appropriate multivariate test
statistics.
430
Testing hypothesis in the two-way model
(cont’d)
• The null hypothesis of no additive effect of factor 1 is
H0 : α1 = α2 = · · · = αg = 0,
versus H1 : at least one αil 6= 0. The Wilk’s Λ∗ statistic is
Λ∗ =
|SSE|
,
|SSPf ac1 + SSE|
and the null is rejected at level α if
"
− bg(n − 1) −
#
p + 1 − (g − 1)
ln Λ∗ ≥ χ2
(α).
p(g−1)
2
431
Testing hypothesis in the two-way model
(cont’d)
• Similarly, the null hypothesis of no additive effect of factor 2
is
H0 : τ1 = τ2 = · · · = τb = 0,
versus H1 : at least one τk 6= 0. The Wilk’s Λ∗ statistic is
Λ∗ =
|SSE|
,
|SSPf ac2 + SSE|
and the null is rejected at level α if
"
− bg(n − 1) −
#
p + 1 − (b − 1)
ln Λ∗ ≥ χ2
(α).
p(b−1)
2
432
Simultaneous confidence intervals
• As in the one-factor case, we may wish to explore differences
across factor levels for each of the p variables.
• For example, simultaneous 100(1 − α)% Bonferroni confidence intervals for differences between the g(g − 1)/2 pairs
of levels of factor 1 (α`j − αmj ) for all j = 1, 2, ..., p response
variables are constructed as
s
Ejj 2
α
(x̄`.j − x̄m.j ) ± tν (
)
,
pg(g − 1)
ν bn
where ν = gb(n−1), Ejj is the (j, j)th diagonal element of the
error SS and CP matrix, and (x̄`.j − x̄m.j ) is the jth element
of the p × 1 vector of sample mean differences (x̄`. − x̄m.).
433
Simultaneous confidence intervals (cont’d)
• Similarly, simultaneous 100(1 − α)% Bonferroni confidence
intervals for differences between the b(b − 1)/2 pairs of levels
of factor 1 (τkj − τqj ) for all j = 1, 2, ..., p response variables
are constructed as
s
Ejj 2
α
(x̄.kj − x̄.qj ) ± tν (
)
,
pb(b − 1)
ν gn
where ν = gb(n−1), Ejj is the (j, j)th diagonal element of the
error SS and CP matrix, and (x̄.kj − x̄.qj ) is the jth element
of the p × 1 vector of sample mean differences (x̄.k − x̄.q ).
434
Simultaneous confidence intervals (cont’d)
• We need to consider combinations of factor levels if the interaction effects are not negligible.
• For example, simultaneous 100(1 − α)% Bonferroni confidence intervals for differences between the g(g − 1)/2 pairs
of levels of factor 1 ((τ`j + γ`kj ) − (τmj − γmkj )) at each of the
k = 1, 2, ..., b levels of factor 2 and all j = 1, 2, ..., p response
variables are constructed as
s
Ejj 2
α
)
,
(x̄`kj − x̄mkj ) ± tν (
pbg(g − 1)
ν n
where ν = gb(n − 1), Ejj is the (j, j)th diagonal element of
error SS and CP matrix, and (x̄`kj − x̄mkj ) is the jth element
of the p × 1 vector of sample mean differences (x̄`k − x̄mk ).
435
Some Comments
• Note: if n = 1, that is we do not have any replications within
factor level combinations, we will not be able to estimate an
error SS an CP matrix. In that case, we can only make formal inferences about the additive effects model for factors 1
and 2.
• Note 2: The extension to designs with more than two factors
is straight forward.
• It is possible to fit interactions of third, fourth and higher
orders when three, four or more factors are included in the
experiment.
• Interpretations are complicated if higher order interactions
are not negligible.
436
Example: Peanuts
• See Problem 6.31 in text book.
• Plant scientists conducted an experiment to examine three
traits of peanuts. The two factors in the experiment were
variety (three levels) and location (two levels), so there are
g × b = 3 × 2 = 6 factor level combinations.
• Two replications n = 2 were included for each of the 6 combinations of factor levels.
437
Example: Peanuts
• Scientists measured three variables on each plot:
– X1 = yield (plot weight)
– X2 = weight in grams of sound mature kernels - 250
grams
– X3 = seed size measured as weight in grams of 100 seeds
• Fit a two-way model with an interaction using SAS.
438
Example: Peanuts(cont’d)
• It is useful to first obtain the profile plots for the three
variables to see whether there may be interaction effects of
location and variety.
• We plot the means of each variable in each variety by
locations.
• See figures that follow. There are clear interactions between
location and variety, and the interactions appear to be
significant for all three variables.
439
Example: Peanuts(SAS Code)
/*
This program performs a two-way MANOVA
on the peanut data posted as peanuts.dat
The code is posted as peanuts.sas */
options linesize=64 nocenter nonumber ;
data set1;
infile "c:\stat501\data\peanuts.dat";
INPUT location variety x1 x2 x3;
/* LABEL group = student group
x1 = yield
x2 = SdMatKer
x3 = SeedSize; */
run;
440
Example: Peanuts(SAS Code)
PROC PRINT data=set1;
run;
PROC GLM DATA=set1;
CLASS location variety;
MODEL x1-x3 = location variety location*variety / P SOLUTION;
MANOVA H=location*variety /PRINTH PRINTE;
MANOVA H=variety / printH printE;
MANOVA H=location / printH printE;
Repeated traits 3 profile / printm;
run;
441
Example: Peanuts(SAS Output)
Obs
location
variety
x1
x2
x3
1
2
3
4
5
6
7
8
9
10
11
12
1
1
2
2
1
1
2
2
1
1
2
2
5
5
5
5
6
6
6
6
8
8
8
8
195.3
194.3
189.7
180.4
203.0
195.9
202.7
197.6
193.5
187.0
201.5
200.0
153.1
167.7
139.5
121.1
156.8
166.0
166.1
161.8
164.5
165.1
166.8
173.8
51.4
53.7
55.5
44.4
49.8
45.8
60.4
54.1
57.8
58.6
65.0
67.2
442
Example: Peanuts(SAS Output)
Dependent Variable: x1
Source
Model
Error
Corrected Total
Source
location
variety
location*variety
DF
5
6
11
DF
1
2
2
Type I SS
0.7008333
196.1150000
205.1016667
Sum of
Squares
401.9175000
104.2050000
506.1225000
Mean Square
80.3835000
17.3675000
Mean Square F Value
0.7008333
0.04
98.0575000
5.65
102.5508333
5.90
Pr > F
0.8474
0.0418
0.0382
443
Example: Peanuts(SAS Code)
Dependent Variable: x2
Source
Model
Error
Corrected Total
DF
5
6
11
Source
DF
location
1
variety
2
location*variety 2
Sum of
Squares
2031.777500
352.105000
2383.882500
Type I SS
162.067500
1089.015000
780.695000
Mean Square
406.355500
58.684167
Mean Square
162.067500
544.507500
390.347500
F Value
2.76
9.28
6.65
Pr > F
0.1476
0.0146
0.0300
444
Example: Peanuts(SAS Code)
Dependent Variable: x3
Source
Model
Error
Corrected Total
DF
5
6
11
Source
location
variety
location*variety
DF
1
2
2
Sum of
Squares
442.5741667
94.8350000
537.4091667
Type I SS
72.5208333
284.1016667
85.9516667
Mean Square
88.5148333
15.8058333
Mean Square
72.5208333
142.0508333
42.9758333
F Value
4.59
8.99
2.72
Pr > F
0.0759
0.0157
0.1443
445
Example: Peanuts(SAS Code)
MANOVA Test Criteria and F Approximations for the
Hypothesis of No Overall location*variety Effect
H = Type III SSCP Matrix for location*variety
E = Error SSCP Matrix
S=2
M=0
N=1
Statistic
Wilks’ Lambda
Pillai’s Trace
Hotelling-Lawley Trace
Roy’s Greatest Root
Value F Value
0.07430
3.56
1.29086
3.03
7.54429
5.03
6.82409
11.37
Num DF
6
6
6
3
Den DF
8
10
4
5
Pr > F
0.0508
0.0587
0.0699
0.0113
NOTE: F Statistic for Roy’s Greatest Root is an upper bound.
NOTE: F Statistic for Wilks’ Lambda is exact.
446
Example: Peanuts(SAS Code)
MANOVA Test Criteria and F Approximations for
the Hypothesis of No Overall variety Effect
H = Type III SSCP Matrix for variety
E = Error SSCP Matrix
S=2
M=0
N=1
Statistic
Wilks’ Lambda
Pillai’s Trace
Hotelling-Lawley Trace
Roy’s Greatest Root
Value F Value
0.01244 10.62
1.70911
9.79
21.37568 14.25
18.18761 30.31
Num DF
6
6
6
3
Den DF
8
10
4
5
Pr > F
0.0019
0.0011
0.0113
0.0012
NOTE: F Statistic for Roy’s Greatest Root is an upper bound.
NOTE: F Statistic for Wilks’ Lambda is exact.
447
Example: Peanuts(SAS Code)
MANOVA Test Criteria and Exact F Statistics for
the Hypothesis of No Overall location Effect
H = Type III SSCP Matrix for location
E = Error SSCP Matrix
S=1
M=0.5
N=1
Statistic
Wilks’ Lambda
Pillai’s Trace
Hotelling-Lawley Trace
Roy’s Greatest Root
Value F Value
0.10651620 11.18
0.89348380 11.18
8.38824348 11.18
8.38824348 11.18
Num DF
3
3
3
3
Den DF
4
4
4
4
Pr > F
0.0205
0.0205
0.0205
0.0205
448
Example: Peanuts(SAS Code)
MANOVA Test Criteria and Exact F Statistics
for the Hypothesis of no traits Effect
H = Type III SSCP Matrix for traits
E = Error SSCP Matrix
S=1
M=0
N=1.5
Statistic
Wilks’ Lambda
Pillai’s Trace
Hotelling-Lawley Trace
Roy’s Greatest Root
Value
0.000089
0.999911
11279.385332
11279.385332
F Value
28198.5
28198.5
28198.5
28198.5
Num DF
2
2
2
2
Den DF
5
5
5
5
449
Pr>F
<.0001
<.0001
<.0001
<.0001
Example: Peanuts(SAS Code)
MANOVA Test Criteria and Exact F Statistics for
the Hypothesis of no traits*location Effect
H = Type III SSCP Matrix for traits*location
E = Error SSCP Matrix
S=1
M=0
N=1.5
Statistic
Wilks’ Lambda
Pillai’s Trace
Hotelling-Lawley Trace
Roy’s Greatest Root
Value F Value
0.11262 19.70
0.88738 19.70
7.87949 19.70
7.87949 19.70
Num DF
2
2
2
2
Den DF
5
5
5
5
Pr > F
0.0043
0.0043
0.0043
0.0043
450
Example: Peanuts(SAS Code)
MANOVA Test Criteria and F Approximations for
the Hypothesis of no traits*variety Effect
H = Type III SSCP Matrix for traits*variety
E = Error SSCP Matrix
S=2
M=-0.5
N=1.5
Statistic
Value
Wilks’ Lambda
0.02064
Pillai’s Trace
1.55258
Hotelling-Lawley Trace 19.67760
Roy’s Greatest Root
18.14717
F Value
14.90
10.41
24.15
54.44
Num DF Den DF
4
10
4
12
4 5.1429
2
6
Pr > F
0.0003
0.0007
0.0016
0.0001
NOTE: F Statistic for Roy’s Greatest Root is an upper bound.
NOTE: F Statistic for Wilks’ Lambda is exact.
451
Example: Peanuts(SAS Code)
Repeated Measures Analysis of Variance
MANOVA Test Criteria and F Approximations for the
Hypothesis of no traits*location*variety Effect
H = Type III SSCP Matrix for traits*location*variety
E = Error SSCP Matrix
S=2
M=-0.5
N=1.5
Statistic
Wilks’ Lambda
Pillai’s Trace
Hotelling-Lawley Trace
Roy’s Greatest Root
Value F Value
0.09547
5.59
1.19307
4.44
6.45248
7.92
5.94400 17.83
Num DF Den DF
4
10
4
12
4 5.1429
2
6
Pr > F
0.0125
0.0198
0.0204
0.0030
NOTE: F Statistic for Roy’s Greatest Root is an upper bound.
NOTE: F Statistic for Wilks’ Lambda is exact.
452
Example: Peanuts(SAS Code)
Repeated Measures Analysis of Variance
Tests of Hypotheses for Between Subjects Effects
Source
location
variety
location*variety
Error
Source
location
variety
location*variety
DF
1
2
2
6
F Value
0.07
9.21
7.23
Type III SS
3.802500
1071.387222
841.031667
348.941667
Mean Square
3.802500
535.693611
420.515833
58.156944
Pr > F
0.8067
0.0148
0.0252
453
Example: Peanuts(SAS Code)
Repeated Measures Analysis of Variance
Univariate Tests of Hypotheses for Within Subject Effects
Source
traits
traits*location
traits*variety
traits*location*variety
Error(traits)
DF
2
2
4
4
12
Type III SS
126097.2156
231.4867
497.8444
230.7167
202.2033
Mean Square
63048.6078
115.7433
124.4611
57.6792
16.8503
454
Example: Peanuts(SAS Code)
Source
traits
traits*location
traits*variety
traits*location*variety
Greenhouse-Geisser Epsilon
Huynh-Feldt Epsilon
F Value
3741.70
6.87
7.39
3.42
Pr > F
<.0001
0.0103
0.0031
0.0436
Adj Pr > F
G - G
H - F
<.0001
<.0001
0.0337
0.0103
0.0188
0.0031
0.0919
0.0436
0.5583
1.1672
455
Download