Comparing Mean Vectors for Several Populations

advertisement
Comparing Mean Vectors for Several
Populations
• Compare mean vectors for g treatments (or populations).
• Randomly assign n` units to the `-th treatment (or take
independent random samples from g populations)
• Measure p characteristics of each unit. Observation vectors
for the `-th population
Pop ` : x`1, x`2, ..., x`n` ,
` = 1, ..., g.
are p × 1 vectors of measurements. We use x̄` to denote the
sample mean vector for the `th treatment, and S` to denote
the estimated covariance matrix in the `th group.
• Each unit responds independently of any other unit.
• We will use n to denote the total sample size: n =
P
` n` .
364
Comparing Several Mean Vectors
• If all n` −p are large, the following assumptions are all we need
to make inferences about the difference between treatments:
1. X`1, X`2, ..., X`n` ∼ p-variate distribution(µ`, Σ`).
2. Each unit responds independently of any other unit (units
are randomly allocated to the g treatment groups).
3. Covariance matrices are homogeneous: Σ` = Σ for all
groups.
4. Each unit responds independently of any other unit.
• When samples sizes are small, we use more assumptions:
1. Distributions are multivariate normal.
365
Pooled estimate of the covariance matrix
• If all population covariance matrices are the same, then all
group-level matrices of sums of squares and cross-products
estimate the same quantity.
• Then, it is reasonable to combine all the group-level
covariance matrices into a single estimate by computing
the weighted average of the covariance matrices. Weights
are proportional to the number of units in each treatment
group.
• The pooled estimate of the common covariance matrix is
g
X

(n − 1)

`
S .
P
Spool =
`
g
(n
−
1)
j=1 j
`=1
366
Analysis of Variance (ANOVA)
• To develop approaches to compare g multivariate means, it
will be convenient to make use of the usual decomposition
of the variability in the sample response vectors into two
sources:
1. Variability due to differences in treatment mean vectors
(between-group variation)
2. Variability due to measurement error or differences among
units within treatment groups( within-group variation)
• We review some of these concepts in the univariate setting,
when p = 1.
367
ANOVA (cont’d)
• If an observation X`,j ∼ N(µ`, σ 2), we can write down a model
X`,j = µ` + e`,j
= µ + τ` + e`,j ,
where µ is an overall mean, τ` is the effect of the `th
treatment, and e`,j ∼ N(0, σ 2).
• A test of the null hypothesis of no differences among treatment means consists of testing
H0 : µ + τ1 = µ + τ2 = ... = µ + τg = 0
which is equivalent to H0 : τ1 = τ2 = ... = τg = 0.
• For identifiability reasons, we typically impose a restriction
like
X
τ` = 0
or
τg = 0
`
368
ANOVA (cont’d)
• Note that because µ` = µ + τ`, it follows that τ` = µ` − µ, so
that a treatment effect is really indicating a deviation of the
group-level mean fromµ.
• We can decompose an observation in a similar manner:
x`j = x̄ + (x̄` − x̄) + (x`j − x̄`),
by adding and subtracting x̄ and x̄` to the observation.
• Note that
(x − x̄)
| `j{z
}
Overall variability
=
(x̄ − x̄)
| ` {z }
Between-group var.
+
(x`j − x̄`)
|
{z
}
Within-group var.
369
ANOVA (cont’d)
• If we first square both sides of the above expression and sum
over all n` observations in the group and over all groups we
have
(x`j − x̄)2 = (x̄` − x̄)2 + (x`j − x̄`)2 + 2(x̄` − x̄)(x`j − x̄`)
and
n
g X̀
X
(x`j − x̄)2 =
`=1 j=1
n
g X̀
X
(x̄` − x̄)2 +
`=1 j=1
=
g
X
`=1
n`(x̄` − x̄)2 +
n
g X̀
X
(x`j − x̄`)2
`=1 j=1
n
g X̀
X
(x`j − x̄`)2
`=1 j=1
= SST reatments + SSError .
370
ANOVA (cont’d)
• The null hypothesis of equal treatment means is rejected at
level α if
F =
SST reatments/(g − 1)
> F(g−1,n−g),α
SSError /(n − g)
371
MANOVA: Multivariate Analysis of Variance
• We now extend ANOVA to the case where observations x`j
are p-dimensional vectors.
• A one-way linear model similar to the one we wrote for the
one-dimensional case is now






x
e`j1
µ1 + τ`1
 `j1 


 µ +τ

 x`j2 

e`j2 

2
`2

=


 +  .. 
.
 ... 
.
.


.
.




µp + τ`p
x`jp
e`jp
• In vector form, the observation for the jth unit in the `th
treatment group is written as
x`j = µ + τ` + e`j ,
where all are p-dimensional vectors and e`j ∼ Np(0, Σ`).
372
MANOVA: Multivariate Analysis of Variance
• A data matrix X for all units in all groups has dimension n × p
P
where n = ` n`. Each row of X is a unit and each column
represents a measurement:









Xn×p = 








x111 x112 · · · x11p

...
...
...
...


x1n11 x1n12 · · · x1n1p 

x211 x212 · · · x21p 


...
...
...
...


x2n21 x2n22 · · · x2n2p 

...
...
...
...



...
...
...
...

xgng 1 xgng 2 · · · xgng p
373
MANOVA: Multivariate Analysis of Variance
• We can write the multivariate linear model as
Xn×p = An×(g+1)β(g+1)×p + n×p,
where the right-hand side in more detail is

















1
...
1
1
...
1
...
...
1
1
...
1
0
...
0
...
...
0
0
...
0
1
...
1
...
...
0
···
...
···
···
...
···
...
...
···
0
...
0
0
...
0
...
...
1

















0
 11

 012 
 . 

 .. 



 0


 1n1 



 +  0


 21 

 0

 2n2 

 . 
 .. 
 . 
 .. 


0
gng


µ1 µ2 · · ·
τ11 τ12 · · ·
τ21 τ22 · · ·
...
...
...
τg1 τg2 · · ·
µp
τ1p
τ2p
...
τgp
374
MANOVA (cont’d)
• Each column of the matrix β corresponds to a variable (or
measured trait).
• Each row of the error matrix is a p × 1 vector.
• As written, the n × (g + 1) design matrix A has linearly
dependent columns. To deal with this, SAS imposes the
restriction
τg1 = τg2 = · · · = τgp = 0,
so that the last row of β and the last column of A are
eliminated. Under this restriction
E(xgj ) = µ,
and τ` = µ` − µg = E(x`j ) − E(xgj ).
375
MANOVA (cont’d)
• With this restriction, A becomes a n×g matrix of full-column
rank, and the MLE of the g × p matrix β is
β̂g×p = (A0g×nAn×g )−1A0g×nXn×p.
• When we set τg = 0, β̂ (as estimated by SAS) is



β̂ = 


µ̂0
τˆ10
...
0
τ̂g−1





=




x̄0g
(x̄1 − x̄g )0
...
(x̄g−1 − x̄g )0



.

376
MANOVA (cont’d)
• For the kth measurement (kth column of β, k = 1, ..., p) we
have
β̂k ∼ Ng (βk , σkk (A0A)−1),
and
cov(β̂k , β̂i) = σki(A0A)−1.
• Estimates of the σkk and σki are obtained from the decomposition of total sums of squares and cross-products into the
matrix of treatment SS and CP and the matrix of error SS
and CP.
377
Sums of squares and cross-products matrices
• As in the univariate case, we can write a p-dimensional
observation vector as a sum of deviations:
(x`j − x̄) = (x̄` − x̄) + (x`j − x̄`).
• Note that
(x`j − x̄)(x`j − x̄)0 = [(x̄` − x̄) + (x`j − x̄`)][(x̄` − x̄) + (x`j − x̄`)]0
= (x̄` − x̄)(x̄` − x̄)0 + (x̄` − x̄)(x`j − x̄`)0 +
(x`j − x̄`)(x̄` − x̄)0 + (x`j − x̄`)(x`j − x̄`)0.
378
Sums of squares and cross-products matrices
(cont’d)
• Within any treatment group,
Pn`
j=1 (x`j − x̄` ) = 0
Pg
Pn`
0
`=1 j=1 (x̄` − x̄)(x`j − x̄` ) = 0 and
Pg
Pn`
0=0
(x
−
x̄
)(x̄
−
x̄)
`j
`
`
j=1
`=1
• Then,
• It follows that
n
g X̀
X
`=1 j=1
(x`j − x̄)(x`j − x̄)0 =
g
X
n`(x̄` − x̄)(x̄` − x̄)0
`=1
+
n
g X̀
X
(x`j − x̄`)(x`j − x̄`)0.
`=1 j=1
379
Sums of squares and cross-products matrices
(cont’d)
• The matrix to the left of the = sign is called the corrected
total sums of squares and cross products matrix.
• The matrices on the right side are called, respectively, the
treatment sums of squares and cross-products matrix,
denoted by B, and the error sums of squares and crossproducts matrix, denoted by W (for ’within groups’).
• Notice that we can re-write the W matrix as
W =
n
g X̀
X
(x`j − x̄`)(x`j − x̄`)0
`=1 j=1
= (n1 − 1)S1 + (n2 − 1)S2 + · · · + (ng − 1)Sg .
380
Sums of squares and cross-products matrices
(cont’d)
• If the g population covariance matrices are homogeneous,
then S1, S2, · · · , Sg estimate the same quantity. Then
W = (n1−1)S1+(n2−1)S2+· · ·+(ng −1)Sg = [
X
(n`−1)]Spool,
`
and an estimate of the pooled covariance matrix is given by
W
W
=
.
(n − g)
` (n` − 1)
Spool = P
• The diagonal elements of W/(n − g) estimate the p variances
and the off-diagonal elements are estimates of covariances.
381
Sums of squares and cross-products matrices
(cont’d)
• Using the linear model set-up, we can extend some of the
results from linear model theory and note that
B = X 0A(A0A)−1A0X = X 0PAX
W = X 0[I − PA]X,
where PA = A(A0A)−1A0 is the usual idempotent projection
matrix.
382
Hypotheses Testing in MANOVA
• We often wish to test H0 : τ1 = τ2 = · · · = τg versus
H1 : At least two τ`0 s are not equal.
• Compare the relative sizes of B and W .
Source of
variation
Matrix of sum of squares and
cross-products (SSP)
Degrees of
freedom (d.f.)
Treatment
P
B = ` n`(x̄` − x̄)(x̄` − x̄)0
g−1
Residual
P P
W = ` j (x`j − x̄`)(x`j − x̄`)0
n−g
Total corrected
B+W =
P P
0
` j (x`j − x̄)(x`j − x̄)
n−1
383
Hypotheses Testing in MANOVA (cont’d)
• One test of the null hypothesis is carried out using a statistic
called Wilk’s Λ (a likelihood ratio test):
Λ=
|W |
.
|B + W |
• If B is ”small” relative to W, then Λ will be close to 1.
Otherwise, Λ will be small.
• We reject the null hypothesis when Λ is small.
• SAS uses different notation. It calls the B matrix H and
it calls the W matrix E, for ’hypothesis’ and ’error’,
respectively.
384
Hypotheses Testing in MANOVA (cont’d)
• The exact sampling distribution of Wilk’s Λ can be derived
only for special cases (see next page).
• In general, for large n and under H0, Bartlett showed that
!
− n−1−
(p + g)
ln Λ ∼ χ2
,
p(g−1)
2
where the distribution is approximate. Thus, we reject H0 at
level α when
!
(p + g)
ln Λ ≥ χ2
− n−1−
p(g−1) (α).
2
385
Exact distribution of Wilk’s Λ
No. of
variables
p=1
p=2
p≥1
p≥1
No. of
groups
Sampling distribution for multivariate
normal data
g≥2
g≥2
n−g−1
g−1
g=2
g=3
n−g
g−1
∼ Fg−1,n−g
√ 1−
√ Λ ∼ F2(g−1),2(n−g−1)
Λ
n−p−1
p
n−p−2
p
1−Λ
Λ
1−Λ
Λ
∼ Fp,n−p−1
√ 1−
√ Λ ∼ F2p,2(n−p−2)
Λ
386
Other Tests
• Most packages (including SAS) will compute Wilk’s Λ and
some other statistics.
• Note that
|W |
Λ=
= |W ||B + W |−1 = |BW −1 + I|−1.
|B + W |
• Lawley-Hotelling trace: Reject the null hypothesis of no
treatment differences at level α if
nT02 = tr(BW −1) ≥ χ2
g×p (α)
• Pillai trace: V = tr[B(B + W )−1].
387
Other Tests
• Roy’s maximum root: the test statistic is the largest
eigenvalue of BW −1. (The F-distribution used by SAS is not
accurate.)
• The power of Wilk’s, Lawley-Hotelling and Pillai statistics is
similar. Roy’s statistic has higher power only when one of
the g treatments is very different from the rest.
• Limited simulation results suggest that Pillai’s trace may be
slightly more robust to departures from multivariate normality.
388
Download