Comparing Mean Vectors for Several Populations • Compare mean vectors for g treatments (or populations). • Randomly assign n` units to the `-th treatment (or take independent random samples from g populations) • Measure p characteristics of each unit. Observation vectors for the `-th population Pop ` : x`1, x`2, ..., x`n` , ` = 1, ..., g. are p × 1 vectors of measurements. We use x̄` to denote the sample mean vector for the `th treatment, and S` to denote the estimated covariance matrix in the `th group. • Each unit responds independently of any other unit. • We will use n to denote the total sample size: n = P ` n` . 364 Comparing Several Mean Vectors • If all n` −p are large, the following assumptions are all we need to make inferences about the difference between treatments: 1. X`1, X`2, ..., X`n` ∼ p-variate distribution(µ`, Σ`). 2. Each unit responds independently of any other unit (units are randomly allocated to the g treatment groups). 3. Covariance matrices are homogeneous: Σ` = Σ for all groups. 4. Each unit responds independently of any other unit. • When samples sizes are small, we use more assumptions: 1. Distributions are multivariate normal. 365 Pooled estimate of the covariance matrix • If all population covariance matrices are the same, then all group-level matrices of sums of squares and cross-products estimate the same quantity. • Then, it is reasonable to combine all the group-level covariance matrices into a single estimate by computing the weighted average of the covariance matrices. Weights are proportional to the number of units in each treatment group. • The pooled estimate of the common covariance matrix is g X (n − 1) ` S . P Spool = ` g (n − 1) j=1 j `=1 366 Analysis of Variance (ANOVA) • To develop approaches to compare g multivariate means, it will be convenient to make use of the usual decomposition of the variability in the sample response vectors into two sources: 1. Variability due to differences in treatment mean vectors (between-group variation) 2. Variability due to measurement error or differences among units within treatment groups( within-group variation) • We review some of these concepts in the univariate setting, when p = 1. 367 ANOVA (cont’d) • If an observation X`,j ∼ N(µ`, σ 2), we can write down a model X`,j = µ` + e`,j = µ + τ` + e`,j , where µ is an overall mean, τ` is the effect of the `th treatment, and e`,j ∼ N(0, σ 2). • A test of the null hypothesis of no differences among treatment means consists of testing H0 : µ + τ1 = µ + τ2 = ... = µ + τg = 0 which is equivalent to H0 : τ1 = τ2 = ... = τg = 0. • For identifiability reasons, we typically impose a restriction like X τ` = 0 or τg = 0 ` 368 ANOVA (cont’d) • Note that because µ` = µ + τ`, it follows that τ` = µ` − µ, so that a treatment effect is really indicating a deviation of the group-level mean fromµ. • We can decompose an observation in a similar manner: x`j = x̄ + (x̄` − x̄) + (x`j − x̄`), by adding and subtracting x̄ and x̄` to the observation. • Note that (x − x̄) | `j{z } Overall variability = (x̄ − x̄) | ` {z } Between-group var. + (x`j − x̄`) | {z } Within-group var. 369 ANOVA (cont’d) • If we first square both sides of the above expression and sum over all n` observations in the group and over all groups we have (x`j − x̄)2 = (x̄` − x̄)2 + (x`j − x̄`)2 + 2(x̄` − x̄)(x`j − x̄`) and n g X̀ X (x`j − x̄)2 = `=1 j=1 n g X̀ X (x̄` − x̄)2 + `=1 j=1 = g X `=1 n`(x̄` − x̄)2 + n g X̀ X (x`j − x̄`)2 `=1 j=1 n g X̀ X (x`j − x̄`)2 `=1 j=1 = SST reatments + SSError . 370 ANOVA (cont’d) • The null hypothesis of equal treatment means is rejected at level α if F = SST reatments/(g − 1) > F(g−1,n−g),α SSError /(n − g) 371 MANOVA: Multivariate Analysis of Variance • We now extend ANOVA to the case where observations x`j are p-dimensional vectors. • A one-way linear model similar to the one we wrote for the one-dimensional case is now x e`j1 µ1 + τ`1 `j1 µ +τ x`j2 e`j2 2 `2 = + .. . ... . . . . µp + τ`p x`jp e`jp • In vector form, the observation for the jth unit in the `th treatment group is written as x`j = µ + τ` + e`j , where all are p-dimensional vectors and e`j ∼ Np(0, Σ`). 372 MANOVA: Multivariate Analysis of Variance • A data matrix X for all units in all groups has dimension n × p P where n = ` n`. Each row of X is a unit and each column represents a measurement: Xn×p = x111 x112 · · · x11p ... ... ... ... x1n11 x1n12 · · · x1n1p x211 x212 · · · x21p ... ... ... ... x2n21 x2n22 · · · x2n2p ... ... ... ... ... ... ... ... xgng 1 xgng 2 · · · xgng p 373 MANOVA: Multivariate Analysis of Variance • We can write the multivariate linear model as Xn×p = An×(g+1)β(g+1)×p + n×p, where the right-hand side in more detail is 1 ... 1 1 ... 1 ... ... 1 1 ... 1 0 ... 0 ... ... 0 0 ... 0 1 ... 1 ... ... 0 ··· ... ··· ··· ... ··· ... ... ··· 0 ... 0 0 ... 0 ... ... 1 0 11 012 . .. 0 1n1 + 0 21 0 2n2 . .. . .. 0 gng µ1 µ2 · · · τ11 τ12 · · · τ21 τ22 · · · ... ... ... τg1 τg2 · · · µp τ1p τ2p ... τgp 374 MANOVA (cont’d) • Each column of the matrix β corresponds to a variable (or measured trait). • Each row of the error matrix is a p × 1 vector. • As written, the n × (g + 1) design matrix A has linearly dependent columns. To deal with this, SAS imposes the restriction τg1 = τg2 = · · · = τgp = 0, so that the last row of β and the last column of A are eliminated. Under this restriction E(xgj ) = µ, and τ` = µ` − µg = E(x`j ) − E(xgj ). 375 MANOVA (cont’d) • With this restriction, A becomes a n×g matrix of full-column rank, and the MLE of the g × p matrix β is β̂g×p = (A0g×nAn×g )−1A0g×nXn×p. • When we set τg = 0, β̂ (as estimated by SAS) is β̂ = µ̂0 τˆ10 ... 0 τ̂g−1 = x̄0g (x̄1 − x̄g )0 ... (x̄g−1 − x̄g )0 . 376 MANOVA (cont’d) • For the kth measurement (kth column of β, k = 1, ..., p) we have β̂k ∼ Ng (βk , σkk (A0A)−1), and cov(β̂k , β̂i) = σki(A0A)−1. • Estimates of the σkk and σki are obtained from the decomposition of total sums of squares and cross-products into the matrix of treatment SS and CP and the matrix of error SS and CP. 377 Sums of squares and cross-products matrices • As in the univariate case, we can write a p-dimensional observation vector as a sum of deviations: (x`j − x̄) = (x̄` − x̄) + (x`j − x̄`). • Note that (x`j − x̄)(x`j − x̄)0 = [(x̄` − x̄) + (x`j − x̄`)][(x̄` − x̄) + (x`j − x̄`)]0 = (x̄` − x̄)(x̄` − x̄)0 + (x̄` − x̄)(x`j − x̄`)0 + (x`j − x̄`)(x̄` − x̄)0 + (x`j − x̄`)(x`j − x̄`)0. 378 Sums of squares and cross-products matrices (cont’d) • Within any treatment group, Pn` j=1 (x`j − x̄` ) = 0 Pg Pn` 0 `=1 j=1 (x̄` − x̄)(x`j − x̄` ) = 0 and Pg Pn` 0=0 (x − x̄ )(x̄ − x̄) `j ` ` j=1 `=1 • Then, • It follows that n g X̀ X `=1 j=1 (x`j − x̄)(x`j − x̄)0 = g X n`(x̄` − x̄)(x̄` − x̄)0 `=1 + n g X̀ X (x`j − x̄`)(x`j − x̄`)0. `=1 j=1 379 Sums of squares and cross-products matrices (cont’d) • The matrix to the left of the = sign is called the corrected total sums of squares and cross products matrix. • The matrices on the right side are called, respectively, the treatment sums of squares and cross-products matrix, denoted by B, and the error sums of squares and crossproducts matrix, denoted by W (for ’within groups’). • Notice that we can re-write the W matrix as W = n g X̀ X (x`j − x̄`)(x`j − x̄`)0 `=1 j=1 = (n1 − 1)S1 + (n2 − 1)S2 + · · · + (ng − 1)Sg . 380 Sums of squares and cross-products matrices (cont’d) • If the g population covariance matrices are homogeneous, then S1, S2, · · · , Sg estimate the same quantity. Then W = (n1−1)S1+(n2−1)S2+· · ·+(ng −1)Sg = [ X (n`−1)]Spool, ` and an estimate of the pooled covariance matrix is given by W W = . (n − g) ` (n` − 1) Spool = P • The diagonal elements of W/(n − g) estimate the p variances and the off-diagonal elements are estimates of covariances. 381 Sums of squares and cross-products matrices (cont’d) • Using the linear model set-up, we can extend some of the results from linear model theory and note that B = X 0A(A0A)−1A0X = X 0PAX W = X 0[I − PA]X, where PA = A(A0A)−1A0 is the usual idempotent projection matrix. 382 Hypotheses Testing in MANOVA • We often wish to test H0 : τ1 = τ2 = · · · = τg versus H1 : At least two τ`0 s are not equal. • Compare the relative sizes of B and W . Source of variation Matrix of sum of squares and cross-products (SSP) Degrees of freedom (d.f.) Treatment P B = ` n`(x̄` − x̄)(x̄` − x̄)0 g−1 Residual P P W = ` j (x`j − x̄`)(x`j − x̄`)0 n−g Total corrected B+W = P P 0 ` j (x`j − x̄)(x`j − x̄) n−1 383 Hypotheses Testing in MANOVA (cont’d) • One test of the null hypothesis is carried out using a statistic called Wilk’s Λ (a likelihood ratio test): Λ= |W | . |B + W | • If B is ”small” relative to W, then Λ will be close to 1. Otherwise, Λ will be small. • We reject the null hypothesis when Λ is small. • SAS uses different notation. It calls the B matrix H and it calls the W matrix E, for ’hypothesis’ and ’error’, respectively. 384 Hypotheses Testing in MANOVA (cont’d) • The exact sampling distribution of Wilk’s Λ can be derived only for special cases (see next page). • In general, for large n and under H0, Bartlett showed that ! − n−1− (p + g) ln Λ ∼ χ2 , p(g−1) 2 where the distribution is approximate. Thus, we reject H0 at level α when ! (p + g) ln Λ ≥ χ2 − n−1− p(g−1) (α). 2 385 Exact distribution of Wilk’s Λ No. of variables p=1 p=2 p≥1 p≥1 No. of groups Sampling distribution for multivariate normal data g≥2 g≥2 n−g−1 g−1 g=2 g=3 n−g g−1 ∼ Fg−1,n−g √ 1− √ Λ ∼ F2(g−1),2(n−g−1) Λ n−p−1 p n−p−2 p 1−Λ Λ 1−Λ Λ ∼ Fp,n−p−1 √ 1− √ Λ ∼ F2p,2(n−p−2) Λ 386 Other Tests • Most packages (including SAS) will compute Wilk’s Λ and some other statistics. • Note that |W | Λ= = |W ||B + W |−1 = |BW −1 + I|−1. |B + W | • Lawley-Hotelling trace: Reject the null hypothesis of no treatment differences at level α if nT02 = tr(BW −1) ≥ χ2 g×p (α) • Pillai trace: V = tr[B(B + W )−1]. 387 Other Tests • Roy’s maximum root: the test statistic is the largest eigenvalue of BW −1. (The F-distribution used by SAS is not accurate.) • The power of Wilk’s, Lawley-Hotelling and Pillai statistics is similar. Roy’s statistic has higher power only when one of the g treatments is very different from the rest. • Limited simulation results suggest that Pillai’s trace may be slightly more robust to departures from multivariate normality. 388