Inferences about a Mean Vector • In the following lectures, we test hypotheses about a p × 1 population mean vector µ = (µ1, µ2, . . . , µp)0 • We could test p disjoint hypothesis (one for each µj in µ) but that would not take advantage of the correlations between the measured traits (X1, X2, . . . , Xp). • We first review hypothesis testing in the univariate case, and then develop the multivariate Hotelling’s T 2 statistic and the likelihood ratio statistic for multivariate hypothesis testing. • We consider applications to repeated measures (longitudinal) studies. • We also consider situations when data are incomplete (data are missing at random). 250 Approaches to Multivariate Inference • Define a reasonable distance measure. An estimated mean vector that is too ”far away” from the hypothesized mean vector µ0 gives evidence against the null hypothesis. • Construct a likelihood ratio test based on the multivariate normal distribution. • ”Union-Intersection”’ approach: Consider a univariate test of H0 : a0µ = a0µ0 versus Ha : a0µ 6= a0µ0 for some linear combination of the traits a0X. Optimize over possible values of a. 251 Review of Univariate Hypothesis Testing • Is a given value µ0 a plausible value for the population mean µ? • We formulate the problem as a hypothesis testing problem. The competing hypotheses are H0 : µ = µ0 and Ha : µ 6= µ0. • Given a sample X1, ..., Xn from a normal population, we compute the test statistic (X̄ − µ0) t= . √ s/ n • If t is ’small’, then X̄ and µ0 are close and we fail to reject H0 . 252 Univariate Hypothesis Testing (cont’d) • When H0 is true, the statistic t has a student t distribution with n − 1 degrees of freedom. We reject the null hypothesis at level α when |t| > t(n−1)(α/2). • Notice that rejecting H0 when t is large is equivalent to rejecting it when the squared standardized distance (X̄ − µ0)2 2 )−1 (X̄ − µ ) 2 = n( X̄ − µ )(s t = 0 0 s2/n is large. • We reject H0 when n(X̄ − µ0)(s2)−1(X̄ − µ0) > t2 (n−1) (α/2) i.e., the squared standardized distance exceeds the upper α percentile of a central F-distribution with n − 1 df. 253 Univariate Hypothesis Testing (cont’d) • If we fail to reject H0, we conclude that µ0 is close (in units of standard deviations of X̄) to X̄, and thus is a plausible value for µ. • The set of plausible values for µ is the set of all values that lie in the 100(1 − α)% confidence interval for µ: s s x̄ − tn−1(α/2) √ ≤ µ0 ≤ x̄ + tn−1(α/2) √ . n n • The confidence interval consists of all the µ0 values that would not be rejected by the α level test of H0 : µ = µ0. • Before collecting the data, the interval is random and has probability 1 − α of containing µ. 254 Hotelling’s T 2 Statistic • Consider now the problem of testing whether the p × 1 vector µ0 is a plausible value for the population mean vector µ. • The squared distance 1 −1 2 0 S (X̄ − µ0) = n(X̄ − µ0)0S −1(X̄ − µ0) T = (X̄ − µ0) n is called the Hotelling T 2 statistic. • In the expression above, 1X Xi , X̄ = n i 1 X S= (Xi − X̄)(Xi − X̄)0. n−1 i 255 Hotelling’s T 2 Statistic (cont’d) • If the observed T 2 value is ’large’ we reject H0 : µ = µ0. • To decide how large is large, we need the sampling distribution of T 2 when the hypothesized mean vector is correct: (n − 1)p T2 ≡ Fp,n−p. (n − p) • We reject the null hypothesis H0 : µ = µ0 for the p-dimensional vector µ at level α when (n − 1)p Fp,n−p(α), (n − p) where Fp,n−p(α) is the upper α percentile of the central F distribution with p and n − p degrees of freedom. T2 > 256 Hotelling’s T 2 Statistic (cont’d) • As we noted earlier, 1 −1 2 0 T = (X̄ − µ0) S (X̄ − µ0) = n(X̄ − µ0)0S −1(X̄ − µ0) n has an approximate central chi-square distribution with p df when µ0 is correct, for large n, or when Σ is known, in which case the distribution is exact when we have normality. • The exact F-distribution relies on the normality assumption. • Note that (n − 1)p Fp,n−p(α) > χ2 p (α) (n − p) but these quantities are nearly equal for large values of n − p. 257 Example 5.2: Female Sweat Data • Perspiration from a sample of 20 healthy females was analyzed. Three variables were measured for each women: X1 =sweat rate X2 =sodium content X3 =potassium content • The question is whether µ0 = [4, 50, 10]0 is plausible for the population mean vector. 258 Example 5.2: Sweat Data (cont’d) • At level α = 0.1, we reject the null hypothesis if (n − 1)p Fp,n−p(0.1) (n − p) 19(3) = F3,17(0.1) = 8.18. 17 T 2 = 20(X̄ − µ0)0S −1(X̄ − µ0) > • From the data displayed in Table 5.1: 4.64 4.64 − 4 0.64 x̄ = 45.4 and x̄ − µ0 = 45.4 − 50 = −4.6 . 9.96 9.96 − 10 −0.04 259 Example 5.2: Sweat Data (cont’d) • After computing the inverse of the 3 × 3 sample covariance matrix S −1 we can compute the value of the T 2 statistic as 0.586 −0.022 0.258 0.64 T 2 = 20[ 0.64 −4.6 −0.04 ] −0.022 0.006 −0.002 −4.60 0.258 −0.002 0.402 −0.04 = 9.74. • Since 9.74 > 8.18 we reject H0 and conclude that µ0 is not a plausible value for µ at the 10% level. • At this point, we do not know which of the three hypothesized mean values is not supported by the data. 260 The Female Sweat Data: R code – sweat.R sweat <- read.table(file = "http://www.public.iastate.edu/~maitra/stat501/datasets/sweat.dat", header = F, col.names = c("subject", "x1", "x2", "x3")) library(ICSNP) HotellingsT2(X = sweat[, -1], mu = nullmean) # Hotelling’s one sample T2-test # data: sweat[, -1] # T.2 = 2.9045, df1 = 3, df2 = 17, p-value = 0.06493 # alternative hypothesis: true location is not equal to c(4,50,10) 261 Invariance property of Hotelling’s T 2 • The T 2 statistic is invariant to changes in units of measurements of the form Yp×1 = Cp×pXp×1 + dp×1, with C non-singular. An example of such a transformation is the conversion of temperature measurements from Fahrenheit to Celsius. • Note that given observations x1, ..., xn, we find that ȳ = Cx̄ + d, and Sy = CSC 0. • Similarly, E(Y ) = Cµ + d and the hypothesized value is µY,0 = Cµ0 + d. 262 Invariance property of Hotelling’s T 2 (cont’d) • We now show that the Ty2 = Tx2. Ty2 = n(ȳ − µY,0)0Sy−1(ȳ − µY,0) = n(C(x̄ − µ0))0(CSC 0)−1(C(x̄ − µ0)) = n(x̄ − µ0)0C 0(C 0)−1S −1C −1C(x̄ − µ0) = n(x̄ − µ0)0S −1(x̄ − µ0). • The Hotelling T 2 test is the most powerful test in the class of tests that are invariate to full rank linear transformations 263 Likelihood Ratio Test and Hotelling’s T 2 • Compare the maximum value of the multivariate normal likelihood function under no restrictions against the restricted maximized value with the mean vector held at µ0. The hypothesized value µ0 will be plausible if it produces a likelihood value almost as large as the unrestricted maximum. • To test H0 : µ = µ0 against H1 : µ 6= µ0 we construct the ratio: Likelihood ratio = Λ = max{Σ} L(µ0, Σ) max{µ,Σ} L(µ, Σ) = |Σ̂| |Σ̂0| !n/2 , where the numerator in the ratio is the likelihood at the MLE of Σ given that µ = µ0 and the denominator is the likelihood at the unrestricted MLEs for both µ, Σ. 264 Likelihood Ratio Test and Hotelling’s T 2 • Since Σ̂0 = n−1 µ̂ = n−1 Σ̂ = n−1 (xi − µ0)(xi − µ0)0 under H0 X i X xi = x̄, under H0 ∪ H1 i X (xi − x̄)(xi − x̄)0 = n−1A, under H0 ∪ H1, i then under the assumption of multivariate normality Λ= |Σ̂0|−n/2 exp{−tr[Σ̂−1 0 P 0 i (xi − µ0 )(xi − µ0 ) /2]} . −n/2 −1 |Σ̂| exp{−tr[Σ̂ A]} 265 Derivation of Likelihood Ratio Test 1 −n/2 |Σ̂0| exp {− 2 i = = = = (xi − µ0)0Σ̂−1 0 (xi − µ0 )]} X 1 −n/2 |Σ̂0| exp{− X tr(xi − µ0)0Σ̂−1 0 (x− µ0 )} 2 i X 1 −1 −n/2 (xi − µ0)(xi − µ0)0} |Σ̂0| exp{− trΣ̂0 2 i 1 |Σ̂0|−n/2 exp{− trΣ̂−1 Σ̂0n} 0 2 np |Σ̂0|−n/2 exp{− }. 2 266 Derivation of Likelihood Ratio Test 1 −n/2 |Σ̂| exp {− 2 i = = = = (xi − x̄)0Σ̂−1(xi − x̄)]} X 1 −n/2 |Σ̂| exp{− X tr(xi − x̄)0Σ̂−1(x−x̄)} 2 i X 1 −1 −n/2 |Σ̂| exp{− trΣ̂ (xi − x̄)(xi − x̄)0} 2 i 1 |Σ̂|−n/2 exp{− trΣ̂−1Σ̂n} 2 np |Σ̂|−n/2 exp{− }. 2 267 Derivation of Likelihood Ratio Test Λ = = = = |Σ̂0|−n/2 exp{− np 2} |Σ̂|−n/2 exp{− np 2} |Σ̂0|−n/2 |Σ̂|−n/2 |Σ̂|n/2 |Σ̂0|n/2 |Σ̂| |Σ̂0| !n/2 . • µ0 is a plausible value for µ if Λ is close to one. 268 Relationship between Λ and T 2 • It is just a matter of algebra to show that Λ2/n = |Σ̂| |Σ̂0| " = 1+ T2 n−1 |Σ̂0| T2 = 1+ . |Σ̂| n−1 #−1 , or • For large T 2, the likelihood ratio is small and both lead to rejection of H0. 269 Relationship between Λ and T 2 • From the previous equation, " T2 = # |Σ̂0| − 1 (n − 1), |Σ̂| which provides another way to compute T 2 that does not require inverting a covariance matrix. • When H0 : µ = µ0 is true, the exact distribution of the likelihood ratio test statistic is obtained from " T2 = # Σ̂0| p(n − 1) − 1 (n − 1) ∼ F(p,n−p)(α) |Σ̂| n−p 270 Union-Intersection Derivation of T 2 • Consider a reduction from p-dimensional observation vectors to univariate observations Yj = a0Xj = a1X1 + a2X2 + · · · + apXp ∼ N ID(a0µ, a0Σa) where a0 = (a1, a2, . . . , ap) • The null hypothesis H0 : µ = µ0 is true if and only if all null hypotheses of the form H(0,a) : a0µ = a0µ0 are true. • Test H(0,a) : a0µ = a0µ0 versus H(A,a) : a0µ 6= a0µ0 with " t2 (a) = Ȳ − a0µ0 sȲ #2 2 0 0 a X̄ − a µ0 = q 1 a0 S a n 271 Union-Intersection Derivation of T 2 • If you cannot reject the null hypothesis for the a that maximizes t2 , you cannot reject any of the the univariate (a) null hypotheses and you cannot reject the multivariate null hypothesis H0 : µ = µ0. is a = • From previous results, a vector that maximizes t2 (a) S −1(X̄ − µ) • Consequently, The maximum squared t-test is T 2 = n(X̄ − µ0)0S −1(X̄ − µ0) 272