Confidence Regions • Confidence regions are multivariate extensions of univariate confidence intervals. • Recall the definition of a 100(1 − α)% CI for a parameter θ: for X ∼ f (x|θ), θ ∈ Θ, the interval (t1(X), t2(X)) is a 100(1 − α)% CI for θ if Pr[t1(X) ≤ θ ≤ t2(X)] = 1 − α. • If θ represents a univariate mean µ, a 100(1 q q − α)% CI for µ is given by [X̄ − tn−1, α s2/n, X̄ + tn−1, α s2/n]. 2 2 • Similarly, for θp×1, the region R(X) is a 100(1 − α)% confidence region for θ if Pr[R(X) will cover the true θ] = 1 − α. 279 Confidence Regions (cont’d) • For the mean vector µp×1, we know that before the sample is selected, # " (n − 1)p 0 −1 Fp,n−p(α) = 1 − α, Pr n(X̄ − µ) S (X̄ − µ) ≤ (n − p) meaning that X̄ is within statistical distance [(n − 1)pFp,n−p(α)/(n − p)]1/2 from µ with probability 1 − α. • Once a sample is obtained and x̄, S are computed, the set of values (n − 1)p Fp,n−p(α) (n − p) defines an ellipsoidal region R(X) that is likely to cover µ. n(x̄ − µ)0S −1(x̄ − µ) ≤ 280 Confidence Regions (cont’d) • To decide whether a hypothesized value µ0 is contained in the confidence region, we evaluate n(x̄ − µ0)0S −1(x̄ − µ0) and compare it to the scaled F value above. If the squared distance from x̄ to µ0 is larger than [(n−1)pFp,n−p(α)/(n−p)], µ0 is not in the confidence region. • This is exactly equivalent to testing Ho : µ = µ0 versus H1 : µ 6= µ0 using Hotelling’s T 2 statistic. • Thus, the 100(1 − α)% confidence region is composed of all values µ0 for which the T 2 test would NOT reject Ho : µ = µ0 versus H1 : µ 6= µ0 at level α. 281 Confidence Regions (cont’d) • What can we say about the shape of the confidence region? It is a p-dimensional ellipsoid centered at the sample mean vector X̄. • Recall that if (λi, ei) are an eigenvalue-eigenvector pair of S, then letting (n − 1)pFp,n−p(α)/(n − p) = c2, the ith axis of the confidence ellipse has half length s s λi λi c = (n − 1)pFp,n−p(α)/(n − p) n n along the ei direction. q 282 Confidence Regions (cont’d) • Thus, beginning from the center of the ellipse at x̄, the axes of the confidence ellipse are q ± λi s (n − 1)p Fp,n−p(α). n(n − p) • Since the second term is constant for all axes, the ratios of the λi will reflect the relative elongations. • Larger differences in the sample variances across the p measurements (due to ’real’ causes or to differences in the scale of the measurements), will create larger ratios of eigenvalues (correlations are also involved) 283 Example: Microwave Ovens • Recall the microwave oven radiation data in Tables 4.1 and 4.5, where two radiation measurements, x1 and x2, were obtained from n = 42 ovens. Here, the xj denotes the transformed (by Box-Cox) radiation measurements, using a power λ = 0.25. • Sample statistics for those data are: " x̄ = 0.564 0.603 # " , S= 0.014 0.012 0.012 0.015 # , S −1 = " 203.02 −163.39 −163.39 200.23 • Eigenvalue and eigenvector pairs for S are λ1 = 0.026 e01 = [0.704, 0.710] λ2 = 0.002 e02 = [−0.71, 0.704] 284 # . Example: Microwave Ovens (cont’d) • The 95% CR for µ is given by all values µ1, µ2 that satisfy: 42[0.564−µ1 0.603−µ2]0 " 203.02 −163.39 −163.39 200.23 #" 0.564 − µ1 0.603 − µ2 # ≤ 6.62, where 2(41) 2(41) F2,40(0.05) = 3.23 = 6.62. 40 40 • Is µ0 = [0.562 0.589]0 a plausible value for µ? To check, plug µ0 into the expression above and see if it satisfies the inequality. In this case, we get 1.30 which is less than 6.62, and conclude that µ0 is plausible at the 95% level. 285 Example: Microwave Ovens (cont’d) 286 Example: Microwave Ovens (cont’d) • The joint confidence ellipsoid is centered at x̄ = [0.564 0.603]0 and the half lengths of the two axes are √ s 2(41) 0.026 3.23 = 0.064, 42(40) √ s 0.002 2(41) 3.23 = 0.018. 42(40) • The axis are in the direction of the two eigenvectors when x̄ is taken as the origin. • The ratio √ 0.026 √ = 3.6 0.002 indicates that the major axis is 3.6 times longer than the minor axis. 287 Simultaneous Confidence Statements • Often we are interested in drawing inference about each µj . • One possibility is to construct ordinary confidence intervals s α sjj x̄j ± tn−1( ) , 2 n for each µj . One problem is that the combined set of individual intervals result in a simultaneous confidence level that is less than the nominal 1 − α. • There are various ways of constructing a collection of individual confidence intervals so that the joint confidence level for the family of parameters remains at 1 − α • Intuitively, CI’s that protect against erosion of the confidence level will be wider than the individual (1 − α) × 100% CI’s. 288 Simultaneous Confidence Statements (cont’d) • Suppose that we have p variables. The population mean of the first variable µ1 can be written as a01µ = [1 0 ... 0]µ, and in general, µj = a0j µ where a0j is the p × 1 row vector with a one in the jth position and zeros in all other positions. • Given a sample x1, x2, ..., xn of p-dimensional vectors, an estimator of µj is a0j x̄, with an estimated variance of a0j Saj /n. • Then, an ordinary (1 − α) × 100% CI for µj can be written as v u 0 u a Saj t j a0j x̄ ± tn−1(α/2) n . 289 Simultaneous Confidence Statements (cont’d) • An alternative way to interpret the ordinary (1 − α) × 100% confidence interval is as follows: the CI is the set of values of a0µ for which √ |t| = n(a0j x̄ − a0j µ) q a0j Saj ≤ tn−1(α/2), or, equivalently t2 = n(a0j x̄ − a0j µ)2 a0j Saj = n(a0j (x̄ − µ))2 a0j Saj ≤ t2 n−1 (α/2). 290 Simultaneous Confidence Statements (cont’d) • Intuitively, if we wish to construct a set of tests for many different vectors a and have confidence level 1 − α that all intervals will cover the true a0µ, we will need a larger critical value on the right-hand side of the inequality. • What is the maximum value that the statistic t2 can reach for some vector a? 0 (x̄ − µ))2 n(a 0 S −1 (x̄ − µ) = T 2 , max t2 = max = n(x̄ − µ) a a a0Sa using the maximization lemma (2.50) on page 80 of your textbook (you checked this on an assignment). • The maximum T 2 is achieved when a is proportional to S −1(x̄ − µ). 291 Simultaneous Confidence Statements (cont’d) • Let X1, ..., Xn be a sample from Np(µ, Σ). Then simultaneously for all a, the intervals given by v u 0 Sa u p(n − 1) a 0 Fp,n−p(α) a X̄ ± t (n − p) n will cover a0µ with probability of at least 1 − α. Proof: recall that n(a0(x̄ − µ))2 2 0 −1 2 2 ≤ c T = n(x̄ − µ) S (x̄ − µ) ≤ c =⇒ a0Sa for every a. 292 Simultaneous Confidence Statements (cont’d) • Equivalently: q q a0x̄ − c a0Sa/n ≤ a0µ ≤ a0x̄ + c a0Sa/n for all a. • Choosing c2 = p(n − 1)Fp,n−p(α)/(n − p) results in intervals that contain a0µ with probability no smaller than 1 − α = Pr(T 2 ≤ c2). 293 Simultaneous Confidence Statements (cont’d) • The intervals we just defined are called T 2 because their length is determined by the sampling distribution of T 2. • For a the vector with zeros everywhere and 1 in the jth position, the T 2 interval is s s s s sjj sjj p(n − 1) p(n − 1) x̄j − Fp,n−p(α) ≤ µj ≤ x̄j + Fp,n−p(α) . (n − p) n (n − p) n • Note that for a the vector with zeros everywhere except 1 in the jth position and -1 in the kth position, the interval would correspond to µj − µk . In this case, a0x̄ = x̄j − x̄k , and a0Sa = sjj − 2sjk + skk . 294 Example: Microwave Ovens (cont’d) • Before we had obtained a simultaneous 95% confidence ellipsoid for µ1 and µ2, the means of the fourth root of radiation with door closed and door open. • We now compute 95% T 2 intervals for the two means. First note that s p(n − 1) Fp,n−p(0.05) = n(n − p) s 2(41) 3.23 = 0.397. 42(40) is common to both intervals. 295 Example: Microwave Ovens (cont’d) • For µ1, µ2: √ x̄1 ± 0.397 s11 ⇒ 0.564 ± (0.397 × 0.12) ⇒ 0.564 ± 0.0476 √ x̄2 ± 0.397 s22 ⇒ 0.603 ± (0.397 × 0.121) ⇒ 0.603 ± 0.048. • For the difference between doors closed and open: q x̄1 − x̄2 ± 0.397 s11 − 2s12 + s22 ⇒ −0.039 ± (0.397 × 0.0748) ⇒ [−0.069, −0.009], suggesting that closing the door significantly reduces the (fourth root) radiation emitted by the ovens. • The T 2 intervals are shadows or projections of the confidence ellipse onto the component axes. 296 Example: Microwave Ovens (cont’d) The T 2 intervals are shadows or projections of the confidence ellipse onto the component axes. 297 Comparison of simultaneous and ordinary t intervals • The ordinary one-at-a-time t intervals each have coverage probability 1 − α, but the joint coverage probability of p intervals is not known. • In the special case where the covariance matrix Σ is diagonal, the joint coverage probability of p ordinary t intervals is (1 − α)p. • Clearly, to guarantee 1 − α joint coverage probability, the t intervals need to be made wider. • How much wider depends on p, n and α. 298 Comparison of confidence intervals (cont’d) • The multipliers of (sjj /n)1/2 in the simultaneous intervals and in the t intervals are, respectively s p(n − 1) Fp,n−p(α), and tn−1(α/2). (n − p) • For example, for α = 0.05, n = 15 and p = 4, the simultaneous intervals are (4.14 − 2.145) × 100% = 93% 2.145 wider. 299 Comparison of confidence intervals (cont’d) • An one-at-a-time t interval is the correct choice if we are interested in only one of the components of µ. • While simultaneous T 2 intervals have the correct joint coverage probability, they tend to be too conservative if we are only interested in the p components of µ (as opposed to all possible linear combinations of the components). • Note that for p = 2, the two T 2 intervals define a rectangle that contains the ellipse (with 95% coverage probability) and more. • Thus, the rectangle formed by the two T 2 intervals has more than 1 − α coverage probability. 300 The Bonferroni method for multiple comparisons • The Bonferroni method is useful when we wish to make a small number m of comparisons for linear combinations a01µ, ..., a0mµ. • Let Ci denote a confidence statement about a0iµ such that Pr(Ci true) = 1 − αi. Then Pr(all Ci true) = 1 − Pr( at least one Ci false) X ≥ 1− Pr(Ci false) = 1− i X (1 − Pr(Ci true) i = 1 − (α1 + α2 + ... + αm). 301 The Bonferroni method for multiple comparisons • Consider, for example, m individual t intervals for µ1, ..., µm, with αi = α/m. From the Bonferroni inequality, we have that: sii α ) contains µi, for all i Pr X̄i ± tn−1( 2m n r ≥ 1− m X α i=1 m = 1 − α. • In general, to make confidence statements about p means, we divide the significance level α by the number of intervals we want to construct p. • Microwave ovens: see T 2 and Bonferroni intervals in next figure. 302 T 2 and Bonferroni confidence intervals 303