Chapter 5 Statistical Inference Estimation and Testing Hypotheses 5.1 Data Sets & Matrix Normal Distribution Data matrix X 11 X 1 p n observatio ns X : p variables X n1 X np where n rows X1, …, Xn are iid N p μ ,Σ . Vec(X') is an np×1 random vector with μ mean vecto r 1n μ μ covariance matrix diag Σ , ,Σ I n Σ We write X ~ N n p 1n μ ' , I n Σ . More general, we can define matrix normal distribution. Definition 5.1.1 An n×p random matrix X is said to follow a matrix normal distribution N n p M, W V if Vec X' ~ N np μ , W V , where μ VecM' . In this case, X M BYA' where W=BB', V=AA', Y has i.i.d. elements each following N(0,1). Theorem 5.5.1 The density function of X ~ N n p M, W V with W > 0, V >0 is given by 2 np 2 W p 2 V n 2 1 etr W -1 X - M V -1 X - M , 2 where etr(A)= exp(tr(A)). Corollary 1: Let X be a matrix of n observations from N p μ , Σ . Then the density function of X is 2 np where 2 Σ n 2 1 -1 etr Σ A , 2 A X j μ X j μ ' . n j1 5.2 Maximum Likelihood Estimation A. Review X 1 , , X n are i .i .d . N μ , 2 Step 1. The likelihood function L μ , 2 1 i 1 2 n 2 n 1 x μ i 2 e 2 n 1 2 2 n exp x μ i 2 2 i 1 Step 2. Domain (parameter space) μ , 2 :μ R , 2 0 The MLE of μ , 2 maximizes L μ , 2 over H. Step 3. Maximization L μ , 2 2 n x 2 n 2 2 exp 2 xi x exp 2 2 2 i 1 n n 1 2 2 2 2 exp 2 xi x 2 i 1 L x , 2 , 2 0. 1 n n It implies thatˆ x. Let a x1 x and g n 2 i 1 2 2 n 2 2 n g 2 a 1 2 2 ˆ 0 x x . 1 2 n n i 1 exp a . 2 2 Results 4.9 (p168 of textbook) B. Multivariate population X1 , , X n are samples of N p μ ,Σ . Step 1. The likelihood function L μ ,Σ 2 np 2 Σ n 2 etr 1 -1 Σ A 2 where A x j μ x j μ ' n j 1 Step 2. Domain μ , Σ :μ R p , Σ : p p , Σ 0 Step 3. Maximization (a) max L μ , Σ max x , Σ μ ,Σ 0 Σ 0 2 np 2 max Σ n 2 Σ 0 where B x j x x j x ' . n j 1 We can prove that P(B > 0) = 1 if n > p . 1 -1 etr Σ B , 2 (b) Let B CC' , C : p p , C 0. Let Σ CΣ * C' Then Σ * -1 C 'Σ -1C We have tr Σ 1B tr Σ 1CC' tr C'Σ 1C tr Σ* 1 Σ* C -1 max Σ Σ 0 B max * Σ 0 n Σ C' Σ -1 2 etr n 2 B 1 -1 Σ B 2 n 1 * -1 * 2 Σ etr Σ . 2 (c) Let λ1, …, λp be the eigenvalues of Σ * . 1 * -1 Σ etr Σ max Σ* 0 2 1 p n 2 e 2 j max j 1 , , p 0 j 1 n * 2 The function g(λ)= λ-n/2 e -1/ 2λ arrives its maximum at λ=1/n. The function L(Σ *) arrives its maximum at λ1 =1/n, …, λp =1/n and 1 * ˆ Σ Ip . n (d) The MLE of Σ is ˆ CΣ ˆ * C' 1 CC' 1 B . Σ n n Theorem 5.2.1 Let X1, …, Xn be a sample from N p μ ,Σ Σ 0 . Then the MLEs of μ and Σ are with n > p and n 1 μˆ x and Σˆ x j x x j x ' , n j 1 respectively, and the maximum likelihood is np L x ,Σ 2 2 B n 2n np 2e np 2 . Theorem 5.2.2 Under the above notations, we have a) x and Σˆ are independent; b) 1 x ~ N p μ , Σ n c) Σˆ is a biased estimator of Σ n 1 ˆ E Σ n A unbiased estimator of Σ is recommended by 1 n S x j x x j x ' n 1 j 1 called the sample covariance matrix. Theorem 5.2.3 Let θˆ be the MLE of θ and f θ be a measurable function. Then f θˆ is the MLE of f θ . Corollary 1 The MLE of the correlations rij bij biib jj ij is , where B bij . Matalb code: mean, cov, corrcoef 5.3 Wishart distribution A. Chi-square distribution Let X1, …, Xn are iid N(0,1). Then Y X 12 X n2 ~ n2, the chi-square distribution with n degrees of freedom or Definition 5.1.1 If x ~ Nn(0, In), then Y= x'x is said to have a chi-square distribution with n degrees of freedom, and write Y ~ n2 . If x ~ N n 0, 2I n , then Y 1 2 x'x ~ n2 If x ~ N n 0, Σ , then Y x' Σ -1 x ~ n2 B. Wishart distribution (obtained by Wishart in 1928) Definition 5.1.1 Let x ~ N n p 0, I n .Σ . Then we said that W= x'x is distributed according to a Wishart distribution W p n ,Σ . p 1 W p n ,Σ 2 n2 , where Σ 2 . The density of W p n ,Σ n p , Σ 0 is C W pW 0 , 1 n p 1 2 etr 1 -1 Σ W , if W 0 2 otherwise B x j -x x j - x ' ~ W p n 1,Σ . n j 1 5.4 Discussion on estimation A. Unbiaseness Let θˆ be an estimator of θ . If E θˆ θ is called unbiased estimator of θ . Theorem 5.4.1 Let X1, …, Xn be a sample from N p μ , Σ . 1 n x xj n j 1 1 n S x j x x j x ' n 1 j 1 Then and are unbiased estimators of μ and Σ , respectively. Matlab code: mean, cov, corrcoef B. Decision Theory t x : an estimator of θ based on sample X Lθ , t : a loss function pθ x : the density of X with the parameter θ Then the average of loss is give by Rθ , t Eθ Lθ , t Lθ , t x pθ x dx That is called the risk function. max R θ , t : the maximum risk if t is employed. θ Definition 5.4.2 An estimator t(X) is called a minimax estimator of θ if max R θ , t min max R θ , t θ t θ Example 1 Under the loss function Lθ , t θ t ' θ t , the sample mean x is a minimax estimator of μ . C. Admissible estimation Definition 5.4.3 An estimator t1(x) is said to be at least as good as another t2(x) if Rθ , t1 Rθ , t 2 ,θ And t1 is said to be better than or strictly dominates t2 if the above inequality holds with strict inequality for at least one θ . Definition 5.4.4 An estimator t* is said to be inadmissible if there exists another estimator t** that is better than t*. An estimator t* is admissible if it is not inadmissible. The admissibility is a weak requirement. Under the loss L μ , t μ t ' μ t , the sample mean x is an inadmissible if the population is N p μ , Σ and p 3. James & Stein pointed out p-2 ˆ μ 1x n x 'x is better than x when p 3. The estimator μˆ is called James-Stein estimator. 5.5 Inferences about a mean vector (Ch.5 Textbook) Let X1, …, Xn be iid samples from N p μ , Σ . H 0 : μ μ0 , Case A: H1 : μ μ0 Σ is known. a) p = 1 u x μ 0 n ~ N 0, 1 b) p > 1 T02 n x μ 0 ' Σ 1 x μ 0 . Under the hypothesis H0 , x ~ N p μ 0 , 1 Σ . Then 1 2 n 1 x μ0 Σ y , y ~ N p 0, I p . n nΣ 1 2 x μ0 y T02 nx μ 0 ' Σ 1 x μ 0 y'y ~ 2p Theorem 5.5.1 Let X1, …, Xn be a sample from N p μ , Σ , where is Σ known. The null distribution of T02 under H 0 :μ μ 0 is 2p and the rejection area is T02 2p . Case B: Σ is unknown. a) Suggestion: Replace Σ by the Sample Covariance 2 Matrix S in T0 , i.e. T2 n x μ 0 ' S -1 x0 μ 0 nn 1 x μ 0 'B -1 x μ 0 where 1 1 n x j x x j x ' S B n-1 n-1 j 1 There are many theoretic approaches to find a suitable statistic. One of the methods is the Likelihood Ratio Criterion. The Likelihood Ratio Criterion (LRC) Step 1 The likelihood function np L μ , Σ 2 n 2 Σ n 2 1 -1 etr Σ A 2 where A x j μ x j μ ' j 1 Step 2 Domains μ ,Σ |μ R ,Σ 0 p μ , Σ | μ μ 0 , Σ 0 max L μ ,Σ max L μ ,Σ Step 3 Maximization We have obtained np maxL μ, Σ 2 2 By a similar way we can find np max L μ ,Σ 2 H0 2 where A0 2 n n 2 n 2 e np 2 e 2 np 2 j μ0 x j μ0 ' x j x x μ0 x j x x μ0 ' j 1 n j 1 under H 0 A0 np np x n B n B n x μ 0 x μ 0 ' Then, the LRC is A0 B n n 2 2 B n 2 B n x μ 0 x μ 0 ' n 2 Note 1 x μ ' B n x μ 0 x μ 0 ' n x μ 0 B 1 2 B 1 n x μ 0 'B x μ 0 B 1 T n 1 -1 Finally T 1 n 1 2 n 2 T2 Remark: Let t(x) be a statistic for the hypothesis and f(u) is a strictly monotone function. Then x f t x is a statistic which is equivalent to t(x). We write x t x 5.6 T2-statistic Definition 5.6.1 Let W ~ W p n, Σ and μ ~ N p 0 , Σ be independent with n > p. The distribution of T 2 n μ 'W 1 μ is called T2 distribution. • The distribution T2 is independent of Σ , we shall write T 2 ~ Tp2,n • n p 1 np • As T 2 ~ Fp ,n p 1 n x μ 0 ~ N p 0 ,Σ , B ~ W p n 1,Σ T 2 n 1 n x μ 0 ' B 1 n x μ 0 ~ T p2,n 1 And n p T 2 ~ F p ,n p p n 1 Theorem 5.6.1 n p 2 Under H 0 : μ μ 0 , T ~ T p ,n 1 and T ~ F p ,n p n 1 p 2 2 Theorem 5.6.2 The distribution of T 2 is invariant under all affine transformations y GX d , G : p p , G 0 , d : p 1 of the observations and the hypothesis Confidence Region • A 100 (1- α )% confidence region for the mean of a pdimensional normal distribution is the ellipsoid determined by all μ such that p(n 1) n( x μ)' S ( x μ) F p , n p (α ) n p 1 Proof: Original observations mean Given mean H0 X1, …, Xn Sample Mean Sample Covariance Matrix x Ty2 After transformation y1 d Gx1 , , yn d Gxn μ μ0 μ μ0 S n y - μ 0* ' S y-1 y - μ 0* Gμ d μ * Gμ 0 d μ 0* μ * μ 0* y d Gx GSG' S y 1 n G x - μ 0 ' GSG' G x - μ 0 n x - μ 0 'S -1 x - μ 0 Tx2 Example 5.6.1 (Example 5.2 in Textbook) Perspiration from 20 healthy females was analysis. SWEAT DATA Individual X1 (Sweat rate) X2 (Sodium) X3 (Potassium) 1 3.7 48.5 2 5.7 65.1 3 3.8 47.2 4 3.2 53.2 5 3.1 55.5 6 4.6 36.1 7 2.4 24.8 8 7.2 33.1 9 6.7 47.4 10 5.4 54.1 11 3.9 36.9 12 4.5 58.8 13 3.5 27.8 14 4.5 40.2 15 1.5 13.5 16 8.5 56.4 17 4.5 71.6 18 6.5 52.8 19 4.1 44.4 20 5.5 40.9 Source: Courtest of Dr. Gerald Bargman. 9.3 8.0 10.9 12.0 9.7 7.9 14.0 7.6 8.5 11.3 12.7 12.3 9.8 8.4 10.1 7.1 8.2 10.9 11.2 9.4 4 4 H 0 : μ 50, H1 : μ 50, 10 10 Computer calculations provide: 4.640 2.879 10.010 x 45.400 . S 10.010 199.788 9.965 - 1.810 - 5.640 and .586 - .022 .258 S -1 - .022 .006 - .002 .258 - .002 .402 - 1.810 - 5.640 3.628 We evaluate T 2 204.640 4, 45.400 50, 9.965 10 .586 .022 .258 4.640 4 .022 .006 .002 45.400 50 .258 .002 .402 9.965 10 .467 20.640, 4.600, .035 .042 9.74 .160 Comparing the observed T 2 9.74 with the critical value n 1 p F .10 193 F .10 3.3532.44 8.18 3 ,17 n p p ,n p 17 we see that T 2 9.74 8.18 , and consequently, we reject H0 at the 10% level of significance. Mahalanobis Distance Definition 5.6.1 Let x and y be samples of a population G with mean μ and covariance matrix Σ 0 . The quadratic forms DM2 x, y x y 'Σ -1 x y and DM2 x, G x - μ 'Σ -1 x - μ are called Mahalanobis distance (M-distance) between x and y, and x and G, respectively. If can be verified that • DM x , y 0 , DM x , y 0 , x y • DM x , y DM y , x • DM x, y DM x, z DM z, y , • x, y, z T02 n x - μ 0 ' Σ -1 x - μ 0 nDM x, G 5.7 Two Samples Problems (Section 6.3, Textbook) 5.7 Two Samples Problems (Section 6.3, Textbook) We have two samples from the two populations G1 : N p μ1 , Σ , G2 : N p μ 2 , Σ , x1 , , x n , n p y1 , , ym , m p where μ1 , μ 2 and Σ are unknown. H 0 : μ1 μ 2 , The LRC is nm 1 x y 'S -pooled x y T nm 2 1n where x xi , n i 1 S pooled H1 : μ1 μ 2 1 m y yj m j 1 m 1 n xi x xi x ' y j y y j y ' n m 2 i 1 j 1 Under the hypothesis 2 T ~T 2 p , n m -1 n m p 1 2 and T ~ Fp ,n m p 1 n m 2 p The 1001 % confidence region of a' μ1 μ 2 is 1 2 nm 2 a' x y T a'S pooled a a' μ1 μ 2 nm 1 2 nm 2 a' x y T a'S pooled a , nm where T2 Tp2,n m1 . Example 5.7.1(p.338-339) Jolicoeur and Mosimann (1960) studied the relationship of size and shape for painted turtles. The following table contains their measurements on the carapaces of 24 female and 24 male turtles. Female Male Length(x1 ) Width(x2 ) Height(x3 ) Length(x1 ) Width(x2 ) Height(x3 ) 98 103 103 105 109 123 123 133 133 133 134 136 138 138 141 147 149 153 155 155 158 159 162 177 81 84 86 86 88 92 95 99 102 102 100 102 98 99 105 108 107 107 115 117 115 118 124 132 38 38 42 42 44 50 46 51 51 51 48 49 51 51 53 57 55 56 63 60 62 63 61 67 93 94 96 101 102 103 104 106 107 112 113 114 116 117 117 119 120 120 121 125 127 128 131 135 74 78 80 84 85 81 83 83 82 89 88 86 90 90 91 93 89 93 95 93 96 95 95 106 37 35 35 39 38 37 39 39 38 40 40 40 43 41 41 41 40 44 42 45 45 45 46 47 136.0417 11.3750 x 102.5833 , y 88.2917 52.0417 40.7083 S pooled 295.1431 175.0607 101.6649 175.0607 110.8869 61.7491 101.6649 61.7491 37.9982 24 24 1 x y 'S -pooled x y 72.3816 T 24 24 24 24 3 1 2 F T 23.0782 F3, 44 0.01 4.30 324 24 2 2 5.8 Multivariate Analysis of Variance A. Review There are k normal populations G1 : N μ 1 , σ 2 , x11 , , xn11 , x1 Gk : N μ k , σ 2 , x1 k , , xn kk , xk One wants to test equality of the means μ 1 , , μ k H 0 : μ1 μ k , H1 : μ1 μ j , for some i j The analysis of variance employs decomposition of sum squares k SSTR na x a x , sum of squares amongtreat ment s k a 1 na 2 a a SSE x j x a , a 1 j 1 k na 2 SST x j x , 2 sum of squares within group totalsum of squares a 1 j 1 where 1 na a 1 k na a xa x j , x x j , n n1 nk na j 1 n a 1 j 1 The testing statistics is SS TR k 1 H 0 F ~ Fk 1,n k SSE n k B. Multivariate population (pp295-305) G1 : N p μ 1 , Σ , x11 , , x n1k Gk : N p μ k , Σ , k k x1 , , x nk Σ is unknown, one wants to test H 0 : μ1 μ k , H1 : μ1 μ j , for some i j I. The likelihood ratio criterion Step 1 The likelihood function np L μ1 , , μ k ,Σ 2 2 Σ n 2 etr 1 -1 Σ A , 2 where A x ja μ a x ja μ a ' k na a 1 j 1 Step 2 The domains ω μ1 ,, μ k , Σ : μ j R , j 1, ,k, Σ 0 μ1 ,, μ k , Σ : μ1 μ k R , Σ 0 p p Step 3 Maximization np max L μ1, ,μ k , Σ 2 e 2 np max L μ1, ,μ k , Σ 2 e where k na 2 1 T n 1 E n n 2 n 2 T x ja x x ja x ' , a 1 j 1 k na E x ja xa x ja xa ' a 1 j 1 are the total sum of squares and products matrix and the error sum of squares and products matrix, respectively. 1 na a xa xj , na J 1 1 k na a x xj n a 1 j 1 The treatment sum of squares and product matrix B T E na xa x xa x ' . k a 1 The LRC E T n n 2 2 E E . T EB Definition 5.8.1 Assume A ~ W p n ,Σ and B ~ W p m ,Σ are independent, where n p , m p , Σ 0 . The distribution A A B is called Wilks -distribution and write ~ p,n,m . Theorem 5.8.1 Under H0 we have 1) T ~ W p n 1, Σ , E ~ W p n k , Σ , B ~ W p k 1, Σ 2) E and B are independent 3) The LRC under the hypothesis has a p,n-k,k -1 Special cases of the Wilks -distributions p ,n ,m n p 11 m 1 ~ Fp ,n p 1 p n p1 m2 ~ F2 p ,2n p p n 1 p 1 ~ Fm ,n m n 11 p2 ~ F2 m ,2n 1 m See pp300-305, Textbook for example. 2. Union-Intersection Decision Rule H 0 : μ1 μk Consider projection hypothesis H a 0 : a' μ1 a' μ k , a R , a 0 p H 0 H a0 a 0 Ga1 : N (a' μ1 , a' Σa) : a' x1(1) ,, a' x n(11 ) Gak : N (a' μ k , a' Σa) a' x1( k ) ,, a' x n( kk ) For projection data, we have SSTR a'Ba , SSE a'Ea SST a'Ta and the F-statistic a'Ba k - 1 H0 Fa ~ Fk-1,n-k a'Ea n k Fa Fk-1,n-k * . The rejection region for H is R F F With the rejection region 0 a a aR p Fa or that implies the testing statistic is max aR p a'Ba max a 0 a'Ea k 1, n k α * Lemma 1 Let A be a symmetric matrix of order p. Denote by 1 2 p , the eigenvalues of A, and l1 l p , the associated eigenvectors of A. Then max x' Ax max x' Ax 1 x' x x 1 min x' Ax min x' Ax p x' x x 1 x0 x0 Lemma 2 Let A and B are two p× p matrices and A’=A, B>0. Denote by 1 p and l1 l p , the eigenvalues of 1 1 B 2 AB 2 and associated eigenvectors. Then x' Ax 1 max x' Bx x0 x' Ax k 1,k 1,, p 1 min x 'li 0 ,i 1,, k , x ' Bx x0 Remark1: 1 , , p are eigenvalue s of A B 0 . Remark2: The union - intersecti on statistic is the largest eigenvalue s of B E 0 . 1 1 2 BE 2 p 1 Remark3: Let 1 p be eigenvalues of E Wilks -statistic can be expressed as . The . i 11 i