AMS 572 Lecture Notes Oct. 3, 2013. Topic 1: Let W ~ k2 . W is indeed a special Gamma random variable. Gamma distribution X ~ gamma ( , ) (Some books use 1 ) x 1 1 if f ( x) x e , 0 ( ) x or f ( x) r ( r ) x r 1e x , 0 0 1 f ( x)dx x r ( r ) if r is a non-negative integer, then (r ) ( r 1)! x r 1e x dx (r ) r xr 1e x dx 0 t M X (t ) 1 r t M X (t ) t E( X ) r r r Var ( X ) r 2 k 1 , X ~ k2 2 2 Special case : when r 1 X ~ exp( ) Special case : when r Review X ~ exp( ) p.d.f. f ( x) e x , x m.g.f. M X (t ) 0 t n i .i .d . e.g. Let X i ~ exp( ) , i 1, , n . What is the distribution of X i 1 i ? Solution 1 M n ( t ) M X i (t ) X i t i 1 n X i ~ gamma(r n, ) e.g. Let W ~ k2 . What is the mgf of W ? Solution k 1 2 M W (t ) 2 1 t 2 k 1 2 M W (t ) 1 2t 2 2 e.g. Let W1 ~ k1 , W2 ~ k2 , and W1 and W2 are independent. What is the distribution of W1 W2 ? Solution k1 k2 1 2 1 2 1 M W1 W2 (t ) M W1 (t ) M W2 (t ) 1 2t 1 2t 1 2t W1 W2 ~ k21 k2 k1 k2 2 Topic 2. Methods for deriving point estimators Maximum Likelihood Estimator (MLE) Method Of Moment Estimator (MOME) i .i .d . Example. X i ~ N ( , 2 ) , i 1, ,n 1. Derive the MLE for and 2 . 2. Derive the MOME for and 2 . Solution. 1. MLE [i] f ( xi ) 1 2 2 e ( xi )2 2 2 1 ( x )2 (2 2 ) 2 exp i 2 , xi R, i 1, 2 [ii] likelihood function L f ( x1 , x2 , ,n n , xn ) f ( xi ) i 1 2 1 n ( x )2 (2 2 ) 2 exp[ i 2 ] 2 i 1 n ( xi ) 2 1 (2 2 ) 2 exp i 1 2 2 [iii] log likelihood function n ( xi )2 1 n l ln L ( ) ln(2 ) ln 2 i 1 2 2 2 2 [iv] n 2 ( xi ) l i 1 0 2 2 n ( xi ) 2 l n i 1 0 2 2 2 2 4 ˆ X ( X i X )2 2 ˆ n 2. MOME Order Population Moment 1st E( X ) nd X1 X 2 n Xn 2 X 12 X 22 n X n2 k X 1k X 2k n X nk E( X ) 2 kth Sample Moment E( X ) E( X ) X n E( X 2 ) 2 2 X i 1 2 i n 3 ˆ X n X i2 ˆ 2 i 1 ( X )2 n We can also see that: n X i2 i 1 n n ( X )2 ( X i X X )2 i 1 n (X i ( X )2 n n i 1 i 1 n ( X i X )2 2 X ( X i X ) ( X )2 i 1 n ( X )2 X )2 n Therefore, the MLE and MOME for 2 are the same for the normal population. ( X i X )2 n 1 ( X i X )2 E (ˆ 2 ) E E n n 1 n n 1 n 1 2 n 2 E (S 2 ) (asymptotically unbiased) n n i .i .d . Example. Let X i ~ Bernoulli ( p ) , i 1, Please derive 1. The MLE of p Solution. 1. MLE [i] f ( xi ) p xi (1 p )1 xi , i 1, ,n. 2. The MOME of p. ,n f ( x ) p (1 p) [iii] l ln L ( x )ln p (n x )ln(1 p) dl x n x [iv] 0 [ii] L n xi xi i i i i i 1 p dp p pˆ x i n 2. MOME X1 X 2 n xi E( X ) p pˆ Xn n 4 Topic 3. Likelihood Ratio Test (for one population mean, normal population) 1. Please derive the likelihood ratio test for H0: μ = μ0 versus Ha: μ ≠ μ0, when the population is normal and population variance σ2 is known. Solution: First we write down the two parameter space as follows: (1) The parameter space under H0: : 0 , and, (2) The unrestricted parameter space: : The likelihoods are: (1) The likelihood under H0: L L 0 f x1 , x 2 ,, x n ; 0 i 1 f xi ; 0 n n xi 0 2 1 2 n 1 xi 0 2 i 1 exp exp 2 2 2 i 1 2 2 2 2 2 The MLE for μ under H0: is μ̂ ω = μ0 Thus the maximum value of the likelihood L is is L(ω ̂ ) = L(μ̂ ω ) = L(μ0 ). n 1 (2) The unrestricted likelihood is: L L f x1 , x2 ,, xn ; i 1 f xi ; n n x 2 1 2 n 1 xi 2 i 1 exp i 2 exp 2 2 i 1 2 2 2 2 2 There is only one free parameter μ in L . Now we shall derive the MLE for μ. That is, we will find the value of μ that maximizes the log likelihood n n 1 xi 2 . ln L ln 2 2 2 i 1 2 2 n d ln L 1 By solving X 2 i 1 xi 0 , we have μ̂ = ̅ d ̅ indeed maximizes the loglikelihood, and thus the It is easy to verify that μ̂ = X likelihood function. So it is the MLE. ̂ ) = L(μ̂) = L(x̅). Thus the maximum value of the likelihood L is L(Ω n 1 (***Note: the unrestricted MLE is simply the usual MLE we discussed in the point estimator section. Therefore to simply the notation, we shall not have insert any subscript for the point estimators under Ω.***) Therefore the likelihood ratio is: 5 L 0 Lˆ ˆ max L L n n 1 2 1 xi 0 2 exp 2 2 i 1 L 0 2 2 n L x n 1 2 1 xi x 2 exp 2 2 i 1 2 2 n 1 xi 0 2 xi x 2 exp 2 i 1 2 2 1x 1 2 0 exp exp z 0 2 2 / n Therefore, the likelihood ratio test that will reject H0 when * is equivalent to the z-test that will reject H0 when Z 0 c , where c can be determined by the significance level α as c z / 2 . 2. Please derive the likelihood ratio test for H0: μ = μ0 versus Ha: μ ≠ μ0, when the population is normal and population variance σ2 is unknown. Solution: Now that we have two unknown parameters, the parameter spaces are: , 2 : 0 , 0 2 and , 2 : , 0 2 The likelihood under the null hypothesis is: L L 0 , 2 n x i 0 2 1 2 n 1 2 i 1 exp exp xi 0 2 2 2 i 1 2 2 2 2 2 There is one free parameter, σ2, in L . Now we shall find the value of σ2 that n 1 n 2 maximizes the log likelihood ln L ln 2 2 2 i 1 xi 0 . By 2 2 d ln L n n 1 2 2 = 1 ∑n (X − μ )2 2 4 i1 xi 0 0 , we have σ̂ solving ω 0 2 n i=1 i d 2 2 n 1 It is easy to verify that this solution indeed maximizes the loglikelihood, and thus the likelihood function. The unrestricted likelihood is: L L , 2 i 1 n x i 2 exp 2 2 2 2 1 n 1 2 1 2 2 exp 2 2 x n i 1 i 2 6 There are two free parameter μ and σ2 in L . Now we shall find the value of μ and σ2 that maximizes the log likelihood n n 1 xi 2 . ln L ln 2 2 2 i 1 2 2 By solving the equation system: ln L 1 ln L n n n 1 2 2 4 i1 xi 0 2 i 1 xi 0 and 2 2 2 1 ̂2 = ∑n (Xi − ̅ we have ̂ X and σ X)2 i=1 n It is easy to verify that this solution indeed maximizes the loglikelihood, and thus the likelihood function. Therefore the likelihood ratio is: n 2 L 0 , ˆ2 L ˆ max 2 L 0 , ˆ max , 2 L , 2 L ˆ , ˆ 2 L n xi 0 2 i n1 xi x 2 i 1 n 2 2 n n exp n 2 2 xi 0 2 i 1 n 2 n n exp n 2 2 xi x 2 i 1 n xi x 2 n x 0 2 i 1 n 2 x x i 1 i n 2 2 n x 0 1 n xi x 2 i 1 n 2 n t0 2 2 1 n 1 Therefore, the likelihood ratio test that will reject H0 when * is equivalent to the t-test that will reject H0 when t0 c , where c can be determined by the significance level α as c tn 1, / 2 . 3. Optimal property of the Likelihood Ratio Test Theorem (Neyman-Pearson Lemma). The likelihood ratio test for a simple null hypothesis 𝐻0 : 𝜃 = 𝜃 ′ versus a simple alternative hypothesis 𝐻𝑎 : 𝜃 = 𝜃 ′′ is a most powerful test. 7 Egon Sharpe Pearson (August 11, 1895 – June 12, 1980); Karl Pearson (March 27, 1857 – April 27, 1936); Maria Pearson (née Sharpe); Sigrid Letitia Sharpe Pearson. In 1901, with Weldon and Galton, Karl Pearson founded the journal Biometrika whose object was the development of statistical theory. He edited this journal until his death. In 1911, Karl Pearson founded the world's first university statistics department at University College London. His only son, Egon Pearson, became an eminent statistician himself, establishing the Neyman-Pearson lemma. He succeeded his father as head of the Applied Statistics Department at University College. http://en.wikipedia.org/wiki/Karl_Pearson Jerzy Neyman (April 16, 1894 – August 5, 1981), is best known for the NeymanPearson Lemma. He has also developed the concept of confidence interval (1937), and contributed significantly to the sampling theory. He published many books dealing with experiments and statistics, and devised the way which the FDA tests medicines today. Jerzy Neyman was also the founder of the Department of 8 Statistics at the University of California, http://en.wikipedia.org/wiki/Jerzy_Neyman Berkeley, in 1955. Note: Under certain conditions, the likelihood ratio test is also a uniformly most power test for a simple null hypothesis versus a composite alternative hypothesis, or for a composite null hypothesis versus a composite alternative hypothesis. (See for example, the Karlin-Rubin theorem.) Topic 4. Likelihood Ratio Test (for two population means, both populations are normal, the populations variances are unknown but equal, and we have two independent samples) Derivation of the Pooled Variance T-Test (2-sided test) Given that we have two independent random samples from two normal populations with equal but unknown variances. Now we derive the likelihood ratio test for: H0 : μ1 = μ2 vs Ha : μ1 ≠ μ2 Let μ1 = μ2 = μ, then, ={−∞ < μ1 = μ2 = μ < +∞, 0 ≤ σ2 < +∞}, Ω = {−∞ < μ1 , μ2 < +∞, 0 < σ2 < +∞} 1 n1 +n2 2 1 n1 2 (xi − μ)2 + ∑nj=1 L(ω) = L(μ, σ2 ) = ( 2 ) 2 exp[− 2 (∑i=1 (yj − μ) )], and 2πσ 2σ there are two parameters . 2 n +n 1 n1 2 (xi − μ)2 + ∑nj=1 lnL(ω) = − 1 2 2 ln(2πσ2 ) − 2σ2 (∑i=1 (yj − μ) ), for it contains two parameters, we do the partial derivatives with and σ2 respectively and let the partial derivatives equal to 0. Then we have: n2 1 ∑ni=1 xi + ∑j=1 yj n1 x̅ + n2 y̅ μ̂ = = n1 + n2 n1 + n2 2 = σ̂ ω n1 n2 1 2 [∑ (xi − μ̂)2 + ∑ (yj − μ̂) ] n1 + n2 i=1 j=1 1 n1 +n2 1 2 n1 2 (xi − μ1 )2 + ∑nj=1 L(Ω) = L(μ1 , μ2 , σ2 ) = (2πσ2 ) 2 exp[− 2σ2 (∑i=1 (yj − μ2 ) )], and there are three parameters. n1 n2 n1 + n2 1 2 lnL(Ω) = − ln(2πσ2 ) − 2 (∑ (xi − μ1 )2 + ∑ (yj − μ2 ) ) 2 2σ i=1 j=1 We do the partial derivatives with μ1 , μ2 and σ2 respectively and let them all equal to 0. Then we have: n1 n2 1 2 μ ̂1 = x̅, μ ̂2 = y̅, σ̂2Ω = [∑ (xi − x̅)2 + ∑ (yj − y̅) ] n1 + n2 i=1 j=1 At this time, we have done all the estimation of parameters. Then, after some cancellations/simplifications, we have: 9 n1 +n2 2 1 n1 +n2 ( ̂ ) 2 2 ̂ 2 L(ω ̂) σ 2πσω Ω λ= = ] n1 +n2 = [ ̂ ̂) L(Ω σ2ω 2 1 ( ̂2 ) 2πσΩ 1 ∑ni=1 (xi − x̅)2 + 2 ∑nj=1 (yj 2 n1 +n2 2 − y̅) =[ ] n1 x̅ + n2 y̅ 2 n1 x̅ + n2 y̅ 2 n2 1 ∑ni=1 ∑ (xi − n + n ) + j=1 (yj − n + n ) 1 2 1 2 n1 +n2 t 20 ]− 2 n1 + n2 − 2 where t 0 is the test statistic in the pooled variance t-test. Therefore, λ ≤ λ∗ is equivalent to |t 0 |≥ c. Thus at the significance level α, we reject the null hypothesis in favor of the alternative when |t 0 | ≥ c = t n1 +n2−2,α/2 . = [1 + 10