Stat 550 Notes 19 I. Schedule Friday, Dec. 5th: I will e-mail take home final. Monday, Dec. 8th: Lunch, Houston Hall, 12 pm, I’ll have my cell phone with me 215-850-6393 if you can’t find the group. Tuesday, Dec. 9th, 5 pm: Homework 10 due. Monday, Dec. 15th, 5 pm: Take home final due. II. Confidence Interval Example from last class Suppose X 1 , , X n iid N ( ,1) . Consider the confidence interval S ( X ) [ | X |,| X |] . The coverage probability of S ( X ) for is P ( S ( X )) P | X | | X | . For 0 , the coverage probability is 2 P | X | | X | P ( X ) P ( X ) 0.5 1 n For 0 , the coverage probability is 2 P | X | | X | P ( X ) P ( X ) 0.5 1 1 n The confidence coefficient for S ( X ) is 1 2 2 inf P ( S ( X )) min inf 0 0.5 ,inf 0 0.5 1 1 1 n n 0.5 III. Homework 8 solution correction. For Problem 1 on Homework 8 (Bickel and Doksum, Problem 3.2.4), the Bayes rule only exists, i.e., the posterior risk is finite, if and only if s n 4 . The posterior risk for an action a is proportional to s (1 ) ns 4 (a 1) a d 2 (see solutions to Homework 8). This integral is finite if and only if s n 4 . IV. Generalized Likelihood Ratio Tests (Chapter 4.9) For many testing problems, there is no UMP test. The generalized likelihood ratio test statistic is a test statistic which generally has reasonable properties and is asymptotically most powerful in a certain sense (described in Section 5.4.4 of Bickel and Doksum). Consider testing H 0 : 0 vs. H1 : 1 . The generalized likelihood ratio test statistic is: sup{ p( x | ) : 0 1} ( x) sup{ p( x | ) : 0 } We reject H 0 for large values of ( x ) . 2 To find the critical region of the test, we often look for a statistic T ( x ) which is a monotone strictly increasing function of ( x ) and for which the distribution of T ( x ) under the null hypothesis can be found. Then rejecting for large values of ( x ) is equivalent to rejecting for large values of T ( x ) and the critical value for the test in terms of T ( x ) can be determined. 2 Example: Let X 1 , , X n be iid N ( , ) where both and 2 are unknown. Suppose we want to test H 0 : 0 versus the alternative H1 : 0 . We showed when covering maximum likelihood that the maximum likelihood estimates over the whole parameter space are 1 n ˆ X , ˆ 2 i 1 ( X i X ) 2 n 2 Under the null hypothesis, only is unknown. The likelihood equation is n 2 ( x ) i 0 i 1 l x ( ) n log 2 n log 2 n n (x ) i 1 i 2 0 3 3 Solving the likelihood equation and checking that the solution is a maximum using the second derivative shows the MLE under the null hypothesis is 1 n 2 ˆ 0 ( X i 0 ) 2 n i 1 Rejecting for large values of ( x ) is equivalent to rejecting for large values of log ( x ) , which equals log ( x ) log p( x | ˆ , ˆ 2 ) log p( x | 0 , ˆ 0 2 ) n n n n [(log 2 ) log ˆ 2 ] [(log 2 ) log ˆ 0 2 ] 2 2 2 2 n log(ˆ 02 / ˆ 2 ) 2 The generalized likelihood ratio test function therefore 2 2 rejects for large values of ˆ 0 / ˆ . To simplify further, we use the following equation which can be established by n 2 2 ˆ writing 0 i 1 ( X i X X 0 ) : ˆ 02 ˆ 2 ( X 0 ) 2 . Therefore, (ˆ 02 / ˆ 2 ) 1 ( X 0 ) 2 / ˆ 2 The sample variance is s 2 (n 1)1 i 1 ( xi x )2 nˆ 2 /(n 1) . n ˆ 02 / ˆ 2 is a monotone increasing function of | Tn | where Tn n ( X 0 ) . s 4 Thus, rejecting for large values of ( x ) is equivalent to rejecting for large values of | Tn | . The distribution of Tn under H 0 is the t-distribution with n-1 degrees of freedom (See Example 4.4 in Bickel and Doksum on page 235). Thus, the generalized likelihood ratio level test rejects the null hypothesis for | Tn | greater than the 1 / 2 quantile of the t-distribution with n-1 degrees of freedom. For example, for n 25, 0.05 , we would reject H 0 if and only if | Tn | 2.064 . Large sample distribution of generalized likelihood ratio statistic: For X 1 , , X n iid, the distribution of 2 log ( x ) converges to a chi-squared distribution under the null hypothesis as the sample size n . See Section 6.3 of Bickel and Doksum. IV. Course Summary: Basic Statistical Inference Problem: We observe data X . We assume the data has been generated from the model X ~ P( X | ), where is unknown . We want to make inferences about . Three Statistical Inference Problems: (1) Point estimation – best estimate of . 5 (2) Hypothesis testing – distinguish whether is one subset of the parameter space (the null hypothesis) or its complement (the alternative hypothesis). (3) Set estimation – find a set that has a high guaranteed probability of containing . The focus of this course was to develop good decision procedures (functions of the data X used to make the statistical inferences) for these three statistical inference problems. Evaluating Decision Procedures: Decision Theory Framework: We evaluate decision procedures by defining a loss function that quantifies the loss involved when the decision procedure makes the wrong decision (e.g., squared error loss for point estimation, 0-1 loss for hypothesis testing). Within the decision theory framework, we considered two basic frameworks for evaluating decision procedures. (1) Frequentist framework: We evaluate the decision procedure based on its expected loss in repeated samples under the true parameter (the risk). Typically, we seek decision procedures which perform reasonably for all (e.g., minimax estimators, confidence sets that have guaranteed (1 ) coverage probability for all ). (2) Bayesian framework: We specify a prior distribution of our beliefs about . We then choose 6 decision procedures that have good properties based on our prior beliefs about and the data we have observed (e.g., choose the decision procedure which minimizes the Bayes risk, credibility intervals). Methods for Finding Decision Procedures Point Estimation: 1. Method of Moments 2. Maximum Likelihood (we showed this has certain asymptotically optimal properties for large sample sizes). 3. Bayes estimators 4. Minimax estimators 5. Uniformly minimum variance unbiased (UMVU) estimators Hypothesis Testing: 1. Bayes tests 2. Likelihood ratio test (Neyman-Pearson lemma shows this is optimal for simple vs. simple hypotheses) 3. Generalized likelihood ratio test Set estimation: 1. Confidence Sets a. Inversion of hypothesis tests. 2. Credibility Sets a. Highest posterior density sets Other Important Concepts 7 1. Sufficiency: Can reduce the dimension of the data that needs to be considered in formulating decision procedures. 2. Exponential Families: Important class of statistical models with several nice properties. 3. Information Inequality: Provides lower bound on the variance of an estimator for a given model. 4. Methods of Computing Maximum Likelihood Estimates: Bisection Method, Coordinate Ascent Method. V. Follow-up courses: Stat 551 (Linear Models): Detailed development of properties of the regression model (Example 1.1.4 of Bickel and Doksum) and related models. Professor Brown will offer this in the spring. Stat 552 (Asymptotics): Focuses on properties of statistical procedures as the sample size goes to infinity. Will cover the contents of Chapters 5-6 of Bickel and Doksum (but uses a different book) and additional material. Offered in the fall. Stat 541 (Applied Statistics): Focuses on exploratory data analysis and regression methods for analyzing data. Offered in the fall. Stat 542 (Bayesian Statistics): Focuses on computation of Bayesian inferences and Bayesian modeling of more 8 complicated settings than we have considered in the course (e.g., hierarchical models). Offered in the spring. 9