Statistics 550 Notes 15 Reading: Section 3.3 For finding minimax estimators, there is not a constructive approach like for finding Bayes estimators and it is often difficult to find the minimax estimators. However, there are some tools that allow us to find minimax estimators for particular settings. I. Finding Minimax Procedures The minimax criteria minimizes the worst possible risk. That is, we prefer to ' , if and only if sup R( , ) sup R( , ') . * A procedure is minimax (over a class of considered decision procedures) if it satisfies sup R( , *) inf sup R( , ) . Let denote the Bayes estimator with respect to the prior ( ) and r ( ) E [ E[l ( , ( X )) | ]] E [ R( , )] denotes the Bayes risk for the Bayes estimator for the prior ( ) . A prior distribution is least favorable if r r ' for all prior distributions ' . 1 This is the prior distribution which causes the statistician the greatest average loss assuming the statistician uses the Bayes estimator. Theorem 2: Suppose that is a prior distribution on and is a Bayes estimator with respect to such that r ( ) R( , )d ( ) sup R( , ) (1.1) Then: (i) is minimax. (ii) If is the unique Bayes solution with respect to , it is the unique minimax procedure. (iii) is a least favorable prior. Proof: (i) Let be any other procedure. Then, sup R( , ) R( , ) d ( ) R( , )d ( ) sup R( , ) (ii) This follows by replacing by > in the second inequality of the proof of (i). (iii) Let ' be some other distribution of . Then, r ' ( ' ) R( , ' )d '( R( , )d '( sup R( , ) r ( ) 2 Corollary 1: If a Bayes procedure has constant risk, then it is minimax. Proof: If has constant risk, then (1.1) is clearly satisfied. * Corollary 2 (Theorem 3.3.2): Suppose has sup R( , * ) r . If there exists a prior * such that * is Bayes for * and *{ : R( , * ) r} 1 , then * is minimax. Example 1 (Example 3.3.1, Problem 3.3.4): Suppose X 1 , , X n are iid Bernoulli ( ) and we want to estimate . 2 Consider the squared error loss function l ( , a) ( a) . For squared error loss and a Beta(r,s) prior, we showed in Notes 16 that the Bayes estimator is r i 1 xi n ˆr , s rsn . We now seek to choose r and s so that ˆr , s has constant risk. The risk of ˆ is r ,s 3 2 n r x i i 1 R( , ˆr , s ) E r s n r n xi i 1 Var rsn r n xi i 1 E rsn 2 n (1 ) r n ( r s n) 2 r s n n (1 ) r n r s n ( r s n) 2 rsn n (1 ) (r r s ) 2 ( r s n) 2 2 2 2 The coefficient on in the numerator is n (r s) and the coefficient on in the numerator is n 2r (r s ) . We choose r and s so that both these coefficients are zero: n (r s)2 0, n 2r (r s) 0 n r s Solving these equations gives 2 . The unique minimax estimator is n n i 1 xi ˆminimax ˆ n n 2 , n n 2 2 n 2 2 1 which has constant risk 4(1 n ) 2 compared to 4 (1 ) n for the MLE X . For small n, the minimax estimator is better than the MLE for a large range of . For large n, the minimax estimator is better than the MLE for only a small range of near 0.5. Note: For large n , the least favorable prior concentrates 1 nearly its entire attention on the neighborhood of 2 for 5 which accurate estimation of is most difficult, leading to poor performance relative to the MLE for other neighborhoods. Minimax as limit of Bayes rules: If the parameter space is not bounded, minimax rules are often not Bayes rules but instead can be obtained as limits of Bayes rules. To deal with such situations we need an extension of Theorem 3.3.2. * Theorem 3.3.3: Let be a decision rule such that sup R( , * ) r . Let { k } be a sequence of prior distributions and let rk be the Bayes risk of the Bayes rule with respect to the prior k . If rk r as k , then * is minimax. Proof: Suppose is any other estimator. Then, sup R( , ) R( , )d k ( ) rk , and this holds for every k. Hence, sup R( , ) sup R( , * ) and * is minimax. Note: Unlike Theorem 3.3.2, even if the Bayes estimators for the priors k are unique, the theorem does not guarantee * that is the unique minimax estimator. 6 Example 2 (Example 3.3.3): X 1 , , X n iid N ( ,1), . Suppose we want to estimate with squared error loss. We will show that X is minimax. 1 First, note that X has constant risk n . Consider the sequence of priors, k N (0, k ) . In Notes 16, we showed that the Bayes estimator for squared error loss with respect n ˆk X 1 . The risk function of ˆk is to the prior k is n k 2 n ˆ R( , k ) E X 1 n k 2 n n [ Bias (ˆk )]2 Var (ˆk ) Var ( X ) 1 1 n n k k 2 2 1 1 2 n 2 n k k 2 2 2 1 1 1 n n n k k k . The Bayes risk of ˆk with respect to k is 7 1 n k rk 2 1 n k 2 2 2 exp d 2 k 2k 1 1 n k 2 2 1 1 n n k k 1 r As k , k n , which is the constant risk of X . Thus, by Theorem 3.3.3, X is minimax. 8