Notes 15 - Wharton Statistics Department

advertisement
Statistics 550 Notes 15
Reading: Section 3.3
For finding minimax estimators, there is not a constructive
approach like for finding Bayes estimators and it is often
difficult to find the minimax estimators. However, there
are some tools that allow us to find minimax estimators for
particular settings.
I. Finding Minimax Procedures
The minimax criteria minimizes the worst possible risk.
That is, we prefer  to  ' , if and only if
sup  R( ,  )  sup  R( ,  ') .
*
A procedure  is minimax (over a class of considered
decision procedures) if it satisfies
sup R( ,  *)  inf sup R( ,  ) .
Let   denote the Bayes estimator with respect to the prior
 ( ) and
r ( )  E [ E[l ( ,  ( X )) |  ]]  E [ R( ,  )]
denotes the Bayes risk for the Bayes estimator for the prior
 ( ) .
A prior distribution  is least favorable if r  r ' for all
prior distributions  ' .
1
This is the prior distribution which causes the statistician
the greatest average loss assuming the statistician uses the
Bayes estimator.
Theorem 2: Suppose that  is a prior distribution on  and
  is a Bayes estimator with respect to  such that
r (  )   R( ,  )d ( )  sup R( ,   )
(1.1)
Then:
(i)   is minimax.
(ii) If   is the unique Bayes solution with respect to  , it
is the unique minimax procedure.
(iii)   is a least favorable prior.
Proof:
(i) Let  be any other procedure. Then,
sup R( ,  )   R( , ) d ( )
  R( ,  )d ( )  sup R( ,   )
(ii) This follows by replacing  by > in the second
inequality of the proof of (i).
(iii) Let  ' be some other distribution of  . Then,
r ' ( ' )   R( , ' )d '(   R( , )d '(
 sup R( ,  )  r ( )
2
Corollary 1: If a Bayes procedure   has constant risk, then
it is minimax.
Proof: If   has constant risk, then (1.1) is clearly satisfied.
*
Corollary 2 (Theorem 3.3.2): Suppose  has
sup R( ,  * )  r   . If there exists a prior  * such that
 * is Bayes for  * and  *{ : R( ,  * )  r}  1 , then  * is
minimax.
Example 1 (Example 3.3.1, Problem 3.3.4): Suppose
X 1 , , X n are iid Bernoulli (  ) and we want to estimate  .
2
Consider the squared error loss function l ( , a)  (  a) .
For squared error loss and a Beta(r,s) prior, we showed in
Notes 16 that the Bayes estimator is
r   i 1 xi
n
ˆr , s 
rsn .
We now seek to choose r and s so that ˆr , s has constant risk.
The risk of ˆ is
r ,s
3
2
n



r

x

i
i 1
R( , ˆr , s )  E 
   
 r  s  n
 
 

 r   n xi
i 1
 Var 
 rsn

   r   n xi
i 1
   E 
   rsn
  


  




2
n (1   )  r  n







( r  s  n) 2  r  s  n

n (1   )  r  n  r  s  n 



( r  s  n) 2 
rsn

n (1   )  (r  r  s ) 2

( r  s  n) 2
2
2
2
The coefficient on  in the numerator is n  (r  s) and
the coefficient on  in the numerator is n  2r (r  s ) . We
choose r and s so that both these coefficients are zero:
n  (r  s)2  0, n  2r (r  s)  0
n
r

s

Solving these equations gives
2 .
The unique minimax estimator is
n
n
  i 1 xi
ˆminimax  ˆ n n  2
,
n
n
2 2

n
2
2
1
which has constant risk 4(1  n ) 2 compared to
4
 (1   )
n
for the MLE X .
For small n, the minimax estimator is better than the MLE
for a large range of  . For large n, the minimax estimator
is better than the MLE for only a small range of  near 0.5.
Note: For large n , the least favorable prior concentrates
1


nearly its entire attention on the neighborhood of
2 for
5
which accurate estimation of  is most difficult, leading to
poor performance relative to the MLE for other
neighborhoods.
Minimax as limit of Bayes rules:
If the parameter space  is not bounded, minimax rules are
often not Bayes rules but instead can be obtained as limits
of Bayes rules. To deal with such situations we need an
extension of Theorem 3.3.2.
*
Theorem 3.3.3: Let  be a decision rule such that
sup R( ,  * )  r   . Let { k } be a sequence of prior
distributions and let rk be the Bayes risk of the Bayes rule
with respect to the prior  k . If
rk  r as k   , then  * is minimax.
Proof: Suppose  is any other estimator. Then,
sup R( ,  )   R( ,  )d  k ( )  rk ,
and this holds for every k. Hence,
sup R( ,  )  sup R( ,  * ) and
 * is minimax.
Note: Unlike Theorem 3.3.2, even if the Bayes estimators
for the priors  k are unique, the theorem does not guarantee
*
that  is the unique minimax estimator.
6
Example 2 (Example 3.3.3): X 1 , , X n iid
N (  ,1),       . Suppose we want to estimate
 with squared error loss. We will show that X is
minimax.
1
First, note that X has constant risk n . Consider the
sequence of priors,  k  N (0, k ) . In Notes 16, we showed
that the Bayes estimator for squared error loss with respect
n
ˆk 
X
1 . The risk function of ˆk is
to the prior  k is
n
k
2

 

 
n
ˆ
R( , k )  E   
X 
1  

n

k  
2




n
n
[ Bias (ˆk )]2  Var (ˆk )    
 
Var ( X ) 
1
1

 n
n
k 
k

2
2
1
1
2  
n  2  
n
k 
k

2
2
2
1 
1
1


n  n 
n 
k
k
k

 


.
The Bayes risk of ˆk with respect to  k is
7
1
n  

k
rk  
2

1

n 
k

2
2
 2 
exp  
d
2 k
 2k 
1
1
n
k


2
2
1 
1

n  n 
k 
k

1
r

As k   , k
n , which is the constant risk of X .
Thus, by Theorem 3.3.3, X is minimax.
8
Download