1.4 Decision principles

advertisement
1.4 Decision Principles
The principles used to select a sensible decision are:
(a) Conditional Bayes Decision Principle
(b) Frequentist Decision Principle.
(a)
Conditional Bayes Decision Principle:
Choose an action
a A
minimizing
a Bayes action and will be denoted
a
  , a  . Such a will be called
.
Example 4 (continue):
Let A  a1 , a2 ,  1   0.99,   2   0.01 . Thus,
  , a1   485,   , a2   294 .
Therefore,
a  a1 .
(b)
Frequentist Decision Principle:
3 most important frequentist decision principle:
 Bayes risk principle
 Minimax principle
 Invariance principle
(1) Bayes Risk Principle:
Let D be the class of the decision rules. Then, for
rule
 1 ,  2  D , a decision
1 is preferred to a rule  2 based on Bayes risk principle if
1
r  , 1   r  ,  2 
.
A decision rule minimizing r  ,   among all decision rules in class D

is called a Bayes rule and will be denoted as  . The quantity

r    r  ,  
is called Bayes risk for

.
Example 5 (continue):


X ~ N  ,1,  ~ N 0,  2 ,
D  cx : c is any constant .
Let
 c  X   cX . Then,
R ,  c   E   cX   E cX     E cX  c   c  1 
2

2
2
 E cX  c   2cX  c c  1  c  1  2
2
2

 c 2Var  X   c  1  2
2
 c 2  c  1  2
2
and

  , c   E  R ,  c   E  c 2  c  12 2
 

 c 2  c  1 E   2
2
 c 2  c  1  2
2
Note that
  ,  c  is a function of c.   ,  c  attains its
2
minimum as c 
,
1  2
2
2
( f c     ,  c   c  c  1  , f c   0  c 
)
1  2
2
2
2
'
Thus,

2
1 2
2
X  
X
2
1 
is the Bayes estimator.
In addition,

  2
r    r   ,   2
  
2
2
1 

1 
2
2

 2
 2
  
 

1
2

1 

4
2


2 2
1    1   2 2
 2 1   2 

1   2 2
2

1  2
Example 4 (continue):
Let D  a1 , a2 ,  1   0.99,   2   0.01. Then,
r  , a1   E  R , a1   E  L , a1   485
and
r  , a2   E  R , a2   E  L , a2   294 .
Thus,
a1
is the Bayes estimator.
Note:
In a no-data problem, the Bayes risk (frequentist principle) is
3
equivalent to the Bayes expected loss (conditional Bayes principle).
Further, the Bayes risk principle will give the same answer as the
conditional Bayes decision principle.
Definition:
Let
X   X 1 , X 2 ,, X n 
have the probability distribution
function (or probability density function)
f x |    f x1 , x2 ,, xn |  
with prior density and prior cumulative distribution
  
and
F    , respectively. Then, the marginal density or distribution
of
X   X 1 , X 2 ,, X n 
mx   mx1 , x2 ,, xn   

is
    f x |  d


f x |  dF   
     f x |  
 
The posterior density or distribution of
 given x
f  | x   f  | x1 , x2 ,, xn  
The posterior expectation of
g  
given
is
   f x |  
mx 
x
.
is

 g     f x |  d

 
 g   f  | x d 
m x 
E f  |x  g    

  g   f  | x    g     f x |  
 
 

m x 

4
Very Important Result:
Let
X   X 1 , X 2 ,, X n 
have the probability distribution
function (or probability density function)
f x |    f x1 , x2 ,, xn |  
with prior density and prior cumulative distribution
  
and
F    , respectively. Suppose the following two assumptions hold:
(a) There exists an estimator
(b) For almost all
0
with finite Bayes risk.

x , there exists a value  x 
minimizing
 L ,  x  f  | x d

f  | x 
L , x   
  ,    E
  L ,  x  f  | x 
 
Then,
(a)
if
L , a   a  g  
2
,
then
 g   f  | x d


f  | x 
g    
 x   E
  g   f  | x 
 
and, more generally, if
L , a   w a  g  
2
then
5
,
 w g   f  | x d
 

f  | x 
 w  f  | x 

E
w g   

 x  

E f  | x  w 
  w g   f  | x 
 
  w  f  | x 


(b)
if
then
L , a   a  
  x 
f  | x 
,
is the median of the posterior density or distribution
 given x . Further, if
of
k 0   a ,   a  0
L , a   
,
 k1 a   ,   a  0
then
 x 

is the
k0
k0  k1 percentile of the posterior density or
f  | x 
distribution
of
 given x .
(c)
if
 0 when a    c
L , a   
1 when a    c
then,
  x 
is the midpoint of the interval
which maximizing
6
I
of length
2c
 f  | x d
I
P  I | x   
  f  | x 
 I
[Outline of proof]
(a)

  , a   E f  |x  L ,  x   E f  |x  w a  g  2



 E f  |x  w  a 2  2ag    g 2  







 E f  |x  w  a 2  2 E f  |x  g  w  a  E f  |x  g 2  w 
Thus
  , a 
 2 E f  |x  w a  2 E f  |x  g  w   0
a
E f  |x  g  w 
 a
E f  |x  w 
(b)
Without loss of generality, assume m is the median of
f  | x  . We
want to prove
  , m    , a   E f  |x  L , m  L , a   0 ,
for
a  m .
Since
L , m   L , a   m    a  
 m    a     m  a,

   m  a    2  m  a,
   m    a  a  m,

7
 m
m   a
 a
then
E f  |x  L , m   L , a 
 m  a P  m | x   2  m  a Pm    a | x 
 a  m P  a | x 
 m    a




2


m

a


 m  a P  m | x   a  m Pm    a | x  
   m     a 


 m  a



 a  m P  a | x 
 m  a P  m | x   a  m P  m | x 

ma am

0
2
2
[Intuition of the above proof:]
a1
a3
a2
c1
c3
c2
3
We want to find a point c such that
a
i 1
2
i 1
i
achieves its
c  a2 ,
minimum. As
3
ca
 ai  a2  a1  a2  a2  a2  a3  a3  a1 .
As c  c1 
3
c
i 1
1
 ai  c1  a3  a3  a1 .
As
8
3
c  c2   c2  ai  c2  a1  c2  a2  c2  a3  a3  a1  c2  a2
i 1
,
 a3  a1
As c  c3 
3
c
i 1
3
 ai  c3  a1  a3  a1 .
3
c  a2 ,
Therefore, As
ca
i
i 1
achieves its minimum.
(2) Minimax Principle:
A decision rule
1 is preferred to a rule  2 based on the minimax
principle if
sup R ,  1   sup R ,  2  .

A decision 

minimizing sup R ,   among all decision rules in
M

class D is called a minimax decision rule, i.e.,


sup R  ,  M  inf sup R  ,   .
 
 D  
Example 5 (continue):
D  cx : c is any constant 
and
R , c   c 2  c 1  2 .
2
Thus,


1 if c  1
2
sup R ,  c   sup c 2  1  c   2  
.


 if c  1
Therefore,
 M  1  X   X
is the minimax decision rule.
9
Example 4 (continue):
D  a1 , a2 . Then,
sup R , a1   sup L , a1   1000

and

sup R , a 2   sup L , a 2   300

Thus,
.

 M  a2 .
(3) Invariance Principle:
If two problems have identical formal structures (i.e., the same sample
space, parameter space, density, and loss function), the same decision
rule should be obtained based on the invariance principle.
Example 6:
X: the decay time of a certain atomic particle (in seconds).
Let X be exponentially distributed with mean
f x |   
1

e
x
Suppose we want to estimate the mean

,
,0  x  .
 . Thus, a sensible loss function
is
L , a  
1
2
  a 
2
a

 1  
 
2
.
Suppose
Y: the decay time of a certain atomic particle (in minutes).
10
Then,
X
1  y

Y 
, f y |  e
, 0  y  ,  
60

60 .
Thus,
2
  a 
 2
 a    60   a 
L , a   1    1 
 1    L  , a 
         
  60 
2
where
a  a
60

,
..
Let
  X  : the decision rule used to estimate 
and
 Y  : the decision rule used to estimate  .
Based on the invariance principle,
 X 
X
  Y     X   60 Y   60  .
60
 60 
The above augments holds for any transformation of the form
Y  cX , c  R , based on the invariance principle. Then,
1
c
1
1
X
  X    Y    cX  

   X   X 1  kX , k   1
c
c
Thus,
  X   kX
is the decision rule based on the invariance
principle.
11
Download