1.4 Decision Principles
The principles used to select a sensible decision are:
(a) Conditional Bayes Decision Principle
(b) Frequentist Decision Principle.
(a)
Conditional Bayes Decision Principle:
Choose an action
a A
minimizing
a Bayes action and will be denoted
a
, a . Such a will be called
.
Example 4 (continue):
Let A a1 , a2 , 1 0.99, 2 0.01 . Thus,
, a1 485, , a2 294 .
Therefore,
a a1 .
(b)
Frequentist Decision Principle:
3 most important frequentist decision principle:
Bayes risk principle
Minimax principle
Invariance principle
(1) Bayes Risk Principle:
Let D be the class of the decision rules. Then, for
rule
1 , 2 D , a decision
1 is preferred to a rule 2 based on Bayes risk principle if
1
r , 1 r , 2
.
A decision rule minimizing r , among all decision rules in class D
is called a Bayes rule and will be denoted as . The quantity
r r ,
is called Bayes risk for
.
Example 5 (continue):
X ~ N ,1, ~ N 0, 2 ,
D cx : c is any constant .
Let
c X cX . Then,
R , c E cX E cX E cX c c 1
2
2
2
E cX c 2cX c c 1 c 1 2
2
2
c 2Var X c 1 2
2
c 2 c 1 2
2
and
, c E R , c E c 2 c 12 2
c 2 c 1 E 2
2
c 2 c 1 2
2
Note that
, c is a function of c. , c attains its
2
minimum as c
,
1 2
2
2
( f c , c c c 1 , f c 0 c
)
1 2
2
2
2
'
Thus,
2
1 2
2
X
X
2
1
is the Bayes estimator.
In addition,
2
r r , 2
2
2
1
1
2
2
2
2
1
2
1
4
2
2 2
1 1 2 2
2 1 2
1 2 2
2
1 2
Example 4 (continue):
Let D a1 , a2 , 1 0.99, 2 0.01. Then,
r , a1 E R , a1 E L , a1 485
and
r , a2 E R , a2 E L , a2 294 .
Thus,
a1
is the Bayes estimator.
Note:
In a no-data problem, the Bayes risk (frequentist principle) is
3
equivalent to the Bayes expected loss (conditional Bayes principle).
Further, the Bayes risk principle will give the same answer as the
conditional Bayes decision principle.
Definition:
Let
X X 1 , X 2 ,, X n
have the probability distribution
function (or probability density function)
f x | f x1 , x2 ,, xn |
with prior density and prior cumulative distribution
and
F , respectively. Then, the marginal density or distribution
of
X X 1 , X 2 ,, X n
mx mx1 , x2 ,, xn
is
f x | d
f x | dF
f x |
The posterior density or distribution of
given x
f | x f | x1 , x2 ,, xn
The posterior expectation of
g
given
is
f x |
mx
x
.
is
g f x | d
g f | x d
m x
E f |x g
g f | x g f x |
m x
4
Very Important Result:
Let
X X 1 , X 2 ,, X n
have the probability distribution
function (or probability density function)
f x | f x1 , x2 ,, xn |
with prior density and prior cumulative distribution
and
F , respectively. Suppose the following two assumptions hold:
(a) There exists an estimator
(b) For almost all
0
with finite Bayes risk.
x , there exists a value x
minimizing
L , x f | x d
f | x
L , x
, E
L , x f | x
Then,
(a)
if
L , a a g
2
,
then
g f | x d
f | x
g
x E
g f | x
and, more generally, if
L , a w a g
2
then
5
,
w g f | x d
f | x
w f | x
E
w g
x
E f | x w
w g f | x
w f | x
(b)
if
then
L , a a
x
f | x
,
is the median of the posterior density or distribution
given x . Further, if
of
k 0 a , a 0
L , a
,
k1 a , a 0
then
x
is the
k0
k0 k1 percentile of the posterior density or
f | x
distribution
of
given x .
(c)
if
0 when a c
L , a
1 when a c
then,
x
is the midpoint of the interval
which maximizing
6
I
of length
2c
f | x d
I
P I | x
f | x
I
[Outline of proof]
(a)
, a E f |x L , x E f |x w a g 2
E f |x w a 2 2ag g 2
E f |x w a 2 2 E f |x g w a E f |x g 2 w
Thus
, a
2 E f |x w a 2 E f |x g w 0
a
E f |x g w
a
E f |x w
(b)
Without loss of generality, assume m is the median of
f | x . We
want to prove
, m , a E f |x L , m L , a 0 ,
for
a m .
Since
L , m L , a m a
m a m a,
m a 2 m a,
m a a m,
7
m
m a
a
then
E f |x L , m L , a
m a P m | x 2 m a Pm a | x
a m P a | x
m a
2
m
a
m a P m | x a m Pm a | x
m a
m a
a m P a | x
m a P m | x a m P m | x
ma am
0
2
2
[Intuition of the above proof:]
a1
a3
a2
c1
c3
c2
3
We want to find a point c such that
a
i 1
2
i 1
i
achieves its
c a2 ,
minimum. As
3
ca
ai a2 a1 a2 a2 a2 a3 a3 a1 .
As c c1
3
c
i 1
1
ai c1 a3 a3 a1 .
As
8
3
c c2 c2 ai c2 a1 c2 a2 c2 a3 a3 a1 c2 a2
i 1
,
a3 a1
As c c3
3
c
i 1
3
ai c3 a1 a3 a1 .
3
c a2 ,
Therefore, As
ca
i
i 1
achieves its minimum.
(2) Minimax Principle:
A decision rule
1 is preferred to a rule 2 based on the minimax
principle if
sup R , 1 sup R , 2 .
A decision
minimizing sup R , among all decision rules in
M
class D is called a minimax decision rule, i.e.,
sup R , M inf sup R , .
D
Example 5 (continue):
D cx : c is any constant
and
R , c c 2 c 1 2 .
2
Thus,
1 if c 1
2
sup R , c sup c 2 1 c 2
.
if c 1
Therefore,
M 1 X X
is the minimax decision rule.
9
Example 4 (continue):
D a1 , a2 . Then,
sup R , a1 sup L , a1 1000
and
sup R , a 2 sup L , a 2 300
Thus,
.
M a2 .
(3) Invariance Principle:
If two problems have identical formal structures (i.e., the same sample
space, parameter space, density, and loss function), the same decision
rule should be obtained based on the invariance principle.
Example 6:
X: the decay time of a certain atomic particle (in seconds).
Let X be exponentially distributed with mean
f x |
1
e
x
Suppose we want to estimate the mean
,
,0 x .
. Thus, a sensible loss function
is
L , a
1
2
a
2
a
1
2
.
Suppose
Y: the decay time of a certain atomic particle (in minutes).
10
Then,
X
1 y
Y
, f y | e
, 0 y ,
60
60 .
Thus,
2
a
2
a 60 a
L , a 1 1
1 L , a
60
2
where
a a
60
,
..
Let
X : the decision rule used to estimate
and
Y : the decision rule used to estimate .
Based on the invariance principle,
X
X
Y X 60 Y 60 .
60
60
The above augments holds for any transformation of the form
Y cX , c R , based on the invariance principle. Then,
1
c
1
1
X
X Y cX
X X 1 kX , k 1
c
c
Thus,
X kX
is the decision rule based on the invariance
principle.
11