1.4 Decision Principles The principles used to select a sensible decision are: (a) Conditional Bayes Decision Principle (b) Frequentist Decision Principle. (a) Conditional Bayes Decision Principle: Choose an action a A minimizing a Bayes action and will be denoted a , a . Such a will be called . Example 4 (continue): Let A a1 , a2 , 1 0.99, 2 0.01 . Thus, , a1 485, , a2 294 . Therefore, a a1 . (b) Frequentist Decision Principle: 3 most important frequentist decision principle: Bayes risk principle Minimax principle Invariance principle (1) Bayes Risk Principle: Let D be the class of the decision rules. Then, for rule 1 , 2 D , a decision 1 is preferred to a rule 2 based on Bayes risk principle if 1 r , 1 r , 2 . A decision rule minimizing r , among all decision rules in class D is called a Bayes rule and will be denoted as . The quantity r r , is called Bayes risk for . Example 5 (continue): X ~ N ,1, ~ N 0, 2 , D cx : c is any constant . Let c X cX . Then, R , c E cX E cX E cX c c 1 2 2 2 E cX c 2cX c c 1 c 1 2 2 2 c 2Var X c 1 2 2 c 2 c 1 2 2 and , c E R , c E c 2 c 12 2 c 2 c 1 E 2 2 c 2 c 1 2 2 Note that , c is a function of c. , c attains its 2 minimum as c , 1 2 2 2 ( f c , c c c 1 , f c 0 c ) 1 2 2 2 2 ' Thus, 2 1 2 2 X X 2 1 is the Bayes estimator. In addition, 2 r r , 2 2 2 1 1 2 2 2 2 1 2 1 4 2 2 2 1 1 2 2 2 1 2 1 2 2 2 1 2 Example 4 (continue): Let D a1 , a2 , 1 0.99, 2 0.01. Then, r , a1 E R , a1 E L , a1 485 and r , a2 E R , a2 E L , a2 294 . Thus, a1 is the Bayes estimator. Note: In a no-data problem, the Bayes risk (frequentist principle) is 3 equivalent to the Bayes expected loss (conditional Bayes principle). Further, the Bayes risk principle will give the same answer as the conditional Bayes decision principle. Definition: Let X X 1 , X 2 ,, X n have the probability distribution function (or probability density function) f x | f x1 , x2 ,, xn | with prior density and prior cumulative distribution and F , respectively. Then, the marginal density or distribution of X X 1 , X 2 ,, X n mx mx1 , x2 ,, xn is f x | d f x | dF f x | The posterior density or distribution of given x f | x f | x1 , x2 ,, xn The posterior expectation of g given is f x | mx x . is g f x | d g f | x d m x E f |x g g f | x g f x | m x 4 Very Important Result: Let X X 1 , X 2 ,, X n have the probability distribution function (or probability density function) f x | f x1 , x2 ,, xn | with prior density and prior cumulative distribution and F , respectively. Suppose the following two assumptions hold: (a) There exists an estimator (b) For almost all 0 with finite Bayes risk. x , there exists a value x minimizing L , x f | x d f | x L , x , E L , x f | x Then, (a) if L , a a g 2 , then g f | x d f | x g x E g f | x and, more generally, if L , a w a g 2 then 5 , w g f | x d f | x w f | x E w g x E f | x w w g f | x w f | x (b) if then L , a a x f | x , is the median of the posterior density or distribution given x . Further, if of k 0 a , a 0 L , a , k1 a , a 0 then x is the k0 k0 k1 percentile of the posterior density or f | x distribution of given x . (c) if 0 when a c L , a 1 when a c then, x is the midpoint of the interval which maximizing 6 I of length 2c f | x d I P I | x f | x I [Outline of proof] (a) , a E f |x L , x E f |x w a g 2 E f |x w a 2 2ag g 2 E f |x w a 2 2 E f |x g w a E f |x g 2 w Thus , a 2 E f |x w a 2 E f |x g w 0 a E f |x g w a E f |x w (b) Without loss of generality, assume m is the median of f | x . We want to prove , m , a E f |x L , m L , a 0 , for a m . Since L , m L , a m a m a m a, m a 2 m a, m a a m, 7 m m a a then E f |x L , m L , a m a P m | x 2 m a Pm a | x a m P a | x m a 2 m a m a P m | x a m Pm a | x m a m a a m P a | x m a P m | x a m P m | x ma am 0 2 2 [Intuition of the above proof:] a1 a3 a2 c1 c3 c2 3 We want to find a point c such that a i 1 2 i 1 i achieves its c a2 , minimum. As 3 ca ai a2 a1 a2 a2 a2 a3 a3 a1 . As c c1 3 c i 1 1 ai c1 a3 a3 a1 . As 8 3 c c2 c2 ai c2 a1 c2 a2 c2 a3 a3 a1 c2 a2 i 1 , a3 a1 As c c3 3 c i 1 3 ai c3 a1 a3 a1 . 3 c a2 , Therefore, As ca i i 1 achieves its minimum. (2) Minimax Principle: A decision rule 1 is preferred to a rule 2 based on the minimax principle if sup R , 1 sup R , 2 . A decision minimizing sup R , among all decision rules in M class D is called a minimax decision rule, i.e., sup R , M inf sup R , . D Example 5 (continue): D cx : c is any constant and R , c c 2 c 1 2 . 2 Thus, 1 if c 1 2 sup R , c sup c 2 1 c 2 . if c 1 Therefore, M 1 X X is the minimax decision rule. 9 Example 4 (continue): D a1 , a2 . Then, sup R , a1 sup L , a1 1000 and sup R , a 2 sup L , a 2 300 Thus, . M a2 . (3) Invariance Principle: If two problems have identical formal structures (i.e., the same sample space, parameter space, density, and loss function), the same decision rule should be obtained based on the invariance principle. Example 6: X: the decay time of a certain atomic particle (in seconds). Let X be exponentially distributed with mean f x | 1 e x Suppose we want to estimate the mean , ,0 x . . Thus, a sensible loss function is L , a 1 2 a 2 a 1 2 . Suppose Y: the decay time of a certain atomic particle (in minutes). 10 Then, X 1 y Y , f y | e , 0 y , 60 60 . Thus, 2 a 2 a 60 a L , a 1 1 1 L , a 60 2 where a a 60 , .. Let X : the decision rule used to estimate and Y : the decision rule used to estimate . Based on the invariance principle, X X Y X 60 Y 60 . 60 60 The above augments holds for any transformation of the form Y cX , c R , based on the invariance principle. Then, 1 c 1 1 X X Y cX X X 1 kX , k 1 c c Thus, X kX is the decision rule based on the invariance principle. 11