2.1 Posterior Distribution I Discrete case: Motivating example 1: 0 : no AIDS, 1 : AIDS 0 0.99, 1 0.01 Let positive (indicate AIDS) 1 : X 0 : negative (indicate no AIDS) From past experience and records, we know f (0 | 0 ) 0.97, f (1 | 0 ) 0.03, f 0 | 1 0.02, f 1 | 1 0.98 Objective: find f 0 | 1 , where f 0 | 1 : the conditional probability that a patient really has no AIDS given the test indicating AIDS. Note: high f 0 | 0 , f 1 | 1 imply the AIDS is accurate in judging the infection of AIDS. Bayes’s Theorem (two events): P( A | B) P( A B) P( A) P( B | A) P( B) P( A) P( B | A) P( A c ) P( B | A c ) 1 [Derivation of Bayes’s theorem (two events)]: A Ac B B∩A B∩Ac We want to know P( A | B) P( B A) P( B) . Since P( B A) P( A) P( B | A) , and P( B) P( B A) P( B Ac ) P( A) P( B | A) P( Ac ) P( B | Ac ) , thus, P( A | B) P( B A) P( B A) P( A) P( B | A) P( B) P( B A) P( B Ac ) P( A) P( B | A) P( Ac ) P( B | Ac ) Similarly, if the parameter taking two values 0 and 1 and the data X taking values at c1 , c2 ,, cn , , then let 0 A , 1 Ac 0 X= ck (X= c k ) and 1 (X= c k ) ∩ 1 0 2 X ck B . Since P(X ck 0 ) P( 0 ) P(X ck | 0 ) 0 f ck | 0 , and P(X ck ) P(X ck 0 ) P(X ck 1 ) P( 0 ) P(X ck | 0 ) P( 1 ) P(X ck | 1 ) 0 f ck | 0 1 f ck | 1 , thus, f 0 | c k P ( 0 | X ck ) P (X c k 0 ) P (X ck ) P (X ck 0 ) P (X c k 0 ) P (X c k 1 ) 0 f ck | 0 0 f ck | 0 1 f ck | 1 Motivating example 1 (continue): By Bayes’s theorem, f 0 | 1 P ( 0 | X 1) 0 f 1 | 0 0 f 1 | 0 1 f 1 | 1 0.99 0.03 0.99 0.03 0.01 0.98 0.7519 A patient with test positive still has high probability (0.7519) of no AIDS. Motivating example 2: 1 : the finance of the company being good. 2 : the finance of the company being O.K. 3 : the finance of the company being bad. 3 X 1: good finance assessment for the company. X 2: O.K. finance assessment for the company. X 3: bad finance assessment for the company. From the past records, we know 1 0.5, ( 2 ) 0.2, ( 3 ) 0.3, f (1 | 1 ) 0.9, f (1 | 2 ) 0.05, f (1 | 3 ) 0.05. That is, we know the chances of the different finance situations of the company and the conditional probabilities of the different assessments for the company given the finance of the company known, for example, f (1 | 1 ) 0.9 indicates 90% chance of good finance year of the company has been predicted correctly by the finance assessment. Our objective is to obtain the probability f (1 | 1) , i.e., the conditional probability that the finance of the company being good in the coming year given that good finance assessment for the company in this year. Bayes’s Theorem (general): Let A1 , A2 ,, An be mutually exclusive events and A1 A2 An S , then P( Ai | B) P( Ai B) P( B) .............. , P( Ai ) P( B | Ai ) P( A1 ) P( B | A1 ) P( A2 ) P( B | A2 ) P( An ) P( B | An ) i 1,2, , n . 4 [Derivation of Bayes’s theorem (general)]: A1 A2 B∩A1 B∩A2 B∩A1 B∩A2 B∩A1 B∩A2 ……………….. B …………… An B∩An B∩An B∩An Since P( B Ai ) P( Ai ) P( B | Ai ) , and P( B) P( B A1 ) P( B A2 ) P( B An ) ....... P( A1 ) P( B | A1 ) P( A2 ) P( B | A2 ) P( An ) P( B | An ) , thus, P( Ai | B) P( B Ai ) P( Ai ) P( B | Ai ) P( B) P( A1 ) P( B | A1 ) P( A2 ) P( B | A2 ) P( An ) P( B | An ) Similarly, if the parameter taking n values 1 , 2 , , n and the data X taking values at c1 , c2 ,, cn , , then let 1 A1 , 2 A2 ,, n An , and X Then, 5 ck B . f i | c k P( i | X c k ) P(X c k i ) P(X c k ) P( i ) P(X c k | i ) P( 1 ) P(X c k | 1 ) P( n ) P(X c k | n ) i f c k | i 1 f c k | 1 2 f c k | 2 n f c k | n 2 1 X ck B1 ……………….. X ck n X ck B∩A1 B∩A2 12 2 B∩A B∩An B∩A n n …………… Example 2: ( 1 ) f (1 | 1 ) ( 1 ) f (1 | 1 ) ( 2 ) f (1 | 2 ) ( 3 ) f (1 | 3 ) 0.5 * 0.9 ............. 0.95 0.5 * 0.9 0.2 * 0.05 0.3 * 0.05 f 1 | 1 A company with good finance assessment has very high probability (0.95) of good finance situation in the coming year. II Continuous case: hx, f x | : the joint density. m x f x | d hx, d : 6 the marginal density Thus, the posterior is f | x h x, f x | f x | l | x m x m x . Example 1: Let X i ~ Poisson , i 1,2,, n and 1 1 ~ gamma , , e . Then, n f x1 ,, xn | x e n i i 1 xi ! xi i 1 e n n xi ! . i 1 Thus, n xi 1 e n f | x1 , , xn e n xi ! i 1 i 1 n xi 1 n n i1 e xi ! i 1 n 1 gamma x , i n i 1 Note: the MLE (maximum likelihood estimate) based on n f x1 ,, xn | l | x1 ,, xn is 7 x x i 1 n i while the posterior n mean (Bayes estimate under square loss function) is x i 1 i . The n posterior mean incorporates the information obtained from the data (sample mean, (prior mean, x) with the information obtained from the prior ) Example 2: Recall: n n x X ~ bn, p , f x | p p x 1 p x and p ~ betaa, b , p Then, a b a 1 b 1 p 1 p . a b f p | x ~ betax a, n b x. Extension: X ,, X ~ muln, ,, , ,, , p 1 p 1 p 1 p i 1 i 1 p f x1 , , x p | n! x 1x1 p p 1 i p i 1 x1! x p ! n xi ! i 1 p n xi i 1 and ~ Dirichlet a1 , , a p , a p 1 , p 1 ai a p 1 1 p a p 1 i 1 a1 1 . p 1 1 p 1 i i 1 a i i 1 8 Then, f | x1 , , x p f x1 , , x p | p p n! xp x1 1 1 p i p i 1 x1! x p ! n xi ! i 1 n xi i 1 p 1 ai a p 1 1 p a 1 p1i 1 1a1 1 p p 1 i i 1 a i i 1 p x1 a1 1 1 x p a p 1 p p 1 i i 1 n xi a p 1 1 i 1 p ~ Dirichlet x1 a1 , , x p a p , n xi a p 1 i 1 Note: the mean of X i is n i and thus the MLE (maximum likelihood estimate) for i based on f x1 ,, xn | l | x1 ,, xn xi is while the posterior mean (Bayes estimate under square loss n function) is xi ai xi ai p 1 p n ak x a n x a k k k p 1 k 1 k 1 k 1 p . The posterior mean incorporates the information obtained from the data xi (MLE estimate, ) with the information obtained from the prior n (prior mean, ai ) p 1 a k 1 k 9 Example 3: n n X 1 , , X n ~ N ,1, f x1 , , xn | and xi 2 1 i 1 e 2 2 ~ N ,1 .Then, f | x1 , , xn f x1 , , xn | n n 1 e 2 xi 2 i 1 2 1 e 2 2 2 n n n 1 1 n 2 2 x x 2 2 2 2 i i 2 i 1 i 1 1 e 2 n n n 1 1 n 1 2 2 x x 2 2 i i 2 i 1 i 1 1 e 2 n 1 2 2 2 n 1 1 e 2 n xi i 1 n 1 1 2 n 1 e xi i 1 n 1 n xi i 1 n 1 2 xi i 1 n 1 2 n n x2 i 2 i 1 n 1 n 1 2 n x2 i n 1 i 1 2 2 n 1 n 1 e n xi 1 N i 1 , n 1 n 1 10 n xi u i 1 n 1 n 2 Note: the MLE (maximum likelihood estimate) based on n f x1 ,, xn | l | x1 ,, xn is x x i 1 i while the posterior n mean (Bayes estimate under square loss function) is n x i 1 i n 1 n 1 x . n 1 n 1 The posterior mean incorporates the information obtained from the data (sample mean, (prior mean, prior is ). x ) with the information obtained from the prior The variance of x is 1/ n 1 . Intuitively, the variation of prior is n and the variance of times of the one of x . Therefore, we put more weights to the more stable estimate. A Useful Result: Let T X 1 , X 2 , , X n be a sufficient statistic for the parameter with density g t | . If, T X 1 , X 2 ,, X n t Then f | x1 ,, xn f | t g t | m t , m t is the marginal density for T X 1 , X 2 ,, X n . Example 3 (continue): 11 n f x1 , , xn | xi 2 n 1 i 1 e 2 n 1 e 2 2 n n 1 n 2 2 xi xi2 2 i 1 i 1 n 1 nx 2 n 2 e e 2 n 1 e 2 1 1 nt n 2 2 n xi2 i 1 2 n xi2 i 1 e 2 t x By factorization theorem, t x is a sufficient statistic. By the above useful result, x 1 n f | x1 ,, xn f | x ~ N , 1 n 1 1 n Definition (Conjugate family): Let denote the class of density function f x | . A class P of prior distribution is said to be a conjugate family for f | x is in the class P for all f x | P . A Useful Result: Let f x | hx e x . 12 if and If the prior of is k , e , then the posterior is f | x k x, 1e x 1 . [proof:] f | x f x | k , e h x e x k , h x e x 1 e x 1 Since d 1 k , d k , e 1 e d , thus k x, 1 1 x 1 e d . Therefore, e x 1 f | x x 1 k x, 1e x 1 d e Note: Some commonly used conjugate families are the following: Normal-normal, Poisson-gamma, Normal-gamma, Binomial-beta, Multinomial-Dirichlet. 13 Negative Gamma-gamma, binomial-beta,