2.1 Posterior distribution

advertisement
2.1 Posterior Distribution
I Discrete case:
Motivating example 1:
 0 : no AIDS, 1 : AIDS    0   0.99,  1   0.01
Let
positive (indicate AIDS)
1 :
X 
0 : negative (indicate no AIDS)
From past experience and records, we know
f (0 |  0 )  0.97, f (1 |  0 )  0.03, f 0 | 1   0.02, f 1 | 1   0.98
Objective: find
f  0 | 1 , where
f  0 | 1 : the conditional probability that a patient really has no
AIDS given the test indicating AIDS.
Note: high
f  0 | 0 , f 1 | 1 imply the AIDS is accurate in
judging the infection of AIDS.
Bayes’s Theorem (two events):
P( A | B) 
P( A  B)
P( A) P( B | A)

P( B)
P( A) P( B | A)  P( A c ) P( B | A c )
1
[Derivation of Bayes’s theorem (two events)]:
A
Ac
B
B∩A B∩Ac
We want to know P( A | B) 
P( B  A)
P( B) . Since
P( B  A)  P( A) P( B | A) ,
and
P( B)  P( B  A)  P( B  Ac )  P( A) P( B | A)  P( Ac ) P( B | Ac ) ,
thus,
P( A | B) 
P( B  A)
P( B  A)
P( A) P( B | A)


P( B)
P( B  A)  P( B  Ac ) P( A) P( B | A)  P( Ac ) P( B | Ac )
Similarly, if the parameter  taking two values  0 and  1 and the
data X taking values at c1 , c2 ,, cn , , then let
   0   A ,   1  Ac
0
X= ck
(X= c k )
and
1
(X= c k )
∩ 1
 0
2
X  ck   B .
Since
P(X  ck      0 )  P(   0 ) P(X  ck  |    0 )
   0  f ck |  0 
,
and
P(X  ck )  P(X  ck     0 )  P(X  ck     1 )
 P(   0 ) P(X  ck  |    0 )  P(   1 ) P(X  ck  |    1 )
   0  f ck |  0     1  f ck |  1 
,
thus,
f  0 | c k   P (   0  | X  ck ) 


P (X  c k     0 )
P (X  ck )
P (X  ck     0 )
P (X  c k     0 )  P (X  c k     1 )
  0  f ck |  0 
  0  f ck |  0     1  f ck |  1 
Motivating example 1 (continue):
By Bayes’s theorem,
f  0 | 1  P (   0  | X  1) 
  0  f 1 |  0 
  0  f 1 |  0     1  f 1 |  1 
0.99  0.03
0.99  0.03  0.01  0.98
 0.7519

 A patient with test positive still has high probability (0.7519) of no
AIDS.
Motivating example 2:
1 :
the finance of the company being good.
2 :
the finance of the company being O.K.
3 :
the finance of the company being bad.
3
X  1:
good finance assessment for the company.
X  2:
O.K. finance assessment for the company.
X  3:
bad finance assessment for the company.
From the past records, we know
 1   0.5,  ( 2 )  0.2,  ( 3 )  0.3,
f (1 | 1 )  0.9, f (1 |  2 )  0.05, f (1 |  3 )  0.05.
That is, we know the chances of the different finance situations of the
company and the conditional probabilities of the different assessments for
the company given the finance of the company known, for example,
f (1 | 1 )  0.9 indicates 90% chance of good finance year of the company
has been predicted correctly by the finance assessment.
Our objective is to obtain the probability f (1 | 1) , i.e., the conditional
probability that the finance of the company being good in the coming
year given that good finance assessment for the company in this year.
Bayes’s Theorem (general):
Let A1 , A2 ,, An be mutually exclusive events and
A1  A2   An  S ,
then
P( Ai | B) 
P( Ai  B)
P( B)
.............. 
,
P( Ai ) P( B | Ai )
P( A1 ) P( B | A1 )  P( A2 ) P( B | A2 )    P( An ) P( B | An )
i  1,2,  , n .
4
[Derivation of Bayes’s theorem (general)]:
A1
A2
B∩A1
B∩A2
B∩A1
B∩A2
B∩A1
B∩A2
………………..
B
……………
An
B∩An
B∩An
B∩An
Since
P( B  Ai )  P( Ai ) P( B | Ai ) ,
and
P( B)  P( B  A1 )  P( B  A2 )   P( B  An )
.......  P( A1 ) P( B | A1 )  P( A2 ) P( B | A2 )    P( An ) P( B | An )
,
thus,
P( Ai | B) 
P( B  Ai )
P( Ai ) P( B | Ai )

P( B)
P( A1 ) P( B | A1 )  P( A2 ) P( B | A2 )    P( An ) P( B | An )
Similarly, if the parameter  taking n values 1 , 2 , , n and
the data X taking values at c1 , c2 ,, cn , , then let
  1  A1 ,    2   A2 ,,    n   An , and X
Then,
5
 ck   B .
f  i | c k   P(   i  | X  c k ) 


P(X  c k      i )
P(X  c k )
P(   i ) P(X  c k  |    i )
P(   1 ) P(X  c k  |    1 )    P(   n ) P(X  c k  |    n )
  i  f c k |  i 
  1  f c k |  1     2  f c k |  2       n  f c k |  n 
  2
  1
X  ck 
B1
………………..
X  ck 
  n
X  ck 
B∩A1
B∩A2
  12    2
B∩A
B∩An
B∩A
 n   n 
……………
Example 2:
 ( 1 ) f (1 | 1 )
 ( 1 ) f (1 | 1 )   ( 2 ) f (1 |  2 )   ( 3 ) f (1 |  3 )
0.5 * 0.9
............. 
 0.95
0.5 * 0.9  0.2 * 0.05  0.3 * 0.05
f  1 | 1 
 A company with good finance assessment has very high
probability (0.95) of good finance situation in the coming year.
II Continuous case:
hx,      f x |   : the joint density.
m  x  
 f x |    d


 hx, d :

6
the marginal density
Thus, the posterior is
f  | x  
h x,     f  x |  

    f  x |      l  | x 
m  x 
m  x 
.
Example 1:
Let
X i ~ Poisson  , i  1,2,, n
and

1
   1  
   ~ gamma , ,    
 e
.







Then,
n
f  x1 ,, xn |    
 x e 
n
i
i 1

xi !

 xi
i 1
e  n
n
 xi !
.
i 1
Thus,
n
 xi
   1   
e  n
f  | x1 , , xn  
 e

n
  
 xi !
i 1
i 1
n
 xi  1   n   

 n
 i1
e
 xi !

i 1
 n
1
 gamma
x


,

i

n
 i 1




Note: the MLE (maximum likelihood estimate) based on
n
f x1 ,, xn |    l  | x1 ,, xn 
is
7
x
x
i 1
n
i
while the posterior
n
mean (Bayes estimate under square loss function) is
x
i 1
i

. The
n
posterior mean incorporates the information obtained from the data
(sample mean,
(prior mean,


x)
with the information obtained from the prior
)
Example 2:
Recall:
 n
n x
X ~ bn, p , f  x | p     p x 1  p 
 x
and
  p  ~ betaa, b ,   p  
Then,
 a  b  a 1
b 1
p 1  p  .
 a  b 
f  p | x ~ betax  a, n  b  x.
Extension:
X ,, X  ~ muln, ,, ,    ,, , 
p
1
p
1
p
1
p
i 1
i
1
p
f x1 , , x p |   

n!
x 
1x1  p p 1   i 
p


i 1


x1! x p ! n   xi !
i 1


p
n
 xi
i 1
and
   ~ Dirichlet a1 , , a p , a p 1 ,
 p 1 
   ai 
a p 1 1
p

a p 1 
i 1


a1 1
.
    p 1
 1  p 1   i 
i 1


 a 

i
i 1
8
Then,
f  | x1 ,  , x p   f x1 ,  , x p |    
p

p

n!
xp 
x1





1



1
p
i

p


i 1


x1! x p ! n   xi !
i 1


n
 xi
i 1
 p 1 
   ai 
a p 1 1
p


a

1
 p1i 1  1a1 1  p p 1    i 
i 1


 a 

i
i 1
p

x1  a1 1
1

x p  a p 1
p
p


1    i 
i 1


n
 xi  a p 1 1
i 1
p


~ Dirichlet  x1  a1 ,  , x p  a p , n   xi  a p 1 
i 1


Note: the mean of X i is n i and thus the MLE (maximum
likelihood estimate) for  i based on f x1 ,, xn |    l  | x1 ,, xn 
xi
is
while the posterior mean (Bayes estimate under square loss
n
function)
is
xi  ai
xi  ai

p 1
p


n

ak




x

a

n

x

a



k
k
k
p 1 

k 1
k 1
k 1


p
.
The
posterior mean incorporates the information obtained from the data
xi
(MLE estimate,
) with the information obtained from the prior
n
(prior mean,
ai
)
p 1
a
k 1
k
9
Example 3:
n
n

X 1 ,  , X n ~ N  ,1, f  x1 ,  , xn |    

and
  xi  2
1   i 1
 e
2 
2
   ~ N  ,1 .Then,
f  | x1 ,  , xn   f x1 ,  , xn |    
n
n
 1 

 e
 2 
  xi  2
 i 1
2
1
e
2


   2
2
n
n


n 1  1   n 2  2 x   x 2   2  2    2
i
i 

2  
i 1
i 1


 1 

 e
 2 






 n
   n

n 1  1    n 1 2  2  x       x 2   2  
i
i

  
2  
 i 1
   i 1
 


 1 

 e
 2 




n 1  2 

   2 
2 

n 1



 
 1 

 e
 2 
  n

  xi    

  i 1

 
  n 1  

 




1
2
n 1

e


 
xi    
 
i 1
 
n 1  
 
 
n



xi   

i 1

n 1 


2

xi   

i 1

n 1 


2
n

 n

  x2

i
2 
 


   i 1 
  n 1 n 1 
 


 
2
 n

 x2

i
n 1  i 1
2 




2  n 1 n 1 





e
 n

  xi  

1 
 N  i 1
,
 n  1 n  1




10
n









xi u 

i 1

n 1 


n

2 

 



 
Note: the MLE (maximum likelihood estimate) based on
n
f x1 ,, xn |    l  | x1 ,, xn 
is
x
x
i 1
i
while the posterior
n
mean (Bayes estimate under square loss function) is
n
x
i 1
i

n 1
 n 
 1 

x  
 .
n

1
n

1




The posterior mean incorporates the information obtained from the
data (sample mean,
(prior mean,
prior is
 ).
x ) with the information obtained from the prior
The variance of
x
is
1/ n
1 . Intuitively, the variation of prior is n
and the variance of
times of the one of
x . Therefore, we put more weights to the more stable estimate.
A Useful Result:
Let T  X 1 , X 2 , , X n  be a sufficient statistic for the
parameter

with density g t |   . If, T  X 1 , X 2 ,, X n   t
Then
f  | x1 ,, xn   f  | t  
  g t |  
m t 
,
m t  is the marginal density for T  X 1 , X 2 ,, X n  .
Example 3 (continue):
11
n

f  x1 ,  , xn |    










  xi  2
n
1   i 1
 e
2 
n
1 
 e
2 
2
n
n

1 
   n 2  2
xi 
xi2  


2  
i 1
i 1


n
1  nx   2 n 2
e
 e
2 
n
1 
 e
2 

1

1
nt  n 2
2

n
 xi2
i 1
2
n
 xi2
i 1
e
2
t  x 
By factorization theorem, t  x is a sufficient statistic. By the above
useful result,



x
1 
n

f  | x1 ,, xn   f  | x  ~ N 
,
1
n  1
 1


n


Definition (Conjugate family):
Let  denote the class of density function f x |   . A class P of
prior distribution is said to be a conjugate family for 
f  | x  is in the class P for all
f
x |    
 P .
A Useful Result:
Let
f x |    hx e x    .
12
if
and
If the prior of

is
    k  ,  e     , then the posterior
is
f  | x   k   x,   1e  x    1   .
[proof:]
f  | x      f  x |    k  ,  e       h x e x   
 k  ,  h x e   x    1  
 e   x    1  
Since
    
d  1  k  ,   
  d   k  ,  e
1
    
e
d

,
thus
k   x,   1 
1
   x   1  
e
d

.
Therefore,
e  x    1  
f  | x      x    1    k   x,   1e   x    1  
d
e
Note:
Some commonly used conjugate families are the following:
Normal-normal,
Poisson-gamma,
Normal-gamma,
Binomial-beta,
Multinomial-Dirichlet.
13
Negative
Gamma-gamma,
binomial-beta,
Download