Improved Cross Entropy Method For Estimation

advertisement
Improved Cross
Entropy Method For
Estimation
Presented by: Alex & Yanna
This presentation is based on the paper
“Improved Cross-Entropy Method for Estimation”
By Dirk P.Kroese & Joshua C.Chan
Rare Events Estimation
Rare Events
Estimation
We wish to estimate -
 PS  X    
X : Random vector taking values in some set 
S X :
Function on

Rare Events
Estimation
We can rewrite it as -


 P  S  X      E f IS  X     IS  X   f  x dx
And estimate with a crude Monte Carlo
N

 I 
i 1
S X i   
N
; Xi ~ f
Rare Events
Estimation
Lets say, for example, that X   X1 ,..., X10  , X i ~ ber  0.1
Direct
Calculation
Simulated
N
 10

 P   X i  8   3.736 107
 i 1

I
i 1

 N

i
 X j 8
 j 1


N
 0 ; X ij ~ ber  0.1
Rare Events
Estimation
Rare Events
Estimation
SOLUTION…
Importance Sampling
Importance
Sampling


 E f I  S  X     
I S  X    f  x 
k  x
 I S  X     f  x  

k  x dx  Ek 


k  x


And the importance sampling estimator will be
N
IS

 I 
i 1
S X i   
f  Xi 
N
k  Xi 
; Xi ~ k
Importance
Sampling
What would be a good choice
for the importance density k  x 
Importance
Sampling
We shall take a look at the Kullback Leibler divergence:
D  g * , f  x; v     g *  x  log g *  x  dx   g *  x  log f  x; v  dx
g  x :
The zero variance density = f  x | S  x     
f  x; v  :
The density from the family of f  x 
*
with parameter
v
I  S  x    f  x 
Importance
Sampling
CE Algorithm
In the article, 2 problematic issues were mentioned
regarding the multilevel CE:
•
•
The parametric family within which the optimal
importance density g is obtained might not be large
enough
when the dimension of the problem is large, the
likelihood ratio involved in obtaining vT becomes
unstable.
Solution
•
Sample directly from g*
Importance
Sampling
Our goal is to find
Stochastic
Version
v  arg max  g  x  log f  x; v  dx
*
v
Deterministic
Version
N
v  arg max  log f  X i ; v  , X i ~ g *
v
i 1
Importance
Sampling
But how the hell are we supposed
*
to sample from g  x  ? ? ?
Importance
Sampling
g  x   f  x | S  x      IS  X   f  x 
*
This observation grants us the opportunity to
apply the useful tool of gibbs sampling.
Gibbs Sampler In Brief
Gibbs Sampler
In Brief
•
•
•
•
an algorithm to generate a sequence of samples from the joint
probability distribution
Gibbs sampling is a special case of the Metropolis–Hastings
algorithm, and thus an example of a Markov chain Monte
Carlo algorithm
Gibbs sampling is applicable when the joint distribution is not
known explicitly, but the conditional distribution of each
variable is known
It can be shown that the sequence of samples constitutes
a Markov chain, and the stationary distribution of that Markov
chain is just the sought-after joint distribution
Gibbs Sampler
In Brief
The Gibbs sampler algorithm
Given X t
Generate Y1 ~ g
*
Generate Yi ~ g
*
1
Generate Yn ~ g
Return X t 1  Y
x | X
,..., X nt 
 x | Y ,..., Y
i
*
t
2
1
t
t
,
X
,...,
X
i 1
i 1
n
 xi | Y1,..., Yn1 
Improved Cross Entropy
Improved Cross
Entropy
The Improved CE consists of 3 steps:
1. Generate via gibbs sampler, N RVs X ~ g *
*
2. Solve v  arg max
v
N
 log  f  X ; v  
i 1
M
3. Estimate 
 I S  X   
i 1
i
f  Xi 
i
M
f  X i ; v* 
; X i ~ f  X i ; v* 
Improved Cross
Entropy
Consider X   X1 ,..., X n  where X i ~ ber  pi 
and we would like to estimate
 n

 P   Xi  n 
 i 1

under the improved cross entropy scheme.
Improved Cross
Entropy
Lets set n  50, pi  0.1,   0.7
and imply the new proposed algorithm
Improved Cross
Entropy
Step 1 – generate RVs from g
First we need to find
g*  x   f  x | S  x    
*
g  xi | x1,..., xi1, xi1,..., xn 
*
ber  pi   j i xi  
g *  xi | xi  ~ 
 ji xi  
1
Improved Cross
Entropy
*
Step 1 – generate RVs from g cont.
Set X 0i ~ ber  pi 
For i {1,..., N }
Generate Y1 ~ g
*
Generate Yi ~ g
*
x | X
1
Generate Yn ~ g
Set X i  Y
ber  pi   j i xi  
g  xi | xi  ~ 
 ji xi  
1
*
*
x
j
i 1
2
,..., X ni 1 
| Y1 ,..., Y j 1 , X ij11 ,..., X ni 1 
 xi | Y1,..., Yn1 
Improved Cross
Entropy
Step 2 – Solve the optimization problem
N
q  arg max  log  f  X i ; q   
*
N
i 1
v
 n X ij
1 X ij 
arg max  log   q j 1  q j 

v
i 1
 j 1

N
arg max  X ij log q j  1  X ij  log 1  q j 
N
v
n
i 1 j 1
q*j 
i
X
 j
i 1
N
Improved Cross
Entropy
Step 3 – Estimate
N
IS

 I 
i 1
S X i   n 
via importance sampling
f  Xi 
N
f  Xi;q
*

; X i ~ f  ; q* 
Improved Cross
Entropy
Multilevel CE Vs. Improved CE
Improved Cross
Entropy
F(q|x)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.5
0.52
0.54
0.56
0.58
0.6
0.62
0.64
0.66
0.68
0.7
Gibbs Sampler
In Brief
CE
N=10000
  0.01
4 iterations
Total budget 40000
v
T
Gibbs Sampler
• 10 parallel chains
• Each has 1000
length
Total budget 10000
n : Obligors
pi : Probability of the i obligor to default
X i : p  P  X  x  for a given threshold x
ci : Monetary loss if the i obligor defaults
th
i
i
i
i
th
L  X   c1 I X1  x1  ...  cn I X n  xn 
   P  L  X      P  L  X   bn 
b0
t Copula Model


X i   Z  1   2i 
  0 ; Z ~ N  0,1 ; i ~ N  0,   
2

1
2
v v
;  ~ Gamma  , 
2 2
X ~ marginally multivariate t  v 
Known methods for the rare event estimation
Exponential Change of Measure
Hazard Rate Twisting
Bounded relative error
Logarithmically efficient
Needs to generate RVs from
non standard distribution
10 times more variance
reduction then ECM
The Improved CE for Estimating the Prob. of a Rare Loss

X is derived from f  z , ,    f N  z;0,1 f G  ; v , v
2 2

n
i 1
f N i ;0,  N2 
f N  ; a, b  : N  a, b 
fG  ; c, d  : Gamma  c, d 
Step I – Sampling from g*
Sampling From
g*
g *  z, ,    f  z, ,  | L  x      f  z, ,   IL x 
Now we will show how we find the conditional
probabilities of g* to apply the gibbs sampler
For generating RVs from g*
Sampling From
g*
g*  z | ,  

• Define G 
i
xi   1   2i

 and arrange them is ascending order
• Let Gi  denote the i th ordered value and ci  the corresponding loss

l
•Then the event L  x    occurs iff Z  G k  where k : min l :   i 1 ci 
g  z |  ,    f N  z;0,1 I z G
*

k 

Via Inverse Transform

Sampling From
g*
g *   | z, 
• Define

H 
 Z  1   2i
i

and arrange them is ascending order
xi
• Let H i  denote the i th ordered value and ci  the corresponding loss

•Then the event L  x    occurs iff   H  n  k  where k : min l :   i 1 ci 


g   | z,   fG ; v , v I  min H 2 ,0
2 2 
 ( nk ) 
*
l

Via Inverse Transform
Sampling From
g*
g *  |  , z 
• Multivariate truncated normal distribution
• Sequentially draw from
• if
 j i c j I 
2
  Z  1   j


• else

g * i | i ,  , z 


 

g i | i ,  , z   f N i ;0, 
*

*
2
then g i |   i ,  , z   f N i ;0,  N
2
N
I

xi    Z 


i 

2

1  



After we got Zi ,i , i i 1 we are ready
to move to the next step…
N
Step II – Solving the Opt.
Problem
Solving Opt.
Problem
n

2
2 
F   f  z, ,  ; v   f N  z;  z ,  z  fG   ;  ,     f N i ;  ,    
i 1


v    z ,  z2 ,   ,   ,  

In our model u  0,1, v 2 , v 2 ,0

Solving Opt.
Problem
Since any member of the group is a product of densities,
standard techniques of maximum likelihood estimation
can be applied to find the optimal v*.
N
 Z* 
Z
i 1
Z
N
i
N
,  Z2* 
i 1
i

N
*
Z

N
2
, * 
2
 * 

S2
,
 * 
n

i 1 j 1
i, j
nN

S2
  , S  the sample mean & variance of 1 ,..., N
Solving Opt.
Problem
Once we obtain the optimal importance density f  ; v* 
we are moving to step 3
Step III – Importance
Sampling
Importence
Sampling
M
 I 
i 1
L X i   
f  Zi ,i , i ; u 
f  Zi ,i , i ; v
M
 Zi ,i , i  ~
f  ; v
*

*

Some Results
Pros and Cons
Improved CE
Pros
Cons
•
•
•
•
•
Rare events
3 basic steps
Appropriate in multi
dimension settings
Fewer simulation effort
then the Multi level CE
•
Problematic in general
performance function
g *  xi | xi  not trivial
Gibbs sampler requires
warm up time
Further research
•
•
Gibbs sampler for the general
performance function
Applying Sequential Monte Carlo
Methods for sampling from g*
Download