Improved Cross Entropy Method For Estimation Presented by: Alex & Yanna This presentation is based on the paper “Improved Cross-Entropy Method for Estimation” By Dirk P.Kroese & Joshua C.Chan Rare Events Estimation Rare Events Estimation We wish to estimate - PS X X : Random vector taking values in some set S X : Function on Rare Events Estimation We can rewrite it as - P S X E f IS X IS X f x dx And estimate with a crude Monte Carlo N I i 1 S X i N ; Xi ~ f Rare Events Estimation Lets say, for example, that X X1 ,..., X10 , X i ~ ber 0.1 Direct Calculation Simulated N 10 P X i 8 3.736 107 i 1 I i 1 N i X j 8 j 1 N 0 ; X ij ~ ber 0.1 Rare Events Estimation Rare Events Estimation SOLUTION… Importance Sampling Importance Sampling E f I S X I S X f x k x I S X f x k x dx Ek k x And the importance sampling estimator will be N IS I i 1 S X i f Xi N k Xi ; Xi ~ k Importance Sampling What would be a good choice for the importance density k x Importance Sampling We shall take a look at the Kullback Leibler divergence: D g * , f x; v g * x log g * x dx g * x log f x; v dx g x : The zero variance density = f x | S x f x; v : The density from the family of f x * with parameter v I S x f x Importance Sampling CE Algorithm In the article, 2 problematic issues were mentioned regarding the multilevel CE: • • The parametric family within which the optimal importance density g is obtained might not be large enough when the dimension of the problem is large, the likelihood ratio involved in obtaining vT becomes unstable. Solution • Sample directly from g* Importance Sampling Our goal is to find Stochastic Version v arg max g x log f x; v dx * v Deterministic Version N v arg max log f X i ; v , X i ~ g * v i 1 Importance Sampling But how the hell are we supposed * to sample from g x ? ? ? Importance Sampling g x f x | S x IS X f x * This observation grants us the opportunity to apply the useful tool of gibbs sampling. Gibbs Sampler In Brief Gibbs Sampler In Brief • • • • an algorithm to generate a sequence of samples from the joint probability distribution Gibbs sampling is a special case of the Metropolis–Hastings algorithm, and thus an example of a Markov chain Monte Carlo algorithm Gibbs sampling is applicable when the joint distribution is not known explicitly, but the conditional distribution of each variable is known It can be shown that the sequence of samples constitutes a Markov chain, and the stationary distribution of that Markov chain is just the sought-after joint distribution Gibbs Sampler In Brief The Gibbs sampler algorithm Given X t Generate Y1 ~ g * Generate Yi ~ g * 1 Generate Yn ~ g Return X t 1 Y x | X ,..., X nt x | Y ,..., Y i * t 2 1 t t , X ,..., X i 1 i 1 n xi | Y1,..., Yn1 Improved Cross Entropy Improved Cross Entropy The Improved CE consists of 3 steps: 1. Generate via gibbs sampler, N RVs X ~ g * * 2. Solve v arg max v N log f X ; v i 1 M 3. Estimate I S X i 1 i f Xi i M f X i ; v* ; X i ~ f X i ; v* Improved Cross Entropy Consider X X1 ,..., X n where X i ~ ber pi and we would like to estimate n P Xi n i 1 under the improved cross entropy scheme. Improved Cross Entropy Lets set n 50, pi 0.1, 0.7 and imply the new proposed algorithm Improved Cross Entropy Step 1 – generate RVs from g First we need to find g* x f x | S x * g xi | x1,..., xi1, xi1,..., xn * ber pi j i xi g * xi | xi ~ ji xi 1 Improved Cross Entropy * Step 1 – generate RVs from g cont. Set X 0i ~ ber pi For i {1,..., N } Generate Y1 ~ g * Generate Yi ~ g * x | X 1 Generate Yn ~ g Set X i Y ber pi j i xi g xi | xi ~ ji xi 1 * * x j i 1 2 ,..., X ni 1 | Y1 ,..., Y j 1 , X ij11 ,..., X ni 1 xi | Y1,..., Yn1 Improved Cross Entropy Step 2 – Solve the optimization problem N q arg max log f X i ; q * N i 1 v n X ij 1 X ij arg max log q j 1 q j v i 1 j 1 N arg max X ij log q j 1 X ij log 1 q j N v n i 1 j 1 q*j i X j i 1 N Improved Cross Entropy Step 3 – Estimate N IS I i 1 S X i n via importance sampling f Xi N f Xi;q * ; X i ~ f ; q* Improved Cross Entropy Multilevel CE Vs. Improved CE Improved Cross Entropy F(q|x) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 0.52 0.54 0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7 Gibbs Sampler In Brief CE N=10000 0.01 4 iterations Total budget 40000 v T Gibbs Sampler • 10 parallel chains • Each has 1000 length Total budget 10000 n : Obligors pi : Probability of the i obligor to default X i : p P X x for a given threshold x ci : Monetary loss if the i obligor defaults th i i i i th L X c1 I X1 x1 ... cn I X n xn P L X P L X bn b0 t Copula Model X i Z 1 2i 0 ; Z ~ N 0,1 ; i ~ N 0, 2 1 2 v v ; ~ Gamma , 2 2 X ~ marginally multivariate t v Known methods for the rare event estimation Exponential Change of Measure Hazard Rate Twisting Bounded relative error Logarithmically efficient Needs to generate RVs from non standard distribution 10 times more variance reduction then ECM The Improved CE for Estimating the Prob. of a Rare Loss X is derived from f z , , f N z;0,1 f G ; v , v 2 2 n i 1 f N i ;0, N2 f N ; a, b : N a, b fG ; c, d : Gamma c, d Step I – Sampling from g* Sampling From g* g * z, , f z, , | L x f z, , IL x Now we will show how we find the conditional probabilities of g* to apply the gibbs sampler For generating RVs from g* Sampling From g* g* z | , • Define G i xi 1 2i and arrange them is ascending order • Let Gi denote the i th ordered value and ci the corresponding loss l •Then the event L x occurs iff Z G k where k : min l : i 1 ci g z | , f N z;0,1 I z G * k Via Inverse Transform Sampling From g* g * | z, • Define H Z 1 2i i and arrange them is ascending order xi • Let H i denote the i th ordered value and ci the corresponding loss •Then the event L x occurs iff H n k where k : min l : i 1 ci g | z, fG ; v , v I min H 2 ,0 2 2 ( nk ) * l Via Inverse Transform Sampling From g* g * | , z • Multivariate truncated normal distribution • Sequentially draw from • if j i c j I 2 Z 1 j • else g * i | i , , z g i | i , , z f N i ;0, * * 2 then g i | i , , z f N i ;0, N 2 N I xi Z i 2 1 After we got Zi ,i , i i 1 we are ready to move to the next step… N Step II – Solving the Opt. Problem Solving Opt. Problem n 2 2 F f z, , ; v f N z; z , z fG ; , f N i ; , i 1 v z , z2 , , , In our model u 0,1, v 2 , v 2 ,0 Solving Opt. Problem Since any member of the group is a product of densities, standard techniques of maximum likelihood estimation can be applied to find the optimal v*. N Z* Z i 1 Z N i N , Z2* i 1 i N * Z N 2 , * 2 * S2 , * n i 1 j 1 i, j nN S2 , S the sample mean & variance of 1 ,..., N Solving Opt. Problem Once we obtain the optimal importance density f ; v* we are moving to step 3 Step III – Importance Sampling Importence Sampling M I i 1 L X i f Zi ,i , i ; u f Zi ,i , i ; v M Zi ,i , i ~ f ; v * * Some Results Pros and Cons Improved CE Pros Cons • • • • • Rare events 3 basic steps Appropriate in multi dimension settings Fewer simulation effort then the Multi level CE • Problematic in general performance function g * xi | xi not trivial Gibbs sampler requires warm up time Further research • • Gibbs sampler for the general performance function Applying Sequential Monte Carlo Methods for sampling from g*