Monte Carlo Solution of Integral Equations (of the second kind) Adam M. Johansen a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ March 7th, 2011 MIRaW – Monte Carlo Methods Day 1 Outline I Background I I I I Methodology I I I Importance Sampling (IS) Fredholm Equations of the 2nd Kind Sequential IS and 2nd Kind Fredholm Equations A Path-space Interpretation MCMC for 2nd Kind Fredholm Equations Results I I A Toy Example An Asset-Pricing Example 2 The Monte Carlo Method I Consider estimating: Z Iϕ = Ef [ϕ(X)] = I f (x)ϕ(x)dx iid Given X1 , . . . , XN ∼ f : N 1 X ϕ(xi ) → Iϕ N →∞ N lim i=1 I Alteratively, approximate: N 1 X fb(x) = δ(x − xi ) N 1=1 and use Efb[ϕ|X1 , . . . , XN ] ≈ Ef [ϕ]. 3 The Monte Carlo Method I Consider estimating: Z Iϕ = Ef [ϕ(X)] = I f (x)ϕ(x)dx iid Given X1 , . . . , XN ∼ f : N 1 X ϕ(xi ) → Iϕ N →∞ N lim i=1 I Alteratively, approximate: N 1 X δ(x − xi ) fb(x) = N 1=1 and use Efb[ϕ|X1 , . . . , XN ] ≈ Ef [ϕ]. 4 Importance Sampling I If f g (i.e. g(x) = 0 ⇒ f (x) = 0): Z f (x) ϕ(x)dx Iϕ = g(x) g(x) f (X) =Eg ϕ(X) g(X) | {z } w(X) I iid Given X1 , . . . , XN ∼ g: N 1 X w(Xi )ϕ(xi ) → Iϕ N →∞ N lim i=1 I Alteratively, approximate: N 1 X w(Xi )δ(x − Xi ) fb(x) = N Efb[ϕ|X1 . . . XN ] ≈ Ef [ϕ]. 1=1 5 Importance Sampling I If f g (i.e. g(x) = 0 ⇒ f (x) = 0): Z f (x) ϕ(x)dx Iϕ = g(x) g(x) f (X) =Eg ϕ(X) g(X) | {z } w(X) I iid Given X1 , . . . , XN ∼ g: N 1 X w(Xi )ϕ(xi ) → Iϕ N →∞ N lim i=1 I Alteratively, approximate: N 1 X fb(x) = w(Xi )δ(x − Xi ) N Efb[ϕ|X1 . . . XN ] ≈ Ef [ϕ]. 1=1 6 Fredholm Equations of the Second Kind I Consider the equation: Z f (y)K(x, y)dy + g(x) f (x) = E I in which I I source g(x) and transition K(x, y) are known, but I f (x) is unknown. I This is a Fredholm equation of the second kind. I Finding f (x) is an inverse problem. 7 The Von Neumann Expansion I We have the expansion: Z f (x) = f (y)K(x, y)dy + g(x) E Z Z = f (z)K(y, z)dx + g(y) K(x, y)dy + g(x) E E .. . =g(x) + ∞ Z X K n (x, y)f (y)dy n=1 where: K 1 (x, y) =K(x, y) K n (x, y) = Z K n−1 (x, z)K(z, y)dz E I For convergence, it is sufficient that: ∞ Z X |g(x)| + |K n (x, y)g(y)dy| < ∞ n=1 8 A Sampling Approach I Choose µ a pdf on E. I Define E 0 := E ∪ {†} I Let M (x, y) be a Markov transition kernel on E 0 such that: I I I I † is absorbing M (x, †) = Pd and M (x, y) > 0 whenever K(x, y) > 0. The pair (µ, M ) defines a Markov chain on E 0 . 9 Sequential Importance Sampling (SIS) Algorithm I n oN (i) Simulate N paths X0:k(i) +1 i=1 I Calculate weights: W (i) X0:k(i) = I (i) until Xk(i) +1 = †. ! (i) (i) (i) (i) kQ K Xk−1 ,Xk g X (i) 1 k (i) (i) (i) Pd µ X0 k=1 M Xk−1 ,Xk (i) g X0 (i) µ X0 Pd if k (i) ≥ 1, if k (i) = 0. Approximate f (x0 ) with: N 1 X (i) (i) b f (x0 ) = W X0:k(i) δ x0 − X0 N i=1 10 Von Neumann Representation Revisited I Starting from f (x0 ) = g(x0 ) + ∞ Z X K n (x0 , xn )f (xn )dxn n=1 E I We have directly: f (x0 ) =g(x0 ) + ∞ Z X n n=1 E =f0 (x0 ) + n Y ∞ Z X K(xp−1 , xp ) g(xn )dx1 . . . dxn p=1 fn (x0:n )dx1 . . . dxn n n=1 E with fn (x0:n ) := g(xn ) n Q K(xp−1 , xp ). p=1 11 A Path-Space Interpretation of SIS I Define: π(n, x0:n ) =pn πn (x0 : n) 0k+1 on F = ∪∞ k=0 {k} × E where pn =P(X0:n ∈ E n+1 , Xn+1 = †) = (1 − Pd )n Pd Q µ(x0 ) nk=1 M (xk−1 , xk ) πn (x0:n ) = (1 − Pd )n I Define also: f (k, x0:k ) =f0 (x0 )δ0,k + ∞ X fn (x0:n )δn,k n=1 I Approximate with: (i) N (i) 1 X f (k , X0:k(i) ) (i) ˆ f (x0 ) = δ(x0 − X0 ) (i) (i) N i=1 π(k , X0:k(i) ) 12 Limitations of SIS I Arbitary geometric distribution for k. I Arbitary initial distribution. I Importance weights (i) k(i) K X (i) , X (i) g X Y (i) k−1 k 1 k (i) (i) (i) P d µ X0 k=1 M Xk−1 , Xk I Resampling will not help. 13 Limitations of SIS I Arbitary geometric distribution for k. I Arbitary initial distribution. I Importance weights (i) k(i) K X (i) , X (i) g X Y (i) k−1 k 1 k (i) (i) (i) P d µ X0 k=1 M Xk−1 , Xk I Resampling will not help. 14 Optimal Importance Distribution I Minimize variance of absolute value of weights. I Consider choosing π(n, x0:n ) = ∞ X pk δk,n πk (x0:k ) i=k πn (x0:n ) =c−1 n |fn (x0:n )| pn =cn /c Z |fn (x0:n )| dx0:n cn = c= E n+1 ∞ X cn . n=0 I We have: f (x0 ) = c sgn (f0 (x0 )) π (0, x0 ) + ∞ Z X c sgn (fn (x0:n )) π (n, x0:n ) dx1:n . n n=1 E 15 That’s where the MCMC comes in I Obtain a sample-approximation of π . . . I I using your favourite sampler, Reversible Jump MCMC (2), for instance I Use the samples to approximate the optimal proposal. I Combine with importance sampling. I Estimate ‘normalising constant’, c. 16 A Simple RJMCMC Algorithm Initialization. (1) I Set k (1) , X (1) randomly or deterministically. 0:k Iteration i ≥ 2. I I Sample U ∼ U[0, 1] If U ≤ uk(i−1) I I Else If U ≤ uk(i−1) + bk(i−1) : I I (i) (k (i) , x0:k(i) ) ←Update Move. (i) (k (i) , x0:k(i) ) ←Birth Move. Else I (i) (k (i) , x0:k(i) ) ←Death Move. 17 Very Simple Update Move I Set k (i) = k (i−1) , sample J ∼U (i−1) ∗ XJ ∼ qu XJ ,· . I With probability (i−1) (i−1) (i−1) π k (i) , X0:J−1 , XJ∗ , XJ+1:k(i) qu XJ∗ , XJ min 1, (i−1) (i−1) π k (i) , X0:k(i) qu XJ , XJ∗ 0, 1, . . . , k (i) and (i) (i−1) (i−1) set X0:k(i) = X0:J−1 , XJ∗ , XJ+1:k(i) , I (i) (i−1) otherwise set X0:k(i) = X0:k(i−1) . Acceptance probabilities involve: fl (x0:l ) π (l, x0:l ) cl πl (x0:l ) . = = π (k, x0:k ) ck πk (x0:k ) fk (x0:k ) 18 Very Simple Update Move I Set k (i) = k (i−1) , sample J ∼U (i−1) ∗ XJ ∼ qu XJ ,· . I With probability (i−1) (i−1) (i−1) π k (i) , X0:J−1 , XJ∗ , XJ+1:k(i) qu XJ∗ , XJ min 1, (i−1) (i−1) π k (i) , X0:k(i) qu XJ , XJ∗ 0, 1, . . . , k (i) and (i) (i−1) (i−1) set X0:k(i) = X0:J−1 , XJ∗ , XJ+1:k(i) , I (i) (i−1) otherwise set X0:k(i) = X0:k(i−1) . Acceptance probabilities involve: fl (x0:l ) cl πl (x0:l ) π (l, x0:l ) . = = π (k, x0:k ) ck πk (x0:k ) fk (x0:k ) 19 Very Simple Birth Move I Sample J ∼ U 0, 1, . . . , k (i−1) , sample XJ∗ ∼ qb (·). I With probability (i−1) (i−1) π k (i−1) + 1, X0:J−1 , XJ∗ , XJ:k(i−1) dk(i−1) +1 min 1, (i−1) π k (i−1) , X0:k(i−1) qb XJ∗ bk(i−1) set I k (i) = k (i−1) + 1, (i) X0:k = (i−1) (i−1) X0:J−1 , XJ∗ , XJ:k(i−1) (i) , (i−1) otherwise set k (i) = k (i−1) , X0:k(i) = X0:k(i−1) . 20 Very Simple Death Move I Sample J ∼ U 0, 1, . . . , k (i−1) . With probability (i−1) (i−1) (i−1) π k (i−1) − 1, X0:J−1 , XJ+1:k(i−1) qb XJ bk(i−1) −1 min 1, (i−1) π k (i−1) , X0:k(i−1) dk(i−1) I (i−1) (i−1) (i) set k (i) = k (i−1) − 1, X0:k(i) = X0:J−1 , XJ+1:k(i−1) , (i) (i−1) otherwise set k (i) = k (i−1) , X0:k(i) = X0:k(i−1) . 21 Estimating the Normalising Constant I Estimate pn as: n 1 X δn,k(i) pc n = N i=1 I Estimate directly, also, Z ce0 ≈ g(x)dx and hence b c =ce0 /pb0 cbn =b cpc n 22 Toy Example I The simple algorithm was applied to Z 1 2 1 exp(x − y)f (y)dy + exp(x). f (x) = 0 |3 {z } |3 {z } K(x,y) g(x) I The solution f (x) = exp(x). I Birth, death and update probabilities were set to 1/3. I A uniform distribution over the unit interval was used for all proposals. 23 Point Estimation for the Toy Example Point Estimates and their Standard Deviations 3.5 3 f(x0) 2.5 Truth N=100 N=1,000 N=10,000 2 1.5 1 0.5 −0.2 0 0.2 0.4 x0 0.6 0.8 1 1.2 (Fix x0 and simulate everything else). 24 Estimating f (x) from Samples I Given fˆ(x) = I such that 1 N PN i=1 W Z E (i) δ(x (i) − X0 ), Z fˆ(x)ϕ(x) = f (x)ϕ(x)dx I Simple option: histogram (ϕ(x) = IA (x)). I More subtle option: Z fe(x) = K(x, y)ˆ(f )(y)dy + g(x) N 1 X (i) (i) =g(x) + W K(x, X0 ). N i=1 25 Estimating f (x) from Samples I Given fˆ(x) = I such that 1 N PN i=1 W Z E (i) δ(x (i) − X0 ), Z fˆ(x)ϕ(x) = f (x)ϕ(x)dx I Simple option: histogram (ϕ(x) = IA (x)). I More subtle option: Z fe(x) = K(x, y)ˆ(f )(y)dy + g(x) N 1 X (i) (i) =g(x) + W K(x, X0 ). N i=1 26 Estimated f (x) for the Toy Problem 3 Estimate 1 Estimate 2 Truth 2.5 2 1.5 1 0.5 0 0 0.2 0.4 0.6 0.8 127 An Asset-Pricing Problem I The rational expectation pricing model (cf. (4)) requires: I I the price, V (s) of an asset in some state s ∈ E satisfies Z V (s) = π(s) + β V (t)p(t|s)dt. E I Where: I I I I π(s) denotes the return on investment, β is a discount factor and p(t|s) is a Markov kernel which models the state evolution. Simple example: E =[0, 1] β =0.85 p(t|s) t−|as+b| √ λ = 1−[as+b] √ ) Φ( √λ ) − Φ(− as+b λ φ with a = 0.05, b = 0.85 and λ = 100. 28 Estimated f (x) for asset pricing case. 3.5 3 2.5 2 1.5 1 0.5 0 0 0.2 0.4 0.6 0.8 1 29 Simulated Chain Lengths for Different Starting Points 0.6 X00 X11 0.5 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 12 14 16 18 30 Conclusions I Many Monte Carlo methods can be viewed as importance sampling. I Many problems can be recast as calculation of expectations. I Approximate solution of Fredholm equations. I Also applicable to Volterra Equations (3). I Any Monte Carlo method could be used. I Some room for further investigation of this method. 31 References [1] A. Doucet, A. M. Johansen, and V. B. Tadić. On solving integral equations using Markov Chain Monte Carlo. Applied Mathematics and Computation, 216:2869–2889, 2010. [2] P. J. Green. Reversible jump Markov Chain Monte Carlo computation and Bayesian model determination. Biometrika, 82:711–732, 1995. [3] G. W. Peters, A. M. Johansen, and A. Doucet. Simulation of the annual loss distribution in operational risk via Panjer recursions and Volterra integral equations for value at risk and expected shortfall estimation. Journal of Operational Risk, 2(3):29–58, Fall 2007. [4] J. Rust, J. F. Traub, and H. Wózniakowski. Is there a curse of dimensionality for contraction fixed points in the worst case? Econometrica, 70(1):285–329, 2002. 32