Monte Carlo Solution of Integral Equations (of the second kind)

advertisement
Monte Carlo Solution of Integral Equations
(of the second kind)
Adam M. Johansen
a.m.johansen@warwick.ac.uk
www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/
March 7th, 2011
MIRaW – Monte Carlo Methods Day
1
Outline
I
Background
I
I
I
I
Methodology
I
I
I
Importance Sampling (IS)
Fredholm Equations of the 2nd Kind
Sequential IS and 2nd Kind Fredholm Equations
A Path-space Interpretation
MCMC for 2nd Kind Fredholm Equations
Results
I
I
A Toy Example
An Asset-Pricing Example
2
The Monte Carlo Method
I
Consider estimating:
Z
Iϕ = Ef [ϕ(X)] =
I
f (x)ϕ(x)dx
iid
Given X1 , . . . , XN ∼ f :
N
1 X
ϕ(xi ) → Iϕ
N →∞ N
lim
i=1
I
Alteratively, approximate:
N
1 X
fb(x) =
δ(x − xi )
N
1=1
and use
Efb[ϕ|X1 , . . . , XN ] ≈ Ef [ϕ].
3
The Monte Carlo Method
I
Consider estimating:
Z
Iϕ = Ef [ϕ(X)] =
I
f (x)ϕ(x)dx
iid
Given X1 , . . . , XN ∼ f :
N
1 X
ϕ(xi ) → Iϕ
N →∞ N
lim
i=1
I
Alteratively, approximate:
N
1 X
δ(x − xi )
fb(x) =
N
1=1
and use
Efb[ϕ|X1 , . . . , XN ] ≈ Ef [ϕ].
4
Importance Sampling
I
If f g (i.e. g(x) = 0 ⇒ f (x) = 0):
Z
f (x)
ϕ(x)dx
Iϕ = g(x)
g(x)
f (X)
=Eg
ϕ(X)
g(X)
| {z }
w(X)
I
iid
Given X1 , . . . , XN ∼ g:
N
1 X
w(Xi )ϕ(xi ) → Iϕ
N →∞ N
lim
i=1
I
Alteratively, approximate:
N
1 X
w(Xi )δ(x − Xi )
fb(x) =
N
Efb[ϕ|X1 . . . XN ] ≈ Ef [ϕ].
1=1
5
Importance Sampling
I
If f g (i.e. g(x) = 0 ⇒ f (x) = 0):
Z
f (x)
ϕ(x)dx
Iϕ = g(x)
g(x)
f (X)
=Eg
ϕ(X)
g(X)
| {z }
w(X)
I
iid
Given X1 , . . . , XN ∼ g:
N
1 X
w(Xi )ϕ(xi ) → Iϕ
N →∞ N
lim
i=1
I
Alteratively, approximate:
N
1 X
fb(x) =
w(Xi )δ(x − Xi )
N
Efb[ϕ|X1 . . . XN ] ≈ Ef [ϕ].
1=1
6
Fredholm Equations of the Second Kind
I
Consider the equation:
Z
f (y)K(x, y)dy + g(x)
f (x) =
E
I
in which
I
I
source g(x) and
transition K(x, y)
are known, but
I
f (x) is unknown.
I
This is a Fredholm equation of the second kind.
I
Finding f (x) is an inverse problem.
7
The Von Neumann Expansion
I
We have the expansion:
Z
f (x) =
f (y)K(x, y)dy + g(x)
E
Z Z
=
f (z)K(y, z)dx + g(y) K(x, y)dy + g(x)
E
E
..
.
=g(x) +
∞ Z
X
K n (x, y)f (y)dy
n=1
where:
K 1 (x, y) =K(x, y) K n (x, y) =
Z
K n−1 (x, z)K(z, y)dz
E
I
For convergence, it is sufficient that:
∞ Z
X
|g(x)| +
|K n (x, y)g(y)dy| < ∞
n=1
8
A Sampling Approach
I
Choose µ a pdf on E.
I
Define
E 0 := E ∪ {†}
I
Let M (x, y) be a Markov transition kernel on E 0 such that:
I
I
I
I
† is absorbing
M (x, †) = Pd and
M (x, y) > 0 whenever K(x, y) > 0.
The pair (µ, M ) defines a Markov chain on E 0 .
9
Sequential Importance Sampling (SIS) Algorithm
I
n
oN
(i)
Simulate N paths X0:k(i) +1
i=1
I
Calculate weights:
W
(i)
X0:k(i)
=









I
(i)
until Xk(i) +1 = †.
! (i)
(i)
(i)
(i)
kQ
K Xk−1 ,Xk
g X (i)
1
k
(i)
(i)
(i)
Pd
µ X0
k=1 M Xk−1 ,Xk
(i)
g X0
(i)
µ X0
Pd
if k (i) ≥ 1,
if k (i) = 0.
Approximate f (x0 ) with:
N
1 X (i) (i)
b
f (x0 ) =
W X0:k(i) δ x0 − X0
N
i=1
10
Von Neumann Representation Revisited
I
Starting from
f (x0 ) = g(x0 ) +
∞ Z
X
K n (x0 , xn )f (xn )dxn
n=1 E
I
We have directly:
f (x0 ) =g(x0 ) +

∞ Z
X
n
n=1 E
=f0 (x0 ) +
n
Y

∞ Z
X

K(xp−1 , xp ) g(xn )dx1 . . . dxn
p=1
fn (x0:n )dx1 . . . dxn
n
n=1 E
with fn (x0:n ) := g(xn )
n
Q
K(xp−1 , xp ).
p=1
11
A Path-Space Interpretation of SIS
I
Define:
π(n, x0:n ) =pn πn (x0 : n)
0k+1
on F = ∪∞
k=0 {k} × E
where
pn =P(X0:n ∈ E n+1 , Xn+1 = †) = (1 − Pd )n Pd
Q
µ(x0 ) nk=1 M (xk−1 , xk )
πn (x0:n ) =
(1 − Pd )n
I
Define also:
f (k, x0:k ) =f0 (x0 )δ0,k +
∞
X
fn (x0:n )δn,k
n=1
I
Approximate with:
(i)
N
(i)
1 X f (k , X0:k(i) )
(i)
ˆ
f (x0 ) =
δ(x0 − X0 )
(i)
(i)
N
i=1 π(k , X0:k(i) )
12
Limitations of SIS
I
Arbitary geometric distribution for k.
I
Arbitary initial distribution.
I
Importance weights
 
(i)
k(i) K X (i) , X (i)
g
X
Y
(i)
k−1
k
1
k


(i)
(i)
(i)
P
d
µ X0
k=1 M Xk−1 , Xk
I
Resampling will not help.
13
Limitations of SIS
I
Arbitary geometric distribution for k.
I
Arbitary initial distribution.
I
Importance weights
 
(i)
k(i) K X (i) , X (i)
g
X
Y
(i)
k−1
k
1
k


(i)
(i)
(i)
P
d
µ X0
k=1 M Xk−1 , Xk
I
Resampling will not help.
14
Optimal Importance Distribution
I
Minimize variance of absolute value of weights.
I
Consider choosing
π(n, x0:n ) =
∞
X
pk δk,n πk (x0:k )
i=k
πn (x0:n ) =c−1
n |fn (x0:n )|
pn =cn /c
Z
|fn (x0:n )| dx0:n
cn =
c=
E n+1
∞
X
cn .
n=0
I
We have:
f (x0 ) = c sgn (f0 (x0 )) π (0, x0 ) +
∞ Z
X
c
sgn (fn (x0:n )) π (n, x0:n ) dx1:n .
n
n=1 E
15
That’s where the MCMC comes in
I
Obtain a sample-approximation of π . . .
I
I
using your favourite sampler,
Reversible Jump MCMC (2), for instance
I
Use the samples to approximate the optimal proposal.
I
Combine with importance sampling.
I
Estimate ‘normalising constant’, c.
16
A Simple RJMCMC Algorithm
Initialization.
(1)
I Set k (1) , X (1) randomly or deterministically.
0:k
Iteration i ≥ 2.
I
I
Sample U ∼ U[0, 1]
If U ≤ uk(i−1)
I
I
Else If U ≤ uk(i−1) + bk(i−1) :
I
I
(i)
(k (i) , x0:k(i) ) ←Update Move.
(i)
(k (i) , x0:k(i) ) ←Birth Move.
Else
I
(i)
(k (i) , x0:k(i) ) ←Death Move.
17
Very Simple Update Move
I
Set k (i) = k (i−1) , sample
J ∼U
(i−1)
∗
XJ ∼ qu XJ
,· .
I
With probability


(i−1)
(i−1)
(i−1)

 π k (i) , X0:J−1
, XJ∗ , XJ+1:k(i) qu XJ∗ , XJ
min 1,
(i−1)
(i−1)


π k (i) , X0:k(i) qu XJ
, XJ∗
0, 1, . . . , k (i)
and
(i)
(i−1)
(i−1)
set X0:k(i) = X0:J−1 , XJ∗ , XJ+1:k(i) ,
I
(i)
(i−1)
otherwise set X0:k(i) = X0:k(i−1) .
Acceptance probabilities involve:
fl (x0:l ) π (l, x0:l )
cl πl (x0:l )
.
=
=
π (k, x0:k )
ck πk (x0:k ) fk (x0:k ) 18
Very Simple Update Move
I
Set k (i) = k (i−1) , sample
J ∼U
(i−1)
∗
XJ ∼ qu XJ
,· .
I
With probability


(i−1)
(i−1)
(i−1)

 π k (i) , X0:J−1
, XJ∗ , XJ+1:k(i) qu XJ∗ , XJ
min 1,
(i−1)
(i−1)


π k (i) , X0:k(i) qu XJ
, XJ∗
0, 1, . . . , k (i)
and
(i)
(i−1)
(i−1)
set X0:k(i) = X0:J−1 , XJ∗ , XJ+1:k(i) ,
I
(i)
(i−1)
otherwise set X0:k(i) = X0:k(i−1) .
Acceptance probabilities involve:
fl (x0:l ) cl πl (x0:l )
π (l, x0:l )
.
=
=
π (k, x0:k )
ck πk (x0:k ) fk (x0:k ) 19
Very Simple Birth Move
I
Sample J ∼ U 0, 1, . . . , k (i−1) , sample XJ∗ ∼ qb (·).
I
With probability


(i−1)
(i−1)
 π k (i−1) + 1, X0:J−1
, XJ∗ , XJ:k(i−1) dk(i−1) +1 
min 1,
(i−1)


π k (i−1) , X0:k(i−1) qb XJ∗ bk(i−1)
set
I
k (i)
=
k (i−1)
+ 1,
(i)
X0:k
=
(i−1)
(i−1)
X0:J−1 , XJ∗ , XJ:k(i−1)
(i)
,
(i−1)
otherwise set k (i) = k (i−1) , X0:k(i) = X0:k(i−1) .
20
Very Simple Death Move
I
Sample J ∼ U 0, 1, . . . , k (i−1) .
With probability


(i−1)
(i−1)
(i−1)
 π k (i−1) − 1, X0:J−1
, XJ+1:k(i−1) qb XJ
bk(i−1) −1 
min 1,
(i−1)


π k (i−1) , X0:k(i−1) dk(i−1)
I
(i−1)
(i−1)
(i)
set k (i) = k (i−1) − 1, X0:k(i) = X0:J−1 , XJ+1:k(i−1) ,
(i)
(i−1)
otherwise set k (i) = k (i−1) , X0:k(i) = X0:k(i−1) .
21
Estimating the Normalising Constant
I
Estimate pn as:
n
1 X
δn,k(i)
pc
n =
N
i=1
I
Estimate directly, also,
Z
ce0 ≈
g(x)dx
and hence
b
c =ce0 /pb0
cbn =b
cpc
n
22
Toy Example
I
The simple algorithm was applied to
Z 1
2
1
exp(x − y)f (y)dy + exp(x).
f (x) =
0 |3
{z
}
|3 {z }
K(x,y)
g(x)
I
The solution f (x) = exp(x).
I
Birth, death and update probabilities were set to 1/3.
I
A uniform distribution over the unit interval was used for
all proposals.
23
Point Estimation for the Toy Example
Point Estimates and their Standard Deviations
3.5
3
f(x0)
2.5
Truth
N=100
N=1,000
N=10,000
2
1.5
1
0.5
−0.2
0
0.2
0.4
x0
0.6
0.8
1
1.2
(Fix x0 and simulate everything else).
24
Estimating f (x) from Samples
I
Given fˆ(x) =
I
such that
1
N
PN
i=1 W
Z
E
(i) δ(x
(i)
− X0 ),
Z
fˆ(x)ϕ(x) = f (x)ϕ(x)dx
I
Simple option: histogram (ϕ(x) = IA (x)).
I
More subtle option:
Z
fe(x) = K(x, y)ˆ(f )(y)dy + g(x)
N
1 X (i)
(i)
=g(x) +
W K(x, X0 ).
N
i=1
25
Estimating f (x) from Samples
I
Given fˆ(x) =
I
such that
1
N
PN
i=1 W
Z
E
(i) δ(x
(i)
− X0 ),
Z
fˆ(x)ϕ(x) = f (x)ϕ(x)dx
I
Simple option: histogram (ϕ(x) = IA (x)).
I
More subtle option:
Z
fe(x) = K(x, y)ˆ(f )(y)dy + g(x)
N
1 X (i)
(i)
=g(x) +
W K(x, X0 ).
N
i=1
26
Estimated f (x) for the Toy Problem
3
Estimate 1
Estimate 2
Truth
2.5
2
1.5
1
0.5
0
0
0.2
0.4
0.6
0.8
127
An Asset-Pricing Problem
I
The rational expectation pricing model (cf. (4)) requires:
I
I
the price, V (s) of an asset
in some state s ∈ E
satisfies
Z
V (s) = π(s) + β
V (t)p(t|s)dt.
E
I
Where:
I
I
I
I
π(s) denotes the return on investment,
β is a discount factor and
p(t|s) is a Markov kernel which models the state evolution.
Simple example:
E =[0, 1]
β =0.85
p(t|s)
t−|as+b|
√
λ
= 1−[as+b]
√ )
Φ( √λ ) − Φ(− as+b
λ
φ
with a = 0.05, b = 0.85 and λ = 100.
28
Estimated f (x) for asset pricing case.
3.5
3
2.5
2
1.5
1
0.5
0
0
0.2
0.4
0.6
0.8
1
29
Simulated Chain Lengths for Different Starting Points
0.6
X00
X11
0.5
0.4
0.3
0.2
0.1
0
0
2
4
6
8
10
12
14
16
18
30
Conclusions
I
Many Monte Carlo methods can be viewed as importance
sampling.
I
Many problems can be recast as calculation of
expectations.
I
Approximate solution of Fredholm equations.
I
Also applicable to Volterra Equations (3).
I
Any Monte Carlo method could be used.
I
Some room for further investigation of this method.
31
References
[1] A. Doucet, A. M. Johansen, and V. B. Tadić. On solving
integral equations using Markov Chain Monte Carlo. Applied
Mathematics and Computation, 216:2869–2889, 2010.
[2] P. J. Green. Reversible jump Markov Chain Monte Carlo
computation and Bayesian model determination.
Biometrika, 82:711–732, 1995.
[3] G. W. Peters, A. M. Johansen, and A. Doucet. Simulation
of the annual loss distribution in operational risk via Panjer
recursions and Volterra integral equations for value at risk
and expected shortfall estimation. Journal of Operational
Risk, 2(3):29–58, Fall 2007.
[4] J. Rust, J. F. Traub, and H. Wózniakowski. Is there a curse
of dimensionality for contraction fixed points in the worst
case? Econometrica, 70(1):285–329, 2002.
32
Download