Differential Geometric Simulation Methods for Uncertainty Quantification in Complex Model Systems

advertisement
Differential Geometric Simulation Methods for Uncertainty
Quantification in Complex Model Systems
Mark Girolami
Department of Statistics
University of Warwick
Warwick Centre for Predictive Modelling
February 2015
Never Mind the Big Data here’s the Big Models
Never Mind the Big Data here’s the Big Models
Never Mind the Big Data here’s the Big Models
I
Statistical inference in mathematical models (Finite Elements) emerging
area of rapid development and growing importance in statistical
methodology
Never Mind the Big Data here’s the Big Models
I
Statistical inference in mathematical models (Finite Elements) emerging
area of rapid development and growing importance in statistical
methodology
I
Quantifying and propagating uncertainty in models e.g. Partial
Differential Equations describing natural phenomena
Never Mind the Big Data here’s the Big Models
I
Statistical inference in mathematical models (Finite Elements) emerging
area of rapid development and growing importance in statistical
methodology
I
Quantifying and propagating uncertainty in models e.g. Partial
Differential Equations describing natural phenomena
I
Examples include Shallow Water Components of Global Climate models
(UK MET Office)
Never Mind the Big Data here’s the Big Models
I
Statistical inference in mathematical models (Finite Elements) emerging
area of rapid development and growing importance in statistical
methodology
I
Quantifying and propagating uncertainty in models e.g. Partial
Differential Equations describing natural phenomena
I
Examples include Shallow Water Components of Global Climate models
(UK MET Office)
I
Ground water flow, subsurface multi-phase flow e.g. oil aquifers (BP,
Schlumberger)
Never Mind the Big Data here’s the Big Models
I
Statistical inference in mathematical models (Finite Elements) emerging
area of rapid development and growing importance in statistical
methodology
I
Quantifying and propagating uncertainty in models e.g. Partial
Differential Equations describing natural phenomena
I
Examples include Shallow Water Components of Global Climate models
(UK MET Office)
I
Ground water flow, subsurface multi-phase flow e.g. oil aquifers (BP,
Schlumberger)
I
Statistical inference in such models hugely challenging due to size of
models and complexity of induced probability measures
Never Mind the Big Data here’s the Big Models
I
Statistical inference in mathematical models (Finite Elements) emerging
area of rapid development and growing importance in statistical
methodology
I
Quantifying and propagating uncertainty in models e.g. Partial
Differential Equations describing natural phenomena
I
Examples include Shallow Water Components of Global Climate models
(UK MET Office)
I
Ground water flow, subsurface multi-phase flow e.g. oil aquifers (BP,
Schlumberger)
I
Statistical inference in such models hugely challenging due to size of
models and complexity of induced probability measures
I
This talk will concentrate on the issues associated with sampling from
the induced measures from such models and associated data
System Identification: Nonlinear ODE Oscillator Model
Clock
mRNA
...
Clock
Protein
dx1
dt
dx2
dt
Protein
=
k1
− m1 x1
1 + xnρ
=
k2 x1 − m2 x2
..
.
dxn
dt
=
kn xn−1 − mn xn
Inhibition
Protein
Systems Identification - Posterior Inference
Mixing of Markov Chains
5
4
3
2
1
1
2
3
4
5
Illustrative Heat Conduction Problem
I
Heat conduction problem governed by an elliptic partial differential
equation in the open and bounded domain Ω:
Illustrative Heat Conduction Problem
I
Heat conduction problem governed by an elliptic partial differential
equation in the open and bounded domain Ω:
−∇ · (eu ∇w)
=
0
−eu ∇w · n
=
u Bi
=
−1
u
−e ∇w · n
in
Ω
on
on
∂Ω/ ΓR
ΓR
Illustrative Heat Conduction Problem
I
Heat conduction problem governed by an elliptic partial differential
equation in the open and bounded domain Ω:
−∇ · (eu ∇w)
=
0
−eu ∇w · n
=
u Bi
=
−1
u
−e ∇w · n
I
in
Ω
on
on
∂Ω/ ΓR
ΓR
Forward state is w and u the logarithm of distributed thermal conductivity
on Ω, n the unit outward normal on Ω, and Bi the Biot number.
Illustrative Heat Conduction Problem
I
Heat conduction problem governed by an elliptic partial differential
equation in the open and bounded domain Ω:
−∇ · (eu ∇w)
=
0
−eu ∇w · n
=
u Bi
=
−1
u
−e ∇w · n
I
in
Ω
on
on
∂Ω/ ΓR
ΓR
Forward state is w and u the logarithm of distributed thermal conductivity
on Ω, n the unit outward normal on Ω, and Bi the Biot number.
Illustrative Heat Conduction Problem
4
3.5
3
2.5
2
1.5
1
0.5
0
−3
−2
−1
0
1
2
3
Illustrative Heat Conduction Problem
Illustrative Heat Conduction Problem
I
Take one finite element discretisation of domain and one observation at
leftmost boundary
Illustrative Heat Conduction Problem
I
Take one finite element discretisation of domain and one observation at
leftmost boundary
I
Consider induced bivariate posterior for varying forms of prior Gaussian
measure
πpost
5
u2
0
−5
−10
−10
−5
u1
0
5
MCMC from Diffusions and Geodesics
I
Riemann manifold Langevin and Hamiltonian Monte Carlo Methods
Girolami, M. & Calderhead, B.
J.R.Statist. Soc. B, with discussion, (2011), 73, 2, 123 - 214.
http://www2.warwick.ac.uk/mgirolami
Talk Outline
I
Motivation to improve MCMC capability for challenging problems
Talk Outline
I
Motivation to improve MCMC capability for challenging problems
I
Exploring differential geometric concepts in MCMC methodology
Talk Outline
I
Motivation to improve MCMC capability for challenging problems
I
Exploring differential geometric concepts in MCMC methodology
I
Diffusions across Riemann manifold as basis for MCMC
Talk Outline
I
Motivation to improve MCMC capability for challenging problems
I
Exploring differential geometric concepts in MCMC methodology
I
Diffusions across Riemann manifold as basis for MCMC
I
Geodesic flows on manifold form basis of MCMC methods
Talk Outline
I
Motivation to improve MCMC capability for challenging problems
I
Exploring differential geometric concepts in MCMC methodology
I
Diffusions across Riemann manifold as basis for MCMC
I
Geodesic flows on manifold form basis of MCMC methods
I
Illustrative examples
Talk Outline
I
Motivation to improve MCMC capability for challenging problems
I
Exploring differential geometric concepts in MCMC methodology
I
Diffusions across Riemann manifold as basis for MCMC
I
Geodesic flows on manifold form basis of MCMC methods
I
Illustrative examples
I
Further Work and Conclusions
Motivation Simulation Based Inference
I
Monte Carlo method employs samples from π(θ) to obtain estimate
Z
1
1 X
φ(θ)π(θ)dθ =
φ(θ n ) + O(N − 2 )
n
N
Motivation Simulation Based Inference
I
Monte Carlo method employs samples from π(θ) to obtain estimate
Z
1
1 X
φ(θ)π(θ)dθ =
φ(θ n ) + O(N − 2 )
n
N
I
Draw θ n from ergodic Markov process with stationary distribution π(θ)
Motivation Simulation Based Inference
I
Monte Carlo method employs samples from π(θ) to obtain estimate
Z
1
1 X
φ(θ)π(θ)dθ =
φ(θ n ) + O(N − 2 )
n
N
I
Draw θ n from ergodic Markov process with stationary distribution π(θ)
I
Construct transition kernel as product of two components
Motivation Simulation Based Inference
I
Monte Carlo method employs samples from π(θ) to obtain estimate
Z
1
1 X
φ(θ)π(θ)dθ =
φ(θ n ) + O(N − 2 )
n
N
I
Draw θ n from ergodic Markov process with stationary distribution π(θ)
I
Construct transition kernel as product of two components
I
Propose a move θ → θ 0 with probability pp (θ 0 |θ)
Motivation Simulation Based Inference
I
Monte Carlo method employs samples from π(θ) to obtain estimate
Z
1
1 X
φ(θ)π(θ)dθ =
φ(θ n ) + O(N − 2 )
n
N
I
Draw θ n from ergodic Markov process with stationary distribution π(θ)
I
Construct transition kernel as product of two components
I
I
Propose a move θ → θ 0 with probability pp (θ 0 |θ)
accept or reject proposal with probability
π(θ 0 )pp (θ|θ 0 )
pa (θ 0 |θ) = min 1,
0
π(θ)pp (θ |θ)
Motivation Simulation Based Inference
I
Monte Carlo method employs samples from π(θ) to obtain estimate
Z
1
1 X
φ(θ)π(θ)dθ =
φ(θ n ) + O(N − 2 )
n
N
I
Draw θ n from ergodic Markov process with stationary distribution π(θ)
I
Construct transition kernel as product of two components
I
I
I
Propose a move θ → θ 0 with probability pp (θ 0 |θ)
accept or reject proposal with probability
π(θ 0 )pp (θ|θ 0 )
pa (θ 0 |θ) = min 1,
0
π(θ)pp (θ |θ)
Convergence rate and asymptotic variance dependent on pp (θ 0 |θ)
Motivation Simulation Based Inference
I
Monte Carlo method employs samples from π(θ) to obtain estimate
Z
1
1 X
φ(θ)π(θ)dθ =
φ(θ n ) + O(N − 2 )
n
N
I
Draw θ n from ergodic Markov process with stationary distribution π(θ)
I
Construct transition kernel as product of two components
I
I
Propose a move θ → θ 0 with probability pp (θ 0 |θ)
accept or reject proposal with probability
π(θ 0 )pp (θ|θ 0 )
pa (θ 0 |θ) = min 1,
0
π(θ)pp (θ |θ)
I
Convergence rate and asymptotic variance dependent on pp (θ 0 |θ)
I
Success of MCMC reliant upon appropriate proposal design
Differential Geometric Concepts in MCMC
Differential Geometric Concepts in MCMC
I
Denote expected Fisher Information as G(θ) = cov(∇θ L(θ))
Differential Geometric Concepts in MCMC
I
Denote expected Fisher Information as G(θ) = cov(∇θ L(θ))
I
Rao, 1945 to first order
Z
|p(y; θ + δθ) − p(y; θ)|2
χ2 (δθ) =
dy ≈ δθ T G(θ)δθ
p(y; θ)
Differential Geometric Concepts in MCMC
I
Denote expected Fisher Information as G(θ) = cov(∇θ L(θ))
I
Rao, 1945 to first order
Z
|p(y; θ + δθ) − p(y; θ)|2
χ2 (δθ) =
dy ≈ δθ T G(θ)δθ
p(y; θ)
I
Jeffreys, 1948 to first order
Z
p(y; θ + δθ)
dy ≈ δθ T G(θ)δθ
D(θ||δθ) = p(y; θ + δθ) log
p(y; θ)
Differential Geometric Concepts in MCMC
I
Denote expected Fisher Information as G(θ) = cov(∇θ L(θ))
I
Rao, 1945 to first order
Z
|p(y; θ + δθ) − p(y; θ)|2
χ2 (δθ) =
dy ≈ δθ T G(θ)δθ
p(y; θ)
I
Jeffreys, 1948 to first order
Z
p(y; θ + δθ)
dy ≈ δθ T G(θ)δθ
D(θ||δθ) = p(y; θ + δθ) log
p(y; θ)
I
Expected Fisher Information G(θ) is metric tensor of a Riemann manifold
Differential Geometric Concepts in MCMC
I
Denote expected Fisher Information as G(θ) = cov(∇θ L(θ))
I
Rao, 1945 to first order
Z
|p(y; θ + δθ) − p(y; θ)|2
χ2 (δθ) =
dy ≈ δθ T G(θ)δθ
p(y; θ)
I
Jeffreys, 1948 to first order
Z
p(y; θ + δθ)
dy ≈ δθ T G(θ)δθ
D(θ||δθ) = p(y; θ + δθ) log
p(y; θ)
I
Expected Fisher Information G(θ) is metric tensor of a Riemann manifold
I
Non-Euclidean geometry - invariants, connections, curvature, geodesics
Differential Geometric Concepts in MCMC
I
Denote expected Fisher Information as G(θ) = cov(∇θ L(θ))
I
Rao, 1945 to first order
Z
|p(y; θ + δθ) − p(y; θ)|2
χ2 (δθ) =
dy ≈ δθ T G(θ)δθ
p(y; θ)
I
Jeffreys, 1948 to first order
Z
p(y; θ + δθ)
dy ≈ δθ T G(θ)δθ
D(θ||δθ) = p(y; θ + δθ) log
p(y; θ)
I
Expected Fisher Information G(θ) is metric tensor of a Riemann manifold
I
Non-Euclidean geometry - invariants, connections, curvature, geodesics
I
Asymptotic statistical analysis. e.g. Amari, 1981; Murray & Rice, 1993;
Critchley et al, 1993; Kass, 1989; Dawid, 1975; Lauritsen, 1989
Differential Geometric Concepts in MCMC
I
Denote expected Fisher Information as G(θ) = cov(∇θ L(θ))
I
Rao, 1945 to first order
Z
|p(y; θ + δθ) − p(y; θ)|2
χ2 (δθ) =
dy ≈ δθ T G(θ)δθ
p(y; θ)
I
Jeffreys, 1948 to first order
Z
p(y; θ + δθ)
dy ≈ δθ T G(θ)δθ
D(θ||δθ) = p(y; θ + δθ) log
p(y; θ)
I
Expected Fisher Information G(θ) is metric tensor of a Riemann manifold
I
Non-Euclidean geometry - invariants, connections, curvature, geodesics
I
Asymptotic statistical analysis. e.g. Amari, 1981; Murray & Rice, 1993;
Critchley et al, 1993; Kass, 1989; Dawid, 1975; Lauritsen, 1989
I
Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
Differential Geometric Concepts in MCMC
I
Denote expected Fisher Information as G(θ) = cov(∇θ L(θ))
I
Rao, 1945 to first order
Z
|p(y; θ + δθ) − p(y; θ)|2
χ2 (δθ) =
dy ≈ δθ T G(θ)δθ
p(y; θ)
I
Jeffreys, 1948 to first order
Z
p(y; θ + δθ)
dy ≈ δθ T G(θ)δθ
D(θ||δθ) = p(y; θ + δθ) log
p(y; θ)
I
Expected Fisher Information G(θ) is metric tensor of a Riemann manifold
I
Non-Euclidean geometry - invariants, connections, curvature, geodesics
I
Asymptotic statistical analysis. e.g. Amari, 1981; Murray & Rice, 1993;
Critchley et al, 1993; Kass, 1989; Dawid, 1975; Lauritsen, 1989
I
Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I
Can geometric structure be employed in Monte Carlo methodology?
Manifolds
A manifold is a smooth, curved surface: A set embedded in Rd , that locally
looks like Rn (n < d).
Example: the unit sphere (2-sphere): d = 3, n = 2
X 2
S2 = {x ∈ R3 :
xi = 1}
i
Manifolds
A manifold is a smooth, curved surface: A set embedded in Rd , that locally
looks like Rn (n < d).
Example: the unit sphere (2-sphere): d = 3, n = 2
X 2
S2 = {x ∈ R3 :
xi = 1}
i
x ∈ Rd are the embedded coordinates
Coordinate systems and Riemannian metrics
Can parameterise the manifold with a coordinate system in q ∈ Rn
q2
q1
(sin q1 sin q2 , cos q1 sin q2 , cos q2 ),
q1 ∈ [0, 2π], q2 ∈ [0, π]
Coordinate systems and Riemannian metrics
Can parameterise the manifold with a coordinate system in q ∈ Rn
q2
q1
(sin q1 sin q2 , cos q1 sin q2 , cos q2 ),
q1 ∈ [0, 2π], q2 ∈ [0, π]
The Euclidean metric k · k induces a Riemannian metric G in the coordinate
system:
X
kdxk2 =
G(q)dqi dqj
i,j
M.C. Escher, Heaven and Hell, 1960
Geodesics
Geodesics are the paths of shortest distance.
I
On a sphere, these are the great circles
q2
q1
Geodesics
Geodesics are the paths of shortest distance.
I
On a sphere, these are the great circles
q2
q1
Given an initial velocity v (0) ⊥ x(0), we have a nice explicit form
1
cos(αt) − sin(αt) 1
x(t) v (t) = x(0) v (0)
−1
sin(αt)
cos(αt)
α
where α = kv (0)k.
α
Geometric Concepts in MCMC
Geometric Concepts in MCMC
I
Tangent space - local metric defined by δθ T G(θ)δθ = gkl δθk δθl
Geometric Concepts in MCMC
I
Tangent space - local metric defined by δθ T G(θ)δθ = gkl δθk δθl
I
Christoffel symbols - characterise Levi-Civita connection on manifold
Geometric Concepts in MCMC
I
Tangent space - local metric defined by δθ T G(θ)δθ = gkl δθk δθl
I
Christoffel symbols - characterise Levi-Civita connection on manifold
1 X im ∂gmk
∂gml
∂gkl
Γikl =
g
+
−
2 m
∂θl
∂θk
∂θm
Geometric Concepts in MCMC
I
Tangent space - local metric defined by δθ T G(θ)δθ = gkl δθk δθl
I
Christoffel symbols - characterise Levi-Civita connection on manifold
1 X im ∂gmk
∂gml
∂gkl
Γikl =
g
+
−
2 m
∂θl
∂θk
∂θm
I
Geodesics - shortest path between two points on manifold
Geometric Concepts in MCMC
I
Tangent space - local metric defined by δθ T G(θ)δθ = gkl δθk δθl
I
Christoffel symbols - characterise Levi-Civita connection on manifold
1 X im ∂gmk
∂gml
∂gkl
Γikl =
g
+
−
2 m
∂θl
∂θk
∂θm
I
Geodesics - shortest path between two points on manifold
X i dθk dθl
d 2 θi
+
Γkl
=0
2
dt
dt dt
k,l
Fisher–Rao metric
A family of probability densities {p(· | θ) : θ ∈ Θ}, the Fisher information
forms a natural Riemannian metric, known as Fisher–Rao metric:
∂2
log p(Y | θ)
gij = EY |θ −
∂θi ∂θj
Fisher–Rao metric
A family of probability densities {p(· | θ) : θ ∈ Θ}, the Fisher information
forms a natural Riemannian metric, known as Fisher–Rao metric:
∂2
log p(Y | θ)
gij = EY |θ −
∂θi ∂θj
Examples:
N(µ, σ 2 )
σ
µ
Fisher–Rao metric
A family of probability densities {p(· | θ) : θ ∈ Θ}, the Fisher information
forms a natural Riemannian metric, known as Fisher–Rao metric:
∂2
log p(Y | θ)
gij = EY |θ −
∂θi ∂θj
Examples:
N(µ, σ 2 )
σ
µ
Illustration of Geometric Concepts
I
Consider Normal density p(x|µ, σ) = Nx (µ, σ)
Illustration of Geometric Concepts
I
Consider Normal density p(x|µ, σ) = Nx (µ, σ)
I
Local inner product on tangent space defined by metric tensor, i.e.
δθ T G(θ)δθ, where θ = (µ, σ)T
Illustration of Geometric Concepts
I
Consider Normal density p(x|µ, σ) = Nx (µ, σ)
I
Local inner product on tangent space defined by metric tensor, i.e.
δθ T G(θ)δθ, where θ = (µ, σ)T
I
Metric is Expected Fisher Information
−2
σ
G(µ, σ) =
0
0
2σ −2
Illustration of Geometric Concepts
I
Consider Normal density p(x|µ, σ) = Nx (µ, σ)
I
Local inner product on tangent space defined by metric tensor, i.e.
δθ T G(θ)δθ, where θ = (µ, σ)T
I
Metric is Expected Fisher Information
−2
σ
G(µ, σ) =
0
I
0
2σ −2
Components of connection ∂µ G = 0 and ∂σ G = − diag(2σ −3 , 4σ −3 )
Illustration of Geometric Concepts
I
Consider Normal density p(x|µ, σ) = Nx (µ, σ)
I
Local inner product on tangent space defined by metric tensor, i.e.
δθ T G(θ)δθ, where θ = (µ, σ)T
I
Metric is Expected Fisher Information
−2
σ
G(µ, σ) =
0
0
2σ −2
I
Components of connection ∂µ G = 0 and ∂σ G = − diag(2σ −3 , 4σ −3 )
I
Metric on tangent space
δθ T G(θ)δθ =
(δµ2 + 2δσ 2 )
σ2
Illustration of Geometric Concepts
I
Consider Normal density p(x|µ, σ) = Nx (µ, σ)
I
Local inner product on tangent space defined by metric tensor, i.e.
δθ T G(θ)δθ, where θ = (µ, σ)T
I
Metric is Expected Fisher Information
−2
σ
G(µ, σ) =
0
0
2σ −2
I
Components of connection ∂µ G = 0 and ∂σ G = − diag(2σ −3 , 4σ −3 )
I
Metric on tangent space
δθ T G(θ)δθ =
I
(δµ2 + 2δσ 2 )
σ2
Metric tensor for univariate Normal defines a Hyperbolic Space
Illustration of Geometric Concepts
I
Consider Normal density p(x|µ, σ) = Nx (µ, σ)
I
Local inner product on tangent space defined by metric tensor, i.e.
δθ T G(θ)δθ, where θ = (µ, σ)T
I
Metric is Expected Fisher Information
−2
σ
G(µ, σ) =
0
0
2σ −2
I
Components of connection ∂µ G = 0 and ∂σ G = − diag(2σ −3 , 4σ −3 )
I
Metric on tangent space
δθ T G(θ)δθ =
(δµ2 + 2δσ 2 )
σ2
I
Metric tensor for univariate Normal defines a Hyperbolic Space
I
Consider densities N (0, 1) & N (1, 1) and N (0, 2) & N (1, 2)
Normal Density - Euclidean Parameter space
σ
N (0, 2)
N (1, 2)
N (0, 1)
N (1, 1)
µ
Normal Density - Riemannian Functional space
σ
N (0, 2)
N (1, 2)
N (0, 1)
N (1, 1)
µ
Langevin Diffusion on Riemannian manifold
I
Continuous Langevin diffusion with invariant measure π(θ) ≡ exp(L(θ))
D
dθ =
X
p
1 −1
G−1 (θ)dW
G (θ)∇θ L(θ) −
G(θ)−1
ij Γij +
2
i,j
Langevin Diffusion on Riemannian manifold
I
Continuous Langevin diffusion with invariant measure π(θ) ≡ exp(L(θ))
D
dθ =
X
p
1 −1
G−1 (θ)dW
G (θ)∇θ L(θ) −
G(θ)−1
ij Γij +
2
i,j
I
Discretised Langevin diffusion on manifold defines proposal mechanism
θ 0d = θ d +
D
p
X
2 −1
d
G (θ)∇θ L(θ) − 2
G(θ)−1
G−1 (θ)z
ij Γij + 2
d
d
i,j
Langevin Diffusion on Riemannian manifold
I
Continuous Langevin diffusion with invariant measure π(θ) ≡ exp(L(θ))
D
dθ =
X
p
1 −1
G−1 (θ)dW
G (θ)∇θ L(θ) −
G(θ)−1
ij Γij +
2
i,j
I
Discretised Langevin diffusion on manifold defines proposal mechanism
I
D
p
X
2 −1
d
G (θ)∇θ L(θ) − 2
G(θ)−1
G−1 (θ)z
ij Γij + 2
d
d
i,j
Manifold with constant curvature then proposal mechanism reduces to
θ 0d = θ d +
θ0 = θ +
p
2 −1
G (θ)∇θ L(θ) + G−1 (θ)z
2
Langevin Diffusion on Riemannian manifold
I
Continuous Langevin diffusion with invariant measure π(θ) ≡ exp(L(θ))
D
dθ =
X
p
1 −1
G−1 (θ)dW
G (θ)∇θ L(θ) −
G(θ)−1
ij Γij +
2
i,j
I
Discretised Langevin diffusion on manifold defines proposal mechanism
I
D
p
X
2 −1
d
G (θ)∇θ L(θ) − 2
G(θ)−1
G−1 (θ)z
ij Γij + 2
d
d
i,j
Manifold with constant curvature then proposal mechanism reduces to
I
p
2
θ 0 = θ + G −1 (θ)∇θ L(θ) + G−1 (θ)z
2
MALA proposal with preconditioning
θ 0d = θ d +
θ0 = θ +
√
2
M∇θ L(θ) + Mz
2
Langevin Diffusion on Riemannian manifold
I
Continuous Langevin diffusion with invariant measure π(θ) ≡ exp(L(θ))
D
dθ =
X
p
1 −1
G−1 (θ)dW
G (θ)∇θ L(θ) −
G(θ)−1
ij Γij +
2
i,j
I
Discretised Langevin diffusion on manifold defines proposal mechanism
I
D
p
X
2 −1
d
G (θ)∇θ L(θ) − 2
G(θ)−1
G−1 (θ)z
ij Γij + 2
d
d
i,j
Manifold with constant curvature then proposal mechanism reduces to
I
p
2
θ 0 = θ + G −1 (θ)∇θ L(θ) + G−1 (θ)z
2
MALA proposal with preconditioning
I
√
2
M∇θ L(θ) + Mz
2
Proposal and acceptance probability
θ 0d = θ d +
θ0 = θ +
pp (θ 0 |θ) = N (θ 0 |µ(θ, ), 2 G−1 (θ))
π(θ 0 )pp (θ|θ 0 )
pa (θ 0 |θ) = min 1,
π(θ)pp (θ 0 |θ)
Langevin Diffusion on Riemannian manifold
I
Continuous Langevin diffusion with invariant measure π(θ) ≡ exp(L(θ))
D
dθ =
X
p
1 −1
G−1 (θ)dW
G (θ)∇θ L(θ) −
G(θ)−1
ij Γij +
2
i,j
I
Discretised Langevin diffusion on manifold defines proposal mechanism
I
D
p
X
2 −1
d
G (θ)∇θ L(θ) − 2
G(θ)−1
G−1 (θ)z
ij Γij + 2
d
d
i,j
Manifold with constant curvature then proposal mechanism reduces to
I
p
2
θ 0 = θ + G −1 (θ)∇θ L(θ) + G−1 (θ)z
2
MALA proposal with preconditioning
I
√
2
M∇θ L(θ) + Mz
2
Proposal and acceptance probability
θ 0d = θ d +
θ0 = θ +
pp (θ 0 |θ) = N (θ 0 |µ(θ, ), 2 G−1 (θ))
π(θ 0 )pp (θ|θ 0 )
pa (θ 0 |θ) = min 1,
π(θ)pp (θ 0 |θ)
I
Proposal mechanism diffuses approximately along the manifold
Langevin Diffusion on Riemannian manifold
35
35
30
30
25
25
!
40
!
40
20
20
15
15
10
10
5
−5
0
µ
5
5
−5
0
µ
5
Langevin Diffusion on Riemannian manifold
20
20
15
15
!
25
!
25
10
10
5
5
−10
0
µ
10
−10
0
µ
10
Second-Order Diffusions
Second-Order Diffusions
I
Define a second-order SDE on a Riemann manifold with the diffusion
defined on vector field v to obtain
Second-Order Diffusions
I
Define a second-order SDE on a Riemann manifold with the diffusion
defined on vector field v to obtain
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
p
2G−1 (xt )dWt
Second-Order Diffusions
I
I
Define a second-order SDE on a Riemann manifold with the diffusion
defined on vector field v to obtain
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
Vector function γ has components γk =
P
i,j
Γkij v i v j
p
2G−1 (xt )dWt
Second-Order Diffusions
I
Define a second-order SDE on a Riemann manifold with the diffusion
defined on vector field v to obtain
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
P
p
2G−1 (xt )dWt
Γkij v i v j
I
Vector function γ has components γk =
I
Each Γki,j being the Christofell symbols of the Levi-Civita connection
corresponding to the metric tensor G(xt )
i,j
Second-Order Diffusions
I
Define a second-order SDE on a Riemann manifold with the diffusion
defined on vector field v to obtain
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
P
p
2G−1 (xt )dWt
Γkij v i v j
I
Vector function γ has components γk =
I
Each Γki,j being the Christofell symbols of the Levi-Civita connection
corresponding to the metric tensor G(xt )
I
Note dvt + γ(xt , vt )dt is the covariant time derivative of velocity and the
contravariant form of the potential gradient is simply G−1 (xt )∇xt φ(xt )
i,j
Second-Order Diffusions
I
Define a second-order SDE on a Riemann manifold with the diffusion
defined on vector field v to obtain
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
P
p
2G−1 (xt )dWt
Γkij v i v j
I
Vector function γ has components γk =
I
Each Γki,j being the Christofell symbols of the Levi-Civita connection
corresponding to the metric tensor G(xt )
I
Note dvt + γ(xt , vt )dt is the covariant time derivative of velocity and the
contravariant form of the potential gradient is simply G−1 (xt )∇xt φ(xt )
I
Physically defines acceleration of a particle across manifold under the
influence of potential field
i,j
Second-Order Diffusions
I
Define a second-order SDE on a Riemann manifold with the diffusion
defined on vector field v to obtain
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
P
p
2G−1 (xt )dWt
Γkij v i v j
I
Vector function γ has components γk =
I
Each Γki,j being the Christofell symbols of the Levi-Civita connection
corresponding to the metric tensor G(xt )
I
Note dvt + γ(xt , vt )dt is the covariant time derivative of velocity and the
contravariant form of the potential gradient is simply G−1 (xt )∇xt φ(xt )
I
Physically defines acceleration of a particle across manifold under the
influence of potential field
I
A unit friction or viscous force is included by the term v and the
fluctuating forcing is modelled by the Brownian motion
i,j
Second-Order Diffusions
I
Define a second-order SDE on a Riemann manifold with the diffusion
defined on vector field v to obtain
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
P
p
2G−1 (xt )dWt
Γkij v i v j
I
Vector function γ has components γk =
I
Each Γki,j being the Christofell symbols of the Levi-Civita connection
corresponding to the metric tensor G(xt )
I
Note dvt + γ(xt , vt )dt is the covariant time derivative of velocity and the
contravariant form of the potential gradient is simply G−1 (xt )∇xt φ(xt )
I
Physically defines acceleration of a particle across manifold under the
influence of potential field
I
A unit friction or viscous force is included by the term v and the
fluctuating forcing is modelled by the Brownian motion
I
What has this to do with MCMC?
i,j
Second-Order Diffusions
Second-Order Diffusions
I
What is invariant measure of this diffusion?
Second-Order Diffusions
I
What is invariant measure of this diffusion?
I
The Fokker-Planck equation for the diffusion follows as
Second-Order Diffusions
I
What is invariant measure of this diffusion?
I
The Fokker-Planck equation for the diffusion follows as
dp
∂
∂2p
∂p
a
a
b c
ab ∂φ
Γ
+
v
p + g ab a b
= − va a +
v
v
+
g
bc
a
b
dt
∂x
∂v
∂x
∂v ∂v
Second-Order Diffusions
I
What is invariant measure of this diffusion?
I
The Fokker-Planck equation for the diffusion follows as
dp
∂
∂2p
∂p
a
a
b c
ab ∂φ
Γ
+
v
p + g ab a b
= − va a +
v
v
+
g
bc
a
b
dt
∂x
∂v
∂x
∂v ∂v
I
Some schoolboy algebra shows that for dp/dt = 0
Second-Order Diffusions
I
What is invariant measure of this diffusion?
I
The Fokker-Planck equation for the diffusion follows as
dp
∂
∂2p
∂p
a
a
b c
ab ∂φ
Γ
+
v
p + g ab a b
= − va a +
v
v
+
g
bc
a
b
dt
∂x
∂v
∂x
∂v ∂v
I
Some schoolboy algebra shows that for dp/dt = 0
p
∂
1 ∂gab a b
∂φ
dp
v
v
+
log
dx c
= −
|g|
−
p
2 ∂x c
∂x c
∂x x
Second-Order Diffusions
I
What is invariant measure of this diffusion?
I
The Fokker-Planck equation for the diffusion follows as
dp
∂
∂2p
∂p
a
a
b c
ab ∂φ
Γ
+
v
p + g ab a b
= − va a +
v
v
+
g
bc
a
b
dt
∂x
∂v
∂x
∂v ∂v
I
Some schoolboy algebra shows that for dp/dt = 0
p
∂
1 ∂gab a b
∂φ
dp
v
v
+
log
dx c
= −
|g|
−
p
2 ∂x c
∂x c
∂x x
I
Therefore the above second-order SDE is satisfied by invariant densities
t → ∞ of the form
Second-Order Diffusions
I
What is invariant measure of this diffusion?
I
The Fokker-Planck equation for the diffusion follows as
dp
∂
∂2p
∂p
a
a
b c
ab ∂φ
Γ
+
v
p + g ab a b
= − va a +
v
v
+
g
bc
a
b
dt
∂x
∂v
∂x
∂v ∂v
I
Some schoolboy algebra shows that for dp/dt = 0
p
∂
1 ∂gab a b
∂φ
dp
v
v
+
log
dx c
= −
|g|
−
p
2 ∂x c
∂x c
∂x x
I
Therefore the above second-order SDE is satisfied by invariant densities
t → ∞ of the form
1
p(x, v) ∝ det(G(x)) exp − vT G(x)v − φ(x)
2
Second-Order Diffusions
I
What is invariant measure of this diffusion?
I
The Fokker-Planck equation for the diffusion follows as
dp
∂
∂2p
∂p
a
a
b c
ab ∂φ
Γ
+
v
p + g ab a b
= − va a +
v
v
+
g
bc
a
b
dt
∂x
∂v
∂x
∂v ∂v
I
Some schoolboy algebra shows that for dp/dt = 0
p
∂
1 ∂gab a b
∂φ
dp
v
v
+
log
dx c
= −
|g|
−
p
2 ∂x c
∂x c
∂x x
I
Therefore the above second-order SDE is satisfied by invariant densities
t → ∞ of the form
1
p(x, v) ∝ det(G(x)) exp − vT G(x)v − φ(x)
2
I
So for φ(x) = − log π(x) +
1
2
log det(G(x)) then it follows that marginally
p(x) = exp (−φ(x)) = π(x)
I
Numerical integration forms basis of MCMC scheme..... however
Second-Order Diffusions
Second-Order Diffusions
I
Transform the system of SDE’s from the configuration to phase space
such that transformed variable pt = G(xt )vt
Second-Order Diffusions
I
Transform the system of SDE’s from the configuration to phase space
such that transformed variable pt = G(xt )vt
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
p
2G−1 (xt )dWt
Second-Order Diffusions
I
I
Transform the system of SDE’s from the configuration to phase space
such that transformed variable pt = G(xt )vt
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
is transformed to
p
2G−1 (xt )dWt
Second-Order Diffusions
I
I
Transform the system of SDE’s from the configuration to phase space
such that transformed variable pt = G(xt )vt
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
p
2G−1 (xt )dWt
is transformed to
dxt
=
G−1 (xt )pt dt
dpt
=
−ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt +
p
2G(xt )dWt
Second-Order Diffusions
I
I
I
Transform the system of SDE’s from the configuration to phase space
such that transformed variable pt = G(xt )vt
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
p
2G−1 (xt )dWt
is transformed to
dxt
=
G−1 (xt )pt dt
dpt
=
−ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt +
where vector function ν has elements −∂i g ab pa pb
p
2G(xt )dWt
Second-Order Diffusions
I
I
Transform the system of SDE’s from the configuration to phase space
such that transformed variable pt = G(xt )vt
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
p
2G−1 (xt )dWt
is transformed to
dxt
=
G−1 (xt )pt dt
dpt
=
−ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt +
I
where vector function ν has elements −∂i g ab pa pb
I
Invariant measure with density
p
2G(xt )dWt
1 T −1
1
p(x, p) ∝
exp − p G (x)p − φ(x)
det(G(x))
2
Second-Order Diffusions
I
I
Transform the system of SDE’s from the configuration to phase space
such that transformed variable pt = G(xt )vt
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
p
2G−1 (xt )dWt
is transformed to
dxt
=
G−1 (xt )pt dt
dpt
=
−ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt +
I
where vector function ν has elements −∂i g ab pa pb
I
Invariant measure with density
p
2G(xt )dWt
1 T −1
1
p(x, p) ∝
exp − p G (x)p − φ(x)
det(G(x))
2
I
Stochastic Hamiltonian on Manifold - Lie-Trotter Splitting of Hamiltonian
(deterministic and stochastic OU Process) and using symplectic
integrator - samples drawn from invariant measure via RMHMC
Second-Order Diffusions
I
I
Transform the system of SDE’s from the configuration to phase space
such that transformed variable pt = G(xt )vt
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
p
2G−1 (xt )dWt
is transformed to
dxt
=
G−1 (xt )pt dt
dpt
=
−ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt +
I
where vector function ν has elements −∂i g ab pa pb
I
Invariant measure with density
p
2G(xt )dWt
1 T −1
1
p(x, p) ∝
exp − p G (x)p − φ(x)
det(G(x))
2
I
Stochastic Hamiltonian on Manifold - Lie-Trotter Splitting of Hamiltonian
(deterministic and stochastic OU Process) and using symplectic
integrator - samples drawn from invariant measure via RMHMC
I
Proposals follow local geodesics on manifold
Second-Order Diffusions
Second-Order Diffusions
I
Stochastic Hamiltonian on Manifold driven by conditional
Ornstein-Uhlenbeck process
Second-Order Diffusions
I
Stochastic Hamiltonian on Manifold driven by conditional
Ornstein-Uhlenbeck process
dxt
=
G−1 (xt )pt dt
dpt
=
−ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt +
p
2G(xt )dWt
Second-Order Diffusions
I
I
Stochastic Hamiltonian on Manifold driven by conditional
Ornstein-Uhlenbeck process
dxt
=
G−1 (xt )pt dt
dpt
=
−ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt +
p
2G(xt )dWt
Employ symplectic integrator e.g. RMHMC for drift term
Second-Order Diffusions
I
Stochastic Hamiltonian on Manifold driven by conditional
Ornstein-Uhlenbeck process
dxt
=
G−1 (xt )pt dt
dpt
=
−ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt +
p
2G(xt )dWt
I
Employ symplectic integrator e.g. RMHMC for drift term
I
Note that OU diffusion process corresponds to ad-hoc partial momentum
updating
Second-Order Diffusions
I
Stochastic Hamiltonian on Manifold driven by conditional
Ornstein-Uhlenbeck process
dxt
=
G−1 (xt )pt dt
dpt
=
−ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt +
p
2G(xt )dWt
I
Employ symplectic integrator e.g. RMHMC for drift term
I
Note that OU diffusion process corresponds to ad-hoc partial momentum
updating
I
Solid theoretical basis amenable to analysis of algorithm performance
Second-Order Diffusions
I
Stochastic Hamiltonian on Manifold driven by conditional
Ornstein-Uhlenbeck process
dxt
=
G−1 (xt )pt dt
dpt
=
−ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt +
p
2G(xt )dWt
I
Employ symplectic integrator e.g. RMHMC for drift term
I
Note that OU diffusion process corresponds to ad-hoc partial momentum
updating
I
Solid theoretical basis amenable to analysis of algorithm performance
I
Phase space defined on co-tangent bundle with symplectic forms in
place
Second-Order Diffusions
I
Stochastic Hamiltonian on Manifold driven by conditional
Ornstein-Uhlenbeck process
dxt
=
G−1 (xt )pt dt
dpt
=
−ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt +
p
2G(xt )dWt
I
Employ symplectic integrator e.g. RMHMC for drift term
I
Note that OU diffusion process corresponds to ad-hoc partial momentum
updating
I
Solid theoretical basis amenable to analysis of algorithm performance
I
Phase space defined on co-tangent bundle with symplectic forms in
place
I
What of the configuration space?
Second-Order Diffusions
Second-Order Diffusions
I
This SDE defined on co-tangent bundle - non-symplectic forms
Second-Order Diffusions
I
This SDE defined on co-tangent bundle - non-symplectic forms
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
p
2G−1 (xt )dWt
Second-Order Diffusions
I
I
This SDE defined on co-tangent bundle - non-symplectic forms
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
p
2G−1 (xt )dWt
Drift function can be seen to describe Lagrangian dynamics
Second-Order Diffusions
I
This SDE defined on co-tangent bundle - non-symplectic forms
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
p
2G−1 (xt )dWt
I
Drift function can be seen to describe Lagrangian dynamics
I
OU diffusion driving overall process
Second-Order Diffusions
I
This SDE defined on co-tangent bundle - non-symplectic forms
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
p
2G−1 (xt )dWt
I
Drift function can be seen to describe Lagrangian dynamics
I
OU diffusion driving overall process
I
Construct explicit integrator for dynamics - with volume correction
Second-Order Diffusions
I
This SDE defined on co-tangent bundle - non-symplectic forms
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
p
2G−1 (xt )dWt
I
Drift function can be seen to describe Lagrangian dynamics
I
OU diffusion driving overall process
I
Construct explicit integrator for dynamics - with volume correction
I
Preserves original Hamiltonian dynamics - high proposal acceptance
rates
Second-Order Diffusions
I
This SDE defined on co-tangent bundle - non-symplectic forms
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
p
2G−1 (xt )dWt
I
Drift function can be seen to describe Lagrangian dynamics
I
OU diffusion driving overall process
I
Construct explicit integrator for dynamics - with volume correction
I
Preserves original Hamiltonian dynamics - high proposal acceptance
rates
I
Speed improvement over implicit symplectic integrator between 2x to 10x
Second-Order Diffusions
I
This SDE defined on co-tangent bundle - non-symplectic forms
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
p
2G−1 (xt )dWt
I
Drift function can be seen to describe Lagrangian dynamics
I
OU diffusion driving overall process
I
Construct explicit integrator for dynamics - with volume correction
I
Preserves original Hamiltonian dynamics - high proposal acceptance
rates
I
Speed improvement over implicit symplectic integrator between 2x to 10x
Though non-symplectic geometry limits sampling - see The Geometry
Foundations of Hamiltonian Monte Carlo
http://arxiv.org/pdf/1410.5110.pdf
Second-Order Diffusions
I
This SDE defined on co-tangent bundle - non-symplectic forms
dxt
=
vt dt
dvt
=
−γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt +
p
2G−1 (xt )dWt
I
Drift function can be seen to describe Lagrangian dynamics
I
OU diffusion driving overall process
I
Construct explicit integrator for dynamics - with volume correction
I
Preserves original Hamiltonian dynamics - high proposal acceptance
rates
I
Speed improvement over implicit symplectic integrator between 2x to 10x
Though non-symplectic geometry limits sampling - see The Geometry
Foundations of Hamiltonian Monte Carlo
http://arxiv.org/pdf/1410.5110.pdf
I
Markov chain Monte Carlo from Lagrangian Dynamics
Journal of Comp.Graph.Stats, 2014
Gaussian Mixture Model
I
Univariate finite mixture model
p(x|µ, σ 2 ) = 0.7 × N (x|0, σ 2 ) + 0.3 × N (x|µ, σ 2 )
Gaussian Mixture Model
Univariate finite mixture model
p(x|µ, σ 2 ) = 0.7 × N (x|0, σ 2 ) + 0.3 × N (x|µ, σ 2 )
4
3
2
1
lnσ
I
0
−1
−2
−3
−4
−4
−3
−2
−1
0
µ
1
2
3
4
Figure : Arrows correspond to the gradients and ellipses to the inverse metric
tensor. Dashed lines are isocontours of the joint log density
Gaussian Mixture Model
MALA path
Simplified mMALA path
mMALA path
4
4
4
−1410
−1410
−1410
−1310
3
−1310
3
−1210
−1110
−1010
1
1
−910
0
−1
lnσ
−910
lnσ
lnσ
−910
0
−1410
−2
−1
0
µ
1
2
−1410
−2
−2910
−3
−4
−4
4
−3
−2
MALA Sample Autocorrelation Function
−1
0
µ
1
2
3
−4
−4
4
0.2
0
0.6
0.4
0.2
0
6
8
10
Lag
12
14
16
18
20
−0.2
−2
−1
0
µ
1
2
3
4
0.8
Sample Autocorrelation
Sample Autocorrelation
0.4
4
−3
Simplified mMALA Sample Autocorrelation Function
0.8
0.6
2
−2910
−3
mMALA Sample Autocorrelation Function
0.8
0
−1410
−2
−2910
−3
3
0
−1
−1
−2
Sample Autocorrelation
−1010
−1010
1
−0.2
−1110
2
2
−3
−1210
−1110
2
−4
−4
−1310
3
−1210
0.6
0.4
0.2
0
0
2
4
6
8
10
Lag
12
14
16
18
20
−0.2
0
2
4
6
8
10
Lag
12
14
16
18
20
Gaussian Mixture Model
HMC path
RMHMC path
4
Gibbs path
4
4
−1410
−1410
−1310
3
−1410
−1310
3
−1210
−1110
2
−1010
1
1
−1
−910
lnσ
−910
lnσ
lnσ
−910
0
−1
−1410
−2
−2
−1
0
µ
1
2
−1410
−4
−4
4
−3
−2
HMC Sample Autocorrelation Function
−1
0
µ
1
2
3
−4
−4
4
0.2
0
0.6
0.4
0.2
0
6
8
10
Lag
12
14
−2
16
18
20
−0.2
−1
0
µ
1
2
3
4
0.8
Sample Autocorrelation
Sample Autocorrelation
0.4
4
−3
Gibbs Sample Autocorrelation Function
0.8
0.6
2
−2910
−3
RMHMC Sample Autocorrelation Function
0.8
0
−1410
−2
−2910
−3
3
0
−1
−2
−2910
−3
Sample Autocorrelation
−1010
1
0
−0.2
−1110
2
−1010
−3
−1210
−1110
2
−4
−4
−1310
3
−1210
0.6
0.4
0.2
0
0
2
4
6
8
10
Lag
12
14
16
18
20
−0.2
0
2
4
6
8
10
Lag
12
14
16
18
20
Log-Gaussian Cox Point Process with Latent Field
I
The joint density for Poisson counts and latent Gaussian field
p(y, x|µ, σ, β) ∝
Y64
i,j
exp{yi,j xi,j −m exp(xi,j )} exp(−(x−µ1)T Σ−1
θ (x−µ1)/2)
Log-Gaussian Cox Point Process with Latent Field
I
The joint density for Poisson counts and latent Gaussian field
p(y, x|µ, σ, β) ∝
I
Y64
i,j
exp{yi,j xi,j −m exp(xi,j )} exp(−(x−µ1)T Σ−1
θ (x−µ1)/2)
Metric tensors
G(θ)i,j
=
1
∂Σθ −1 ∂Σθ
Σθ
trace Σ−1
θ
2
∂θi
∂θj
G(x)
=
Λ + Σ−1
θ
where Λ is diagonal with elements m exp(µ + (Σθ )i,i )
Log-Gaussian Cox Point Process with Latent Field
I
The joint density for Poisson counts and latent Gaussian field
p(y, x|µ, σ, β) ∝
I
Y64
i,j
exp{yi,j xi,j −m exp(xi,j )} exp(−(x−µ1)T Σ−1
θ (x−µ1)/2)
Metric tensors
G(θ)i,j
=
1
∂Σθ −1 ∂Σθ
Σθ
trace Σ−1
θ
2
∂θi
∂θj
G(x)
=
Λ + Σ−1
θ
where Λ is diagonal with elements m exp(µ + (Σθ )i,i )
I
Latent field metric tensor defining flat manifold is 4096 × 4096, O(N 3 )
obtained from parameter conditional
Log-Gaussian Cox Point Process with Latent Field
I
The joint density for Poisson counts and latent Gaussian field
p(y, x|µ, σ, β) ∝
I
Y64
i,j
exp{yi,j xi,j −m exp(xi,j )} exp(−(x−µ1)T Σ−1
θ (x−µ1)/2)
Metric tensors
G(θ)i,j
=
1
∂Σθ −1 ∂Σθ
Σθ
trace Σ−1
θ
2
∂θi
∂θj
G(x)
=
Λ + Σ−1
θ
where Λ is diagonal with elements m exp(µ + (Σθ )i,i )
I
Latent field metric tensor defining flat manifold is 4096 × 4096, O(N 3 )
obtained from parameter conditional
I
MALA requires transformation of latent field to sample - with separate
tuning in transient and stationary phases of Markov chain
RMHMC for Log-Gaussian Cox Point Processes
10
10
10
10
20
20
20
20
30
30
30
30
40
40
40
40
50
50
50
50
60
60
60
20
40
60
20
40
60
60
20
40
60
10
10
10
10
20
20
20
20
30
30
30
30
40
40
40
40
50
50
50
50
60
60
60
20
40
60
20
40
60
20
40
60
20
40
60
60
20
40
60
Figure : Data, Latent Field, Poisson Mean Field
RMHMC for Log-Gaussian Cox Point Processes
Table : Sampling the latent variables of a Log-Gaussian Cox Process - Comparison of
sampling methods
Method
MALA (Transient)
MALA (Stationary)
mMALA
RMHMC
Time
31,577
31,118
634
2936
ESS (Min, Med, Max)
(3, 8, 50)
(4, 16, 80)
(26, 84, 174)
(1951, 4545, 5000)
s/Min ESS
10,605
7836
24.1
1.5
Rel. Speed
×1
×1.35
×440
×7070
Illustrative Heat Conduction Problem
I
Heat conduction problem governed by an elliptic partial differential
equation in the open and bounded domain Ω:
Illustrative Heat Conduction Problem
I
Heat conduction problem governed by an elliptic partial differential
equation in the open and bounded domain Ω:
−∇ · (eu ∇w)
=
0
−eu ∇w · n
=
u Bi
=
−1
u
−e ∇w · n
in
Ω
on
on
∂Ω/ ΓR
ΓR
Illustrative Heat Conduction Problem
I
Heat conduction problem governed by an elliptic partial differential
equation in the open and bounded domain Ω:
−∇ · (eu ∇w)
=
0
−eu ∇w · n
=
u Bi
=
−1
u
−e ∇w · n
I
in
Ω
on
on
∂Ω/ ΓR
ΓR
Forward state is w and u the logarithm of distributed thermal conductivity
on Ω, n the unit outward normal on Ω, and Bi the Biot number.
Illustrative Heat Conduction Problem
I
Heat conduction problem governed by an elliptic partial differential
equation in the open and bounded domain Ω:
−∇ · (eu ∇w)
=
0
−eu ∇w · n
=
u Bi
=
−1
u
−e ∇w · n
I
in
Ω
on
on
∂Ω/ ΓR
ΓR
Forward state is w and u the logarithm of distributed thermal conductivity
on Ω, n the unit outward normal on Ω, and Bi the Biot number.
Illustrative Heat Conduction Problem
4
3.5
3
2.5
2
1.5
1
0.5
0
−3
−2
−1
0
1
2
3
Illustrative Heat Conduction Problem
Illustrative Heat Conduction Problem
I
Take one finite element discretisation of domain and one observation at
leftmost boundary
Illustrative Heat Conduction Problem
I
Take one finite element discretisation of domain and one observation at
leftmost boundary
I
Consider induced bivariate posterior for varying forms of prior Gaussian
measure
πpost
5
u2
0
−5
−10
−10
−5
u1
0
5
Hamiltonian on Riemann Manifold
The dynamics for k -th component of u is given by Hamiltons equations
duk
∂H
= G(u)−1 p
=
dt
∂pk
k
∂ G(u)
dpk
∂H
1
=−
= −∇k J (u) − Tr G(u)−1
dt
∂uk
2
∂uk
∂ G(u)
1
+ pT G(u)−1
G(u)−1 p
2
∂uk
Gradient
Gradient
Z
ũeu ∇w · ∇λ dΩ,
h∇J (u) , ũi =
Ω
First Order Forward
Z
eu ∇w · ∇λ̂ dΩ +
Ω
Z
Z
Bi w λ̂ ds =
∂Ω\ΓR
λ̂ ds,
ΓR
First Order Adjoint
Z
Ω
eu ∇λ · ∇ŵ dΩ +
Z
Bi λŵ ds = −
∂Ω\ΓR
K
1 X
(w (xj ) − dj ) ŵ (xj ) ,
σ2
j=1
Hamiltonian on Riemann Manifold
The dynamics for k -th component of u is given by Hamiltons equations
duk
∂H
= G(u)−1 p
=
dt
∂pk
k
∂ G(u)
dpk
∂H
1
=−
= −∇k J (u) − Tr G(u)−1
dt
∂uk
2
∂uk
∂ G(u)
1
+ pT G(u)−1
G(u)−1 p
2
∂uk
Hessian/Fisher Information Matrix
Fisher Matrix-Vector Product
D
E Z
ũeu ∇w · ∇λ˜2 dΩ,
hG (u) , ũi , u 2 =
Ω
Second Order Forward
Z
eu ∇w 2 · ∇λ̂ dΩ +
Z
Ω
Bi w 2 λ̂ ds = −
∂Ω\ΓR
Z
u 2 eu ∇w · ∇λ̂ dΩ,
Ω
Second Order Adjoint
Z
eu ∇λ˜2 · ∇ŵ dΩ +
Ω
Z
Bi λ˜2 ŵ ds = −
∂Ω\ΓR
K
1 X 2
w (xj ) ŵ (xj ) .
σ2
j=1
Hamiltonian on Riemann Manifold
The dynamics for k -th component of u is given by Hamiltons equations
duk
∂H
= G(u)−1 p
=
dt
∂pk
k
∂ G(u)
dpk
∂H
1
=−
= −∇k J (u) − Tr G(u)−1
dt
∂uk
2
∂uk
∂ G(u)
1
+ pT G(u)−1
G(u)−1 p
2
∂uk
Derivative of Fisher Information Matrix
Derivative of Fisher Matrix-Matrix product
DD
E
E
D D
E
E
hT (u) , ũi , u 2 , u 3 := ∇ hG (u) , ũi , u 2 , u 3
Z
Z
Z
ũeu ∇w · ∇λ2,3 dΩ
ũeu ∇w 3 · ∇λ˜2 dΩ +
ũu 3 eu ∇w · ∇λ˜2 dΩ +
=
Ω
Ω
Ω
Third Order Forward
Z
u
Z
3
Z
3
e ∇w · ∇λ̂ dΩ +
3 u
u e ∇w · ∇λ̂ dΩ.
Bi w λ̂ ds = −
Ω
∂Ω\ΓR
Ω
Third Order Adjoint
Z
u
2,3
e ∇λ
Ω
Z
· ∇ŵ dΩ +
2,3
Bi λ
ŵ ds = −
∂Ω\ΓR
K
1 X
σ 2 j=1
Z
w
2,3
xj ŵ xj
u e ∇λ˜2 · ∇ŵ dΩ,
3 u
−
Ω
Two-parameter Case
trace of ui
sRMMALA
Sample mean
ACF of ui
1
5
0.5
1
0
0
0
−5
0
0.5
1
1000
3000
5000
0
50
RMMALA
Lag
1
5
0.5
1
0
0
0
−5
0
0.5
1
1000
3000
5000
0
50
sRMHMC
Lag
1
5
0.5
1
0
0
0
−5
0
0.5
1
1000
3000
5000
0
50
RMHMC
Lag
1
5
0.5
1
0
0
0
−5
0
0.5
1
1000
3000
5000
0
50
Lag
I
ε = 0.7 for simRMMALA and RMMALA
I
ε = 0.02, L = 100 for simRMHMC and RMHMC
1025-parameter Case
u1
Low rank Fisher
5
0
0
0
u2
5000
0
−5
0
2
u1024
Exact full Hessian
5
−5
0
5
5000
5000
−5
0
2
5000
−2
0
2
5000
−2
0
5000
−5
0
2
5000
0
5000
0
5000
−5
0
5
0
0
0
−2
0
−5
0
5
0
0
−2
0
2
u1025
Exact Fisher
5
−2
0
2
5000
0
5000
−2
0
5000
1025-parameter Case
Low rank Fisher
Exact Fisher
u1
1
u2
u1024
1
0
0
0
−1
0
−1
0
−1
0
5
Lag
10
1
5
Lag
10
1
0
0
0
−1
0
−1
0
5
Lag
10
5
Lag
10
1
1
0
0
0
−1
0
−1
0
−1
0
5
Lag
10
5
Lag
10
5
Lag
10
5
Lag
10
5
Lag
10
1
0
0
0
−1
0
−1
0
−1
0
10
10
1
1
5
Lag
5
Lag
1
−1
0
1
u1025
Exact full Fisher
1
5
Lag
10
Bui-Thanh, T., and Girolami, M., Solving Large-scale PDE using Riemann
Manifold Hamiltonian MCMC, Inverse Problems, 30, 92014) 114014, IoP
Publishing.
Conclusion and Discussion
I
Geometry of statistical models harnessed in Monte Carlo methods
Conclusion and Discussion
I
Geometry of statistical models harnessed in Monte Carlo methods
I
Diffusions that respect structure and curvature of space - Manifold MALA
Conclusion and Discussion
I
Geometry of statistical models harnessed in Monte Carlo methods
I
Diffusions that respect structure and curvature of space - Manifold MALA
I
Geodesic flows on model manifold - RMHMC - generalisation of HMC
Conclusion and Discussion
I
Geometry of statistical models harnessed in Monte Carlo methods
I
Diffusions that respect structure and curvature of space - Manifold MALA
I
Geodesic flows on model manifold - RMHMC - generalisation of HMC
I
Assessed on correlated & high-dimensional latent variable models
Conclusion and Discussion
I
Geometry of statistical models harnessed in Monte Carlo methods
I
Diffusions that respect structure and curvature of space - Manifold MALA
I
Geodesic flows on model manifold - RMHMC - generalisation of HMC
I
Assessed on correlated & high-dimensional latent variable models
I
Promising capability of methodology
Conclusion and Discussion
I
I
Geometry of statistical models harnessed in Monte Carlo methods
I
Diffusions that respect structure and curvature of space - Manifold MALA
I
Geodesic flows on model manifold - RMHMC - generalisation of HMC
I
Assessed on correlated & high-dimensional latent variable models
I
Promising capability of methodology
Ongoing development
Conclusion and Discussion
I
I
Geometry of statistical models harnessed in Monte Carlo methods
I
Diffusions that respect structure and curvature of space - Manifold MALA
I
Geodesic flows on model manifold - RMHMC - generalisation of HMC
I
Assessed on correlated & high-dimensional latent variable models
I
Promising capability of methodology
Ongoing development
I
Theoretical analysis of convergence
Conclusion and Discussion
I
I
Geometry of statistical models harnessed in Monte Carlo methods
I
Diffusions that respect structure and curvature of space - Manifold MALA
I
Geodesic flows on model manifold - RMHMC - generalisation of HMC
I
Assessed on correlated & high-dimensional latent variable models
I
Promising capability of methodology
Ongoing development
I
Theoretical analysis of convergence
I
Geodesic Monte Carlo on Embedded Manifolds
http://arxiv.org/pdf/1301.6064.pdf
Conclusion and Discussion
I
I
Geometry of statistical models harnessed in Monte Carlo methods
I
Diffusions that respect structure and curvature of space - Manifold MALA
I
Geodesic flows on model manifold - RMHMC - generalisation of HMC
I
Assessed on correlated & high-dimensional latent variable models
I
Promising capability of methodology
Ongoing development
I
Theoretical analysis of convergence
I
Geodesic Monte Carlo on Embedded Manifolds
http://arxiv.org/pdf/1301.6064.pdf
I
Investigate alternative manifold structures
Conclusion and Discussion
I
I
Geometry of statistical models harnessed in Monte Carlo methods
I
Diffusions that respect structure and curvature of space - Manifold MALA
I
Geodesic flows on model manifold - RMHMC - generalisation of HMC
I
Assessed on correlated & high-dimensional latent variable models
I
Promising capability of methodology
Ongoing development
I
Theoretical analysis of convergence
I
Geodesic Monte Carlo on Embedded Manifolds
http://arxiv.org/pdf/1301.6064.pdf
I
Investigate alternative manifold structures
I
Design and effect of metric and connection
Conclusion and Discussion
I
I
Geometry of statistical models harnessed in Monte Carlo methods
I
Diffusions that respect structure and curvature of space - Manifold MALA
I
Geodesic flows on model manifold - RMHMC - generalisation of HMC
I
Assessed on correlated & high-dimensional latent variable models
I
Promising capability of methodology
Ongoing development
I
Theoretical analysis of convergence
I
Geodesic Monte Carlo on Embedded Manifolds
http://arxiv.org/pdf/1301.6064.pdf
I
Investigate alternative manifold structures
I
Design and effect of metric and connection
I
Optimality of Hamiltonian flows as local geodesics
Conclusion and Discussion
I
I
Geometry of statistical models harnessed in Monte Carlo methods
I
Diffusions that respect structure and curvature of space - Manifold MALA
I
Geodesic flows on model manifold - RMHMC - generalisation of HMC
I
Assessed on correlated & high-dimensional latent variable models
I
Promising capability of methodology
Ongoing development
I
Theoretical analysis of convergence
I
Geodesic Monte Carlo on Embedded Manifolds
http://arxiv.org/pdf/1301.6064.pdf
I
Investigate alternative manifold structures
I
Design and effect of metric and connection
I
Optimality of Hamiltonian flows as local geodesics
I
Surrogate geometries based on emulators for PDEs
Conclusion and Discussion
I
I
I
Geometry of statistical models harnessed in Monte Carlo methods
I
Diffusions that respect structure and curvature of space - Manifold MALA
I
Geodesic flows on model manifold - RMHMC - generalisation of HMC
I
Assessed on correlated & high-dimensional latent variable models
I
Promising capability of methodology
Ongoing development
I
Theoretical analysis of convergence
I
Geodesic Monte Carlo on Embedded Manifolds
http://arxiv.org/pdf/1301.6064.pdf
I
Investigate alternative manifold structures
I
Design and effect of metric and connection
I
Optimality of Hamiltonian flows as local geodesics
I
Surrogate geometries based on emulators for PDEs
No silver bullet or cure all - new powerful methodology for MC toolkit
Funding Acknowledgments
I
EPSRC Established Research Career Fellowship
I
EPSRC Programme Grant - Enabling Quantification of Uncertainty in
Inverse Problems
I
Royal Society Wolfson Research Merit Award
Download