Differential Geometric Simulation Methods for Uncertainty Quantification in Complex Model Systems Mark Girolami Department of Statistics University of Warwick Warwick Centre for Predictive Modelling February 2015 Never Mind the Big Data here’s the Big Models Never Mind the Big Data here’s the Big Models Never Mind the Big Data here’s the Big Models I Statistical inference in mathematical models (Finite Elements) emerging area of rapid development and growing importance in statistical methodology Never Mind the Big Data here’s the Big Models I Statistical inference in mathematical models (Finite Elements) emerging area of rapid development and growing importance in statistical methodology I Quantifying and propagating uncertainty in models e.g. Partial Differential Equations describing natural phenomena Never Mind the Big Data here’s the Big Models I Statistical inference in mathematical models (Finite Elements) emerging area of rapid development and growing importance in statistical methodology I Quantifying and propagating uncertainty in models e.g. Partial Differential Equations describing natural phenomena I Examples include Shallow Water Components of Global Climate models (UK MET Office) Never Mind the Big Data here’s the Big Models I Statistical inference in mathematical models (Finite Elements) emerging area of rapid development and growing importance in statistical methodology I Quantifying and propagating uncertainty in models e.g. Partial Differential Equations describing natural phenomena I Examples include Shallow Water Components of Global Climate models (UK MET Office) I Ground water flow, subsurface multi-phase flow e.g. oil aquifers (BP, Schlumberger) Never Mind the Big Data here’s the Big Models I Statistical inference in mathematical models (Finite Elements) emerging area of rapid development and growing importance in statistical methodology I Quantifying and propagating uncertainty in models e.g. Partial Differential Equations describing natural phenomena I Examples include Shallow Water Components of Global Climate models (UK MET Office) I Ground water flow, subsurface multi-phase flow e.g. oil aquifers (BP, Schlumberger) I Statistical inference in such models hugely challenging due to size of models and complexity of induced probability measures Never Mind the Big Data here’s the Big Models I Statistical inference in mathematical models (Finite Elements) emerging area of rapid development and growing importance in statistical methodology I Quantifying and propagating uncertainty in models e.g. Partial Differential Equations describing natural phenomena I Examples include Shallow Water Components of Global Climate models (UK MET Office) I Ground water flow, subsurface multi-phase flow e.g. oil aquifers (BP, Schlumberger) I Statistical inference in such models hugely challenging due to size of models and complexity of induced probability measures I This talk will concentrate on the issues associated with sampling from the induced measures from such models and associated data System Identification: Nonlinear ODE Oscillator Model Clock mRNA ... Clock Protein dx1 dt dx2 dt Protein = k1 − m1 x1 1 + xnρ = k2 x1 − m2 x2 .. . dxn dt = kn xn−1 − mn xn Inhibition Protein Systems Identification - Posterior Inference Mixing of Markov Chains 5 4 3 2 1 1 2 3 4 5 Illustrative Heat Conduction Problem I Heat conduction problem governed by an elliptic partial differential equation in the open and bounded domain Ω: Illustrative Heat Conduction Problem I Heat conduction problem governed by an elliptic partial differential equation in the open and bounded domain Ω: −∇ · (eu ∇w) = 0 −eu ∇w · n = u Bi = −1 u −e ∇w · n in Ω on on ∂Ω/ ΓR ΓR Illustrative Heat Conduction Problem I Heat conduction problem governed by an elliptic partial differential equation in the open and bounded domain Ω: −∇ · (eu ∇w) = 0 −eu ∇w · n = u Bi = −1 u −e ∇w · n I in Ω on on ∂Ω/ ΓR ΓR Forward state is w and u the logarithm of distributed thermal conductivity on Ω, n the unit outward normal on Ω, and Bi the Biot number. Illustrative Heat Conduction Problem I Heat conduction problem governed by an elliptic partial differential equation in the open and bounded domain Ω: −∇ · (eu ∇w) = 0 −eu ∇w · n = u Bi = −1 u −e ∇w · n I in Ω on on ∂Ω/ ΓR ΓR Forward state is w and u the logarithm of distributed thermal conductivity on Ω, n the unit outward normal on Ω, and Bi the Biot number. Illustrative Heat Conduction Problem 4 3.5 3 2.5 2 1.5 1 0.5 0 −3 −2 −1 0 1 2 3 Illustrative Heat Conduction Problem Illustrative Heat Conduction Problem I Take one finite element discretisation of domain and one observation at leftmost boundary Illustrative Heat Conduction Problem I Take one finite element discretisation of domain and one observation at leftmost boundary I Consider induced bivariate posterior for varying forms of prior Gaussian measure πpost 5 u2 0 −5 −10 −10 −5 u1 0 5 MCMC from Diffusions and Geodesics I Riemann manifold Langevin and Hamiltonian Monte Carlo Methods Girolami, M. & Calderhead, B. J.R.Statist. Soc. B, with discussion, (2011), 73, 2, 123 - 214. http://www2.warwick.ac.uk/mgirolami Talk Outline I Motivation to improve MCMC capability for challenging problems Talk Outline I Motivation to improve MCMC capability for challenging problems I Exploring differential geometric concepts in MCMC methodology Talk Outline I Motivation to improve MCMC capability for challenging problems I Exploring differential geometric concepts in MCMC methodology I Diffusions across Riemann manifold as basis for MCMC Talk Outline I Motivation to improve MCMC capability for challenging problems I Exploring differential geometric concepts in MCMC methodology I Diffusions across Riemann manifold as basis for MCMC I Geodesic flows on manifold form basis of MCMC methods Talk Outline I Motivation to improve MCMC capability for challenging problems I Exploring differential geometric concepts in MCMC methodology I Diffusions across Riemann manifold as basis for MCMC I Geodesic flows on manifold form basis of MCMC methods I Illustrative examples Talk Outline I Motivation to improve MCMC capability for challenging problems I Exploring differential geometric concepts in MCMC methodology I Diffusions across Riemann manifold as basis for MCMC I Geodesic flows on manifold form basis of MCMC methods I Illustrative examples I Further Work and Conclusions Motivation Simulation Based Inference I Monte Carlo method employs samples from π(θ) to obtain estimate Z 1 1 X φ(θ)π(θ)dθ = φ(θ n ) + O(N − 2 ) n N Motivation Simulation Based Inference I Monte Carlo method employs samples from π(θ) to obtain estimate Z 1 1 X φ(θ)π(θ)dθ = φ(θ n ) + O(N − 2 ) n N I Draw θ n from ergodic Markov process with stationary distribution π(θ) Motivation Simulation Based Inference I Monte Carlo method employs samples from π(θ) to obtain estimate Z 1 1 X φ(θ)π(θ)dθ = φ(θ n ) + O(N − 2 ) n N I Draw θ n from ergodic Markov process with stationary distribution π(θ) I Construct transition kernel as product of two components Motivation Simulation Based Inference I Monte Carlo method employs samples from π(θ) to obtain estimate Z 1 1 X φ(θ)π(θ)dθ = φ(θ n ) + O(N − 2 ) n N I Draw θ n from ergodic Markov process with stationary distribution π(θ) I Construct transition kernel as product of two components I Propose a move θ → θ 0 with probability pp (θ 0 |θ) Motivation Simulation Based Inference I Monte Carlo method employs samples from π(θ) to obtain estimate Z 1 1 X φ(θ)π(θ)dθ = φ(θ n ) + O(N − 2 ) n N I Draw θ n from ergodic Markov process with stationary distribution π(θ) I Construct transition kernel as product of two components I I Propose a move θ → θ 0 with probability pp (θ 0 |θ) accept or reject proposal with probability π(θ 0 )pp (θ|θ 0 ) pa (θ 0 |θ) = min 1, 0 π(θ)pp (θ |θ) Motivation Simulation Based Inference I Monte Carlo method employs samples from π(θ) to obtain estimate Z 1 1 X φ(θ)π(θ)dθ = φ(θ n ) + O(N − 2 ) n N I Draw θ n from ergodic Markov process with stationary distribution π(θ) I Construct transition kernel as product of two components I I I Propose a move θ → θ 0 with probability pp (θ 0 |θ) accept or reject proposal with probability π(θ 0 )pp (θ|θ 0 ) pa (θ 0 |θ) = min 1, 0 π(θ)pp (θ |θ) Convergence rate and asymptotic variance dependent on pp (θ 0 |θ) Motivation Simulation Based Inference I Monte Carlo method employs samples from π(θ) to obtain estimate Z 1 1 X φ(θ)π(θ)dθ = φ(θ n ) + O(N − 2 ) n N I Draw θ n from ergodic Markov process with stationary distribution π(θ) I Construct transition kernel as product of two components I I Propose a move θ → θ 0 with probability pp (θ 0 |θ) accept or reject proposal with probability π(θ 0 )pp (θ|θ 0 ) pa (θ 0 |θ) = min 1, 0 π(θ)pp (θ |θ) I Convergence rate and asymptotic variance dependent on pp (θ 0 |θ) I Success of MCMC reliant upon appropriate proposal design Differential Geometric Concepts in MCMC Differential Geometric Concepts in MCMC I Denote expected Fisher Information as G(θ) = cov(∇θ L(θ)) Differential Geometric Concepts in MCMC I Denote expected Fisher Information as G(θ) = cov(∇θ L(θ)) I Rao, 1945 to first order Z |p(y; θ + δθ) − p(y; θ)|2 χ2 (δθ) = dy ≈ δθ T G(θ)δθ p(y; θ) Differential Geometric Concepts in MCMC I Denote expected Fisher Information as G(θ) = cov(∇θ L(θ)) I Rao, 1945 to first order Z |p(y; θ + δθ) − p(y; θ)|2 χ2 (δθ) = dy ≈ δθ T G(θ)δθ p(y; θ) I Jeffreys, 1948 to first order Z p(y; θ + δθ) dy ≈ δθ T G(θ)δθ D(θ||δθ) = p(y; θ + δθ) log p(y; θ) Differential Geometric Concepts in MCMC I Denote expected Fisher Information as G(θ) = cov(∇θ L(θ)) I Rao, 1945 to first order Z |p(y; θ + δθ) − p(y; θ)|2 χ2 (δθ) = dy ≈ δθ T G(θ)δθ p(y; θ) I Jeffreys, 1948 to first order Z p(y; θ + δθ) dy ≈ δθ T G(θ)δθ D(θ||δθ) = p(y; θ + δθ) log p(y; θ) I Expected Fisher Information G(θ) is metric tensor of a Riemann manifold Differential Geometric Concepts in MCMC I Denote expected Fisher Information as G(θ) = cov(∇θ L(θ)) I Rao, 1945 to first order Z |p(y; θ + δθ) − p(y; θ)|2 χ2 (δθ) = dy ≈ δθ T G(θ)δθ p(y; θ) I Jeffreys, 1948 to first order Z p(y; θ + δθ) dy ≈ δθ T G(θ)δθ D(θ||δθ) = p(y; θ + δθ) log p(y; θ) I Expected Fisher Information G(θ) is metric tensor of a Riemann manifold I Non-Euclidean geometry - invariants, connections, curvature, geodesics Differential Geometric Concepts in MCMC I Denote expected Fisher Information as G(θ) = cov(∇θ L(θ)) I Rao, 1945 to first order Z |p(y; θ + δθ) − p(y; θ)|2 χ2 (δθ) = dy ≈ δθ T G(θ)δθ p(y; θ) I Jeffreys, 1948 to first order Z p(y; θ + δθ) dy ≈ δθ T G(θ)δθ D(θ||δθ) = p(y; θ + δθ) log p(y; θ) I Expected Fisher Information G(θ) is metric tensor of a Riemann manifold I Non-Euclidean geometry - invariants, connections, curvature, geodesics I Asymptotic statistical analysis. e.g. Amari, 1981; Murray & Rice, 1993; Critchley et al, 1993; Kass, 1989; Dawid, 1975; Lauritsen, 1989 Differential Geometric Concepts in MCMC I Denote expected Fisher Information as G(θ) = cov(∇θ L(θ)) I Rao, 1945 to first order Z |p(y; θ + δθ) − p(y; θ)|2 χ2 (δθ) = dy ≈ δθ T G(θ)δθ p(y; θ) I Jeffreys, 1948 to first order Z p(y; θ + δθ) dy ≈ δθ T G(θ)δθ D(θ||δθ) = p(y; θ + δθ) log p(y; θ) I Expected Fisher Information G(θ) is metric tensor of a Riemann manifold I Non-Euclidean geometry - invariants, connections, curvature, geodesics I Asymptotic statistical analysis. e.g. Amari, 1981; Murray & Rice, 1993; Critchley et al, 1993; Kass, 1989; Dawid, 1975; Lauritsen, 1989 I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998 Differential Geometric Concepts in MCMC I Denote expected Fisher Information as G(θ) = cov(∇θ L(θ)) I Rao, 1945 to first order Z |p(y; θ + δθ) − p(y; θ)|2 χ2 (δθ) = dy ≈ δθ T G(θ)δθ p(y; θ) I Jeffreys, 1948 to first order Z p(y; θ + δθ) dy ≈ δθ T G(θ)δθ D(θ||δθ) = p(y; θ + δθ) log p(y; θ) I Expected Fisher Information G(θ) is metric tensor of a Riemann manifold I Non-Euclidean geometry - invariants, connections, curvature, geodesics I Asymptotic statistical analysis. e.g. Amari, 1981; Murray & Rice, 1993; Critchley et al, 1993; Kass, 1989; Dawid, 1975; Lauritsen, 1989 I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998 I Can geometric structure be employed in Monte Carlo methodology? Manifolds A manifold is a smooth, curved surface: A set embedded in Rd , that locally looks like Rn (n < d). Example: the unit sphere (2-sphere): d = 3, n = 2 X 2 S2 = {x ∈ R3 : xi = 1} i Manifolds A manifold is a smooth, curved surface: A set embedded in Rd , that locally looks like Rn (n < d). Example: the unit sphere (2-sphere): d = 3, n = 2 X 2 S2 = {x ∈ R3 : xi = 1} i x ∈ Rd are the embedded coordinates Coordinate systems and Riemannian metrics Can parameterise the manifold with a coordinate system in q ∈ Rn q2 q1 (sin q1 sin q2 , cos q1 sin q2 , cos q2 ), q1 ∈ [0, 2π], q2 ∈ [0, π] Coordinate systems and Riemannian metrics Can parameterise the manifold with a coordinate system in q ∈ Rn q2 q1 (sin q1 sin q2 , cos q1 sin q2 , cos q2 ), q1 ∈ [0, 2π], q2 ∈ [0, π] The Euclidean metric k · k induces a Riemannian metric G in the coordinate system: X kdxk2 = G(q)dqi dqj i,j M.C. Escher, Heaven and Hell, 1960 Geodesics Geodesics are the paths of shortest distance. I On a sphere, these are the great circles q2 q1 Geodesics Geodesics are the paths of shortest distance. I On a sphere, these are the great circles q2 q1 Given an initial velocity v (0) ⊥ x(0), we have a nice explicit form 1 cos(αt) − sin(αt) 1 x(t) v (t) = x(0) v (0) −1 sin(αt) cos(αt) α where α = kv (0)k. α Geometric Concepts in MCMC Geometric Concepts in MCMC I Tangent space - local metric defined by δθ T G(θ)δθ = gkl δθk δθl Geometric Concepts in MCMC I Tangent space - local metric defined by δθ T G(θ)δθ = gkl δθk δθl I Christoffel symbols - characterise Levi-Civita connection on manifold Geometric Concepts in MCMC I Tangent space - local metric defined by δθ T G(θ)δθ = gkl δθk δθl I Christoffel symbols - characterise Levi-Civita connection on manifold 1 X im ∂gmk ∂gml ∂gkl Γikl = g + − 2 m ∂θl ∂θk ∂θm Geometric Concepts in MCMC I Tangent space - local metric defined by δθ T G(θ)δθ = gkl δθk δθl I Christoffel symbols - characterise Levi-Civita connection on manifold 1 X im ∂gmk ∂gml ∂gkl Γikl = g + − 2 m ∂θl ∂θk ∂θm I Geodesics - shortest path between two points on manifold Geometric Concepts in MCMC I Tangent space - local metric defined by δθ T G(θ)δθ = gkl δθk δθl I Christoffel symbols - characterise Levi-Civita connection on manifold 1 X im ∂gmk ∂gml ∂gkl Γikl = g + − 2 m ∂θl ∂θk ∂θm I Geodesics - shortest path between two points on manifold X i dθk dθl d 2 θi + Γkl =0 2 dt dt dt k,l Fisher–Rao metric A family of probability densities {p(· | θ) : θ ∈ Θ}, the Fisher information forms a natural Riemannian metric, known as Fisher–Rao metric: ∂2 log p(Y | θ) gij = EY |θ − ∂θi ∂θj Fisher–Rao metric A family of probability densities {p(· | θ) : θ ∈ Θ}, the Fisher information forms a natural Riemannian metric, known as Fisher–Rao metric: ∂2 log p(Y | θ) gij = EY |θ − ∂θi ∂θj Examples: N(µ, σ 2 ) σ µ Fisher–Rao metric A family of probability densities {p(· | θ) : θ ∈ Θ}, the Fisher information forms a natural Riemannian metric, known as Fisher–Rao metric: ∂2 log p(Y | θ) gij = EY |θ − ∂θi ∂θj Examples: N(µ, σ 2 ) σ µ Illustration of Geometric Concepts I Consider Normal density p(x|µ, σ) = Nx (µ, σ) Illustration of Geometric Concepts I Consider Normal density p(x|µ, σ) = Nx (µ, σ) I Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ)T Illustration of Geometric Concepts I Consider Normal density p(x|µ, σ) = Nx (µ, σ) I Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ)T I Metric is Expected Fisher Information −2 σ G(µ, σ) = 0 0 2σ −2 Illustration of Geometric Concepts I Consider Normal density p(x|µ, σ) = Nx (µ, σ) I Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ)T I Metric is Expected Fisher Information −2 σ G(µ, σ) = 0 I 0 2σ −2 Components of connection ∂µ G = 0 and ∂σ G = − diag(2σ −3 , 4σ −3 ) Illustration of Geometric Concepts I Consider Normal density p(x|µ, σ) = Nx (µ, σ) I Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ)T I Metric is Expected Fisher Information −2 σ G(µ, σ) = 0 0 2σ −2 I Components of connection ∂µ G = 0 and ∂σ G = − diag(2σ −3 , 4σ −3 ) I Metric on tangent space δθ T G(θ)δθ = (δµ2 + 2δσ 2 ) σ2 Illustration of Geometric Concepts I Consider Normal density p(x|µ, σ) = Nx (µ, σ) I Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ)T I Metric is Expected Fisher Information −2 σ G(µ, σ) = 0 0 2σ −2 I Components of connection ∂µ G = 0 and ∂σ G = − diag(2σ −3 , 4σ −3 ) I Metric on tangent space δθ T G(θ)δθ = I (δµ2 + 2δσ 2 ) σ2 Metric tensor for univariate Normal defines a Hyperbolic Space Illustration of Geometric Concepts I Consider Normal density p(x|µ, σ) = Nx (µ, σ) I Local inner product on tangent space defined by metric tensor, i.e. δθ T G(θ)δθ, where θ = (µ, σ)T I Metric is Expected Fisher Information −2 σ G(µ, σ) = 0 0 2σ −2 I Components of connection ∂µ G = 0 and ∂σ G = − diag(2σ −3 , 4σ −3 ) I Metric on tangent space δθ T G(θ)δθ = (δµ2 + 2δσ 2 ) σ2 I Metric tensor for univariate Normal defines a Hyperbolic Space I Consider densities N (0, 1) & N (1, 1) and N (0, 2) & N (1, 2) Normal Density - Euclidean Parameter space σ N (0, 2) N (1, 2) N (0, 1) N (1, 1) µ Normal Density - Riemannian Functional space σ N (0, 2) N (1, 2) N (0, 1) N (1, 1) µ Langevin Diffusion on Riemannian manifold I Continuous Langevin diffusion with invariant measure π(θ) ≡ exp(L(θ)) D dθ = X p 1 −1 G−1 (θ)dW G (θ)∇θ L(θ) − G(θ)−1 ij Γij + 2 i,j Langevin Diffusion on Riemannian manifold I Continuous Langevin diffusion with invariant measure π(θ) ≡ exp(L(θ)) D dθ = X p 1 −1 G−1 (θ)dW G (θ)∇θ L(θ) − G(θ)−1 ij Γij + 2 i,j I Discretised Langevin diffusion on manifold defines proposal mechanism θ 0d = θ d + D p X 2 −1 d G (θ)∇θ L(θ) − 2 G(θ)−1 G−1 (θ)z ij Γij + 2 d d i,j Langevin Diffusion on Riemannian manifold I Continuous Langevin diffusion with invariant measure π(θ) ≡ exp(L(θ)) D dθ = X p 1 −1 G−1 (θ)dW G (θ)∇θ L(θ) − G(θ)−1 ij Γij + 2 i,j I Discretised Langevin diffusion on manifold defines proposal mechanism I D p X 2 −1 d G (θ)∇θ L(θ) − 2 G(θ)−1 G−1 (θ)z ij Γij + 2 d d i,j Manifold with constant curvature then proposal mechanism reduces to θ 0d = θ d + θ0 = θ + p 2 −1 G (θ)∇θ L(θ) + G−1 (θ)z 2 Langevin Diffusion on Riemannian manifold I Continuous Langevin diffusion with invariant measure π(θ) ≡ exp(L(θ)) D dθ = X p 1 −1 G−1 (θ)dW G (θ)∇θ L(θ) − G(θ)−1 ij Γij + 2 i,j I Discretised Langevin diffusion on manifold defines proposal mechanism I D p X 2 −1 d G (θ)∇θ L(θ) − 2 G(θ)−1 G−1 (θ)z ij Γij + 2 d d i,j Manifold with constant curvature then proposal mechanism reduces to I p 2 θ 0 = θ + G −1 (θ)∇θ L(θ) + G−1 (θ)z 2 MALA proposal with preconditioning θ 0d = θ d + θ0 = θ + √ 2 M∇θ L(θ) + Mz 2 Langevin Diffusion on Riemannian manifold I Continuous Langevin diffusion with invariant measure π(θ) ≡ exp(L(θ)) D dθ = X p 1 −1 G−1 (θ)dW G (θ)∇θ L(θ) − G(θ)−1 ij Γij + 2 i,j I Discretised Langevin diffusion on manifold defines proposal mechanism I D p X 2 −1 d G (θ)∇θ L(θ) − 2 G(θ)−1 G−1 (θ)z ij Γij + 2 d d i,j Manifold with constant curvature then proposal mechanism reduces to I p 2 θ 0 = θ + G −1 (θ)∇θ L(θ) + G−1 (θ)z 2 MALA proposal with preconditioning I √ 2 M∇θ L(θ) + Mz 2 Proposal and acceptance probability θ 0d = θ d + θ0 = θ + pp (θ 0 |θ) = N (θ 0 |µ(θ, ), 2 G−1 (θ)) π(θ 0 )pp (θ|θ 0 ) pa (θ 0 |θ) = min 1, π(θ)pp (θ 0 |θ) Langevin Diffusion on Riemannian manifold I Continuous Langevin diffusion with invariant measure π(θ) ≡ exp(L(θ)) D dθ = X p 1 −1 G−1 (θ)dW G (θ)∇θ L(θ) − G(θ)−1 ij Γij + 2 i,j I Discretised Langevin diffusion on manifold defines proposal mechanism I D p X 2 −1 d G (θ)∇θ L(θ) − 2 G(θ)−1 G−1 (θ)z ij Γij + 2 d d i,j Manifold with constant curvature then proposal mechanism reduces to I p 2 θ 0 = θ + G −1 (θ)∇θ L(θ) + G−1 (θ)z 2 MALA proposal with preconditioning I √ 2 M∇θ L(θ) + Mz 2 Proposal and acceptance probability θ 0d = θ d + θ0 = θ + pp (θ 0 |θ) = N (θ 0 |µ(θ, ), 2 G−1 (θ)) π(θ 0 )pp (θ|θ 0 ) pa (θ 0 |θ) = min 1, π(θ)pp (θ 0 |θ) I Proposal mechanism diffuses approximately along the manifold Langevin Diffusion on Riemannian manifold 35 35 30 30 25 25 ! 40 ! 40 20 20 15 15 10 10 5 −5 0 µ 5 5 −5 0 µ 5 Langevin Diffusion on Riemannian manifold 20 20 15 15 ! 25 ! 25 10 10 5 5 −10 0 µ 10 −10 0 µ 10 Second-Order Diffusions Second-Order Diffusions I Define a second-order SDE on a Riemann manifold with the diffusion defined on vector field v to obtain Second-Order Diffusions I Define a second-order SDE on a Riemann manifold with the diffusion defined on vector field v to obtain dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + p 2G−1 (xt )dWt Second-Order Diffusions I I Define a second-order SDE on a Riemann manifold with the diffusion defined on vector field v to obtain dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + Vector function γ has components γk = P i,j Γkij v i v j p 2G−1 (xt )dWt Second-Order Diffusions I Define a second-order SDE on a Riemann manifold with the diffusion defined on vector field v to obtain dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + P p 2G−1 (xt )dWt Γkij v i v j I Vector function γ has components γk = I Each Γki,j being the Christofell symbols of the Levi-Civita connection corresponding to the metric tensor G(xt ) i,j Second-Order Diffusions I Define a second-order SDE on a Riemann manifold with the diffusion defined on vector field v to obtain dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + P p 2G−1 (xt )dWt Γkij v i v j I Vector function γ has components γk = I Each Γki,j being the Christofell symbols of the Levi-Civita connection corresponding to the metric tensor G(xt ) I Note dvt + γ(xt , vt )dt is the covariant time derivative of velocity and the contravariant form of the potential gradient is simply G−1 (xt )∇xt φ(xt ) i,j Second-Order Diffusions I Define a second-order SDE on a Riemann manifold with the diffusion defined on vector field v to obtain dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + P p 2G−1 (xt )dWt Γkij v i v j I Vector function γ has components γk = I Each Γki,j being the Christofell symbols of the Levi-Civita connection corresponding to the metric tensor G(xt ) I Note dvt + γ(xt , vt )dt is the covariant time derivative of velocity and the contravariant form of the potential gradient is simply G−1 (xt )∇xt φ(xt ) I Physically defines acceleration of a particle across manifold under the influence of potential field i,j Second-Order Diffusions I Define a second-order SDE on a Riemann manifold with the diffusion defined on vector field v to obtain dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + P p 2G−1 (xt )dWt Γkij v i v j I Vector function γ has components γk = I Each Γki,j being the Christofell symbols of the Levi-Civita connection corresponding to the metric tensor G(xt ) I Note dvt + γ(xt , vt )dt is the covariant time derivative of velocity and the contravariant form of the potential gradient is simply G−1 (xt )∇xt φ(xt ) I Physically defines acceleration of a particle across manifold under the influence of potential field I A unit friction or viscous force is included by the term v and the fluctuating forcing is modelled by the Brownian motion i,j Second-Order Diffusions I Define a second-order SDE on a Riemann manifold with the diffusion defined on vector field v to obtain dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + P p 2G−1 (xt )dWt Γkij v i v j I Vector function γ has components γk = I Each Γki,j being the Christofell symbols of the Levi-Civita connection corresponding to the metric tensor G(xt ) I Note dvt + γ(xt , vt )dt is the covariant time derivative of velocity and the contravariant form of the potential gradient is simply G−1 (xt )∇xt φ(xt ) I Physically defines acceleration of a particle across manifold under the influence of potential field I A unit friction or viscous force is included by the term v and the fluctuating forcing is modelled by the Brownian motion I What has this to do with MCMC? i,j Second-Order Diffusions Second-Order Diffusions I What is invariant measure of this diffusion? Second-Order Diffusions I What is invariant measure of this diffusion? I The Fokker-Planck equation for the diffusion follows as Second-Order Diffusions I What is invariant measure of this diffusion? I The Fokker-Planck equation for the diffusion follows as dp ∂ ∂2p ∂p a a b c ab ∂φ Γ + v p + g ab a b = − va a + v v + g bc a b dt ∂x ∂v ∂x ∂v ∂v Second-Order Diffusions I What is invariant measure of this diffusion? I The Fokker-Planck equation for the diffusion follows as dp ∂ ∂2p ∂p a a b c ab ∂φ Γ + v p + g ab a b = − va a + v v + g bc a b dt ∂x ∂v ∂x ∂v ∂v I Some schoolboy algebra shows that for dp/dt = 0 Second-Order Diffusions I What is invariant measure of this diffusion? I The Fokker-Planck equation for the diffusion follows as dp ∂ ∂2p ∂p a a b c ab ∂φ Γ + v p + g ab a b = − va a + v v + g bc a b dt ∂x ∂v ∂x ∂v ∂v I Some schoolboy algebra shows that for dp/dt = 0 p ∂ 1 ∂gab a b ∂φ dp v v + log dx c = − |g| − p 2 ∂x c ∂x c ∂x x Second-Order Diffusions I What is invariant measure of this diffusion? I The Fokker-Planck equation for the diffusion follows as dp ∂ ∂2p ∂p a a b c ab ∂φ Γ + v p + g ab a b = − va a + v v + g bc a b dt ∂x ∂v ∂x ∂v ∂v I Some schoolboy algebra shows that for dp/dt = 0 p ∂ 1 ∂gab a b ∂φ dp v v + log dx c = − |g| − p 2 ∂x c ∂x c ∂x x I Therefore the above second-order SDE is satisfied by invariant densities t → ∞ of the form Second-Order Diffusions I What is invariant measure of this diffusion? I The Fokker-Planck equation for the diffusion follows as dp ∂ ∂2p ∂p a a b c ab ∂φ Γ + v p + g ab a b = − va a + v v + g bc a b dt ∂x ∂v ∂x ∂v ∂v I Some schoolboy algebra shows that for dp/dt = 0 p ∂ 1 ∂gab a b ∂φ dp v v + log dx c = − |g| − p 2 ∂x c ∂x c ∂x x I Therefore the above second-order SDE is satisfied by invariant densities t → ∞ of the form 1 p(x, v) ∝ det(G(x)) exp − vT G(x)v − φ(x) 2 Second-Order Diffusions I What is invariant measure of this diffusion? I The Fokker-Planck equation for the diffusion follows as dp ∂ ∂2p ∂p a a b c ab ∂φ Γ + v p + g ab a b = − va a + v v + g bc a b dt ∂x ∂v ∂x ∂v ∂v I Some schoolboy algebra shows that for dp/dt = 0 p ∂ 1 ∂gab a b ∂φ dp v v + log dx c = − |g| − p 2 ∂x c ∂x c ∂x x I Therefore the above second-order SDE is satisfied by invariant densities t → ∞ of the form 1 p(x, v) ∝ det(G(x)) exp − vT G(x)v − φ(x) 2 I So for φ(x) = − log π(x) + 1 2 log det(G(x)) then it follows that marginally p(x) = exp (−φ(x)) = π(x) I Numerical integration forms basis of MCMC scheme..... however Second-Order Diffusions Second-Order Diffusions I Transform the system of SDE’s from the configuration to phase space such that transformed variable pt = G(xt )vt Second-Order Diffusions I Transform the system of SDE’s from the configuration to phase space such that transformed variable pt = G(xt )vt dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + p 2G−1 (xt )dWt Second-Order Diffusions I I Transform the system of SDE’s from the configuration to phase space such that transformed variable pt = G(xt )vt dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + is transformed to p 2G−1 (xt )dWt Second-Order Diffusions I I Transform the system of SDE’s from the configuration to phase space such that transformed variable pt = G(xt )vt dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + p 2G−1 (xt )dWt is transformed to dxt = G−1 (xt )pt dt dpt = −ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt + p 2G(xt )dWt Second-Order Diffusions I I I Transform the system of SDE’s from the configuration to phase space such that transformed variable pt = G(xt )vt dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + p 2G−1 (xt )dWt is transformed to dxt = G−1 (xt )pt dt dpt = −ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt + where vector function ν has elements −∂i g ab pa pb p 2G(xt )dWt Second-Order Diffusions I I Transform the system of SDE’s from the configuration to phase space such that transformed variable pt = G(xt )vt dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + p 2G−1 (xt )dWt is transformed to dxt = G−1 (xt )pt dt dpt = −ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt + I where vector function ν has elements −∂i g ab pa pb I Invariant measure with density p 2G(xt )dWt 1 T −1 1 p(x, p) ∝ exp − p G (x)p − φ(x) det(G(x)) 2 Second-Order Diffusions I I Transform the system of SDE’s from the configuration to phase space such that transformed variable pt = G(xt )vt dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + p 2G−1 (xt )dWt is transformed to dxt = G−1 (xt )pt dt dpt = −ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt + I where vector function ν has elements −∂i g ab pa pb I Invariant measure with density p 2G(xt )dWt 1 T −1 1 p(x, p) ∝ exp − p G (x)p − φ(x) det(G(x)) 2 I Stochastic Hamiltonian on Manifold - Lie-Trotter Splitting of Hamiltonian (deterministic and stochastic OU Process) and using symplectic integrator - samples drawn from invariant measure via RMHMC Second-Order Diffusions I I Transform the system of SDE’s from the configuration to phase space such that transformed variable pt = G(xt )vt dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + p 2G−1 (xt )dWt is transformed to dxt = G−1 (xt )pt dt dpt = −ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt + I where vector function ν has elements −∂i g ab pa pb I Invariant measure with density p 2G(xt )dWt 1 T −1 1 p(x, p) ∝ exp − p G (x)p − φ(x) det(G(x)) 2 I Stochastic Hamiltonian on Manifold - Lie-Trotter Splitting of Hamiltonian (deterministic and stochastic OU Process) and using symplectic integrator - samples drawn from invariant measure via RMHMC I Proposals follow local geodesics on manifold Second-Order Diffusions Second-Order Diffusions I Stochastic Hamiltonian on Manifold driven by conditional Ornstein-Uhlenbeck process Second-Order Diffusions I Stochastic Hamiltonian on Manifold driven by conditional Ornstein-Uhlenbeck process dxt = G−1 (xt )pt dt dpt = −ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt + p 2G(xt )dWt Second-Order Diffusions I I Stochastic Hamiltonian on Manifold driven by conditional Ornstein-Uhlenbeck process dxt = G−1 (xt )pt dt dpt = −ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt + p 2G(xt )dWt Employ symplectic integrator e.g. RMHMC for drift term Second-Order Diffusions I Stochastic Hamiltonian on Manifold driven by conditional Ornstein-Uhlenbeck process dxt = G−1 (xt )pt dt dpt = −ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt + p 2G(xt )dWt I Employ symplectic integrator e.g. RMHMC for drift term I Note that OU diffusion process corresponds to ad-hoc partial momentum updating Second-Order Diffusions I Stochastic Hamiltonian on Manifold driven by conditional Ornstein-Uhlenbeck process dxt = G−1 (xt )pt dt dpt = −ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt + p 2G(xt )dWt I Employ symplectic integrator e.g. RMHMC for drift term I Note that OU diffusion process corresponds to ad-hoc partial momentum updating I Solid theoretical basis amenable to analysis of algorithm performance Second-Order Diffusions I Stochastic Hamiltonian on Manifold driven by conditional Ornstein-Uhlenbeck process dxt = G−1 (xt )pt dt dpt = −ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt + p 2G(xt )dWt I Employ symplectic integrator e.g. RMHMC for drift term I Note that OU diffusion process corresponds to ad-hoc partial momentum updating I Solid theoretical basis amenable to analysis of algorithm performance I Phase space defined on co-tangent bundle with symplectic forms in place Second-Order Diffusions I Stochastic Hamiltonian on Manifold driven by conditional Ornstein-Uhlenbeck process dxt = G−1 (xt )pt dt dpt = −ν(xt , pt )dt − ∇xt φ(xt )dt − pt dt + p 2G(xt )dWt I Employ symplectic integrator e.g. RMHMC for drift term I Note that OU diffusion process corresponds to ad-hoc partial momentum updating I Solid theoretical basis amenable to analysis of algorithm performance I Phase space defined on co-tangent bundle with symplectic forms in place I What of the configuration space? Second-Order Diffusions Second-Order Diffusions I This SDE defined on co-tangent bundle - non-symplectic forms Second-Order Diffusions I This SDE defined on co-tangent bundle - non-symplectic forms dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + p 2G−1 (xt )dWt Second-Order Diffusions I I This SDE defined on co-tangent bundle - non-symplectic forms dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + p 2G−1 (xt )dWt Drift function can be seen to describe Lagrangian dynamics Second-Order Diffusions I This SDE defined on co-tangent bundle - non-symplectic forms dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + p 2G−1 (xt )dWt I Drift function can be seen to describe Lagrangian dynamics I OU diffusion driving overall process Second-Order Diffusions I This SDE defined on co-tangent bundle - non-symplectic forms dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + p 2G−1 (xt )dWt I Drift function can be seen to describe Lagrangian dynamics I OU diffusion driving overall process I Construct explicit integrator for dynamics - with volume correction Second-Order Diffusions I This SDE defined on co-tangent bundle - non-symplectic forms dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + p 2G−1 (xt )dWt I Drift function can be seen to describe Lagrangian dynamics I OU diffusion driving overall process I Construct explicit integrator for dynamics - with volume correction I Preserves original Hamiltonian dynamics - high proposal acceptance rates Second-Order Diffusions I This SDE defined on co-tangent bundle - non-symplectic forms dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + p 2G−1 (xt )dWt I Drift function can be seen to describe Lagrangian dynamics I OU diffusion driving overall process I Construct explicit integrator for dynamics - with volume correction I Preserves original Hamiltonian dynamics - high proposal acceptance rates I Speed improvement over implicit symplectic integrator between 2x to 10x Second-Order Diffusions I This SDE defined on co-tangent bundle - non-symplectic forms dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + p 2G−1 (xt )dWt I Drift function can be seen to describe Lagrangian dynamics I OU diffusion driving overall process I Construct explicit integrator for dynamics - with volume correction I Preserves original Hamiltonian dynamics - high proposal acceptance rates I Speed improvement over implicit symplectic integrator between 2x to 10x Though non-symplectic geometry limits sampling - see The Geometry Foundations of Hamiltonian Monte Carlo http://arxiv.org/pdf/1410.5110.pdf Second-Order Diffusions I This SDE defined on co-tangent bundle - non-symplectic forms dxt = vt dt dvt = −γ(xt , vt )dt − G−1 (xt )∇xt φ(xt )dt − vt dt + p 2G−1 (xt )dWt I Drift function can be seen to describe Lagrangian dynamics I OU diffusion driving overall process I Construct explicit integrator for dynamics - with volume correction I Preserves original Hamiltonian dynamics - high proposal acceptance rates I Speed improvement over implicit symplectic integrator between 2x to 10x Though non-symplectic geometry limits sampling - see The Geometry Foundations of Hamiltonian Monte Carlo http://arxiv.org/pdf/1410.5110.pdf I Markov chain Monte Carlo from Lagrangian Dynamics Journal of Comp.Graph.Stats, 2014 Gaussian Mixture Model I Univariate finite mixture model p(x|µ, σ 2 ) = 0.7 × N (x|0, σ 2 ) + 0.3 × N (x|µ, σ 2 ) Gaussian Mixture Model Univariate finite mixture model p(x|µ, σ 2 ) = 0.7 × N (x|0, σ 2 ) + 0.3 × N (x|µ, σ 2 ) 4 3 2 1 lnσ I 0 −1 −2 −3 −4 −4 −3 −2 −1 0 µ 1 2 3 4 Figure : Arrows correspond to the gradients and ellipses to the inverse metric tensor. Dashed lines are isocontours of the joint log density Gaussian Mixture Model MALA path Simplified mMALA path mMALA path 4 4 4 −1410 −1410 −1410 −1310 3 −1310 3 −1210 −1110 −1010 1 1 −910 0 −1 lnσ −910 lnσ lnσ −910 0 −1410 −2 −1 0 µ 1 2 −1410 −2 −2910 −3 −4 −4 4 −3 −2 MALA Sample Autocorrelation Function −1 0 µ 1 2 3 −4 −4 4 0.2 0 0.6 0.4 0.2 0 6 8 10 Lag 12 14 16 18 20 −0.2 −2 −1 0 µ 1 2 3 4 0.8 Sample Autocorrelation Sample Autocorrelation 0.4 4 −3 Simplified mMALA Sample Autocorrelation Function 0.8 0.6 2 −2910 −3 mMALA Sample Autocorrelation Function 0.8 0 −1410 −2 −2910 −3 3 0 −1 −1 −2 Sample Autocorrelation −1010 −1010 1 −0.2 −1110 2 2 −3 −1210 −1110 2 −4 −4 −1310 3 −1210 0.6 0.4 0.2 0 0 2 4 6 8 10 Lag 12 14 16 18 20 −0.2 0 2 4 6 8 10 Lag 12 14 16 18 20 Gaussian Mixture Model HMC path RMHMC path 4 Gibbs path 4 4 −1410 −1410 −1310 3 −1410 −1310 3 −1210 −1110 2 −1010 1 1 −1 −910 lnσ −910 lnσ lnσ −910 0 −1 −1410 −2 −2 −1 0 µ 1 2 −1410 −4 −4 4 −3 −2 HMC Sample Autocorrelation Function −1 0 µ 1 2 3 −4 −4 4 0.2 0 0.6 0.4 0.2 0 6 8 10 Lag 12 14 −2 16 18 20 −0.2 −1 0 µ 1 2 3 4 0.8 Sample Autocorrelation Sample Autocorrelation 0.4 4 −3 Gibbs Sample Autocorrelation Function 0.8 0.6 2 −2910 −3 RMHMC Sample Autocorrelation Function 0.8 0 −1410 −2 −2910 −3 3 0 −1 −2 −2910 −3 Sample Autocorrelation −1010 1 0 −0.2 −1110 2 −1010 −3 −1210 −1110 2 −4 −4 −1310 3 −1210 0.6 0.4 0.2 0 0 2 4 6 8 10 Lag 12 14 16 18 20 −0.2 0 2 4 6 8 10 Lag 12 14 16 18 20 Log-Gaussian Cox Point Process with Latent Field I The joint density for Poisson counts and latent Gaussian field p(y, x|µ, σ, β) ∝ Y64 i,j exp{yi,j xi,j −m exp(xi,j )} exp(−(x−µ1)T Σ−1 θ (x−µ1)/2) Log-Gaussian Cox Point Process with Latent Field I The joint density for Poisson counts and latent Gaussian field p(y, x|µ, σ, β) ∝ I Y64 i,j exp{yi,j xi,j −m exp(xi,j )} exp(−(x−µ1)T Σ−1 θ (x−µ1)/2) Metric tensors G(θ)i,j = 1 ∂Σθ −1 ∂Σθ Σθ trace Σ−1 θ 2 ∂θi ∂θj G(x) = Λ + Σ−1 θ where Λ is diagonal with elements m exp(µ + (Σθ )i,i ) Log-Gaussian Cox Point Process with Latent Field I The joint density for Poisson counts and latent Gaussian field p(y, x|µ, σ, β) ∝ I Y64 i,j exp{yi,j xi,j −m exp(xi,j )} exp(−(x−µ1)T Σ−1 θ (x−µ1)/2) Metric tensors G(θ)i,j = 1 ∂Σθ −1 ∂Σθ Σθ trace Σ−1 θ 2 ∂θi ∂θj G(x) = Λ + Σ−1 θ where Λ is diagonal with elements m exp(µ + (Σθ )i,i ) I Latent field metric tensor defining flat manifold is 4096 × 4096, O(N 3 ) obtained from parameter conditional Log-Gaussian Cox Point Process with Latent Field I The joint density for Poisson counts and latent Gaussian field p(y, x|µ, σ, β) ∝ I Y64 i,j exp{yi,j xi,j −m exp(xi,j )} exp(−(x−µ1)T Σ−1 θ (x−µ1)/2) Metric tensors G(θ)i,j = 1 ∂Σθ −1 ∂Σθ Σθ trace Σ−1 θ 2 ∂θi ∂θj G(x) = Λ + Σ−1 θ where Λ is diagonal with elements m exp(µ + (Σθ )i,i ) I Latent field metric tensor defining flat manifold is 4096 × 4096, O(N 3 ) obtained from parameter conditional I MALA requires transformation of latent field to sample - with separate tuning in transient and stationary phases of Markov chain RMHMC for Log-Gaussian Cox Point Processes 10 10 10 10 20 20 20 20 30 30 30 30 40 40 40 40 50 50 50 50 60 60 60 20 40 60 20 40 60 60 20 40 60 10 10 10 10 20 20 20 20 30 30 30 30 40 40 40 40 50 50 50 50 60 60 60 20 40 60 20 40 60 20 40 60 20 40 60 60 20 40 60 Figure : Data, Latent Field, Poisson Mean Field RMHMC for Log-Gaussian Cox Point Processes Table : Sampling the latent variables of a Log-Gaussian Cox Process - Comparison of sampling methods Method MALA (Transient) MALA (Stationary) mMALA RMHMC Time 31,577 31,118 634 2936 ESS (Min, Med, Max) (3, 8, 50) (4, 16, 80) (26, 84, 174) (1951, 4545, 5000) s/Min ESS 10,605 7836 24.1 1.5 Rel. Speed ×1 ×1.35 ×440 ×7070 Illustrative Heat Conduction Problem I Heat conduction problem governed by an elliptic partial differential equation in the open and bounded domain Ω: Illustrative Heat Conduction Problem I Heat conduction problem governed by an elliptic partial differential equation in the open and bounded domain Ω: −∇ · (eu ∇w) = 0 −eu ∇w · n = u Bi = −1 u −e ∇w · n in Ω on on ∂Ω/ ΓR ΓR Illustrative Heat Conduction Problem I Heat conduction problem governed by an elliptic partial differential equation in the open and bounded domain Ω: −∇ · (eu ∇w) = 0 −eu ∇w · n = u Bi = −1 u −e ∇w · n I in Ω on on ∂Ω/ ΓR ΓR Forward state is w and u the logarithm of distributed thermal conductivity on Ω, n the unit outward normal on Ω, and Bi the Biot number. Illustrative Heat Conduction Problem I Heat conduction problem governed by an elliptic partial differential equation in the open and bounded domain Ω: −∇ · (eu ∇w) = 0 −eu ∇w · n = u Bi = −1 u −e ∇w · n I in Ω on on ∂Ω/ ΓR ΓR Forward state is w and u the logarithm of distributed thermal conductivity on Ω, n the unit outward normal on Ω, and Bi the Biot number. Illustrative Heat Conduction Problem 4 3.5 3 2.5 2 1.5 1 0.5 0 −3 −2 −1 0 1 2 3 Illustrative Heat Conduction Problem Illustrative Heat Conduction Problem I Take one finite element discretisation of domain and one observation at leftmost boundary Illustrative Heat Conduction Problem I Take one finite element discretisation of domain and one observation at leftmost boundary I Consider induced bivariate posterior for varying forms of prior Gaussian measure πpost 5 u2 0 −5 −10 −10 −5 u1 0 5 Hamiltonian on Riemann Manifold The dynamics for k -th component of u is given by Hamiltons equations duk ∂H = G(u)−1 p = dt ∂pk k ∂ G(u) dpk ∂H 1 =− = −∇k J (u) − Tr G(u)−1 dt ∂uk 2 ∂uk ∂ G(u) 1 + pT G(u)−1 G(u)−1 p 2 ∂uk Gradient Gradient Z ũeu ∇w · ∇λ dΩ, h∇J (u) , ũi = Ω First Order Forward Z eu ∇w · ∇λ̂ dΩ + Ω Z Z Bi w λ̂ ds = ∂Ω\ΓR λ̂ ds, ΓR First Order Adjoint Z Ω eu ∇λ · ∇ŵ dΩ + Z Bi λŵ ds = − ∂Ω\ΓR K 1 X (w (xj ) − dj ) ŵ (xj ) , σ2 j=1 Hamiltonian on Riemann Manifold The dynamics for k -th component of u is given by Hamiltons equations duk ∂H = G(u)−1 p = dt ∂pk k ∂ G(u) dpk ∂H 1 =− = −∇k J (u) − Tr G(u)−1 dt ∂uk 2 ∂uk ∂ G(u) 1 + pT G(u)−1 G(u)−1 p 2 ∂uk Hessian/Fisher Information Matrix Fisher Matrix-Vector Product D E Z ũeu ∇w · ∇λ˜2 dΩ, hG (u) , ũi , u 2 = Ω Second Order Forward Z eu ∇w 2 · ∇λ̂ dΩ + Z Ω Bi w 2 λ̂ ds = − ∂Ω\ΓR Z u 2 eu ∇w · ∇λ̂ dΩ, Ω Second Order Adjoint Z eu ∇λ˜2 · ∇ŵ dΩ + Ω Z Bi λ˜2 ŵ ds = − ∂Ω\ΓR K 1 X 2 w (xj ) ŵ (xj ) . σ2 j=1 Hamiltonian on Riemann Manifold The dynamics for k -th component of u is given by Hamiltons equations duk ∂H = G(u)−1 p = dt ∂pk k ∂ G(u) dpk ∂H 1 =− = −∇k J (u) − Tr G(u)−1 dt ∂uk 2 ∂uk ∂ G(u) 1 + pT G(u)−1 G(u)−1 p 2 ∂uk Derivative of Fisher Information Matrix Derivative of Fisher Matrix-Matrix product DD E E D D E E hT (u) , ũi , u 2 , u 3 := ∇ hG (u) , ũi , u 2 , u 3 Z Z Z ũeu ∇w · ∇λ2,3 dΩ ũeu ∇w 3 · ∇λ˜2 dΩ + ũu 3 eu ∇w · ∇λ˜2 dΩ + = Ω Ω Ω Third Order Forward Z u Z 3 Z 3 e ∇w · ∇λ̂ dΩ + 3 u u e ∇w · ∇λ̂ dΩ. Bi w λ̂ ds = − Ω ∂Ω\ΓR Ω Third Order Adjoint Z u 2,3 e ∇λ Ω Z · ∇ŵ dΩ + 2,3 Bi λ ŵ ds = − ∂Ω\ΓR K 1 X σ 2 j=1 Z w 2,3 xj ŵ xj u e ∇λ˜2 · ∇ŵ dΩ, 3 u − Ω Two-parameter Case trace of ui sRMMALA Sample mean ACF of ui 1 5 0.5 1 0 0 0 −5 0 0.5 1 1000 3000 5000 0 50 RMMALA Lag 1 5 0.5 1 0 0 0 −5 0 0.5 1 1000 3000 5000 0 50 sRMHMC Lag 1 5 0.5 1 0 0 0 −5 0 0.5 1 1000 3000 5000 0 50 RMHMC Lag 1 5 0.5 1 0 0 0 −5 0 0.5 1 1000 3000 5000 0 50 Lag I ε = 0.7 for simRMMALA and RMMALA I ε = 0.02, L = 100 for simRMHMC and RMHMC 1025-parameter Case u1 Low rank Fisher 5 0 0 0 u2 5000 0 −5 0 2 u1024 Exact full Hessian 5 −5 0 5 5000 5000 −5 0 2 5000 −2 0 2 5000 −2 0 5000 −5 0 2 5000 0 5000 0 5000 −5 0 5 0 0 0 −2 0 −5 0 5 0 0 −2 0 2 u1025 Exact Fisher 5 −2 0 2 5000 0 5000 −2 0 5000 1025-parameter Case Low rank Fisher Exact Fisher u1 1 u2 u1024 1 0 0 0 −1 0 −1 0 −1 0 5 Lag 10 1 5 Lag 10 1 0 0 0 −1 0 −1 0 5 Lag 10 5 Lag 10 1 1 0 0 0 −1 0 −1 0 −1 0 5 Lag 10 5 Lag 10 5 Lag 10 5 Lag 10 5 Lag 10 1 0 0 0 −1 0 −1 0 −1 0 10 10 1 1 5 Lag 5 Lag 1 −1 0 1 u1025 Exact full Fisher 1 5 Lag 10 Bui-Thanh, T., and Girolami, M., Solving Large-scale PDE using Riemann Manifold Hamiltonian MCMC, Inverse Problems, 30, 92014) 114014, IoP Publishing. Conclusion and Discussion I Geometry of statistical models harnessed in Monte Carlo methods Conclusion and Discussion I Geometry of statistical models harnessed in Monte Carlo methods I Diffusions that respect structure and curvature of space - Manifold MALA Conclusion and Discussion I Geometry of statistical models harnessed in Monte Carlo methods I Diffusions that respect structure and curvature of space - Manifold MALA I Geodesic flows on model manifold - RMHMC - generalisation of HMC Conclusion and Discussion I Geometry of statistical models harnessed in Monte Carlo methods I Diffusions that respect structure and curvature of space - Manifold MALA I Geodesic flows on model manifold - RMHMC - generalisation of HMC I Assessed on correlated & high-dimensional latent variable models Conclusion and Discussion I Geometry of statistical models harnessed in Monte Carlo methods I Diffusions that respect structure and curvature of space - Manifold MALA I Geodesic flows on model manifold - RMHMC - generalisation of HMC I Assessed on correlated & high-dimensional latent variable models I Promising capability of methodology Conclusion and Discussion I I Geometry of statistical models harnessed in Monte Carlo methods I Diffusions that respect structure and curvature of space - Manifold MALA I Geodesic flows on model manifold - RMHMC - generalisation of HMC I Assessed on correlated & high-dimensional latent variable models I Promising capability of methodology Ongoing development Conclusion and Discussion I I Geometry of statistical models harnessed in Monte Carlo methods I Diffusions that respect structure and curvature of space - Manifold MALA I Geodesic flows on model manifold - RMHMC - generalisation of HMC I Assessed on correlated & high-dimensional latent variable models I Promising capability of methodology Ongoing development I Theoretical analysis of convergence Conclusion and Discussion I I Geometry of statistical models harnessed in Monte Carlo methods I Diffusions that respect structure and curvature of space - Manifold MALA I Geodesic flows on model manifold - RMHMC - generalisation of HMC I Assessed on correlated & high-dimensional latent variable models I Promising capability of methodology Ongoing development I Theoretical analysis of convergence I Geodesic Monte Carlo on Embedded Manifolds http://arxiv.org/pdf/1301.6064.pdf Conclusion and Discussion I I Geometry of statistical models harnessed in Monte Carlo methods I Diffusions that respect structure and curvature of space - Manifold MALA I Geodesic flows on model manifold - RMHMC - generalisation of HMC I Assessed on correlated & high-dimensional latent variable models I Promising capability of methodology Ongoing development I Theoretical analysis of convergence I Geodesic Monte Carlo on Embedded Manifolds http://arxiv.org/pdf/1301.6064.pdf I Investigate alternative manifold structures Conclusion and Discussion I I Geometry of statistical models harnessed in Monte Carlo methods I Diffusions that respect structure and curvature of space - Manifold MALA I Geodesic flows on model manifold - RMHMC - generalisation of HMC I Assessed on correlated & high-dimensional latent variable models I Promising capability of methodology Ongoing development I Theoretical analysis of convergence I Geodesic Monte Carlo on Embedded Manifolds http://arxiv.org/pdf/1301.6064.pdf I Investigate alternative manifold structures I Design and effect of metric and connection Conclusion and Discussion I I Geometry of statistical models harnessed in Monte Carlo methods I Diffusions that respect structure and curvature of space - Manifold MALA I Geodesic flows on model manifold - RMHMC - generalisation of HMC I Assessed on correlated & high-dimensional latent variable models I Promising capability of methodology Ongoing development I Theoretical analysis of convergence I Geodesic Monte Carlo on Embedded Manifolds http://arxiv.org/pdf/1301.6064.pdf I Investigate alternative manifold structures I Design and effect of metric and connection I Optimality of Hamiltonian flows as local geodesics Conclusion and Discussion I I Geometry of statistical models harnessed in Monte Carlo methods I Diffusions that respect structure and curvature of space - Manifold MALA I Geodesic flows on model manifold - RMHMC - generalisation of HMC I Assessed on correlated & high-dimensional latent variable models I Promising capability of methodology Ongoing development I Theoretical analysis of convergence I Geodesic Monte Carlo on Embedded Manifolds http://arxiv.org/pdf/1301.6064.pdf I Investigate alternative manifold structures I Design and effect of metric and connection I Optimality of Hamiltonian flows as local geodesics I Surrogate geometries based on emulators for PDEs Conclusion and Discussion I I I Geometry of statistical models harnessed in Monte Carlo methods I Diffusions that respect structure and curvature of space - Manifold MALA I Geodesic flows on model manifold - RMHMC - generalisation of HMC I Assessed on correlated & high-dimensional latent variable models I Promising capability of methodology Ongoing development I Theoretical analysis of convergence I Geodesic Monte Carlo on Embedded Manifolds http://arxiv.org/pdf/1301.6064.pdf I Investigate alternative manifold structures I Design and effect of metric and connection I Optimality of Hamiltonian flows as local geodesics I Surrogate geometries based on emulators for PDEs No silver bullet or cure all - new powerful methodology for MC toolkit Funding Acknowledgments I EPSRC Established Research Career Fellowship I EPSRC Programme Grant - Enabling Quantification of Uncertainty in Inverse Problems I Royal Society Wolfson Research Merit Award