Adaptive Geometric Monte Carlos using Gaussian Process emulation for computation intensive models Shiwei Lan Mark Girolami Department of Statistics University of Warwick March 19, 2015 Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 1 / 56 Geometric Monte Carlo – adapt to local geometry RWM θ2 HMC LMC 2 2 2 1.5 1.5 1.5 1 1 1 1 0.5 0.5 0.5 0.5 θ2 0 -0.5 θ2 0 -0.5 θ2 0 -0.5 0 -0.5 -1 -1 -1 -1 -1.5 -1.5 -1.5 -1.5 -2 -2 -1 0 1 2 θ1 Shiwei Lan RHMC 2 1.5 Mark Girolami (Warwick) -2 -2 -1 0 θ1 1 2 -2 -2 -1 0 1 θ1 GP emulative Geometric Monte Carlos 2 -2 -2 -1 0 1 2 θ1 03/19/2015 2 / 56 Geometric Monte Carlo – but computationally... RWM θ2 HMC LMC 2 2 2 1.5 1.5 1.5 1 1 1 1 0.5 0.5 0.5 0.5 θ2 0 -0.5 θ2 0 -0.5 θ2 0 -0.5 0 -0.5 -1 -1 -1 -1 -1.5 -1.5 -1.5 -1.5 -2 -2 -1 0 1 2 θ1 Shiwei Lan RHMC 2 1.5 Mark Girolami (Warwick) -2 -2 -1 0 θ1 1 2 -2 -2 -1 0 1 θ1 GP emulative Geometric Monte Carlos 2 -2 -2 -1 0 1 2 θ1 03/19/2015 3 / 56 Big data/models – how? Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 4 / 56 Gaussian Process emulation/surrogate Truth 2 2 1.5 1.5 1 1 0.5 θ2 0.5 θ2 0 0 -0.5 -0.5 -1 -1 -1.5 -1.5 -2 -2 Emulation by GP -1 0 1 2 -2 -2 -1 θ1 Shiwei Lan Mark Girolami (Warwick) 0 1 2 θ1 GP emulative Geometric Monte Carlos 03/19/2015 5 / 56 Geometric Monte Carlos 1 Geometric Monte Carlos Random Walk + Gradient + Metric 2 Gaussian Process emulation Emulation of potential energy Emulation with derivative information Emulation of gradient and metric 3 Auto-refinement: online updating design pool Adaptation through Regeneration Mutual Information for Computer Experiments (MICE) 4 Experiments Banana-Biscuit-Doughnut distribution Elliptic PDE Teal South oil reservoir 5 Conclusion and Discussion Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 6 / 56 Geometric Monte Carlos Random Walk Random Walk Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 7 / 56 Geometric Monte Carlos Random Walk Random Walk Metropolis Sample from probability distribution P(θ) with density π(θ). Given current state θ, sample a random direction p ∼ N (0, I). Make proposal θ ∗ following this direction p like a random walk: dθ t = dWt , θ ∗ = θ + εp Accept θ ∗ according to Metropolis acceptance probability: π(θ ∗ |D)/q(θ ∗ |θ) π(θ ∗ |D) α = min 1, = min 1, π(θ|D)/q(θ|θ ∗ ) π(θ|D) where q(θ ∗ |θ) = q(θ|θ ∗ ) = πN (θ ∗ ; θ, ε2 I). Or reject θ ∗ and stay at current state for the next sample. Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 8 / 56 Geometric Monte Carlos + Gradient + Gradient Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 9 / 56 Geometric Monte Carlos + Gradient Metropolis Adjusted Langevin Algorithm Definition 1 (Langevin dynamics) 1 dθ t = ∇ log π(θ t )dt + dWt 2 ε2 θ ∗ = θ + ∇ log π(θ t ) + εp, 2 p ∼ N (0, I) Accept or reject θ ∗ according to Metropolis probability α. Converge to the invariant distribution π(θ). Suppress the random walk behavior; but isotropic diffusion is inefficient for correlated distributions. Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 10 / 56 Geometric Monte Carlos + Gradient Hamiltonian Monte Carlo: Multi-step MALA ∂H ∂p ∂H ṗ = − ∂θ U (θ ) θ̇ = θ Position θ ∈ RD ⇐= variables of interest Momentum p ∈ RD ⇐= fictitious, usually ∼ N (0, M) Potential energy U(θ) ⇐= minus log of target density π(·) Kinetic energy K (p) ⇐= minus log of momentum density Hamiltonian H(θ, p) = U(θ) + K (p) ⇐= constant. Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 11 / 56 Geometric Monte Carlos + Gradient Hamiltonian Monte Carlo Definition 2 (Hamiltonian dynamics) θ̇ = ∂ H(θ, p) ∂p = M−1 p ∂ ṗ = − ∂θ H(θ, p) = −∇θ U(θ) Leapfrog: numerical integrator p(t + ε/2) = p(t) − (ε/2)∇θ U(θ(t)) θ(t + ε) = θ(t) + εM−1 p(t + ε/2) p(t + ε) = p(t + ε/2) − (ε/2)∇θ U(θ(t + ε)) Run for L steps and accept the joint proposal of z := (θ, p) with α = min{1, exp(−H(z∗ ) + H(z))} Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 12 / 56 Geometric Monte Carlos + Gradient Hamiltonian Monte Carlo Reversibility% θ% Random% p% dθ/dt%=%dH/dp% dp/dt%=%)dH/dθ% sample% θ*% Choice%of%ε,%L% Volume% PreservaCon% θ% Random% p% Energy% ConservaCon% Shiwei Lan Mark Girolami (Warwick) p(t+ε/2)%=%p(t)–ε/2dH/dθ(θ(t))% θ(t+ε)%=%θ(t)+εdH/dp(p(t+ε/2))% p(t+ε)%=%p(t+ε/2))ε/2dH/dθ(θ(t+ε))% GP emulative Geometric Monte Carlos propose% acp /rej% DirecCon%of%p% θ*% 03/19/2015 13 / 56 Geometric Monte Carlos + Gradient Geometry plays a role! Sampling Path of RHMC 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 θ2 2 0 0 −0.5 −0.5 −0.5 −1 −1 −1 −1.5 −1.5 −1.5 −2 −2 Shiwei Lan Sampling Path of HMC 2 θ2 θ2 Sampling Path of RWM 2 −1 0 θ1 1 Mark Girolami (Warwick) 2 −2 −2 −1 0 θ1 1 2 GP emulative Geometric Monte Carlos −2 −2 −1 0 θ1 1 03/19/2015 2 14 / 56 Geometric Monte Carlos + Metric + Metric Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 15 / 56 Geometric Monte Carlos + Metric Riemannian Hamiltonian dynamics On the manifold {π(·; θ)} with metric G (θ) = −Ex|θ [∇2θ log π(x; θ)]: H(θ, p) = U(θ) + K (p, θ) 1 1 = − log π(θ) + log det G(θ) + pT G(θ)−1 p 2 2 1 T −1 ≡ φ(θ) + p G(θ) p 2 where p|θ ∼ N (0, G(θ)). Girolami and Calderhead (2011) propose: Definition 3 (Riemannian Hamiltonian dynamics) ∂ H(θ, p) = G(θ)−1 p ∂p ∂ 1 ṗ = − H(θ, p) = −∇θ φ(θ) + pT G(θ)−1 ∂G(θ)G(θ)−1 p ∂θ 2 θ̇ = Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 16 / 56 Geometric Monte Carlos + Metric Riemannian Hamiltonian Monte Carlo Generalized Leapfrog (n+ 12 ) p θ (n+1) p(n+1) i εh (n) (n) (n+ 21 ) (n+ 21 ) T −1 =p − ∇θ φ(θ ) − (p ) ∇θ G (θ )p 2h i 1 ε −1 (n) G (θ ) + G−1 (θ (n+1) ) p(n+ 2 ) = θ (n) + 2 h i ε (n+1) (n+1) (n+ 21 ) (n+ 12 ) (n+ 21 ) T −1 =p − ∇θ φ(θ ) − (p ) ∇θ G (θ )p 2 (n) Time reversible Volume preserving But · · · time consuming, and occasionally un-stable · · · Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 17 / 56 Geometric Monte Carlos + Metric Lagrangian dynamics Definition 4 (Lagrangian Dynamics) θ̇ = G(θ)−1 p 1 ṗ = −∇θ φ(θ) + pT G(θ)−1 ∂G(θ)G(θ)−1 p 2 w w p → v Lagrangian Dynamics θ̇ = v v̇ = −vT Γ(θ)v − G(θ)−1 ∇θ φ(θ) Not Hamiltonian dynamics of (θ, v)! Computational time is mainly spent on finding direction v. Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 18 / 56 Geometric Monte Carlos + Metric Second-order Langevin Diffusion Consider the stochastic differential equation dθ t = vt dt dvt = −vtT Γ(θ t )vt dt − G−1 (θ t )∇θt φ(θ t )dt − vt dt + p 2G−1 (θ t )dWt Fokker-Planck equation of the density evolution q(θ, v) is 2 dq ∂ k ∂q k i j kl ∂φ k kl ∂ q v v + g = −v + Γ + v q + g ij dt ∂θk ∂v k ∂θl ∂v k ∂v l The stationary density as t → ∞, dq/dt = 0: |g (θ)| 1 T q(θ, v) = exp − v G(θ)v − φ(x) = π(θ)N (0, G(θ)−1 ) D/2 (2π) 2 Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 19 / 56 Geometric Monte Carlos + Metric Lagrangian Monte Carlo Explicit time-reversible integrator 1 ε ε v(n+ 2 ) = [I + (v(n) )T Γ(θ (n) )]−1 [v(n) − G(θ (n) )−1 ∇θ φ(θ (n) )] 2 2 (n+1) (n) (n+ 12 ) θ = θ + εv 1 1 ε ε v(n+1) = [I + (v(n+ 2 ) )T Γ(θ (n+1) )]−1 [v(n+ 2 ) − G(θ (n+1) )−1 ∇θ φ(θ (n+1) 2 2 Proposition 1 (Detailed Balance with Volume Correction) exp(−H(z0 )) α̃(z, z )P(dz) = α̃(z , z)P(dz ), α̃(z, z ) = min 1, | det T̂L | exp(−H(z)) 0 0 0 0 Numerical convergence en = kz(tn ) − z(n) k = k(θ(tn ), v(tn )) − (θ (n) , v(n) )k → 0, ε → 0 Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 20 / 56 Geometric Monte Carlos + Metric Efficiency Measurement Definition 5 (Effective Sample Size) For S samples, effective sample size is calculated as follows: K ESS = S[1 + 2Σk=1 ρ(k)]−1 where ρ(k) is the autocorrelation function with lag k, and K 1. Performance measured by time-normalized ESS. Interpreted as number of nearly independent samples. Use the minimum ESS normalized by CPU time: min(ESS)/s. Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 21 / 56 Geometric Monte Carlos + Metric Banana-shaped distribution RHMC sLMC 1.2 1.2 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0 −3 −2 −1 0 1 2 0.2 0 −3 −2 −1 0 1 2 0 −3 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0 0 −1 0 1 2 AP 0.79 0.78 0.84 0.73 Mark Girolami (Warwick) 0 1 2 0.1 0 −2 θ2 Method HMC RHMC sLMC LMC −1 θ1 0.6 −2 −2 θ1 θ1 Shiwei Lan LMC 1.2 −1 0 1 2 −2 θ2 s/Iter 6.96e-04 4.56e-03 7.90e-04 7.27e-04 ESS (288,614,941) (4514,5779,7044) (2195,3476,4757) (1139,2409,3678) GP emulative Geometric Monte Carlos −1 0 1 2 θ2 min(ESS)/s 20.65 49.50 138.98 78.32 03/19/2015 22 / 56 Geometric Monte Carlos + Metric Banana-shaped distribution Sampling Path of LMC 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 θ2 2 0 0 −0.5 −0.5 −0.5 −1 −1 −1 −1.5 −1.5 −1.5 −2 −2 Shiwei Lan Sampling Path of sLMC 2 θ2 θ2 Sampling Path of RHMC 2 −1 0 θ1 1 Mark Girolami (Warwick) 2 −2 −2 −1 0 θ1 1 2 GP emulative Geometric Monte Carlos −2 −2 −1 0 θ1 1 03/19/2015 2 23 / 56 Geometric Monte Carlos + Metric Links between geometric Monte Carlos RWM MALA HMC Gradient descent Shiwei Lan Mark Girolami (Warwick) RHMC LMC Newton’s method GP emulative Geometric Monte Carlos 03/19/2015 24 / 56 Gaussian Process emulation 1 Geometric Monte Carlos Random Walk + Gradient + Metric 2 Gaussian Process emulation Emulation of potential energy Emulation with derivative information Emulation of gradient and metric 3 Auto-refinement: online updating design pool Adaptation through Regeneration Mutual Information for Computer Experiments (MICE) 4 Experiments Banana-Biscuit-Doughnut distribution Elliptic PDE Teal South oil reservoir 5 Conclusion and Discussion Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 25 / 56 Gaussian Process emulation Emulation of potential energy GP emulation of potential energy (log-posterior) We need log-posterior for all MCMCs: U(θ) = − log L(x; θ) − log P(θ) Big data – the log-likelihood has huge amount items to add up; Complex models–they are computationally expensive to simulate. We need cheaper substitutes – Gaussian Process emulation. U(·) ∼ GP(µ(·), C(·, ·)) µ(θ) = h(θ)β, h(θ) := [1, θ T ] a 1 × (D + 1) vector C(·, ·) = σ 2 C(·, ·), T C(θ i , θ j ) := exp{−(θ i − θ j ) diag(ρ)(θ i − θ j )} Other parametrizations ρ = r−2 (r correlation length), ρ = e−τ . Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 26 / 56 Gaussian Process emulation Emulation of potential energy GP emulation of potential energy Given design points De := {θ 1 , · · · , θ n }, and conditioned on functional outputs uD := U(De), we can predict U(θ ∗ ) at E := {θ ∗1 , · · · , θ ∗m }, denoted as uE . Assume p(β, σ 2 ) ∝ σ −2 . uE |uD , ρ ∼ Tn−(D+1) (µ∗∗ , σ b2 C∗∗ ) b + CED C−1 (uD − HD β) b µ∗∗ = HE β D " # " # h i 0 HT −1 HT D E C∗∗ = CE − HE CED HD CD CDE b = (HT C−1 HD )−1 HT C−1 uD β D D D D σ b2 = (n − (D + 1) − 2)−1 uTD QD uD Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 27 / 56 Gaussian Process emulation Emulation of potential energy GP emulation of potential energy −1 BD := (HTD C−1 D HD ) PD := BD HTD C−1 D QD := C−1 D [I − HD PD ] LE := HE PD + CED QD Best Linear Unbiased Predictor BLUP uE |uD , ρ ≈ µ∗∗ = LE uD ρ is fixed at MLE. PD HD = I, Shiwei Lan Mark Girolami (Warwick) HTD QD = QD HD = 0 GP emulative Geometric Monte Carlos 03/19/2015 28 / 56 Gaussian Process emulation Emulation with derivative information GP emulation with derivative information Derivative information, duD = ∇ ⊗ U(De) helps GP emulation. The differential operator is linear thus dU(·) is still a Gaussian Process (Papoulis and Pillai, 2002) ∂U(θ i ) ∂ E = i E[U(θ i )] i ∂θk ∂θk i ∂U(θ ) ∂ Cor , U(θ j ) = i C(θ i , θ j ) = −2ρk (θki − θkj )C(θ i , θ j ) i ∂θk ∂θk # " ∂2 ∂U(θ i ) ∂U(θ j ) , = i j C(θ i , θ j ) Cor i j ∂θk ∂θl ∂θk ∂θl = [2ρk δkl − 4ρk ρl (θki − θkj )(θli − θlj )]C(θ i , θ j ) Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 29 / 56 Gaussian Process emulation Emulation with derivative information Effect of derivative information on emulation 2 30pts (no derivatives) 2 1.5 θ2 60pts (no derivatives) True density 2 1.5 2 1.5 1.5 1 1 1 1 0.5 0.5 0.5 0.5 θ2 0 θ2 0 θ2 0 0 -0.5 -0.5 -0.5 -0.5 -1 -1 -1 -1 -1.5 -1.5 -1.5 -1.5 -2 -2 -1 0 1 2 -2 -2 θ1 -1 0 θ1 1 2 -2 -2 30pts (with derivatives) -1 0 θ1 1 2 -2 -2 -1 0 1 2 θ1 Proposition 2 Denote ũD = [uTD , duTD ]T . Given the same design set De, we have E[(U(θ ∗ ) − Û(θ ∗ )|ũD )2 ] ≤ E[(U(θ ∗ ) − Û(θ ∗ )|uD )2 ] Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 30 / 56 Gaussian Process emulation Emulation of gradient and metric GP emulation of gradient and Hessian Geometric MCMCs also require gradient/Hessian of potential energy and Fisher metric (derivatives) that are emulated similarly d 2 uE (mD 2 ) d 2 uE (mD 2 ) duE (mD) uE (m) CE2 uD (n) duD (nD) CD0 E2 CD1 E2 duE (mD) uE (m) uD (n) duD (nD) CE CE2 D0 CE1 D0 CE0 D0 CE2 D1 CE1 D1 CE0 D1 CD0 E0 CD1 E0 CD0 D0 CD1 D0 CD0 D1 CD1 D1 CE1 CD0 E1 CD1 E1 e D := (H eTC e −1 e −1 e e e T e −1 Denote B D D HD ) , PD := BD HD CD , e D := C e −1 (I − H e DP e D ). All predictions can be written as a linear Q D map of the extended information based on designed points: eEα ũD , E[uEα |ũD ] = L Shiwei Lan Mark Girolami (Warwick) eEα := HEα P eD + C α e Q e L E D D, GP emulative Geometric Monte Carlos α = 0, 1, 2 03/19/2015 31 / 56 Gaussian Process emulation Emulation of gradient and metric GP emulation of Fisher metric Direction emulation Z of expected Fisher information is impossible: FI(θ ∗ |De) = ∇2 U(x, θ ∗ |De)exp(−U(x, θ ∗ |De))dx Consider empirical Fisher information instead: eFI(θ ∗ |De) = DU(θ ∗ |De)JN DU(θ ∗ |De)T ∂ DU(θ ∗ |De)i,j = ∗ U(xj , θ ∗ |De), JN := IN − 1N 1TN /N ∂θi Assume GP for U(xj , ·) across different xj ’s: U(xj , ·) ∼ GP(µ(·), C(·, ·)) We emulate empirical Fisher e D] = L e E1 U e D JN U eT L eT e eT E[eFI(θ ∗ )|U D E1 =: LE1 gFID LE1 Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 32 / 56 Gaussian Process emulation Emulation of gradient and metric Effect of design size on emulation Emulated gradient and Fisher information 2 Emulation in the field 2 1.5 1.5 1 1 0.5 0.5 θ2 θ2 0 -0.5 -0.5 density test pts true gradient true Fisher 6 design pts emulation with 6 pts 15 design pts emulation with 15 pts 30 design pts emulation with 30 pts -1 -1.5 -2 -2 0 -1.5 -1 -0.5 0 0.5 1 1.5 -1 -1.5 2 -2 -2 -1.5 θ1 -1 -0.5 0 0.5 1 1.5 2 θ1 Proposition 3 (Benjamin Haaland, Vaibhav Maheshwari) For design sets De1 ⊆ De2 , we have E[(U(θ ∗ ) − Û(θ ∗ |De2 ))2 ] ≤ E[(U(θ ∗ ) − Û(θ ∗ |De1 ))2 ] Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 33 / 56 Auto-refinement: online updating design pool 1 Geometric Monte Carlos Random Walk + Gradient + Metric 2 Gaussian Process emulation Emulation of potential energy Emulation with derivative information Emulation of gradient and metric 3 Auto-refinement: online updating design pool Adaptation through Regeneration Mutual Information for Computer Experiments (MICE) 4 Experiments Banana-Biscuit-Doughnut distribution Elliptic PDE Teal South oil reservoir 5 Conclusion and Discussion Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 34 / 56 Auto-refinement: online updating design pool Adaptation through Regeneration Regeneration Adaptation: refine the design set and update GP emulator. T (θ t+1 |θ t ) = S(θ t )Q(θ t+1 ) + (1 − S(θ t ))R(θ t+1 |θ t ) Key: regard the transition kernel as a mixture of two kernels; view state as coming from independence kernel with certain probability =⇒ REGENERATE. Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 35 / 56 Auto-refinement: online updating design pool Adaptation through Regeneration Regeneration Sample θ t+1 |θ t ∼ T (·|θ t ), but decide if θ t+1 ∼ Q(θ t+1 ) with Bt+1 ∼ Bern(r (θ t , θ t+1 )), r (θ t , θ t+1 ) = S(θ t )Q(θ t+1 ) T (θ t+1 |θ t ) Split another transition kernel other than GPeMC π(θ t+1 )/q(θ t+1 ) T (θ t+1 |θ t ) = q(θ t+1 ) min 1, π(θ t )/q(θ t ) S(θ t ) = min{1, c/[π(θ t )/q(θ t )]} Q(θ t+1 ) = q(θ t+1 ) min{1, [π(θ t+1 )/q(θ t+1 )]/c} q(·): density of mixture of Gaussians centered at design points. Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 36 / 56 Auto-refinement: online updating design pool Mutual Information for Computer Experiments (MICE) Mutual Information for Computer Experiments Definition 6 (Mutual Information) I (U; U 0 ) = DKL (pUU 0 ||pU pU 0 ) = H(U) − H(U|U 0 ) Given a current design De and a candidate set Θcand , to choose the most informative subset E, Guestrin et al (2005) propose the sequential optimization of mutual information: θ ∗ = arg max I (U(De ∪ {θ}); U(Dec \{θ})) − I (U(De); U(Dec )) θ∈Θcand = arg max Var(U(θ)|uD )/Var(U(θ)|U(Dec \{θ})) θ∈Θcand Beck and Guillas (2014) improve it for computer experiments. Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 37 / 56 Auto-refinement: online updating design pool Mutual Information for Computer Experiments (MICE) Refining design set Start with an arbitrary design, run GP emulative Monte Carlo. When the chain regenerates, refine the design by MICE, with candidate set formed by MCMC samples between regenerations. {U(θ)|θ ∈ Θcand } have been calculated at accept/reject step. 2 Initial Design Final Design 2 1.5 1.5 1.5 1 1 1 1 0.5 θ2 0 -0.5 0.5 θ2 0 -0.5 -0.5 0 -0.5 -1 -1 -1 -1 -1.5 -1.5 -1.5 -1 0 1 2 θ1 Mark Girolami (Warwick) -2 -2 -1 0 θ1 1 2 -2 -2 True density 0.5 θ2 0 -1.5 -2 -2 Shiwei Lan 2 1.5 0.5 θ2 Adapting... 2 -1 0 θ1 GP emulative Geometric Monte Carlos 1 2 -2 -2 -1 0 1 2 θ1 03/19/2015 38 / 56 Auto-refinement: online updating design pool Mutual Information for Computer Experiments (MICE) Mutual learning system Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 39 / 56 Experiments 1 Geometric Monte Carlos Random Walk + Gradient + Metric 2 Gaussian Process emulation Emulation of potential energy Emulation with derivative information Emulation of gradient and metric 3 Auto-refinement: online updating design pool Adaptation through Regeneration Mutual Information for Computer Experiments (MICE) 4 Experiments Banana-Biscuit-Doughnut distribution Elliptic PDE Teal South oil reservoir 5 Conclusion and Discussion Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 40 / 56 Experiments Banana-Biscuit-Doughnut distribution Banana-Biscuit-Doughnut distribution Banana 2 θ2 Biscuit 2 1.5 1.5 1 1 1 0.5 0.5 0.5 θ3 0 -0.5 θ4 0 -0.5 0 -0.5 -1 -1 -1 -1.5 -1.5 -1.5 -2 -2 -1 0 1 2 -2 -2 θ1 Doughnut 2 1.5 -1 0 1 2 θ1 -2 -2 -1 0 1 2 θ2 Generalization of 2d Banana shaped distribution: dD/2e bD/2c X X 2 2 θ2k−1 + θ2k y |θ ∼ N (µy , σy ), µy := k=1 k=1 iid θi ∼ N (0, σθ2 ) Consider D = 4, and generate N = 3 × 106 data yn with µy = 0, σy = 104 , and σθ = 1 to make it data intensive. Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 41 / 56 Experiments Banana-Biscuit-Doughnut distribution Banana-Biscuit-Doughnut distribution 2 0 -2 -4 2 0 -2 -4 2 0 -2 -4 2 0 -2 -4 -4 Shiwei Lan -2 0 2 Mark Girolami (Warwick) -4 -2 0 2 -4 -2 0 GP emulative Geometric Monte Carlos 2 -4 -2 0 2 03/19/2015 42 / 56 Experiments Banana-Biscuit-Doughnut distribution Banana-Biscuit-Doughnut distribution Algorithm RWM HMC RHMC LMC GPeHMC GPeRHMC GPeLMC AP 0.68 0.84 0.85 0.91 0.78 0.76 0.78 s/iter 9.84E-03 8.28E-02 1.05E-01 3.91E-02 2.84E-02 1.32E-01 2.46E-02 ESS (3,4,8) (552,873,1102) (914,1040,1135) (884,1119,1239) (622,653,754) (319,560,756) (537,587,647) minESS/s 0.03 0.67 0.87 2.26 2.19 0.24 2.18 spdup 1.00 21.97 28.75 74.37 72.14 7.95 71.76 Table: Sampling Efficiency in BBD distribution. AP is the acceptance probability, s/iter is the CPU time (second) for each iteration, ESS has (min., med., max.) and minESS/s is the time-normalized ESS. Spdup is the speed up of minESS/s with RWM as the baseline. Results are summarized for 100000 samples after burn-in. Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 43 / 56 Experiments Banana-Biscuit-Doughnut distribution Banana-Biscuit-Doughnut distribution Error Reducing RWM HMC GPeHMC RHMC GPeRHMC LMC GPeLMC 100 10-1 0 200 400 600 800 1000 Relative Error of Covariance Relative Error of Mean Error Reducing RWM HMC GPeHMC RHMC GPeRHMC LMC GPeLMC 100 10-1 0 200 Seconds Shiwei Lan Mark Girolami (Warwick) 400 600 800 1000 Seconds GP emulative Geometric Monte Carlos 03/19/2015 44 / 56 Experiments Elliptic PDE Elliptic PDE Consider a canonical inverse problem involving inference of the diffusion coefficient c in the following elliptic PDE on [0, 1]2 : ∇x · (c(x, θ)∇x u(x, θ)) = 0 u(x, θ)|x2 =0 = x1 , u(x, θ)|x2 =1 = 1 − x1 ∂u(x, θ) ∂u(x, θ) = =0 ∂x1 x1 =0 ∂x1 x1 =1 The observations {yi } arise from the solutions on a 11 × 11 grid: yi = u(xi , θ) + εi , εj ∼ N (0, 0.12 ) c(x) has log-Gaussian process prior. Karhunen-Loève expansion: ! D X p θd λd cd (x) , θi ∼ N (0, 1) c(x, θ) ≈ exp d=1 Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 45 / 56 Experiments Elliptic PDE Elliptic PDE 4 2 0 -2 -4 4 2 0 -2 -4 4 2 0 -2 -4 4 2 0 -2 -4 4 2 0 -2 -4 4 2 0 -2 -4 -4 -2 0 Shiwei Lan 2 4 -4 -2 0 Mark Girolami (Warwick) 2 4 -4 -2 0 2 4 -4 -2 0 2 4 GP emulative Geometric Monte Carlos -4 -2 0 2 4 -4 -2 0 2 4 03/19/2015 46 / 56 Experiments Elliptic PDE Elliptic PDE Algorithm AP s/iter ESS minESS/s spdup RWM HMC RHMC LMC GPeHMC GPeRHMC GPeLMC adpGPeLMC 0.57 0.76 0.87 0.72 0.57 0.65 0.64 0.94 2.70E-02 4.35E-01 2.22E+00 3.92E-01 1.56E-02 7.35E-02 2.65E-02 8.71E-02 (69,94,249) (3169,4357,5082) (5073,5802,6485) (5170,5804,6214) (609,1328,2265) (614,1224,1457) (774,1427,1754) (3328,4058,4543) 0.25 0.73 0.23 1.32 3.91 0.83 2.92 3.82 1.00 2.87 0.90 5.19 15.38 3.29 11.48 15.03 Table: Sampling Efficiency in Elliptic PDE Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 47 / 56 Experiments Elliptic PDE Elliptic PDE RWM HMC GPeHMC RHMC GPeRHMC LMC GPeLMC 100 Relative Error of Mean Error Reducing 10-1 10-2 0 200 400 600 800 1000 Relative Error of Covariance Error Reducing RWM HMC GPeHMC RHMC GPeRHMC LMC GPeLMC 100 10-1 0 200 Seconds Shiwei Lan Mark Girolami (Warwick) 400 600 800 1000 Seconds GP emulative Geometric Monte Carlos 03/19/2015 48 / 56 Experiments Teal South oil reservoir Teal South oil reservoir Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 49 / 56 Experiments Teal South oil reservoir Teal South oil reservoir a single well producing oil, water and gas 9 parameters θ: kh for each of the 5 layers of the field, kv , aquifer strength, rock compressibility and porosity a set of PDEs to simulate the field oil production (FOPR) 42 observations available in 1200 days starting from Nov. 1996 N Misfit = 1 X (FOPRobsi − FOPRsimi (θ))2 , 2 2 i=1 σFOPR σFOPR = 100stb/d inference based on 1000 models simulated by tNavigator Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 50 / 56 Experiments Teal South oil reservoir Teal South oil reservoir GPeHMC 40 20 20 parameter parameter RWM 40 0 -20 -40 0 200 400 600 800 0 -20 -40 1000 0 iteration (thinned) 200 20 20 0 -20 0 200 400 600 800 iteration (thinned) Shiwei Lan Mark Girolami (Warwick) 600 800 1000 GPeLMC 40 parameter parameter GPeRHMC 40 -40 400 iteration (thinned) 1000 0 -20 -40 0 200 400 600 800 1000 iteration (thinned) GP emulative Geometric Monte Carlos 03/19/2015 51 / 56 Experiments Teal South oil reservoir Teal South oil reservoir Algorithm AP s/iter ESS minESS/s spdup RWM GPeHMC GPeRHMC GPeLMC 0.77 0.89 0.84 0.84 1.35E-03 4.38E-03 1.32E-02 5.13E-03 (43,81,116) (5808,6023,6124) (40,1173,3242) (40,1173,3242) 3.20 132.73 0.30 0.78 1.00 41.49 0.09 0.24 Table: Sampling Efficiency in Teal South oil reservoir problem. Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 52 / 56 Conclusion and Discussion 1 Geometric Monte Carlos Random Walk + Gradient + Metric 2 Gaussian Process emulation Emulation of potential energy Emulation with derivative information Emulation of gradient and metric 3 Auto-refinement: online updating design pool Adaptation through Regeneration Mutual Information for Computer Experiments (MICE) 4 Experiments Banana-Biscuit-Doughnut distribution Elliptic PDE Teal South oil reservoir 5 Conclusion and Discussion Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 53 / 56 Conclusion and Discussion Conclusion Geometry helps MCMC samplers in guiding the exploration of parameter space, yet it poses challenge for computation of geometric quantities for big data/models. Gaussian Process emulation can alleviate such challenge but the quality of emulator critically depends on the design set. Regeneration can legally determine times to refine the design and adapt the Markov chain. Experimental design algorithms, e.g. MICE, are helpful in refining the design set. Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 54 / 56 Conclusion and Discussion Future directions Emulator design for higher dimension: demanding larger number of design points. Factorization of covariance matrix: decaying away from diagonal. Local predictor: using only subset of design points for prediction. Natural parallelization: simultaneously emulating on multiple points |E| > 1 and exchange information among chains. Application to larger scale oil reservoir problems. Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 55 / 56 Conclusion and Discussion Thank you ! Shiwei Lan Mark Girolami (Warwick) GP emulative Geometric Monte Carlos 03/19/2015 56 / 56