Adaptive Geometric Monte Carlos using Gaussian Process emulation for computation intensive models

advertisement
Adaptive Geometric Monte Carlos
using Gaussian Process emulation
for computation intensive models
Shiwei Lan
Mark Girolami
Department of Statistics
University of Warwick
March 19, 2015
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
1 / 56
Geometric Monte Carlo – adapt to local geometry
RWM
θ2
HMC
LMC
2
2
2
1.5
1.5
1.5
1
1
1
1
0.5
0.5
0.5
0.5
θ2
0
-0.5
θ2
0
-0.5
θ2
0
-0.5
0
-0.5
-1
-1
-1
-1
-1.5
-1.5
-1.5
-1.5
-2
-2
-1
0
1
2
θ1
Shiwei Lan
RHMC
2
1.5
Mark Girolami (Warwick)
-2
-2
-1
0
θ1
1
2
-2
-2
-1
0
1
θ1
GP emulative Geometric Monte Carlos
2
-2
-2
-1
0
1
2
θ1
03/19/2015
2 / 56
Geometric Monte Carlo – but computationally...
RWM
θ2
HMC
LMC
2
2
2
1.5
1.5
1.5
1
1
1
1
0.5
0.5
0.5
0.5
θ2
0
-0.5
θ2
0
-0.5
θ2
0
-0.5
0
-0.5
-1
-1
-1
-1
-1.5
-1.5
-1.5
-1.5
-2
-2
-1
0
1
2
θ1
Shiwei Lan
RHMC
2
1.5
Mark Girolami (Warwick)
-2
-2
-1
0
θ1
1
2
-2
-2
-1
0
1
θ1
GP emulative Geometric Monte Carlos
2
-2
-2
-1
0
1
2
θ1
03/19/2015
3 / 56
Big data/models – how?
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
4 / 56
Gaussian Process emulation/surrogate
Truth
2
2
1.5
1.5
1
1
0.5
θ2
0.5
θ2
0
0
-0.5
-0.5
-1
-1
-1.5
-1.5
-2
-2
Emulation by GP
-1
0
1
2
-2
-2
-1
θ1
Shiwei Lan
Mark Girolami (Warwick)
0
1
2
θ1
GP emulative Geometric Monte Carlos
03/19/2015
5 / 56
Geometric Monte Carlos
1
Geometric Monte Carlos
Random Walk
+ Gradient
+ Metric
2
Gaussian Process emulation
Emulation of potential energy
Emulation with derivative information
Emulation of gradient and metric
3
Auto-refinement: online updating design pool
Adaptation through Regeneration
Mutual Information for Computer Experiments (MICE)
4
Experiments
Banana-Biscuit-Doughnut distribution
Elliptic PDE
Teal South oil reservoir
5
Conclusion and Discussion
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
6 / 56
Geometric Monte Carlos
Random Walk
Random Walk
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
7 / 56
Geometric Monte Carlos
Random Walk
Random Walk Metropolis
Sample from probability distribution P(θ) with density π(θ).
Given current state θ, sample a random direction p ∼ N (0, I).
Make proposal θ ∗ following this direction p like a random walk:
dθ t = dWt ,
θ ∗ = θ + εp
Accept θ ∗ according to Metropolis acceptance probability:
π(θ ∗ |D)/q(θ ∗ |θ)
π(θ ∗ |D)
α = min 1,
= min 1,
π(θ|D)/q(θ|θ ∗ )
π(θ|D)
where q(θ ∗ |θ) = q(θ|θ ∗ ) = πN (θ ∗ ; θ, ε2 I).
Or reject θ ∗ and stay at current state for the next sample.
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
8 / 56
Geometric Monte Carlos
+ Gradient
+ Gradient
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
9 / 56
Geometric Monte Carlos
+ Gradient
Metropolis Adjusted Langevin Algorithm
Definition 1 (Langevin dynamics)
1
dθ t = ∇ log π(θ t )dt + dWt
2
ε2
θ ∗ = θ + ∇ log π(θ t ) + εp,
2
p ∼ N (0, I)
Accept or reject θ ∗ according to Metropolis probability α.
Converge to the invariant distribution π(θ).
Suppress the random walk behavior; but isotropic diffusion is
inefficient for correlated distributions.
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
10 / 56
Geometric Monte Carlos
+ Gradient
Hamiltonian Monte Carlo: Multi-step MALA
∂H
∂p
∂H
ṗ = −
∂θ
U (θ )
θ̇ =
θ
Position θ ∈ RD ⇐= variables of interest
Momentum p ∈ RD ⇐= fictitious, usually ∼ N (0, M)
Potential energy U(θ) ⇐= minus log of target density π(·)
Kinetic energy K (p) ⇐= minus log of momentum density
Hamiltonian H(θ, p) = U(θ) + K (p) ⇐= constant.
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
11 / 56
Geometric Monte Carlos
+ Gradient
Hamiltonian Monte Carlo
Definition 2 (Hamiltonian dynamics)
θ̇ =
∂
H(θ, p)
∂p
=
M−1 p
∂
ṗ = − ∂θ
H(θ, p) = −∇θ U(θ)
Leapfrog: numerical integrator
p(t + ε/2) = p(t) − (ε/2)∇θ U(θ(t))
θ(t + ε) = θ(t) + εM−1 p(t + ε/2)
p(t + ε) = p(t + ε/2) − (ε/2)∇θ U(θ(t + ε))
Run for L steps and accept the joint proposal of z := (θ, p) with
α = min{1, exp(−H(z∗ ) + H(z))}
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
12 / 56
Geometric Monte Carlos
+ Gradient
Hamiltonian Monte Carlo
Reversibility%
θ%
Random%
p%
dθ/dt%=%dH/dp%
dp/dt%=%)dH/dθ%
sample%
θ*%
Choice%of%ε,%L%
Volume%
PreservaCon%
θ%
Random%
p%
Energy%
ConservaCon%
Shiwei Lan
Mark Girolami (Warwick)
p(t+ε/2)%=%p(t)–ε/2dH/dθ(θ(t))%
θ(t+ε)%=%θ(t)+εdH/dp(p(t+ε/2))%
p(t+ε)%=%p(t+ε/2))ε/2dH/dθ(θ(t+ε))%
GP emulative Geometric Monte Carlos
propose%
acp
/rej%
DirecCon%of%p%
θ*%
03/19/2015
13 / 56
Geometric Monte Carlos
+ Gradient
Geometry plays a role!
Sampling Path of RHMC
1.5
1.5
1.5
1
1
1
0.5
0.5
0.5
0
θ2
2
0
0
−0.5
−0.5
−0.5
−1
−1
−1
−1.5
−1.5
−1.5
−2
−2
Shiwei Lan
Sampling Path of HMC
2
θ2
θ2
Sampling Path of RWM
2
−1
0
θ1
1
Mark Girolami (Warwick)
2
−2
−2
−1
0
θ1
1
2
GP emulative Geometric Monte Carlos
−2
−2
−1
0
θ1
1
03/19/2015
2
14 / 56
Geometric Monte Carlos
+ Metric
+ Metric
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
15 / 56
Geometric Monte Carlos
+ Metric
Riemannian Hamiltonian dynamics
On the manifold {π(·; θ)} with metric G (θ) = −Ex|θ [∇2θ log π(x; θ)]:
H(θ, p) = U(θ) + K (p, θ)
1
1
= − log π(θ) + log det G(θ) + pT G(θ)−1 p
2
2
1 T
−1
≡ φ(θ) + p G(θ) p
2
where p|θ ∼ N (0, G(θ)). Girolami and Calderhead (2011) propose:
Definition 3 (Riemannian Hamiltonian dynamics)
∂
H(θ, p) =
G(θ)−1 p
∂p
∂
1
ṗ = − H(θ, p) = −∇θ φ(θ) + pT G(θ)−1 ∂G(θ)G(θ)−1 p
∂θ
2
θ̇ =
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
16 / 56
Geometric Monte Carlos
+ Metric
Riemannian Hamiltonian Monte Carlo
Generalized Leapfrog
(n+ 12 )
p
θ (n+1)
p(n+1)
i
εh
(n)
(n) (n+ 21 )
(n+ 21 ) T
−1
=p −
∇θ φ(θ ) − (p
) ∇θ G (θ )p
2h
i
1
ε −1 (n)
G (θ ) + G−1 (θ (n+1) ) p(n+ 2 )
= θ (n) +
2 h
i
ε
(n+1)
(n+1) (n+ 21 )
(n+ 12 )
(n+ 21 ) T
−1
=p
−
∇θ φ(θ
) − (p
) ∇θ G (θ
)p
2
(n)
Time reversible
Volume preserving
But · · · time consuming, and occasionally un-stable · · ·
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
17 / 56
Geometric Monte Carlos
+ Metric
Lagrangian dynamics
Definition 4 (Lagrangian Dynamics)
θ̇ = G(θ)−1 p
1
ṗ = −∇θ φ(θ) + pT G(θ)−1 ∂G(θ)G(θ)−1 p
2
w
w
p → v  Lagrangian Dynamics
θ̇ = v
v̇ = −vT Γ(θ)v − G(θ)−1 ∇θ φ(θ)
Not Hamiltonian dynamics of (θ, v)!
Computational time is mainly spent on finding direction v.
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
18 / 56
Geometric Monte Carlos
+ Metric
Second-order Langevin Diffusion
Consider the stochastic differential equation
dθ t = vt dt
dvt = −vtT Γ(θ t )vt dt − G−1 (θ t )∇θt φ(θ t )dt − vt dt +
p
2G−1 (θ t )dWt
Fokker-Planck equation of the density evolution q(θ, v) is
2
dq
∂
k ∂q
k i j
kl ∂φ
k
kl ∂ q
v
v
+
g
= −v
+
Γ
+
v
q
+
g
ij
dt
∂θk
∂v k
∂θl
∂v k ∂v l
The stationary density as t → ∞, dq/dt = 0:
|g (θ)|
1 T
q(θ, v) =
exp − v G(θ)v − φ(x) = π(θ)N (0, G(θ)−1 )
D/2
(2π)
2
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
19 / 56
Geometric Monte Carlos
+ Metric
Lagrangian Monte Carlo
Explicit time-reversible integrator
1
ε
ε
v(n+ 2 ) = [I + (v(n) )T Γ(θ (n) )]−1 [v(n) − G(θ (n) )−1 ∇θ φ(θ (n) )]
2
2
(n+1)
(n)
(n+ 12 )
θ
= θ + εv
1
1
ε
ε
v(n+1) = [I + (v(n+ 2 ) )T Γ(θ (n+1) )]−1 [v(n+ 2 ) − G(θ (n+1) )−1 ∇θ φ(θ (n+1)
2
2
Proposition 1 (Detailed Balance with Volume Correction)
exp(−H(z0 ))
α̃(z, z )P(dz) = α̃(z , z)P(dz ), α̃(z, z ) = min 1,
| det T̂L |
exp(−H(z))
0
0
0
0
Numerical convergence
en = kz(tn ) − z(n) k = k(θ(tn ), v(tn )) − (θ (n) , v(n) )k → 0, ε → 0
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
20 / 56
Geometric Monte Carlos
+ Metric
Efficiency Measurement
Definition 5 (Effective Sample Size)
For S samples, effective sample size is calculated as follows:
K
ESS = S[1 + 2Σk=1
ρ(k)]−1
where ρ(k) is the autocorrelation function with lag k, and K 1.
Performance measured by time-normalized ESS.
Interpreted as number of nearly independent samples.
Use the minimum ESS normalized by CPU time: min(ESS)/s.
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
21 / 56
Geometric Monte Carlos
+ Metric
Banana-shaped distribution
RHMC
sLMC
1.2
1.2
1
1
1
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0
−3
−2
−1
0
1
2
0.2
0
−3
−2
−1
0
1
2
0
−3
0.6
0.6
0.5
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0
0
−1
0
1
2
AP
0.79
0.78
0.84
0.73
Mark Girolami (Warwick)
0
1
2
0.1
0
−2
θ2
Method
HMC
RHMC
sLMC
LMC
−1
θ1
0.6
−2
−2
θ1
θ1
Shiwei Lan
LMC
1.2
−1
0
1
2
−2
θ2
s/Iter
6.96e-04
4.56e-03
7.90e-04
7.27e-04
ESS
(288,614,941)
(4514,5779,7044)
(2195,3476,4757)
(1139,2409,3678)
GP emulative Geometric Monte Carlos
−1
0
1
2
θ2
min(ESS)/s
20.65
49.50
138.98
78.32
03/19/2015
22 / 56
Geometric Monte Carlos
+ Metric
Banana-shaped distribution
Sampling Path of LMC
1.5
1.5
1.5
1
1
1
0.5
0.5
0.5
0
θ2
2
0
0
−0.5
−0.5
−0.5
−1
−1
−1
−1.5
−1.5
−1.5
−2
−2
Shiwei Lan
Sampling Path of sLMC
2
θ2
θ2
Sampling Path of RHMC
2
−1
0
θ1
1
Mark Girolami (Warwick)
2
−2
−2
−1
0
θ1
1
2
GP emulative Geometric Monte Carlos
−2
−2
−1
0
θ1
1
03/19/2015
2
23 / 56
Geometric Monte Carlos
+ Metric
Links between geometric Monte Carlos
RWM MALA HMC Gradient descent Shiwei Lan
Mark Girolami (Warwick)
RHMC LMC Newton’s method GP emulative Geometric Monte Carlos
03/19/2015
24 / 56
Gaussian Process emulation
1
Geometric Monte Carlos
Random Walk
+ Gradient
+ Metric
2
Gaussian Process emulation
Emulation of potential energy
Emulation with derivative information
Emulation of gradient and metric
3
Auto-refinement: online updating design pool
Adaptation through Regeneration
Mutual Information for Computer Experiments (MICE)
4
Experiments
Banana-Biscuit-Doughnut distribution
Elliptic PDE
Teal South oil reservoir
5
Conclusion and Discussion
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
25 / 56
Gaussian Process emulation
Emulation of potential energy
GP emulation of potential energy (log-posterior)
We need log-posterior for all MCMCs:
U(θ) = − log L(x; θ) − log P(θ)
Big data – the log-likelihood has huge amount items to add up;
Complex models–they are computationally expensive to simulate.
We need cheaper substitutes – Gaussian Process emulation.
U(·) ∼ GP(µ(·), C(·, ·))
µ(θ) = h(θ)β,
h(θ) := [1, θ T ] a 1 × (D + 1) vector
C(·, ·) = σ 2 C(·, ·),
T
C(θ i , θ j ) := exp{−(θ i − θ j ) diag(ρ)(θ i − θ j )}
Other parametrizations ρ = r−2 (r correlation length), ρ = e−τ .
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
26 / 56
Gaussian Process emulation
Emulation of potential energy
GP emulation of potential energy
Given design points De := {θ 1 , · · · , θ n }, and conditioned on
functional outputs uD := U(De), we can predict U(θ ∗ ) at
E := {θ ∗1 , · · · , θ ∗m }, denoted as uE . Assume p(β, σ 2 ) ∝ σ −2 .
uE |uD , ρ ∼ Tn−(D+1) (µ∗∗ , σ
b2 C∗∗ )
b + CED C−1 (uD − HD β)
b
µ∗∗ = HE β
D
"
# "
#
h
i 0 HT −1 HT
D
E
C∗∗ = CE − HE CED
HD CD
CDE
b = (HT C−1 HD )−1 HT C−1 uD
β
D D
D D
σ
b2 = (n − (D + 1) − 2)−1 uTD QD uD
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
27 / 56
Gaussian Process emulation
Emulation of potential energy
GP emulation of potential energy
−1
BD := (HTD C−1
D HD )
PD := BD HTD C−1
D
QD := C−1
D [I − HD PD ]
LE := HE PD + CED QD
Best Linear Unbiased Predictor
BLUP
uE |uD , ρ ≈ µ∗∗ = LE uD
ρ is fixed at MLE.
PD HD = I,
Shiwei Lan
Mark Girolami (Warwick)
HTD QD = QD HD = 0
GP emulative Geometric Monte Carlos
03/19/2015
28 / 56
Gaussian Process emulation
Emulation with derivative information
GP emulation with derivative information
Derivative information, duD = ∇ ⊗ U(De) helps GP emulation.
The differential operator is linear thus dU(·) is still a Gaussian
Process (Papoulis and Pillai, 2002)
∂U(θ i )
∂
E
= i E[U(θ i )]
i
∂θk
∂θk
i
∂U(θ )
∂
Cor
, U(θ j ) = i C(θ i , θ j ) = −2ρk (θki − θkj )C(θ i , θ j )
i
∂θk
∂θk
#
"
∂2
∂U(θ i ) ∂U(θ j )
,
= i j C(θ i , θ j )
Cor
i
j
∂θk
∂θl
∂θk ∂θl
= [2ρk δkl − 4ρk ρl (θki − θkj )(θli − θlj )]C(θ i , θ j )
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
29 / 56
Gaussian Process emulation
Emulation with derivative information
Effect of derivative information on emulation
2
30pts (no derivatives)
2
1.5
θ2
60pts (no derivatives)
True density
2
1.5
2
1.5
1.5
1
1
1
1
0.5
0.5
0.5
0.5
θ2
0
θ2
0
θ2
0
0
-0.5
-0.5
-0.5
-0.5
-1
-1
-1
-1
-1.5
-1.5
-1.5
-1.5
-2
-2
-1
0
1
2
-2
-2
θ1
-1
0
θ1
1
2
-2
-2
30pts (with derivatives)
-1
0
θ1
1
2
-2
-2
-1
0
1
2
θ1
Proposition 2
Denote ũD = [uTD , duTD ]T . Given the same design set De, we have
E[(U(θ ∗ ) − Û(θ ∗ )|ũD )2 ] ≤ E[(U(θ ∗ ) − Û(θ ∗ )|uD )2 ]
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
30 / 56
Gaussian Process emulation
Emulation of gradient and metric
GP emulation of gradient and Hessian
Geometric MCMCs also require gradient/Hessian of potential
energy and Fisher metric (derivatives) that are emulated similarly
d 2 uE (mD 2 )
d 2 uE (mD 2 )
duE (mD)
uE (m)
CE2
uD (n)
duD (nD)
CD0 E2
CD1 E2
duE (mD)
uE (m)
uD (n)
duD (nD)
CE
CE2 D0
CE1 D0
CE0 D0
CE2 D1
CE1 D1
CE0 D1
CD0 E0
CD1 E0
CD0 D0
CD1 D0
CD0 D1
CD1 D1
CE1
CD0 E1
CD1 E1
e D := (H
eTC
e −1 e −1 e
e e T e −1
Denote B
D D HD ) , PD := BD HD CD ,
e D := C
e −1 (I − H
e DP
e D ). All predictions can be written as a linear
Q
D
map of the extended information based on designed points:
eEα ũD ,
E[uEα |ũD ] = L
Shiwei Lan
Mark Girolami (Warwick)
eEα := HEα P
eD + C α e Q
e
L
E D D,
GP emulative Geometric Monte Carlos
α = 0, 1, 2
03/19/2015
31 / 56
Gaussian Process emulation
Emulation of gradient and metric
GP emulation of Fisher metric
Direction emulation
Z of expected Fisher information is impossible:
FI(θ ∗ |De) = ∇2 U(x, θ ∗ |De)exp(−U(x, θ ∗ |De))dx
Consider empirical Fisher information instead:
eFI(θ ∗ |De) = DU(θ ∗ |De)JN DU(θ ∗ |De)T
∂
DU(θ ∗ |De)i,j = ∗ U(xj , θ ∗ |De), JN := IN − 1N 1TN /N
∂θi
Assume GP for U(xj , ·) across different xj ’s:
U(xj , ·) ∼ GP(µ(·), C(·, ·))
We emulate empirical Fisher
e D] = L
e E1 U
e D JN U
eT L
eT
e
eT
E[eFI(θ ∗ )|U
D E1 =: LE1 gFID LE1
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
32 / 56
Gaussian Process emulation
Emulation of gradient and metric
Effect of design size on emulation
Emulated gradient and Fisher information
2
Emulation in the field
2
1.5
1.5
1
1
0.5
0.5
θ2
θ2
0
-0.5
-0.5
density
test pts
true gradient
true Fisher
6 design pts
emulation with 6 pts
15 design pts
emulation with 15 pts
30 design pts
emulation with 30 pts
-1
-1.5
-2
-2
0
-1.5
-1
-0.5
0
0.5
1
1.5
-1
-1.5
2
-2
-2
-1.5
θ1
-1
-0.5
0
0.5
1
1.5
2
θ1
Proposition 3 (Benjamin Haaland, Vaibhav Maheshwari)
For design sets De1 ⊆ De2 , we have
E[(U(θ ∗ ) − Û(θ ∗ |De2 ))2 ] ≤ E[(U(θ ∗ ) − Û(θ ∗ |De1 ))2 ]
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
33 / 56
Auto-refinement: online updating design pool
1
Geometric Monte Carlos
Random Walk
+ Gradient
+ Metric
2
Gaussian Process emulation
Emulation of potential energy
Emulation with derivative information
Emulation of gradient and metric
3
Auto-refinement: online updating design pool
Adaptation through Regeneration
Mutual Information for Computer Experiments (MICE)
4
Experiments
Banana-Biscuit-Doughnut distribution
Elliptic PDE
Teal South oil reservoir
5
Conclusion and Discussion
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
34 / 56
Auto-refinement: online updating design pool
Adaptation through Regeneration
Regeneration
Adaptation: refine the design set and update GP emulator.
T (θ t+1 |θ t ) = S(θ t )Q(θ t+1 ) + (1 − S(θ t ))R(θ t+1 |θ t )
Key: regard the transition kernel as a mixture of two kernels;
view state as coming from independence kernel with certain
probability =⇒ REGENERATE.
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
35 / 56
Auto-refinement: online updating design pool
Adaptation through Regeneration
Regeneration
Sample θ t+1 |θ t ∼ T (·|θ t ), but decide if θ t+1 ∼ Q(θ t+1 ) with
Bt+1 ∼ Bern(r (θ t , θ t+1 )),
r (θ t , θ t+1 ) =
S(θ t )Q(θ t+1 )
T (θ t+1 |θ t )
Split another transition kernel other than GPeMC
π(θ t+1 )/q(θ t+1 )
T (θ t+1 |θ t ) = q(θ t+1 ) min 1,
π(θ t )/q(θ t )
S(θ t ) = min{1, c/[π(θ t )/q(θ t )]}
Q(θ t+1 ) = q(θ t+1 ) min{1, [π(θ t+1 )/q(θ t+1 )]/c}
q(·): density of mixture of Gaussians centered at design points.
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
36 / 56
Auto-refinement: online updating design pool
Mutual Information for Computer Experiments (MICE)
Mutual Information for Computer Experiments
Definition 6 (Mutual Information)
I (U; U 0 ) = DKL (pUU 0 ||pU pU 0 ) = H(U) − H(U|U 0 )
Given a current design De and a candidate set Θcand , to choose
the most informative subset E, Guestrin et al (2005) propose the
sequential optimization of mutual information:
θ ∗ = arg max I (U(De ∪ {θ}); U(Dec \{θ})) − I (U(De); U(Dec ))
θ∈Θcand
= arg max Var(U(θ)|uD )/Var(U(θ)|U(Dec \{θ}))
θ∈Θcand
Beck and Guillas (2014) improve it for computer experiments.
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
37 / 56
Auto-refinement: online updating design pool
Mutual Information for Computer Experiments (MICE)
Refining design set
Start with an arbitrary design, run GP emulative Monte Carlo.
When the chain regenerates, refine the design by MICE, with
candidate set formed by MCMC samples between regenerations.
{U(θ)|θ ∈ Θcand } have been calculated at accept/reject step.
2
Initial Design
Final Design
2
1.5
1.5
1.5
1
1
1
1
0.5
θ2
0
-0.5
0.5
θ2
0
-0.5
-0.5
0
-0.5
-1
-1
-1
-1
-1.5
-1.5
-1.5
-1
0
1
2
θ1
Mark Girolami (Warwick)
-2
-2
-1
0
θ1
1
2
-2
-2
True density
0.5
θ2
0
-1.5
-2
-2
Shiwei Lan
2
1.5
0.5
θ2
Adapting...
2
-1
0
θ1
GP emulative Geometric Monte Carlos
1
2
-2
-2
-1
0
1
2
θ1
03/19/2015
38 / 56
Auto-refinement: online updating design pool
Mutual Information for Computer Experiments (MICE)
Mutual learning system
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
39 / 56
Experiments
1
Geometric Monte Carlos
Random Walk
+ Gradient
+ Metric
2
Gaussian Process emulation
Emulation of potential energy
Emulation with derivative information
Emulation of gradient and metric
3
Auto-refinement: online updating design pool
Adaptation through Regeneration
Mutual Information for Computer Experiments (MICE)
4
Experiments
Banana-Biscuit-Doughnut distribution
Elliptic PDE
Teal South oil reservoir
5
Conclusion and Discussion
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
40 / 56
Experiments
Banana-Biscuit-Doughnut distribution
Banana-Biscuit-Doughnut distribution
Banana
2
θ2
Biscuit
2
1.5
1.5
1
1
1
0.5
0.5
0.5
θ3
0
-0.5
θ4
0
-0.5
0
-0.5
-1
-1
-1
-1.5
-1.5
-1.5
-2
-2
-1
0
1
2
-2
-2
θ1
Doughnut
2
1.5
-1
0
1
2
θ1
-2
-2
-1
0
1
2
θ2
Generalization of 2d Banana shaped distribution:
dD/2e
bD/2c
X
X
2
2
θ2k−1 +
θ2k
y |θ ∼ N (µy , σy ), µy :=
k=1
k=1
iid
θi ∼ N (0, σθ2 )
Consider D = 4, and generate N = 3 × 106 data yn with
µy = 0, σy = 104 , and σθ = 1 to make it data intensive.
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
41 / 56
Experiments
Banana-Biscuit-Doughnut distribution
Banana-Biscuit-Doughnut distribution
2
0
-2
-4
2
0
-2
-4
2
0
-2
-4
2
0
-2
-4
-4
Shiwei Lan
-2
0
2
Mark Girolami (Warwick)
-4
-2
0
2
-4
-2
0
GP emulative Geometric Monte Carlos
2
-4
-2
0
2
03/19/2015
42 / 56
Experiments
Banana-Biscuit-Doughnut distribution
Banana-Biscuit-Doughnut distribution
Algorithm
RWM
HMC
RHMC
LMC
GPeHMC
GPeRHMC
GPeLMC
AP
0.68
0.84
0.85
0.91
0.78
0.76
0.78
s/iter
9.84E-03
8.28E-02
1.05E-01
3.91E-02
2.84E-02
1.32E-01
2.46E-02
ESS
(3,4,8)
(552,873,1102)
(914,1040,1135)
(884,1119,1239)
(622,653,754)
(319,560,756)
(537,587,647)
minESS/s
0.03
0.67
0.87
2.26
2.19
0.24
2.18
spdup
1.00
21.97
28.75
74.37
72.14
7.95
71.76
Table: Sampling Efficiency in BBD distribution. AP is the acceptance
probability, s/iter is the CPU time (second) for each iteration, ESS has
(min., med., max.) and minESS/s is the time-normalized ESS. Spdup is
the speed up of minESS/s with RWM as the baseline. Results are
summarized for 100000 samples after burn-in.
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
43 / 56
Experiments
Banana-Biscuit-Doughnut distribution
Banana-Biscuit-Doughnut distribution
Error Reducing
RWM
HMC
GPeHMC
RHMC
GPeRHMC
LMC
GPeLMC
100
10-1
0
200
400
600
800
1000
Relative Error of Covariance
Relative Error of Mean
Error Reducing
RWM
HMC
GPeHMC
RHMC
GPeRHMC
LMC
GPeLMC
100
10-1
0
200
Seconds
Shiwei Lan
Mark Girolami (Warwick)
400
600
800
1000
Seconds
GP emulative Geometric Monte Carlos
03/19/2015
44 / 56
Experiments
Elliptic PDE
Elliptic PDE
Consider a canonical inverse problem involving inference of the
diffusion coefficient c in the following elliptic PDE on [0, 1]2 :
∇x · (c(x, θ)∇x u(x, θ)) = 0
u(x, θ)|x2 =0 = x1 , u(x, θ)|x2 =1 = 1 − x1
∂u(x, θ) ∂u(x, θ) =
=0
∂x1 x1 =0
∂x1 x1 =1
The observations {yi } arise from the solutions on a 11 × 11 grid:
yi = u(xi , θ) + εi ,
εj ∼ N (0, 0.12 )
c(x) has log-Gaussian process prior. Karhunen-Loève expansion:
!
D
X
p
θd λd cd (x) , θi ∼ N (0, 1)
c(x, θ) ≈ exp
d=1
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
45 / 56
Experiments
Elliptic PDE
Elliptic PDE
4
2
0
-2
-4
4
2
0
-2
-4
4
2
0
-2
-4
4
2
0
-2
-4
4
2
0
-2
-4
4
2
0
-2
-4
-4 -2 0
Shiwei Lan
2
4
-4 -2 0
Mark Girolami (Warwick)
2
4
-4 -2 0
2
4
-4 -2 0
2
4
GP emulative Geometric Monte Carlos
-4 -2 0
2
4
-4 -2 0
2
4
03/19/2015
46 / 56
Experiments
Elliptic PDE
Elliptic PDE
Algorithm
AP
s/iter
ESS
minESS/s
spdup
RWM
HMC
RHMC
LMC
GPeHMC
GPeRHMC
GPeLMC
adpGPeLMC
0.57
0.76
0.87
0.72
0.57
0.65
0.64
0.94
2.70E-02
4.35E-01
2.22E+00
3.92E-01
1.56E-02
7.35E-02
2.65E-02
8.71E-02
(69,94,249)
(3169,4357,5082)
(5073,5802,6485)
(5170,5804,6214)
(609,1328,2265)
(614,1224,1457)
(774,1427,1754)
(3328,4058,4543)
0.25
0.73
0.23
1.32
3.91
0.83
2.92
3.82
1.00
2.87
0.90
5.19
15.38
3.29
11.48
15.03
Table: Sampling Efficiency in Elliptic PDE
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
47 / 56
Experiments
Elliptic PDE
Elliptic PDE
RWM
HMC
GPeHMC
RHMC
GPeRHMC
LMC
GPeLMC
100
Relative Error of Mean
Error Reducing
10-1
10-2
0
200
400
600
800
1000
Relative Error of Covariance
Error Reducing
RWM
HMC
GPeHMC
RHMC
GPeRHMC
LMC
GPeLMC
100
10-1
0
200
Seconds
Shiwei Lan
Mark Girolami (Warwick)
400
600
800
1000
Seconds
GP emulative Geometric Monte Carlos
03/19/2015
48 / 56
Experiments
Teal South oil reservoir
Teal South oil reservoir
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
49 / 56
Experiments
Teal South oil reservoir
Teal South oil reservoir
a single well producing oil, water and gas
9 parameters θ: kh for each of the 5 layers of the field, kv ,
aquifer strength, rock compressibility and porosity
a set of PDEs to simulate the field oil production (FOPR)
42 observations available in 1200 days starting from Nov. 1996
N
Misfit =
1 X (FOPRobsi − FOPRsimi (θ))2
,
2
2 i=1
σFOPR
σFOPR = 100stb/d
inference based on 1000 models simulated by tNavigator
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
50 / 56
Experiments
Teal South oil reservoir
Teal South oil reservoir
GPeHMC
40
20
20
parameter
parameter
RWM
40
0
-20
-40
0
200
400
600
800
0
-20
-40
1000
0
iteration (thinned)
200
20
20
0
-20
0
200
400
600
800
iteration (thinned)
Shiwei Lan
Mark Girolami (Warwick)
600
800
1000
GPeLMC
40
parameter
parameter
GPeRHMC
40
-40
400
iteration (thinned)
1000
0
-20
-40
0
200
400
600
800
1000
iteration (thinned)
GP emulative Geometric Monte Carlos
03/19/2015
51 / 56
Experiments
Teal South oil reservoir
Teal South oil reservoir
Algorithm
AP
s/iter
ESS
minESS/s
spdup
RWM
GPeHMC
GPeRHMC
GPeLMC
0.77
0.89
0.84
0.84
1.35E-03
4.38E-03
1.32E-02
5.13E-03
(43,81,116)
(5808,6023,6124)
(40,1173,3242)
(40,1173,3242)
3.20
132.73
0.30
0.78
1.00
41.49
0.09
0.24
Table: Sampling Efficiency in Teal South oil reservoir problem.
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
52 / 56
Conclusion and Discussion
1
Geometric Monte Carlos
Random Walk
+ Gradient
+ Metric
2
Gaussian Process emulation
Emulation of potential energy
Emulation with derivative information
Emulation of gradient and metric
3
Auto-refinement: online updating design pool
Adaptation through Regeneration
Mutual Information for Computer Experiments (MICE)
4
Experiments
Banana-Biscuit-Doughnut distribution
Elliptic PDE
Teal South oil reservoir
5
Conclusion and Discussion
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
53 / 56
Conclusion and Discussion
Conclusion
Geometry helps MCMC samplers in guiding the exploration of
parameter space, yet it poses challenge for computation of
geometric quantities for big data/models.
Gaussian Process emulation can alleviate such challenge but the
quality of emulator critically depends on the design set.
Regeneration can legally determine times to refine the design
and adapt the Markov chain.
Experimental design algorithms, e.g. MICE, are helpful in
refining the design set.
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
54 / 56
Conclusion and Discussion
Future directions
Emulator design for higher dimension: demanding larger number
of design points.
Factorization of covariance matrix: decaying away from diagonal.
Local predictor: using only subset of design points for prediction.
Natural parallelization: simultaneously emulating on multiple
points |E| > 1 and exchange information among chains.
Application to larger scale oil reservoir problems.
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
55 / 56
Conclusion and Discussion
Thank you !
Shiwei Lan
Mark Girolami (Warwick)
GP emulative Geometric Monte Carlos
03/19/2015
56 / 56
Download