Introduction Standard RML Augmented RML Examples Summary References Metropolized Randomized Maximum Likelihood for sampling from multimodal distributions Dean Oliver Uni Research CIPR 22 February 2016 1/45 Introduction Standard RML Augmented RML Examples Summary References Problem statement Sample from a target distribution π that is a product of a prior distribution (Gaussian) and a likelihood function with additive Gaussian errors • π may be multimodal • Dimension of model space may be relatively large (targeted at geoscience problems). 2/45 Introduction Standard RML Augmented RML Examples Summary References Motivation Numerical models are used for forecasting and decisions. • Many parameters (105 –106 ). • Observations are generally sparse (1 km apart, but daily observations for several years). • Likelihood function evaluations expensive (0.1–10 hour) Figure source: http://www.sintef.no/ 3/45 Introduction Standard RML Augmented RML Examples Summary References Current methodology Currently use iterative ensemble smoothers for reservoir “history matching”. Multiple approximations: 1. Based on Randomized Maximum Likelihood without weighting 2. Updates are computed from correlations between data and model variables 3. Ignore model error 4. . . . Layer 1 F3 10 B1B C3 K3 D2 B4D C4 E2 E2A B4B B1 D3B D3A B2 20 Layer 1 E3A B3 B4 E3 E3C E1 F1 D4A F2 D4 D1 C2 D1C C1 30 C4A F4 E4A 40 20 40 60 80 realization 1 100 8.50 7.95 7.40 6.86 6.31 5.76 5.21 4.66 4.12 3.57 3.02 F3 10 B1B C3 K3 D2 B4D E3A B3 B4 C4 E2 E2A B4B B1 D3B D3A B2 20 E3 E3C E1 F1 D4A F2 D4 D1 C2 D1C C1 30 C4A F4 E4A 40 20 40 60 80 8.50 7.95 7.40 6.86 6.31 5.76 5.21 4.66 4.12 3.57 3.02 100 realization 2 4/45 Introduction Standard RML Augmented RML Examples Summary References Background Several methods use optimization to place candidate states in regions of high probability. Quasi-Linear Estimation did not apply a MH test to correct the sampling for nonlinear (Kitanidis, 1995). Randomized Maximum Likelihood applied MH test to obtain correct sampling but required marginalization (Oliver et al., 1996). Randomize-Then-Optimize MH acceptance test limited the application to distributions with a single mode (Bardsley et al., 2014). In this paper, the RML method is modified so that marginalization is not required. Applicable to multimodal distributions. 5/45 Introduction Standard RML Augmented RML Examples Summary References Simulation methods based on minimization Many methods can be used to generate realizations from multi-normal distributions. Two are particularly interesting because of the contrast in approaches. • One approach is to generate a “rough” field (an unconditional simulation) with the same covariance as the true field, then to subtract a smooth correction that forces the simulated field to pass through the data. • A second approach is to compute a “smooth” estimate that passes through the data, then use the LU decomposition of the estimation error covariance to add a stochastic component to the estimate. It is known that these methods are equivalent for data without errors (Krzanowski, 1988) and for data with errors (Oliver, 1996). 6/45 Introduction Standard RML Augmented RML Examples Summary References Smooth plus rough — Gauss-linear problem The maximum a posteriori estimate minimizes −1 mmap = argmin(m − mprior )T CM (m − mprior ) + (Gm − dobs )T CD−1 (Gm − dobs ), = mprior + CM G T (GCM G T + CD )−1 (dobs − Gmprior ). Samples from posterior can be generated mi = mmap + LZi for i = 1, . . . , N where Zi ∼ N[0, I ], LLT = CM0 , and −1 CM0 = (CM + G T CD−1 G )−1 7/45 Introduction Standard RML Augmented RML Examples Summary References Rough plus smooth — Gauss-linear problem In this algorithm, we first generate unconditional realizations of the model (and data), then make a correction to the unconditional model. • Sample mu ∼ N[mprior , CM ]. • Sample du ∼ N[dobs , CD ]. • Compute the model mc that minimizes −1 mc = argmin(m − mu )T CM (m − mu ) + (Gm − du )T CD−1 (Gm − du ), = mu − CM G T [GCM G T + CD ]−1 (Gmu − du ) Does not require mmap or posteriori covariance. 8/45 Introduction Standard RML Augmented RML Examples Summary References Illustration: Truth 2.6 2.4 2.2 2.0 1.8 1.6 0 10 20 30 40 50 The truth with three (inaccurate) observations. 9/45 Introduction Standard RML Augmented RML Examples Summary References Smooth plus rough — Compute maximum a posteriori estimate 2.8 2.6 Truth 2.4 2.2 2.0 1.8 MAP estimate 1.6 0 10 20 30 40 50 h −1 mmap = argmin (m − mprior )T CM (m − mprior ) i + (Gm − dobs )T CD−1 (Gm − dobs ) 10/45 Introduction Standard RML Augmented RML Examples Summary References Smooth plus rough 0.5 0.4 0.3 0.2 0.1 10 20 30 40 50 The MAP estimate of the standard deviation in the estimate (square root of posteriori variance). 11/45 Introduction Standard RML Augmented RML Examples Summary References Smooth plus rough 3 2 1 10 20 1/2 mi = mmap + CM 0 Zi 30 40 50 for i = 1, . . . , N 12/45 Introduction Standard RML Augmented RML Examples Summary References Alternative: Rough plus smooth 3.5 3.0 2.5 2.0 1.5 1.0 0.5 10 20 30 40 50 Ten unconditional realizations. 1/2 mi = mprior + CM Zi for i = 1, . . . , N 13/45 Introduction Standard RML Augmented RML Examples Summary References Illustration: Rough plus smooth 2.5 2.0 1.5 1.0 0.5 10 20 30 40 Add a smooth correction to an unconditional realization to make it conditional to data. δm = −CM G T [GCM G T + CD ]−1 (Gmu − du ). 14/45 Introduction Standard RML Augmented RML Examples Summary References Illustration: Rough plus smooth 3.5 3.0 2.5 2.0 1.5 1.0 0.5 10 20 30 40 50 The 10 conditional realizations resulting from smooth corrections to the unconditional realizations. 15/45 Introduction Standard RML Augmented RML Examples Summary References Smooth plus rough and rough plus smooth— Summary Both algorithms are valid methods for sampling the posterior for linear problems with gaussian measurement errors and a gaussian prior PDF for the model variables. Neither is correct for nonlinear problems, but the method of adding a smooth correction to unconditional realizations appeared to be more robust to nonlinearity. 16/45 Introduction Standard RML Augmented RML Examples Summary References Application to nonlinear problems Let X be multivariate normal with mean µ and covariance Cx such that the prior probability density for X is 1 p(x) = cp exp − (x − µ)T Cx−1 (x − µ) . 2 Observations d o = g (x) + d with d ∼ N(0, Cd ) are assimilated resulting in a posterior density 1 π(x) ∝ exp − (x − µ)T Cx−1 (x − µ) 2 1 o T −1 o − (g (x) − d ) Cd (g (x) − d ) . 2 17/45 Introduction Standard RML Augmented RML Examples Summary References Standard RML for MCMC 1. Generate candidate state x∗ : 1.1 Independently sample xuc ∼ N[xpr , Cx ] and duc ∼ N[dobs , Cd ]. 1.2 Minimize a nonlinear least-squares functional: Model parameter mismatch }| { h1 z x∗ = argmin (x − xuc )T Cx−1 (x − xuc ) 2 x i 1 + (g (x) − duc )T Cd−1 (g (x) − duc ) 2| {z } Sum of squared data mismatch (Need to compute the probability of proposing x∗ .) 18/45 Introduction Standard RML Augmented RML Examples Summary References Standard RML for MCMC (2) 2. Compute probability of proposing (x∗ , duc ). 2.1 Probability of proposing (xuc , duc ) 1 f (xuc , duc ) ∝ exp − (xuc − µ)T Cx−1 (xuc − µ) 2 1 − (duc − dobs )T Cd−1 (duc − dobs ) 2 2.2 Probability of proposing (x∗ , duc ) q(x∗ , duc ) = f (xuc (x∗ , duc ), duc )| det J| where xuc = x∗ + Cx G (x∗ )T Cd−1 (g (x∗ ) − duc ) from the necessary condition for x∗ to be a minimum. G (x∗ ) = ∂g /∂x|x∗ . 19/45 Introduction Standard RML Augmented RML Examples Summary References Standard RML for MCMC (3) 3. Compute marginal probability of proposing x∗ . Z qx (x∗ ) = q(x∗ , duc ) dduc . D 4. Accept proposed state x∗ with probability π(x∗ )qx (x) α(x, x∗ ) = min 1, . π(x)qx (x∗ ) else retain state x (Metropolis-Hastings). 20/45 Introduction Standard RML Augmented RML Examples EXAMPLE FROM 1996 1.8 Summary 2. 2.2 References 2.4 1.2 4 Posterior 1.2 1. 1. dUC 3 2 ∂mu ∂m∗ 0.8 Prior 0.8 1 0.6 0.6 1.8 1.5 2.0 2. 2.5 2.2 2.4 m - calibrated 0.06 1.2 Proposal 0.05 1.0 Acceptance rate 0.74 0.04 0.8 0.03 0.6 Target 0.02 0.4 0.01 0.2 1.8 1.9 2.0 2.1 2.2 2.3 2.4 1.8 2.0 2.2 2.4 21/45 Introduction Standard RML Augmented RML Examples Summary References Comments on Randomized Maximum Likelihood • The proposal density was typically close to the target density (identical for linear observations). • Acceptance rate was high in multimodal experiments • MH acceptance ratio was difficult to compute • In practice — ignored MH test (accepted all transitions) • Formed the basis for several iterative variants of ensemble Kalman filter-like methods 22/45 Introduction Standard RML Augmented RML Examples Summary References Augmented state RML1 1. Augment the state with data variables 2. Modify the target joint pdf such that the marginal pdf for model variables is identical to the true posterior pdf for model variables 3. Modify the proposal density to improve MH acceptance 1 Oliver (2015) 23/45 Introduction Standard RML Augmented RML Examples Summary References Modify the target joint pdf Define the target joint probability for the augmented state is h 1 1 π(x, d) ∝ exp − (x−µ)T Cx−1 (x−µ)− (g (x)−d)T Cd−1 (g (x)−d) 2 2γ i 1 (d − dobs )T Cd−1 (d − dobs ) . − 2(1 − γ) 24/45 Introduction Standard RML Augmented RML Examples Summary References Modify the target joint pdf Define the target joint probability for the augmented state is h 1 1 π(x, d) ∝ exp − (x−µ)T Cx−1 (x−µ)− (g (x)−d)T Cd−1 (g (x)−d) 2 2γ i 1 (d − dobs )T Cd−1 (d − dobs ) . − 2(1 − γ) The marginal target density for model variable x can be shown to be h 1 i 1 π(x) ∝ exp − (x−µ)T Cx−1 (x−µ)− (g (x)−dobs )T Cd−1 (g (x)−dobs ) 2 2 independent of γ. 24/45 Introduction Standard RML Augmented RML Examples Summary References Target joint pdf 25 3.3 3.25 γ = 0.02 75 20 175 60 30 20 225 3.1 60 3.12 3.15 225 d γ = 0.90 3.14 3.20 125 3.2 3.16 γ = 0.40 3.10 3.10 100 225 80 200 3.0 50 3.05 150 3.08 40 40 100 3.06 3.00 50 2.9 1.45 10 1.50 1.55 x 1.60 1.65 1.50 1.55 x 1.60 1.65 1.50 1.55 1.60 1.65 x Figure 1: Dependence of the joint density for (x, d) on magnitude of the modelization error γ. 25/45 Introduction Standard RML Augmented RML Examples Summary References Generate independent candidates for MCMC 1. Draw unconditional samples from the prior distribution of model and data variables, xuc ∼ N[µ, Cx ] and duc ∼ N[dobs , Cd ]. 2. Candidate joint states are obtained by minimizing a nonlinear least squares function h (x∗ , d∗ ) = argmin (x − xuc )T Cx−1 (x − xuc ) x,d 1 + (g (x) − d)T Cd−1 (g (x) − d) ρ i 1 + (d − duc )T Cd−1 (d − duc ) (1 − ρ) 26/45 Introduction Standard RML Augmented RML Examples Summary References Proposal probability density The inverse transformation is straightforward, 1 xuc = x∗ + Cx G T Cd−1 (g (x∗ ) − d∗ ) ρ and duc = 1 d∗ − ρ 1−ρ ρ g (x∗ ). So the joint proposal density is q(x∗ , d∗ ) = p (xuc (x∗ , d∗ ), duc (x∗ , d∗ )) | det J| 1 = cp exp − (xuc (x∗ , d∗ ) − µ)T Cx−1 (xuc (x∗ , d∗ ) − µ) 2 1 T −1 − (duc (x∗ , d∗ ) − dobs ) Cd (duc (x∗ , d∗ ) − dobs ) | det J| 2 27/45 Introduction Standard RML Augmented RML Examples Summary References Dependence of proposal density on ρ 500 3.30 3.3 ρ = 0.10 3.25 ρ = 0.50 50 1500 3.25 ρ = 0.99 100 60 3.20 2000 1500 2000 3.20 3.2 150 2500 3.15 100 200 3.15 d∗ 2500 2000 3.10 3.1 1500 3.10 3.05 175 2500 3.05 80 2000 125 3.00 3.0 3.00 1500 1500 1000 40 75 2.95 2.95 20 25 1.50 1.55 1.60 x∗ 1.65 1.50 1.55 1.60 1.65 x∗ 1.50 1.55 1.60 1.65 x∗ Figure 2: Dependence of the joint density for proposed transitions (x∗ , d∗ ) on magnitude of ρ. Increasing ρ provides a wider proposal density. 28/45 Introduction Standard RML Augmented RML Examples Summary References Bimodal example: problem definition ��� � d = g (x) ��� posterior � ��� � ��� � ��� ��� ��� x ��� ��� prior � ��� ��� ��� ��� x Observation operator is quadratic: two identical peaks in the likelihood. 29/45 Introduction Standard RML Augmented RML Examples Summary References Bimodal example: augmented state space 1.0 1.0 joint posterior 0.9 0.9 d 0.8 0.8 0.7 0.7 0.6 proposal density 0.6 1.8 2.0 2.2 2.4 1.8 1.9 2.0 x γ = 0.02 2.1 2.2 2.3 2.4 x and ρ = 0.65 30/45 Introduction Standard RML Augmented RML Examples Summary References Marginal distribution of model state from Metropolized RML 2.4 4 2.3 2.2 3 2.1 2 2.0 1 1.9 1.8 0 1.8 2.0 2.2 2.4 Distribution of samples of x 100 200 300 400 500 600 Markov chain for x Acceptance rate for independent proposals is 64% (almost independent of ρ for 0.5 ≤ ρ ≤ 0.8). 31/45 Introduction Standard RML Augmented RML Examples Summary References Example with many modes 6 4 5 4 2 3 0 2 -2 1 -4 -4 -2 0 2 True posterior pdf for x. 4 -3 -2 -1 1 2 3 Mapping xuc to x∗ Two model variables and two nonlinear observations. sin[2πx1 ] g [x1 , x2 ] = sin[2πx2 ] σD = 0.2, xpr = (0.0, 0.0) and σX = 1., dobs = (0., 0.) 32/45 Introduction Standard RML Augmented RML Examples Summary References Proposed transitions 4 2 -3 -2 1 -1 2 3 -2 -4 Sample independently from the prior distribution. 33/45 Introduction Standard RML Augmented RML Examples Summary References Proposed transitions 0.5 -0.6 -0.4 0.2 -0.2 0.4 0.6 -0.5 Solve a minimization problem which maps samples from the prior to samples from a proposal distribution. 34/45 Introduction Standard RML Augmented RML Examples Summary References Distribution of proposed transitions 2 -3 -2 1 -1 2 3 -2 -4 Need to apply Metropolis-Hastings test for samples of x∗ , d∗ . 35/45 Introduction Standard RML Augmented RML Examples Summary References MCMC samples 4 2 -4 2 -2 4 -2 -4 Samples from MH independence sampler with 40,000 elements. Acceptance rate = 0.875 ± 0.002. 36/45 Introduction Standard RML Augmented RML Examples Summary References Compare sampling to exact pdf 11 0.06 0.04 0.02 0.00 -0.02 2 1 1 -0.04 3 45 3 5 4 3 4 5 4 4 2 2 3 3 5 55 4 3 2 11 2 2 -0.06 -0.4 0.0 -0.2 0.2 0.4 6 5 4 3 2 1 -0.6 -0.4 -0.2 0.2 0.4 0.6 Red is true model density. Black is density estimated by kernel smoothing (bandwidth 0.01) of 4200 samples in the regions of three central peaks. 37/45 Introduction Standard RML Augmented RML Examples Summary References True prob Error RML w MH -0.00075 -0.00050 -0.00025 0 0.00025 0.00050 -0.00075 -0.00050 -0.00025 0 0.00025 0.00050 0 0.01 0.02 0.03 Benefit of Metropolization Error RML wo MH Small difference in total absolute error (0.0178 vs 0.0169). 38/45 Introduction Standard RML Augmented RML Examples Summary References Summary • A new augmented variable independence Metropolis sampler • Minimization to place proposals in regions of high probability • Relatively high acceptance rate, even in multimodal distributions (rapid mixing). • Open issues • Computation of the Jacobian determinant in high dimensions • Requirement for obtaining global minimum • Generalization to nongaussian priors • Applicability with EnKF-like methods 39/45 Introduction Standard RML Augmented RML Examples Summary References Acknowledgements Primary support has been provided by the cooperative research project “4D Seismic History Matching” which is funded by industry partners Eni, Petrobras, and Total, as well as the Research Council of Norway (PETROMAKS). 40/45 Introduction Standard RML Augmented RML Examples Summary References References I Bardsley, J., Solonen, A., Haario, H., and Laine, M. (2014). Randomize-Then-Optimize: A method for sampling from posterior distributions in nonlinear inverse problems. SIAM Journal on Scientific Computing, 36(4):A1895–A1910. Kitanidis, P. K. (1995). Quasi-linear geostatistical theory for inversing. Water Resour. Res., 31(10):2411–2419. Krzanowski, W. J. (1988). Principles of Multivariate Analysis: A User’s Perspective. Clarendon Press, Oxford. 563 p. Oliver, D. S. (1996). On conditional simulation to inaccurate data. Math. Geology, 28(6):811–817. Oliver, D. S. (2015). Metropolized Randomized Maximum Likelihood for sampling from multimodal distributions. ArXiv e-prints, (arXiv:1507.08563). 41/45 Introduction Standard RML Augmented RML Examples Summary References References II Oliver, D. S., He, N., and Reynolds, A. C. (1996). Conditioning permeability fields to pressure data. In European Conference for the Mathematics of Oil Recovery, V, pages 1–11. 42/45 Introduction Standard RML Augmented RML Examples Summary References Example: non-Gaussian prior ��� ��� ��� Prior Posterior ��� � � ��� ��� ��� ��� ��� ��� ��� -� � � � � � � � � � � � Prior and posterior model variable distributions for Example 3. 43/45 Introduction Standard RML Augmented RML Examples Summary References The Jacobian of transformation 2.5 2.5 Prior density 2.5 Target density 2.0 Proposal density 2.0 2.0 1.5 1.5 1.5 1.0 1.0 1.0 0.5 0.5 0.5 0.0 0.0 -0.5 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -1.5 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 Joint prior, posterior, and proposal distributions for Gaussian model variable and data variable. Model variable on horizontal axis. 44/45 Introduction Standard RML Augmented RML Examples Summary References Example: non-Gaussian prior ● ���� 3.0 � 2.5 � 2.0 ���� � ����������� ������� ���� � 1.5 � ���� ���� 1.0 -� 0.5 � � � � � � Distribution samples ● ● ●● ● ●● ● ● ● ● ●● ● ●● ● ● ●●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●●●● ●● ●●●● ● ● ●● ● ● ●● ●●●● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ●●●●●● ●● ● ●● ● ●● ● ● ●●●● ● ●●●●● ● ● ● ● ●● ● ●● ●● ●● ● ●● ●● ●● ● ●● ●● ●● ● ● ● ●● ●● ● ● ● ●●●●●● ●● ● ●● ●● ● ● ● ● ●● ● ●● ● ●● ●● ● ●●●●● ●● ●● ● ● ●●● ●● ● ● ● ● ●●● ●●● ● ●●● ● ● ● ●●●●● ● ● ● ●● ● ● ● ●●● ●● ● ●●●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● 500 1000 1500 2000 First 2000 elements ● ● � � ●● ● ● ● ● ● ● ● ● � � � (xuc , duc ) to (x∗ , d∗ ) Results from Metropolized RML with variable transformation. Acceptance rate is 74%. 45/45