Minimization for generating conditional realizations from unconditional realizations Dean Oliver

advertisement
Minimization for generating
conditional realizations
from unconditional realizations
Dean Oliver
Uni Centre for Integrated Petroleum Research
17 June 2013
1/39
Organization
I
Data assimilation — geoscience applications
I
I
I
Large numbers of parameters to be estimated (105 – 107 )
Large amounts of data (103 – 105 )
Evaluate of likelihood function is expensive (1 hr)
I
Sample updating — Bayes rule + another condition
I
Examples (very small)
I
Comments
2/39
Final uncertainty in
Initial uncertainty in
data
reservoir model param- −−−−−−−−−→ reservoir model parameters
eters
2
1.0
1
0.8
0
0.6
-1
0.4
-2
-1
0
1
relatively simple
2
-1.0
-0.5
0.0
0.5
relatively complex
Bayes’ rule relates prior pdf to posterior pdf. For high model
dimensions, generally need Monte Carlo methods for integration.
3/39
Ensemble of conditional
Initial ensemble of
data
reservoir model param- −−−−−−−−−→ reservoir model parameters
eters
-3
-2
3
3
2
2
1
1
1
-1
2
3
-3
-2
1
-1
-1
-1
-2
-2
-3
-3
2
3
Ensembles of particles represent prior pdf and posterior pdf.
Compute expectations by summing over samples.
4/39
Sampling from the conditional pdf
1.2
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
-1.0
-0.5
0.0
0.5
-1.0
-0.5
0.5
Samples need to be distributed correctly. MCMC?
Rejection/acceptance? Scale poorly in high model/data
dimensions.
5/39
Update the ensemble of samples
ææ æ
æææ
æ
æ
æ
æ æ
æ æ
æ
æ
ææ
æ
ææ
æ
æ
æ æ
æ
1æ
æ
ææ æ
æ
æ
ææ
æææ ææ æ
ææ æ
æ
æ
æ æ
æ
æ
æ
æ
æ
æ
æ ææ æ
æ
æ
æ ææææ
æ
æ
æ
æ
æ
ææ
æ ææ æ ææ
æ
æ
æ
æ æ
æ
æææ
æ
ææ æææ
ææ 1æ
æ æ æ
æ
ææ
æ
2
æ æ -1æ æ ææ æ ææ æ æ æ æ
æ
ææ
ææ
æ ææ æ
ææ
æ ææ æ ææ æ
ææ
æ æ æ
ææ
ææ
æ
æ
æ æ
æ
æ
ææ
ææ æ
æ
æ-1
æ æ
ææ æ æ
æ
ææ
æ
æ æ
æ
æ
æ
æ
æ
æ
æ
-2
æ
æ
2æ ææ
prior
ææ æ
æ
2 æ
æ
æææ
æ
æ
æ
æ æ
æ æ
æ
æ
æ
æ
ææ
æ ææ
æ
æ æ
æ
1
æ
ææ æ
æ
æ
æ
ææ
æ æ
æææ ææ æ
æ
æ
æ
æ
æ æ
æ
æ
æ
æ
æ
æ
æ ææ æ
æ
æ
æ ææææ
æ
æ
æ
æ
æ
æ
ææ
æ ææ æ æ
æ
æ
æ
æ
æ
æææ 2
-2
-1ææ æ ææ æ
æ
æ æææ
ææ 1 æ
æ æ æ
æ
æ æ ææ æ æ æ
ææ
æ
æ
æ
ææ
ææ
æææ æ æ
ææ
æ ææ æ ææ æ
ææ
æ æ æ
ææ
ææ
æ
æ
æ
æ
æ
æ
ææ
ææ æ
æ
æ -1
æ æ
ææ æ æ
æ
ææ
æ
æ æ
æ
æ
æ
æ
æ
æ
æ
posterior
Each sample from prior is updated, not resampled from posterior.
How to update?
6/39
How to update samples?
“The Ensemble Kalman Filter (EnKF) [is] a continuous
implementation of the Bayesian update rule” (Li and Caers, 2011).
“These methods use . . . ‘samples’ that are drawn independently
from the given initial distribution and assigned equal weights.
. . . When observations become available, Bayes’ rule is applied
either to individual samples . . . ” (Park and Xu, 2009)
Note: Bayes rule explains how to update probabilities, but not how
to update samples.
(Particle filters do update the probability of each sample using
Bayes rule.)
7/39
Two transformations from prior to posterior pdf
-2
y = µy +
1.5
1.5
1.0
1.0
0.5
0.5
1
-1
2
-2
1
-1
-0.5
-0.5
-1.0
-1.0
-1.5
-1.5
Ly [Ly Σx Ly ]−1/2 Ly (x
− µx )
2
y = µy + Ly L−1
x (x − µx )
Two equally valid transformations from prior to posterior for
linear-gaussian problem. Bayes rule is not sufficient to specify the
transformation.
8/39
Why it matters
1. If prior is Gaussian and posterior is Gaussian, then any linear
transformation of variables that obtains the correct posterior
mean and covariance is OK. (Any version of EnKF or EnSRF.)
2. What transformation to use when the observation operator is
nonlinear?
2.1 Randomized maximum likelihood (Kitanidis, 1995; Oliver
et al., 1996)
2.2 Optimal map (Sei, 2011; El Moselhy and Marzouk, 2012)
2.3 Implicit filters (Chorin et al., 2010; Morzfeld et al., 2012)
2.4 Continuous data assimilation or multiple data assimilation
(Reich, 2011; Emerick and Reynolds, 2013)
2.5 Ensemble-based iterative filters/smoothers
9/39
Monge’s transport problem
I
I
Let x ∼ p(x)
Let y = s(x)
I
I
then y ∼ qs (y ) = p(s −1 (y ))| det(Ds −1 (y ))|
q(y ) is the target pdf of transformed variables
Solve for the optimal transformation s ∗ (·) that minimizes
Z
∗
s = arg min kx − s (x)k2 p(x) dx
such that qs (y ) = q(y )
s
10/39
Example:1 Gaussian to Gaussian
I
Prior pdf:
p(x) = N(µx , Σx )
I
Target pdf:
q(y ) = N(µy , Σy )
y = s(x) := µy + Ly [Ly Σx Ly ]−1/2 Ly (x − µx )
1/2
where Lx = Σx
1/2
and Ly = Σy .
Minimizes the expected value of the squared distance kx − y k2
1
Olkin and Pukelsheim (1982); Knott and Smith (1984); Rüschendorf (1990)
11/39
Speculation — why would the minimum distance solution
be desirable?
Obtaining a transformation with optimal transport properties may
only be important insofar as it simplifies the sampling problem
except that the ‘natural’ optimal transport solution might be more
robust to deviations from ideality.
Examples include those in which samples from the prior are
complex geological models, in which case making as small of a
change as possible might be beneficial.
12/39
Transformations between spaces of unequal dimensions
2.6
2.4
x ∈ RN
y = s(x) ∈ RM
D
2.2
2.0
1.8
1.6
1.4
-1
0
1
2
3
4
1.4
1.6
1.8
2.0
2.2
2.4
Y
X
Z
Find s(·) such that
kx − As(x)k2Σx p(x) dx
is minimized and x and y = s(x) are distributed as
x ∼ N(µx , Σx )
y ∼ N(µy , Σy ).
13/39
Transformations between spaces of unequal dimensions
2.6
2.6
Ay
2.4
2.4
2.2
D
2.0
2.2
x
2.0
1.8
1.8
1.6
Σx
1.6
1.4
-1
0
1
2
3
4
-1
1
2
3
4
X
Z
Find s(·) such that
kx − As(x)k2Σx p(x) dx
is minimized and x and y = s(x) are distributed as
x ∼ N(µx , Σx )
y ∼ N(µy , Σy ).
14/39
Transformations between spaces of unequal dimensions
Note that if
A=
I
H
and
Σx =
Cx
0
0
Cd
then
x
A
y= A
d
−1
= x + Cx H T HCx H T + Cd
(d − Hx)
T
−1
Σ−1
x A
T
Σ−1
x
is an optimal mapping of [x d] to y .
Note that this is the perturbed observation form of the EnKF
(Burgers et al., 1998), or RML for linear inverse problem.
15/39
Nonlinear Transformations
Consider now the problem of determining the transformation s ∗
that satisfies
2
Z x
s(x, d) ∗
p(x, d) dx dd
−
s = arg min s
d
h(s(x, d)) Σ
xd
subject to y = s(x, d) is distributed as q(y ).
The prior is Gaussian but the data relationship is nonlinear.
16/39
Nonlinear Transformations
x
1.6
y = s(x)
1.4
D
1.2
1.0
0.8
0.6
0.4
-2
-1
0
1
X
2
3
-1.0
-0.5
0.0
0.5
1.0
Y
17/39
Nonlinear Transformations
1.6
1.6
Ay
1.4
1.4
x
1.2
D
D
1.2
1.0
1.0
0.8
0.8
0.6
0.6
-2
-1
0
1
X
2
3
Σx
-2
-1
0
1
2
3
X
18/39
Approximate solution
Consider the transformation defined by the solution of
T x∗
y
Cx
−
∗
d
h(y )
0
∗
y = arg min
y
0
Cd
−1 x∗
y
−
.
d∗
h(y )
If h is differentiable, then the variables x ∗ , d ∗ , and y ∗ are related as
x ∗ = y ∗ + Cx H∗T Cd−1 [h(y ∗ ) − d ∗ ]
19/39
Approximate solution
For small y − y ∗ , the pdf of transformed variables
log(qs (y )) ≈
1
(y − µ)T Cx−1 (y − µ)
2
1
+ (h(y ) − do )T Cd−1 (h(y ) − do )
2
+ u(δ̂, d ∗ , y ∗ ) + log |J|
is approximately equal to the target pdf.
u(δ̂, d ∗ , y ∗ ) comprises terms that do not depend on y .
The Jacobian determinant of the transformation is
∂(x, d) I −Cx H∗T C −1 d
J=
=
I
∂(y , δ) H∗
20/39
Examples
1.2
2.0
1.0
1.5
0.8
0.6
1.0
0.4
0.5
0.2
1.0
1.5
2.0
2.5
1. univariate: posterior pdf
is unimodal but skewed
-1.5
-1.0
0.5
-0.5
1.0
1.5
2. univariate: posterior pdf
is bimodal
0.30
-0.8
0.25
-1.0
0.20
-1.2
0.15
-1.4
0.10
-1.6
0.05
-1.8
-0.05
0.00
0.05
0.10
0.15
3. bivariate: posterior pdf
is bimodal
-0.15
-0.10
-0.05
0.05
0.10
0.15
4. 1000 D: posterior pdf
has 4 modes
21/39
Example 2: single variable, bimodal
h(x) = d
posterior
4
1.2
1.0
3
0.8
2
0.6
prior
0.4
1
0.2
-2
1
-1
(a) Likelihood
2
-2
-1
1
2
(b) prior and posterior
Figure 1: The observation tells that x ≈ ±1. Prior says that x ≈ 0.8.
Will sample from prior, transform to posterior.
22/39
Example 2: single variable, bimodal
y
x
1.5
1.0
0.5
-2
1
-1
-0.5
2
3
x
y
-1.0
-1.5
d
(a) y vs x
(b) y vs (x, d)
Figure 2: Transformed samples from approximate posterior vs samples
from prior.
T x∗
y
Cx
−
d∗
h(y )
0
∗
y = arg min
y
0
Cd
−1 x∗
y
−
.
d∗
h(y )
23/39
Example 2: single variable, bimodal
x
1.5
1.0
0.5
-2
1
-1
2
3
y
-0.5
-1.0
-1.5
d
(a) y vs x
(b) y vs (x, d)
Figure 3: Transformed samples from approximate posterior vs samples
from prior. Red curve shows the (correct) optimal map defined for
minimizing expected distance between x and y .
24/39
Example 2: single variable, bimodal
7
1.5
6
5
1.0
4
3
0.5
2
1
-1.5 -1.0 -0.5
0.0
0.5
(a) σd = 0.1
1.0
1.5
2.0
-1.5 -1.0 -0.5
0.0
0.5
1.0
1.5
2.0
(b) σd = 0.5
Figure 4: Distributions of transformed samples for two values of noise in
the observation. Samples from the prior distribution (gray curve) were
mapped using the minimization transformation. The blue solid curve
shows the true pdf. Clearly under-sampled in region between modes.
25/39
Example 2: single variable, bimodal
7
6
1.5
5
4
1.0
3
2
0.5
1
-1.5
-1.0
-0.5
0.0
0.5
(a) σd = 0.1
1.0
1.5
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
(b) σd = 0.5
Figure 5: Same as previous slide, but the dashed curve shows the product
of the true pdf and the Jacobian determinant of the transformation.
26/39
Example 3: two variables, bimodal
1. The locations of the modes in 20 experiments were randomly
sampled.
2. Bimodality is established through nonlinearity of one
observation of x12 + x22 .
3. Second observation is made of a linear combination of the two
variables
4. Approximately 4000 samples for each experiment
27/39
Example 3: two variables, bimodal
0.6
0.4
1.2
A
1.0
0.2
B
0.8
0.08
0.10
0.12
0.14
-0.2
0.6
-0.4
0.4
-0.6
-0.8
-1.0
(a) Example A: modes relatively
close, good sampling.
0.5
-0.5
(b) Example B: modes relatively
distant, good sampling.
-0.6
-0.8
-1.0
æ
-1.6
-1.8
-0.10 -0.05
0.05
0.10
0.15
0.20
(c) Example C: modes relatively
close, poor sampling.
æ
æ
-1.2
-1.4
ææ æ
æ
2 æ
æ
æææ
æ
æ
æ æ
æ æ
æ
æ
æ
æ
ææ
æ ææ
æ
æ æ
æ
1
æ
ææ æ
æ
æ
æ
ææ
æ æ
æææ ææ æ
æ
æ
æ
æ
æ æ
æ
ææ æ ææ
æ
æ
æ
æ
æ
æ ææ
æ
æ
æ
ææ
æ
ææ
æ
æææ
æ
æ
æ
æææ æ
æ
æ
æææ 2
-2
-1ææ æ ææ æ
æ
æ æææ
ææ 1 æ
æ æ æ
æ
æ æ ææ æ æ æ
ææ
æ
æ
æ
ææ
ææ
æææ æ æ
ææ
æ ææ æ ææ æ
ææ
æ æ æ
ææ
ææ
æ
æ
æ
æ
æ
æ
ææ
ææ æ
æ
æ -1
æ æ
ææ æ æ
æ
ææ
æ
æ æ
æ
æ
æ
æ
æ
C
(d) Example B: mapping of individual samples from prior to posterior.
28/39
Example 3: Quantitative comparison
0.5
0.5
0.0
0.40
-0.5
0.45
0.50
0.55
0.60
-0.5
-1.0
-1.0
0.35
0.40
0.45
0.50
true pdf
0.55
0.60
sampling
For each example, we compute the local mean for each of the
modes of the distribution and the total probability for each mode.
29/39
Example 3: Summary results
æ
æ
C
0.5
B
0.0
æ
æ
ææ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
ææ æ
æ
A
C
æ
-0.5
æ
Empirical weight
Empirical local mean
0.9
æ
æ
0.8
æ
æ
0.7
æ
æ
0.6
æ
æ
B
æ
-1.0
æ
ææ
æ
æ
0.5
æ
æ
-1.0
-0.5
0.0
True local mean
0.5
(a) Comparison of sampled estimate
of local mean for modes with true local mean.
A
æ
0.5
0.6
0.7
True weight
0.8
0.9
(b) Comparison of sampled estimate
of weight on one of the modes with
true weight.
Figure 7: Approximately 4000 optimization-based samples are used to
compute the approximate weights and means for both modes of the
sampled distributions. Labeled points refer to experiments shown on a
previous slide.
30/39
Example 4: 4 modes, high dimension
The prior probability density is Gaussian with independence of
variables and unit range:
x ∼ A exp[−0.5x T x]
The likelihood is the sum of delta functions of random weights:
L[x|d] =
4
X
bi δ(x − αi )
i=1
so that the posteriori pdf that we wish to sample is
p[x = αi |d] ∝ bi exp[−0.5 αiT αi ]
The αi were normally distributed with mean 0 but small variance.
31/39
Example 4: random test pdf
0.4
0.05
0.3
0.00
0.2
-0.05
0.00
0.1
-0.05
-0.10
0.00
0.05
-0.15
-0.10
-0.05
0.05
0.10
-0.10
0.15
projection 1D
0.10
projection 3D
Computations in dimensions as high as 2000.
32/39
Total absolute error in probability
Example 3: Summary results
1.0
æ
æ
æ
0.8
æ
0.6
æ
æ
0.4
æ
0.2
æ
5
10
50
100
æ
æ
æ
500
1000
Dimension of model space
1000 random samples of αi and βi . 10,000 samples used to
compute error for each set of αi and βi .
33/39
Key points
1. Bayes’ rule does not specify how to update individual samples.
Need some other criterion (such as ‘minimum expected
distance’ or ‘maximum correlation’).
2. For some types of problems, can simplify the optimal
transport problem (for probability density) by careful choice of
a cost function.
3. RML sampling is approximate for nonlinear observation
operators — should probably weight samples according to the
Jacobian determinant of the transformation.
34/39
Remaining questions
1. Defining a cost function was key to getting a good
approximation of optimal transport solution – how to
generalize for nongaussian prior?
2. How to efficiently compute the Jacobian of transformation for
weighting of updated samples?
3. Usefulness in ensemble-based approaches?
35/39
36/39
Burgers, G., van Leeuwen, P. J., and Evensen, G. (1998). Analysis
scheme in the ensemble Kalman filter. Monthly Weather
Review, 126(6):1719–1724.
Chorin, A. J., Morzfeld, M., and Tu, X. (2010). Implicit particle
filters for data assimilation. Communications in Applied
Mathematics and Computational Science, 5(2):221–240.
El Moselhy, T. A. and Marzouk, Y. M. (2012). Bayesian inference
with optimal maps. Journal of Computational Physics,
231(23):7815–7850.
Emerick, A. A. and Reynolds, A. C. (2013). Ensemble smoother
with multiple data assimilation. Computers & Geosciences,
55:3–15.
Kitanidis, P. K. (1995). Quasi-linear geostatistical theory for
inversing. Water Resour. Res., 31(10):2411–2419.
Knott, M. and Smith, C. S. (1984). On the optimal mapping of
distributions. Journal of Optimization Theory and Applications,
43(1):39–49.
Li, H. and Caers, J. (2011). Geological modelling and history
36/39
matching of multi-scale flow barriers in channelized reservoirs:
methodology and application. Petroleum Geoscience, 17:17–34.
Morzfeld, M., Tu, X., Atkins, E., and Chorin, A. J. (2012). A
random map implementation of implicit filters. Journal of
Computational Physics, 231(4):2049–2066.
Oliver, D. S., He, N., and Reynolds, A. C. (1996). Conditioning
permeability fields to pressure data. In European Conference for
the Mathematics of Oil Recovery, V, pages 1–11.
Olkin, I. and Pukelsheim, F. (1982). The distance between two
random vectors with given dispersion matrices. Linear Algebra
and its Applications, 48(0):257–263.
Reich, S. (2011). A dynamical systems framework for intermittent
data assimilation. BIT Numer Math, 51(1):235–249.
Rüschendorf, L. (1990). Corrigendum. J. Multivar. Anal.,
34(1):156.
Sei, T. (2011). Gradient modeling for multivariate quantitative
data. Annals of the Institute of Statistical Mathematics,
63:675–688.
37/39
Example 1: unimodal but skewed
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
1
2
3
prior pdf for model var
4
1
2
3
4
d = h(x) = tanh(x)
10,000 independent samples sampled from the prior distribution
and mapped to the posterior.
37/39
Example 1: unimodal but skewed
5
14
σd = 0.0252.5
2.0
σd = 0.0104
12
10
8
3
6
2
σd = 0.050
1.5
1.0
4
1
2
1.15
1.20
1.25
1.30
0.5
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.0
1.4
σd = 0.10
1.2
1.0
0.8
0.6
1.0
1.2
1.4
1.6
1.8
2.0
2.2
1.0
σd = 0.25
0.8
σd = 0.50
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.4
0.2
1.0
1.5
2.0
2.5
3.0
0.5
1.0
1.5
2.0
2.5
3.0
0.5
1.0
1.5
2.0
2.5
3.0
Figure 8: Samples from the Gaussian prior distribution (mean 1.5) were
mapped using the minimization transformation. The blue solid curve
shows the true pdf. The dashed curve shows the product of the true pdf
and the Jacobian determinant of the transformation.
38/39
0.10
à
à
æ
à
æ
æ
0.08
à
à
0.06
æ
0.04
à
æ
æ
à
0.02
à
0.00 æ
0.01
à
æ
æ
à
æ
0.02
0.05 0.10 0.20
Standard deviation of data noise
0.50
Error in estimate of st dev
Error in estimate of mean
Example 1: unimodal but skewed
à
0.06
à
à
à
à
0.05
à
0.04
0.03
à
0.02
0.01
æ
à
à
0.00 æ
0.01
æ
æ
æ
æ
æ
à
æ
æ
æ
0.02
0.05 0.10 0.20
Standard deviation of data noise
0.50
(a) Absolute error in estimate of mean. (b) Absolute error in estimate of standard deviation.
Figure 9: Comparison of empirical moments from 10,000 independent
samples using minimization-based sampling with true moments. The
magenta dots are 2 standard deviations in the error of samples of 100
realizations from the true distribution.
39/39
Download