A012 A REAL PARAMETER GENETIC ALGORITHM FOR CLUSTER IDENTIFICATION IN HISTORY MATCHING

advertisement
1
A012 A REAL PARAMETER GENETIC ALGORITHM
FOR CLUSTER IDENTIFICATION IN HISTORY
MATCHING
Jonathan N Carter and Pedro J Ballester
Dept Earth Science and Engineering, Imperial College, London
Abstract
Non-linear inverse problems, by their very nature, can be expected to yield multiple solutions.
This will occur even when the problem is well defined, in the sense that the number of
measurements is significantly greater than the number of free parameters. These solutions will
manifest themselves as local optima for some objective function, and will be separated by
regions of poor objective function value.
In history matching the challenge is to identify all of the high quality local optima, and sample
the parameter space around them. Within a Bayesian framework this allows us to estimate the
likelihood and quantify the uncertainty associated with a solution. Algorithms, such as Monte
Carlo Markov Chain (MCMC), allow us to do this. However in practice they are not very
efficient and not suitable for practical problems.
In this paper we present a real parameter Genetic Algorithm that has been designed to search for
multiple local optima and to sample the parameter space around the optima. The methodology
has been implemented within a non-generational steady-state scheme. Possible solutions
generated by the Genetic Algorithm are evaluated in parallel on a cluster of computers. All of the
solutions generated are finally clustered using a new clustering algorithm. This algorithm does
not need the user to specify the number of clusters to be identified, unlike most other clustering
algorithms.
The application of the algorithms is illustrated on two inverse problems. The first is a simple
three parameter cross-sectional model, which was already known to have multiple solutions. The
second is a real world case study, with 82 free parameters. In each case it is shown that the
Genetic Algorithm can find multiple optima and that the results can be clustered with the
clustering algorithm.
Introduction
In the petroleum industry the business pressure of making faster decisions with less risk, makes
the understanding of subsurface uncertainty increasingly important. To be able to do this we
need to understand which of the many possible models, that might represent the reservoir, are
consistent with the measurement data that is available. In the language of history matching we
are no longer looking for the single set of model parameters that give the best match between
simulated and measured history. But instead must find all those areas in parameter space the give
acceptable matches to the measurements. This means that it is not sufficient to simply find all of
the local optima on the response surface, we also need to sample the space around each of the
9th
European Conference on the Mathematics of Oil Recovery — Cannes, France, 30 August - 2 September 2004
2
optima. Work by Oliver et al[1] and the PUNQ group[2] have demonstrated the importance of
exploring the whole of the parameter space, additional reasons to favour this approach have been
discussed by Sambridge[3]. Due to errors within the modelling process, the measurement
process and possibly insufficient measurements it is likely that the response surface, that
characterizes the history matching problem, has multiple high quality optima[4].
These features make clear the need of a method that preferentially samples regions of the
parameter space that fit the data well. Among the algorithms commonly used to search a
parameter space, Markov Chain Monte Carlo (MCMC) is probably the best known. This
approach has fallen out of favour as it has become clear that it only effective for problems with a
few parameters [5,6,7]. A recent technique, the Neighbourhood Algorithm[8], has quickly gained
popularity in the earth sciences. It is based on the idea of using all previously evaluated models
to guide the search for new models. It has been used to provide important new results in
geophysics[9]. Despite their success and growing acceptance, global search methods still have
room for improvement.
Genetic Algorithms are a group of optimization algorithms that are inspired by the ideas of
Darwinian evolution and genetics to generate solutions to problems. GAs have been used in the
petroleum for 20 years[10,11] , although their use for history matching has been limited. The
literature that describes the many ways of implementing a GA is extensive, there are a number of
introductory texts. Mosegaard and Sambridge[12] have identified GAs as a promising
approach to sampling the optimal regions of a parameter space.
In this paper we present a real parameter non-generational GA which has been designed to work
with a heterogeneous computing cluster that identifies and samples multiple local optima on the
response surface. We also present a clustering algorithm that is used to isolate the clusters
identified by the GA. Below we describe both these algorithms and give some preliminary
results from two reservoir characterization problems.
Genetic Algorithm
The GA that we have used has a number of particular properties: it uses real numbers as genes
(rather than the common binary representation), it operates on a non-generational basis, and
works without modification on a multi-processor computer cluster. The general structure of our
GA is shown in figure 1. There are three independent activities going on: the breeding of
children from parents selected from the adult population, the testing (training) of these children,
and their initiation into the adult population. The most time consuming element of this process is
testing and evaluation of the children. Depending on the problem being considered, and the
computer being used, this may take anything from minutes to hours to complete for each child.
The two other tasks operate as background activities, so as to ensure that there are always
children waiting in the training queue, and that children are initiated quickly into the adult
population. One of the advantages of this set-up is that the multi-processor system can be any
collection of computers, from a group of networked PCs to a dedicated computing cluster. The
number of computers can range from one to hundreds, and need not be constant during the
process. All that needs to happen is that as a processor becomes available it is able to collect a
child from the training queue, and when the evaluation is complete then the result is placed in the
initiation queue, each computer operates completely independently.
The key components of this GA have been designed to work efficiently for optimization
problems with only real variables and where we expect there to be multiple local optima which
3
will be of interest. There are three elements of the implementation that appear to be very
important to its effectiveness:
• The crossover operator does a gene by gene recombination to obtain the child.[13]
• Our parental selection policy is completely random, with no bias towards the better
members of the population, this is contrary to normal practice.
• Our culling policy is designed to maintain selection pressure and to allow the
development of niche populations. The child competes in a probabilistic two person
tournament with an adult for the right to be part of the adult population, the loser being
stored in an archive. The adult selected is one of R randomly chosen adults from the
population. From these R adults the one nearest the child in parameter space then enters
the tournament.
A version of the algorithm has been implemented using a generational approach and tested on a
wide range of test problems. The results of these tests can be found elsewhere [13,14,15].
Is initiation queue size >= N
No
Yes
Initiate child into adult population
Pause
No
Is initiation queue empty
Initiation queue
Yes
Is training queue empty
No
Yes
Select the parents
Evaluate fitness using
multi-processor system
Breed 1 child using crossover operator
Training queue
Yes
Is training queue size <= M
No
Generate initial population
The structure of the parallel GA used in this study
Cluster Analysis
The output from the GA described in the previous section is the final population. Which if the
algorithm has been run for sufficient time will consist some subpopulations, plus some number
of outlier points, not associated with any of the subpopulations. It is assumed that each of the
subpopulations will be associated with a different local optimum. The challenge is to separate
out the various groups, which will then allow further analysis.
A review of the literature shows that none of the many cluster analysis techniques that have been
devised are well suited to our problem. In general they require the user to many one of several
possible assumptions about the characteristics of the data: eg the number of clusters, the density
of points within clusters, the minimum separation between clusters. There is also a lack of
algorithms that claim to be able to deal with what is known as “full dimensional clustering in
high dimensional spaces”. For clusters that may exist within the final population of the GA we
are going to face several problems that make the existing algorithms difficult to use. We can not
9th
European Conference on the Mathematics of Oil Recovery — Cannes, France, 30 August - 2 September 2004
4
be sure about the number of clusters that will have formed, and we have the problem of an
unknown number of outlier points. We have no guaranty that the point density inside each
cluster is similar, nor do we know the likely separation between clusters. Finally we can not
predict in how many dimensions clustering might be occurring.
We have developed an algorithm that overcomes all of the difficulties that we have described. It
is fully described elsewhere[16] and is subject to a patent application[17].
Results
We have tested our methods on two problems. The first is a simple three parameter crossectional
model of a reservoir, the second is an 82 parameter model of a real world reservoir provided to
us by British Petroleum PLC.
IC Fault Model
Our model is a cross-section of a simple layered reservoir, with a single vertical fault midway
between an injector producer pair, as shown in figure 3. The model that we calibrate has three
parameters: the vertical displacement (throw) of the fault; the permeability of the poor quality
sand; and the permeability of the good quality sand. The geological layers are assumed to be
homogeneous (ie they have constant physical properties). The “truth” case, which is used to
generate the measurements for the calibration, is a variant of the calibration model, but with
fixed parameter values. In the case of no model error, then the “truth” case is a member of the set
of all possible calibration models. The size and type of model error is chosen by how a specific
calibration model is perturbed to obtain the truth case. In the work presented in this paper, the
model error is obtained by introducing small variations into the spatial properties of the
geological layers. The permeability and porosity in each grid block are randomly perturbed. The
maximum variations that are allowed is 1% of the unperturbed mean values. These perturbations
are much lower than would be expected for a real world rock that had been classified as
homogeneous. A more extensive description of the model can be found a papers that deal with
estimating model errors[18,19], the data set has also been made available electronically[20].
Good sand
Poor sand
Schematic if the IC fault model
5
0.7
0.6
exp(−Dm/0.15)
0.5
0.4
0.3
0.2
0.1
0
0
20
40
60
kp
60
50
30
40
20
10
0
h
This figure shows the sampling achieved (in two of the three dimensions) with a single run of the
GA. The figure shows that the algorithm has sampled more densely in the region of highest
likelihood, and that in a single run we have identified several local minima. It was not necessary
to use the clustering algorithm to identify the clusters around the local optima.
Midge Reservoir Model
The Midge reservoir is a fractured and faulted chalk above a salt diaper. The key uncertainties
are the volumes of oil in the various compartments, the transmissibility across the faults and the
allocation of water to the injection wells. In total there are 82 variables, each has an upper and
lower bound defined by the reservoir engineer, and the data available for history matching is
primarily well bottom hole pressures and RFT measurements. Each simulation takes about 20
minutes, on a sun ultra 5, to complete. Analysis by BP suggested that at least four significant
local optima exist for this problem.
Our testing to date on this problem has been to use the GA to search for individual local optima
and to sample the space around each optimum identified. We then used the clustering algorithm
to check whether the local optima found in eight runs were the same or different. Each time we
run the history match we completed 9500 simulations and then kept the best 25 function
evaluations.
9th
European Conference on the Mathematics of Oil Recovery — Cannes, France, 30 August - 2 September 2004
6
This shows the results of using the clustering algorithm on the 200 samples generated in this
way, and the table shows the relative importance of the four components of the objective
function. In each case the match to the measured BHP and GOR seems very similar. The RFT
measurements were considered very important by the reservoir engineering team and the key
measurements are shown in figure9. The penalty is a measure of the number of months that wells
failed to meet prescribed production targets.
RFT
27.8
36.8
17.8
17.1
26.7
24.2
21.1
19.9
BHP
35.2
35.1
35.2
35.2
35.2
35.2
35.2
35.2
GOR
25.7
25.6
25.7
25.7
25.6
25.6
25.6
25.6
Penalty 126.4
54.5
114.3
114.4
35.2
18.0
18.3
20.3
The left hand figure shows the key RFT measurements for the second optima, and the right hand
figure shows the RFT measurements for the eight optima.
7
Conclusions
In this paper we have shown the application of two complementary algorithms to the
identification of multiple optima in two history matching problems. It has been observed that
multiple high quality optima seem to be quite common in history matching problems. This
suggests that it is more important to search for multiple solutions than is commonly the case.
References
[1]
Oliver, D., Reynolds, A., Bi, Z., and Abacioglu, Y., Integration of production data into
reservoir models, petroleum Geoscience 7, S65-S73, 2001.
[2]
Floris, F., Bush, M., Cuypers, M., Roggero, F., Syversveen, A.R., Methods for quantifying
the uncertainty of production forecasts: a comparative study, Petroleum Geoscience 7 S87S96, 2001.
[3]
Sambridge, M., An ensemble view of earth’s inner core, Science 299, 529-530, 2003.
[4]
Oliver, D., Cunha, L., Reynolds, A., A Markov chain monte carlo methods for
conditioning a permeability field to pressure data. Mathematical Geology 29, 61-91, 1997.
[5]
Sambridge, M., Geophysical inversion with a neighbourhood algorithm: searching a
parameter space, Geophysical Journal International 138,479-494,1999.
[6]
Mosegaard, K., Sambridge, M, Monte carlo analysis of inverse problems, Inverse
Problems 18,R29-R54, 2002.
[7]
Oliver, D., Cunha, L., Reynolds, A.: Markov chain monte carlo methods for conditioning
a permeability field to pressure data., Mathematical geology 29,61-91, 1997
[8]
Sambridge, Geophysical inversion with a neighbourhood algorithm -i. searching a
parameter space, Geophysical Journal International 138, 479—494, 1999.
[9]
Beghein, C., Trampert, J.: Robust normal mode constraints on inner-core anisotropy from
model space search, Svience 299, 552-555, 2003.
[10] Goldberg, D., Computer aided gas pipeline operation using genetic algorithms and rule
learning, part 1, SPE 14590, 1985.
[11] Goldberg, D., Computer aided gas pipeline operation using genetic algorithms and rule
learning, part 2, SPE 14591, 1985.
[12] Mosegaard and |Sambridge, Montecarlo analysis of inverse problems, Inverse problems
18, R29-R54, 2002
[13] Real parameter genetic algorithms for finding multiple optimal solutions in multi-modal
optimization, Ballester, P.J. and Carter J.N., Lecture Notes in Computer science 2723,
Springer, 706-717, 2003
[14] A effective real parameter genetic algorithm for multi-modal optimization, Adaptive
Computing in Design and Manufacture, Ed I.Parmee, Springer , 359-364, 2004.
[15] An effective real parameter genetic algorithm with parent centric normal crossover for
mulri-modal optimization, Genetic and evolutionary conference Ed K.Deb, Springer, 2004.
[16] Ballester, P.J., Carter, J.N.: An algorithm to identify clusters of solutions in multimodal
optimisation., Experimental and Efficient Algorithms International Workshop, Lecture
Notes in Computer Science 3059, Springer (2004) 42-56.
9th
European Conference on the Mathematics of Oil Recovery — Cannes, France, 30 August - 2 September 2004
8
[17] Method for managing a database (2003) UK Patent Application filed on 5th August 2003.
[18] J.N.~Carter, Using Bayesian Statistics to Capture the Effects of Modelling Errors in
Inverse Problems, \emph{Mathematical Geology}, 36:187, 2004.
[19] Carter, Ballester, Tavassoli, King, Our Calibrated model has no predictive value, proc
Sensitivity analysis of model outputs Conference ,
http://webfarm.jrc.cec.eu.int/uasa/events/SAMO2004/Papers/samo04-45(J_ Carter).pdf,
2004
[20] IC Fault Model http://www.ese.ic.ac.uk/userfiles/PDF/g290_index_ICFM.html
Download