Uploaded by cem.celal

A review of adjoint methods for sensitivity analysis, uncertainty quantification and optimization in numerical codes

advertisement
A review of adjoint methods for sensitivity analysis,
uncertainty quantification and optimization in numerical
codes
Grégoire Allaire
To cite this version:
Grégoire Allaire. A review of adjoint methods for sensitivity analysis, uncertainty quantification and
optimization in numerical codes. Ingénieurs de l’Automobile, SIA, 2015, 836, pp.33-36. �hal-01242950�
HAL Id: hal-01242950
https://hal.archives-ouvertes.fr/hal-01242950
Submitted on 14 Dec 2015
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
A review of adjoint methods for sensitivity analysis, uncertainty
quantification and optimization in numerical codes
G. Allaire1
1: CMAP, Ecole Polytechnique, 91128 Palaiseau, France
Abstract: The goal of this paper is to briefly recall
the importance of the adjoint method in many
problems of sensitivity analysis, uncertainty
quantification and optimization when the model is a
differential equation. We illustrate this notion with
some recent examples. As is well known, from a
computational point of view the adjoint method is
intrusive, meaning that it requires some changes in
the numerical codes. Therefore we advocate that
any new software development must take into
account this issue, right from its inception.
Keywords: adjoint method, sensitivity, uncertainty,
optimization.
1. Introduction
The adjoint method originates in the theory of
Lagrange multipliers in optimization. As an example,
consider the minimization of a function J from Rn into
R, under the equality constraint F, a function from R n
into Rm,
min J(x) such that F(x) = 0
(1)
Introducing the Lagrange multilplier p in R m and the
Lagrangian
L(x,p) = J(x) + p.F(x)
(2)
the optimality condition for (1) (under some
qualification conditions) is the stationarity of the
Lagragian, namely
J'(x*) + p*.F'(x*) = 0
(3)
The adjoint method is an extension of this approach
in the framework of optimal control theory. In this
context, the variable x is the union of a state variable
y and a control variable u, while the constraint
F(y,u)=0 is the state equation which gives y in terms
of u. In such a case the Lagrange multiplier p is
called the adjoint state. This approach was
developed by Pontryagin [20] for ordinary differential
equations and by Lions [17] for distributed systems,
i.e., partial differential equations.
2. Adjoint method
2.1 Setting of the problem
For simplicity in the exposition we consider a finite
dimensional model problem. Denote by u in R k a
control or a parameter which is the ultimate
optimization variable. The state of the system is
denoted by y in Rn and is defined as the solution of
the following state equation
A(u) y = b
(4)
where b is a given right hand side in Rn and A(u) is
an invertible n by n matrix. Since the matrix depends
on u, so does the solution y. The goal is to minimize,
over all admissible controls u, an objective function
J(u,y) under the constraint (4). The difficulty of the
problem is that y depends nonlinearly on u. We
assume here that u is a continuous variable (the
case of discrete variables is much more subtle). To
numerically minimize the objective function J, the
most efficient algorithms are by far those based on
derivative informations. Therefore, a key issue is to
compute the gradient of J(u,y(u)). This is the
purpose of the next section.
2.2 Computation of a gradient
By the chain rule lemma the gradient of j(u)=J(u,y(u))
is
j'(u) = ∂uJ(u,y(u)) + y'(u) ∂yJ(u,y(u))
(5)
Differentiating (4) with respect to u yields
A(u) y'(u) = -A'(u) y(u)
(6)
From (6) and (5) we deduce the so-called direct
gradient
j'(u) = ∂uJ(u,y(u)) – A(u)-1A'(u) y(u) ∂yJ(u,y(u))
(7)
If the dimension k of the vector variable u is small,
formula (7) is viable in the sense that it is not too
costly for computing j'(u). Indeed, using (7) requires
k linear solving of (6) (recall that the notation ' is the
gradient with respect to u, a vector in Rk). Solving
(6) is a costly operation since A(u) is an n by n
matrix and n is usually very large (it is the number of
degrees of freedom in the state equation, ranging
from 104 up to 107 in the most recent applications).
However, if k is as large as n (like in structural
optimization,
parameter
identification,
data
assimilation, etc.), then formula (7) is useless !
Rather one should use the adjoint method which
goes as follows. Introduce the so-called adjoint
equation
AT(u) p = -∂yJ(u,y(u))
(8)
T
where A (u) is the adjoint or transposed matrix.
Multiplying (8) by y'(u) leads to
AT(u)p.y'(u) = -y'(u).∂yJ(u,y(u))
(9)
while multiplying (6) by p gives
A(u)y'(u).p = -A'(u)y(u).p
(10)
Page 1/5
Comparing (9) and (10) which have equal left hand
side yields to a simplification of (5) as
j'(u) = ∂uJ(u,y(u)) + A'(u)y(u).p
(11)
On the contrary of (7), formula (11) is cheap to
evaluate, whatever the size of k: it requires only to
solve the additional adjoint equation (8). It largely
outperforms (7) and it is thus the method of choice
for computing the gradient of the objective function
j(u)=J(u,y(u)). There are at least two ways for
introducing an adjoint in a computer code. Either, a
so-called analytic adjoint, i.e. (8), is implemented. Or
a program for the adjoint is obtained by automatic
differentiation [14], [23] of the code solving the state
equation (4).
The only drawback of (11) (and of the whole adjoint
approach) is that it is intrusive from a computational
software point of view, namely it requires the
development of an adjoint solver. This is quite easy if
you know well your software. It is impossible if your
software is a black box with no access to the source
code. Even, if the source files are available, it could
be a nightmare if they are not documented and/or
not written in a clear and systematic manner...
3.1 Optimal design
A classical application of the adjoint approach is the
optimal design of structures or systems. The number
of optimization variables describing the system is so
large that it is the only viable approach. For example
in fluid mechanics, the number of variables is related
to the number of parameters, necessary to describe
the boundary of an airplane [13], [18]. In the context
of topology optimization for mechanical structures,
the number of design variables is even larger since
any cell of a “hold-all” computational domain is
potentially a design variable: either it is full of
material or empty. For any topology optimization
method (be it homogenization [4], SIMP [10], phase
field [24] or level set [5]) the number of design
variables is so large that again the adjoint method is
the only way for computing a gradient and perform
efficient optimization. Topology optimization (using
an adjoint approach) has been very successful those
last years and we present two results obtained with
the level set method [5], [6] in Figures 1 and 2.
Figure 2: Optimal chair obtained with the level set
method and an exact mesh of the structure
Figure 1: Optimal mast obtained with the level set
method for shape and topology optimization
3. Applications
There are many applications of the adjoint approach.
The most classical ones are in control theory [17],
[20], optimization [1], sensitivity or perturbation
analysis, uncertainty quantification, inverse problems
[22] and data assimilation [15]. I will review some of
these applications, in connection with my own
experience, therefore not trying to be exhaustive.
The goal of this list is to convince the reader that,
whatever the field of application, any numerical
simulation code will sooner or later need an adjoint.
Therefore, it is crucial to design any new software
with this view in mind.
Figure 3: Topological derivative or sensitivity for
adding a vortex generator in an air duct
Page 2/5
3.2 Perturbation analysis
The adjoint approach is also very helpful for
sensitivity analysis, i.e., in the determination of the
effect of small perturbations of the data on the
solution. This is a very old and classical topic in
nuclear reactor physics [19] where the adjoint state
is called the importance function. It is a classical tool
in meteorology too [11].
Figure 4: Boundary conditions for the mast test case
Compared to the setting of section 2, the variable u
is now interpreted as a coefficient or a data in the
model (4) and j(u)=J(u,y(u)) is a quantity of interest,
the sensitivity of which has to be computed. Formula
(11) is precisely the prediction of the sensitivity of the
output j(u) with respect to perturbations of the data u.
Figure 5: Optimal mast without uncertainties
A variant of this perturbation analysis appeared
recently under the name of topological derivative
[12], [21]. It amounts to compute a criterion which
predicts if the nucleation of a small hole or inclusion
is going to decrease or not the objective function. It
has been very successful, as an additional topology
optimization tool, in structural mechanics [3], inverse
problems [8] and fluid mechanics. In figure 3 an
example from [7] is displayed: the blue (negative)
region indicates where it is advantageous to put a
vortex generator (a small winglet) on the boundary of
an air duct in order to obtain a homogeneous flow
distribution at the outlet.
3.3 Uncertainty quantification
Estimating the variations of a computational results
in terms of possible uncertainties of the data or
parameters is a very important topic which has
attracted a lot of attention those last years. Very
often, this task is associated to some probabilistic
representation of the parameters and of the results.
Besides the direct approach of Monte-Carlo
simulation (which is very CPU time-consuming) one
can use spectral methods or polynomial chaos [16].
Still these methods are quite intensive in terms of
CPU. Another possibility is to rely on a worst-case
approach which makes the problem deterministic if
the worst-case scenario can be identified (which is
not always possible). Recently a linearization
approach for small uncertainties was suggested in
[2] and [9] which makes the problem fully
deterministic and explicit. Here is the main idea.
Suppose that small perturbations δ occur in the right
hand side of (4), namely
A(u) y(u,δ) = b + δ
(12)
The perturbed solution is y(u,δ) = y(u) + z with
z=A(u)-1δ. Let us assume that the quantity of interest
is
j(u,δ) = J(u,y(u,δ))
(13)
Assuming that the norm of the perturbation is
bounded by a small number m>0, the worst-case
function is
w(u) = max|| δ|| < m j(u,δ)
(14)
Evaluating (14) is a difficult task in general.
However, if m is small we can linearize (13) and
compute (14) as
w(u) ≈ j(u,0) + max|| δ|| < m ∂y J(u,y(u,0)) . z
(15)
-1
which yields, since z=A(u) δ,
w(u) ≈ j(u) + m || A(u)-T ∂y J(u,y(u)) ||
(16)
As usual inverting the matrix AT(u) is too costly, so
we resort to the adjoint approach again. Multiplying
the adjoint equation (8) by z and comparing with the
equation A(u)z=δ multiplied by p leads to
- ∂y J(u,y(u,0)) . z = δ . p
(17)
which in turn gives the worst-case function
w(u) ≈ j(u) + m || p ||
(18)
In other words the worst case deviates from the
standard ouput by the norm of the adjoint p
multiplied by the magnitude m of the uncertainties.
This is a very simple (albeit linearized) way of
assessing and quantifying uncertainties which,
again, relies on the notion of adjoint.
In [2] we used formula (18) to perform shape and
topology optimization of mechanical structures under
various type of perturbations (on loads, material
Page 3/5
properties or geometry). In Figures 4, 5 and 6 are
displayed boundary conditions and optimal shapes
for minimal compliance under a volume constraint
when uncertain vertical loads are applied in the blue
zone. As can be seen here, these perturbations can
have a drastic effect on the optimal topology.
or the “clean” programming of the routines that will
be submitted to an automatic differentiation tool like
Tapenade [23]. These minor precautions will later
save a huge amount of development time, not to
speak of the even larger savings in computational
efficiency compared to any other method, either not
relying
on
gradients or computing
mere
approximations (like finite differences).
5. Acknowledgement
Part of this work was performed in the framework of
the RODIN project (sponsored by FUI AAP 13). This
work was done in collaboration with many
colleagues, among them M. Albertelli, Ch. Dapogny,
G. Delgado, P. Frey, F. Jouve, G. Michailidis. The
author is also a member of the DEFI project at INRIA
Saclay Ile-de-France.
6. References
[1]
[2]
Figure 6: Optimal mast with uncertainties
[3]
4. Conclusion
In this paper we briefly recalled the adjoint theory for
computing derivatives or sensitivities of an objective
function depending on the solution of a state
equation. In an increasingly large variety of cases
(as illustrated in the figures above), the number of
variables is so large that there is no viable
alternative to the adjoint method. We also collected
various problems where the adjoint is a key
technique: of course optimal design, control and
inverse problems, but also sensitivity analysis and
uncertainty quantification. These types of problems
occur in virtually all fields of science and
engineering. Despite its efficiency the adjoint
approach suffers from one major drawback which is
the difficulty of implementing an adjoint solver (be it
analytic or by automatic differentiation) in a preexisting code which was not designed for
accommodating an adjoint solver. Note however that
programming an adjoint along the early development
of a numerical software is a rather easy task.
Therefore we advocate that any new software
planning should incorporate, in its early stages, all
the necessary tools for later incorporating an adjoint
(when the need will be felt and, surely, it will happen
at some times). These ingredients include various
data structures, the possibility of defining “dummy”
loads, similar to the right hand sides of the adjoint
equation (8), the definition of the linearized adjoint
operator (if an analytic adjoint is going to be chosen)
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
G. Allaire: "Conception optimale de structures",
Collection: Mathématiques et Applications, Vol. 58,
Springer, 2007.
G. Allaire, Ch. Dapogny: "A linearized approach to
worst-case design in parametric and geometric
shape optimization", M3AS, 24(11) pp.2199-2257,
2014.
G. Allaire, F. de Gournay, F. Jouve, A.-M. Toader:
"Structural optimization using topological and
shape sensitivity via a level set method", Control
and Cybernetics 34, 59-80, 2005.
G.
Allaire:
"Shape
optimization
by
the
homogenization method", Springer, 2001.
G. Allaire, F. Jouve, A.-M. Toader: "Structural
optimization using sensitivity analysis and a levelset method", J. Comp. Phys. Vol 194/1, 363-393,
2004.
G. Allaire, Ch. Dapogny, P. Frey: "Shape
optimization with a level set based mesh evolution
method", CMAME 282, 22-53, 2014.
G. Allaire, J. Chetboun: "Flow Control of Curved Air
Ducts using Topological Derivatives", Proceedings
of the 8th World Congress on Structural and
Multidisciplinary Optimization, H.C. Rodrigues et al.
eds., ISSMO, 2009.
H. Ammari: "An introduction to mathematics of
emerging
biomedical
imaging",
Collection:
Mathématiques et Applications, Vol. 62, Springer,
2008.
I. Babuška, F. Nobile and R. Tempone: "Worst
case scenario analysis for elliptic problems with
uncertainty", Numer. Math., 101(2), pp.185-219,
2005.
M. Bendsoe, O. Sigmund: "Topology Optimization.
Theory, Methods, and Applications", Springer
Verlag, 2003.
R. Errico: "What is an adjoint model ?", Bulletin of
the American Meteorological Society, 78, 25772591, 1997.
S. Garreau, P. Guillaume, M. Masmoudi: "The
topological asymptotic for PDE systems: the
Page 4/5
[13]
[14]
[15]
[16]
[17]
elasticity case", SIAM J. Control Optim., 39(6),
1756-1778, 2001.
M. Giles, N. Pierce: "An Introduction to the Adjoint
Approach to Design", Flow, Turbulence and
Combustion 65: 393-415, 2000.
A. Griewank, A. Walther: "Evaluating derivatives.
Principles
and
techniques
of
algorithmic
differentiation”, Second edition, Society for
Industrial and Applied Mathematics (SIAM), 2008.
F.X. Le Dimet, O. Talagrand: "Variational
algorithms for analysis and assimilation of
meteorological observations: theoretical aspects",
Tellus A, 38A(2), 97-110, 1986.
O. Le Maître, O. Knio: "Spectral methods for
uncertainty quantification. With applications to
computational
fluid
dynamics",
Scientific
computation, Springer, New York, 2010.
J.L. Lions: "Optimal control of systems governed by
partial differential equations”, Die Grundlehren der
mathematischen Wissenschaften, 170, SpringerVerlag, 1971.
[18]
[19]
[20]
[21]
[22]
[23]
[24]
B. Mohammadi, O. Pironneau: "Applied shape
optimization for fluids", Clarendon Press, Oxford,
2001.
J. Planchard: "Méthodes mathématiques en
neutronique", Collection de la Direction des Etudes
et Recherches d'EDF, Eyrolles, 1995.
L.S.
Pontryagin,
V.G.
Boltyanskii,
R.V.
Gamkrelidze: "The mathematical theory of optimal
processes", Pergamon Press, Oxford, 1964.
J. Sokolowski, A. Zochowski: "On the topological
derivative in shape optimization", SIAM J. Control
Optim. 37, 1251--1272 , 1999.
A. Tikhonov, V. Arsenin: "Solutions of ill-posed
problems", John Wiley, 1977.
L. Hascoët, V. Pascual: "The Tapenade automatic
differentiation tool: principles, model, and
specification", ACM Transactions On Mathematical
Software, 39(3), 2013.
A. Chambolle, B. Bourdin: "Design-dependent
loads in topology optimization", ESAIM Control
Optim. Calc. Var., 9, 19-48, 2003.
Page 5/5
Download