Folie 1

advertisement
Second Order Differentiation
Yi Heng
Bommerholz – 14.08.2006
Summer School 2006
Outline
Background
What are derivatives?
Where do we need derivatives?
How to compute derivatives?
Basics of Automatic Differentiation
Introduction
Forward mode strategy
Reverse mode strategy
Second-Order Automatic Differentiation Module
Introduction
Forward mode strategy
Taylor Series strategy
Hessian Performance
An Application in Optimal Control Problems
Summary
2
Background
What are derivatives?
Jacobian Matrix:
The differential of f :
m

n
, x
f ( x) is described by the Jacobian matrix
f1 
 f1
 x
xm 
1

f 
.
J

x 


f

f
n 
 n
 x
xm 
 1
Tangents (directional derivatives): dY  J  dX
Gradients
: Xb*  Yb*  J
3
Background
What are derivatives?
Hessian Matrix:
The second order partial derivatives of a function f :
constitutes its Hessian matrix
 2 f 
H( f )  


x

x
 i j  i , j 1,
m

,m
4
Background
Where do we need derivatives?
•
•
•
•
•
•
•
•
Linear approximation
Bending and Acceleration (Second derivatives)
Solve algebraic and differential equations
Curve fitting
Optimization Problems
Sensitivity analysis
Inverse Problem (data assimilation)
Parameter identification
5
Background
How to compute derivatives?
Symbolic differentiation
Derivatives can be computed to machine precision
Computation work is expensive
For complicated functions, the representation of the final expression may be
an unaffordable overhead
Divided differences
It is easy to implement (the definition of derivative is used)
Only original computer program is required (Formula not necessary)
The approximation contains truncation error
6
Background
How to compute derivatives?
Automatic differentiation
Machine precision approximation can be obtained
Computation work is cheaper
Only requires the original computer program
To be continued ...
7
Basics of Automatic Differentiation
Introduction
Automatic differentiation ...
•
Is also known as computational differentiation, algorithmic differentiation, and
differentiation of algorithms;
•
Is a systematic application of the familar rules of calculus to computer programs, yielding
programs for the propagation of numerical values of first, second, or higher order
derivatives;
•
Traverses the code list (or computational graph) in the forward mode, the reverse
mode, or a combination of the two;
•
Typically is implemented by using either source code transformation or operator
overloading;
•
Is a process for evaluating derivatives which depends only on an algorithmic
specification of the function to be differentiated.
8
Basics of Automatic Differentiation
Introduction
Rules of arithmetic operations for gradient vector
u, v are scalar function with m independent input variables.
(u  v)  u  v,
(uv)  uv  vu ,
(u / v)  (u  (u / v)v) / v, v  0,
( (u ))   '(u )(u ),
for differentiable functions  (such as the standard functions)
with known derivatives.
9
Basics of Automatic Differentiation
Forward mode & Reverse mode
An example
given function f ( x, y, z )  ( xy  cos z )( x 2  2 y 2  3 z 2 ), the partial derivatives are
f
 y  ( x 2  2 y 2  3 z 2 )  ( xy  cos z )  2 x  3x 2 y  2 y 3  3 yz 2  2 x cos z ,
x
f
 x  ( x 2  2 y 2  3 z 2 )  ( xy  cos z )  4 y  x 3  6 xy 2  3xz 2  4 y cos z ,
y
f
  sin z  ( x 2  2 y 2  3 z 2 )  ( xy  cos z )  6 z
z
  x 2 sin z  2 y 2 sin z  3 z 2 sin z  6 xyz  6 z cos z.
 f f f 
f  


x

y

z


T
10
Basics of Automatic Differentiation
Forward mode & Reverse mode - Forward mode
Code list
+
u1  x,
Gradient entries
u1  [1, 0, 0],
u2  y ,
u2  [0,1, 0],
u3  z ,
u4  u1u2 ,
u5  cos u3 ,
u 6  u 4  u5 ,
u7  u ,
2
1
u8  2u2 ,
2
u3  [0, 0,1],
u4  u1u2 +u2u1  [0, u1 ,0]+[u2 ,0,0]=[u2 ,u1 ,0],
u5  ( sin u3 )u3  [0, 0,  sin u3 ],
u6  u4  u5  [u2 ,u1 ,  sin u3 ],
u7  2u1u1  [2u1 , 0, 0],
u8  4u2u2  [0, 4u2 , 0],
u9  3u32 ,
u9  6u3u3  [0, 0, 6u3 ],
u10  u7  u8  u9 ,
u10  u7  u8  u9  [2u1 , 4u2 , 6u3 ],
u11  u6u10 ,
u11  u6u10  u10u6  [2u6u1  u10u2 , 4u6u2  u10u1 ,6u6u3  u10sinu3 ].
f ( x, y, z )  u11  [3x 2 y  2 x cos z  2 y 3  3 yz 2 ,
6 xy 2  4 y cos z  x 3  3 xz 2 ,
6 xyz  6 z cos z  x 2 sin z  2 y 2 sin z  3z 2 sin z ].
11
Basics of Automatic Differentiation
Forward mode & Reverse mode - Reverse mode
Code list
+
u11
u11
u11 u11 u10
 1,
 u6 ,

 u6 ,
u11
u10
u9 u10 u9
u1  x,
u2  y ,
u11 u11 u10
u
u u

=u6 , 11  11 10 =u6 ,
u8 u10 u8
u7 u10 u7
u3  z ,
u4  u1u2 ,
u11
u
u u
u11 u11 u6
 u10 , 11  11 6  u10 ,

 u10 ,
u6
u5
u6 u5
u4 u6 u4
u5  cos u3 ,
u 6  u 4  u5 ,
u11 u11 u9 u11 u5


 6u6u3  u10 sin u3 ,
u3
u9 u3 u5 u3
u7  u12 ,
u8  2u2 2 ,
u9  3u32 ,
u10  u7  u8  u9 ,
u11  u6u10 ,
f ( x , y , z )  [
Adjoints
u11 u11 u4 u11 u8


 u10u1  4u6u2 ,
u2 u4 u2 u8 u2
u11 u11 u4 u11 u7

+
=u10u2  2u6u1.
u1
u4 u1 u7 u1
u11 u11 u11
,
,
]  [3 x 2 y  2 x cos z  2 y 3  3 yz 2 ,
u1 u2 u3
6 xy 2  4 y cos z  x 3  3 xz 2 ,
6 xyz  6 z cos z  x 2 sin z  2 y 2 sin z  3 z 2 sin z ].
12
Second-Order AD Module
Introduction
Divided differences
First order differentiation
Forward differentiation:
f ( x1 ,
f

xm
Backward differentiation:
Centered differentiation:
f ( x1 ,
f

xm
f ( x1 ,
f

xm
, xm  h,
, xn )  f ( x1 ,
h
, xm ,
, xn )  f ( x1 ,
h
, xm ,
, xm  h,
, xn )
, xm  h,
, xn )  f ( x1 ,
2h
 O ( h)
, xn )
, xm  h,
 O ( h)
, xn )
 O(h 2 )
Second order differentiation
f ( x1 ,
2 f

xm2
, xm  h,
, xn )  2 f ( x1 ,
, xm ,
h2
, xn )  f ( x1 ,
, xm  h,
, xn )
 O(h 2 )
13
Second-Order AD Module
Introduction
Rules of arithmetic operations for Hessian matrices
H (u  v)  H (u )  H (v),
H (uv)  uH (v)  u T v  vT u  vH (u ),
H (u / v)  ( H (u )  (u / v)T v  vT (u / v)  (u / v) H (v)) / v, v  0,
H ( (u ))   ''(u )u T u   '(u ) H (u ),
for twice differentiable functions  such as the standard functions.
14
Second-Order AD Module
Forward Mode Stategy
An example
given function f ( x, y )  x 2  2 y 2  sin( xy ), the second order partial derivatives are
2 f
2 f
2
 (2 x  y cos( xy )) x  2  y sin( xy ),
 (2 x  y cos( xy )) y  cos( xy )  xy sin( xy),
x 2
xy
2 f
2 f
2
 (4 y  x cos( xy )) x  cos( xy )  xy sin( xy ),

(4
y

x
cos(
xy
))

4

x
sin( xy ).
y
2
yx
y
 2 f
2 f 
 2


x

x

y
.
The Hessian matrix H ( f )   2
2
 f  f 

2 


y

x

y


15
Second-Order AD Module
Forward Mode Stategy
Code list
+
Gradient entries
u1  x,
u1  [1, 0],
u2  y ,
u2  [0,1],
u3  u12 ,
u3  2u1u1 =[2u1 ,0],
u4  2u22 ,
u5  sin(u1u2 ),
u6  u3  u4  u5 ,
u4 =4u2u2 [0,4u2 ],
u5  sin(u1u2 )  u2 cos (u1u2 )u1  u1 cos (u1u2 )u2
 [u2 cos (u1u2 ), u1 cos (u1u2 )],
u6  u3  u4  u5
 [2u1  u2 cos (u1u2 ), 4u2  u1 cos (u1u2 )].
16
Second-Order AD Module
Forward Mode Stategy
... +
0
H (u1 )  
0
Hessian matrix entries
0
0
 , H (u2 )  
0
0
0
,
0
2 0
H (u3 )  H (u )=u1H (u1 )  u u1  u u1  u1H (u1 )  
,
0
0


0 0
2
T
T
H (u4 )=H (2u2 )=2[u2 H (u2 )  u2 u2  u2 u2  u2 H (u2 )]  
,
0
4


H (u5 )  H (sin(u1u2 ))   sin(u1u2 )(u1u2 )T (u1u2 )  cos(u1u2 ) H (u1u2 )
2
1
T
1
T
1
 u22 u1u2 
 0 1
  sin(u1u2 )  
 cos(u1u2 )   ,
 u u u 2 
1 0 
 1 2 1
 2  sin(u1u2 )u22
H (u6 )  H (u3 )  H (u4 )  H (u5 )  
 cos(u u )  u u sin(u u )

1 2
1 2
1 2
cos(u1u2 )  u1u2 sin(u1u2 ) 
 .
2
4  u1 sin(u1u2 )

17
Second-Order AD Module
Forward Mode Stategy
Hessian Type
Cost
H(f)
O(n2)
H(f): n by n matrix
H(f)*V
O(nnv)
V: n by nv matrix
VTH(f)*V
O(nv2)
W: n by nw matrix
VTH(f)*W
O(nvnw)
18
Second-Order AD Module
Taylor Series Strategy
We consider f as a scalar function f (x 0  tu) of t. Its Taylor series, up to second order, is
f
f ( x 0  tu )  f ( x 0 ) 
t
1 2 f
t 
2 t 2
t 0
 t 2  f  ft t  f tt t 2 ,
t 0
where ft and ftt are the first and second order Taylor coefficients.
The uniquesness of the Taylor series implies that for u  ei , the ith basis vector, we obtain
f
ft 
xi
x  x0
1 2 f
, ftt 
2 xi2
.
x x0
19
Second-Order AD Module
Taylor Series Strategy
To compute the (i, j ) off-diagonal entry in the Hessian, we set u  ei  e j . The uniqueness
of Taylor expansion implies
ft 
f
xi

x  x0
1 2 f
ftt 
2 xi2
f
x j
x  x0
,
x  x0
2 f

xi x j
x  x0
1 2 f

2 x 2j
.
x  x0
20
Second-Order AD Module
Hessian Performance
•
Twice ADIFOR, first produces a gradient code with ADIFOR 2.0, and then runs the
gradient code through ADIFOR again.
•
Forward, implements the forward mode.
•
Adaptive Forward, uses the forward mode, with preaccumulation at a statement level
where deemed appropriate.
•
Sparse Taylor Series, uses the Taylor series mode to compute the needed entries.
21
An Application in OCPs
Problem Definition and Theoretical Analysis
Consider the following problem
f (x, x, u, v)  0n. t  [t0 , t f ]
(1a)
with the following set of consistent and non-redundant initial conditions:
x(t0 )  x 0 ( v)
where: x, x 
u

n
(1b)
are the state (output) variables and their time derivatives respectively,
are control (input) variables, and v 

are time-invariant parameters. Depending
on the implementation, the control variables may be approximated by some types of
discretization which involves some (or all) of the parameters in set v
u  u( v )
(1c)
22
An Application in OCPs
The first order sensitivity equations
f x f x f u f



 0n  , t  [t0 ,t f ]
x v x v u v v
(2a)
with the initial conditions:
x
x
(t0 )  0 ( v )
v
v
(2b)
23
An Application in OCPs
The second order sensitivity equations
T
2
2
2
2
2
2
 f
  x  f
 x 
 x     f x  f x  f u  f 
 x  I   v 2   x  I   v 2   I n   v    x 2 v  xx v  ux v  vx 

 

T
2
2
2
2
2

 x     f x  f x  f x  f   f
 u
  In      2



   I  2

 v    x v xx v ux v vx   u
 v

T
2
2
2
2

 u     f x  f x  f u  f 
  In     

 2


v

x

u

v

x

u

v

u

v
vu 
   

  2 f x  2 f x  2 f u  2 f 



 2   0n  . t  [t0 , t f ]
 xv v xv v uv v v 
with initial conditions given by:
 2 x0
2x
(t 0 ) 
( v)
v 2
v 2
(3a)
(3b)
24
An Application in OCPs
The second order sensitivity equations
The result of Eq. (3a) is post-multiplied by a vector p obtaining:
{ }  p  0n 1
(4)
by comparing terms the equivalent form is derived:
f T f T
Z  Z  A(x, x, u, v)  0 n 
(5a)
x
x
with Z, Z   n being the matrices whose columns are respectively given by
the matrix-vector products zi , zi 

, with
 2 xi
 2 xi
zi 
p, zi 
p, i  1, 2, , n, t  [t0 , t f ].
v
v
Finally, the set of initial conditions for these are:
 T   2 x01

( Z(t0 ))   p 
( v) 

  v
T
  2 x0 n
 
p 
( v)  

v

 
T
(5b)
(5c)
25
An Application in OCPs
Optimal control problem
Find the control vector u(t ) over t  [t0 , t f ] to minimize (or maximize) a
performance index, J :
J (x, u)   ( x(t f ))
(6)
subject to a set of ordinary differential equations:
dx
 f ( x(t ), u(t ), t )
(7)
dt
where x is the vector of state variables, with initial conditions x(t0 )  x 0 .
An additional set of inequality constrains are the lower and upper bounds
on the control variables:
u L  u(t )  uU
(8)
26
An Application in OCPs
Truncated Newton method for the solution of the NLP
The truncated newton method uses an iterative scheme, usually a conjugate
gradient method, to solve the Newton equations of the optimization problem:
H ( x) p   g ( x)
(9)
where H ( x) is the Hessian matrix, p is the search direction and g ( x) is the
gradient vector.
27
An Application in OCPs
Implementation Details
Step 1
•
Automatic derivation of the first and second order sensitivity equations to construct a full
augmented IVP.
•
Creates corresponding program subroutines in a format suitable to a standard IVP solver.
Step 2
•
Numerical solution of the outer NLP using a truncated –Newton method which solves
bound-constrained problems.
28
An Application in OCPs
Two approaches with TN method
TN algorithm with finite difference scheme
•
Gradient evaluation requires the solution of the first order sensitivity system
•
Gradient information is used to approximate the Hessian vector product with a finite
difference scheme
TN algorithm with the exact Hessian vector product calculation
•
It uses the second order sensitivity equations defined in Eq. (5a) to obtain the exact
Hessian vector product. (Earlier methods of the CVP type were based on first order
sensitivities only, i.e. Gradient based algorithms mostly).
•
This approach has been shown more robust and reliable due to the use of exact second
order information.
29
Summary
•
Basics of derivatives
-
•
Basics of AD
-
•
Compute first order derivatives with forward mode
Compute first order derivatives with reverse mode
Second Order Differentiation
-
•
Definition of derivatives
Application of derivatives
Methods to compute derivatives
Compute second order derivatives with forward mode strategy
Compute second order derivatives with Taylor Series strategy
Hessian Performance
An Application in Optimal Control Problems
-
First order and second order sensitivity equations of DAE
Solve optimal control problem with CVP method
Solve nonlinear programming problems with truncated Newton method
Truncated Newton method with exact Hessian vector product calculation
30
References
•
Abate, Bischof, Roh,Carle "Algorithms and Design for a Second-Order Automatic
Differentiation Module„
•
Eva Balsa Canto, Julio R. Banga, Antonio A. Alonso, Vassilios S. Vassiliadis
"Restricted second order information for the solution of optimal control problems
using control vector parameterization„
•
Louis B. Rall, George F. Corliss „An Introduction to Automatic Differentiation„
•
Andreas Griewank „Evaluating Derivatives: Principles and Techniques of
Algorithmic Differentiation„
•
Stephen G. Nash „A Survey of Truncated-Newton Methods„
31
Download