Quadratically Converging Control Sequence Optimisation Algorithms Newton-GRAPE; Hessian Calculations & Regularisation

advertisement
Quadratically Converging Control Sequence
Optimisation Algorithms
Newton-GRAPE; Hessian Calculations & Regularisation
David L. Goodwin &
Ilya Kuprov
d.goodwin@soton.ac.uk
QUAINT@Swansea, Wednesday 1st July 2015
Two equations. . .
∂2J
(k)
(k)
∂cn2 ∂cn1
= hσ| P̂ˆ N P̂ˆ N −1 · · ·
∂
(k)
∂cn2
P̂ˆ n2 · · ·
∂
(k)
∂cn1
P̂ˆ n1 · · · P̂ˆ 2 P̂ˆ 1 |ψ0 i

ˆ ∆t −iĤ
ˆ (k1 ) ∆t
−iL̂
0
n1

ˆ ∆t
ˆ (k2 ) ∆t
exp  0
=
−iĤ
−iL̂
n2
ˆ ∆t
0
0
−iL̂

ˆ
ˆ ∆t
ˆ ∆t 
−iL̂
−iL̂
∂
∂2
e−iL̂∆t
(k1 ) e
(k1 )
(k2 ) e
∂c
∂c
∂c
n1
n1
n2




ˆ
ˆ ∆t
−iL̂
∂
 0

e−iL̂∆t
(k2 ) e


∂cn2
(1)

0
0
ˆ
e−iL̂∆t
(2)
01
Newton-Raphson method
02
GRAPE; gradient and Hessian
03
Van Loan’s Augmented Exponentials
04
Regularisation
Abstract
We report an efficient algorithm for computing the full Hessian of the optimal control
objective function in a large class of practically relevant cases, and use it to analyze
the convergence properties of Newton-Raphson optimization methods using the full
Hessian matrix.
Newton-Raphson method
The Newton-Raphson method
Taylor’s Theorem
I
Taylor series approximated to second order[1] .
I
If f is continuously differentiable
f (x + p) = f (x) + ∇f (x + tp)T p
I
If f is twice continuously differentiable
f (x + p) = f (x) + ∇f (x)T p + 12 pT ∇2 f (x + αp)p
I
1st order necessary condition: ∇f (x∗ ) = 0
I
2nd order necessary condition: ∇2 f (x∗ ) is positive semidefinite
Finding a minimum to the function using tangents of the gradient
f 0 (x)
f 0 (x)
f 0 (x)
f (x)
x0
x0
2
2
1
1
x0
2
2
x0
1
x1
1
x2
0
0
−2
−1
0
1
2
x
−2
[1]
0
0
1
2
x
−2
−1
−1
f 0 (x) = x2 − 1
−1
00
f (x0 ) = 2x0
x2
x1
−1
−1
00
f (x1 ) = 2x1
B. Taylor. Inny, 1717, J. Nocedal and S. J. Wright. 1999.
x1
0
0
1
2
x
−2
−1
0
1
2
−1
f (x) =
3
x
3
−x+1
x
The Hessian Matrix
−1
pN
k = −Hk ∇fk
The Newton step:
I
∇2 fk = Hk is the Hessian matrix, one of second order partial derivatives[2] :

 2
2
∂2f
f
∂ f
· · · ∂x∂ ∂x
2
∂x
∂x
∂x
n
1
2
1
 21
2
∂2f
f 
 ∂ f
· · · ∂x∂ ∂x

 ∂x2 ∂x1
∂x2
n
2
2
H=
 ..
..
.. 
.
.
 .
.
.
. 

 2
∂ f
∂2f
∂2f
···
∂x ∂x
∂x ∂x
∂x2
n
1
n
2
n
I
The steepest descent method results from the same equation when we set H
to the identity matrix.
I
Quasi-Newton methods initialise H to the identity matrix, then to approximate
it from an update formula using a gradient history.
I
The Hessian proper must be positive definite (an quite well conditioned) to
make an inverse; an indefinte Hessian results in non-descent search directions.
[2]
L. O. Hesse. Chelsea, 1972.
GRAPE; gradient and Hessian
GRAPE gradient
I
Split Liouvillian to controllable and uncontrollable parts[3] ,
ˆ (t) = Ĥ
ˆ + P c(k) (t)Ĥ
ˆ
L̂
0
k
k
I
T
R ˆ
Maximise the fidelity measure, J = <e hσ̂| exp(0) −i L̂
(t)dt |ρ̂(0)i
0
∂J
∂ck (t)
= 0 at a minimum, and the Hessian matrix
I
Optimality conditions,
should be positive definite
I
Gradient found from forward and backward propagation
∂J
(k)
∂cn
I
= hσ| P̂ˆ N P̂ˆ N −1 · · ·
∂
(k)
∂cn
P̂ˆ n · · · P̂ˆ 2 P̂ˆ 1 |ψ0 i
Propagator over a time slice:
"
! #
X (k)
ˆ
ˆ
ˆ
P̂ n = exp −i Ĥ 0 +
cn Ĥ k ∆t
k
[3]
N. Khaneja et al. In: Journal of Magnetic Resonance 172.2 (2005), pp. 296 –305.
GRAPE Hessian
I
(block) diagonal elements
∂2J
(k) 2
∂cn
I
∂2
(k) 2
∂cn
P̂ˆ n · · · P̂ˆ 2 P̂ˆ 1 |ψ0 i
non-diagonal elements
∂2J
(k)
(k)
∂cn2 ∂cn1
I
= hσ| P̂ˆ N P̂ˆ N −1 · · ·
= hσ| P̂ˆ N P̂ˆ N −1 · · ·
∂
(k)
∂cn2
P̂ˆ n2 · · ·
∂
(k)
∂cn1
P̂ˆ n1 · · · P̂ˆ 2 P̂ˆ 1 |ψ0 i
All propagators of the non-diagonal blocks have been calculated within a
gradient calculation, and can be reused. Only need to find the diagonal blocks.
Van Loan’s Augmented Exponentials
Efficient Gradient Calculation
Augmented Exponentials
I
Among the many complicated functions encountered in magnetic resonance
simulation context, chained exponential integrals involving square matrices Ak
and Bk occur particularly often:
tn−2
Zt Zt1
Z
o
n
dt1 dt2 · · · dtn−1 eA1 (t−t1 ) B1 eA2 (t1 −t2 ) B2 · · · eA1 (t−t1 ) Bn−1 eAn tn−1
0
0
0
I
A method for computing some of the integrals of the general type shown in
Equation of this type was proposed by Van Loan in 1978[4] (pointed out by
Sophie Schirmer[5] )
I
Using this augmented exponential technique, we can write an upper-triangular
block matrix exponential as

 

Rt A(t−s) As
R1 A(1−s) As
At
A
e
e
Be
ds
A B
e
e
Be
ds
=

=
exp
0
0
0 A
0
eAt
0
eA
[4]
[5]
C. F. Van Loan. In: Automatic Control, IEEE Transactions on 23.3 (1978), pp. 395–404.
F. F. Floether, P. de Fouquieres and S. G. Schirmer. In: New Journal of Physics 14.7 (2012),
p. 073023.
Efficient Gradient Calculation
Augmented Exponentials
I
Find the derivative of the control pulse at a specific time point
I
set
Z1
ˆ (k)
ˆ ∆t ⇒ B = −iĤ
eA(1−s) BeAs ds = Dcn (t) exp −iL̂
n ∆t
0
I
leading to an efficient calculation of the gradient element

!  −iL̂∆t
ˆ
ˆ
−iL̂∆t
∂
ˆ
ˆ (k)
e
−iL̂∆t −iĤn ∆t
(k) e
∂cn

exp
=
ˆ
ˆ
0
−iL̂∆t
0
e−iL̂∆t
Efficient Hessian Calculation
Augmented Exponentials
I
second order derivatives can be calculated with a 3 × 3 augmented
exponential[6]
I
set
Z1 Zs
eA(1−s) Bn1 eA(s−r) Bn2 eAr drds = Dc2n
1 cn2
0
I
0
Giving the efficient Hessian element calculation


ˆ
ˆ (k )
−iL̂∆t −iĤn11 ∆t
0


ˆ
ˆ (k )
exp 
−iL̂∆t
−iĤn22 ∆t
 0
=
ˆ
0
0
−iL̂∆t


ˆ
ˆ
ˆ
−iL̂∆t
−iL̂∆t
∂2
∂
e−iL̂∆t
(k1 ) e
(k1 )
(k2 ) e
∂c
∂c
∂c
n
n
n


1
1
2
ˆ
ˆ


−iL̂∆t
∂

 0
e−iL̂∆t
(k2 ) e


∂cn2
0
[6]
ˆ (k)
ˆ ∆t ⇒ B = −iĤ
(t) exp −iL̂
n
n ∆t
0
ˆ
e−iL̂∆t
T. F. Havel, I Najfeld and J. X. Yang. In: Proceedings of the National Academy of Sciences 91.17
(1994), pp. 7962–7966, I. Najfeld and T. Havel. In: Advances in Applied Mathematics 16.3
(1995), pp. 321 –375.
Overtone excitation
Simulation
T̂1,0
I
I
I
I
Excite 14 N from a state T̂1,0 → T̂2,2 .
Solid state powder average, with objective
functional weighted over the crystalline orientations
(rank 17 Lebedev grid - 110 points).
Nuclear quadrupolar interaction.
400 time points for total pulse duration of 40µs
T̂2,2
Overtone excitation
Comparison of BFGS and Newton-Raphson
1
BFGS
Newton
0.95
0.9
0.85
1-fidelity
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0
20
40
60
iterates
80
100
Regularisation
Problems
Singularities everywhere!
I
BFGS (using the DFP formula) is guaranteed to produce a positive definite
Hessian update
I
The Newton-Raphson method does not:
−1
pN
k = −Hk ∇fk
I
Properties of the Hessian matrix:
1. Must be symmetric:
∂2
∂c(i) ∂c(j)
=
∂2
;
∂c(j) ∂c(i)
not if control operators commute
2. Must be sufficiently positive definite; non-singular; invertible.
3. The Hessian is diagonally dominant.
Regularisation
Avoiding Singularities
I
Common when we have negative eigenvalues, regularise the Hessian to be
positive definite[7] .
I
Check eigenvalues performing an eigendecomposition of the Hessian matrix:
H = QΛQ−1
I
Add the smallest negative eigenvalue to all eigenvalues, then reform the
Hessian with initial eigenvectors:
λmin = max [0, − min(Λ)]
ˆ −1
Hreg =Q(Λ + λmin I)Q
TRM Introduce a constant δ; region of a radius we trust to give a sufficiently positive
definite Hessian.
ˆ −1
Hreg = Q(Λ + δλmin I)Q
However, if δ is too small, the Hessian will become ill-conditioned.
RFO The method proceeds to construct an augmented Hessian matrix
2
δ H δ~g
Haug =
= QΛQ−1
δ~g
0
[7]
X. P. Resina. PhD thesis. Universitat Autónoma de Barcelona, 2004.
State transfer
z
H
z
0
F
H
14
+
C
0
H
I
n
o
ˆ (H) , L̂
ˆ (H) , L̂
ˆ (C) , L̂
ˆ (C) , L̂
ˆ (F ) , L̂
ˆ (F )
Controls := L̂
x
y
x
y
x
y
valid vs. invalid parametrisation of Lie groups.
16
−
I
Interaction parameters of a
hydroflourocarbon
molecular group used in
state transfer simulations,
with a magnetic induction
of 9.4 Tesla
State transfer
Comparison of BFGS and Newton-Raphson I
1-fidelity
10 0
1-fidelity
10 0
10 -1
10 -1
0
50
100
iterations
150
0
500
wall clock time (s)
1000
State transfer
10 0
10 0
10 -2
10 -2
10 -4
10 -4
1-fidelity
1-fidelity
Comparison of BFGS and Newton-Raphson II
10 -6
10 -8
10 -10
10 -6
10 -8
0
50
iterations
100
10 -10
0
1000
wall clock time (s)
2000
Closing Remarks
I
Hessian calculation that scales with O(n) computations.
I
Efficient directional derivative calculation with augmented exponentials
I
better regularisation and conditioning
I
infeasible start points?
I
Different line search methods?
I
forced symmetry of a Hessian
Download