Quadratically Converging Control Sequence Optimisation Algorithms Newton-GRAPE; Hessian Calculations & Regularisation David L. Goodwin & Ilya Kuprov d.goodwin@soton.ac.uk QUAINT@Swansea, Wednesday 1st July 2015 Two equations. . . ∂2J (k) (k) ∂cn2 ∂cn1 = hσ| P̂ˆ N P̂ˆ N −1 · · · ∂ (k) ∂cn2 P̂ˆ n2 · · · ∂ (k) ∂cn1 P̂ˆ n1 · · · P̂ˆ 2 P̂ˆ 1 |ψ0 i ˆ ∆t −iĤ ˆ (k1 ) ∆t −iL̂ 0 n1 ˆ ∆t ˆ (k2 ) ∆t exp 0 = −iĤ −iL̂ n2 ˆ ∆t 0 0 −iL̂ ˆ ˆ ∆t ˆ ∆t −iL̂ −iL̂ ∂ ∂2 e−iL̂∆t (k1 ) e (k1 ) (k2 ) e ∂c ∂c ∂c n1 n1 n2 ˆ ˆ ∆t −iL̂ ∂ 0 e−iL̂∆t (k2 ) e ∂cn2 (1) 0 0 ˆ e−iL̂∆t (2) 01 Newton-Raphson method 02 GRAPE; gradient and Hessian 03 Van Loan’s Augmented Exponentials 04 Regularisation Abstract We report an efficient algorithm for computing the full Hessian of the optimal control objective function in a large class of practically relevant cases, and use it to analyze the convergence properties of Newton-Raphson optimization methods using the full Hessian matrix. Newton-Raphson method The Newton-Raphson method Taylor’s Theorem I Taylor series approximated to second order[1] . I If f is continuously differentiable f (x + p) = f (x) + ∇f (x + tp)T p I If f is twice continuously differentiable f (x + p) = f (x) + ∇f (x)T p + 12 pT ∇2 f (x + αp)p I 1st order necessary condition: ∇f (x∗ ) = 0 I 2nd order necessary condition: ∇2 f (x∗ ) is positive semidefinite Finding a minimum to the function using tangents of the gradient f 0 (x) f 0 (x) f 0 (x) f (x) x0 x0 2 2 1 1 x0 2 2 x0 1 x1 1 x2 0 0 −2 −1 0 1 2 x −2 [1] 0 0 1 2 x −2 −1 −1 f 0 (x) = x2 − 1 −1 00 f (x0 ) = 2x0 x2 x1 −1 −1 00 f (x1 ) = 2x1 B. Taylor. Inny, 1717, J. Nocedal and S. J. Wright. 1999. x1 0 0 1 2 x −2 −1 0 1 2 −1 f (x) = 3 x 3 −x+1 x The Hessian Matrix −1 pN k = −Hk ∇fk The Newton step: I ∇2 fk = Hk is the Hessian matrix, one of second order partial derivatives[2] : 2 2 ∂2f f ∂ f · · · ∂x∂ ∂x 2 ∂x ∂x ∂x n 1 2 1 21 2 ∂2f f ∂ f · · · ∂x∂ ∂x ∂x2 ∂x1 ∂x2 n 2 2 H= .. .. .. . . . . . . 2 ∂ f ∂2f ∂2f ··· ∂x ∂x ∂x ∂x ∂x2 n 1 n 2 n I The steepest descent method results from the same equation when we set H to the identity matrix. I Quasi-Newton methods initialise H to the identity matrix, then to approximate it from an update formula using a gradient history. I The Hessian proper must be positive definite (an quite well conditioned) to make an inverse; an indefinte Hessian results in non-descent search directions. [2] L. O. Hesse. Chelsea, 1972. GRAPE; gradient and Hessian GRAPE gradient I Split Liouvillian to controllable and uncontrollable parts[3] , ˆ (t) = Ĥ ˆ + P c(k) (t)Ĥ ˆ L̂ 0 k k I T R ˆ Maximise the fidelity measure, J = <e hσ̂| exp(0) −i L̂ (t)dt |ρ̂(0)i 0 ∂J ∂ck (t) = 0 at a minimum, and the Hessian matrix I Optimality conditions, should be positive definite I Gradient found from forward and backward propagation ∂J (k) ∂cn I = hσ| P̂ˆ N P̂ˆ N −1 · · · ∂ (k) ∂cn P̂ˆ n · · · P̂ˆ 2 P̂ˆ 1 |ψ0 i Propagator over a time slice: " ! # X (k) ˆ ˆ ˆ P̂ n = exp −i Ĥ 0 + cn Ĥ k ∆t k [3] N. Khaneja et al. In: Journal of Magnetic Resonance 172.2 (2005), pp. 296 –305. GRAPE Hessian I (block) diagonal elements ∂2J (k) 2 ∂cn I ∂2 (k) 2 ∂cn P̂ˆ n · · · P̂ˆ 2 P̂ˆ 1 |ψ0 i non-diagonal elements ∂2J (k) (k) ∂cn2 ∂cn1 I = hσ| P̂ˆ N P̂ˆ N −1 · · · = hσ| P̂ˆ N P̂ˆ N −1 · · · ∂ (k) ∂cn2 P̂ˆ n2 · · · ∂ (k) ∂cn1 P̂ˆ n1 · · · P̂ˆ 2 P̂ˆ 1 |ψ0 i All propagators of the non-diagonal blocks have been calculated within a gradient calculation, and can be reused. Only need to find the diagonal blocks. Van Loan’s Augmented Exponentials Efficient Gradient Calculation Augmented Exponentials I Among the many complicated functions encountered in magnetic resonance simulation context, chained exponential integrals involving square matrices Ak and Bk occur particularly often: tn−2 Zt Zt1 Z o n dt1 dt2 · · · dtn−1 eA1 (t−t1 ) B1 eA2 (t1 −t2 ) B2 · · · eA1 (t−t1 ) Bn−1 eAn tn−1 0 0 0 I A method for computing some of the integrals of the general type shown in Equation of this type was proposed by Van Loan in 1978[4] (pointed out by Sophie Schirmer[5] ) I Using this augmented exponential technique, we can write an upper-triangular block matrix exponential as Rt A(t−s) As R1 A(1−s) As At A e e Be ds A B e e Be ds = = exp 0 0 0 A 0 eAt 0 eA [4] [5] C. F. Van Loan. In: Automatic Control, IEEE Transactions on 23.3 (1978), pp. 395–404. F. F. Floether, P. de Fouquieres and S. G. Schirmer. In: New Journal of Physics 14.7 (2012), p. 073023. Efficient Gradient Calculation Augmented Exponentials I Find the derivative of the control pulse at a specific time point I set Z1 ˆ (k) ˆ ∆t ⇒ B = −iĤ eA(1−s) BeAs ds = Dcn (t) exp −iL̂ n ∆t 0 I leading to an efficient calculation of the gradient element ! −iL̂∆t ˆ ˆ −iL̂∆t ∂ ˆ ˆ (k) e −iL̂∆t −iĤn ∆t (k) e ∂cn exp = ˆ ˆ 0 −iL̂∆t 0 e−iL̂∆t Efficient Hessian Calculation Augmented Exponentials I second order derivatives can be calculated with a 3 × 3 augmented exponential[6] I set Z1 Zs eA(1−s) Bn1 eA(s−r) Bn2 eAr drds = Dc2n 1 cn2 0 I 0 Giving the efficient Hessian element calculation ˆ ˆ (k ) −iL̂∆t −iĤn11 ∆t 0 ˆ ˆ (k ) exp −iL̂∆t −iĤn22 ∆t 0 = ˆ 0 0 −iL̂∆t ˆ ˆ ˆ −iL̂∆t −iL̂∆t ∂2 ∂ e−iL̂∆t (k1 ) e (k1 ) (k2 ) e ∂c ∂c ∂c n n n 1 1 2 ˆ ˆ −iL̂∆t ∂ 0 e−iL̂∆t (k2 ) e ∂cn2 0 [6] ˆ (k) ˆ ∆t ⇒ B = −iĤ (t) exp −iL̂ n n ∆t 0 ˆ e−iL̂∆t T. F. Havel, I Najfeld and J. X. Yang. In: Proceedings of the National Academy of Sciences 91.17 (1994), pp. 7962–7966, I. Najfeld and T. Havel. In: Advances in Applied Mathematics 16.3 (1995), pp. 321 –375. Overtone excitation Simulation T̂1,0 I I I I Excite 14 N from a state T̂1,0 → T̂2,2 . Solid state powder average, with objective functional weighted over the crystalline orientations (rank 17 Lebedev grid - 110 points). Nuclear quadrupolar interaction. 400 time points for total pulse duration of 40µs T̂2,2 Overtone excitation Comparison of BFGS and Newton-Raphson 1 BFGS Newton 0.95 0.9 0.85 1-fidelity 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0 20 40 60 iterates 80 100 Regularisation Problems Singularities everywhere! I BFGS (using the DFP formula) is guaranteed to produce a positive definite Hessian update I The Newton-Raphson method does not: −1 pN k = −Hk ∇fk I Properties of the Hessian matrix: 1. Must be symmetric: ∂2 ∂c(i) ∂c(j) = ∂2 ; ∂c(j) ∂c(i) not if control operators commute 2. Must be sufficiently positive definite; non-singular; invertible. 3. The Hessian is diagonally dominant. Regularisation Avoiding Singularities I Common when we have negative eigenvalues, regularise the Hessian to be positive definite[7] . I Check eigenvalues performing an eigendecomposition of the Hessian matrix: H = QΛQ−1 I Add the smallest negative eigenvalue to all eigenvalues, then reform the Hessian with initial eigenvectors: λmin = max [0, − min(Λ)] ˆ −1 Hreg =Q(Λ + λmin I)Q TRM Introduce a constant δ; region of a radius we trust to give a sufficiently positive definite Hessian. ˆ −1 Hreg = Q(Λ + δλmin I)Q However, if δ is too small, the Hessian will become ill-conditioned. RFO The method proceeds to construct an augmented Hessian matrix 2 δ H δ~g Haug = = QΛQ−1 δ~g 0 [7] X. P. Resina. PhD thesis. Universitat Autónoma de Barcelona, 2004. State transfer z H z 0 F H 14 + C 0 H I n o ˆ (H) , L̂ ˆ (H) , L̂ ˆ (C) , L̂ ˆ (C) , L̂ ˆ (F ) , L̂ ˆ (F ) Controls := L̂ x y x y x y valid vs. invalid parametrisation of Lie groups. 16 − I Interaction parameters of a hydroflourocarbon molecular group used in state transfer simulations, with a magnetic induction of 9.4 Tesla State transfer Comparison of BFGS and Newton-Raphson I 1-fidelity 10 0 1-fidelity 10 0 10 -1 10 -1 0 50 100 iterations 150 0 500 wall clock time (s) 1000 State transfer 10 0 10 0 10 -2 10 -2 10 -4 10 -4 1-fidelity 1-fidelity Comparison of BFGS and Newton-Raphson II 10 -6 10 -8 10 -10 10 -6 10 -8 0 50 iterations 100 10 -10 0 1000 wall clock time (s) 2000 Closing Remarks I Hessian calculation that scales with O(n) computations. I Efficient directional derivative calculation with augmented exponentials I better regularisation and conditioning I infeasible start points? I Different line search methods? I forced symmetry of a Hessian