ss-NMR, Noise Resilience, and a Scalable
Newton’s Method
D.L. Goodwin, iMR-CDT: 1st year Presentation, University of
Nottingham.
3rd December 2014
Optimal Control at Southampton
1
Introduction to Optimal Control
GRadient Ascent Pulse Engineering
2
Solid State Nuclear Magnetic Resonance
Crystallite orientation ensemble average
3
Noise Resilience of Optimal Control
4
GRAPE - Efficient Hessian Calculations
5
Numerical Optimisation
Basic extrema finding
I
Optimisation is the process of finding minimum or maximum
(extrema) of functions
I
Newton-Raphson method uses the gradient at a trial point to, incrementally, approach an
I extrema of an objective function x k +1
= x k
− f ( x )
∇ f ( x )
Numerical Optimisation
Basic extrema finding
I
Basic method to use a step length , α , and a step direction , p , to iteratively find and extrema.
I
We calculate a step direction, then decide how far to move in that direction: x k +1
= x k
+ α k p k
I for minimisation, the step direction should be a descent direction.
p k
= − B k
− 1
∇ f k
Gradient Descent B k
Newton’s Method B k is the identity matrix is the exact Hessain matrix ∇
2 f ( x k
)
Quasi-Newton Methods B k is an approximation to the Hessian.
Numerical Optimisation
Comparison of Methods
Gradient Descent Cheap, Scalable, Slow Convergence
Newton’s Method Can be Expensive, Good Convergence.
BFGS Requires line search, No matrix inversion, Each update in
O ( n
2
).
l-BFGS Requires line search, No matrix inversion, Each update in
O ( n ).
Numerical Optimisation
Example - Rosenbruck Function
Optimal Control Theory
I
The idea of optimal control is to use all controllable parts of the full Hamiltonian to steer an initial quantum state into a target state.
I
A measure of “success” for this control can be fidelity; the overlap between the controlled, final state and the target state (1 begin identical states, 0 being orthogonal states).
I
The control parts of the Magnetic Resonance Hamiltonians are rf pulses for NMR, and mw pulses for ESR.
I
The optimal solutions to the stated optimisation problem are a set of discrete-time control pulses.
Optimal Control Theory
I
H
0
, and the parts we can control,
( k )
.
ˆ
( t ) = ˆ
0
+
X c
( k )
( t )
ˆ
( k ) k
I
The control operators (e.g. Electromagnetic fields), each have a coefficient, c
( k )
( t ), dictating the controllable time-evolution.
I
Find { c
( k )
( t ) } to take our system from an initial state, | ψ
0 i , to a target state, | σ i
Optimal Control Theory
I
Use fidelity as a metric to measure the overlap between target state and final state (1 = total overlap, 0 = no overlap)
J { c
( k )
( t ) } = Re h σ | exp
(0)
Z
T
− i H
0
+
X c
( k )
( t ) H
( k )
!
dt | ψ
0 i k
0
I
The problem is to find the maximum of the fidelity (or the minimum of 1 − fidelity).
GRAPE
I
Assumption: control pulse sequence is piecewise constant.
I
Time-ordered exponential now a sequential multiplication of propagators.
J { c
( k )
( t ) } = Re
D
σ U
N
U
N − 1
· · · U
2
U
1
ψ
0
E
ˆ n
= exp
(0)
− i
H
0
+
X c
( k )
( t n
)
ˆ
( k )
∆ t
k
Ensemble Average
Crystallite orientation, Lebedev grid, Floquet space
I
Take an ensemble average over the crystallite orientations.
I
Feed into a numerical optimisation through GRAPE.
I
Average over a Lebedev grid.
I
Initialise each of the ensemble to the same random guess.
I
Calculate the Liouvillian in a
Floquet basis at each of the grid weights.
I find the local cost and its gradient.
I
Take the weighted average over the cost and gradient.
Ensemble Average
HCF simulations
+140
C
−
160
Hz
Hz
H
F
Figure: Interaction parameters of a molecular group used in HCF small system state transfer simulations, with a magnetic induction of 9.4 Tesla.
protein backbone simulations
C
30 ppm
O
N
− 11 Hz
C
55 ppm
55 Hz
C
7 Hz
175 ppm
− 15 Hz
N
H
Figure: Interaction parameters of a typical protein backbone used in large system state transfer simulations, with a magnetic induction of 9.4 Tesla.
Ensemble Average
I
In real experimental apparatus, the control channels will include some level of noise as most electrical systems do
I
Just as with the ssNMR optimisation, we average over an ensemble of systems.
I each system has the same initial random guess
I
In simulating this noise, the solution is to create an ensemble of systems, each with their own instance of noise affecting the control channels.
I
Noise is additive Gaussian, and is defined at the start of the simulation for each noise instance.
I
We then optimise over the ensemble.
Ensemble Average
1.0
0.8
0.6
N
=
2
5
N
=
2
4
N
=
2
3
0.4
N
=
2
6
0.2
0.0
0.01
0.10
Noise Level
1.00
Figure: Convergence of the noise simulations to a number of noise instances
Ensemble Average
1.0
0.8
0.6
1
H
13
C
19
F
1-spin
Total populations of correlations in the system (upper-right), and coherence orders (lower-right).
2-spin
0.4
0.2
0.0
1.0
3-spin
20 40 60 80 100120 0 20 40 60 80 100120 a direct product basis set the full state space, L , is a direct sum of correlation order subspaces:
1
H
13
C
19
F
0.8
0.6
L = L
0
⊕ L
1
⊕ L
2
⊕ L
3
0.4
0.2
0.0
single-quantum double-quantum
20 40 60 80 100120 0 20 40 60 80 100120
Time point Time point
Figure: Population of states local at each spin of HCF (upper-left), and total population of each state (lower-left).
where the population of a correlation order, p k
, is given by the projection onto its subspace p k
=
L k
| ˆ i
Ensemble Average
1.0
0.8
0.6
0.4
0.2
0.0
0 20 40 60 80 100 120
Time Point
Figure: Measures of similarity between the trajectory of a noisy system compared to that of a similar one without noise
Objective function and Gradient Calculation
I
The problem of second order methods, that require a gradient calculation, is reduced to:
1.
Propagate forwards from the source
2.
Propagate backwards from the target
3.
Compute the expectation of the derivative
*
σ U
N
U
N − 1
· · ·
∂
( k )
∂ c n = t
ˆ n = t
· · ·
ˆ
2
U
1
ψ
0
+
Augmented Exponential
Gradient Calculation
I
We can use a the work of C. Van Loan (1970’s) and Sophie
Schirmer to find the derivative of the control pulse at a specific time point exp
− i
ˆ
L ∆ t − i
0 −
ˆ
H i
ˆ k
( j )
∆
∆ t t
!
=
e
− i
0
ˆ
∆ t ∂
∂ c k
( j ) e
− i e
− i
ˆ
∆ t
ˆ
∆ t
Hessian Matrix
Avoiding Singularities
I
Symmetric
I
Non-singular
I
Size is time-points multiplied by control channels.
I
Diagonally dominant.
Hessian Matrix
Calculating the Hessian elements
I
Calculation of the Hessian elements requires the second order derivatives
I
Scales to O ( n
2
) calculations (compared with O ( n ) for a gradient calculation).
I second order derivatives can be calculated with a 3 × 3 augmented exponential.
exp
− i
ˆ
L ∆ t − i
0
ˆ
H
( k
1 n
1
− i
)
∆ t
ˆ
L ∆ t − i
0 0
0
ˆ
H
( k n
2
− i
2
)
∆
ˆ
∆ t t
=
e
− i
0
ˆ
∆ t ∂
∂ c
( k
1) n
1 e e
− i
ˆ
∆ t
− i
ˆ
∆ t
0 0
∂
2
∂ c
( k
1) n
1
∂ c
( k
2) n
2
∂
∂ c
( k
2) n
2 e
− i e e
− i
ˆ
∆ t
− i
ˆ
∆ t
ˆ
∆ t
Hessian Calculation
Expectation of second order derivatives
I
We can reduce the computation to scale with O ( n ) by realising that we can store propagators from gradient calculation, and then perform an extra set of 3 × 3 augmented exponentials for the
(block) diagonal elements.
*
σ U
N
*
*
σ
σ
U
N
U
N
ˆ
N − 1
· · ·
∂
( k )
∂ c n = t
ˆ
2
U
1
ψ
0
+
∂
2
ˆ
N − 1
· · ·
( k )
∂ c n = t
2
ˆ n = t
· · ·
ˆ n = t +1
∂
( k )
∂ c n = t
ˆ n = t
· · · U
2
U
1
ψ
0
+
+
ˆ
N − 1
· · ·
∂
( k )
∂ c n = t
ˆ n = t
2
· · ·
∂
( k )
∂ c n = t
ˆ n = t
1
· · ·
ˆ
2
U
1
ψ
0
Hessian Regularisation
Avoiding Singularities
I
A singular matrix is one that is not invertable.
I
Common when we have negative eigenvalues.
I
We can regularise the Hessian matrix:
TRM - We can add a multiple of the identity to the Hessian
H reg
= H + λ I where λ > max { 0 , − min [ eig ( H )] }
RFO - Prevents the algorithm taking large steps....
Results
Comparing the Newton method
10
0
10
-1
10
-2
10
-3
10
-4
10
-5
10
-6
Steepest descent
50 100 150 200 250
10
0
10
-1
10
-2
10
-3
10
-4
10
-5
10
-6
BFGS
50 100 150 200 250
10
0
10
-1
10
-2
10
-3
10
-4
10
-5
10
-6
L-BFGS
50 100 150 200 250
10
0
10
-1
10
-2
10
-3
10
-4
10
-5
10
-6
Finite Difference Hessian
50 100 150 iterations
200 250
10
0
10
-1
10
-2
10
-3
10
-4
10
-5
10
-6
Newton (fmincon.m)
50 100 150 iterations
200 250
10
0
10
-1
10
-2
10
-3
10
-4
10
-5
10
-6
Newton-TRM
50 100 150 iterations
200 250
Acknowledgement
Marina Caravetta University of Southampton
Sophie Schirmer Swansea University
Frank Langbein Cardiff University
Gavin Morley University of Warwick
Anton Tcholakov University of Warwick
Sean Wright University of Southampton
Spinach
Ilya Kuprov University of Southampton
Hannah Hogben University of Oxford
Luke Edwards University of Oxford
Matthew Krzystyniak Oxford e-Research Centre
Peter Hore University of Oxford
Gareth Charnock Oxford e-Research Centre
Dmitry Savostyanov University of Southampton
Sergey Dolgov Max-Planck Institute for Mathematics in the Sciences
Zenawi Welderufael University of Southampton