Fast Algorithms for the Computation of Oscillatory Integrals Emmanuel Candès California Institute of Technology EPSRC Symposium Capstone Conference Warwick Mathematics Institute, July 2009 Collaborators Lexing Ying Laurent Demanet Our problem Evaluate numerically a Fourier integral operator (FIO) Z (T f )(x) = a(x, k)e2πiΦ(x,k) fˆ(k) dk Rd at points x given on a Cartesian grid k ∈ Rd : frequency variable (fˆ(k) = R Rd e−2πix·k f (x) dx) a(x, k): (smooth) amplitude Φ(x, λk): homogeneous (smooth) phase function as large as |k| Φ(x, λk) = λ Φ(x, k), e.g. Φ(x, k) = g(x)|k| λ>0 A motivating example: wave propagation ∂2u (x, t) = c2 ∆u(x, t), ∂t2 u(x, 0) = ∂u/∂t(x, 0) = u0 (x) 0 A motivating example: wave propagation ∂2u (x, t) = c2 ∆u(x, t), ∂t2 u(x, 0) = ∂u/∂t(x, 0) = u0 (x) 0 Solution operator is Z Z 1 2πi(x·k+c|k|t) 2πi(x·k−c|k|t) u(x, t) = e û0 (k) dk + e û0 (k) dk 2 R2 R2 Two FIOs with phase functions Φ± (x, k) = x · k ± c|k|t Inhomogeneous medium c(x) → solution operator = sum of two FIOs (small times) Importance of FIOs Arise in many (inverse) problems Applying FIOs is often the computational bottleneck Example in seismics Marine survey Kirchhoff migration Wave measurements fs (t, xr ) parametrized by time t receiver location xr source coordinate xs Forward map: F δc = δp Imaging operator is F ∗ : FIO under general assumptions Approximations by generalized Radon transform (GRT): integration of fs over fixed set of curves parametrized by travel times Z gs (x) = δ(t − τ (x, xr ) − τ (x, xs )) fs (t, xr ) dt dxr Followed by stack operation over the s index Other examples Transmission electron microscopy Radar imaging Ultrasound imaging Discrete computational problem Discrete grids X = {(i1 /N, i2 /N ) : 0 ≤ i1 , i2 < N } ⊂ [0, 1]2 Ω = {(k1 /N, k2 /N ) : −N/2 ≤ k1 , k2 < N } ⊂ [−1/2, 1/2]2 Given input {f (k)}k∈Ω , evaluate (T f )(x) := 1 X a(x, k) e2πiN Φ(x,k) f (k), N k∈Ω with Φ smooth and homogeneous in k at all x∈X Peek at the results (T f )(x) := 1 X a(x, k) e2πiN Φ(x,k) f (k), N x∈X k∈Ω Kernel is not analytic and is highly oscillatory Naive evaluation O(N 4 ) (O(N 2 ) inputs/outputs) Algorithm for fast summation O(N 2.5 log N ) (C., Demanet and Ying, ’06) Peek at the results (T f )(x) := 1 X a(x, k) e2πiN Φ(x,k) f (k), N x∈X k∈Ω Kernel is not analytic and is highly oscillatory Naive evaluation O(N 4 ) (O(N 2 ) inputs/outputs) Algorithm for fast summation O(N 2.5 log N ) (C., Demanet and Ying, ’06) Today: Novel algorithm with optimal complexity for accurate summation O(N 2 log N ) flops O(N 2 ) storage Agenda The butterfly structure Fast butterfly algorithm for the evaluation of FIOs The Butterfly Structure Butterfly algorithm General algorithmic structure for evaluating certain types of integrals X ui = K(xi , pj )fj j Introduced by Michielssen and Boag (’96) Generalized by O’Neil and Rokhlin (’07) Butterfly algorithm General algorithmic structure for evaluating certain types of integrals X ui = K(xi , pj )fj j Introduced by Michielssen and Boag (’96) Generalized by O’Neil and Rokhlin (’07) Example: K(x, p) = e2πiN xp Applications {xi }: N points in [0, 1] FFT {pj }: N points in [0, 1] nonuniform FFTs {fj }: sources at {pj } many others Kernel is dense and oscillatory Low-rank approximation (K(x, p) = exp(2πiN xp)) A interval in x B interval in p obeying length(A) × length(B) ≤ 1/N The submatrix {K(xi , pj ) : xi ∈ A, pj ∈ B} has approximately low rank: |K(x, p) − r X t=1 with r = O(log(1/)) at (x)bt (p)| ≤ Example 106 × 106 DFT Top left 103 × 103 block Top singular values 0 10 −5 10 −10 10 −15 10 −20 10 0 10 20 30 40 50 60 70 80 90 100 Interpolative decompositions O’Neil and Rokhlin suggest using interpolative decompositions at (x) := K(x, pt ), pt ∈ B Rank-revealing QR decomposition: Gu and Eisenstat (’96) Interpolative decomp.: Cheng, Gimbutas, Martinsson and Rokhlin (’05) Interpolative representation K(x, p) ≈ r X t=1 Cost for an m × n matrix is O(mn2 ) K(x, pt )bt (x) Multiscale decompositions Compute low-rank approximations of all submatrices obeying length(A) × length(B) = 1/N Use two-scale relations for efficiency Definition of partial sums and equivalent sources Partial sums uB (x) = X K(x, pj )fj pj ∈B Approximation for x ∈ A and length(A) × length(B) ≤ 1/N r X X , x ∈ A uB (x) ≈ K(x, pAB bAB t ) t (pj )fj t=1 pj ∈B Definition of partial sums and equivalent sources Partial sums uB (x) = X K(x, pj )fj pj ∈B Approximation for x ∈ A and length(A) × length(B) ≤ 1/N r X X , x ∈ A uB (x) ≈ K(x, pAB bAB t ) t (pj )fj t=1 pj ∈B Equivalent sources for (A, B): {ftAB }1≤t≤r X ftAB = bAB t (pj )fj pj ∈B Definition of partial sums and equivalent sources Partial sums uB (x) = X K(x, pj )fj pj ∈B Approximation for x ∈ A and length(A) × length(B) ≤ 1/N r X X , x ∈ A uB (x) ≈ K(x, pAB bAB t ) t (pj )fj pj ∈B t=1 Equivalent sources for (A, B): {ftAB }1≤t≤r X ftAB = bAB t (pj )fj pj ∈B Compact representation of K AB : B → A uB (x) = r X t=1 AB K(x, pAB t )ft Definition of partial sums and equivalent sources Partial sums uB (x) = X K(x, pj )fj pj ∈B Approximation for x ∈ A and length(A) × length(B) ≤ 1/N r X X , x ∈ A uB (x) ≈ K(x, pAB bAB t ) t (pj )fj pj ∈B t=1 Equivalent sources for (A, B): {ftAB }1≤t≤r X ftAB = bAB t (pj )fj pj ∈B Compact representation of K AB : B → A uB (x) = r X AB K(x, pAB t )ft t=1 Butterfly structure AB Recursive computation of {pAB } for length(A) × length(B) = 1/N t }, {ft Initialization AB For all (A, B), `(A) = 1 & `(B) = 1/N , construct {pAB }1≤t≤r t }1≤t≤r and {ft X ftAB = bAB t (pj )fj pj ∈B 2 cost of constructing {pAB t } is O(r N )/pair AB cost of constructing {ft } is O(r2 )/pair ⇒ tot. cost is O(r2 N 2 ) AB Interpretation: {pAB } column weights t } selected columns, {ft Recursion: “merge, split, compress” AB For all (A, B), `(A) = 1/2 & `(B) = 2/N , get {pAB }1≤t≤r t }1≤t≤r and {ft uB (x) ≈ 2 X r X i=1 t=1 A Bi K(x, pt p Ap Bi )ft , x ∈ A ⊂ Ap Recursion: “merge, split, compress” AB For all (A, B), `(A) = 1/2 & `(B) = 2/N , get {pAB }1≤t≤r t }1≤t≤r and {ft uB (x) ≈ 2 X r X A Bi K(x, pt p Ap Bi )ft , x ∈ A ⊂ Ap i=1 t=1 A Bi Can reduce the rank of K(x, pt p Ap Bi Treat {ft A Bi ) : x ∈ A, {pt p A Bi }t,i as sources at {pt p ftAB = 2 X r X }t,i → K(x, pAB t ) }t,i ⊂ B Ap Bi bAB )fsAp Bi t (ps i=1 s=1 2 cost of constructing {pAB t } is O(r N )/pair AB cost of constructing {ft } is O(r2 )/pair ⇒ tot. cost is O(r2 N 2 ) Schematic representation Schematic representation Bi u (x) = r X t=1 A B A B K(x, pt p i )ft p i uB (x) = r X t=1 AB K(x, pAB t )ft Next step ... until `(B) = 1 and `(A) = 1/N Termination In the end, `(A) = 1/N and `(B) = 1, and B u (x) ≈ r X AB K(x, pAB t )ft t=1 cost of evaluating uB (x) is O(r)/pair ⇒ tot. cost is O(rN ) Multiscale recursion Summary: complexity analysis If low-rank expansions available 2 If not O(r N log N ) evaluation time O(r2 N 2 ) evaluation time O(r2 N log N ) storage O(r2 N log N ) storage Summary: complexity analysis If low-rank expansions available If not 2 O(r N log N ) evaluation time O(r2 N 2 ) evaluation time O(r2 N log N ) storage O(r2 N log N ) storage Powerful architecture but Precomputation time may be prohibitive Storage may be prohibitive Summary: complexity analysis If low-rank expansions available If not 2 O(r N log N ) evaluation time O(r2 N 2 ) evaluation time O(r2 N log N ) storage O(r2 N log N ) storage Powerful architecture but Precomputation time may be prohibitive Storage may be prohibitive Our contribution O(r2 N log N ) evaluation time and storage complexity is O(r2 N ) Fast Evaluation of FIOs Recall oscillatory integral u(x) = X e2πiN Φ(x,k) fk k∈Ω 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0 0 0.1 0.2 0.4 0.6 x∈X 0.8 1 0 0 0.2 0.4 0.6 k∈Ω 0.8 1 Polar coordinates for frequency variable k Phase may be singular at k = 0 because of homogeneity (e.g. Φ(x, k) = |k|) Polar coordinates for frequency variable k Phase may be singular at k = 0 because of homogeneity (e.g. Φ(x, k) = |k|) Polar coordinates p = (p1 , p2 ) k1 = p1 cos(2πp2 ) k2 = p2 sin(2πp2 ) 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 30 20 10 0 −10 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0.2 0.4 0.6 0.8 1 Cartesian 0 0 −20 0.2 0.4 0.6 0.8 Polar Slight abuse of notations X u(x) = e2πiN Ψ(x,p) fp , p∈Ω 1 −30 −30 −20 −10 0 10 20 Partitioning in k ⇒ smooth phase Ψ 30 Hierarchical structure `(A) × `(B) = 1/N Low-rank interactions If `(A)`(B) ≤ 1/N , kernel e2πiN Ψ(x,p) : x ∈ A, p ∈ B has approx. low rank Assume wlog A and B centered at 0 Low-rank interactions If `(A)`(B) ≤ 1/N , kernel e2πiN Ψ(x,p) : x ∈ A, p ∈ B has approx. low rank Assume wlog A and B centered at 0 Residual phase in 1D (for simplicity) RAB (x, p) = Ψ(x, p) − Ψ(0, p) − Ψ(x, 0) + Ψ(0, 0) = ∂x ∂p Ψ(x∗ , p∗ ) xp = O(1/N ) Same calculation in higher dimensions Low-rank interactions If `(A)`(B) ≤ 1/N , kernel e2πiN Ψ(x,p) : x ∈ A, p ∈ B has approx. low rank Assume wlog A and B centered at 0 Residual phase in 1D (for simplicity) RAB (x, p) = Ψ(x, p) − Ψ(0, p) − Ψ(x, 0) + Ψ(0, 0) = ∂x ∂p Ψ(x∗ , p∗ ) xp = O(1/N ) Same calculation in higher dimensions Shows that e2πiN R AB (x,p) approx. low rank Low-rank interactions If `(A)`(B) ≤ 1/N , kernel e2πiN Ψ(x,p) : x ∈ A, p ∈ B has approx. low rank Assume wlog A and B centered at 0 Residual phase in 1D (for simplicity) RAB (x, p) = Ψ(x, p) − Ψ(0, p) − Ψ(x, 0) + Ψ(0, 0) = ∂x ∂p Ψ(x∗ , p∗ ) xp = O(1/N ) Same calculation in higher dimensions Shows that e2πiN R AB (x,p) approx. low rank Factorization h i AB e2πiN Ψ(x,p) = e−2πiN Ψ(0,0) e2πiN Ψ(0,x) e2πiN R (x,p) e2πiN Ψ(0,p) Oscillatory Chebyshev interpolation Chebyshev interpolation of e2πiN R √ x when `(A) ≤ 1/ N √ p when `(B) ≤ 1/ N AB (x,p) in E.g. interpolation in p {pt }B : tensor-Chebyshev grid in B 0 B B B LB t (p) : Lagrange interpolant with inputs at {pt }, Lt (pt0 ) = δ(t = t ) Oscillatory Chebyshev interpolation Chebyshev interpolation of e2πiN R √ x when `(A) ≤ 1/ N √ p when `(B) ≤ 1/ N AB (x,p) in E.g. interpolation in p {pt }B : tensor-Chebyshev grid in B 0 B B B LB t (p) : Lagrange interpolant with inputs at {pt }, Lt (pt0 ) = δ(t = t ) With grid of logarithmic size in 1/ 2πiN RAB (x,p) X 2πiN RAB (x,pB ) B t − e Lt (p) ≤ e t When interpolation in x, low-rank approximation is P t e2πiN R AB on A × B (xA t ,p) LA t (x) Demodulation/Interpolation/Remodulation From RAB (x, p) = Ψ(x, p) − Ψ(0, p) − Ψ(x, 0) + Ψ(0, 0) X B B 2πiN Ψ(x,p) Ψ(x,pt ) −i2πN Ψ(0,pt ) B i2πN Ψ(0,p) − e|2πiN{z e L (p) e e ≤ t } | {z } t at (x) b (x) t Similar structure (with different interpretation) when interpolating kernel in x Overall structure and two-scale relation 1 Initialize equivalent sources 2 Propagate equivalent sources (interpolation in p) until mid-level 3 Switch representation at mid-level 4 Propagate equivalent sources (interpolation in x) 5 Terminate by evaluating output Step 2: propagation of equivalent sources X Ap Bc Bc ftAB = bB t (pt0 )ft0 c,t0 B = e−2πiN Ψ(x0 (A),pt ) XX c Step 4: similar t0 Bc A Bc Bc 2πiN Ψ(x0 (A),pt0 ) LB ft0 p t (pt0 ) e Finer points and summary {pB t } and polynomials Lt (p) are computed all at once Only need to store equivalent sources at pairs of consecutive scales Separation rank higher than that of the interpolative decomposition Finer points and summary {pB t } and polynomials Lt (p) are computed all at once Only need to store equivalent sources at pairs of consecutive scales Separation rank higher than that of the interpolative decomposition Complexity is O(polylog(1/) N 2 log N ) Storage is O(polylog(1/) N 2 for -accurate computation Easy extensions to varying amplitudes → a(x, k)ei2πN Φ(x,k) Numerical results Generalized Radon transform integrating f along ellipses centered at x and with axes of length c1 (x) and c2 (x) is the sum of two FIOs with phases Φ± (x, k) = x · k ± q c21 (x)k12 + c22 (x)k22 Numerical results Generalized Radon transform integrating f along ellipses centered at x and with axes of length c1 (x) and c2 (x) is the sum of two FIOs with phases Φ± (x, k) = x · k ± q c21 (x)k12 + c22 (x)k22 First example (constant amplitude) X u(x) = e2πiN Φ+ (x,k) fˆ(k) k∈Ω with c1 (x) = 1 (2 + sin(2πx1 ) sin(2πx2 )), 3 c2 (x) = 1 (2 + cos(2πx1 ) cos(2πx2 )) 3 (N , Chby. grid) (1024, 5) (2048, 5) (4096, 5) (1024, 7) (2048, 7) (4096, 7) (1024, 9) (2048, 9) (4096, 9) (1024, 11) (2048, 11) Alg. Time 1.48e+3 6.57e+3 3.13e+4 2.76e+3 1.23e+4 5.80e+4 4.95e+3 2.21e+4 1.02e+5 8.33e+3 3.48e+4 Dir. Time 9.44e+4 1.53e+6 2.43e+7 9.48e+4 1.46e+6 2.31e+7 9.44e+4 1.48e+6 2.23e+7 9.50e+4 1.49e+6 Conclusion: scales like O(log(1/)) × O(N 2 log N ) Speedup 6.37e+1 2.32e+2 7.74e+2 3.44e+1 1.19e+2 3.99e+2 1.91e+1 6.71e+1 2.18e+2 1.14e+1 4.27e+1 Error 1.26e-2 1.75e-2 1.75e-2 6.45e-4 8.39e-4 8.18e-4 3.45e-5 4.01e-5 4.21e-5 5.23e-7 5.26e-7 Numerical results II Second example (variable amplitude): exact integration X X u(x) = a+ (x, k)e2πiN Φ+ (x,k) fˆ(k) + a− (x, k)e2πiN Φ− (x,k) fˆ(k) k∈Ω k∈Ω a± (x, k) = (J0 (2πc(x)|k|) ± iY0 (2πc(x)|k|) e∓2πic(x)|k| Φ± (x, k) = x · k + c(x)|k| J0 and Y0 are Bessel functions and spheres’ radii c(x) = 1 (3 + sin(2πx1 ) sin(2πx2 )) 4 (N , Chby. grid) (256, 5) (512, 5) (1024, 5) (256, 7) (512, 7) (1024, 7) (256, 9) (512, 9) (1024, 9) (256, 11) (512, 11) (1024, 11) Alg. Time 1.39e+2 7.25e+2 3.45e+3 2.69e+2 1.38e+3 6.43e+3 5.23e+2 2.49e+3 1.15e+4 1.04e+3 4.10e+3 1.84e+4 Dir. Time 3.20e+3 5.20e+4 8.34e+5 3.21e+3 5.20e+4 8.35e+5 3.20e+3 5.17e+4 8.32e+5 3.18e+3 5.11e+4 8.38e+5 Conclusion: scales like O(log(1/)) × O(N 2 log N ) Speedup 2.31e+1 7.17e+1 2.42e+2 1.19e+1 3.78e+1 1.30e+2 6.12e+0 2.08e+1 7.25e+1 3.06e+0 1.24e+1 4.57e+1 Error 1.48e-2 1.62e-2 1.90e-2 4.71e-4 7.30e-4 6.35e-4 1.59e-5 2.97e-5 1.75e-5 8.03e-7 9.38e-7 8.01e-7 Summary Accurate and near-optimal numerical evaluation of FIOs Operating characteristics Butterfly structure Residue phase Oscillatory Chebyshev interpolation Many applications/extensions Other types of oscillatory kernels K(x, p) Other types of problems: e.g. sparse Fourier transform (Ying, ’08) Reference: E. J. Candès, L. Demanet and L. Ying (2008). “A Fast Butterfly Algorithm for the Computation of Fourier Integral Operators,” to appear in Mult. Model. Sim