Geophysical Inverse Problems with a focus on seismic tomography CIDER2012- KITP- Santa Barbara Seismic travel time tomography Principles of travel time tomography 1) In the background, “reference” model: Travel time T along a ray g: 1 T=ò ds = g v (s) 0 ò g u (s)ds 0 v0(s) velocity at point s on the ray u= 1/v is the “slowness” The ray path g is determined by the velocity structure using Snell’s law. Ray theory. 2) Suppose the slowness u is perturbed by an amount du small enough that the ray path g is not changed. The travel time is changed by: 1 1 dv dT = ò d u ds = - ò 2 d vds = - ò ds g g v0 g v0 v0 j=M æ dv ö 1 dv dTi = - ò (s)ds = å Gij ç ÷ è v0 ø j g v0 (s) v0 j=1 where: G ij = - lij v 0j lij is the distance travelled by ray i in block j v0j is the reference velocity (“starting model”) in block j Solving the problem: “Given a set of travel time perturbations dTi on an ensemble of rays {i=1…N}, determine the perturbations (dv/v0)j in a 3D model parametrized in blocks (j=1…M}” is solving an inverse problem of the form: d d = Gd m or M d di = åGijd m j j=1 d= data vector= travel time pertubations dT m= model vector = perturbations in velocity i = 1, N d d = Gd m G has dimensions M x N or Usually N (number of rays) > M (number of blocks): “over determined system” M d di = åGijd m j i = 1, N j=1 We write: G T d d = G T Gd m GTG is a square matrix of dimensions MxM If it is invertible, we can write the solution as: dmˆ = (G G) G dd T -1 T where (GTG)-1 is the inverse of GTG In the sense that (GTG)-1(GTG) = I, I= identity matrix “least squares solution” – equivalent to minimizing ||d-Gm||2 dmˆ = (G G) G dd T -1 T “””least squares solution” Minimizes ||d-Gm||2 - G contains assumptions/choices: - Theory of wave propagation (ray theory) - Parametrization (i.e. blocks of some size) In practice, things are more complicated because GTG, in general, is singular: M d di = åGijd m j i =1, N j=1 Some Gij are null ( lij=0)-> infinite elements in the inverse matrix How to choose a solution? • Special solution that maximizes or minimizes some desireable property through a norm • For example: – Model with the smallest size (norm): mTm=||m||2=(m12+m22+m32+…mM2)1/2 – Closest possible solution to a preconceived model <m>: minimize ||m-<m>||2 regularization • Minimize some combination of the misfit and the solution size: (m) e e m m T 2 T e=d-Gm • Then the solution is the “damped least squares solution”: 1 ˆ G G I G d m T 2 T Tikhonov regularization • We can choose to minimize the model size, – eg ||m||2 =[m]T[m] - “norm damping” • Generalize to other norms. – Example: minimize roughness, i.e. difference between adjacent model parameters. – Consider ||Dm||2 instead of ||m||2 and minimize: Dm Dm mT DT Dm mTWmm T – More generally, minimize: (m m )T Wm (m m ) é-1 1 ùé m1 ù ê úê ú -1 1 m2 ú ê ú ê Dm = ê -1 1 úê ú ê úê ú -11 ë ûë mM û <m> reference model Weighted damped least squares • More generally, the solution has the form: m est m [G T WeG 2Wm ]1 G TWe [d G m ] or, m est equivalently : m W G [GW G W ] [d G m ] 1 m T 1 m T 2 1 1 e For more rigorous and complete treatment (incl. non-linear): See Tarantola (1985) Inverse problem theory Tarantola and Valette (1982) Concept of ‘Generalized Inverse’ • Generalized inverse (G-g) is the matrix in the linear inverse problem that multiplies the data to provide an estimate of the model parameters; g ˆ G d m – For Least Squares G g – For Damped Least Squares – Note : Generally G-g ≠G-1 1 G G G G T g T 1 G G I GT T 2 • How to choose so that model is not overly biased? • Leads to idea of trade-off analysis. “L curve” η m m 2 • As you increase the damping parameter , more priority is given to model-norm part of functional. – Increases Prediction Error – Decreases model structure – Model will be biased toward smooth solution Gm d 2 Model Resolution Matrix • How accurately is the value of an inversion parameter recovered? • How small of an object can be imaged ? • Model resolution matrix R: -g mˆ = G d obs -g = G Gmtrue = Rmtrue – R can be thought of as a spatial filter that is applied to the true model to produce the estimated values. • Often just main diagonal analyzed to determine how spatial resolution changes with position in the image. • Off-diagonal elements provide the ‘filter functions’ for every parameter. Masters, CIDER 2010 Checkerboard test 80% mˆ = Rmtrue After Masters, CIDER 2010 R = G -gG R contains theoretical assumptions on wave propagation, parametrization And assumes the problem is linear Ingredients of an inversion • Importance of sampling/coverage – mixture of data types • Parametrization – Physical (Vs, Vp, ρ, anisotropy, attenuation) – Geometry (local versus global functions, size of blocks) • Theory of wave propagation – e.g. for travel times: banana-donut kernels/ray theory Surface waves P S SS P, PP S, SS Arrivals well separated on the seismogram, suitable for travel time measurements Generally: -Ray theory -Iterative back projection techniques - Parametrization in blocks 50 mn P velocity tomography Slabs…… Van der Hilst et al., 1998 ...and plumes Montelli et al., 2004 P Travel Time Tomography: Ray Density maps Vasco and Johnson,1998 Checkerboard tests Karason and van der Hilst, 2000 05 Honshu Fukao and Obayashi 2011 ±1.5 % 0 6 12 410 660 410 660 1000 07 ±1.5 % 13 northern Bonin 08 14 09 11 15 15 06 11 07 12 Tonga 410 660 1000 ±1.5% 13 08 ±1.5% 09 14 15 10 Fukao and Obayashi 2011 Kermadec 400 660 1000 Fukao and Obayashi, 2011 Tonga S40RTS Ritsema et al., 2011 PRI-S05 Montelli et al., 2005 South Pacific superswell EPR Rayleigh wave overtones By including overtones, we can see into the transition zone and the top of the lower mantle. after Ritsema et al, 2004 Models from different data subsets 120 km 600 km 1600 km 2800 km After Ritsema et al., 2004 The travel time dataset in this model includes: Sdiff ScS2 Multiple ScS: ScSn Coverage of S and P After Masters, CIDER 2010 P S SS Surface waves Full Waveform Tomography Long period (30s-400s) 3- component seismic waveforms Subdivided into wavepackets and compared in time domain to synthetics. u(x,t) = G(m) du = A dm A= ∂u/∂m contains Fréchet derivatives of G UC B e r k e l e y SS Sdiff PAVA NACT Li and Romanowicz , 1995 PAVA NACT 2800 km depth from Kustowski, 2006 Waveforms only, T>32 s! 20,000 wavepackets NACT To et al, 2005 Indian Ocean Paths - Sdiffracted Corner frequencies: 2sec, 5sec, 18 sec To et al, 2005 To et al., EPSL, 2005 Full Waveform Tomography using SEM: Data Synthetics Replace mode synthetics by numerical synthetics computed using the Spectral Element Method (SEM) UC B e r k e l e y SEMum (Lekic and Romanowicz, 2011) -12% S20RTS (Ritsema et al. 2004) -7% 70 km +8% +6% -7% -6% 125 km +9% +8% -6% -4% 180 km +8% +6% -5% -3.5% 250 km +5% +3% French et al, 2012, in prep. Courtesy of Scott French Fukao and Obayashi, 2011 Tonga South Pacific superswell Macdonald Samoa SEMum2 French et al., 2012 S40RTS Ritsema et al., 2011 EPR Easter Island Summary: what’s important in global mantle tomography • Sampling: improved by inclusion of different types of data: surface waves, overtones, body waves, diffracted waves… • Theory: to constrain better amplitudes of lateral variations as well as smaller scale features (especially in low velocity regions) • Physical parametrization: effects of anisotropy!! • Geographical parametrization: local/global basis functions • Error estimation