Singular value decomposition If only the first p singular values are nonzero we write G = [Up | Uo] " Sp 0 0 0 # Up represents the first p columns of U Uo represents the last N-p columns of U Vp represents the first p columns of V Vo represents the last M-p columns of V [Vp | Vo]T → A data null space is created → A model null space is created Properties UpT Uo = 0 UoT Up = 0 VpT Vo = 0 VoT Vp = 0 UpT Up = I UoT Uo = I VoT Vo = I VpT Vp = I Since the columns of Vo and Uo multiply by zeros we get the compact form for G G = UpSpVpT 120 Model null space Consider a vector made up of a linear combination of the columns of Vo mv = M X λiv i i=p+1 The model m lies in the space spanned by columns of Vo Gm v = M X λiUpSpVpT v i = 0 i=p+1 So any model of this type has no affect on the data. It lies in the model null space ! Where have we seen this before ? Consequence: If any solution exists to the inverse problem then an infinite number will Assume the model mls fits the data Gmls = dobs G(mls + mv ) = Gmls + Gmv Uniqueness question of Backus and Gilbert = dobs + 0 The data can not constrain models in the model null space 121 Data null space Consider a data vector with at least one component in Uo dobs = do + λiui (i > p) For any model space vector m we have dpre = Gm = UpSpVpT m = Up a For the model to fit the data we must have do + λiui = p X j=1 aj uj dobs = dpre Where have we seen this before ? So data of this type can not be fit by any model. The data has a component in the data null space ! Consequence: No model exists that can fit the data Existence question of Backus and Gilbert All this depends on the structure of the kernel matrix G ! 122 Moore Penrose Generalized inverse G† = VpSp−1UpT The generalized inverse combines the features of the least squares and minimum length solutions. Purely over-determined problem it is equivalent to the least squares solution m† = G†d = (GT G)−1GT d In a purely under-determined problem it is equivalent to the minimum length solution m† = G†d = GT (GGT )−1d In general problems it minimizes the data prediction error while also producing a minimizing the length solution. L(m†) = m†T m† φ(m†) = (d − Gm†)T (d − Gm†) 123 Covariance and Resolution of the pseudo inverse How does data noise propagate into the model ? What is the model covariance matrix for the generalized inverse ? CM = G†Cd(G†)T For the case Cd = σ 2I CM = σ 2G†(G†)T G† = VpSp−1UpT Prove this = σ 2VpSp−2VpT Recall that Sp is a diagonal matrix of singular ordered values Sp = diag[s1, s2, . . . , sp] p X v iv Ti ⇒ CM = σ 2 s i i=1 2 Prove this As the number of singular values, p, increases the variance of What is the effect of singular values on the model covariance ? the model parameters increases ! 124 Covariance and Resolution of the pseudo inverse How is the estimated model related to the true model ? Model resolution matrix m† = Rmtrue R = G† G = VpSp−1UpT UpSpVpT = VpVpT G† = VpSp−1UpT As p increases the model null space decreases p→M : VpT → Vp−1, R→I As the number of singular values, p, increases the resolution of What is the effect of singular values on the resolution matrix ? the model parameters increases ! We see the trade-off between variance and resolution 125 Worked example: tomography δ d = Gδ m Using rays 1- 4 ⎡ ⎢ ⎢ G=⎢ ⎢ ⎣ 1 0 1 0 0 √1 √0 1 2 2 √0 √0 2 0 0 2 ⎡ ⎢ ⎢ G G=⎢ ⎣ T 3 0 1 2 0 3 2 1 1 2 3 0 2 1 0 3 ⎤ ⎤ ⎥ ⎥ ⎥ ⎥ ⎦ ⎥ ⎥ ⎥ ⎦ This has eigenvalues 0, 2, 4, 6. ⎡ ⎢ ⎢ Vp = ⎢ ⎣ 0.5 −0.5 −0.5 0.5 0.5 0.5 0.5 0.5 −0.5 0.5 −0.5 0.5 s12 = 6 s22 = 4 ⎤ ⎥ ⎥ ⎥ ⎦ s32 = 2 ⎡ ⎢ ⎢ Vo = ⎢ ⎣ 0.5 0.5 −0.5 −0.5 ⎤ ⎥ ⎥ ⎥ ⎦ Gv o = 0 s42 = 0 126 Worked example: Eigenvectors S12=6 S22=4 ⎡ ⎢ ⎢ ⎣ Vp = ⎢ 0.5 −0.5 −0.5 0.5 0.5 0.5 0.5 0.5 −0.5 0.5 −0.5 0.5 S32=2 ⎡ ⎢ ⎢ ⎣ Vo = ⎢ 0.5 0.5 −0.5 −0.5 127 ⎤ ⎥ ⎥ ⎥ ⎦ ⎤ ⎥ ⎥ ⎥ ⎦ Worked example: tomography Using all non zero eigenvalues s1, s2 and s3 the resolution matrix becomes δm = Rδ mtrue = VpVpT δ mtrue ⎡ ⎢ ⎢ ⎣ R=⎢ 0.75 −0.25 0.25 0.25 −0.25 0.75 0.25 0.25 0.25 0.25 0.75 −0.25 0.25 0.25 −0.25 0.75 Input model ⎤ ⎥ ⎥ ⎥ ⎦ ⎡ ⎢ ⎢ ⎣ Vp = ⎢ 0.5 −0.5 −0.5 0.5 0.5 0.5 0.5 0.5 −0.5 0.5 −0.5 0.5 Recovered model 128 ⎤ ⎥ ⎥ ⎥ ⎦ Worked example: tomography Using eigenvalues s1, s2 and s3 the model covariance becomes ⇒ CM p X v iv Ti =σ 2 i=1 si 2 s12 = 2 ⎧ ⎛ ⎪ 1 −1 1 −1 ⎪ ⎪ 2 ⎜ ⎨ σ 1 ⎜ −1 1 −1 1 CM = ⎜ ⎪ 2 ⎝ 1 −1 1 −1 4 ⎪ ⎪ ⎩ −1 1 −1 ⎡ σ2 ⎢ ⎢ CM = ⎢ 48 ⎣ 1 ⎞ ⎛ ⎟ 1⎜ ⎟ ⎜ ⎟+ ⎜ ⎠ 4⎝ ⎞ ⎛ 1 −1 −1 1 1⎜ −1 1 1 −1 ⎟ ⎟ ⎜ ⎟+ ⎜ −1 −1 1 −1 ⎠ 6⎝ 1 −1 −1 1 11 −7 5 −1 −7 11 −1 5 5 −1 11 −7 −1 5 −7 11 1 1 1 1 s22 = 4 1 1 1 1 1 1 1 1 1 1 1 1 s32 = 6 ⎞⎫ ⎪ ⎪ ⎪ ⎟⎬ ⎟ ⎟ ⎠⎪ ⎪ ⎪ ⎭ ⎤ ⎥ ⎥ ⎥ ⎦ 129 Worked example: tomography Repeat using only one singular value s3 =6 Model resolution matrix ⎡ 1⎢ ⎢ T R = Vp Vp = ⎢ 4⎣ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ⎢ ⎢ ⎣ Vp = ⎢ 0.5 0.5 0.5 0.5 ⎤ ⎥ ⎥ ⎥ ⎦ ⎤ ⎥ ⎥ ⎥ ⎦ Input Model covariance matrix ⎡ Output p X v iv Ti CM = σ 2 i=1 si 2 ⎡ σ2 ⎢ ⎢ = ⎢ 24 ⎣ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ⎤ ⎥ ⎥ ⎥ ⎦ 130 Recap: Singular value decomposition There may exist a model null space -> models that can not be constrained by the data. There may exist a data null space -> data that can not be fit by any model. The general linear discrete inverse problem may be simultaneously under and over determined (mix-determined). Singular value decomposition is a framework for dealing with ill-posed problems. The Pseudo inverse is constructed using SVD and provides a unique model with desirable properties. Fits the data in a least squares sense Gives a minimum length model (no component in the null space) Model Resolution and Covariance can be traded off by choosing the number of eigenvalues to use in reconstruction. 131 Ill-posedness = sensitivity to noise Look what happens when the eigenvalues are small and positive Truncated SVD m† = VpSp−1UpT d = ! p à X ui · d i=1 si Discrete Picard condition vi Stability question of Backus and Gilbert Noise in the data is amplified in the model if si << 1. The eigenvalue spectrum needs to be truncated by reducing p. TSVD: Choose the smallest p such that data fit is acceptable ||Gm − d||2 ≤ δ As N or M increase the computational cost increases significantly ! (See example 4.3 of Aster et al., 2005) 132 SVD Example: The Shaw problem m(θ) = intensity of light incident on a slit at angle θ − π π ≤ m(θ) ≤ 2 2 d(s) = measurements of diffracted light intensity at angle s − π π ≤s≤ 2 2 Shaw Problem Given d(s) find m(s) ? d(s) = Z π/2 −π/2 (cos(s)+cos(θ))2 à sin(π(sin(s) + sin(θ))) π(sin(s) + sin(θ)) !2 m(θ)dθ Is this a continuous or discrete inverse problem ? Is this a linear or nonlinear inverse problem ? 133 SVD Example: The Shaw problem Let’s discretize the inverse problem Data d(s) and model m(θ) at N equal angles si = θi = (i − 0.5)π π − , n 2 di = d(si) mj = m(θj ) (i = 1, 2, . . . , n) (i = 1, . . . , n) (j = 1, . . . , n) This gives a system of N× N linear equations d = Gm where Gi,j = ∆s(cos(si )+cos(θj ))2 à sin(π(sin(si) + sin(θj ))) π(sin(si) + sin(θj )) ∆s = !2 π n See MATLAB routine `shaw’ 134 Example: Ill-posedness Ill-posedness means solution sensitivity to noise m† = VpSp−1UpT d = ! p à X ui · d i=1 si vi d = Gm si 20 data, 20 unknowns N = M = 20 i Eigenvalue spectrum for Shaw problem Condition number is the ratio of largest to smallest singular value = 1014 Large condition number means severe ill-posedness 135 Example: Ill-posedness Eigenvectors for different singular values: Shaw problem ! p à X ui · d † −1 T m = VpSp Up d = vi Amplitude i=1 si v1 v 18 Model units Eigenvector for smallest non-zero singular value Model units Eigenvector for largest singular value 136 Test inversion without noise d = Gm m† = VpSp−1UpT d = ! p à X ui · d i=1 vi Data from input spike Amplitude Input spike model si Model units Data units Recovered model 137 Test inversion with noise m† = VpSp−1UpT d = d = Gm ! p à X ui · d si i=1 vi Data from spike model Model units Data units Amplitude Input spike model Add Gaussian noise to data Recovered model σ = 10−6 Presence of small eigenvalues means sensitivity of solution to noise 138 Shaw problem with p=10 m† = VpSp−1UpT d = d = Gm ! p à X ui · d i=1 si vi Amplitude Input spike model Model units use first 10 eigenvalues only No noise solution Noise solution Truncating Truncatingeigenvalues eigenvaluesreduces reducessensitivity sensitivityto tonoise noise but also resolving power of the data but also resolving power of the data 139 Shaw problem Picard plot A guide to choosing the SVD truncation level p (=number of eigenvalues) ! p à X u · d i m† = VpSp−1UpT d = vi s i i=1 The eigenvalue from the truncation level in SVD 140