Numerical Methods I: Nonlinear equations and nonlinear least squares Georg Stadler Courant Institute, NYU stadler@cims.nyu.edu Oct. 9, Oct 15 2015 1 / 34 Organization I Homework 2 due tomorrow morning, please hand in a printout, if possible. 2 / 34 Organization I Homework 2 due tomorrow morning, please hand in a printout, if possible. I Homework 3 to be posted tomorrow. 2 / 34 Organization I Homework 2 due tomorrow morning, please hand in a printout, if possible. I Homework 3 to be posted tomorrow. I Today: Review of linear least squares; Solving nonlinear equations; Nonlinear least squares problems 2 / 34 Organization I Homework 2 due tomorrow morning, please hand in a printout, if possible. I Homework 3 to be posted tomorrow. I Today: Review of linear least squares; Solving nonlinear equations; Nonlinear least squares problems I Questions? 2 / 34 Sources/reading Main reference: Section 4 in Deuflhard/Hohmann. 3 / 34 Overview Review of linear least squares and QR 4 / 34 Review: Least-squares problems Given data points/measurements (ti , bi ), i = 1, . . . , m and a model function φ that relates t and b: b = φ(t; x1 , . . . , xn ), where x1 , . . . , xn are model function parameters. If the model is supposed to describe the data, the deviations/errors ∆i = bi − φ(ti , x1 , . . . , xn ) should be small. Thus, to fit the model to the measurements, one must choose x1 , . . . , xn appropriately. 5 / 34 Review: Linear least-squares We assume (for now) that the model depends linearly on x1 , . . . , xn , e.g.: φ(t; x1 , . . . xn ) = a1 (t)x1 + . . . + an (t)xn Choosing the least square error, this results in min kAx − bk, x where x = (x1 , . . . , xn )T , b = (b1 , . . . , bm )T , and aij = aj (ti ). In the following, we study the over-determined case, i.e., m ≥ n. 6 / 34 Linear least-squares problems–QR factorization Consider non-square matrices A ∈ Rm×n with m ≥ n and rank(A) = n. Then the system Ax = b does, in general, not have a solution (more equations than unknowns). We thus instead solve a minimization problem min kAx − bk2 . x The minimum x̄ of this optimization problem is characterized by the normal equations: AT Ax̄ = AT b. 7 / 34 Linear least-squares problems–QR factorization Solving the normal equations AT Ax̄ = AT b requires: I computing AT A (which is O(mn2 )) I condition number of AT A is square of condition number of A; (problematic for the Choleski factorization) 8 / 34 Linear least-squares problems–QR factorization One would like to avoid the multiplication AT A and use a suitable factorization of A that aids in solving the normal equation, the QR-factorization: R1 A = QR = Q1 , Q2 = Q1 R1 , 0 where Q ∈ Rm×m is an orthonormal matrix (QQT = I), and R ∈ Rm×n consists of an upper triangular matrix and a block of zeros. 9 / 34 Linear least-squares problems–QR factorization How can the QR factorization be used to solve the normal equation? b − R1 x 2 min kAx − bk2 = min kQT (Ax − b)k2 min k 1 k , x x x b2 where QT b b = 1 . b2 Thus, the least squares solution is x = R−1 b1 and the residual is kb2 k. 10 / 34 Linear least-squares problems–QR factorization How can we compute the QR factorization? Givens rotations Use sequence of rotations in 2D subspaces: For m ≈ n: ∼ n2 /2 square roots, and 4/3n3 multiplications For m n: ∼ nm square roots, and 2mn2 multiplications Householder reflections Use sequence of reflections in 2D subspaces For m ≈ n: 2/3n3 multiplications For m n: 2mn2 multiplications These methods compute an orthonormal basis of the columns of A. An alternative would be the Gram Schmidt method—however, Gram Schmidt is unstable and thus sensitive to rounding errors. 11 / 34 Overview Nonlinear systems 12 / 34 Fixed point ideas We intend to solve the nonlinear equation f (x) = 0, x ∈ R. Reformulation as fixed point method: x = Φ(x) Corresponding iteration: Choose x0 (initialization) and compute x1 , x2 , . . . from xk+1 = Φ(xk ) When does this iteration converge? 13 / 34 Fixed point ideas Example: Solve the nonlinear equation 2x − tan(x) = 0. Iteration #1: xk+1 = Φ1 (xk ) = 0.5 tan(xk ) Iteration #2: xk+1 = Φ2 (xk ) = arctan(2xk ) Iteration #3: xk+1 = Φ3 (xk ) = xk − 2xk −tan(xk ) 1−tan2 (xk ) 14 / 34 Convergence of fixed point methods A mapping Φ : [a, b] → R is called contractive on [a, b] if there is a 0 ≤ Θ < 1 such that |Φ(x) − Φ(y)| ≤ Θ|x − y| for all x, y ∈ [a, b]. If Φ is continuously differentiable on [a, b], then |Φ(x) − Φ(y)| = sup |Φ0 (z)| |x − y| z∈[a,b] x,y∈[a,b] sup 15 / 34 Convergence of fixed point methods Let Φ : [a, b] → [a, b] be contractive with constant Θ < 1. Then: I There exists a unique fixed point x̄ with x̄ = Φ(x̄) I For any starting guess x0 in [a, b], the fixed point iteration converges to x̄ and |xk+1 − xk | ≤ Θ|xk − xk−1 | (linear convergence) Θk |x1 − x0 |. 1−Θ The second expression allows to estimate the required number of iterations. |x̄ − xk | ≤ 16 / 34 Newton’s method In one dimension, solve f (x) = 0: Start with x0 , and compute x1 , x2 , . . . from xk+1 = xk − f (xk ) , f 0 (xk ) k = 0, 1, . . . Requires f (xk ) 6= 0 to be well-defined (i.e., tangent has nonzero slope). 17 / 34 Newton’s method Let F : Rn → Rn , n ≥ 1 and solve F (x) = 0. Taylor expansion about starting point x0 : F (x) = F (x0 ) + F 0 (x0 )(x − x0 ) + o(|x − x0 |) for x → x0 . Hence: x1 = x0 − F 0 (x0 )−1 F (x0 ) Newton iteration: Start with x0 ∈ Rn , and for k = 0, 1, . . . compute F 0 (xk )∆xk = −F (xk ), xk+1 = xk + ∆xk Requires that F 0 (xk ) ∈ Rn×n is invertible. Newton’s method is affine invariant. 18 / 34 Convergence of Newton’s method Assumptions on F : D ⊂ Rn open and convex, F : D → Rn continuously differentiable with F 0 (x) is invertible for all x, and there exists ω ≥ 0 such that kF 0 (x)−1 (F 0 (x + sv) − F 0 (x))vk ≤ sωkvk2 for all s ∈ [0, 1], x ∈ D, v ∈ Rn with x + v ∈ D. Assumptions on x∗ and x0 : There exists a solution x∗ ∈ D and a starting point x0 ∈ D such that ρ := kx∗ − x0 k ≤ 2 and Bρ (x∗ ) ⊂ D ω Theorem: Then, the Newton sequence xk stays in Bρ (x∗ ) and limk→∞ xk = x∗ , and kxk+1 − x∗ k ≤ ω k kx − x∗ k2 2 19 / 34 Newton’s method Monotonicity test k F (xk+1 )k ≤ Θ̄k F (xk )k, Θ̄ < 1 20 / 34 Newton’s method Monotonicity test (affine invariant): kF 0 (xk )−1 F (xk+1 )k ≤ Θ̄kF 0 (xk )−1 F (xk )k, Θ̄ < 1 20 / 34 Newton’s method Monotonicity test (affine invariant): kF 0 (xk )−1 F (xk+1 )k ≤ Θ̄kF 0 (xk )−1 F (xk )k, Θ̄ < 1 Damping: xk+1 = xk + λk ∆xk , 0 < λk ≤ 1 For difficult problems, start with small λk and increase later in the iteration (close to the solution λk should be 1). 20 / 34 Newton’s method Monotonicity test (affine invariant): kF 0 (xk )−1 F (xk+1 )k ≤ Θ̄kF 0 (xk )−1 F (xk )k, Θ̄ < 1 Damping: xk+1 = xk + λk ∆xk , 0 < λk ≤ 1 For difficult problems, start with small λk and increase later in the iteration (close to the solution λk should be 1). Approximative Jacobians: Use approximative Jacobians F̃ 0 (xk ), e.g., computed through finite differences. 20 / 34 Newton’s method Convergence of Newton’s method I The “more nonlinear” a problem, the harder it is to solve. 21 / 34 Newton’s method Convergence of Newton’s method I The “more nonlinear” a problem, the harder it is to solve. I Computation of Jacobian can be costly/complicated 21 / 34 Newton’s method Convergence of Newton’s method I The “more nonlinear” a problem, the harder it is to solve. I Computation of Jacobian can be costly/complicated I There’s no reliable black-box solver for nonlinear problems; at least for higher-dimensional problems, the structure of the problem must be taken into account. 21 / 34 Newton’s method Convergence of Newton’s method I The “more nonlinear” a problem, the harder it is to solve. I Computation of Jacobian can be costly/complicated I There’s no reliable black-box solver for nonlinear problems; at least for higher-dimensional problems, the structure of the problem must be taken into account. I Sometimes, continuation ideas must be used to find good initializations: Solve simpler problems first and use solution as starting point for harder problems. 21 / 34 Newton’s method Role of initialization Choice of initialization x0 is critical. Depending on the initialization, the Newton iteration might I not converge (it could “blow up” or “oscillate” between two points) I converge to different solutions I fail cause it hits a point where the Jacobian is not invertible (this cannot happen if the conditions of the convergence theorem are satisfied) I ... 22 / 34 Nonlinear versus linear problems “Classification of mathematical problems as linear and nonlinear is like classification of the Universe as bananas and non-bananas.” 23 / 34 Nonlinear versus linear problems “Classification of mathematical problems as linear and nonlinear is like classification of the Universe as bananas and non-bananas.” or (according to Stanislav Ulam): Using a term like nonlinear science is like referring to the bulk of zoology as the study of non-elephant animals. 23 / 34 Overview Nonlinear least squares—Gauss-Newton 24 / 34 Nonlinear least-squares problems Assume a least squares problem, where the parameters x do not enter linearly into the model. Instead of min kAx − bk2 , x∈Rn we have with F : D → Rn , D ⊂ Rn : 1 min g(x) := kF (x)k2 , 2 x∈Rn where F (x)i = ϕ(ti , x)−bi , 1 ≤ i ≤ m The (local) minimum x∗ of this optimization problem satisfies: g 0 (x) = 0, g 00 (x) is positive definite. 25 / 34 Nonlinear least-squares problems The derivative of g(·) is G(x) := g 0 (x) = F 0 (x)F (x) This is a nonlinear system in x, G : D → Rn . Let’s try to solve it using Newton’s method: G0 (xk )∆xk = −G(xk ), xk+1 = xk + ∆xk where G0 (x) = F 0 (x)T F 0 (x) + F 00T (x)F (x). 26 / 34 Nonlinear least-squares problems F 00 (x) is a tensor. It is often neglected due to the following reasons: I It’s difficult to compute and we can use an approximate Jacobian in Newton’s method. I If the data is compatible with the model, then F (x∗ ) = 0 and the term involving F 00 (x) drops out. If kF (x∗ )k is small, neglecting that term might not make the convergence much slower. I We know that g 00 (x∗ ) must be positive. If F 0 (xk ) has full rank, then F 0 (x)T F 0 (x) is positive and invertible. 27 / 34 Nonlinear least-squares problems—Gauss-Newton The resulting Newton method for the nonlinear least squares problem is called Gauss-Newton method: Initialize x0 and for k = 0, 1, . . . solve F 0 (xk )T F 0 (xk )∆xk = −F 0 (xk )T F (xk ) xk+1 = xk + ∆xk . (solve) (update step) 28 / 34 Nonlinear least-squares problems—Gauss-Newton The resulting Newton method for the nonlinear least squares problem is called Gauss-Newton method: Initialize x0 and for k = 0, 1, . . . solve F 0 (xk )T F 0 (xk )∆xk = −F 0 (xk )T F (xk ) xk+1 = xk + ∆xk . (solve) (update step) The solve step is the normal equation for the linear least squares problem min kF 0 (xk )∆xk + F (xk )k. ∆x 28 / 34 Convergence of Gauss-Newton method Assumptions on F : D ⊂ Rn open and convex, F : D → Rm , m ≥ n continuously differentiable with F 0 (x) has full rank for all x, and let ω ≥ 0, 0 ≤ κ∗ < 1 such that kF 0 (x)+ (F 0 (x + sv) − F 0 (x))vk ≤ sωkvk2 for all s ∈ [0, 1], x ∈ D, v ∈ Rn with x + v ∈ D. Assumptions on x∗ and x0 : Assume there exists a solution x∗ ∈ D of the least squares problem and a starting point x0 ∈ D such that kF 0 (x)+ F (x∗ )k ≤ κ∗ kx − x∗ k 2(1 − κ∗ ) := σ ω Theorem: Then, the sequence xk stays in Bρ (x∗ ) and limk→∞ xk = x∗ , and ω kxk+1 − x∗ k ≤ kxk − x∗ k2 + κ∗ kxk − x∗ k 2 ρ := kx∗ − x0 k ≤ 29 / 34 Convergence of Gauss-Newton method I Role of κ∗ : Represents omission of F 00 (x) I I κ∗ can be chosen as 0 ⇒ local quadratic convergence κ∗ > 0 linear convergence (thus we require κ∗ < 1). I Damping strategy as before (better: linesearch to make guaranteed progress in minimization problem) I There can, in principle, be multiple solutions. 30 / 34 Overview Nonlinear systems depending on parameters 31 / 34 Nonlinear systems depending on parameters Consider F (x ) with F : Rn → Rn , 32 / 34 Nonlinear systems depending on parameters Consider F (x, λ) with F : Rn × Rp → Rn , where λ is one or a vector of parameters. This is a set of parametrized nonlinear systems. 32 / 34 Nonlinear systems depending on parameters Consider F (x, λ) with F : Rn × Rp → Rn , where λ is one or a vector of parameters. This is a set of parametrized nonlinear systems. We would like to either I explore solution xλ for all λ or I find xλ0 for a specific λ0 that is difficult to solve directly using Newton’s method (e.g., solution to Navier Stokes with high Reynolds number) Example: F (x, λ) = x(x3 − x − λ). 32 / 34 Nonlinear systems depending on parameters Regularity assumption: F 0 (x, λ) has full rank when F (x, λ) = 0. This excludes bifurcation points (due to the implicit function theorem). Study of bifurcation points (often very interesting!) requires additional work. Continuation methods can exploit the local fast convergence of Newton’s method: Given solution (x0 , λ0 , ), find solution (x1 , λ1 ) for λ1 close to λ0 . 33 / 34 Nonlinear systems depending on parameters Continuation methods I Classical continuation: Use x0 as starting guess in Newton’s method for λ1 . I Tangent continuation: Differentiate F (x(s), λ0 + s) with respect to s, and evaluate at s = 0. This leads to a better guess to initialize Newton’s method to find x1 . These basic ideas additionally require some robustification to result in reliable algorthms; see Deuflhard/Hohmann. 34 / 34