Scientific Computing: Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 1st, 2011 A. Donev (Courant Institute) Lecture VI 3/1/2011 1 / 22 Outline 1 Basics of Nonlinear Solvers 2 One Dimensional Root Finding 3 Systems of Non-Linear Equations A. Donev (Courant Institute) Lecture VI 3/1/2011 2 / 22 Homework Notes Homework 1 has been graded and grades posted on BlackBoard. Homework 3 has been posted. It is due March 25th (after Spring break). It has an extra credit part. Make sure to follow the instructions for submission: No Word files, no tar/rar files, no subdirectories. Submit a simple (flat) zip file with the PDF and original Matlab codes. Plots are the vehicle for delivering your solution and should be careful about the choice of domains, scales, and titles. The questions have lots of hints (which often reveal half the answer!) and rather detailed step-by-step instructions. Do not include unnecessary/redundant figures/tables: Get your point accross effectively (it is a useful skill). A. Donev (Courant Institute) Lecture VI 3/1/2011 3 / 22 Basics of Nonlinear Solvers Fundamentals Simplest problem: Root finding in one dimension: f (x) = 0 with x ∈ [a, b] Or more generally, solving a square system of nonlinear equations f(x) = 0 ⇒ fi (x1 , x2 , . . . , xn ) = 0 for i = 1, . . . , n. There can be no closed-form answer, so just as for eigenvalues, we need iterative methods. Most generally, starting from m ≥ 1 initial guesses x 0 , x 1 , . . . , x m , iterate: x k+1 = φ(x k , x k−1 , . . . , x k−m ). A. Donev (Courant Institute) Lecture VI 3/1/2011 4 / 22 Basics of Nonlinear Solvers Order of convergence Consider one dimensional root finding and let the actual root be α, f (α) = 0. A sequence of iterates x k that converges to α has order of convergence p > 1 if as k → ∞ k+1 k+1 e x − α p = p → C = const, k k |x − α| |e | where the constant 0 < C < 1 is the convergence factor. A method should at least converge linearly, that is, the error should at least be reduced by a constant factor every iteration, for example, the number of accurate digits increases by 1 every iteration. A good method for root finding coverges quadratically, that is, the number of accurate digits doubles every iteration! A. Donev (Courant Institute) Lecture VI 3/1/2011 5 / 22 Basics of Nonlinear Solvers Local vs. global convergence A good initial guess is extremely important in nonlinear solvers! Assume we are looking for a unique root a ≤ α ≤ b starting with an initial guess a ≤ x0 ≤ b. A method has local convergence if it converges to a given root α for any initial guess that is sufficiently close to α (in the neighborhood of a root). A method has global convergence if it converges to the root for any initial guess. General rule: Global convergence requires a slower (careful) method but is safer. It is best to combine a global method to first find a good initial guess close to α and then use a faster local method. A. Donev (Courant Institute) Lecture VI 3/1/2011 6 / 22 Basics of Nonlinear Solvers Conditioning of root finding f (α + δα) ≈ f (α) + f 0 (α)δα = δf |δα| ≈ |δf | |f 0 (α)| −1 ⇒ κabs = f 0 (α) . The problem of finding a simple root is well-conditioned when |f 0 (α)| is far from zero. Finding roots with multiplicity m > 1 is ill-conditioned: 0 f (α) = · · · = f (m−1) (α) = 0 ⇒ |δf | |δα| ≈ m |f (α)| 1/m Note that finding roots of algebraic equations (polynomials) is a separate subject of its own that we skip. A. Donev (Courant Institute) Lecture VI 3/1/2011 7 / 22 One Dimensional Root Finding The bisection and Newton algorithms A. Donev (Courant Institute) Lecture VI 3/1/2011 8 / 22 One Dimensional Root Finding Bisection First step is to locate a root by searching for a sign change, i.e., finding a0 and b0 such that f (a0 )f (b0 ) < 0. The simply bisect the interval, for k = 0, 1, . . . ak + b k 2 and choose the half in which the function changes sign, i.e., either ak+1 = x k , bk+1 = bk or bk+1 = x k , ak+1 = ak so that f (ak+1 )f (bk+1 ) < 0. Observe that each step we need one function evaluation, f (x k ), but only the sign matters. The convergence is essentially linear because k+1 k x k − α b x − α ≤ ⇒ ≤ 2. k+1 k 2 |x − α| xk = A. Donev (Courant Institute) Lecture VI 3/1/2011 9 / 22 One Dimensional Root Finding Newton’s Method Bisection is a slow but sure method. It uses no information about the value of the function or its derivatives. √ Better convergence, of order p = (1 + 5)/2 ≈ 1.63 (the golden ratio), can be achieved by using the value of the function at two points, as in the secant method. Achieving second-order convergence requires also evaluating the function derivative. Linearize the function around the current guess using Taylor series: f (x k+1 ) ≈ f (x k ) + (x k+1 − x k )f 0 (x k ) = 0 x k+1 = x k − A. Donev (Courant Institute) Lecture VI f (x k ) f 0 (x k ) 3/1/2011 10 / 22 One Dimensional Root Finding Convergence of Newton’s method Taylor series with remainder: 1 f (α) = 0 = f (x k )+(α−x k )f 0 (x k )+ (α−x k )2 f 00 (ξ) = 0, for some ξ ∈ [xn , α] 2 After dividing by f 0 (x k ) 6= 0 we get 1 f 00 (ξ) f (x k ) k x − 0 k − α = − (α − x k )2 0 k f (x ) 2 f (x ) x k+1 − α = e k+1 = − 1 k 2 f 00 (ξ) e 2 f 0 (x k ) which shows second-order convergence k+1 k+1 00 x e f 00 (ξ) f (α) − α = = 0 k → 0 2 2 2f (x ) 2f (α) |x k − α| |e k | A. Donev (Courant Institute) Lecture VI 3/1/2011 11 / 22 One Dimensional Root Finding Proof of Local Convergence k+1 x − α 2 00 00 f (α) f (ξ) = 0 k ≤ M ≈ 0 2f (x ) 2f (α) |x k − α| k+1 k+1 2 e = x − α ≤ M x k − α = M e k e k which will converge, e k+1 < e k , if M e k < 1. This will be true for all k > 0 if e 0 < M −1 , leading us to conclude that Newton’s method thus always converges quadratically if we start sufficiently close to a simple root, more precisely, if 0 0 x − α = e 0 < M −1 ≈ 2f (α) . f 00 (α) A. Donev (Courant Institute) Lecture VI 3/1/2011 12 / 22 One Dimensional Root Finding Stopping Criteria A good library function for root finding has to implement careful termination criteria. An obvious option is to terminate when the residual becomes small k f (x ) < ε, which is only good for very well-conditioned problems, |f 0 (α)| ∼ 1. Another option is to terminate when the increment becomes small k+1 x − x k < ε, which is only good for for rapidly converging iterations. A. Donev (Courant Institute) Lecture VI 3/1/2011 13 / 22 One Dimensional Root Finding In practice A robust but fast algorithm for root finding would combine bisection with Newton’s method. Specifically, a method like Newton’s that can easily take huge steps in the wrong direction and lead far from the current point must be safeguarded by a method that ensures one does not leave the search interval and that the zero is not missed. Once x k is close to α, the safeguard will not be used and quadratic or faster convergence will be achieved. Newton’s method requires first-order derivatives so often other methods are preferred that require function evaluation only. Matlab’s function fzero combines bisection, secant and inverse quadratic interpolation and is “fail-safe”. A. Donev (Courant Institute) Lecture VI 3/1/2011 14 / 22 One Dimensional Root Finding Find zeros of a sin(x) + b exp(−x 2 /2) in MATLAB % f=@ m f i l e u s e s a f u n c t i o n i n an m− f i l e % Parameterized f u n c t i o n s are created with : a = 1; b = 2; f = @( x ) a ∗ s i n ( x ) + b∗ exp(−x ˆ 2 / 2 ) ; % Handle figure (1) ezplot ( f ,[ −5 ,5]); grid x1=f z e r o ( f , [ − 2 , 0 ] ) [ x2 , f 2 ]= f z e r o ( f , 2 . 0 ) x1 = x2 = f2 = −1.227430849357917 3.155366415494801 −2.116362640691705 e −16 A. Donev (Courant Institute) Lecture VI 3/1/2011 15 / 22 One Dimensional Root Finding Figure of f (x) a sin(x)+b exp(−x2/2) 2.5 2 1.5 1 0.5 0 −0.5 −1 −5 −4 A. Donev (Courant Institute) −3 −2 −1 0 x Lecture VI 1 2 3 4 5 3/1/2011 16 / 22 Systems of Non-Linear Equations Multi-Variable Taylor Expansion We are after solving a square system of nonlinear equations for some variables x: f(x) = 0 ⇒ fi (x1 , x2 , . . . , xn ) = 0 for i = 1, . . . , n. It is convenient to focus on one of the equations, i.e., consider a scalar function f (x). The usual Taylor series is replaced by 1 f (x + ∆x) = f (x) + gT (∆x) + (∆x)T H (∆x) 2 where the gradient vector is ∂f ∂f ∂f T g = ∇x f = , ,··· , ∂x1 ∂x2 ∂xn and the Hessian matrix is H= A. Donev (Courant Institute) ∇2x f = Lecture VI ∂2f ∂xi ∂xj ij 3/1/2011 17 / 22 Systems of Non-Linear Equations Newton’s Method for Systems of Equations It is much harder if not impossible to do globally convergent methods like bisection in higher dimensions! A good initial guess is therefore a must when solving systems, and Newton’s method can be used to refine the guess. The first-order Taylor series is f xk + ∆x ≈ f xk + J xk ∆x = 0 where the Jacobian J has the gradients of fi (x) as rows: [J (x)]ij = ∂fi ∂xj So taking a Newton step requires solving a linear system: J xk ∆x = −f xk but denote J ≡ J xk xk+1 = xk + ∆x = xk − J−1 f xk . A. Donev (Courant Institute) Lecture VI 3/1/2011 18 / 22 Systems of Non-Linear Equations Convergence of Newton’s method Newton’s method converges quadratically if started sufficiently close to a root x? at which the Jacobian is not singular. k+1 x − x? = ek+1 = xk − J−1 f xk − x? = ek − J−1 f xk but using second-order Taylor series 1 k T −1 k −1 ? k k f x ≈J f (x ) + Je + J e H e 2 J−1 k T e = ek + H ek 2 k+1 J−1 k T J−1 kHk k 2 k e = e H e 2 e ≤ 2 A. Donev (Courant Institute) Lecture VI 3/1/2011 19 / 22 Systems of Non-Linear Equations Problems with Newton’s method Newton’s method requires solving many linear systems, which can become complicated when there are many variables. It also requires computing a whole matrix of derivatives, which can be expensive or hard to do (differentiation by hand?) Newton’s method converges fast if the Jacobian J (x? ) is well-conditioned, otherwise it can “blow up”. For large systems one can use so called quasi-Newton methods: Approximate the Jacobian with another matrix e J and solve e J∆x = f(xk ). Damp the step by a step length αk . 1, xk+1 = xk + αk ∆x. Update e J by a simple update, e.g., a rank-1 update (recall homework 2). A. Donev (Courant Institute) Lecture VI 3/1/2011 20 / 22 Systems of Non-Linear Equations In practice It is much harder to construct general robust solvers in higher dimensions and some problem-specific knowledge is required. There is no built-in function for solving nonlinear systems in MATLAB, but the Optimization Toolbox has fsolve. In many practical situations there is some continuity of the problem so that a previous solution can be used as an initial guess. For example, implicit methods for differential equations have a time-dependent Jacobian J(t) and in many cases the solution x(t) evolves smootly in time. For large problems specialized sparse-matrix solvers need to be used. In many cases derivatives are not provided but there are some techniques for automatic differentiation. A. Donev (Courant Institute) Lecture VI 3/1/2011 21 / 22 Systems of Non-Linear Equations Conclusions/Summary Root finding is well-conditioned for simple roots (unit multiplicity), ill-conditioned otherwise. Methods for solving nonlinear equations are always iterative and the order of convergence matters: second order is usually good enough. A good method uses a higher-order unsafe method such as Newton method near the root, but safeguards it with something like the bisection method. Newton’s method is second-order but requires derivative/Jacobian evaluation. In higher dimensions having a good initial guess for Newton’s method becomes very important. Quasi-Newton methods can aleviate the complexity of solving the Jacobian linear system. A. Donev (Courant Institute) Lecture VI 3/1/2011 22 / 22