18.330 Lecture Notes: Numerical Differentiation Homer Reid February 25, 2014 Suppose we have a black-box function f (x). We can query this function for its value at any x, and we will get back a number, but we don’t have an analytical formula for f (x). How do we estimate values of f 0 (x)? Contents 1 Finite-difference approximations of the first derivative 1.1 Forward differencing . . . . . . . . . . . . . . . . . . . . . 1.2 Backward differencing . . . . . . . . . . . . . . . . . . . . 1.3 Centered differencing . . . . . . . . . . . . . . . . . . . . . 1.4 Higher-order finite-difference formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 2 3 3 2 Finite-difference approximations of higher derivatives 4 3 Finite-differencing of multivariable functions 5 4 Finite-differencing as matrix-vector multiplication 6 1 18.330 Lecture Notes 1 1.1 2 Finite-difference approximations of the first derivative Forward differencing The definition of the derivative of a function f (x) at a point x is f (x + h) − f (x) df f 0 (x) ≡ = lim . dx x h→0 h (1) The simplest approach to numerical differentiation is simply to arrest the limiting process here and evaluate the RHS of (1) at a finite value of h. This defines what is known as the forward-finite-difference (FFD) (or just forward-difference) approximation to the derivative: 0 fFFD (h; x) ≡ f (x + h) − f (x) . h (2) It’s easy to assess the error incurred by the forward-difference procedure. Recall that the Taylor-series expression for the quantity f (x + h) is f (x + h) = f (x) + hf 0 (x) + h2 00 f (x) + O(h3 ) 2 Inserting this into (2), we find 0 fFFD (h; x) = f 0 (x) + h 00 f (x) + O(h2 ) 2 (3) The first term on the RHS here is the quantity we are trying to compute, and everything else is an error term. Thus we have 0 fFFD (h; x) − f 0 (x) = h 00 f (x) + O(h2 ) 2 (4) As usual in error analysis, this equation is not useful for giving us an actual number for the error, because we don’t know how to evaluate f 00 (x). The only important thing is the h dependence: the error is linear in h, i.e. we have a first-order method. To obtain one more digit of accuracy (i.e. 10× smaller error) we must use a 10× smaller value of h. 1.2 Backward differencing It may happen that values of f (x + h) are not available for positive h. This may happen, for example, if the point x lies at the right endpoint of the interval over which our function is computable or measurable. (I mean measurable in the experimental sense, not the sense of Lebesgue integration. Think of f (x) as a quantity reported by an experimental apparatus on which we can’t turn the dial any further than some xmax .) Of course values of f (x + h) must exist for at 18.330 Lecture Notes 3 least some nonzero range of positive h, since otherwise the derivative at x would not be defined, but those values may not be accessible to us for one reason or another. In this case, we can use backward differencing: 0 fBD (h; x) ≡ f (x) − f (x − h) . h (5) It’s easy to show that backward-differencing, like forward-differencing, is a firstorder method. 1.3 Centered differencing Consider the Taylor-series expansions of f (x − h) and f (x + h): f (x + h) = f (x) + hf 0 (x) + h3 h2 00 f (x) + f 000 (x) + O(h4 ) 2 6 (6a) f (x − h) = f (x) − hf 0 (x) + h2 00 h3 f (x) − f 000 (x) + O(h4 ) 2 6 (6b) Careful scrutiny of these equations reveals that by subtracting them and dividing by 2 we can pick off the second-derivative term (and in fact all even derivative terms) in (3): f (x + h) − f (x − h) h3 = hf 0 (x) + f 000 (x) + O(h5 ) 2 6 Now just divide by h to obtain the centered-difference approximation to the derivative: f (x + h) − f (x − h) 0 fCD (x) ≡ (7) 2h The above analysis shows that 0 fCD (x) − f 0 (x) = h2 000 f (x) + O(h4 ) 6 (8) Thus centered-differencing is a method of order 2. 1.4 Higher-order finite-difference formulas Formulas like (2), (5), and (7) are known as finite-difference stencils: they are linear combinations of n function samples that approximate the derivative with pth-order convergence. The forward-difference, backward-difference, and centered-difference stencils have (n, p) = (2, 1), (2, 1), (2, 2) respectively. By increasing the number of function samples n that we are willing to compute, it is easy to construct finite-difference stencils that achieve any desired convergence order p. All you have to do is write down the Taylor expansions of the quantities · · · , f (x − 2h), f (x − h), f (x), f (x + h), f (x + 2h), · · · 18.330 Lecture Notes 4 and construct clever weighted combinations of these that pick off successively higher-order terms in the error estimates of equations (3) and (8). However, we generally don’t carry out finite-differencing beyond the centereddifference case. The reason is that in constructing formulas of this type we are essentially constructing and differentiating a polynomial interpolant through data samples at uniformly-spaced intervals. As we have noted many times, this procedure is badly-behaved due to the Runge phenomenon: the more you try to bend and squeeze a high-order polynomial to fit through evenly-spaced data points, the more it will bulge out in between the points. If you need a numerical differentiation stencil that achieves a rapid convergence rate, a better idea is to use non-uniformly spaced points to construct and differentiate a polynomial interpolant. We will revisit this topic when we consider Chebyshev interpolation later in the course. 2 Finite-difference approximations of higher derivatives We can play similar games to write down approximate formulas for higher derivatives. For example, go back to equations (6) and suppose that we add the two equations together instead of subtracting them: f (x + h) + f (x − h) = 2f (x) + h2 f 00 (x) + h4 0000 f (x) + · · · 12 Clearly all we have to do is subtract off 2f (x) and divide by h2 to obtain an approximation to the second derivative: 00 fCD (h; x) ≡ f (x + h) − 2f (x) + f (x − h) h2 0000 00 = f (x) + f (x) + · · · h2 12 (9) We call this the “centered-difference” approximation to the second derivative; evidently it achieves second-order convergence. 18.330 Lecture Notes 3 5 Finite-differencing of multivariable functions Next suppose we want to differentiate a function of more than one variable, say f (x, y). If we are only interested in partial derivatives with respect to a single variable, we can just apply the formulas for the one-dimensional case with the other variables held fixed. For example: f (x + h, y) − f (x, y) ∂f ≈ first-order convergence ∂x (x,y) h ∂f f (x, y + h) − f (x, y − h) ≈ second-order convergence ∂y (x,y) 2h f (x − h, y) − 2f (x, y) + f (x + h, y) ∂ 2 f second-order convergence ≈ 2 ∂y (x,y) h2 However, things get a little more interesting when we go to compute mixed partial derivatives. Consider, for example, the simultaneous double Taylor expansion of f (x, y) : f (x + ∆x , y + ∆y ) = f (x, y) + ∆x fx (x, y) + ∆y fy (x, y) + ∆2y ∆2x fxx (x, y) + ∆x ∆y fxy (x, y) + fyy (x, y) + O(∆3 ) 2 2 By writing out this equation for various possible choices of ∆x and ∆y and taking linear combinations of the results, it is possible to kill off various terms on the RHS to obtain stencils for various partial derivatives. You will explore this possibility in your problem set this week. 18.330 Lecture Notes 6 Figure 1: A set of N = 5 equally-spaced points in the interior of an interval [a, b]. The spacing between the points is h = (b − a)/(N + 1). 4 Finite-differencing as matrix-vector multiplication Consider an interval [xa , xb ] and a function f (x) that vanishes at the endpoints, i.e. f (xa ) = f (xb ) = 0. Suppose we have samples of f at a set of N evenlyspaced points between a and b. More specifically, break up the interval into N + 1 segments of width b−a h= N +1 and denote the endpoints of these intervals and the values of f at those points by xn = xa + nh, fn = f (xn ), n = 1, 2, · · · , N. (For convenience we will also use the notation x0 = a and xN +1 = b.) Now suppose we try to compute the second derivative of f at the points xn 18.330 Lecture Notes 7 using the second-order finite-difference stencil (9). We find i 1h f (x ) − 2f (x ) + f (x ) 0 1 2 h2 i 1h = 2 − 2f (x1 ) + f (x2 ) h f 00 (x1 ) ≈ (10a) (where we used the boundary condition f (x0 ) = 0) f 00 (x2 ) ≈ f 00 (x3 ) ≈ .. . = f 00 (xN −1 ) ≈ f 00 (xN ) ≈ i 1h f (x1 ) − 2f (x2 ) + f (x3 ) 2 h i 1h f (x2 ) − 2f (x3 ) + f (x4 ) 2 h .. . i 1h f (x ) − 2f (x ) + f (x ) N −2 N −1 N h2 i 1h f (x ) − 2f (x ) N −1 N h2 (10b) (10c) (10d) (10e) where in the last line we used the boundary condition f (xN +1 ) = 0. It’s convenient to write equations (10) in the form of a matrix-vector product: f1 f100 −2 1 0 ··· 0 0 f200 1 −2 1 · · · 0 0 f2 f300 1 −2 · · · 0 0 f3 1 0 (11) .. ≈ 2 .. .. .. . .. .. .. .. . . h . . . . . f 00 0 0 0 · · · −2 1 fN −1 N −1 00 fN . 0 0 0 · · · 1 −2 fN which we may write using matrix-vector notation in the form f 00 = Af (12) where f 00 and f are the vectors of samples of f and samples of the second derivative of f . The point of equations (11) and (12) is that the operation that takes f into f 00 may be thought of as matrix multiplication. Among the important consequences of this observation is that it makes it easy to invert the operation that obtains f from f 00 : f = A−1 f 00 (13) The primary use of formulas like (13) is in the application of finite-difference differentiation to the solution of boundary-value problems and higher-order PDEs; the technique is known in the PDE world as the finite-difference method. 18.330 Lecture Notes 8 Extension to nontrivial boundary conditions In the example above, we used the boundary conditions f (xa ) = f (xb ) = 0. This simplified equations (10a) and (10e). What if instead we have nontrivial boundary conditions f (xa ) = fa , f (xb ) = fb where fa , fb are some given numbers? In this case, equations (10a) and (10e) are respectively modified to read i 1h f − 2f (x ) + f (x ) a 1 2 h2 i 1h f 00 (xN ) ≈ 2 f (xN −1 ) − 2f (xN ) + fb h f 00 (x1 ) ≈ and equation (11) is modified to look like this: f100 fa −2 1 0 f200 0 1 −2 1 f300 1 0 1 −2 1 0 .. − 2 .. ≈ 2 .. .. .. . h . h . . . f 00 0 0 0 0 N −1 00 fN 0 0 0 fb | | {z } | {z } {z f 00 ∆ A ··· ··· ··· .. . 0 0 0 .. . ··· ··· −2 1 (14a) (14b) 0 0 0 .. . f1 f2 f3 .. . 1 fN −1 −2 fN . }| {z f } (15) What we have done here is to swing the terms involving fa and fb in (14) over to the other side of the equation in (15) – that is, away from the side containing the unknowns and onto the side on which the known quantities reside. Note that the matrix A and the vectors f , f 00 in this equation are unchanged from equation (11). All that happens is that the RHS of equation (13) is now augmented by an additional term: h 1 i f = h2 A−1 f 00 − 2 ∆ (16) h where ∆ is the sparse vector in (15) containing the boundary values of f .