Some Surprising Errors in Numerical Differentiation Sheldon P. Gordon Department of Mathematics Farmingdale State College Farmingdale, NY 11735 gordonsp@farmingdale.edu Abstract Data analysis methods, both numerical and visual, are used to discover a variety of surprising patterns in the errors associated with successive approximations to the derivatives of sinusoidal and exponential functions based on the Newton differencequotient. l’Hpital’s Rule and Taylor polynomial approximations are then used to explain why these surprising patterns occur. Keywords Numerical differentiation, error analysis, Taylor approximations, l’Hpital’s Rule Introduction As Richard Hamming, one of the giants of modern numerical analysis put it, “The purpose of computing is insight, not numbers” [3]. The ideas presented in this article are certainly in the spirit of that comment and many of them are usually encountered in a first course in numerical analysis. However, there are good pedagogical reasons to incorporate some of them in first year calculus as well. For one, numerical methods is one of the most important and useful fields of modern mathematics and it is desirable to expose students to it early in their mathematical experiences. Moreover, these ideas can provide students with some different perspectives and deeper insight into topics that they do see in freshman calculus. Unfortunately, relatively few mathematicians have had any training in numerical analysis or have had the opportunity to teach a course in the subject. As such, they often are not acquainted with, or comfortable with, some lovely ideas that can enrich courses such as calculus while exposing students early on to a deeper appreciation of numerical methods. In this article, we focus on ideas that can be incorporated into freshman calculus in conjunction with a discussion of numerical differentiation and subsequent in discussions of l’Hpital’s Rule and Taylor polynomial approximations. In the process, we hope to highlight the kinds of important mathematical insights that can be gained by examining the errors in numerical processes. The conventional wisdom in both baseball and mathematics education is that errors are terrible. Coaches constantly impress on their players, and mathematics instructors on their students, that errors are to be avoided at all costs. While this may be sound advice on the baseball field, it is not always correct in mathematics. In particular, in numerical analysis, understanding errors is an extremely valuable tool for gaining insight into iterative methods, which often leads to the creation of more effective numerical tools that converge more rapidly. We will look at a number of simple examples that typically arise in freshman calculus in the process of approximating the derivative of a function at a point using the Newton difference-quotient. However, rather than focusing on the “answer”, we instead look at the errors involved in the successive approximations and find some rather surprising patterns and results when we apply some standard methods of data analysis. We subsequently come back to each example to use some slightly more sophisticated ideas from calculus to explain why these patterns actually occur. Some Examples In each of the following examples, we use the sequence of values h = 0.1, 0.05, 0.025, … We leave it to the reader to investigate what, if anything, might change if another sequence of values for h were used instead. Example 1 The derivative of f(x) = sin x at the origin. We use the Newton difference-quotient sin( x h) sin( x) h to approximate the derivative of the sine function at x = 0, so f '( x) that Table 1: Approximations to derivative of sin x h 0.1 0.05 0.025 0.0125 0.00625 0.003125 0.001563 0.000781 0.000391 0.000195 sin h/h 0.99833417 0.99958339 0.99989584 0.99997396 0.99999349 0.99999837 0.99999959 0.99999990 0.99999997 0.99999999 f '(0) sin(h) . h With the indicated sequence of values for h, we obtain the approximations shown in Table 1. It is clear that the ratio converges, and rather quickly at that, to 1. However, instead of examining the values of the difference-quotient, let’s look at the resulting values for the absolute values of the successive errors in these approximations, sin h sin h f '(0) 1 , h h as shown in Table 2. In addition, we include an extra column showing the ratios of the successive error values. Clearly, the errors values converge rapidly to 0, which is what we expect. However, some very important insights arise if we use the notion and methods of data analysis. These ideas are among several significant new topics that have been introduced into school mathematics and many reform versions of college courses such as college algebra and precalculus. In many of these courses, these topics are the unifying theme around which the entire course is built. Unfortunately, this thread tends to disappear completely when students reach calculus. By utilizing the data analysis ideas as part of these investigations on limits and approximations to the derivative, we can build on this increasingly important aspect of the precursor courses and hence build still deeper understanding of the new material. Consider the scatterplot of the error values plotted against the values of the step h, 0.0018 as shown in Figure 1. The points not only 0.0015 approach 0 as h 0, but they do so in a very 0.0012 distinct manner, one that might suggest a 0.0009 power function pattern (since the points 0.0006 appear to pass through the origin). Moreover, the pattern suggests that the power function E Error 0.0003 h 0 0 0.02 0.04 0.06 0.08 0.1 p = Ah has a power p that is greater than 1 (since the pattern is that of an increasing, Figure 1: Errors in the derivative approximations vs. h concave up function). When we fit a power function to this data using the power regression routine in Excel or on any graphing calculator, we get the function E = 0.166667h2. The graph of this function is shown superimposed over the points in Figure 1 and it appears to be an excellent fit. Furthermore, the corresponding correlation coefficient is r = 1, which indicates a virtually perfect fit between the log E and the log h values, and hence suggests that the E vs. h power function fit is extremely good. That is, the error values as a function of the step size seemingly lie on the curve E = h2/6. Moreover, when you examine the third column in Table 2, which shows the absolute values of the errors, you will likely notice that each entry appears to be roughly one-quarter of the previous entry. We therefore included a fourth column in Table 2 to show the ratios of the successive errors. It is evident from the entries in this column that the successive ratios start very close to and converge to it extremely rapidly. That is, each error is essentially one-quarter the preceding error whenever the step size is reduced by one-half. These apparent results suggest some related questions. What happens at points other than the origin? What happens if the sequence of values for h is different? We consider the latter issue later, but leave the former for the interested reader to investigate. Table 2: Approximations to derivative of sin x, the size of the errors, and the ratios of successive errors h 0.1 0.05 sin h/h │error│ ratio 0.99833417 0.0016658 0.250094 0.99958339 0.0004166 0.250023 0.025 0.0125 0.00625 0.003125 0.001563 0.000781 0.000391 0.000195 0.99989584 0.99997396 0.99999349 0.99999837 0.99999959 0.9999999 0.99999997 0.99999999 0.0001042 0.250006 2.604E-05 0.250001 6.51E-06 0.25 1.628E-06 0.25 4.069E-07 0.25 1.017E-07 0.25 2.543E-08 0.25 6.358E-09 Example 2 The derivative of f(x) = cos x at the origin. The Newton differencequotient for the cosine function to approximate its derivative when x = 0 is f '(0) cos(h) 1 . h The corresponding values for our sequence of h-values, as well as the associated errors and the ratios of the errors, are displayed in Table 3. Table 3: Approximations to derivative of cos x, the size of the errors, and the ratios of successive errors h 0.1 0.05 0.025 0.0125 0.00625 0.003125 0.001563 0.000781 0.000391 0.000195 (cos h - 1)/h -0.04995835 -0.02499479 -0.01249935 -0.00624992 -0.00312499 -0.0015625 -0.00078125 -0.00039062 -0.00019531 -9.7656E-05 │error│ 0.0499583 0.0249948 0.0124993 0.0062499 0.003125 0.0015625 0.0007812 0.0003906 0.0001953 9.766E-05 ratio 0.500313 0.500078 0.50002 0.500005 0.500001 0.5 0.5 0.5 0.5 Clearly, the successive difference- 0 .0 5 quotient approximations approach 0 as h 0 .0 4 → 0, as expected, and the errors likewise 0 .0 3 approach 0 as h → 0. We show the 0 .0 2 scatterplot of the error values plotted 0 .0 1 Error h 0 0 0 .0 2 0 .0 4 0 .0 6 0 .0 8 0 .1 Figure 2: Errors in derivative approximations vs. h against h in Figure 2 and see that the points seem to lie on a line through the origin. The corresponding regression line is E = 0.4996h + 0.000003, as shown in Figure 2. The associated correlation coefficient is r = 1. Alternatively, the power regression equation is E = 0.4997h0.9999 with an associated correlation coefficient of r = 1. Thus, either way, the line E = ½h is an almost perfect fit to the error values. Furthermore, from Table 3, the successive ratios of the errors converge to ½ quite rapidly. So, as with the line function, each error is effectively one-quarter the previous error whenever the step-size is reduced by half. Again, these results suggest a number of questions. Why is the pattern in the errors linear here while it was quadratic for the sine function? What happens at points other than the origin? What happens if the sequence of values for h changes? Example 3: The derivative of f(x) = ex at x = 0. We next consider the successive approximations f '(0) eh 1 h to the derivative of the exponential function f(x) = ex at x = 0. The values based on our 0 .0 6 Error 0 .0 5 sequence of h-values are displayed in Table 4. As expected, the approximations clearly 0 .0 4 0 .0 3 converge to 1 and the errors quickly 0 .0 2 approach 0. 0 .0 1 Further, the ratios of the successive errors clearly approach ¼. We h 0 0 0 .0 2 0 .0 4 0 .0 6 0 .0 8 0 .1 show the error values plotted against the Figure 3: Errors in Approximations to Derivative of ex step-size in Figure 3 and observe that the pattern seemingly is a linear one. The resulting linear regression equation is E = 0.516h – 0.00009, with a correlation coefficient of r = 1, so the linear function is essentially a perfect fit. Alternatively, the power regression equation is E = 0.5147h1.0041 with an associated correlation coefficient of r = 1. For values of h slightly above 0, this power function virtually indistinguishable from the linear regression function; the two certainly diverge from one another, albeit very slowly, as h increases. Either way, we conclude that the pattern in the errors as a function of h is the linear relationship E 0.515h when h is reasonably small. Table 4: Approximations to derivative of ex, the size of the errors, and the ratios of successive errors h 0.1 0.05 0.025 0.0125 0.00625 0.003125 0.001563 0.000781 0.000391 0.000195 (eh-1)/h 1.051709181 1.025421928 1.012604821 1.006276123 1.003131521 1.001564129 1.000781657 1.000390727 1.000195338 1.000097663 │error│ 0.0517092 0.0254219 0.0126048 0.0062761 0.0031315 0.0015641 0.0007817 0.0003907 0.0001953 9.766E-05 ratio 0.491633 0.495825 0.497915 0.498958 0.499479 0.49974 0.49987 0.499935 0.499967 Explaining the “Surprises” We now revisit each of the above three examples to see precisely why the results we found occur. In the process, we will also address some of the extended issues raised in conjunction with each of the examples. The derivative of the sine function We begin with the question of why the successive ratios of the errors converge to . For a given value of h, the associated error E(h) is E ( h) sin h sin h f '(0) 1, h h and the subsequent error E(½h) based on a step of ½h is E ( 12 h) sin ( 12 h) 1. 1 2h Consequently, the ratio of the successive errors is sin ( 12 h) sin ( 12 h) 12 h 1 1 1 1 E ( 12 h) 2sin ( 12 h) h 2h 2h 2h . sin h sin h h E ( h) sin h h 1 h h h To find the limit of this ratio as h 0, we apply l’Hpital’s Rule repeatedly to get lim h0 2sin ( 12 h) h cos ( 12 h) 1 1 sin ( 12 h) 1 cos ( 12 h) lim h0 lim h0 2 lim h0 4 14 . sin h h cos h 1 sin h cos h Thus, in terms of the absolute values of the errors, we see that the limit of the successive ratios is indeed . While l’Hpital’s Rule is great for finding the value of the limit, it really does not provide an understanding of why that value occurs. A far more insightful approach is to use Taylor approximations. We then get, for the error based on a step of h, h h3! h5! ... h h3 h5 ... sin h 2 4 E ( h) 1 3! 5! h3! h5! ... , h h h 3 5 (1) and with step ½h, 1 h3 1 h5 1 1 23 25 h h3 h5 ...) 2 3! 5! ... 2 h 1 2( 8(3!) sin( 2 h) 64(5!) h2 h4 ... 1 E ( 2 h) 1 1 1 4(3!) 32(5!) h 2h 2h . Therefore, when h is small, the ratio of the error terms is approximately E ( 12 h) 4(3!) 32(5!) ... 4(3!) 1 , h2 h4 ... h2 E ( h) 4 (3!) (5!) (3!) since all higher order terms tend to zero much more rapidly. h2 h4 h2 (2) Incidentally, suppose that the sequence of values for h is chosen with a different pattern, say where each successive value is kh, where k is some fraction other than ½. Consider the expression in the numerator of the first time of Equation (2). In the denominator of the h2 term, the 4 = 22, would be replaced by k2, the denominator of the h4 term, 32 = 24, would be replaced by k4, and so on. As a result, the expression for the ratio of the errors would become E ( 1k h) k 2 (3!) k 4 (5!) ... k 2 (3!) 1 2, 2 4 2 E ( h) k h h ... h h2 (3!) h4 h2 (5!) (3!) and so the limiting value for the ratio of the errors will be 1/ k2 instead of ¼. We next consider the question of why the pattern in the absolute value of the error values versus the step size turns out to be roughly E = h2/6. This actually follows immediately from Equation (1) above. When h is reasonably small, all of the higher order terms in the power series at the right approach 0 very rapidly and so sin h 2 1 h3! h h5!4 ... h3!2 h62 . The derivative of the cosine function For a given step size h, the associated error in the approximation is E ( h) cos h 1 cos h 1 f '(0) 0. h h Therefore, using the Taylor approximation to the cosine, we find that 2 4 1 h2! h4! ... 1 h2 h4 ... cos h 1 h h3 ..... E (h) 0 2! 4! 2! 4! h h h As h approaches 0, all higher order terms approach 0 much more rapidly and so, for small h, we see that E -½h; as such, when we look at the absolute values, we get │E│ ½h . Next, let’s look at the ratio of successive error terms. Corresponding to a step size of ½h, we have E ( 12 h) cos ( 12 h) 1 1 2h and so the ratio of successive error terms is cos ( 12 h) 1 1 cos ( 12 h) 1 E ( 12 h) 2h 2 . cos (h) 1 E ( h) cos (h) 1 h When we introduce the Taylor approximations, we find that 1 h2 h4 ... 1 h2 h4 ... cos ( 12 h) 1 E ( 12 h) 4 2! 16 4! . 2 2 4 2!2 164 4! 2 2 4 h h ... h h E ( h) cos ( h ) 1 1 ... 1 2! 4! 2! 4! When h is small, the higher order terms become insignificant and so the ratio of the successive errors is approximately h2 1. 2 42! 2 h 2 2! Furthermore, if the sequence of values for h is such that each successive value is a fraction k of the current value instead of ½, it is clear from the above that the ratio of the errors will approach k. The derivative of the exponential function We now consider the previous results on the errors associated with the derivative of the exponential function at x = 0. For a given step size h, the associated error in the approximation is eh 1 eh 1 E ( h) f '(0) 1. h h Therefore, using the Taylor approximation to the exponential function, we find that 2 3 2 3 1 h h2! h3! ... 1 h h2! h3! ... eh 1 h h2 ...) 1. E (h) 1 1 1 (1 2! 3! h h h As h approaches 0 and all the higher order terms approach 0, we see that E ½h. Also, when we consider the ratios of the successive errors, we have, for a step size of ½h, ( 1 h) e 2 1 E ( h) 1 1, 2h 1 2 and so the ratio of successive error terms is e( 2 h ) 1 1 1 E ( 12 h) 2 h E ( h) eh 1 h 1 1 e( 2 h ) 12 h h . e h 1 1 2 When we introduce the Taylor approximations, we find that 2 3 h h2 h3 h ( 1 h) E ( 12 h) 1 e 2 12 h 1 1 2 4 2! 8 3! ... 2 1 1 4h 2! 8h3! ... . 2 3 E (h) 2 eh h 2 1 h h2 h3 ... h 2 1 h2! h3! ... 2! 3! When h is small, the powers of h become insignificant and so the ratio of the successive errors is approximately ½, as we discovered above. Pedagogical Considerations The ideas presented above certain can be, and in fact often are, introduced in a first course in numerical analysis in the context of studying methods to approximate the derivative at a point. However, the author strongly believes that there are a variety of good reasons to introduce some of these ideas in calculus. For one, as Hamming said, the purpose of computing is insight, not numbers. But one can clearly extend this philosophy to state that the purpose of mathematics is insight, not numbers. A standard assignment in calculus consists of a score of exercises in which the students are asked to evaluate the following limits using l’Hpital’s Rule. In each one of these exercises, the objective is finding merely a number, not gaining insight. In comparison, a single problem of the sort discussed above with using l’Hpital’s Rule provides as much practice as half a dozen exercises, but there is a target for going through all that work: finding out why the seeming limit is in fact correct. Similarly, in a discussion of Taylor approximations, the students are typically presented with a list of a dozen or more exercises asking them to find the Taylor polynomial approximation to a variety of functions, but there is little indication of why one should want to know those polynomials. The value, both in mathematics and in many other disciplines that use such approximations, is the insight that the approximation provides about a process or a mathematical model. The approach outlined above can provide calculus students with a better appreciation of that kind of mathematical analysis compared to the repetitive process of constructing approximations to functions that few, if any, people ever use in practice. Moreover, as mentioned previously, numerical methods is one of the most important branches of modern mathematics. As such, there is much to be gained by exposing students, particularly those who might become interested in majoring in mathematics or a related field, to such an important area. Finally, as was mentioned above, the ideas on data analysis have become extremely prevalent throughout the high school curriculum and in many college courses below calculus. They provide some extremely powerful tools that give a very different perspective on the practice and the learning of mathematics. However, their use comes to a grinding halt when students reach calculus, which is very unfortunate, since most students see these methods as being extremely useful and many feel highly empowered by the ability to create functions based on data. The author has previously investigated a variety of ways that data analysis ideas can be extended into the calculus curriculum, including ways to discover the fundamental theorem of calculus [1] and to motivate derivative formulas [2]. The ideas discussed here represent another way to extend this data analysis theme up the mathematics curriculum. Perhaps the ideal way to introduce the ideas discussed in this article would take place if an instructor has a class for the full year. In that case, it would be natural to introduce one or possibly two of the examples discussed here in class in the context of introducing the notion of the derivative at a point via the Newton difference-quotient early in Calculus I. Additional examples of this nature could then be assigned as homework problems or as a small group project. The instructor could devote a little extra time looking at the errors associated with the successive approximations via the kinds of initial explorations we did at the start of the article. This would provide the students with a deeper understanding of what is happening in terms of the difference-quotient, as well as to reinforce some fundamental notions of limits in a practical setting. However, the reasons for those results arising could not be addressed at this point. The best that could be done at this point, however, is to look on the results as mysteries that require some deeper applications of calculus, which would provide some intriguing teasers to the students for the need to develop more ideas on calculus Later in the calculus, in the context of l’Hpital’s Rule, one can come back to some of these examples to provide partial answers to the mysterious results previously discovered; these would certainly be more “practical” than the usual collection of exercises with l’Hpital’s Rule. Letting students see how the rule might actually be needed certainly is a rather convincing argument of its value and importance. Finally, in the context of Taylor series and Taylor approximations, one can likewise come back to these examples to provide, once again, several more practical applications that demonstrate the importance of the ideas and that simultaneously provide additional opportunities for them to become familiar with, and more comfortable with, the Taylor approximation formulas. It would also tie together ideas from early in calculus to what many consider the climax of the first year of calculus. References 1. Gordon, S. P., 2003. Using Data Analysis to Discover the Fundamental Theorem of Calculus, PRIMUS .XIII(1), 85-91. 2. Gordon, S. P. and F.S. Gordon, 2002. Using Data Analysis to Motivate Derivative Formulas, Mathematics and Computer Education, 36(3), 247-253. 3. Hamming, Richard W. 1973. Numerical Methods for Scientists and Engineers, 2nd Ed., New York, McGraw-Hill.