Some Surprising Errors in Numerical Differentiation

advertisement
Some Surprising Errors in Numerical Differentiation
Sheldon P. Gordon
Department of Mathematics
Farmingdale State College
Farmingdale, NY 11735
gordonsp@farmingdale.edu
Abstract Data analysis methods, both numerical and visual, are used to discover a
variety of surprising patterns in the errors associated with successive approximations to
the derivatives of sinusoidal and exponential functions based on the Newton differencequotient.
l’Hpital’s Rule and Taylor polynomial approximations are then used to
explain why these surprising patterns occur.
Keywords Numerical differentiation, error analysis, Taylor approximations, l’Hpital’s
Rule
Introduction
As Richard Hamming, one of the giants of modern numerical analysis put it, “The
purpose of computing is insight, not numbers” [3]. The ideas presented in this article are
certainly in the spirit of that comment and many of them are usually encountered in a first
course in numerical analysis.
However, there are good pedagogical reasons to
incorporate some of them in first year calculus as well. For one, numerical methods is
one of the most important and useful fields of modern mathematics and it is desirable to
expose students to it early in their mathematical experiences. Moreover, these ideas can
provide students with some different perspectives and deeper insight into topics that they
do see in freshman calculus.
Unfortunately, relatively few mathematicians have had any training in numerical
analysis or have had the opportunity to teach a course in the subject. As such, they often
are not acquainted with, or comfortable with, some lovely ideas that can enrich courses
such as calculus while exposing students early on to a deeper appreciation of numerical
methods. In this article, we focus on ideas that can be incorporated into freshman
calculus in conjunction with a discussion of numerical differentiation and subsequent in
discussions of l’Hpital’s Rule and Taylor polynomial approximations. In the process,
we hope to highlight the kinds of important mathematical insights that can be gained by
examining the errors in numerical processes.
The conventional wisdom in both baseball and mathematics education is that
errors are terrible.
Coaches constantly impress on their players, and mathematics
instructors on their students, that errors are to be avoided at all costs. While this may be
sound advice on the baseball field, it is not always correct in mathematics. In particular,
in numerical analysis, understanding errors is an extremely valuable tool for gaining
insight into iterative methods, which often leads to the creation of more effective
numerical tools that converge more rapidly.
We will look at a number of simple examples that typically arise in freshman
calculus in the process of approximating the derivative of a function at a point using the
Newton difference-quotient. However, rather than focusing on the “answer”, we instead
look at the errors involved in the successive approximations and find some rather
surprising patterns and results when we apply some standard methods of data analysis.
We subsequently come back to each example to use some slightly more sophisticated
ideas from calculus to explain why these patterns actually occur.
Some Examples
In each of the following examples, we use the sequence of values
h = 0.1, 0.05, 0.025, … We leave it to the reader to investigate
what, if anything, might change if another sequence of values
for h were used instead.
Example 1 The derivative of f(x) = sin x at the origin. We
use the Newton difference-quotient
sin( x  h)  sin( x)
h
to approximate the derivative of the sine function at x = 0, so
f '( x) 
that
Table 1: Approximations to
derivative of sin x
h
0.1
0.05
0.025
0.0125
0.00625
0.003125
0.001563
0.000781
0.000391
0.000195
sin h/h
0.99833417
0.99958339
0.99989584
0.99997396
0.99999349
0.99999837
0.99999959
0.99999990
0.99999997
0.99999999
f '(0) 
sin(h)
.
h
With the indicated sequence of values for h, we obtain the approximations shown in
Table 1. It is clear that the ratio converges, and rather quickly at that, to 1. However,
instead of examining the values of the difference-quotient, let’s look at the resulting
values for the absolute values of the successive errors in these approximations,
sin h
sin h
 f '(0) 
1 ,
h
h
as shown in Table 2. In addition, we include an extra column showing the ratios of the
successive error values. Clearly, the errors values converge rapidly to 0, which is what
we expect.
However, some very important insights arise if we use the notion and methods of
data analysis. These ideas are among several significant new topics that have been
introduced into school mathematics and many reform versions of college courses such as
college algebra and precalculus. In many of these courses, these topics are the unifying
theme around which the entire course is built.
Unfortunately, this thread tends to
disappear completely when students reach calculus. By utilizing the data analysis ideas
as part of these investigations on limits and approximations to the derivative, we can
build on this increasingly important aspect of the precursor courses and hence build still
deeper understanding of the new material.
Consider the scatterplot of the error
values plotted against the values of the step h,
0.0018
as shown in Figure 1. The points not only
0.0015
approach 0 as h  0, but they do so in a very
0.0012
distinct manner, one that might suggest a
0.0009
power function pattern (since the points
0.0006
appear to pass through the origin). Moreover,
the pattern suggests that the power function E
Error
0.0003
h
0
0
0.02
0.04
0.06
0.08
0.1
p
= Ah has a power p that is greater than 1
(since the pattern is that of an increasing,
Figure 1: Errors in the derivative
approximations vs. h
concave up function). When we fit a power function to this data using the power
regression routine in Excel or on any graphing calculator, we get the function E =
0.166667h2. The graph of this function is shown superimposed over the points in Figure
1 and it appears to be an excellent fit. Furthermore, the corresponding correlation
coefficient is r = 1, which indicates a virtually perfect fit between the log E and the log h
values, and hence suggests that the E vs. h power function fit is extremely good. That is,
the error values as a function of the step size seemingly lie on the curve E = h2/6.
Moreover, when you examine the third column in Table 2, which shows the
absolute values of the errors, you will likely notice that each entry appears to be roughly
one-quarter of the previous entry. We therefore included a fourth column in Table 2 to
show the ratios of the successive errors. It is evident from the entries in this column that
the successive ratios start very close to  and converge to it extremely rapidly. That is,
each error is essentially one-quarter the preceding error whenever the step size is reduced
by one-half.
These apparent results suggest some related questions. What happens at points
other than the origin? What happens if the sequence of values for h is different? We
consider the latter issue later, but leave the former for the interested reader to investigate.
Table 2: Approximations to derivative of sin x, the size of the errors, and the ratios of successive errors
h
0.1
0.05
sin h/h
│error│
ratio
0.99833417 0.0016658 0.250094
0.99958339 0.0004166 0.250023
0.025
0.0125
0.00625
0.003125
0.001563
0.000781
0.000391
0.000195
0.99989584
0.99997396
0.99999349
0.99999837
0.99999959
0.9999999
0.99999997
0.99999999
0.0001042 0.250006
2.604E-05 0.250001
6.51E-06
0.25
1.628E-06
0.25
4.069E-07
0.25
1.017E-07
0.25
2.543E-08
0.25
6.358E-09
Example 2 The derivative of f(x) = cos x at the origin. The Newton differencequotient for the cosine function to approximate its derivative when x = 0 is
f '(0) 
cos(h)  1
.
h
The corresponding values for our sequence of h-values, as well as the associated errors
and the ratios of the errors, are displayed in Table 3.
Table 3: Approximations to derivative of cos x, the size of the errors, and the ratios of successive errors
h
0.1
0.05
0.025
0.0125
0.00625
0.003125
0.001563
0.000781
0.000391
0.000195
(cos h - 1)/h
-0.04995835
-0.02499479
-0.01249935
-0.00624992
-0.00312499
-0.0015625
-0.00078125
-0.00039062
-0.00019531
-9.7656E-05
│error│
0.0499583
0.0249948
0.0124993
0.0062499
0.003125
0.0015625
0.0007812
0.0003906
0.0001953
9.766E-05
ratio
0.500313
0.500078
0.50002
0.500005
0.500001
0.5
0.5
0.5
0.5
Clearly, the successive difference-
0 .0 5
quotient approximations approach 0 as h
0 .0 4
→ 0, as expected, and the errors likewise
0 .0 3
approach 0 as h → 0.
We show the
0 .0 2
scatterplot of the error values plotted
0 .0 1
Error
h
0
0
0 .0 2
0 .0 4
0 .0 6
0 .0 8
0 .1
Figure 2: Errors in derivative approximations vs. h
against h in Figure 2 and see that the points seem to lie on a line through the origin. The
corresponding regression line is E = 0.4996h + 0.000003, as shown in Figure 2. The
associated correlation coefficient is r = 1. Alternatively, the power regression equation is
E = 0.4997h0.9999 with an associated correlation coefficient of r = 1. Thus, either way, the
line E = ½h is an almost perfect fit to the error values.
Furthermore, from Table 3, the successive ratios of the errors converge to ½ quite
rapidly. So, as with the line function, each error is effectively one-quarter the previous
error whenever the step-size is reduced by half.
Again, these results suggest a number of questions. Why is the pattern in the
errors linear here while it was quadratic for the sine function? What happens at points
other than the origin? What happens if the sequence of values for h changes?
Example 3: The derivative of f(x) = ex at x = 0.
We next consider the successive approximations
f '(0) 
eh  1
h
to the derivative of the exponential function
f(x) = ex at x = 0. The values based on our
0 .0 6
Error
0 .0 5
sequence of h-values are displayed in Table
4. As expected, the approximations clearly
0 .0 4
0 .0 3
converge to 1 and the errors quickly
0 .0 2
approach 0.
0 .0 1
Further, the ratios of the
successive errors clearly approach ¼. We
h
0
0
0 .0 2
0 .0 4
0 .0 6
0 .0 8
0 .1
show the error values plotted against the
Figure 3: Errors in Approximations to Derivative of ex
step-size in Figure 3 and observe that the
pattern seemingly is a linear one. The resulting linear regression equation is E = 0.516h –
0.00009, with a correlation coefficient of r = 1, so the linear function is essentially a
perfect fit. Alternatively, the power regression equation is E = 0.5147h1.0041 with an
associated correlation coefficient of r = 1. For values of h slightly above 0, this power
function virtually indistinguishable from the linear regression function; the two certainly
diverge from one another, albeit very slowly, as h increases. Either way, we conclude that
the pattern in the errors as a function of h is the linear relationship E  0.515h when h is
reasonably small.
Table 4: Approximations to derivative of ex, the size of the errors, and the
ratios of successive errors
h
0.1
0.05
0.025
0.0125
0.00625
0.003125
0.001563
0.000781
0.000391
0.000195
(eh-1)/h
1.051709181
1.025421928
1.012604821
1.006276123
1.003131521
1.001564129
1.000781657
1.000390727
1.000195338
1.000097663
│error│
0.0517092
0.0254219
0.0126048
0.0062761
0.0031315
0.0015641
0.0007817
0.0003907
0.0001953
9.766E-05
ratio
0.491633
0.495825
0.497915
0.498958
0.499479
0.49974
0.49987
0.499935
0.499967
Explaining the “Surprises”
We now revisit each of the above three examples to see precisely why the results we
found occur. In the process, we will also address some of the extended issues raised in
conjunction with each of the examples.
The derivative of the sine function We begin with the question of why the successive
ratios of the errors converge to . For a given value of h, the associated error E(h) is
E ( h) 
sin h
sin h
 f '(0) 
 1,
h
h
and the subsequent error E(½h) based on a step of ½h is
E ( 12 h) 
sin ( 12 h)
 1.
1
2h
Consequently, the ratio of the successive errors is
sin ( 12 h)
sin ( 12 h) 12 h

1
1
1
1
E ( 12 h)
2sin ( 12 h)  h
2h
2h
2h



.
sin h
sin h h
E ( h)
sin
h

h
1

h
h
h
To find the limit of this ratio as h  0, we apply l’Hpital’s Rule repeatedly to get
lim h0
2sin ( 12 h)  h
cos ( 12 h)  1
 1 sin ( 12 h)
 1 cos ( 12 h)
 lim h0
 lim h0 2
 lim h0 4
  14 .
sin h  h
cos h  1
 sin h
 cos h
Thus, in terms of the absolute values of the errors, we see that the limit of the successive
ratios is indeed .
While l’Hpital’s Rule is great for finding the value of the limit, it really does not
provide an understanding of why that value occurs. A far more insightful approach is to
use Taylor approximations. We then get, for the error based on a step of h,


h  h3!  h5!  ...  h  h3  h5  ...
sin h
2
4
E ( h) 
1 
 3! 5!
  h3!  h5!  ... ,
h
h
h
3
5
(1)
and with step ½h,
1 h3
1 h5
1
 1
23
25
h


h3  h5  ...)
2
3!
5!  ...   2 h
1
2( 8(3!)
sin( 2 h)
64(5!)


h2  h4  ...
1
E ( 2 h)  1
1 



1
4(3!) 32(5!)
h
2h
2h
.
Therefore, when h is small, the ratio of the error terms is approximately
E ( 12 h)  4(3!)  32(5!)  ...  4(3!) 1


 ,
h2  h4  ...
h2
E ( h)
4
 (3!)

(5!)
(3!)
since all higher order terms tend to zero much more rapidly.
h2
h4
h2
(2)
Incidentally, suppose that the sequence of values for h is chosen with a different
pattern, say where each successive value is kh, where k is some fraction other than ½.
Consider the expression in the numerator of the first time of Equation (2).
In the
denominator of the h2 term, the 4 = 22, would be replaced by k2, the denominator of the h4
term, 32 = 24, would be replaced by k4, and so on. As a result, the expression for the ratio
of the errors would become
E ( 1k h)  k 2 (3!)  k 4 (5!)  ...  k 2 (3!) 1


 2,
2
4
2
E ( h)
k
 h  h  ...
h
h2
(3!)
h4
h2
(5!)
(3!)
and so the limiting value for the ratio of the errors will be 1/ k2 instead of ¼.
We next consider the question of why the pattern in the absolute value of the error
values versus the step size turns out to be roughly E = h2/6. This actually follows
immediately from Equation (1) above. When h is reasonably small, all of the higher
order terms in the power series at the right approach 0 very rapidly and so
sin h
2
 1   h3!
h
 h5!4  ...   h3!2   h62 .
The derivative of the cosine function For a given step size h, the associated error in the
approximation is
E ( h) 
cos h  1
cos h  1
 f '(0) 
 0.
h
h
Therefore, using the Taylor approximation to the cosine, we find that

2
4

1  h2!  h4!  ...  1  h2  h4  ...
cos h  1
h  h3  .....
E (h) 
0 
 2! 4!
  2!
4!
h
h
h
As h approaches 0, all higher order terms approach 0 much more rapidly and so, for small
h, we see that E  -½h; as such, when we look at the absolute values, we get │E│  ½h .
Next, let’s look at the ratio of successive error terms. Corresponding to a step size
of ½h, we have
E ( 12 h) 
cos ( 12 h)  1
1
2h
and so the ratio of successive error terms is
cos ( 12 h)  1
1
 cos ( 12 h)  1 
E ( 12 h)
2h

 2
.
cos (h)  1
E ( h)
 cos (h)  1 
h
When we introduce the Taylor approximations, we find that


 1  h2  h4  ...  1 
  h2  h4  ... 
 cos ( 12 h)  1 
E ( 12 h)
4 2! 16 4!


.
 2
 2  4 2!2 164 4!
  2
2
4
h  h  ... 

h
h
E ( h)
cos
(
h
)

1




1



...

1
2!
4!


2!
4!




When h is small, the higher order terms become insignificant and so the ratio of the
successive errors is approximately
  h2 
  1.
2  42!
2
h
  2
 2! 
Furthermore, if the sequence of values for h is such that each successive value is a
fraction k of the current value instead of ½, it is clear from the above that the ratio of the
errors will approach k.
The derivative of the exponential function We now consider the previous results on
the errors associated with the derivative of the exponential function at x = 0. For a given
step size h, the associated error in the approximation is
eh  1
eh  1
E ( h) 
 f '(0) 
 1.
h
h
Therefore, using the Taylor approximation to the exponential function, we find that

2

3
2
3
1  h  h2!  h3!  ...  1
h  h2!  h3!  ...
eh  1
h  h2  ...)  1.
E (h) 
1 
1 
 1  (1  2!
3!
h
h
h
As h approaches 0 and all the higher order terms approach 0, we see that E  ½h.
Also, when we consider the ratios of the successive errors, we have, for a step size
of ½h,
( 1 h)
e 2 1
E ( h)  1
 1,
2h
1
2
and so the ratio of successive error terms is
 e( 2 h )  1 
 1
 1
E ( 12 h)  2 h 


E ( h)
 eh  1
 h  1


1
 e( 2 h )  12 h 
 h
.
 e  h 
1
1
2
When we introduce the Taylor approximations, we find that


2
3
h
h2
h3
h

( 1 h)
E ( 12 h) 1  e 2  12 h  1  1  2  4 2!  8 3!  ...  2  1 1  4h 2!  8h3!  ... 
 
 
.

2
3
E (h) 2  eh  h  2  1  h  h2  h3  ...  h  2  1  h2!  h3!  ... 
2!
3!




When h is small, the powers of h become insignificant and so the ratio of the successive
errors is approximately ½, as we discovered above.
Pedagogical Considerations
The ideas presented above certain can be, and in fact often are, introduced in a
first course in numerical analysis in the context of studying methods to approximate the
derivative at a point. However, the author strongly believes that there are a variety of
good reasons to introduce some of these ideas in calculus.
For one, as Hamming said, the purpose of computing is insight, not numbers. But
one can clearly extend this philosophy to state that the purpose of mathematics is insight,
not numbers. A standard assignment in calculus consists of a score of exercises in which
the students are asked to evaluate the following limits using l’Hpital’s Rule. In each
one of these exercises, the objective is finding merely a number, not gaining insight. In
comparison, a single problem of the sort discussed above with using l’Hpital’s Rule
provides as much practice as half a dozen exercises, but there is a target for going
through all that work: finding out why the seeming limit is in fact correct.
Similarly, in a discussion of Taylor approximations, the students are typically
presented with a list of a dozen or more exercises asking them to find the Taylor
polynomial approximation to a variety of functions, but there is little indication of why
one should want to know those polynomials. The value, both in mathematics and in
many other disciplines that use such approximations, is the insight that the approximation
provides about a process or a mathematical model. The approach outlined above can
provide calculus students with a better appreciation of that kind of mathematical analysis
compared to the repetitive process of constructing approximations to functions that few,
if any, people ever use in practice.
Moreover, as mentioned previously, numerical methods is one of the most
important branches of modern mathematics. As such, there is much to be gained by
exposing students, particularly those who might become interested in majoring in
mathematics or a related field, to such an important area.
Finally, as was mentioned above, the ideas on data analysis have become
extremely prevalent throughout the high school curriculum and in many college courses
below calculus. They provide some extremely powerful tools that give a very different
perspective on the practice and the learning of mathematics. However, their use comes to
a grinding halt when students reach calculus, which is very unfortunate, since most
students see these methods as being extremely useful and many feel highly empowered
by the ability to create functions based on data. The author has previously investigated a
variety of ways that data analysis ideas can be extended into the calculus curriculum,
including ways to discover the fundamental theorem of calculus [1] and to motivate
derivative formulas [2]. The ideas discussed here represent another way to extend this
data analysis theme up the mathematics curriculum.
Perhaps the ideal way to introduce the ideas discussed in this article would take
place if an instructor has a class for the full year. In that case, it would be natural to
introduce one or possibly two of the examples discussed here in class in the context of
introducing the notion of the derivative at a point via the Newton difference-quotient
early in Calculus I.
Additional examples of this nature could then be assigned as
homework problems or as a small group project. The instructor could devote a little extra
time looking at the errors associated with the successive approximations via the kinds of
initial explorations we did at the start of the article. This would provide the students with
a deeper understanding of what is happening in terms of the difference-quotient, as well
as to reinforce some fundamental notions of limits in a practical setting. However, the
reasons for those results arising could not be addressed at this point. The best that could
be done at this point, however, is to look on the results as mysteries that require some
deeper applications of calculus, which would provide some intriguing teasers to the
students for the need to develop more ideas on calculus
Later in the calculus, in the context of l’Hpital’s Rule, one can come back to
some of these examples to provide partial answers to the mysterious results previously
discovered; these would certainly be more “practical” than the usual collection of
exercises with l’Hpital’s Rule. Letting students see how the rule might actually be
needed certainly is a rather convincing argument of its value and importance.
Finally, in the context of Taylor series and Taylor approximations, one can
likewise come back to these examples to provide, once again, several more practical
applications that demonstrate the importance of the ideas and that simultaneously provide
additional opportunities for them to become familiar with, and more comfortable with,
the Taylor approximation formulas. It would also tie together ideas from early in calculus
to what many consider the climax of the first year of calculus.
References
1. Gordon, S. P., 2003. Using Data Analysis to Discover the Fundamental Theorem of
Calculus, PRIMUS .XIII(1), 85-91.
2. Gordon, S. P. and F.S. Gordon, 2002. Using Data Analysis to Motivate Derivative
Formulas, Mathematics and Computer Education, 36(3), 247-253.
3. Hamming, Richard W. 1973. Numerical Methods for Scientists and Engineers, 2nd
Ed., New York, McGraw-Hill.
Download