APPLIED ANALYSIS Contents 1. The Real Numbers 2 2. Ordered

APPLIED ANALYSIS GREGORY D. LANDWEBER Contents 1. The Real Numbers 2. Ordered Fields 3. Metric Spaces√ 4. Constructing 2 5. The Archimedean Property 6. Decimal Representations and Computation 7. Sequences 8. Cauchy Sequences 9. Convergence Theorems 10. Limits and Continuity 11. The Intermediate Value Theorem 12. The Extreme Value Theorem 13. Derivatives 14. The Mean Value Theorem 15. Taylor’s Theorem 15.1. Generalizing the Mean Value Theorem 15.2. Error Estimates 16. Series 17. Numerical Differentiation 17.1. The Derivative 17.2. Higher Derivatives 17.3. Richardson Extrapolation 18. Contractions 19. Fixed Point Iteration 20. Root Approximation 20.1. Newton’s Method 20.2. Secant Method 21. Numerical Integration 21.1. Riemann Sums 21.2. Trapezoid Method 21.3. Advanced Quadrature 22. Numerical ODEs Date: May 10, 2014. 1 2 4 6 7 8 9 11 14 14 17 18 19 20 22 23 23 25 26 33 33 35 36 37 41 44 44 47 51 51 52 54 55 2 GREGORY D. LANDWEBER 22.1. Euler’s Method 22.2. Taylor Series Method 22.3. Runge-Kutta Methods 56 57 58 1. The Real Numbers What are the real numbers? In this course we will see several equivalent definitions of the real numbers, but first we need to define the natural numbers, the integers, and the rational numbers. We start with the natural numbers, which are our standard counting numbers starting with 1. (There is some debate over whether the natural numbers should start with 1 or 0, with computer scientists preferring 0. Personally, I don’t care, as long as you make it clear what you mean if it makes any difference.) Definition 1.1 (The Peano Postulates). The set N of natural numbers satisfies the following four axioms: • • • • There is an element 1 ∈ N, There is a successor map, succ : N → N, which is injective. The element 1 is not in the image of succ. The only subset N ⊂ N satisfying both (a) 1 ∈ N and (b) if n ∈ N then succ(n) ∈ N is N = N itself. These axioms are what you might expect if you think about the natural numbers. When you count, you typically start with 1, and you successively add 1, which is what the successor map does. As you count, you never repeat any numbers, which is the injectivity, and you never get back to 1 again, which among other things eliminates modular arithmetic. The final axiom is the most complicated, and it says that you get all of the natural numbers if you start with 1 and repeatedly apply the successor function. Or stating it differently, the natural numbers are the smallest set satisfying the first three axioms. Without the last axiom, the set of all positive real numbers would satisfy the other axioms, as would the union of the natural numbers with the integers shifted by 1/2. With extra work, we could also define addition and multiplication of natural numbers and show that they satisfy all their expected properties. To be complete, we should also show that a set satisfying the Peano Postulates exists and is unique. However, this is neither a course in foundations nor in algebra, so let’s continue on to the integers. Although we can add and multiply natural numbers, we cannot always subtract them, which requires negative numbers, as well as an additive identity 0. (In algebraic terms, the natural numbers are semi-ring, which satisfies all the properties of a ring, except for not requiring an additive identity or additive inverses.) To construct the integers, we could define them by simply tacking on 0 and negative numbers to the natural numbers: Z = N ∪ {0} ∪ −N. APPLIED ANALYSIS 3 However, this makes working with addition and multiplication unnecessarily difficult, as all our proofs would have to deal constantly with the three separate cases. Instead we construct the integers using formal differences. Definition 1.2. The integers Z are equivalence classes in N×N, writing ordered pairs (m, n) as formal differences n − m for n, m ∈ N, modulo the equivalence relation (1.1) m − n ∼ p − q iff m + q = p + n for m, n, p, q ∈ N. We wanted subtraction, and this definition gives it to us. In fact, every integer is viewed as a difference of two natural numbers. For instance the negative integer −2 can be represented as the differences 1−3 or 2−4 or 3−5, etc., all of which are equivalent. In order to show that two such formal differences are equivalent, the relation (1.1) algebraically transforms the new operation of subtraction into addition, which we can do just fine using natural numbers. Although we can subtract in the integers (i.e., they are a ring with an additive identity and additive inverses), we still cannot necessarily divide any two integers. However, we can use essentially the same formal ordered pair technique that we used to construct the integers, now constructing the rational numbers as formal quotients of integers. Definition 1.3. The rational numbers Q are equivalence classes in Z × (Z − {0}), writing ordered pairs (p, q) as formal quotients p/q for p, q ∈ Z, modulo the equivalence relation m p (1.2) ∼ iff mq = pn n q for m, n, p, q ∈ Z. This definition of the rational numbers is, in fact, exactly how we think of fractions. Every fraction has a numerator and a denominator, where the numerator is an integer, and the denominator is a non-zero integer. We know that fractions can be written multiple ways, such as 1/2 = 2/4 = 3/6 etc., all of which are equivalent via the relation (1.2), which is just cross-multiplying. And reducing a fraction to lowest terms is, in this language, choosing a representative of the equivalence class. They never taught you that in elementary school! Now that have made sense of fractions as equivalence classes of ordered pairs, go back and take another look at Definition 1.2 to see if it makes any more sense to you. So far, we started with the natural numbers, constructed the integers by formally subtracting, and constructed the rational numbers by formally dividing. Algebraically, the rational numbers are a field, which means we can do any kind of arithmetic. However, we run into problems 2 with algebra, such as if we try to find the √ roots of the equation x − 2 = 0. You probably showed in Proofs & Fundamentals that 2 is irrational. In addition, you probably know that transcendental numbers such as π and e are irrational. But the real numbers are much more than just the roots of polynomials (which are called algebraic numbers), or the short list of popular irrational numbers arising from the study of geometry and compound interest. To really understand the real numbers, we must first ask ourselves what do we need to do using real numbers that we cannot already do with rational numbers? It turns out you have already known the answer for years: it’s limits and calculus! 4 GREGORY D. LANDWEBER When we wanted to subtract, we defined the integers as formal differences of natural numbers. When we wanted to divide, we defined the rational numbers as formal quotients of integers (except for disallowing dividing by 0). Now that we want to be able to take limits, we can define the real numbers as the formal limits of appropriate sequences of rational numbers. But what does it mean for a sequence to converge? The standard definition is: Definition 1.4. A sequence {an }∞ n=1 converges to L, written limn→∞ an = L, if ∀ > 0, ∃N ∈ N such that n ≥ N =⇒ |an − L| < . This may look complicated to you now, with its double quantifiers and Greek letter epsilon. In fact, a large part of understanding real analysis is mastering how to manipulate such double quantifiers expressed in terms of , N , and later δ. We will study this definition in much more detail later, but for now it means that as we progress through the sequence, the terms get closer and closer to the limiting value. The problem with trying to use this definition of a convergent sequence to construct the real numbers is that it is stated in terms of the value L to which the sequence converges. This is fine when considering sequences that converge to rational numbers, but we want to consider sequences of rational numbers that may converge to irrational numbers. So how do we know if a rational sequence converges, if there is no rational number that it converges to? Instead of actually converging, we consider sequences that “should converge”: Definition 1.5. A Cauchy sequence is a sequence {an }∞ n=1 satisfying the property ∀ > 0, ∃N ∈ N such that m, n ≥ N =⇒ |an − am | < . Basically, as we progress through a Cauchy sequence, the terms get closer and closer to each other. Although such sequences look like they should converge, a Cauchy sequence of rational numbers does not necessarily converge to a rational number. For example, the sequence of rational numbers 3, 3.1, 3.14, 3.141, 3.1415, 3.14159, 3.141592, 3.1415926, 3.14159265, 3.141592653, . . . does not converge to a rational number, since π is irrational. To get such sequences to converge, we explicitly construct the real numbers out of sequences: Definition 1.6. The real numbers R are equivalence classes of Cauchy sequences in rational numbers, modulo the equivalence relation ∞ {an }∞ n=1 ∼ {bn }n=1 iff lim (an − bn ) = 0. n→∞ In other words, real numbers are sequences of rational numbers that should converge, where two sequences are equivalent if they should converge to the same value. 2. Ordered Fields The rational numbers and the real numbers are both fields, meaning they have addition and multiplication operations, both of which satisfy the properties of an abelian group operation (if you exclude 0 when considering multiplication), and which satisfy a distributive law. Or APPLIED ANALYSIS 5 in other words, all the standard rules of arithmetic hold. In addition, the rational numbers and the real numbers are ordered, and we will now define precisely what that means. Definition 2.1. An ordered field is a field with a < relation satisfying • If x < y and y < z, then x < z (the Transitive Property). • For any x and y, precisely one of x < y, x = y, or y < x must be true (the Trichotomy Property). • If x < y, then x + z < y + z (the Additive Property). • If x < y and z > 0, then xz < yz (the Multiplicative Property). The transitive property is a standard property of relations, and the additive and multiplicative properties describe how the less-than relation interacts with the field operations. The trichotomy property tells us that an ordered field is totally ordered, in that we can directly compare any two elements. This is in contrast to partially ordered sets, or posets, where any two elements are not necessarily directly comparable. Of course, an ordered field not only has a < relation, but also the corresponding relations >, ≤, ≥. With these relations, we can define intervals, such as the open interval (a, b) = {x | a < x and x < b}, and absolute values ( |x| = x if x ≥ 0, −x if x < 0. Theorem 2.2. It follows immediately from the definition of an ordered field that: • |x| ≥ 0. • if x < y, then 1/x > 1/y (taking reciprocals reverses inequalities), • if x < y and z < 0, then xz > yz (multiplying by a negative number reverses inequalities), and • x2 ≥ 0 (so in particular the complex numbers C cannot be an ordered field). Proof. The proofs of these elementary facts are left to the reader. The following two results often come in handy in real analysis: Lemma 2.3. If x ≤ for all > 0, then x ≤ 0. Proof. Suppose that x > 0. Taking = x/2, the multiplicative property gives us x/2 > 0, and the addition property gives us x > x/2. Then by the trichotomy property we cannot have x ≤ x/2. Corollary 2.4. If |x| ≤ for all > 0, then x = 0. It turns out that every ordered field contains the rational numbers. After all, a field must contain 0 and 1, and it follows from the axioms of an ordered field that all elements of the form 1 + 1 + · · · + 1 must be distinct. That gives us the natural numbers, and since a field has additive and multiplicative inverses, we get the rational numbers as well. What is more difficult to show is that the real numbers are in fact the only complete ordered field, where complete means that all Cauchy sequences converge. 6 GREGORY D. LANDWEBER 3. Metric Spaces Definitions 1.4 and 1.5 are stated in terms of the absolute values |an − L| and |an − am |, respectively. However, they really are not about absolute values in the sense of switching minus signs to plus signs. Rather, they use the fact that |a − b| is the distance between a and b. In fact, most of the definitions and results in this course do not require that we work with the real numbers or even an ordered field, and can be converted from statements involving absolute values to more general results about distances. To do this, we need a rigorous set of axioms to characterize distance. Definition 3.1. A metric space is a set X together with a distance function d : X × X → R satisfying • • • • d(x, y) ≥ 0, d(x, y) = 0 if and only if x = y, d(x, y) = d(y, x) (the symmetric property), and d(x, z) < d(x, y) + d(y, z) (the triangle inequality). In our case we use the distance function d(x, y) = |x−y|. However, if we are working in R2 we can use the Pythagorean theorem, defining the distance as the length of the hypotenuse p d ((x1 , y1 ), (x2 , y2 )) = (x1 − x2 )2 + (y1 − y2 )2 , and for a general Rn , the distance between two vectors x and y is v u n uX d(x, y) = t (xi − yi )2 . i=1 In fact, viewing the real numbers as R = R1 we can rewrite the absolute value as p d(x, y) = |x − y| = (x − y)2 . The Euclidean spaces Rn are not the only examples of metric spaces. We can also build metric spaces by measuring distances on curved surfaces. For example, to measure the distance between two points on the spherical surface of the earth, we measure the distance along a great circle (a circle centered at the center of the earth, which may be familiar to you as the route that airplanes fly). Understanding how curvature affects distance, and how to determine the shortest paths between points (called a geodesic) is a big part of differential geometry and general relativity. We can also concoct strange and interesting distances functions satisfying the axioms of Definition 3.1. For example, we can define a discrete metric space by defining distance to be ( 0 if x = y, d(x, y) = 1 if x 6= y, which indeed satisfies all the axioms. With a discrete metric, all points are separated from each other, which makes taking limits rather pointless. The field of topology is an axiomatic study of distance functions and open sets, which generalize the open intervals we use on the APPLIED ANALYSIS 7 real line. Topology explores much broader notions of limits and continuity, with surprising and beautiful results. 4. Constructing √ 2 In √ Section 1, we constructed the real numbers as sequences of rational numbers. We know that √ 2 is an irrational number, so what is a sequence of rational numbers that converges to √ 2? There are many ways to do this, and we will see at least four different constructions of 2 later in the course. √ The simplest way to construct 2 is to start with the two closest integers. Since 12 = 1 √ 2 and 2 = 4, we know that 2 is between 1 and 2. If I were √ doing this, I would figure out the 2 2 next digit, noticing that 1.4 = 1.96 and 1.5 = 2.25, so 2 lies between √ 1.4 and 1.5. On the other hand, if we want to be methodical about it, if we know that 2 is between 1 and 2, 2 we √ could split the difference and consider the midpoint 1.5. Since 1.5 =2 2.25, we know√that 2 is between 1 and 1.5. Taking the midpoint again, we see that 1.25 = 1.5625, so 2 is between 1.25 and 1.5. Continuing this process, we obtain the sequence: 1, 1.5, 1.25, 1.375, 1.4375, 1.40625, . . . √ which indeed converges to 2 = 1.41421356237309 . . .. This process is called the bisection method, since at each stage we bisect the interval. From 2 2 a theoretical √ point of view, in order to do this we need to know that if a < 2 and b > 2, then a < 2 < b. This is a consequence of: Theorem 4.1 (Intermediate Value Theorem). If the function f : [a, b] → R is continuous, and d is between f (a) and f (b), then there exists c ∈ [a, b] such that f (c) = d. We will define the notion of continuity later, but at this point you probably have an intuitive idea of what it means from Calculus. √ Now that the bisection method gives us a sequence that potentially converges to 2, we can use Definition √ 1.4 to prove that it converges. Given any > 0, we need to find an N such that |an − 2| √ < whenever n > N . Here a1 = 1, a2 = 1.5, a3 = 1.25, etc. At each stage, we know that 2 lies between an and an+1 , and we also note that |an − an+1 | = 1/2n , so we have √ |an − 2| < 1/2n . We now need to find N such that 1/2N < . Solving for N , we obtain N > − log2 . Since it is always possible to find√a natural number N greater than − log2 , we have shown that this sequence converges to 2. 8 GREGORY D. LANDWEBER 5. The Archimedean Property Actually, how do we know that given any real number, there always exists a greater natural number? This is called the Archimedean Property, and while it seems obvious, it is non-trivial to prove. We start with the analogous statement for the rational numbers Theorem 5.1 (Archimedean Property of Q). Let p/q ∈ Q be a rational number. Then there exists a natural number n ∈ N such that n > p/q. Proof. Let us assume that p/q > 0, since otherwise we have p/q < 0 < 1 and we are done. Since p/q is positive, let us take both p > 0 and q > 0, and we observe that p/q > 0 =⇒ 2p/q > p/q > 0. Similarly, we obtain 3p/q > p/q > 0 and 4p/q > p/q > 0 etc. Adding p/q a total of q times gives us p = qp/q > p/q, where p is a positive integer, and thus a natural number. Now that we have established the Archimedean Property for the rational numbers, we can prove it for the real numbers using the fact that every real number is the limit of a sequence of rational numbers. Theorem 5.2 (Archimedean Property of R). For any real number x ∈ R, there exists a natural number n ∈ N such that n > x. Proof. Let x be a real number. We then have x = limn→∞ an , where {an }∞ n=0 is a sequence of rational numbers. Since this sequence converges, we can choose = 1 and then there exists an N ∈ N so that whenever n > N , we have |an − x| < 1. It follows that x < an + 1. Since an is rational, we know that there exists a natural number M ∈ N such that an < M , and thus x < an + 1 < M + 1. This proof introduces a technique that we will use repeatedly. Since the reals are limits of sequences of rational numbers, given a real number we can always find a rational number that is arbitrarily close. The technical term for this is that the rational numbers are dense in the real numbers. Instead of trying to find natural numbers that are larger than any given real number, we can invert the problem and find rational numbers that are smaller than any given positive real number. Corollary 5.3. Given any > 0, there exists N ∈ N such that 1/N < . This corollary is very useful in proving that sequences converge. Here are two examples, both showing convergence to 0, although the Archimedean property can be used to prove convergence for sequences with limits other than 0, too. APPLIED ANALYSIS 9 Example 5.4. To show that 1 = 0, n→∞ n suppose we are given an > 0. The corollary to the Archimedean property says that there exists a natural number N such that 1/N < . Then for all n > N we have lim 1 1 < < , n N which proves convergence. Example 5.5. To show that 1 = 0, n→∞ 2n suppose we are given an > 0. We would like to find N so that 1/2N < , and solving for N we obtain N > − log2 . The Archimedean property says that we can always find such an N . Then for n ≥ N we have 1 1 ≤ N < , n 2 2 which proves convergence. We could also prove that this sequence converges to 0 by using the Sequence Comparison Test (a corollary of the Squeeze Theorem) below. We notice that lim 0< 1 1 < , n 2 n and taking limits gives us 1 1 ≤ lim = 0, n n→∞ 2 n→∞ n which squeezes the limit between two values, both of which are 0. 0 ≤ lim 6. Decimal Representations and Computation Our Definition 1.6 of the real numbers in terms of Cauchy sequences is the most commonly used construction of the real numbers. The process of extending a metric space by considering equivalence classes of Cauchy sequences is called taking the Cauchy completion. It is, however, not the only way to define the real numbers. The real numbers can also be defined axiomatically, for instance as a complete ordered field, in which all Cauchy sequences converge. That’s a little different than constructing the reals out of Cauchy sequences, and proving that our Cauchy completion is in fact complete is surprisingly non-trivial. Another axiomatic definition of the real numbers is as a complete ordered field satisfying the least upper bound property, where any set that is bound above in fact has a least upper bound. You can also construct the real numbers as Dedekind cuts, which split the entirety of the rational numbers into lower and upper sets. A more practical construction of the real numbers is via infinite decimal expansions: nk . . . n3 n2 n1 .d1 d2 d3 d4 . . . , where there are finitely many digits to the left of the decimal place and countably infinitely many digits to the right. Of course all the digits are in the range from 0 through 9. We 10 GREGORY D. LANDWEBER note that a terminating decimal can still be considered to be an infinite decimal expansion, where all the di digits are 0 past some point. There are two problems with constructing real numbers via such infinite decimals. First, it turns out that infinite decimals are not unique. For instance, we can show that 0.9999 . . . = 1.0000 . . . . Fortunately, all pairs of distinct decimal expansions corresponding to the same number are of this form, and we can fix that by taking the appropriate equivalence classes. The other problem with infinite decimals is that they are expressed in base 10, a consequence of the fact that most humans have 10 fingers and 10 toes. That is an arbitrary choice, and we could just as easily express numbers as infinite strings of digits in other bases. For instance, a computer uses binary representations of floating point numbers in base 2. It does not matter what base you use; the real numbers are the same regardless. But what is really happening when we consider a decimal expansion? As we saw before, the number π = 3.1415926535 . . . is the limit of the Cauchy sequence {3, 3.1, 3.14, 3.141, 3.1415, 3.14159, 3.141592, 3.1415926, 3.1415265, 3.141592653, . . .}. The same is true for every infinite decimal expansion, so such decimal numbers are indeed real numbers. The converse is more difficult, showing that every Cauchy sequence of rational numbers is equivalent to a decimal expansion. Another way to view the decimal expansion of a real number is as giving a sequence of intervals containing that number. For example, when we say that π ≈ 3, we mean that π is between 3 and 4, and when we say that π ≈ 3.1, we mean that π is between 3.1 and 3.2. In fact, we can express π as the unique point in the intersection of all such intervals: {π} = [3, 4) ∩ [3.1, 3.2) ∩ [3.14, 3.15) ∩ [3.141, 3.142) ∩ [3.1415, 3.1416) ∩ · · · . Later, we will prove the Nested Intervals Theorem, which says that any such sequence of nested (closed) intervals whose sizes shrink to 0 contains precisely one real number. In fact, we could even define the real numbers as an ordered field satisfying a Nested Intervals Axiom. This approach of bounding a real number inside ever shrinking intervals mirrors how real numbers are used in real life. Have you ever actually used a real number? No measurement you have ever taken has been an irrational number, but instead was a fraction or a finite decimal. Why is that? It is because your measuring equipment does not have infinite accuracy, so you are forced to use an approximation. In chemistry, people talk about significant figures. If you say you have 2.54 moles of a molecule, it is understood that the actual value is between 2.535 and 2.545 moles, so your approximation has an error of at most 0.005. In statistics, you might conclude that Democrats are leading Republicans by 53% to 47%, but that there is a margin of error of 2%, and most good statistical estimates usually include a standard deviation term. Other scientific measurements typically come with an error estimate, too, due to the limitations of the equipment. Part of the scientific method is that experiments must be reproducible, with the understanding that a repeated experiment APPLIED ANALYSIS 11 should give a value within your original margin of error, and that as you use more and more accurate measuring equipment, your margin of error decreases. Sound familiar? Is the problem with science that we have not yet figured out how build a contraption that can measure with infinite precision? Surprisingly, quantum mechanics tells us that it will never be possible to measure anything with infinite accuracy, because particles themselves are not localized at specific points, but rather smeared over small regions as wave functions. So, is science doomed? No, that’s just how real numbers work! If we define the real numbers in terms of a Nested Intervals Axiom, the reals are numbers that can be approximated to within whatever error you are willing to tolerate, but which can always be refined further. This is the approach we will take with our numerical computations. Every computational algorithm we will discuss not only provides a sequence of better and better approximations to whatever we are trying to compute, but also comes with a bound on the error at each step. If we are given an error tolerance, such as wanting to compute π to 10 decimal places, we find a sequence that approaches π and keep computing until our error bound is less than 10−10 . This way we can compute any real value we want by providing a sequence of ever improving approximations, since that is what real numbers are. 7. Sequences A sequence is an ordered, countably infinite collection of numbers, which is typically denoted by a subscripted variable, as in a1 , a2 , a3 , a4 , a5 , . . . . More compactly, we can write such a sequence as {an }∞ n=1 . Although sequences are typically indexed by natural numbers starting with 1, we often encounter sequences with indexes starting at 0, particular when we consider series. In general, the index we start with is of little consequence, and surprisingly the order of the sequence is not terribly important either. What really matters is that the sequence contains a countably infinite number of elements. Recall from Section 1 our Definition 1.4 of the limit of a sequence, using double quantifiers. Here we explore some immediate consequences of that definition. Lemma 7.1 (Sign-Preserving Property). If an ≥ 0 for all n ∈ N, then limn→∞ an ≥ 0. Proof. The proof is left to the reader. This lemma is a corollary of the slightly simpler result: if limn→∞ an > 0, then there exists M > 0 and N ∈ N so that an > M for all n > N . ∞ Theorem 7.2. If {an }∞ n=1 and {bn }n=1 both converge, then lim (an ± bn ) = lim an ± lim bn . n→∞ n→∞ n→∞ The proof of this lemma is a straightforward example of what is called an /2 argument and demonstrates several common techniques that we will use throughout real analysis. To illustrate these techniques we will go into rather more detail than is really necessary. ∞ Long-Winded Proof. Since we know that {an }∞ n=1 and {bn }n=1 converge, they have limits lim an = La n→∞ and lim bn = Lb . n→∞ 12 GREGORY D. LANDWEBER To prove that limn→∞ (an + bn ) = La + Lb , we must show that given any > 0, we can produce N ∈ N such that |(an + bn ) − (La + Lb )| < whenever n ≥ N . Using Definition 1.4 of the limit of a sequence, we know that given our > 0 there must exist • Na ∈ N such that |an − La | < whenever n ≥ Na , and • Nb ∈ N such that |bn − Lb | < whenever n ≥ Nb . We rewrite these absolute value inequalities in terms of the open intervals (La − , La + ) and (Lb − , Lb + ), both of size centered at La and Lb respectively. This gives us La − < an < La + and Lb − < bn < Lb + . Adding these inequalities, we obtain La + Lb − 2 < an + bn < La + Lb + 2, which in terms of absolute values gives us (an + bn ) − (La + Lb ) < 2. (7.1) We know this is true whenever both n ≥ Na and n ≥ Nb . Constructing N = max(Na , Nb ), we observe that N ≥ Na and N ≥ Nb , so whenever n ≥ N we must have both n ≥ N ≥ Na and n ≥ N ≥ Nb . This gives us our desired result, well almost. We wanted the distance in (7.1) to be bounded by , not 2. How do we fix that? Just go back to our definitions of Na and Nb and use /2 in place of . If we wanted a more concise proof, we could have written: Concise Proof. Let limn→∞ an = La and limn→∞ bn = Lb . There exist Na and Nb such that |an − La | < /2 and |bn − Lb | < /2 whenever n > Na and n > Nb , respectively. Letting N = max(Na , Nb ), we then have |(an + bn ) − (La + Lb )| < whenever n > N . Or if we really know what we are doing and are a tad flippant: Over-Confident Proof. This is an /2 argument. Here are some results involving sequences bounded by numbers or by other sequences. Theorem 7.3. Every convergent sequence is bounded. Proof. Suppose limn→∞ an = L. Taking = 1, we find that there exists N ∈ N so that |an − L| < 1 =⇒ L − 1 < an < L + 1 for all n > N . This bounds all of the terms in the sequence after the aN term. To bound the entire sequence, we just need to extend our bounds so they works for the first N terms as well. To do this we observe that min(a1 , a2 , . . . , aN , L − 1) ≤ an ≤ max(a1 , a2 , . . . , aN , L + 1) for all n ∈ N, and so the entire sequence is bounded. APPLIED ANALYSIS 13 Both this proof and our long-winded proof of Theorem 7.2 above take advantage of the min/max trick, the simple observation that a < min(b, c) if and only if a < b and a < c, and a > max(b, c) if and only if a > b and a > c. In practice, the converse of Theorem 7.3 is used far more often. Corollary 7.4. If a sequence is unbounded, then it diverges You may recall the following lemma from Calculus, if only because of its interesting name. Theorem 7.5 (Squeeze Lemma). Given two convergent sequences with the same limit lim an = lim cn = L, n→∞ n→∞ suppose a third sequence {bn }∞ n=1 is squeezed between them, in that there exists N ∈ N such that an ≤ bn ≤ cn for all n > N . Then limn→∞ bn = L as well. Proof. Given > 0, there exists Na , Nc ∈ N so that |am − L| < and |cn − L| < whenever m > Na and n > Nb . We want to find Nb so that |bn − L| < whever n > Nb . For any n ∈ N satisfying both n > Na and n > Nb , as well as n > N from the statement of the theorem, we have L − < an ≤ bn ≤ cn < L + , so |bn − L| < . In order that n be greater than all three of Na , Nc , and N , we can require n > Nb = max(Na , Nc , N ), which shows that limn→∞ bn = L. Notice that once again we have used the min/max trick. The most commonly used application of the Squeeze Lemma is the comparison test for positive sequences. Corollary 7.6 (Sequence Comparison Test). Suppose that limn→∞ bn = 0 and {an }∞ n=1 is a sequence with positive terms satisfying an ≤ bn for all n > N for some N ∈ N. Then limn→∞ an = 0 as well. Proof. Since the an terms are all positive, we have 0 < an , and so we can squeeze the ∞ ∞ {an }∞ n=1 sequence between the constant sequence {0}n=1 and the sequence {bn }n=1 , both of which converge to 0. ∞ Exercise 7.7. Suppose the sequences {an }∞ n=1 and {bn }n=1 are precisely the same set of points, but in a different order. Prove that limn→∞ an = L if and only if limn→∞ bn = L. (Hint. You can prove this using a straightforward − N argument, together with the min/max trick.) 14 GREGORY D. LANDWEBER 8. Cauchy Sequences Theorem 8.1. Every convergent sequence is a Cauchy sequence. Proof. Support limn→∞ an = L. Given any > 0, we can find N ∈ N so that for all n > N we have |an − L| < /2, or in other words L − < an < L + . 2 2 For any m, n > N , both am and an are contained in the same interval of size , and so |am − an | < . This is another example of an /2 argument. We are saying that two terms of the sequence that are close to L must be close to each other. However, if we used the standard condition |an − L| < , then we would find that the distance between the two terms would be bounded by |am − an | < 2. However, the definition of the limit of a sequence applies to all , so we can divide all our ’s by 2 and the argument still works. We will see many examples of /2 arguments throughout this course, as well as the occasional /3 argument, and even perhaps an /4 or /5 argument. Ultimately, all such arguments can be viewed as consequences of the triangle inequality for metric spaces. Theorem 8.2 (Cauchy Completeness Theorem). Every Cauchy sequence of real numbers converges. Proof. Surprisingly hard, considering that we defined the real numbers via Cauchy sequences over the rational numbers. 9. Convergence Theorems Recall that we defined the real numbers via the Cauchy Completeness Axiom, so all Cauchy sequences converge. In this section we prove several theorems about convergence, all of which follow from that one axiom. All of these proofs can be reduced to showing that the sequence in question is, in fact, a Cauchy sequence. However we will take an indirect method in our proofs, instead relying on the following workhorse theorem: Theorem 9.1 (Nested Intervals Theorem). Suppose we have a sequence of closed intervals [a1 , b1 ] ⊃ [a2 , b2 ] ⊃ [a3 , b3 ] ⊃ · · · , each nested inside the previous one, and suppose the size of these intervals shrinks to zero, i.e, limn→∞ (bn − an ) = 0. Then their intersection is a single point, ∞ \ [an , bn ] = {c}, n=1 ∞ for some c ∈ R, and furthermore the sequences {an }∞ n=1 and {bn }n=1 converge, with limit lim an = lim bn = c. n→∞ n→∞ APPLIED ANALYSIS 15 ∞ Proof. We first show that both {an }∞ n=1 and {bn }n=1 are Cauchy sequences, which means they converge. Consider the sequence {an }∞ n=1 , and suppose we are given > 0. Since limn→∞ (bn − an ) = 0, we know that there exists N ∈ N so that |an − bn | < whenever n > N . Since the intervals are nested, we have a1 ≤ a2 ≤ a3 ≤ · · · ≤ an ≤ · · · ≤ b n · · · ≤ b 3 ≤ b 2 ≤ b 1 . It follows that for any m > N we have aN ≤ am ≤ bN < aN + , and same is true for an where m > N . Since both am and an are contained in the same interval of size , we have |an − am | < whenever m, n > N . This shows that {an }∞ n=1 is a Cauchy sequence, which converges by the Cauchy Completeness Axiom, and so limn→∞ an = ca for some ca ∈ R. Using the same argument to show that {bn }∞ n=1 is a Cauchy sequence, we find that limn→∞ an = cb for some cb ∈ R. With both sequences converging, we have cb − ca = lim bn − lim an = lim bn − an = 0, n→∞ n→∞ n→∞ and thus ca = cb . From here on we write c = ca = cb . Next, we consider the intersection of all the nested intervals. We must show that it contains c, but not any other real numbers. In order to show that the intersection contains c, we must show that an ≤ c ≤ bn for all n ∈ N. Suppose that c < an for some n ∈ N. Taking = an − c, we find that |an − c| ≥ , and for any m > n we have c < an ≤ am and so |am − c| ≥ as well. This means that limn→∞ an 6= c, which is a contradiction. Thus an ≤ c. Similarly, we can show that c ≤ bn . Suppose that the intersection also contains some d 6= c. Then each each of the intervals [an , bn ] must contain both c and d, and in particular we have bn − an > |c − d| > 0 for all n. This contradicts that limn→∞ (bn − an ) = 0, so there is no such d in the intersection. This theorem says that if you are trying to compute a value and are able to bound it inside an interval, and if you successively shrink the size of that interval to 0, then the bounds of the interval approach your desired value √ from both sides. This is what we did when we used the bisection method to approximate 2. Theorem 9.2 (Monotone Convergence Theorem). Suppose the sequence {an }∞ n=1 is monotone increasing, i.e., an < an+1 for all n ∈ N. If {an }∞ is bounded above, then it must n=1 converge. The same holds for sequences that are monotone non-decreasing (an ≤ an+1 ) and bounded above, as well as for sequences that are bounded below and are monotone decreasing (an > an+1 ) or monotone non-increasing (an ≥ an+1 ). You might think that we will prove the Monotone Convergence Theorem by showing that the monotone sequence is, in fact, a Cauchy sequence. Actually, our proof will do so only indirectly, instead following from the Nested Intervals Theorem. But first we need a lemma, the sequence version of the Least Upper Bound property of the real numbers. Lemma 9.3 (Least Upper Bound Lemma). If a set A ∈ R is bounded above, then there exists a sequence limn→∞ an = L with values in A, such that L is the least upper bound of A. 16 GREGORY D. LANDWEBER Proof. We construct a chain of nested intervals and invoke the Nested Intervals Theorem above. Let M1 be an upper bounded of A, and let a1 ∈ A be any element. Now construct the midpoint m = (a1 + M1 )/2, and consider the following two cases: (1) If m is an upper bound of the A, we set a2 = a1 , and we take M2 = m. (2) If m is not an upper bound of A, then there exists a2 ∈ A such that a2 > m, and we set M2 = M1 . Repeating this process, we can construct a sequence of nested intervals [a1 , M1 ] ⊃ [a2 , M2 ] ⊃ [a3 , M3 ] ⊃ · · · . Also, since at each stage we take the midpoint of the previous interval and the next interval is contained in either the left or right half, the size of these intervals shrinks to 0. By the Nested Intervals Theorem, there exists a single L ∈ R contained in all of these intervals, such that L = lim an = lim Mn . n→∞ n→∞ We must now show that L is the least upper bound of A. If L is not an upper bound of A, then there exists a ∈ A with a > L. However, since the upper bounds converge to L, there must exist upper bounds Mn satisfying |Mn − L| < a − L, and thus a > Mn , which contradicts that Mn is an upper bound. If L is not the least possible upper bound of A, then there exists another upper bound K < L. But then there exist elements an satisfying |an − L| < L − K, and thus an > K, which contracts that K is an upper bound. While constructing the least upper bound was straightforward, showing that it is a least upper bound was more difficult. It is best to think of this in terms of a number line diagram. If L is not an upper bound, there is an element of A greater than one of the upper bound right endpoints, and if L is not least, then there exists an upper bond less than one of the sequence left endpoints. So we really need from the Nested Intervals Theorem that the two endpoint sequences converge to the same limit from both sides. Proof of the Monotone Convergence Theorem. We use the previous lemma, applied to the bounded set A = {an }∞ n=1 . This produces a convergent subsequence limk→∞ ank = L. We must now show that this least upper bound L is, in fact, the limit of the entire monotone sequence {an }∞ n=1 . Given any > 0, there exists K ∈ N so that L − ank < for all k > K. Letting N = nk+1 and using the fact that our sequence is monotone increasing, we have an > ank+1 > L − =⇒ |an − L| < , for all n > N , and thus limn→∞ an = L. In general, having a convergent subsequence does not imply that the full sequence converges. For instance, the sequence +1, −1, +1, −1, +1, −1, . . . has subsequences that converge to +1 and −1, but the entire sequence does not converge. However, it is indeed true for monotone sequences that the convergence of a subsequence implies the convergence of the whole sequence, as our proof above shows. In the following exercise, we consider the converse of this statement, which is true even without the assumption of monotonicity. APPLIED ANALYSIS 17 Exercise 9.4. Given a convergent sequence, show that any subsequence converges to the same value as the original sequence. In other words, if limn→∞ an = L, then limk→∞ ank = L. On the other hand, it is not hard to find convergent subsequences if the original sequence is bounded, as the following theorem shows: Theorem 9.5 (Bolzano-Weierstrass Theorem). Every bounded sequence has a convergent subsequence. Furthermore, if the sequence is contained in a closed interval I, then the limit of any convergent subsequence is also contained in I. Proof. Our proof uses the Nested Intervals Theorem. Divide the interval I into two closed subintervals at its midpoint. Since our original sequence was infinite, either the left halfinterval or the right half-interval (or both half-intervals) contains infinitely many elements. Chose a closed half-interval that contains infinitely many elements of the original sequence. Then subdivide that interval and choose its closed half-interval containing infinitely many elements of the original sequence, and so on. Since each iteration of this process cuts the size of the interval in half, we obtain a sequence of nested closed intervals whose sizes shrink to zero. Choosing a subsequence of our original sequence with one element in each of these nested intervals, this subsequence is then squeezed between the sequences of left and right endpoints, which both converge to the same value by the Nested Intervals Theorem. Our subsequence must then converge to that same value by the Squeeze Lemma. Now, suppose that we have a convergent sequence {an }∞ n=1 contained in a closed interval I = [c, d]. We then have c ≤ an ≤ d for all n, and by the Sign-Preserving Lemma, we see that c ≤ limn→∞ an ≤ d. Note that the second statement of this theorem is not necessarily true if I were an open interval, since the limit could be one of the endpoints. Also note that the Monotone Convergence Theorem can be viewed as a corollary of the Bolzano-Weierstrass Theorem. If you have a monotone increasing sequence that is bounded above, then the sequence is also bounded below by its first element. It therefore has a convergent subsequence, just as we showed using the Least Upper Bound Lemma. The converse is also true, that the BolzanoWeierstrass Theorem is a corollary of the Monotone Convergence Theorem. Indeed, all four of the results in this section are equivalent to one another. 10. Limits and Continuity You probably have an intuitive idea about what limits are from Calculus. Here we present a definition of limits of functions in terms of limits of sequences. Definition 10.1. Let f : I − {c} → R be a function defined on an open interval I with a point c ∈ I removed. We say the limit of f (x) as x approaches c is limx→c f (x) = L, if for every convergent sequence in I − {c} with limn→∞ an = c, we have limn→∞ f (an ) = L. This sequence-based definition of limits likely agrees with your intuition from calculus. If you want to compute limx→2 x2 , you might compute 1.92 , 1.992 , 1.9992 , . . ., or perhaps 2.12 , 2.012 , 2.0012 , . . .. In other words, you choose a sequence of numbers approaching 2, apply 18 GREGORY D. LANDWEBER the function to each of those numbers, and determine the limit of the resulting sequence. Our definition is very much a discrete approach to limits, which contrasts the standard continuous − δ approach, which we show is equivalent in the following theorem. Theorem 10.2. We have limx→c f (x) = L if and only if for all > 0, there exists δ > 0 such that for any x 6= c satisfying |x − c| < δ, we have |f (x) − L| < . Proof. Suppose the − δ definition of the limit hold and consider any sequence such that limn→∞ an = c. Given > 0, there exists a δ such that |f (an ) − L| < whenever |an − c| < δ. However, we can find N ∈ N so that |an − c| < δ whenever n > N . It follows that limn→∞ f (an ) = L. Suppose the − δ definition of the limit fails. In that case, there exists an > 0 such that for all δ > 0 we have some x 6= c satisfying both |x − c| < δ and |f (x) − L| ≥ . We construct a sequence by choosing terms an satisfying these conditions for δ = 1/n for each n ∈ N. We clearly have limn→∞ an = c, but limn→∞ f (an ) 6= L. Definition 10.3. Let f : I → R be a function defined on an open interval I. If c ∈ I, we say that f is continuous at c if limn→∞ f (an ) = f (c) for every sequence with limn→∞ an = c. We say that f is continuous on I if it is continuous at all c ∈ I. This definition of continuity says that taking a limit of a sequence commutes with applying the function, i.e., that applying the function and taking the limit of the result is the same as taking the limit first and then applying the function. In light of our definitions of limits above, we also have an equivalent and simpler definition for continuity. Definition 10.4. A function f : I → R is continuous at c ∈ I if limx→c f (x) = f (c). 11. The Intermediate Value Theorem At the √very start of this course, we showed how you can use the bisection method to compute 2, proving that the resulting sequenced converged. At the time, we pointed out that the bisection method works because of the Intermediate Value Theorem from Calcuus. In this section, we are going to consider the bisection method in general, and then use it to actually prove the Intermediate Value Theorem. No, this is not circular reasoning. Rather, our understanding of the real numbers has shifted from our Calculus-based intuition to a rigorous treatment in terms of Cauchy sequences and the Nested Intervals Theorem. The bisection method is our first example of a root approximation algorithm. Given a real-valued function f we often want to find its roots, the solutions of the equation f (x) =√0. If we cannot solve for the directly, it may be useful to approximate them. In our earlier 2 example, we were looking for a root of the equation f (x) = x2 − 2. Algorithm 11.1 (Bisection Method). If f : [a, b] → R is a continuous function and f (a) and f (b) have opposite signs, then we can approximate a root of f (x) via the following algorithm: (1) Begin with the closed interval [a, b]. (2) Consider the midpoint m = (a + b)/2. There are two possibilities: (a) If f (m) = 0, we are done, as we have found a root. APPLIED ANALYSIS 19 (b) If f (m) 6= 0, choose either the subinterval [a, m] or [m, b] so that the function f (x) takes opposite signs at the two endpoints. (3) Return to step (2). Lather, rinse, repeat. The midpoints generated in step (2) then form a sequence which converges to a root of f . Proof. This algorithm produces a chain of nested intervals. Furthermore, the size of the interval at the nth iteration is (b − a)/2n , which shrinks to 0 as n → ∞. We may therefore invoke the Nested Interval Theorem to produce a value c ∈ [a, b], which is the limit of the sequences of both the left and the right endpoints of those intervals. We must still verify that c is a root, i.e., that f (c) = 0. To do this, we construct two new sequences. For one sequence, from each successive interval we take the endpoint where f (x) is positive. For the other sequence, for each interval we take the endpoint where f (x) is negative. Both sequences must converge to c, and by continuity, if we apply the function f to the entries of these sequences, the resulting sequences must converge to f (c). However, one of those sequences has positive values, while the other has negative values. By Lemma 7.1, the limit of the positive sequence must be non-negative, and the limit of the negative sequence must be non-positive. Together, that means the limit must be f (c) = 0. While we presented the bisection method as a method for root approximation, solving f (x) = 0, we can also use it to approximate the solution of any equation of the form f (x) = d, which is a root of the equation f (x) − d = 0. The Bisection Method therefore gives us a constructive method of proving the Intermediate Value Theorem. This stands in contrast to the approach taught in most Calculus and Real Analysis courses, where the Intermediate Value Theorem is touted as an example of an existence theorem which does not actually give you an explicit solution. Corollary 11.2 (Intermediate Value Theorem). If the function f : [a, b] → R is continuous, and d is between f (a) and f (b), then there exists c ∈ [a, b] such that f (c) = d. Exercise 11.3. Use the Bisection Method to compute the Golden Ratio ϕ, which satisfies the equation 1 + 1/ϕ = ϕ, to three digits after the decimal point. 12. The Extreme Value Theorem One of the main applications of calculus is to minimize or maximize the value of a differentiable function. In this section, we show that every continuous function on a closed interval achieves its minimum or maximum. First we need a lemma, showing that every continuous on function on a closed interval is bounded. Lemma 12.1. A continuous function f : I → R on a closed interval I is bounded. Proof. Suppose that f is not bounded above on the closed interval I. Then for any potential bound M ∈ R, there must exist some a ∈ I with f (a) > M . In particular, we can find a sequence {an }∞ n=1 in I with f (a1 ) > 1, f (a2 ) > 2, f (a3 ) > 3, etc. Clearly the sequence {f (an )}∞ diverges, as does any subsequence. However, the Bolzano-Weierstrass Theorem n=1 20 GREGORY D. LANDWEBER ∞ tells us that the bounded sequence {an }∞ n=1 contains a convergent subsequence {ank }k=1 with limit in I, and since f is continuous we have lim f (ank ) = f lim ank , k→∞ k→∞ which contradicts that all subsequences of {f (an )}∞ n=1 diverge. Theorem 12.2 (Extreme Value Theorem). If a function f : I → R on a closed interval I is continuous, then there exist a, b ∈ I where f attains its minimum and maximum values: f (a) = max f (x) and f (b) = min f (x). x∈I x∈I Proof. By the above lemma, the continuous function f is bounded on a closed interval. To find the maximum and minimum of the bounded set A = {f (x) | x ∈ I}, we use the Least ∞ Upper Bound Lemma, which tells us that there exists sequences {f (an )}∞ n=1 and {f (bn )}n=1 converging to the least upper bound and greatest lower bound of A, respectively. The ∞ sequences {an }∞ n=1 and {bn }n=1 in I may not converge, but since I is a closed interval, the Bolzano-Weierstrass Theorem tells us that there exist convergent subsequences lim ank = a and lim bnk = b, k→∞ k→∞ with a, b ∈ I. Furthermore, by continuity we have lim f (ank ) = f (a) and lim f (bnk ) = f (b). k→∞ k→∞ ∞ But these are subsequences of {f (an )}∞ n=1 and {f (bn )}n=1 , which we already know converge to the least upper bound and greatest lower bound, so f (a) and f (b) must themselves be the maximum and minimum, respectively. 13. Derivatives Definition 13.1. If f : I → R is a continuous function on an interval I, the derivative of f at a point c ∈ R is f (x) − f (c) f (c + h) − f (c) = lim . x→c h→0 x−c h We say that f is differentiable at c if this limit exists, and that f is differentiable in an interval I if it is differentiable for all c ∈ I. In that case the first derivative of f can be viewed as a function f 0 : I → R. f 0 (c) = lim Theorem 13.2. Differentiability implies continuity. Proof. Suppose f : I → R is discontinuous at c ∈ I. Then there exists > 0 such that for all δ > 0, there exists x with |x − c| < δ and |f (x) − f (c)| ≥ . For such an x we have |f (x) − f (c)| > , |x − c| δ but since δ was arbitrary, we see that the slope (f (x) − f (c))/(x − c) is unbounded as x → c, so the derivative cannot exist. APPLIED ANALYSIS 21 Theorem 13.3 (Product Rule). If f, g are both differentiable at x, then so is their product, and we have (f g)0 (x) = f 0 (x)g(x) + f (x)g 0 (x). Proof. By some clever algebra, we rearrange the limit as f (x + h)g(x + h) − f (x)g(x) h→0 h f (x + h) − f (x) g(x + h) − g(x) = lim g(x + h) + f (x) h→0 h h 0 0 = f (x)g(x) + f (x)g (x), (f g)0 (x) = lim where we use the fact that a limit of a sum or product is the sum or product of the limits, provided those limits exist. Another way to view the derivative is to extend the real numbers by adjoining an additional infinitesimal element dx which is so small that (dx)2 = 0. Such a dx encapsulates the idea of limh→0 h. While there is no actual real number dx, we can work instead with the polynomial ring R[dx] in the variable dx, and then take the quotient R[dx]/hdx2 i by the ideal generated by dx2 . Elements of this quotient ring can be written in the form a + b dx, and in particular the derivative of a function satisfies f (x + dx) = f (x) + f 0 (x)dx. Note that if we replace dx by h in this formula, we get the formula for the linear approximation to the function f (x) at the point (x, f (x)). However, we assert that if dx is not just small but infinitesimal, then this formula becomes an equality, not an approximation. Theoretical mathematicians will be quick to point out that working with infinitesimals is very tricky, and that there are subtleties which we are ignoring. They are absolutely right, but that should not stop us from exploiting this surprisingly useful idea, which works extremely well in the applied setting. As an example, we can recast our proof of the product rule using infinitesimals: (f g)(x + dx) = f (x + dx)g(x + dx) = f (x) + f 0 (x)dx g(x) + g 0 (x)dx = f (x)g(x) + f 0 (x)g(x) + f (x)g 0 (x) dx + f 0 (x)g 0 (x)(dx)2 = (f g)(x) + f 0 (x)g(x) + f (x)g 0 (x) dx, where in the final step we recall that (dx)2 = 0. Note that the coefficient of dx is precisely the formula for the product rule! We can also use infinitesimals to give a simple proof of the chain rule: (f ◦ g)(x + dx) = f (g(x + dx)) = f g(x) + g 0 (x)dx = f (g(x)) + f 0 (g(x))g 0 (x)dx, where the coefficient of dx is the the formula for the chain rule. Here we used the fact that if dx is an infinitesimal satisfying (dx)2 = 0, then g 0 (x)dx is also an infinitesimal. 22 GREGORY D. LANDWEBER 14. The Mean Value Theorem In this section we prove the most important theorem in all of Numerical Analysis: the error estimate for Taylor series. Why is this so important? In Numerical Analysis, we not only find approximations, but we also need to provide error bounds so we know how good those approximations are. Most of the time, the functions we are working with can be expressed in terms of their Taylor series, and nearly all of our error estimates are derived from the Taylor series error term. But it gets better than that. Sometimes, just knowing that Taylor series exists for our functions is enough to show us how to manipulate our approximations to squeeze out even more accuracy. We will return to that idea later. Before we get to Taylor’s Theorem, we warm up with a lemma and two special cases, all familiar to us from Calculus. Lemma 14.1 (Critical Point). If f : I → R is differentiable at c ∈ I, and if f (c) ≥ f (x) for all x ∈ I, then f 0 (x) = 0. Proof. In the definition of the derivative f 0 (x), the numerator satisfies f (c) − f (x) ≥ 0 for all x ∈ I. If we take limx→c+ , then the denominator satisfies x − c > 0 as well, so we must have f 0 (c) ≥ 0. However, if we take limx→c− , then the denominator satisfies x − c < 0, so we must have f 0 (c) ≤ 0. It follows that f 0 (c) = 0. This lemma tells us that the extrema of a differentiable function are critical points where the derivative vanishes. This is the basic idea behind optimization problems in Calculus, where you attempt to find the extrema of a function by setting its derivative equal to zero. Theorem 14.2 (Rolle’s Theorem). If f : [a, b] → R is differentiable and f (a) = f (b), then there exists c ∈ [a, b] where f 0 (c) = 0. Proof. If f is differentiable, then it is also continuous. By the Extreme Value Theorem, there exist points c, d ∈ [a, b] such that f (c) is maximal and f (d) is minimal. If these maxima and minima occur at the endpoints a and b, then since f (a) = f (b) we see that the function is in fact constant, and so its derivative vanishes across the entire interval. Otherwise, there is a minimum or maximum at some point in the interior of the interval, and by the Critical Point Lemma we see that the derivative vanishes there. Theorem 14.3 (Mean Value Theorem). If f : [a, b] → R is differentiable, then there exists c ∈ [a, b] where f (b) − f (a) f 0 (c) = . b−a Proof. Consider the slope-adjusted function f (b) − f (a) (x − a). b−a We note that g(a) = g(b) = f (a). Applying Rolle’s Theorem, there exists c where g(x) = f (x) − 0 = g 0 (c) = f 0 (c) − which gives us our desired result. f (b) − f (a) , b−a APPLIED ANALYSIS 23 15. Taylor’s Theorem 15.1. Generalizing the Mean Value Theorem. Another way to look at the Mean Value Theorem is that it limits how far away from each other f (a) and f (b) can get. If the derivative is small, then f (a) and f (b) are close to each other, while if the derivative is very large then f (a) and f (b) can be far apart. Replacing the a, b, c with c, x, ξ respectively, we can rewrite the Mean Value Theorem as (15.1) f (x) = f (c) + (x − c)f 0 (ξ), for some ξ between c and x. This looks once again very much like the formula for the linear approximation of f (x) at the point (x, f (x)). However, the difference here is that this is an equality, not an approximation, but the price we pay is that we do not take f 0 (c) but rather f 0 (ξ) at some unknown point ξ between c and x. Suppose we know the value of f (c) and we would like to use f (c) as a (very simple) estimate for the value of some other f (x). If we know that the derivative f 0 (ξ) is small, or perhaps bounded by |f 0 (ξ)| < M for all ξ between c and x, then we have |f (x) − f (c)| < M |x − c|. This is the essence of a standard − δ argument, where we take δ = /M , letting M be an upper bound on the derivative. As far as error estimates go, that was pretty lame. However, if we replace the idea of a linear approximation with a Taylor series, we get even better approximations. Theorem 15.1 (Taylor’s Theorem). Suppose f : I → R is differentiable at least n + 1 times. Then for any c, x in I we have f (x) = f (c) + (x − c)f 0 (c) + x − c 00 (x − c)n (n) (x − c)n+1 (n+1) f (c) + · · · + f (c) + f (ξ). 2 n! (n + 1)! for some ξ between c and x. Furthermore, if |f (n+1) (ξ)| < M for all ξ between c and x, then the error in the nth degree Taylor series is bounded by n k X (x − c) (k) |x − c|n+1 f (c) < M. f (x) − k! (n + 1)! k=0 Proof. Fixing x 6= c, we solve for the coefficient of (x − c)n+1 that we are looking for: P k f (x) − nk=0 (x−c) f (k) (c) k! d= . (x − c)n+1 Our job is then to show that this coefficient is, in fact, f (n+1) (ξ) (n + 1)! for some ξ between c and x. Consider the function ! n X (x − y)k (k) g(y) = f (y) + d(x − y)n+1 . k! k=0 d= For y = x, the only term that survives is f (0) (y), so we have g(x) = f (x). 24 GREGORY D. LANDWEBER And for y = c, because of our choice of d we also obtain g(c) = f (x). By Rolle’s Theorem, we then know that there exists some ξ between c and x such that g 0 (ξ) = 0. Computing the derivative of g(y), we have ! n k k−1 X (x − y) (x − y) f (k) (y) + f (k+1) (y) − (n + 1)d(x − y)n g 0 (y) = (−k) k! k! k=0 (x − y)n (n+1) f (y) − (n + 1)d(x − y)n , n! where each (x − y)k−1 f (k) /(k − 1)! cancels against the (x − y)k f (k+1) (y)/k! in the previous term, leaving only the last term in the summand uncanceled. Setting g 0 (ξ) = 0 gives us = (x − ξ)n (n+1) f (ξ) = (n + 1)d(x − ξ)n , n! and solving for d we obtain f (n+1) (ξ) , d= (n + 1)! which completes our proof. Actually, the proof of Taylor’s Theorem is not that different from the proof of the Mean Value Theorem. In both cases, we constructed a function g(x) which takes the same value at both endpoints of the interval and then invoked Rolle’s Theorem. In fact, the Mean Value Theorem, particularly when written in the form of (15.1), is a special case of Taylor’s Theorem where n = 0. To see this, let’s redo Mean Value Theorem and its proof in the form of Taylor’s Theorem. Recall that the variables c, x, ξ in Taylor’s Theorem correspond to the variables a, b, c in the Mean Value Theorem, respectively. Also, the variable d in the proof of Taylor’s Theorem is a generalization of the slope of the secant line in the Mean Value Theorem. Corollary 15.2 (Mean Value Theorem Redux). If f is differentiable in the closed interval between c and x, then f (x) = f (c) + f 0 (ξ)(x − c) for some ξ between c and x. Proof. Solving for the coefficient of x − c, we obtain f (x) − f (c) d= , x−c which is the slope of the secant line for f (x) between c and x. We want to show that d = f 0 (ξ) for some ξ between c and x. Noting that the slope-adjusted function g(y) = f (y) + d(x − y) satisfies g(c) = g(x) = f (x), Rolle’s theorem tells us that there exists ξ between c and x such that g 0 (ξ) = 0. Computing the derivative of g, we have g 0 (y) = f 0 (y) − d APPLIED ANALYSIS so 0 = g 0 (ξ) = f 0 (ξ) − d implies f 0 (ξ) = d. 25 15.2. Error Estimates. Taylor’s Theorem is one of the most important results in numerical analysis. Most of our error estimates will arise by approximating our functions by polynomials, and then analyzing the Taylor series error term. In Calculus, you likely encountered Taylor series, showing that they allow you to approximate many functions by an infinite series. We will consider such infinite series in the next section. What Taylor’s Theorem tells us is how well a function can be approximated by a finite Taylor polynomial. We see that the error in the nth degree Taylor polynomial is proportional to (x − c)n+1 . We typically say that the nth degree Taylor polynomial approximates the function up to order xn+1 , and we write this in the form f (x) = pn (x) + O (x − c)n+1 , which means that there exists some constant of proportionality M > 0 such that |f (x) − pn (x)| < M |x − c|n+1 . In the case of Taylor’s Theorem, we can take M= maxξ∈[c,x] |f (n+1) (ξ)| , (n + 1)! or if we do not know the maximum of f (n+1) (ξ), we can instead take any upper bound. In practice, how do we find maximum of f (n+1 (ξ) that we need for the Taylor’s Theorem error bound? In some cases, we know enough about the derivatives of the function that we can come up with a good upper bound. For instance, we note that e < 3, or that | sin x| ≤ 1 and | cos x| ≤ 1, as we will use in examples in the next chapter. In general, we typically take x very close to the center c of the Taylor series, and most of the time we find that f (n+1) (ξ) is monotone between c and x. (We should still be careful of the exceptions where f (n+1) (ξ) has a critical point between c and x and is neither increasing or decreasing over the entire range, which is the case if we consider a Taylor series for cos x or sin x.) If the (n + 1)-st derivative is monotone, then it takes its maximum value between c and x at either the point ξ = c or ξ = x, so we need only compute its value at both and take the larger: (n+1) f (ξ) ≤ max |f (n+1) (c)|, |f (n+1) (x)| . In particular, if f (n+1) (c) is larger, then the Taylor’s Theorem error bound gives us |f (x) − pn (x)| ≤ |f (n+1) (c)| |x − c|n+1 , (n + 1)! which is precisely the absolute value of the next term in the Taylor series! This usually happens if the Taylor series alternates between positive and negative terms, and it is a special case of the Alternating Series Test discussed in the next section. However, if the Taylor series is not alternating, then usually the maximum is at the other point, ξ = x, and the error becomes |f (n+1) (x)| |f (x) − pn (x)| ≤ |x − c|n+1 . (n + 1)! 26 GREGORY D. LANDWEBER This takes a bit more computation that just the next term of the Taylor series, and often such error estimates are not terribly helpful, but it may be the best we can do. We will see examples of both error estimates in the next section. One last thing to keep in mind about these estimates it that which one we use often depends on whether x > c or x < c, as one of the series is alternating and the other is not. It is worth noting that while the notation and indeed the definition of big-O notation is the same in mathematics and computer science, the usage is different. In computer science, you typically consider expressions like O(n2 ) or O(n log n), which describe the time required to perform an algorithm as the size n of the problem gets large, or approaches infinity. In this context, big-O notation means “up to a constant and ignoring lower order terms”. In contrast, when analyzing Taylor series in mathematics, we consider expressions like O(x2 ) where x is small or approaches 0, and the notation essentially means “up to a constant and ignoring higher order terms”. This approach makes sense if we think in terms of infinite Taylor series, where the terms get successively smaller with higher and higher powers of x − c. However, in light of Taylor’s Theorem, we are not necessarily ignoring higher order terms, since the error can actually be expressed using a single term of degree n + 1. There are always two things to consider when analyzing the convergence of a Taylor series. First, since the error is O((x − c)n+1) ), Taylor series typically work well when x is close to the center c. For instance, if x − c = 0.1, then you would expect the error in the quadratic Taylor polynomial to be proportional to 0.13 = 0.001, and if you wanted d digits after the decimal place, you would consider adding up the first d terms of the Taylor series. However, we must also consider the constant M , which measures how wild the n + 1st derivative is. If the n + 1st derivative is small, then the (x − c)n+1 factor dominates. However, the n + 1st derivative may be large, sometimes dominating the error term. So a complete error analysis of Taylor series convergence must always examine the derivatives of the function. Finally, I want to explain how Taylor polynomials fit with the idea of an infinitesimals satisfying 2 = 0 that we introduced at the end of Section 13. If we generalize our notion of infinitesimals, instead using the nilpotent condition n+1 = 0 and assuming that lower powers of do not vanish, then we can write our Taylor polynomial exactly, without requiring an error term 2 n f (c + ) = f (c) + f 0 (c) + f 00 (c) + · · · + f (n) (c), 2 n! n+1 because the error term is proportional to = 0. Looking at infinitesimals differently, working with such an infinitesimal is equivalent to ignoring terms of degree n + 1 or higher, or working up to O(xn+1 ). 16. Series To better understand Taylor series, we consider series in general. Definition 16.1. A series is an infinite summation, and we say that a series converges ∞ X n=1 an = L APPLIED ANALYSIS if the sequence of partial sums sn = Pn k=1 27 ak converges to limn→∞ sn = L. Notice that the convergence of a series is expressed in terms of the convergence of the sequence of partial sums. In fact, studying series is equivalent to studying sequences. Not only can every series be expressed in terms of a sequence of partial sums, but also every sequence {sn }∞ n=1 can be expressed as the partial sums of the series with a1 = s1 , and an = sn − sn−1 . In other words, we can take the series whose terms are the successive differences of the the terms in our sequence. Several of our standard theorems about the convergence of sequences have immediate analogues for series, and in particular, the Series Comparison Test and Alternating Series Test below are the series versions of the Monotone Convergence Theorem and the Nested Intervals Theorem, respectively. So why do we make a big deal about series? It turns out that many interesting sequences can be expressed most simply as series, particularly Taylor series. Taylor series are actually series-valued functions, called power series. Definition 16.2. A power series is a function of the form p(x) = ∞ X an x n , n=0 or more generally, a power series centered at c ∈ R is a function of the form p(x) = ∞ X an (x − c)n . n=0 Example 16.3. The simplest power series is a geometric series, which converges to ∞ X xn = n=0 1 1−x for |x| < 1. To prove convergence, we consider the partial sums s n = 1 + x + x2 + · · · xn = 1 − xn+1 1 xn+1 = − , 1−x 1−x 1−x and taking the limit we obtain ∞ X 1 xn+1 1 limn→∞ xn+1 1 n x = lim − = + = . n→∞ 1 − x 1 − x 1 − x 1 − x 1 − x n=0 Recalling our proof that limn→∞ 1/2n = 0, we see that limn→∞ xn = 0 whenever |x| < 1. Definition 16.4. The Taylor series for an infinitely differentiable (called smooth or C ∞ ) function f (x) centered at c ∈ R is the series ∞ X f (n) (c) n=0 n! (x − c)n . We say that a function f : I → R is real analytic if its Taylor series centered at some point c ∈ I converges for all x ∈ I. 28 GREGORY D. LANDWEBER Oddly enough, most real analysis courses do not spend much time discussing real analytic functions. However, from a numerical point of view, real analytic functions are precisely the sort of well behaved functions that are best suited for approximation. Not only are such functions continuous and all their derivatives are continuous, but we can also approximate them by Taylor polynomials, with error bounded by Taylor’s Theorem. In some cases, simply knowing that a convergent Taylor series exists for a given function will allow us to refine our algorithms to extract even more accuracy, even without explicitly computing the error. Example 16.5. The Taylor series for the exponential function is particularly easy to compute, since ex is its own derivative. With f (n) (x) = ex , we have ∞ X xn x2 x3 x e = =1+x+ + + ··· . n! 2 3! n=0 This series converges for all x ∈ R. While it is immediately clear that it converges for small x, such as x = 0.1: e0.1 = 1 + 0.1 + 0.005 + 0.000166 + · · · ≈ 1.10517, It is not immediately clear that it converges for large x, such as x = 10: e10 = 1 + 10 + 50 + 167 + 417 + · · · . We can see why this converges by comparing two adjacent terms of the Taylor series. The term xn /n! differs from xn−1 /(n − 1)! by a factor of x/n. By the Archimedean property, for any x ∈ R, there exists n ∈ N such that n > x, which gives us x/n < 1. So even if x is very large, eventually the terms we are adding to the Taylor series start to decrease, and in fact they decrease faster than the terms in the geometric series. Let us now estimate the error in the Taylor series for e = e1 . Since f (n+1) (ξ) = eξ is an increasing function, it achieves its maximum between for ξ ∈ [0, 1] at ξ = 1. That maximum is e1 = e. (Note that at the other endpoint ξ = 0, we have e0 = 1, which is clearly smaller.) This isn’t very useful in making an error estimate because it is the same value we are trying to compute, and we cannot assume we already know it. However, we do know that e < 3, so we can use 3 as an upper bound for |f (n+1) (ξ)|. We therefore have 1 1 3 1 e − 1 + 1 + + + ··· + < , 2 6 n! (n + 1)! where we have left out the factor |1 − 0|n+1 = 1. Example 16.6. Consider the function f (x) = 1/(1 − x). Computing its Taylor series centered at 0, we have: f (x) = (1 − x)−1 f (0) = 1 f 0 (x) = (1 − x)−2 f 0 (0) = 1 f 00 (x) = 2(1 − x)−3 f 00 (0) = 2 f 000 (x) = 3!(1 − x)−4 f 000 (0) = 3! f (n) (x) = n!(1 − x)−n−1 f (n) (0) = n! APPLIED ANALYSIS 29 The corresponding Taylor series is then just the geometric series, ∞ X 1 = 1 + x + x2 + x3 + · · · = xn , 1−x n=0 since the f (n) (0) = n! factor cancels the n! in the denominator of the Taylor series formula. Taking x = 0.1, we compute the Taylor series error estimate for the series 10 1.111 · · · = . 9 (n+1) −n−2 The function f (ξ) = (n + 1)!(1 − x) is decreasing, so it takes its maximum value on the interval ξ ∈ [0, 0.1] at ξ = 0.1. There we have f (n+1) (0.1) = (n + 1)! , 0.9n+2 and our error estimate becomes 0.1n+1 10 10 − (1 + 0.1 + 0.01 + · · · + 0.1n ) < = n+2 , n+2 9 0.9 9 where the (n + 1)! terms in the numerator and denominator cancel. In particular, the n = 0 estimate of 1 is within an error of 10/81 = 0.123, and the n = 1 estimate of 1.1 is within an error of 10/729 = 0.014. Also, since the series consists of entirely positive terms, we know that the limit of the series is always greater than the estimate. Example 16.7. Consider the function f (x) = cos x. Computing its Taylor series centered at 0, we have f (x) = cos x f (0) = 1 f 0 (x) = − sin x f 0 (0) = 0 f 00 (x) = − cos x f 00 (0) = −1 f 000 (x) = sin x f 000 (0) = 0 f (4) (x) = cos x f (4) = 1, and the corresponding series is x2 x4 x6 + − + ··· . 2 4! 6! Like ex , this series also converges for all x ∈ R. Analyzing the convergence via Taylor’s Theorem, we note that we always have | sin x| ≤ 1 and | cos x| ≤ 1. We can therefore bound any derivative of cos x by 1, so the error in the nth degree Taylor polynomial is at most 1 | cos x − pn (x)| ≤ |x|n+1 . (n + 1)! cos x = 1 − In particular, since the 2nth degree Taylor polynomial is the same as the 2n + 1st degree Taylor polynomial (since the degree 2n + 1 term vanishes), its error bound is | cos x − p2n (x)| ≤ x2n+2 , (2n + 2)! 30 GREGORY D. LANDWEBER which is precisely the next term in the series! This is common for Taylor series with alternating positive and negative signs, and illustrates the Alternating Series Test below. Exercise 16.8. Compute the Taylor series for f (x) = ln x centered at c = 1. Using this Taylor series, plug in x = 0 to show that the corresponding series for ln 0 diverges. Then plug in x = 2 and use Taylor’s Theorem to show that the corresponding sequence converges to ln 2. √ Exercise 16.9. Compute the Taylor series for f (x) = x centered √ at c = 1, finding a general expression for the terms. Use this Taylor series to compute 1.1 to three digits after the √ decimal point. Then try to compute 2 using 10 terms of the Taylor series. What do you notice about the convergence? Theorem 16.10 (Alternating Series Test). Suppose that the terms an of a series alternate between positive and negative, that the |an | are decreasing (or non-increasing), and that P limn→∞ an = 0. Then the series ∞ n=1 an converges. Furthermore, the error in the partial sum sn is bounded by the next term of the series |an+1 |. Proof. This is a corollary of the Nested Interval Theorem. Suppose an is positive. Then since the series is alternating we have an+1 ≤ 0, and looking at the partial sums, we see that sn+1 = sn + an+1 ≤ sn . By our alternating and decreasing assumptions, we have an+2 ≤ −an+1 , or equivalently an+1 + an+2 ≤ 0, and so sn+2 = sn + an+1 + an+2 ≤ sn . Continuing this process, we see that sm ≤ sn for all m > n. Similarly, if an is negative we see that sm ≥ sn for all m > n. We therefore get a chain of nested intervals (assuming without loss of generality that a1 is positive): [s2 , s1 ] ⊃ [s2 , s3 ] ⊃ [s4 , s3 ] ⊃ [s4 , s5 ] ⊃ [s6 , s5 ] ⊃ [s6 , s7 ] ⊃ · · · . Since limn→∞ an = 0, these nested intervals shrink to zero size, and so there is a unique value L which is the limit of the sn for both the left (even) endpoints and right (odd) endpoints. P It follows that ∞ n=1 an = L. Furthermore, since L lies in each of the intervals, we see that the error |sn − L| is at most the size of the interval between sn and sn+1 , which is |an+1 |. In general, alternating series work extremely well for approximations. Instead of using Taylor’s Theorem, which requires bounding successively larger derivatives to determine the error, we need only look at the next term of the series! In fact, if a Taylor series is alternating, then the Taylor series error bound is often the absolute value of the next term of the series, as we saw at the end of the previous section. Exercise 16.11. Using the Taylor series for ln x centered at c = 1 from Exercise 16.8, compute both ln 0.9 and ln 1.1 to three digits after the decimal point. Estimate the error in each case and show that it is less that 0.001. Note that one of the series will be an alternating series, so you can use the Alternating Series Test. However, the other series will not be alternating, so you will need to use Taylor’s Theorem to estimate the error. Alternating series are much more likely to converge that series that are entirely positive. We make this more precise with the following definition and lemmas, which explore the differences between positive and alternating series. APPLIED ANALYSIS 31 P Definition 16.12. The series ∞ n=1 an is called absolutely convergent if the positive series P∞ of absolute values n=1 |an | converges. P Lemma 16.13 (Series Comparison Test). Suppose 0 ≤ an ≤ bn for all n ∈ N and ∞ n=1 bn P∞ converges. Then n=1 an converges. P Proof. We note that the sequence of partial sums {san }∞ for the an series is monotone n=1 P∞ b b a non-decreasing. If n=1 bn = L, we see that sn ≤ sn ≤ L, where sn is a partial sum for the P bn sequence. Since the sequence {san }∞ n=1 is monotone and bounded, it must converge by the Monotone Convergence Theorem. Lemma 16.14. Absolute convergence implies convergence. P P∞ Proof. Suppose that ∞ n=1 |an | converges. Multiplying all terms by 2, we see that n=1 2|an | converges as well. However, we also have 0 ≤ |an | + an ≤ 2|an |. By the Series Comparison Test, this tells us that the series ∞ X |an | + an n=1 converges. But then the difference of two convergent series ∞ X n=1 an = ∞ X 2|an | − n=1 ∞ X |an | − an n=1 must converge as well. The converse of this lemma is not true. For example, consider the alternating series ∞ X (−1)n n=1 n = − ln 2 converges, but taking the absolute values of its terms gives the divergent harmonic series ∞ X 1 . n n=1 Series that are convergent but not absolutely convergent are called conditionally convergent, and they typically converge more √slowly than absolutely convergent series. For example, the Taylor series you computed for 2 is conditionally convergent and converged slowly. P∞ n The convergence of a power series n=0 an x depends on the value of the variable x. In general, a power series converges on some interval, which may be a degenerate interval consisting of just the single point x = 0, or it may be the entire real line. In the inside of this interval of convergence, we will show that the power series converges absolutely by comparing the power series to a geometric series. P n Lemma 16.15. If a power series ∞ n=0 an x converges for some x = r, then it converges absolutely for all x with |x| < |r|. 32 GREGORY D. LANDWEBER P n n Proof. Since ∞ n=0 an r converges, we know that its terms an r are bounded. Suppose that |an rn | < M for all n ∈ N. Then we have x n |x|n 0 ≤ |an xn | = |an rn | n < M . |r| r If |x| < |r|, then |x/r| < 1, and the series ∞ ∞ x n X X x n M =M r r n=0 n=0 is a geometric series which converges. Then by the comparison test we see that P n converges, and so ∞ n=0 an x converges absolutely. P∞ n=0 |an xn | Theorem 16.16 (Radius of Convergence). There are three possibilities for the convergence P n of a power series of the form ∞ n=0 an (x − c) : (1) it converges absolutely for all x ∈ R, in which case it is called an entire function, (2) it converges for only x = c, where it is just the constant series a0 , or (3) there is a radius of convergence R such that it converges absolutely when |x − c| < R, and it diverges when |x − c| > R. Proof. Assuming that neither (1) nor (2) holds, we know that the power series diverges at some z 6= c, and by the contrapositive of the above lemma, it must diverge for all x with |x − c| > |z − c|. Consider the set S = {r ∈ R | the series diverges for |x − c| > r}. The set S contains |z − c| so it is non-empty, and all elements of S are positive so S is bounded. Let R be the greatest lower bound of the set S. If |x−c| > R, then there is some element r ∈ S so that |x−c| > r > R, and thus the power series diverges at x. However, if |x − c| < R, then there exists r 6∈ S with |x − c| < r < R, and some y ∈ R with |y − c| > r where the power series converges. But then |x − c| < |y − c| and by the above lemma the power series converges absolutely at x. Note that at the radius of convergence, i.e., at the points c ± R, this theorem does not say whether the series converges or not. In some cases the series may converge at both of these points, while in other cases it may diverge at one or both. Even if the series does converge at the radius of convergence, it cannot converge absolutely, although proving that is beyond the scope of this course. So what course must you take to make sense of this? Complex Analysis! It turns out that if you consider complex-valued power series, there is a theorem which says that there is always at least one point on the complex circle |z − c| = R where the power series diverges. However, that point may not lie along the real line. In practice, if we want to determine the radius of convergence of a power series, we use the Ratio Test from calculus, which once again compares the given series to a geometric series. P Theorem 16.17 (Ratio Test). Consider the series ∞ n=0 an , and take the limit of the ratios of successive coefficients an+1 . L = lim n→∞ an APPLIED ANALYSIS 33 If L < 1, then the series converges absolutely, and if L > 1 or L = ∞, then the power series diverges. Proof. If L < 1, then there exists r ∈ R such that L < r < 1. There then exists some N ∈ N such that for n ≥ N we have |an+1 /an | < r, or equivalently |an+1 | < r|an |. The P terms of the series ∞ the terms of a geometric series n=1 |an | are thus eventually less than P∞ with 0 < r < 1, and by the comparison test we see that n=1 |an | converges. It follows that P∞ n=1 an converges absolutely. If L > 1 or L = ∞, then there exists r ∈ R such that 1 < r < L, and similarly there exists N ∈ N such that |an+1 /an | > r for n ≥ N . With |an+1 | > r|an |, the terms an of our sequence are unbounded, and so the sequence cannot converge. Note that this theorem says nothing if the limit of the ratios is 1, which corresponds to the fact that a power series may converge or diverge at its radius of convergence. Example 16.18. Consider the power series ∞ X xn n=1 n . Applying the ratio test, we obtain n+1 x /(n + 1) = lim |x| n = |x| lim n = |x| 1 = |x|. lim n→∞ n + 1 n n→∞ n→∞ n + 1 x /n So the power series converges absolutely when |x| < 1 and diverges when |x| > 1. P n In general, given a power series ∞ n=0 an x we compute an+1 xn+1 an+1 , lim = |x| lim n→∞ n→∞ an x n an and the radius of convergence is then an . R= lim an+1 = n→∞ a n+1 limn→∞ an 1 So why do we care so much about the radius of convergence about a Taylor series? It is because it is very tempting to assume that Taylor series converge everywhere. Since they typically don’t, it is vital to know where a Taylor series converges before you attempt to use it, so you don’t accidentally try to compute the value of a divergent series. Also, a Taylor series will converge more slowly as you approach its radius of convergence, and knowing when that happens will help you analyze the convergence rate. 17. Numerical Differentiation 17.1. The Derivative. We recall the definition of the derivative f (x + h) − f (x) f 0 (x) = lim . h→0 h Instead of taking the limit and evaluating derivatives by the usual set of algebraic rules, in this section we compute derivatives numerically. Given the value of f (x) at two points, such 34 GREGORY D. LANDWEBER as f (x + h) and f (x), we can compute the slope of the secant line as an estimate of the derivative f 0 (x). But how good an estimate is it? We will bound the error and show how to minimize it. We start with our standard assumption that the function is real analytic, i.e, that it can be expressed as a Taylor series f 00 (x) f 000 (x) f (n) (x) + h3 + · · · + hn + ··· . 2 6 n! Subtracting f (x), we can estimate the error for the standard divided difference formula for the derivative f (x + h) − f (x) hf 0 (x) + O(h2 ) = = f 0 (x) + O(h), h h where the O(h) term involves the second derivative. This means that we expect the error in the approximation (f (x + h) − f (x))/h of the derivative to be proportional to h itself. What we do not see in the O(h) notation is the constant of proportionality, which is controlled by the second derivative f 00 (ξ) for ξ between x and x + h. If that derivative is nicely bounded, then the h dominates the expression. However, if that derivative is wild or unbounded, then the error may be too large to be useful. We note that the secant line we have been using goes through the point (x, f (x)) and looks either forward or back to the point (x + h, f (x + h)). But what if we look forward and back simultaneously, using both +h and −h and taking the average of the slopes of the two secant lines? This gives us the central difference formula 1 f (x + h) − f (x) f (x) − f (x − h) f (x + h) − f (x − h) (17.1) + . = 2 h h 2h f (x + h) = f (x) + hf 0 (x) + h2 This central difference formula takes just as much effort to compute as the standard divided difference formula, using the value of the function at just two points. However, when we analyze the error, we find that all the even powered terms in the two Taylor series cancel, f (x + h) − f (x − h) = f (x) + hf 0 (x) + h2 f 00 (x)/2 + h3 f 000 (x)/6 + · · · − f (x) − hf 0 (x) + h2 f 00 (x)/2 − h3 f 000 (x)/6 + · · · = 2hf 0 (x) + h3 f 000 (x)/3 + · · · , and we have f (x + h) − f (x − h) 2hf 0 (x) + O(h3 ) = = f 0 (x) + O(h2 ), 2h 2h where the O(h2 ) term involves the third derivative. What this means is that if we use the central difference formula rather than the standard divided difference formula, the error is much smaller. In particular, since the error is proportional to h2 rather than h, we get twice as many digits of accuracy with the central difference formula as we do with the standard divided difference formula, all without working any harder. (Provided that the third derivative f 000 (ξ) is well controlled for ξ between x and x + h.) This is our first glimpse of the magical power of numerical analysis! By simply adjusting our computation slightly we can often get significantly more accuracy. APPLIED ANALYSIS 35 17.2. Higher Derivatives. What about higher derivatives? If we want to compute the second derivative, we can approximate f 00 (x) by f (x + 2h) − f (x + h) f (x + h) − f (x) + O(h) − − O(h) h h h f (x + 2h) − 2f (x + h) + f (x) = + O(h). h2 f 0 (x + h) − f 0 (x) = h Here the two O(h) terms do not cancel completely. Rather, their difference eliminates the h terms in the Taylor series, leaving the h2 terms and higher, giving us O(h) − O(h) = O(h2 ). Furthermore, the O(h) term involves the third derivative of f . If we want to verify this explicitly from the Taylor series, we compute f (x + 2h) − 2f (x + h) + f (x) = f (x) + 2hf 0 (x) + 4h2 f 00 (x)/2 + 8h3 f 000 (x)/6 + · · · − 2 f (x) + hf 0 (x) + h2 f 00 (x)/2 + h3 f 000 (x)/6 + · · · + f (x) = h2 f 00 (x) + O(h3 ), noting that the f (x) and f 0 (x) terms all cancel. For even higher derivatives, we obtain the following results f (x) = f (x), f (x + h) − f (x) + O(h), h f (x + 2h) − 2f (x + h) + f (x) + O(h), f 00 (x) = h2 f (x + 3h) − 3f (x + 2h) + 3f (x + h) − f (x) f 000 (x) = + O(h), h3 ! n 1 X (n) n−k n f (x) = n (−1) f (x + kh) + O(h). h k k=0 f 0 (x) = These formulæ are not difficult to verify once we observe that the coefficients are taken from Pascal’s Triangle, and we leave this as an exercise for the interested reader. So far, we have considered only the standard divided difference formulae for the higher derivatives. What about the higher derivative version of the central difference formula? We notice that our formula for f 00 (x) is computed using f (x), f (x + h), and f (x + 2h). If we instead center those three points around f (x) itself, we obtain the central difference formula for the second derivative f 00 (x) = f (x + h) − 2f (x) + f (x − h) + O(h2 ), 2 h 36 GREGORY D. LANDWEBER which once again replaces the O(h) error with a much better O(h2 ) error, giving us twice as many digits of accuracy. Once again, we verify this via the Taylor series, f (x + h) − 2f (x) + f (x − h) = f (x) + hf 0 (x) + h2 f 00 (x)/2 + h3 f 000 (x)/6 + h4 f (4) /4! + · · · − 2f (x) + f (x) − hf 0 (x) + h2 f 00 (x)/2 − h3 f 000 (x)/6 + h4 f (4) /4! − · · · = h2 f 00 (x) + O(h4 ), where we observe that all the odd powers of h cancel between f (x + h) and f (x − h). How do these central difference formulae work? Why is it that simply centering the points where we evaluate the function around f (x) eliminates the O(h) term in the Taylor series, giving us a much better O(h2 ) error? It is because the central difference expressions are even functions of h. Recall that an even function is a function satisfying g(−h) = g(h), and that the Taylor series for an even function g(h) has only even powers of h. Since our central difference formulae for f 0 (x) and f 00 (x) do not change when we replace h with −h, their Taylor series expansions have no odd powers of h. In particular the O(h) error term involves h1 , which is an odd power, and so it must vanish, leaving an even O(h2 ) error term. Presto! No additional work, but a much better error estimate. In general, the central difference versions of our higher derivative formulæ are of the form f (x) = f (x), f 0 (x) = f 00 (x) = f 000 (x) = f (n) (x) = = f (x + h) − f (x − h) + O(h2 ), 2h f (x + h) − 2f (x) + f (x − h) + O(h2 ), 2 h f (x + 3h) − 3f (x + h) + 3f (x − h) − f (x − 3h) + O(h2 ), (2h)3 ! n 1 X n (−1)n−k f (x + kh − nh/2) + O(h2 ) hn k=0 k ! n X 1 n−k n (−1) f (x + 2kh − nh) + O(h2 ) n (2h) k k=0 where once again the coefficients come from Pascal’s triangle. The second version of the general formula simply replaces h with 2h. This gives better looking formulæ for odd derivatives, since we end up evaluating our function at points such as f (x + h) rather than f (x + h/2). 17.3. Richardson Extrapolation. But it gets better. Much better. Suppose we want to compute a first derivative using f (x + h) − f (x) = f 0 (x) + ch + dh2 + · · · , h where c and d are some constants coming from the Taylor series expansion. While this may give us a reasonable estimate for f 0 (x), if we want to improve our estimate we could plug in g(h) = APPLIED ANALYSIS 37 a smaller value of h ch dh2 + + ··· . 2 4 However, by taking a linear combination of g(h) and g(h/2) we can get an even better estimate that eliminates the ch term in the Taylor series expansion g(h/2) = f 0 (x) + g(h) − 2g(h/2) = −f 0 (x) + dh2 + ··· , 2 which gives us f 0 (x) = 2g(h/2) − g(h) + O(h2 ) −f (x + h) + 4f (x + h/2) − 3f (x) + O(h2 ). h While this does give us a formula involving f (x + h), f (x + h/2), and f (x), the much more important observation is about the formula involving g(h) and g(h/2). This formula is a weighted average of two quantities of the form ax + by where the weights satisfy a + b = 1. Such weighted averages are often used for linear interpolation, estimating a value between x and y, in which case we would take 0 ≤ a, b ≤ 1. In our case, we have a = 2 and b = −1, meaning that the g(h/2) terms is doubly overweighted. This is an example of extrapolation. We are effectively connecting g(h/2) and g(h) with a straight line, and extending that line to estimate g(0) = f 0 (x). Doing this explicitly, the point-point form of this line is = y= g(h) − g(h/2) g(h) − g(h/2) (x − h/2) + g(h/2) = x + 2g(h/2) − g(h), h/2 h/2 and the y-intercept gives our formula 2g(h/2) − g(h). Exercise 17.1. Suppose we are given a function with f (−1) = 1, f (0) = 0 and f (1) = 3. (1) Using the central difference formula, extrapolation formula, and second derivative formula, give the best possible estimates for f 0 (−1), f 0 (0), f 0 (1) and f 00 (0). (2) Find the unique quadratic polynomial y = p(x) = ax2 + bx + c passing through the points (−1, 1), (0, 0), (1, 3). (Hint. Plug in each of the (x, y) pairs to get a system of 3 equations in the 3 variables a, b, c.) (3) Compute p0 (−1), p0 (0), p0 (1), p00 (0). 18. Contractions Recall from our discussion of the Intermediate Value Theorem that any continuous function f : [a, b] → [a, b] from a closed interval back to itself has a fixed point, i.e., a point x ∈ [a, b] such that f (x) = x. To see this, consider the continuous function g(x) = f (x) − x. At a we have g(a) ≥ 0 and at b we have g(b) ≤ 0. So by the Intermediate Value Theorem, there must exist x ∈ [a, b] such that g(x) = 0, and thus f (x) = x. Note that while the Intermediate Value Theorem tells us that a fixed point exists, there may in fact be more than one. Fixed points are useful, since many computations can be restated in terms of finding the fixed points of well chosen functions. In this section, we demonstrate a simple technique to find the fixed point of a function, and we consider a class of functions called contractions which are guaranteed to have a unique fixed point. 38 GREGORY D. LANDWEBER Definition 18.1. A contraction is a function f : I → I satisfying |f (x) − f (y)| ≤ c |x − y| for all x, y ∈ I, where c < 1 is a constant. Such a function is called a contraction because all of the points in the domain get closer to each other when you apply the function. In other words, applying the function causes the space to contract. In the analysis literature, contractions are usually defined not only for real functions in terms of absolute values, but also for functions on a general metric space, with a distance function d(x, y). In that language, the condition is d (f (x), f (y)) ≤ c d(x, y). The simplest example of a contraction is a linear function f (x) = mx + b with slope |m| < 1. In that case we have |f (x) − f (y)| = |mx + b − my − b| = |mx − my| = |m||x − y|. Indeed, the condition for a contraction can be restated in terms of the slope of secant lines: f (y) − f (x) y − x ≤ c < 1, so we see that when our function is a contraction, the slopes of all secant lines have absolute values less than 1, or more precisely are bounded below 1 by some fixed c. The slopes for a contraction are therefore well controlled, so such functions cannot oscillate wildly. Lemma 18.2. Contractions are continuous. Proof. Let f : I → I be a contraction. Given any > 0, we take δ = /c. Then if |x − y| < δ = /c, we have |f (x) − f (y)| ≤ c |x − y| < c /c = . The proof we gave here is very similar to that of the homework problem showing that a linear function f (x) = mx + b is continuous. To show continuity, we do not need the slope to be a constant. Instead, it is sufficient that the slope of any secant line is bounded. This proof even works if the slopes of the secant lines are bounded by M ≥ 1, although the definition of a contraction requires c ≤ 1. Also note that the we chose in the proof does not actually depend on the point x. Usually, we talk about a function being continuous at a particular point x, so for every > 0, we can find a δ > 0 that works for points close to x. The typically changes depending on the value of x, but in this case, a single works for all x. This is a stronger form of continuity called uniform continuity. We could also have proved Lemma 18.2 using Definition 10.3 of continuity via sequences. Suppose that limn→∞ an = L. Then for any > 0, there exists N ∈ N such that n ≥ N implies |an − L| < . But if f is a contraction, we have |f (an ) − f (L)| ≤ c |an − L| < c < , and thus limn→∞ f (an ) = f (L). In other words, when the elements of our original sequence are close enough to their limit, they get even closer when we apply the contraction. APPLIED ANALYSIS 39 Lemma 18.3. If a contraction admits a fixed point, then that fixed point is unique. Proof. Suppose both x and y are fixed points of a contraction f . Then f (x) = x and f (y) = y, and so |f (x) − f (y)| = |x − y|. However, the contraction condition tells us that |f (x) − f (y)| ≤ c |x − y| with c < 1. The only way that both could be true is if x = y. In fact, every contraction admits a fixed point by our Intermediate Value Theorem argument above, and here is how you compute it. Theorem 18.4 (Contraction Mapping Theorem). If f : I → I is a contraction, then the sequence x0 , x1 = f (x0 ), x2 = f (x1 ) = f (f (x0 )), . . . , xn = f (xn−1 ) = f n (x0 ), . . . converges to the unique fixed point of f , regardless of the initial choice of x0 ∈ I. We provide two proofs of this theorem. For our first proof, we assume that we already know that a fixed point exists. Since a contraction is continuous, our Intermediate Value Theorem argument above tells us that. Proof assuming a fixed point exists. Suppose f (x) = x. For any initial estimate x0 , we have |x1 − x| = |f (x0 ) − f (x)| ≤ c |x0 − x|, |x2 − x| = |f (x1 ) − f (x)| ≤ c2 |x0 − x|, |xn − x| ≤ cn |x0 − x|. Given any > 0, we must choose N so that cN |x0 − x| < . Solving for c, we require , N > logc |x0 − x| which we know exists by the Archimedean property. (Note that taking the log base c reverses the inequality. This is because c < 1, and logc is a decreasing function. If you don’t buy that, you can instead take ln of both sides, and then you need to divide by ln c < 0 to solve for N .) Then for all n ≥ N , we have |xn − x| ≤ cn |x0 − x| ≤ cN |x0 − x| ≤ , which shows that limn→∞ xn = x. If we want to prove the Contraction Mapping Theorem on a more general metric space with a distance function, we do not have the benefit of the Intermediate Value Theorem to produce our fixed point. Instead, we need to construct the fixed point by showing that iterating the function produces a Cauchy sequence. Proof not assuming a fixed point exists. We show that the sequence {xn }∞ n=0 is Cauchy. Let d = |x0 − x1 | be the distance between the first two points. Then we have |x1 − x2 | = |f (x0 ) − f (x1 )| ≤ c|x0 − x1 | ≤ cd, |x2 − x3 | = |f (x1 ) − f (x2 )| ≤ c|x1 − x2 | ≤ c2 d, |xn − xn+1 | ≤ cn d. 40 GREGORY D. LANDWEBER By the triangle inequality, for m < n we obtain |xm − xn | ≤ |xm − xm+1 | + |xm+1 − xm+2 | + · · · + |xn−1 − xn | ≤ cm d + cm+1 d + · · · + cn−1 d = d cm (1 + c + · · · cn−m−1 ) = d cm = 1 − cn−m 1−c d (cm − cn ) 1−c So we see that as m and n get larger, the element xm and xn get closer and closer together. To make this more precise, suppose we are given > 0. We want to find N so that for all m, n > N we have |xm − xn | < . To do this, we must choose N such that d cN 1−c < =⇒ cN ≤ . 1−c d Taking logarithms of both sides, we obtain N ≥ logc 1−c , d and we know we can find such an N by the Archimedean property.(Note that taking the log base c reverses the inequality. This is because c < 1, and logc is a decreasing function. If you don’t buy that, you can instead take ln of both sides, and then you need to divide by ln c < 0 to solve for N .) Given m, n > N , we then have |xm − xn | ≤ d d d N (cm − cn ) ≤ max(cm , cn ) ≤ c ≤ , 1−c 1−c 1−c and so the sequence {xn }∞ n=0 is Cauchy. It therefore converges by the Cauchy Completeness Theorem to limn→∞ xn = x. Finally, we show that x is a fixed point of f . Since applying f simply shifts the elements of the sequence by one, by continuity we observe that f (x) = f ( lim xn ) = lim f (xn ) = lim xn = x, n→∞ n→∞ n→∞ and so x is indeed the unique fixed point of f . These proofs not only show that the sequence obtained by iterating the function f converges to the fixed point, but they also give us a way to estimate the convergence and control the error. From our first proof, we see that |xn − x| ≤ c |xn−1 − x|, so each iteration of the function improves the estimate by a factor of c < 1. This is called linear convergence, and it is similar to the convergence rate of the bisection method, where each successive estimate had an error bounded by half the error of the previous estimate. Provided that c < 1, a sequence with linear convergence is guaranteed to converge. The difficulty with this method is coming up with the initial error |x − x0 |. We do not know where the fixed point is (if we did, we would not be trying to estimate it), so all we have to go on is that the fixed point is somewhere in the domain I of the function. APPLIED ANALYSIS 41 Our second proof gives us a more precise estimate of the error. Given any initial estimate x0 we consider the distance d = |x0 − f (x0 )| between our initial estimate and the next estimate obtained by applying the function f . We then have d d |xn − x| = lim |xn − xm | ≤ lim (cn − cm ) = cn . m→∞ m→∞ 1 − c 1−c This gives us a good error bound, controlled both by the initial distance d and by the number of times n we have iterated the function f . Although the sequence will always converge, we get much better error estimates if the initial distance d is small. This process if iterating the function f (x) over and over again is one of the core ideas of dynamical systems. It is basically a feedback loop, where the output of one step is the input in the next. With the Contraction Mapping Theorem we have what is called a stable fixed point, or an attractor, where the dynamical system sucks in all (nearby) values to the fixed point. Stable fixed points are very forgiving. If you move slightly away from the fixed point, applying the function a few times will take you back to the fixed point. This is in contrast to an unstable fixed point, also called a repeller, where if you move slightly away from the fixed point, applying the function takes you even farther away. 19. Fixed Point Iteration So how do we know that a given function is a contraction? We do not usually verify the contraction condition for all x, y ∈ I, unless there is an obvious reason why all the slopes of the secant lines are small, such as if the function is linear. Instead, we switch from using the slopes of secant lines to using derivatives, i.e., the slopes of tangent lines, via the Mean Value Theorem. Theorem 19.1 (Fixed Point Iteration Theorem). Suppose f (x) = x, and that f is continuously differentiable on some open interval containing x. If |f 0 (x)| < 1, then there exists an open interval I containing x on which f : I → I is a contraction. Proof. Since |f 0 (x)| < 1, we have −1 < f 0 (x) < 1. Since f 0 is continuous, we can use the Sign Preserving Property for continuous functions to show that |f 0 (x)| is bounded away from 1 in a small enough interval containing x. More precisely there exists a constant c+ and a δ+ > 0 such that f 0 (y) < c+ < 1 whenever |x − y| < δ+ , and there exists another constant c− and a δ− > 0 such that −1 < c− < f 0 (y) whenever |x − y| < δ− . Combining these, taking c = max(|c+ |, |c− |) and δ = min(δ+ , δ− ), we see that |f 0 (y)| < c < 1 for all y ∈ I = (x − c, x + c). For any z ∈ I, the Mean Value Theorem tells us that there exists y between x and z with f (z) − f (x) = f 0 (y). z−x However, y is also in I, and taking absolute values and using the result of the previous paragraph, we have f (z) − f (x) 0 z − x = |f (y)| < c < 1. It follows that f : I → I is a contraction. 42 GREGORY D. LANDWEBER Although this proof may seem complicated, the fundamental idea behind the Fixed Point Iteration Theorem is fairly intuitive. If we want to bound the slopes of secant lines, the Mean Value Theorem tells us that for a differentiable function, the slopes of secant lines are actually given by derivatives of the function. So we want to bound the derivative. But if the derivative is continuous, if it satisfies the appropriate bound at the fixed point, then it must also satisfy that bound at other nearby points, in a sufficiently small interval. Example 19.2. We will compute the fixed point of cos x (in radians). From looking at where the graphs of y = cos x and y = x cross, we see that cos x has a unique fixed point, somewhere around 0.7. The function cos x is almost but not quite a contraction. If we consider the slopes of tangent lines, i.e., derivatives, we note that the derivative of cos x is sin x. While | sin x| ≤ 1 for all x, we do not have | sin x| ≤ c < 1, so the slopes of the tangent and also secants are not bounded below 1. If we consider the Fixed Point Iteration Theorem, we must show that |f 0 (x)| = | sin x| < 1 at the fixed point x. We do know that | sin x| ≤ 1 for all x. We just need to show that | sin x| 6= 1 at the fixed point x. However, we have sin x = ±1 only at π/2 + kπ for k ∈ Z, and none of those points are the fixed point for cos x. So we now take any x0 sufficiently close to the fixed point (in fact, ANY initial value will do), and start iterating cos x. Starting at x0 = 0, we obtain the sequence: 0, 1, 0.54030230586814, 0.857553215846393, 0.65428979049778, 0.793480358742565, 0.701368773622757, 0.763959682900654, 0.722102425026708, 0.75041776176376, 0.73140404242251, 0.744237354900557, 0.735604740436347, 0.741425086610109, 0.737506890513243, 0.740147335567876, 0.738369204122323, 0.739567202212256, . . . As far as our computational algorithms go, this is the easiest we have seen! All we need to do is keep mashing the cos button our our calculator until all the digits stay the same. So why doesn’t it matter what our initial value is? Even though cos x is not a contraction in general, it is a contraction on any interval where |f 0 (x)| = | sin x| < 1. In particular, it is a contraction on the interval (−π/2, +π/2). For any starting value x0 , we have cos x0 ∈ [−1, +1] ⊂ (−π/2, +π/2), so after just one iteration of cos x, our iterates are indeed inside an interval where cos x is a contraction. Example 19.3. Consider the Golden Ratio φ, approximately 1.6, which is a root of the quadratic equation φ2 − φ − 1 = 0. We can rewrite this to exhibit φ as the fixed point of the function f (x) = x2 − 1. What happens if we iterate this function near 1.6? Starting at x0 = 1, we obtain 1, 0, −1, 0, 1, 0, −1, 0, . . . . That sequence clearly does not converge! What if we start even closer to the Golden Ratio, at x0 = 1.6. We compute the sequence 1.6, 1.56, 1.4336, 1.0552, 0.1134, . . . , APPLIED ANALYSIS 43 whose terms are getting farther away from φ and from each other, so it also diverges. What is going wrong? Computing the derivative at the fixed point φ, we have f 0 (x) = 2x and f 0 (φ) = 2φ, which is approximately 3.2. So we fail the Fixed Point Iteration Theorem condition on the derivative. Indeed, the argument of the Fixed Point Iteration Theorem can be modified to show that if |f 0 (x)| > 1, then x is an unstable fixed point or repeller. Let’s compute the Golden Ratio a different way. Rewriting its quadratic equation, we have 1 φ=1+ . φ Letting f (x) = 1 + 1/x, we can iterate: 3 5 8 13 21 34 1, 2, , , , , , , . . . . 2 3 5 8 13 21 This sequence does converge, and we notice that each fraction is a ratio of successive terms of the Fibonacci series, which is one of the many cool properties of the Golden Ratio. How fast does this converge? We compute f 0 (x) = −1/x2 , and near the Golden Ratio of 1.6 we have | − 1/(1.6)2 | ≈ 0.4. So we expect to find linear convergence with a constant of c ≈ 0.4. Note that this fixed point iteration sequence will converge to the Golden Ratio φ slightly faster than solving the quadratic equation by the bisection method, which gives linear convergence with a constant of c = 1/2. Exercise 19.4. Suppose we iterate the function f (x) = 1 + 1/x as in the example above to compute the Golden Ratio φ. But suppose that instead of starting with x0 = 1, we choose two arbitrary real numbers f0 6= 0 and f1 ≥ f0 and start with initial estimate x0 = f1 /f0 ≥ 1. Show that the successive terms of the fixed point iteration are all ratios of the form xn = fn+1 , fn where the fn satisfy the Fibonacci relation fn = fn−1 + fn−2 . Show that although this is not necessarily the usual Fibonacci sequence, the ratios of successive terms still converges to the Golden Ratio φ. Since we are most interested in functions that are real analytic, i.e., functions that can be approximated by Taylor series, we will now use Taylor’s Theorem to analyze the Fixed Point Iteration error. In some cases, this analysis will show that the convergence is even better than the linear convergence guaranteed by the Contraction Mapping Theorem. If c is a fixed point of f , so that f (c) = c, then Taylor’s Theorem gives us f (x) = c + f 0 (ξ)(x − c) = c + O(h), where ξ is between c and x. This means that |f (x) − c| = |f 0 (ξ)| |x − c|. 44 GREGORY D. LANDWEBER If we have a bound M such that |f 0 (ξ)| ≤ M < 1 for all ξ between c and our initial estimate, then we obtain the linear convergence |f (x) − c| ≤ M |x − c|. Since M < 1, each successive estimate is closer to the fixed point c than the previous, so our initial bound M on f 0 (ξ) holds for subsequent estimates as well. In our proof of the Fixed Point Iteration Theorem, we assumed that |f 0 (c)| < 1, so the Sign Preserving Property of continuous functions gave us |f 0 (ξ)| < M < 1 for ξ sufficiently close to c. If f 0 (c) = 0, then we get even faster convergence. Then Taylor’s Theorem gives us f 00 (ξ) (x − c)2 2 = c + O(h2 ), f (x) = c + where ξ is between c and x. This means that |f 00 (ξ)| |x − c|2 . 2 00 If we have a bound M such that |f (ξ)|/2 ≤ M for all ξ between c and our initial estimate, then we obtain quadratic convergence |f (x) − c| = |f (x) − c| ≤ M |x − c|2 . In general, if multiple derivatives f 0 (c) = f 00 (c) = · · · = f (n−1) (c) = 0 at the fixed point all vanish, then the fixed point iteration has degree n convergence, with |f (x) − c| ≤ M |x − c|n , where M is an upper bound on |f (n) (ξ)|/n!. 20. Root Approximation We have already seen how to use the Bisection Method to approximate the roots of a function, solving f (x) = 0. The Bisection Method is an iterative method with linear convergence and a constant of c = 1/2. The strength of the Bisection Method is that is always works, as it is guaranteed by the Intermediate Value Theorem. However, it is relatively slow, providing only one binary digit of accuracy at each stage, and taking a little over 3 steps for each decimal digit of accuracy. In this section we discuss other iterative methods of root approximation with better convergence. 20.1. Newton’s Method. If we are working with a function that is differentiable, where we can actually compute the derivative, we can use Newton’s Method to approximate its roots. If we have an estimate xn for the root, we replace f (x) with its linear approximation l(x) at the point (xn , f (xn )). By the point-slope form of the line, we have l(x) = f 0 (xn )(x − xn ) + f (xn ). Solving for the root of l(x), we obtain our next estimate 0 = f 0 (xn )(xn+1 − xn ) + f (xn ) =⇒ xn+1 = xn − f (xn ) . f 0 (xn ) APPLIED ANALYSIS 45 This gives us a fixed point iteration for the function g(x) = x − f (x) . f 0 (x) How fast does this converge? Using the Fixed Point Iteration Theorem, we compute |g 0 (x)| and obtain (20.1) g 0 (x) = 1 − f 0 (x)2 − f (x)f 00 (x) f (x)f 00 (x) = . f 0 (x)2 f 0 (x)2 At the root we have f (x) = 0, so g 0 (x) = 0 vanishes there. This means that g(x) is a contraction sufficiently close to the root, and furthermore we expect to see at least quadratic convergence. Computing the second derivative, |g 00 (x)|, we obtain f 0 (x)3 f 00 (x) + f (x)f 0 (x)2 f 000 (x) − 2f (x)f 0 (x)f 00 (x)2 f 0 (x)4 f 0 (x)2 f 00 (x) + f (x)f 0 (x)f 000 (x) − 2f (x)f 00 (x)2 = , f 0 (x)3 g 00 (x) = (20.2) and at the root with f (x) = 0, we obtain g 00 (x) = f 00 (x)/f 0 (x). We therefore expect quadratic convergence, giving us |xn+1 − x| ≤ c |xn − x|2 , with 00 f (x) c ≈ 0 . 2f (x) Alternatively, we can analyze the iteration of Newton’s Method directly via Taylor’s Theorem. Suppose the root is f (y) = 0, and we have an approximation x. Expanding the Taylor series at x, we obtain 0 = f (y) = f (x) + f 0 (x)(y − x) + f 00 (ξ) (y − x)2 , 2 where ξ is between x and y. Dividing by f 0 (x), we obtain 0= f (x) f 00 (ξ) + y − x + (y − x)2 , f 0 (x) 2f 0 (x) and rearranging the terms shows us that the next iteration of Newton’s Method is x− f (x) f 00 (ξ) = y + (y − x)2 , f 0 (x) 2f 0 (x) and the distance to the root changes according to 00 x − f (x) − y = |f (ξ)| |x − y|2 , 2|f 0 (x)| f 0 (x) giving us our expected quadratic convergence. 46 GREGORY D. LANDWEBER √ Example 20.1. Let’s compute 2, which is a root of the function f (x) = x2 −2. By Newton’s Method, we iterate the function g(x) = x − x2 − 2 x 1 = + . 2x 2 x Starting at x0 = 1, we obtain 1, 1.5, 1.4166667, 1.4142157, 1.4142137 √ which gets us very quickly to 2 ≈ 1.4142136. Notice that we are approximately doubling the number of digits of accuracy at each step of the iteration. This quadratic convergence is much faster than the linear convergence of the bisection method, which takes around 10 steps just for every 3 digits of accuracy. For our error estimate, we have f 00 (x) = 2 and f 0 (x) = 2x ≈ 3 near the root, so we obtain quadratic convergence with c ≈ 1/3 sufficiently close to the root. Exercise 20.2. Use Newton’s Method to construct the function you need to iterate to compute √ √ n for any n ∈ N. Iterate this function to compute 5 to the maximum precision allowed by your calculator or computer. Then compute the Golden Ratio √ 1+ 5 φ= . 2 Exercise 20.3. Use Newton’s Method to find x such that x x cos = sin . 4 4 Compute it to the maximum precision allowed by your calculator or computer. Exercise 20.4. Suppose we are using Newton’s Method to find a root r of a function f (x) where f 00 (r) = 0. In this case, the function g(x) that we iterate satisfies not only the usual g(r) = r and g 0 (r) = 0, but also g 00 (r) = 0. With the second derivative vanishing, our discussion at the end of the last section shows that fixed point iteration in this case has at least cubic convergence, with 000 g (r) |xn − r|3 . |xn+1 − r| ≈ 6 Compute the constant |g 000 (r)/6|. (Hint. Differentiate (20.2) and remember that any terms involving g 0 (x) or g 00 (x) will vanish when you plug in x = r at the end of the computation.) Newton’s Method works very well to approximate roots provided that our initial estimate is close enough to the root, or that f 00 (ξ)/f 0 (ξ) never gets too large. In particular, we can run into problems if the function has a local minimum or maximum (or other critical point) near our initial estimate, or indeed anywhere between our initial estimate and the root we are trying to approximate. Also, Newton’s Method is not guaranteed to converge. Unlike with the bisection method, we do not have the benefit of the Intermediate Value Theorem telling us that our continuous function must have a root inside some interval. APPLIED ANALYSIS 47 Example 20.5. Let’s use Newton’s Method to approximate the root of f (x) = x2 . Iterating x x x2 =x− = , 2x 2 2 we see that it does indeed converge to the root 0. However, the convergence is not quadratic! Instead we have linear convergence g(x) = x − |xn+1 − 0| = 1 |xn − 0|. 2 This example illustrates that Newton’s Method actually converges linearly, not quadratically whenever we have f 0 (x) = 0 at the root x. Why does that happen? It is because the quadratic convergence constant is f 00 (x) / 2f 0 (x), which blows up when f 0 (x) = 0. Viewed in terms of finding the roots of successive tangent lines, the tangent line is horizontal where the derivative vanishes, so the tangent lines at nearby points have roots farther away than we expect. So Newton’s Method works best when the function crosses the x-axis at an oblique angle at the root. In the special case where the derivative vanishes at the root, we recall from (20.1) that the fixed point iteration contraction constant is g 0 (x) = f (x)f 00 (x) 0 = 0 2 f (x) 0 at the root, where now both f (x) = 0 and f 0 (x) = 0. To resolve this indeterminate expression, we use L’Hôpital’s rule (differentiating both numerator and denominator) twice, giving f 0 (y)f 00 (y) + f (y)f 000 (y) y→x 2 f 0 (y)f 00 (y) lim g 0 (y) = lim y→x f 00 (y)2 + f 0 (y)f 000 (y) + f 0 (y)f 000 (y) + f (y)f (4) (y) y→x 2f 00 (y)2 + 2f 0 (y)f 000 (y) 1 = , 2 = lim so we find that we have linear convergence with constant 1/2 as we saw in Example 20.10. There are variants of Newton’s Method that are once again quadratically convergent in this case, as well as for roots of multiplicity greater than 2. 20.2. Secant Method. Another potential problem with Newton’s Method is that to use it we must be able to compute the derivative f 0 (x). This is easy if the function f (x) is given to us symbolically, since we can differentiate pretty much anything. However, if we do not have a symbolic expression for our function, the best we can do is approximate the derivative using the slopes of secant lines. This may be the case if the values of our function are determined by performing an experiment or measurement. We can perform the same basic construction as Newton’s Method, but rather than replacing our original function f (x) with a tangent line at a single point, we use a secant line intersecting our function at two points. This gives rise to the Secant Method of root approximation. Given estimates xn−1 and xn−2 , the secant line intersecting the graph of our 48 GREGORY D. LANDWEBER function f (x) at the points (xn−1 , f (xn−1 )) and (xn−2 , f (xn−2 )) is given by the point-point form of the line: f (xn−1 ) − f (xn−2 ) l(x) = (x − xn−1 ) + f (xn−1 ). xn−1 − xn−2 Solving for a root xn of l(x) and performing some algebra, we obtain the iterative formula xn−1 − xn−2 + xn−1 f (xn−1 ) − f (xn−2 ) f (xn−1 )xn−2 − f (xn−2 )xn−1 = . f (xn−1 ) − f (xn−2 ) xn = −f (xn−1 ) Note that this formula expresses the next iterate xn as a sort of weighted average of the prior two iterates xn−1 and xn−2 . If f (xn−2 ) and f (xn−1 ) are both positive or both negative, then this formula is an overweighted average that extrapolates xn , like we did with Richardson extrapolation in Section 17.3. If however f (xn−1 ) and f (xn−2 ) have opposite signs, then xn is interpolated between xn−2 and xn−1 . In this case, the Secant Method behaves like a smarter version of the Bisection Method, choosing a new approximation not at the midpoint of the interval, but rather taking into account the values of the function at the two endpoints. Lemma 20.6. Let r be a root of the function f (x). The Secant Method has convergence governed by the error estimate 00 f (r) |xn − r| ≈ 0 |xn−1 − r| |xn−2 − r| . 2f (r) Proof. Computing the left hand side and then factoring out the error terms |xn−1 − r| and |xn−2 − r|, we obtain f (xn−1 )xn−2 − f (xn−2 )xn−1 − r |xn − r| = f (xn−1 ) − f (xn−2 ) f (xn−1 )(xn−2 − r) − f (xn−2 )(xn−1 − r) . = f (xn−1 ) − f (xn−2 ) f (xn−1 ) f (xn−2 ) xn−1 − r − xn−2 − r |xn−1 − r| |xn−2 − r| = f (x ) − f (x ) n−1 n−2 Expanding the Taylor series of f (x) centered at r, we have f (x) ≈ f 0 (r)(x − r) + f 00 (r) (x − r)2 , 2 where the constant terms f (r) = 0 vanishes because r is a root. Dividing by x − r gives f (x) f 00 (r) ≈ f 0 (r) + (x − r). x−r 2 APPLIED ANALYSIS 49 Plugging this into our approximate expression for the error |xn − r|, we obtain f 00 (r) f 00 (r) 0 0 f (r) + (xn−1 − r) − f (r) − (xn−2 − r) 2 2 |xn − r| ≈ |xn−1 − r| |xn−2 − r| f (xn−1 ) − f (xn−2 ) 00 f (r) (xn−1 − xn−2 ) 2 = |xn−1 − r| |xn−2 − r| f (xn−1 ) − f (xn−2 ) 00 f (r) ≈ 0 |xn−1 − r| |xn−2 − r| , 2f (r) where in the final step we observe that the slope of the secant line is approximately the derivative f 0 (r). Note that the convergence of the Secant Method is quite similar to that of Newton’s Method. Both have the same constant |f 00 (r)/2f 0 (r)|. Also, both involve the product of two errors on the right hand side. For Newton’s Method, that product is |xn−1 − r|2 , giving quadratic convergence, while for the Secant Method it is |xn−1 − r| |xn−2 − r|, multiplying the two previous errors. In fact, since the tangent line can be viewed as the limit of secant lines as the two points converge to the same value, the convergence bound for Newton’s Method can be viewed as the limit of the convergence bound for the Secant Method. Although we stated the convergence of the Secant Method in terms of an approximate error, we could indeed refine it to give a precise error bound in terms of f 0 (ξ) and f 00 (η) for some values of ξ and η appropriately close to the estimates xn−1 , xn−2 and the root r. We would then need to refine our proof, using Taylor’s Theorem to replace the approximate second order Taylor expansion of f (x) with a precise equality, and converting the slope of the secant line to a derivative via the Mean Value Theorem. Although the above Lemma quite nicely describes the error of the Secant Method in terms of the product of the two previous errors, it is customary to describe the convergence of each term in terms of just one previous error. This gives us a surprising and interesting result. Theorem 20.7. Let r be a root of the function f (x). The Secant Method has super-linear convergence (i.e., convergence of order greater than 1), given by 00 f (x) φ−1 |xn − r| ≈ 0 |xn−1 − r|φ , 2f (x) where the order is the Golden Ratio φ ≈ 1.6. Sketch of Proof. By the above Lemma, we have |xn − r| ≈ M |xn−1 − r| |xn−2 − r| . Multiplying both sides by M , we obtain M |xn − r| ≈ M |xn−1 − r| M |xn−2 − r| , 50 GREGORY D. LANDWEBER and taking the natural log of both sides gives us ln(M |xn − r|) ≈ ln(M |xn−1 − r|) + ln(M |xn−2 − r|), Defining fn = ln(M |xn − r|), we obtain the Fibonacci relation fn ≈ fn−1 + fn−2 . However, any sequence fn satisfying the Fibonacci relation can be approximated by powers of the Golden Ratio φ fn ≈ Cφn . Indeed, we have seen in Exercise 19.4 that the ratio of successive Fibonacci numbers tends to φ, and this is true regardless of the initial values of f0 and f1 (except in a special case that does not apply here where the ratio tends instead to φ − 1). We then have fn ≈ φfn−1 , or in terms of our error estimates ln M |xn − r| ≈ φ ln M |xn−1 − r|. Exponentiating both sides gives M |xn − r| ≈ (M |xn−1 − r|)φ , from which we obtain our desired approximation. Exercise 20.8. Use the Secant Method to approximate the Golden Ratio φ satisfying the equation φ2 − φ − 1 = 0. Let 1 and 2 be your initial estimates. Compute the first few estimates as fractions and try to discern the pattern. Then compute your answer numerically to the limit of your calculator or your computer’s floating point precision. How fast does this converge? Exercise 20.9. Use the Secant Method to approximate π by calculating the root of the equation sin x = 0 between 3 and 4. Compute your answer to the limit of your calculator or your computer’s floating point precision. How fast does this converge? Example 20.10. Let’s use the Secant Method to approximate the root of f (x) = x2 . In this case, at our root 0 we have not only f (x) = 0, but also f 0 (0) = 0. In our error formulæ from Lemma 20.6 and Theorem 20.7, the denominator in the constant vanishes, so the constant becomes infinite. This means that we expect the convergence to be slower than usual, just like how Newton’s Method reverted from quadratic to linear convergence for this function. Given two estimates, xn−1 and xn−2 , our next estimate is given by the formula x2n−1 xn−2 − x2n−2 xn−1 xn = . x2n−1 − x2n−2 APPLIED ANALYSIS 51 Starting with initial estimates 2 and 1, we obtain 2 x1 = 2 = , 1 2 x2 = 1 = , 2 2−4 2 x3 = = , 1−4 3 4/9 − 2/3 2 x4 = = , 4/9 − 1 5 4/25 · 2/3 − 4/9 · 2/5 1 2 = = , x5 = 4/25 − 4/9 4 8 and we notice that each term is of the form xn = 2/fn , where fn is the nth Fibonacci number. Recalling that the ratios of successive Fibonnaci numbers satisfy fn−1 1 lim = = φ − 1, n→∞ fn φ we find that |xn − 0| ≈ (φ − 1) |xn−1 − 0|, so this sequence has linear convergence, with constant φ − 1 ≈ 0.618. Comparing this to the linear convergence with constant 0.5 that we saw in Example 20.10 for this function with Newton’s method, the Secant Method performs slightly worse than Newton’s method did, which is what we have come to expect from the Secant Method. Exercise 20.11. Prove that xn = 2/fn in the previous example. 21. Numerical Integration 21.1. Riemann Sums. We recall the Riemann integral, which computes the area under a curve. The simplest method of approximating the integral Z b f (x) dx a is via Riemann sums, dividing the interval from a to b into n subintervals, and adding up the areas of rectangles with heights given by the value of the function, Z b n X b−a f (x) dx = lim f (xi ). n→∞ n a k=1 Here we take the limit as the number n of rectangles tends to infinity, which shrinks the widths (b − a)/n of the rectangles to 0. The points xi at which we evaluate f (xi ) can be taken anywhere in each of the subintervals. Various conventions include letting xi be the left endpoint, the right endpoint, the midpoint, or the point where f (x) takes its maximum or minimum on the interval (assuming such points exist, which is guaranteed by the Extreme Value Theorem if f (x) is continuous). In the limit, it does not matter which xi we take; all of these methods converge to the same value, provided that the function f (x) is sufficiently well behaved. (Actually, this is circular reasoning. We say that a function is integrable precisely 52 GREGORY D. LANDWEBER when the limit is the same regardless of the choice of the points xi in each subinterval. But integrability is a subtle issue which we will not explore in these notes.) Let’s analyze the error in this method of integration by Riemann sums. We start with the error in estimating the integral by a single rectangle. We know that the actual area under the curve is bounded by Z b f (x) dx ≤ (b − a) max f (x). (b − a) min f (x) ≤ x∈[a,b] a x∈[a,b] If f (x) is continuous, then the Extreme Value Theorem tells us that f (x) achieves both its minimum and maximum values. Also, the Intermediate Value Theorem tells us that there must exist some c ∈ [a, b] where Z b f (x) dx = (b − a)f (c). (21.1) a This f (c) is called the average value of f (x) on the interval [a, b], and (21.1) is known as the integral form of the Mean Value Theorem. So if we choose the right point c ∈ [a, b] to evaluate our function, our rectangular estimate of the area matches the integral right on the nose. But we’re usually not so lucky. How far off can we be? Recalling that the Mean Value Theorem says that f (x) = f (c) + f 0 (ξ)(x − c), for some ξ between c and x, we see that the difference between any two values of the function f (x) on the interval [a, b] is at most |f (c) − f (x)| ≤ M (b − a), where |f 0 (ξ)| ≤ M is an upper bound on the derivative for all ξ ∈ [a, b]. Multiplying this possible difference in height by the width b − a of the interval gives us an error estimate M (b − a)2 for a single rectangle. Next, suppose we have multiple rectangles, with the interval [a, b] divided into n equal subintervals, each with size h = (b − a)/n. Let |f 0 (ξ)| < M be a bound on the derivative over the entire interval [a, b]. Adding up the errors on each of the subintervals, our total error for a Riemann sum becomes n X M h2 = nM h2 = M (b − a)h = O(h). i=1 Here, we use the fact that nh = b − a, and we treat both b − a and M as constants. This tells us that the error is proportional to the size of the intervals h, which makes sense since as we increase the number of rectangles, we shrink the size of the intervals to 0, and the Riemann sum converges to the integral. 21.2. Trapezoid Method. So what convention for computing Riemann sums should we use? Left endpoints? Right endpoints? Midpoints? All of them have about the same rate of convergence, each with error O(h). We chose rectangles for Riemann sums because they are the simplest objects to measure the area of. However, we also have formulæ for the areas of other geometric shapes. A slightly APPLIED ANALYSIS 53 nicer technique often taught in Calculus classes is the Trapezoid Method for approximating integrals. For each rectangle, instead of choosing a single height at one point, we use the heights at both the left and the right endpoints of the interval. Connecting the corresponding two points on the graph with a straight line, we form a trapezoid. Working with a single trapezoid on the entire interval [a, b], we recall that the area of a trapezoid is the width times the average of the two heights. This gives us the area f (a) + f (b) (b − a)f (a) + (b − a)f (b) = , 2 2 which we note is actually the average of the area computed the left endpoint method and the area computed by the right endpoint method. Another way to look at what we are doing is that we are replacing our original function f (x) with the secant line l(x) passing through the function at the points a and b and linearly interpolates values of the functions on the interior of the interval. This is in contrast to Riemann sums, which replace the original function with a constant function on each subinterval. So how far off is our secant line l(x) from our original function f (x)? There is a formula for the error in linear interpolation, which tells us that (b − a) f 00 (ξ) (x − a)(x − b) 2 for some ξ ∈ [a, b]. (This is analogous to the Taylor’s Theorem error for the linear approximation of a function, which is of the form f 00 (ξ)(x − c)2 /2, except that instead of having a single center c, we have two endpoints a and b.) Bounding the second derivative term as f 00 (ξ)/2 < M , we obtain |f (x) − l(x)| < M (b − a)2 f (x) − l(x) = (actually, if we do this carefully, we can improve this error estimate by a factor of 4), and integrating this constant error bound over the whole interval [a, b], we find that the error in a single trapezoid is bounded by M (b − a)3 . If we now take n equally spaced subintervals of size (b − a)/n, the error in measuring the areas of the corresponding n trapezoids is then nM h3 = M (b − a)h2 = O(h2 ). This is a dramatic improvement over standard Riemann sums, which had an O(h) error. This is all the more surprising given that the Trapezoid Method is just the average of the Left Endpoint Method and the Right Endpoint Method! So, with nearly no extra work, we have significantly increased our accuracy. We have seen this sort of thing before. Recall from Section 17.1 that the standard definition of the derivative is O(h), while the central difference formula, which averages the forward-looking and backward-looking derivative estimates is actually O(h2 ). When considering integration methods, the Trapezoid Method is symmetric with respect to the two endpoints, just as the symmetric In terms of the formulæ, writing the size of the intervals as h = (b − a)/n, we have the Left Endpoint Method n−1 X L(h) = h f (a + kh) + O(h), k=0 54 GREGORY D. LANDWEBER the Right Endpoint Method R(h) = h n X b Z f (x) dx + O(h), f (a + kh) = a k=1 and their average, the Trapezoid Method n−1 T (h) = h f (a) X f (b) + f (a + kh) + 2 2 k=1 ! Z = b f (x) dx + O(h2 ). a We observe that the right endpoint of a trapezoid is the same as the left endpoint of the next trapezoid. Each of the internal points is counted twice, while the two endpoints are counted only once. In fact T (h) = T (−h) is an even function of h, with only even powers of h appearing in its power series expansion. The more trapezoids we take, the better an estimate we get. We note that if we double the number of trapezoids, we can use the estimate we already have, and add the new points: (21.2) n/2−1 1 h X T (h/2) = T (h) + f (a + h/2 + kh). 2 2 k=0 Another way to think of this is that we average the trapezoid rule approximation T (h) with a Riemann sum evaluating the function at the midpoint of each interval. 21.3. Advanced Quadrature. To obtain an even better estimate, we use Simpson’s rule, which approximates the area under the curve by measuring the areas under parabolas. It turns out that instead of computing directly with quadratic approximations, we can obtain Simpson’s rule using Romberg extrapolation to cancel the O(h2 ) term in the error for T (h), leaving us with an O(h4 ) error. We have Z b T (h) = f (x) dx + Ch2 + O(h4 ), a Z b f (x) dx + T (h/2) = a Ch2 + O(h4 ). 4 From this, we compute Z 4T (h/2) − T (h) = 3 b f (x) dx + O(h4 ), a giving us the Simpson’s rule estimate (21.3) 4T (h/2) − T (h) , 3 S(h) = which satisfies Z S(h) = a b f (x) dx + O(h4 ). APPLIED ANALYSIS 55 Taking the Romberg extrapolation a step further, we have Z b f (x) dx + Dh4 + O(h6 ), S(h) = a b Z S(h/2) = a Dh4 f (x) dx + + O(h6 ), 16 which gives us Z b 16S(h/2) − S(h) (21.4) R(h) = f (x) dx + O(h6 ). = 15 a Continuing, we obtain Z b 64R(h/2) − R(h) = f (x) dx + O(h8 ). 63 a Exercise 21.1. Estimate π, the area of the unit circle, by approximating Z 1 √ 2 1 − x2 dx. −1 Compute the trapezoid rule estimates T (2), T (1), and T (1/2) with 1, 2, and 4 trapezoids, respectively. Each can be computed from the previous by adding in the new sample points, as shown in (21.2). Next, compute the Simpson’s rule estimates S(2) and S(1) using the extrapolation formula (21.3). Finally, compute the O(h6 ) Romberg extrapolation estimate R(2) using (21.4). (Note. This actually doesn’t give a very good estimate for π.) 22. Numerical ODEs Our final topic is numerical methods for solving first order ordinary differential equations of the form y 0 (t) = f (t, y). This is a small subset of differential equations in general, but it includes many cases that arise in the real world. In these cases, the rate of change of the quantity y is determined by a function f which depends on both time t and the quantity y itself. Included in this class are differential equations where y 0 depends only on y, such as y 0 (t) = ky, which describes simple population models or radioactive decay. We also note that if y 0 depends only on t, the equation y 0 (t) = f (t) is just an integration problem with solution Z t y(t) = y0 + f (t) dt, t0 which we can solve using the quadrature techniques of the previous section. In all of our differential equation solving methods, we start at an initial time t0 with an initial value y0 = y(t0 ). Given these initial conditions, we want to find the value of 56 GREGORY D. LANDWEBER y(tf ) at some final time tf . It makes sense that we should be able to compute, or at least approximate, y(tf ), since we are given an initial value for y, and the derivative y 0 tells us how y changes with respect to time t. To do this, we inch forward from t0 to tf dividing the range of times into n small intervals each of width h = (tf − t0 )/n. 22.1. Euler’s Method. The simplest method of solving first order differential equations of the form y 0 (t) = f (t, y) is Euler’s method. Since we know the initial value y0 , and since the derivative at t0 is y 0 (t0 ) = f (t0 , y0 ), we can determine the next value of y using the linear approximation (22.1) y1 = y(t0 + h) = y(t0 ) + hy 0 (t0 ) + O(h2 ) = y0 + hf (t0 , y0 ) + O(h2 ). Repeating this process, we iterate the computations tk+1 = tk + h, yk+1 = y(tk+1 ) = yk + h f (tk , yk ) + O(h2 ) until we obtain yn = y(tf ). Note that while we have an O(h2 ) error at each step, there are n steps total. Since nh = tf − t0 is a constant, the overall error for all n steps is actually O(h). Doing this by hand, you would typical construct a table with the three columns: tk yk f (tk , yk ) As we fill in the rows of the table, we add h to each tk , compute yk using the previous row’s yk and f (tk , yk ) values, and then plug them both into f (tk , yk ). Easy peasy. Example 22.1. Suppose we want to approximate the solution to the differential equation y 0 = f (t). Given starting time t0 = a and initial condition y(a) = 0, the Fundamental Theorem of Calculus tells us that the solution for time tf = b is the definite integral Z b f (t) dt. y(b) = a Using Euler’s method, we compute y1 = y0 + hf (t0 ) = hf (a), y2 = y1 + hf (t1 ) = h f (a) + f (a + h) , .. . y(b) ≈ yn = h f (a) + f (a + h) + · · · + f (a + (n − 1)h) . But this is precisely what we get if we compute the integral using the left endpoint method, which we know gives an O(h) approximation. In light of this example, we can interpret Euler’s method as a generalization of Riemann sums for differential equations y 0 = f (t, y), where f now depends not only on t but also y. Historically, though, Euler’s method predated Riemann sums, and for our purposes Riemann sums are better viewed as a special case of Euler’s method. To put this in the proper historical context, note that Euler’s method involves derivatives, while Riemann sums involve integrals, viewed as the area under a curve. It took the Fundamental Theorem APPLIED ANALYSIS 57 of Calculus for mathematicians to realize that that these two computations are related to one another. 22.2. Taylor Series Method. When performing Euler’s method, we compute y(t+h) using a linear approximation with O(h)2 error. If we want a smaller or higher order error, we can replace the linear approximation (22.1) with a higher order Taylor series approximation. Using a second order Taylor series, we have h2 00 y (t) + O(h3 ) 2 (22.2) h2 d f (t, y) + O(h3 ). = y(t) + hf (t, y) + 2 dt In order to do this, we must be able to differentiate the function f (t, y). This is harder than it looks, since y is itself a function of t. We must therefore use the multivariable chain rule y(t + h) = y(t) + hy 0 (t) + f 0 (t, y) = d ∂ ∂ dy ∂ ∂ f (t, y) = f (t, y) + f (t, y) = f (t, y) + f (t, y) f (t, y), dt ∂t ∂y dt ∂t ∂y where in the final step we recall that our original differential equation gives us y 0 (t) = f (t, y). Since the error at each step is O(h3 ), and recalling that nh = tf − t0 is constant, the total error for all n steps of this Taylor series method of order 2 is therefore O(h2 ). In practice, to apply a Taylor series method, we first compute the f 0 (t, y) symbolically, then we iterate h2 0 tk+1 = tk + h, yk+1 = yk + h f (tn , yk ) + f (tk , yk ). 2 Building a table by hand, we have columns tk yk f (tk , yk ) f 0 (tk , yk ), and we fill in each row by adding h to the previous tk , computing yk from the values in the previous row, and then plugging in the tk and yk into the f (t, y) and f 0 (t, y) functions. Although the Taylor series method improves on the O(h) error we had with Euler’s method, it is a more complicated process. We must compute a symbolic multivariable derivative at the outset, and each step of the Taylor series method takes longer than the corresponding steps of Euler’s method. Example 22.2. Consider the differential equation y 0 = ty with initial condition t0 = 0 and y0 = 1. Let’s approximate y(3) with three intervals, so we set the size of each interval to h = 1. Trying it first with Euler’s method, we build a table t 0 1 2 3 so we get y(3) ≈ 6. y f (t, y) = ty 1 0 1 1 2 4 6 18 58 GREGORY D. LANDWEBER Next, let’s try the Taylor series method of order 2. We first compute the derivative f 0 (t, y) = d ty = y + ty 0 = y + t2 y. dt Now we can build our table t y f (t, y) = ty f 0 (t, y) = y + t2 y 0 1 0 1 1 1.5 1.5 3 2 4.5 9 22.5 3 24.75 so we get y(3) ≈ 24.75. (Notice that we did not need to compute the f (t, y) or f 0 (t, y) values for the last row of the table.) That’s rather higher than the approximation we got with Euler’s method. Solving this same differential equation using separation of variables, we have Z Z dy t2 2 = t =⇒ ln |y| = + C =⇒ y = Cet /2 , y 2 2 and with our initial condition y(0) = 1, we have C = 1, so y(t) = et /2 . Plugging in t = 3 gives us y(3) = e4.5 ≈ 90. So neither of our estimates are very good, but the Taylor series method is much better. Why did we do so poorly? Because we took h = 1 which is relatively large. This was an example to show how to use these two methods, rather than to give an accurate computation. With the Taylor series method, we are not limited to Taylor series of order 2. We can use Taylor series of any degree d and the corresponding method would approximate the solution of the differential equation to O(hd ). For instance, if we wanted to use the Taylor series method of degree 4, we would iterate h2 0 h3 00 h4 000 yk+1 = yk + h f (tk , yk ) + f (tk , yk ) + f (tk , yk ) + f (tk , yk ), 2 6 24 4 giving us an approximation with O(h ) error. Doing this by hand involves computing the higher total derivatives of f (t, h), and building a table with columns tk yk f (tk , yk ) f 0 (tk , yk ) f 00 (tk , yk ) f 000 (tk , yk ), which is tedious, but more accurate. 22.3. Runge-Kutta Methods. In practice, rather than taking higher derivatives, people often prefer to use approximation methods that require evaluating the function f (t, y) at multiple points and taking the weighted average. These are called Runge-Kutta methods. This gives significantly greater accuracy, with relatively little additional work, and the weighted averages we take are analogous to those that we used when integrating via the Trapezoid method, Simpson’s method, and Romberg extrapolation. The simplest is the Runge-Kutta method of order 2. To derive it, we start with the Taylor series method of order 2, but instead of computing f 0 (t, y) symbolically, we approximate the APPLIED ANALYSIS 59 derivative as f (t + h, y(t + h)) − f (t, y) + O(h). h Since we don’t have an exact value for y(t + h), we approximate that via Euler’s method as f 0 (t, y) = y(t + h) = y(t) + hf (t, y). Plugging these expressions into formula (22.2) for the Taylor series method of order 2, h2 f (t + h, y(t) + hf (t, y)) − f (t, y) + O(h3 ) 2 h h = y(t) + f (t, y) + f (t + h, y + hf (t, y)) + O(h3 ). 2 Notice that we still have the same O(h3 ) error that we had with a single iteration of the Taylor series method or order 2. Even though we are replacing the derivative f 0 (t, y) with an O(h) approximation, that approximation is multiplied by h2 /2, and so gives an error that is still O(h3 ). In practice, we build a table with columns y(t + h) = y(t) + hf (t, y) + tk yk f (tk , yk ) ek f (tk + h, ek ) where ek = yk + hf (tk , yk ) is the next Euler method estimate, and h f (tk , yk ) + f (tk+1 , ek ) . 2 What is really going on here? We are performing the standard Euler’s method to get a new value of y, but then we refine our estimate. Instead of computing the new value of y based on the old value and the derivative at time t, we look ahead and average the derivatives at both the left endpoint and right endpoint of each interval. This smarter way of estimating the derivative over the whole interval is reminiscent of the central difference formula (17.1) for the derivative. yk+1 = yk + Example 22.3. What does the Runge-Kutta method of order 2 give us if we apply it to an integration problem y 0 = f (t)? Letting t0 = a and y0 = 0, we compute h y1 = f (a) + f (a + h) 2 h y2 = y1 + f (a + h) + f (a + 2h) 2 1 1 =h f (a) + f (a + h) + f (a + 2h) 2 2 .. . 1 1 yn = h f (a) + f (a + h) + f (a + 2h) + · · · + f (a + (n − 1)h) + f (b) , 2 2 60 GREGORY D. LANDWEBER which is precisely the O(h2 ) estimate of the integral via the Trapezoid rule. Note that we did not need to compute the Euler’s method estimates ek at all here. Those appear in the expression f (tk+1 , ek ), but this f depends only on t and not the y parameter. Example 22.4. Let’s return to the differential equation y 0 = ty with y(0) = 1. To compute y(3) in three steps with h = 1, our Runge-Kutta method of degree 2 table becomes: t y f (t, y) = ty e = y + ty f (t + h, e) = (t + 1)(y + ty) 0 1 0 0 0 1 1 1 2 4 2 3.5 7 10.5 31.5 3 22.75 giving us y(3) ≈ 22.75, which is close to what we obtained using the Taylor series method of degree 2. That makes sense, as both methods have O(h2 ) error. The most commonly used Runge-Kutta method is in degree 4, which is based on Simpson’s rule for integration. At each stage it iterates: h (22.3) tk+1 = tk , yk+1 = yk + (s1 + 2s2 + 2s3 + s4 ), 6 where we take a weighted average of the derivative function f (t, y) evaluated at four points: s1 = f (tk , yk ), h h s2 = f tk + , yk + s1 , 2 2 h h s3 = f tk + , yk + s2 , 2 2 s4 = f (tk+1 , yk + hs3 ). Doing this by hand requires columns for h h h yn + s1 s2 yn + s2 s3 yn + hs3 s4 tn yn s1 tn + 2 2 2 which seems like a lot of work for each step, but it’s worth it because the error is O(h4 ). People tend not to use Runge-Kutta methods of order higher than 4, though, since the amount of work it takes to perform each step grows so quickly that it outweighs the benefit of the smaller error estimate. Notice that if we are considering an integration problem y 0 = f (t) where the function f does not depend on y, then s1 = s2 , and the formula (22.3) becomes 1 2 1 yk+1 = yk + h f (t) + f (t + h/2) + (t + h) , 6 3 6 which is precisely weighted average we used with Simpson’s method, which computes the area under a parabola. We are not going to derive the Runge-Kutta method of degree 4 here. To do so would require starting with a Taylor series method of order 3 and approximating both the f 0 (t, y) term (as we did with the Runge-Kutta method of degree 2) as well as the f 00 (t, y) term via numerical differentiation. Although that would appear to give us a method APPLIED ANALYSIS 61 with error O(h3 ), we actually get a better error of O(h4 ). Just like with Simpson’s method, the error series for the Runge-Kutta method of degree 4 has only even powers of h. Exercise 22.5. Use the methods described above for the differential equation y 0 = y and initial value y(0) = 1 to approximate the value of y(1) = e. Use (1) 4 iterations of Euler’s method, (2) 2 iterations of the Taylor series method of order 2, (3) 1 iteration of the Taylor series method of order 4, (4) 2 iterations of the Runge-Kutta method of order 2, and (5) 1 iteration of the Runge-Kutta method of order 4. Exercise 22.6. Compute e to 7 digits after the decimal point by approximating the root of the equation ln x = 1. You may use any method of root approximation that you want. Show all of your work. Exercise 22.7. Approximate ln 2 to 3 digits after the decimal point by computing the integral Z 2 1 ln 2 = dx 1 x using any method of numerical integration you want. Show all of your work. Bard College, P.O. Box 5000, Annandale-on-Hudson, NY 12504-5000 E-mail address: gregland@bard.edu URL: http://math.bard.edu/greg/

APPLIED ANALYSIS Contents 1. The Real Numbers 2 2. Ordered

Related documents

Products

Support

APPLIED ANALYSIS Contents 1. The Real Numbers 2 2. Ordered

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib