APPLIED ANALYSIS Contents 1. The Real Numbers 2 2. Ordered

advertisement
APPLIED ANALYSIS
GREGORY D. LANDWEBER
Contents
1. The Real Numbers
2. Ordered Fields
3. Metric Spaces√
4. Constructing 2
5. The Archimedean Property
6. Decimal Representations and Computation
7. Sequences
8. Cauchy Sequences
9. Convergence Theorems
10. Limits and Continuity
11. The Intermediate Value Theorem
12. The Extreme Value Theorem
13. Derivatives
14. The Mean Value Theorem
15. Taylor’s Theorem
15.1. Generalizing the Mean Value Theorem
15.2. Error Estimates
16. Series
17. Numerical Differentiation
17.1. The Derivative
17.2. Higher Derivatives
17.3. Richardson Extrapolation
18. Contractions
19. Fixed Point Iteration
20. Root Approximation
20.1. Newton’s Method
20.2. Secant Method
21. Numerical Integration
21.1. Riemann Sums
21.2. Trapezoid Method
21.3. Advanced Quadrature
22. Numerical ODEs
Date: May 10, 2014.
1
2
4
6
7
8
9
11
14
14
17
18
19
20
22
23
23
25
26
33
33
35
36
37
41
44
44
47
51
51
52
54
55
2
GREGORY D. LANDWEBER
22.1. Euler’s Method
22.2. Taylor Series Method
22.3. Runge-Kutta Methods
56
57
58
1. The Real Numbers
What are the real numbers? In this course we will see several equivalent definitions of the
real numbers, but first we need to define the natural numbers, the integers, and the rational
numbers. We start with the natural numbers, which are our standard counting numbers
starting with 1. (There is some debate over whether the natural numbers should start with
1 or 0, with computer scientists preferring 0. Personally, I don’t care, as long as you make
it clear what you mean if it makes any difference.)
Definition 1.1 (The Peano Postulates). The set N of natural numbers satisfies the following
four axioms:
•
•
•
•
There is an element 1 ∈ N,
There is a successor map, succ : N → N, which is injective.
The element 1 is not in the image of succ.
The only subset N ⊂ N satisfying both (a) 1 ∈ N and (b) if n ∈ N then succ(n) ∈ N
is N = N itself.
These axioms are what you might expect if you think about the natural numbers. When
you count, you typically start with 1, and you successively add 1, which is what the successor
map does. As you count, you never repeat any numbers, which is the injectivity, and you
never get back to 1 again, which among other things eliminates modular arithmetic. The
final axiom is the most complicated, and it says that you get all of the natural numbers
if you start with 1 and repeatedly apply the successor function. Or stating it differently,
the natural numbers are the smallest set satisfying the first three axioms. Without the last
axiom, the set of all positive real numbers would satisfy the other axioms, as would the union
of the natural numbers with the integers shifted by 1/2.
With extra work, we could also define addition and multiplication of natural numbers and
show that they satisfy all their expected properties. To be complete, we should also show
that a set satisfying the Peano Postulates exists and is unique. However, this is neither a
course in foundations nor in algebra, so let’s continue on to the integers.
Although we can add and multiply natural numbers, we cannot always subtract them,
which requires negative numbers, as well as an additive identity 0. (In algebraic terms,
the natural numbers are semi-ring, which satisfies all the properties of a ring, except for
not requiring an additive identity or additive inverses.) To construct the integers, we could
define them by simply tacking on 0 and negative numbers to the natural numbers:
Z = N ∪ {0} ∪ −N.
APPLIED ANALYSIS
3
However, this makes working with addition and multiplication unnecessarily difficult, as all
our proofs would have to deal constantly with the three separate cases. Instead we construct
the integers using formal differences.
Definition 1.2. The integers Z are equivalence classes in N×N, writing ordered pairs (m, n)
as formal differences n − m for n, m ∈ N, modulo the equivalence relation
(1.1)
m − n ∼ p − q iff m + q = p + n
for m, n, p, q ∈ N.
We wanted subtraction, and this definition gives it to us. In fact, every integer is viewed as
a difference of two natural numbers. For instance the negative integer −2 can be represented
as the differences 1−3 or 2−4 or 3−5, etc., all of which are equivalent. In order to show that
two such formal differences are equivalent, the relation (1.1) algebraically transforms the new
operation of subtraction into addition, which we can do just fine using natural numbers.
Although we can subtract in the integers (i.e., they are a ring with an additive identity
and additive inverses), we still cannot necessarily divide any two integers. However, we can
use essentially the same formal ordered pair technique that we used to construct the integers,
now constructing the rational numbers as formal quotients of integers.
Definition 1.3. The rational numbers Q are equivalence classes in Z × (Z − {0}), writing
ordered pairs (p, q) as formal quotients p/q for p, q ∈ Z, modulo the equivalence relation
m
p
(1.2)
∼ iff mq = pn
n
q
for m, n, p, q ∈ Z.
This definition of the rational numbers is, in fact, exactly how we think of fractions. Every
fraction has a numerator and a denominator, where the numerator is an integer, and the
denominator is a non-zero integer. We know that fractions can be written multiple ways,
such as 1/2 = 2/4 = 3/6 etc., all of which are equivalent via the relation (1.2), which is just
cross-multiplying. And reducing a fraction to lowest terms is, in this language, choosing a
representative of the equivalence class. They never taught you that in elementary school!
Now that have made sense of fractions as equivalence classes of ordered pairs, go back
and take another look at Definition 1.2 to see if it makes any more sense to you. So far,
we started with the natural numbers, constructed the integers by formally subtracting, and
constructed the rational numbers by formally dividing. Algebraically, the rational numbers
are a field, which means we can do any kind of arithmetic. However, we run into problems
2
with algebra, such as if we try to find the
√ roots of the equation x − 2 = 0. You probably
showed in Proofs & Fundamentals that 2 is irrational. In addition, you probably know
that transcendental numbers such as π and e are irrational. But the real numbers are much
more than just the roots of polynomials (which are called algebraic numbers), or the short
list of popular irrational numbers arising from the study of geometry and compound interest.
To really understand the real numbers, we must first ask ourselves what do we need to do
using real numbers that we cannot already do with rational numbers? It turns out you have
already known the answer for years: it’s limits and calculus!
4
GREGORY D. LANDWEBER
When we wanted to subtract, we defined the integers as formal differences of natural
numbers. When we wanted to divide, we defined the rational numbers as formal quotients of
integers (except for disallowing dividing by 0). Now that we want to be able to take limits,
we can define the real numbers as the formal limits of appropriate sequences of rational
numbers. But what does it mean for a sequence to converge? The standard definition is:
Definition 1.4. A sequence {an }∞
n=1 converges to L, written limn→∞ an = L, if
∀ > 0, ∃N ∈ N such that n ≥ N =⇒ |an − L| < .
This may look complicated to you now, with its double quantifiers and Greek letter epsilon.
In fact, a large part of understanding real analysis is mastering how to manipulate such
double quantifiers expressed in terms of , N , and later δ. We will study this definition in
much more detail later, but for now it means that as we progress through the sequence,
the terms get closer and closer to the limiting value. The problem with trying to use this
definition of a convergent sequence to construct the real numbers is that it is stated in terms
of the value L to which the sequence converges. This is fine when considering sequences that
converge to rational numbers, but we want to consider sequences of rational numbers that
may converge to irrational numbers. So how do we know if a rational sequence converges, if
there is no rational number that it converges to? Instead of actually converging, we consider
sequences that “should converge”:
Definition 1.5. A Cauchy sequence is a sequence {an }∞
n=1 satisfying the property
∀ > 0, ∃N ∈ N such that m, n ≥ N =⇒ |an − am | < .
Basically, as we progress through a Cauchy sequence, the terms get closer and closer to
each other. Although such sequences look like they should converge, a Cauchy sequence
of rational numbers does not necessarily converge to a rational number. For example, the
sequence of rational numbers
3, 3.1, 3.14, 3.141, 3.1415, 3.14159, 3.141592, 3.1415926, 3.14159265, 3.141592653, . . .
does not converge to a rational number, since π is irrational. To get such sequences to
converge, we explicitly construct the real numbers out of sequences:
Definition 1.6. The real numbers R are equivalence classes of Cauchy sequences in rational
numbers, modulo the equivalence relation
∞
{an }∞
n=1 ∼ {bn }n=1 iff lim (an − bn ) = 0.
n→∞
In other words, real numbers are sequences of rational numbers that should converge,
where two sequences are equivalent if they should converge to the same value.
2. Ordered Fields
The rational numbers and the real numbers are both fields, meaning they have addition and
multiplication operations, both of which satisfy the properties of an abelian group operation
(if you exclude 0 when considering multiplication), and which satisfy a distributive law. Or
APPLIED ANALYSIS
5
in other words, all the standard rules of arithmetic hold. In addition, the rational numbers
and the real numbers are ordered, and we will now define precisely what that means.
Definition 2.1. An ordered field is a field with a < relation satisfying
• If x < y and y < z, then x < z (the Transitive Property).
• For any x and y, precisely one of x < y, x = y, or y < x must be true (the Trichotomy
Property).
• If x < y, then x + z < y + z (the Additive Property).
• If x < y and z > 0, then xz < yz (the Multiplicative Property).
The transitive property is a standard property of relations, and the additive and multiplicative properties describe how the less-than relation interacts with the field operations.
The trichotomy property tells us that an ordered field is totally ordered, in that we can
directly compare any two elements. This is in contrast to partially ordered sets, or posets,
where any two elements are not necessarily directly comparable.
Of course, an ordered field not only has a < relation, but also the corresponding relations
>, ≤, ≥. With these relations, we can define intervals, such as the open interval
(a, b) = {x | a < x and x < b},
and absolute values
(
|x| =
x if x ≥ 0,
−x if x < 0.
Theorem 2.2. It follows immediately from the definition of an ordered field that:
• |x| ≥ 0.
• if x < y, then 1/x > 1/y (taking reciprocals reverses inequalities),
• if x < y and z < 0, then xz > yz (multiplying by a negative number reverses
inequalities), and
• x2 ≥ 0 (so in particular the complex numbers C cannot be an ordered field).
Proof. The proofs of these elementary facts are left to the reader.
The following two results often come in handy in real analysis:
Lemma 2.3. If x ≤ for all > 0, then x ≤ 0.
Proof. Suppose that x > 0. Taking = x/2, the multiplicative property gives us x/2 > 0,
and the addition property gives us x > x/2. Then by the trichotomy property we cannot
have x ≤ x/2.
Corollary 2.4. If |x| ≤ for all > 0, then x = 0.
It turns out that every ordered field contains the rational numbers. After all, a field must
contain 0 and 1, and it follows from the axioms of an ordered field that all elements of the
form 1 + 1 + · · · + 1 must be distinct. That gives us the natural numbers, and since a field
has additive and multiplicative inverses, we get the rational numbers as well. What is more
difficult to show is that the real numbers are in fact the only complete ordered field, where
complete means that all Cauchy sequences converge.
6
GREGORY D. LANDWEBER
3. Metric Spaces
Definitions 1.4 and 1.5 are stated in terms of the absolute values |an − L| and |an − am |,
respectively. However, they really are not about absolute values in the sense of switching
minus signs to plus signs. Rather, they use the fact that |a − b| is the distance between a and
b. In fact, most of the definitions and results in this course do not require that we work with
the real numbers or even an ordered field, and can be converted from statements involving
absolute values to more general results about distances. To do this, we need a rigorous set
of axioms to characterize distance.
Definition 3.1. A metric space is a set X together with a distance function d : X × X → R
satisfying
•
•
•
•
d(x, y) ≥ 0,
d(x, y) = 0 if and only if x = y,
d(x, y) = d(y, x) (the symmetric property), and
d(x, z) < d(x, y) + d(y, z) (the triangle inequality).
In our case we use the distance function d(x, y) = |x−y|. However, if we are working in R2
we can use the Pythagorean theorem, defining the distance as the length of the hypotenuse
p
d ((x1 , y1 ), (x2 , y2 )) = (x1 − x2 )2 + (y1 − y2 )2 ,
and for a general Rn , the distance between two vectors x and y is
v
u n
uX
d(x, y) = t (xi − yi )2 .
i=1
In fact, viewing the real numbers as R = R1 we can rewrite the absolute value as
p
d(x, y) = |x − y| = (x − y)2 .
The Euclidean spaces Rn are not the only examples of metric spaces. We can also build
metric spaces by measuring distances on curved surfaces. For example, to measure the
distance between two points on the spherical surface of the earth, we measure the distance
along a great circle (a circle centered at the center of the earth, which may be familiar to you
as the route that airplanes fly). Understanding how curvature affects distance, and how to
determine the shortest paths between points (called a geodesic) is a big part of differential
geometry and general relativity.
We can also concoct strange and interesting distances functions satisfying the axioms of
Definition 3.1. For example, we can define a discrete metric space by defining distance to be
(
0 if x = y,
d(x, y) =
1 if x 6= y,
which indeed satisfies all the axioms. With a discrete metric, all points are separated from
each other, which makes taking limits rather pointless. The field of topology is an axiomatic
study of distance functions and open sets, which generalize the open intervals we use on the
APPLIED ANALYSIS
7
real line. Topology explores much broader notions of limits and continuity, with surprising
and beautiful results.
4. Constructing
√
2
In √
Section 1, we constructed the real numbers as sequences of rational numbers. We know
that
√ 2 is an irrational number, so what is a sequence of rational numbers that converges
to √ 2? There are many ways to do this, and we will see at least four different constructions
of 2 later in the course.
√
The simplest way to construct
2 is to start with the two closest integers. Since 12 = 1
√
2
and 2 = 4, we know that 2 is between 1 and 2. If I were
√ doing this, I would figure out the
2
2
next digit, noticing that 1.4 = 1.96 and 1.5 = 2.25, so 2 lies between
√ 1.4 and 1.5. On the
other hand, if we want to be methodical about it, if we know that 2 is between 1 and 2,
2
we
√ could split the difference and consider the midpoint 1.5. Since 1.5 =2 2.25, we know√that
2 is between 1 and 1.5. Taking the midpoint again, we see that 1.25 = 1.5625, so 2 is
between 1.25 and 1.5. Continuing this process, we obtain the sequence:
1, 1.5, 1.25, 1.375, 1.4375, 1.40625, . . .
√
which indeed converges to 2 = 1.41421356237309 . . ..
This process is called the bisection method, since at each stage we bisect the interval. From
2
2
a theoretical
√ point of view, in order to do this we need to know that if a < 2 and b > 2,
then a < 2 < b. This is a consequence of:
Theorem 4.1 (Intermediate Value Theorem). If the function f : [a, b] → R is continuous,
and d is between f (a) and f (b), then there exists c ∈ [a, b] such that f (c) = d.
We will define the notion of continuity later, but at this point you probably have an
intuitive idea of what it means from Calculus.
√
Now that the bisection method gives us a sequence that potentially converges to 2, we
can use Definition
√ 1.4 to prove that it converges. Given any > 0, we need to find an N
such that |an − 2| √
< whenever n > N . Here a1 = 1, a2 = 1.5, a3 = 1.25, etc. At each
stage, we know that 2 lies between an and an+1 , and we also note that |an − an+1 | = 1/2n ,
so we have
√
|an − 2| < 1/2n .
We now need to find N such that
1/2N < .
Solving for N , we obtain
N > − log2 .
Since it is always possible to find√a natural number N greater than − log2 , we have shown
that this sequence converges to 2.
8
GREGORY D. LANDWEBER
5. The Archimedean Property
Actually, how do we know that given any real number, there always exists a greater
natural number? This is called the Archimedean Property, and while it seems obvious, it is
non-trivial to prove. We start with the analogous statement for the rational numbers
Theorem 5.1 (Archimedean Property of Q). Let p/q ∈ Q be a rational number. Then there
exists a natural number n ∈ N such that n > p/q.
Proof. Let us assume that p/q > 0, since otherwise we have p/q < 0 < 1 and we are done.
Since p/q is positive, let us take both p > 0 and q > 0, and we observe that
p/q > 0 =⇒ 2p/q > p/q > 0.
Similarly, we obtain
3p/q > p/q > 0 and 4p/q > p/q > 0 etc.
Adding p/q a total of q times gives us
p = qp/q > p/q,
where p is a positive integer, and thus a natural number.
Now that we have established the Archimedean Property for the rational numbers, we can
prove it for the real numbers using the fact that every real number is the limit of a sequence
of rational numbers.
Theorem 5.2 (Archimedean Property of R). For any real number x ∈ R, there exists a
natural number n ∈ N such that n > x.
Proof. Let x be a real number. We then have x = limn→∞ an , where {an }∞
n=0 is a sequence of
rational numbers. Since this sequence converges, we can choose = 1 and then there exists
an N ∈ N so that whenever n > N , we have |an − x| < 1. It follows that x < an + 1. Since
an is rational, we know that there exists a natural number M ∈ N such that an < M , and
thus x < an + 1 < M + 1.
This proof introduces a technique that we will use repeatedly. Since the reals are limits
of sequences of rational numbers, given a real number we can always find a rational number
that is arbitrarily close. The technical term for this is that the rational numbers are dense
in the real numbers.
Instead of trying to find natural numbers that are larger than any given real number, we
can invert the problem and find rational numbers that are smaller than any given positive
real number.
Corollary 5.3. Given any > 0, there exists N ∈ N such that 1/N < .
This corollary is very useful in proving that sequences converge. Here are two examples,
both showing convergence to 0, although the Archimedean property can be used to prove
convergence for sequences with limits other than 0, too.
APPLIED ANALYSIS
9
Example 5.4. To show that
1
= 0,
n→∞ n
suppose we are given an > 0. The corollary to the Archimedean property says that there
exists a natural number N such that 1/N < . Then for all n > N we have
lim
1
1
<
< ,
n
N
which proves convergence.
Example 5.5. To show that
1
= 0,
n→∞ 2n
suppose we are given an > 0. We would like to find N so that 1/2N < , and solving for
N we obtain N > − log2 . The Archimedean property says that we can always find such an
N . Then for n ≥ N we have
1
1
≤ N < ,
n
2
2
which proves convergence. We could also prove that this sequence converges to 0 by using
the Sequence Comparison Test (a corollary of the Squeeze Theorem) below. We notice that
lim
0<
1
1
< ,
n
2
n
and taking limits gives us
1
1
≤ lim = 0,
n
n→∞ 2
n→∞ n
which squeezes the limit between two values, both of which are 0.
0 ≤ lim
6. Decimal Representations and Computation
Our Definition 1.6 of the real numbers in terms of Cauchy sequences is the most commonly used construction of the real numbers. The process of extending a metric space by
considering equivalence classes of Cauchy sequences is called taking the Cauchy completion.
It is, however, not the only way to define the real numbers. The real numbers can also be
defined axiomatically, for instance as a complete ordered field, in which all Cauchy sequences
converge. That’s a little different than constructing the reals out of Cauchy sequences, and
proving that our Cauchy completion is in fact complete is surprisingly non-trivial.
Another axiomatic definition of the real numbers is as a complete ordered field satisfying
the least upper bound property, where any set that is bound above in fact has a least upper
bound. You can also construct the real numbers as Dedekind cuts, which split the entirety
of the rational numbers into lower and upper sets.
A more practical construction of the real numbers is via infinite decimal expansions:
nk . . . n3 n2 n1 .d1 d2 d3 d4 . . . ,
where there are finitely many digits to the left of the decimal place and countably infinitely
many digits to the right. Of course all the digits are in the range from 0 through 9. We
10
GREGORY D. LANDWEBER
note that a terminating decimal can still be considered to be an infinite decimal expansion,
where all the di digits are 0 past some point.
There are two problems with constructing real numbers via such infinite decimals. First,
it turns out that infinite decimals are not unique. For instance, we can show that
0.9999 . . . = 1.0000 . . . .
Fortunately, all pairs of distinct decimal expansions corresponding to the same number are
of this form, and we can fix that by taking the appropriate equivalence classes.
The other problem with infinite decimals is that they are expressed in base 10, a consequence of the fact that most humans have 10 fingers and 10 toes. That is an arbitrary choice,
and we could just as easily express numbers as infinite strings of digits in other bases. For
instance, a computer uses binary representations of floating point numbers in base 2. It does
not matter what base you use; the real numbers are the same regardless.
But what is really happening when we consider a decimal expansion? As we saw before,
the number
π = 3.1415926535 . . .
is the limit of the Cauchy sequence
{3, 3.1, 3.14, 3.141, 3.1415, 3.14159, 3.141592, 3.1415926, 3.1415265, 3.141592653, . . .}.
The same is true for every infinite decimal expansion, so such decimal numbers are indeed
real numbers. The converse is more difficult, showing that every Cauchy sequence of rational
numbers is equivalent to a decimal expansion.
Another way to view the decimal expansion of a real number is as giving a sequence of
intervals containing that number. For example, when we say that π ≈ 3, we mean that π is
between 3 and 4, and when we say that π ≈ 3.1, we mean that π is between 3.1 and 3.2. In
fact, we can express π as the unique point in the intersection of all such intervals:
{π} = [3, 4) ∩ [3.1, 3.2) ∩ [3.14, 3.15) ∩ [3.141, 3.142) ∩ [3.1415, 3.1416) ∩ · · · .
Later, we will prove the Nested Intervals Theorem, which says that any such sequence of
nested (closed) intervals whose sizes shrink to 0 contains precisely one real number. In fact,
we could even define the real numbers as an ordered field satisfying a Nested Intervals Axiom.
This approach of bounding a real number inside ever shrinking intervals mirrors how real
numbers are used in real life. Have you ever actually used a real number? No measurement you have ever taken has been an irrational number, but instead was a fraction or a
finite decimal. Why is that? It is because your measuring equipment does not have infinite accuracy, so you are forced to use an approximation. In chemistry, people talk about
significant figures. If you say you have 2.54 moles of a molecule, it is understood that the
actual value is between 2.535 and 2.545 moles, so your approximation has an error of at most
0.005. In statistics, you might conclude that Democrats are leading Republicans by 53% to
47%, but that there is a margin of error of 2%, and most good statistical estimates usually
include a standard deviation term. Other scientific measurements typically come with an
error estimate, too, due to the limitations of the equipment. Part of the scientific method is
that experiments must be reproducible, with the understanding that a repeated experiment
APPLIED ANALYSIS
11
should give a value within your original margin of error, and that as you use more and more
accurate measuring equipment, your margin of error decreases. Sound familiar?
Is the problem with science that we have not yet figured out how build a contraption that
can measure with infinite precision? Surprisingly, quantum mechanics tells us that it will
never be possible to measure anything with infinite accuracy, because particles themselves
are not localized at specific points, but rather smeared over small regions as wave functions.
So, is science doomed? No, that’s just how real numbers work! If we define the real numbers
in terms of a Nested Intervals Axiom, the reals are numbers that can be approximated to
within whatever error you are willing to tolerate, but which can always be refined further.
This is the approach we will take with our numerical computations. Every computational
algorithm we will discuss not only provides a sequence of better and better approximations
to whatever we are trying to compute, but also comes with a bound on the error at each
step. If we are given an error tolerance, such as wanting to compute π to 10 decimal places,
we find a sequence that approaches π and keep computing until our error bound is less than
10−10 . This way we can compute any real value we want by providing a sequence of ever
improving approximations, since that is what real numbers are.
7. Sequences
A sequence is an ordered, countably infinite collection of numbers, which is typically
denoted by a subscripted variable, as in
a1 , a2 , a3 , a4 , a5 , . . . .
More compactly, we can write such a sequence as {an }∞
n=1 . Although sequences are typically
indexed by natural numbers starting with 1, we often encounter sequences with indexes
starting at 0, particular when we consider series. In general, the index we start with is of
little consequence, and surprisingly the order of the sequence is not terribly important either.
What really matters is that the sequence contains a countably infinite number of elements.
Recall from Section 1 our Definition 1.4 of the limit of a sequence, using double quantifiers.
Here we explore some immediate consequences of that definition.
Lemma 7.1 (Sign-Preserving Property). If an ≥ 0 for all n ∈ N, then limn→∞ an ≥ 0.
Proof. The proof is left to the reader. This lemma is a corollary of the slightly simpler result:
if limn→∞ an > 0, then there exists M > 0 and N ∈ N so that an > M for all n > N .
∞
Theorem 7.2. If {an }∞
n=1 and {bn }n=1 both converge, then
lim (an ± bn ) = lim an ± lim bn .
n→∞
n→∞
n→∞
The proof of this lemma is a straightforward example of what is called an /2 argument
and demonstrates several common techniques that we will use throughout real analysis. To
illustrate these techniques we will go into rather more detail than is really necessary.
∞
Long-Winded Proof. Since we know that {an }∞
n=1 and {bn }n=1 converge, they have limits
lim an = La
n→∞
and
lim bn = Lb .
n→∞
12
GREGORY D. LANDWEBER
To prove that limn→∞ (an + bn ) = La + Lb , we must show that given any > 0, we can
produce N ∈ N such that |(an + bn ) − (La + Lb )| < whenever n ≥ N . Using Definition 1.4
of the limit of a sequence, we know that given our > 0 there must exist
• Na ∈ N such that |an − La | < whenever n ≥ Na , and
• Nb ∈ N such that |bn − Lb | < whenever n ≥ Nb .
We rewrite these absolute value inequalities in terms of the open intervals (La − , La + )
and (Lb − , Lb + ), both of size centered at La and Lb respectively. This gives us
La − < an < La + and Lb − < bn < Lb + .
Adding these inequalities, we obtain
La + Lb − 2 < an + bn < La + Lb + 2,
which in terms of absolute values gives us
(an + bn ) − (La + Lb ) < 2.
(7.1)
We know this is true whenever both n ≥ Na and n ≥ Nb . Constructing N = max(Na , Nb ),
we observe that N ≥ Na and N ≥ Nb , so whenever n ≥ N we must have both n ≥ N ≥ Na
and n ≥ N ≥ Nb . This gives us our desired result, well almost. We wanted the distance in
(7.1) to be bounded by , not 2. How do we fix that? Just go back to our definitions of Na
and Nb and use /2 in place of .
If we wanted a more concise proof, we could have written:
Concise Proof. Let limn→∞ an = La and limn→∞ bn = Lb . There exist Na and Nb such that
|an − La | < /2 and |bn − Lb | < /2 whenever n > Na and n > Nb , respectively. Letting
N = max(Na , Nb ), we then have |(an + bn ) − (La + Lb )| < whenever n > N .
Or if we really know what we are doing and are a tad flippant:
Over-Confident Proof. This is an /2 argument.
Here are some results involving sequences bounded by numbers or by other sequences.
Theorem 7.3. Every convergent sequence is bounded.
Proof. Suppose limn→∞ an = L. Taking = 1, we find that there exists N ∈ N so that
|an − L| < 1 =⇒ L − 1 < an < L + 1
for all n > N . This bounds all of the terms in the sequence after the aN term. To bound
the entire sequence, we just need to extend our bounds so they works for the first N terms
as well. To do this we observe that
min(a1 , a2 , . . . , aN , L − 1) ≤ an ≤ max(a1 , a2 , . . . , aN , L + 1)
for all n ∈ N, and so the entire sequence is bounded.
APPLIED ANALYSIS
13
Both this proof and our long-winded proof of Theorem 7.2 above take advantage of the
min/max trick, the simple observation that
a < min(b, c) if and only if a < b and a < c, and
a > max(b, c) if and only if a > b and a > c.
In practice, the converse of Theorem 7.3 is used far more often.
Corollary 7.4. If a sequence is unbounded, then it diverges
You may recall the following lemma from Calculus, if only because of its interesting name.
Theorem 7.5 (Squeeze Lemma). Given two convergent sequences with the same limit
lim an = lim cn = L,
n→∞
n→∞
suppose a third sequence {bn }∞
n=1 is squeezed between them, in that there exists N ∈ N such
that an ≤ bn ≤ cn for all n > N . Then limn→∞ bn = L as well.
Proof. Given > 0, there exists Na , Nc ∈ N so that
|am − L| < and |cn − L| < whenever m > Na and n > Nb . We want to find Nb so that |bn − L| < whever n > Nb . For
any n ∈ N satisfying both n > Na and n > Nb , as well as n > N from the statement of the
theorem, we have
L − < an ≤ bn ≤ cn < L + ,
so |bn − L| < . In order that n be greater than all three of Na , Nc , and N , we can require
n > Nb = max(Na , Nc , N ),
which shows that limn→∞ bn = L.
Notice that once again we have used the min/max trick. The most commonly used application of the Squeeze Lemma is the comparison test for positive sequences.
Corollary 7.6 (Sequence Comparison Test). Suppose that limn→∞ bn = 0 and {an }∞
n=1 is
a sequence with positive terms satisfying an ≤ bn for all n > N for some N ∈ N. Then
limn→∞ an = 0 as well.
Proof. Since the an terms are all positive, we have 0 < an , and so we can squeeze the
∞
∞
{an }∞
n=1 sequence between the constant sequence {0}n=1 and the sequence {bn }n=1 , both of
which converge to 0.
∞
Exercise 7.7. Suppose the sequences {an }∞
n=1 and {bn }n=1 are precisely the same set of points,
but in a different order. Prove that limn→∞ an = L if and only if limn→∞ bn = L. (Hint. You
can prove this using a straightforward − N argument, together with the min/max trick.)
14
GREGORY D. LANDWEBER
8. Cauchy Sequences
Theorem 8.1. Every convergent sequence is a Cauchy sequence.
Proof. Support limn→∞ an = L. Given any > 0, we can find N ∈ N so that for all n > N
we have |an − L| < /2, or in other words
L − < an < L + .
2
2
For any m, n > N , both am and an are contained in the same interval of size , and so
|am − an | < .
This is another example of an /2 argument. We are saying that two terms of the sequence
that are close to L must be close to each other. However, if we used the standard condition
|an − L| < , then we would find that the distance between the two terms would be bounded
by |am − an | < 2. However, the definition of the limit of a sequence applies to all , so we
can divide all our ’s by 2 and the argument still works. We will see many examples of /2
arguments throughout this course, as well as the occasional /3 argument, and even perhaps
an /4 or /5 argument. Ultimately, all such arguments can be viewed as consequences of
the triangle inequality for metric spaces.
Theorem 8.2 (Cauchy Completeness Theorem). Every Cauchy sequence of real numbers
converges.
Proof. Surprisingly hard, considering that we defined the real numbers via Cauchy sequences
over the rational numbers.
9. Convergence Theorems
Recall that we defined the real numbers via the Cauchy Completeness Axiom, so all
Cauchy sequences converge. In this section we prove several theorems about convergence,
all of which follow from that one axiom. All of these proofs can be reduced to showing that
the sequence in question is, in fact, a Cauchy sequence. However we will take an indirect
method in our proofs, instead relying on the following workhorse theorem:
Theorem 9.1 (Nested Intervals Theorem). Suppose we have a sequence of closed intervals
[a1 , b1 ] ⊃ [a2 , b2 ] ⊃ [a3 , b3 ] ⊃ · · · ,
each nested inside the previous one, and suppose the size of these intervals shrinks to zero,
i.e, limn→∞ (bn − an ) = 0. Then their intersection is a single point,
∞
\
[an , bn ] = {c},
n=1
∞
for some c ∈ R, and furthermore the sequences {an }∞
n=1 and {bn }n=1 converge, with limit
lim an = lim bn = c.
n→∞
n→∞
APPLIED ANALYSIS
15
∞
Proof. We first show that both {an }∞
n=1 and {bn }n=1 are Cauchy sequences, which means
they converge. Consider the sequence {an }∞
n=1 , and suppose we are given > 0. Since
limn→∞ (bn − an ) = 0, we know that there exists N ∈ N so that |an − bn | < whenever
n > N . Since the intervals are nested, we have
a1 ≤ a2 ≤ a3 ≤ · · · ≤ an ≤ · · · ≤ b n · · · ≤ b 3 ≤ b 2 ≤ b 1 .
It follows that for any m > N we have
aN ≤ am ≤ bN < aN + ,
and same is true for an where m > N . Since both am and an are contained in the same interval
of size , we have |an − am | < whenever m, n > N . This shows that {an }∞
n=1 is a Cauchy
sequence, which converges by the Cauchy Completeness Axiom, and so limn→∞ an = ca for
some ca ∈ R. Using the same argument to show that {bn }∞
n=1 is a Cauchy sequence, we find
that limn→∞ an = cb for some cb ∈ R. With both sequences converging, we have
cb − ca = lim bn − lim an = lim bn − an = 0,
n→∞
n→∞
n→∞
and thus ca = cb . From here on we write c = ca = cb .
Next, we consider the intersection of all the nested intervals. We must show that it contains
c, but not any other real numbers. In order to show that the intersection contains c, we
must show that an ≤ c ≤ bn for all n ∈ N. Suppose that c < an for some n ∈ N. Taking
= an − c, we find that |an − c| ≥ , and for any m > n we have c < an ≤ am and so
|am − c| ≥ as well. This means that limn→∞ an 6= c, which is a contradiction. Thus an ≤ c.
Similarly, we can show that c ≤ bn .
Suppose that the intersection also contains some d 6= c. Then each each of the intervals
[an , bn ] must contain both c and d, and in particular we have bn − an > |c − d| > 0 for all n.
This contradicts that limn→∞ (bn − an ) = 0, so there is no such d in the intersection.
This theorem says that if you are trying to compute a value and are able to bound it inside
an interval, and if you successively shrink the size of that interval to 0, then the bounds of
the interval approach your desired value
√ from both sides. This is what we did when we used
the bisection method to approximate 2.
Theorem 9.2 (Monotone Convergence Theorem). Suppose the sequence {an }∞
n=1 is monotone increasing, i.e., an < an+1 for all n ∈ N. If {an }∞
is
bounded
above,
then it must
n=1
converge. The same holds for sequences that are monotone non-decreasing (an ≤ an+1 ) and
bounded above, as well as for sequences that are bounded below and are monotone decreasing
(an > an+1 ) or monotone non-increasing (an ≥ an+1 ).
You might think that we will prove the Monotone Convergence Theorem by showing that
the monotone sequence is, in fact, a Cauchy sequence. Actually, our proof will do so only
indirectly, instead following from the Nested Intervals Theorem. But first we need a lemma,
the sequence version of the Least Upper Bound property of the real numbers.
Lemma 9.3 (Least Upper Bound Lemma). If a set A ∈ R is bounded above, then there
exists a sequence limn→∞ an = L with values in A, such that L is the least upper bound of A.
16
GREGORY D. LANDWEBER
Proof. We construct a chain of nested intervals and invoke the Nested Intervals Theorem
above. Let M1 be an upper bounded of A, and let a1 ∈ A be any element. Now construct
the midpoint m = (a1 + M1 )/2, and consider the following two cases:
(1) If m is an upper bound of the A, we set a2 = a1 , and we take M2 = m.
(2) If m is not an upper bound of A, then there exists a2 ∈ A such that a2 > m, and we
set M2 = M1 .
Repeating this process, we can construct a sequence of nested intervals
[a1 , M1 ] ⊃ [a2 , M2 ] ⊃ [a3 , M3 ] ⊃ · · · .
Also, since at each stage we take the midpoint of the previous interval and the next interval
is contained in either the left or right half, the size of these intervals shrinks to 0. By the
Nested Intervals Theorem, there exists a single L ∈ R contained in all of these intervals,
such that
L = lim an = lim Mn .
n→∞
n→∞
We must now show that L is the least upper bound of A. If L is not an upper bound of
A, then there exists a ∈ A with a > L. However, since the upper bounds converge to L,
there must exist upper bounds Mn satisfying |Mn − L| < a − L, and thus a > Mn , which
contradicts that Mn is an upper bound. If L is not the least possible upper bound of A,
then there exists another upper bound K < L. But then there exist elements an satisfying
|an − L| < L − K, and thus an > K, which contracts that K is an upper bound.
While constructing the least upper bound was straightforward, showing that it is a least
upper bound was more difficult. It is best to think of this in terms of a number line diagram.
If L is not an upper bound, there is an element of A greater than one of the upper bound
right endpoints, and if L is not least, then there exists an upper bond less than one of the
sequence left endpoints. So we really need from the Nested Intervals Theorem that the two
endpoint sequences converge to the same limit from both sides.
Proof of the Monotone Convergence Theorem. We use the previous lemma, applied to the
bounded set A = {an }∞
n=1 . This produces a convergent subsequence limk→∞ ank = L. We
must now show that this least upper bound L is, in fact, the limit of the entire monotone
sequence {an }∞
n=1 . Given any > 0, there exists K ∈ N so that L − ank < for all k > K.
Letting N = nk+1 and using the fact that our sequence is monotone increasing, we have
an > ank+1 > L − =⇒ |an − L| < ,
for all n > N , and thus limn→∞ an = L.
In general, having a convergent subsequence does not imply that the full sequence converges. For instance, the sequence +1, −1, +1, −1, +1, −1, . . . has subsequences that converge
to +1 and −1, but the entire sequence does not converge. However, it is indeed true for
monotone sequences that the convergence of a subsequence implies the convergence of the
whole sequence, as our proof above shows.
In the following exercise, we consider the converse of this statement, which is true even
without the assumption of monotonicity.
APPLIED ANALYSIS
17
Exercise 9.4. Given a convergent sequence, show that any subsequence converges to the same
value as the original sequence. In other words, if limn→∞ an = L, then limk→∞ ank = L.
On the other hand, it is not hard to find convergent subsequences if the original sequence
is bounded, as the following theorem shows:
Theorem 9.5 (Bolzano-Weierstrass Theorem). Every bounded sequence has a convergent
subsequence. Furthermore, if the sequence is contained in a closed interval I, then the limit
of any convergent subsequence is also contained in I.
Proof. Our proof uses the Nested Intervals Theorem. Divide the interval I into two closed
subintervals at its midpoint. Since our original sequence was infinite, either the left halfinterval or the right half-interval (or both half-intervals) contains infinitely many elements.
Chose a closed half-interval that contains infinitely many elements of the original sequence.
Then subdivide that interval and choose its closed half-interval containing infinitely many
elements of the original sequence, and so on. Since each iteration of this process cuts the
size of the interval in half, we obtain a sequence of nested closed intervals whose sizes shrink
to zero. Choosing a subsequence of our original sequence with one element in each of these
nested intervals, this subsequence is then squeezed between the sequences of left and right
endpoints, which both converge to the same value by the Nested Intervals Theorem. Our
subsequence must then converge to that same value by the Squeeze Lemma.
Now, suppose that we have a convergent sequence {an }∞
n=1 contained in a closed interval
I = [c, d]. We then have c ≤ an ≤ d for all n, and by the Sign-Preserving Lemma, we see
that c ≤ limn→∞ an ≤ d.
Note that the second statement of this theorem is not necessarily true if I were an open
interval, since the limit could be one of the endpoints. Also note that the Monotone Convergence Theorem can be viewed as a corollary of the Bolzano-Weierstrass Theorem. If
you have a monotone increasing sequence that is bounded above, then the sequence is also
bounded below by its first element. It therefore has a convergent subsequence, just as we
showed using the Least Upper Bound Lemma. The converse is also true, that the BolzanoWeierstrass Theorem is a corollary of the Monotone Convergence Theorem. Indeed, all four
of the results in this section are equivalent to one another.
10. Limits and Continuity
You probably have an intuitive idea about what limits are from Calculus. Here we present
a definition of limits of functions in terms of limits of sequences.
Definition 10.1. Let f : I − {c} → R be a function defined on an open interval I with a
point c ∈ I removed. We say the limit of f (x) as x approaches c is limx→c f (x) = L, if for
every convergent sequence in I − {c} with limn→∞ an = c, we have limn→∞ f (an ) = L.
This sequence-based definition of limits likely agrees with your intuition from calculus.
If you want to compute limx→2 x2 , you might compute 1.92 , 1.992 , 1.9992 , . . ., or perhaps
2.12 , 2.012 , 2.0012 , . . .. In other words, you choose a sequence of numbers approaching 2, apply
18
GREGORY D. LANDWEBER
the function to each of those numbers, and determine the limit of the resulting sequence.
Our definition is very much a discrete approach to limits, which contrasts the standard
continuous − δ approach, which we show is equivalent in the following theorem.
Theorem 10.2. We have limx→c f (x) = L if and only if for all > 0, there exists δ > 0
such that for any x 6= c satisfying |x − c| < δ, we have |f (x) − L| < .
Proof. Suppose the − δ definition of the limit hold and consider any sequence such that
limn→∞ an = c. Given > 0, there exists a δ such that |f (an ) − L| < whenever |an − c| < δ.
However, we can find N ∈ N so that |an − c| < δ whenever n > N . It follows that
limn→∞ f (an ) = L.
Suppose the − δ definition of the limit fails. In that case, there exists an > 0 such that
for all δ > 0 we have some x 6= c satisfying both |x − c| < δ and |f (x) − L| ≥ . We construct
a sequence by choosing terms an satisfying these conditions for δ = 1/n for each n ∈ N. We
clearly have limn→∞ an = c, but limn→∞ f (an ) 6= L.
Definition 10.3. Let f : I → R be a function defined on an open interval I. If c ∈ I, we
say that f is continuous at c if limn→∞ f (an ) = f (c) for every sequence with limn→∞ an = c.
We say that f is continuous on I if it is continuous at all c ∈ I.
This definition of continuity says that taking a limit of a sequence commutes with applying
the function, i.e., that applying the function and taking the limit of the result is the same
as taking the limit first and then applying the function. In light of our definitions of limits
above, we also have an equivalent and simpler definition for continuity.
Definition 10.4. A function f : I → R is continuous at c ∈ I if limx→c f (x) = f (c).
11. The Intermediate Value Theorem
At the √very start of this course, we showed how you can use the bisection method to
compute 2, proving that the resulting sequenced converged. At the time, we pointed out
that the bisection method works because of the Intermediate Value Theorem from Calcuus.
In this section, we are going to consider the bisection method in general, and then use it to
actually prove the Intermediate Value Theorem. No, this is not circular reasoning. Rather,
our understanding of the real numbers has shifted from our Calculus-based intuition to a
rigorous treatment in terms of Cauchy sequences and the Nested Intervals Theorem.
The bisection method is our first example of a root approximation algorithm. Given a
real-valued function f we often want to find its roots, the solutions of the equation f (x) =√0.
If we cannot solve for the directly, it may be useful to approximate them. In our earlier 2
example, we were looking for a root of the equation f (x) = x2 − 2.
Algorithm 11.1 (Bisection Method). If f : [a, b] → R is a continuous function and f (a) and
f (b) have opposite signs, then we can approximate a root of f (x) via the following algorithm:
(1) Begin with the closed interval [a, b].
(2) Consider the midpoint m = (a + b)/2. There are two possibilities:
(a) If f (m) = 0, we are done, as we have found a root.
APPLIED ANALYSIS
19
(b) If f (m) 6= 0, choose either the subinterval [a, m] or [m, b] so that the function
f (x) takes opposite signs at the two endpoints.
(3) Return to step (2). Lather, rinse, repeat.
The midpoints generated in step (2) then form a sequence which converges to a root of f .
Proof. This algorithm produces a chain of nested intervals. Furthermore, the size of the
interval at the nth iteration is (b − a)/2n , which shrinks to 0 as n → ∞. We may therefore
invoke the Nested Interval Theorem to produce a value c ∈ [a, b], which is the limit of the
sequences of both the left and the right endpoints of those intervals.
We must still verify that c is a root, i.e., that f (c) = 0. To do this, we construct two
new sequences. For one sequence, from each successive interval we take the endpoint where
f (x) is positive. For the other sequence, for each interval we take the endpoint where f (x) is
negative. Both sequences must converge to c, and by continuity, if we apply the function f to
the entries of these sequences, the resulting sequences must converge to f (c). However, one of
those sequences has positive values, while the other has negative values. By Lemma 7.1, the
limit of the positive sequence must be non-negative, and the limit of the negative sequence
must be non-positive. Together, that means the limit must be f (c) = 0.
While we presented the bisection method as a method for root approximation, solving
f (x) = 0, we can also use it to approximate the solution of any equation of the form
f (x) = d, which is a root of the equation f (x) − d = 0. The Bisection Method therefore
gives us a constructive method of proving the Intermediate Value Theorem. This stands
in contrast to the approach taught in most Calculus and Real Analysis courses, where the
Intermediate Value Theorem is touted as an example of an existence theorem which does
not actually give you an explicit solution.
Corollary 11.2 (Intermediate Value Theorem). If the function f : [a, b] → R is continuous,
and d is between f (a) and f (b), then there exists c ∈ [a, b] such that f (c) = d.
Exercise 11.3. Use the Bisection Method to compute the Golden Ratio Ï•, which satisfies the
equation 1 + 1/Ï• = Ï•, to three digits after the decimal point.
12. The Extreme Value Theorem
One of the main applications of calculus is to minimize or maximize the value of a differentiable function. In this section, we show that every continuous function on a closed interval
achieves its minimum or maximum. First we need a lemma, showing that every continuous
on function on a closed interval is bounded.
Lemma 12.1. A continuous function f : I → R on a closed interval I is bounded.
Proof. Suppose that f is not bounded above on the closed interval I. Then for any potential
bound M ∈ R, there must exist some a ∈ I with f (a) > M . In particular, we can find
a sequence {an }∞
n=1 in I with f (a1 ) > 1, f (a2 ) > 2, f (a3 ) > 3, etc. Clearly the sequence
{f (an )}∞
diverges,
as does any subsequence. However, the Bolzano-Weierstrass Theorem
n=1
20
GREGORY D. LANDWEBER
∞
tells us that the bounded sequence {an }∞
n=1 contains a convergent subsequence {ank }k=1 with
limit in I, and since f is continuous we have
lim f (ank ) = f lim ank ,
k→∞
k→∞
which contradicts that all subsequences of {f (an )}∞
n=1 diverge.
Theorem 12.2 (Extreme Value Theorem). If a function f : I → R on a closed interval I
is continuous, then there exist a, b ∈ I where f attains its minimum and maximum values:
f (a) = max f (x) and f (b) = min f (x).
x∈I
x∈I
Proof. By the above lemma, the continuous function f is bounded on a closed interval. To
find the maximum and minimum of the bounded set A = {f (x) | x ∈ I}, we use the Least
∞
Upper Bound Lemma, which tells us that there exists sequences {f (an )}∞
n=1 and {f (bn )}n=1
converging to the least upper bound and greatest lower bound of A, respectively. The
∞
sequences {an }∞
n=1 and {bn }n=1 in I may not converge, but since I is a closed interval, the
Bolzano-Weierstrass Theorem tells us that there exist convergent subsequences
lim ank = a and lim bnk = b,
k→∞
k→∞
with a, b ∈ I. Furthermore, by continuity we have
lim f (ank ) = f (a) and lim f (bnk ) = f (b).
k→∞
k→∞
∞
But these are subsequences of {f (an )}∞
n=1 and {f (bn )}n=1 , which we already know converge
to the least upper bound and greatest lower bound, so f (a) and f (b) must themselves be
the maximum and minimum, respectively.
13. Derivatives
Definition 13.1. If f : I → R is a continuous function on an interval I, the derivative of f
at a point c ∈ R is
f (x) − f (c)
f (c + h) − f (c)
= lim
.
x→c
h→0
x−c
h
We say that f is differentiable at c if this limit exists, and that f is differentiable in an
interval I if it is differentiable for all c ∈ I. In that case the first derivative of f can be
viewed as a function f 0 : I → R.
f 0 (c) = lim
Theorem 13.2. Differentiability implies continuity.
Proof. Suppose f : I → R is discontinuous at c ∈ I. Then there exists > 0 such that for
all δ > 0, there exists x with |x − c| < δ and |f (x) − f (c)| ≥ . For such an x we have
|f (x) − f (c)|
> ,
|x − c|
δ
but since δ was arbitrary, we see that the slope (f (x) − f (c))/(x − c) is unbounded as x → c,
so the derivative cannot exist.
APPLIED ANALYSIS
21
Theorem 13.3 (Product Rule). If f, g are both differentiable at x, then so is their product,
and we have (f g)0 (x) = f 0 (x)g(x) + f (x)g 0 (x).
Proof. By some clever algebra, we rearrange the limit as
f (x + h)g(x + h) − f (x)g(x)
h→0
h
f (x + h) − f (x)
g(x + h) − g(x)
= lim
g(x + h) + f (x)
h→0
h
h
0
0
= f (x)g(x) + f (x)g (x),
(f g)0 (x) = lim
where we use the fact that a limit of a sum or product is the sum or product of the limits,
provided those limits exist.
Another way to view the derivative is to extend the real numbers by adjoining an additional
infinitesimal element dx which is so small that (dx)2 = 0. Such a dx encapsulates the idea of
limh→0 h. While there is no actual real number dx, we can work instead with the polynomial
ring R[dx] in the variable dx, and then take the quotient R[dx]/hdx2 i by the ideal generated
by dx2 . Elements of this quotient ring can be written in the form a + b dx, and in particular
the derivative of a function satisfies
f (x + dx) = f (x) + f 0 (x)dx.
Note that if we replace dx by h in this formula, we get the formula for the linear approximation to the function f (x) at the point (x, f (x)). However, we assert that if dx is not just
small but infinitesimal, then this formula becomes an equality, not an approximation.
Theoretical mathematicians will be quick to point out that working with infinitesimals
is very tricky, and that there are subtleties which we are ignoring. They are absolutely
right, but that should not stop us from exploiting this surprisingly useful idea, which works
extremely well in the applied setting. As an example, we can recast our proof of the product
rule using infinitesimals:
(f g)(x + dx) = f (x + dx)g(x + dx)
= f (x) + f 0 (x)dx g(x) + g 0 (x)dx
= f (x)g(x) + f 0 (x)g(x) + f (x)g 0 (x) dx + f 0 (x)g 0 (x)(dx)2
= (f g)(x) + f 0 (x)g(x) + f (x)g 0 (x) dx,
where in the final step we recall that (dx)2 = 0. Note that the coefficient of dx is precisely
the formula for the product rule!
We can also use infinitesimals to give a simple proof of the chain rule:
(f â—¦ g)(x + dx) = f (g(x + dx)) = f g(x) + g 0 (x)dx = f (g(x)) + f 0 (g(x))g 0 (x)dx,
where the coefficient of dx is the the formula for the chain rule. Here we used the fact that
if dx is an infinitesimal satisfying (dx)2 = 0, then g 0 (x)dx is also an infinitesimal.
22
GREGORY D. LANDWEBER
14. The Mean Value Theorem
In this section we prove the most important theorem in all of Numerical Analysis: the error
estimate for Taylor series. Why is this so important? In Numerical Analysis, we not only
find approximations, but we also need to provide error bounds so we know how good those
approximations are. Most of the time, the functions we are working with can be expressed
in terms of their Taylor series, and nearly all of our error estimates are derived from the
Taylor series error term. But it gets better than that. Sometimes, just knowing that Taylor
series exists for our functions is enough to show us how to manipulate our approximations to
squeeze out even more accuracy. We will return to that idea later. Before we get to Taylor’s
Theorem, we warm up with a lemma and two special cases, all familiar to us from Calculus.
Lemma 14.1 (Critical Point). If f : I → R is differentiable at c ∈ I, and if f (c) ≥ f (x)
for all x ∈ I, then f 0 (x) = 0.
Proof. In the definition of the derivative f 0 (x), the numerator satisfies f (c) − f (x) ≥ 0 for
all x ∈ I. If we take limx→c+ , then the denominator satisfies x − c > 0 as well, so we must
have f 0 (c) ≥ 0. However, if we take limx→c− , then the denominator satisfies x − c < 0, so we
must have f 0 (c) ≤ 0. It follows that f 0 (c) = 0.
This lemma tells us that the extrema of a differentiable function are critical points where
the derivative vanishes. This is the basic idea behind optimization problems in Calculus,
where you attempt to find the extrema of a function by setting its derivative equal to zero.
Theorem 14.2 (Rolle’s Theorem). If f : [a, b] → R is differentiable and f (a) = f (b), then
there exists c ∈ [a, b] where f 0 (c) = 0.
Proof. If f is differentiable, then it is also continuous. By the Extreme Value Theorem, there
exist points c, d ∈ [a, b] such that f (c) is maximal and f (d) is minimal. If these maxima and
minima occur at the endpoints a and b, then since f (a) = f (b) we see that the function is
in fact constant, and so its derivative vanishes across the entire interval. Otherwise, there
is a minimum or maximum at some point in the interior of the interval, and by the Critical
Point Lemma we see that the derivative vanishes there.
Theorem 14.3 (Mean Value Theorem). If f : [a, b] → R is differentiable, then there exists
c ∈ [a, b] where
f (b) − f (a)
f 0 (c) =
.
b−a
Proof. Consider the slope-adjusted function
f (b) − f (a)
(x − a).
b−a
We note that g(a) = g(b) = f (a). Applying Rolle’s Theorem, there exists c where
g(x) = f (x) −
0 = g 0 (c) = f 0 (c) −
which gives us our desired result.
f (b) − f (a)
,
b−a
APPLIED ANALYSIS
23
15. Taylor’s Theorem
15.1. Generalizing the Mean Value Theorem. Another way to look at the Mean Value
Theorem is that it limits how far away from each other f (a) and f (b) can get. If the derivative
is small, then f (a) and f (b) are close to each other, while if the derivative is very large then
f (a) and f (b) can be far apart. Replacing the a, b, c with c, x, ξ respectively, we can rewrite
the Mean Value Theorem as
(15.1)
f (x) = f (c) + (x − c)f 0 (ξ),
for some ξ between c and x. This looks once again very much like the formula for the linear
approximation of f (x) at the point (x, f (x)). However, the difference here is that this is an
equality, not an approximation, but the price we pay is that we do not take f 0 (c) but rather
f 0 (ξ) at some unknown point ξ between c and x.
Suppose we know the value of f (c) and we would like to use f (c) as a (very simple) estimate
for the value of some other f (x). If we know that the derivative f 0 (ξ) is small, or perhaps
bounded by |f 0 (ξ)| < M for all ξ between c and x, then we have |f (x) − f (c)| < M |x − c|.
This is the essence of a standard − δ argument, where we take δ = /M , letting M be an
upper bound on the derivative. As far as error estimates go, that was pretty lame. However,
if we replace the idea of a linear approximation with a Taylor series, we get even better
approximations.
Theorem 15.1 (Taylor’s Theorem). Suppose f : I → R is differentiable at least n + 1 times.
Then for any c, x in I we have
f (x) = f (c) + (x − c)f 0 (c) +
x − c 00
(x − c)n (n)
(x − c)n+1 (n+1)
f (c) + · · · +
f (c) +
f
(ξ).
2
n!
(n + 1)!
for some ξ between c and x. Furthermore, if |f (n+1) (ξ)| < M for all ξ between c and x, then
the error in the nth degree Taylor series is bounded by
n
k
X
(x − c) (k) |x − c|n+1
f (c) <
M.
f (x) −
k!
(n + 1)!
k=0
Proof. Fixing x 6= c, we solve for the coefficient of (x − c)n+1 that we are looking for:
P
k
f (x) − nk=0 (x−c)
f (k) (c)
k!
d=
.
(x − c)n+1
Our job is then to show that this coefficient is, in fact,
f (n+1) (ξ)
(n + 1)!
for some ξ between c and x. Consider the function
!
n
X
(x − y)k (k)
g(y) =
f (y) + d(x − y)n+1 .
k!
k=0
d=
For y = x, the only term that survives is f (0) (y), so we have
g(x) = f (x).
24
GREGORY D. LANDWEBER
And for y = c, because of our choice of d we also obtain
g(c) = f (x).
By Rolle’s Theorem, we then know that there exists some ξ between c and x such that
g 0 (ξ) = 0. Computing the derivative of g(y), we have
!
n
k
k−1
X
(x
−
y)
(x
−
y)
f (k) (y) +
f (k+1) (y) − (n + 1)d(x − y)n
g 0 (y) =
(−k)
k!
k!
k=0
(x − y)n (n+1)
f
(y) − (n + 1)d(x − y)n ,
n!
where each (x − y)k−1 f (k) /(k − 1)! cancels against the (x − y)k f (k+1) (y)/k! in the previous
term, leaving only the last term in the summand uncanceled. Setting g 0 (ξ) = 0 gives us
=
(x − ξ)n (n+1)
f
(ξ) = (n + 1)d(x − ξ)n ,
n!
and solving for d we obtain
f (n+1) (ξ)
,
d=
(n + 1)!
which completes our proof.
Actually, the proof of Taylor’s Theorem is not that different from the proof of the Mean
Value Theorem. In both cases, we constructed a function g(x) which takes the same value
at both endpoints of the interval and then invoked Rolle’s Theorem. In fact, the Mean
Value Theorem, particularly when written in the form of (15.1), is a special case of Taylor’s
Theorem where n = 0. To see this, let’s redo Mean Value Theorem and its proof in the form
of Taylor’s Theorem. Recall that the variables c, x, ξ in Taylor’s Theorem correspond to the
variables a, b, c in the Mean Value Theorem, respectively. Also, the variable d in the proof
of Taylor’s Theorem is a generalization of the slope of the secant line in the Mean Value
Theorem.
Corollary 15.2 (Mean Value Theorem Redux). If f is differentiable in the closed interval
between c and x, then
f (x) = f (c) + f 0 (ξ)(x − c)
for some ξ between c and x.
Proof. Solving for the coefficient of x − c, we obtain
f (x) − f (c)
d=
,
x−c
which is the slope of the secant line for f (x) between c and x. We want to show that d = f 0 (ξ)
for some ξ between c and x. Noting that the slope-adjusted function
g(y) = f (y) + d(x − y)
satisfies g(c) = g(x) = f (x), Rolle’s theorem tells us that there exists ξ between c and x
such that g 0 (ξ) = 0. Computing the derivative of g, we have
g 0 (y) = f 0 (y) − d
APPLIED ANALYSIS
so 0 = g 0 (ξ) = f 0 (ξ) − d implies f 0 (ξ) = d.
25
15.2. Error Estimates. Taylor’s Theorem is one of the most important results in numerical
analysis. Most of our error estimates will arise by approximating our functions by polynomials, and then analyzing the Taylor series error term. In Calculus, you likely encountered
Taylor series, showing that they allow you to approximate many functions by an infinite
series. We will consider such infinite series in the next section. What Taylor’s Theorem tells
us is how well a function can be approximated by a finite Taylor polynomial. We see that
the error in the nth degree Taylor polynomial is proportional to (x − c)n+1 . We typically say
that the nth degree Taylor polynomial approximates the function up to order xn+1 , and we
write this in the form
f (x) = pn (x) + O (x − c)n+1 ,
which means that there exists some constant of proportionality M > 0 such that
|f (x) − pn (x)| < M |x − c|n+1 .
In the case of Taylor’s Theorem, we can take
M=
maxξ∈[c,x] |f (n+1) (ξ)|
,
(n + 1)!
or if we do not know the maximum of f (n+1) (ξ), we can instead take any upper bound.
In practice, how do we find maximum of f (n+1 (ξ) that we need for the Taylor’s Theorem
error bound? In some cases, we know enough about the derivatives of the function that we
can come up with a good upper bound. For instance, we note that e < 3, or that | sin x| ≤ 1
and | cos x| ≤ 1, as we will use in examples in the next chapter. In general, we typically take
x very close to the center c of the Taylor series, and most of the time we find that f (n+1) (ξ)
is monotone between c and x. (We should still be careful of the exceptions where f (n+1) (ξ)
has a critical point between c and x and is neither increasing or decreasing over the entire
range, which is the case if we consider a Taylor series for cos x or sin x.) If the (n + 1)-st
derivative is monotone, then it takes its maximum value between c and x at either the point
ξ = c or ξ = x, so we need only compute its value at both and take the larger:
(n+1) f
(ξ) ≤ max |f (n+1) (c)|, |f (n+1) (x)| .
In particular, if f (n+1) (c) is larger, then the Taylor’s Theorem error bound gives us
|f (x) − pn (x)| ≤
|f (n+1) (c)|
|x − c|n+1 ,
(n + 1)!
which is precisely the absolute value of the next term in the Taylor series! This usually
happens if the Taylor series alternates between positive and negative terms, and it is a
special case of the Alternating Series Test discussed in the next section. However, if the
Taylor series is not alternating, then usually the maximum is at the other point, ξ = x, and
the error becomes
|f (n+1) (x)|
|f (x) − pn (x)| ≤
|x − c|n+1 .
(n + 1)!
26
GREGORY D. LANDWEBER
This takes a bit more computation that just the next term of the Taylor series, and often
such error estimates are not terribly helpful, but it may be the best we can do. We will see
examples of both error estimates in the next section. One last thing to keep in mind about
these estimates it that which one we use often depends on whether x > c or x < c, as one of
the series is alternating and the other is not.
It is worth noting that while the notation and indeed the definition of big-O notation is
the same in mathematics and computer science, the usage is different. In computer science,
you typically consider expressions like O(n2 ) or O(n log n), which describe the time required
to perform an algorithm as the size n of the problem gets large, or approaches infinity. In
this context, big-O notation means “up to a constant and ignoring lower order terms”. In
contrast, when analyzing Taylor series in mathematics, we consider expressions like O(x2 )
where x is small or approaches 0, and the notation essentially means “up to a constant and
ignoring higher order terms”. This approach makes sense if we think in terms of infinite
Taylor series, where the terms get successively smaller with higher and higher powers of
x − c. However, in light of Taylor’s Theorem, we are not necessarily ignoring higher order
terms, since the error can actually be expressed using a single term of degree n + 1.
There are always two things to consider when analyzing the convergence of a Taylor series.
First, since the error is O((x − c)n+1) ), Taylor series typically work well when x is close to
the center c. For instance, if x − c = 0.1, then you would expect the error in the quadratic
Taylor polynomial to be proportional to 0.13 = 0.001, and if you wanted d digits after the
decimal place, you would consider adding up the first d terms of the Taylor series. However,
we must also consider the constant M , which measures how wild the n + 1st derivative is.
If the n + 1st derivative is small, then the (x − c)n+1 factor dominates. However, the n + 1st
derivative may be large, sometimes dominating the error term. So a complete error analysis
of Taylor series convergence must always examine the derivatives of the function.
Finally, I want to explain how Taylor polynomials fit with the idea of an infinitesimals satisfying 2 = 0 that we introduced at the end of Section 13. If we generalize our notion of
infinitesimals, instead using the nilpotent condition n+1 = 0 and assuming that lower powers
of do not vanish, then we can write our Taylor polynomial exactly, without requiring an
error term
2
n
f (c + ) = f (c) + f 0 (c) + f 00 (c) + · · · + f (n) (c),
2
n!
n+1
because the error term is proportional to = 0. Looking at infinitesimals differently,
working with such an infinitesimal is equivalent to ignoring terms of degree n + 1 or higher,
or working up to O(xn+1 ).
16. Series
To better understand Taylor series, we consider series in general.
Definition 16.1. A series is an infinite summation, and we say that a series converges
∞
X
n=1
an = L
APPLIED ANALYSIS
if the sequence of partial sums sn =
Pn
k=1
27
ak converges to limn→∞ sn = L.
Notice that the convergence of a series is expressed in terms of the convergence of the
sequence of partial sums. In fact, studying series is equivalent to studying sequences. Not
only can every series be expressed in terms of a sequence of partial sums, but also every
sequence {sn }∞
n=1 can be expressed as the partial sums of the series with a1 = s1 , and
an = sn − sn−1 . In other words, we can take the series whose terms are the successive
differences of the the terms in our sequence. Several of our standard theorems about the
convergence of sequences have immediate analogues for series, and in particular, the Series
Comparison Test and Alternating Series Test below are the series versions of the Monotone
Convergence Theorem and the Nested Intervals Theorem, respectively.
So why do we make a big deal about series? It turns out that many interesting sequences
can be expressed most simply as series, particularly Taylor series. Taylor series are actually
series-valued functions, called power series.
Definition 16.2. A power series is a function of the form
p(x) =
∞
X
an x n ,
n=0
or more generally, a power series centered at c ∈ R is a function of the form
p(x) =
∞
X
an (x − c)n .
n=0
Example 16.3. The simplest power series is a geometric series, which converges to
∞
X
xn =
n=0
1
1−x
for |x| < 1. To prove convergence, we consider the partial sums
s n = 1 + x + x2 + · · · xn =
1 − xn+1
1
xn+1
=
−
,
1−x
1−x 1−x
and taking the limit we obtain
∞
X
1
xn+1
1
limn→∞ xn+1
1
n
x = lim
−
=
+
=
.
n→∞ 1 − x
1
−
x
1
−
x
1
−
x
1
−
x
n=0
Recalling our proof that limn→∞ 1/2n = 0, we see that limn→∞ xn = 0 whenever |x| < 1.
Definition 16.4. The Taylor series for an infinitely differentiable (called smooth or C ∞ )
function f (x) centered at c ∈ R is the series
∞
X
f (n) (c)
n=0
n!
(x − c)n .
We say that a function f : I → R is real analytic if its Taylor series centered at some point
c ∈ I converges for all x ∈ I.
28
GREGORY D. LANDWEBER
Oddly enough, most real analysis courses do not spend much time discussing real analytic
functions. However, from a numerical point of view, real analytic functions are precisely the
sort of well behaved functions that are best suited for approximation. Not only are such
functions continuous and all their derivatives are continuous, but we can also approximate
them by Taylor polynomials, with error bounded by Taylor’s Theorem. In some cases, simply
knowing that a convergent Taylor series exists for a given function will allow us to refine our
algorithms to extract even more accuracy, even without explicitly computing the error.
Example 16.5. The Taylor series for the exponential function is particularly easy to compute,
since ex is its own derivative. With f (n) (x) = ex , we have
∞
X
xn
x2 x3
x
e =
=1+x+
+
+ ··· .
n!
2
3!
n=0
This series converges for all x ∈ R. While it is immediately clear that it converges for small
x, such as x = 0.1:
e0.1 = 1 + 0.1 + 0.005 + 0.000166 + · · · ≈ 1.10517,
It is not immediately clear that it converges for large x, such as x = 10:
e10 = 1 + 10 + 50 + 167 + 417 + · · · .
We can see why this converges by comparing two adjacent terms of the Taylor series. The
term xn /n! differs from xn−1 /(n − 1)! by a factor of x/n. By the Archimedean property, for
any x ∈ R, there exists n ∈ N such that n > x, which gives us x/n < 1. So even if x is very
large, eventually the terms we are adding to the Taylor series start to decrease, and in fact
they decrease faster than the terms in the geometric series.
Let us now estimate the error in the Taylor series for e = e1 . Since f (n+1) (ξ) = eξ is an
increasing function, it achieves its maximum between for ξ ∈ [0, 1] at ξ = 1. That maximum
is e1 = e. (Note that at the other endpoint ξ = 0, we have e0 = 1, which is clearly smaller.)
This isn’t very useful in making an error estimate because it is the same value we are trying
to compute, and we cannot assume we already know it. However, we do know that e < 3,
so we can use 3 as an upper bound for |f (n+1) (ξ)|. We therefore have
1 1
3
1
e − 1 + 1 + + + ··· +
<
,
2 6
n!
(n + 1)!
where we have left out the factor |1 − 0|n+1 = 1.
Example 16.6. Consider the function f (x) = 1/(1 − x). Computing its Taylor series centered
at 0, we have:
f (x) = (1 − x)−1
f (0) = 1
f 0 (x) = (1 − x)−2
f 0 (0) = 1
f 00 (x) = 2(1 − x)−3
f 00 (0) = 2
f 000 (x) = 3!(1 − x)−4
f 000 (0) = 3!
f (n) (x) = n!(1 − x)−n−1
f (n) (0) = n!
APPLIED ANALYSIS
29
The corresponding Taylor series is then just the geometric series,
∞
X
1
= 1 + x + x2 + x3 + · · · =
xn ,
1−x
n=0
since the f (n) (0) = n! factor cancels the n! in the denominator of the Taylor series formula.
Taking x = 0.1, we compute the Taylor series error estimate for the series
10
1.111 · · · = .
9
(n+1)
−n−2
The function f
(ξ) = (n + 1)!(1 − x)
is decreasing, so it takes its maximum value on
the interval ξ ∈ [0, 0.1] at ξ = 0.1. There we have
f (n+1) (0.1) =
(n + 1)!
,
0.9n+2
and our error estimate becomes
0.1n+1
10
10
− (1 + 0.1 + 0.01 + · · · + 0.1n ) <
= n+2 ,
n+2
9
0.9
9
where the (n + 1)! terms in the numerator and denominator cancel. In particular, the n = 0
estimate of 1 is within an error of 10/81 = 0.123, and the n = 1 estimate of 1.1 is within an
error of 10/729 = 0.014. Also, since the series consists of entirely positive terms, we know
that the limit of the series is always greater than the estimate.
Example 16.7. Consider the function f (x) = cos x. Computing its Taylor series centered at
0, we have
f (x) = cos x
f (0) = 1
f 0 (x) = − sin x
f 0 (0) = 0
f 00 (x) = − cos x
f 00 (0) = −1
f 000 (x) = sin x
f 000 (0) = 0
f (4) (x) = cos x
f (4) = 1,
and the corresponding series is
x2 x4 x6
+
−
+ ··· .
2
4!
6!
Like ex , this series also converges for all x ∈ R. Analyzing the convergence via Taylor’s
Theorem, we note that we always have | sin x| ≤ 1 and | cos x| ≤ 1. We can therefore bound
any derivative of cos x by 1, so the error in the nth degree Taylor polynomial is at most
1
| cos x − pn (x)| ≤
|x|n+1 .
(n + 1)!
cos x = 1 −
In particular, since the 2nth degree Taylor polynomial is the same as the 2n + 1st degree
Taylor polynomial (since the degree 2n + 1 term vanishes), its error bound is
| cos x − p2n (x)| ≤
x2n+2
,
(2n + 2)!
30
GREGORY D. LANDWEBER
which is precisely the next term in the series! This is common for Taylor series with alternating positive and negative signs, and illustrates the Alternating Series Test below.
Exercise 16.8. Compute the Taylor series for f (x) = ln x centered at c = 1. Using this
Taylor series, plug in x = 0 to show that the corresponding series for ln 0 diverges. Then
plug in x = 2 and use Taylor’s Theorem to show that the corresponding sequence converges
to ln 2.
√
Exercise 16.9. Compute the Taylor series for f (x) = x centered
√ at c = 1, finding a general
expression for the terms. Use this Taylor
series to compute 1.1 to three digits after the
√
decimal point. Then try to compute 2 using 10 terms of the Taylor series. What do you
notice about the convergence?
Theorem 16.10 (Alternating Series Test). Suppose that the terms an of a series alternate
between positive and negative, that the |an | are decreasing (or non-increasing), and that
P
limn→∞ an = 0. Then the series ∞
n=1 an converges. Furthermore, the error in the partial
sum sn is bounded by the next term of the series |an+1 |.
Proof. This is a corollary of the Nested Interval Theorem. Suppose an is positive. Then
since the series is alternating we have an+1 ≤ 0, and looking at the partial sums, we see
that sn+1 = sn + an+1 ≤ sn . By our alternating and decreasing assumptions, we have
an+2 ≤ −an+1 , or equivalently an+1 + an+2 ≤ 0, and so sn+2 = sn + an+1 + an+2 ≤ sn .
Continuing this process, we see that sm ≤ sn for all m > n. Similarly, if an is negative we see
that sm ≥ sn for all m > n. We therefore get a chain of nested intervals (assuming without
loss of generality that a1 is positive):
[s2 , s1 ] ⊃ [s2 , s3 ] ⊃ [s4 , s3 ] ⊃ [s4 , s5 ] ⊃ [s6 , s5 ] ⊃ [s6 , s7 ] ⊃ · · · .
Since limn→∞ an = 0, these nested intervals shrink to zero size, and so there is a unique value
L which is the limit of the sn for both the left (even) endpoints and right (odd) endpoints.
P
It follows that ∞
n=1 an = L. Furthermore, since L lies in each of the intervals, we see that
the error |sn − L| is at most the size of the interval between sn and sn+1 , which is |an+1 |. In general, alternating series work extremely well for approximations. Instead of using
Taylor’s Theorem, which requires bounding successively larger derivatives to determine the
error, we need only look at the next term of the series! In fact, if a Taylor series is alternating,
then the Taylor series error bound is often the absolute value of the next term of the series,
as we saw at the end of the previous section.
Exercise 16.11. Using the Taylor series for ln x centered at c = 1 from Exercise 16.8, compute
both ln 0.9 and ln 1.1 to three digits after the decimal point. Estimate the error in each case
and show that it is less that 0.001. Note that one of the series will be an alternating series,
so you can use the Alternating Series Test. However, the other series will not be alternating,
so you will need to use Taylor’s Theorem to estimate the error.
Alternating series are much more likely to converge that series that are entirely positive.
We make this more precise with the following definition and lemmas, which explore the
differences between positive and alternating series.
APPLIED ANALYSIS
31
P
Definition 16.12. The series ∞
n=1 an is called absolutely convergent if the positive series
P∞
of absolute values n=1 |an | converges.
P
Lemma 16.13 (Series Comparison Test). Suppose 0 ≤ an ≤ bn for all n ∈ N and ∞
n=1 bn
P∞
converges. Then n=1 an converges.
P
Proof. We note that the sequence of partial sums {san }∞
for the
an series is monotone
n=1
P∞
b
b
a
non-decreasing. If n=1 bn = L, we see that sn ≤ sn ≤ L, where sn is a partial sum for the
P
bn sequence. Since the sequence {san }∞
n=1 is monotone and bounded, it must converge by
the Monotone Convergence Theorem.
Lemma 16.14. Absolute convergence implies convergence.
P
P∞
Proof. Suppose that ∞
n=1 |an | converges. Multiplying all terms by 2, we see that
n=1 2|an |
converges as well. However, we also have 0 ≤ |an | + an ≤ 2|an |. By the Series Comparison
Test, this tells us that the series
∞
X
|an | + an
n=1
converges. But then the difference of two convergent series
∞
X
n=1
an =
∞
X
2|an | −
n=1
∞
X
|an | − an
n=1
must converge as well.
The converse of this lemma is not true. For example, consider the alternating series
∞
X
(−1)n
n=1
n
= − ln 2
converges, but taking the absolute values of its terms gives the divergent harmonic series
∞
X
1
.
n
n=1
Series that are convergent but not absolutely convergent are called conditionally convergent,
and they typically converge more
√slowly than absolutely convergent series. For example, the
Taylor series you computed for 2 is conditionally convergent and converged slowly.
P∞
n
The convergence of a power series
n=0 an x depends on the value of the variable x.
In general, a power series converges on some interval, which may be a degenerate interval
consisting of just the single point x = 0, or it may be the entire real line. In the inside
of this interval of convergence, we will show that the power series converges absolutely by
comparing the power series to a geometric series.
P
n
Lemma 16.15. If a power series ∞
n=0 an x converges for some x = r, then it converges
absolutely for all x with |x| < |r|.
32
GREGORY D. LANDWEBER
P
n
n
Proof. Since ∞
n=0 an r converges, we know that its terms an r are bounded. Suppose that
|an rn | < M for all n ∈ N. Then we have
x n
|x|n
0 ≤ |an xn | = |an rn | n < M .
|r|
r
If |x| < |r|, then |x/r| < 1, and the series
∞
∞ x n
X
X
x n
M =M
r
r
n=0
n=0
is a geometric series which converges. Then by the comparison test we see that
P
n
converges, and so ∞
n=0 an x converges absolutely.
P∞
n=0
|an xn |
Theorem 16.16 (Radius of Convergence). There are three possibilities for the convergence
P
n
of a power series of the form ∞
n=0 an (x − c) :
(1) it converges absolutely for all x ∈ R, in which case it is called an entire function,
(2) it converges for only x = c, where it is just the constant series a0 , or
(3) there is a radius of convergence R such that it converges absolutely when |x − c| < R,
and it diverges when |x − c| > R.
Proof. Assuming that neither (1) nor (2) holds, we know that the power series diverges at
some z 6= c, and by the contrapositive of the above lemma, it must diverge for all x with
|x − c| > |z − c|. Consider the set
S = {r ∈ R | the series diverges for |x − c| > r}.
The set S contains |z − c| so it is non-empty, and all elements of S are positive so S is
bounded. Let R be the greatest lower bound of the set S.
If |x−c| > R, then there is some element r ∈ S so that |x−c| > r > R, and thus the power
series diverges at x. However, if |x − c| < R, then there exists r 6∈ S with |x − c| < r < R,
and some y ∈ R with |y − c| > r where the power series converges. But then |x − c| < |y − c|
and by the above lemma the power series converges absolutely at x.
Note that at the radius of convergence, i.e., at the points c ± R, this theorem does not say
whether the series converges or not. In some cases the series may converge at both of these
points, while in other cases it may diverge at one or both. Even if the series does converge
at the radius of convergence, it cannot converge absolutely, although proving that is beyond
the scope of this course. So what course must you take to make sense of this? Complex
Analysis! It turns out that if you consider complex-valued power series, there is a theorem
which says that there is always at least one point on the complex circle |z − c| = R where
the power series diverges. However, that point may not lie along the real line.
In practice, if we want to determine the radius of convergence of a power series, we use the
Ratio Test from calculus, which once again compares the given series to a geometric series.
P
Theorem 16.17 (Ratio Test). Consider the series ∞
n=0 an , and take the limit of the ratios
of successive coefficients
an+1 .
L = lim n→∞
an APPLIED ANALYSIS
33
If L < 1, then the series converges absolutely, and if L > 1 or L = ∞, then the power series
diverges.
Proof. If L < 1, then there exists r ∈ R such that L < r < 1. There then exists some
N ∈ N such that for n ≥ N we have |an+1 /an | < r, or equivalently |an+1 | < r|an |. The
P
terms of the series ∞
the terms of a geometric series
n=1 |an | are thus eventually less than
P∞
with 0 < r < 1, and by the comparison test we see that n=1 |an | converges. It follows that
P∞
n=1 an converges absolutely.
If L > 1 or L = ∞, then there exists r ∈ R such that 1 < r < L, and similarly there
exists N ∈ N such that |an+1 /an | > r for n ≥ N . With |an+1 | > r|an |, the terms an of our
sequence are unbounded, and so the sequence cannot converge.
Note that this theorem says nothing if the limit of the ratios is 1, which corresponds to
the fact that a power series may converge or diverge at its radius of convergence.
Example 16.18. Consider the power series
∞
X
xn
n=1
n
.
Applying the ratio test, we obtain
n+1
x /(n + 1) = lim |x| n = |x| lim n = |x| 1 = |x|.
lim n→∞ n + 1
n
n→∞
n→∞ n + 1
x /n
So the power series converges absolutely when |x| < 1 and diverges when |x| > 1.
P
n
In general, given a power series ∞
n=0 an x we compute
an+1 xn+1 an+1 ,
lim
= |x| lim n→∞ n→∞
an x n an and the radius of convergence is then
an .
R=
lim an+1 = n→∞
a
n+1
limn→∞ an 1
So why do we care so much about the radius of convergence about a Taylor series? It is
because it is very tempting to assume that Taylor series converge everywhere. Since they
typically don’t, it is vital to know where a Taylor series converges before you attempt to use
it, so you don’t accidentally try to compute the value of a divergent series. Also, a Taylor
series will converge more slowly as you approach its radius of convergence, and knowing
when that happens will help you analyze the convergence rate.
17. Numerical Differentiation
17.1. The Derivative. We recall the definition of the derivative
f (x + h) − f (x)
f 0 (x) = lim
.
h→0
h
Instead of taking the limit and evaluating derivatives by the usual set of algebraic rules, in
this section we compute derivatives numerically. Given the value of f (x) at two points, such
34
GREGORY D. LANDWEBER
as f (x + h) and f (x), we can compute the slope of the secant line as an estimate of the
derivative f 0 (x). But how good an estimate is it? We will bound the error and show how to
minimize it. We start with our standard assumption that the function is real analytic, i.e,
that it can be expressed as a Taylor series
f 00 (x)
f 000 (x)
f (n) (x)
+ h3
+ · · · + hn
+ ··· .
2
6
n!
Subtracting f (x), we can estimate the error for the standard divided difference formula for
the derivative
f (x + h) − f (x)
hf 0 (x) + O(h2 )
=
= f 0 (x) + O(h),
h
h
where the O(h) term involves the second derivative. This means that we expect the error in
the approximation (f (x + h) − f (x))/h of the derivative to be proportional to h itself. What
we do not see in the O(h) notation is the constant of proportionality, which is controlled by
the second derivative f 00 (ξ) for ξ between x and x + h. If that derivative is nicely bounded,
then the h dominates the expression. However, if that derivative is wild or unbounded, then
the error may be too large to be useful.
We note that the secant line we have been using goes through the point (x, f (x)) and
looks either forward or back to the point (x + h, f (x + h)). But what if we look forward and
back simultaneously, using both +h and −h and taking the average of the slopes of the two
secant lines? This gives us the central difference formula
1 f (x + h) − f (x) f (x) − f (x − h)
f (x + h) − f (x − h)
(17.1)
+
.
=
2
h
h
2h
f (x + h) = f (x) + hf 0 (x) + h2
This central difference formula takes just as much effort to compute as the standard divided
difference formula, using the value of the function at just two points. However, when we
analyze the error, we find that all the even powered terms in the two Taylor series cancel,
f (x + h) − f (x − h) = f (x) + hf 0 (x) + h2 f 00 (x)/2 + h3 f 000 (x)/6 + · · ·
− f (x) − hf 0 (x) + h2 f 00 (x)/2 − h3 f 000 (x)/6 + · · ·
= 2hf 0 (x) + h3 f 000 (x)/3 + · · · ,
and we have
f (x + h) − f (x − h)
2hf 0 (x) + O(h3 )
=
= f 0 (x) + O(h2 ),
2h
2h
where the O(h2 ) term involves the third derivative. What this means is that if we use the
central difference formula rather than the standard divided difference formula, the error is
much smaller. In particular, since the error is proportional to h2 rather than h, we get
twice as many digits of accuracy with the central difference formula as we do with the
standard divided difference formula, all without working any harder. (Provided that the
third derivative f 000 (ξ) is well controlled for ξ between x and x + h.) This is our first glimpse
of the magical power of numerical analysis! By simply adjusting our computation slightly
we can often get significantly more accuracy.
APPLIED ANALYSIS
35
17.2. Higher Derivatives. What about higher derivatives? If we want to compute the
second derivative, we can approximate f 00 (x) by
f (x + 2h) − f (x + h)
f (x + h) − f (x)
+ O(h) −
− O(h)
h
h
h
f (x + 2h) − 2f (x + h) + f (x)
=
+ O(h).
h2
f 0 (x + h) − f 0 (x)
=
h
Here the two O(h) terms do not cancel completely. Rather, their difference eliminates the h
terms in the Taylor series, leaving the h2 terms and higher, giving us O(h) − O(h) = O(h2 ).
Furthermore, the O(h) term involves the third derivative of f . If we want to verify this
explicitly from the Taylor series, we compute
f (x + 2h) − 2f (x + h) + f (x) = f (x) + 2hf 0 (x) + 4h2 f 00 (x)/2 + 8h3 f 000 (x)/6 + · · ·
− 2 f (x) + hf 0 (x) + h2 f 00 (x)/2 + h3 f 000 (x)/6 + · · ·
+ f (x)
= h2 f 00 (x) + O(h3 ),
noting that the f (x) and f 0 (x) terms all cancel.
For even higher derivatives, we obtain the following results
f (x) = f (x),
f (x + h) − f (x)
+ O(h),
h
f (x + 2h) − 2f (x + h) + f (x)
+ O(h),
f 00 (x) =
h2
f (x + 3h) − 3f (x + 2h) + 3f (x + h) − f (x)
f 000 (x) =
+ O(h),
h3
!
n
1 X
(n)
n−k n
f (x) = n
(−1)
f (x + kh) + O(h).
h
k
k=0
f 0 (x) =
These formulæ are not difficult to verify once we observe that the coefficients are taken from
Pascal’s Triangle, and we leave this as an exercise for the interested reader.
So far, we have considered only the standard divided difference formulae for the higher
derivatives. What about the higher derivative version of the central difference formula? We
notice that our formula for f 00 (x) is computed using f (x), f (x + h), and f (x + 2h). If we
instead center those three points around f (x) itself, we obtain the central difference formula
for the second derivative
f 00 (x) =
f (x + h) − 2f (x) + f (x − h)
+ O(h2 ),
2
h
36
GREGORY D. LANDWEBER
which once again replaces the O(h) error with a much better O(h2 ) error, giving us twice as
many digits of accuracy. Once again, we verify this via the Taylor series,
f (x + h) − 2f (x) + f (x − h) = f (x) + hf 0 (x) + h2 f 00 (x)/2 + h3 f 000 (x)/6 + h4 f (4) /4! + · · ·
− 2f (x)
+ f (x) − hf 0 (x) + h2 f 00 (x)/2 − h3 f 000 (x)/6 + h4 f (4) /4! − · · ·
= h2 f 00 (x) + O(h4 ),
where we observe that all the odd powers of h cancel between f (x + h) and f (x − h).
How do these central difference formulae work? Why is it that simply centering the points
where we evaluate the function around f (x) eliminates the O(h) term in the Taylor series,
giving us a much better O(h2 ) error? It is because the central difference expressions are
even functions of h. Recall that an even function is a function satisfying g(−h) = g(h),
and that the Taylor series for an even function g(h) has only even powers of h. Since our
central difference formulae for f 0 (x) and f 00 (x) do not change when we replace h with −h,
their Taylor series expansions have no odd powers of h. In particular the O(h) error term
involves h1 , which is an odd power, and so it must vanish, leaving an even O(h2 ) error term.
Presto! No additional work, but a much better error estimate.
In general, the central difference versions of our higher derivative formulæ are of the form
f (x) = f (x),
f 0 (x) =
f 00 (x) =
f 000 (x) =
f (n) (x) =
=
f (x + h) − f (x − h)
+ O(h2 ),
2h
f (x + h) − 2f (x) + f (x − h)
+ O(h2 ),
2
h
f (x + 3h) − 3f (x + h) + 3f (x − h) − f (x − 3h)
+ O(h2 ),
(2h)3
!
n
1 X
n
(−1)n−k
f (x + kh − nh/2) + O(h2 )
hn k=0
k
!
n
X
1
n−k n
(−1)
f (x + 2kh − nh) + O(h2 )
n
(2h)
k
k=0
where once again the coefficients come from Pascal’s triangle. The second version of the general formula simply replaces h with 2h. This gives better looking formulæ for odd derivatives,
since we end up evaluating our function at points such as f (x + h) rather than f (x + h/2).
17.3. Richardson Extrapolation. But it gets better. Much better. Suppose we want to
compute a first derivative using
f (x + h) − f (x)
= f 0 (x) + ch + dh2 + · · · ,
h
where c and d are some constants coming from the Taylor series expansion. While this may
give us a reasonable estimate for f 0 (x), if we want to improve our estimate we could plug in
g(h) =
APPLIED ANALYSIS
37
a smaller value of h
ch dh2
+
+ ··· .
2
4
However, by taking a linear combination of g(h) and g(h/2) we can get an even better
estimate that eliminates the ch term in the Taylor series expansion
g(h/2) = f 0 (x) +
g(h) − 2g(h/2) = −f 0 (x) +
dh2
+ ··· ,
2
which gives us
f 0 (x) = 2g(h/2) − g(h) + O(h2 )
−f (x + h) + 4f (x + h/2) − 3f (x)
+ O(h2 ).
h
While this does give us a formula involving f (x + h), f (x + h/2), and f (x), the much more
important observation is about the formula involving g(h) and g(h/2). This formula is a
weighted average of two quantities of the form ax + by where the weights satisfy a + b = 1.
Such weighted averages are often used for linear interpolation, estimating a value between x
and y, in which case we would take 0 ≤ a, b ≤ 1. In our case, we have a = 2 and b = −1,
meaning that the g(h/2) terms is doubly overweighted. This is an example of extrapolation.
We are effectively connecting g(h/2) and g(h) with a straight line, and extending that line
to estimate g(0) = f 0 (x). Doing this explicitly, the point-point form of this line is
=
y=
g(h) − g(h/2)
g(h) − g(h/2)
(x − h/2) + g(h/2) =
x + 2g(h/2) − g(h),
h/2
h/2
and the y-intercept gives our formula 2g(h/2) − g(h).
Exercise 17.1. Suppose we are given a function with f (−1) = 1, f (0) = 0 and f (1) = 3.
(1) Using the central difference formula, extrapolation formula, and second derivative
formula, give the best possible estimates for f 0 (−1), f 0 (0), f 0 (1) and f 00 (0).
(2) Find the unique quadratic polynomial y = p(x) = ax2 + bx + c passing through the
points (−1, 1), (0, 0), (1, 3). (Hint. Plug in each of the (x, y) pairs to get a system of
3 equations in the 3 variables a, b, c.)
(3) Compute p0 (−1), p0 (0), p0 (1), p00 (0).
18. Contractions
Recall from our discussion of the Intermediate Value Theorem that any continuous function
f : [a, b] → [a, b] from a closed interval back to itself has a fixed point, i.e., a point x ∈ [a, b]
such that f (x) = x. To see this, consider the continuous function g(x) = f (x) − x. At a we
have g(a) ≥ 0 and at b we have g(b) ≤ 0. So by the Intermediate Value Theorem, there must
exist x ∈ [a, b] such that g(x) = 0, and thus f (x) = x. Note that while the Intermediate
Value Theorem tells us that a fixed point exists, there may in fact be more than one.
Fixed points are useful, since many computations can be restated in terms of finding the
fixed points of well chosen functions. In this section, we demonstrate a simple technique to
find the fixed point of a function, and we consider a class of functions called contractions
which are guaranteed to have a unique fixed point.
38
GREGORY D. LANDWEBER
Definition 18.1. A contraction is a function f : I → I satisfying
|f (x) − f (y)| ≤ c |x − y|
for all x, y ∈ I, where c < 1 is a constant.
Such a function is called a contraction because all of the points in the domain get closer
to each other when you apply the function. In other words, applying the function causes the
space to contract. In the analysis literature, contractions are usually defined not only for
real functions in terms of absolute values, but also for functions on a general metric space,
with a distance function d(x, y). In that language, the condition is
d (f (x), f (y)) ≤ c d(x, y).
The simplest example of a contraction is a linear function f (x) = mx + b with slope |m| < 1.
In that case we have
|f (x) − f (y)| = |mx + b − my − b| = |mx − my| = |m||x − y|.
Indeed, the condition for a contraction can be restated in terms of the slope of secant lines:
f (y) − f (x) y − x ≤ c < 1,
so we see that when our function is a contraction, the slopes of all secant lines have absolute
values less than 1, or more precisely are bounded below 1 by some fixed c. The slopes for a
contraction are therefore well controlled, so such functions cannot oscillate wildly.
Lemma 18.2. Contractions are continuous.
Proof. Let f : I → I be a contraction. Given any > 0, we take δ = /c. Then if
|x − y| < δ = /c, we have |f (x) − f (y)| ≤ c |x − y| < c /c = .
The proof we gave here is very similar to that of the homework problem showing that a
linear function f (x) = mx + b is continuous. To show continuity, we do not need the slope to
be a constant. Instead, it is sufficient that the slope of any secant line is bounded. This proof
even works if the slopes of the secant lines are bounded by M ≥ 1, although the definition
of a contraction requires c ≤ 1.
Also note that the we chose in the proof does not actually depend on the point x. Usually,
we talk about a function being continuous at a particular point x, so for every > 0, we
can find a δ > 0 that works for points close to x. The typically changes depending on the
value of x, but in this case, a single works for all x. This is a stronger form of continuity
called uniform continuity.
We could also have proved Lemma 18.2 using Definition 10.3 of continuity via sequences.
Suppose that limn→∞ an = L. Then for any > 0, there exists N ∈ N such that n ≥ N
implies |an − L| < . But if f is a contraction, we have
|f (an ) − f (L)| ≤ c |an − L| < c < ,
and thus limn→∞ f (an ) = f (L). In other words, when the elements of our original sequence
are close enough to their limit, they get even closer when we apply the contraction.
APPLIED ANALYSIS
39
Lemma 18.3. If a contraction admits a fixed point, then that fixed point is unique.
Proof. Suppose both x and y are fixed points of a contraction f . Then f (x) = x and
f (y) = y, and so |f (x) − f (y)| = |x − y|. However, the contraction condition tells us that
|f (x) − f (y)| ≤ c |x − y| with c < 1. The only way that both could be true is if x = y. In fact, every contraction admits a fixed point by our Intermediate Value Theorem argument above, and here is how you compute it.
Theorem 18.4 (Contraction Mapping Theorem). If f : I → I is a contraction, then the
sequence x0 , x1 = f (x0 ), x2 = f (x1 ) = f (f (x0 )), . . . , xn = f (xn−1 ) = f n (x0 ), . . . converges
to the unique fixed point of f , regardless of the initial choice of x0 ∈ I.
We provide two proofs of this theorem. For our first proof, we assume that we already
know that a fixed point exists. Since a contraction is continuous, our Intermediate Value
Theorem argument above tells us that.
Proof assuming a fixed point exists. Suppose f (x) = x. For any initial estimate x0 , we have
|x1 − x| = |f (x0 ) − f (x)| ≤ c |x0 − x|,
|x2 − x| = |f (x1 ) − f (x)| ≤ c2 |x0 − x|,
|xn − x| ≤ cn |x0 − x|.
Given any > 0, we must choose N so that cN |x0 − x| < . Solving for c, we require
,
N > logc
|x0 − x|
which we know exists by the Archimedean property. (Note that taking the log base c reverses
the inequality. This is because c < 1, and logc is a decreasing function. If you don’t buy
that, you can instead take ln of both sides, and then you need to divide by ln c < 0 to solve
for N .) Then for all n ≥ N , we have
|xn − x| ≤ cn |x0 − x| ≤ cN |x0 − x| ≤ ,
which shows that limn→∞ xn = x.
If we want to prove the Contraction Mapping Theorem on a more general metric space
with a distance function, we do not have the benefit of the Intermediate Value Theorem
to produce our fixed point. Instead, we need to construct the fixed point by showing that
iterating the function produces a Cauchy sequence.
Proof not assuming a fixed point exists. We show that the sequence {xn }∞
n=0 is Cauchy. Let
d = |x0 − x1 | be the distance between the first two points. Then we have
|x1 − x2 | = |f (x0 ) − f (x1 )| ≤ c|x0 − x1 | ≤ cd,
|x2 − x3 | = |f (x1 ) − f (x2 )| ≤ c|x1 − x2 | ≤ c2 d,
|xn − xn+1 | ≤ cn d.
40
GREGORY D. LANDWEBER
By the triangle inequality, for m < n we obtain
|xm − xn | ≤ |xm − xm+1 | + |xm+1 − xm+2 | + · · · + |xn−1 − xn |
≤ cm d + cm+1 d + · · · + cn−1 d = d cm (1 + c + · · · cn−m−1 ) = d cm
=
1 − cn−m
1−c
d
(cm − cn )
1−c
So we see that as m and n get larger, the element xm and xn get closer and closer together.
To make this more precise, suppose we are given > 0. We want to find N so that for all
m, n > N we have |xm − xn | < . To do this, we must choose N such that
d cN
1−c
< =⇒ cN ≤
.
1−c
d
Taking logarithms of both sides, we obtain
N ≥ logc
1−c
,
d
and we know we can find such an N by the Archimedean property.(Note that taking the log
base c reverses the inequality. This is because c < 1, and logc is a decreasing function. If
you don’t buy that, you can instead take ln of both sides, and then you need to divide by
ln c < 0 to solve for N .)
Given m, n > N , we then have
|xm − xn | ≤
d
d
d N
(cm − cn ) ≤
max(cm , cn ) ≤
c ≤ ,
1−c
1−c
1−c
and so the sequence {xn }∞
n=0 is Cauchy. It therefore converges by the Cauchy Completeness
Theorem to limn→∞ xn = x. Finally, we show that x is a fixed point of f . Since applying f
simply shifts the elements of the sequence by one, by continuity we observe that
f (x) = f ( lim xn ) = lim f (xn ) = lim xn = x,
n→∞
n→∞
n→∞
and so x is indeed the unique fixed point of f .
These proofs not only show that the sequence obtained by iterating the function f converges to the fixed point, but they also give us a way to estimate the convergence and control
the error. From our first proof, we see that
|xn − x| ≤ c |xn−1 − x|,
so each iteration of the function improves the estimate by a factor of c < 1. This is called
linear convergence, and it is similar to the convergence rate of the bisection method, where
each successive estimate had an error bounded by half the error of the previous estimate.
Provided that c < 1, a sequence with linear convergence is guaranteed to converge. The
difficulty with this method is coming up with the initial error |x − x0 |. We do not know
where the fixed point is (if we did, we would not be trying to estimate it), so all we have to
go on is that the fixed point is somewhere in the domain I of the function.
APPLIED ANALYSIS
41
Our second proof gives us a more precise estimate of the error. Given any initial estimate
x0 we consider the distance d = |x0 − f (x0 )| between our initial estimate and the next
estimate obtained by applying the function f . We then have
d
d
|xn − x| = lim |xn − xm | ≤ lim
(cn − cm ) =
cn .
m→∞
m→∞ 1 − c
1−c
This gives us a good error bound, controlled both by the initial distance d and by the number
of times n we have iterated the function f . Although the sequence will always converge, we
get much better error estimates if the initial distance d is small.
This process if iterating the function f (x) over and over again is one of the core ideas of
dynamical systems. It is basically a feedback loop, where the output of one step is the input
in the next. With the Contraction Mapping Theorem we have what is called a stable fixed
point, or an attractor, where the dynamical system sucks in all (nearby) values to the fixed
point. Stable fixed points are very forgiving. If you move slightly away from the fixed point,
applying the function a few times will take you back to the fixed point. This is in contrast
to an unstable fixed point, also called a repeller, where if you move slightly away from the
fixed point, applying the function takes you even farther away.
19. Fixed Point Iteration
So how do we know that a given function is a contraction? We do not usually verify the
contraction condition for all x, y ∈ I, unless there is an obvious reason why all the slopes
of the secant lines are small, such as if the function is linear. Instead, we switch from using
the slopes of secant lines to using derivatives, i.e., the slopes of tangent lines, via the Mean
Value Theorem.
Theorem 19.1 (Fixed Point Iteration Theorem). Suppose f (x) = x, and that f is continuously differentiable on some open interval containing x. If |f 0 (x)| < 1, then there exists an
open interval I containing x on which f : I → I is a contraction.
Proof. Since |f 0 (x)| < 1, we have −1 < f 0 (x) < 1. Since f 0 is continuous, we can use the
Sign Preserving Property for continuous functions to show that |f 0 (x)| is bounded away
from 1 in a small enough interval containing x. More precisely there exists a constant c+
and a δ+ > 0 such that f 0 (y) < c+ < 1 whenever |x − y| < δ+ , and there exists another
constant c− and a δ− > 0 such that −1 < c− < f 0 (y) whenever |x − y| < δ− . Combining
these, taking c = max(|c+ |, |c− |) and δ = min(δ+ , δ− ), we see that |f 0 (y)| < c < 1 for all
y ∈ I = (x − c, x + c).
For any z ∈ I, the Mean Value Theorem tells us that there exists y between x and z with
f (z) − f (x)
= f 0 (y).
z−x
However, y is also in I, and taking absolute values and using the result of the previous
paragraph, we have
f (z) − f (x) 0
z − x = |f (y)| < c < 1.
It follows that f : I → I is a contraction.
42
GREGORY D. LANDWEBER
Although this proof may seem complicated, the fundamental idea behind the Fixed Point
Iteration Theorem is fairly intuitive. If we want to bound the slopes of secant lines, the
Mean Value Theorem tells us that for a differentiable function, the slopes of secant lines are
actually given by derivatives of the function. So we want to bound the derivative. But if the
derivative is continuous, if it satisfies the appropriate bound at the fixed point, then it must
also satisfy that bound at other nearby points, in a sufficiently small interval.
Example 19.2. We will compute the fixed point of cos x (in radians). From looking at
where the graphs of y = cos x and y = x cross, we see that cos x has a unique fixed point,
somewhere around 0.7. The function cos x is almost but not quite a contraction. If we
consider the slopes of tangent lines, i.e., derivatives, we note that the derivative of cos x is
sin x. While | sin x| ≤ 1 for all x, we do not have | sin x| ≤ c < 1, so the slopes of the tangent
and also secants are not bounded below 1.
If we consider the Fixed Point Iteration Theorem, we must show that |f 0 (x)| = | sin x| < 1
at the fixed point x. We do know that | sin x| ≤ 1 for all x. We just need to show that
| sin x| 6= 1 at the fixed point x. However, we have sin x = ±1 only at π/2 + kπ for k ∈ Z,
and none of those points are the fixed point for cos x.
So we now take any x0 sufficiently close to the fixed point (in fact, ANY initial value will
do), and start iterating cos x. Starting at x0 = 0, we obtain the sequence:
0, 1, 0.54030230586814, 0.857553215846393, 0.65428979049778, 0.793480358742565,
0.701368773622757, 0.763959682900654, 0.722102425026708, 0.75041776176376,
0.73140404242251, 0.744237354900557, 0.735604740436347, 0.741425086610109,
0.737506890513243, 0.740147335567876, 0.738369204122323, 0.739567202212256, . . .
As far as our computational algorithms go, this is the easiest we have seen! All we need to
do is keep mashing the cos button our our calculator until all the digits stay the same.
So why doesn’t it matter what our initial value is? Even though cos x is not a contraction
in general, it is a contraction on any interval where |f 0 (x)| = | sin x| < 1. In particular, it is
a contraction on the interval (−π/2, +π/2). For any starting value x0 , we have
cos x0 ∈ [−1, +1] ⊂ (−π/2, +π/2),
so after just one iteration of cos x, our iterates are indeed inside an interval where cos x is a
contraction.
Example 19.3. Consider the Golden Ratio φ, approximately 1.6, which is a root of the
quadratic equation φ2 − φ − 1 = 0. We can rewrite this to exhibit φ as the fixed point of
the function f (x) = x2 − 1. What happens if we iterate this function near 1.6? Starting at
x0 = 1, we obtain
1, 0, −1, 0, 1, 0, −1, 0, . . . .
That sequence clearly does not converge! What if we start even closer to the Golden Ratio,
at x0 = 1.6. We compute the sequence
1.6, 1.56, 1.4336, 1.0552, 0.1134, . . . ,
APPLIED ANALYSIS
43
whose terms are getting farther away from φ and from each other, so it also diverges. What
is going wrong? Computing the derivative at the fixed point φ, we have f 0 (x) = 2x and
f 0 (φ) = 2φ, which is approximately 3.2. So we fail the Fixed Point Iteration Theorem
condition on the derivative. Indeed, the argument of the Fixed Point Iteration Theorem can
be modified to show that if |f 0 (x)| > 1, then x is an unstable fixed point or repeller.
Let’s compute the Golden Ratio a different way. Rewriting its quadratic equation, we
have
1
φ=1+ .
φ
Letting f (x) = 1 + 1/x, we can iterate:
3 5 8 13 21 34
1, 2, , , , , , , . . . .
2 3 5 8 13 21
This sequence does converge, and we notice that each fraction is a ratio of successive terms of
the Fibonacci series, which is one of the many cool properties of the Golden Ratio. How fast
does this converge? We compute f 0 (x) = −1/x2 , and near the Golden Ratio of 1.6 we have
| − 1/(1.6)2 | ≈ 0.4. So we expect to find linear convergence with a constant of c ≈ 0.4. Note
that this fixed point iteration sequence will converge to the Golden Ratio φ slightly faster
than solving the quadratic equation by the bisection method, which gives linear convergence
with a constant of c = 1/2.
Exercise 19.4. Suppose we iterate the function f (x) = 1 + 1/x as in the example above to
compute the Golden Ratio φ. But suppose that instead of starting with x0 = 1, we choose
two arbitrary real numbers f0 6= 0 and f1 ≥ f0 and start with initial estimate x0 = f1 /f0 ≥ 1.
Show that the successive terms of the fixed point iteration are all ratios of the form
xn =
fn+1
,
fn
where the fn satisfy the Fibonacci relation
fn = fn−1 + fn−2 .
Show that although this is not necessarily the usual Fibonacci sequence, the ratios of successive terms still converges to the Golden Ratio φ.
Since we are most interested in functions that are real analytic, i.e., functions that can be
approximated by Taylor series, we will now use Taylor’s Theorem to analyze the Fixed Point
Iteration error. In some cases, this analysis will show that the convergence is even better
than the linear convergence guaranteed by the Contraction Mapping Theorem.
If c is a fixed point of f , so that f (c) = c, then Taylor’s Theorem gives us
f (x) = c + f 0 (ξ)(x − c)
= c + O(h),
where ξ is between c and x. This means that
|f (x) − c| = |f 0 (ξ)| |x − c|.
44
GREGORY D. LANDWEBER
If we have a bound M such that |f 0 (ξ)| ≤ M < 1 for all ξ between c and our initial estimate,
then we obtain the linear convergence
|f (x) − c| ≤ M |x − c|.
Since M < 1, each successive estimate is closer to the fixed point c than the previous, so our
initial bound M on f 0 (ξ) holds for subsequent estimates as well. In our proof of the Fixed
Point Iteration Theorem, we assumed that |f 0 (c)| < 1, so the Sign Preserving Property of
continuous functions gave us |f 0 (ξ)| < M < 1 for ξ sufficiently close to c.
If f 0 (c) = 0, then we get even faster convergence. Then Taylor’s Theorem gives us
f 00 (ξ)
(x − c)2
2
= c + O(h2 ),
f (x) = c +
where ξ is between c and x. This means that
|f 00 (ξ)|
|x − c|2 .
2
00
If we have a bound M such that |f (ξ)|/2 ≤ M for all ξ between c and our initial estimate,
then we obtain quadratic convergence
|f (x) − c| =
|f (x) − c| ≤ M |x − c|2 .
In general, if multiple derivatives f 0 (c) = f 00 (c) = · · · = f (n−1) (c) = 0 at the fixed point all
vanish, then the fixed point iteration has degree n convergence, with
|f (x) − c| ≤ M |x − c|n ,
where M is an upper bound on |f (n) (ξ)|/n!.
20. Root Approximation
We have already seen how to use the Bisection Method to approximate the roots of a
function, solving f (x) = 0. The Bisection Method is an iterative method with linear convergence and a constant of c = 1/2. The strength of the Bisection Method is that is always
works, as it is guaranteed by the Intermediate Value Theorem. However, it is relatively slow,
providing only one binary digit of accuracy at each stage, and taking a little over 3 steps
for each decimal digit of accuracy. In this section we discuss other iterative methods of root
approximation with better convergence.
20.1. Newton’s Method. If we are working with a function that is differentiable, where
we can actually compute the derivative, we can use Newton’s Method to approximate its
roots. If we have an estimate xn for the root, we replace f (x) with its linear approximation
l(x) at the point (xn , f (xn )). By the point-slope form of the line, we have
l(x) = f 0 (xn )(x − xn ) + f (xn ).
Solving for the root of l(x), we obtain our next estimate
0 = f 0 (xn )(xn+1 − xn ) + f (xn ) =⇒ xn+1 = xn −
f (xn )
.
f 0 (xn )
APPLIED ANALYSIS
45
This gives us a fixed point iteration for the function
g(x) = x −
f (x)
.
f 0 (x)
How fast does this converge? Using the Fixed Point Iteration Theorem, we compute |g 0 (x)|
and obtain
(20.1)
g 0 (x) = 1 −
f 0 (x)2 − f (x)f 00 (x)
f (x)f 00 (x)
=
.
f 0 (x)2
f 0 (x)2
At the root we have f (x) = 0, so g 0 (x) = 0 vanishes there. This means that g(x) is a
contraction sufficiently close to the root, and furthermore we expect to see at least quadratic
convergence. Computing the second derivative, |g 00 (x)|, we obtain
f 0 (x)3 f 00 (x) + f (x)f 0 (x)2 f 000 (x) − 2f (x)f 0 (x)f 00 (x)2
f 0 (x)4
f 0 (x)2 f 00 (x) + f (x)f 0 (x)f 000 (x) − 2f (x)f 00 (x)2
=
,
f 0 (x)3
g 00 (x) =
(20.2)
and at the root with f (x) = 0, we obtain g 00 (x) = f 00 (x)/f 0 (x). We therefore expect quadratic
convergence, giving us
|xn+1 − x| ≤ c |xn − x|2 ,
with
00
f (x) c ≈ 0 .
2f (x)
Alternatively, we can analyze the iteration of Newton’s Method directly via Taylor’s Theorem. Suppose the root is f (y) = 0, and we have an approximation x. Expanding the Taylor
series at x, we obtain
0 = f (y) = f (x) + f 0 (x)(y − x) +
f 00 (ξ)
(y − x)2 ,
2
where ξ is between x and y. Dividing by f 0 (x), we obtain
0=
f (x)
f 00 (ξ)
+
y
−
x
+
(y − x)2 ,
f 0 (x)
2f 0 (x)
and rearranging the terms shows us that the next iteration of Newton’s Method is
x−
f (x)
f 00 (ξ)
=
y
+
(y − x)2 ,
f 0 (x)
2f 0 (x)
and the distance to the root changes according to
00
x − f (x) − y = |f (ξ)| |x − y|2 ,
2|f 0 (x)|
f 0 (x)
giving us our expected quadratic convergence.
46
GREGORY D. LANDWEBER
√
Example 20.1. Let’s compute 2, which is a root of the function f (x) = x2 −2. By Newton’s
Method, we iterate the function
g(x) = x −
x2 − 2
x 1
= + .
2x
2 x
Starting at x0 = 1, we obtain
1, 1.5, 1.4166667, 1.4142157, 1.4142137
√
which gets us very quickly to 2 ≈ 1.4142136. Notice that we are approximately doubling
the number of digits of accuracy at each step of the iteration. This quadratic convergence
is much faster than the linear convergence of the bisection method, which takes around 10
steps just for every 3 digits of accuracy. For our error estimate, we have f 00 (x) = 2 and
f 0 (x) = 2x ≈ 3 near the root, so we obtain quadratic convergence with c ≈ 1/3 sufficiently
close to the root.
Exercise 20.2. Use Newton’s Method to construct the function
you need to iterate to compute
√
√
n for any n ∈ N. Iterate this function to compute 5 to the maximum precision allowed
by your calculator or computer. Then compute the Golden Ratio
√
1+ 5
φ=
.
2
Exercise 20.3. Use Newton’s Method to find x such that
x
x
cos = sin .
4
4
Compute it to the maximum precision allowed by your calculator or computer.
Exercise 20.4. Suppose we are using Newton’s Method to find a root r of a function f (x)
where f 00 (r) = 0. In this case, the function g(x) that we iterate satisfies not only the usual
g(r) = r and g 0 (r) = 0, but also g 00 (r) = 0. With the second derivative vanishing, our
discussion at the end of the last section shows that fixed point iteration in this case has at
least cubic convergence, with
000 g (r) |xn − r|3 .
|xn+1 − r| ≈ 6 Compute the constant |g 000 (r)/6|. (Hint. Differentiate (20.2) and remember that any terms
involving g 0 (x) or g 00 (x) will vanish when you plug in x = r at the end of the computation.)
Newton’s Method works very well to approximate roots provided that our initial estimate
is close enough to the root, or that f 00 (ξ)/f 0 (ξ) never gets too large. In particular, we can
run into problems if the function has a local minimum or maximum (or other critical point)
near our initial estimate, or indeed anywhere between our initial estimate and the root we
are trying to approximate. Also, Newton’s Method is not guaranteed to converge. Unlike
with the bisection method, we do not have the benefit of the Intermediate Value Theorem
telling us that our continuous function must have a root inside some interval.
APPLIED ANALYSIS
47
Example 20.5. Let’s use Newton’s Method to approximate the root of f (x) = x2 . Iterating
x
x
x2
=x− = ,
2x
2
2
we see that it does indeed converge to the root 0. However, the convergence is not quadratic!
Instead we have linear convergence
g(x) = x −
|xn+1 − 0| =
1
|xn − 0|.
2
This example illustrates that Newton’s Method actually converges linearly, not quadratically whenever we have f 0 (x) = 0 at the root x. Why does that happen? It is because the
quadratic convergence constant is f 00 (x) / 2f 0 (x), which blows up when f 0 (x) = 0. Viewed in
terms of finding the roots of successive tangent lines, the tangent line is horizontal where the
derivative vanishes, so the tangent lines at nearby points have roots farther away than we
expect. So Newton’s Method works best when the function crosses the x-axis at an oblique
angle at the root. In the special case where the derivative vanishes at the root, we recall
from (20.1) that the fixed point iteration contraction constant is
g 0 (x) =
f (x)f 00 (x)
0
=
0
2
f (x)
0
at the root, where now both f (x) = 0 and f 0 (x) = 0. To resolve this indeterminate expression, we use L’Hò‚pital’s rule (differentiating both numerator and denominator) twice,
giving
f 0 (y)f 00 (y) + f (y)f 000 (y)
y→x
2 f 0 (y)f 00 (y)
lim g 0 (y) = lim
y→x
f 00 (y)2 + f 0 (y)f 000 (y) + f 0 (y)f 000 (y) + f (y)f (4) (y)
y→x
2f 00 (y)2 + 2f 0 (y)f 000 (y)
1
= ,
2
= lim
so we find that we have linear convergence with constant 1/2 as we saw in Example 20.10.
There are variants of Newton’s Method that are once again quadratically convergent in this
case, as well as for roots of multiplicity greater than 2.
20.2. Secant Method. Another potential problem with Newton’s Method is that to use
it we must be able to compute the derivative f 0 (x). This is easy if the function f (x) is
given to us symbolically, since we can differentiate pretty much anything. However, if we
do not have a symbolic expression for our function, the best we can do is approximate the
derivative using the slopes of secant lines. This may be the case if the values of our function
are determined by performing an experiment or measurement.
We can perform the same basic construction as Newton’s Method, but rather than replacing our original function f (x) with a tangent line at a single point, we use a secant
line intersecting our function at two points. This gives rise to the Secant Method of root
approximation. Given estimates xn−1 and xn−2 , the secant line intersecting the graph of our
48
GREGORY D. LANDWEBER
function f (x) at the points (xn−1 , f (xn−1 )) and (xn−2 , f (xn−2 )) is given by the point-point
form of the line:
f (xn−1 ) − f (xn−2 )
l(x) =
(x − xn−1 ) + f (xn−1 ).
xn−1 − xn−2
Solving for a root xn of l(x) and performing some algebra, we obtain the iterative formula
xn−1 − xn−2
+ xn−1
f (xn−1 ) − f (xn−2 )
f (xn−1 )xn−2 − f (xn−2 )xn−1
=
.
f (xn−1 ) − f (xn−2 )
xn = −f (xn−1 )
Note that this formula expresses the next iterate xn as a sort of weighted average of the prior
two iterates xn−1 and xn−2 . If f (xn−2 ) and f (xn−1 ) are both positive or both negative, then
this formula is an overweighted average that extrapolates xn , like we did with Richardson
extrapolation in Section 17.3. If however f (xn−1 ) and f (xn−2 ) have opposite signs, then xn is
interpolated between xn−2 and xn−1 . In this case, the Secant Method behaves like a smarter
version of the Bisection Method, choosing a new approximation not at the midpoint of the
interval, but rather taking into account the values of the function at the two endpoints.
Lemma 20.6. Let r be a root of the function f (x). The Secant Method has convergence
governed by the error estimate
00
f (r) |xn − r| ≈ 0 |xn−1 − r| |xn−2 − r| .
2f (r)
Proof. Computing the left hand side and then factoring out the error terms |xn−1 − r| and
|xn−2 − r|, we obtain
f (xn−1 )xn−2 − f (xn−2 )xn−1
− r
|xn − r| = f (xn−1 ) − f (xn−2 )
f (xn−1 )(xn−2 − r) − f (xn−2 )(xn−1 − r) .
= f (xn−1 ) − f (xn−2 )
f (xn−1 )
f (xn−2 ) xn−1 − r − xn−2 − r |xn−1 − r| |xn−2 − r|
=
f
(x
)
−
f
(x
)
n−1
n−2
Expanding the Taylor series of f (x) centered at r, we have
f (x) ≈ f 0 (r)(x − r) +
f 00 (r)
(x − r)2 ,
2
where the constant terms f (r) = 0 vanishes because r is a root. Dividing by x − r gives
f (x)
f 00 (r)
≈ f 0 (r) +
(x − r).
x−r
2
APPLIED ANALYSIS
49
Plugging this into our approximate expression for the error |xn − r|, we obtain
f 00 (r)
f 00 (r)
0
0
f (r) +
(xn−1 − r) − f (r) −
(xn−2 − r) 2
2
|xn − r| ≈ |xn−1 − r| |xn−2 − r|
f (xn−1 ) − f (xn−2 )
00
f (r)
(xn−1 − xn−2 ) 2
=
|xn−1 − r| |xn−2 − r|
f (xn−1 ) − f (xn−2 ) 00
f (r) ≈ 0 |xn−1 − r| |xn−2 − r| ,
2f (r)
where in the final step we observe that the slope of the secant line is approximately the
derivative f 0 (r).
Note that the convergence of the Secant Method is quite similar to that of Newton’s
Method. Both have the same constant |f 00 (r)/2f 0 (r)|. Also, both involve the product of
two errors on the right hand side. For Newton’s Method, that product is |xn−1 − r|2 , giving
quadratic convergence, while for the Secant Method it is |xn−1 − r| |xn−2 − r|, multiplying
the two previous errors. In fact, since the tangent line can be viewed as the limit of secant
lines as the two points converge to the same value, the convergence bound for Newton’s
Method can be viewed as the limit of the convergence bound for the Secant Method.
Although we stated the convergence of the Secant Method in terms of an approximate
error, we could indeed refine it to give a precise error bound in terms of f 0 (ξ) and f 00 (η)
for some values of ξ and η appropriately close to the estimates xn−1 , xn−2 and the root r.
We would then need to refine our proof, using Taylor’s Theorem to replace the approximate
second order Taylor expansion of f (x) with a precise equality, and converting the slope of
the secant line to a derivative via the Mean Value Theorem.
Although the above Lemma quite nicely describes the error of the Secant Method in terms
of the product of the two previous errors, it is customary to describe the convergence of each
term in terms of just one previous error. This gives us a surprising and interesting result.
Theorem 20.7. Let r be a root of the function f (x). The Secant Method has super-linear
convergence (i.e., convergence of order greater than 1), given by
00
f (x) φ−1
|xn − r| ≈ 0 |xn−1 − r|φ ,
2f (x)
where the order is the Golden Ratio φ ≈ 1.6.
Sketch of Proof. By the above Lemma, we have
|xn − r| ≈ M |xn−1 − r| |xn−2 − r| .
Multiplying both sides by M , we obtain
M |xn − r| ≈ M |xn−1 − r| M |xn−2 − r| ,
50
GREGORY D. LANDWEBER
and taking the natural log of both sides gives us
ln(M |xn − r|) ≈ ln(M |xn−1 − r|) + ln(M |xn−2 − r|),
Defining fn = ln(M |xn − r|), we obtain the Fibonacci relation
fn ≈ fn−1 + fn−2 .
However, any sequence fn satisfying the Fibonacci relation can be approximated by powers
of the Golden Ratio φ
fn ≈ Cφn .
Indeed, we have seen in Exercise 19.4 that the ratio of successive Fibonacci numbers tends
to φ, and this is true regardless of the initial values of f0 and f1 (except in a special case
that does not apply here where the ratio tends instead to φ − 1). We then have
fn ≈ φfn−1 ,
or in terms of our error estimates
ln M |xn − r| ≈ φ ln M |xn−1 − r|.
Exponentiating both sides gives
M |xn − r| ≈ (M |xn−1 − r|)φ ,
from which we obtain our desired approximation.
Exercise 20.8. Use the Secant Method to approximate the Golden Ratio φ satisfying the
equation φ2 − φ − 1 = 0. Let 1 and 2 be your initial estimates. Compute the first few
estimates as fractions and try to discern the pattern. Then compute your answer numerically
to the limit of your calculator or your computer’s floating point precision. How fast does
this converge?
Exercise 20.9. Use the Secant Method to approximate π by calculating the root of the
equation sin x = 0 between 3 and 4. Compute your answer to the limit of your calculator or
your computer’s floating point precision. How fast does this converge?
Example 20.10. Let’s use the Secant Method to approximate the root of f (x) = x2 . In this
case, at our root 0 we have not only f (x) = 0, but also f 0 (0) = 0. In our error formulæ from
Lemma 20.6 and Theorem 20.7, the denominator in the constant vanishes, so the constant
becomes infinite. This means that we expect the convergence to be slower than usual, just
like how Newton’s Method reverted from quadratic to linear convergence for this function.
Given two estimates, xn−1 and xn−2 , our next estimate is given by the formula
x2n−1 xn−2 − x2n−2 xn−1
xn =
.
x2n−1 − x2n−2
APPLIED ANALYSIS
51
Starting with initial estimates 2 and 1, we obtain
2
x1 = 2 = ,
1
2
x2 = 1 = ,
2
2−4
2
x3 =
= ,
1−4
3
4/9 − 2/3
2
x4 =
= ,
4/9 − 1
5
4/25 · 2/3 − 4/9 · 2/5
1
2
= = ,
x5 =
4/25 − 4/9
4
8
and we notice that each term is of the form xn = 2/fn , where fn is the nth Fibonacci number.
Recalling that the ratios of successive Fibonnaci numbers satisfy
fn−1
1
lim
= = φ − 1,
n→∞ fn
φ
we find that
|xn − 0| ≈ (φ − 1) |xn−1 − 0|,
so this sequence has linear convergence, with constant φ − 1 ≈ 0.618. Comparing this to the
linear convergence with constant 0.5 that we saw in Example 20.10 for this function with
Newton’s method, the Secant Method performs slightly worse than Newton’s method did,
which is what we have come to expect from the Secant Method.
Exercise 20.11. Prove that xn = 2/fn in the previous example.
21. Numerical Integration
21.1. Riemann Sums. We recall the Riemann integral, which computes the area under a
curve. The simplest method of approximating the integral
Z b
f (x) dx
a
is via Riemann sums, dividing the interval from a to b into n subintervals, and adding up
the areas of rectangles with heights given by the value of the function,
Z b
n
X
b−a
f (x) dx = lim
f (xi ).
n→∞
n
a
k=1
Here we take the limit as the number n of rectangles tends to infinity, which shrinks the
widths (b − a)/n of the rectangles to 0. The points xi at which we evaluate f (xi ) can be
taken anywhere in each of the subintervals. Various conventions include letting xi be the left
endpoint, the right endpoint, the midpoint, or the point where f (x) takes its maximum or
minimum on the interval (assuming such points exist, which is guaranteed by the Extreme
Value Theorem if f (x) is continuous). In the limit, it does not matter which xi we take; all of
these methods converge to the same value, provided that the function f (x) is sufficiently well
behaved. (Actually, this is circular reasoning. We say that a function is integrable precisely
52
GREGORY D. LANDWEBER
when the limit is the same regardless of the choice of the points xi in each subinterval. But
integrability is a subtle issue which we will not explore in these notes.)
Let’s analyze the error in this method of integration by Riemann sums. We start with the
error in estimating the integral by a single rectangle. We know that the actual area under
the curve is bounded by
Z b
f (x) dx ≤ (b − a) max f (x).
(b − a) min f (x) ≤
x∈[a,b]
a
x∈[a,b]
If f (x) is continuous, then the Extreme Value Theorem tells us that f (x) achieves both its
minimum and maximum values. Also, the Intermediate Value Theorem tells us that there
must exist some c ∈ [a, b] where
Z b
f (x) dx = (b − a)f (c).
(21.1)
a
This f (c) is called the average value of f (x) on the interval [a, b], and (21.1) is known as the
integral form of the Mean Value Theorem.
So if we choose the right point c ∈ [a, b] to evaluate our function, our rectangular estimate
of the area matches the integral right on the nose. But we’re usually not so lucky. How far
off can we be? Recalling that the Mean Value Theorem says that
f (x) = f (c) + f 0 (ξ)(x − c),
for some ξ between c and x, we see that the difference between any two values of the function
f (x) on the interval [a, b] is at most
|f (c) − f (x)| ≤ M (b − a),
where |f 0 (ξ)| ≤ M is an upper bound on the derivative for all ξ ∈ [a, b]. Multiplying this
possible difference in height by the width b − a of the interval gives us an error estimate
M (b − a)2 for a single rectangle.
Next, suppose we have multiple rectangles, with the interval [a, b] divided into n equal
subintervals, each with size h = (b − a)/n. Let |f 0 (ξ)| < M be a bound on the derivative
over the entire interval [a, b]. Adding up the errors on each of the subintervals, our total
error for a Riemann sum becomes
n
X
M h2 = nM h2 = M (b − a)h = O(h).
i=1
Here, we use the fact that nh = b − a, and we treat both b − a and M as constants. This tells
us that the error is proportional to the size of the intervals h, which makes sense since as we
increase the number of rectangles, we shrink the size of the intervals to 0, and the Riemann
sum converges to the integral.
21.2. Trapezoid Method. So what convention for computing Riemann sums should we
use? Left endpoints? Right endpoints? Midpoints? All of them have about the same rate
of convergence, each with error O(h).
We chose rectangles for Riemann sums because they are the simplest objects to measure
the area of. However, we also have formulæ for the areas of other geometric shapes. A slightly
APPLIED ANALYSIS
53
nicer technique often taught in Calculus classes is the Trapezoid Method for approximating
integrals. For each rectangle, instead of choosing a single height at one point, we use the
heights at both the left and the right endpoints of the interval. Connecting the corresponding
two points on the graph with a straight line, we form a trapezoid.
Working with a single trapezoid on the entire interval [a, b], we recall that the area of a
trapezoid is the width times the average of the two heights. This gives us the area
f (a) + f (b)
(b − a)f (a) + (b − a)f (b)
=
,
2
2
which we note is actually the average of the area computed the left endpoint method and the
area computed by the right endpoint method. Another way to look at what we are doing is
that we are replacing our original function f (x) with the secant line l(x) passing through the
function at the points a and b and linearly interpolates values of the functions on the interior
of the interval. This is in contrast to Riemann sums, which replace the original function
with a constant function on each subinterval.
So how far off is our secant line l(x) from our original function f (x)? There is a formula
for the error in linear interpolation, which tells us that
(b − a)
f 00 (ξ)
(x − a)(x − b)
2
for some ξ ∈ [a, b]. (This is analogous to the Taylor’s Theorem error for the linear approximation of a function, which is of the form f 00 (ξ)(x − c)2 /2, except that instead of having
a single center c, we have two endpoints a and b.) Bounding the second derivative term as
f 00 (ξ)/2 < M , we obtain
|f (x) − l(x)| < M (b − a)2
f (x) − l(x) =
(actually, if we do this carefully, we can improve this error estimate by a factor of 4), and
integrating this constant error bound over the whole interval [a, b], we find that the error in
a single trapezoid is bounded by M (b − a)3 . If we now take n equally spaced subintervals of
size (b − a)/n, the error in measuring the areas of the corresponding n trapezoids is then
nM h3 = M (b − a)h2 = O(h2 ).
This is a dramatic improvement over standard Riemann sums, which had an O(h) error.
This is all the more surprising given that the Trapezoid Method is just the average of the
Left Endpoint Method and the Right Endpoint Method! So, with nearly no extra work, we
have significantly increased our accuracy. We have seen this sort of thing before. Recall from
Section 17.1 that the standard definition of the derivative is O(h), while the central difference
formula, which averages the forward-looking and backward-looking derivative estimates is
actually O(h2 ). When considering integration methods, the Trapezoid Method is symmetric
with respect to the two endpoints, just as the symmetric
In terms of the formulæ, writing the size of the intervals as h = (b − a)/n, we have the
Left Endpoint Method
n−1
X
L(h) = h
f (a + kh) + O(h),
k=0
54
GREGORY D. LANDWEBER
the Right Endpoint Method
R(h) = h
n
X
b
Z
f (x) dx + O(h),
f (a + kh) =
a
k=1
and their average, the Trapezoid Method
n−1
T (h) = h
f (a) X
f (b)
+
f (a + kh) +
2
2
k=1
!
Z
=
b
f (x) dx + O(h2 ).
a
We observe that the right endpoint of a trapezoid is the same as the left endpoint of the
next trapezoid. Each of the internal points is counted twice, while the two endpoints are
counted only once. In fact T (h) = T (−h) is an even function of h, with only even powers of
h appearing in its power series expansion.
The more trapezoids we take, the better an estimate we get. We note that if we double
the number of trapezoids, we can use the estimate we already have, and add the new points:
(21.2)
n/2−1
1
h X
T (h/2) = T (h) +
f (a + h/2 + kh).
2
2 k=0
Another way to think of this is that we average the trapezoid rule approximation T (h) with
a Riemann sum evaluating the function at the midpoint of each interval.
21.3. Advanced Quadrature. To obtain an even better estimate, we use Simpson’s rule,
which approximates the area under the curve by measuring the areas under parabolas. It
turns out that instead of computing directly with quadratic approximations, we can obtain
Simpson’s rule using Romberg extrapolation to cancel the O(h2 ) term in the error for T (h),
leaving us with an O(h4 ) error. We have
Z b
T (h) =
f (x) dx + Ch2 + O(h4 ),
a
Z
b
f (x) dx +
T (h/2) =
a
Ch2
+ O(h4 ).
4
From this, we compute
Z
4T (h/2) − T (h) = 3
b
f (x) dx + O(h4 ),
a
giving us the Simpson’s rule estimate
(21.3)
4T (h/2) − T (h)
,
3
S(h) =
which satisfies
Z
S(h) =
a
b
f (x) dx + O(h4 ).
APPLIED ANALYSIS
55
Taking the Romberg extrapolation a step further, we have
Z b
f (x) dx + Dh4 + O(h6 ),
S(h) =
a
b
Z
S(h/2) =
a
Dh4
f (x) dx +
+ O(h6 ),
16
which gives us
Z b
16S(h/2) − S(h)
(21.4)
R(h) =
f (x) dx + O(h6 ).
=
15
a
Continuing, we obtain
Z b
64R(h/2) − R(h)
=
f (x) dx + O(h8 ).
63
a
Exercise 21.1. Estimate π, the area of the unit circle, by approximating
Z 1 √
2 1 − x2 dx.
−1
Compute the trapezoid rule estimates T (2), T (1), and T (1/2) with 1, 2, and 4 trapezoids,
respectively. Each can be computed from the previous by adding in the new sample points,
as shown in (21.2). Next, compute the Simpson’s rule estimates S(2) and S(1) using the
extrapolation formula (21.3). Finally, compute the O(h6 ) Romberg extrapolation estimate
R(2) using (21.4). (Note. This actually doesn’t give a very good estimate for π.)
22. Numerical ODEs
Our final topic is numerical methods for solving first order ordinary differential equations
of the form
y 0 (t) = f (t, y).
This is a small subset of differential equations in general, but it includes many cases that
arise in the real world. In these cases, the rate of change of the quantity y is determined by
a function f which depends on both time t and the quantity y itself. Included in this class
are differential equations where y 0 depends only on y, such as
y 0 (t) = ky,
which describes simple population models or radioactive decay. We also note that if y 0
depends only on t, the equation
y 0 (t) = f (t)
is just an integration problem with solution
Z
t
y(t) = y0 +
f (t) dt,
t0
which we can solve using the quadrature techniques of the previous section.
In all of our differential equation solving methods, we start at an initial time t0 with
an initial value y0 = y(t0 ). Given these initial conditions, we want to find the value of
56
GREGORY D. LANDWEBER
y(tf ) at some final time tf . It makes sense that we should be able to compute, or at least
approximate, y(tf ), since we are given an initial value for y, and the derivative y 0 tells us
how y changes with respect to time t. To do this, we inch forward from t0 to tf dividing the
range of times into n small intervals each of width h = (tf − t0 )/n.
22.1. Euler’s Method. The simplest method of solving first order differential equations of
the form y 0 (t) = f (t, y) is Euler’s method. Since we know the initial value y0 , and since the
derivative at t0 is y 0 (t0 ) = f (t0 , y0 ), we can determine the next value of y using the linear
approximation
(22.1)
y1 = y(t0 + h) = y(t0 ) + hy 0 (t0 ) + O(h2 )
= y0 + hf (t0 , y0 ) + O(h2 ).
Repeating this process, we iterate the computations
tk+1 = tk + h,
yk+1 = y(tk+1 ) = yk + h f (tk , yk ) + O(h2 )
until we obtain yn = y(tf ). Note that while we have an O(h2 ) error at each step, there are n
steps total. Since nh = tf − t0 is a constant, the overall error for all n steps is actually O(h).
Doing this by hand, you would typical construct a table with the three columns:
tk
yk
f (tk , yk )
As we fill in the rows of the table, we add h to each tk , compute yk using the previous row’s
yk and f (tk , yk ) values, and then plug them both into f (tk , yk ). Easy peasy.
Example 22.1. Suppose we want to approximate the solution to the differential equation
y 0 = f (t). Given starting time t0 = a and initial condition y(a) = 0, the Fundamental
Theorem of Calculus tells us that the solution for time tf = b is the definite integral
Z b
f (t) dt.
y(b) =
a
Using Euler’s method, we compute
y1 = y0 + hf (t0 ) = hf (a),
y2 = y1 + hf (t1 ) = h f (a) + f (a + h) ,
..
.
y(b) ≈ yn = h f (a) + f (a + h) + · · · + f (a + (n − 1)h) .
But this is precisely what we get if we compute the integral using the left endpoint method,
which we know gives an O(h) approximation.
In light of this example, we can interpret Euler’s method as a generalization of Riemann
sums for differential equations y 0 = f (t, y), where f now depends not only on t but also
y. Historically, though, Euler’s method predated Riemann sums, and for our purposes
Riemann sums are better viewed as a special case of Euler’s method. To put this in the
proper historical context, note that Euler’s method involves derivatives, while Riemann
sums involve integrals, viewed as the area under a curve. It took the Fundamental Theorem
APPLIED ANALYSIS
57
of Calculus for mathematicians to realize that that these two computations are related to
one another.
22.2. Taylor Series Method. When performing Euler’s method, we compute y(t+h) using
a linear approximation with O(h)2 error. If we want a smaller or higher order error, we can
replace the linear approximation (22.1) with a higher order Taylor series approximation.
Using a second order Taylor series, we have
h2 00
y (t) + O(h3 )
2
(22.2)
h2 d
f (t, y) + O(h3 ).
= y(t) + hf (t, y) +
2 dt
In order to do this, we must be able to differentiate the function f (t, y). This is harder than
it looks, since y is itself a function of t. We must therefore use the multivariable chain rule
y(t + h) = y(t) + hy 0 (t) +
f 0 (t, y) =
d
∂
∂
dy
∂
∂
f (t, y) = f (t, y) +
f (t, y)
= f (t, y) + f (t, y) f (t, y),
dt
∂t
∂y
dt
∂t
∂y
where in the final step we recall that our original differential equation gives us y 0 (t) = f (t, y).
Since the error at each step is O(h3 ), and recalling that nh = tf − t0 is constant, the total
error for all n steps of this Taylor series method of order 2 is therefore O(h2 ).
In practice, to apply a Taylor series method, we first compute the f 0 (t, y) symbolically,
then we iterate
h2 0
tk+1 = tk + h, yk+1 = yk + h f (tn , yk ) + f (tk , yk ).
2
Building a table by hand, we have columns
tk
yk
f (tk , yk )
f 0 (tk , yk ),
and we fill in each row by adding h to the previous tk , computing yk from the values in
the previous row, and then plugging in the tk and yk into the f (t, y) and f 0 (t, y) functions.
Although the Taylor series method improves on the O(h) error we had with Euler’s method,
it is a more complicated process. We must compute a symbolic multivariable derivative at
the outset, and each step of the Taylor series method takes longer than the corresponding
steps of Euler’s method.
Example 22.2. Consider the differential equation y 0 = ty with initial condition t0 = 0 and
y0 = 1. Let’s approximate y(3) with three intervals, so we set the size of each interval to
h = 1. Trying it first with Euler’s method, we build a table
t
0
1
2
3
so we get y(3) ≈ 6.
y f (t, y) = ty
1
0
1
1
2
4
6
18
58
GREGORY D. LANDWEBER
Next, let’s try the Taylor series method of order 2. We first compute the derivative
f 0 (t, y) =
d
ty = y + ty 0 = y + t2 y.
dt
Now we can build our table
t
y
f (t, y) = ty f 0 (t, y) = y + t2 y
0
1
0
1
1 1.5
1.5
3
2 4.5
9
22.5
3 24.75
so we get y(3) ≈ 24.75. (Notice that we did not need to compute the f (t, y) or f 0 (t, y) values
for the last row of the table.) That’s rather higher than the approximation we got with
Euler’s method.
Solving this same differential equation using separation of variables, we have
Z
Z
dy
t2
2
= t =⇒ ln |y| = + C =⇒ y = Cet /2 ,
y
2
2
and with our initial condition y(0) = 1, we have C = 1, so y(t) = et /2 . Plugging in t = 3
gives us y(3) = e4.5 ≈ 90. So neither of our estimates are very good, but the Taylor series
method is much better. Why did we do so poorly? Because we took h = 1 which is relatively
large. This was an example to show how to use these two methods, rather than to give an
accurate computation.
With the Taylor series method, we are not limited to Taylor series of order 2. We can use
Taylor series of any degree d and the corresponding method would approximate the solution
of the differential equation to O(hd ). For instance, if we wanted to use the Taylor series
method of degree 4, we would iterate
h2 0
h3 00
h4 000
yk+1 = yk + h f (tk , yk ) + f (tk , yk ) + f (tk , yk ) + f (tk , yk ),
2
6
24
4
giving us an approximation with O(h ) error. Doing this by hand involves computing the
higher total derivatives of f (t, h), and building a table with columns
tk
yk
f (tk , yk )
f 0 (tk , yk )
f 00 (tk , yk )
f 000 (tk , yk ),
which is tedious, but more accurate.
22.3. Runge-Kutta Methods. In practice, rather than taking higher derivatives, people
often prefer to use approximation methods that require evaluating the function f (t, y) at multiple points and taking the weighted average. These are called Runge-Kutta methods. This
gives significantly greater accuracy, with relatively little additional work, and the weighted
averages we take are analogous to those that we used when integrating via the Trapezoid
method, Simpson’s method, and Romberg extrapolation.
The simplest is the Runge-Kutta method of order 2. To derive it, we start with the Taylor
series method of order 2, but instead of computing f 0 (t, y) symbolically, we approximate the
APPLIED ANALYSIS
59
derivative as
f (t + h, y(t + h)) − f (t, y)
+ O(h).
h
Since we don’t have an exact value for y(t + h), we approximate that via Euler’s method as
f 0 (t, y) =
y(t + h) = y(t) + hf (t, y).
Plugging these expressions into formula (22.2) for the Taylor series method of order 2,
h2 f (t + h, y(t) + hf (t, y)) − f (t, y)
+ O(h3 )
2
h
h
= y(t) + f (t, y) + f (t + h, y + hf (t, y)) + O(h3 ).
2
Notice that we still have the same O(h3 ) error that we had with a single iteration of the
Taylor series method or order 2. Even though we are replacing the derivative f 0 (t, y) with
an O(h) approximation, that approximation is multiplied by h2 /2, and so gives an error that
is still O(h3 ).
In practice, we build a table with columns
y(t + h) = y(t) + hf (t, y) +
tk
yk
f (tk , yk )
ek
f (tk + h, ek )
where
ek = yk + hf (tk , yk )
is the next Euler method estimate, and
h
f (tk , yk ) + f (tk+1 , ek ) .
2
What is really going on here? We are performing the standard Euler’s method to get a new
value of y, but then we refine our estimate. Instead of computing the new value of y based
on the old value and the derivative at time t, we look ahead and average the derivatives at
both the left endpoint and right endpoint of each interval. This smarter way of estimating
the derivative over the whole interval is reminiscent of the central difference formula (17.1)
for the derivative.
yk+1 = yk +
Example 22.3. What does the Runge-Kutta method of order 2 give us if we apply it to an
integration problem y 0 = f (t)? Letting t0 = a and y0 = 0, we compute
h
y1 = f (a) + f (a + h)
2
h
y2 = y1 + f (a + h) + f (a + 2h)
2
1
1
=h
f (a) + f (a + h) + f (a + 2h)
2
2
..
.
1
1
yn = h
f (a) + f (a + h) + f (a + 2h) + · · · + f (a + (n − 1)h) + f (b) ,
2
2
60
GREGORY D. LANDWEBER
which is precisely the O(h2 ) estimate of the integral via the Trapezoid rule. Note that we
did not need to compute the Euler’s method estimates ek at all here. Those appear in the
expression f (tk+1 , ek ), but this f depends only on t and not the y parameter.
Example 22.4. Let’s return to the differential equation y 0 = ty with y(0) = 1. To compute
y(3) in three steps with h = 1, our Runge-Kutta method of degree 2 table becomes:
t
y
f (t, y) = ty e = y + ty f (t + h, e) = (t + 1)(y + ty)
0
1
0
0
0
1
1
1
2
4
2 3.5
7
10.5
31.5
3 22.75
giving us y(3) ≈ 22.75, which is close to what we obtained using the Taylor series method
of degree 2. That makes sense, as both methods have O(h2 ) error.
The most commonly used Runge-Kutta method is in degree 4, which is based on Simpson’s
rule for integration. At each stage it iterates:
h
(22.3)
tk+1 = tk , yk+1 = yk + (s1 + 2s2 + 2s3 + s4 ),
6
where we take a weighted average of the derivative function f (t, y) evaluated at four points:
s1 = f (tk , yk ),
h
h s2 = f tk + , yk + s1 ,
2
2
h
h s3 = f tk + , yk + s2 ,
2
2
s4 = f (tk+1 , yk + hs3 ).
Doing this by hand requires columns for
h
h
h
yn + s1 s2 yn + s2 s3 yn + hs3 s4
tn yn s1 tn +
2
2
2
which seems like a lot of work for each step, but it’s worth it because the error is O(h4 ).
People tend not to use Runge-Kutta methods of order higher than 4, though, since the
amount of work it takes to perform each step grows so quickly that it outweighs the benefit
of the smaller error estimate.
Notice that if we are considering an integration problem y 0 = f (t) where the function f
does not depend on y, then s1 = s2 , and the formula (22.3) becomes
1
2
1
yk+1 = yk + h
f (t) + f (t + h/2) + (t + h) ,
6
3
6
which is precisely weighted average we used with Simpson’s method, which computes the
area under a parabola. We are not going to derive the Runge-Kutta method of degree 4 here.
To do so would require starting with a Taylor series method of order 3 and approximating
both the f 0 (t, y) term (as we did with the Runge-Kutta method of degree 2) as well as the
f 00 (t, y) term via numerical differentiation. Although that would appear to give us a method
APPLIED ANALYSIS
61
with error O(h3 ), we actually get a better error of O(h4 ). Just like with Simpson’s method,
the error series for the Runge-Kutta method of degree 4 has only even powers of h.
Exercise 22.5. Use the methods described above for the differential equation y 0 = y and
initial value y(0) = 1 to approximate the value of y(1) = e. Use
(1) 4 iterations of Euler’s method,
(2) 2 iterations of the Taylor series method of order 2,
(3) 1 iteration of the Taylor series method of order 4,
(4) 2 iterations of the Runge-Kutta method of order 2, and
(5) 1 iteration of the Runge-Kutta method of order 4.
Exercise 22.6. Compute e to 7 digits after the decimal point by approximating the root of
the equation ln x = 1. You may use any method of root approximation that you want. Show
all of your work.
Exercise 22.7. Approximate ln 2 to 3 digits after the decimal point by computing the integral
Z 2
1
ln 2 =
dx
1 x
using any method of numerical integration you want. Show all of your work.
Bard College, P.O. Box 5000, Annandale-on-Hudson, NY 12504-5000
E-mail address: gregland@bard.edu
URL: http://math.bard.edu/greg/
Download