MATH 409 Advanced Calculus I Paul Skoufranis April 29, 2016

advertisement
MATH 409
Advanced Calculus I
Paul Skoufranis
April 29, 2016
ii
Preface:
These are the first edition of these lecture notes for MATH 409.
Consequently, there may be several typographical errors. Furthermore, these
notes will not contain much additional material outside the topics covered in
class. However, due to time constraints, some subsections may be skipped in
class. We leave those subsections as part of these notes for the curious
student, but students will not be responsible for those sections.
iv
Contents
1 Axioms of Number Systems
1.1 Set Notation . . . . . . . . . . . . . . . . . . . .
1.2 The Natural Numbers . . . . . . . . . . . . . . .
1.2.1 Peano’s Axioms . . . . . . . . . . . . . . .
1.2.2 The Principle of Mathematical Induction
1.2.3 The Well-Ordering Principle . . . . . . .
1.3 The Real Numbers . . . . . . . . . . . . . . . . .
1.3.1 Fields . . . . . . . . . . . . . . . . . . . .
1.3.2 Partially Ordered Sets . . . . . . . . . . .
1.3.3 The Triangle Inequality . . . . . . . . . .
1.3.4 The Least Upper Bound Property . . . .
1.3.5 Constructing the Real Numbers . . . . . .
2 Sequences of Real Numbers
2.1 The Limit of a Sequence . . . . . . . . . . .
2.1.1 Definition of a Limit . . . . . . . . .
2.1.2 Uniqueness of the Limit . . . . . . .
2.2 The Monotone Convergence Theorem . . .
2.3 Limit Theorems . . . . . . . . . . . . . . . .
2.3.1 Limit Arithmetic . . . . . . . . . . .
2.3.2 Diverging to Infinity . . . . . . . . .
2.3.3 The Squeeze Theorem . . . . . . . .
2.3.4 Limit Supremum and Limit Infimum
2.4 The Bolzano–Weierstrass Theorem . . . . .
2.4.1 Subsequences . . . . . . . . . . . . .
2.4.2 The Peak Point Lemma . . . . . . .
2.4.3 The Bolzano–Weierstrass Theorem .
3 An Introduction to Topology
3.1 Completeness of the Real Numbers . . . .
3.1.1 Cauchy Sequences . . . . . . . . .
3.1.2 Convergence of Cauchy Sequences
3.2 Topology of the Real Numbers . . . . . .
v
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
3
3
4
6
7
7
9
10
13
15
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
17
19
20
21
21
24
25
26
29
29
29
30
.
.
.
.
31
31
31
32
33
vi
CONTENTS
3.3
3.2.1 Open Sets . . . . . . . . . . . . .
3.2.2 Closed Sets . . . . . . . . . . . .
Compactness . . . . . . . . . . . . . . .
3.3.1 Definition of Compactness . . . .
3.3.2 The Heine-Borel Theorem . . . .
3.3.3 Sequential Compactness . . . . .
3.3.4 The Finite Intersection Property
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Cardinality of Sets
4.1 Functions . . . . . . . . . . . . . . . . . . .
4.1.1 The Axiom of Choice . . . . . . . .
4.1.2 Bijections . . . . . . . . . . . . . . .
4.2 Cardinality . . . . . . . . . . . . . . . . . .
4.2.1 Definition of Cardinality . . . . . . .
4.2.2 Cantor-Schröder–Bernstein Theorem
4.2.3 Countable and Uncountable Sets . .
4.2.4 Zorn’s Lemma . . . . . . . . . . . .
4.2.5 Comparability of Cardinals . . . . .
5 Continuity
5.1 Limits of Functions . . . . . . . . . .
5.1.1 Definition of a Limit . . . . .
5.1.2 Limit Theorems for Functions
5.1.3 One-Sided Limits . . . . . . .
5.1.4 Limits at and to Infinity . . .
5.2 Continuity of Functions . . . . . . .
5.3 The Intermediate Value Theorem . .
5.4 Uniform Continuity . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
33
36
38
38
40
41
42
.
.
.
.
.
.
.
.
.
45
45
45
48
51
51
53
55
57
59
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
63
63
63
67
68
71
73
76
78
6 Differentiation
6.1 The Derivative . . . . . . . . . . . . . . . .
6.1.1 Definition of a Derivative . . . . . .
6.1.2 Rules of Differentiation . . . . . . .
6.2 Inverse Functions . . . . . . . . . . . . . . .
6.2.1 Monotone Functions . . . . . . . . .
6.2.2 Inverse Function Theorem . . . . . .
6.3 Extreme Values of Functions . . . . . . . .
6.4 The Mean Value Theorem . . . . . . . . . .
6.4.1 Proof of the Mean Value Theorem .
6.4.2 Anti-Derivatives . . . . . . . . . . .
6.4.3 Monotone Functions and Derivatives
6.4.4 L’Hôpital’s Rule . . . . . . . . . . .
6.4.5 Taylor’s Theorem . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
83
83
83
87
91
91
93
96
99
100
101
102
104
111
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
7 Integration
7.1 The Riemann Integral . . . . . . . . . . .
7.1.1 Riemann Sums . . . . . . . . . . .
7.1.2 Definition of the Riemann Integral
7.1.3 Some Integrable Functions . . . . .
7.1.4 Properties of the Riemann Integral
7.2 The Fundamental Theorems of Calculus .
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
115
115
115
119
124
127
134
viii
CONTENTS
Chapter 1
Axioms of Number Systems
To discuss advanced calculus, we must return to many of the basis structures
that are taken for granted in previous courses. In particular, what exactly
are the natural numbers and the real numbers, and what properties do these
number systems have that we may use?
1.1
Set Notation
All mathematics must contain some notation in order for one to adequately
describe the objects of study. As such, we begin by developing the notation
for one of the ‘simplest’ constructs in mathematics.
Heuristic Definition. A set is a collection of distinct objects.
Our first task is to develop notation to adequately describe sets and
symbols to represent sets that will be common in this course. The following
table list several sets, the symbol used to represent the set, and a set
notational way to describe the set.
Set
natural numbers
integers
real numbers
rational numbers
Symbol
N
Z
R
Q
Set Notation
{1, 2, 3, 4, . . .}
{0, 1, −1, 2, −2, 3, −3, . . .}
{real numbers}
a
b | a, b ∈ Z, b 6= 0
Notice two different types of notation are used in the above table to describe
sets: namely {objects} and {objects | conditions on the objects}. Furthermore, the symbol ∅ will denote the empty set; that is, the set with no
elements.
Given a set X and an object x, we need notation to describe when x
belongs to X. In particular, we say that x is an element of X, denoted
x ∈ X, when x is one of the objects that make up X. Furthermore,
we will
√
√
use x ∈
/ X when x is not an element of X. For example, 2 ∈ R yet 2 ∈
/Q
1
2
CHAPTER 1. AXIOMS OF NUMBER SYSTEMS
and 0 ∈ Z but 0 ∈
/ N. Furthermore, given two sets X and Y , we say that Y
is a subset of X, denoted Y ⊆ X, if each element of Y is an element of X;
that is, if a ∈ Y then a ∈ X. For example N ⊆ Z ⊆ Q ⊆ R. Furthermore,
note if X ⊆ Y and Y ⊆ X, then X = Y .
Given two sets X and Y , there are various operations one can perform
on these two sets. Three such operations are as follows:
• The union of X and Y , denoted X ∪ Y , is the set
X ∪ Y = {a | a ∈ X or a ∈ Y };
that is, the union of X and Y consists of joining the two sets into one.
• The intersection of X and Y , denoted X ∩ Y , is the set
X ∩ Y = {a | a ∈ X and a ∈ Y };
that is, the intersection of X and Y is the set of elements contained in
both X and Y .
• The set difference of X and Y , denoted X \ Y , is the set
X \ Y = {a | a ∈ X and a ∈
/ Y };
that is, the set of all elements of X that are not elements of Y .
For example, if X = {1, 2, 3} and Y = {2, 4, 6}, then
X ∪ Y = {1, 2, 3, 4, 6},
X ∩ Y = {2},
and
X \ Y = {1, 3}.
In this course, we will often have a set X (usually R) and will be considering
subsets of X. Consequently, given a subset Y of X, the set difference
X \ Y will be called the complement of Y (in X) and will be denoted Y c for
convenience.
Sets will play an important role in this course. However, one important
question that has not been addressed is, “What exactly is a set?” This
questions must be asked as we have not provided a rigorous definition of a
set. This leads to some interesting questions, such as, “Does the collection
of all sets form a set?”
Let us suppose that there is a set of all sets; that is
Z = {X | X is a set}
makes sense. Note Z has the interesting property that Z ∈ Z. Furthermore,
if Z exists, then
Y = {X | X is a set and X ∈
/ X}
1.2. THE NATURAL NUMBERS
3
would be a valid subset of Z. However, we clearly have two disjoint cases:
either Y ∈ Y or Y ∈
/ Y (that is, either Y is an element of Y or Y is not an
element of Y ).
If Y ∈ Y , then the definition of Y implies Y ∈
/ Y which is a contradiction
since we cannot have both Y ∈ Y and Y ∈
/ Y . Thus, if Y ∈ Y is false, then
it must be the case that Y ∈
/ Y.
However, Y ∈
/ Y implies by the definition of Y that Y ∈ Y . Again this is
a contradiction since we cannot have both Y ∈
/ Y and Y ∈ Y . This argument
is known as Russell’s Paradox and demonstrates that there cannot be a set
of all sets.
The above paradox illustrates the necessity of a rigorous definition of a
set. However, said definition takes us beyond the study of this class. Instead
we will focus on two unforeseen questions, “What are the natural numbers?”
and “How do we define the natural numbers?”
1.2
The Natural Numbers
As seen through Russell’s Paradox, rigorous definitions are required to prevent
misconceptions with the objects we desire to study. As such, we need to
discuss what exactly the natural numbers are.
1.2.1
Peano’s Axioms
The following, known as Peano’s Axioms, completely characterize the natural
numbers.
Definition 1.2.1. The natural numbers, denoted N, are the unique number
system satisfying the following five axioms:
1. There is a number, denoted 1, such that 1 ∈ N.
2. For each number n ∈ N, there is a number S(n) ∈ N called the successor
of n (i.e. S(n) = n + 1).
3. The number 1 is not the successor of any number in N.
4. If m, n ∈ N and S(n) = S(m), then n = m.
5. (Induction Axiom) If X ⊆ N has the properties
(a) 1 ∈ X, and
(b) if k ∈ N and k ∈ X, then S(k) ∈ X,
then X = N.
Each of the above five axioms are necessary. The following examples
demonstrate the necessity of the third, fourth, and fifth axioms.
4
CHAPTER 1. AXIOMS OF NUMBER SYSTEMS
Example 1.2.2. Consider the set X = {1, 2} where we define S(1) = 2
and S(2) = 1. One may verify that X satisfies all but the third of Peano’s
Axioms.
Example 1.2.3. Consider the set X = {1, 2} where we define S(1) = 2 and
S(2) = 2. One may verify that X satisfies all but the fourth of Peano’s
Axioms.
Example 1.2.4. Consider the set N2 = {(n, m) | n, m ∈ Z} where we
define 1 = (1, 1) and S(n, m) = (n + 1, m + 1). One may verify that N2
satisfies all but the fifth of Peano’s Axioms since X = {(n, n) | n ∈ N} has
properties (a) and (b) but is not all of N2 .
The axioms of the natural numbers provide some nice properties. The
next subsection will focus on applications of the fifth axiom. For now, we note
that the other axioms give us a nice ‘ordering’ on N, which is consistent with
the ordering one expects. In particular, for n, m ∈ N, we define n < m if m
can be obtained by taking (possibly multiple) successors of n. Furthermore,
we define n ≤ m if n < m or n = m. The notion of ordering will play an
essential role in the construction of the real numbers (see Subsection 1.3.2).
1.2.2
The Principle of Mathematical Induction
The Induction Axiom of the natural numbers leads to the following principle.
Theorem 1.2.5 (The Principle of Mathematical Induction). For each
k ∈ N, let Pk be a mathematical statement that is either true or false. Suppose
1. (base case) P1 is true, and
2. (inductive step) if k ∈ N and Pk is true, then Pk+1 is true.
Then Pn is true for all n ∈ N.
Proof. Let
X = {n ∈ N | Pn is true}.
By assumption we see that 1 ∈ X as P1 is true.
Assume that k ∈ X. By the definition of X, we know Pk is true. By
the assumptions in the statement of the theorem, Pk+1 is true and hence
k + 1 ∈ X by the definition of X. Hence the Induction Axiom in Definition
1.2.1 implies X = N. Hence Pn is true for all n.
The Principle of Mathematical Induction is an essential method for
proving mathematical statements. The following is a specific example.
1.2. THE NATURAL NUMBERS
5
Example 1.2.6. For each n ∈ N, we claim that
n
X
m = 1 + 2 + 3 + ··· + n =
m=1
n(n + 1)
.
2
To see this result is true, for each n ∈ N let Pn be the statement that
Pn
n(n+1)
. To show that Pn is true for all n ∈ N, we will apply the
k=1 k =
2
Principle of Mathematical Induction. To do so, we must demonstrate the
two conditions in Theorem 1.2.5.
Base Case: To see that P1 is true, notice that when n = 1,
1
X
n(n + 1)
1(1 + 1)
=
=1=
m.
2
2
m=1
Hence P1 is true.
P
Inductive Step: Suppose that Pk is true; that is, suppose km=1 m =
k(k+1)
(this assumption
2
Pk+1 is true, notice
k+1
X
is known as the induction hypothesis). To see that
m = (k + 1) +
m=1
k
X
m
m=1
k(k + 1)
by the induction hypothesis
2
2(k + 1) + (k 2 + k)
=
2
k 2 + 3k + 2
(k + 1)(k + 2)
(k + 1)((k + 1) + 1)
=
=
=
.
2
2
2
= (k + 1) +
Hence Pk+1 is true.
Therefore, as we have demonstrated the base case and the inductive step,
the result follows by the Principle of Mathematical Induction.
Some people (mainly computer scientists) argue that the Induction Axiom
must be false as it would take an infinite amount of time for a computer to
verify Pn is true for all n by using the fact P1 is true and Pk is true implies
Pk+1 is true. We will not adopt this notion. In fact, often one wants to
assume more than just Pk is true in order to show that Pk+1 is true.
Theorem 1.2.7 (Strong Induction). Suppose X ⊆ N. If
1. 1 ∈ X, and
2. if k ∈ N and {1, 2, . . . , k} ⊆ X, then k + 1 ∈ X,
then X = N.
6
CHAPTER 1. AXIOMS OF NUMBER SYSTEMS
Proof. For each n ∈ N, let Pn be the statement that {1, . . . , n} ⊆ X. We
claim that Pn is true for all n ∈ N. To show this, we will apply the Principle
of Mathematical Induction.
Base Case: As 1 ∈ X by assumption, clearly P1 is true.
Inductive Step: Suppose that Pk is true; that is, {1, 2, . . . , k} ⊆ X. By
assumption on X, k + 1 ∈ X. Hence {1, . . . , k, k + 1} ⊆ X so Pk+1 is true.
Hence, by the Principle of Mathematical Induction, {1, . . . , n} ⊆ X for
all n ∈ N. In particular, n ∈ X for all n ∈ N. Hence X = N.
Theorem 1.2.8 (The Principle of Strong Mathematical Induction).
For each k ∈ N, let Pk be a mathematical statement that is either true or
false. Suppose
1. P1 is true, and
2. if k ∈ N and Pm is true is true for all m ≤ k, then Pk+1 is true.
Then Pn is true for all n ∈ N.
Proof. The proof of this result is nearly identical to that of Theorem 1.2.5.
Let
X = {n ∈ N | Pn is true}.
By assumption we see that 1 ∈ X as P1 is true.
Assume that {1, . . . , k} ⊆ X. By the definition of X, we know Pm is true
for all m ≤ k. By the assumptions in the statement of the theorem, Pk+1 is
true and hence k + 1 ∈ X by the definition of X. Hence Strong Induction
implies X = N. Hence Pn is true for all n ∈ N.
1.2.3
The Well-Ordering Principle
There is one additional form of the Principle of Mathematical Induction that
is quite useful.
Theorem 1.2.9 (The Well-Ordering Principle). Every non-empty subset of N has a least element; that is, if Y ⊆ N and Y =
6 ∅, then there is an
element m ∈ Y such that m ≤ k for all k ∈ Y .
Proof. Suppose Y is a non-empty subset of N that does not have a least
element. Let
X = N \ Y = {n ∈ N | n ∈
/ X}.
We will apply Strong Induction to show that X = N. This will complete
the proof since X = N implies Y = ∅, which contradicts the fact that Y
is non-empty. To apply Strong Induction, we must demonstrate the two
necessary assumptions in Theorem 1.2.7.
Base Case: Since Y does not have a least element, we know that 1 ∈
/Y
or else 1 would be the least element of Y . Hence 1 ∈ X.
1.3. THE REAL NUMBERS
7
Inductive Step: Suppose k ∈ N and {1, . . . , k} ⊆ X. Then each element
of {1, . . . , k} is not in Y . Hence k + 1 ∈
/ Y for otherwise k + 1 would be
the least element of Y since none of 1, . . . , k are in Y . Hence k + 1 ∈ X as
k+1∈
/ Y.
Hence, by Strong Induction, X = N thereby completing the proof by
earlier discussions.
In the above, we assumed the Induction Axiom as one of Peano’s Axioms,
deduced Strong Induction, and used Strong Induction to deduce the WellOrdering Principle. In fact, the Induction Axiom and the Well-Ordering
Principle are logically equivalent; that is, if one replaces the Induction Axiom
with the Well-Ordering Principle in Definition 1.2.1, one may deduce the
Induction Axiom (see the homework).
1.3
The Real Numbers
With a rigorous construction of the natural numbers now complete, we turn
our attention to the real numbers. In particular, how does one construct the
real numbers, what properties do the real numbers have, and are there any
number systems with the same properties as the real numbers?
1.3.1
Fields
To begin our discussion of the real numbers, we note there are some common
operations we may apply to the real numbers: namely addition, subtraction,
multiplication, and division. These operations have specific properties that
we shall explore.
We begin with addition and multiplication. Recall that addition and
multiplication are operations on pairs of real numbers; that is, for every
x, y ∈ R there are numbers, denoted x + y and x · y, which are elements
of R. Furthermore, there are two properties we require for addition and
multiplication to behave well, and one property that says addition and
multiplication play together nicely:
(F1) (Commutativity) x + y = y + x and x · y = y · x for all x, y ∈ R.
(F2) (Associativity) (x + y) + z = x + (y + z) and (x · y) · z = x · (y · z) for
all x, y, z ∈ R.
(F3) (Distributivity) x · (y + z) = (x · y) + (x · z) for all x, y, z ∈ R.
To introduce the operations of subtraction and division, we must understand what these operations are and how they may be derived from addition
and multiplication. For example, what does subtracting 3 from 4 mean in
terms of addition? Well, it really means add the number −3 to 4. And how
8
CHAPTER 1. AXIOMS OF NUMBER SYSTEMS
are 3 and −3 related? Well, −3 is the unique number x such that 3 + x = 0.
And what is 0 in terms of addition? Well, 0 is the unique number y that
when you add y to any number z, you end up with z.
Similarly, what does dividing by 7 mean in terms of multiplication? Well,
it really means multiply by 17 . And how are 7 and 17 related? Well, 17 is the
unique number x such that 7x = 1. And what is 1 in terms of multiplication?
Well, 1 is the unique number y that when you multiply y to any number z,
you end up with z.
Using the above, we added the following properties to our list of properties
defining R:
(F4) (Existence of Identities) There are numbers 0, 1 ∈ R with 0 6= 1 such
that 0 + x = x and 1 · x = x for all x ∈ R.
(F5) (Existence of Inverses) For all x, y ∈ R with y 6= 0, there exists
−x, y −1 ∈ R such that x + (−x) = 0 and y · y −1 = 1.
Using these two properties, one then defines subtraction and division via
x−y = x+(−y) and x÷z = x·z −1 for all x, y, z ∈ R with z =
6 0. Furthermore,
it is possible to show that all of the numbers listed in (F4) and (F5) are
unique (that is, any number with the same properties as one of 0, 1, −x, or
y −1 must be the corresponding number)
Although the real numbers have the above five properties, they are
not the only number system that has all five properties. For example,
clearly the rational numbers Q (which are not equal to the real numbers
by the homework) also satisfy all five properties when we replace R with Q.
Consequently, we make the following definition.
Definition 1.3.1. A field is a set F together with two operations + and ·
such that a + b ∈ F and a · b ∈ F for all a, b ∈ F, and + and · satisfy (F1),
(F2), (F3), (F4), and (F5) as written above (replacing R with F).
Notice if one is given a field F and a subset E of F that has the property
that a + b ∈ E and a · b ∈ E for all a, b ∈ E, then E is a field with the
operations + and · provided 0, 1 ∈ E and −x, z −1 ∈ E for all x, z ∈ E with
z 6= 0. In this case, we call E a subfield of F. For example,
√
√
Q[ 2] := {x + y 2 | x, y ∈ Q}
is a subfield of R.
However, there are fields that look strikingly different from R.
Example 1.3.2. Consider Z2 = {0, 1} with the following rules for addition
and multiplication:
+
0
1
0
0
1
1
1
0
·
0
1
0
0
0
1
0
1
1.3. THE REAL NUMBERS
9
(think of 0 as all even numbers and 1 as all odd numbers; an odd plus an
odd is odd, an odd times an even is even, etc). One can verify that Z2 is a
field with the above operations.
All of the above properties listed are algebraic properties. Are there other
properties of R we can include to distinguish R from other fields?
1.3.2
Partially Ordered Sets
One notion that exists for the real numbers that does not exist for other
fields is the notion of an ordering; that is, given two numbers, we have a
notion which tells us which number is bigger. We begin with the following
concept.
Definition 1.3.3. Let X be a set. A relation on the elements of X is
called a partial ordering if:
1. (reflexivity) a a for all a ∈ X,
2. (antisymmetry) if a b and b a, then a = b for all a, b ∈ X, and
3. (transitivity) if a, b, c ∈ X are such that a b and b c, then a c.
Clearly ≤ (as usually defined) is a partial ordering on R. Here is another
example:
Example 1.3.4. Let
P(R) := {X | X ⊆ R}.
The set P(R) is known as the power set of R and consists of all subsets of R.
We define a relation on P(R) as follows: given X, Y ∈ P(R),
XY
if and only if
X ⊆ Y.
It is not difficult to verify that is a partial ordering on P(R).
The partial ordering in the previous example is not as nice as our ordering
on R. To see this, consider the sets X = {1} and Y = {2}. Then X Y
and Y X; that is, we cannot use the partial ordering to compare X and Y .
However, if x, y ∈ R, then either x ≤ y or y ≤ x. Consequently, we desire to
add in this additional property to our ordering:
Definition 1.3.5. Let X be a set. A partial ordering on X is called a
total ordering if for all x, y ∈ X, either x y or y x (or both).
The ordering one usually considers on R is clearly a total ordering.
However, it is also easy to place a total ordering on Z2 .
10
CHAPTER 1. AXIOMS OF NUMBER SYSTEMS
Example 1.3.6. Let Z2 be as in Example 1.3.2. Define 0 0, 0 1, 1 1,
and 1 0. It is easy to verify that is a total ordering on Z2 .
The problem with the ordering on Z2 is that addition and multiplication
do not interact well with respect to the ordering. The following describes
fields with ‘nice’ orderings:
Definition 1.3.7. An ordered field is a field F together with a total ordering
such that for all x, y, z ∈ F with x y, the following two properties hold:
• (Additive Property) x + z y + z.
• (Multiplicative Property) x · z y · z provided 0 z and y · z x · z
provided z 0.
In any ordered field, it must be the case that 0 1. Indeed, if 1 0,
then the Multiplicative Property implies 0 · 1 1 · 1 so 0 1 and 1 0 and
therefore antisymmetry implies 0 = 1 which contradicts (F4).
Note the ordering on Z2 given in Example 1.3.6 does not make Z2 into
an ordered field since 0 1 yet 0 + 1 1 + 1 (so this total ordering does not
satisfy the Additive Property).
It is clear that R is an
√ ordered field. However, it is clear that any subfield
of R (such as Q and Q[ 2]) are then also ordered fields. Consequently, we
still need a way to distinguish R from its subfields.
1.3.3
The Triangle Inequality
Before discussing how R differs from its subfields, we will analyze a useful
concept the ordering on R provides.
Definition 1.3.8. Given x ∈ R, the absolute value of x is
(
|x| =
x
if x ≥ 0
.
−x if x < 0
The absolute value has many important properties. For example, clearly
| − x| = |x|
for all x ∈ R (split the proof into two cases: x ≥ 0 and x < 0). Furthermore,
since x = ±|x| for all x ∈ R, it is not difficult to check that
|xy| = |x||y|
for all x, y ∈ R (split the proof into four cases: the two cases x ≥ 0 and x < 0,
each of which has the two cases y ≥ 0 and y < 0). However, the absolute
value is not important just for its properties, but for what it represents.
1.3. THE REAL NUMBERS
11
Notice that |x| represents the distance from x to 0. Consequently, we
can also see that |b − a| represents the distance from b to a for all a, b ∈ R.
Furthermore, for all a, δ ∈ R with δ > 0, the set
{x ∈ R | |x − a| < δ}
describes all points in R whose distance to a is strictly less than δ. Notice
|x−a| < δ if and only if −δ < x−a < δ if and only if a−δ < x < a+δ, which
provides an alternate description of the above set without using absolute
values. Such sets are quite important in this course so we make the following
notation.
Notation 1.3.9. For all a, b ∈ R with a ≤ b, we define
(a, b) := {x ∈ R | a < x < b}
[a, b) := {x ∈ R | a ≤ x < b}
(a, b] := {x ∈ R | a < x ≤ b}
[a, b] := {x ∈ R | a ≤ x ≤ b}.
For the first two, we permit ∞ to replace b, and, for the first and third, we
permit −∞ to replace a. Each of the above sets is called an interval with
(a, b) called an open interval and [a, b] called a closed interval.
In order to have a well-defined notion of distance in mathematics, several
properties need to be satisfied. Notice that if a, b ∈ R, then the distance
from b to a is zero exactly when |b − a| = 0, which is the same as saying
b = a. Furthermore, since |b − a| = | − (b − a)| = |a − b|, the distance from
b to a is the same as the distance from a to b. Finally, the last property
required to have a well-defined notion of distance is as follows:
Theorem 1.3.10 (The Triangle Inequality). Let x, y, z ∈ R. Then
|x − y| ≤ |x − z| + |z − y|.
That is, the distance from x to y is no more than the sum of the distance
from x to z and the distance from z to y.
y
z
x
12
CHAPTER 1. AXIOMS OF NUMBER SYSTEMS
Proof. If x = y, the result is trivial to verify. Consequently we will assume
x < y (if y < x, we can relabel y with x and x with y to run the following
proof). We have three cases to consider.
Case 1
z
x
Case 2
y
x
y
Case 3
z
x
z
y
Case 1. z < x: In this case, notice
|x − y| ≤ |z − y| = 0 + |z − y| ≤ |x − z| + |z − y|
as desired.
Case 2. y < z: In this case, notice
|x − y| ≤ |x − z| = |x − z| + 0 ≤ |x − z| + |z − y|
as desired.
Case 3. x ≤ z ≤ y: In this case, we easily see that
|x − y| = |x − z| + |z − y|.
Hence, as we have exhausted all cases (up to flipping x and y), the proof
is complete.
The Triangle Inequality is an incredibly useful tool in analysis. Furthermore, there are many other forms of the Triangle Inequality. For example,
letting x = a, y = −b, and z = 0 produces
|a + b| ≤ |a| + |b|
for all a, b ∈ R.
In addition, if we let x = a, y = 0, and z = b, we obtain
|a| ≤ |a − b| + |b|
so
|a| − |b| ≤ |a − b|,
and if we let x = b, y = 0, and z = a, we obtain
|b| ≤ |a − b| + |a|
so
− (|a| − |b|) ≤ |a − b|.
Consequently, we obtain that
||a| − |b|| ≤ |a − b|
for all a, b ∈ R.
1.3. THE REAL NUMBERS
1.3.4
13
The Least Upper Bound Property
We have seen that R (along with its subfields) are ordered fields. We now
begin the discussion of how to use this ordering to construct the final property
needed to distinguish R from all other fields!
Definition 1.3.11. Let X ⊆ R. An element α ∈ R is said to be an upper
bound for X if x ≤ α for all x ∈ X. An element α ∈ R is said to be a lower
bound for X if α ≤ x for all x ∈ X. Finally, X is said to be bounded above if
X has an upper bound, bounded, below if X has a lower bound, and bounded
if X has both an upper and lower bound.
Example 1.3.12. Let X = (0, 1). Then 1 is an upper bound of X and 0 is
a lower bound of X. Thus X is bounded. Furthermore, note that 5 is also
an upper bound of X and −7 is a lower bound of X.
Example 1.3.13. Let X = ∅. Then every number in R is both an upper
and lower bound of X vacuously (that is, there are no elements of X to
which to check the defining property).
Notice that N is bounded below as 1 is a lower bound (as is −2, 0, 0.5,
etc.). Does N have an upper bound? Our intuition says no so that N is not
bounded above. However, how do we prove this?
To tackle the above problem (in addition to describing the property
required to distinguish R from other ordered fields), one probably has noticed
in the above examples there were special upper/lower bounds that were
‘optimal’.
Definition 1.3.14. Let X ⊆ R. An element α ∈ R is said to be the least
upper bound of X if
• α is an upper bound of X, and
• if β is an upper bound of X, then α ≤ β.
We write lub(X) in place of α, provided α exists.
Similarly, an element α ∈ R is said to be the greatest lower bound of X if
• α is a lower bound of X, and
• if β is a lower bound of X, then β ≤ α.
We write glb(X) in place of α, provided α exists.
In the above definition, notice we have used the term ‘the least upper
bound’ instead of ‘a least upper bound’. This is because it is elementary
to show that a set with a least upper bound has exactly one least upper
bound. Indeed if α1 and α2 are both least upper bounds of a set X, then
α1 ≤ α2 and α2 ≤ α1 by the two defining properties of a least upper bound,
so α1 = α2 .
14
CHAPTER 1. AXIOMS OF NUMBER SYSTEMS
Example 1.3.15. Let X = [0, 1] and let Y = (0, 1). Then lub(X) =
lub(Y ) = 1 and glb(X) = glb(Y ) = 0. However, notice 0, 1 ∈ X whereas
0, 1 ∈
/ Y . This demonstrates that the least upper bound and greatest lower
bounds may or may not be in the set.
Example 1.3.16. Clearly a set that is not bounded above cannot have a
least upper bound and a set that is not bounded below cannot have a greatest
lower bound. Consequently ∅ has no least upper bound nor greatest lower
bound.
Example 1.3.17. Let
X = {x ∈ Q | x ≥ 0 and x2 < 2}.
Clearly glb(X) = 0 and lub(X) =
√
2.
The above example emphasizes the difference between Q and R. Notice
that X ⊆ Q. However, if we only consider numbers
√ in Q, then X does not
have a least upper bound
in
Q
as
if
b
∈
Q
and
2 < b, there is always
√
an r ∈ Q such that 2 < r < b (see homework). The following property
guarantees that R does not have such pitfalls.
Theorem 1.3.18 (The Least Upper Bound Property). Every nonempty subset of R that is bounded above has a least upper bound.
Note the term ‘non-empty’ must be included because of Example 1.3.16.
Furthermore, this completes our discussion of how to distinguish R from
other number systems since it is possible to show that any ordered field with
the Least Upper Bound Property is R! We will not demonstrate this fact as
it detours us from the goals of this course.
The Least Upper Bound Property is an amazing property that makes all
of analysis on R possible. In fact, we note the following corollaries of the
Least Upper Bound Property.
Corollary 1.3.19 (The Greatest Lower Bound Property). Every nonempty subset of R that is bounded below has a greatest lower bound.
Proof Sketch. Let X be a non-empty subset of R that is bounded below. Let
Y = {−x | x ∈ X}.
One can verify that if a ∈ R, then a is an upper bound for Y if and only
if −a is a lower bound for X. Consequently Y is bounded above (as X is
bounded below) and thus Y has a least upper bound by the Least Upper
Bound Property. Furthermore, it is not difficult to check that −lub(Y ) is
then the greatest lower bound of X.
1.3. THE REAL NUMBERS
15
Corollary 1.3.20 (The Archimedean Property). The natural numbers
are not bounded above in R.
Proof. Suppose N is bounded above in R. Then N must have a least upper
bound, say α, by the Least Upper Bound Property. Since α is an upper
bound of N, we know that n ≤ α for all n ∈ N. Hence n + 1 ≤ α and
thus n ≤ α − 1 for all n ∈ N. Thus α − 1 is an upper bound for N, which
contradicts the fact that α is the least upper bound of N as α − 1 < α.
1.3.5
Constructing the Real Numbers
In the previous section, we claimed that the real numbers are the unique
ordered field with the Least Upper Bound Property. However, how do we
know the real numbers exist at all? There are two main constructions of
the reals. The first uses equivalence (see Chapter 3) of Cauchy sequences
(see Chapter 3) of rational numbers. The other is more complicated and
is quickly sketched below. A more interested reader may consult https:
//en.wikipedia.org/wiki/Construction_of_the_real_numbers.
In Section 1.2 we rigorously constructed the natural numbers. From N
we can construct the integers Z by adding a symbol −n for all n ∈ N. One
then must define + and · using the notion of successors in Definition 1.2.1
and verify all of the desired properties. One must also extend the notion of
< from N to Z in the obvious way.
From Z we can then construct Q by defining Q to be the set with elements
of the form ab where a, b ∈ Z with b 6= 0, where we define ab = dc whenever
ad = bc. Care must be taken in subsequent definitions as there are multiple
ways to write a rational number. One then defines + and · as one does with
fractions, and then verifies that Q is a field. To extend < to Q, if a, b, c, d ∈ N
are all positive, we define ab < dc whenever ad < bc, and similar definitions
are provided in other cases. One then verifies that Q is an ordered field.
The real numbers may then be defined to be the set



X is bounded above,
R = X ⊆ Q X contains no greatest element, and


if x ∈ X then y ∈ X for all y < x



.


It remains to show that R is an ordered field with the Least Upper Bound
Property. This requires defining +, ·, and ≤, and verifying all of the above
properties, which can be quite time consuming. As an example, one defines
addition via
X + Y := {x + y | x ∈ X, y ∈ Y }
and then must check (F1), (F2), and that the zero element of R is
{q ∈ Q | q < 0}.
Furthermore, one obtains the least upper bound of elements of R, which are
being viewed as subsets of Q, by taking the union of the subsets.
16
CHAPTER 1. AXIOMS OF NUMBER SYSTEMS
Chapter 2
Sequences of Real Numbers
One of the most important concepts in calculus is the notion of converging
sequences. Knowing that a sequence converges to a number allows one to
use elements of the sequence to better and better approximate the number.
However, having a precise definition of a limit allows one to better understand
what limits really are.
2.1
2.1.1
The Limit of a Sequence
Definition of a Limit
Before discussing limits, we must ask, “What is a sequence?”
Definition 2.1.1. A sequence of real numbers is an ordered list of real
numbers indexed by the natural numbers.
If we have ak ∈ R for all k ∈ N, we will use (an )n≥1 or (a1 , a2 , a3 , . . .) to
denote a sequences whose first element is a1 , whose second element is a2 , etc.
Example 2.1.2. If c ∈ R and an = c for all n ∈ N, then the sequence
(an )n≥1 is the constant sequence with value c.
Example 2.1.3. For all n ∈ N, let an = n1 . Then (an )n≥1 is the sequence
(1, 21 , 13 , 14 , . . .).
Example 2.1.4. For all n ∈ N, let an = (−1)n+1 . Then (an )n≥1 is the
sequence (1, −1, 1, −1, 1, −1, . . .).
Example 2.1.5. Let a1 = 1 and a2 = 1. For n ∈ N with n ≥ 3, let
an = an−1 + an−2 . Then (an )n≥1 is the sequence
(1, 1, 2, 3, 5, 8, 13, . . .).
This sequence is known as the Fibonacci sequence and is an example of a
recursively defined sequence (a sequence where subsequent terms are defined
using the previous terms under a fixed pattern).
17
18
CHAPTER 2. SEQUENCES OF REAL NUMBERS
With the above notion of a sequence, we turn to the notion of limits. If
we consider the sequence ( n1 )n≥1 , we intuitively know that as n gets larger
and larger, the sequence gets closer and closer to zero. Thus we would want
to use this to say that 0 is the limit of ( n1 )n≥1 . This may lead us to take the
following as our definition of a limit:
“A sequence (an )n≥1 has limit L (as n tends to infinity)
if as n gets larger and larger, an gets closer to L.”
However, the fault in the above idea of a limit is that ( n1 )n≥1 also gets
‘closer and closer’ to −1. We prefer 0 over −1 as the limit of ( n1 )n≥1 since n1
better and better approximates 0 whereas we intuitively know that ( n1 )n≥1
cannot approximate −1. This leads us to the following better idea of what a
limit is:
Heuristic Definition. A sequence (an )n≥1 has limit L if the terms of
(an )n≥1 are eventually all as close to L as we would like.
Using the above as a guideline, we obtain a rigorous, mathematical
definition of the limit of a sequence of real numbers.
Definition 2.1.6. Let (an )n≥1 be a sequence of real numbers. A number
L ∈ R is said to be the limit of (an )n≥1 if for every > 0 there exists an
N ∈ N (which depends on ) such that |an − L| < for all n ≥ N .
If (an )n≥1 has limit L, we say that (an )n≥1 converges to L and write
L = limn→∞ an . Otherwise we say that (an )n≥1 diverges.
Example 2.1.7. Consider the constant sequence (an )n≥1 where an = c for
all n ∈ N and some c ∈ R. Notice for all > 0, |an − c| = 0 < for all n ∈ N.
Hence (an )n≥1 converges to c.
Example 2.1.8. To see that limn→∞ n1 = 0 using the definition of the limit,
let > 0 be arbitrary. Then (by the homework) there exists an N ∈ N such
that 0 < N1 < . Therefore, for all n ≥ N we obtain that 0 < n1 ≤ N1 < .
Hence n1 − 0 < for all n ≥ N . Hence limn→∞ n1 = 0. Note that ( n1 )n≥1
has limit zero, but no term in the sequence is zero.
Example 2.1.9. Using the definition of a limit, we see that a sequence
(an )n≥1 does not converge if for all L ∈ R there is an > 0 (depending on
the L) such that for every N ∈ N there is an n ≥ N such that |an − L| ≥ .
Using the above paragraph, we can show that ((−1)n+1 )n≥1 does not
converge. Indeed let L ∈ R be arbitrary and let = 12 . Suppose there exists
an N ∈ N such that |(−1)n+1 − L| < for all n ≥ N . Since there exists an
odd number n greater than N , we obtain that |1 − L| < . Therefore, since
= 12 , we obtain that L ∈ ( 12 , 32 ). Similarly, since there exists an even number
n greater than N , we obtain that | − 1 − L| < . Therefore, since = 12 ,
2.1. THE LIMIT OF A SEQUENCE
19
we obtain that L ∈ (− 32 , − 12 ). Hence L ∈ ( 12 , 32 ) ∩ (− 32 , − 12 ) = ∅ which is
absurd. Hence we have a contradiction so L is not the limit of ((−1)n+1 )n≥1 .
Therefore, since L ∈ R was arbitrary, ((−1)n+1 )n≥1 does not converge.
2.1.2
Uniqueness of the Limit
Notice in the definition of ‘the’ limit of a sequence, we used ‘the’ instead of ‘a’;
that is, how do we know that there is at most one limit to a sequence? The
following justifies the use of the word ‘the’ and demonstrates one important
proof technique when dealing with limits.
Proposition 2.1.10. Let (an )n≥1 be a sequence of real numbers. If L and
K are limits of (an )n≥1 , then L = K.
6 K, we know that
Proof. Suppose that L 6= K. Let = |L−K|
2 . Since L =
> 0.
Since L is a limit of (an )n≥1 , we know by the definition of a limit that
there exists an N1 ∈ N such that if n ≥ N1 then |an − L| < . Similarly,
since K is a limit of (an )n≥1 , we know by the definition of a limit that there
exists an N2 ∈ N such that if n ≥ N2 then |an − K| < .
Let n = max{N1 , N2 }. By the above paragraph, we have that |an −L| < and |an − K| < . Hence by the Triangle Inequality
|L − K| ≤ |L − an | + |an − K| < + = 2 = |L − K|
which is absurd (i.e. x < x is false for all x ∈ R). Thus we have obtained a
contradiction so it must be the case that L = K.
To conclude this section, we note the following that demonstrates that
|an − L| < may be replaced with |an − L| ≤ in the definition of the limit of
a sequence. This can be useful on occasion and also establishes an important
idea in handling limits: is simply a constant and may be modified.
Proposition 2.1.11. Let (an )n≥1 be a sequence of real numbers and let
L ∈ R. Then (an )n≥1 converges to L if and only if for all > 0 there exists
an N ∈ N such that |an − L| ≤ for all n ≥ N .
Proof. Suppose (an )n≥1 converges to L. Let > 0 be arbitrary. By the
definition of the limit, there exists an N ∈ N such that |an − L| < for all
n ≥ N . As this implies |an − L| ≤ for all n ≥ N and as > 0 was arbitrary,
one direction of the proof is complete.
For the other direction, assume that (an )n≥1 and L have the property
listed in the statement. Let > 0 be arbitrary. Let 0 = 2 . Since 0 > 0, the
assumptions of this direction imply that there exists an N ∈ N such that
|an − L| ≤ 0 for all n ≥ N . Hence |an − L| ≤ 0 < for all n ≥ N . As > 0
was arbitrary, (an )n≥1 converges to L by the definition of the limit.
20
CHAPTER 2. SEQUENCES OF REAL NUMBERS
Remark 2.1.12. By analyzing the above proof, we see that the definition
of the limit can be modified to involve a constant multiple of . That is, if
(an )n≥1 is a sequence of real numbers, L ∈ R, and k > 0, then L = limn→∞ an
if and only if for all > 0 there exists an N ∈ N such that |an − L| < k
for all n ≥ N . It is very important to note that the constant k CANNOT
depend on n nor .
2.2
The Monotone Convergence Theorem
With the above, there are two main questions for us to ask: “What types of
sequences converge?” and “How can we find the limits of sequences without
always appealing to the definition?” The goal of this section is to look at
the first question.
First let us ask, “Does the sequence (n)n≥1 converge?” Intuitively the
answer is no since this sequence does not approximate a number. To make
this rigorous, consider the following.
Definition 2.2.1. A sequence (an )n≥1 of real numbers is said to be bounded
if the set {an | n ∈ N} is bounded.
Proposition 2.2.2. Every convergent sequence is bounded.
Proof. Let (an )n≥1 be a sequence of real numbers that converge to a number
L ∈ R. Let = 1. By the definition of a limit, there exists an N ∈ N such
that |an − L| ≤ = 1 for all n ≥ N . Hence |an | ≤ |L| + 1 for all n ≥ N by
the Triangle Inequality.
Let M = max{|a1 |, |a2 |, . . . , |aN |, |L| + 1}. Using the above paragraph,
we see that |an | ≤ M for all n ∈ N. Hence −M ≤ an ≤ M for all n ∈ N so
(an )n≥1 is bounded.
The above shows us that boundness is a requirement for convergence of a
sequence. However, a bounded sequence need not converge. Indeed Example
2.1.9 shows that the sequence ((−1)n+1 )n≥1 (which is clearly bounded) does
not converge. However, a natural question to ask is, “Is there a condition we
may place on a sequence so that boundedness implies convergence?” Indeed
there is!
Definition 2.2.3. A sequence (an )n≥1 of real numbers is said to be
• increasing if an < an+1 for all n ∈ N,
• non-decreasing if an ≤ an+1 for all n ∈ N,
• decreasing if an > an+1 for all n ∈ N,
• non-increasing if an ≥ an+1 for all n ∈ N, and
2.3. LIMIT THEOREMS
21
• monotone if (an )n≥1 is non-decreasing or non-increasing.
Theorem 2.2.4 (Monotone Convergence Theorem). A monotone sequence (an )n≥1 of real numbers converges if and only if (an )n≥1 is bounded.
Proof. By Proposition 2.2.2, if (an )n≥1 converges, then (an )n≥1 is bounded.
For the other direction, suppose that (an )n≥1 is a monotone sequence
that is bounded. We will assume that (an )n≥1 is a non-decreasing sequence
for the remainder of the proof as the case when (an )n≥1 is non-increasing
can be demonstrated using similar arguments.
Since (an )n≥1 is bounded, {an | n ∈ N} has a least upper bounded, say
α, by the Least Upper Bound Property (Theorem 1.3.18). We claim that α
is the limit of (an )n≥1 . To see this, let > 0 be arbitrary. Since α is the least
upper bound of {an | n ∈ N}, we know that an ≤ α for all n ∈ N and α − is not an upper bound of {an | n ∈ N}. Hence there exists an N ∈ N such
that α − < aN . Since (an )n≥1 is non-decreasing, we obtain for all n ≥ N
that α − < aN ≤ an ≤ α, which implies |an − α| < for all n ≥ N . Since
> 0 was arbitrary, we obtain that α is the limit of (an )n≥1 by definition.
Hence (an )n≥1 converges.
Example 2.2.5. Consider
the sequence (an )n≥1 defined recursively via
√
a1 = 1 and an+1 = 3 + 2an for all n ≥ 1. In the homework, it was
demonstrated that 0 ≤ an ≤ an+1 ≤ 3 for all n ∈ N. Hence (an )n≥1
converges by the Monotone Convergence Theorem. The question remains,
“What is the limit of (an )n≥1 ?” By the proof of the Monotone Convergence
Theorem, we know the answer is lub({an | n ∈ N}), which is at most 3. But
is the answer 3 or a number less than 3?
2.3
Limit Theorems
To answer the above question and aid us in our computation of limits, there
are several theorems we may explore to aid us.
2.3.1
Limit Arithmetic
Our first goal is to determine how limits behave with respect to the simplest
operations on R.
Theorem 2.3.1. Let (an )n≥1 and (bn )n≥1 be sequences of real numbers such
that L = limn→∞ an and K = limn→∞ bn exist. Then
a) limn→∞ an + bn = L + K.
b) limn→∞ an bn = LK.
c) limn→∞ can = cL for all c ∈ R.
22
CHAPTER 2. SEQUENCES OF REAL NUMBERS
d) limn→∞
1
bn
=
1
K
whenever K 6= 0 (see proof for technicality).
e) limn→∞
an
bn
=
L
K
whenever K 6= 0 (see proof for technicality).
Proof. a) Let > 0 be arbitrary. Since L = limn→∞ an , there exists an
N1 ∈ N such that |an − L| < 2 for all n ≥ N1 . Similarly, since K =
limn→∞ bn , there exists an N2 ∈ N such that |bn − L| < 2 for all n ≥ N2 .
Let N = max{N1 , N2 }. Hence, using the Triangle Inequality, for all n ≥ N ,
|(an + bn ) − (L + K)| ≤ |an − L| + |bn − K| <
+ .
2 2
Hence (an + bn )n≥1 converges to L + K by definition.
b) Let > 0 be arbitrary. First note that 0 ≤ |K| < |K| + 1 so
|K|
0 ≤ |K|+1
≤ 1 (we will use this later). Next, since (an )n≥1 convergence,
(an )n≥1 is bounded by Proposition 2.2.2. Hence there exists an M > 0 such
that |an | < M for all n ∈ N.
Since L = limn→∞ an , there exists an N1 ∈ N such that |an −L| < 2(|K|+1)
1
for all n ≥ N1 (as 2(|K|+1)
> 0 is a constant). Similarly, since K = limn→∞ bn ,
1
there exists an N2 ∈ N such that |bn − L| < 2M
for all n ≥ N2 (as 2M
is a
constant). Let N = max{N1 , N2 }. Hence, using the Triangle Inequality, for
all n ≥ N ,
|an bn − LK| = |(an bn − an K) + (an K − LK)|
≤ |an bn − an K| + |an K − LK|
≤ |an ||bn − K| + |K||an − L|
≤ M |bn − K| + |K||an − L|
+ |K|
≤M
2M
2(|K| + 1)
≤ + = .
2 2
Hence (an bn )n≥1 converges to LK by definition.
c) Apply part (b) with bn = c for all n ∈ N.
d) The one technicality here is that if bn = 0, then
|K|
2
1
bn
does not make
sense. However, since K = limn→∞ bn and since
> 0 as K =
6 0, there
|K|
exists an N1 ∈ N such that |bn − K| < 2 for all n ≥ N1 . Therefore, by
|K|
the Triangle Inequality, |bn | ≥ |K| − |K|
2 = 2 > 0 for all n ≥ N1 . Hence, if
n ≥ N1 we have that |bn | > 0 and thus b1n is well-defined for suitably large n.
Furthermore, since limits depend only on the behaviour for large n, it makes
sense to consider the sequence ( b1n )n≥1 .
Let > 0 be arbitrary. In the above paragraph, we saw that |bn | ≥ |K|
2
2
for all n ≥ N1 and thus |b1n | ≤ |K|
for all n ≥ N1 (as K =
6 0). Since
2.3. LIMIT THEOREMS
23
K = limn→∞ bn , there exists an N2 ∈ N such that |bn − K| <
n ≥ N2 (as
|K|2
2
|K|2
2
for all
> 0 is a constant). Therefore, for all n ≥ max{N1 , N2 },
1
1 |K − bn |
b − K = |b ||K|
n
n
|K|2
2|bn ||K|
|K| 1
≤
2 |bn |
|K| 2
≤
= .
2 |K|
≤
Hence ( b1n )n≥1 converges to K1 by definition.
e) By part (d), limn→∞ b1n = K1 . Hence, as limn→∞ an = L, part (b)
implies that limn→∞ an b1n = L K1 completing the proof.
Example 2.3.2. Consider
the sequence (an )n≥1 defined recursively via
√
a1 = 1 and an+1 = 3 + 2an for all n ≥ 1. In Example 2.2.5, we used the
Monotone Convergence Theorem (Theorem 2.2.4) along with the fact that
0 ≤ an ≤ an+1 ≤ 3 to show that (an )n≥1 converges. It remains to compute
the limit of this sequence.
√
Let L = limn→∞ an . Since an+1 = 3 + 2an for all n ≥ 1, we have that
a2n+1 = 3 + 2an for all n ∈ N. Therefore, using Theorem 2.3.1, we obtain
that
3 + 2L = lim 3 + 2an = lim a2n+1
n→∞
n→∞
= lim a2n
index shift does not change the limit
n→∞
=
lim an
n→∞
2
= L2 .
Hence L2 − 2L − 3 = 0 so (L − 3)(L + 1) = 0 so L = 3 or L = −1. However,
since −1 < 0 < 1 = a1 ≤ an for all n ∈ N, |an − (−1)| ≥ 2 for all n ∈ N and
thus −1 cannot be the limit of (an )n≥1 by the definition of the limit. Hence
limn→∞ an = 3.
Example 2.3.3. Consider the sequence (an )n≥1 where an =
n ∈ N. Does (an )n≥1 converge and, if so, what is its limit?
To answer this question, notice that
an =
5n2
n2 5 +
+ 2n
=
3n2 − n + 4
n2 3 −
1
n
2
n2
+
4
n2
=
5n2 +2n
3n2 −n+4
for all
5 + n22
.
3 − n1 + n42
Since limn→∞ n1 = 0 by Example 2.1.8, and since limn→∞ n12 = 0 (see
homework), we obtain that
2
1
4
5
lim 5 + 2 = 5 and lim 3 − + 2 = 3 so lim an = .
n→∞
n→∞
n→∞
n
n n
3
24
CHAPTER 2. SEQUENCES OF REAL NUMBERS
In part (e) of Theorem 2.3.1, it was required in the proof that the
denominator does not converge to 0. This is due to the fact that there are
many different types of behaviour that may occur when the denominator of
a sequence of fractions tends to zero.
For two examples, first consider the sequences (an )n≥1 and (bn )n≥1 where
an = bn = n1 for all n ∈ N. Then clearly limn→∞ an = 0 = limn→∞ bn , and
1
an
n
= =1
1
bn
n
for all n ∈ N. Hence limn→∞ abnn = 1.
Alternatively, consider the sequences (an )n≥1 and (bn )n≥1 where an = 1
and bn = n1 for all n ∈ N. Then clearly limn→∞ an = 1 and limn→∞ bn = 0,
yet
an
1
= =n
1
bn
n
does not converge as (n)n≥1 is not bounded (see Proposition 2.2.2).
Thus, if (a
n≥1 and (bn )n≥1 are sequences and limn→∞ bn = 0, it is
n )
possible that abnn
does not converge. However, if limn→∞ abnn exists, then
n≥1
by part (b) of Theorem 2.3.1 we must have that
an
lim an = lim
bn =
n→∞
n→∞ bn
an an
lim
lim bn = lim
(0) = 0.
n→∞ bn
n→∞
n→∞ bn
Thus a necessary condition for limn→∞
limn→∞ an = 0.
2.3.2
an
bn
to exist when limn→∞ bn = 0 is
Diverging to Infinity
We have seen several examples of sequences that do not converge. In particular, Proposition 2.2.2 says that unbounded sequences have no chance to
converge. However, it is useful to discuss specific notions of divergence for
unbounded sequences.
Definition 2.3.4. A sequence (an )n≥1 of real numbers is said to diverge to
infinity , denoted limn→∞ an = ∞, if for every M ∈ R there exists an N ∈ N
such that an ≥ M for all n ≥ N .
Similarly, a sequence (an )n≥1 of real numbers is said to diverge to negative
infinity, denoted limn→∞ an = −∞, if for every M ∈ R there exists an N ∈ N
such that an ≤ M for all n ≥ N .
Example 2.3.5. It is clear that limn→∞ n = ∞.
Using the same proof ideas as in Theorem 2.3.1, we obtain the following.
2.3. LIMIT THEOREMS
25
Theorem 2.3.6. Let (an )n≥1 and (bn )n≥1 be sequences of real numbers.
Suppose that (bn )n≥1 diverges to ∞ (respectively −∞). Then
a) If (an )n≥1 is bounded below (respectively above), then limn→∞ an +bn = ∞
(respectively limn→∞ an + bn = −∞).
b) If there exists an M > 0 such that an ≥ M for all n ∈ N, then
limn→∞ an bn = ∞ (respectively limn→∞ an bn = −∞).
c) If (an )n≥1 is bounded, then limn→∞
an
bn
= 0.
Proof. See the homework.
The above theorem aids us in computing limits of fractions where the
denominator grows faster than the numerator.
Example 2.3.7. Consider the sequence (an )n≥1 where an =
n ∈ N. Then
n 2 + n1
2 + n1
=
an = .
n + n3
n n + n3
Therefore, since limn→∞
3
n
= 0 so
3
n n≥1
2n+1
n2 +3
is bounded, and since limn→∞ n =
∞, we have limn→∞ n+ n3 = ∞. Hence since limn→∞ 2+ n1 = 2 so 2 +
is bounded, we have that limn→∞
2.3.3
2n+1
n2 +3
for all
1
n n≥1
= 0.
The Squeeze Theorem
Using Theorem 2.3.6, it is possible to show that
cos(n)
n
n≥1
converges to
zero. Indeed, if an = cos(n) and bn = n for all n ∈ N, then (an )n≥1 is
bounded (above by 1 and below by −1) and limn→∞ bn = ∞, so part (c)
of Theorem 2.3.6 implies limn→∞ cos(n)
= 0. Alternatively, we can show
n
cos(n)
1
limn→∞ n = 0 by noting that − n ≤ cos(n)
≤ n1 for all n ∈ N and by
n
applying the following useful theorem (which may be used to prove part (c)
of Theorem 2.3.6).
Theorem 2.3.8 (Squeeze Theorem). Let (an )n≥1 , (bn )n≥1 and (cn )n≥1
be sequences of real numbers such that there exists an N0 ∈ N such that
an ≤ bn ≤ cn
for all n ≥ N0 .
If limn→∞ an = limn→∞ cn = L, then (bn )n≥1 converges and limn→∞ bn = L.
Proof. Let > 0 be arbitrary. Since L = limn→∞ an , there exists an N1 ∈ N
such that |an − L| < for all n ≥ N1 . Hence L − < an for all n ≥ N1 .
Similarly, since L = limn→∞ cn , there exists an N2 ∈ N such that |cn − L| < 26
CHAPTER 2. SEQUENCES OF REAL NUMBERS
for all n ≥ N2 . Hence cn < L + for all n ≥ N2 . Therefore, for all
n ≥ max{N0 , N1 , N2 }, we have that
L − < an ≤ bn ≤ cn ≤ L + .
Hence L − ≤ bn ≤ L + for all n ≥ max{N0 , N1 , N2 }, which implies
− ≤ bn − L ≤ and thus |bn − L| < for all n ≥ max{N0 , N1 , N2 }. Hence
(bn )n≥1 converges and limn→∞ bn = L by definition.
2.3.4
Limit Supremum and Limit Infimum
There are several sequences that do not converge nor diverge to ±∞. For
example, the sequence ((−1)n+1 )n≥1 has been shown to not converge and
clearly does not diverge to ±∞ as it is bounded. Consequently, we may ask,
“Is it possible to obtain some information about this sequence as n tends to
infinity?”
Clearly everything we want to know about the sequence ((−1)n+1 )n≥1
can be obtained by taking the least upper bound and greatest lower bound
of its elements. Consequently, we extend the notions of least upper bound
and greatest lower bound to include infinities.
Definition 2.3.9. Let X be a set of real numbers. The supremum of X,
denoted sup(X), is defined to be


 −∞
if X = ∅
lub(X) if X =
6 ∅ and X is bounded above .
sup(X) :=

 ∞
if X is not bounded above
Similarly, the infimum of X, denoted inf(X), is defined to be


 ∞
if X = ∅
glb(X) if X =
6 ∅ and if X is bounded below .
inf(X) :=

 −∞
if X is not bounded below
The infimum and supremum of sequences are not the objects we are after
since we are more interested in the behaviour of sequences
as n gets large.
For example, consider the sequence (−1)n+1 (1 + n1 )
. It is not difficult
n≥1
to see that 2 is the supremum of this sequence and − 23 is the infimum of
this sequence. However, as n gets larger and larger, the largest values of
the sequence are very close to 1 and the smallest values of the sequence are
very close to −1. How can we express this notion for arbitrary sequences
mathematically?
Let (an )n≥1 be a sequence. To see how the largest values of (an )n≥1
behave as n grows, we can take the suprumum after we ignore the first few
terms. Consequently, we define a new sequence (bn )n≥1 defined by
bn = sup{ak | k ≥ n}.
2.3. LIMIT THEOREMS
27
It is not difficult to see that b1 ≥ b2 ≥ b3 ≥ · · · as the supremum may only
get smaller as we remove terms from the set from which we are taking the
supremum. Consequently we see that (bn )n≥1 is a monotone sequence. Since
(bn )n≥1 is non-increasing, (bn )n≥1 either converges to a number, diverges to
−∞, or bn = ∞ for all n.
Applying the same idea with the sequence (cn )n≥1 where
cn = inf{ak | k ≥ n}
we arrive at the following.
Definition 2.3.10. The limit supremum of a sequence (an )n≥1 of real numbers, denoted lim supn→∞ an , is
lim sup an = lim sup{ak | k ≥ n} ∈ R ∪ {±∞}.
n→∞
n→∞
Similarly, the limit infimum of a sequence (an )n≥1 of real numbers,denoted
lim inf n→∞ an , is
lim inf an = lim inf{ak | k ≥ n} ∈ R ∪ {±∞}.
n→∞
n→∞
To see that all values are possible, it is not difficult to see that
n+1
lim sup(−1)
n→∞
1
1+
n
n+1
=1
and
lim inf (−1)
lim sup n = ∞
and
lim inf n = ∞.
n→∞
1
1+
n
= −1
whereas
n→∞
n→∞
Unsurprisingly, there is a solid connection between lim inf, lim sup, and
lim. To see this connection, we require the following.
Theorem 2.3.11 (Comparison Theorem). Let (an )n≥1 and (bn )n≥1 be
convergent sequences of real numbers. Suppose that there exists an N0 ∈ N
such that an ≤ bn for all n ≥ N0 . Then limn→∞ an ≤ limn→∞ bn .
Proof. Let L = limn→∞ an and let K = limn→∞ bn . Suppose that K < L.
Therefore if = L−K
2 , then > 0.
Since L = limn→∞ an , there exists an N1 ∈ N such that |an −L| < for all
n ≥ N1 . Hence L < an + for all n ≥ N1 . Similarly, since K = limn→∞ bn ,
there exists an N2 ∈ N such that |bn − K| < for all n ≥ N2 . Hence
bn ≤ K + for all n ≥ N2 .
Therefore, if n ≥ max{N1 , N2 , N0 }, we obtain that
L < an + ≤ bn + ≤ K + 2 = K + |K − L|.
Hence L − K < |K − L|. However, this is impossible as we are assuming
that K < L which would imply |K − L| = L − K. Hence we have obtained a
contradiction in the case that K < L so it must be the case that L ≤ K.
28
CHAPTER 2. SEQUENCES OF REAL NUMBERS
Proposition 2.3.12. Let (an )n≥1 be a sequence of real numbers such that
lim inf an , lim sup an ∈ R.
n→∞
n→∞
Then
lim inf an ≤ lim sup an .
n→∞
n→∞
In addition, (an )n≥1 converges if and only if lim inf n→∞ an = lim supn→∞ an .
In this case
lim an = lim inf an = lim sup an .
n→∞
n→∞
n→∞
Proof. For the remainder of the proof, for each n ∈ N let
bn = sup{ak | k ≥ n} ∈ R
cn = inf{ak | k ≥ n} ∈ R.
and
Clearly
lim sup an = lim bn ,
n→∞
lim inf an = lim cn ,
n→∞
n→∞
and
n→∞
cn ≤ an ≤ bn for all n.
Hence, the Comparison Theorem (Theorem 2.3.11) implies lim inf n→∞ an ≤
lim supn→∞ an .
Next, suppose that lim inf n→∞ an = lim supn→∞ an . Therefore, since
cn ≤ an ≤ bn for all n ∈ N, we obtain that (an )n≥1 converges and
lim an = lim inf an = lim sup an
n→∞
n→∞
n→∞
by the Squeeze Theorem (Theorem 2.3.8).
Finally, suppose L = limn→∞ an exists. Let > 0. Hence there exists an
N ∈ N such that |an − L| < for all n ∈ N. Thus L − ≤ an ≤ L + for all
n ≥ N . Therefore L − ≤ cn ≤ bn ≤ L + for all n ≥ N by the definition of
bn and cn . Hence, the Comparison Theorem (Theorem 2.3.11) implies that
L − ≤ lim cn ≤ lim bn ≤ L + n→∞
n→∞
for all > 0. In particular,
L−
1
1
≤ lim cn ≤ lim bn ≤ L +
n→∞
n→∞
m
m
1
for all m ∈ N. Therefore, since limm→∞ m
= 0, the above is only possible
(for example, by the Squeeze Theorem (Theorem 2.3.8)) if
L = lim bn = lim cn .
n→∞
n→∞
2.4. THE BOLZANO–WEIERSTRASS THEOREM
2.4
29
The Bolzano–Weierstrass Theorem
We have seen that many sequences do not converge. The lim inf and lim sup
do provide us with some information about the sequence. However, one
natural question to ask is, “If we have a sequence that does not converge,
can we remove terms from the sequence to make it converge?” Of course for
convergence, our new sequence must be bounded by Proposition 2.2.2. Thus
perhaps a better question is, “If we have a bounded sequence that does not
converge, can we remove terms from the sequence to make it converge?”
2.4.1
Subsequences
To answer the above question, we must describe what we mean by ‘remove
terms from a sequence’. This is made precise by the mathematical notion of
a subsequence.
Definition 2.4.1. A subsequence of a sequence (an )n≥1 of real numbers is
any sequence (bn )n≥1 of real numbers such that there exists an increasing
sequence of natural numbers (kn )n≥1 so that bn = akn for all n ∈ N.
For example, if (an )n≥1 is our favourite sequence an = (−1)n+1 for all
n ∈ N and if we choose kn = 2n − 1 for all n ∈ N, then (akn )n≥1 is the
sequence (1, 1, 1, . . .). Similarly, if (bn )n≥1 is the sequence where bn = n1
2
for all n ∈ N and if we choose
kn = n for all n ∈ N, then (bkn )n≥1 is the
sequence (1, 14 , 91 , . . .) = n12
.
n≥1
In the above paragraph, notice that the sequence (an )n≥1 diverges whereas
the given subsequence converges. Thus it is possible that divergent sequences
have convergent subsequences. Furthermore, (bn )n≥1 and the given subsequence both converge to 0. It is not difficult to see that every subsequence
of (bn )n≥1 converges to zero and this is no coincidence.
Proposition 2.4.2. Let (an )n≥1 be a sequence of real numbers that converges
to L. Every subsequence of (an )n≥1 converges to L.
Proof. Let (akn )n≥1 be a subsequence of (an )n≥1 . Let > 0. Since L =
limn→∞ an , there exists an N ∈ N such that |an − L| < for all n ≥ N .
Since (kn )n≥1 is an increasing sequence of natural numbers, there exists an
N0 ∈ N such that kn ≥ N for all n ≥ N0 . Hence |akn − L| < for all n ≥ N0 .
Therefore, as > 0 was arbitrary, we obtain that limn→∞ akn = L by the
definition of the limit.
2.4.2
The Peak Point Lemma
It is natural to ask, “Given a sequence, is there a ‘nice’ subsequence?” Of
course ‘nice’ is ambiguous, but the following demonstrates a specific form of
subsequence we may always construct.
30
CHAPTER 2. SEQUENCES OF REAL NUMBERS
Lemma 2.4.3 (The Peak Point Lemma). Every sequence of real numbers
has a monotone subsequence.
In order to prove the above lemma (and from which it gets its name), we
will use the following notion:
Definition 2.4.4. Let (an )n≥1 be a sequence of real numbers. An index
n0 ∈ N is said to be a peak point for the sequence (an )n≥1 if an ≤ an0 for all
n ≥ n0 .
Proof of Lemma 2.4.3. Let (an )n≥1 be a sequence of real numbers. The
proof is divided into two cases:
Case 1. (an )n≥1 has an infinite number of peak points: By assumption
there exists indices k1 < k2 < k3 < · · · such that kj is a peak point for all
j ∈ N. Therefore, we have by the definition of a peak point that akn ≥ akn+1
for all n ∈ N. Hence (akn )n≥1 is a non-increasing subsequence of (an )n≥1 .
Case 2. (an )n≥1 has a finite number (or no) peak points: Let n0 be the
largest peak point of (an )n≥1 (or n0 = 1 if (an )n≥1 has no peak points),
and let k1 = n0 + 1. Thus k1 is not a peak point of (an )n≥1 . Therefore
there exists a k2 > k1 = n0 + 1 such that ak2 > ak1 . Subsequently, since
k2 > k1 > n0 , k2 is not a peak point. Therefore there exists a k3 > k2 such
that ak3 > ak2 . Repeating this process ad nauseum, we obtain a sequence
of indices k1 < k2 < k3 < · · · such that akn+1 > akn for all n ∈ N. Hence
(akn )n≥1 is an increasing subsequence of (an )n≥1 .
As in either case a monotone subsequence can be constructed, the result
follows.
2.4.3
The Bolzano–Weierstrass Theorem
Combining the Peak Point Lemma together with the Monotone Convergence
Theorem, we easily obtain the following.
Theorem 2.4.5 (The Bolzano-Weierstrass Theorem). Every bounded
sequence of real numbers has a convergent subsequence.
Proof. Let (an )n≥1 be a bounded sequence of real numbers. By the Peak
Point Lemma (Lemma 2.4.3), there exists a monotone subsequence (akn )n≥1
of (an )n≥1 . Since (an )n≥1 is bounded, (akn )n≥1 is also bound and thus
converges by the Monotone Convergence Theorem (Theorem 2.2.4).
Chapter 3
An Introduction to Topology
With the above study of sequences complete, we can turn our attention to
analyzing properties of the real numbers and their subsets through convergent
sequences. One of the most important properties of the real numbers is
the notion of completeness, which implies the convergence of specific types
of sequences. Furthermore, when a subset of real numbers has specific
properties, the limit of a convergent sequence of real numbers from the subset
must also be in the subset.
3.1
Completeness of the Real Numbers
Currently, one difficulty with determining when a sequence converges is that
one must have an idea of what the limit of the sequence is in order to prove
convergence. This even holds for bounded monotone sequences as intuition
(and results) tell us the limit is either the least upper bound or greatest
lower bound of the sequence. Thus it is natural to ask, “Is there a way to
determine whether a sequence converges?”
3.1.1
Cauchy Sequences
If a sequence were to converge, then eventually all terms in the sequence
are as close to the limit as we would like. In particular, by the Triangle
Inequality, eventually all terms in the sequence are as close to each other as
we would like. This leads us to the notion of a Cauchy sequence.
Heuristic Definition. A sequence (an )n≥1 is said to be Cauchy if the terms
of (an )n≥1 are as close to each other as we would like as long as n is large
enough.
As with the definition of the limit of a sequence, the notion of Cauchy
sequence can be made mathematically precise.
31
32
CHAPTER 3. AN INTRODUCTION TO TOPOLOGY
Definition 3.1.1. A sequence (an )n≥1 of real numbers is said to be Cauchy
if for all > 0 there exists an N ∈ N such that |an − am | < for all n, m ≥ N .
Note that it is possible a sequence (an )n≥1 satisfies limn→∞ an+1 − an = 0
P
but is not Cauchy. Indeed if an = nk=1 k1 for each n ∈ N, then an+1 − an =
1
n+1 which clearly converges to zero. However, it is possible to show that
(an )n≥1 diverges to infinity. Although we cannot prove this divergence at this
time, many students will have seen series in previous courses and techniques
of the last chapter of this course will enable this proof.
As our definition of Cauchy sequence was motivated by convergence, the
following result should not be surprising (and provides a plethora of examples
of Cauchy sequences).
Theorem 3.1.2. Every convergent sequence of real numbers is Cauchy.
Proof. Let (an )n≥1 be a convergent sequence of real numbers. Let L =
limn→∞ an . To see that (an )n≥1 is Cauchy, let > 0 be arbitrary. Since
L = limn→∞ an , there exists an N ∈ N such that |an − L| < 2 for all n ≥ N .
Therefore, for all n, m ≥ N ,
|an − am | ≤ |an − L| + |L − am | <
+ = .
2 2
Thus, as > 0 was arbitrary, (an )n≥1 is Cauchy by definition.
3.1.2
Convergence of Cauchy Sequences
As our motivation for the study of Cauchy sequences was to find a method
for demonstrating the convergence of a sequence without knowing its limit,
it is natural to ask “Does every Cauchy sequence converge?” One method for
providing intuition to what the answer of this question is is to see if Cauchy
sequences share similar properties to convergent sequences. In particular,
analyzing Proposition 2.2.2 and its proof, we obtain the following.
Lemma 3.1.3. Every Cauchy Sequence is bounded.
Proof. Let (an )n≥1 be a Cauchy sequence. Since (an )n≥1 is Cauchy, there
exists an N ∈ N such that |an − am | < 1 for all n, m ≥ N . Hence, by
letting m = N , we obtain that |an | ≤ |aN | + 1 for all n ≥ N by the Triangle
Inequality.
Let M = max{|a1 |, |a2 |, . . . , |aN |, |L| + 1}. Using the above paragraph,
we see that |an | ≤ M for all n ∈ N. Hence −M ≤ an ≤ M for all n ∈ N so
(an )n≥1 is bounded.
As further intuition towards whether all Cauchy sequence converge, recall
Proposition 2.4.2 demonstrates subsequences of convergent sequence must
converge. The following demonstrates the converse is true if our sequence is
assumed to be Cauchy.
3.2. TOPOLOGY OF THE REAL NUMBERS
33
Lemma 3.1.4. Let (an )n≥1 be a Cauchy sequence. If a subsequence of
(an )n≥1 converges, then (an )n≥1 converges.
Proof. Let (an )n≥1 be a Cauchy sequence with a convergent subsequence
(akn )n≥1 and let L = limn→∞ akn . We claim that limn→∞ an = L. To see this,
let > 0 be arbitrary. Since (an )n≥1 is Cauchy, there exists an N ∈ N such
that |an − am | < 2 for all n, m ≥ N . Furthermore, since L = limn→∞ akn ,
there exists an kj ≥ N such that |akj − L| < 2 . Hence, if n ≥ N then
|an − L| ≤ |an − akj | + |akj − L| <
+ = .
2 2
Thus, as > 0 was arbitrary, (an )n≥1 is converges to L by definition.
Using Lemma 3.1.4, we easily obtain the following.
Theorem 3.1.5 (Completeness of the Real Numbers). Every Cauchy
sequence of real numbers converges.
Proof. Let (an )n≥1 be a Cauchy sequence. By Lemma 3.1.3, (an )n≥1 is
bounded. Therefore the Bolzano-Weierstrass Theorem (Theorem 2.4.5)
implies that (an )n≥1 has a convergent subsequence. Hence Lemma 3.1.4
implies that (an )n≥1 converges.
Theorem 3.1.5 demonstrates that the real numbers is a complete space
(a space where every Cauchy sequence converges). The terminology comes
from the fact that complete spaces have no ‘holes’ in them. In fact, the
Completeness of the Real Numbers is logically equivalent to the Least Upper
Bound Property (i.e. if instead of asking for the real numbers to have the
Least Upper Bound Property we asked for them to be complete, we would
still end up with the real numbers).
3.2
Topology of the Real Numbers
Often when dealing with limits, it is useful to think of convergence in terms
of open intervals. Indeed notice that |an − L| < for all n ≥ N if and only
if an ∈ (L − , L + ) for all n ≥ N . Consequently, open intervals have an
important connection to properties of the real numbers. The goal of this
section is to study how specific subsets of real numbers can give us results
about real numbers.
3.2.1
Open Sets
Instead of just studying open intervals, we desire a larger class of sets. These
sets are characterized by the fact that each of their elements is contained in
an open interval which is contained in the set.
34
CHAPTER 3. AN INTRODUCTION TO TOPOLOGY
Definition 3.2.1. A set U ⊆ R is said to be open if whenever x ∈ U there
exists an > 0 such that (x − , x + ) ⊆ U .
Example 3.2.2. Unsurprisingly, each open interval is open. To see this,
suppose a, b ∈ R are such that a < b. To see that (a, b) is open, let x ∈ (a, b)
be arbitrary. Then, if = min{x − a, b − x}, then (x − , x + ) ⊆ (a, b).
Thus, as x ∈ (a, b) was arbitrary, (a, b) is open.
Using similar arguments, it is possible to show that if a = −∞ and/or
b = ∞, then (a, b) is open. Consequently, (−∞, ∞) = R is open.
Example 3.2.3. The empty set is open since the definition of open is
vacuously true for ∅ (as there are no elements in the empty set).
Example 3.2.4. If a, b ∈ R are such that a < b, then [a, b) is not open.
Indeed for all > 0, (a − , a + ) is not a subset of [a, b) since a − 12 ∈
/ [a, b).
Similar arguments can be used to show that (a, b] and [a, b] are not open.
With the above definition and examples, it is natural to ask “Can we
describe all open subsets of the real numbers?” The following gives us a
method for constructing open sets from other open sets.
Proposition 3.2.5. Let I be an non-empty set and for each i ∈ I let Ui be
an open subset of R. Then
•
S
•
T
i∈I
Ui is open in R, and
i∈I
Ui is open in R provided I has a finite number of elements.
Proof. To see that i∈I Ui is open, let x ∈ i∈I Ui be arbitrary. Then x ∈ Ui0
for some i0 ∈ I. Therefore, as Ui0 is open, there exists an > 0 such that
S
S
(x − , x + ) ⊆ Ui0 . Hence (x − , x + ) ⊆ i∈I Ui . Since x ∈ i∈I Ui was
S
arbitrary, i∈I Ui is open.
T
To see that i∈I Ui is open in R provided I has a finite number of
T
elements, x ∈ i∈I Ui be arbitrary. Hence x ∈ Ui for each i ∈ I. Since Ui
is open, for each i ∈ I there exists an i > 0 such that (x − i , x + i ) ⊆ Ui .
Let = min{i | i ∈ I}. Since I has a finite number of elements, > 0.
Furthermore, by the definition of , (x − , x + ) ⊆ Ui for all i ∈ I. Hence
T
T
T
(x − , x + ) ⊆ i∈I Ui . Since x ∈ i∈I Ui was arbitrary, i∈I Ui is open.
S
S
Note the proof of the intersection part of Proposition 3.2.5 does not work
if I has an infinite number of elements. However, this does not mean the
result is false when I has an infinite number of elements. However, the result
is false when I has aninfinite number of elements. To see this, for each
n ∈ N let Un = − n1 , n1 . Then
\
n≥1
Un = {0}
3.2. TOPOLOGY OF THE REAL NUMBERS
35
which is clearly not an open set.
Proposition 3.2.5 shows us that the union of any number of open intervals
is an open set. In fact, our next result (Theorem 3.2.7) will demonstrate
that every open subset of the real numbers is a union of open intervals. To
prove this result, we will need the following mathematical construct. This
mathematical construct relaxes the notion of what it means for two objects
to be the same.
Definition 3.2.6. Let X be a set. A relation ∼ on the elements of X is
said to be an equivalence relation if:
1. (reflexive) x ∼ x for all x ∈ X,
2. (symmetric) if x ∼ y, then y ∼ x for all x, y ∈ X, and
3. (transitive) if x ∼ y and y ∼ z, then x ∼ z for all x, y, z ∈ X.
Given an x ∈ X, the set {y ∈ X | y ∼ x} is called the equivalence class of x.
Equivalence relations will be an essential component of the subsequent
chapter. For now, our first example of an equivalence relation is contained
in the proof of the following result.
Theorem 3.2.7. Every (non-empty) open subset of R is a union of open
intervals.
Proof. Let U be an open subset of R. Define an equivalence relation on U as
follows: if x, y ∈ U , then x ∼ y if and only if [x, y] ⊆ U and [y, x] ⊆ U (one
of these intervals is usually empty). It is not too difficult to see that ∼ is
an equivalence relation on U (indeed ∼ is clearly reflexive and symmetric;
transitive takes a moment of thought to check the four cases).
For each x ∈ U , let Ex denote the equivalence class of x with respect to
∼. Clearly
[
U=
Ex
x∈U
as x ∈ Ex for all x ∈ U . Hence if each Ex is an open interval, the proof will
be complete.
Let x ∈ U be fixed. Let
αx = inf(Ex )
and
βx = sup(Ex ).
We claim that Ex = (αx , βx ).
First, we claim that αx < βx . To see this, notice that x ∈ Ex ⊆ U . Hence,
as U is open, there exists an > 0 such that (x − , x + ) ⊆ U . Clearly
y ∼ x for all y ∈ (x − , x + ) so
αx ≤ x − < x + ≤ β x .
36
CHAPTER 3. AN INTRODUCTION TO TOPOLOGY
To see that (αx , βx ) ⊆ Ex , let y ∈ (αx , βx ) be arbitrary. Since αx < y <
βx , by the definition of inf and sup there exists z1 , z2 ∈ Ex such that
α x ≤ z1 < y < z 2 ≤ β x .
Since z1 , z2 ∈ Ex , we have z1 ∼ x and z2 ∼ x. Thus z1 ∼ z2 so [z1 , z2 ] ⊆ U .
Hence y ∈ [z1 , z2 ] ⊆ U . Therefore, as y ∈ (αx , βx ) was arbitrary, (αx , βx ) ⊆
Ex .
To see that Ex ⊆ (αx , βx ), note that Ex ⊆ (αx , βx ) ∪ {αx , βx } by the
definition of αx and βx . Thus it suffices to show that αx , βx ∈
/ Ex . Suppose
βx ∈ Ex (this implies βx 6= ∞). Then βx ∈ U so there exists an > 0 so
that (βx − , βx + ) ⊆ U . Hence βx + 12 ∼ βx ∼ x (as βx ∈ Ex ). Hence
βx + 12 ∈ Ex . However βx + 12 > βx so βx + 12 ∈ Ex contradicts the fact that
βx = sup(Ex ). Hence we have obtained a contradiction so βx ∈
/ Ex . Similar
arguments show that αx ∈
/ Ex . Hence Ex = (αx , βx ) thereby completing the
proof.
Analyzing the above proof, a natural question to ask is, “How many
open intervals do we need in the union?” Clearly instead of unioning over
all elements in the open set, we can take a union of all of the equivalence
classes. Consequently, we can index the open intervals by a single element
in each equivalence class. Hence, as each open interval contains a rational
number (by the homework), each open set of real numbers is a union of open
intervals indexed by a subset of the rational numbers. Consequently, it is
natural to ask, “How many rational numbers are there?” This question will
be a focus of the next chapter.
To conclude this subsection and to give motivation for the study of open
sets, we note the following result (whose proof takes a moment of thought).
Proposition 3.2.8. Let (an )n≥1 be a sequence of real numbers. A number
L ∈ R is the limit of (an )n≥1 if and only if for every open set U ⊆ R such
that L ∈ U , there exists an N ∈ N such that an ∈ U for all n ≥ N .
The above demonstrates an alternative definition for the limit of a sequence of real numbers. The above is useful in generalizing limits to abstract
spaces where one defines a good notion of open sets (a topology which satisfies
the conclusions of Proposition 3.2.5) which then determines which sequences
converge.
3.2.2
Closed Sets
Although the notion of open sets is important in future courses, the following
notion is far more important for this course.
Definition 3.2.9. A set F ⊆ R is said to be closed if F c is open.
3.2. TOPOLOGY OF THE REAL NUMBERS
37
Using our notion of open sets, we easily see that ∅ and R are both open
sets. Furthermore, for all a, b ∈ R with a < b, we see that [a, b] is open since
[a, b]c = (−∞, a) ∪ (b, ∞)
which is a union of open sets and thus open by Proposition 3.2.5.
It is important to note that there are subsets of R that are not open nor
closed. Indeed if a, b ∈ R are such that a < b, then [a, b) is not open and not
closed. Indeed [a, b) is not open by Example 3.2.4. Furthermore, since
[a, b)c = (−∞, a) ∪ [b, ∞)
we see that [a, b)c is not open and thus [a, b) is not closed. However, since
[a, ∞)c = (−∞, a)
and
(−∞, b]c = (b, ∞)
are open sets, [a, ∞) and (−∞, b] are closed sets.
Due to the nature of the complement of a set, the following trivially
follows from Proposition 3.2.5.
Proposition 3.2.10. Let I be an non-empty set and for each i ∈ I let Fi
be a closed subset of R. Then
•
T
•
S
i∈I
Fi is closed in R, and
i∈I
Fi is open in R provided I has a finite number of element.
Proof. Since
!c
\
i∈I
Fi
!c
=
[
i∈I
Fic
and
[
i∈I
Fi
=
\
Fic
i∈I
by the homework, the result follows by the definition of a closed set along
with Proposition 3.2.5.
Example 3.2.11. Proposition 3.2.10 can be used to show that Z is closed
in R as {n} is a closed set for each n ∈ Z.
The reason we are interested in closed sets is the following result that
shows that closed sets contain all of their limits.
Theorem 3.2.12. A set F ⊆ R is closed if and only if whenever (an )n≥1
is a convergent sequence of real numbers with an ∈ F for all n ∈ N, then
limn→∞ an ∈ F .
Proof. We divide the proof into two cases: either F is closed, or F is not
closed.
Suppose F ⊆ R is a closed set. Let (an )n≥1 be a convergent sequence of
real numbers with an ∈ F for all n ∈ N and let L = limn→∞ an . Suppose
38
CHAPTER 3. AN INTRODUCTION TO TOPOLOGY
L∈
/ F . Hence L ∈ F c . Since F is closed, F c is open. Therefore L ∈ F c
implies there exists an > 0 such that (L − , L + ) ⊆ F c . However, since
L = limn→∞ an , there exists an N ∈ N such that aN ∈ (L − , L + ) ⊆ F c .
Hence aN ∈ F c and aN ∈ F which is a contradiction. Therefore, it must be
the case that L ∈ F .
Suppose F is not a closed set. Hence F c is not an open set. Therefore
there exists an L ∈ F c such that for each n ∈ N there exists a number
an ∈ (L − n1 , L + n1 ) with an ∈
/ F c . Hence (an )n≥1 is a sequence of real
numbers with an ∈ F and
L−
1
1
≤ an ≤ L +
n
n
for all n ∈ N. Therefore, by the Squeeze Theorem (Theorem 2.3.8), (an )n≥1
converges to L. Since an ∈ F for all n ∈ N and L ∈
/ F , the proof is
complete.
Example 3.2.13. The set
X=
1 n∈N
n
is not closed since n1 ∈ X for all n ∈ N and 0 = limn→∞
However, one can show that
1
n
yet 0 ∈
/ X.
1 n ∈ N ∪ {0}
n
is closed by showing that every convergent sequence whose elements are in
this set (of which there are lots) converges to a number in this set.
3.3
Compactness
There are sets, known as compact sets, that are nicer than closed sets in
many ways. In particular, compact sets are far more useful in future studies.
3.3.1
Definition of Compactness
To define the notion of a compact set, we will need the following definition.
Definition 3.3.1. Let X ⊆ R. A collection {Ui | i ∈ I} of subsets of R is
S
said to be an open cover of X if Ui is open for all i ∈ I and X ⊆ i∈I Ui .
For example, if for each n ∈ N we let Un = (−n, n), then {Un | n ∈ N}
is an open cover of R (and any subset of R). In addition, {( n1 , 1) | n ∈ N} is
an open cover of (0, 1).
3.3. COMPACTNESS
39
Definition 3.3.2. A set K ⊆ R is said to be compact if every open cover
has a finite subcover; that is, if {Ui | i ∈ I} is an open cover of K, then
S
there exists an n ∈ N and i1 , . . . , in ∈ I such that K ⊆ nk=1 Uik .
Example 3.3.3. Let
K = {0} ∪
1 n∈N .
n
We claim that K is a compact set.
S
To see this, let {Ui | i ∈ I} be any open cover of K. Since 0 ∈ i∈I Ui ,
there exists an i0 ∈ I such that 0 ∈ Ui0 . Therefore there exists an > 0 so
that (−, ) ⊆ Ui0 .
Since limn→∞ n1 = 0, there exists an N ∈ N such that n1 ∈ (−, ) ⊆ Ui0
S
for all n ≥ N . Furthermore, since K ⊆ i∈I Ui , for each n < N we may
chose an in ∈ I such that n1 ∈ Uin . Hence, by construction
K⊆
N[
−1
Uik
k=0
so {Ui0 , . . . , UiN −1 } is a finite open subcover of K.
It is natural to ask whether R is compact. Since the open cover {(−n, n) |
n ∈ N} of R clearly has no finite subcover, we see that R is not compact.
Using the same open cover, we obtain the following.
Theorem 3.3.4. If K ⊆ R is compact, then K is bounded.
Proof. Let K ⊆ R be compact. For each n ∈ N, let Un = (−n, n). Therefore,
S
since n≥1 Un = R, we have that {Un | n ∈ N} is an open cover of K. Since
K is compact, there exists numbers k1 , . . . , km ∈ N such that {Uk1 , . . . , Ukm }
is an open cover of K. Therefore, if M = max{k1 , . . . , km }, then
K⊆
m
[
Ukj = (−M, M ).
j=1
Hence K is bounded.
Analyzing the open cover {( n1 , 1) | n ∈ N} of (0, 1), we see that this open
cover has not finite subcover and thus (0, 1) is not compact. Using similar
ideas, we obtain the following.
Theorem 3.3.5. If K ⊆ R is compact, then K is closed.
Proof. Let K ⊆ R be compact. Suppose K is not closed. By Theorem 3.2.12,
there exists a convergent sequence (an )n≥1 such that an ∈ K for all n ∈ N
yet L = limn→∞ an ∈
/ K. We will use the sequence (an )n≥1 to obtain a
contradiction to the fact that K is compact.
40
CHAPTER 3. AN INTRODUCTION TO TOPOLOGY
For each n ∈ N let Un = [L − n1 , L + n1 ]c . Notice that each Un is open and
[
Un = R \ {L}.
n≥1
Hence, as L ∈
/ K, {Un | n ∈ N} is an open cover of K.
We claim that {Un | n ∈ N} does not contain a finite subcover of K. To
see this, suppose to the contrary that Uk1 , . . . , Ukm is a finite subcover of K
for some k1 , . . . , km ∈ N. Let M = max{k1 , . . . , km }. Then
K⊆
m
[
j=1
1
1
⊆ L−
,L +
M
M
Ukj
c
.
1
1
Thus (L − M
) ⊆ K c . However, since L = limn→∞ an , there exists an
,L+ M
1
1
, L+ M
N ∈ N such that aN ∈ (L − M
). Therefore, as aN ∈ K by assumption
and we have demonstrated that aN ∈
/ K, we have obtained a contradiction.
Therefore {Un | n ∈ N} does not contain a finite subcover of K thereby
contradicting the fact that K is compact. Therefore, as we have obtained a
contradiction, it must be the case that K is closed.
3.3.2
The Heine-Borel Theorem
Theorems 3.3.4 and 3.3.5 show that compact subsets of R are closed and
bounded. In fact, the following theorem shows these are the only compact
subsets of R.
Theorem 3.3.6 (The Heine-Borel Theorem). A set K ⊆ R is compact
if and only if K is closed and bounded.
Proof. If K is a compact subset of R, then K is bounded and closed by
Theorems 3.3.4 and 3.3.5 respectively.
Let K ⊆ R be closed and bounded. To see that K is compact, let
{Ui | i ∈ I} be an arbitrary an open cover of K. We claim that {Ui | i ∈ I}
has a finite subcover of K. To see this, suppose to the contrary that
{Ui | i ∈ I} does not have a finite subcover of K.
Since K is bounded, there exists an M ∈ R such that K ⊆ [−M, M ].
Since {Ui | i ∈ I} is an open cover that does not have a finite subcover of
K, either
K ∩ [−M, 0] or K ∩ [0, M ]
does not have a finite subcover (as if each has a finite subcover, then combining
the finite subcovers yields a finite subcover of K). Choose I1 = [a1 , b1 ] from
{[−M, 0], [0, M ]} so that K ∩ I1 does not have a finite subcover. Note that
|b1 − a1 | = M .
Using the ideas in the above paragraph, there must exist closed intervals
I1 ⊇ I2 ⊇ I3 ⊇ · · · such that K ∩ In does not have a finite subcover for all
3.3. COMPACTNESS
41
1
n ∈ N and if In = [an , bn ], then |bn − an | = 2n−1
M . Since K ∩ In does not
have a finite subcover for all n ∈ N, K ∩ In 6= ∅ for all n ∈ N. Hence, for
each n ∈ N, we can choose a cn ∈ K ∩ In .
We claim that the sequence (cn )n≥1 is Cauchy. To see this, let > 0 be
arbitrary. Since limn→∞ 21n = 0 (see the homework), there exists an N ∈ N
such that 2N1−1 M < . Therefore, if n, m ≥ N , then cn , cm ∈ IN so
|cn − cm | ≤ |bN − aN | =
1
2N −1
M < .
Hence, as > 0 was arbitrary, (cn )n≥1 is Cauchy. Hence L = limn→∞ cn exists
by the Completeness of the Real Numbers (Theorem 3.1.5). Furthermore,
since K is closed by assumption, L ∈ K by Theorem 3.2.12. Furthermore,
note that L ∈ In for all n ∈ N by Theorem 3.2.12 since In is closed and
cm ∈ In for all m ≥ n.
Since {Ui | i ∈ I} is an open cover of K, there exists an i0 ∈ I so
that L ∈ Ui0 . Hence, since Ui0 is open, there exists an > 0 so that
(L − , L + ) ⊆ Ui0 . Since limn→∞ 21n = 0, there exists an N ∈ N such that
|bN − aN | = 2N1−1 M < . Hence, since L ∈ IN so aN ≤ L ≤ bN , it must be
the case that IN ⊆ (L − , L + ) ⊆ Ui0 . Hence Ui0 is a finite open cover
of K ∩ IN which contradicts the fact that K ∩ In does not admit a finite
subcover. Hence we have obtained a contradiction so {Ui | i ∈ I} must
admit a finite subcover of K. Since {Ui | i ∈ I} was an arbitrary an open
cover of K, K is compact by definition.
The Heine-Borel Theorem shows us that compact subsets of R are precisely
the closed and bounded sets. In other topological spaces, this need not be
the case. However, compact subsets in those spaces play the same role as
closed, bounded sets play in this course.
3.3.3
Sequential Compactness
In topological spaces, there are other notions of compactness. In particular,
the following is a notion which requires that each sequence in the set has a
convergent subsequence in the set.
Definition 3.3.7. A set K ⊆ R is said to be sequentially compact if whenever
(an )n≥1 is a sequence of real numbers with an ∈ K for all n ∈ N, then there
exist a subsequence of (an )n≥1 that converges to an element of K.
Perhaps (unsurprisingly), sequentially compact and compact are the same
notion for subsets of real numbers.
Theorem 3.3.8. A set K ⊆ R is sequentially compact if and only if K is
compact.
42
CHAPTER 3. AN INTRODUCTION TO TOPOLOGY
Proof. Suppose K is compact. Thus K is closed and bounded by the HeineBorel Theorem (Theorem 3.3.6). Let (an )n≥1 be an arbitrary sequence of
real numbers with an ∈ F for all n ∈ N. Thus (an )n≥1 must be bounded
and thus has a convergent subsequence (akn )n≥1 by the Bolzano-Weierstrass
Theorem (Theorem 2.4.5). Since an ∈ F for all n ∈ N and since K is closed,
the limit of (akn )n≥1 must be in F . Hence, as (an )n≥1 was arbitrary, K is
sequentially compact.
Suppose K is sequentially compact. We claim that K must be bounded.
Indeed, if K is not bounded above, then for all n ∈ N there exists a an ∈ K
such that an ≥ n. Therefore, since every subsequence of (an )n≥1 is unbounded
and thus cannot converge by Proposition 2.2.2, (an )n≥1 does not have a
convergent subsequence. As this contradicts the fact that K is sequentially
compact, we must have that K is bounded.
Next we claim that K is closed. Indeed, if K is not closed, then by
Theorem 3.2.12 there exists a convergent sequence (an )n≥1 with an ∈ K for
all n ∈ N such that limn→∞ an ∈
/ K. Therefore Proposition 2.4.2 implies
that every subsequence of (an )n≥1 converges to limn→∞ an ∈
/ K. As this
contradicts the fact that K is sequentially compact, we must have that K is
closed.
Hence K being sequentially compact implies K is closed and bounded.
Hence the Heine-Borel Theorem (Theorem 3.3.6) implies that K is compact.
3.3.4
The Finite Intersection Property
There is one additional notion related to compactness that is particularly
useful. To begin, we need a method for taking a set X and making the
smallest possible closed set containing X.
Definition 3.3.9. The closure of a subset X of R, denoted X, is the smallest
closed subset of R containing X.
By Proposition 3.2.10,
if FX = {Y ⊆ R | X ⊆ Y, Y is closed}
then
X=
\
Y.
Y ∈FX
Consequently, we obtain that (0, 1) = [0, 1]. Furthermore, the closure of a
closed set X is just X.
Example 3.3.10. The closure of the rational numbers in the real numbers
is the real numbers; that is, Q = R. To see this, we note each element of R
is a limit of elements of Q (by the homework). Consequently, by Theorem
3.2.12, the only closed set containing Q is R.
Generalizing the idea in the above example, we obtain the following
characterization of the closure of a set of real numbers.
3.3. COMPACTNESS
43
Lemma 3.3.11. Let X ⊆ R. If z ∈ X, then for all > 0 there exists a
x ∈ X so that |x − z| < .
Proof. Let z ∈ X. Suppose to the contrary that there exists an > 0
so that |x − z| ≥ for all x ∈ X. Then (z − , z + ) ∩ X = ∅. Hence
X ⊆ (−∞, z − ] ∩ [z + , ∞). Since (−∞, z − ] ∩ [z + , ∞) is a closed set
containing X, we have X ⊆ (−∞, z − ] ∩ [z + , ∞) by the definition of the
closure. As X ⊆ (−∞, z − ] ∩ [z + , ∞) contradicts the fact that z ∈ X,
the result follows.
To describe the final property related to compactness, we require the
following definition.
Definition 3.3.12. A collection {Ai | i ∈ I} of subsets of R is said to have
the finite intersection property if whenever J ⊆ I has a finite number of
T
elements, j∈J Aj 6= ∅.
Theorem 3.3.13. A set K ⊆ R is compact if and only if whenever
{Fi | i ∈ I}
is a collection of closed subsets of K with the finite intersection property,
T
then i∈I Fi 6= ∅.
Proof. Let K be a compact subset of R. Let {Fi | i ∈ I} be a collection of
closed subsets of K with the finite intersection property. We must show that
T
T
i∈I Fi 6= ∅. Suppose to the contrary that
i∈I Fi = ∅. For each i ∈ I, let
Ui = Fic . Then
!c
[
Ui =
i∈I
[
Fic
\
=
i∈I
= ∅c = R
Fi
i∈I
by the homework. Hence {Ui | i ∈ I} is an open subcover of K. However,
for any n ∈ N and i1 , . . . , in ∈ I, we have that
n
[
m=1
Uim =
n
[
Ficm
m=1
=
n
\
!c
Fim
.
m=1
However, by the assumptions on {Fi | i ∈ I},
∅=
6
n
\
m=1
Fim ⊆ K
so
K*
n
\
!c
Fim
.
m=1
Thus {Ui | i ∈ I} is an open subcover of K without any finite subcovers. As
this contradicts the fact that K is compact, it must have been the case that
T
i∈I Fi 6= ∅.
44
CHAPTER 3. AN INTRODUCTION TO TOPOLOGY
For the other direction, we will show that K is sequentially compact,
which implies K is compact by Theorem 3.3.8. To see that K is sequentially
compact, let (an )n≥1 be an arbitrary sequence of real numbers such that
an ∈ K for all n ∈ N. Furthermore, for each m ∈ N, let
Fm = {an | n ≥ m}.
Therefore F = {Fn | n ∈ N} is a collection of closed subsets of K (closed as
we took the closure and subsets of K since an ∈ K for all n ∈ N and by the
definition of the closure). Furthermore, F has the finite intersection property
since if n1 , . . . , nm ∈ N and k = max{n1 , . . . , nm }, then
m
\
Fnj = Fk .
j=1
Therefore, by the assumptions on this direction of the proof, there exists an
T
L ∈ R such that L ∈ n≥1 Fn . Since Fn ⊆ K for all n ∈ N, we obtain that
L ∈ K.
We claim there exists a subsequence of (an )n≥1 that converges to L. To
construct such a subsequence, first note that
L ∈ F1 = {an | n ≥ 1}.
Hence, Lemma 3.3.11 implies there exists an k1 ∈ N such that |ak1 − L| < 1.
Now, suppose we have constructed k1 < k2 < · · · < kn such that |akj −
L| < 1j for all 1 ≤ j ≤ n. To construct kn+1 , we note that
L ∈ Fkn +1 = {am | m ≥ kn + 1}.
Hence, Lemma 3.3.11 implies there exists an kn+1 ∈ N such that kn+1 > kn
1
.
and |akn+1 − L| < n+1
By recursion, we obtain a subsequence (akn )n≥1 of (an )n≥1 such that
|akn − L| < n1 for all n ∈ N. Therefore, since limn→∞ n1 = 0, we obtain that
(akn )n≥1 converges to L ∈ K. Therefore, since (an )n≥1 was an arbitrary
sequence of real numbers such that an ∈ K for all n ∈ N, we obtain that K
is sequentially compact by definition. Hence the proof is complete.
Chapter 4
Cardinality of Sets
In the previous chapter, it was shown in Theorem 3.2.7 that every open
subset of R is a union of open intervals. In fact, as we can choose the
intervals to have empty intersection and as can choose one rational number
from each interval, each open subset of R is a union of open intervals indexed
by a subset of the rational numbers. The question is, “How many rational
numbers are there?”
This question leads us to the notion of the cardinality of a set; that is, a
measure of how many elements a set contains. In particular, we will see that
there are different types of infinities. This notion of various types of infinities
was the like work of the mathematician Cantor and eventually drove him
insane. Thus we should tread carefully.
4.1
Functions
To discuss how we can compare the size of two sets, we must introduce a
mathematical object that we have avoided until this point: functions.
4.1.1
The Axiom of Choice
The most useful and accurate method for defining functions is to use the
following operation on sets.
Definition 4.1.1. Given two non-empty sets X and Y , the Cartesian product
of X and Y , denoted X × Y , is the set
X × Y = {(x, y) | x ∈ X, y ∈ Y }.
Definition 4.1.2. Given two non-empty sets X and Y , a function f from
X to Y , denoted f : X → Y , is a subset S of X × Y such that for each
x ∈ X there is an unique element denoted f (x) ∈ Y such that (x, f (x)) ∈ S
(that is, a function is defined by its graph).
45
46
CHAPTER 4. CARDINALITY OF SETS
Many of the ‘operations’ and ‘relations’ discussed in previous chapters in
these notes are actually functions in disguise.
Example 4.1.3. Notice f, g : R × R → R defined by f ((x, y)) = x + y
and g((x, y)) = xy are functions. Hence the operations of addition and
multiplication on R are functions.
Example 4.1.4. Sequence of real numbers are actually functions. Indeed
each sequence (an )n≥1 can be described via a function f : N → R where
f (n) = an . Conversely, given a function f : N → R, we may define the
sequence (f (n))n≥1 .
In the subsequent two examples, we remind the reader of two mathematical objects that will be essential in that which follows.
Example 4.1.5. Let X be an non-empty set and let be a partial ordering
(Definition 1.3.3) on X. Notice we can define a function f : X × X → {0, 1}
by
(
1 if x1 x2
f ((x1 , x2 )) =
.
0 otherwise
Notice the fact that is a partial ordering implies that
• f ((x, x)) = 1 for all x ∈ X,
• if f ((x, y)) = f ((y, x)) = 1, then x = y, and
• if f ((x, y)) = f ((y, z)) = 1, then f ((x, z)) = 1.
Conversely, if g : X × X → {0, 1} has the above three properties, then we
can define a partial ordering on X by x1 x2 if and only if g((x1 , x2 )) = 1.
Consequently, a partial ordering on X can be viewed as a function on X × X
with specific properties. Furthermore, said ordering is a total ordering
precisely when either f ((x, y)) = 1 or f ((y, x)) = 1 for all x, y ∈ X.
Example 4.1.6. Let X be an non-empty set and let ∼ be an equivalence
relation (Definition 3.2.6) on X. Notice we can define a function f : X ×X →
{0, 1} by
(
1 if x1 ∼ x2
f ((x1 , x2 )) =
.
0 otherwise
Notice the fact that ∼ is a partial ordering implies that
• f ((x, x)) = 1 for all x ∈ X,
• if f ((x, y)) = 1, then f ((y, x)) = 1 for all x, y ∈ X, and
• if f ((x, y)) = f ((y, z)) = 1, then f ((x, z)) = 1.
4.1. FUNCTIONS
47
Conversely, if g : X × X → {0, 1} has the above three properties, then
we can define an equivalence relation on X by x1 ∼ x2 if and only if
g((x1 , x2 )) = 1. Consequently, an equivalence relation on X can be viewed
as a function on X × X with specific properties.
Example 4.1.7. Given two non-empty sets X and Y , there is a natural way
to view
X × Y = {f : {1, 2} → X ∪ Y | f (1) ∈ X, f (2) ∈ Y }.
Indeed, a function f : {1, 2} → X ∪ Y is uniquely determined by the values
f (1) and f (2). Consequently, an f : {1, 2} → X ∪ Y as defined in the above
set can be viewed as the pair (f (1), f (2)). Conversely a pair (x, y) ∈ X × Y
can be represented by the function f : {1, 2} → X ∪ Y defined by f (1) = x
and f (2) = y.
In Examples 4.1.5, 4.1.6, and 4.1.7, we have exhibited equivalent ways of
looking at a single object. In doing so, we have created a nice correspondence
between the various forms of the objects. Fully describing what we mean by
such a correspondence will be postponed to the next subsection. For now,
we desire to extend the notion of the products of sets.
Let X1 , . . . , Xn be non-empty sets. We define the product of these sets
to be
X1 × · · · × Xn = {(x1 , . . . , xn ) | xj ∈ Xj for all j ∈ {1, . . . , n}}.
Notice we can view X1 × · · · × Xn as a set of functions in a similar manner
to Example 4.1.7. Indeed
(
X1 × · · · × Xn =
f : {1, . . . , n} →
n
[
k=1
)
Xk f (j) ∈ Xj ∀ j ∈ {1, . . . , n} .
But what happens if we want to take a product of an infinite number of sets?
Given a non-empty set I and a collection of non-empty sets {Xi | i ∈ I},
we would like to define the product
(
Y
i∈I
Xi =
f :I→
[
i∈I
)
Xi f (k) ∈ Xk for all k ∈ I .
However, we must ask, “Is the above set non-empty?” That is, how do we
know there is always such a function? The answer is, because we add an
axiom to make it so.
Axiom 4.1.8 (The Axiom of Choice). Given a non-empty set I and a
Q
collection of non-empty sets {Xi | i ∈ I}, the product i∈I Xi is non-empty.
Q
Any function f ∈ i∈I Xi is called a choice function.
48
CHAPTER 4. CARDINALITY OF SETS
One may ask, “Why Mr. Anderson? Why? Why do we include the
Axiom of Choice?” The short answer is, of course, “Because I choose to.”
It turns out that the Axiom of Choice is independent from the axioms
of (Zermelo–Fraenkel) set theory. This means that if one starts with the
standard axioms of set theory, one can neither prove nor disprove the Axiom
of Choice. Thus we have the option on whether to include or exclude the
Axiom of Choice from our theory.
We will allow the use of the Axiom of Choice. In fact, we have already used
a form of the Axiom of Choice (called countable choice) when constructing
sequences in the past two chapters. Of course, mathematicians like to see
what can be done if one excludes certain assumptions from their theories.
By allowing only certain forms of the Axiom of Choice, one obtains various
forms of calculus where some of the results in these notes are either false, or
far more difficult to prove. But let’s choose to make the correct decision and
include the Axiom of Choice.
4.1.2
Bijections
As we need to deal with functions throughout the remainder of the course,
we will need some notation and definitions.
Given a function f : X → Y and an A ⊆ X, we define
f (A) = {f (x) | x ∈ A} ⊆ Y.
Definition 4.1.9. Given a function f : X → Y , the range of f is f (X).
Using the notion of the range, we can define an important property we
may desire our functions to have.
Definition 4.1.10. A function f : X → Y is said to be surjective (or onto) if
f (X) = Y ; that is, for each y ∈ Y there exists an x ∈ X such that f (x) = y.
To illustrate when a function is surjective or not, consider the following
diagrams.
f is surjective
X
→
−
f
Y
f is not surjective
X
→
−
f
Y
4.1. FUNCTIONS
49
Example 4.1.11. Consider the function f : [0, 1] → [0, 2] defined by f (x) =
x2 . Notice f is not surjective since f (x) 6= 2 for all x ∈ [0, 1]. However, the
function g : [0, 1] → [0, 1] defined by g(x) = x2 is surjective. Consequently,
the target set (known as the co-domain) matters.
One useful tool when dealing with functions is to be able to describe all
points in the initial space that map into a predetermined set. Thus we make
the following definition.
Definition 4.1.12. Given a function f : X → Y and a B ⊆ Y , the preimage
of B under f is the set
f −1 (B) = {x ∈ X | f (x) ∈ B} ⊆ X.
Note the notation used for the preimage does not assume the existence
of an inverse of f (see Theorem 4.1.16). Using preimages, we can define an
important property we may desire our functions to have.
Definition 4.1.13. A function f : X → Y is said to be injective (or one-toone) if for all y ∈ Y , the preimage f −1 ({y}) has at most one element; that
is, if x1 , x2 ∈ X are such that f (x1 ) = f (x2 ), then x1 = x2 .
To illustrate when a function is injective or not, consider the following
diagrams.
f is injective
X
→
−
f
f is not injective
Y
X
→
−
f
Y
Example 4.1.14. Consider the function f : [−1, 1] → [0, 1] defined by
f (x) = x2 . Notice f is not injective since f (−1) = f (1). However, the
function g : [0, 1] → [0, 1] defined by g(x) = x2 is injective. Consequently,
the initial set (known as the domain) matters.
We desire to combine the notions of injective and surjective.
Definition 4.1.15. A function f : X → Y is said to be a bijection if f is
injective and surjective.
Using the above examples, we have seen several functions that are not
bijective. Furthermore, we have seen that f : [0, 1] → [0, 1] defined by
f (x) = x2 is bijective. One way to observe that f is injective is to consider
√
the function g : [0, 1] → [0, 1] defined by g(x) = x. Notice that f and g
‘undo’ what the other function does. In fact, this is true of all bijections.
50
CHAPTER 4. CARDINALITY OF SETS
Theorem 4.1.16. A function f : X → Y is a bijection if and only if there
exists a function g : Y → X such that
• g(f (x)) = x for all x ∈ X, and
• f (g(y)) = y for all y ∈ Y .
Furthermore, if f is a bijection, there is exactly one function g : Y → X that
satisfies these properties, which is called the inverse of f and is denoted by
f −1 : Y → X. Notice this implies f −1 is also a bijection with (f −1 )−1 = f .
Proof. Suppose that f is a bijection. Since f is surjective, for each y ∈ Y
there exists an zy ∈ X such that f (zy ) = y. Furthermore, note zy is the
unique element of X that f maps to y since f is injective.
Define g : Y → X by g(y) = zy . Clearly g is a well-defined function.
To see that g satisfies the two properties, first let x ∈ X be arbitrary.
Then y = f (x) ∈ Y . However, since f (zy ) = y = f (x), it must be the case
that zy = x as f is injective. Therefore
g(f (x)) = g(y) = zy = x
as desired. For the second property, let y ∈ Y be arbitrary. Then
f (g(y)) = f (zy ) = y
by the definition of zy . Hence g satisfies the desired properties.
Conversely, suppose g : Y → X satisfies the two properties. To see that
f is injective, suppose x1 , x2 ∈ X are such that f (x1 ) = f (x2 ). Then
x1 = g(f (x1 )) = g(f (x2 )) = x2
as desired. To see that f is surjective, let y ∈ Y be arbitrary. Then g(y) ∈ X
so
y = f (g(y)) ∈ f (X).
Since y ∈ Y is arbitrary, we have Y ⊆ f (X). Hence f (X) = Y so f is
surjective. Therefore, as f is both injective and surjective, f is bijective by
definition.
Finally, suppose f is bijective and g : Y → X satisfies the above properties.
Suppose h : Y → X is another function such that h(f (x)) = x for all x ∈ X,
and f (h(y)) = y for all y ∈ Y . Then for all y ∈ Y ,
h(y) = g(f (h(y))) = g(y)
(where we have used g(f (x1 )) = x1 when x1 = h(y) and f (h(y)) = y).
Therefore g = h as desired.
Remark 4.1.17. If f : X → Y is injective, consider the function g : X →
f (X) defined by g(x) = f (x) for all x ∈ X. Clearly g is injective since f is,
and, by construction, g is surjective. Hence g is bijective and thus has an
inverse g −1 : f (X) → X. The function g −1 is called the inverse of f on its
image.
4.2. CARDINALITY
4.2
51
Cardinality
We turn our attention to trying to determine how large a given set is. For
finite sets, we can count the number of elements to determine whether two
sets have the same number of elements or whether one set has more elements
than the other. The problem is, “How do we count the number of elements
in an infinite set?”
4.2.1
Definition of Cardinality
An alternative way to look at the above problem is to use functions. For
example, one way to see that {1, 2, 3} and {5, π, 42} have the same number
of elements is that we can pair up the elements via {(1, 5), (3, π), (2, 42)} for
example. However, we can see that {1, 2, 3} and {5, π, 42, 29} do not have
the same number of elements since there is no such pairing.
Saying that there is such a pairing is precisely saying that there exists
a bijection from one set to the other. Consequently, we define a relation ∼
on the ‘collection’ of all sets by X ∼ Y if and only if there exists a bijection
f : X → Y . Notice that ∼ ‘is’ an equivalence relation. Indeed, to see that
∼ satisfies the properties in Definition 3.2.6, first notice that X ∼ X as the
function f : X → X defined by f (x) = x for all x ∈ X is a bijection. Next,
if f : X → Y is a bijection, then f −1 : Y → X is a bijection so X ∼ Y
implies Y ∼ X. Finally, if X ∼ Y and Y ∼ Z, then there exists bijections
f : X → Y and g : Y → Z. If we define h : X → Z to be the composition
of f and g, denoted g ◦ f , which is the function defined by h(x) = g(f (x)),
it is not difficult to see that h is a bijection (either check h is injective and
surjective directly, or check that h−1 = f −1 ◦ g −1 ) so X ∼ Z.
Consequently, given a set X, we will use |X| to denote the equivalence
class of X under the above equivalence relation. Oppose to always referring
to this equivalence relation, we make the following definition.
Definition 4.2.1. Given two sets X and Y , it is said that X and Y have
the same cardinality (or are equinumerous), denoted |X| = |Y |, if there exists
a bijection f : X → Y .
Example 4.2.2. Notice that the sets X = {3, 7, π, 2} and Y = {1, 2, 3, 4}
have the same cardinality via the function f : Y → X defined by f (1) = 3,
f (2) = π, f (3) = 2, and f (4) = 7.
Example 4.2.3. We claim that |N| = |Z| (which may seem odd as N ⊆ Z).
To see this, define f : N → Z by
f (n) =


0

n
2

− n−1
2
if n = 1
.
if n is even
if n is odd and n ≥ 3
52
CHAPTER 4. CARDINALITY OF SETS
(For example f (1) = 0, f (2) = 1, f (3) = −1, f (4) = 2, f (5) = −2, etc.) It is
not difficult to verify that f is a bijection.
Using bijections gives us a method for determining when two sets have
the same size. However, how can we determine when one set has fewer
elements than another?
We have already seen that {1, 2, 3} and {5, π, 42, 29} do not have the
same number of elements. We know that {1, 2, 3} has fewer elements than
{5, π, 42, 29}. One way to see this is that we can define a function from
{1, 2, 3} to {5, π, 42, 29} that is optimal as possible; that is, we try to form a
bijective pairing, but we only obtain an injective function as we cannot hit
all of the elements of the later set. Consequently:
Definition 4.2.4. Given two sets X and Y , it is said that X has cardinality
less than Y , denoted |X| ≤ |Y |, if there exists an injective function f : X →
Y.
Example 4.2.5. Let n, m ∈ N be such that n < m. Then {1, . . . , n} has
cardinality less than {1, . . . , m} as f : {1, . . . , n} → {1, . . . , m} defined by
f (k) = k is injective.
Example 4.2.6. Since the function f : N → Q defined by f (n) = n is
injective, we see that |N| ≤ |Q|. More generally, if X ⊆ Y , then |X| ≤ |Y |.
When determining that {1, 2, 3} has fewer elements than {5, π, 42, 29},
we could have thought of things in a different light. In particular, we could
define a function from {5, π, 42, 29} to {1, 2, 3} that was onto. This should
imply that {5, π, 42, 29} has more elements than {1, 2, 3}, which is the case
by the next result.
Proposition 4.2.7. Let X and Y be non-empty sets. If f : X → Y is
surjective, then |Y | ≤ |X|.
Proof. For each y ∈ Y , let
Ay = f −1 ({y}).
Since f is surjective, Ay 6= ∅ for all y ∈ Y . Hence, by the Axiom of
Q
Choice (Axiom 4.1.8), there exists a function g ∈ y∈Y Ay ; that is, g : Y →
S
y∈Y Ay ⊆ X is such that g(y) ∈ Ay for all y ∈ Y .
We claim that g is injective. To see this, suppose y1 , y2 ∈ Y are such
that g(y1 ) = g(y2 ). Let x = g(y1 ) = g(y2 ) ∈ X. By the properties of g, it
must be the case that x ∈ Ay1 and x ∈ Ay2 . Since x ∈ Ay1 , we must have
f (x) = y1 by the definition of Ay1 . Similarly, since x ∈ Ay2 , we must have
f (x) = y2 . Therefore y1 = y2 as desired.
4.2. CARDINALITY
4.2.2
53
Cantor-Schröder–Bernstein Theorem
One natural question is, “Is ≤ a partial ordering (see Definition 1.3.3) on
the ‘collection’ of possible cardinalities?” We want our ordering to be (at
least) a partial ordering, as the properties defining a partial ordering are the
minimal properties one would like for a ‘nice’ ordering. Clearly reflexivity
and transitivity hold (the composition of injective functions is injective), but
does antisymmetry?
In Example 4.2.6, it was shown that |N| ≤ |Q|. However, notice if
m m ≥ 0, n > 0, m and n have no common divisors
n m N=
m
<
0,
n
>
0,
m
and
n
have
no
common
divisors
,
n P =
then P ∩ N = ∅ and P ∪ N = Q. Furthermore, we may define f : Q → N by
f (q) =



1
2m 3n


5−m 7n
if m = 0
if m > 0 and n > 0
if m < 0 and n > 0
where q = m
n is the unique way to write q as an element of P or N . Using
the uniqueness of prime factorization (something not covered in this course),
we see f is an injective function. Hence |Q| ≤ |N|!
As |N| ≤ |Q| and |Q| ≤ |N|, is |Q| = |N|? It is seems difficult to construct
a bijective function f : N → Q, so what hope do we have?
To answer this question, we have the following result (alternatively, we
could construct such a function, but it is not nice to define). Notice that if
X and Y are sets such that there exists injective functions f : X → Y and
g : Y → X, then we may invoke the following theorem with A = g(Y ) and
B = f (X) to obtain that |X| = |Y |.
Theorem 4.2.8 (Cantor-Schröder–Bernstein Theorem). Let X and
Y be non-empty sets. Suppose A ⊆ X and B ⊆ Y are such that there exists
bijective functions f : X → B and g : Y → A. Then |X| = |Y |.
Proof. Let A0 = X and A1 = A. Define h = g ◦ f : A0 → A0 by h(x) =
g(f (x)). Notice h is injective as f and g are injective.
Let A2 = h(A0 ). Notice
A2 = h(A0 ) = g(f (A0 )) = g(B) ⊆ g(Y ) = A1 .
Hence A2 ⊆ A1 ⊆ A0 . Next let A3 = h(A1 ). Then
A3 = h(A1 ) ⊆ h(A0 ) = A2 .
54
CHAPTER 4. CARDINALITY OF SETS
Consequently, if for each n ∈ N we recursively define An = h(An−2 ), then,
by recursion (formally, we should apply the Principle of Mathematical Induction),
An = h(An−2 ) ⊆ h(An−3 ) = An−1
for all n ∈ N. Hence we have constructed a sequence A0 ⊇ A1 ⊇ A2 ⊇ · · ·
with An = h(An−2 ) for all n ≥ 2.
We claim that |A| = |X|. To see this, notice that
X = A0 = (A0 \ A1 ) ∪ (A1 \ A2 ) ∪ (A2 \ A3 ) ∪ (A3 \ A4 ) ∪ · · · ∪
A = A1 = (A1 \ A2 ) ∪ (A2 \ A3 ) ∪ (A3 \ A4 ) ∪ (A4 \ A5 ) ∪ · · · ∪
∞
\
n=1
∞
\
!
An
!
An .
n=1
Furthermore, notice that any two distinct sets chosen from either union have
empty intersection as A0 ⊇ A1 ⊇ A2 ⊇ · · · .
Since h is injective
h(A2n \ A2n+1 ) = h(A2n ) \ h(A2n+1 ) = A2n+2 \ A2n+3
for all n ∈ N ∪ {0}. Therefore, as the sets in the union description of X are
disjoint, we may define h0 : A0 → A1 via
h0 (x) =



x
if x ∈ ∞
n=1 An
if x ∈ A2n−1 \ A2n for some n ∈ N
if x ∈ A2n \ A2n+1 for some n ∈ N
T
x


h(x)
Since
• h0 maps A2n \ A2n+1 to A2n+2 \ A2n+3 bijectively for all n ∈ N,
• h0 maps A2n−1 \ A2n to A2n−1 \ A2n bijectively for all n ∈ N, and
• h0 maps
T∞
n=1 An
to
T∞
n=1 An
bijectively,
we obtain that h0 is a bijection (any two distinct sets chosen from either
union have empty intersection). Hence |A| = |X| as claimed.
However |A| = |Y | as g : Y → A is a bijection. Hence |Y | = |X| as
having equal cardinality is an equivalence relation (see the discussion at the
beginning of Section 4.2.1.
Since we have shown |N| ≤ |Q| and |Q| ≤ |N|, we have by the CantorSchröder–Bernstein Theorem (Theorem 4.2.8) that |N| = |Q|; that is N and
Q have the same number of elements!
4.2. CARDINALITY
4.2.3
55
Countable and Uncountable Sets
One nice corollary about |N| = |Q| is that we can make a list of all rational
numbers; that is, as there is a bijective function f : N → Q, we can form
the sequence of all rational numbers (f (n))n≥1 . Consequently, sets that are
equinumerous to the natural numbers are particularity nice sets as we can
index such sets by N. This leads us to the study of such sets.
Definition 4.2.9. A set X is said to be
• countable if X is finite or |X| = |N|,
• countably infinite if |X| = |N|,
• uncountable if X is not countable.
A natural question is, “Under what operations is the countability of sets
preserved?” The following demonstrates that subsets (and thus intersections)
of countable sets are countable.
Lemma 4.2.10. If X is a countable set, then any subset of X must also be
countable.
Proof. If |X| = |N|, then there is a bijection between X and N which induces
a bijection between the subsets of X and N. Thus we may assume that
X = N. Using the Well-Ordering Principle, it is not difficult to see that
every subset of N is either finite, or can be listed as a sequence (and thus
equinumerous with N). Indeed, choose a1 to be the least element of X. Then
clearly 1 ≤ a1 . Next, let a2 be the least element of X \ {a1 }. Hence we must
have a2 ≥ 2. Recursively let an be the least element of X \ {a1 , . . . , an−1 }
(which implies an ≥ n). This process either stops (in which case X is finite)
or continues and must list all of the elements of X as an ≥ n for all n ∈ N.
The following, which simply stated says the countable union of countable
sets is countable, is an nice example of why it is useful to be able to write
countable sets as a sequence.
Theorem 4.2.11. For each n ∈ N, let Xn be a countable set. Then X =
S∞
n=1 Xn is countable.
Proof. We first desire to restrict to the case that our countable sets are
disjoint.
Let B1 = X1 and for each k ≥ 2 let

k−1
[
Bk = Xk \ 
j=1

Xj  .
56
CHAPTER 4. CARDINALITY OF SETS
Clearly Bk ∩ Bj = ∅ for all j 6= k and X = ∞
n=1 Bn . Since Bn ⊆ Xn for all
n, each Bn is countable by Lemma 4.2.10. Consequently, for each n ∈ N, we
may write
Bn = (bn,1 , bn,2 , bn,3 , . . . , ).
S
We desire to define a function f : X → N by
f (bn,m ) = 2n 3m .
Note such a function is well-defined since Bk ∩ Bj = ∅ for all j 6= k. Since f
is injective by the uniqueness of the prime decomposition of natural numbers,
we obtain that |X| ≤ |N|. Hence X is countable.
Corollary 4.2.12. If X and Y are countable sets, X
S
Y is a countable set.
Proof. Apply Theorem 4.2.11 where X1 = X, X2 = Y , and Xn = ∅ for all
n ≥ 3.
In Chapter 1, when we were using set notation to describe sets, we had
a hard time ‘listing’ the real numbers. Thus, one might ask, “Is there a
sequence of all real numbers?” We know that |N| ≤ |R| via the injective
function f : N → R defined by f (n) = n for all n ∈ N. However, is |N| = |R|?
To demonstrate that |N| < |R|, we will use the following.
Theorem 4.2.13. The open interval (0, 1) is uncountable.
Proof. The following proof is known as Cantor’s diagonalization argument
and has a wide variety of uses. Suppose that (0, 1) is countable. Then we
may write (0, 1) = {xn | n ∈ N}. By the homework, there exists numbers
{ai,j | i, j ∈ N} ⊆ {0, 1, . . . , 9} such that
xj = lim
n→∞
n
X
ak,j
k=1
10k
for all j ∈ N. Note that the sequence (ak,j )k≥1 in the above expression for
xj represents the decimal expansion of xj ; that is
xj = 0.a1,j a2,j a3,j a4,j a5,j · · · .
Consequently, this representation need not be unique due to the possibility
of repeating 9s (and this is the only reason).
For each k ∈ N, define
(
yk =
3
7
if ak,k = 7
otherwise
yk
and let y = limn→∞ nk=1 10
It is not difficult to see that y ∈ (0, 1).
k.
Furthermore y 6= xn for all n ∈ N (as y and xn will disagree in the nth
decimal place and this is not because of repeating 9s). Therefore, since
(0, 1) = {xn | n ∈ N}, we must have that y ∈
/ (0, 1), which contradicts the
fact that y ∈ (0, 1).
P
4.2. CARDINALITY
57
Proposition 4.2.14. A set containing an uncountable subset is uncountable.
Consequently, by Theorem 4.2.13, R is uncountable.
Proof. Let X be a set such that there exists an uncountable subset Y of
X. Suppose X was countable. Then Y would be countable by Lemma
4.2.10, which contradicts the fact that Y is uncountable. Hence X must be
uncountable.
Corollary 4.2.15. The irrational numbers R \ Q is an uncountable set.
Proof. Suppose R \ Q is a countable set. Since Q is countable and R =
Q ∪ (R \ Q), it would need to be the case that R is countable by Theorem
4.2.11. Since R is uncountable by Proposition 4.2.14, we have obtained a
contradiction so R \ Q is an uncountable set.
Since R is uncountable, |N| < |R| so there does not exist a list of real
numbers. However, is R the ‘smallest’ set larger than N? In particular:
Axiom 4.2.16 (The Continuum Hypothesis). If X ⊆ R is uncountable,
must it be the case that |X| = |R|?
The Continuum Hypothesis was originally postulated by Cantor whom
spent many years (at the cost of his own health and possibly sanity) trying
to prove the hypothesis. Consequently, we will not try. In fact, the reason
for Cantor’s difficulty is that there is no proof. However, nor is there any
counter example. Like with the Axiom of Choice, the Continuum Hypothesis
is independent of (Zermelo–Fraenkel) set theory, even if the Axiom of Choice
is included. Most results in analysis do not require an assertion to whether
the Continuum Hypothesis is true of false. Thus we move on.
4.2.4
Zorn’s Lemma
Using the Cantor-Schröder–Bernstein Theorem (Theorem 4.2.8), we saw that
cardinality gives a partial ordering on the size of sets. However, is it a total
ordering (Definition 1.3.5)? That is, if X and Y are non-empty sets, must it
be the case that |X| ≤ |Y | or |Y | ≤ |X|?
The above is a desirable property since it makes the ordering nicer.
However, when given two sets, it is not clear whether there always exist
an injection from one set to the other. The goal of this subsection is to
develop the necessary tools in order to answer this problem in the subsequent
subsection. The tools we require are related to partial ordering, so the
following definition is made.
Definition 4.2.17. A partially ordered set (or poset) is a pair (X, ) where
X is a non-empty set and is a partial ordering on X.
58
CHAPTER 4. CARDINALITY OF SETS
For examples of posets, we refer the reader back to Subsection 1.3.2. Our
main focus is a ‘result’ about totally ordered subsets of partially ordered
sets:
Definition 4.2.18. Let (X, ) be a partially ordered set. A non-empty
subset Y ⊆ X is said to be a chain if Y is totally ordered with respect to ;
that is, if a, b ∈ Y , then either a b or b a.
Clearly any non-empty subset of a totally ordered set is a chain. Here is
a less obvious example.
Example 4.2.19. Recall from Example 1.3.4 that the power set P(R) of R
has a partial ordering where
AB
⇐⇒
A ⊆ B.
If Y = {An | n ∈ N} ⊆ P(R) are such that An ⊆ An+1 for all n ∈ N, then
Y is a chain.
Like our initial study of the real numbers in Chapter 1, upper bounds
play an important role with respect to chains.
Definition 4.2.20. Let (X, ) be a partially ordered set. A non-empty
subset Y ⊆ X is said to be a bounded above if there exists a z ∈ X such that
y ≤ z for all y ∈ Y . Such an element z is said to be an upper bound for Y .
Example 4.2.21. Recall from Example 4.2.19 that if Y = {An | n ∈ N} ⊆
P(R) are such that An ⊆ An+1 for all n ∈ N, then Y is a chain with respect
to the partial ordering defined by inclusion. If
A=
∞
[
An
n=1
then clearly A ∈ P(R) and An ⊆ A for all n ∈ N. Hence A is an upper
bound for Y .
As in Chapter 1, there are optimal upper bounds of subsets of R which
we called least upper bounds. We saw that least upper bounds need not be
in the subset. Thus we desire a slightly different object when it comes to
partially ordered sets as the lack of a total ordering means there may not be
a unique ‘optimal’ upper bound.
Definition 4.2.22. Let X be a non-empty set and let be a partial ordering
on X. An element x ∈ X is said to be maximal if there does not exist a
y ∈ X \ {x} such that x y; that is, there is no element of X that is larger
than x with respect to .
4.2. CARDINALITY
59
Notice that R together with its usual ordering ≤ does not have a maximal
element (by, for example, the Archimedean Property). However, many
partially ordered sets do have maximal elements. For example ([0, 1], ≤) has
1 as a maximal element (although ((0, 1), ≤) does not).
For an example involving a partial ordering that is not a total ordering,
suppose X = {x, y, z, w} and is defined such that a a for all a ∈ X, a b
for all a ∈ {x, y} and b ∈ {z, w}, and a b for all other pairs (a, b) ∈ X × X.
It is not difficult to see that z and w are maximal elements and x and y
are not maximal elements. Thus it is possible, when dealing with a partial
ordering that is not a total ordering, to have multiple maximal elements.
The result we require for the next subsection may now be stated using
the above notions.
Axiom 4.2.23 (Zorn’s Lemma). Let (X, ) be a partially ordered set. If
every chain in X has an upper bound, then X has a maximal element.
We will not prove Zorn’s Lemma. To do so, we would need to use the
Axiom of Choice. In fact, Zorn’s Lemma and the Axiom of Choice are
logically equivalent; that is, assuming the axioms of (Zermelo–Fraenkel) set
theory, one may use the Axiom of Choice to prove Zorn’s Lemma, and one
may use Zorn’s Lemma to prove the Axiom of Choice.
4.2.5
Comparability of Cardinals
Using Zorn’s Lemma, we may finally demonstrate that the ordering on
cardinals is a total ordering.
Theorem 4.2.24. Let X and Y be non-empty sets. Then either |X| ≤ |Y |
or |Y | ≤ |X|.
Proof. Let
F = {(A, B, f ) | A ⊆ X, B ⊆ Y, f : A → B is a bijection}.
Notice that F is non-empty since, by assumption, there exists an x ∈ X and
a y ∈ Y so we may select A = {x}, B = {y}, and f : A → B defined by
f (x) = y.
Given (A1 , B1 , f1 ), (A2 , B2 , f2 ) ∈ F, define (A1 , B1 , f1 ) (A2 , B2 , f2 ) if
and only if
A1 ⊆ A2 ,
B1 ⊆ B2 ,
and
f2 (x) = f1 (x) for all x ∈ A1 .
It is not difficult to verify that is a partial ordering on F.
We desire to invoke Zorn’s Lemma (Axiom 4.2.23) in order to obtain a
maximal element of F. To invoke Zorn’s Lemma, it must be demonstrated
60
CHAPTER 4. CARDINALITY OF SETS
that every chain in (F, ) has an upper bound. Let C = {(Ai , Bi , fi ) | i ∈ I}
be an arbitrary chain in (F, ). Let
A=
[
i∈I
Ai
and
B=
[
Bi .
i∈I
We desire to define f : A → B such that f (x) = fi (x) whenever x ∈ Ai .
The question is, “Will such an f be well-defined as each x could be in
multiple Ai ?” To see that f is well-defined, suppose x ∈ Ai and x ∈ Aj
for some i, j ∈ I. Since C is a chain, either (Ai , Bi , fi ) (Aj , Bj , fj ) or
(Aj , Bj , fj ) (Ai , Bi , fi ). If (Ai , Bi , fi ) (Aj , Bj , fj ), then Ai ⊆ Aj and implies that fj (x) = fi (x). As the case that (Aj , Bj , fj ) (Ai , Bi , fi ) is the
same (reversing i and j), we obtain that f is well-defined.
In order for (A, B, f ) to be an upper bound for C, we must first demonstrate that (A, B, f ) ∈ F. Clearly A ⊆ X, B ⊆ Y , and f : A → B is a
function. It remains to check that f is a bijection.
To see that f is injective, suppose x1 , x2 ∈ A are such that f (x1 ) = f (x2 ).
S
Since A = i∈I Ai , there exists i, j ∈ I such that xi ∈ Ai and xj ∈ Aj . Since
C is a chain, we must have either (Ai , Bi , fi ) (Aj , Bj , fj ) or (Aj , Bj , fj ) (Ai , Bi , fi ). In the former case, we obtain that fj (x1 ) = f (x1 ) = f (x2 ) =
fj (x2 ). Therefore, since fj is injective, it must be the case that x1 = x2 . As
the case that (Aj , Bj , fj ) (Ai , Bi , fi ) is the same (reversing i and j), we
obtain that f is injective.
To see that f is surjective, let y ∈ B be arbitrary. Since B = i∈I Bi ,
there exists an i ∈ I such that y ∈ Bi . Since fi is surjective, there exists an
x ∈ Ai such that fi (x) = y. Hence x ∈ A and f (x) = fi (x) = y. Therefore,
as y was arbitrary, f is surjective. Hence f is a bijection and (A, B, f ) ∈ F.
S
As (A, B, f ) ∈ F, it is easy to see that (A, B, f ) is an upper bound for C
by the definition of (A, B, f ) and the partial ordering . Hence, as C was an
arbitrary chain, every chain in F has an upper bound. Thus Zorn’s Lemma
implies that (F, ) has a maximal element.
Let (A0 , B0 , f0 ) ∈ F be a maximal element. We claim that either A0 = X
or B0 = Y . To see this, suppose otherwise that A0 6= X and B0 6= Y .
Therefore, there exist x0 ∈ X \ A0 and y0 ∈ Y \ B0 . Let A0 = A0 ∪ {x0 },
B 0 = B0 ∪ {y0 }, and g : A0 → B 0 be defined by g(x0 ) = y0 and g(x) = f0 (x)
for all x ∈ A0 . Clearly g is a well-defined bijection by construction so
(A0 , B 0 , g) ∈ F. However, it is elementary to see that (A0 , B0 , f0 ) (A0 , B 0 , g)
and (A0 , B0 , f0 ) 6= (A0 , B 0 , g). As this contradicts the fact that (A0 , B0 , f0 ) ∈
F is a maximal element, we have obtained a contradiction. Hence either
A0 = X or B0 = Y .
If A0 = X, then f0 : X → B ⊆ Y is injective so |X| ≤ |Y | by definition.
Otherwise, if B0 = Y , then f0 : A0 → Y is surjective. Choose y ∈ Y and
4.2. CARDINALITY
61
define h : X → Y by
(
h(x) =
f0 (x)
y
if x ∈ A0
.
if x ∈
/ A0
Clearly h is a well-defined surjective function so |Y | ≤ |X| by Proposition
4.2.7.
62
CHAPTER 4. CARDINALITY OF SETS
Chapter 5
Continuity
In the previous chapter, we saw the use of functions in comparing the size
of sets. However, there is a vast possibility of applications for functions.
In particular, the focus of this chapter is to begin to examine functions
from subsets of the real numbers to the real numbers. However, our goal is
not to plainly study such functions, but how such functions interact with
the properties of the real numbers we have investigated in this course. In
particular, we will begin with a focus of limits of functions. This study will
lead us to the all important theory of continuous functions.
5.1
Limits of Functions
To study analytic properties of functions on subset of real numbers, we first
must modify the definition of a limit of sequence to be able to examine the
limit of a function.
5.1.1
Definition of a Limit
Given a function f : R → R and a point a ∈ R, we desire to describe the
behaviour as x gets ‘closer and closer’ to a. However, f (a) exists, so this
concept might seem weird; that is, why do we want to know how f behaves
as x gets ‘closer and closer’ to a since we know f (a)? The short answer is
that, due to things like fluctuations, f may behave very differently as x gets
‘closer and closer’ to a than it does at x = a. This leads us to the following
heuristic concept.
Heuristic Definition. A number L is said to be the limit of a function f
as x tends to a if the values of f (x) approximate L provided that x is very
close, but not equal to a.
Since we are only interested in the behaviour of f as x tends to a, it is
not necessary that f (a) is well-defined. Furthermore, we do not need need f
63
64
CHAPTER 5. CONTINUITY
to be defined on all of R, but just near a; that is, on an open interval that
contains a, except for possibly at a. Using this and our heuristic definition,
we arrive at our definition of a limit (and the reason this course is often
called a first course in -δ).
Definition 5.1.1. Let a ∈ R, let I be an open interval containing a, and let
f be a function define on I except at possibly a. A number L ∈ R is said
to be the limit of f as x tends to a if for every > 0 there exists a δ > 0
(which depends on ) such that if 0 < |x − a| < δ then |f (x) − L| < .
If L is the limit of f as x tends to a, we say the limit of f (x) as x tends
to a exists and write L = limx→a f (x). Otherwise we say the limit does not
exist.
Note the assumption that f is defined on an open interval I containing a
is necessary to ensure that f (x) is well-defined provided 0 < |x − a| < δ and
δ is chosen sufficiently small.
As it took some time for use to get use to the -N definition of a limit of
a sequence, let’s provide some examples for checking the -δ definition of a
limit of a function.
Example 5.1.2. Let f (x) = 3x + 1 for all x ∈ R. Does limx→2 3x + 1 exist?
Our intuition says yes; as x tends to 2, we expect f (x) to tend to 3(2) + 1 = 7.
To see this using the definition of the limit, let > 0 be arbitrary. Let
δ = 3 > 0. Then if 0 < |x − 2| < δ, then
|f (x) − 7| = |(3x + 1) − 7| = |3x − 6| = 3|x − 2| < 3 = .
3
Hence, as > 0 was arbitrary, we obtain that limx→2 3x + 1 = 7 by the
definition of the limit.
Furthermore, if we define
(
g(x) =
3x + 1
100
if x 6= 2
,
if x = 2
then it is still the case that limx→2 g(x) = 7.
Example 5.1.3. Let f (x) = x2 for all x ∈ R. Does limx→3 x2 exist? Our
intuition says yes; as x tends to 3, we expect f (x) to tend to 32 = 9.
To see this using the definition of the limit, let > 0 be arbitrary. Let
δ = min{1, 7 } > 0. To see why we chose this δ, we would first do the
computation
|x2 − 9| = |(x + 3)(x − 3)| = |x + 3||x − 3|.
Thus, we know we can make |x − 3| < δ, so provided we can find an upper
bound of |x + 3|, then we will be fine. Thus, with our choice of δ, we see
5.1. LIMITS OF FUNCTIONS
65
that if 0 < |x − 3| < δ, then |x − 3| ≤ 1 so x ∈ [2, 4]. Hence 0 < |x − 3| < δ
implies |x + 3| ≤ 7 and thus
|x2 − 9| = |x + 3||x − 3| ≤ 7|x − 3| < 7 = .
7
Hence, as > 0 was arbitrary, we obtain that limx→3 x2 = 9 by the definition
of the limit.
x
Example 5.1.4. Let f (x) = |x|
for x 6= 0. Does limx→0 f (x) exist? Well, if
x > 0 then f (x) = 1 whereas if x < 0 then f (x) = −1. Thus if x is close to 0,
it is possible that f (x) is either ±1 so we do not expect the limit to exist. To
see this via our definition, suppose L = limx→0 f (x). Let = 1 and let δ > 0
be such that |f (x) − L| < for all 0 < |x| < δ. Therefore |f ( 2δ ) − L| < 1 and
|f (− 2δ ) − L| < 1, which implies |1 − L| < 1 (i.e. L ∈ (0, 2)) and | − 1 − L| < 1
(i.e. L ∈ (−2, 0)). As this is impossible, we see that the limit of f as x tends
to 0 does not exist.
As with sequences, we used the word ‘the’ in the definition of the limit
of a function. Again we must demonstrate that this is warranted.
Proposition 5.1.5. Let a ∈ R, let I be an open interval containing a, and
let f be a function define on I except at possibly a. If L and K are limits of
f as x tends to a, then L = K.
6 K, we know that
Proof. Suppose that L 6= K. Let = |L−K|
2 . Since L =
> 0.
Since L is a limit of f as x approaches a, we know by the definition
of a limit that there exists a δ1 > 0 such that if 0 < |x − a| < δ1 then
|f (x) − L| < . Similarly, since K is a limit of f as x approaches a, we
know by the definition of a limit that there exists a δ2 > 0 such that if
0 < |x − a| < δ2 then |f (x) − K| < Let δ = min{δ1 , δ2 } > 0. By the above paragraph, we have that 0 <
|x − a| < δ implies |f (x) − L| < and |f (x) − K| < . Choose x0 ∈ I such
that 0 < |x − a| < δ (such an x0 exists since I is an open interval containing
a). Hence by the Triangle Inequality
|L − K| ≤ |L − f (x)| + |f (x) − K| < + = 2 = |L − K|
which is absurd (i.e. x < x is false for all x ∈ R). Thus we have obtained a
contradiction so it must be the case that L = K.
Notice the proof of Proposition 5.1.5 is quite similar to that of Proposition
2.1.10 where the limit of a sequence was shown to be unique. Coincidence, I
think not! The limit of a function f as x tends to a is intimately connected
with the limit of sequences obtained by applying f to sequences converging
to a.
66
CHAPTER 5. CONTINUITY
Theorem 5.1.6. Let a ∈ R, let I be an open interval containing a, and let
f be a function define on I except at possibly a. Then L = limx→a f (x) if
and only if whenever (xn )n≥1 has the properties that xn 6= a for all n ∈ N
and limn→∞ xn = a, then limn→∞ f (xn ) = L.
Proof. First, suppose L = limx→a f (x). Let (xn )n≥1 be such that xn 6= a for
all n ∈ N and limn→∞ xn = a. We desire to show that limn→∞ f (xn ) = L.
Let > 0 be arbitrary. Since L = limx→a f (x), there exists a δ > 0 such
that if 0 < |x − a| < δ, then |f (x) − L| < . Since δ > 0 and limn→∞ xn = a,
there exists an N ∈ N such that |xn − a| < δ for all n ≥ N . Hence, if n ≥ N
then |xn − a| < δ so |f (xn ) − L| < (as xn 6= a). Therefore, as > 0 was
arbitrary, limn→∞ f (xn ) = L as desired.
Conversely, suppose that f does not converge to L as x tends to a. Thus
there exists an > 0 such that for all δ > 0 there exists an x ∈ I such that
0 < |x − a| < δ yet |f (x) − L| ≥ . For each n ∈ N, choose xn ∈ I such that
0 < |xn − a| < n1 yet |f (xn ) − L| ≥ . Then (xn )n≥1 is a sequence with the
property that xn =
6 a for all n ∈ N. Furthermore, since 0 < |xn − a| < n1 for
all n ∈ N, we obtain that limn→∞ xn = a. However, since |f (xn ) − L| ≥ for all n ∈ N, we see that (f (xn ))n≥1 does not converge to L.
Theorem 5.1.6 provides an alternate way of defining the limit of a function;
instead of using -δ, we use sequences. The sequential definition of a limit
will have many applications for us. For example, pretty much all of our
theorems from sequences will extend easily to functions.
√
Example 5.1.7. Let a > 0. Then f (x) = x is defined on an open interval
√
√
containing a. Furthermore, by the homework, limx→a x = a.
Furthermore, the sequential definition of the limit of a function is particularly useful in showing that limits do not exists. This is because we need
only construct to sequences converging to the point x = a that have different
limits once f is applied to them.
Example 5.1.8. The function f : R → R defined by
f (x) =

0
sin
if x = 0
1
x
if x 6= 0
has no limit as x tends to 0. To see this, consider the sequences (an )n≥1
2
2
and (bn )n≥1 defined by an = π(4n+1)
and bn = π(4n−1)
for all n ∈ N. Clearly,
limn→∞ an = limn→∞ bn = 0. However
lim f (an ) = lim 1 = 1
n→∞
n→∞
and
lim f (bn ) = lim −1 = −1.
n→∞
n→∞
Thus, as the above limits differ, limx→0 f (x) does not exist.
Although the sequential definition of a limit will be quite useful as we
have built up our theory of limits of sequences, the -δ definition will be
useful for more theoretical applications in the pages to come.
5.1. LIMITS OF FUNCTIONS
5.1.2
67
Limit Theorems for Functions
Using Theorem 5.1.6, we easily import results from Chapter 2
Theorem 5.1.9. Let a ∈ R, let I be an open interval containing a, and let
f and g be functions define on I except at possibly a. If L = limx→a f (x)
and K = limx→a g(x), then
a) limx→a f (x) + g(x) = L + K.
b) limx→a f (x)g(x) = LK.
c) limx→a cf (x) = cL for all c ∈ R.
d) limx→a
f (x)
g(x)
=
L
K
whenever K 6= 0.
Proof. Combine Theorem 2.3.1 together with Theorem 5.1.6.
Example 5.1.10. For each c, a ∈ R, we easily see that limx→a c = c and
limx→a x = a (where the later comes from taking δ = in Definition 5.1.1.
Consequently, by applying (b) of Theorem 5.1.9, we obtain that limx→a cxn =
can for all n ∈ N and all c ∈ R. Therefore, by applying (b) of Theorem 5.1.9,
we obtain that limx→a p(x) = p(a) for all polynomials p.
Example 5.1.11. Let f (x) = p(x)
q(x) where p and q are polynomials where q
is not the zero polynomial. Such a function is said to be a rational function.
If a ∈ R is such that q(a) 6= 0, then (d) implies that limx→a f (x) = f (a).
As with sequences, given two functions f and g such that limx→a g(x) = 0,
(x)
(x)
one may ask whether limx→a fg(x)
exists. Clearly if limx→a fg(x)
= L and
limx→a g(x) = 0 then Theorem 5.1.9 implies limx→a f (x) exists and
f (x) lim f (x) = lim
lim g(x) = L(0) = 0.
x→a
x→a
x→a g(x)
Like with sequences, if limx→a g(x) = 0 yet limx→a f (x) 6= 0 (or does not
exist), there are many possible behaviours, some of which we will examine in
the next section. For now, we continue to note results from sequences hold
for functions.
Theorem 5.1.12 (Squeeze Theorem). Let a ∈ R, let I be an open interval
containing a, and let f , g, and h be functions define on I except at possibly
a. Suppose for each x ∈ I \ {a} that
g(x) ≤ f (x) ≤ h(x).
If limx→a g(x) and limx→a h(x) exist and L = limx→a g(x) = limx→a h(x),
then limx→a f (x) exists and limx→a f (x) = L.
68
CHAPTER 5. CONTINUITY
Proof. Combine Theorem 2.3.8 together with Theorem 5.1.6.
Again, the Squeeze Theorem has its uses when dealing with difficult
functions that may be compared to simple ones.
Example 5.1.13. Consider the function
f (x) =

x sin 1
if x 6= 0
0
if x = 0
x
.
In Example 5.1.8 we saw that limx→0 x1 f (x) did not exist. However, since
1
x
−|x| ≤ f (x) ≤ |x|
since
− 1 ≤ sin
≤ 1 for all x ∈ R \ {0},
and since limx→0 x = limx→0 −x = 0, we see that limx→0 f (x) = 0 by the
Squeeze Theorem.
Finally, the Comparison Theorem is also useful when comparing limits of
functions.
Theorem 5.1.14 (Comparison Theorem). Let a ∈ R, let I be an open
interval containing a, and let f and g be functions define on I except at
possibly a. Suppose for each x ∈ I \ {a} that
g(x) ≤ f (x).
If L = limx→a f (x) and K = limx→a g(x) exist, then K ≤ L.
Proof. Combine Theorem 2.3.11 together with Theorem 5.1.6.
5.1.3
One-Sided Limits
The limits of functions we have been dealing with so far may be called
two-sided limits. The rational behind this terminology is that one must
examine numbers (or sequences with terms) that are larger and/or smaller
than the target point x = a. Thus, one must examine the behaviour of the
function on both sides of the target. Restricting to one side or the other
weakens the requirement for the limit to exist at the cost of slightly less
information.
Definition 5.1.15. Let a ∈ R, let I be an open interval with a as the left
endpoint, and let f be a function defined on I. A number L ∈ R is said to
be the right-sided limit of f as x tends to a if for every > 0 there exists a
δ > 0 (which depends on ) such that if a < x < a + δ then |f (x) − L| < .
In this case, we say that f (x) converges to L as x approaches a from above
and write L = limx→a+ f (x).
5.1. LIMITS OF FUNCTIONS
69
Definition 5.1.16. Let a ∈ R, let I be an open interval with a as the right
endpoint, and let f be a function defined on I. A number L ∈ R is said to
be the left-sided limit of f as x tends to a if for every > 0 there exists a
δ > 0 (which depends on ) such that if a − δ < x < a then |f (x) − L| < .
In this case, we say that f (x) converges to L as x approaches a from below
and write L = limx→a− f (x).
x
Example 5.1.17. Let f (x) = |x|
for x 6= 0. Recall from Example 5.1.4 that
limx→0 f (x) did not exist. However, limx→0+ f (x) and limx→0− f (x) do exist.
To see this, we notice that f (x) = 1 for all x > 0, so clearly limx→0+ f (x) = 1.
Similarly, f (x) = −1 for all x < 0 so limx→0− f (x) = −1.
As with two-sided limits, the definitions of one-sided limits can be phrased
in terms of sequences.
Theorem 5.1.18. Let a ∈ R, let I be an open interval with a as the left
(right) endpoint, and let f be a function defined on I. Then L = limx→a+ f (x)
(L = limx→a− f (x)) if and only if whenever (xn )n≥1 has the properties that
xn > a (xn < a) for all n ∈ N and limn→∞ xn = a, then limn→∞ f (xn ) = L.
Proof. Repeat the ideas of Theorem 5.1.6.
Again, as with two-sided limits, the results pertaining to limits of sequences easily import to the one-sided limit setting.
Corollary 5.1.19. The conclusions of Theorems 5.1.9, 5.1.12, and 5.1.14
when two-sided limits are replaced with one-sided limits (under the necessary
modifications to the hypotheses).
It is clear that for limx→a f (x) to exist, we must have limx→a+ f (x) and
limx→a− f (x) existing. However, Example 5.1.17 demonstrates the existence
of limx→a+ f (x) and limx→a− f (x) is not enough. Of course, the problem
with Example 5.1.17 is simply that limx→a+ f (x) 6= limx→a− f (x)
Theorem 5.1.20. Let a ∈ R, let I be an open interval containing a, and
let f be a function define on I except at possibly a. Then limx→a f (x) exists
if and only if limx→a+ f (x) and limx→a− f (x) exist and limx→a+ f (x) =
limx→a− f (x). Furthermore, if limx→a f (x) exists, then
lim f (x) = lim f (x) = lim f (x).
x→a
x→a+
x→a−
Proof. For the first direction, suppose limx→a f (x) exists and let L =
limx→a f (x). We desire to show that limx→a+ f (x) and limx→a− f (x) exist and are both equal to L. To see this, let > 0 be arbitrary. Since
L = limx→a f (x), there exists a δ > 0 such that if 0 < |x − a| < δ, then
|f (x) − L| < . Hence if a < x < a + δ then |f (x) − L| < . Thus, as > 0
was arbitrary, by the definition of the right-sided limit limx→a+ f (x) exists
70
CHAPTER 5. CONTINUITY
and equals L. Similarly if a − δ < x < a then |f (x) − L| < . Thus, as > 0
was arbitrary, by the definition of the left-sided limit limx→a− f (x) exists
and equals L. Hence this direction of the proof is complete.
For the other direction, suppose limx→a+ f (x) and limx→a− f (x) exist
and L = limx→a+ f (x) = limx→a− f (x). To see that limx→a f (x) exists and
equals L, let > 0 be arbitrary. Since L = limx→a+ f (x), there exists a
δ1 > 0 such that if a < x < a + δ1 , then |f (x) − L| < . Similarly, since
L = limx→a− f (x), there exists a δ2 > 0 such that if a − δ2 < x < a, then
|f (x) − L| < . Therefore, if δ = min{δ1 , δ2 } > 0, then 0 < |x − a| < δ
implies either a < x < a + δ1 or a − δ2 < x < a and thus |f (x) − L| < .
Therefore, since > 0 was arbitrary, we have by the definition of the limit
that limx→a f (x) exists and equals L.
The benefit of Theorem 5.1.20 that it is often easier to deal with one side
of the limit at a time, and then combine the results at the end.
Theorem 5.1.21 (The Fundamental Trigonometric Limit).
sin(θ)
= 1.
θ→0
θ
lim
Proof. First, suppose 0 < θ < π2 . Consider the following diagrams where the
specified point is (cos(θ), sin(θ)):
1
1
θ
−1
θ
1
−1
1
−1
−1
1
θ
−1
1
−1
5.1. LIMITS OF FUNCTIONS
71
Note, as the coordinates of the point on the circle are (cos(θ), sin(θ)), the
area of the first triangle is 12 cos(θ) sin(θ) and the area of the second triangle
θ
is 12 (1) tan(θ). The area of the region subtended by the arc is 2π
times the
area of the circle, which is π. Hence the area of the region subtended by the
arc 2θ . Hence we see that
1
1
1 sin(θ)
cos(θ) sin(θ) ≤ θ ≤
.
2
2
2 cos(θ)
Therefore, since
1
2
sin(θ) > 0 when 0 ≤ θ < π2 , we obtain that
0 < cos(θ) ≤
1
θ
≤
.
sin(θ)
cos(θ)
Hence, by taking reciprocals, we obtain that
1
sin(θ)
≥
≥ cos(θ).
cos(θ)
θ
However, since limθ→0 cos(θ) = 1 and thus limθ→0
1
cos(θ)
= 1, we obtain by
limθ→0+ sin(θ)
= 1 (where we only get the rightθ
π
0 < θ < 2 ). Since limθ→0− sin(θ)
= 1 by similar
θ
the Squeeze Theorem that
sided limit as we assumed
arguments (or because sin(−θ) = − sin(θ)), the result follows by Theorem
5.1.20.
5.1.4
Limits at and to Infinity
There are many more types of limits we may examine. Much of the theory
follows along the same lines as the previous results in this section, so we will
only summarize the definitions and results, and provided a few examples.
First, instead of requiring a ∈ R, we may ask for limits as x tends to
±∞.
Definition 5.1.22. Let f be a function define on an interval (c, ∞). A
number L ∈ R is said to be the limit of f as x tends to ∞ if for every
> 0 there exists an M > 0 (which depends on ) such that if M ≤ x then
|f (x) − L| < . In this case, we say that f (x) converges to L as x tends to
∞ and write L = limx→∞ f (x).
Definition 5.1.23. Let f be a function define on an interval (−∞, c). A
number L ∈ R is said to be the limit of f as x tends to −∞ if for every
> 0 there exists an M > 0 (which depends on ) such that if M ≥ x then
|f (x) − L| < . In this case, we say that f (x) converges to L as x tends to
−∞ and write L = limx→−∞ f (x).
Theorem 5.1.24. Let f be a function define on an interval (c, ∞). Then
L = limx→∞ f (x) if and only if whenever (xn )n≥1 has the properties that
xn > c for all n ∈ N and limn→∞ xn = ∞, then limn→∞ f (xn ) = L.
72
CHAPTER 5. CONTINUITY
Proof. Repeat the ideas of Theorem 5.1.6.
Corollary 5.1.25. The conclusions of Theorems 5.1.9, 5.1.12, and 5.1.14
when a = ±∞ (under the necessary modifications to the hypotheses).
Example 5.1.26. It is not difficult to verify based on definitions that
limx→∞ x1 = 0.
Example 5.1.27. Let f (x) =
f (x) =
3x2 −2x+1
.
2x2 +5x−2
(x2 )(3 −
(x2 )(2 +
2
x
5
x
+
−
Then, for sufficiently large x,
1
)
x2
2
)
x2
=
3−
2+
2
x
5
x
+
−
1
x2
2
x2
.
Hence
lim f (x) =
x→∞
3 − 2(0) − (0)(0)
3
= .
2 + 5(0) − 2(0)(0)
2
Example 5.1.28. We claim that limx→∞
sin(x) ≤ 1 for all x ∈ R, we see that
−
sin(x)
x
= 0. Indeed, since −1 ≤
1
sin(x)
1
≤
≤
x
x
x
for all x > 0. Hence, since limx→∞ x1 = limx→∞ − x1 = 0, we obtain that
= 0 by the Squeeze Theorem
limx→∞ sin(x)
x
Like with sequences, we can discuss limits of unbounded functions.
Definition 5.1.29. Let a ∈ R, let I be an open interval containing a, and
let f be a function define on I except at possibly a. The function f is said
to diverge to infinity (negative infinity) as x tends to a if for every M > 0
there exists a δ > 0 (which depends on ) such that if 0 < |x − a| < δ
then f (x) ≥ M (f (x) ≤ −M ). In this case we write limx→a f (x) = ∞
(limx→a f (x) = −∞).
1
= ∞. Indeed if M > 0, and
Example 5.1.30. Notice that limx→0 |x|
1
1
0 < |x| < M , then |x| > M . However, limx→0 x1 =
6 ∞ since if x < 0, then
1
x < 0 < M.
Of course, we may combine Definition 5.1.29 with Definitions 5.1.22 and
5.1.23. Furthermore, we may discuss one-sided versions of Definitions 5.1.22
and 5.1.23. We could go on and on showing the same results hold and doing
more examples. Instead, lets move onto bigger and better things.
5.2. CONTINUITY OF FUNCTIONS
5.2
73
Continuity of Functions
With our discussion of limits complete, we may start using them to study
far more interesting objects. In particular, we desire to examine functions
for which limits exist at each point and are equal to evaluating the function
at each point.
Definition 5.2.1. A function f defined on an open interval containing
a number a ∈ R is said to be continuous at a if limx→a f (x) exists and
limx→a f (x) = f (a).
Of course, as we had various definitions of the limit of a function, we can
rephrase the definition of continuity in various ways. Using the -δ version
of a limit, we obtain the following equivalent formulation of continuity.
Definition 5.2.2 (-δ Definition of Continuity). A function f defined
on an open interval containing a number a ∈ R is said to be continuous at a if
for all > 0 there exists a δ > 0 such that if |x − a| < δ then |f (x) − f (a)| < .
Similarly, using Theorem 5.1.6, we obtain the following equivalent formulation of continuity.
Definition 5.2.3 (Sequential Definition of Continuity). A function
f defined on an open interval containing a number a ∈ R is said to be
continuous at a if whenever (xn )n≥1 converges to a, the sequence (f (xn ))n≥1
converges to f (a).
Example 5.2.4. Using Example 5.1.10, we see that if p(x) is a polynomial,
then p(x) is continuous at a for all a ∈ R. Similarly, using Example 5.1.11,
we see that if p(x) and q(x) are polynomials, then p(x)
q(x) is continuous at a
provided q(a) 6= 0.
√
Example 5.2.5. The function f (x) = x is continuous at a for all a > 0
by Example 5.1.7.
Remark 5.2.6. We will assume throughout the remainder of the course
that sin(x), cos(x), and ex are continuous at all points in R. Furthermore,
we will assume that ln(x) (the inverse of ex ), is continuous for all a > 0. The
idea behind the proofs that all of these functions are continuous (modulo
ln which will be demonstrated in the next section) are based on ideas from
Section 5.4 together with a notion of convergence of functions (which we will
not deal with in this course).
Of course, once one has continuous functions, we expect w can form new
continuous functions using the old ones.
Theorem 5.2.7. Let f (x) and g(x) be functions on an open interval I
containing a number a ∈ R. If f and g are continuous at a, then
74
CHAPTER 5. CONTINUITY
a) f (x) + g(x) is continuous at a.
b) f (x)g(x) is continuous at a.
c) cf (x) is continuous at a for all c ∈ R.
d)
f (x)
g(x)
is continuous at a provided g(a) 6= 0.
Proof. Apply Theorem 5.1.9 together with the definition of continuity.
One other nice operation we have seen for functions is composition.
Indeed, if f, g : R → R are functions, we can consider the function g ◦ f and
ask whether or nor g ◦ f is continuous at a point. Of course, we will want
to extend to functions that are not defined on all of R, so we will need to
impose conditions to guarantee the composition is well-defined. In fact the
following is the best we can do to obtain continuity at a point (see later
examples).
Theorem 5.2.8. Let f be a function defined on an open interval I1 containing a number a ∈ R. Let g be a function defined on an open interval I2
which contains f (I2 ). If f is continuous at a and g is continuous at f (a),
then g ◦ f is continuous at a.
Proof. Let (xn )n≥1 be a sequence such that limn→∞ xn = a. Since f is
continuous at a, we know that limn→∞ f (xn ) = f (a). Since g is continuous
at f (a) and (f (xn ))n≥1 is a sequence that converges to f (a), we obtain that
limn→∞ g(f (xn )) = g(f (a)). Hence, by definition g ◦ f is continuous at a.
Of course, there are many functions that are not continuous at points
which we now desire to discuss.
Example 5.2.9. Consider the function
f (x) =



x
1


−x2
if x > 0
if x = 1 .
if x < 0
Then f is continuous at a precisely when a 6= 0. Indeed f is not continuous
at 0 since limx→0 f (x) = 0, yet f (0) = 1 6= 0. Such a discontinuity (one
where limx→a f (x) exists, but is not equal to f (a)) is said to be removable
as one could simply redefine the value of f at a to obtain a function that is
continuous at a (i.e. we can remove the discontinuity by redefining a single
point).
Of course, if limx→a f (x) does not exist, then clearly f is discontinuous
at a. The following are some basic examples of how limx→a f (x) may fail to
exist.
5.2. CONTINUITY OF FUNCTIONS
75
Example 5.2.10. Consider the function
f (x) =

x
if x 6= 0
1
ifx = 0
|x|
.
Then limx→0 f (x) does not exist. Since limx→0+ f (x) and limx→0− f (x) exist,
such a discontinuity is called a jump discontinuity.
Example 5.2.11. The function
(
f (x) =
1
x
if x 6= 0
if x = 0
0
is discontinuous at 0 since limx→0+ f (x) = ∞. This is an example of an
unbounded discontinuity.
Example 5.2.12. The function f : R → R defined by
f (x) =

0
if x = 0
sin 1
x
if x 6= 0
is discontinuous at 0 since limx→0+ f (x) does not exist. This is an example
of an osculating discontinuity
Of course, a discussion would not be complete with examining some very
bizarre functions.
Example 5.2.13. Consider the function
(
f (x) =
1
0
if x ∈ Q
.
if x ∈ R \ Q
If a ∈ R, then f is discontinuous at a. Indeed if a ∈ R \ Q, we know (from
the homework) that there exists a sequence (xn )n≥1 of rational numbers
such that limn→∞ xn = a. However, since f (xn ) = 1 for all n yet f (a) = 0,
we obtain that f is discontinuous at a. Similarly, if a ∈ Q, then (from the
homework) that there exists a sequence (xn )n≥1 of irrational numbers such
that limn→∞ xn = a. However, since f (xn ) = 0 for all n yet f (a) = 1, we
obtain that f is discontinuous at a.
Example 5.2.14. Consider the function
(
f (x) =
1
n
0
if x = m
n where n ∈ Nand m ∈ Z have no common divisors .
if x ∈ R \ Q
By the same arguments as the previous example, f is discontinuous at each
rational number. However, f is continuous at each irrational number (see
the homework).
76
CHAPTER 5. CONTINUITY
To complete this section, we note that continuity and compositions do
not play well together if there are some discontinuities.
Remark 5.2.15. If f is discontinuous on Q and g is discontinuous at a
single point, it is possible that g ◦ f is nowhere continuous. Indeed let
(
g(x) =
1 if x 6= 0
0 if x = 0
and let f be as in Example 5.2.14. Then g ◦ f is the function in Example
5.2.13, which was discontinuous at all points.
There are many other questions one may ask pertaining to discontinuities.
For example, is there a function that is continuous on Q but discontinuous
on R \ Q? The answer to this question turns out to be no. However, although
we could pursue this question, we will not for the sake of studying functions
that are continuous on a large collection of points and their properties.
5.3
The Intermediate Value Theorem
Suppose two people are start walking on a path at the same time, one starting
on one end, the other starting at the other. If they walk until they both reach
the end of the path opposite to where they started, must they eventually
meet as time progresses? Of course, logic says they must. But how can we
mathematically prove said result.
Of course, specific assumptions must be made in the above problem. For
example, we are assuming that position is a function of time (no time travel
permitted via Delorean’s and TARDISs’). Furthermore, we must assume that
our functions are continuous at each point (i.e. no teleportation). As such,
we desire to study functions that are continuous on open/closed intervals.
Definition 5.3.1. A function f defined on (a, b) is said to be continuous on
(a, b) if f is continuous at all points in (a, b).
Definition 5.3.2. A function f defined on [a, b] is said to be continuous on
[a, b] if f is continuous on (a, b), limx→a+ f (x) = f (a), and limx→b− f (x) =
f (b).
Notice that polynomials, exponentials, sine, and cosine are continuous
functions on any open or closed interval. Furthermore, Theorems 5.2.7 and
5.2.8 may be used to construct additional continuous functions from known
continuous functions.
For more examples of continuous functions on intervals, the function
√
f (x) = x is continuous on any open or closed subinterval of [0, ∞). Similarly,
1
1
x and sin( x ) are continuous on (0, 1) but cannot be extended to be continuous
5.3. THE INTERMEDIATE VALUE THEOREM
77
functions on [0, 1] (that is, there is no way to define values of these functions
at 0 to make them continuous.
Continuous functions on intervals will play a major role in the remainder
of this course. In fact, there is a deep connection between continuous
functions and open sets (as discussed in Chapter 3). Indeed one can show
that a function f : R → R is continuous on R if and only if f −1 (U ) is open
for every open set U ⊆ R. This enables one to define the notion of continuous
functions on arbitrary sets provided one has a nice notion of ‘open sets’.
Instead of pursuing abstractification of continuous functions, we will
examine properties of continuous functions on (generally closed) intervals
of R. We begin by obtaining our first piece of the Triforce of theory about
continuous functions
Theorem 5.3.3 (The Intermediate Value Theorem). If f : [a, b] → R
is continuous on [a, b] and if α ∈ R is such that f (a) < α < f (b) (or
f (b) < α < f (a)), then there exists a c ∈ (a, b) such that f (c) = α.
Proof. Let f : [a, b] → R be a continuous function on a closed interval [a, b]
and let α ∈ R be such that f (a) < α < f (b) (the proof when f (b) < α < f (a)
is similar). Define g : [a, b] → R by g(x) = f (x) − α for all x ∈ [a, b].
Hence it suffice to show there exists a c ∈ (a, b) such that g(c) = 0. Notice
g(a) < 0 < g(b) and that g is continuous on [a, b].
Let
S = {x ∈ [a, b] | g(x) ≤ 0}.
Since f (a) < 0, we see that a ∈ S so S is non-empty. Furthermore, since S
is bounded above by b, the Least Upper Bound Property (Theorem 1.3.18)
implies that c = lub(S) exists. Notice c 6= b since g(b) > 0 so b ∈
/ S. Hence
c ∈ [a, b).
We claim that g(c) = 0. Note this would imply c 6= a (so c ∈ (a, b) as
desired) since g(a) < 0. To see that g(c) = 0, first recall (by the homework)
that there exists a sequence (xn )n≥1 such that xn ∈ S for all n ∈ N and
limn→∞ xn = lub(S) = c. Therefore, since g is continuous, we obtain that
g(c) = limn→∞ g(xn ). Since xn ∈ S for all n ∈ N, we know that g(xn ) ≤ 0 for
all n ∈ N and thus g(c) = limn→∞ g(xn ) ≤ 0. Thus, to show that g(c) = 0,
it suffices to show that g(c) ≥ 0.
Since c < b, for each n ∈ N there exists a yn ∈ [a, b] such that c <
yn < c + n1 . Since yn > c, we have that yn ∈
/ S for all n ∈ N. Since
limn→∞ yn = c by the Squeeze theorem, we obtain that g(c) = limn→∞ g(yn )
as g is continuous. Since yn ∈
/ S for all n ∈ N, g(yn ) > 0 for all n ∈ N and
thus g(c) = limn→∞ g(yn ) ≥ 0. Hence g(c) = 0 so the proof is complete.
The Intermediate Value Theorem has a wide range of applications. For
example, it may be used to solve the question posed at the beginning of
this section (modulo modes of transit that have yet to be invented yet).
78
CHAPTER 5. CONTINUITY
To see this, suppose the path is c units long. Suppose that f (t) and g(t)
represent the distance along the path from the start for both of the two
people. Then the values of the functions at 0 are 0 and c; that is, we may
assume that f (0) = 0 and g(0) = c. Eventually, when t is some number
b, the people are at the opposite ends of the path. Thus f (b) = c and
g(b) = 0. If we consider the function h(t) = g(t) − f (t), then h(0) = c > 0
whereas h(b) = −c < 0. Consequently, assuming that h is continuous, the
Intermediate Value Theorem implies that there exists a time t0 such that
h(t0 ) = 0; that is, g(t0 ) = f (t0 ) and the two people are at the same point on
the path.
Time for one more example:
Example 5.3.4. We claim there exists a z ∈ [0, π2 ] such that cos(z) = z. To
see this, consider the function f (x) = x − cos(x). Since
π
2
f (0) = 0 − 1 = −1 < 0
and
f
=
π
− 0 > 0,
2
and since f is continuous, the Intermediate Value Theorem implies there
exists a z ∈ [0, π2 ] such that f (z) = 0 (and thus cos(z) = z).
5.4
Uniform Continuity
There is a stronger form of continuity that we may wish to examine in this
course.
Definition 5.4.1. A function f defined on an interval I is said to be
uniformly continuous on I if for all > 0 there exists a δ > 0 such that if
x, y ∈ I and |x − y| < δ then |f (x) − f (y)| < .
Note the difference between Definitions 5.2.2 and 5.4.1 is that for a fixed
, δ > 0 need only work for a given point in Definition 5.2.2 whereas for
uniformly continuous functions, Definition 5.4.1 enforces that the same δ
works for all points in the interval!
It is clear from the definition of continuity that constant functions are
uniformly continuous (for an > 0, we can let δ be anything). Furthermore,
it is clear that if f (x) = x for all x ∈ R, then f is uniformly continuous on R
(for an > 0, let δ = in the definition). Other functions are harder to tell.
Example 5.4.2. We claim that f (x) = x2 is uniformly continuous on (−1, 1).
To see this, let > 0 be arbitrary. To see this, let δ = 2 > 0. Then if x, y ∈ I
and |x − y| < δ, we notice that
|f (x) − f (y)| = |x2 − y 2 | = |x + y||x − y| ≤ 2|x − y| < 2 = 2
where we have used the fact that |x + y| ≤ 2 for all x, y ∈ I. Hence, as > 0
was arbitrary, the result follows.
5.4. UNIFORM CONTINUITY
79
Example 5.4.3. We claim that f (x) = x2 is not uniformly continuous on
R. To see this, we claim Definition 5.4.1 fails for = 2. To see this, we must
demonstrate that for all δ > 0 (specifically δ = n1 for all n ∈ N will do) there
exists x, y ∈ R such that |x − y| < δ yet |f (x) − f (y)| ≥ 1. In particular, for
1
n ∈ N, let xn = n and yn = n + n1 . Then |xn − yn | < n−1
yet
1
1 2 2
|f (xn ) − f (yn )| = n − n +
= 2 + 2 ≥ 2.
n n
Hence f is not uniformly continuous on R.
The above examples show how uniform continuity may be more desirable
than simple continuity; having a δ that works for the whole interval seems
far more powerful than at a single point. However, we have seen that even
x2 is not uniformly continuous on all of R as x2 grows too quickly as x tends
to infinity. Consequently, one may ask, “Are things much nicer if we restrict
to intervals?” For closed intervals, yes!
Theorem 5.4.4. If a function f is continuous on a closed interval [a, b],
then f is uniformly continuous on [a, b].
Proof. Let f be a continuous function on [a, b]. Suppose to the contrary that
f is not uniformly continuous on [a, b]. Thus, by the definition of uniform
continuity, there exists an > 0 such that for all δ > 0 there exists x, y ∈ [a, b]
with |x − y| < δ yet |f (x) − f (y)| ≥ . Hence, for each n ∈ N there exist
xn , yn ∈ [a, b] with |xn − yn | < n1 yet |f (xn ) − f (yn )| ≥ .
Since [a, b] is closed and bounded, [a, b] is sequentially compact by Theorem 3.3.8. Therefore there exists a subsequence (xkn )n≥1 of (xn )n≥1 that
converges to some number L ∈ [a, b]. Consider the subsequence (ykn )n≥1 of
(yn )n≥1 . Notice for all n ∈ N that
|ykn − L| ≤ |ykn − xkn | + |xkn − L| ≤
1
1
+ |xkn − L| ≤ + |xkn − L|.
kn
n
Therefore, since limn→∞ |xkn − L| = 0 and limn→∞ n1 = 0, we obtain that
limn→∞ ykn = L by the Squeeze Theorem.
Since L = limn→∞ xkn = L and since f is continuous, there exists
an N1 ∈ N such that |f (xkn ) − L| < 2 for all n ≥ N1 . Similarly, since
L = limn→∞ ykn = L and since f is continuous, there exists an N2 ∈ N such
that |f (ykn ) − L| < 2 for all n ≥ N2 . Therefore, if n = max{N1 , N2 }, we
obtain that
|f (xkn ) − f (ykn )| ≤ |f (xkn ) − L| + |L − f (ykn )| < + = 2 2
which contradicts the fact that |f (xkn ) − f (ykn )| ≥ . Hence, as we have
obtained a contradiction, it must have been the case that f is uniformly
continuous on [a, b].
80
CHAPTER 5. CONTINUITY
Note the essential part of the above proof is that [a, b] is compact. In
fact, if one extends the notion of continuous function from functions with
intervals as domains to functions with arbitrary sets as domains, then any
continuous function on a compact set will be uniformly continuous.
Using Theorem 5.4.4, we can demonstrate additional functions on R are
uniformly continuous.
Example 5.4.5. We claim that the function f (x) = cos(x) is uniformly
continuous on R. To see this, let > 0 be arbitrary. Since f is uniformly
continuous on [−2π, 2π], there exists a δ > 0 such that |f (x) − f (y)| < whenever x, y ∈ [−2π, 2π] and |x − y| < δ. Due to the fact that cos(x + 2π) =
cos(x) for all x ∈ R, it is then easy to see that if x, y ∈ R are such that
|x − y| < δ, then |f (x) − f (y)| < .
Example 5.4.6. We claim that the function f (x) = x2x+1 is uniformly
continuous on R. Indeed let > 0 be arbitrary. Since limx→∞ f (x) = 0,
there exists an M1 > 0 such that |f (x)| < 2 for all x > M1 . Similarly,
limx→−∞ f (x) = 0, there exists an M2 > 0 such that |f (x)| < 2 for all
x < −M2 . Since f is continuous on [−3M2 , 3M1 ], f is uniformly continuous
there by Theorem 5.4.4 and thus there exists a δ0 > 0 such that if x, y ∈
[−3M2 , 3M1 ] and |x − y| < δ0 , then |f (x) − f (y)| < .
Let δ = min{δ0 , M1 , M2 } > 0. We claim that δ works for this . To see
this, suppose x, y ∈ R are such that |x − y| < δ. If x ∈ [−2M2 , 2M1 ], then
|x − y| < δ implies y ∈ [−3M2 , 3M1 ] so |f (x) − f (y)| < by the choice of δ0 .
If x > 2M1 then |x − y| < δ implies y > M1 so
|f (x) − f (y)| ≤ |f (x)| + |f (y)| <
+ = .
2 2
Finally, if x < −2M2 then |x − y| < δ implies y < −M2 so
|f (x) − f (y)| ≤ |f (x)| + |f (y)| <
+ = .
2 2
Hence, as we have exhausted all possible cases for x, the result follows.
To see that Theorem 5.4.4 fails on open intervals, we note the following
two examples.
Example 5.4.7. We claim that f (x) = x1 is not uniformly continuous on
(0, 1). To see this, we claim Definition 5.4.1 fails for = 1. To see this, we
must demonstrate that for all δ > 0 (specifically δ = n1 for all n ∈ N will
do) there exists x, y ∈ (0, 1) such that |x − y| < δ yet |f (x) − f (y)| ≥ 1. In
1
yet
particular, for n ∈ N, let xn = n1 and yn = n2 . Then |xn − yn | < n−1
1
1 n n
|f (xn ) − f (yn )| = 1 − 2 = n − = ≥ 1.
2
2
n
n
Hence f is not uniformly continuous on (0, 1).
5.4. UNIFORM CONTINUITY
81
Example 5.4.8. We claim that f (x) = sin x1 is not uniformly continuous
on (0, 1). To see this, we claim Definition 5.4.1 fails for = 1. To see this,
we must demonstrate that for all δ > 0 (specifically δ = n1 for all n ∈ N will
do) there exists x, y ∈ (0, 1) such that |x − y| < δ yet |f (x) − f (y)| ≥ 1. In
1
1
particular, for n ∈ N, let xn = 2πn+
. Since
π and yn =
2πn+ 3π
2
2
lim xn = 0 = lim yn ,
n→∞
n→∞
for any δ > 0 we can find an N ∈ N such that |xN | < 12 δ and |yN | < 21 δ.
Hence |xN − yN | < δ yet
|f (xN ) − f (yN )| = |1 − (−1)| = 2 ≥ 1.
Hence f is not uniformly continuous on (0, 1).
The following result shows the reason why the above functions are not
uniformly continuous on the bounded open intervals is that the one-sided
limits at the boundaries do not exist.
Theorem 5.4.9. Let f : (a, b) → R where a, b ∈ R. Then f is uniformly
continuous on (a, b) if and only if there exists a continuous function g :
[a, b] → R such that f (x) = g(x) for all x ∈ (a, b).
Proof. Suppose there exists a continuous function g : [a, b] → R such that
f (x) = g(x) for all x ∈ (a, b). Then, as g(x) is uniformly continuous on [a, b]
by Theorem 5.4.4, we clearly obtain that f is uniformly continuous on (a, b)
(i.e. for > 0, the δ that works for g also works for f ).
For the other direction, suppose f is uniformly continuous on (a, b). To
complete the result, we need to find a continuous g : [a, b] → R such that
f (x) = g(x) for all x ∈ (a, b). In particular, we need only define g(a) and
g(b). However, we will need
g(a) = lim f (x)
x→a+
and
g(b) = lim f (x).
x→b−
Hence, provided the above two one-sided limits exist, the result will be
complete.
To see that limx→a+ f (x) exists via Theorem 5.1.18, we must show that
that (f (xn ))n≥1 converges for every sequence (xn )n≥1 with xn > a for all n
and limn→∞ xn = a, AND that (f (xn ))n≥1 converges to the same number
for every choice of (xn )n≥1 .
First, let (xn )n≥1 be such that a < xn < b for all n and limn→∞ xn = a.
To see that (f (xn ))n≥1 converges, we claim that (f (xn ))n≥1 is Cauchy (and
thus converges by Theorem 3.1.5). To see this, let > 0 be arbitrary. Since f
is uniformly continuous on (a, b) there exists a δ > 0 such that if x, y ∈ (a, b)
and |x − y| < δ, then |f (x) − f (y)| < . Since limn→∞ xn = a, we know that
82
CHAPTER 5. CONTINUITY
(xn )n≥1 is Cauchy and thus there exists an N ∈ N such that |xn − xm | < δ
for all n, m ≥ N . Hence, if n, m ≥ N , we obtain that |f (xn ) − f (xm )| < as
desired. Hence (f (xn ))n≥1 is Cauchy.
Suppose that (xn )n≥1 and (yn )n≥1 are such that a < xn , yn < b for all
n ∈ N and limn→∞ xn = limn→∞ yn = a. Thus L = limn→∞ f (xn ) and
K = limn→∞ f (yn ) exist. To see that L = K, let > 0 be arbitrary. Since f
is uniformly continuous on (a, b) there exists a δ > 0 such that x, y ∈ (a, b)
and |x−y| < δ, then |f (x)−f (y)| < 3 . Since limn→∞ xn −yn = 0, there exists
an N1 ∈ N such that |xn − yn | < δ for all n ≥ N1 . Since limn→∞ f (xn ) = L,
there exists an N2 ∈ N such that |f (xn ) − L| < 3 for all n ≥ N2 . Similarly,
since limn→∞ f (yn ) = K, there exists an N3 ∈ N such that |yn − a| < 3 for
all n ≥ N3 . Thus, if N = max{N1 , N2 , N3 }, then |xN − yN | < δ so
|L − K| ≤ |L − f (xN )| + |f (xN ) − f (yN )| + |f (yN ) − K| <
+ + = .
3 3 3
Thus, we have shown that |L − K| < for all > 0. This implies |L − K| = 0
and thus L = K as desired.
Hence we may define g(a) so that g(a) = limx→a+ f (x). Similar arguments show that we may define g(b) as desired thereby completing the
proof.
In the above proof, we have used what is known as a three- argument.
Said argument is quite useful. For example, one can use a three- argument
to show that a ‘uniform limit of continuous functions is a continuous function’.
This enables to define the exponential function along with sine and cosine
as uniform limits of polynomials thereby proving they are continuous. The
construct of such polynomials will be motivated in the next chapter.
Chapter 6
Differentiation
With the above study of continuity, we may turn our attention to studying
another important concept in calculus: differentiation. Constructed to be an
approximation to the slope of the tangent line of the graph of a function at a
point, derivatives are essential to studying the rate of changes of dynamical
systems. Furthermore, the theory of derivatives can aid in computing limits
of functions and in approximating functions with polynomials.
6.1
The Derivative
To begin our study of the theory of differentiation, we must begin by studying
a formal definition of the derivative and some of the basic rules for computing
derivatives.
6.1.1
Definition of a Derivative
Given a function f defined on an open interval I containing a, we desire to
define the derivative to be the slope of the tangent line of the graph of f at
a. As an approximation, we may pick any point x ∈ I and look at the slope
of the line from (x, f (x)) to (a, f (a)). The slope of said line is
f (x) − f (a)
.
x−a
Furthermore, as x gets closer and closer to a, the slope of the line from
(x, f (x)) to (a, f (a)) should better and better approximate the slope of
the tangent line to f at a. Alternatively, one can think of the slopes
better and better approximating the instantaneous rate of change of f at a.
Consequently, we define the derivative as follows.
Definition 6.1.1. Let f be a function defined on an open interval containing
a. It is said that f is differentiable at a if
lim
x→a
f (x) − f (a)
x−a
83
84
CHAPTER 6. DIFFERENTIATION
exists. If f is differentiable at a, we use f 0 (a) to denote the above limit.
If f is differentiable at each point x in I, then the function f 0 : I → R
whose value at x is f 0 (x) is called the derivative of f on I.
Of course, although our motivation for defining the derivative was to find
the slope of the tangent line to the graph of f at a, we have seen many odd
functions so far. Thus we should not rely too much on our intuition.
In addition, there is another way to formulate the derivative of a function
f at a. Indeed, if x is tending to a, then x − a tends to 0. Substituting
h = x − a, we see x = a + h so
lim
x→a
f (x) − f (a)
f (a + h) − f (a)
= lim
.
h→0
x−a
h
This alternate formulation of the derivative is often useful for computations.
Example 6.1.2. If c ∈ R and f (x) = c for all x ∈ R, then f 0 (x) = 0 for all
x ∈ R. Indeed for all a ∈ R,
lim
x→a
f (x) − f (a)
c−c
= lim
= lim 0 = 0.
x→a
x−a
x − a x→a
Hence f 0 (a) exists for all a ∈ R and f 0 (a) = 0.
Example 6.1.3. Let n ∈ N and let f (x) = xn for all x ∈ R. Then for all
a ∈ R, we have (by a homework problem) that
f (a + h) − f (a)
(a + h)n − an
= lim
h→0
h→0
h
h !
!
n
X
1
n n−k k
n
= lim
a
h −a
h→0 h
k
k=0
lim
1
= lim
h→0 h
= lim
h→0
n
X
!
n n−k k
a
h
k
k=1
n
X
k=1
!
!
n n−k k−1
a
h
k
!
=
n n−1
a
1
lim hk−1 = 0 for all k > 1
h→0
= nan−1 .
Hence f 0 (a) exists for all a ∈ R and f 0 (a) = nan−1 .
1
Example 6.1.4. Let f (x) = x 2 for all x > 0. Then for all a > 0, we have
6.1. THE DERIVATIVE
85
that
√
√
x− a
x−a
√
√ √
√
( x − a)( x + a)
√
√
= lim
x→a
(x − a)( x + a)
1
1
√ = √ .
= lim √
x→a
x+ a
2 a
f (x) − f (a)
lim
= lim
x→a
x→a
x−a
1
Hence f 0 (a) exists for all a > 0 and f 0 (a) = 2√
. Note it does not make
a
0
sense to discuss f (0) as f is not defined on an open interval around 0.
Example 6.1.5. Let f (x) = |x| for all x ∈ R. It is not difficult to see that
f 0 (x) = 1 if x > 0 and f 0 (x) = −1 if x < 0 (as f (x) = x if x > 0 and
f (x) = −x if x < 0). However, f is not differentiable at 0. Indeed
f (h) − f (0)
h
= lim
=1
h→0+
h→0+ h
h
lim
whereas
f (h) − f (0)
−h
= lim
= −1.
h→0−
h→0− h
h
lim
Thus limh→0
f (h)−f (0)
h
does not exist, so f is not differentiable at 0.
From this point forward, we will assume that f (x) = ex , g(x) = sin(x),
and h(x) = cos(x) are all differentiable on R with
f 0 (x) = ex ,
g 0 (x) = cos(x),
and
h0 (x) = − sin(x),
and that k(x) = ln(x) and j(x) = xp for p ∈ R are differentiable for x > 0
with k 0 (x) = x1 and j 0 (x) = pxp−1 . In fact, much of the theory of this
chapter is devoted to showing that ex is an injective function and that ln(x)
is continuous, differentiable, and has derivative x1 .
Remark 6.1.6. To study the natural logarithm properly, we will need to
know that ex is continuous, differentiable, and has itself as its derivative.
One way to do this would be to define ex to be the unique function f such
that f (1) = 1 and f 0 (x) = f (x). However, we would then need to prove
that such a function exists. Another way (which is easiest in my opinion) to
define ex is to define
n
X
1 k
ex = lim
x
n→∞
k!
k=0
for all x ∈ R. Of course, one must show this limit exists, but is is possible
to show this using Cauchy sequences (see the homework for convergence of
infinite sums).
86
CHAPTER 6. DIFFERENTIATION
To study ex , we would then need to rigorously study the convergence of
sums of real numbers (i.e. series). Unfortunately, we will not have time to do
so. If we did, one must then show that the function f (x) = ex is continuous.
To do this, it is first simpler to show that ea+b = ea eb for all a, b ∈ R. This
can be done by carefully expand the formula for ea+b , exchange terms in the
sums, and show convergence. This is possible since the sums converges in
a very special manner (i.e. the sum is absolutely summable). As it is easy
to show that ex > 0 if x ≥ 0, (i.e. it is a sum of positive elements bounded
below by 1) this implies ex > 0 everywhere for if y < 0 is such that ey < 0,
then 0 > ey e−y = e0 = 1 which is a contradiction.
If one wants to show that f is continuous at a, we may then notice that
|ex − ea | = |ea ||ex−a − 1| = |ea ||ex−a − e0 |
so provided ex is continuous at x = 0, we can show that ex is continuous on
all of R. Again, dealing with the sums, we can see that ex is continuous at
0. Similar arguments show that f (x) = ex is differentiable everywhere with
f 0 (x) = ex . Using the fact that ex > 0 everywhere, we may use a result later
in this chapter (Theorem 6.4.6) to prove ex is injective.
Moving forward, we notice that all of the above functions are continuous.
Coincidence, I think not!
Theorem 6.1.7. Let f be a function defined on an open interval containing
a. If f is differentiable at a, then f is continuous at a.
Proof. Suppose that f 0 (a) exists. Therefore limx→a
limx→a x − a exists and since
f (x) − f (a) =
f (x)−f (a)
x−a
exists. Since
f (x) − f (a)
(x − a),
x−a
we obtain that limx→a f (x) − f (a) exists and
f (x) − f (a) lim f (x) − f (a) = lim
lim x − a = f 0 (a)0 = 0.
x→a
x→a
x→a
x−a
Hence limx→a f (x) = f (a) so f is continuous at a.
Consequently, if one demonstrates a function is differentiable at a point,
then it must be continuous. Conversely, if a function is not continuous at
a point, then it is not differentiable there. Using the previous chapter, this
provides many examples of functions that are not differentiable at points.
However, note that continuity does not imply differentiability. Indeed, we
have seen that |x| is continuous on R but not differentiable at 0. In fact,
there exists a function that is continuous on R and differentiable on no points
in R (although, we do not have the technology to construct it).
6.1. THE DERIVATIVE
6.1.2
87
Rules of Differentiation
With the above construction of derivatives complete, we turn our attention
to rules that may be used to compute derivatives using known derivatives.
Theorem 6.1.8. If c ∈ R and f is differentiable at a point a, then the
function cf defined via (cf )(x) = cf (x) for all x ∈ R is differentiable at a
and
(cf )0 (a) = cf 0 (a).
Proof. Since
c(f (x) − f (a))
f (x) − f (a)
(cf )(x) − (cf )(a)
= lim
= c lim
= cf 0 (a),
x→a
x→a
x→a
x−a
x−a
x−a
lim
the proof is complete.
Theorem 6.1.9. If f and g are differentiable at a point a, then the function
f + g defined via (f + g)(x) = f (x) + g(x) for all x ∈ R is differentiable at a
and
(f + g)0 (a) = f 0 (a) + g 0 (a).
Proof. Since
(f + g)(x) − (f + g)(a)
f (x) − f (a) g(x) − g(a)
= lim
+
x→a
x→a
x−a
x−a
x−a
0
0
= f (a) + g (a),
lim
the proof is complete.
Using the previous two results, we see that if n ∈ N and a0 , a1 , . . . , an ∈ R,
then
(an xn +an−1 xn−1 +· · ·+a1 x+a0 )0 = nan xn−1 +(n−1)an−1 xn−2 +· · ·+a1 +0.
Theorem 6.1.10 (Product Rule). If f and g are differentiable at a point
a, then the function f g defined via (f g)(x) = f (x)g(x) for all x ∈ R is
differentiable at a and
(f g)0 (a) = f 0 (a)g(a) + f (a)g 0 (a).
Proof. First, notice that
f (x)g(x) − f (a)g(a)
(f g)(x) − (f g)(a)
=
x−a
x−a
f (x)g(x) − f (x)g(a) f (x)g(a) − f (a)g(a)
=
+
.
x−a
x−a
88
CHAPTER 6. DIFFERENTIATION
Since f 0 (a) exists, f is continuous at a by Theorem 6.1.7. Therefore
limx→a f (x) = f (a). Since g 0 (a) = limx→a g(x)−g(a)
= g 0 (a), we obtain
x−a
that
f (x)g(x) − f (x)g(a) g(x) − g(a)
= f (a)g 0 (a).
lim
= lim f (x) lim
x→a
x→a
x→a
x−a
x−a
Since
f (x)g(a) − f (a)g(a)
f (x) − f (a)
lim
= g(a) lim
= g(a)f 0 (a),
x→a
x→a
x−a
x−a
we obtain that
(f g)(x) − (f g)(a)
lim
= f (a)g 0 (a) + g(a)f 0 (a)
x→a
x−a
thereby completing the proof.
Using the product rule, we see that
(cos(x) sin(x))0 = − sin(x) sin(x) + cos(x) cos(x)
= cos2 (x) − sin2 (x) = cos(2x).
Furthermore, note the product rule can be applied recursively; that is, if f ,
g, and h are all differentiable at a, then
(f gh)0 (a) = f 0 (a)(gh)(a) + f (a)(gh)0 (a)
= f 0 (a)g(a)h(a) + f (a)(g 0 (a)h(a) + g(a)h0 (a))
= f 0 (a)g(a)h(a) + f (a)g 0 (a)h(a) + f (a)g(a)h0 (a).
Next, how do we take the derivative of the reciprocal of a function?
Lemma 6.1.11. If f is differentiable at a point a and f (a) 6= 0, then the
1
for all x ∈ R is differentiable at a and
function h defined via h(x) = f (x)
h0 (a) = −
f 0 (a)
.
(f (a))2
Proof. First, we must demonstrate that h(x) is well-defined in an open
interval containing a. Notice since f 0 (a) exists, f is continuous at a by
Theorem 6.1.7. Hence, by the homework, there exists an open interval I
containing a such that f (x) 6= 0 for all x ∈ I. Thus h(x) is well-defined on I
so it makes sense to discuss whether h0 (a) exists.
Notice that
h(x) − h(a)
=
x−a
=
1
f (x)
−
1
f (a)
x−a
f (a)−f (x)
f (a)f (x)
x−a
f (x) − f (a)
=−
.
f (x)f (a)(x − a)
6.1. THE DERIVATIVE
89
Since f 0 (a) exists, f is continuous at a by Theorem 6.1.7 so. limx→a f (x) =
1
1
f (a). Therefore as f (a) 6= 0, limx→a f (x)
= f (a)
. Hence
1
f (x) − f (a)
1
h(x) − h(a)
lim
lim
=−
x→a
x→a
x−a
f (a) x→a f (x)
x−a
1
1
f 0 (a)
=−
f 0 (a) = −
f (a) f (a)
(f (a))2
lim
thereby completing the proof.
As an example of the above result,
0
(sec(x)) =
1
cos(x)
0
=−
− sin(x)
= tan(x) sec(x).
cos2 (x)
Combining the above result with the product rule, we obtain:
Theorem 6.1.12 (Quotient Rule). If f and g are differentiable at a point
(x)
a and g 0 (a) 6= 0, then the function h defined via h(x) = fg(x)
for all x ∈ R is
differentiable at a and
h0 (a) =
f 0 (a)g(a) − f (a)g 0 (a)
.
(g(a))2
Proof. By Lemma 6.1.11 and Theorem 6.1.10, we obtain that h(x) is differentiable at a and
1
g 0 (a)
f 0 (a)g(a) − f (a)g 0 (a)
0
0
h (a) = f (a)
+ f (a) −
=
.
g(a)
(g(a))2
(g(a))2
For example
sin(x) 0
cos(x)
cos(x) cos(x) − sin(x)(− sin(x))
=
cos2 (x)
1
=
= sec2 (x)
cos2 (x)
(tan(x))0 =
For our final rule, the Chain Rule, we desire to compute the derivative
of the composition of functions, provided the composition makes sense and
derivatives exist. Most proofs of the Chain Rule seen in elementary calculus
have a large flaw in them. To rigorously prove the Chain Rule, we will need
the following.
Theorem 6.1.13 (Carathéodory’s Theorem). Let f be a function defined on an open interval I containing a. Then f is differentiable at a if and
only if there exists a function ϕ defined on I such that ϕ is continuous at a
and
f (x) = f (a) + ϕ(x)(x − a).
Furthermore, f 0 (a) = ϕ(a).
90
CHAPTER 6. DIFFERENTIATION
Proof. First, suppose there a function ϕ defined on I such that ϕ is continuous
at a and
f (x) = f (a) + ϕ(x)(x − a).
To see that f is differentiable at a, notice if x 6= a then
f (x) − f (a)
ϕ(x)(x − a)
=
= ϕ(x).
x−a
x−a
Therefore, since limx→a ϕ(x) exists (and equals ϕ(a)), we obtain that
f (x) − f (a)
x→a
x−a
lim
exists (and equals ϕ(a)) so f is differentiable at a.
Conversely, suppose that f is differentiable at a. Define ϕ : I → R via
(
ϕ(x) =
f 0 (a)
f (x)−f (a)
x−a
if x = a
if x =
6 a
for all x ∈ I. Clearly f (x) = f (a) + ϕ(x)(x − a) for all x ∈ I. Furthermore,
since
f (x) − f (a)
= f 0 (a) = ϕ(a),
lim ϕ(x) = lim
x→a
x→a
x−a
ϕ is continuous at a as desired.
Theorem 6.1.14 (Chain Rule). Let I and J be open intervals, let g :
I → R, and let f : J → R be such that f (J) ⊆ I. Suppose that a ∈ I, f is
differentiable at a, and g is differentiable at f (a). Then g ◦ f : I → R is
differentiable at a and
(g ◦ f )0 (a) = g 0 (f (a))f 0 (a).
Proof. Since f 0 (a) and g 0 (f (a)) exists, we have that limx→a f (x) = f (a) and
limx→f (a) g(x) = g(f (a)) by Theorem 6.1.7. Furthermore, by Theorem 6.1.13,
there exists functions ϕ : J → R and ψ : I → R such that
• ϕ is continuous at a,
• f (x) = f (a) + ϕ(x)(x − a) for all x ∈ J,
• f 0 (a) = ϕ(a),
• ψ is continuous at f (a),
• g(x) = g(f (a)) + ψ(x)(x − f (a)) for all x ∈ I, and
• g 0 (f (a)) = ψ(f (a)).
6.2. INVERSE FUNCTIONS
91
Therefore
g(f (x)) − g(f (a)) = ψ(f (x))(f (x) − f (a)) = ψ(f (x))ϕ(x)(x − a).
However, since limx→a f (x) = f (a) and since ψ is continuous at f (a),
lim ψ(f (x)) = ψ(f (a))
x→a
so ψ ◦ f is continuous at a. Furthermore, since ϕ is continuous at a, the
function h : J → R defined by h(x) = ψ(f (x))ϕ(x) is continuous at a. Since
g(f (x)) = g(f (a)) + h(x)(x − a), we obtain that g ◦ f is differentiable at a
with derivative
h(a) = ψ(f (a))ϕ(a) = g 0 (f (a))f 0 (a)
by Theorem 6.1.13.
As an example of the Chain Rule, we notice that
(cos(x3 ))0 = (− sin(x3 ))(3x2 ).
6.2
Inverse Functions
In this section, we will examine the inverse of functions on closed intervals.
In particular, our theory will apply to the natural logarithm ln(x) (being the
inverse of ex on R), and inverse trigonometric functions. Unfortunately, the
technology to prove certain functions are invertible (specifically Theorem
6.4.6) is not available at this time, so we will make assumptions of invertibility
of the necessary functions.
6.2.1
Monotone Functions
One of the simplest ways to construct injective functions on R is the following,
which will be of greater interest than injective functions.
Definition 6.2.1. Let I be an interval. A function f defined on I is said to
be
• increasing on I if f (x1 ) < f (x2 ) whenever x1 , x2 ∈ I and x1 < x2 ,
• non-decreasing on I if f (x1 ) ≤ f (x2 ) whenever x1 , x2 ∈ I and x1 < x2 ,
• decreasing on I if f (x1 ) > f (x2 ) whenever x1 , x2 ∈ I and x1 < x2 ,
• non-increasing on I if f (x1 ) ≥ f (x2 ) whenever x1 , x2 ∈ I and x1 < x2 ,
• monotone on I if f is non-decreasing or non-increasing.
Indeed, if we include continuity hypotheses, there is little difference
between injective and monotone functions.
92
CHAPTER 6. DIFFERENTIATION
Proposition 6.2.2. Let I be an interval and let f : I → R. If f is continuous
on I and f is injective, then either f is increasing or decreasing on I.
Proof. Notice that since f is injective on I, if c, d ∈ I are such that c < d,
then f (c) 6= f (d).
Suppose that f is not increasing nor decreasing on I. Therefore, there
must exist three points x1 , x2 , x3 ∈ I with x1 < x2 < x3 such that either
• f (x1 ) < f (x2 ) and f (x3 ) < f (x2 ), or
• f (x1 ) > f (x2 ) and f (x3 ) > f (x2 ).
Suppose that f (x1 ) < f (x2 ) and f (x3 ) < f (x2 ). Choose α ∈ R such
that f (x1 ) < α < f (x2 ) and f (x3 ) < α < f (x2 ) (this is possible by letting
β = max{f (x1 ), f (x3 )} and letting α = β+f2(x2 ) ). Since f is continuous
on [x1 , x2 ], the Intermediate Value Theorem (Theorem 5.3.3) implies there
exists a c ∈ (x1 , x2 ) such that f (c) = α. Similarly, since f is continuous on
[x2 , x3 ], the Intermediate Value Theorem (Theorem 5.3.3) implies there exists
a d ∈ (x1 , x2 ) such that f (d) = α. As c < d, f (c) = α = f (d) contradicts the
fact that f is injective on I. Hence we have a contradiction in this case.
Since the case that f (x1 ) > f (x2 ) and f (x3 ) > f (x2 ) also creates a
contradiction by similar arguments, we have obtained a contradiction. Hence
f is either increasing or decreasing on I.
It is important to note that the above result is false if f is not continuous.
Indeed if
(
x
if x ∈ Q
f (x) =
,
−x if x ∈ R \ Q
then f is injective on R, yet f is neither increasing nor decreasing.
Our next result demonstrates it is easy to check that monotone functions are continuous. In particular, to check that monotone functions are
continuous, we just need to check that such functions do not have jump
discontinuities.
Theorem 6.2.3. Let f : [a, b] → R be monotone. Then f is continuous
on [a, b] if and only if whenever α ∈ R is such that f (a) < α < f (b) or
f (b) < α < f (a) there exists a c ∈ [a, b] such that f (c) = α (that is, f is
continuous if and only if the conclusions of the Intermediate Value Theorem
hold).
Proof. If f is continuous on [a, b], then whenever α ∈ R is such that f (a) <
α < f (b) or f (b) < α < f (a) there exists a c ∈ [a, b] such that f (c) = α by
the Intermediate Value Theorem (Theorem 5.3.3).
For the converse, suppose that f is non-decreasing as the proof when f
is non-increasing will hold by similar arguments. Let x0 ∈ (a, b) be arbitrary.
To see that f is continuous at x0 , let > 0 be arbitrary.
6.2. INVERSE FUNCTIONS
93
Let α = max{f (a), f (x0 ) − }. If α = f (a), let c1 = a and notice that if
c1 < x < x0 , then
0 ≤ f (x0 ) − f (x) ≤ f (x0 ) − f (a) = f (x0 ) − α ≤ f (x0 ) − (f (x0 ) − ) = as f is non-decreasing. Otherwise f (a) < α = f (x0 ) − . Therefore, by
assumptions, there exists a c1 ∈ [a, b] such that f (c1 ) = α. Since f (c1 ) = α =
f (x0 ) − < f (x0 ), it must be the case that c1 < x0 as f is non-decreasing.
Furthermore, if c1 < x < x0 , then
0 ≤ f (x0 ) − f (x) ≤ f (x0 ) − f (c1 ) = f (x0 ) − α = f (x0 ) − (f (x0 ) − ) = Hence, in either case, there exists a c1 ∈ [a, x0 ) such that |f (x) − f (x0 )| ≤ for all x ∈ (c1 , x0 ).
Let β = min{f (b), f (x0 ) + }. If β = f (b), let c2 = b and notice that if
x0 < x < b, then
0 ≤ f (x) − f (x0 ) ≤ f (b) − f (x0 ) = β − f (x0 ) ≤ (f (x0 ) + ) − f (x0 ) = as f is non-decreasing. Otherwise f (b) > β = f (x0 ) + . Therefore, by
assumptions, there exists a c2 ∈ [a, b] such that f (c2 ) = β. Since f (c2 ) = β =
f (x0 ) + > f (x0 ), it must be the case that x0 < c2 as f is non-decreasing.
Furthermore, if x0 < x < c2 , then
0 ≤ f (x) − f (x0 ) ≤ f (c2 ) − f (x0 ) = β − f (x0 ) = (f (x0 ) + ) − f (x0 ) = Hence, in either case, there exists a c2 ∈ (x0 , b] such that |f (x) − f (x0 )| ≤ for all x ∈ (x0 , c2 ).
Therefore, if x ∈ (c1 , c2 ) then x ∈ I and |f (x) − f (x0 )| ≤ . Let
δ = min{x0 − c1 , c2 − x0 } > 0. Then, if |x − x0 | < δ then x ∈ (c1 , c2 ) so
|f (x) − f (x0 )| ≤ . Hence f is continuous at x0 .
Since x0 ∈ (a, b) be arbitrary, f is continuous at each point in (a, b). To
see that f is continuous at b, apply the α-part of the argument with x0 = b
to obtain limx→b− f (x) = f (b). Similarly, to see that f is continuous at a,
apply the β-part of the argument with x0 = a to obtain limx→a+ f (x) = f (a).
Hence f is continuous on [a, b].
6.2.2
Inverse Function Theorem
Although Theorem 6.2.3 may seem an odd thing to prove, it is the essential
tool in demonstrating the following (which tells us the continuity of ex implies
the continuity of ln(x)).
Corollary 6.2.4. If f : [a, b] → R is injective and continuous on [a, b],
then f ([a, b]) is a closed interval and the inverse of f on its image, f −1 :
f ([a, b]) → [a, b], is continuous.
94
CHAPTER 6. DIFFERENTIATION
Proof. Note f is either increasing or decreasing by Proposition 6.2.2. We
will assume that f is increasing as the proof that f is decreasing will follow
by similar arguments (or can follow as if f is decreasing, the function
g : [a, b] → R defined by g(x) = −f (x) is an increasing continuous injective
function).
Since f is increasing, we obtain f (a) < f (b). Since f is continuous, we
obtain by the Intermediate Value Theorem (Theorem 5.3.3) that
f ([a, b]) = [f (a), f (b)].
We claim that f −1 is increasing. To see this, suppose y1 , y2 ∈ f ([a, b])
are such that y1 < y2 . Choose x1 , x2 ∈ [a, b] such that f (x1 ) = y1 and
f (x2 ) = y2 . Since f (x1 ) < f (x2 ), it must be the case that x1 < x2 as f was
increasing. Hence
f −1 (y1 ) = f −1 (f (x1 )) = x1 < x2 = f −1 (f (x2 )) = f −1 (y2 ).
Hence f −1 is increasing.
Hence f −1 : [f (a), f (b)] → [a, b] is an increasing function such that
f −1 ([f (a), f (b)]) = [a, b]. Therefore f −1 is continuous by Theorem 6.2.3 as
f −1 satisfies the conclusions of the Intermediate Value Theorem.
Since inverse functions of continuous function are continuous, it is possible
that they are differentiable. In particular, the following tells us how to
compute derivatives of inverse functions.
Theorem 6.2.5 (Inverse Function Theorem). Suppose a, b ∈ R with
a < b and f : [a, b] → R is injective and continuous on [a, b]. Let g :
f ([a, b]) → [a, b] be the inverse of f on its image. If c ∈ (a, b) and f is
differentiable at c with f 0 (c) 6= 0, then g is differentiable at f (c) and
g 0 (f (c)) =
1
.
f 0 (c)
Proof. Since f is injective and continuous, f ([a, b]) is a closed interval by
Corollary 6.2.4. Furthermore, g is continuous on f ([a, b]) by the same
corollary.
To see that
g(x) − g(f (a))
lim
x − f (c)
x→f (c)
exists, suppose (xn )n≥1 is such that xn ∈ f (I) \ {f (c)} for all n ∈ N and
limn→∞ xn = f (c). Let yn = g(xn ) (so f (yn ) = xn 6= f (c)). Then
g(xn ) − g(f (c))
yn − a
=
=
xn − f (c)
f (yn ) − f (c)
1
f (yn )−f (c)
yn −c
.
6.2. INVERSE FUNCTIONS
95
Since limn→∞ xn = f (c) and since g is continuous at f (c), we obtain that
lim yn = lim g(xn ) = g(f (c)) = c.
n→∞
Hence, as
f 0 (c)
n→∞
exists and
f 0 (c)
lim
6= 0,
1
n→∞ f (yn )−f (c)
yn −c
=
1
.
f 0 (c)
Hence
1
g(xn ) − g(f (c))
= 0 .
n→∞
xn − f (c)
f (c)
Since this holds for all (xn )n≥1 is such that xn ∈ f (I) \ {f (c)} for all n ∈ N
and limn→∞ xn = f (c), we obtain that g 0 (f (c)) exists and equals f 01(c) .
lim
Example 6.2.6. Let f (x) = ex for all x ∈ R. Then g(x) = ln(x) for x > 0
is the inverse of f . Since f 0 (x) = ex , if a ∈ R then
g 0 (ea ) = g 0 (f (a)) =
1
1
= a.
f 0 (a)
e
Since if x > 0 we can write x = ea for some a ∈ R, we obtain that g 0 (x) = x1
as desired.
Using the above (and theory of exponentials), for a fixed b > 0 we can
compute the derivative of bx . Indeed we know that b = eln(b) and thus bx =
(eln(b) )x = ex ln(b) . Hence, by the Chain Rule, (bx )0 = ln(b)ex ln(b) = ln(b)bx .
In particular, using the definition of the derivative, we obtain that
bh − 1
.
h→0
h
injective
Example 6.2.7. We know that cos(x), sin(x), and tan(x) are not
π π
−
functions on R. However,
we
can
show
that
f
:
[0,
π]
→
[−1,
1],
g
:
2, 2 →
π π
[−1, 1], and h : − 2 , 2 → R defined by
ln(b) = lim
f (x) = cos(x),
g(x) = sin(x),
and
h(x) = tan(x)
are injective continuous functions with non-zero derivatives (on the open
intervals). Consequently,
their inverses, denoted arccos : [−1, 1] → [0, π],
arcsin : [−1, 1] → − π2 , π2 , and arctan : R → − π2 , π2 respectively, are
continuous by Corollary 6.2.4 and differentiable by Theorem 6.2.5. Let’s
compute their inverses.
First, for f (x), we notice that arccos(−1) = π and arccos(1) = 0. Also
1
f 0 (arccos(x))
1
=
− sin(arccos(x))
1
= −√
1 − x2
(arccos(x))0 =
96
CHAPTER 6. DIFFERENTIATION
where we have used the following triangle where θ = arccos(x):
√
1
1 − x2
θ
x
Next, for g(x), we notice that arcsin(−1) = − π2 and arcsin(1) = π2 . Also
(arcsin(x))0 =
1
g 0 (arcsin(x))
1
cos(arcsin(x))
1
=√
1 − x2
=
where we have used the following triangle where θ = arcsin(x):
1
x
θ
√
1 − x2
Finally, for h(x), we notice that arctan(x) tends to −∞ as x tends to
from the right whereas arctan(x) tends to ∞ as x tends to π2 from the
left. Also
− π2
(arctan(x))0 =
=
=
1
h0 (arctan(x))
1
sec2 (arctan(x))
1
1 + x2
where we have used the following triangle where θ = arctan(x):
√
x
1 + x2
θ
1
6.3
Extreme Values of Functions
Next we turn our attention to another problem, which may be motivated by
physics: If an object travels from the Earth to the moon, how do we know
6.3. EXTREME VALUES OF FUNCTIONS
97
that there is a point where it obtains its maximum speed? Of course, we
must make assumptions that speed is a well-defined continuous function (no
teleporters) in order for us to answer this question in the affirmative.
A similar question is to analyze elements of R for which a function f on R
obtains its extremal values. Consequently, we make the following definition.
Definition 6.3.1. Let I be an interval and let f : I → R. It is said that f
has a
• global maximum at c if f (x) ≤ f (c) for all x ∈ I,
• global minimum at c if f (x) ≥ f (c) for all x ∈ I,
• local maximum at c if there exists an open interval J ⊆ I such that
c ∈ J and f (x) ≤ f (c) for all x ∈ J,
• local minimum at c if there exists an open interval J ⊆ I such that
c ∈ J and f (x) ≥ f (c) for all x ∈ J,
Given a function f , it is clear that f must have the following property in
order for f to have a global maximum or minimum.
Definition 6.3.2. A function f defined on an interval I is said to be bounded
if f (I) is a bounded set.
Using the Bolzano-Weierstrass Theorem (Theorem 2.4.5) or our results
about sequentially compact sets (see Theorem 3.3.8), we can obtain the
second piece of our Triforce. Consequently, the following result is really a
result about continuous functions on compact sets.
Theorem 6.3.3 (Extreme Value Theorem). Let f : [a, b] → R be
continuous on [a, b]. Then there exists points x1 , x2 ∈ [a, b] such that
f (x1 ) ≤ f (x) ≤ f (x2 ) for all x ∈ [a, b]; that is, f has a global maximum and
minimum on [a, b].
Proof. Since f : [a, b] → R is continuous on [a, b], f is bounded.
Let α = glb(f ([a, b])). Hence, for each n ∈ N there exists a yn ∈ [a, b]
such that
1
α ≤ f (yn ) < α + .
n
Hence limn→∞ f (yn ) = α by the Squeeze Theorem.
Since (yn )n≥1 is a sequence such that yn ∈ [a, b] and since [a, b] is sequentially compact (see Theorem 3.3.8), there exists a subsequence (ykn )k≥1 such
that limn→∞ ykn exists and is an element of [a, b]. Let x1 = limn→∞ ykn ∈
[a, b]. Then, since f is continuous on [a, b],
f (x1 ) = lim f (ykn ) = α.
n→∞
98
CHAPTER 6. DIFFERENTIATION
Hence f (x1 ) ≤ f (x) for all x ∈ [a, b] by the definition of α.
Similar arguments show that if β = lub(f ([a, b])), then there exists an
x2 ∈ [a, b] such that f (x2 ) = β. Hence f (x) ≤ f (x2 ) for all x ∈ [a, b] by the
definition of β.
Of course, the Extreme Value Theorem says that maximum and minimum
are obtain, but provides no method for computing them. In the case our
functions are differentiable, we are in luck.
Proposition 6.3.4. Let I be an interval and let f : I → R. If f has a local
maximum or local minimum at c ∈ I, and if f 0 (c) exists, then f 0 (c) = 0.
Proof. Let c ∈ I be such that f 0 (c) exists. We will assume that f has a local
maximum at c as the proof when f has a local minimum at c is similar.
Since f has a local maximum at c, there exists an open interval J ⊆ I
such that c ∈ J and f (x) ≤ f (c) for all x ∈ J. If x ∈ J and x > c, then
f (x) − f (c)
≥0
x−c
as both the numerator and denominator are positive. Therefore, as J is an
open interval containing c,
lim
x→c+
f (x) − f (c)
≥ 0.
x−c
Similarly, if x ∈ J and x < c, then
f (x) − f (c)
≤0
x−c
as the numerator is positive whereas the denominator is negative. Therefore,
as J is an open interval containing c,
f (x) − f (c)
≤ 0.
x→c−
x−c
lim
Since f 0 (c) exists
f (x) − f (c)
f (x) − f (c)
= f 0 (c) = lim
.
x→c−
x→c+
x−c
x−c
lim
Hence the above inequalities show 0 ≤ f 0 (c) ≤ 0 and thus f 0 (c) = 0.
Using the above we may discuss an algorithm for deducing the maximal
and minimal values a function achieves: Given a function f defined on either
[a, b] or R:
1. Find all points where f is differentiable.
6.4. THE MEAN VALUE THEOREM
99
2. Find the value of all points x where f 0 (x) = 0.
3. Find the value of all points x where f 0 (x) does not exist (if f is not
differentiable on a large set, we are pretty much out of luck).
4. If f is defined on [a, b], find f (a) and f (b). Otherwise, if f is defined on
R, find limx→∞ f (x) and limx→−∞ f (x). We must do this to analyze
the behaviour of f at the bound values for which we have no information
pertaining to derivatives.
5. Compare the values to find the maximal/minimal values and where
they occur.
The following two examples demonstrate when a maximum/minimum
can occur when f 0 does not exist and when f 0 (x) exists and equals zero, yet
f (x) is not a local maximum or minimum.
Example 6.3.5. If f (x) = |x|, we can see that f has no maximum on R yet
f has a minimal value of 0 at x = 0. However f 0 (0) does not exist.
Example 6.3.6. If f (x) = x3 , we can see that f 0 (x) = 3x2 is zero only
when x = 0. However, f does not have a local maximum nor local minimum
at 0 as f (x) < 0 if x < 0 and f (x) > 0 if x > 0.
One tool the above example illustrates is missing from our repertoire is
the ability to determine whether a point is a local maximum or local minimum
based on the derivative. For example, consider the function f (x) = xex
defined for all x ∈ R. We notice, via the product rule, that
f 0 (x) = ex + xex = (x + 1)ex .
Hence f 0 (x) is zero only x = −1. However, how can we tell if f has a local
maximum or minimum at x = −1, or whether we are in a similar case as we
were with 0 for x3 ?
6.4
The Mean Value Theorem
To develop a tool to answer the above question, we turn our attention to
obtaining our third and final piece of our Triforce theorems. This final
essential theorem is motivated by the following problem: Suppose one drove
from College Station to Houston (approximately 96 miles) in one hour. How
can we prove that at some point in the journey the driver was speeding?
Of course we must make some similar assumptions again: that is, distance
is a function of time (no time travel) and distance is a continuous function of
time (no teleporters). Furthermore, to be able to measure the speed of the
vehicle at any instant in time, the distance function must be differentiable.
100
CHAPTER 6. DIFFERENTIATION
Our theorem (Theorem 6.4.2) will demonstrate that there must be a
point where the instantaneous speed of the vehicle is the mean (or average)
value of the speed; namely 96 miles per hour in this case. Consequently, at
some point in the journey, the vehicle must have been speeding.
6.4.1
Proof of the Mean Value Theorem
To prove our main theorem, we start with a lemma that is easier to prove
and will enable us to prove the desired theorem via a simple translation.
Lemma 6.4.1 (Rolle’s Theorem). If f : [a, b] → R is continuous on [a, b],
differentiable on (a, b), and f (a) = f (b) = 0, then there exists a c ∈ (a, b)
such that f 0 (c) = 0.
Proof. The proof will be divided into three cases.
Case 1: f (x) = 0 for all x ∈ (a, b). Clearly f 0 (x) = 0 for all x ∈ (a, b) by
Example 6.1.2.
Case 2: There is an x0 ∈ (a, b) with f (x0 ) > 0. By the Extreme Value
Theorem (Theorem 6.3.3) there exists an c ∈ [a, b] such that f (c) ≥ f (x) for
all x ∈ [a, b]. Thus f (c) ≥ f (x0 ) > 0 so c 6= a, b. Since f (c) ≥ f (x) for all
x ∈ [a, b], c must be a local maximum of f on (a, b) and thus f 0 (c) = 0 by
Proposition 6.3.4.
Case 3: There is an x0 ∈ (a, b) with f (x0 ) < 0. By the Extreme Value
Theorem 6.3.3 there exists an c ∈ [a, b] such that f (c) ≤ f (x) for all x ∈ [a, b].
Thus f (c) ≤ f (x0 ) < 0 so c 6= a, b. Since f (c) ≤ f (x) for all x ∈ [a, b], c must
be a local minimum of f on (a, b) and thus f 0 (c) = 0 by Proposition 6.3.4.
As at least one of the above three cases must always be true, the result
follows.
To use Rolle’s Theorem to obtain our third piece of the Triforce, we will
translate our function. To do this, notice given a function f : [a, b] → R, the
function
f (b) − f (a)
g(x) = f (a) +
(x − a)
b−a
is a line with slope
(b, f (b)).
f (b)−f (a)
b−a
that passes through the points (a, f (a)) and
Theorem 6.4.2 (Mean Value Theorem). If f : [a, b] → R is continuous
on [a, b] and differentiable on (a, b), then there exists a c ∈ (a, b) such that
(a)
f 0 (c) = f (b)−f
.
b−a
Proof. Let f : [a, b] → R be continuous on [a, b] and differentiable on (a, b).
Define h : [a, b] → R by
h(x) = f (x) − f (a) −
f (b) − f (a)
(x − a)
b−a
6.4. THE MEAN VALUE THEOREM
101
for all x ∈ [a, b]. By previous results h is continuous on [a, b] and differentiable
on (a, b) with
f (b) − f (a)
h0 (x) = f 0 (x) −
.
b−a
Notice that
h(a) = f (a) − f (a) −
f (b) − f (a)
(a − a) = 0
b−a
h(b) = f (b) − f (a) −
f (b) − f (a)
(b − a) = 0.
b−a
and
Hence, by Rolle’s Theorem (Lemma 6.4.1), there exists a c ∈ (a, b) such that
h0 (c) = 0. Hence
f (b) − f (a)
0 = f 0 (c) −
b−a
and thus the result follows.
Now that our Triforce is complete, any result we wish to prove is ours!
Before we move on, we notice that the conclusions of the Mean Value
Theorem can fail even if f is not differentiable at a single point. Indeed if
f : [−1, 1] → R is defined by f (x) = |x|, then f is continuous on [−1, 1] and
differentiable on (−1, 1) \ {0}. However, there is no point c ∈ (−1, 1) such
(−1)
that f 0 (c) = f (1)−f
= 1−1
−2 = 0.
1−(−1)
6.4.2
Anti-Derivatives
For our first application of the Mean Value Theorem, we will examine how
zero derivatives affect functions. In particular, this enable us to study
functions which have the same derivatives.
Corollary 6.4.3. Let I be an open interval and suppose f : I → R is
differentiable on I with f 0 (x) = 0 for all x ∈ I. Then there exists an α ∈ R
such that f (x) = α for all x ∈ I.
Proof. Let a, b ∈ I be such that a < b. Then f is continuous on [a, b] (by
Theorem 6.1.7) and differentiable on (a, b). Therefore, by the Mean Value
Theorem (Theorem 6.4.2), there exists a c ∈ (a, b) such that
f (b) − f (a)
= f 0 (c) = 0 =⇒ f (b) = f (a)
b−a
(as f 0 (x) = 0 for all x ∈ I).
Choose x0 ∈ I and let α = f (x0 ). If x ∈ I and x 6= x0 , then either x > x0
or x < x0 . In either case the above shows that f (x) = f (x0 ) = α. Hence
f (x) = α for all x ∈ I.
102
CHAPTER 6. DIFFERENTIATION
Corollary 6.4.4. Let I be an open interval and suppose f, g : I → R are
differentiable on I with f 0 (x) = g 0 (x) for all x ∈ I. Then there exists an
α ∈ R such that f (x) = g(x) + α for all x ∈ I.
Proof. Let h : I → R be defined by h(x) = f (x) − g(x). Then h is differentiable on I and
h0 (x) = f 0 (x) − g 0 (x) = 0
for all x ∈ I. Hence there exists an α ∈ R such that h(x) = α for all x ∈ I.
Hence f (x) = g(x) + α for all x ∈ I.
Based on the above, we make the following definition.
Definition 6.4.5. Let I be an open interval and let f : I → R. A function
F : I → R is said to be an anti-derivative of f on I if F is differentiable on
I and F 0 (x) = f (x) for all x ∈ I.
Thus Corollary 6.4.4 implies that if F is an anti-derivative of f , then the
function G defined via G(x) = F (x) + c for all x and for some fixed constant
c ∈ R is also an anti-derivative of f . Anti-derivative are important tools in
integration as we will see in the next chapter.
6.4.3
Monotone Functions and Derivatives
For our next application of the Mean Value Theorem, we will see how we
may deduce a function is increasing from its derivative. This will also enable
us to construct a test to see if an extreme value of a function is either a
maximum, a minimum, or neither.
Theorem 6.4.6 (Increasing Function Theorem). Let f : [a, b] → R be
continuous on [a, b] and differentiable on (a, b). If f 0 (x) ≥ 0 for all x ∈ (a, b),
then f is non-decreasing on [a, b]. Similarly, if f 0 (x) > 0 for all x ∈ (a, b),
then f is increasing on [a, b]
Proof. Let f : [a, b] → R be continuous on [a, b] and differentiable on (a, b)
such that f 0 (x) ≥ 0 for all x ∈ (a, b). Let x1 , x2 ∈ [a, b] be such that x1 < x2 .
Since f is continuous on [x1 , x2 ] and differentiable on (x1 , x2 ), the Mean
Value Theorem (Theorem 6.4.2) implies that there exists a c ∈ (x1 , x2 ) such
that
f (x2 ) − f (x1 )
f 0 (c) =
.
x2 − x1
By the assumptions on f , f 0 (c) ≥ 0. Therefore, since x1 < x2 , we must have
that
f (x2 ) − f (x1 ) = f 0 (c)(x2 − x1 ) ≥ 0
Hence f must be non-decreasing on [a, b] as desired.
The proof in the case that f 0 (x) > 0 for all x ∈ (a, b) is identical.
6.4. THE MEAN VALUE THEOREM
103
A similar proof shows the following.
Theorem 6.4.7 (Decreasing Function Theorem). Let f : [a, b] → R be
continuous on [a, b] and differentiable on (a, b). If f 0 (x) ≤ 0 for all x ∈ (a, b),
then f is non-increasing on [a, b]. Similarly, if f 0 (x) < 0 for all x ∈ (a, b),
then f is decreasing on [a, b].
Note since we said that one of the defining properties of f (x) = ex is that
f 0 (x) = f (x), and since f (x) > 0 for all x ∈ R, we see that ex is an increasing
function on R. Therefore ex is injective on R. Thus, all that is missing in
our theory of ex and ln(x) are some of the details in Remark 6.1.6. Similarly,
we can now show sin(x), cos(x), and tan(x) are increasing/decreasing on
certain intervals (assuming we know what their derivatives are).
At the end of Section 6.3, we demonstrated that if f (x) = xex , then
0
f (x) = (x + 1)ex so the only possible local maximum or minimum of f will
occur at x = −1. However, we did not have the ability to determine if f did
have a local maximum or minimum at −1. Thanks to the following, we do.
Note the following does not require the existence of the second derivative
(see Definition 6.4.13), nor does it require the function to be differentiable at
the point c in question.
Theorem 6.4.8 (First Derivative Test). Let f : (a, b) → R be differentiable on (a, b). Suppose c ∈ (a, b) has the property that there exists a δ > 0
such that
• f 0 (x) exists and f 0 (x) > 0 for all x ∈ (c, c + δ) ⊆ (a, b), and
• f 0 (x) exists and f 0 (x) < 0 for all x ∈ (c − δ, c) ⊆ (a, b).
Then f has a local minimum at c.
Similarly, suppose c ∈ (a, b) has the property that there exists a δ > 0
such that
• f 0 (x) exists and f 0 (x) < 0 for all x ∈ (c, c + δ) ⊆ (a, b), and
• f 0 (x) exists and f 0 (x) > 0 for all x ∈ (c − δ, c) ⊆ (a, b).
Then f has a local maximum at c.
Proof. Let f : (a, b) → R be differentiable on (a, b). Suppose c ∈ (a, b) has
the property that there exists a δ > 0 such that
• f 0 (x) exists and f 0 (x) > 0 for all x ∈ (c, c + δ), and
• f 0 (x) exists and f 0 (x) < 0 for all x ∈ (c − δ, c).
To see that f has a local minimum at c, first let x ∈ (c, c + δ) ⊆ (a, b) be
arbitrary. Since f is continuous on [c, x] and differentiable on (c, x), the
104
CHAPTER 6. DIFFERENTIATION
Mean Value Theorem (Theorem 6.4.2) implies there exists a d ∈ (c, x) such
that
f (x) − f (c)
f 0 (d) =
.
x−c
Since d ∈ (c, c + δ), f 0 (d) > 0. Hence the above equation implies f (x) > f (c)
for all x ∈ (c, c + δ). Similarly, let x ∈ (c − δ, c) ⊆ (a, b) be arbitrary. Since f
is continuous on [x, c] and differentiable on (x, c), the Mean Value Theorem
(Theorem 6.4.2) implies there exists a d ∈ (x, c) such that
f 0 (d) =
f (c) − f (x)
.
c−x
Since d ∈ (c − δ, c), f 0 (d) < 0. Hence the above equation implies f (x) > f (c)
for all x ∈ (c − δ, c). Therefore, f has a local minimum at c by definition.
The proof of the second portion of this result is similar to the first.
Consequently, if f (x) = xex , then f 0 (x) = (x + 1)ex . If x < −1 then
x + 1 < 0 whereas ex > 0 so f 0 (x) < 0 if x < 0. Furthermore, if x > −1 then
x + 1 > 0 and ex > 0 so f 0 (x) > 0. Therefore, f has a local minimum at −1
by Theorem 6.4.8. Note f (−1) = − 1e . Is − 1e a global minimum of f ? It is
not difficult to see that limx→∞ xex = ∞ as both x and ex tend to infinity as
x tends to infinity (note by the definition of ex in Remark 6.1.6, ex > 1 + 12 x
for all x > 0). However limx→−∞ xex is not as clear as limx→−∞ x = −∞
whereas
lim ex = lim e−a = 0.
x→−∞
a→∞
xex
Thus, does
converge as x tends to −∞? It turns out the limit will be 0
thereby showing that f has a global minimum of − 1e at x = −1.
6.4.4
L’Hôpital’s Rule
Notice that, if the limits exist,
a
.
a→∞
a→∞ ea
x→−∞
However, this does not help us as lima→∞ a = ∞ = lima→∞ ea so again have
no useful information. If only we had a way to compute such limits?
To develop a method for computing limits of the form 00 or ∞
∞ , we will
need to place our Mean Value Theorem on steroids:
lim xex = − lim ae−a = − lim
Theorem 6.4.9 (Cauchy’s Mean Value Theorem). Let f, g : [a, b] → R
be continuous on [a, b] and differentiable on (a, b) with g 0 (x) 6= 0 for all
x ∈ (a, b). Then there exists a c ∈ (a, b) such that
f (b) − f (a)
f 0 (c)
= 0 .
g(b) − g(a)
g (c)
(Note: If g(x) = x for all x ∈ [a, b], this is precisely the Mean Value
Theorem.)
6.4. THE MEAN VALUE THEOREM
105
Proof. By the Mean Value Theorem (Theorem 6.4.2) there exists a d ∈ (a, b)
such that
g(b) − g(a)
g 0 (d) =
.
b−a
As g 0 (d) 6= 0, we obtain that g(b) − g(a) 6= 0.
Define h : [a, b] → R by
h(x) =
f (b) − f (a)
(g(x) − g(a)) − f (x) + f (a)
g(b) − g(a)
for all x ∈ [a, b]. Note that h makes sense as g(b) − g(a) 6= 0. Furthermore,
notice that h is continuous on [a, b] and differentiable on (a, b) with
h0 (x) =
f (b) − f (a) 0
g (x) − f 0 (x).
g(b) − g(a)
Furthermore, notice
h(a) =
f (b) − f (a)
(g(a) − g(a)) − f (a) + f (a) = 0
g(b) − g(a)
whereas
h(b) =
f (b) − f (a)
(g(b)−g(a))−f (b)+f (a) = (f (b)−f (a))−f (b)+f (a) = 0.
g(b) − g(a)
Hence by Rolle’s Theorem (Lemma 6.4.1) or, alternatively, by the Mean
Value Theorem, there exists a c ∈ (a, b) such that h0 (c) = 0. Hence
0=
Therefore
f (b) − f (a) 0
g (c) − f 0 (c).
g(b) − g(a)
f (b) − f (a)
f 0 (c)
= 0
g(b) − g(a)
g (c)
as g 0 (c) 6= 0.
Using Cauchy’s Mean Value Theorem, we can obtain the following result
which is extremely useful for computing limits. This result is commonly
believed to be first proved by Bernoulli.
Theorem 6.4.10 (L’Hôpital’s Rule). Suppose that −∞ < a < b < ∞,
that f, g : (a, b) → R are such that f and g are differentiable on (a, b), that
g 0 (x) 6= 0 for all x ∈ (a, b), and that either
i) limx→a+ f (x) = limx→a+ g(x) = 0, or
ii) limx→a+ f (x) = limx→a+ g(x) = ±∞.
106
CHAPTER 6. DIFFERENTIATION
Then:
a) If limx→a+
f 0 (x)
g 0 (x)
= L ∈ R, then limx→a+
b) If limx→a+
f 0 (x)
g 0 (x)
= ±∞, then limx→a+
f (x)
g(x)
f (x)
g(x)
= L.
= ±∞.
Similarly, the result holds with a+ exchanged with b−, ∞, or −∞.
Proof. For all cases, we claim that there exists at most one point x in (a, b)
such that g(x) = 0. To see this, notice for all x1 , x2 ∈ (a, b) with x1 < x2
that g is continuous on [x1 , x2 ] (by Theorem 6.1.7) and differentiable on
(x1 , x2 ). Hence, by the Mean Value Theorem (Theorem 6.4.2), there exists a
d ∈ (x1 , x2 ) such that
g(x2 ) − g(x1 )
g 0 (d) =
.
x2 − x1
As g 0 (d) 6= 0, we obtain that g(x2 ) − g(x1 ) 6= 0. As this holds for all
x1 , x2 ∈ (a, b) with x1 < x2 , there exists at most one point, say γ, in (a, b)
such that g(γ) = 0.
0 (x)
For the proof of part (a), suppose limx→a+ fg0 (x)
= L. Let > 0 be
0
arbitrary. Therefore, there exists a b ∈ (a, b) such that
0
f (x)
g 0 (x) − L < for all x ∈ (a, b0 ). If γ exists as in the previous paragraph, we may assume
that b0 < γ by decreasing b0 if necessary.
Let α and β be arbitrary numbers such that a < α < β < b0 . Since f
and g are continuous on [α, β], differentiable on (α, β), and g 0 (x) 6= 0 for all
x ∈ (α, β), Cauchy’s Mean Value Theorem (Theorem 6.4.9) implies there
exists a c ∈ (α, β) such that
f 0 (c)
f (β) − f (α)
=
.
0
g (c)
g(β) − g(α)
Hence, as c ∈ (α, β) ⊆ (a, b0 ), we obtain that
0
f (c)
f (β) − f (α)
g 0 (c) − L = g(β) − g(α) − L < .
For part (i), notice the above inequality holds for any β ∈ (a, b0 ) and for
all α ∈ (a, β). Hence, by fixing a β ∈ (a, b0 ) and taking the limit of α ∈ (a, β)
as α tends to a, we obtain that
f (β)
f (β) − f (α)
= lim
g(β) α→a+ g(β) − g(α)
6.4. THE MEAN VALUE THEOREM
107
as g(β) 6= 0 and limα→a+ f (β) = 0 = limβ→a+ g(β). Hence, for all β ∈ (a, b0 ),
we have that
f (β)
≤
−
L
g(β)
Since > 0 was arbitrary, we obtain that
lim
x→a+
f (x)
=L
g(x)
as desired.
For part (ii) (that is limx→a+ f (x) = limx→a+ g(x) = ±∞), notice
f (β) f (α)
1
−
=
(f (β) − f (α))
g(α)
g(α)
g(α)
1
f 0 (c)
=
(g(β) − g(α)) 0
g(α)
g (c)
0
0
g(β) f (c) f (c)
=
− 0
g(α) g 0 (c)
g (c)
so
f 0 (c) f (β) g(β) f 0 (c)
f (α)
= 0
+
−
.
g(α)
g (c)
g(α) g(α) g 0 (c)
Hence
0
f (α)
f (c)
f (β) g(β) f 0 (c) g(α) − L ≤ g 0 (c) − L + g(α) + g(α) g 0 (c) f (β) g(β) +
(L + )
≤ + g(α) g(α) for all β ∈ (a, b0 ) and for all α ∈ (a, β). However, for any fixed β, we know
that
f (β) g(β) +
lim g(α) (L + ) = 0
α→a+ g(α) as limα→a+ g(α) = ±∞ (note we really do not need to know limx→a+ f (x) =
∞ for this to work). Hence, there exists a δ > 0 such that if a < α < a + δ,
then
f (β) g(β) +
(L + ) < .
0 ≤ g(α) g(α) Hence, if a < α < a + δ then
f (α)
g(α) − L < 2.
Since > 0 was arbitrary, we obtain that
f (x)
=L
x→a+ g(x)
lim
108
CHAPTER 6. DIFFERENTIATION
as desired.
0 (x)
For part (b), suppose limx→a+ fg0 (x)
= ∞ as the proof when the limit is
−∞ is similar. Let M > 0 be arbitrary. Therefore, there exists a b0 ∈ (a, b)
such that
f 0 (x)
>M
g 0 (x)
for all x ∈ (a, b0 ). If γ exists as in the previous paragraph, we may assume
that b0 < γ by decreasing b0 if necessary.
Let α and β be arbitrary numbers such that a < α < β < b0 . Since f
and g are continuous on [α, β], differentiable on (α, β), and g 0 (x) 6= 0 for all
x ∈ (α, β), Cauchy’s Mean Value Theorem (Theorem 6.4.9) implies there
exists a c ∈ (α, β) such that
f 0 (c)
f (β) − f (α)
=
.
0
g (c)
g(β) − g(α)
Hence, as c ∈ (α, β) ⊆ (a, b0 ), we obtain that
f (β) − f (α)
f 0 (c)
=
> M.
g 0 (c)
g(β) − g(α)
For part (i), notice the above inequality holds for any β ∈ (a, b0 ) and for
all α ∈ (a, β). Hence, by fixing a β ∈ (a, b0 ) and taking the limit of α ∈ (a, β)
as α tends to a, we obtain that
f (β)
f (β) − f (α)
= lim
α→a+
g(β)
g(β) − g(α)
as g(β) 6= 0 and limα→a+ f (β) = 0 = limβ→a+ g(β). Hence, for all β ∈ (a, b0 ),
we have that
f (β)
≥M
g(β)
Since M > 0 was arbitrary, we obtain that
lim
x→a+
f (x)
=∞
g(x)
as desired.
For part (ii), we may repeat the computation in part (a) to obtain that
f (α)
f 0 (c) f (β) g(β) f 0 (c)
= 0
+
−
.
g(α)
g (c)
g(α) g(α) g 0 (c)
Notice that as limx→a+ f (x) = limx→a+ g(x) = ±∞, we obtain there exists
a δ1 > 0 such that
f (β)
>0
g(α)
and
g(β)
>0
g(α)
6.4. THE MEAN VALUE THEOREM
109
whenever a < α < β < a + δ1 (i.e. f (β), g(α), and g(β) must eventually all
by the same sign). Therefore
f (β) g(β) f 0 (c)
f (β) g(β)
−
≥
−
M.
g(α) g(α) g 0 (c)
g(α) g(α)
Fix β. Since limx→a+ g(x) = ±∞ we know that
lim
x→a+
f (β) g(β)
−
M = 0.
g(α) g(α)
Hence we may find a 0 < δ < δ1 such that if a < α < a + δ, then
M
f (β) g(β)
−
M ≥−
g(α) g(α)
2
Thus, if a < α < a + δ, then
f (α)
f 0 (c) f (β) g(β) f 0 (c)
= 0
+
−
g(α)
g (c)
g(α) g(α) g 0 (c)
M
M
≥M−
=
.
2
2
Since M > 0 was arbitrary, we obtain that
lim
x→a+
f (x)
=∞
g(x)
as desired.
The proof is nearly identical when we replace a+ with b− (change the role
of α and β). To complete the proof, we will run part (a), case (i) when we
replace a+ with ∞ as replacing with −∞ is similar and all other parts/cases
are similar.
0 (x)
Suppose limx→∞ f (x) = limx→∞ g(x) = 0 and limx→∞ fg0 (x)
= L ∈ R.
Let h(x) = f
1
x
and k(x) = g
1
x
for all x ∈ (a, ∞). Notice
lim h(x) = lim f (x) = 0 = lim g(x) = lim k(x).
x→0+
x→∞
x→∞
x→0+
Also notice that h and k are differentiable on (a, ∞) \ {0} via the Chain Rule
(Theorem 6.1.14) with
1
1
h (x) = − 2 f 0
x
x
0
and
Therefore
lim
x→0+
h0 (x)
k 0 (x)
f0
= lim
x→0+
g0
1
x
1
1
k (x) = − 2 g 0
.
x
x
0
f 0 (x)
= L.
x→∞ g 0 (x)
= lim
1
x
110
CHAPTER 6. DIFFERENTIATION
Hence, by our previous proofs, we obtain that
h(x)
= L.
k(x)
lim
x→0+
Since the existence of the above limit implies limx→∞
limx→∞
(x)0
f (x)
g(x)
= limx→0+
h(x)
k(x) ,
f (x)
g(x)
exists and since
the result follows.
Using L’Hôpital’s rule, we can determine limx→∞
= 1 and (ex )0 = ex , and since
x
ex .
Indeed, since
1
= 0,
ex
lim
x→∞
we obtain by L’Hôpital’s rule tht limx→∞
we can use L’Hôpital’s rule to show that
x
ex
= 0. Similarly, using induction,
xn
=0
x→∞ ex
lim
for all n ∈ N; that is, ex grows substantially faster than any power of x!
We can use L’Hôpital’s rule to prove that a plethora of limits exists and
compute said limits. Furthermore, there are many other forms of L’Hôpital’s
rule.
Example 6.4.11 (Fundamental Logarithmic Limit). The Fundamental
Logarithmic Limit is the limit
lim x ln(x).
x→0+
Although it does not appear that we may apply L’Hôpital’s rule, we write
x ln(x) = ln(x)
1 . Then we may apply L’Hôpital’s rule limx→0+ ln(x) = −∞
x
whereas limx→0+
1
x
= ∞ so, up to multiplication by −1, the hypotheses of
L’Hôpital’s rule hold. Since (ln(x))0 =
1
x
x→0+ − 12
x
1
x
and
0
1
x
= − x12 , and since
= lim −x = 0
lim
x→0+
we obtain that limx→0+ x ln(x) = 0 by L’Hôpital’s rule.
Example 6.4.12. Using similar techniques, we may compute
1
1+
x
lim
x→∞
x
.
To begin, we will instead first compute
lim ln
x→∞
1+
1
x
x = lim x ln 1 +
x→∞
1
x
ln 1 +
= lim
x→∞
1
x
1
x
.
6.4. THE MEAN VALUE THEOREM
thermore, since
0
1
x
1
x
First notice that limx→∞
= 0 and limx→∞ ln 1 +
lim
we obtain that limx→∞ ln
ex is continuous,
1
1+ x1
− x12
1+
1
x
1+
x
1
x
1
x
0
=
1
1+ x1
1
x
= ln(1) = 0. Fur
− x12 , and since
− x12
x→∞
= − x12 and ln 1 +
x→∞
lim
111
= lim
x→∞
x 1
1+
1
x
= 1,
= 1 by L’Hôpital’s rule. Therefore, as
x
1
= lim eln((1+ x ) ) = e1 = e.
x→∞
Hence, using the sequential definition of limits, we obtain the well-known
limit
1 n
lim 1 +
= e.
n→∞
n
6.4.5
Taylor’s Theorem
Our final application of the Mean Value Theorem is when given a differentiable
function f to approximate f pointwise using polynomials. However, to obtain
better and better approximations, we will need polynomials of larger and
larger degree, and more and more derivatives of f . Thus we defined the
following.
Definition 6.4.13. Suppose f : (a, b) → R is differentiable on (a, b). If f 0 is
differentiable at c ∈ (a, b), the derivative of f 0 is called the second derivative
of f and is denoted f 00 . In particular
f 0 (c + h) − f 0 (c)
.
h→0
h
f 00 (c) = lim
In general, for any n ∈ N, the (n + 1)st -derivative of f is
f (n) (c + h) − f (n) (c)
h→0
h
f (n+1) (c) = lim
provided f (n) exists. For convenience, f (0) = f .
Definition 6.4.14. Assuming that f is n-times differentiable at a (which
means it is (n − 1)-times differentiable in an open interval containing a), the
nth -degree Taylor polynomial of f centred at a is
Pf,a,n (x) =
n
X
f (k) (a)
k=0
k!
(x − a)k .
112
CHAPTER 6. DIFFERENTIATION
For example, if f (x) = x2 , then Pf,0,n (x) = x2 for all n ≥ 2. Alternatively
if f (x) = ex for all x ∈ R, then
Pf,0,n (x) =
n
X
1 k
x ,
k=0
k!
which are the polynomials that define ex up to a limit in Remark 6.1.6.
The reason for examining Taylor polynomials which says that f (x) is
almost Pf,a,n (x).
Theorem 6.4.15 (Taylor’s Theorem). Let I be an open interval containing a point a and let f : I → R be n + 1 times differentiable on I. If
x ∈ I \ {a}, then there exists a cx ∈ (a, x) ∪ (x, a) such that
f (x) = Pf,a,n (x) +
f (n+1) (cx )
(x − a)n+1 .
(n + 1)!
Proof. Consider the function g : I → R defined by
g(t) = f (x) − f (t) −
n
X
f (k) (t)
k=1
k!
(x − t)k .
Therefore
g(a) = f (x) − Pf,a,n (x)
and
g(x) = 0.
Notice that g is continuous on I and differentiable on I. Since
f (t) 7−→ d f 0 (t)
dt
f (t)(x − t) 7−→ d f 0 (t)(−1) + f 00 (t)(x − t)
0
dt
f 00 (t)
f 000 (t)
(x − t)2 −
7 → d f 00 (t)(−1)(x − t) +
(x − t)2
dt
2!
2!
f 000 (t)
f 000 (t)
f 0000 (t)
(x − t)3 −
7 →d
(−1)(x − t)2 +
(x − t)3
dt
3!
2!
3!
excreta, we see that
g 0 (t) = −
f (n+1) (t)
(x − t)n .
n!
Let h : I → R be defined by
h(t) = g(t) −
x−t
x−a
n+1
g(a).
Therefore
h(a) = g(a) − g(a) = 0
and
h(x) = g(x) − 0 = 0.
6.4. THE MEAN VALUE THEOREM
113
Since h is differentiable on I, h is continuous on [a, x]∪[x, a] and differentiable
on (a, x) ∪ (x, a). Hence by Rolle’s Theorem (Lemma 6.4.1) or by the Mean
Value Theorem (Theorem 6.4.2), there exists a c ∈ (a, x) ∪ (x, a) such that
h0 (c) = 0. Since
f (n+1) (t)
−1
h (t) =
(x − t)n + (n + 1)
n!
x−a
0
x−t
x−a
n
g(a),
we obtain that
f (n+1) (c)
1
(x − c)n = (n + 1)
n!
x−a
Hence
x−c
x−a
n
g(a).
f (n+1) (c)
(x − a)n+1 = g(a) = f (x) − Pf,a,n (x).
(n + 1)!
As desired.
One important use of Taylor’s Theorem can be obtained if one knows
bounds for f (n+1) (cx ). Indeed, if one knows that |f (n+1) (c)| ≤ M for all
c ∈ (a − δ, a + δ) for some M > 0, then we have that
|f (x) − Pf,a,n (x)| ≤
M
(x − a)n+1
(n + 1)!
for all x ∈ (a − δ, a + δ). Consequently, provided we can approximate M
well, we can approximate f (x) with Pf,a,n (x) on this interval! This can be
quite useful as dealing with polynomials is substantially easier than dealing
with an arbitrary function.
114
CHAPTER 6. DIFFERENTIATION
Chapter 7
Integration
For our final chapter, we will study what will be shown to be the opposite
of differentiation; namely integration. Integration has a wide variety of
uses in calculus as it allows the computation of the area under a curve
and permits the averaging of the values obtained by a function over an
integral. Consequently, the purpose of this chapter is to formal defined the
Riemann integral, develop the basic properties of the Riemann integral, and
demonstrate the connections between differentiation and integration through
our Fundamental Theorems of Calculus (Theorems 7.2.2 and 7.2.3).
7.1
The Riemann Integral
The formal definition of the Riemann integral is modelled on trying to approximate the area under the graph of a function. The idea of approximating
this area is to divide up the interval one wants to integrate over into small
bits, and approximate the area under the graph via rectangles. Thus we
must make such constructions formal. Once this is done, we must decide
whether or not these approximations are good approximations. If they are,
the resulting limit will be the Riemann integral.
7.1.1
Riemann Sums
In order to ‘divide up the interval into small bits’, we will use the following
notion.
Definition 7.1.1. A partition of a closed interval [a, b] is a finite list of real
numbers {tk }nk=0 such that
a = t0 < t1 < t2 < · · · < tn−1 < tn = b.
Eventually, we will want to ensure that |tk − tk−1 | is small for all k in
order to obtain better and better approximations to the area under a graph.
115
116
CHAPTER 7. INTEGRATION
To obtain a lower bound for the area under a graph, we can choose our
approximating rectangles to have the largest possible height while remaining
completely under the graph. This leads us to the following notion.
Definition 7.1.2. Let P = {tk }nk=0 be a partition of [a, b] and let f : [a, b] →
R be bounded. The lower Riemann sum of f associated to P, denoted L(f, P),
is
n
L(f, P) =
X
mk (tk − tk−1 )
k=1
where, for all k ∈ {1, . . . , n},
mk = inf{f (x) | x ∈ [tk−1 , tk ]}.
Example 7.1.3. If f : [0, 1] → R is defined by f (x) = x for all x ∈ [0, 1]
and if P = {tk }nk=0 is a partition of [0, 1], it is easy to see that
L(f, P) =
n
X
tk−1 (tk − tk−1 )
k=1
as f obtains its minimum on [tk−1 , tk ] at tk−1 .
If it so happens that tk = nk for all k ∈ {0, 1, . . . , n}, we see that
L(f, P) =
=
n
X
k−1 k
n
k=1
n
X
k−1
−
n
n
1
(k − 1)
2
n
k=1


X
1 n−1
j
= 2
n
j=1
=
1−
1 n(n − 1)
=
2
n
2
2
1
n
by the homework. Clearly, as n tends to infinity, L(f, P) tends to 12 for this
particular partitions, which happens to be the area under the graph of f on
[0, 1].
Although lower Riemann sums accurately estimate the area under the
graph of the function in the previous example, perhaps we also need an upper
bound for the area under the graph. By choose our approximating rectangles
to have the smallest possible height while remaining completely above the
graph, we obtain the following notion.
Definition 7.1.4. Let P = {tk }nk=0 be a partition of [a, b] and let f :
[a, b] → R be bounded. The upper Riemann sum of f associated to P,
denoted U (f, P), is
U (f, P) =
n
X
k=1
Mk (tk − tk−1 )
7.1. THE RIEMANN INTEGRAL
117
where, for all k ∈ {1, . . . , n},
Mk = sup{f (x) | x ∈ [tk−1 , tk ]}.
Example 7.1.5. If f : [0, 1] → R is defined by f (x) = x for all x ∈ [0, 1]
and if P = {tk }nk=0 is a partition of [0, 1], it is easy to see that
U (f, P) =
n
X
tk (tk − tk−1 )
k=1
as f obtains its maximum on [tk−1 , tk ] at tk .
If it so happens that tk = nk for all k ∈ {0, 1, . . . , n}, we see that
U (f, P) =
=
n
X
k k
k=1
n
X
1
k
2
n
k=1
1
= 2
n
=
n
k−1
−
n
n
n
X
!
k
k=1
1+
1 n(n + 1)
=
2
n
2
2
1
n
by the homework. Clearly, as n tends to infinity, U (f, P) tends to 12 for this
particular partitions, which happens to be the area under the graph of f on
[0, 1].
Although we have been able to approximate the area under the graph of
f (x) = x using upper and lower Riemann sums, how do we know whether
we can accurate do so for other functions? To analyze this question, we must
first decide whether we can compare the upper and lower Riemann sums of a
function. Clearly we have that L(f, P) ≤ U (f, P) for any bounded function
f : [a, b] → R and any partition P of [a, b]. However, if Q is another partition
of [a, b], is it the case that L(f, Q) ≤ U (f, P)? Of course our intuition using
‘areas under a graph’ says this should be so, but how do we prove it?
To answer the above question and provide some ‘sequence-like’ structure
to partitions, we define an ordering on the set of partitions.
Definition 7.1.6. Let P and Q be partitions of [a, b]. It is said that Q is a
refinement of P, denoted P ≤ Q, if P ⊆ Q; that is Q has all of the points
that P has, and possibly more.
It is not difficult to check that refinement defines a partial ordering
(Definition 1.3.3) on the set of all partitions of [a, b]. Furthermore, the
following says that if Q is a refinement of P, we should have better upper
and lower bounds for the area under the graph of a function if we use Q
instead of P.
118
CHAPTER 7. INTEGRATION
Lemma 7.1.7. Let P and Q be partitions of [a, b] and let f : [a, b] → R be
bounded. If Q is a refinement of P, then
L(f, P) ≤ L(f, Q) ≤ U (f, Q) ≤ U (f, P).
Proof. Note the inequality L(f, Q) ≤ U (f, Q) follows from earlier discussions.
Thus it suffices to show that L(f, P) ≤ L(f, Q) and U (f, Q) ≤ U (f, P). Write
P = {tk }nk=0 where
a = t0 < t1 < t2 < · · · < tn−1 < tn = b.
First suppose Q = P ∪ {t0 } where t0 ∈ [a, b] is such that tq−1 < t0 < tq
for some q ∈ {1, . . . , n}. Therefore, if
mk = inf{f (x) | x ∈ [tk−1 , tk ]}
and
Mk = sup{f (x) | x ∈ [tk−1 , tk ]}
then
L(f, P) =
n
X
mk (tk − tk−1 )
and
U (f, P) =
k=1
n
X
Mk (tk − tk−1 ).
k=1
However, if
m0q = inf{f (x) | x ∈ [tq−1 , t0 ]},
m00q = inf{f (x) | x ∈ [t0 , tq ]},
Mq0 = sup{f (x) | x ∈ [tq−1 , t0 ]}, and
Mq00 = sup{f (x) | x ∈ [t0 , tq ]},
then we easily see that mq ≤ m0q , m00q , that Mq0 , Mq00 ≤ Mq ,
L(f, P) = m0q (t0 − tq−1 ) + m00q (tq − t0 ) +
n
X
mk (tk − tk−1 ),
k=1
k6=q
n
X
U (f, P) = Mq0 (t0 − tq−1 ) + Mq00 (tq − t0 ) +
and
Mk (tk − tk−1 ).
k=1
k6=q
Thus
L(f, Q) − L(f, P) = m0q (t0 − tq−1 ) + m00q (tq − t0 ) − mq (tq − tq−1 )
≥ mq (t0 − tq−1 ) + mq (tq − t0 ) − mq (tq − tq−1 ) = 0
so L(f, P) ≤ L(f, Q). Similarly
U (f, Q) − U (f, P) = Mq0 (t0 − tq−1 ) + Mq00 (tq − t0 ) − Mq (tq − tq−1 )
≤ Mq (t0 − tq−1 ) + Mq (tq − t0 ) − Mq (tq − tq−1 ) = 0
7.1. THE RIEMANN INTEGRAL
119
so U (f, Q) ≤ U (f, P). Hence the result follows when Q = P ∪ {t0 }.
To complete the proof, let Q be an arbitrary refinement of P. Thus we
0 m
can write Q = P ∪ {t0k }m
k=1 for some {tk }k=1 ⊆ (a, b). Thus, by the first part
of the proof,
L(f, P) ≤ L(f, P ∪ {t01 }) ≤ L(f, P ∪ {t01 , t02 }) ≤ · · · ≤ L(f, Q)
and
U (f, P) ≥ U (f, P ∪ {t01 }) ≥ U (f, P ∪ {t01 , t02 }) ≥ · · · ≥ U (f, Q),
which completes the proof.
In order to answer our question of whether L(f, Q) ≤ U (f, P) for all
partitions P and Q, we can use Lemma 7.1.7 provided we have a partition
that is a refinement of both P and Q:
Definition 7.1.8. Given two partitions P and Q of [a, b], the common
refinement of P and Q is the partition P ∪ Q of [a, b].
Clearly, given two partitions P and Q, P ∪ Q is a refinement of P and Q.
Consequently, if f : [a, b] → R is bounded, then Lemma 7.1.7 implies that
L(f, P) ≤ L(f, P ∪ Q) ≤ U (f, P ∪ Q) ≤ U (f, Q).
Hence any lower bound for the area under a curve is smaller than any upper
bound for the area under a curve.
Before moving on, we note the above shows that the partially ordered
set of partitions of a closed interval [a, b] is a direct set (that is, a partially
ordered set with the property that if P and Q are elements of the partially
ordered set, then there exists an element R such that P ≤ R and Q ≤ R).
A set of real numbers indexed by a direct set is called a net and one can
discuss the convergences of nets in R as we did with sequences. It turns out
nothing new is gained by using nets instead of sequences and we can avoid
the discussion of nets in our discussion of integrals (although they exist in
the background).
7.1.2
Definition of the Riemann Integral
In order to define the Riemann integral of a function on a closed interval,
we desire that the upper and lower Riemann sums both better and better
approximate a number. Using the above observations, we notice that if
f : [a, b] → R is bounded, then
sup{L(f, P) | P a partition of [a, b]}
≤ inf{U (f, P) | P a partition of [a, b]}.
120
CHAPTER 7. INTEGRATION
Therefore, in order for there to be no reasonable discrepancy between our
approximations, we will like an equality in the above inequality (in which
case, the value obtained should be the area under the graph). Unfortunately,
this is not always the case by the following example.
Example 7.1.9. Let f : [0, 1] → R be defined by
(
1
0
f (x) =
if x ∈ Q
if x ∈ R \ Q
for all x ∈ [0, 1]. Since each open interval always contains at least one element
from each of Q and R \ Q by the homework, we easily see that L(f, P) = 0
and U (f, P) = 1 for all partitions P of [0, 1]. Consequently, the upper and
lower Riemann sums do not allow us to approximate the area under the
graph of f .
Consequently, we will restrict our attention to the following type of
functions.
Definition 7.1.10. Let f : [a, b] → R be bounded. It is said that f is
Riemann integrable on [a, b] if
sup{L(f, P) | P a partition of [a, b]}
= inf{U (f, P) | P a partition of [a, b]}.
If f is Riemann
integrable on [a, b], the Riemann integral of f from a to b,
Rb
denoted a f (x) dx, is
Z b
f (x) dx = sup{L(f, P) | P a partition of [a, b]}
a
= inf{U (f, P) | P a partition of [a, b]}.
Remark 7.1.11. Notice that if f is Riemann integrable on [a, b], then
L(f, P) ≤
Z a
f (x) dx ≤ U (f, P)
b
for every partition P of [a, b] by the definition of the Riemann integral.
Clearly the function f in Example 7.1.9 is not Riemann integrable.
However, which types of function are Riemann integrable and how can we
compute the integral? To illustrate the definition, we note the following
simple examples (note if the first example did not work out the way it does,
we clearly would not have a well-defined notion of area under a graph using
Riemann integrals).
7.1. THE RIEMANN INTEGRAL
121
Example 7.1.12. Let c ∈ R and let f : [a, b] → R be defined by f (x) = c
for all x ∈ [a, b]. If P = {tk }nk=0 is a partition of [a, b], we see that
L(f, P) = U (f, P) =
n
X
c(tk − tk−1 ) = c
k=1
n
X
tk − tk−1 = c(tn − t0 ) = c(b − a).
k=1
Hence f is Riemann integrable and
doubt?)
Rb
a
f (x) dx = c(b − a). (Was there any
Example 7.1.13. Let f : [0, 1] → R be defined by f (x) = x for all x ∈ [0, 1].
For each n ∈ N, note Example 7.1.3 demonstrates the existence of a partition
Pn such that L(f, Pn ) =
1
1− n
2
. Hence
sup{L(f, P) | P a partition of [a, b]} ≥ lim sup
n→∞
1−
2
1
n
1
= .
2
Similarly, for each n ∈ N, Example 7.1.5 demonstrates the existence of a
partition Qn such that U (f, Qn ) =
1
1+ n
2
. Hence
inf{U (f, P) | P a partition of [a, b]} ≤ lim inf
n→∞
1+
2
1
n
1
= .
2
Therefore, since
sup{L(f, P) | P a partition of [a, b]}
≤ inf{U (f, P) | P a partition of [a, b]},
the above computations show both the inf and sup must be 12 . Hence f is
R
Riemann integrable on [0, 1] and 01 x dx = 12 .
Example 7.1.14. Let f : [0, 1] → R be defined by fR(x) = x2 for all x ∈ [0, 1].
We claim that f is Riemann integrable on [0, 1] and 01 x2 dx = 13 . To see this,
let n ∈ N and let Pn = {tk }nk=1 be the partition of [0, 1] such that tk = nk for
all n ∈ N. Then, by the first assignment,
L(f, P) =
=
n
X
(k − 1)2 k
n2
k=1
n
X
n
−
k−1
n
1
(k − 1)2
3
n
k=1

n−1
X

=
1 
j2
n3 j=1
=
1 (n − 1)(n)(2(n − 1) + 1)
2n3 − 3n2 + n
=
n3
6
6n3
122
CHAPTER 7. INTEGRATION
and
U (f, P) =
=
n
X
k2 k
k=1
n
X
Hence, since limn→∞
n2
1 2
k
n3
k=1
1
= 3
n
=
k−1
−
n
n
n
X
!
k
2
k=1
1 n(n + 1)(2n + 1)
2n3 + 3n2 + n
.
=
3
n
2
6n3
2n3 −3n2 +1
6n3
= limn→∞
2n3 +3n2 +1
6n3
= 13 , we see that
1
≤ sup{L(f, P) | P a partition of [a, b]}
3
1
≤ inf{U (f, P) | P a partition of [a, b]} ≤ .
3
Hence the inequalities
must be equalities so f is Riemann integrable on [0, 1]
R
by definition with 01 x2 dx = 13
Note in the previous two examples, the functions were demonstrated
to be Riemann integrable on [0, 1] via partitions P such that L(f, P) and
U (f, P) were as closes as one would like. Coincidence, I think not!
Theorem 7.1.15. Let f : [a, b] → R be bounded. Then f is Riemann
integrable if and only if for every > 0 there exists a partition P of [a, b]
such that
0 ≤ U (f, P) − L(f, P) < .
Proof. Note we must have that 0 ≤ U (f, P) − L(f, P) for any partition P
by earlier discussions.
R
First suppose f is Riemann integrable. Hence, if I = ab f (x) dx, we have
by the definition of the integral that
I = sup{L(f, P) | P a partition of [a, b]}
= inf{U (f, P) | P a partition of [a, b]}.
Let > 0 be arbitrary. By the definition of the supremum, there exists a
partition P1 of [a, b] such that
I − L(f, P1 ) < .
2
Similarly, by the definition of the infimum, there exists a partition P2 of
[a, b] such that
U (f, P2 ) − I < .
2
7.1. THE RIEMANN INTEGRAL
123
Let P = P1 ∪ P2 which is a partition of [a, b]. Since P is a refinement of
both P1 and P2 , we obtain that
L(f, P1 ) ≤ L(f, P) ≤ U (f, P) ≤ U (f, P2 )
by Lemma 7.1.7. Hence
U (f, P) − L(f, P) ≤ U (f, P2 ) − L(f, P1 )
= (U (f, P2 ) − I) + (I − L(f, P1 ))
< + = .
2 2
Therefore, since > 0 was arbitrary, this direction of the proof is complete.
For the other direction, suppose every > 0 there exists a partition P of
[a, b] such that
0 ≤ U (f, P) − L(f, P) < .
In particular, for each n ∈ N there exists a partition Pn of [a, b] such that
0 ≤ U (f, Pn ) − L(f, Pn ) <
1
.
n
Let
L = sup{L(f, P) | P a partition of [a, b]} and
U = inf{U (f, P) | P a partition of [a, b]}.
Then L, U ∈ R are such that L ≤ U . However, for each n ∈ N
0 ≤ U − L ≤ U (f, Pn ) − L(f, Pn ) <
1
.
n
Therefore, as the above holds for all n ∈ N, it must be the case (by the homework/Archimedean property) that U = L. Hence f is Riemann integrable
on [a, b] by definition.
Using Theorem 7.1.15, we can obtain an easier method for approximating
a Riemann integral provided we know the function is Riemann integrable.
Indeed suppose P = {tk }nk=0 is a partition of [a, b] with
a = t0 < t1 < t2 < · · · < tn−1 < tn = b
and let f : [a, b] → R be bounded. For each k, suppose xk ∈ [tk−1 , tk ] and let
R(f, P, {xk }nk=1 ) =
n
X
f (xk )(tk − tk−1 ).
k=1
The sum R(f, P, {xk }nk=1 ) is called a Riemann sum. Clearly
L(f, P) ≤ R(f, P, {xk }nk=1 ) ≤ U (f, P).
124
CHAPTER 7. INTEGRATION
In particular, if f is Riemann integrable, we obtain via Theorem 7.1.15 that
for any > 0 there exists a partition P 0 of [a, b] such that
0
L(f, P ) ≤
Z a
f (x) dx ≤ U (f, P 0 ) ≤ L(f, P)0 + b
and thus
Z
b
0
n
f (x) dx − R(f, P , {xk }k=1 ) < a
for any choice of {xk }nk=1 . Consequently, if one knows that f is Riemann
R
integrable, one may approximate ab f (x) dx using Riemann sums oppose to
lower/upper Riemann sums. This is occasionally useful as convenient choices
of {xn }nk=1 may make computing the sum much easier.
Of course, our next question is, “Which classes of functions that have
been studied in this course are Riemann integrable?”
7.1.3
Some Integrable Functions
If the theory of Riemann integration will be of use to us, we must have a
wide variety of functions that are Riemann integrable. First we start with
the following.
Theorem 7.1.16. If f : [a, b] → R is monotonic and bounded, then f is
Riemann integrable on [a, b].
Proof. Let f : [a, b] → R be monotone and bounded. We will assume that f
is non-decreasing as the proof when f is non-increasing is similar.
Fix n ∈ N. Let Pn = {tk }nk=0 be the partition such that
k
(b − a)
n
for all k ∈ {0, . . . , n}. Notice tk − tk−1 = n1 (b − a) for all k (and thus we call
Pn the uniform partition of [a, b] into n intervals). Since f is non-decreasing,
if
tk = a +
mk = inf{f (x) | x ∈ [tk−1 , tk ]}
and
Mk = sup{f (x) | x ∈ [tk−1 , tk ]}
then
mk = f (tk−1 )
and
Mk = f (tk ).
Hence
0 ≤ U (f, Pn ) − L(f, Pn )
=
=
n
X
k=1
n
X
Mk (tk − tk−1 ) −
n
X
mk (tk − tk−1 )
k=1
n
X
1
1
f (tk ) (b − a) −
f (tk−1 ) (b − a)
n
n
k=1
k=1
1
1
1
= f (tn ) (b − a) − f (t0 ) (b − a) = (b − a)(f (b) − f (a)).
n
n
n
7.1. THE RIEMANN INTEGRAL
125
Since limn→∞ n1 (b − a)(f (b) − f (a)) = 0, for each > 0 there exists an
N ∈ N such that
0 ≤ U (f, PN ) − L(f, PN ) ≤
1
(b − a)(f (b) − f (a)) < .
N
Hence Theorem 7.1.15 implies that f is Riemann integrable on [a, b].
Of course, if continuous functions were not Riemann integrable, Riemann
integration would be next to worthless.
Theorem 7.1.17. If f : [a, b] → R is continuous, then f is Riemann
integrable on [a, b].
Proof. Let f : [a, b] → R be continuous. Therefore f is bounded on [a, b] by
the homework. In order to invoke Theorem 7.1.15, let > 0 be arbitrary.
Since f : [a, b] → R is continuous, f is uniformly continuous on [a, b]
by Theorem 5.4.4. Hence there exists a δ > 0 such that if x, y ∈ [a, b] and
|x − y| < δ then |f (x) − f (y)| < b−a
.
1
Choose n ∈ N such that n < δ (homework/Archemedean property), and
let P be the uniform partition of [a, b] into n intervals; that is P = {tk }nk=0
be the partition such that
tk = a +
k
(b − a)
n
for all k ∈ {0, . . . , n}. Let
mk = inf{f (x) | x ∈ [tk−1 , tk ]}
and
Mk = sup{f (x) | x ∈ [tk−1 , tk ]}.
Since |tk − tk−1 | = n1 < δ so |x − y| < δ for all x, y ∈ [tk−1 , tk ], it must be the
case that Mk − mk = |Mk − mk | ≤ b−a
(in fact, < by the Extreme Value
Theorem) for all k ∈ {1, . . . , n}. Hence
0 ≤ U (f, P) − L(f, P) =
n
X
(Mk − mk )(tk − tk−1 )
k=1
n
X
≤
(tk − tk−1 )
b−a
k=1
=
n
X
(b − a) = .
tk − tk−1 =
b − a k=1
b−a
Thus, as > 0 was arbitrary, f is Riemann integrable on [a, b] by Theorem
7.1.15.
We have seen continuity is a lot to ask. However, many functions one
sees and deals with in real-world applications are continuous at almost every
point. In particular, the following shows that if our functions are piecewise
continuous, then they are Riemann integrable.
126
CHAPTER 7. INTEGRATION
Corollary 7.1.18. If f : [a, b] → R is bounded on [a, b], and continuous on
[a, b] except at a finite number of points, then f is Riemann integrable on
[a, b].
Proof. Suppose f : [a, b] → R is continuous except at a finite number of
points and f ([a, b]) is bounded. Let {ak }qk=0 contain all of the points for
which f is not continuous at and be such that
a = a0 < a1 < a2 < · · · < aq = b.
The idea of the proof is to construct a partition such that each interval of the
partition contains at most one ak , and if an interval of the partition contains
an ak , then its length is really small.
Let > 0 be arbitrary. Let
L = sup{f (x) − f (y) | x, y ∈ [a, b]}.
Since f ([a, b]) is bounded, we obtain that 0 ≤ L < ∞. Let
δ=
> 0.
2(q + 1)(L + 1)
It is not difficult to see that there exists a partition P 0 = {tk }2q+1
k=0 with
a = t0 < t1 < t2 < · · · < t2q+2 = b
such that t2k+1 − t2k < δ for all k ∈ {0, . . . , q} and t2k < ak < t2k+1 for all
k ∈ {1, . . . , q − 1}. Let
mk = inf{f (x) | x ∈ [tk−1 , tk ]}
Mk = sup{f (x) | x ∈ [tk−1 , tk ]}.
and
Thus Mk − mk ≤ L for all k ∈ {1, . . . , 2q + 1}.
Since f is continuous on [t2k−1 , t2k ] for all k ∈ {1, . . . , q}, f is Riemann
integrable on [t2k−1 , t2k ] by Theorem 7.1.17. Hence, by the definition of
Riemann integration, there exist partitions Pk of [t2k−1 , t2k ] such that
0 ≤ U (f, Pk ) − L(f, Pk ) <
Let P = P 0 ∪
Sq
k=1 Pk
.
2q
. Then P is a partition of [a, b] such that
0 ≤ U (f, P) − L(f, P)
=
q
X
k=1
U (f, Pk ) − L(f, Pk ) +
q
X
(M2k+1 − m2k+1 )(t2k+1 − t2k ).
k=0
7.1. THE RIEMANN INTEGRAL
127
(that is, on each [t2k−1 , t2k ] the partition behaves like Pk and thus so do the
sums, and the parts of the partition remaining are of the form [t2k , t2k+1 ]
(each of which contains at most one aj ). Hence
0 ≤ U (f, P) − L(f, P)
≤
q
X
k=1
2q
+
q
X
Lδ
k=0
+ (q + 1)Lδ
2
≤ + (q + 1)L
≤ + = .
2
2(q + 1)(L + 1)
2 2
≤
Thus, as > 0 was arbitrary, f is Riemann integrable on [a, b] by Theorem
7.1.15.
Notice the main idea of the above proof is really to construct a finite
number of open intervals which contain all of the discontinuities such that
the sum of the lengths of the open intervals is as small as possible. Therefore,
the same argument can be used to show that if the set of discontinuities of a
function f has measure zero (where ‘measure’ is as defined on the homework),
then f is Riemann integrable (see the homework for an example question).
7.1.4
Properties of the Riemann Integral
Now that we know several functions are Riemann integrable, we desire
properties of the Riemann integral. Hence we begin with the following which
is simple to state, yet technical to prove.
Theorem 7.1.19. Let f, g : [a, b] → R be bounded, Riemann integrable
functions on [a, b]. Then:
a) If α ∈ R, then αf is Riemann integrable on [a, b] and
Z b
Z b
(αf )(x) dx = α
a
f (x) dx.
a
b) f + g is Riemann integrable on [a, b] and
Z b
Z b
(f + g)(x) dx =
a
Z b
f (x) dx +
a
g(x) dx.
a
c) If a ≤ c ≤ b, then f is Riemann integrable on [a, c] and [c, b] with
Z b
Z c
f (x) dx =
a
Z b
f (x) dx.
f (x) dx +
a
c
128
CHAPTER 7. INTEGRATION
d) If f (x) ≤ g(x) for all x ∈ [a, b], then
Z b
f (x) dx ≤
Z b
g(x) dx.
a
a
e) If m ≤ f (x) ≤ M for all x ∈ [a, b], then
m(b − a) ≤
Z b
f (x) dx ≤ M (b − a).
a
Proof. For part (a), let P be any partition of [a, b]. If α ≥ 0, it is not difficult
to see that
L(αf, P) = αL(f, P)
U (αf, P) = αU (f, P).
and
Furthermore, if α < 0, then it is not difficult to see that
L(αf, P) = αU (f, P)
U (αf, P) = αL(f, P)
and
(i.e. if X is a bounded subset of R, inf(−X) = − sup(X)).
Since f is integrable on [a, b], we obtain by definition that
Z b
f (x) dx = sup{L(f, P) | P a partition of [a, b]}
a
= inf{U (f, P) | P a partition of [a, b]}.
Thus, the above computations imply that
Z b
α
f (x) dx = sup{L(αf, P) | P a partition of [a, b]}
a
= inf{U (αf, P) | P a partition of [a, b]}.
Hence αf is Riemann integrable on [a, b] with
Z b
Z b
(αf )(x) dx = α
a
f (x) dx.
a
For part (b), let P be any partition of [a, b]. Since
sup{f (x)+g(x) | x ∈ [c, d]} ≤ sup{f (x) | x ∈ [c, d]}+sup{g(x) | x ∈ [c, d]}
and
inf{f (x) + g(x) | x ∈ [c, d]} ≥ inf{f (x) | x ∈ [c, d]} + inf{g(x) | x ∈ [c, d]}
for all c, d ∈ [a, b], we see that
L(f, P) + L(g, P) ≤ L(f + g, P) ≤ U (f + g, P) ≤ U (f, P) + U (g, P).
7.1. THE RIEMANN INTEGRAL
129
Let > 0 be arbitrary. Since f is Riemann integrable on [a, b], there
exists a partition P1 of [a, b] such that
L(f, P1 ) ≤
Z b
a
f (x) dx ≤ U (f, P1 ) ≤ L(f, P1 ) + .
2
Similarly, since g is Riemann integrable on [a, b], there exists a partition P2
of [a, b] such that
L(g, P2 ) ≤
Z b
a
g(x) dx ≤ U (g, P2 ) ≤ L(g, P2 ) + .
2
Let P = P1 ∪ P2 . Then P is a partition of [a, b] such that
Z b
and
2
a
Z b
g(x) dx ≤ U (g, P) ≤ L(g, P) + .
L(g, P) ≤
2
a
L(f, P) ≤
f (x) dx ≤ U (f, P) ≤ L(f, P) +
Hence, since we know that
L(f, P) + L(g, P) ≤ L(f + g, P) ≤ U (f + g, P) ≤ U (f, P) + U (g, P)
we obtain that
L(f, P) + L(g, P) ≤ L(f + g, P) ≤ U (f + g, P) ≤ L(f, P) + L(g, P) + and
Z b
Z b
f (x) dx +
g(x) dx − ≤ L(f + g, P)
a
a
≤ U (f + g, P)
≤
Z b
Z b
f (x) dx +
a
g(x) dx + .
a
Hence 0 ≤ U (f + g, P) − L(f + g, P) < . Thus, as was arbitrary, Theorem
7.1.15 implies that f + g is Riemann integrable on [a, b]. Furthermore, for
each > 0, the above computation produced a partition P such that
Z b
Z b
f (x) dx +
a
g(x) dx − ≤ L(f + g, P)
a
≤
Z b
(f + g)(x) dx
a
≤ U (f + g, P)
≤
Z b
Z b
f (x) dx +
a
g(x) dx + .
a
130
Hence
CHAPTER 7. INTEGRATION
Z
Z b
Z b
b
(f + g)(x) dx ≤ .
g(x) dx −
f (x) dx +
a
a
a
Therefore, as > 0 was arbitrary, we obtain that
Z b
Z b
Z b
a
a
a
g(x) dx.
f (x) dx +
(f + g)(x) dx =
For part (c), first let us show that f is Riemann integrable on [a, c] and
[c, b]. To see this, let > 0. By Theorem 7.1.15, there exists a partition P of
[a, b] such that
L(f, P) ≤
Z b
f (x) dx ≤ U (f, P) ≤ L(f, P) + .
a
Therefore, if P0 = P ∪ {c}, then P0 is a partition of [a, b] containing c such
that
Z b
L(f, P0 ) ≤
f (x) dx ≤ U (f, P0 ) ≤ L(f, P0 ) + .
a
Let
P1 = P0 ∩ [a, c]
and
P2 = P0 ∩ [c, b].
Then P1 is a partition of [a, c] and P2 is a partition of [c, b]. Furthermore,
due to the nature of these partitions, we easily see that
L(f, P0 ) = L(f, P1 ) + L(f, P2 )
and
U (f, P0 ) = U (f, P1 ) + U (f, P2 ).
Hence
0 ≤ (U (f, P1 ) − L(f, P1 )) + (U (f, P2 ) − L(f, P2 )) = U (f, P0 ) − L(f, P0 ) ≤ .
Hence, as 0 ≤ U (f, P1 ) − L(f, P1 ) and 0 ≤ U (f, P2 ) − L(f, P2 ), it must be
the case that
0 ≤ U (f, P1 ) − L(f, P1 ) ≤ and
0 ≤ U (f, P2 ) − L(f, P2 ) ≤ .
Hence f is integrable on both [a, c] and [c, b] by Theorem 7.1.15.
To see that
Z b
Z c
f (x) dx =
a
Z b
f (x) dx,
f (x) dx +
a
c
let > 0 be arbitrary. Since f is integrable on both [a, c] and [c, b], there
exist partitions P1 and P2 of [a, c] and [c, b] respectively such that
Z c
and
2
a
Z b
L(f, P2 ) ≤
f (x) dx ≤ U (f, P2 ) ≤ L(f, P2 ) + .
2
c
L(f, P1 ) ≤
f (x) dx ≤ U (f, P1 ) ≤ L(f, P1 ) +
7.1. THE RIEMANN INTEGRAL
131
Let P = P1 ∪ P2 which is a partition of [a, b]. Then, as before,
L(f, P) = L(f, P1 ) + L(f, P2 )
U (f, P) = U (f, P1 ) + U (f, P2 ).
and
Hence
Z b
Z c
f (x) dx +
f (x) dx − ≤ L(f, P1 ) + L(f, P2 )
c
a
= L(f, P)
≤
Z b
f (x) dx
a
≤ U (f, P)
= U (f, P1 ) + U (f, P2 )
≤
Z b
Z c
f (x) dx + .
f (x) dx +
c
a
Hence
Z
Z b
Z b
c
f (x) dx < .
f (x) dx −
f (x) dx +
a
a
c
Therefore, since > 0 was arbitrary, we obtain that
Z b
Z c
f (x) dx +
f (x) dx =
a
Z b
a
f (x) dx.
c
For part (d), suppose f (x) ≤ g(x) for all x ∈ [a, b]. Let > 0. By
Theorem 7.1.15, there exists a partition P of [a, b] such that
L(f, P) ≤
Z b
f (x) dx ≤ U (f, P) ≤ L(f, P) + .
a
However, since f (x) ≤ g(x) for all x ∈ [a, b], it must be the case that
inf{f (x) | x ∈ [c, d]} ≤ inf{g(x) | x ∈ [c, d]}
for any c, d ∈ [a, b]. Therefore L(f, P) ≤ L(g, P). Hence
Z b
f (x) dx − ≤ L(f, P) ≤ L(g, P) ≤
Z b
a
g(x) dx.
a
Since > 0 was arbitrary, we have
Z b
f (x) dx ≤
Z b
a
g(x) dx + a
for all > 0. Hence it must be the case that
Z b
f (x) dx ≤
a
Z b
g(x) dx.
a
For part (e), we have by part (d) and Example 7.1.12 that
m(b − a) =
Z b
a
m dx ≤
Z b
a
f (x) dx ≤
Z b
a
M dx = M (b − a).
132
CHAPTER 7. INTEGRATION
Note that Theorem 7.1.19 does not produce a formula for the Riemann
integral of the product of Riemann integrable
functions.
Indeed
R
R
it is almost
Rb
b
b
always the case that a (f g)(x) dx 6= a f (x) dx
a g(x) dx . For example,
using Examples 7.1.13 and 7.1.14, we see that
Z 1
x2 dx =
0
1
3
2
Z 1
x dx
whereas
0
1
= .
4
However, it is still possible to show that if f and g are Riemann integrable
on [a, b], then so to is f g. To prove this, we first note the following which
has its own uses.
Theorem 7.1.20. Let f : [a, b] → R a be bounded, Riemann integrable
function on [a, b]. Then the function |f | : [a, b] → R defined by |f |(x) = |f (x)|
for all x ∈ [a, b] is Riemann integrable on [a, b] and
Z
Z
b
b
f (x) dx ≤
|f (x)| dx.
a
a
Proof. Let > 0. By Theorem 7.1.15, there exists a partition P of [a, b] such
that
0 ≤ U (f, P) − L(f, P) < .
Write P = {tk }nk=0 where
a = t0 < t1 < t2 < · · · < tn−1 < tn = b.
For each k ∈ {1, . . . , n} let
mk (f ) = inf{f (x) | x ∈ [tk−1 , tk ]},
Mk (f ) = sup{f (x) | x ∈ [tk−1 , tk ]},
mk (|f |) = inf{|f (x)| | x ∈ [tk−1 , tk ]}, and
Mk (|f |) = sup{|f (x)| | x ∈ [tk−1 , tk ]}.
We claim that
Mk (|f |) − mk (|f |) ≤ Mk (f ) − mk (f )
for all k ∈ {1, . . . , n}. Indeed notice if x, y ∈ [tk−1 , tk ] are such that:
• f (x), f (y) ≥ 0, then
|f (x)| − |f (y)| = f (x) − f (y) ≤ Mk (f ) − mk (f ).
• f (x) ≥ 0 ≥ f (y) or f (y) ≥ 0 ≥ f (x), then
|f (x)| − |f (y)| ≤ f (x) − f (y) ≤ Mk (f ) − mk (f ).
7.1. THE RIEMANN INTEGRAL
133
• f (x), f (y) ≤ 0, then
|f (x)| − |f (y)| = f (y) − f (x) ≤ Mk (f ) − mk (f ).
The above inequalities and definitions imply that
Mk (|f |) − mk (|f |) ≤ Mk (f ) − mk (f ).
Hence
U (|f |, P) − L(|f |, P) =
≤
n
X
(Mk (|f |) − mk (|f |))(tk − tk−1 )
k=1
n
X
(Mk (f ) − mk (f ))(tk − tk−1 )
k=1
= U (f, P) − L(f, P) < .
Hence, as > 0 was arbitrary, |f | is Riemann integrable on [a, b] by Theorem
7.1.15.
By Theorem 7.1.19, −|f | is also Riemann integrable. Since
−|f (x)| ≤ f (x) ≤ |f (x)|
for all x ∈ [a, b], Theorem 7.1.19 also implies that
−
Z b
|f (x)| dx ≤
a
Hence
Z b
f (x) dx ≤
a
Z b
|f (x)| dx.
a
Z
Z
b
b
f (x) dx ≤
|f (x)| dx.
a
a
which completes the proof.
As a step toward proving that if f and g are Riemann integrable, then
f g is Riemann integrable, we first prove that f 2 is Riemann integrable.
Lemma 7.1.21. Let f : [a, b] → R be a bounded, Riemann integrable function
on [a, b]. Then the function f 2 : [a, b] → R defined by f 2 (x) = (f (x))2 for
all x ∈ [a, b] is Riemann integrable on [a, b].
Proof. Since f is bounded, let K = sup{|f (x)| | x ∈ [a, b]} < ∞. To see
that f 2 is Riemann integrable, let > 0 be arbitrary. Since |f | is Riemann
integrable by Theorem 7.1.20, by Theorem 7.1.15 there exists a partition P
of [a, b] such that
0 ≤ U (|f |, P) − L(|f |, P) <
1
.
2(K + 1)
134
CHAPTER 7. INTEGRATION
Write P = {tk }nk=0 where
a = t0 < t1 < t2 < · · · < tn−1 < tn = b.
For each k ∈ {1, . . . , n} let
mk (|f |) = inf{|f (x)| | x ∈ [tk−1 , tk ]},
Mk (|f |) = sup{|f (x)| | x ∈ [tk−1 , tk ]},
mk (f 2 ) = inf{(f (x))2 | x ∈ [tk−1 , tk ]}, and
Mk (f 2 ) = sup{(f (x))2 | x ∈ [tk−1 , tk ]}.
Since f 2 = |f |2 and since X ⊆ [0, ∞) implies
sup({x2 | x ∈ X}) = sup({x | x ∈ X})2 ,
we see that
Mk (f 2 ) − mk (f 2 ) = Mk (|f |)2 − mk (|f |)2
= (Mk (|f |) + mk (|f |))(Mk (|f |) − mk (|f |))
≤ 2K(Mk (|f |) − mk (|f |)).
Hence
0 ≤ U (f 2 , P) − L(f 2 , P) ≤ 2K(U (|f |, P) − L(|f |, P)) ≤ 2K
1
< .
2(K + 1)
Hence f 2 is Riemann integrable by Theorem 7.1.20.
Theorem 7.1.22. Let f, g : [a, b] → R be bounded, Riemann integrable
functions on [a, b]. Then f g : [a, b] → R is Riemann integrable on [a, b].
Proof. Since
f (x)g(x) =
1
(f (x) + g(x))2 − f (x)2 − g(x)2
2
and since f + g, f 2 , g 2 , and (f + g)2 are Riemann integrable by Theorem
7.1.19 and Lemma 7.1.21, it follows by Theorem 7.1.19 that f g is Riemann
integrable.
7.2
The Fundamental Theorems of Calculus
For our final section of the course, we note that although we have developed
the Riemann integral and its properties, we still lack a simple way to compute
the integral of even some of the most basic functions. Indeed the only integrals
we have been able to compute were in Examples 7.1.13 and 7.1.14 where
specific sums were used.
7.2. THE FUNDAMENTAL THEOREMS OF CALCULUS
135
The goal of this final section is to prove what is know as the Fundamental
Theorems of Calculus. Said theorems are named as such since they provide
the ultimate connection between integration and differentiation via antiderivatives as introduced in subsection 6.4.2. To study these theorems, we
will need to define some functions based on integrals.
To begin, suppose f : [a, b] → R is bounded and Riemann integrable on
[a, b]. For simplicity, let us define
Z a
f (x) dx = 0.
a
Therefore, if we define F : [a, b] → R by
Z x
F (x) =
f (t) dt
a
for all x ∈ [a, b], we see that F is a well-defined since f is Riemann integrable
on [a, x] by Theorem 7.1.19.
Lemma 7.2.1. Let f : [a, b] → R be a bounded, Riemann integrable function
on [a, b] and let F : [a, b] → R be defined by
Z x
F (x) =
f (t) dt
a
for all x ∈ [a, b]. Then F is continuous on [a, b].
Proof. Since f ([a, b]) is bounded, M = max{|f (x)| | x ∈ [a, b]} < ∞. If
x1 , x2 ∈ [a, b] and x1 < x2 , then we easily see by Theorem 7.1.19 that
Z x
2
−M |x1 − x2 | ≤ f (t) dt ≤ M |x2 − x1 |.
x1
Since
F (x2 ) − F (x1 ) =
Z x2
a
f (t) dt −
Z x1
Z x2
f (t) dt =
a
f (t) dt,
x1
for all x1 < x2 , it easily follows that F is continuous (i.e. as x2 tends to x1
(from any appropriate direction), M |x2 − x1 | tends to zero so F (x2 ) tends
to F (x1 ) by the Squeeze Theorem).
As the function F is continuous, we may as, “Is F differentiable?” The
First Fundamental Theorem of Calculus shows this is true and enables us
to compute the derivative. In fact, the following shows that if we integrate
a function f to obtain F , then the derivative of F is f ; that is, derivatives
undo integration in a certain sense.
136
CHAPTER 7. INTEGRATION
Theorem 7.2.2 (The Fundamental Theorem of Calculus, I). Let
f : [a, b] → R be continuous on [a, b] and let F : [a, b] → R be defined by
Z x
F (x) =
f (t) dt
a
for all x ∈ [a, b]. Then F is differentiable on (a, b) and F 0 (x) = f (x) for all
x ∈ (a, b).
Proof. Fix x ∈ (a, b). To see that
F (x + h) − F (x)
= f (x)
h→0
h
lim
let > 0. Since f is continuous at x, there exists a δ > 0 such that if
|t − x| < δ then |f (t) − f (x)| < . Suppose |h| < δ. If 0 < h < δ, then
F (x + h) − F (x)
− f (x)
h
Z
1 x+h
=
f (t) dt − f (x)
h x
Z
1 x+h
=
f (t) − f (x) dt
h x
Z x+h
1
|f (t) − f (x)| dt
h x
Z
1 x+h
≤
dt as |t − x| ≤ δ for all t in the integral
h x
1
= ((x + h) − x) = .
h
≤
Similarly, if −δ < h < 0, then
F (x + h) − F (x)
−
f
(x)
h
Z x
1
= −
f (t) dt − f (x)
h x+h
Z
1 x
f (t) − f (x) dt
= −
h x+h
Z x
1
|f (t) − f (x)| dt
h x+h
Z
1 x
≤−
dt as |t − x| ≤ δ for all t in the integral
h x+h
1
= − (x − (x + h)) = .
h
≤−
7.2. THE FUNDAMENTAL THEOREMS OF CALCULUS
137
Hence, for all h with 0 < |h| < δ,
F (x + h) − F (x)
≤ .
−
f
(x)
h
Thus, as was arbitrary,
lim
h→0
F (x + h) − F (x)
= f (x).
h
Hence F 0 (x) exists and F 0 (x) = f (x).
As the First Fundamental Theorem of Calculus shows that derivatives
undo integration, the Second Fundamental Theorem of Calculus shows that
integration undoes derivatives in a certain sense. In particular, the Second
Fundamental Theorem of Calculus shows us that if we know the antiderivative
of a function f , then we can compute the Riemann integral of f .
Theorem 7.2.3 (The Fundamental Theorem of Calculus, II). Let
f, g : [a, b] → R be such that f is Riemann integrable on [a, b], g is continuous
on [a, b], g is differentiable on (a, b), and g 0 (x) = f (x) for all x ∈ (a, b).
Then
Z
b
f (t) dt = g(b) − g(a).
a
Proof of Theorem 7.2.3 when f is continuous. Define F : [a, b] → R by
Z x
F (x) =
f (t) dt
a
for all x ∈ [a, b]. By the First Fundamental Theorem of Calculus (Theorem
7.2.2), F (x) is differentiable on (a, b) with
F 0 (x) = f (x) = g 0 (x)
for all x ∈ (a, b). Hence, by Corollary 6.4.4, there exists a constant α ∈ R
such that F (x) = g(x) + α for all x ∈ (a, b). Since F is continuous on [a, b]
by Lemma 7.2.1 and since g is continuous on [a, b] by assumption, we have
that F (x) = g(x) + α for all x ∈ [a, b]. Hence
Z b
f (t) dt = F (b) − 0
a
= F (b) − F (a)
= (g(b) + α) − (g(a) + α) = g(b) − g(a).
Proof of Theorem 7.2.3, no additional assumptions. Notice g is continuous
on [a, b] by Theorem 6.1.7.
138
CHAPTER 7. INTEGRATION
Let > 0. By Theorem 7.1.15 there exists a partition P of [a, b] such
that
Z b
L(f, P) ≤
f (t) dt ≤ U (f, P) ≤ L(f, P) + .
a
Write P = {tk }nk=0 where
a = t0 < t1 < t2 < · · · < tn−1 < tn = b.
Since g is continuous on [a, b], for each k ∈ {1, . . . , n}, the Mean Value
Theorem (Theorem 6.4.2) implies there exists a xk ∈ (tk−1 , tk ) such that
g(tk ) − g(tk−1 )
= g 0 (xk ) = f (xk )
tk − tk−1
=⇒
g(tk )−g(tk−1 ) = f (xk )(tk −tk−1 ).
Notice
n
X
n
X
f (xk )(tk − tk−1 ) =
k=1
g(tk ) − g(tk−1 ) = g(tn ) − g(t0 ) = g(b) − g(a).
k=1
Furthermore, since
L(f, P) ≤
n
X
f (xk )(tk − tk−1 ) ≤ U (f, P) ≤ L(f, P) + k=1
we obtain that
L(f, P) ≤ g(b) − g(a) ≤ L(f, P) + .
Since
L(f, P) ≤
Z b
f (t) dt ≤ U (f, P) ≤ L(f, P) + ,
a
we obtain that
Z b
f (t) dt ≤ .
g(b) − g(a) −
a
Therefore, as > 0 was arbitrary, the result follows.
Of course, we do not have the time in this course to pursue all of the
possible common methods of computing Riemann integrals. However, using
the second Fundamental Theorem of Calculus, we easily can compute some
7.2. THE FUNDAMENTAL THEOREMS OF CALCULUS
139
derivatives:
Z x
Z0 x
Z 1x
tn dt =
xn+1
0n+1
xn+1
−
=
n+1 n+1
n+1
1
dt = ln(x) − ln(1) = ln(x)
t
et dt = ex − e0 = ex − 1
Z x 0
sin(t) dt = − cos(x) − (− cos(0)) = − cos(x) + 1
Z 0x
Z 0x
cos(t) dt = sin(x) − (sin(0)) = sin(x)
sec2 (t) dt = tan(x) − tan(0) = tan(x)
0
Z x
sec(t) tan(t) dt = sec2 (x) − sec2 (0) = sec2 (x) − 1
0
Z x
1
dt = arcsin(x) − arcsin(0) = arcsin(x)
1 − t2
0
Z x
π
1
dt = arccos(x) − arccos(0) = arccos(x) −
−√
2
1 − t2
0
√
and so on.
To complete our course, let us prove one of the most common methods
of integration (if you feel substitution is more common, note the method of
substitution easily follows from the Chain Rule and the Second Fundamental
Theorem of Calculus whereas the following requires most of the theory we
have developed in this course).
Corollary 7.2.4 (Integration by Parts). Suppose [a, b] ⊆ (c, d), that f, g :
(c, d) → R are continuous and differentiable on [a, b], and f 0 , g 0 : [a, b] → R
are Riemann integrable. Then
Z b
f (x)g 0 (x) dx = f (b)g(g) − f (a)g(a) −
a
Z b
f 0 (x)g(x) dx.
a
Proof. Let h : [a, b] → R be defined by
h(x) = f 0 (x)g(x) + f (x)g 0 (x).
Since h is Riemann integrable on [a, b] by Theorems 7.1.17, 7.1.19, and 7.1.22,
since f g is differentiable on (c, d) with (f g)0 (x) = h(x) for all x ∈ [a, b] by
the Product Rule (Theorem 6.1.10), we obtain by the Second Fundamental
Theorem of Calculus (Theorem 7.2.3), and by Theorems 7.1.19, and 7.1.22
140
CHAPTER 7. INTEGRATION
that
f (b)g(b) − f (a)g(a) =
Z b
h(x) dx
a
Z b
=
a
Z b
=
a
Thus the result follows.
f 0 (x)g(x) + f (x)g 0 (x) dx
f 0 (x)g(x) dx +
Z b
a
f (x)g 0 (x) dx.
Index
absolute value, 10
anti-derivative, 102
Archimedean Property, 15
Axiom of Choice, 47
Bolzano-Weierstrass Theorem, 30
bounded above, general, 58
bounded below, 13
bounded, above, 13
bounded, subset of R, 13
Cantor-Schröder–Bernstein Theorem, 53
Carathéodory’s Theorem, 89
cardinality, 51
cardinality, less than or equal to, 52
Cartesian product, 45
Cartesian product, multiple, 47
Cauchy’s Mean Value Theorem, 104
chain, 58
chain rule, 90
common refinement, 119
Comparison Theorem, functions, 68
Comparison Theorem, sequences, 27
Completeness of R, 33
continuous function, 73
continuous, uniformly, 78
countable, 55
countably infinite, 55
Decreasing Function Theorem, 103
derivative, 84
differentiable, 83
equinumerous, 51
equivalence class, 35
141
142
equivalence relation, 35
Extreme Value Theorem, 97
field, 8
field, ordered, 10
field, subfield, 8
finite intersection property, 43
First Derivative Test, 103
function, 45
function, x → −∞, 71
function, x → ∞, 71
function, bijection, 49
function, co-domain, 49
function, composition, 51
function, continuous (closed interval), 76
function, continuous (open interval), 76
function, continuous (point), 73
function, decreasing, 91
function, diverge to infinity, 72
function, domain, 49
function, global maximum, 97
function, global minimum, 97
function, increasing, 91
function, injective, 49
function, left-sided limit, 69
function, limit, 64
function, local maximum, 97
function, local minimum, 97
function, monotone, 91
function, non-decreasing, 91
function, non-increasing, 91
function, one-to-one, 49
function, onto, 48
function, preimage, 49
function, range, 48
function, right-sided limit, 68
function, surjective, 48
function, two-sided-limit, 68
Fundamental Theorem of Calculus, I, 136
Fundamental Theorem of Calculus, II, 137
Fundamental Trigonometric Limit, 70
greatest lower bound, 13
Greatest Lower Bound Property, 14
INDEX
INDEX
Heine-Borel Theorem, 40
Increasing Function Theorem, 102
infimum, 26
integration by parts, 139
Intermediate Value Theorem, 77
interval, 11
interval, closed, 11
interval, open, 11
Inverse Function Theorem, 94
jump discontinuity, 75
L’Hôpital’s Rule, 105
least upper bound, 13
Least Upper Bound Property, 14
limit infimum, 27
limit supremum, 27
limit, function, 64
limit, sequence, 18
lower bound, R, 13
maximal element, 58
Mean Value Theorem, 100
Monotone Convergence Theorem, 21
open cover, 38
partial ordering, 9
partially ordered set, 57
partition, 115
peak point, 30
Peak Point Lemma, 30
Peano’s Axioms, 3
poset, 57
Principle of Mathematical Induction, 4
Principle of Strong Induction, 6
product rule, 87
quotient rule, 89
rational function, 67
refinement, 117
removable discontinuity, 74
Riemann integrable, 120
Riemann sum, 123
143
144
Riemann sum, lower, 116
Riemann sum, upper, 116
Rolle’s Theorem, 100
second derivative, 111
sequence, 17
sequence, bounded, 20
sequence, Cauchy, 32
sequence, constant, 17
sequence, converging, 18
sequence, decreasing, 20
sequence, diverging, 18
sequence, diverging to infinity, 24
sequence, Fibonacci, 17
sequence, increasing, 20
sequence, limit, 18
sequence, monotone, 21
sequence, non-decreasing, 20
sequence, non-increasing, 20
sequence, recursively defined, 17
set, 1
set, closed, 36
set, closure, 42
set, compact, 39
set, compliment, 2
set, difference, 2
set, element, 1
set, intersection, 2
set, open, 34
set, power set R, 9
set, sequentially compact, 41
set, subset, 2
set, union, 2
Squeeze Theorem, functions, 67
Squeeze Theorem, sequences, 25
subfield, 8
subsequence, 29
supremum, 26
Taylor polynomial, 111
Taylor’s Theorem, 112
total ordering, 9
uncountable, 55
uniform partition, 124
INDEX
INDEX
upper bound, R, 13
upper bound, arbitrary, 58
Well-Ordering Principle, 6
Zorn’s Lemma, 59
145
Download