MATH 409 Advanced Calculus I Paul Skoufranis April 29, 2016 ii Preface: These are the first edition of these lecture notes for MATH 409. Consequently, there may be several typographical errors. Furthermore, these notes will not contain much additional material outside the topics covered in class. However, due to time constraints, some subsections may be skipped in class. We leave those subsections as part of these notes for the curious student, but students will not be responsible for those sections. iv Contents 1 Axioms of Number Systems 1.1 Set Notation . . . . . . . . . . . . . . . . . . . . 1.2 The Natural Numbers . . . . . . . . . . . . . . . 1.2.1 Peano’s Axioms . . . . . . . . . . . . . . . 1.2.2 The Principle of Mathematical Induction 1.2.3 The Well-Ordering Principle . . . . . . . 1.3 The Real Numbers . . . . . . . . . . . . . . . . . 1.3.1 Fields . . . . . . . . . . . . . . . . . . . . 1.3.2 Partially Ordered Sets . . . . . . . . . . . 1.3.3 The Triangle Inequality . . . . . . . . . . 1.3.4 The Least Upper Bound Property . . . . 1.3.5 Constructing the Real Numbers . . . . . . 2 Sequences of Real Numbers 2.1 The Limit of a Sequence . . . . . . . . . . . 2.1.1 Definition of a Limit . . . . . . . . . 2.1.2 Uniqueness of the Limit . . . . . . . 2.2 The Monotone Convergence Theorem . . . 2.3 Limit Theorems . . . . . . . . . . . . . . . . 2.3.1 Limit Arithmetic . . . . . . . . . . . 2.3.2 Diverging to Infinity . . . . . . . . . 2.3.3 The Squeeze Theorem . . . . . . . . 2.3.4 Limit Supremum and Limit Infimum 2.4 The Bolzano–Weierstrass Theorem . . . . . 2.4.1 Subsequences . . . . . . . . . . . . . 2.4.2 The Peak Point Lemma . . . . . . . 2.4.3 The Bolzano–Weierstrass Theorem . 3 An Introduction to Topology 3.1 Completeness of the Real Numbers . . . . 3.1.1 Cauchy Sequences . . . . . . . . . 3.1.2 Convergence of Cauchy Sequences 3.2 Topology of the Real Numbers . . . . . . v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 3 3 4 6 7 7 9 10 13 15 . . . . . . . . . . . . . 17 17 17 19 20 21 21 24 25 26 29 29 29 30 . . . . 31 31 31 32 33 vi CONTENTS 3.3 3.2.1 Open Sets . . . . . . . . . . . . . 3.2.2 Closed Sets . . . . . . . . . . . . Compactness . . . . . . . . . . . . . . . 3.3.1 Definition of Compactness . . . . 3.3.2 The Heine-Borel Theorem . . . . 3.3.3 Sequential Compactness . . . . . 3.3.4 The Finite Intersection Property . . . . . . . . . . . . . . 4 Cardinality of Sets 4.1 Functions . . . . . . . . . . . . . . . . . . . 4.1.1 The Axiom of Choice . . . . . . . . 4.1.2 Bijections . . . . . . . . . . . . . . . 4.2 Cardinality . . . . . . . . . . . . . . . . . . 4.2.1 Definition of Cardinality . . . . . . . 4.2.2 Cantor-Schröder–Bernstein Theorem 4.2.3 Countable and Uncountable Sets . . 4.2.4 Zorn’s Lemma . . . . . . . . . . . . 4.2.5 Comparability of Cardinals . . . . . 5 Continuity 5.1 Limits of Functions . . . . . . . . . . 5.1.1 Definition of a Limit . . . . . 5.1.2 Limit Theorems for Functions 5.1.3 One-Sided Limits . . . . . . . 5.1.4 Limits at and to Infinity . . . 5.2 Continuity of Functions . . . . . . . 5.3 The Intermediate Value Theorem . . 5.4 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 36 38 38 40 41 42 . . . . . . . . . 45 45 45 48 51 51 53 55 57 59 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 63 63 67 68 71 73 76 78 6 Differentiation 6.1 The Derivative . . . . . . . . . . . . . . . . 6.1.1 Definition of a Derivative . . . . . . 6.1.2 Rules of Differentiation . . . . . . . 6.2 Inverse Functions . . . . . . . . . . . . . . . 6.2.1 Monotone Functions . . . . . . . . . 6.2.2 Inverse Function Theorem . . . . . . 6.3 Extreme Values of Functions . . . . . . . . 6.4 The Mean Value Theorem . . . . . . . . . . 6.4.1 Proof of the Mean Value Theorem . 6.4.2 Anti-Derivatives . . . . . . . . . . . 6.4.3 Monotone Functions and Derivatives 6.4.4 L’Hôpital’s Rule . . . . . . . . . . . 6.4.5 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 83 83 87 91 91 93 96 99 100 101 102 104 111 . . . . . . . . . . . . . . . . . . . . . . . . CONTENTS 7 Integration 7.1 The Riemann Integral . . . . . . . . . . . 7.1.1 Riemann Sums . . . . . . . . . . . 7.1.2 Definition of the Riemann Integral 7.1.3 Some Integrable Functions . . . . . 7.1.4 Properties of the Riemann Integral 7.2 The Fundamental Theorems of Calculus . vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 115 115 119 124 127 134 viii CONTENTS Chapter 1 Axioms of Number Systems To discuss advanced calculus, we must return to many of the basis structures that are taken for granted in previous courses. In particular, what exactly are the natural numbers and the real numbers, and what properties do these number systems have that we may use? 1.1 Set Notation All mathematics must contain some notation in order for one to adequately describe the objects of study. As such, we begin by developing the notation for one of the ‘simplest’ constructs in mathematics. Heuristic Definition. A set is a collection of distinct objects. Our first task is to develop notation to adequately describe sets and symbols to represent sets that will be common in this course. The following table list several sets, the symbol used to represent the set, and a set notational way to describe the set. Set natural numbers integers real numbers rational numbers Symbol N Z R Q Set Notation {1, 2, 3, 4, . . .} {0, 1, −1, 2, −2, 3, −3, . . .} {real numbers} a b | a, b ∈ Z, b 6= 0 Notice two different types of notation are used in the above table to describe sets: namely {objects} and {objects | conditions on the objects}. Furthermore, the symbol ∅ will denote the empty set; that is, the set with no elements. Given a set X and an object x, we need notation to describe when x belongs to X. In particular, we say that x is an element of X, denoted x ∈ X, when x is one of the objects that make up X. Furthermore, we will √ √ use x ∈ / X when x is not an element of X. For example, 2 ∈ R yet 2 ∈ /Q 1 2 CHAPTER 1. AXIOMS OF NUMBER SYSTEMS and 0 ∈ Z but 0 ∈ / N. Furthermore, given two sets X and Y , we say that Y is a subset of X, denoted Y ⊆ X, if each element of Y is an element of X; that is, if a ∈ Y then a ∈ X. For example N ⊆ Z ⊆ Q ⊆ R. Furthermore, note if X ⊆ Y and Y ⊆ X, then X = Y . Given two sets X and Y , there are various operations one can perform on these two sets. Three such operations are as follows: • The union of X and Y , denoted X ∪ Y , is the set X ∪ Y = {a | a ∈ X or a ∈ Y }; that is, the union of X and Y consists of joining the two sets into one. • The intersection of X and Y , denoted X ∩ Y , is the set X ∩ Y = {a | a ∈ X and a ∈ Y }; that is, the intersection of X and Y is the set of elements contained in both X and Y . • The set difference of X and Y , denoted X \ Y , is the set X \ Y = {a | a ∈ X and a ∈ / Y }; that is, the set of all elements of X that are not elements of Y . For example, if X = {1, 2, 3} and Y = {2, 4, 6}, then X ∪ Y = {1, 2, 3, 4, 6}, X ∩ Y = {2}, and X \ Y = {1, 3}. In this course, we will often have a set X (usually R) and will be considering subsets of X. Consequently, given a subset Y of X, the set difference X \ Y will be called the complement of Y (in X) and will be denoted Y c for convenience. Sets will play an important role in this course. However, one important question that has not been addressed is, “What exactly is a set?” This questions must be asked as we have not provided a rigorous definition of a set. This leads to some interesting questions, such as, “Does the collection of all sets form a set?” Let us suppose that there is a set of all sets; that is Z = {X | X is a set} makes sense. Note Z has the interesting property that Z ∈ Z. Furthermore, if Z exists, then Y = {X | X is a set and X ∈ / X} 1.2. THE NATURAL NUMBERS 3 would be a valid subset of Z. However, we clearly have two disjoint cases: either Y ∈ Y or Y ∈ / Y (that is, either Y is an element of Y or Y is not an element of Y ). If Y ∈ Y , then the definition of Y implies Y ∈ / Y which is a contradiction since we cannot have both Y ∈ Y and Y ∈ / Y . Thus, if Y ∈ Y is false, then it must be the case that Y ∈ / Y. However, Y ∈ / Y implies by the definition of Y that Y ∈ Y . Again this is a contradiction since we cannot have both Y ∈ / Y and Y ∈ Y . This argument is known as Russell’s Paradox and demonstrates that there cannot be a set of all sets. The above paradox illustrates the necessity of a rigorous definition of a set. However, said definition takes us beyond the study of this class. Instead we will focus on two unforeseen questions, “What are the natural numbers?” and “How do we define the natural numbers?” 1.2 The Natural Numbers As seen through Russell’s Paradox, rigorous definitions are required to prevent misconceptions with the objects we desire to study. As such, we need to discuss what exactly the natural numbers are. 1.2.1 Peano’s Axioms The following, known as Peano’s Axioms, completely characterize the natural numbers. Definition 1.2.1. The natural numbers, denoted N, are the unique number system satisfying the following five axioms: 1. There is a number, denoted 1, such that 1 ∈ N. 2. For each number n ∈ N, there is a number S(n) ∈ N called the successor of n (i.e. S(n) = n + 1). 3. The number 1 is not the successor of any number in N. 4. If m, n ∈ N and S(n) = S(m), then n = m. 5. (Induction Axiom) If X ⊆ N has the properties (a) 1 ∈ X, and (b) if k ∈ N and k ∈ X, then S(k) ∈ X, then X = N. Each of the above five axioms are necessary. The following examples demonstrate the necessity of the third, fourth, and fifth axioms. 4 CHAPTER 1. AXIOMS OF NUMBER SYSTEMS Example 1.2.2. Consider the set X = {1, 2} where we define S(1) = 2 and S(2) = 1. One may verify that X satisfies all but the third of Peano’s Axioms. Example 1.2.3. Consider the set X = {1, 2} where we define S(1) = 2 and S(2) = 2. One may verify that X satisfies all but the fourth of Peano’s Axioms. Example 1.2.4. Consider the set N2 = {(n, m) | n, m ∈ Z} where we define 1 = (1, 1) and S(n, m) = (n + 1, m + 1). One may verify that N2 satisfies all but the fifth of Peano’s Axioms since X = {(n, n) | n ∈ N} has properties (a) and (b) but is not all of N2 . The axioms of the natural numbers provide some nice properties. The next subsection will focus on applications of the fifth axiom. For now, we note that the other axioms give us a nice ‘ordering’ on N, which is consistent with the ordering one expects. In particular, for n, m ∈ N, we define n < m if m can be obtained by taking (possibly multiple) successors of n. Furthermore, we define n ≤ m if n < m or n = m. The notion of ordering will play an essential role in the construction of the real numbers (see Subsection 1.3.2). 1.2.2 The Principle of Mathematical Induction The Induction Axiom of the natural numbers leads to the following principle. Theorem 1.2.5 (The Principle of Mathematical Induction). For each k ∈ N, let Pk be a mathematical statement that is either true or false. Suppose 1. (base case) P1 is true, and 2. (inductive step) if k ∈ N and Pk is true, then Pk+1 is true. Then Pn is true for all n ∈ N. Proof. Let X = {n ∈ N | Pn is true}. By assumption we see that 1 ∈ X as P1 is true. Assume that k ∈ X. By the definition of X, we know Pk is true. By the assumptions in the statement of the theorem, Pk+1 is true and hence k + 1 ∈ X by the definition of X. Hence the Induction Axiom in Definition 1.2.1 implies X = N. Hence Pn is true for all n. The Principle of Mathematical Induction is an essential method for proving mathematical statements. The following is a specific example. 1.2. THE NATURAL NUMBERS 5 Example 1.2.6. For each n ∈ N, we claim that n X m = 1 + 2 + 3 + ··· + n = m=1 n(n + 1) . 2 To see this result is true, for each n ∈ N let Pn be the statement that Pn n(n+1) . To show that Pn is true for all n ∈ N, we will apply the k=1 k = 2 Principle of Mathematical Induction. To do so, we must demonstrate the two conditions in Theorem 1.2.5. Base Case: To see that P1 is true, notice that when n = 1, 1 X n(n + 1) 1(1 + 1) = =1= m. 2 2 m=1 Hence P1 is true. P Inductive Step: Suppose that Pk is true; that is, suppose km=1 m = k(k+1) (this assumption 2 Pk+1 is true, notice k+1 X is known as the induction hypothesis). To see that m = (k + 1) + m=1 k X m m=1 k(k + 1) by the induction hypothesis 2 2(k + 1) + (k 2 + k) = 2 k 2 + 3k + 2 (k + 1)(k + 2) (k + 1)((k + 1) + 1) = = = . 2 2 2 = (k + 1) + Hence Pk+1 is true. Therefore, as we have demonstrated the base case and the inductive step, the result follows by the Principle of Mathematical Induction. Some people (mainly computer scientists) argue that the Induction Axiom must be false as it would take an infinite amount of time for a computer to verify Pn is true for all n by using the fact P1 is true and Pk is true implies Pk+1 is true. We will not adopt this notion. In fact, often one wants to assume more than just Pk is true in order to show that Pk+1 is true. Theorem 1.2.7 (Strong Induction). Suppose X ⊆ N. If 1. 1 ∈ X, and 2. if k ∈ N and {1, 2, . . . , k} ⊆ X, then k + 1 ∈ X, then X = N. 6 CHAPTER 1. AXIOMS OF NUMBER SYSTEMS Proof. For each n ∈ N, let Pn be the statement that {1, . . . , n} ⊆ X. We claim that Pn is true for all n ∈ N. To show this, we will apply the Principle of Mathematical Induction. Base Case: As 1 ∈ X by assumption, clearly P1 is true. Inductive Step: Suppose that Pk is true; that is, {1, 2, . . . , k} ⊆ X. By assumption on X, k + 1 ∈ X. Hence {1, . . . , k, k + 1} ⊆ X so Pk+1 is true. Hence, by the Principle of Mathematical Induction, {1, . . . , n} ⊆ X for all n ∈ N. In particular, n ∈ X for all n ∈ N. Hence X = N. Theorem 1.2.8 (The Principle of Strong Mathematical Induction). For each k ∈ N, let Pk be a mathematical statement that is either true or false. Suppose 1. P1 is true, and 2. if k ∈ N and Pm is true is true for all m ≤ k, then Pk+1 is true. Then Pn is true for all n ∈ N. Proof. The proof of this result is nearly identical to that of Theorem 1.2.5. Let X = {n ∈ N | Pn is true}. By assumption we see that 1 ∈ X as P1 is true. Assume that {1, . . . , k} ⊆ X. By the definition of X, we know Pm is true for all m ≤ k. By the assumptions in the statement of the theorem, Pk+1 is true and hence k + 1 ∈ X by the definition of X. Hence Strong Induction implies X = N. Hence Pn is true for all n ∈ N. 1.2.3 The Well-Ordering Principle There is one additional form of the Principle of Mathematical Induction that is quite useful. Theorem 1.2.9 (The Well-Ordering Principle). Every non-empty subset of N has a least element; that is, if Y ⊆ N and Y = 6 ∅, then there is an element m ∈ Y such that m ≤ k for all k ∈ Y . Proof. Suppose Y is a non-empty subset of N that does not have a least element. Let X = N \ Y = {n ∈ N | n ∈ / X}. We will apply Strong Induction to show that X = N. This will complete the proof since X = N implies Y = ∅, which contradicts the fact that Y is non-empty. To apply Strong Induction, we must demonstrate the two necessary assumptions in Theorem 1.2.7. Base Case: Since Y does not have a least element, we know that 1 ∈ /Y or else 1 would be the least element of Y . Hence 1 ∈ X. 1.3. THE REAL NUMBERS 7 Inductive Step: Suppose k ∈ N and {1, . . . , k} ⊆ X. Then each element of {1, . . . , k} is not in Y . Hence k + 1 ∈ / Y for otherwise k + 1 would be the least element of Y since none of 1, . . . , k are in Y . Hence k + 1 ∈ X as k+1∈ / Y. Hence, by Strong Induction, X = N thereby completing the proof by earlier discussions. In the above, we assumed the Induction Axiom as one of Peano’s Axioms, deduced Strong Induction, and used Strong Induction to deduce the WellOrdering Principle. In fact, the Induction Axiom and the Well-Ordering Principle are logically equivalent; that is, if one replaces the Induction Axiom with the Well-Ordering Principle in Definition 1.2.1, one may deduce the Induction Axiom (see the homework). 1.3 The Real Numbers With a rigorous construction of the natural numbers now complete, we turn our attention to the real numbers. In particular, how does one construct the real numbers, what properties do the real numbers have, and are there any number systems with the same properties as the real numbers? 1.3.1 Fields To begin our discussion of the real numbers, we note there are some common operations we may apply to the real numbers: namely addition, subtraction, multiplication, and division. These operations have specific properties that we shall explore. We begin with addition and multiplication. Recall that addition and multiplication are operations on pairs of real numbers; that is, for every x, y ∈ R there are numbers, denoted x + y and x · y, which are elements of R. Furthermore, there are two properties we require for addition and multiplication to behave well, and one property that says addition and multiplication play together nicely: (F1) (Commutativity) x + y = y + x and x · y = y · x for all x, y ∈ R. (F2) (Associativity) (x + y) + z = x + (y + z) and (x · y) · z = x · (y · z) for all x, y, z ∈ R. (F3) (Distributivity) x · (y + z) = (x · y) + (x · z) for all x, y, z ∈ R. To introduce the operations of subtraction and division, we must understand what these operations are and how they may be derived from addition and multiplication. For example, what does subtracting 3 from 4 mean in terms of addition? Well, it really means add the number −3 to 4. And how 8 CHAPTER 1. AXIOMS OF NUMBER SYSTEMS are 3 and −3 related? Well, −3 is the unique number x such that 3 + x = 0. And what is 0 in terms of addition? Well, 0 is the unique number y that when you add y to any number z, you end up with z. Similarly, what does dividing by 7 mean in terms of multiplication? Well, it really means multiply by 17 . And how are 7 and 17 related? Well, 17 is the unique number x such that 7x = 1. And what is 1 in terms of multiplication? Well, 1 is the unique number y that when you multiply y to any number z, you end up with z. Using the above, we added the following properties to our list of properties defining R: (F4) (Existence of Identities) There are numbers 0, 1 ∈ R with 0 6= 1 such that 0 + x = x and 1 · x = x for all x ∈ R. (F5) (Existence of Inverses) For all x, y ∈ R with y 6= 0, there exists −x, y −1 ∈ R such that x + (−x) = 0 and y · y −1 = 1. Using these two properties, one then defines subtraction and division via x−y = x+(−y) and x÷z = x·z −1 for all x, y, z ∈ R with z = 6 0. Furthermore, it is possible to show that all of the numbers listed in (F4) and (F5) are unique (that is, any number with the same properties as one of 0, 1, −x, or y −1 must be the corresponding number) Although the real numbers have the above five properties, they are not the only number system that has all five properties. For example, clearly the rational numbers Q (which are not equal to the real numbers by the homework) also satisfy all five properties when we replace R with Q. Consequently, we make the following definition. Definition 1.3.1. A field is a set F together with two operations + and · such that a + b ∈ F and a · b ∈ F for all a, b ∈ F, and + and · satisfy (F1), (F2), (F3), (F4), and (F5) as written above (replacing R with F). Notice if one is given a field F and a subset E of F that has the property that a + b ∈ E and a · b ∈ E for all a, b ∈ E, then E is a field with the operations + and · provided 0, 1 ∈ E and −x, z −1 ∈ E for all x, z ∈ E with z 6= 0. In this case, we call E a subfield of F. For example, √ √ Q[ 2] := {x + y 2 | x, y ∈ Q} is a subfield of R. However, there are fields that look strikingly different from R. Example 1.3.2. Consider Z2 = {0, 1} with the following rules for addition and multiplication: + 0 1 0 0 1 1 1 0 · 0 1 0 0 0 1 0 1 1.3. THE REAL NUMBERS 9 (think of 0 as all even numbers and 1 as all odd numbers; an odd plus an odd is odd, an odd times an even is even, etc). One can verify that Z2 is a field with the above operations. All of the above properties listed are algebraic properties. Are there other properties of R we can include to distinguish R from other fields? 1.3.2 Partially Ordered Sets One notion that exists for the real numbers that does not exist for other fields is the notion of an ordering; that is, given two numbers, we have a notion which tells us which number is bigger. We begin with the following concept. Definition 1.3.3. Let X be a set. A relation on the elements of X is called a partial ordering if: 1. (reflexivity) a a for all a ∈ X, 2. (antisymmetry) if a b and b a, then a = b for all a, b ∈ X, and 3. (transitivity) if a, b, c ∈ X are such that a b and b c, then a c. Clearly ≤ (as usually defined) is a partial ordering on R. Here is another example: Example 1.3.4. Let P(R) := {X | X ⊆ R}. The set P(R) is known as the power set of R and consists of all subsets of R. We define a relation on P(R) as follows: given X, Y ∈ P(R), XY if and only if X ⊆ Y. It is not difficult to verify that is a partial ordering on P(R). The partial ordering in the previous example is not as nice as our ordering on R. To see this, consider the sets X = {1} and Y = {2}. Then X Y and Y X; that is, we cannot use the partial ordering to compare X and Y . However, if x, y ∈ R, then either x ≤ y or y ≤ x. Consequently, we desire to add in this additional property to our ordering: Definition 1.3.5. Let X be a set. A partial ordering on X is called a total ordering if for all x, y ∈ X, either x y or y x (or both). The ordering one usually considers on R is clearly a total ordering. However, it is also easy to place a total ordering on Z2 . 10 CHAPTER 1. AXIOMS OF NUMBER SYSTEMS Example 1.3.6. Let Z2 be as in Example 1.3.2. Define 0 0, 0 1, 1 1, and 1 0. It is easy to verify that is a total ordering on Z2 . The problem with the ordering on Z2 is that addition and multiplication do not interact well with respect to the ordering. The following describes fields with ‘nice’ orderings: Definition 1.3.7. An ordered field is a field F together with a total ordering such that for all x, y, z ∈ F with x y, the following two properties hold: • (Additive Property) x + z y + z. • (Multiplicative Property) x · z y · z provided 0 z and y · z x · z provided z 0. In any ordered field, it must be the case that 0 1. Indeed, if 1 0, then the Multiplicative Property implies 0 · 1 1 · 1 so 0 1 and 1 0 and therefore antisymmetry implies 0 = 1 which contradicts (F4). Note the ordering on Z2 given in Example 1.3.6 does not make Z2 into an ordered field since 0 1 yet 0 + 1 1 + 1 (so this total ordering does not satisfy the Additive Property). It is clear that R is an √ ordered field. However, it is clear that any subfield of R (such as Q and Q[ 2]) are then also ordered fields. Consequently, we still need a way to distinguish R from its subfields. 1.3.3 The Triangle Inequality Before discussing how R differs from its subfields, we will analyze a useful concept the ordering on R provides. Definition 1.3.8. Given x ∈ R, the absolute value of x is ( |x| = x if x ≥ 0 . −x if x < 0 The absolute value has many important properties. For example, clearly | − x| = |x| for all x ∈ R (split the proof into two cases: x ≥ 0 and x < 0). Furthermore, since x = ±|x| for all x ∈ R, it is not difficult to check that |xy| = |x||y| for all x, y ∈ R (split the proof into four cases: the two cases x ≥ 0 and x < 0, each of which has the two cases y ≥ 0 and y < 0). However, the absolute value is not important just for its properties, but for what it represents. 1.3. THE REAL NUMBERS 11 Notice that |x| represents the distance from x to 0. Consequently, we can also see that |b − a| represents the distance from b to a for all a, b ∈ R. Furthermore, for all a, δ ∈ R with δ > 0, the set {x ∈ R | |x − a| < δ} describes all points in R whose distance to a is strictly less than δ. Notice |x−a| < δ if and only if −δ < x−a < δ if and only if a−δ < x < a+δ, which provides an alternate description of the above set without using absolute values. Such sets are quite important in this course so we make the following notation. Notation 1.3.9. For all a, b ∈ R with a ≤ b, we define (a, b) := {x ∈ R | a < x < b} [a, b) := {x ∈ R | a ≤ x < b} (a, b] := {x ∈ R | a < x ≤ b} [a, b] := {x ∈ R | a ≤ x ≤ b}. For the first two, we permit ∞ to replace b, and, for the first and third, we permit −∞ to replace a. Each of the above sets is called an interval with (a, b) called an open interval and [a, b] called a closed interval. In order to have a well-defined notion of distance in mathematics, several properties need to be satisfied. Notice that if a, b ∈ R, then the distance from b to a is zero exactly when |b − a| = 0, which is the same as saying b = a. Furthermore, since |b − a| = | − (b − a)| = |a − b|, the distance from b to a is the same as the distance from a to b. Finally, the last property required to have a well-defined notion of distance is as follows: Theorem 1.3.10 (The Triangle Inequality). Let x, y, z ∈ R. Then |x − y| ≤ |x − z| + |z − y|. That is, the distance from x to y is no more than the sum of the distance from x to z and the distance from z to y. y z x 12 CHAPTER 1. AXIOMS OF NUMBER SYSTEMS Proof. If x = y, the result is trivial to verify. Consequently we will assume x < y (if y < x, we can relabel y with x and x with y to run the following proof). We have three cases to consider. Case 1 z x Case 2 y x y Case 3 z x z y Case 1. z < x: In this case, notice |x − y| ≤ |z − y| = 0 + |z − y| ≤ |x − z| + |z − y| as desired. Case 2. y < z: In this case, notice |x − y| ≤ |x − z| = |x − z| + 0 ≤ |x − z| + |z − y| as desired. Case 3. x ≤ z ≤ y: In this case, we easily see that |x − y| = |x − z| + |z − y|. Hence, as we have exhausted all cases (up to flipping x and y), the proof is complete. The Triangle Inequality is an incredibly useful tool in analysis. Furthermore, there are many other forms of the Triangle Inequality. For example, letting x = a, y = −b, and z = 0 produces |a + b| ≤ |a| + |b| for all a, b ∈ R. In addition, if we let x = a, y = 0, and z = b, we obtain |a| ≤ |a − b| + |b| so |a| − |b| ≤ |a − b|, and if we let x = b, y = 0, and z = a, we obtain |b| ≤ |a − b| + |a| so − (|a| − |b|) ≤ |a − b|. Consequently, we obtain that ||a| − |b|| ≤ |a − b| for all a, b ∈ R. 1.3. THE REAL NUMBERS 1.3.4 13 The Least Upper Bound Property We have seen that R (along with its subfields) are ordered fields. We now begin the discussion of how to use this ordering to construct the final property needed to distinguish R from all other fields! Definition 1.3.11. Let X ⊆ R. An element α ∈ R is said to be an upper bound for X if x ≤ α for all x ∈ X. An element α ∈ R is said to be a lower bound for X if α ≤ x for all x ∈ X. Finally, X is said to be bounded above if X has an upper bound, bounded, below if X has a lower bound, and bounded if X has both an upper and lower bound. Example 1.3.12. Let X = (0, 1). Then 1 is an upper bound of X and 0 is a lower bound of X. Thus X is bounded. Furthermore, note that 5 is also an upper bound of X and −7 is a lower bound of X. Example 1.3.13. Let X = ∅. Then every number in R is both an upper and lower bound of X vacuously (that is, there are no elements of X to which to check the defining property). Notice that N is bounded below as 1 is a lower bound (as is −2, 0, 0.5, etc.). Does N have an upper bound? Our intuition says no so that N is not bounded above. However, how do we prove this? To tackle the above problem (in addition to describing the property required to distinguish R from other ordered fields), one probably has noticed in the above examples there were special upper/lower bounds that were ‘optimal’. Definition 1.3.14. Let X ⊆ R. An element α ∈ R is said to be the least upper bound of X if • α is an upper bound of X, and • if β is an upper bound of X, then α ≤ β. We write lub(X) in place of α, provided α exists. Similarly, an element α ∈ R is said to be the greatest lower bound of X if • α is a lower bound of X, and • if β is a lower bound of X, then β ≤ α. We write glb(X) in place of α, provided α exists. In the above definition, notice we have used the term ‘the least upper bound’ instead of ‘a least upper bound’. This is because it is elementary to show that a set with a least upper bound has exactly one least upper bound. Indeed if α1 and α2 are both least upper bounds of a set X, then α1 ≤ α2 and α2 ≤ α1 by the two defining properties of a least upper bound, so α1 = α2 . 14 CHAPTER 1. AXIOMS OF NUMBER SYSTEMS Example 1.3.15. Let X = [0, 1] and let Y = (0, 1). Then lub(X) = lub(Y ) = 1 and glb(X) = glb(Y ) = 0. However, notice 0, 1 ∈ X whereas 0, 1 ∈ / Y . This demonstrates that the least upper bound and greatest lower bounds may or may not be in the set. Example 1.3.16. Clearly a set that is not bounded above cannot have a least upper bound and a set that is not bounded below cannot have a greatest lower bound. Consequently ∅ has no least upper bound nor greatest lower bound. Example 1.3.17. Let X = {x ∈ Q | x ≥ 0 and x2 < 2}. Clearly glb(X) = 0 and lub(X) = √ 2. The above example emphasizes the difference between Q and R. Notice that X ⊆ Q. However, if we only consider numbers √ in Q, then X does not have a least upper bound in Q as if b ∈ Q and 2 < b, there is always √ an r ∈ Q such that 2 < r < b (see homework). The following property guarantees that R does not have such pitfalls. Theorem 1.3.18 (The Least Upper Bound Property). Every nonempty subset of R that is bounded above has a least upper bound. Note the term ‘non-empty’ must be included because of Example 1.3.16. Furthermore, this completes our discussion of how to distinguish R from other number systems since it is possible to show that any ordered field with the Least Upper Bound Property is R! We will not demonstrate this fact as it detours us from the goals of this course. The Least Upper Bound Property is an amazing property that makes all of analysis on R possible. In fact, we note the following corollaries of the Least Upper Bound Property. Corollary 1.3.19 (The Greatest Lower Bound Property). Every nonempty subset of R that is bounded below has a greatest lower bound. Proof Sketch. Let X be a non-empty subset of R that is bounded below. Let Y = {−x | x ∈ X}. One can verify that if a ∈ R, then a is an upper bound for Y if and only if −a is a lower bound for X. Consequently Y is bounded above (as X is bounded below) and thus Y has a least upper bound by the Least Upper Bound Property. Furthermore, it is not difficult to check that −lub(Y ) is then the greatest lower bound of X. 1.3. THE REAL NUMBERS 15 Corollary 1.3.20 (The Archimedean Property). The natural numbers are not bounded above in R. Proof. Suppose N is bounded above in R. Then N must have a least upper bound, say α, by the Least Upper Bound Property. Since α is an upper bound of N, we know that n ≤ α for all n ∈ N. Hence n + 1 ≤ α and thus n ≤ α − 1 for all n ∈ N. Thus α − 1 is an upper bound for N, which contradicts the fact that α is the least upper bound of N as α − 1 < α. 1.3.5 Constructing the Real Numbers In the previous section, we claimed that the real numbers are the unique ordered field with the Least Upper Bound Property. However, how do we know the real numbers exist at all? There are two main constructions of the reals. The first uses equivalence (see Chapter 3) of Cauchy sequences (see Chapter 3) of rational numbers. The other is more complicated and is quickly sketched below. A more interested reader may consult https: //en.wikipedia.org/wiki/Construction_of_the_real_numbers. In Section 1.2 we rigorously constructed the natural numbers. From N we can construct the integers Z by adding a symbol −n for all n ∈ N. One then must define + and · using the notion of successors in Definition 1.2.1 and verify all of the desired properties. One must also extend the notion of < from N to Z in the obvious way. From Z we can then construct Q by defining Q to be the set with elements of the form ab where a, b ∈ Z with b 6= 0, where we define ab = dc whenever ad = bc. Care must be taken in subsequent definitions as there are multiple ways to write a rational number. One then defines + and · as one does with fractions, and then verifies that Q is a field. To extend < to Q, if a, b, c, d ∈ N are all positive, we define ab < dc whenever ad < bc, and similar definitions are provided in other cases. One then verifies that Q is an ordered field. The real numbers may then be defined to be the set X is bounded above, R = X ⊆ Q X contains no greatest element, and if x ∈ X then y ∈ X for all y < x . It remains to show that R is an ordered field with the Least Upper Bound Property. This requires defining +, ·, and ≤, and verifying all of the above properties, which can be quite time consuming. As an example, one defines addition via X + Y := {x + y | x ∈ X, y ∈ Y } and then must check (F1), (F2), and that the zero element of R is {q ∈ Q | q < 0}. Furthermore, one obtains the least upper bound of elements of R, which are being viewed as subsets of Q, by taking the union of the subsets. 16 CHAPTER 1. AXIOMS OF NUMBER SYSTEMS Chapter 2 Sequences of Real Numbers One of the most important concepts in calculus is the notion of converging sequences. Knowing that a sequence converges to a number allows one to use elements of the sequence to better and better approximate the number. However, having a precise definition of a limit allows one to better understand what limits really are. 2.1 2.1.1 The Limit of a Sequence Definition of a Limit Before discussing limits, we must ask, “What is a sequence?” Definition 2.1.1. A sequence of real numbers is an ordered list of real numbers indexed by the natural numbers. If we have ak ∈ R for all k ∈ N, we will use (an )n≥1 or (a1 , a2 , a3 , . . .) to denote a sequences whose first element is a1 , whose second element is a2 , etc. Example 2.1.2. If c ∈ R and an = c for all n ∈ N, then the sequence (an )n≥1 is the constant sequence with value c. Example 2.1.3. For all n ∈ N, let an = n1 . Then (an )n≥1 is the sequence (1, 21 , 13 , 14 , . . .). Example 2.1.4. For all n ∈ N, let an = (−1)n+1 . Then (an )n≥1 is the sequence (1, −1, 1, −1, 1, −1, . . .). Example 2.1.5. Let a1 = 1 and a2 = 1. For n ∈ N with n ≥ 3, let an = an−1 + an−2 . Then (an )n≥1 is the sequence (1, 1, 2, 3, 5, 8, 13, . . .). This sequence is known as the Fibonacci sequence and is an example of a recursively defined sequence (a sequence where subsequent terms are defined using the previous terms under a fixed pattern). 17 18 CHAPTER 2. SEQUENCES OF REAL NUMBERS With the above notion of a sequence, we turn to the notion of limits. If we consider the sequence ( n1 )n≥1 , we intuitively know that as n gets larger and larger, the sequence gets closer and closer to zero. Thus we would want to use this to say that 0 is the limit of ( n1 )n≥1 . This may lead us to take the following as our definition of a limit: “A sequence (an )n≥1 has limit L (as n tends to infinity) if as n gets larger and larger, an gets closer to L.” However, the fault in the above idea of a limit is that ( n1 )n≥1 also gets ‘closer and closer’ to −1. We prefer 0 over −1 as the limit of ( n1 )n≥1 since n1 better and better approximates 0 whereas we intuitively know that ( n1 )n≥1 cannot approximate −1. This leads us to the following better idea of what a limit is: Heuristic Definition. A sequence (an )n≥1 has limit L if the terms of (an )n≥1 are eventually all as close to L as we would like. Using the above as a guideline, we obtain a rigorous, mathematical definition of the limit of a sequence of real numbers. Definition 2.1.6. Let (an )n≥1 be a sequence of real numbers. A number L ∈ R is said to be the limit of (an )n≥1 if for every > 0 there exists an N ∈ N (which depends on ) such that |an − L| < for all n ≥ N . If (an )n≥1 has limit L, we say that (an )n≥1 converges to L and write L = limn→∞ an . Otherwise we say that (an )n≥1 diverges. Example 2.1.7. Consider the constant sequence (an )n≥1 where an = c for all n ∈ N and some c ∈ R. Notice for all > 0, |an − c| = 0 < for all n ∈ N. Hence (an )n≥1 converges to c. Example 2.1.8. To see that limn→∞ n1 = 0 using the definition of the limit, let > 0 be arbitrary. Then (by the homework) there exists an N ∈ N such that 0 < N1 < . Therefore, for all n ≥ N we obtain that 0 < n1 ≤ N1 < . Hence n1 − 0 < for all n ≥ N . Hence limn→∞ n1 = 0. Note that ( n1 )n≥1 has limit zero, but no term in the sequence is zero. Example 2.1.9. Using the definition of a limit, we see that a sequence (an )n≥1 does not converge if for all L ∈ R there is an > 0 (depending on the L) such that for every N ∈ N there is an n ≥ N such that |an − L| ≥ . Using the above paragraph, we can show that ((−1)n+1 )n≥1 does not converge. Indeed let L ∈ R be arbitrary and let = 12 . Suppose there exists an N ∈ N such that |(−1)n+1 − L| < for all n ≥ N . Since there exists an odd number n greater than N , we obtain that |1 − L| < . Therefore, since = 12 , we obtain that L ∈ ( 12 , 32 ). Similarly, since there exists an even number n greater than N , we obtain that | − 1 − L| < . Therefore, since = 12 , 2.1. THE LIMIT OF A SEQUENCE 19 we obtain that L ∈ (− 32 , − 12 ). Hence L ∈ ( 12 , 32 ) ∩ (− 32 , − 12 ) = ∅ which is absurd. Hence we have a contradiction so L is not the limit of ((−1)n+1 )n≥1 . Therefore, since L ∈ R was arbitrary, ((−1)n+1 )n≥1 does not converge. 2.1.2 Uniqueness of the Limit Notice in the definition of ‘the’ limit of a sequence, we used ‘the’ instead of ‘a’; that is, how do we know that there is at most one limit to a sequence? The following justifies the use of the word ‘the’ and demonstrates one important proof technique when dealing with limits. Proposition 2.1.10. Let (an )n≥1 be a sequence of real numbers. If L and K are limits of (an )n≥1 , then L = K. 6 K, we know that Proof. Suppose that L 6= K. Let = |L−K| 2 . Since L = > 0. Since L is a limit of (an )n≥1 , we know by the definition of a limit that there exists an N1 ∈ N such that if n ≥ N1 then |an − L| < . Similarly, since K is a limit of (an )n≥1 , we know by the definition of a limit that there exists an N2 ∈ N such that if n ≥ N2 then |an − K| < . Let n = max{N1 , N2 }. By the above paragraph, we have that |an −L| < and |an − K| < . Hence by the Triangle Inequality |L − K| ≤ |L − an | + |an − K| < + = 2 = |L − K| which is absurd (i.e. x < x is false for all x ∈ R). Thus we have obtained a contradiction so it must be the case that L = K. To conclude this section, we note the following that demonstrates that |an − L| < may be replaced with |an − L| ≤ in the definition of the limit of a sequence. This can be useful on occasion and also establishes an important idea in handling limits: is simply a constant and may be modified. Proposition 2.1.11. Let (an )n≥1 be a sequence of real numbers and let L ∈ R. Then (an )n≥1 converges to L if and only if for all > 0 there exists an N ∈ N such that |an − L| ≤ for all n ≥ N . Proof. Suppose (an )n≥1 converges to L. Let > 0 be arbitrary. By the definition of the limit, there exists an N ∈ N such that |an − L| < for all n ≥ N . As this implies |an − L| ≤ for all n ≥ N and as > 0 was arbitrary, one direction of the proof is complete. For the other direction, assume that (an )n≥1 and L have the property listed in the statement. Let > 0 be arbitrary. Let 0 = 2 . Since 0 > 0, the assumptions of this direction imply that there exists an N ∈ N such that |an − L| ≤ 0 for all n ≥ N . Hence |an − L| ≤ 0 < for all n ≥ N . As > 0 was arbitrary, (an )n≥1 converges to L by the definition of the limit. 20 CHAPTER 2. SEQUENCES OF REAL NUMBERS Remark 2.1.12. By analyzing the above proof, we see that the definition of the limit can be modified to involve a constant multiple of . That is, if (an )n≥1 is a sequence of real numbers, L ∈ R, and k > 0, then L = limn→∞ an if and only if for all > 0 there exists an N ∈ N such that |an − L| < k for all n ≥ N . It is very important to note that the constant k CANNOT depend on n nor . 2.2 The Monotone Convergence Theorem With the above, there are two main questions for us to ask: “What types of sequences converge?” and “How can we find the limits of sequences without always appealing to the definition?” The goal of this section is to look at the first question. First let us ask, “Does the sequence (n)n≥1 converge?” Intuitively the answer is no since this sequence does not approximate a number. To make this rigorous, consider the following. Definition 2.2.1. A sequence (an )n≥1 of real numbers is said to be bounded if the set {an | n ∈ N} is bounded. Proposition 2.2.2. Every convergent sequence is bounded. Proof. Let (an )n≥1 be a sequence of real numbers that converge to a number L ∈ R. Let = 1. By the definition of a limit, there exists an N ∈ N such that |an − L| ≤ = 1 for all n ≥ N . Hence |an | ≤ |L| + 1 for all n ≥ N by the Triangle Inequality. Let M = max{|a1 |, |a2 |, . . . , |aN |, |L| + 1}. Using the above paragraph, we see that |an | ≤ M for all n ∈ N. Hence −M ≤ an ≤ M for all n ∈ N so (an )n≥1 is bounded. The above shows us that boundness is a requirement for convergence of a sequence. However, a bounded sequence need not converge. Indeed Example 2.1.9 shows that the sequence ((−1)n+1 )n≥1 (which is clearly bounded) does not converge. However, a natural question to ask is, “Is there a condition we may place on a sequence so that boundedness implies convergence?” Indeed there is! Definition 2.2.3. A sequence (an )n≥1 of real numbers is said to be • increasing if an < an+1 for all n ∈ N, • non-decreasing if an ≤ an+1 for all n ∈ N, • decreasing if an > an+1 for all n ∈ N, • non-increasing if an ≥ an+1 for all n ∈ N, and 2.3. LIMIT THEOREMS 21 • monotone if (an )n≥1 is non-decreasing or non-increasing. Theorem 2.2.4 (Monotone Convergence Theorem). A monotone sequence (an )n≥1 of real numbers converges if and only if (an )n≥1 is bounded. Proof. By Proposition 2.2.2, if (an )n≥1 converges, then (an )n≥1 is bounded. For the other direction, suppose that (an )n≥1 is a monotone sequence that is bounded. We will assume that (an )n≥1 is a non-decreasing sequence for the remainder of the proof as the case when (an )n≥1 is non-increasing can be demonstrated using similar arguments. Since (an )n≥1 is bounded, {an | n ∈ N} has a least upper bounded, say α, by the Least Upper Bound Property (Theorem 1.3.18). We claim that α is the limit of (an )n≥1 . To see this, let > 0 be arbitrary. Since α is the least upper bound of {an | n ∈ N}, we know that an ≤ α for all n ∈ N and α − is not an upper bound of {an | n ∈ N}. Hence there exists an N ∈ N such that α − < aN . Since (an )n≥1 is non-decreasing, we obtain for all n ≥ N that α − < aN ≤ an ≤ α, which implies |an − α| < for all n ≥ N . Since > 0 was arbitrary, we obtain that α is the limit of (an )n≥1 by definition. Hence (an )n≥1 converges. Example 2.2.5. Consider the sequence (an )n≥1 defined recursively via √ a1 = 1 and an+1 = 3 + 2an for all n ≥ 1. In the homework, it was demonstrated that 0 ≤ an ≤ an+1 ≤ 3 for all n ∈ N. Hence (an )n≥1 converges by the Monotone Convergence Theorem. The question remains, “What is the limit of (an )n≥1 ?” By the proof of the Monotone Convergence Theorem, we know the answer is lub({an | n ∈ N}), which is at most 3. But is the answer 3 or a number less than 3? 2.3 Limit Theorems To answer the above question and aid us in our computation of limits, there are several theorems we may explore to aid us. 2.3.1 Limit Arithmetic Our first goal is to determine how limits behave with respect to the simplest operations on R. Theorem 2.3.1. Let (an )n≥1 and (bn )n≥1 be sequences of real numbers such that L = limn→∞ an and K = limn→∞ bn exist. Then a) limn→∞ an + bn = L + K. b) limn→∞ an bn = LK. c) limn→∞ can = cL for all c ∈ R. 22 CHAPTER 2. SEQUENCES OF REAL NUMBERS d) limn→∞ 1 bn = 1 K whenever K 6= 0 (see proof for technicality). e) limn→∞ an bn = L K whenever K 6= 0 (see proof for technicality). Proof. a) Let > 0 be arbitrary. Since L = limn→∞ an , there exists an N1 ∈ N such that |an − L| < 2 for all n ≥ N1 . Similarly, since K = limn→∞ bn , there exists an N2 ∈ N such that |bn − L| < 2 for all n ≥ N2 . Let N = max{N1 , N2 }. Hence, using the Triangle Inequality, for all n ≥ N , |(an + bn ) − (L + K)| ≤ |an − L| + |bn − K| < + . 2 2 Hence (an + bn )n≥1 converges to L + K by definition. b) Let > 0 be arbitrary. First note that 0 ≤ |K| < |K| + 1 so |K| 0 ≤ |K|+1 ≤ 1 (we will use this later). Next, since (an )n≥1 convergence, (an )n≥1 is bounded by Proposition 2.2.2. Hence there exists an M > 0 such that |an | < M for all n ∈ N. Since L = limn→∞ an , there exists an N1 ∈ N such that |an −L| < 2(|K|+1) 1 for all n ≥ N1 (as 2(|K|+1) > 0 is a constant). Similarly, since K = limn→∞ bn , 1 there exists an N2 ∈ N such that |bn − L| < 2M for all n ≥ N2 (as 2M is a constant). Let N = max{N1 , N2 }. Hence, using the Triangle Inequality, for all n ≥ N , |an bn − LK| = |(an bn − an K) + (an K − LK)| ≤ |an bn − an K| + |an K − LK| ≤ |an ||bn − K| + |K||an − L| ≤ M |bn − K| + |K||an − L| + |K| ≤M 2M 2(|K| + 1) ≤ + = . 2 2 Hence (an bn )n≥1 converges to LK by definition. c) Apply part (b) with bn = c for all n ∈ N. d) The one technicality here is that if bn = 0, then |K| 2 1 bn does not make sense. However, since K = limn→∞ bn and since > 0 as K = 6 0, there |K| exists an N1 ∈ N such that |bn − K| < 2 for all n ≥ N1 . Therefore, by |K| the Triangle Inequality, |bn | ≥ |K| − |K| 2 = 2 > 0 for all n ≥ N1 . Hence, if n ≥ N1 we have that |bn | > 0 and thus b1n is well-defined for suitably large n. Furthermore, since limits depend only on the behaviour for large n, it makes sense to consider the sequence ( b1n )n≥1 . Let > 0 be arbitrary. In the above paragraph, we saw that |bn | ≥ |K| 2 2 for all n ≥ N1 and thus |b1n | ≤ |K| for all n ≥ N1 (as K = 6 0). Since 2.3. LIMIT THEOREMS 23 K = limn→∞ bn , there exists an N2 ∈ N such that |bn − K| < n ≥ N2 (as |K|2 2 |K|2 2 for all > 0 is a constant). Therefore, for all n ≥ max{N1 , N2 }, 1 1 |K − bn | b − K = |b ||K| n n |K|2 2|bn ||K| |K| 1 ≤ 2 |bn | |K| 2 ≤ = . 2 |K| ≤ Hence ( b1n )n≥1 converges to K1 by definition. e) By part (d), limn→∞ b1n = K1 . Hence, as limn→∞ an = L, part (b) implies that limn→∞ an b1n = L K1 completing the proof. Example 2.3.2. Consider the sequence (an )n≥1 defined recursively via √ a1 = 1 and an+1 = 3 + 2an for all n ≥ 1. In Example 2.2.5, we used the Monotone Convergence Theorem (Theorem 2.2.4) along with the fact that 0 ≤ an ≤ an+1 ≤ 3 to show that (an )n≥1 converges. It remains to compute the limit of this sequence. √ Let L = limn→∞ an . Since an+1 = 3 + 2an for all n ≥ 1, we have that a2n+1 = 3 + 2an for all n ∈ N. Therefore, using Theorem 2.3.1, we obtain that 3 + 2L = lim 3 + 2an = lim a2n+1 n→∞ n→∞ = lim a2n index shift does not change the limit n→∞ = lim an n→∞ 2 = L2 . Hence L2 − 2L − 3 = 0 so (L − 3)(L + 1) = 0 so L = 3 or L = −1. However, since −1 < 0 < 1 = a1 ≤ an for all n ∈ N, |an − (−1)| ≥ 2 for all n ∈ N and thus −1 cannot be the limit of (an )n≥1 by the definition of the limit. Hence limn→∞ an = 3. Example 2.3.3. Consider the sequence (an )n≥1 where an = n ∈ N. Does (an )n≥1 converge and, if so, what is its limit? To answer this question, notice that an = 5n2 n2 5 + + 2n = 3n2 − n + 4 n2 3 − 1 n 2 n2 + 4 n2 = 5n2 +2n 3n2 −n+4 for all 5 + n22 . 3 − n1 + n42 Since limn→∞ n1 = 0 by Example 2.1.8, and since limn→∞ n12 = 0 (see homework), we obtain that 2 1 4 5 lim 5 + 2 = 5 and lim 3 − + 2 = 3 so lim an = . n→∞ n→∞ n→∞ n n n 3 24 CHAPTER 2. SEQUENCES OF REAL NUMBERS In part (e) of Theorem 2.3.1, it was required in the proof that the denominator does not converge to 0. This is due to the fact that there are many different types of behaviour that may occur when the denominator of a sequence of fractions tends to zero. For two examples, first consider the sequences (an )n≥1 and (bn )n≥1 where an = bn = n1 for all n ∈ N. Then clearly limn→∞ an = 0 = limn→∞ bn , and 1 an n = =1 1 bn n for all n ∈ N. Hence limn→∞ abnn = 1. Alternatively, consider the sequences (an )n≥1 and (bn )n≥1 where an = 1 and bn = n1 for all n ∈ N. Then clearly limn→∞ an = 1 and limn→∞ bn = 0, yet an 1 = =n 1 bn n does not converge as (n)n≥1 is not bounded (see Proposition 2.2.2). Thus, if (a n≥1 and (bn )n≥1 are sequences and limn→∞ bn = 0, it is n ) possible that abnn does not converge. However, if limn→∞ abnn exists, then n≥1 by part (b) of Theorem 2.3.1 we must have that an lim an = lim bn = n→∞ n→∞ bn an an lim lim bn = lim (0) = 0. n→∞ bn n→∞ n→∞ bn Thus a necessary condition for limn→∞ limn→∞ an = 0. 2.3.2 an bn to exist when limn→∞ bn = 0 is Diverging to Infinity We have seen several examples of sequences that do not converge. In particular, Proposition 2.2.2 says that unbounded sequences have no chance to converge. However, it is useful to discuss specific notions of divergence for unbounded sequences. Definition 2.3.4. A sequence (an )n≥1 of real numbers is said to diverge to infinity , denoted limn→∞ an = ∞, if for every M ∈ R there exists an N ∈ N such that an ≥ M for all n ≥ N . Similarly, a sequence (an )n≥1 of real numbers is said to diverge to negative infinity, denoted limn→∞ an = −∞, if for every M ∈ R there exists an N ∈ N such that an ≤ M for all n ≥ N . Example 2.3.5. It is clear that limn→∞ n = ∞. Using the same proof ideas as in Theorem 2.3.1, we obtain the following. 2.3. LIMIT THEOREMS 25 Theorem 2.3.6. Let (an )n≥1 and (bn )n≥1 be sequences of real numbers. Suppose that (bn )n≥1 diverges to ∞ (respectively −∞). Then a) If (an )n≥1 is bounded below (respectively above), then limn→∞ an +bn = ∞ (respectively limn→∞ an + bn = −∞). b) If there exists an M > 0 such that an ≥ M for all n ∈ N, then limn→∞ an bn = ∞ (respectively limn→∞ an bn = −∞). c) If (an )n≥1 is bounded, then limn→∞ an bn = 0. Proof. See the homework. The above theorem aids us in computing limits of fractions where the denominator grows faster than the numerator. Example 2.3.7. Consider the sequence (an )n≥1 where an = n ∈ N. Then n 2 + n1 2 + n1 = an = . n + n3 n n + n3 Therefore, since limn→∞ 3 n = 0 so 3 n n≥1 2n+1 n2 +3 is bounded, and since limn→∞ n = ∞, we have limn→∞ n+ n3 = ∞. Hence since limn→∞ 2+ n1 = 2 so 2 + is bounded, we have that limn→∞ 2.3.3 2n+1 n2 +3 for all 1 n n≥1 = 0. The Squeeze Theorem Using Theorem 2.3.6, it is possible to show that cos(n) n n≥1 converges to zero. Indeed, if an = cos(n) and bn = n for all n ∈ N, then (an )n≥1 is bounded (above by 1 and below by −1) and limn→∞ bn = ∞, so part (c) of Theorem 2.3.6 implies limn→∞ cos(n) = 0. Alternatively, we can show n cos(n) 1 limn→∞ n = 0 by noting that − n ≤ cos(n) ≤ n1 for all n ∈ N and by n applying the following useful theorem (which may be used to prove part (c) of Theorem 2.3.6). Theorem 2.3.8 (Squeeze Theorem). Let (an )n≥1 , (bn )n≥1 and (cn )n≥1 be sequences of real numbers such that there exists an N0 ∈ N such that an ≤ bn ≤ cn for all n ≥ N0 . If limn→∞ an = limn→∞ cn = L, then (bn )n≥1 converges and limn→∞ bn = L. Proof. Let > 0 be arbitrary. Since L = limn→∞ an , there exists an N1 ∈ N such that |an − L| < for all n ≥ N1 . Hence L − < an for all n ≥ N1 . Similarly, since L = limn→∞ cn , there exists an N2 ∈ N such that |cn − L| < 26 CHAPTER 2. SEQUENCES OF REAL NUMBERS for all n ≥ N2 . Hence cn < L + for all n ≥ N2 . Therefore, for all n ≥ max{N0 , N1 , N2 }, we have that L − < an ≤ bn ≤ cn ≤ L + . Hence L − ≤ bn ≤ L + for all n ≥ max{N0 , N1 , N2 }, which implies − ≤ bn − L ≤ and thus |bn − L| < for all n ≥ max{N0 , N1 , N2 }. Hence (bn )n≥1 converges and limn→∞ bn = L by definition. 2.3.4 Limit Supremum and Limit Infimum There are several sequences that do not converge nor diverge to ±∞. For example, the sequence ((−1)n+1 )n≥1 has been shown to not converge and clearly does not diverge to ±∞ as it is bounded. Consequently, we may ask, “Is it possible to obtain some information about this sequence as n tends to infinity?” Clearly everything we want to know about the sequence ((−1)n+1 )n≥1 can be obtained by taking the least upper bound and greatest lower bound of its elements. Consequently, we extend the notions of least upper bound and greatest lower bound to include infinities. Definition 2.3.9. Let X be a set of real numbers. The supremum of X, denoted sup(X), is defined to be −∞ if X = ∅ lub(X) if X = 6 ∅ and X is bounded above . sup(X) := ∞ if X is not bounded above Similarly, the infimum of X, denoted inf(X), is defined to be ∞ if X = ∅ glb(X) if X = 6 ∅ and if X is bounded below . inf(X) := −∞ if X is not bounded below The infimum and supremum of sequences are not the objects we are after since we are more interested in the behaviour of sequences as n gets large. For example, consider the sequence (−1)n+1 (1 + n1 ) . It is not difficult n≥1 to see that 2 is the supremum of this sequence and − 23 is the infimum of this sequence. However, as n gets larger and larger, the largest values of the sequence are very close to 1 and the smallest values of the sequence are very close to −1. How can we express this notion for arbitrary sequences mathematically? Let (an )n≥1 be a sequence. To see how the largest values of (an )n≥1 behave as n grows, we can take the suprumum after we ignore the first few terms. Consequently, we define a new sequence (bn )n≥1 defined by bn = sup{ak | k ≥ n}. 2.3. LIMIT THEOREMS 27 It is not difficult to see that b1 ≥ b2 ≥ b3 ≥ · · · as the supremum may only get smaller as we remove terms from the set from which we are taking the supremum. Consequently we see that (bn )n≥1 is a monotone sequence. Since (bn )n≥1 is non-increasing, (bn )n≥1 either converges to a number, diverges to −∞, or bn = ∞ for all n. Applying the same idea with the sequence (cn )n≥1 where cn = inf{ak | k ≥ n} we arrive at the following. Definition 2.3.10. The limit supremum of a sequence (an )n≥1 of real numbers, denoted lim supn→∞ an , is lim sup an = lim sup{ak | k ≥ n} ∈ R ∪ {±∞}. n→∞ n→∞ Similarly, the limit infimum of a sequence (an )n≥1 of real numbers,denoted lim inf n→∞ an , is lim inf an = lim inf{ak | k ≥ n} ∈ R ∪ {±∞}. n→∞ n→∞ To see that all values are possible, it is not difficult to see that n+1 lim sup(−1) n→∞ 1 1+ n n+1 =1 and lim inf (−1) lim sup n = ∞ and lim inf n = ∞. n→∞ 1 1+ n = −1 whereas n→∞ n→∞ Unsurprisingly, there is a solid connection between lim inf, lim sup, and lim. To see this connection, we require the following. Theorem 2.3.11 (Comparison Theorem). Let (an )n≥1 and (bn )n≥1 be convergent sequences of real numbers. Suppose that there exists an N0 ∈ N such that an ≤ bn for all n ≥ N0 . Then limn→∞ an ≤ limn→∞ bn . Proof. Let L = limn→∞ an and let K = limn→∞ bn . Suppose that K < L. Therefore if = L−K 2 , then > 0. Since L = limn→∞ an , there exists an N1 ∈ N such that |an −L| < for all n ≥ N1 . Hence L < an + for all n ≥ N1 . Similarly, since K = limn→∞ bn , there exists an N2 ∈ N such that |bn − K| < for all n ≥ N2 . Hence bn ≤ K + for all n ≥ N2 . Therefore, if n ≥ max{N1 , N2 , N0 }, we obtain that L < an + ≤ bn + ≤ K + 2 = K + |K − L|. Hence L − K < |K − L|. However, this is impossible as we are assuming that K < L which would imply |K − L| = L − K. Hence we have obtained a contradiction in the case that K < L so it must be the case that L ≤ K. 28 CHAPTER 2. SEQUENCES OF REAL NUMBERS Proposition 2.3.12. Let (an )n≥1 be a sequence of real numbers such that lim inf an , lim sup an ∈ R. n→∞ n→∞ Then lim inf an ≤ lim sup an . n→∞ n→∞ In addition, (an )n≥1 converges if and only if lim inf n→∞ an = lim supn→∞ an . In this case lim an = lim inf an = lim sup an . n→∞ n→∞ n→∞ Proof. For the remainder of the proof, for each n ∈ N let bn = sup{ak | k ≥ n} ∈ R cn = inf{ak | k ≥ n} ∈ R. and Clearly lim sup an = lim bn , n→∞ lim inf an = lim cn , n→∞ n→∞ and n→∞ cn ≤ an ≤ bn for all n. Hence, the Comparison Theorem (Theorem 2.3.11) implies lim inf n→∞ an ≤ lim supn→∞ an . Next, suppose that lim inf n→∞ an = lim supn→∞ an . Therefore, since cn ≤ an ≤ bn for all n ∈ N, we obtain that (an )n≥1 converges and lim an = lim inf an = lim sup an n→∞ n→∞ n→∞ by the Squeeze Theorem (Theorem 2.3.8). Finally, suppose L = limn→∞ an exists. Let > 0. Hence there exists an N ∈ N such that |an − L| < for all n ∈ N. Thus L − ≤ an ≤ L + for all n ≥ N . Therefore L − ≤ cn ≤ bn ≤ L + for all n ≥ N by the definition of bn and cn . Hence, the Comparison Theorem (Theorem 2.3.11) implies that L − ≤ lim cn ≤ lim bn ≤ L + n→∞ n→∞ for all > 0. In particular, L− 1 1 ≤ lim cn ≤ lim bn ≤ L + n→∞ n→∞ m m 1 for all m ∈ N. Therefore, since limm→∞ m = 0, the above is only possible (for example, by the Squeeze Theorem (Theorem 2.3.8)) if L = lim bn = lim cn . n→∞ n→∞ 2.4. THE BOLZANO–WEIERSTRASS THEOREM 2.4 29 The Bolzano–Weierstrass Theorem We have seen that many sequences do not converge. The lim inf and lim sup do provide us with some information about the sequence. However, one natural question to ask is, “If we have a sequence that does not converge, can we remove terms from the sequence to make it converge?” Of course for convergence, our new sequence must be bounded by Proposition 2.2.2. Thus perhaps a better question is, “If we have a bounded sequence that does not converge, can we remove terms from the sequence to make it converge?” 2.4.1 Subsequences To answer the above question, we must describe what we mean by ‘remove terms from a sequence’. This is made precise by the mathematical notion of a subsequence. Definition 2.4.1. A subsequence of a sequence (an )n≥1 of real numbers is any sequence (bn )n≥1 of real numbers such that there exists an increasing sequence of natural numbers (kn )n≥1 so that bn = akn for all n ∈ N. For example, if (an )n≥1 is our favourite sequence an = (−1)n+1 for all n ∈ N and if we choose kn = 2n − 1 for all n ∈ N, then (akn )n≥1 is the sequence (1, 1, 1, . . .). Similarly, if (bn )n≥1 is the sequence where bn = n1 2 for all n ∈ N and if we choose kn = n for all n ∈ N, then (bkn )n≥1 is the sequence (1, 14 , 91 , . . .) = n12 . n≥1 In the above paragraph, notice that the sequence (an )n≥1 diverges whereas the given subsequence converges. Thus it is possible that divergent sequences have convergent subsequences. Furthermore, (bn )n≥1 and the given subsequence both converge to 0. It is not difficult to see that every subsequence of (bn )n≥1 converges to zero and this is no coincidence. Proposition 2.4.2. Let (an )n≥1 be a sequence of real numbers that converges to L. Every subsequence of (an )n≥1 converges to L. Proof. Let (akn )n≥1 be a subsequence of (an )n≥1 . Let > 0. Since L = limn→∞ an , there exists an N ∈ N such that |an − L| < for all n ≥ N . Since (kn )n≥1 is an increasing sequence of natural numbers, there exists an N0 ∈ N such that kn ≥ N for all n ≥ N0 . Hence |akn − L| < for all n ≥ N0 . Therefore, as > 0 was arbitrary, we obtain that limn→∞ akn = L by the definition of the limit. 2.4.2 The Peak Point Lemma It is natural to ask, “Given a sequence, is there a ‘nice’ subsequence?” Of course ‘nice’ is ambiguous, but the following demonstrates a specific form of subsequence we may always construct. 30 CHAPTER 2. SEQUENCES OF REAL NUMBERS Lemma 2.4.3 (The Peak Point Lemma). Every sequence of real numbers has a monotone subsequence. In order to prove the above lemma (and from which it gets its name), we will use the following notion: Definition 2.4.4. Let (an )n≥1 be a sequence of real numbers. An index n0 ∈ N is said to be a peak point for the sequence (an )n≥1 if an ≤ an0 for all n ≥ n0 . Proof of Lemma 2.4.3. Let (an )n≥1 be a sequence of real numbers. The proof is divided into two cases: Case 1. (an )n≥1 has an infinite number of peak points: By assumption there exists indices k1 < k2 < k3 < · · · such that kj is a peak point for all j ∈ N. Therefore, we have by the definition of a peak point that akn ≥ akn+1 for all n ∈ N. Hence (akn )n≥1 is a non-increasing subsequence of (an )n≥1 . Case 2. (an )n≥1 has a finite number (or no) peak points: Let n0 be the largest peak point of (an )n≥1 (or n0 = 1 if (an )n≥1 has no peak points), and let k1 = n0 + 1. Thus k1 is not a peak point of (an )n≥1 . Therefore there exists a k2 > k1 = n0 + 1 such that ak2 > ak1 . Subsequently, since k2 > k1 > n0 , k2 is not a peak point. Therefore there exists a k3 > k2 such that ak3 > ak2 . Repeating this process ad nauseum, we obtain a sequence of indices k1 < k2 < k3 < · · · such that akn+1 > akn for all n ∈ N. Hence (akn )n≥1 is an increasing subsequence of (an )n≥1 . As in either case a monotone subsequence can be constructed, the result follows. 2.4.3 The Bolzano–Weierstrass Theorem Combining the Peak Point Lemma together with the Monotone Convergence Theorem, we easily obtain the following. Theorem 2.4.5 (The Bolzano-Weierstrass Theorem). Every bounded sequence of real numbers has a convergent subsequence. Proof. Let (an )n≥1 be a bounded sequence of real numbers. By the Peak Point Lemma (Lemma 2.4.3), there exists a monotone subsequence (akn )n≥1 of (an )n≥1 . Since (an )n≥1 is bounded, (akn )n≥1 is also bound and thus converges by the Monotone Convergence Theorem (Theorem 2.2.4). Chapter 3 An Introduction to Topology With the above study of sequences complete, we can turn our attention to analyzing properties of the real numbers and their subsets through convergent sequences. One of the most important properties of the real numbers is the notion of completeness, which implies the convergence of specific types of sequences. Furthermore, when a subset of real numbers has specific properties, the limit of a convergent sequence of real numbers from the subset must also be in the subset. 3.1 Completeness of the Real Numbers Currently, one difficulty with determining when a sequence converges is that one must have an idea of what the limit of the sequence is in order to prove convergence. This even holds for bounded monotone sequences as intuition (and results) tell us the limit is either the least upper bound or greatest lower bound of the sequence. Thus it is natural to ask, “Is there a way to determine whether a sequence converges?” 3.1.1 Cauchy Sequences If a sequence were to converge, then eventually all terms in the sequence are as close to the limit as we would like. In particular, by the Triangle Inequality, eventually all terms in the sequence are as close to each other as we would like. This leads us to the notion of a Cauchy sequence. Heuristic Definition. A sequence (an )n≥1 is said to be Cauchy if the terms of (an )n≥1 are as close to each other as we would like as long as n is large enough. As with the definition of the limit of a sequence, the notion of Cauchy sequence can be made mathematically precise. 31 32 CHAPTER 3. AN INTRODUCTION TO TOPOLOGY Definition 3.1.1. A sequence (an )n≥1 of real numbers is said to be Cauchy if for all > 0 there exists an N ∈ N such that |an − am | < for all n, m ≥ N . Note that it is possible a sequence (an )n≥1 satisfies limn→∞ an+1 − an = 0 P but is not Cauchy. Indeed if an = nk=1 k1 for each n ∈ N, then an+1 − an = 1 n+1 which clearly converges to zero. However, it is possible to show that (an )n≥1 diverges to infinity. Although we cannot prove this divergence at this time, many students will have seen series in previous courses and techniques of the last chapter of this course will enable this proof. As our definition of Cauchy sequence was motivated by convergence, the following result should not be surprising (and provides a plethora of examples of Cauchy sequences). Theorem 3.1.2. Every convergent sequence of real numbers is Cauchy. Proof. Let (an )n≥1 be a convergent sequence of real numbers. Let L = limn→∞ an . To see that (an )n≥1 is Cauchy, let > 0 be arbitrary. Since L = limn→∞ an , there exists an N ∈ N such that |an − L| < 2 for all n ≥ N . Therefore, for all n, m ≥ N , |an − am | ≤ |an − L| + |L − am | < + = . 2 2 Thus, as > 0 was arbitrary, (an )n≥1 is Cauchy by definition. 3.1.2 Convergence of Cauchy Sequences As our motivation for the study of Cauchy sequences was to find a method for demonstrating the convergence of a sequence without knowing its limit, it is natural to ask “Does every Cauchy sequence converge?” One method for providing intuition to what the answer of this question is is to see if Cauchy sequences share similar properties to convergent sequences. In particular, analyzing Proposition 2.2.2 and its proof, we obtain the following. Lemma 3.1.3. Every Cauchy Sequence is bounded. Proof. Let (an )n≥1 be a Cauchy sequence. Since (an )n≥1 is Cauchy, there exists an N ∈ N such that |an − am | < 1 for all n, m ≥ N . Hence, by letting m = N , we obtain that |an | ≤ |aN | + 1 for all n ≥ N by the Triangle Inequality. Let M = max{|a1 |, |a2 |, . . . , |aN |, |L| + 1}. Using the above paragraph, we see that |an | ≤ M for all n ∈ N. Hence −M ≤ an ≤ M for all n ∈ N so (an )n≥1 is bounded. As further intuition towards whether all Cauchy sequence converge, recall Proposition 2.4.2 demonstrates subsequences of convergent sequence must converge. The following demonstrates the converse is true if our sequence is assumed to be Cauchy. 3.2. TOPOLOGY OF THE REAL NUMBERS 33 Lemma 3.1.4. Let (an )n≥1 be a Cauchy sequence. If a subsequence of (an )n≥1 converges, then (an )n≥1 converges. Proof. Let (an )n≥1 be a Cauchy sequence with a convergent subsequence (akn )n≥1 and let L = limn→∞ akn . We claim that limn→∞ an = L. To see this, let > 0 be arbitrary. Since (an )n≥1 is Cauchy, there exists an N ∈ N such that |an − am | < 2 for all n, m ≥ N . Furthermore, since L = limn→∞ akn , there exists an kj ≥ N such that |akj − L| < 2 . Hence, if n ≥ N then |an − L| ≤ |an − akj | + |akj − L| < + = . 2 2 Thus, as > 0 was arbitrary, (an )n≥1 is converges to L by definition. Using Lemma 3.1.4, we easily obtain the following. Theorem 3.1.5 (Completeness of the Real Numbers). Every Cauchy sequence of real numbers converges. Proof. Let (an )n≥1 be a Cauchy sequence. By Lemma 3.1.3, (an )n≥1 is bounded. Therefore the Bolzano-Weierstrass Theorem (Theorem 2.4.5) implies that (an )n≥1 has a convergent subsequence. Hence Lemma 3.1.4 implies that (an )n≥1 converges. Theorem 3.1.5 demonstrates that the real numbers is a complete space (a space where every Cauchy sequence converges). The terminology comes from the fact that complete spaces have no ‘holes’ in them. In fact, the Completeness of the Real Numbers is logically equivalent to the Least Upper Bound Property (i.e. if instead of asking for the real numbers to have the Least Upper Bound Property we asked for them to be complete, we would still end up with the real numbers). 3.2 Topology of the Real Numbers Often when dealing with limits, it is useful to think of convergence in terms of open intervals. Indeed notice that |an − L| < for all n ≥ N if and only if an ∈ (L − , L + ) for all n ≥ N . Consequently, open intervals have an important connection to properties of the real numbers. The goal of this section is to study how specific subsets of real numbers can give us results about real numbers. 3.2.1 Open Sets Instead of just studying open intervals, we desire a larger class of sets. These sets are characterized by the fact that each of their elements is contained in an open interval which is contained in the set. 34 CHAPTER 3. AN INTRODUCTION TO TOPOLOGY Definition 3.2.1. A set U ⊆ R is said to be open if whenever x ∈ U there exists an > 0 such that (x − , x + ) ⊆ U . Example 3.2.2. Unsurprisingly, each open interval is open. To see this, suppose a, b ∈ R are such that a < b. To see that (a, b) is open, let x ∈ (a, b) be arbitrary. Then, if = min{x − a, b − x}, then (x − , x + ) ⊆ (a, b). Thus, as x ∈ (a, b) was arbitrary, (a, b) is open. Using similar arguments, it is possible to show that if a = −∞ and/or b = ∞, then (a, b) is open. Consequently, (−∞, ∞) = R is open. Example 3.2.3. The empty set is open since the definition of open is vacuously true for ∅ (as there are no elements in the empty set). Example 3.2.4. If a, b ∈ R are such that a < b, then [a, b) is not open. Indeed for all > 0, (a − , a + ) is not a subset of [a, b) since a − 12 ∈ / [a, b). Similar arguments can be used to show that (a, b] and [a, b] are not open. With the above definition and examples, it is natural to ask “Can we describe all open subsets of the real numbers?” The following gives us a method for constructing open sets from other open sets. Proposition 3.2.5. Let I be an non-empty set and for each i ∈ I let Ui be an open subset of R. Then • S • T i∈I Ui is open in R, and i∈I Ui is open in R provided I has a finite number of elements. Proof. To see that i∈I Ui is open, let x ∈ i∈I Ui be arbitrary. Then x ∈ Ui0 for some i0 ∈ I. Therefore, as Ui0 is open, there exists an > 0 such that S S (x − , x + ) ⊆ Ui0 . Hence (x − , x + ) ⊆ i∈I Ui . Since x ∈ i∈I Ui was S arbitrary, i∈I Ui is open. T To see that i∈I Ui is open in R provided I has a finite number of T elements, x ∈ i∈I Ui be arbitrary. Hence x ∈ Ui for each i ∈ I. Since Ui is open, for each i ∈ I there exists an i > 0 such that (x − i , x + i ) ⊆ Ui . Let = min{i | i ∈ I}. Since I has a finite number of elements, > 0. Furthermore, by the definition of , (x − , x + ) ⊆ Ui for all i ∈ I. Hence T T T (x − , x + ) ⊆ i∈I Ui . Since x ∈ i∈I Ui was arbitrary, i∈I Ui is open. S S Note the proof of the intersection part of Proposition 3.2.5 does not work if I has an infinite number of elements. However, this does not mean the result is false when I has an infinite number of elements. However, the result is false when I has aninfinite number of elements. To see this, for each n ∈ N let Un = − n1 , n1 . Then \ n≥1 Un = {0} 3.2. TOPOLOGY OF THE REAL NUMBERS 35 which is clearly not an open set. Proposition 3.2.5 shows us that the union of any number of open intervals is an open set. In fact, our next result (Theorem 3.2.7) will demonstrate that every open subset of the real numbers is a union of open intervals. To prove this result, we will need the following mathematical construct. This mathematical construct relaxes the notion of what it means for two objects to be the same. Definition 3.2.6. Let X be a set. A relation ∼ on the elements of X is said to be an equivalence relation if: 1. (reflexive) x ∼ x for all x ∈ X, 2. (symmetric) if x ∼ y, then y ∼ x for all x, y ∈ X, and 3. (transitive) if x ∼ y and y ∼ z, then x ∼ z for all x, y, z ∈ X. Given an x ∈ X, the set {y ∈ X | y ∼ x} is called the equivalence class of x. Equivalence relations will be an essential component of the subsequent chapter. For now, our first example of an equivalence relation is contained in the proof of the following result. Theorem 3.2.7. Every (non-empty) open subset of R is a union of open intervals. Proof. Let U be an open subset of R. Define an equivalence relation on U as follows: if x, y ∈ U , then x ∼ y if and only if [x, y] ⊆ U and [y, x] ⊆ U (one of these intervals is usually empty). It is not too difficult to see that ∼ is an equivalence relation on U (indeed ∼ is clearly reflexive and symmetric; transitive takes a moment of thought to check the four cases). For each x ∈ U , let Ex denote the equivalence class of x with respect to ∼. Clearly [ U= Ex x∈U as x ∈ Ex for all x ∈ U . Hence if each Ex is an open interval, the proof will be complete. Let x ∈ U be fixed. Let αx = inf(Ex ) and βx = sup(Ex ). We claim that Ex = (αx , βx ). First, we claim that αx < βx . To see this, notice that x ∈ Ex ⊆ U . Hence, as U is open, there exists an > 0 such that (x − , x + ) ⊆ U . Clearly y ∼ x for all y ∈ (x − , x + ) so αx ≤ x − < x + ≤ β x . 36 CHAPTER 3. AN INTRODUCTION TO TOPOLOGY To see that (αx , βx ) ⊆ Ex , let y ∈ (αx , βx ) be arbitrary. Since αx < y < βx , by the definition of inf and sup there exists z1 , z2 ∈ Ex such that α x ≤ z1 < y < z 2 ≤ β x . Since z1 , z2 ∈ Ex , we have z1 ∼ x and z2 ∼ x. Thus z1 ∼ z2 so [z1 , z2 ] ⊆ U . Hence y ∈ [z1 , z2 ] ⊆ U . Therefore, as y ∈ (αx , βx ) was arbitrary, (αx , βx ) ⊆ Ex . To see that Ex ⊆ (αx , βx ), note that Ex ⊆ (αx , βx ) ∪ {αx , βx } by the definition of αx and βx . Thus it suffices to show that αx , βx ∈ / Ex . Suppose βx ∈ Ex (this implies βx 6= ∞). Then βx ∈ U so there exists an > 0 so that (βx − , βx + ) ⊆ U . Hence βx + 12 ∼ βx ∼ x (as βx ∈ Ex ). Hence βx + 12 ∈ Ex . However βx + 12 > βx so βx + 12 ∈ Ex contradicts the fact that βx = sup(Ex ). Hence we have obtained a contradiction so βx ∈ / Ex . Similar arguments show that αx ∈ / Ex . Hence Ex = (αx , βx ) thereby completing the proof. Analyzing the above proof, a natural question to ask is, “How many open intervals do we need in the union?” Clearly instead of unioning over all elements in the open set, we can take a union of all of the equivalence classes. Consequently, we can index the open intervals by a single element in each equivalence class. Hence, as each open interval contains a rational number (by the homework), each open set of real numbers is a union of open intervals indexed by a subset of the rational numbers. Consequently, it is natural to ask, “How many rational numbers are there?” This question will be a focus of the next chapter. To conclude this subsection and to give motivation for the study of open sets, we note the following result (whose proof takes a moment of thought). Proposition 3.2.8. Let (an )n≥1 be a sequence of real numbers. A number L ∈ R is the limit of (an )n≥1 if and only if for every open set U ⊆ R such that L ∈ U , there exists an N ∈ N such that an ∈ U for all n ≥ N . The above demonstrates an alternative definition for the limit of a sequence of real numbers. The above is useful in generalizing limits to abstract spaces where one defines a good notion of open sets (a topology which satisfies the conclusions of Proposition 3.2.5) which then determines which sequences converge. 3.2.2 Closed Sets Although the notion of open sets is important in future courses, the following notion is far more important for this course. Definition 3.2.9. A set F ⊆ R is said to be closed if F c is open. 3.2. TOPOLOGY OF THE REAL NUMBERS 37 Using our notion of open sets, we easily see that ∅ and R are both open sets. Furthermore, for all a, b ∈ R with a < b, we see that [a, b] is open since [a, b]c = (−∞, a) ∪ (b, ∞) which is a union of open sets and thus open by Proposition 3.2.5. It is important to note that there are subsets of R that are not open nor closed. Indeed if a, b ∈ R are such that a < b, then [a, b) is not open and not closed. Indeed [a, b) is not open by Example 3.2.4. Furthermore, since [a, b)c = (−∞, a) ∪ [b, ∞) we see that [a, b)c is not open and thus [a, b) is not closed. However, since [a, ∞)c = (−∞, a) and (−∞, b]c = (b, ∞) are open sets, [a, ∞) and (−∞, b] are closed sets. Due to the nature of the complement of a set, the following trivially follows from Proposition 3.2.5. Proposition 3.2.10. Let I be an non-empty set and for each i ∈ I let Fi be a closed subset of R. Then • T • S i∈I Fi is closed in R, and i∈I Fi is open in R provided I has a finite number of element. Proof. Since !c \ i∈I Fi !c = [ i∈I Fic and [ i∈I Fi = \ Fic i∈I by the homework, the result follows by the definition of a closed set along with Proposition 3.2.5. Example 3.2.11. Proposition 3.2.10 can be used to show that Z is closed in R as {n} is a closed set for each n ∈ Z. The reason we are interested in closed sets is the following result that shows that closed sets contain all of their limits. Theorem 3.2.12. A set F ⊆ R is closed if and only if whenever (an )n≥1 is a convergent sequence of real numbers with an ∈ F for all n ∈ N, then limn→∞ an ∈ F . Proof. We divide the proof into two cases: either F is closed, or F is not closed. Suppose F ⊆ R is a closed set. Let (an )n≥1 be a convergent sequence of real numbers with an ∈ F for all n ∈ N and let L = limn→∞ an . Suppose 38 CHAPTER 3. AN INTRODUCTION TO TOPOLOGY L∈ / F . Hence L ∈ F c . Since F is closed, F c is open. Therefore L ∈ F c implies there exists an > 0 such that (L − , L + ) ⊆ F c . However, since L = limn→∞ an , there exists an N ∈ N such that aN ∈ (L − , L + ) ⊆ F c . Hence aN ∈ F c and aN ∈ F which is a contradiction. Therefore, it must be the case that L ∈ F . Suppose F is not a closed set. Hence F c is not an open set. Therefore there exists an L ∈ F c such that for each n ∈ N there exists a number an ∈ (L − n1 , L + n1 ) with an ∈ / F c . Hence (an )n≥1 is a sequence of real numbers with an ∈ F and L− 1 1 ≤ an ≤ L + n n for all n ∈ N. Therefore, by the Squeeze Theorem (Theorem 2.3.8), (an )n≥1 converges to L. Since an ∈ F for all n ∈ N and L ∈ / F , the proof is complete. Example 3.2.13. The set X= 1 n∈N n is not closed since n1 ∈ X for all n ∈ N and 0 = limn→∞ However, one can show that 1 n yet 0 ∈ / X. 1 n ∈ N ∪ {0} n is closed by showing that every convergent sequence whose elements are in this set (of which there are lots) converges to a number in this set. 3.3 Compactness There are sets, known as compact sets, that are nicer than closed sets in many ways. In particular, compact sets are far more useful in future studies. 3.3.1 Definition of Compactness To define the notion of a compact set, we will need the following definition. Definition 3.3.1. Let X ⊆ R. A collection {Ui | i ∈ I} of subsets of R is S said to be an open cover of X if Ui is open for all i ∈ I and X ⊆ i∈I Ui . For example, if for each n ∈ N we let Un = (−n, n), then {Un | n ∈ N} is an open cover of R (and any subset of R). In addition, {( n1 , 1) | n ∈ N} is an open cover of (0, 1). 3.3. COMPACTNESS 39 Definition 3.3.2. A set K ⊆ R is said to be compact if every open cover has a finite subcover; that is, if {Ui | i ∈ I} is an open cover of K, then S there exists an n ∈ N and i1 , . . . , in ∈ I such that K ⊆ nk=1 Uik . Example 3.3.3. Let K = {0} ∪ 1 n∈N . n We claim that K is a compact set. S To see this, let {Ui | i ∈ I} be any open cover of K. Since 0 ∈ i∈I Ui , there exists an i0 ∈ I such that 0 ∈ Ui0 . Therefore there exists an > 0 so that (−, ) ⊆ Ui0 . Since limn→∞ n1 = 0, there exists an N ∈ N such that n1 ∈ (−, ) ⊆ Ui0 S for all n ≥ N . Furthermore, since K ⊆ i∈I Ui , for each n < N we may chose an in ∈ I such that n1 ∈ Uin . Hence, by construction K⊆ N[ −1 Uik k=0 so {Ui0 , . . . , UiN −1 } is a finite open subcover of K. It is natural to ask whether R is compact. Since the open cover {(−n, n) | n ∈ N} of R clearly has no finite subcover, we see that R is not compact. Using the same open cover, we obtain the following. Theorem 3.3.4. If K ⊆ R is compact, then K is bounded. Proof. Let K ⊆ R be compact. For each n ∈ N, let Un = (−n, n). Therefore, S since n≥1 Un = R, we have that {Un | n ∈ N} is an open cover of K. Since K is compact, there exists numbers k1 , . . . , km ∈ N such that {Uk1 , . . . , Ukm } is an open cover of K. Therefore, if M = max{k1 , . . . , km }, then K⊆ m [ Ukj = (−M, M ). j=1 Hence K is bounded. Analyzing the open cover {( n1 , 1) | n ∈ N} of (0, 1), we see that this open cover has not finite subcover and thus (0, 1) is not compact. Using similar ideas, we obtain the following. Theorem 3.3.5. If K ⊆ R is compact, then K is closed. Proof. Let K ⊆ R be compact. Suppose K is not closed. By Theorem 3.2.12, there exists a convergent sequence (an )n≥1 such that an ∈ K for all n ∈ N yet L = limn→∞ an ∈ / K. We will use the sequence (an )n≥1 to obtain a contradiction to the fact that K is compact. 40 CHAPTER 3. AN INTRODUCTION TO TOPOLOGY For each n ∈ N let Un = [L − n1 , L + n1 ]c . Notice that each Un is open and [ Un = R \ {L}. n≥1 Hence, as L ∈ / K, {Un | n ∈ N} is an open cover of K. We claim that {Un | n ∈ N} does not contain a finite subcover of K. To see this, suppose to the contrary that Uk1 , . . . , Ukm is a finite subcover of K for some k1 , . . . , km ∈ N. Let M = max{k1 , . . . , km }. Then K⊆ m [ j=1 1 1 ⊆ L− ,L + M M Ukj c . 1 1 Thus (L − M ) ⊆ K c . However, since L = limn→∞ an , there exists an ,L+ M 1 1 , L+ M N ∈ N such that aN ∈ (L − M ). Therefore, as aN ∈ K by assumption and we have demonstrated that aN ∈ / K, we have obtained a contradiction. Therefore {Un | n ∈ N} does not contain a finite subcover of K thereby contradicting the fact that K is compact. Therefore, as we have obtained a contradiction, it must be the case that K is closed. 3.3.2 The Heine-Borel Theorem Theorems 3.3.4 and 3.3.5 show that compact subsets of R are closed and bounded. In fact, the following theorem shows these are the only compact subsets of R. Theorem 3.3.6 (The Heine-Borel Theorem). A set K ⊆ R is compact if and only if K is closed and bounded. Proof. If K is a compact subset of R, then K is bounded and closed by Theorems 3.3.4 and 3.3.5 respectively. Let K ⊆ R be closed and bounded. To see that K is compact, let {Ui | i ∈ I} be an arbitrary an open cover of K. We claim that {Ui | i ∈ I} has a finite subcover of K. To see this, suppose to the contrary that {Ui | i ∈ I} does not have a finite subcover of K. Since K is bounded, there exists an M ∈ R such that K ⊆ [−M, M ]. Since {Ui | i ∈ I} is an open cover that does not have a finite subcover of K, either K ∩ [−M, 0] or K ∩ [0, M ] does not have a finite subcover (as if each has a finite subcover, then combining the finite subcovers yields a finite subcover of K). Choose I1 = [a1 , b1 ] from {[−M, 0], [0, M ]} so that K ∩ I1 does not have a finite subcover. Note that |b1 − a1 | = M . Using the ideas in the above paragraph, there must exist closed intervals I1 ⊇ I2 ⊇ I3 ⊇ · · · such that K ∩ In does not have a finite subcover for all 3.3. COMPACTNESS 41 1 n ∈ N and if In = [an , bn ], then |bn − an | = 2n−1 M . Since K ∩ In does not have a finite subcover for all n ∈ N, K ∩ In 6= ∅ for all n ∈ N. Hence, for each n ∈ N, we can choose a cn ∈ K ∩ In . We claim that the sequence (cn )n≥1 is Cauchy. To see this, let > 0 be arbitrary. Since limn→∞ 21n = 0 (see the homework), there exists an N ∈ N such that 2N1−1 M < . Therefore, if n, m ≥ N , then cn , cm ∈ IN so |cn − cm | ≤ |bN − aN | = 1 2N −1 M < . Hence, as > 0 was arbitrary, (cn )n≥1 is Cauchy. Hence L = limn→∞ cn exists by the Completeness of the Real Numbers (Theorem 3.1.5). Furthermore, since K is closed by assumption, L ∈ K by Theorem 3.2.12. Furthermore, note that L ∈ In for all n ∈ N by Theorem 3.2.12 since In is closed and cm ∈ In for all m ≥ n. Since {Ui | i ∈ I} is an open cover of K, there exists an i0 ∈ I so that L ∈ Ui0 . Hence, since Ui0 is open, there exists an > 0 so that (L − , L + ) ⊆ Ui0 . Since limn→∞ 21n = 0, there exists an N ∈ N such that |bN − aN | = 2N1−1 M < . Hence, since L ∈ IN so aN ≤ L ≤ bN , it must be the case that IN ⊆ (L − , L + ) ⊆ Ui0 . Hence Ui0 is a finite open cover of K ∩ IN which contradicts the fact that K ∩ In does not admit a finite subcover. Hence we have obtained a contradiction so {Ui | i ∈ I} must admit a finite subcover of K. Since {Ui | i ∈ I} was an arbitrary an open cover of K, K is compact by definition. The Heine-Borel Theorem shows us that compact subsets of R are precisely the closed and bounded sets. In other topological spaces, this need not be the case. However, compact subsets in those spaces play the same role as closed, bounded sets play in this course. 3.3.3 Sequential Compactness In topological spaces, there are other notions of compactness. In particular, the following is a notion which requires that each sequence in the set has a convergent subsequence in the set. Definition 3.3.7. A set K ⊆ R is said to be sequentially compact if whenever (an )n≥1 is a sequence of real numbers with an ∈ K for all n ∈ N, then there exist a subsequence of (an )n≥1 that converges to an element of K. Perhaps (unsurprisingly), sequentially compact and compact are the same notion for subsets of real numbers. Theorem 3.3.8. A set K ⊆ R is sequentially compact if and only if K is compact. 42 CHAPTER 3. AN INTRODUCTION TO TOPOLOGY Proof. Suppose K is compact. Thus K is closed and bounded by the HeineBorel Theorem (Theorem 3.3.6). Let (an )n≥1 be an arbitrary sequence of real numbers with an ∈ F for all n ∈ N. Thus (an )n≥1 must be bounded and thus has a convergent subsequence (akn )n≥1 by the Bolzano-Weierstrass Theorem (Theorem 2.4.5). Since an ∈ F for all n ∈ N and since K is closed, the limit of (akn )n≥1 must be in F . Hence, as (an )n≥1 was arbitrary, K is sequentially compact. Suppose K is sequentially compact. We claim that K must be bounded. Indeed, if K is not bounded above, then for all n ∈ N there exists a an ∈ K such that an ≥ n. Therefore, since every subsequence of (an )n≥1 is unbounded and thus cannot converge by Proposition 2.2.2, (an )n≥1 does not have a convergent subsequence. As this contradicts the fact that K is sequentially compact, we must have that K is bounded. Next we claim that K is closed. Indeed, if K is not closed, then by Theorem 3.2.12 there exists a convergent sequence (an )n≥1 with an ∈ K for all n ∈ N such that limn→∞ an ∈ / K. Therefore Proposition 2.4.2 implies that every subsequence of (an )n≥1 converges to limn→∞ an ∈ / K. As this contradicts the fact that K is sequentially compact, we must have that K is closed. Hence K being sequentially compact implies K is closed and bounded. Hence the Heine-Borel Theorem (Theorem 3.3.6) implies that K is compact. 3.3.4 The Finite Intersection Property There is one additional notion related to compactness that is particularly useful. To begin, we need a method for taking a set X and making the smallest possible closed set containing X. Definition 3.3.9. The closure of a subset X of R, denoted X, is the smallest closed subset of R containing X. By Proposition 3.2.10, if FX = {Y ⊆ R | X ⊆ Y, Y is closed} then X= \ Y. Y ∈FX Consequently, we obtain that (0, 1) = [0, 1]. Furthermore, the closure of a closed set X is just X. Example 3.3.10. The closure of the rational numbers in the real numbers is the real numbers; that is, Q = R. To see this, we note each element of R is a limit of elements of Q (by the homework). Consequently, by Theorem 3.2.12, the only closed set containing Q is R. Generalizing the idea in the above example, we obtain the following characterization of the closure of a set of real numbers. 3.3. COMPACTNESS 43 Lemma 3.3.11. Let X ⊆ R. If z ∈ X, then for all > 0 there exists a x ∈ X so that |x − z| < . Proof. Let z ∈ X. Suppose to the contrary that there exists an > 0 so that |x − z| ≥ for all x ∈ X. Then (z − , z + ) ∩ X = ∅. Hence X ⊆ (−∞, z − ] ∩ [z + , ∞). Since (−∞, z − ] ∩ [z + , ∞) is a closed set containing X, we have X ⊆ (−∞, z − ] ∩ [z + , ∞) by the definition of the closure. As X ⊆ (−∞, z − ] ∩ [z + , ∞) contradicts the fact that z ∈ X, the result follows. To describe the final property related to compactness, we require the following definition. Definition 3.3.12. A collection {Ai | i ∈ I} of subsets of R is said to have the finite intersection property if whenever J ⊆ I has a finite number of T elements, j∈J Aj 6= ∅. Theorem 3.3.13. A set K ⊆ R is compact if and only if whenever {Fi | i ∈ I} is a collection of closed subsets of K with the finite intersection property, T then i∈I Fi 6= ∅. Proof. Let K be a compact subset of R. Let {Fi | i ∈ I} be a collection of closed subsets of K with the finite intersection property. We must show that T T i∈I Fi 6= ∅. Suppose to the contrary that i∈I Fi = ∅. For each i ∈ I, let Ui = Fic . Then !c [ Ui = i∈I [ Fic \ = i∈I = ∅c = R Fi i∈I by the homework. Hence {Ui | i ∈ I} is an open subcover of K. However, for any n ∈ N and i1 , . . . , in ∈ I, we have that n [ m=1 Uim = n [ Ficm m=1 = n \ !c Fim . m=1 However, by the assumptions on {Fi | i ∈ I}, ∅= 6 n \ m=1 Fim ⊆ K so K* n \ !c Fim . m=1 Thus {Ui | i ∈ I} is an open subcover of K without any finite subcovers. As this contradicts the fact that K is compact, it must have been the case that T i∈I Fi 6= ∅. 44 CHAPTER 3. AN INTRODUCTION TO TOPOLOGY For the other direction, we will show that K is sequentially compact, which implies K is compact by Theorem 3.3.8. To see that K is sequentially compact, let (an )n≥1 be an arbitrary sequence of real numbers such that an ∈ K for all n ∈ N. Furthermore, for each m ∈ N, let Fm = {an | n ≥ m}. Therefore F = {Fn | n ∈ N} is a collection of closed subsets of K (closed as we took the closure and subsets of K since an ∈ K for all n ∈ N and by the definition of the closure). Furthermore, F has the finite intersection property since if n1 , . . . , nm ∈ N and k = max{n1 , . . . , nm }, then m \ Fnj = Fk . j=1 Therefore, by the assumptions on this direction of the proof, there exists an T L ∈ R such that L ∈ n≥1 Fn . Since Fn ⊆ K for all n ∈ N, we obtain that L ∈ K. We claim there exists a subsequence of (an )n≥1 that converges to L. To construct such a subsequence, first note that L ∈ F1 = {an | n ≥ 1}. Hence, Lemma 3.3.11 implies there exists an k1 ∈ N such that |ak1 − L| < 1. Now, suppose we have constructed k1 < k2 < · · · < kn such that |akj − L| < 1j for all 1 ≤ j ≤ n. To construct kn+1 , we note that L ∈ Fkn +1 = {am | m ≥ kn + 1}. Hence, Lemma 3.3.11 implies there exists an kn+1 ∈ N such that kn+1 > kn 1 . and |akn+1 − L| < n+1 By recursion, we obtain a subsequence (akn )n≥1 of (an )n≥1 such that |akn − L| < n1 for all n ∈ N. Therefore, since limn→∞ n1 = 0, we obtain that (akn )n≥1 converges to L ∈ K. Therefore, since (an )n≥1 was an arbitrary sequence of real numbers such that an ∈ K for all n ∈ N, we obtain that K is sequentially compact by definition. Hence the proof is complete. Chapter 4 Cardinality of Sets In the previous chapter, it was shown in Theorem 3.2.7 that every open subset of R is a union of open intervals. In fact, as we can choose the intervals to have empty intersection and as can choose one rational number from each interval, each open subset of R is a union of open intervals indexed by a subset of the rational numbers. The question is, “How many rational numbers are there?” This question leads us to the notion of the cardinality of a set; that is, a measure of how many elements a set contains. In particular, we will see that there are different types of infinities. This notion of various types of infinities was the like work of the mathematician Cantor and eventually drove him insane. Thus we should tread carefully. 4.1 Functions To discuss how we can compare the size of two sets, we must introduce a mathematical object that we have avoided until this point: functions. 4.1.1 The Axiom of Choice The most useful and accurate method for defining functions is to use the following operation on sets. Definition 4.1.1. Given two non-empty sets X and Y , the Cartesian product of X and Y , denoted X × Y , is the set X × Y = {(x, y) | x ∈ X, y ∈ Y }. Definition 4.1.2. Given two non-empty sets X and Y , a function f from X to Y , denoted f : X → Y , is a subset S of X × Y such that for each x ∈ X there is an unique element denoted f (x) ∈ Y such that (x, f (x)) ∈ S (that is, a function is defined by its graph). 45 46 CHAPTER 4. CARDINALITY OF SETS Many of the ‘operations’ and ‘relations’ discussed in previous chapters in these notes are actually functions in disguise. Example 4.1.3. Notice f, g : R × R → R defined by f ((x, y)) = x + y and g((x, y)) = xy are functions. Hence the operations of addition and multiplication on R are functions. Example 4.1.4. Sequence of real numbers are actually functions. Indeed each sequence (an )n≥1 can be described via a function f : N → R where f (n) = an . Conversely, given a function f : N → R, we may define the sequence (f (n))n≥1 . In the subsequent two examples, we remind the reader of two mathematical objects that will be essential in that which follows. Example 4.1.5. Let X be an non-empty set and let be a partial ordering (Definition 1.3.3) on X. Notice we can define a function f : X × X → {0, 1} by ( 1 if x1 x2 f ((x1 , x2 )) = . 0 otherwise Notice the fact that is a partial ordering implies that • f ((x, x)) = 1 for all x ∈ X, • if f ((x, y)) = f ((y, x)) = 1, then x = y, and • if f ((x, y)) = f ((y, z)) = 1, then f ((x, z)) = 1. Conversely, if g : X × X → {0, 1} has the above three properties, then we can define a partial ordering on X by x1 x2 if and only if g((x1 , x2 )) = 1. Consequently, a partial ordering on X can be viewed as a function on X × X with specific properties. Furthermore, said ordering is a total ordering precisely when either f ((x, y)) = 1 or f ((y, x)) = 1 for all x, y ∈ X. Example 4.1.6. Let X be an non-empty set and let ∼ be an equivalence relation (Definition 3.2.6) on X. Notice we can define a function f : X ×X → {0, 1} by ( 1 if x1 ∼ x2 f ((x1 , x2 )) = . 0 otherwise Notice the fact that ∼ is a partial ordering implies that • f ((x, x)) = 1 for all x ∈ X, • if f ((x, y)) = 1, then f ((y, x)) = 1 for all x, y ∈ X, and • if f ((x, y)) = f ((y, z)) = 1, then f ((x, z)) = 1. 4.1. FUNCTIONS 47 Conversely, if g : X × X → {0, 1} has the above three properties, then we can define an equivalence relation on X by x1 ∼ x2 if and only if g((x1 , x2 )) = 1. Consequently, an equivalence relation on X can be viewed as a function on X × X with specific properties. Example 4.1.7. Given two non-empty sets X and Y , there is a natural way to view X × Y = {f : {1, 2} → X ∪ Y | f (1) ∈ X, f (2) ∈ Y }. Indeed, a function f : {1, 2} → X ∪ Y is uniquely determined by the values f (1) and f (2). Consequently, an f : {1, 2} → X ∪ Y as defined in the above set can be viewed as the pair (f (1), f (2)). Conversely a pair (x, y) ∈ X × Y can be represented by the function f : {1, 2} → X ∪ Y defined by f (1) = x and f (2) = y. In Examples 4.1.5, 4.1.6, and 4.1.7, we have exhibited equivalent ways of looking at a single object. In doing so, we have created a nice correspondence between the various forms of the objects. Fully describing what we mean by such a correspondence will be postponed to the next subsection. For now, we desire to extend the notion of the products of sets. Let X1 , . . . , Xn be non-empty sets. We define the product of these sets to be X1 × · · · × Xn = {(x1 , . . . , xn ) | xj ∈ Xj for all j ∈ {1, . . . , n}}. Notice we can view X1 × · · · × Xn as a set of functions in a similar manner to Example 4.1.7. Indeed ( X1 × · · · × Xn = f : {1, . . . , n} → n [ k=1 ) Xk f (j) ∈ Xj ∀ j ∈ {1, . . . , n} . But what happens if we want to take a product of an infinite number of sets? Given a non-empty set I and a collection of non-empty sets {Xi | i ∈ I}, we would like to define the product ( Y i∈I Xi = f :I→ [ i∈I ) Xi f (k) ∈ Xk for all k ∈ I . However, we must ask, “Is the above set non-empty?” That is, how do we know there is always such a function? The answer is, because we add an axiom to make it so. Axiom 4.1.8 (The Axiom of Choice). Given a non-empty set I and a Q collection of non-empty sets {Xi | i ∈ I}, the product i∈I Xi is non-empty. Q Any function f ∈ i∈I Xi is called a choice function. 48 CHAPTER 4. CARDINALITY OF SETS One may ask, “Why Mr. Anderson? Why? Why do we include the Axiom of Choice?” The short answer is, of course, “Because I choose to.” It turns out that the Axiom of Choice is independent from the axioms of (Zermelo–Fraenkel) set theory. This means that if one starts with the standard axioms of set theory, one can neither prove nor disprove the Axiom of Choice. Thus we have the option on whether to include or exclude the Axiom of Choice from our theory. We will allow the use of the Axiom of Choice. In fact, we have already used a form of the Axiom of Choice (called countable choice) when constructing sequences in the past two chapters. Of course, mathematicians like to see what can be done if one excludes certain assumptions from their theories. By allowing only certain forms of the Axiom of Choice, one obtains various forms of calculus where some of the results in these notes are either false, or far more difficult to prove. But let’s choose to make the correct decision and include the Axiom of Choice. 4.1.2 Bijections As we need to deal with functions throughout the remainder of the course, we will need some notation and definitions. Given a function f : X → Y and an A ⊆ X, we define f (A) = {f (x) | x ∈ A} ⊆ Y. Definition 4.1.9. Given a function f : X → Y , the range of f is f (X). Using the notion of the range, we can define an important property we may desire our functions to have. Definition 4.1.10. A function f : X → Y is said to be surjective (or onto) if f (X) = Y ; that is, for each y ∈ Y there exists an x ∈ X such that f (x) = y. To illustrate when a function is surjective or not, consider the following diagrams. f is surjective X → − f Y f is not surjective X → − f Y 4.1. FUNCTIONS 49 Example 4.1.11. Consider the function f : [0, 1] → [0, 2] defined by f (x) = x2 . Notice f is not surjective since f (x) 6= 2 for all x ∈ [0, 1]. However, the function g : [0, 1] → [0, 1] defined by g(x) = x2 is surjective. Consequently, the target set (known as the co-domain) matters. One useful tool when dealing with functions is to be able to describe all points in the initial space that map into a predetermined set. Thus we make the following definition. Definition 4.1.12. Given a function f : X → Y and a B ⊆ Y , the preimage of B under f is the set f −1 (B) = {x ∈ X | f (x) ∈ B} ⊆ X. Note the notation used for the preimage does not assume the existence of an inverse of f (see Theorem 4.1.16). Using preimages, we can define an important property we may desire our functions to have. Definition 4.1.13. A function f : X → Y is said to be injective (or one-toone) if for all y ∈ Y , the preimage f −1 ({y}) has at most one element; that is, if x1 , x2 ∈ X are such that f (x1 ) = f (x2 ), then x1 = x2 . To illustrate when a function is injective or not, consider the following diagrams. f is injective X → − f f is not injective Y X → − f Y Example 4.1.14. Consider the function f : [−1, 1] → [0, 1] defined by f (x) = x2 . Notice f is not injective since f (−1) = f (1). However, the function g : [0, 1] → [0, 1] defined by g(x) = x2 is injective. Consequently, the initial set (known as the domain) matters. We desire to combine the notions of injective and surjective. Definition 4.1.15. A function f : X → Y is said to be a bijection if f is injective and surjective. Using the above examples, we have seen several functions that are not bijective. Furthermore, we have seen that f : [0, 1] → [0, 1] defined by f (x) = x2 is bijective. One way to observe that f is injective is to consider √ the function g : [0, 1] → [0, 1] defined by g(x) = x. Notice that f and g ‘undo’ what the other function does. In fact, this is true of all bijections. 50 CHAPTER 4. CARDINALITY OF SETS Theorem 4.1.16. A function f : X → Y is a bijection if and only if there exists a function g : Y → X such that • g(f (x)) = x for all x ∈ X, and • f (g(y)) = y for all y ∈ Y . Furthermore, if f is a bijection, there is exactly one function g : Y → X that satisfies these properties, which is called the inverse of f and is denoted by f −1 : Y → X. Notice this implies f −1 is also a bijection with (f −1 )−1 = f . Proof. Suppose that f is a bijection. Since f is surjective, for each y ∈ Y there exists an zy ∈ X such that f (zy ) = y. Furthermore, note zy is the unique element of X that f maps to y since f is injective. Define g : Y → X by g(y) = zy . Clearly g is a well-defined function. To see that g satisfies the two properties, first let x ∈ X be arbitrary. Then y = f (x) ∈ Y . However, since f (zy ) = y = f (x), it must be the case that zy = x as f is injective. Therefore g(f (x)) = g(y) = zy = x as desired. For the second property, let y ∈ Y be arbitrary. Then f (g(y)) = f (zy ) = y by the definition of zy . Hence g satisfies the desired properties. Conversely, suppose g : Y → X satisfies the two properties. To see that f is injective, suppose x1 , x2 ∈ X are such that f (x1 ) = f (x2 ). Then x1 = g(f (x1 )) = g(f (x2 )) = x2 as desired. To see that f is surjective, let y ∈ Y be arbitrary. Then g(y) ∈ X so y = f (g(y)) ∈ f (X). Since y ∈ Y is arbitrary, we have Y ⊆ f (X). Hence f (X) = Y so f is surjective. Therefore, as f is both injective and surjective, f is bijective by definition. Finally, suppose f is bijective and g : Y → X satisfies the above properties. Suppose h : Y → X is another function such that h(f (x)) = x for all x ∈ X, and f (h(y)) = y for all y ∈ Y . Then for all y ∈ Y , h(y) = g(f (h(y))) = g(y) (where we have used g(f (x1 )) = x1 when x1 = h(y) and f (h(y)) = y). Therefore g = h as desired. Remark 4.1.17. If f : X → Y is injective, consider the function g : X → f (X) defined by g(x) = f (x) for all x ∈ X. Clearly g is injective since f is, and, by construction, g is surjective. Hence g is bijective and thus has an inverse g −1 : f (X) → X. The function g −1 is called the inverse of f on its image. 4.2. CARDINALITY 4.2 51 Cardinality We turn our attention to trying to determine how large a given set is. For finite sets, we can count the number of elements to determine whether two sets have the same number of elements or whether one set has more elements than the other. The problem is, “How do we count the number of elements in an infinite set?” 4.2.1 Definition of Cardinality An alternative way to look at the above problem is to use functions. For example, one way to see that {1, 2, 3} and {5, π, 42} have the same number of elements is that we can pair up the elements via {(1, 5), (3, π), (2, 42)} for example. However, we can see that {1, 2, 3} and {5, π, 42, 29} do not have the same number of elements since there is no such pairing. Saying that there is such a pairing is precisely saying that there exists a bijection from one set to the other. Consequently, we define a relation ∼ on the ‘collection’ of all sets by X ∼ Y if and only if there exists a bijection f : X → Y . Notice that ∼ ‘is’ an equivalence relation. Indeed, to see that ∼ satisfies the properties in Definition 3.2.6, first notice that X ∼ X as the function f : X → X defined by f (x) = x for all x ∈ X is a bijection. Next, if f : X → Y is a bijection, then f −1 : Y → X is a bijection so X ∼ Y implies Y ∼ X. Finally, if X ∼ Y and Y ∼ Z, then there exists bijections f : X → Y and g : Y → Z. If we define h : X → Z to be the composition of f and g, denoted g ◦ f , which is the function defined by h(x) = g(f (x)), it is not difficult to see that h is a bijection (either check h is injective and surjective directly, or check that h−1 = f −1 ◦ g −1 ) so X ∼ Z. Consequently, given a set X, we will use |X| to denote the equivalence class of X under the above equivalence relation. Oppose to always referring to this equivalence relation, we make the following definition. Definition 4.2.1. Given two sets X and Y , it is said that X and Y have the same cardinality (or are equinumerous), denoted |X| = |Y |, if there exists a bijection f : X → Y . Example 4.2.2. Notice that the sets X = {3, 7, π, 2} and Y = {1, 2, 3, 4} have the same cardinality via the function f : Y → X defined by f (1) = 3, f (2) = π, f (3) = 2, and f (4) = 7. Example 4.2.3. We claim that |N| = |Z| (which may seem odd as N ⊆ Z). To see this, define f : N → Z by f (n) = 0 n 2 − n−1 2 if n = 1 . if n is even if n is odd and n ≥ 3 52 CHAPTER 4. CARDINALITY OF SETS (For example f (1) = 0, f (2) = 1, f (3) = −1, f (4) = 2, f (5) = −2, etc.) It is not difficult to verify that f is a bijection. Using bijections gives us a method for determining when two sets have the same size. However, how can we determine when one set has fewer elements than another? We have already seen that {1, 2, 3} and {5, π, 42, 29} do not have the same number of elements. We know that {1, 2, 3} has fewer elements than {5, π, 42, 29}. One way to see this is that we can define a function from {1, 2, 3} to {5, π, 42, 29} that is optimal as possible; that is, we try to form a bijective pairing, but we only obtain an injective function as we cannot hit all of the elements of the later set. Consequently: Definition 4.2.4. Given two sets X and Y , it is said that X has cardinality less than Y , denoted |X| ≤ |Y |, if there exists an injective function f : X → Y. Example 4.2.5. Let n, m ∈ N be such that n < m. Then {1, . . . , n} has cardinality less than {1, . . . , m} as f : {1, . . . , n} → {1, . . . , m} defined by f (k) = k is injective. Example 4.2.6. Since the function f : N → Q defined by f (n) = n is injective, we see that |N| ≤ |Q|. More generally, if X ⊆ Y , then |X| ≤ |Y |. When determining that {1, 2, 3} has fewer elements than {5, π, 42, 29}, we could have thought of things in a different light. In particular, we could define a function from {5, π, 42, 29} to {1, 2, 3} that was onto. This should imply that {5, π, 42, 29} has more elements than {1, 2, 3}, which is the case by the next result. Proposition 4.2.7. Let X and Y be non-empty sets. If f : X → Y is surjective, then |Y | ≤ |X|. Proof. For each y ∈ Y , let Ay = f −1 ({y}). Since f is surjective, Ay 6= ∅ for all y ∈ Y . Hence, by the Axiom of Q Choice (Axiom 4.1.8), there exists a function g ∈ y∈Y Ay ; that is, g : Y → S y∈Y Ay ⊆ X is such that g(y) ∈ Ay for all y ∈ Y . We claim that g is injective. To see this, suppose y1 , y2 ∈ Y are such that g(y1 ) = g(y2 ). Let x = g(y1 ) = g(y2 ) ∈ X. By the properties of g, it must be the case that x ∈ Ay1 and x ∈ Ay2 . Since x ∈ Ay1 , we must have f (x) = y1 by the definition of Ay1 . Similarly, since x ∈ Ay2 , we must have f (x) = y2 . Therefore y1 = y2 as desired. 4.2. CARDINALITY 4.2.2 53 Cantor-Schröder–Bernstein Theorem One natural question is, “Is ≤ a partial ordering (see Definition 1.3.3) on the ‘collection’ of possible cardinalities?” We want our ordering to be (at least) a partial ordering, as the properties defining a partial ordering are the minimal properties one would like for a ‘nice’ ordering. Clearly reflexivity and transitivity hold (the composition of injective functions is injective), but does antisymmetry? In Example 4.2.6, it was shown that |N| ≤ |Q|. However, notice if m m ≥ 0, n > 0, m and n have no common divisors n m N= m < 0, n > 0, m and n have no common divisors , n P = then P ∩ N = ∅ and P ∪ N = Q. Furthermore, we may define f : Q → N by f (q) = 1 2m 3n 5−m 7n if m = 0 if m > 0 and n > 0 if m < 0 and n > 0 where q = m n is the unique way to write q as an element of P or N . Using the uniqueness of prime factorization (something not covered in this course), we see f is an injective function. Hence |Q| ≤ |N|! As |N| ≤ |Q| and |Q| ≤ |N|, is |Q| = |N|? It is seems difficult to construct a bijective function f : N → Q, so what hope do we have? To answer this question, we have the following result (alternatively, we could construct such a function, but it is not nice to define). Notice that if X and Y are sets such that there exists injective functions f : X → Y and g : Y → X, then we may invoke the following theorem with A = g(Y ) and B = f (X) to obtain that |X| = |Y |. Theorem 4.2.8 (Cantor-Schröder–Bernstein Theorem). Let X and Y be non-empty sets. Suppose A ⊆ X and B ⊆ Y are such that there exists bijective functions f : X → B and g : Y → A. Then |X| = |Y |. Proof. Let A0 = X and A1 = A. Define h = g ◦ f : A0 → A0 by h(x) = g(f (x)). Notice h is injective as f and g are injective. Let A2 = h(A0 ). Notice A2 = h(A0 ) = g(f (A0 )) = g(B) ⊆ g(Y ) = A1 . Hence A2 ⊆ A1 ⊆ A0 . Next let A3 = h(A1 ). Then A3 = h(A1 ) ⊆ h(A0 ) = A2 . 54 CHAPTER 4. CARDINALITY OF SETS Consequently, if for each n ∈ N we recursively define An = h(An−2 ), then, by recursion (formally, we should apply the Principle of Mathematical Induction), An = h(An−2 ) ⊆ h(An−3 ) = An−1 for all n ∈ N. Hence we have constructed a sequence A0 ⊇ A1 ⊇ A2 ⊇ · · · with An = h(An−2 ) for all n ≥ 2. We claim that |A| = |X|. To see this, notice that X = A0 = (A0 \ A1 ) ∪ (A1 \ A2 ) ∪ (A2 \ A3 ) ∪ (A3 \ A4 ) ∪ · · · ∪ A = A1 = (A1 \ A2 ) ∪ (A2 \ A3 ) ∪ (A3 \ A4 ) ∪ (A4 \ A5 ) ∪ · · · ∪ ∞ \ n=1 ∞ \ ! An ! An . n=1 Furthermore, notice that any two distinct sets chosen from either union have empty intersection as A0 ⊇ A1 ⊇ A2 ⊇ · · · . Since h is injective h(A2n \ A2n+1 ) = h(A2n ) \ h(A2n+1 ) = A2n+2 \ A2n+3 for all n ∈ N ∪ {0}. Therefore, as the sets in the union description of X are disjoint, we may define h0 : A0 → A1 via h0 (x) = x if x ∈ ∞ n=1 An if x ∈ A2n−1 \ A2n for some n ∈ N if x ∈ A2n \ A2n+1 for some n ∈ N T x h(x) Since • h0 maps A2n \ A2n+1 to A2n+2 \ A2n+3 bijectively for all n ∈ N, • h0 maps A2n−1 \ A2n to A2n−1 \ A2n bijectively for all n ∈ N, and • h0 maps T∞ n=1 An to T∞ n=1 An bijectively, we obtain that h0 is a bijection (any two distinct sets chosen from either union have empty intersection). Hence |A| = |X| as claimed. However |A| = |Y | as g : Y → A is a bijection. Hence |Y | = |X| as having equal cardinality is an equivalence relation (see the discussion at the beginning of Section 4.2.1. Since we have shown |N| ≤ |Q| and |Q| ≤ |N|, we have by the CantorSchröder–Bernstein Theorem (Theorem 4.2.8) that |N| = |Q|; that is N and Q have the same number of elements! 4.2. CARDINALITY 4.2.3 55 Countable and Uncountable Sets One nice corollary about |N| = |Q| is that we can make a list of all rational numbers; that is, as there is a bijective function f : N → Q, we can form the sequence of all rational numbers (f (n))n≥1 . Consequently, sets that are equinumerous to the natural numbers are particularity nice sets as we can index such sets by N. This leads us to the study of such sets. Definition 4.2.9. A set X is said to be • countable if X is finite or |X| = |N|, • countably infinite if |X| = |N|, • uncountable if X is not countable. A natural question is, “Under what operations is the countability of sets preserved?” The following demonstrates that subsets (and thus intersections) of countable sets are countable. Lemma 4.2.10. If X is a countable set, then any subset of X must also be countable. Proof. If |X| = |N|, then there is a bijection between X and N which induces a bijection between the subsets of X and N. Thus we may assume that X = N. Using the Well-Ordering Principle, it is not difficult to see that every subset of N is either finite, or can be listed as a sequence (and thus equinumerous with N). Indeed, choose a1 to be the least element of X. Then clearly 1 ≤ a1 . Next, let a2 be the least element of X \ {a1 }. Hence we must have a2 ≥ 2. Recursively let an be the least element of X \ {a1 , . . . , an−1 } (which implies an ≥ n). This process either stops (in which case X is finite) or continues and must list all of the elements of X as an ≥ n for all n ∈ N. The following, which simply stated says the countable union of countable sets is countable, is an nice example of why it is useful to be able to write countable sets as a sequence. Theorem 4.2.11. For each n ∈ N, let Xn be a countable set. Then X = S∞ n=1 Xn is countable. Proof. We first desire to restrict to the case that our countable sets are disjoint. Let B1 = X1 and for each k ≥ 2 let k−1 [ Bk = Xk \ j=1 Xj . 56 CHAPTER 4. CARDINALITY OF SETS Clearly Bk ∩ Bj = ∅ for all j 6= k and X = ∞ n=1 Bn . Since Bn ⊆ Xn for all n, each Bn is countable by Lemma 4.2.10. Consequently, for each n ∈ N, we may write Bn = (bn,1 , bn,2 , bn,3 , . . . , ). S We desire to define a function f : X → N by f (bn,m ) = 2n 3m . Note such a function is well-defined since Bk ∩ Bj = ∅ for all j 6= k. Since f is injective by the uniqueness of the prime decomposition of natural numbers, we obtain that |X| ≤ |N|. Hence X is countable. Corollary 4.2.12. If X and Y are countable sets, X S Y is a countable set. Proof. Apply Theorem 4.2.11 where X1 = X, X2 = Y , and Xn = ∅ for all n ≥ 3. In Chapter 1, when we were using set notation to describe sets, we had a hard time ‘listing’ the real numbers. Thus, one might ask, “Is there a sequence of all real numbers?” We know that |N| ≤ |R| via the injective function f : N → R defined by f (n) = n for all n ∈ N. However, is |N| = |R|? To demonstrate that |N| < |R|, we will use the following. Theorem 4.2.13. The open interval (0, 1) is uncountable. Proof. The following proof is known as Cantor’s diagonalization argument and has a wide variety of uses. Suppose that (0, 1) is countable. Then we may write (0, 1) = {xn | n ∈ N}. By the homework, there exists numbers {ai,j | i, j ∈ N} ⊆ {0, 1, . . . , 9} such that xj = lim n→∞ n X ak,j k=1 10k for all j ∈ N. Note that the sequence (ak,j )k≥1 in the above expression for xj represents the decimal expansion of xj ; that is xj = 0.a1,j a2,j a3,j a4,j a5,j · · · . Consequently, this representation need not be unique due to the possibility of repeating 9s (and this is the only reason). For each k ∈ N, define ( yk = 3 7 if ak,k = 7 otherwise yk and let y = limn→∞ nk=1 10 It is not difficult to see that y ∈ (0, 1). k. Furthermore y 6= xn for all n ∈ N (as y and xn will disagree in the nth decimal place and this is not because of repeating 9s). Therefore, since (0, 1) = {xn | n ∈ N}, we must have that y ∈ / (0, 1), which contradicts the fact that y ∈ (0, 1). P 4.2. CARDINALITY 57 Proposition 4.2.14. A set containing an uncountable subset is uncountable. Consequently, by Theorem 4.2.13, R is uncountable. Proof. Let X be a set such that there exists an uncountable subset Y of X. Suppose X was countable. Then Y would be countable by Lemma 4.2.10, which contradicts the fact that Y is uncountable. Hence X must be uncountable. Corollary 4.2.15. The irrational numbers R \ Q is an uncountable set. Proof. Suppose R \ Q is a countable set. Since Q is countable and R = Q ∪ (R \ Q), it would need to be the case that R is countable by Theorem 4.2.11. Since R is uncountable by Proposition 4.2.14, we have obtained a contradiction so R \ Q is an uncountable set. Since R is uncountable, |N| < |R| so there does not exist a list of real numbers. However, is R the ‘smallest’ set larger than N? In particular: Axiom 4.2.16 (The Continuum Hypothesis). If X ⊆ R is uncountable, must it be the case that |X| = |R|? The Continuum Hypothesis was originally postulated by Cantor whom spent many years (at the cost of his own health and possibly sanity) trying to prove the hypothesis. Consequently, we will not try. In fact, the reason for Cantor’s difficulty is that there is no proof. However, nor is there any counter example. Like with the Axiom of Choice, the Continuum Hypothesis is independent of (Zermelo–Fraenkel) set theory, even if the Axiom of Choice is included. Most results in analysis do not require an assertion to whether the Continuum Hypothesis is true of false. Thus we move on. 4.2.4 Zorn’s Lemma Using the Cantor-Schröder–Bernstein Theorem (Theorem 4.2.8), we saw that cardinality gives a partial ordering on the size of sets. However, is it a total ordering (Definition 1.3.5)? That is, if X and Y are non-empty sets, must it be the case that |X| ≤ |Y | or |Y | ≤ |X|? The above is a desirable property since it makes the ordering nicer. However, when given two sets, it is not clear whether there always exist an injection from one set to the other. The goal of this subsection is to develop the necessary tools in order to answer this problem in the subsequent subsection. The tools we require are related to partial ordering, so the following definition is made. Definition 4.2.17. A partially ordered set (or poset) is a pair (X, ) where X is a non-empty set and is a partial ordering on X. 58 CHAPTER 4. CARDINALITY OF SETS For examples of posets, we refer the reader back to Subsection 1.3.2. Our main focus is a ‘result’ about totally ordered subsets of partially ordered sets: Definition 4.2.18. Let (X, ) be a partially ordered set. A non-empty subset Y ⊆ X is said to be a chain if Y is totally ordered with respect to ; that is, if a, b ∈ Y , then either a b or b a. Clearly any non-empty subset of a totally ordered set is a chain. Here is a less obvious example. Example 4.2.19. Recall from Example 1.3.4 that the power set P(R) of R has a partial ordering where AB ⇐⇒ A ⊆ B. If Y = {An | n ∈ N} ⊆ P(R) are such that An ⊆ An+1 for all n ∈ N, then Y is a chain. Like our initial study of the real numbers in Chapter 1, upper bounds play an important role with respect to chains. Definition 4.2.20. Let (X, ) be a partially ordered set. A non-empty subset Y ⊆ X is said to be a bounded above if there exists a z ∈ X such that y ≤ z for all y ∈ Y . Such an element z is said to be an upper bound for Y . Example 4.2.21. Recall from Example 4.2.19 that if Y = {An | n ∈ N} ⊆ P(R) are such that An ⊆ An+1 for all n ∈ N, then Y is a chain with respect to the partial ordering defined by inclusion. If A= ∞ [ An n=1 then clearly A ∈ P(R) and An ⊆ A for all n ∈ N. Hence A is an upper bound for Y . As in Chapter 1, there are optimal upper bounds of subsets of R which we called least upper bounds. We saw that least upper bounds need not be in the subset. Thus we desire a slightly different object when it comes to partially ordered sets as the lack of a total ordering means there may not be a unique ‘optimal’ upper bound. Definition 4.2.22. Let X be a non-empty set and let be a partial ordering on X. An element x ∈ X is said to be maximal if there does not exist a y ∈ X \ {x} such that x y; that is, there is no element of X that is larger than x with respect to . 4.2. CARDINALITY 59 Notice that R together with its usual ordering ≤ does not have a maximal element (by, for example, the Archimedean Property). However, many partially ordered sets do have maximal elements. For example ([0, 1], ≤) has 1 as a maximal element (although ((0, 1), ≤) does not). For an example involving a partial ordering that is not a total ordering, suppose X = {x, y, z, w} and is defined such that a a for all a ∈ X, a b for all a ∈ {x, y} and b ∈ {z, w}, and a b for all other pairs (a, b) ∈ X × X. It is not difficult to see that z and w are maximal elements and x and y are not maximal elements. Thus it is possible, when dealing with a partial ordering that is not a total ordering, to have multiple maximal elements. The result we require for the next subsection may now be stated using the above notions. Axiom 4.2.23 (Zorn’s Lemma). Let (X, ) be a partially ordered set. If every chain in X has an upper bound, then X has a maximal element. We will not prove Zorn’s Lemma. To do so, we would need to use the Axiom of Choice. In fact, Zorn’s Lemma and the Axiom of Choice are logically equivalent; that is, assuming the axioms of (Zermelo–Fraenkel) set theory, one may use the Axiom of Choice to prove Zorn’s Lemma, and one may use Zorn’s Lemma to prove the Axiom of Choice. 4.2.5 Comparability of Cardinals Using Zorn’s Lemma, we may finally demonstrate that the ordering on cardinals is a total ordering. Theorem 4.2.24. Let X and Y be non-empty sets. Then either |X| ≤ |Y | or |Y | ≤ |X|. Proof. Let F = {(A, B, f ) | A ⊆ X, B ⊆ Y, f : A → B is a bijection}. Notice that F is non-empty since, by assumption, there exists an x ∈ X and a y ∈ Y so we may select A = {x}, B = {y}, and f : A → B defined by f (x) = y. Given (A1 , B1 , f1 ), (A2 , B2 , f2 ) ∈ F, define (A1 , B1 , f1 ) (A2 , B2 , f2 ) if and only if A1 ⊆ A2 , B1 ⊆ B2 , and f2 (x) = f1 (x) for all x ∈ A1 . It is not difficult to verify that is a partial ordering on F. We desire to invoke Zorn’s Lemma (Axiom 4.2.23) in order to obtain a maximal element of F. To invoke Zorn’s Lemma, it must be demonstrated 60 CHAPTER 4. CARDINALITY OF SETS that every chain in (F, ) has an upper bound. Let C = {(Ai , Bi , fi ) | i ∈ I} be an arbitrary chain in (F, ). Let A= [ i∈I Ai and B= [ Bi . i∈I We desire to define f : A → B such that f (x) = fi (x) whenever x ∈ Ai . The question is, “Will such an f be well-defined as each x could be in multiple Ai ?” To see that f is well-defined, suppose x ∈ Ai and x ∈ Aj for some i, j ∈ I. Since C is a chain, either (Ai , Bi , fi ) (Aj , Bj , fj ) or (Aj , Bj , fj ) (Ai , Bi , fi ). If (Ai , Bi , fi ) (Aj , Bj , fj ), then Ai ⊆ Aj and implies that fj (x) = fi (x). As the case that (Aj , Bj , fj ) (Ai , Bi , fi ) is the same (reversing i and j), we obtain that f is well-defined. In order for (A, B, f ) to be an upper bound for C, we must first demonstrate that (A, B, f ) ∈ F. Clearly A ⊆ X, B ⊆ Y , and f : A → B is a function. It remains to check that f is a bijection. To see that f is injective, suppose x1 , x2 ∈ A are such that f (x1 ) = f (x2 ). S Since A = i∈I Ai , there exists i, j ∈ I such that xi ∈ Ai and xj ∈ Aj . Since C is a chain, we must have either (Ai , Bi , fi ) (Aj , Bj , fj ) or (Aj , Bj , fj ) (Ai , Bi , fi ). In the former case, we obtain that fj (x1 ) = f (x1 ) = f (x2 ) = fj (x2 ). Therefore, since fj is injective, it must be the case that x1 = x2 . As the case that (Aj , Bj , fj ) (Ai , Bi , fi ) is the same (reversing i and j), we obtain that f is injective. To see that f is surjective, let y ∈ B be arbitrary. Since B = i∈I Bi , there exists an i ∈ I such that y ∈ Bi . Since fi is surjective, there exists an x ∈ Ai such that fi (x) = y. Hence x ∈ A and f (x) = fi (x) = y. Therefore, as y was arbitrary, f is surjective. Hence f is a bijection and (A, B, f ) ∈ F. S As (A, B, f ) ∈ F, it is easy to see that (A, B, f ) is an upper bound for C by the definition of (A, B, f ) and the partial ordering . Hence, as C was an arbitrary chain, every chain in F has an upper bound. Thus Zorn’s Lemma implies that (F, ) has a maximal element. Let (A0 , B0 , f0 ) ∈ F be a maximal element. We claim that either A0 = X or B0 = Y . To see this, suppose otherwise that A0 6= X and B0 6= Y . Therefore, there exist x0 ∈ X \ A0 and y0 ∈ Y \ B0 . Let A0 = A0 ∪ {x0 }, B 0 = B0 ∪ {y0 }, and g : A0 → B 0 be defined by g(x0 ) = y0 and g(x) = f0 (x) for all x ∈ A0 . Clearly g is a well-defined bijection by construction so (A0 , B 0 , g) ∈ F. However, it is elementary to see that (A0 , B0 , f0 ) (A0 , B 0 , g) and (A0 , B0 , f0 ) 6= (A0 , B 0 , g). As this contradicts the fact that (A0 , B0 , f0 ) ∈ F is a maximal element, we have obtained a contradiction. Hence either A0 = X or B0 = Y . If A0 = X, then f0 : X → B ⊆ Y is injective so |X| ≤ |Y | by definition. Otherwise, if B0 = Y , then f0 : A0 → Y is surjective. Choose y ∈ Y and 4.2. CARDINALITY 61 define h : X → Y by ( h(x) = f0 (x) y if x ∈ A0 . if x ∈ / A0 Clearly h is a well-defined surjective function so |Y | ≤ |X| by Proposition 4.2.7. 62 CHAPTER 4. CARDINALITY OF SETS Chapter 5 Continuity In the previous chapter, we saw the use of functions in comparing the size of sets. However, there is a vast possibility of applications for functions. In particular, the focus of this chapter is to begin to examine functions from subsets of the real numbers to the real numbers. However, our goal is not to plainly study such functions, but how such functions interact with the properties of the real numbers we have investigated in this course. In particular, we will begin with a focus of limits of functions. This study will lead us to the all important theory of continuous functions. 5.1 Limits of Functions To study analytic properties of functions on subset of real numbers, we first must modify the definition of a limit of sequence to be able to examine the limit of a function. 5.1.1 Definition of a Limit Given a function f : R → R and a point a ∈ R, we desire to describe the behaviour as x gets ‘closer and closer’ to a. However, f (a) exists, so this concept might seem weird; that is, why do we want to know how f behaves as x gets ‘closer and closer’ to a since we know f (a)? The short answer is that, due to things like fluctuations, f may behave very differently as x gets ‘closer and closer’ to a than it does at x = a. This leads us to the following heuristic concept. Heuristic Definition. A number L is said to be the limit of a function f as x tends to a if the values of f (x) approximate L provided that x is very close, but not equal to a. Since we are only interested in the behaviour of f as x tends to a, it is not necessary that f (a) is well-defined. Furthermore, we do not need need f 63 64 CHAPTER 5. CONTINUITY to be defined on all of R, but just near a; that is, on an open interval that contains a, except for possibly at a. Using this and our heuristic definition, we arrive at our definition of a limit (and the reason this course is often called a first course in -δ). Definition 5.1.1. Let a ∈ R, let I be an open interval containing a, and let f be a function define on I except at possibly a. A number L ∈ R is said to be the limit of f as x tends to a if for every > 0 there exists a δ > 0 (which depends on ) such that if 0 < |x − a| < δ then |f (x) − L| < . If L is the limit of f as x tends to a, we say the limit of f (x) as x tends to a exists and write L = limx→a f (x). Otherwise we say the limit does not exist. Note the assumption that f is defined on an open interval I containing a is necessary to ensure that f (x) is well-defined provided 0 < |x − a| < δ and δ is chosen sufficiently small. As it took some time for use to get use to the -N definition of a limit of a sequence, let’s provide some examples for checking the -δ definition of a limit of a function. Example 5.1.2. Let f (x) = 3x + 1 for all x ∈ R. Does limx→2 3x + 1 exist? Our intuition says yes; as x tends to 2, we expect f (x) to tend to 3(2) + 1 = 7. To see this using the definition of the limit, let > 0 be arbitrary. Let δ = 3 > 0. Then if 0 < |x − 2| < δ, then |f (x) − 7| = |(3x + 1) − 7| = |3x − 6| = 3|x − 2| < 3 = . 3 Hence, as > 0 was arbitrary, we obtain that limx→2 3x + 1 = 7 by the definition of the limit. Furthermore, if we define ( g(x) = 3x + 1 100 if x 6= 2 , if x = 2 then it is still the case that limx→2 g(x) = 7. Example 5.1.3. Let f (x) = x2 for all x ∈ R. Does limx→3 x2 exist? Our intuition says yes; as x tends to 3, we expect f (x) to tend to 32 = 9. To see this using the definition of the limit, let > 0 be arbitrary. Let δ = min{1, 7 } > 0. To see why we chose this δ, we would first do the computation |x2 − 9| = |(x + 3)(x − 3)| = |x + 3||x − 3|. Thus, we know we can make |x − 3| < δ, so provided we can find an upper bound of |x + 3|, then we will be fine. Thus, with our choice of δ, we see 5.1. LIMITS OF FUNCTIONS 65 that if 0 < |x − 3| < δ, then |x − 3| ≤ 1 so x ∈ [2, 4]. Hence 0 < |x − 3| < δ implies |x + 3| ≤ 7 and thus |x2 − 9| = |x + 3||x − 3| ≤ 7|x − 3| < 7 = . 7 Hence, as > 0 was arbitrary, we obtain that limx→3 x2 = 9 by the definition of the limit. x Example 5.1.4. Let f (x) = |x| for x 6= 0. Does limx→0 f (x) exist? Well, if x > 0 then f (x) = 1 whereas if x < 0 then f (x) = −1. Thus if x is close to 0, it is possible that f (x) is either ±1 so we do not expect the limit to exist. To see this via our definition, suppose L = limx→0 f (x). Let = 1 and let δ > 0 be such that |f (x) − L| < for all 0 < |x| < δ. Therefore |f ( 2δ ) − L| < 1 and |f (− 2δ ) − L| < 1, which implies |1 − L| < 1 (i.e. L ∈ (0, 2)) and | − 1 − L| < 1 (i.e. L ∈ (−2, 0)). As this is impossible, we see that the limit of f as x tends to 0 does not exist. As with sequences, we used the word ‘the’ in the definition of the limit of a function. Again we must demonstrate that this is warranted. Proposition 5.1.5. Let a ∈ R, let I be an open interval containing a, and let f be a function define on I except at possibly a. If L and K are limits of f as x tends to a, then L = K. 6 K, we know that Proof. Suppose that L 6= K. Let = |L−K| 2 . Since L = > 0. Since L is a limit of f as x approaches a, we know by the definition of a limit that there exists a δ1 > 0 such that if 0 < |x − a| < δ1 then |f (x) − L| < . Similarly, since K is a limit of f as x approaches a, we know by the definition of a limit that there exists a δ2 > 0 such that if 0 < |x − a| < δ2 then |f (x) − K| < Let δ = min{δ1 , δ2 } > 0. By the above paragraph, we have that 0 < |x − a| < δ implies |f (x) − L| < and |f (x) − K| < . Choose x0 ∈ I such that 0 < |x − a| < δ (such an x0 exists since I is an open interval containing a). Hence by the Triangle Inequality |L − K| ≤ |L − f (x)| + |f (x) − K| < + = 2 = |L − K| which is absurd (i.e. x < x is false for all x ∈ R). Thus we have obtained a contradiction so it must be the case that L = K. Notice the proof of Proposition 5.1.5 is quite similar to that of Proposition 2.1.10 where the limit of a sequence was shown to be unique. Coincidence, I think not! The limit of a function f as x tends to a is intimately connected with the limit of sequences obtained by applying f to sequences converging to a. 66 CHAPTER 5. CONTINUITY Theorem 5.1.6. Let a ∈ R, let I be an open interval containing a, and let f be a function define on I except at possibly a. Then L = limx→a f (x) if and only if whenever (xn )n≥1 has the properties that xn 6= a for all n ∈ N and limn→∞ xn = a, then limn→∞ f (xn ) = L. Proof. First, suppose L = limx→a f (x). Let (xn )n≥1 be such that xn 6= a for all n ∈ N and limn→∞ xn = a. We desire to show that limn→∞ f (xn ) = L. Let > 0 be arbitrary. Since L = limx→a f (x), there exists a δ > 0 such that if 0 < |x − a| < δ, then |f (x) − L| < . Since δ > 0 and limn→∞ xn = a, there exists an N ∈ N such that |xn − a| < δ for all n ≥ N . Hence, if n ≥ N then |xn − a| < δ so |f (xn ) − L| < (as xn 6= a). Therefore, as > 0 was arbitrary, limn→∞ f (xn ) = L as desired. Conversely, suppose that f does not converge to L as x tends to a. Thus there exists an > 0 such that for all δ > 0 there exists an x ∈ I such that 0 < |x − a| < δ yet |f (x) − L| ≥ . For each n ∈ N, choose xn ∈ I such that 0 < |xn − a| < n1 yet |f (xn ) − L| ≥ . Then (xn )n≥1 is a sequence with the property that xn = 6 a for all n ∈ N. Furthermore, since 0 < |xn − a| < n1 for all n ∈ N, we obtain that limn→∞ xn = a. However, since |f (xn ) − L| ≥ for all n ∈ N, we see that (f (xn ))n≥1 does not converge to L. Theorem 5.1.6 provides an alternate way of defining the limit of a function; instead of using -δ, we use sequences. The sequential definition of a limit will have many applications for us. For example, pretty much all of our theorems from sequences will extend easily to functions. √ Example 5.1.7. Let a > 0. Then f (x) = x is defined on an open interval √ √ containing a. Furthermore, by the homework, limx→a x = a. Furthermore, the sequential definition of the limit of a function is particularly useful in showing that limits do not exists. This is because we need only construct to sequences converging to the point x = a that have different limits once f is applied to them. Example 5.1.8. The function f : R → R defined by f (x) = 0 sin if x = 0 1 x if x 6= 0 has no limit as x tends to 0. To see this, consider the sequences (an )n≥1 2 2 and (bn )n≥1 defined by an = π(4n+1) and bn = π(4n−1) for all n ∈ N. Clearly, limn→∞ an = limn→∞ bn = 0. However lim f (an ) = lim 1 = 1 n→∞ n→∞ and lim f (bn ) = lim −1 = −1. n→∞ n→∞ Thus, as the above limits differ, limx→0 f (x) does not exist. Although the sequential definition of a limit will be quite useful as we have built up our theory of limits of sequences, the -δ definition will be useful for more theoretical applications in the pages to come. 5.1. LIMITS OF FUNCTIONS 5.1.2 67 Limit Theorems for Functions Using Theorem 5.1.6, we easily import results from Chapter 2 Theorem 5.1.9. Let a ∈ R, let I be an open interval containing a, and let f and g be functions define on I except at possibly a. If L = limx→a f (x) and K = limx→a g(x), then a) limx→a f (x) + g(x) = L + K. b) limx→a f (x)g(x) = LK. c) limx→a cf (x) = cL for all c ∈ R. d) limx→a f (x) g(x) = L K whenever K 6= 0. Proof. Combine Theorem 2.3.1 together with Theorem 5.1.6. Example 5.1.10. For each c, a ∈ R, we easily see that limx→a c = c and limx→a x = a (where the later comes from taking δ = in Definition 5.1.1. Consequently, by applying (b) of Theorem 5.1.9, we obtain that limx→a cxn = can for all n ∈ N and all c ∈ R. Therefore, by applying (b) of Theorem 5.1.9, we obtain that limx→a p(x) = p(a) for all polynomials p. Example 5.1.11. Let f (x) = p(x) q(x) where p and q are polynomials where q is not the zero polynomial. Such a function is said to be a rational function. If a ∈ R is such that q(a) 6= 0, then (d) implies that limx→a f (x) = f (a). As with sequences, given two functions f and g such that limx→a g(x) = 0, (x) (x) one may ask whether limx→a fg(x) exists. Clearly if limx→a fg(x) = L and limx→a g(x) = 0 then Theorem 5.1.9 implies limx→a f (x) exists and f (x) lim f (x) = lim lim g(x) = L(0) = 0. x→a x→a x→a g(x) Like with sequences, if limx→a g(x) = 0 yet limx→a f (x) 6= 0 (or does not exist), there are many possible behaviours, some of which we will examine in the next section. For now, we continue to note results from sequences hold for functions. Theorem 5.1.12 (Squeeze Theorem). Let a ∈ R, let I be an open interval containing a, and let f , g, and h be functions define on I except at possibly a. Suppose for each x ∈ I \ {a} that g(x) ≤ f (x) ≤ h(x). If limx→a g(x) and limx→a h(x) exist and L = limx→a g(x) = limx→a h(x), then limx→a f (x) exists and limx→a f (x) = L. 68 CHAPTER 5. CONTINUITY Proof. Combine Theorem 2.3.8 together with Theorem 5.1.6. Again, the Squeeze Theorem has its uses when dealing with difficult functions that may be compared to simple ones. Example 5.1.13. Consider the function f (x) = x sin 1 if x 6= 0 0 if x = 0 x . In Example 5.1.8 we saw that limx→0 x1 f (x) did not exist. However, since 1 x −|x| ≤ f (x) ≤ |x| since − 1 ≤ sin ≤ 1 for all x ∈ R \ {0}, and since limx→0 x = limx→0 −x = 0, we see that limx→0 f (x) = 0 by the Squeeze Theorem. Finally, the Comparison Theorem is also useful when comparing limits of functions. Theorem 5.1.14 (Comparison Theorem). Let a ∈ R, let I be an open interval containing a, and let f and g be functions define on I except at possibly a. Suppose for each x ∈ I \ {a} that g(x) ≤ f (x). If L = limx→a f (x) and K = limx→a g(x) exist, then K ≤ L. Proof. Combine Theorem 2.3.11 together with Theorem 5.1.6. 5.1.3 One-Sided Limits The limits of functions we have been dealing with so far may be called two-sided limits. The rational behind this terminology is that one must examine numbers (or sequences with terms) that are larger and/or smaller than the target point x = a. Thus, one must examine the behaviour of the function on both sides of the target. Restricting to one side or the other weakens the requirement for the limit to exist at the cost of slightly less information. Definition 5.1.15. Let a ∈ R, let I be an open interval with a as the left endpoint, and let f be a function defined on I. A number L ∈ R is said to be the right-sided limit of f as x tends to a if for every > 0 there exists a δ > 0 (which depends on ) such that if a < x < a + δ then |f (x) − L| < . In this case, we say that f (x) converges to L as x approaches a from above and write L = limx→a+ f (x). 5.1. LIMITS OF FUNCTIONS 69 Definition 5.1.16. Let a ∈ R, let I be an open interval with a as the right endpoint, and let f be a function defined on I. A number L ∈ R is said to be the left-sided limit of f as x tends to a if for every > 0 there exists a δ > 0 (which depends on ) such that if a − δ < x < a then |f (x) − L| < . In this case, we say that f (x) converges to L as x approaches a from below and write L = limx→a− f (x). x Example 5.1.17. Let f (x) = |x| for x 6= 0. Recall from Example 5.1.4 that limx→0 f (x) did not exist. However, limx→0+ f (x) and limx→0− f (x) do exist. To see this, we notice that f (x) = 1 for all x > 0, so clearly limx→0+ f (x) = 1. Similarly, f (x) = −1 for all x < 0 so limx→0− f (x) = −1. As with two-sided limits, the definitions of one-sided limits can be phrased in terms of sequences. Theorem 5.1.18. Let a ∈ R, let I be an open interval with a as the left (right) endpoint, and let f be a function defined on I. Then L = limx→a+ f (x) (L = limx→a− f (x)) if and only if whenever (xn )n≥1 has the properties that xn > a (xn < a) for all n ∈ N and limn→∞ xn = a, then limn→∞ f (xn ) = L. Proof. Repeat the ideas of Theorem 5.1.6. Again, as with two-sided limits, the results pertaining to limits of sequences easily import to the one-sided limit setting. Corollary 5.1.19. The conclusions of Theorems 5.1.9, 5.1.12, and 5.1.14 when two-sided limits are replaced with one-sided limits (under the necessary modifications to the hypotheses). It is clear that for limx→a f (x) to exist, we must have limx→a+ f (x) and limx→a− f (x) existing. However, Example 5.1.17 demonstrates the existence of limx→a+ f (x) and limx→a− f (x) is not enough. Of course, the problem with Example 5.1.17 is simply that limx→a+ f (x) 6= limx→a− f (x) Theorem 5.1.20. Let a ∈ R, let I be an open interval containing a, and let f be a function define on I except at possibly a. Then limx→a f (x) exists if and only if limx→a+ f (x) and limx→a− f (x) exist and limx→a+ f (x) = limx→a− f (x). Furthermore, if limx→a f (x) exists, then lim f (x) = lim f (x) = lim f (x). x→a x→a+ x→a− Proof. For the first direction, suppose limx→a f (x) exists and let L = limx→a f (x). We desire to show that limx→a+ f (x) and limx→a− f (x) exist and are both equal to L. To see this, let > 0 be arbitrary. Since L = limx→a f (x), there exists a δ > 0 such that if 0 < |x − a| < δ, then |f (x) − L| < . Hence if a < x < a + δ then |f (x) − L| < . Thus, as > 0 was arbitrary, by the definition of the right-sided limit limx→a+ f (x) exists 70 CHAPTER 5. CONTINUITY and equals L. Similarly if a − δ < x < a then |f (x) − L| < . Thus, as > 0 was arbitrary, by the definition of the left-sided limit limx→a− f (x) exists and equals L. Hence this direction of the proof is complete. For the other direction, suppose limx→a+ f (x) and limx→a− f (x) exist and L = limx→a+ f (x) = limx→a− f (x). To see that limx→a f (x) exists and equals L, let > 0 be arbitrary. Since L = limx→a+ f (x), there exists a δ1 > 0 such that if a < x < a + δ1 , then |f (x) − L| < . Similarly, since L = limx→a− f (x), there exists a δ2 > 0 such that if a − δ2 < x < a, then |f (x) − L| < . Therefore, if δ = min{δ1 , δ2 } > 0, then 0 < |x − a| < δ implies either a < x < a + δ1 or a − δ2 < x < a and thus |f (x) − L| < . Therefore, since > 0 was arbitrary, we have by the definition of the limit that limx→a f (x) exists and equals L. The benefit of Theorem 5.1.20 that it is often easier to deal with one side of the limit at a time, and then combine the results at the end. Theorem 5.1.21 (The Fundamental Trigonometric Limit). sin(θ) = 1. θ→0 θ lim Proof. First, suppose 0 < θ < π2 . Consider the following diagrams where the specified point is (cos(θ), sin(θ)): 1 1 θ −1 θ 1 −1 1 −1 −1 1 θ −1 1 −1 5.1. LIMITS OF FUNCTIONS 71 Note, as the coordinates of the point on the circle are (cos(θ), sin(θ)), the area of the first triangle is 12 cos(θ) sin(θ) and the area of the second triangle θ is 12 (1) tan(θ). The area of the region subtended by the arc is 2π times the area of the circle, which is π. Hence the area of the region subtended by the arc 2θ . Hence we see that 1 1 1 sin(θ) cos(θ) sin(θ) ≤ θ ≤ . 2 2 2 cos(θ) Therefore, since 1 2 sin(θ) > 0 when 0 ≤ θ < π2 , we obtain that 0 < cos(θ) ≤ 1 θ ≤ . sin(θ) cos(θ) Hence, by taking reciprocals, we obtain that 1 sin(θ) ≥ ≥ cos(θ). cos(θ) θ However, since limθ→0 cos(θ) = 1 and thus limθ→0 1 cos(θ) = 1, we obtain by limθ→0+ sin(θ) = 1 (where we only get the rightθ π 0 < θ < 2 ). Since limθ→0− sin(θ) = 1 by similar θ the Squeeze Theorem that sided limit as we assumed arguments (or because sin(−θ) = − sin(θ)), the result follows by Theorem 5.1.20. 5.1.4 Limits at and to Infinity There are many more types of limits we may examine. Much of the theory follows along the same lines as the previous results in this section, so we will only summarize the definitions and results, and provided a few examples. First, instead of requiring a ∈ R, we may ask for limits as x tends to ±∞. Definition 5.1.22. Let f be a function define on an interval (c, ∞). A number L ∈ R is said to be the limit of f as x tends to ∞ if for every > 0 there exists an M > 0 (which depends on ) such that if M ≤ x then |f (x) − L| < . In this case, we say that f (x) converges to L as x tends to ∞ and write L = limx→∞ f (x). Definition 5.1.23. Let f be a function define on an interval (−∞, c). A number L ∈ R is said to be the limit of f as x tends to −∞ if for every > 0 there exists an M > 0 (which depends on ) such that if M ≥ x then |f (x) − L| < . In this case, we say that f (x) converges to L as x tends to −∞ and write L = limx→−∞ f (x). Theorem 5.1.24. Let f be a function define on an interval (c, ∞). Then L = limx→∞ f (x) if and only if whenever (xn )n≥1 has the properties that xn > c for all n ∈ N and limn→∞ xn = ∞, then limn→∞ f (xn ) = L. 72 CHAPTER 5. CONTINUITY Proof. Repeat the ideas of Theorem 5.1.6. Corollary 5.1.25. The conclusions of Theorems 5.1.9, 5.1.12, and 5.1.14 when a = ±∞ (under the necessary modifications to the hypotheses). Example 5.1.26. It is not difficult to verify based on definitions that limx→∞ x1 = 0. Example 5.1.27. Let f (x) = f (x) = 3x2 −2x+1 . 2x2 +5x−2 (x2 )(3 − (x2 )(2 + 2 x 5 x + − Then, for sufficiently large x, 1 ) x2 2 ) x2 = 3− 2+ 2 x 5 x + − 1 x2 2 x2 . Hence lim f (x) = x→∞ 3 − 2(0) − (0)(0) 3 = . 2 + 5(0) − 2(0)(0) 2 Example 5.1.28. We claim that limx→∞ sin(x) ≤ 1 for all x ∈ R, we see that − sin(x) x = 0. Indeed, since −1 ≤ 1 sin(x) 1 ≤ ≤ x x x for all x > 0. Hence, since limx→∞ x1 = limx→∞ − x1 = 0, we obtain that = 0 by the Squeeze Theorem limx→∞ sin(x) x Like with sequences, we can discuss limits of unbounded functions. Definition 5.1.29. Let a ∈ R, let I be an open interval containing a, and let f be a function define on I except at possibly a. The function f is said to diverge to infinity (negative infinity) as x tends to a if for every M > 0 there exists a δ > 0 (which depends on ) such that if 0 < |x − a| < δ then f (x) ≥ M (f (x) ≤ −M ). In this case we write limx→a f (x) = ∞ (limx→a f (x) = −∞). 1 = ∞. Indeed if M > 0, and Example 5.1.30. Notice that limx→0 |x| 1 1 0 < |x| < M , then |x| > M . However, limx→0 x1 = 6 ∞ since if x < 0, then 1 x < 0 < M. Of course, we may combine Definition 5.1.29 with Definitions 5.1.22 and 5.1.23. Furthermore, we may discuss one-sided versions of Definitions 5.1.22 and 5.1.23. We could go on and on showing the same results hold and doing more examples. Instead, lets move onto bigger and better things. 5.2. CONTINUITY OF FUNCTIONS 5.2 73 Continuity of Functions With our discussion of limits complete, we may start using them to study far more interesting objects. In particular, we desire to examine functions for which limits exist at each point and are equal to evaluating the function at each point. Definition 5.2.1. A function f defined on an open interval containing a number a ∈ R is said to be continuous at a if limx→a f (x) exists and limx→a f (x) = f (a). Of course, as we had various definitions of the limit of a function, we can rephrase the definition of continuity in various ways. Using the -δ version of a limit, we obtain the following equivalent formulation of continuity. Definition 5.2.2 (-δ Definition of Continuity). A function f defined on an open interval containing a number a ∈ R is said to be continuous at a if for all > 0 there exists a δ > 0 such that if |x − a| < δ then |f (x) − f (a)| < . Similarly, using Theorem 5.1.6, we obtain the following equivalent formulation of continuity. Definition 5.2.3 (Sequential Definition of Continuity). A function f defined on an open interval containing a number a ∈ R is said to be continuous at a if whenever (xn )n≥1 converges to a, the sequence (f (xn ))n≥1 converges to f (a). Example 5.2.4. Using Example 5.1.10, we see that if p(x) is a polynomial, then p(x) is continuous at a for all a ∈ R. Similarly, using Example 5.1.11, we see that if p(x) and q(x) are polynomials, then p(x) q(x) is continuous at a provided q(a) 6= 0. √ Example 5.2.5. The function f (x) = x is continuous at a for all a > 0 by Example 5.1.7. Remark 5.2.6. We will assume throughout the remainder of the course that sin(x), cos(x), and ex are continuous at all points in R. Furthermore, we will assume that ln(x) (the inverse of ex ), is continuous for all a > 0. The idea behind the proofs that all of these functions are continuous (modulo ln which will be demonstrated in the next section) are based on ideas from Section 5.4 together with a notion of convergence of functions (which we will not deal with in this course). Of course, once one has continuous functions, we expect w can form new continuous functions using the old ones. Theorem 5.2.7. Let f (x) and g(x) be functions on an open interval I containing a number a ∈ R. If f and g are continuous at a, then 74 CHAPTER 5. CONTINUITY a) f (x) + g(x) is continuous at a. b) f (x)g(x) is continuous at a. c) cf (x) is continuous at a for all c ∈ R. d) f (x) g(x) is continuous at a provided g(a) 6= 0. Proof. Apply Theorem 5.1.9 together with the definition of continuity. One other nice operation we have seen for functions is composition. Indeed, if f, g : R → R are functions, we can consider the function g ◦ f and ask whether or nor g ◦ f is continuous at a point. Of course, we will want to extend to functions that are not defined on all of R, so we will need to impose conditions to guarantee the composition is well-defined. In fact the following is the best we can do to obtain continuity at a point (see later examples). Theorem 5.2.8. Let f be a function defined on an open interval I1 containing a number a ∈ R. Let g be a function defined on an open interval I2 which contains f (I2 ). If f is continuous at a and g is continuous at f (a), then g ◦ f is continuous at a. Proof. Let (xn )n≥1 be a sequence such that limn→∞ xn = a. Since f is continuous at a, we know that limn→∞ f (xn ) = f (a). Since g is continuous at f (a) and (f (xn ))n≥1 is a sequence that converges to f (a), we obtain that limn→∞ g(f (xn )) = g(f (a)). Hence, by definition g ◦ f is continuous at a. Of course, there are many functions that are not continuous at points which we now desire to discuss. Example 5.2.9. Consider the function f (x) = x 1 −x2 if x > 0 if x = 1 . if x < 0 Then f is continuous at a precisely when a 6= 0. Indeed f is not continuous at 0 since limx→0 f (x) = 0, yet f (0) = 1 6= 0. Such a discontinuity (one where limx→a f (x) exists, but is not equal to f (a)) is said to be removable as one could simply redefine the value of f at a to obtain a function that is continuous at a (i.e. we can remove the discontinuity by redefining a single point). Of course, if limx→a f (x) does not exist, then clearly f is discontinuous at a. The following are some basic examples of how limx→a f (x) may fail to exist. 5.2. CONTINUITY OF FUNCTIONS 75 Example 5.2.10. Consider the function f (x) = x if x 6= 0 1 ifx = 0 |x| . Then limx→0 f (x) does not exist. Since limx→0+ f (x) and limx→0− f (x) exist, such a discontinuity is called a jump discontinuity. Example 5.2.11. The function ( f (x) = 1 x if x 6= 0 if x = 0 0 is discontinuous at 0 since limx→0+ f (x) = ∞. This is an example of an unbounded discontinuity. Example 5.2.12. The function f : R → R defined by f (x) = 0 if x = 0 sin 1 x if x 6= 0 is discontinuous at 0 since limx→0+ f (x) does not exist. This is an example of an osculating discontinuity Of course, a discussion would not be complete with examining some very bizarre functions. Example 5.2.13. Consider the function ( f (x) = 1 0 if x ∈ Q . if x ∈ R \ Q If a ∈ R, then f is discontinuous at a. Indeed if a ∈ R \ Q, we know (from the homework) that there exists a sequence (xn )n≥1 of rational numbers such that limn→∞ xn = a. However, since f (xn ) = 1 for all n yet f (a) = 0, we obtain that f is discontinuous at a. Similarly, if a ∈ Q, then (from the homework) that there exists a sequence (xn )n≥1 of irrational numbers such that limn→∞ xn = a. However, since f (xn ) = 0 for all n yet f (a) = 1, we obtain that f is discontinuous at a. Example 5.2.14. Consider the function ( f (x) = 1 n 0 if x = m n where n ∈ Nand m ∈ Z have no common divisors . if x ∈ R \ Q By the same arguments as the previous example, f is discontinuous at each rational number. However, f is continuous at each irrational number (see the homework). 76 CHAPTER 5. CONTINUITY To complete this section, we note that continuity and compositions do not play well together if there are some discontinuities. Remark 5.2.15. If f is discontinuous on Q and g is discontinuous at a single point, it is possible that g ◦ f is nowhere continuous. Indeed let ( g(x) = 1 if x 6= 0 0 if x = 0 and let f be as in Example 5.2.14. Then g ◦ f is the function in Example 5.2.13, which was discontinuous at all points. There are many other questions one may ask pertaining to discontinuities. For example, is there a function that is continuous on Q but discontinuous on R \ Q? The answer to this question turns out to be no. However, although we could pursue this question, we will not for the sake of studying functions that are continuous on a large collection of points and their properties. 5.3 The Intermediate Value Theorem Suppose two people are start walking on a path at the same time, one starting on one end, the other starting at the other. If they walk until they both reach the end of the path opposite to where they started, must they eventually meet as time progresses? Of course, logic says they must. But how can we mathematically prove said result. Of course, specific assumptions must be made in the above problem. For example, we are assuming that position is a function of time (no time travel permitted via Delorean’s and TARDISs’). Furthermore, we must assume that our functions are continuous at each point (i.e. no teleportation). As such, we desire to study functions that are continuous on open/closed intervals. Definition 5.3.1. A function f defined on (a, b) is said to be continuous on (a, b) if f is continuous at all points in (a, b). Definition 5.3.2. A function f defined on [a, b] is said to be continuous on [a, b] if f is continuous on (a, b), limx→a+ f (x) = f (a), and limx→b− f (x) = f (b). Notice that polynomials, exponentials, sine, and cosine are continuous functions on any open or closed interval. Furthermore, Theorems 5.2.7 and 5.2.8 may be used to construct additional continuous functions from known continuous functions. For more examples of continuous functions on intervals, the function √ f (x) = x is continuous on any open or closed subinterval of [0, ∞). Similarly, 1 1 x and sin( x ) are continuous on (0, 1) but cannot be extended to be continuous 5.3. THE INTERMEDIATE VALUE THEOREM 77 functions on [0, 1] (that is, there is no way to define values of these functions at 0 to make them continuous. Continuous functions on intervals will play a major role in the remainder of this course. In fact, there is a deep connection between continuous functions and open sets (as discussed in Chapter 3). Indeed one can show that a function f : R → R is continuous on R if and only if f −1 (U ) is open for every open set U ⊆ R. This enables one to define the notion of continuous functions on arbitrary sets provided one has a nice notion of ‘open sets’. Instead of pursuing abstractification of continuous functions, we will examine properties of continuous functions on (generally closed) intervals of R. We begin by obtaining our first piece of the Triforce of theory about continuous functions Theorem 5.3.3 (The Intermediate Value Theorem). If f : [a, b] → R is continuous on [a, b] and if α ∈ R is such that f (a) < α < f (b) (or f (b) < α < f (a)), then there exists a c ∈ (a, b) such that f (c) = α. Proof. Let f : [a, b] → R be a continuous function on a closed interval [a, b] and let α ∈ R be such that f (a) < α < f (b) (the proof when f (b) < α < f (a) is similar). Define g : [a, b] → R by g(x) = f (x) − α for all x ∈ [a, b]. Hence it suffice to show there exists a c ∈ (a, b) such that g(c) = 0. Notice g(a) < 0 < g(b) and that g is continuous on [a, b]. Let S = {x ∈ [a, b] | g(x) ≤ 0}. Since f (a) < 0, we see that a ∈ S so S is non-empty. Furthermore, since S is bounded above by b, the Least Upper Bound Property (Theorem 1.3.18) implies that c = lub(S) exists. Notice c 6= b since g(b) > 0 so b ∈ / S. Hence c ∈ [a, b). We claim that g(c) = 0. Note this would imply c 6= a (so c ∈ (a, b) as desired) since g(a) < 0. To see that g(c) = 0, first recall (by the homework) that there exists a sequence (xn )n≥1 such that xn ∈ S for all n ∈ N and limn→∞ xn = lub(S) = c. Therefore, since g is continuous, we obtain that g(c) = limn→∞ g(xn ). Since xn ∈ S for all n ∈ N, we know that g(xn ) ≤ 0 for all n ∈ N and thus g(c) = limn→∞ g(xn ) ≤ 0. Thus, to show that g(c) = 0, it suffices to show that g(c) ≥ 0. Since c < b, for each n ∈ N there exists a yn ∈ [a, b] such that c < yn < c + n1 . Since yn > c, we have that yn ∈ / S for all n ∈ N. Since limn→∞ yn = c by the Squeeze theorem, we obtain that g(c) = limn→∞ g(yn ) as g is continuous. Since yn ∈ / S for all n ∈ N, g(yn ) > 0 for all n ∈ N and thus g(c) = limn→∞ g(yn ) ≥ 0. Hence g(c) = 0 so the proof is complete. The Intermediate Value Theorem has a wide range of applications. For example, it may be used to solve the question posed at the beginning of this section (modulo modes of transit that have yet to be invented yet). 78 CHAPTER 5. CONTINUITY To see this, suppose the path is c units long. Suppose that f (t) and g(t) represent the distance along the path from the start for both of the two people. Then the values of the functions at 0 are 0 and c; that is, we may assume that f (0) = 0 and g(0) = c. Eventually, when t is some number b, the people are at the opposite ends of the path. Thus f (b) = c and g(b) = 0. If we consider the function h(t) = g(t) − f (t), then h(0) = c > 0 whereas h(b) = −c < 0. Consequently, assuming that h is continuous, the Intermediate Value Theorem implies that there exists a time t0 such that h(t0 ) = 0; that is, g(t0 ) = f (t0 ) and the two people are at the same point on the path. Time for one more example: Example 5.3.4. We claim there exists a z ∈ [0, π2 ] such that cos(z) = z. To see this, consider the function f (x) = x − cos(x). Since π 2 f (0) = 0 − 1 = −1 < 0 and f = π − 0 > 0, 2 and since f is continuous, the Intermediate Value Theorem implies there exists a z ∈ [0, π2 ] such that f (z) = 0 (and thus cos(z) = z). 5.4 Uniform Continuity There is a stronger form of continuity that we may wish to examine in this course. Definition 5.4.1. A function f defined on an interval I is said to be uniformly continuous on I if for all > 0 there exists a δ > 0 such that if x, y ∈ I and |x − y| < δ then |f (x) − f (y)| < . Note the difference between Definitions 5.2.2 and 5.4.1 is that for a fixed , δ > 0 need only work for a given point in Definition 5.2.2 whereas for uniformly continuous functions, Definition 5.4.1 enforces that the same δ works for all points in the interval! It is clear from the definition of continuity that constant functions are uniformly continuous (for an > 0, we can let δ be anything). Furthermore, it is clear that if f (x) = x for all x ∈ R, then f is uniformly continuous on R (for an > 0, let δ = in the definition). Other functions are harder to tell. Example 5.4.2. We claim that f (x) = x2 is uniformly continuous on (−1, 1). To see this, let > 0 be arbitrary. To see this, let δ = 2 > 0. Then if x, y ∈ I and |x − y| < δ, we notice that |f (x) − f (y)| = |x2 − y 2 | = |x + y||x − y| ≤ 2|x − y| < 2 = 2 where we have used the fact that |x + y| ≤ 2 for all x, y ∈ I. Hence, as > 0 was arbitrary, the result follows. 5.4. UNIFORM CONTINUITY 79 Example 5.4.3. We claim that f (x) = x2 is not uniformly continuous on R. To see this, we claim Definition 5.4.1 fails for = 2. To see this, we must demonstrate that for all δ > 0 (specifically δ = n1 for all n ∈ N will do) there exists x, y ∈ R such that |x − y| < δ yet |f (x) − f (y)| ≥ 1. In particular, for 1 n ∈ N, let xn = n and yn = n + n1 . Then |xn − yn | < n−1 yet 1 1 2 2 |f (xn ) − f (yn )| = n − n + = 2 + 2 ≥ 2. n n Hence f is not uniformly continuous on R. The above examples show how uniform continuity may be more desirable than simple continuity; having a δ that works for the whole interval seems far more powerful than at a single point. However, we have seen that even x2 is not uniformly continuous on all of R as x2 grows too quickly as x tends to infinity. Consequently, one may ask, “Are things much nicer if we restrict to intervals?” For closed intervals, yes! Theorem 5.4.4. If a function f is continuous on a closed interval [a, b], then f is uniformly continuous on [a, b]. Proof. Let f be a continuous function on [a, b]. Suppose to the contrary that f is not uniformly continuous on [a, b]. Thus, by the definition of uniform continuity, there exists an > 0 such that for all δ > 0 there exists x, y ∈ [a, b] with |x − y| < δ yet |f (x) − f (y)| ≥ . Hence, for each n ∈ N there exist xn , yn ∈ [a, b] with |xn − yn | < n1 yet |f (xn ) − f (yn )| ≥ . Since [a, b] is closed and bounded, [a, b] is sequentially compact by Theorem 3.3.8. Therefore there exists a subsequence (xkn )n≥1 of (xn )n≥1 that converges to some number L ∈ [a, b]. Consider the subsequence (ykn )n≥1 of (yn )n≥1 . Notice for all n ∈ N that |ykn − L| ≤ |ykn − xkn | + |xkn − L| ≤ 1 1 + |xkn − L| ≤ + |xkn − L|. kn n Therefore, since limn→∞ |xkn − L| = 0 and limn→∞ n1 = 0, we obtain that limn→∞ ykn = L by the Squeeze Theorem. Since L = limn→∞ xkn = L and since f is continuous, there exists an N1 ∈ N such that |f (xkn ) − L| < 2 for all n ≥ N1 . Similarly, since L = limn→∞ ykn = L and since f is continuous, there exists an N2 ∈ N such that |f (ykn ) − L| < 2 for all n ≥ N2 . Therefore, if n = max{N1 , N2 }, we obtain that |f (xkn ) − f (ykn )| ≤ |f (xkn ) − L| + |L − f (ykn )| < + = 2 2 which contradicts the fact that |f (xkn ) − f (ykn )| ≥ . Hence, as we have obtained a contradiction, it must have been the case that f is uniformly continuous on [a, b]. 80 CHAPTER 5. CONTINUITY Note the essential part of the above proof is that [a, b] is compact. In fact, if one extends the notion of continuous function from functions with intervals as domains to functions with arbitrary sets as domains, then any continuous function on a compact set will be uniformly continuous. Using Theorem 5.4.4, we can demonstrate additional functions on R are uniformly continuous. Example 5.4.5. We claim that the function f (x) = cos(x) is uniformly continuous on R. To see this, let > 0 be arbitrary. Since f is uniformly continuous on [−2π, 2π], there exists a δ > 0 such that |f (x) − f (y)| < whenever x, y ∈ [−2π, 2π] and |x − y| < δ. Due to the fact that cos(x + 2π) = cos(x) for all x ∈ R, it is then easy to see that if x, y ∈ R are such that |x − y| < δ, then |f (x) − f (y)| < . Example 5.4.6. We claim that the function f (x) = x2x+1 is uniformly continuous on R. Indeed let > 0 be arbitrary. Since limx→∞ f (x) = 0, there exists an M1 > 0 such that |f (x)| < 2 for all x > M1 . Similarly, limx→−∞ f (x) = 0, there exists an M2 > 0 such that |f (x)| < 2 for all x < −M2 . Since f is continuous on [−3M2 , 3M1 ], f is uniformly continuous there by Theorem 5.4.4 and thus there exists a δ0 > 0 such that if x, y ∈ [−3M2 , 3M1 ] and |x − y| < δ0 , then |f (x) − f (y)| < . Let δ = min{δ0 , M1 , M2 } > 0. We claim that δ works for this . To see this, suppose x, y ∈ R are such that |x − y| < δ. If x ∈ [−2M2 , 2M1 ], then |x − y| < δ implies y ∈ [−3M2 , 3M1 ] so |f (x) − f (y)| < by the choice of δ0 . If x > 2M1 then |x − y| < δ implies y > M1 so |f (x) − f (y)| ≤ |f (x)| + |f (y)| < + = . 2 2 Finally, if x < −2M2 then |x − y| < δ implies y < −M2 so |f (x) − f (y)| ≤ |f (x)| + |f (y)| < + = . 2 2 Hence, as we have exhausted all possible cases for x, the result follows. To see that Theorem 5.4.4 fails on open intervals, we note the following two examples. Example 5.4.7. We claim that f (x) = x1 is not uniformly continuous on (0, 1). To see this, we claim Definition 5.4.1 fails for = 1. To see this, we must demonstrate that for all δ > 0 (specifically δ = n1 for all n ∈ N will do) there exists x, y ∈ (0, 1) such that |x − y| < δ yet |f (x) − f (y)| ≥ 1. In 1 yet particular, for n ∈ N, let xn = n1 and yn = n2 . Then |xn − yn | < n−1 1 1 n n |f (xn ) − f (yn )| = 1 − 2 = n − = ≥ 1. 2 2 n n Hence f is not uniformly continuous on (0, 1). 5.4. UNIFORM CONTINUITY 81 Example 5.4.8. We claim that f (x) = sin x1 is not uniformly continuous on (0, 1). To see this, we claim Definition 5.4.1 fails for = 1. To see this, we must demonstrate that for all δ > 0 (specifically δ = n1 for all n ∈ N will do) there exists x, y ∈ (0, 1) such that |x − y| < δ yet |f (x) − f (y)| ≥ 1. In 1 1 particular, for n ∈ N, let xn = 2πn+ . Since π and yn = 2πn+ 3π 2 2 lim xn = 0 = lim yn , n→∞ n→∞ for any δ > 0 we can find an N ∈ N such that |xN | < 12 δ and |yN | < 21 δ. Hence |xN − yN | < δ yet |f (xN ) − f (yN )| = |1 − (−1)| = 2 ≥ 1. Hence f is not uniformly continuous on (0, 1). The following result shows the reason why the above functions are not uniformly continuous on the bounded open intervals is that the one-sided limits at the boundaries do not exist. Theorem 5.4.9. Let f : (a, b) → R where a, b ∈ R. Then f is uniformly continuous on (a, b) if and only if there exists a continuous function g : [a, b] → R such that f (x) = g(x) for all x ∈ (a, b). Proof. Suppose there exists a continuous function g : [a, b] → R such that f (x) = g(x) for all x ∈ (a, b). Then, as g(x) is uniformly continuous on [a, b] by Theorem 5.4.4, we clearly obtain that f is uniformly continuous on (a, b) (i.e. for > 0, the δ that works for g also works for f ). For the other direction, suppose f is uniformly continuous on (a, b). To complete the result, we need to find a continuous g : [a, b] → R such that f (x) = g(x) for all x ∈ (a, b). In particular, we need only define g(a) and g(b). However, we will need g(a) = lim f (x) x→a+ and g(b) = lim f (x). x→b− Hence, provided the above two one-sided limits exist, the result will be complete. To see that limx→a+ f (x) exists via Theorem 5.1.18, we must show that that (f (xn ))n≥1 converges for every sequence (xn )n≥1 with xn > a for all n and limn→∞ xn = a, AND that (f (xn ))n≥1 converges to the same number for every choice of (xn )n≥1 . First, let (xn )n≥1 be such that a < xn < b for all n and limn→∞ xn = a. To see that (f (xn ))n≥1 converges, we claim that (f (xn ))n≥1 is Cauchy (and thus converges by Theorem 3.1.5). To see this, let > 0 be arbitrary. Since f is uniformly continuous on (a, b) there exists a δ > 0 such that if x, y ∈ (a, b) and |x − y| < δ, then |f (x) − f (y)| < . Since limn→∞ xn = a, we know that 82 CHAPTER 5. CONTINUITY (xn )n≥1 is Cauchy and thus there exists an N ∈ N such that |xn − xm | < δ for all n, m ≥ N . Hence, if n, m ≥ N , we obtain that |f (xn ) − f (xm )| < as desired. Hence (f (xn ))n≥1 is Cauchy. Suppose that (xn )n≥1 and (yn )n≥1 are such that a < xn , yn < b for all n ∈ N and limn→∞ xn = limn→∞ yn = a. Thus L = limn→∞ f (xn ) and K = limn→∞ f (yn ) exist. To see that L = K, let > 0 be arbitrary. Since f is uniformly continuous on (a, b) there exists a δ > 0 such that x, y ∈ (a, b) and |x−y| < δ, then |f (x)−f (y)| < 3 . Since limn→∞ xn −yn = 0, there exists an N1 ∈ N such that |xn − yn | < δ for all n ≥ N1 . Since limn→∞ f (xn ) = L, there exists an N2 ∈ N such that |f (xn ) − L| < 3 for all n ≥ N2 . Similarly, since limn→∞ f (yn ) = K, there exists an N3 ∈ N such that |yn − a| < 3 for all n ≥ N3 . Thus, if N = max{N1 , N2 , N3 }, then |xN − yN | < δ so |L − K| ≤ |L − f (xN )| + |f (xN ) − f (yN )| + |f (yN ) − K| < + + = . 3 3 3 Thus, we have shown that |L − K| < for all > 0. This implies |L − K| = 0 and thus L = K as desired. Hence we may define g(a) so that g(a) = limx→a+ f (x). Similar arguments show that we may define g(b) as desired thereby completing the proof. In the above proof, we have used what is known as a three- argument. Said argument is quite useful. For example, one can use a three- argument to show that a ‘uniform limit of continuous functions is a continuous function’. This enables to define the exponential function along with sine and cosine as uniform limits of polynomials thereby proving they are continuous. The construct of such polynomials will be motivated in the next chapter. Chapter 6 Differentiation With the above study of continuity, we may turn our attention to studying another important concept in calculus: differentiation. Constructed to be an approximation to the slope of the tangent line of the graph of a function at a point, derivatives are essential to studying the rate of changes of dynamical systems. Furthermore, the theory of derivatives can aid in computing limits of functions and in approximating functions with polynomials. 6.1 The Derivative To begin our study of the theory of differentiation, we must begin by studying a formal definition of the derivative and some of the basic rules for computing derivatives. 6.1.1 Definition of a Derivative Given a function f defined on an open interval I containing a, we desire to define the derivative to be the slope of the tangent line of the graph of f at a. As an approximation, we may pick any point x ∈ I and look at the slope of the line from (x, f (x)) to (a, f (a)). The slope of said line is f (x) − f (a) . x−a Furthermore, as x gets closer and closer to a, the slope of the line from (x, f (x)) to (a, f (a)) should better and better approximate the slope of the tangent line to f at a. Alternatively, one can think of the slopes better and better approximating the instantaneous rate of change of f at a. Consequently, we define the derivative as follows. Definition 6.1.1. Let f be a function defined on an open interval containing a. It is said that f is differentiable at a if lim x→a f (x) − f (a) x−a 83 84 CHAPTER 6. DIFFERENTIATION exists. If f is differentiable at a, we use f 0 (a) to denote the above limit. If f is differentiable at each point x in I, then the function f 0 : I → R whose value at x is f 0 (x) is called the derivative of f on I. Of course, although our motivation for defining the derivative was to find the slope of the tangent line to the graph of f at a, we have seen many odd functions so far. Thus we should not rely too much on our intuition. In addition, there is another way to formulate the derivative of a function f at a. Indeed, if x is tending to a, then x − a tends to 0. Substituting h = x − a, we see x = a + h so lim x→a f (x) − f (a) f (a + h) − f (a) = lim . h→0 x−a h This alternate formulation of the derivative is often useful for computations. Example 6.1.2. If c ∈ R and f (x) = c for all x ∈ R, then f 0 (x) = 0 for all x ∈ R. Indeed for all a ∈ R, lim x→a f (x) − f (a) c−c = lim = lim 0 = 0. x→a x−a x − a x→a Hence f 0 (a) exists for all a ∈ R and f 0 (a) = 0. Example 6.1.3. Let n ∈ N and let f (x) = xn for all x ∈ R. Then for all a ∈ R, we have (by a homework problem) that f (a + h) − f (a) (a + h)n − an = lim h→0 h→0 h h ! ! n X 1 n n−k k n = lim a h −a h→0 h k k=0 lim 1 = lim h→0 h = lim h→0 n X ! n n−k k a h k k=1 n X k=1 ! ! n n−k k−1 a h k ! = n n−1 a 1 lim hk−1 = 0 for all k > 1 h→0 = nan−1 . Hence f 0 (a) exists for all a ∈ R and f 0 (a) = nan−1 . 1 Example 6.1.4. Let f (x) = x 2 for all x > 0. Then for all a > 0, we have 6.1. THE DERIVATIVE 85 that √ √ x− a x−a √ √ √ √ ( x − a)( x + a) √ √ = lim x→a (x − a)( x + a) 1 1 √ = √ . = lim √ x→a x+ a 2 a f (x) − f (a) lim = lim x→a x→a x−a 1 Hence f 0 (a) exists for all a > 0 and f 0 (a) = 2√ . Note it does not make a 0 sense to discuss f (0) as f is not defined on an open interval around 0. Example 6.1.5. Let f (x) = |x| for all x ∈ R. It is not difficult to see that f 0 (x) = 1 if x > 0 and f 0 (x) = −1 if x < 0 (as f (x) = x if x > 0 and f (x) = −x if x < 0). However, f is not differentiable at 0. Indeed f (h) − f (0) h = lim =1 h→0+ h→0+ h h lim whereas f (h) − f (0) −h = lim = −1. h→0− h→0− h h lim Thus limh→0 f (h)−f (0) h does not exist, so f is not differentiable at 0. From this point forward, we will assume that f (x) = ex , g(x) = sin(x), and h(x) = cos(x) are all differentiable on R with f 0 (x) = ex , g 0 (x) = cos(x), and h0 (x) = − sin(x), and that k(x) = ln(x) and j(x) = xp for p ∈ R are differentiable for x > 0 with k 0 (x) = x1 and j 0 (x) = pxp−1 . In fact, much of the theory of this chapter is devoted to showing that ex is an injective function and that ln(x) is continuous, differentiable, and has derivative x1 . Remark 6.1.6. To study the natural logarithm properly, we will need to know that ex is continuous, differentiable, and has itself as its derivative. One way to do this would be to define ex to be the unique function f such that f (1) = 1 and f 0 (x) = f (x). However, we would then need to prove that such a function exists. Another way (which is easiest in my opinion) to define ex is to define n X 1 k ex = lim x n→∞ k! k=0 for all x ∈ R. Of course, one must show this limit exists, but is is possible to show this using Cauchy sequences (see the homework for convergence of infinite sums). 86 CHAPTER 6. DIFFERENTIATION To study ex , we would then need to rigorously study the convergence of sums of real numbers (i.e. series). Unfortunately, we will not have time to do so. If we did, one must then show that the function f (x) = ex is continuous. To do this, it is first simpler to show that ea+b = ea eb for all a, b ∈ R. This can be done by carefully expand the formula for ea+b , exchange terms in the sums, and show convergence. This is possible since the sums converges in a very special manner (i.e. the sum is absolutely summable). As it is easy to show that ex > 0 if x ≥ 0, (i.e. it is a sum of positive elements bounded below by 1) this implies ex > 0 everywhere for if y < 0 is such that ey < 0, then 0 > ey e−y = e0 = 1 which is a contradiction. If one wants to show that f is continuous at a, we may then notice that |ex − ea | = |ea ||ex−a − 1| = |ea ||ex−a − e0 | so provided ex is continuous at x = 0, we can show that ex is continuous on all of R. Again, dealing with the sums, we can see that ex is continuous at 0. Similar arguments show that f (x) = ex is differentiable everywhere with f 0 (x) = ex . Using the fact that ex > 0 everywhere, we may use a result later in this chapter (Theorem 6.4.6) to prove ex is injective. Moving forward, we notice that all of the above functions are continuous. Coincidence, I think not! Theorem 6.1.7. Let f be a function defined on an open interval containing a. If f is differentiable at a, then f is continuous at a. Proof. Suppose that f 0 (a) exists. Therefore limx→a limx→a x − a exists and since f (x) − f (a) = f (x)−f (a) x−a exists. Since f (x) − f (a) (x − a), x−a we obtain that limx→a f (x) − f (a) exists and f (x) − f (a) lim f (x) − f (a) = lim lim x − a = f 0 (a)0 = 0. x→a x→a x→a x−a Hence limx→a f (x) = f (a) so f is continuous at a. Consequently, if one demonstrates a function is differentiable at a point, then it must be continuous. Conversely, if a function is not continuous at a point, then it is not differentiable there. Using the previous chapter, this provides many examples of functions that are not differentiable at points. However, note that continuity does not imply differentiability. Indeed, we have seen that |x| is continuous on R but not differentiable at 0. In fact, there exists a function that is continuous on R and differentiable on no points in R (although, we do not have the technology to construct it). 6.1. THE DERIVATIVE 6.1.2 87 Rules of Differentiation With the above construction of derivatives complete, we turn our attention to rules that may be used to compute derivatives using known derivatives. Theorem 6.1.8. If c ∈ R and f is differentiable at a point a, then the function cf defined via (cf )(x) = cf (x) for all x ∈ R is differentiable at a and (cf )0 (a) = cf 0 (a). Proof. Since c(f (x) − f (a)) f (x) − f (a) (cf )(x) − (cf )(a) = lim = c lim = cf 0 (a), x→a x→a x→a x−a x−a x−a lim the proof is complete. Theorem 6.1.9. If f and g are differentiable at a point a, then the function f + g defined via (f + g)(x) = f (x) + g(x) for all x ∈ R is differentiable at a and (f + g)0 (a) = f 0 (a) + g 0 (a). Proof. Since (f + g)(x) − (f + g)(a) f (x) − f (a) g(x) − g(a) = lim + x→a x→a x−a x−a x−a 0 0 = f (a) + g (a), lim the proof is complete. Using the previous two results, we see that if n ∈ N and a0 , a1 , . . . , an ∈ R, then (an xn +an−1 xn−1 +· · ·+a1 x+a0 )0 = nan xn−1 +(n−1)an−1 xn−2 +· · ·+a1 +0. Theorem 6.1.10 (Product Rule). If f and g are differentiable at a point a, then the function f g defined via (f g)(x) = f (x)g(x) for all x ∈ R is differentiable at a and (f g)0 (a) = f 0 (a)g(a) + f (a)g 0 (a). Proof. First, notice that f (x)g(x) − f (a)g(a) (f g)(x) − (f g)(a) = x−a x−a f (x)g(x) − f (x)g(a) f (x)g(a) − f (a)g(a) = + . x−a x−a 88 CHAPTER 6. DIFFERENTIATION Since f 0 (a) exists, f is continuous at a by Theorem 6.1.7. Therefore limx→a f (x) = f (a). Since g 0 (a) = limx→a g(x)−g(a) = g 0 (a), we obtain x−a that f (x)g(x) − f (x)g(a) g(x) − g(a) = f (a)g 0 (a). lim = lim f (x) lim x→a x→a x→a x−a x−a Since f (x)g(a) − f (a)g(a) f (x) − f (a) lim = g(a) lim = g(a)f 0 (a), x→a x→a x−a x−a we obtain that (f g)(x) − (f g)(a) lim = f (a)g 0 (a) + g(a)f 0 (a) x→a x−a thereby completing the proof. Using the product rule, we see that (cos(x) sin(x))0 = − sin(x) sin(x) + cos(x) cos(x) = cos2 (x) − sin2 (x) = cos(2x). Furthermore, note the product rule can be applied recursively; that is, if f , g, and h are all differentiable at a, then (f gh)0 (a) = f 0 (a)(gh)(a) + f (a)(gh)0 (a) = f 0 (a)g(a)h(a) + f (a)(g 0 (a)h(a) + g(a)h0 (a)) = f 0 (a)g(a)h(a) + f (a)g 0 (a)h(a) + f (a)g(a)h0 (a). Next, how do we take the derivative of the reciprocal of a function? Lemma 6.1.11. If f is differentiable at a point a and f (a) 6= 0, then the 1 for all x ∈ R is differentiable at a and function h defined via h(x) = f (x) h0 (a) = − f 0 (a) . (f (a))2 Proof. First, we must demonstrate that h(x) is well-defined in an open interval containing a. Notice since f 0 (a) exists, f is continuous at a by Theorem 6.1.7. Hence, by the homework, there exists an open interval I containing a such that f (x) 6= 0 for all x ∈ I. Thus h(x) is well-defined on I so it makes sense to discuss whether h0 (a) exists. Notice that h(x) − h(a) = x−a = 1 f (x) − 1 f (a) x−a f (a)−f (x) f (a)f (x) x−a f (x) − f (a) =− . f (x)f (a)(x − a) 6.1. THE DERIVATIVE 89 Since f 0 (a) exists, f is continuous at a by Theorem 6.1.7 so. limx→a f (x) = 1 1 f (a). Therefore as f (a) 6= 0, limx→a f (x) = f (a) . Hence 1 f (x) − f (a) 1 h(x) − h(a) lim lim =− x→a x→a x−a f (a) x→a f (x) x−a 1 1 f 0 (a) =− f 0 (a) = − f (a) f (a) (f (a))2 lim thereby completing the proof. As an example of the above result, 0 (sec(x)) = 1 cos(x) 0 =− − sin(x) = tan(x) sec(x). cos2 (x) Combining the above result with the product rule, we obtain: Theorem 6.1.12 (Quotient Rule). If f and g are differentiable at a point (x) a and g 0 (a) 6= 0, then the function h defined via h(x) = fg(x) for all x ∈ R is differentiable at a and h0 (a) = f 0 (a)g(a) − f (a)g 0 (a) . (g(a))2 Proof. By Lemma 6.1.11 and Theorem 6.1.10, we obtain that h(x) is differentiable at a and 1 g 0 (a) f 0 (a)g(a) − f (a)g 0 (a) 0 0 h (a) = f (a) + f (a) − = . g(a) (g(a))2 (g(a))2 For example sin(x) 0 cos(x) cos(x) cos(x) − sin(x)(− sin(x)) = cos2 (x) 1 = = sec2 (x) cos2 (x) (tan(x))0 = For our final rule, the Chain Rule, we desire to compute the derivative of the composition of functions, provided the composition makes sense and derivatives exist. Most proofs of the Chain Rule seen in elementary calculus have a large flaw in them. To rigorously prove the Chain Rule, we will need the following. Theorem 6.1.13 (Carathéodory’s Theorem). Let f be a function defined on an open interval I containing a. Then f is differentiable at a if and only if there exists a function ϕ defined on I such that ϕ is continuous at a and f (x) = f (a) + ϕ(x)(x − a). Furthermore, f 0 (a) = ϕ(a). 90 CHAPTER 6. DIFFERENTIATION Proof. First, suppose there a function ϕ defined on I such that ϕ is continuous at a and f (x) = f (a) + ϕ(x)(x − a). To see that f is differentiable at a, notice if x 6= a then f (x) − f (a) ϕ(x)(x − a) = = ϕ(x). x−a x−a Therefore, since limx→a ϕ(x) exists (and equals ϕ(a)), we obtain that f (x) − f (a) x→a x−a lim exists (and equals ϕ(a)) so f is differentiable at a. Conversely, suppose that f is differentiable at a. Define ϕ : I → R via ( ϕ(x) = f 0 (a) f (x)−f (a) x−a if x = a if x = 6 a for all x ∈ I. Clearly f (x) = f (a) + ϕ(x)(x − a) for all x ∈ I. Furthermore, since f (x) − f (a) = f 0 (a) = ϕ(a), lim ϕ(x) = lim x→a x→a x−a ϕ is continuous at a as desired. Theorem 6.1.14 (Chain Rule). Let I and J be open intervals, let g : I → R, and let f : J → R be such that f (J) ⊆ I. Suppose that a ∈ I, f is differentiable at a, and g is differentiable at f (a). Then g ◦ f : I → R is differentiable at a and (g ◦ f )0 (a) = g 0 (f (a))f 0 (a). Proof. Since f 0 (a) and g 0 (f (a)) exists, we have that limx→a f (x) = f (a) and limx→f (a) g(x) = g(f (a)) by Theorem 6.1.7. Furthermore, by Theorem 6.1.13, there exists functions ϕ : J → R and ψ : I → R such that • ϕ is continuous at a, • f (x) = f (a) + ϕ(x)(x − a) for all x ∈ J, • f 0 (a) = ϕ(a), • ψ is continuous at f (a), • g(x) = g(f (a)) + ψ(x)(x − f (a)) for all x ∈ I, and • g 0 (f (a)) = ψ(f (a)). 6.2. INVERSE FUNCTIONS 91 Therefore g(f (x)) − g(f (a)) = ψ(f (x))(f (x) − f (a)) = ψ(f (x))ϕ(x)(x − a). However, since limx→a f (x) = f (a) and since ψ is continuous at f (a), lim ψ(f (x)) = ψ(f (a)) x→a so ψ ◦ f is continuous at a. Furthermore, since ϕ is continuous at a, the function h : J → R defined by h(x) = ψ(f (x))ϕ(x) is continuous at a. Since g(f (x)) = g(f (a)) + h(x)(x − a), we obtain that g ◦ f is differentiable at a with derivative h(a) = ψ(f (a))ϕ(a) = g 0 (f (a))f 0 (a) by Theorem 6.1.13. As an example of the Chain Rule, we notice that (cos(x3 ))0 = (− sin(x3 ))(3x2 ). 6.2 Inverse Functions In this section, we will examine the inverse of functions on closed intervals. In particular, our theory will apply to the natural logarithm ln(x) (being the inverse of ex on R), and inverse trigonometric functions. Unfortunately, the technology to prove certain functions are invertible (specifically Theorem 6.4.6) is not available at this time, so we will make assumptions of invertibility of the necessary functions. 6.2.1 Monotone Functions One of the simplest ways to construct injective functions on R is the following, which will be of greater interest than injective functions. Definition 6.2.1. Let I be an interval. A function f defined on I is said to be • increasing on I if f (x1 ) < f (x2 ) whenever x1 , x2 ∈ I and x1 < x2 , • non-decreasing on I if f (x1 ) ≤ f (x2 ) whenever x1 , x2 ∈ I and x1 < x2 , • decreasing on I if f (x1 ) > f (x2 ) whenever x1 , x2 ∈ I and x1 < x2 , • non-increasing on I if f (x1 ) ≥ f (x2 ) whenever x1 , x2 ∈ I and x1 < x2 , • monotone on I if f is non-decreasing or non-increasing. Indeed, if we include continuity hypotheses, there is little difference between injective and monotone functions. 92 CHAPTER 6. DIFFERENTIATION Proposition 6.2.2. Let I be an interval and let f : I → R. If f is continuous on I and f is injective, then either f is increasing or decreasing on I. Proof. Notice that since f is injective on I, if c, d ∈ I are such that c < d, then f (c) 6= f (d). Suppose that f is not increasing nor decreasing on I. Therefore, there must exist three points x1 , x2 , x3 ∈ I with x1 < x2 < x3 such that either • f (x1 ) < f (x2 ) and f (x3 ) < f (x2 ), or • f (x1 ) > f (x2 ) and f (x3 ) > f (x2 ). Suppose that f (x1 ) < f (x2 ) and f (x3 ) < f (x2 ). Choose α ∈ R such that f (x1 ) < α < f (x2 ) and f (x3 ) < α < f (x2 ) (this is possible by letting β = max{f (x1 ), f (x3 )} and letting α = β+f2(x2 ) ). Since f is continuous on [x1 , x2 ], the Intermediate Value Theorem (Theorem 5.3.3) implies there exists a c ∈ (x1 , x2 ) such that f (c) = α. Similarly, since f is continuous on [x2 , x3 ], the Intermediate Value Theorem (Theorem 5.3.3) implies there exists a d ∈ (x1 , x2 ) such that f (d) = α. As c < d, f (c) = α = f (d) contradicts the fact that f is injective on I. Hence we have a contradiction in this case. Since the case that f (x1 ) > f (x2 ) and f (x3 ) > f (x2 ) also creates a contradiction by similar arguments, we have obtained a contradiction. Hence f is either increasing or decreasing on I. It is important to note that the above result is false if f is not continuous. Indeed if ( x if x ∈ Q f (x) = , −x if x ∈ R \ Q then f is injective on R, yet f is neither increasing nor decreasing. Our next result demonstrates it is easy to check that monotone functions are continuous. In particular, to check that monotone functions are continuous, we just need to check that such functions do not have jump discontinuities. Theorem 6.2.3. Let f : [a, b] → R be monotone. Then f is continuous on [a, b] if and only if whenever α ∈ R is such that f (a) < α < f (b) or f (b) < α < f (a) there exists a c ∈ [a, b] such that f (c) = α (that is, f is continuous if and only if the conclusions of the Intermediate Value Theorem hold). Proof. If f is continuous on [a, b], then whenever α ∈ R is such that f (a) < α < f (b) or f (b) < α < f (a) there exists a c ∈ [a, b] such that f (c) = α by the Intermediate Value Theorem (Theorem 5.3.3). For the converse, suppose that f is non-decreasing as the proof when f is non-increasing will hold by similar arguments. Let x0 ∈ (a, b) be arbitrary. To see that f is continuous at x0 , let > 0 be arbitrary. 6.2. INVERSE FUNCTIONS 93 Let α = max{f (a), f (x0 ) − }. If α = f (a), let c1 = a and notice that if c1 < x < x0 , then 0 ≤ f (x0 ) − f (x) ≤ f (x0 ) − f (a) = f (x0 ) − α ≤ f (x0 ) − (f (x0 ) − ) = as f is non-decreasing. Otherwise f (a) < α = f (x0 ) − . Therefore, by assumptions, there exists a c1 ∈ [a, b] such that f (c1 ) = α. Since f (c1 ) = α = f (x0 ) − < f (x0 ), it must be the case that c1 < x0 as f is non-decreasing. Furthermore, if c1 < x < x0 , then 0 ≤ f (x0 ) − f (x) ≤ f (x0 ) − f (c1 ) = f (x0 ) − α = f (x0 ) − (f (x0 ) − ) = Hence, in either case, there exists a c1 ∈ [a, x0 ) such that |f (x) − f (x0 )| ≤ for all x ∈ (c1 , x0 ). Let β = min{f (b), f (x0 ) + }. If β = f (b), let c2 = b and notice that if x0 < x < b, then 0 ≤ f (x) − f (x0 ) ≤ f (b) − f (x0 ) = β − f (x0 ) ≤ (f (x0 ) + ) − f (x0 ) = as f is non-decreasing. Otherwise f (b) > β = f (x0 ) + . Therefore, by assumptions, there exists a c2 ∈ [a, b] such that f (c2 ) = β. Since f (c2 ) = β = f (x0 ) + > f (x0 ), it must be the case that x0 < c2 as f is non-decreasing. Furthermore, if x0 < x < c2 , then 0 ≤ f (x) − f (x0 ) ≤ f (c2 ) − f (x0 ) = β − f (x0 ) = (f (x0 ) + ) − f (x0 ) = Hence, in either case, there exists a c2 ∈ (x0 , b] such that |f (x) − f (x0 )| ≤ for all x ∈ (x0 , c2 ). Therefore, if x ∈ (c1 , c2 ) then x ∈ I and |f (x) − f (x0 )| ≤ . Let δ = min{x0 − c1 , c2 − x0 } > 0. Then, if |x − x0 | < δ then x ∈ (c1 , c2 ) so |f (x) − f (x0 )| ≤ . Hence f is continuous at x0 . Since x0 ∈ (a, b) be arbitrary, f is continuous at each point in (a, b). To see that f is continuous at b, apply the α-part of the argument with x0 = b to obtain limx→b− f (x) = f (b). Similarly, to see that f is continuous at a, apply the β-part of the argument with x0 = a to obtain limx→a+ f (x) = f (a). Hence f is continuous on [a, b]. 6.2.2 Inverse Function Theorem Although Theorem 6.2.3 may seem an odd thing to prove, it is the essential tool in demonstrating the following (which tells us the continuity of ex implies the continuity of ln(x)). Corollary 6.2.4. If f : [a, b] → R is injective and continuous on [a, b], then f ([a, b]) is a closed interval and the inverse of f on its image, f −1 : f ([a, b]) → [a, b], is continuous. 94 CHAPTER 6. DIFFERENTIATION Proof. Note f is either increasing or decreasing by Proposition 6.2.2. We will assume that f is increasing as the proof that f is decreasing will follow by similar arguments (or can follow as if f is decreasing, the function g : [a, b] → R defined by g(x) = −f (x) is an increasing continuous injective function). Since f is increasing, we obtain f (a) < f (b). Since f is continuous, we obtain by the Intermediate Value Theorem (Theorem 5.3.3) that f ([a, b]) = [f (a), f (b)]. We claim that f −1 is increasing. To see this, suppose y1 , y2 ∈ f ([a, b]) are such that y1 < y2 . Choose x1 , x2 ∈ [a, b] such that f (x1 ) = y1 and f (x2 ) = y2 . Since f (x1 ) < f (x2 ), it must be the case that x1 < x2 as f was increasing. Hence f −1 (y1 ) = f −1 (f (x1 )) = x1 < x2 = f −1 (f (x2 )) = f −1 (y2 ). Hence f −1 is increasing. Hence f −1 : [f (a), f (b)] → [a, b] is an increasing function such that f −1 ([f (a), f (b)]) = [a, b]. Therefore f −1 is continuous by Theorem 6.2.3 as f −1 satisfies the conclusions of the Intermediate Value Theorem. Since inverse functions of continuous function are continuous, it is possible that they are differentiable. In particular, the following tells us how to compute derivatives of inverse functions. Theorem 6.2.5 (Inverse Function Theorem). Suppose a, b ∈ R with a < b and f : [a, b] → R is injective and continuous on [a, b]. Let g : f ([a, b]) → [a, b] be the inverse of f on its image. If c ∈ (a, b) and f is differentiable at c with f 0 (c) 6= 0, then g is differentiable at f (c) and g 0 (f (c)) = 1 . f 0 (c) Proof. Since f is injective and continuous, f ([a, b]) is a closed interval by Corollary 6.2.4. Furthermore, g is continuous on f ([a, b]) by the same corollary. To see that g(x) − g(f (a)) lim x − f (c) x→f (c) exists, suppose (xn )n≥1 is such that xn ∈ f (I) \ {f (c)} for all n ∈ N and limn→∞ xn = f (c). Let yn = g(xn ) (so f (yn ) = xn 6= f (c)). Then g(xn ) − g(f (c)) yn − a = = xn − f (c) f (yn ) − f (c) 1 f (yn )−f (c) yn −c . 6.2. INVERSE FUNCTIONS 95 Since limn→∞ xn = f (c) and since g is continuous at f (c), we obtain that lim yn = lim g(xn ) = g(f (c)) = c. n→∞ Hence, as f 0 (c) n→∞ exists and f 0 (c) lim 6= 0, 1 n→∞ f (yn )−f (c) yn −c = 1 . f 0 (c) Hence 1 g(xn ) − g(f (c)) = 0 . n→∞ xn − f (c) f (c) Since this holds for all (xn )n≥1 is such that xn ∈ f (I) \ {f (c)} for all n ∈ N and limn→∞ xn = f (c), we obtain that g 0 (f (c)) exists and equals f 01(c) . lim Example 6.2.6. Let f (x) = ex for all x ∈ R. Then g(x) = ln(x) for x > 0 is the inverse of f . Since f 0 (x) = ex , if a ∈ R then g 0 (ea ) = g 0 (f (a)) = 1 1 = a. f 0 (a) e Since if x > 0 we can write x = ea for some a ∈ R, we obtain that g 0 (x) = x1 as desired. Using the above (and theory of exponentials), for a fixed b > 0 we can compute the derivative of bx . Indeed we know that b = eln(b) and thus bx = (eln(b) )x = ex ln(b) . Hence, by the Chain Rule, (bx )0 = ln(b)ex ln(b) = ln(b)bx . In particular, using the definition of the derivative, we obtain that bh − 1 . h→0 h injective Example 6.2.7. We know that cos(x), sin(x), and tan(x) are not π π − functions on R. However, we can show that f : [0, π] → [−1, 1], g : 2, 2 → π π [−1, 1], and h : − 2 , 2 → R defined by ln(b) = lim f (x) = cos(x), g(x) = sin(x), and h(x) = tan(x) are injective continuous functions with non-zero derivatives (on the open intervals). Consequently, their inverses, denoted arccos : [−1, 1] → [0, π], arcsin : [−1, 1] → − π2 , π2 , and arctan : R → − π2 , π2 respectively, are continuous by Corollary 6.2.4 and differentiable by Theorem 6.2.5. Let’s compute their inverses. First, for f (x), we notice that arccos(−1) = π and arccos(1) = 0. Also 1 f 0 (arccos(x)) 1 = − sin(arccos(x)) 1 = −√ 1 − x2 (arccos(x))0 = 96 CHAPTER 6. DIFFERENTIATION where we have used the following triangle where θ = arccos(x): √ 1 1 − x2 θ x Next, for g(x), we notice that arcsin(−1) = − π2 and arcsin(1) = π2 . Also (arcsin(x))0 = 1 g 0 (arcsin(x)) 1 cos(arcsin(x)) 1 =√ 1 − x2 = where we have used the following triangle where θ = arcsin(x): 1 x θ √ 1 − x2 Finally, for h(x), we notice that arctan(x) tends to −∞ as x tends to from the right whereas arctan(x) tends to ∞ as x tends to π2 from the left. Also − π2 (arctan(x))0 = = = 1 h0 (arctan(x)) 1 sec2 (arctan(x)) 1 1 + x2 where we have used the following triangle where θ = arctan(x): √ x 1 + x2 θ 1 6.3 Extreme Values of Functions Next we turn our attention to another problem, which may be motivated by physics: If an object travels from the Earth to the moon, how do we know 6.3. EXTREME VALUES OF FUNCTIONS 97 that there is a point where it obtains its maximum speed? Of course, we must make assumptions that speed is a well-defined continuous function (no teleporters) in order for us to answer this question in the affirmative. A similar question is to analyze elements of R for which a function f on R obtains its extremal values. Consequently, we make the following definition. Definition 6.3.1. Let I be an interval and let f : I → R. It is said that f has a • global maximum at c if f (x) ≤ f (c) for all x ∈ I, • global minimum at c if f (x) ≥ f (c) for all x ∈ I, • local maximum at c if there exists an open interval J ⊆ I such that c ∈ J and f (x) ≤ f (c) for all x ∈ J, • local minimum at c if there exists an open interval J ⊆ I such that c ∈ J and f (x) ≥ f (c) for all x ∈ J, Given a function f , it is clear that f must have the following property in order for f to have a global maximum or minimum. Definition 6.3.2. A function f defined on an interval I is said to be bounded if f (I) is a bounded set. Using the Bolzano-Weierstrass Theorem (Theorem 2.4.5) or our results about sequentially compact sets (see Theorem 3.3.8), we can obtain the second piece of our Triforce. Consequently, the following result is really a result about continuous functions on compact sets. Theorem 6.3.3 (Extreme Value Theorem). Let f : [a, b] → R be continuous on [a, b]. Then there exists points x1 , x2 ∈ [a, b] such that f (x1 ) ≤ f (x) ≤ f (x2 ) for all x ∈ [a, b]; that is, f has a global maximum and minimum on [a, b]. Proof. Since f : [a, b] → R is continuous on [a, b], f is bounded. Let α = glb(f ([a, b])). Hence, for each n ∈ N there exists a yn ∈ [a, b] such that 1 α ≤ f (yn ) < α + . n Hence limn→∞ f (yn ) = α by the Squeeze Theorem. Since (yn )n≥1 is a sequence such that yn ∈ [a, b] and since [a, b] is sequentially compact (see Theorem 3.3.8), there exists a subsequence (ykn )k≥1 such that limn→∞ ykn exists and is an element of [a, b]. Let x1 = limn→∞ ykn ∈ [a, b]. Then, since f is continuous on [a, b], f (x1 ) = lim f (ykn ) = α. n→∞ 98 CHAPTER 6. DIFFERENTIATION Hence f (x1 ) ≤ f (x) for all x ∈ [a, b] by the definition of α. Similar arguments show that if β = lub(f ([a, b])), then there exists an x2 ∈ [a, b] such that f (x2 ) = β. Hence f (x) ≤ f (x2 ) for all x ∈ [a, b] by the definition of β. Of course, the Extreme Value Theorem says that maximum and minimum are obtain, but provides no method for computing them. In the case our functions are differentiable, we are in luck. Proposition 6.3.4. Let I be an interval and let f : I → R. If f has a local maximum or local minimum at c ∈ I, and if f 0 (c) exists, then f 0 (c) = 0. Proof. Let c ∈ I be such that f 0 (c) exists. We will assume that f has a local maximum at c as the proof when f has a local minimum at c is similar. Since f has a local maximum at c, there exists an open interval J ⊆ I such that c ∈ J and f (x) ≤ f (c) for all x ∈ J. If x ∈ J and x > c, then f (x) − f (c) ≥0 x−c as both the numerator and denominator are positive. Therefore, as J is an open interval containing c, lim x→c+ f (x) − f (c) ≥ 0. x−c Similarly, if x ∈ J and x < c, then f (x) − f (c) ≤0 x−c as the numerator is positive whereas the denominator is negative. Therefore, as J is an open interval containing c, f (x) − f (c) ≤ 0. x→c− x−c lim Since f 0 (c) exists f (x) − f (c) f (x) − f (c) = f 0 (c) = lim . x→c− x→c+ x−c x−c lim Hence the above inequalities show 0 ≤ f 0 (c) ≤ 0 and thus f 0 (c) = 0. Using the above we may discuss an algorithm for deducing the maximal and minimal values a function achieves: Given a function f defined on either [a, b] or R: 1. Find all points where f is differentiable. 6.4. THE MEAN VALUE THEOREM 99 2. Find the value of all points x where f 0 (x) = 0. 3. Find the value of all points x where f 0 (x) does not exist (if f is not differentiable on a large set, we are pretty much out of luck). 4. If f is defined on [a, b], find f (a) and f (b). Otherwise, if f is defined on R, find limx→∞ f (x) and limx→−∞ f (x). We must do this to analyze the behaviour of f at the bound values for which we have no information pertaining to derivatives. 5. Compare the values to find the maximal/minimal values and where they occur. The following two examples demonstrate when a maximum/minimum can occur when f 0 does not exist and when f 0 (x) exists and equals zero, yet f (x) is not a local maximum or minimum. Example 6.3.5. If f (x) = |x|, we can see that f has no maximum on R yet f has a minimal value of 0 at x = 0. However f 0 (0) does not exist. Example 6.3.6. If f (x) = x3 , we can see that f 0 (x) = 3x2 is zero only when x = 0. However, f does not have a local maximum nor local minimum at 0 as f (x) < 0 if x < 0 and f (x) > 0 if x > 0. One tool the above example illustrates is missing from our repertoire is the ability to determine whether a point is a local maximum or local minimum based on the derivative. For example, consider the function f (x) = xex defined for all x ∈ R. We notice, via the product rule, that f 0 (x) = ex + xex = (x + 1)ex . Hence f 0 (x) is zero only x = −1. However, how can we tell if f has a local maximum or minimum at x = −1, or whether we are in a similar case as we were with 0 for x3 ? 6.4 The Mean Value Theorem To develop a tool to answer the above question, we turn our attention to obtaining our third and final piece of our Triforce theorems. This final essential theorem is motivated by the following problem: Suppose one drove from College Station to Houston (approximately 96 miles) in one hour. How can we prove that at some point in the journey the driver was speeding? Of course we must make some similar assumptions again: that is, distance is a function of time (no time travel) and distance is a continuous function of time (no teleporters). Furthermore, to be able to measure the speed of the vehicle at any instant in time, the distance function must be differentiable. 100 CHAPTER 6. DIFFERENTIATION Our theorem (Theorem 6.4.2) will demonstrate that there must be a point where the instantaneous speed of the vehicle is the mean (or average) value of the speed; namely 96 miles per hour in this case. Consequently, at some point in the journey, the vehicle must have been speeding. 6.4.1 Proof of the Mean Value Theorem To prove our main theorem, we start with a lemma that is easier to prove and will enable us to prove the desired theorem via a simple translation. Lemma 6.4.1 (Rolle’s Theorem). If f : [a, b] → R is continuous on [a, b], differentiable on (a, b), and f (a) = f (b) = 0, then there exists a c ∈ (a, b) such that f 0 (c) = 0. Proof. The proof will be divided into three cases. Case 1: f (x) = 0 for all x ∈ (a, b). Clearly f 0 (x) = 0 for all x ∈ (a, b) by Example 6.1.2. Case 2: There is an x0 ∈ (a, b) with f (x0 ) > 0. By the Extreme Value Theorem (Theorem 6.3.3) there exists an c ∈ [a, b] such that f (c) ≥ f (x) for all x ∈ [a, b]. Thus f (c) ≥ f (x0 ) > 0 so c 6= a, b. Since f (c) ≥ f (x) for all x ∈ [a, b], c must be a local maximum of f on (a, b) and thus f 0 (c) = 0 by Proposition 6.3.4. Case 3: There is an x0 ∈ (a, b) with f (x0 ) < 0. By the Extreme Value Theorem 6.3.3 there exists an c ∈ [a, b] such that f (c) ≤ f (x) for all x ∈ [a, b]. Thus f (c) ≤ f (x0 ) < 0 so c 6= a, b. Since f (c) ≤ f (x) for all x ∈ [a, b], c must be a local minimum of f on (a, b) and thus f 0 (c) = 0 by Proposition 6.3.4. As at least one of the above three cases must always be true, the result follows. To use Rolle’s Theorem to obtain our third piece of the Triforce, we will translate our function. To do this, notice given a function f : [a, b] → R, the function f (b) − f (a) g(x) = f (a) + (x − a) b−a is a line with slope (b, f (b)). f (b)−f (a) b−a that passes through the points (a, f (a)) and Theorem 6.4.2 (Mean Value Theorem). If f : [a, b] → R is continuous on [a, b] and differentiable on (a, b), then there exists a c ∈ (a, b) such that (a) f 0 (c) = f (b)−f . b−a Proof. Let f : [a, b] → R be continuous on [a, b] and differentiable on (a, b). Define h : [a, b] → R by h(x) = f (x) − f (a) − f (b) − f (a) (x − a) b−a 6.4. THE MEAN VALUE THEOREM 101 for all x ∈ [a, b]. By previous results h is continuous on [a, b] and differentiable on (a, b) with f (b) − f (a) h0 (x) = f 0 (x) − . b−a Notice that h(a) = f (a) − f (a) − f (b) − f (a) (a − a) = 0 b−a h(b) = f (b) − f (a) − f (b) − f (a) (b − a) = 0. b−a and Hence, by Rolle’s Theorem (Lemma 6.4.1), there exists a c ∈ (a, b) such that h0 (c) = 0. Hence f (b) − f (a) 0 = f 0 (c) − b−a and thus the result follows. Now that our Triforce is complete, any result we wish to prove is ours! Before we move on, we notice that the conclusions of the Mean Value Theorem can fail even if f is not differentiable at a single point. Indeed if f : [−1, 1] → R is defined by f (x) = |x|, then f is continuous on [−1, 1] and differentiable on (−1, 1) \ {0}. However, there is no point c ∈ (−1, 1) such (−1) that f 0 (c) = f (1)−f = 1−1 −2 = 0. 1−(−1) 6.4.2 Anti-Derivatives For our first application of the Mean Value Theorem, we will examine how zero derivatives affect functions. In particular, this enable us to study functions which have the same derivatives. Corollary 6.4.3. Let I be an open interval and suppose f : I → R is differentiable on I with f 0 (x) = 0 for all x ∈ I. Then there exists an α ∈ R such that f (x) = α for all x ∈ I. Proof. Let a, b ∈ I be such that a < b. Then f is continuous on [a, b] (by Theorem 6.1.7) and differentiable on (a, b). Therefore, by the Mean Value Theorem (Theorem 6.4.2), there exists a c ∈ (a, b) such that f (b) − f (a) = f 0 (c) = 0 =⇒ f (b) = f (a) b−a (as f 0 (x) = 0 for all x ∈ I). Choose x0 ∈ I and let α = f (x0 ). If x ∈ I and x 6= x0 , then either x > x0 or x < x0 . In either case the above shows that f (x) = f (x0 ) = α. Hence f (x) = α for all x ∈ I. 102 CHAPTER 6. DIFFERENTIATION Corollary 6.4.4. Let I be an open interval and suppose f, g : I → R are differentiable on I with f 0 (x) = g 0 (x) for all x ∈ I. Then there exists an α ∈ R such that f (x) = g(x) + α for all x ∈ I. Proof. Let h : I → R be defined by h(x) = f (x) − g(x). Then h is differentiable on I and h0 (x) = f 0 (x) − g 0 (x) = 0 for all x ∈ I. Hence there exists an α ∈ R such that h(x) = α for all x ∈ I. Hence f (x) = g(x) + α for all x ∈ I. Based on the above, we make the following definition. Definition 6.4.5. Let I be an open interval and let f : I → R. A function F : I → R is said to be an anti-derivative of f on I if F is differentiable on I and F 0 (x) = f (x) for all x ∈ I. Thus Corollary 6.4.4 implies that if F is an anti-derivative of f , then the function G defined via G(x) = F (x) + c for all x and for some fixed constant c ∈ R is also an anti-derivative of f . Anti-derivative are important tools in integration as we will see in the next chapter. 6.4.3 Monotone Functions and Derivatives For our next application of the Mean Value Theorem, we will see how we may deduce a function is increasing from its derivative. This will also enable us to construct a test to see if an extreme value of a function is either a maximum, a minimum, or neither. Theorem 6.4.6 (Increasing Function Theorem). Let f : [a, b] → R be continuous on [a, b] and differentiable on (a, b). If f 0 (x) ≥ 0 for all x ∈ (a, b), then f is non-decreasing on [a, b]. Similarly, if f 0 (x) > 0 for all x ∈ (a, b), then f is increasing on [a, b] Proof. Let f : [a, b] → R be continuous on [a, b] and differentiable on (a, b) such that f 0 (x) ≥ 0 for all x ∈ (a, b). Let x1 , x2 ∈ [a, b] be such that x1 < x2 . Since f is continuous on [x1 , x2 ] and differentiable on (x1 , x2 ), the Mean Value Theorem (Theorem 6.4.2) implies that there exists a c ∈ (x1 , x2 ) such that f (x2 ) − f (x1 ) f 0 (c) = . x2 − x1 By the assumptions on f , f 0 (c) ≥ 0. Therefore, since x1 < x2 , we must have that f (x2 ) − f (x1 ) = f 0 (c)(x2 − x1 ) ≥ 0 Hence f must be non-decreasing on [a, b] as desired. The proof in the case that f 0 (x) > 0 for all x ∈ (a, b) is identical. 6.4. THE MEAN VALUE THEOREM 103 A similar proof shows the following. Theorem 6.4.7 (Decreasing Function Theorem). Let f : [a, b] → R be continuous on [a, b] and differentiable on (a, b). If f 0 (x) ≤ 0 for all x ∈ (a, b), then f is non-increasing on [a, b]. Similarly, if f 0 (x) < 0 for all x ∈ (a, b), then f is decreasing on [a, b]. Note since we said that one of the defining properties of f (x) = ex is that f 0 (x) = f (x), and since f (x) > 0 for all x ∈ R, we see that ex is an increasing function on R. Therefore ex is injective on R. Thus, all that is missing in our theory of ex and ln(x) are some of the details in Remark 6.1.6. Similarly, we can now show sin(x), cos(x), and tan(x) are increasing/decreasing on certain intervals (assuming we know what their derivatives are). At the end of Section 6.3, we demonstrated that if f (x) = xex , then 0 f (x) = (x + 1)ex so the only possible local maximum or minimum of f will occur at x = −1. However, we did not have the ability to determine if f did have a local maximum or minimum at −1. Thanks to the following, we do. Note the following does not require the existence of the second derivative (see Definition 6.4.13), nor does it require the function to be differentiable at the point c in question. Theorem 6.4.8 (First Derivative Test). Let f : (a, b) → R be differentiable on (a, b). Suppose c ∈ (a, b) has the property that there exists a δ > 0 such that • f 0 (x) exists and f 0 (x) > 0 for all x ∈ (c, c + δ) ⊆ (a, b), and • f 0 (x) exists and f 0 (x) < 0 for all x ∈ (c − δ, c) ⊆ (a, b). Then f has a local minimum at c. Similarly, suppose c ∈ (a, b) has the property that there exists a δ > 0 such that • f 0 (x) exists and f 0 (x) < 0 for all x ∈ (c, c + δ) ⊆ (a, b), and • f 0 (x) exists and f 0 (x) > 0 for all x ∈ (c − δ, c) ⊆ (a, b). Then f has a local maximum at c. Proof. Let f : (a, b) → R be differentiable on (a, b). Suppose c ∈ (a, b) has the property that there exists a δ > 0 such that • f 0 (x) exists and f 0 (x) > 0 for all x ∈ (c, c + δ), and • f 0 (x) exists and f 0 (x) < 0 for all x ∈ (c − δ, c). To see that f has a local minimum at c, first let x ∈ (c, c + δ) ⊆ (a, b) be arbitrary. Since f is continuous on [c, x] and differentiable on (c, x), the 104 CHAPTER 6. DIFFERENTIATION Mean Value Theorem (Theorem 6.4.2) implies there exists a d ∈ (c, x) such that f (x) − f (c) f 0 (d) = . x−c Since d ∈ (c, c + δ), f 0 (d) > 0. Hence the above equation implies f (x) > f (c) for all x ∈ (c, c + δ). Similarly, let x ∈ (c − δ, c) ⊆ (a, b) be arbitrary. Since f is continuous on [x, c] and differentiable on (x, c), the Mean Value Theorem (Theorem 6.4.2) implies there exists a d ∈ (x, c) such that f 0 (d) = f (c) − f (x) . c−x Since d ∈ (c − δ, c), f 0 (d) < 0. Hence the above equation implies f (x) > f (c) for all x ∈ (c − δ, c). Therefore, f has a local minimum at c by definition. The proof of the second portion of this result is similar to the first. Consequently, if f (x) = xex , then f 0 (x) = (x + 1)ex . If x < −1 then x + 1 < 0 whereas ex > 0 so f 0 (x) < 0 if x < 0. Furthermore, if x > −1 then x + 1 > 0 and ex > 0 so f 0 (x) > 0. Therefore, f has a local minimum at −1 by Theorem 6.4.8. Note f (−1) = − 1e . Is − 1e a global minimum of f ? It is not difficult to see that limx→∞ xex = ∞ as both x and ex tend to infinity as x tends to infinity (note by the definition of ex in Remark 6.1.6, ex > 1 + 12 x for all x > 0). However limx→−∞ xex is not as clear as limx→−∞ x = −∞ whereas lim ex = lim e−a = 0. x→−∞ a→∞ xex Thus, does converge as x tends to −∞? It turns out the limit will be 0 thereby showing that f has a global minimum of − 1e at x = −1. 6.4.4 L’Hôpital’s Rule Notice that, if the limits exist, a . a→∞ a→∞ ea x→−∞ However, this does not help us as lima→∞ a = ∞ = lima→∞ ea so again have no useful information. If only we had a way to compute such limits? To develop a method for computing limits of the form 00 or ∞ ∞ , we will need to place our Mean Value Theorem on steroids: lim xex = − lim ae−a = − lim Theorem 6.4.9 (Cauchy’s Mean Value Theorem). Let f, g : [a, b] → R be continuous on [a, b] and differentiable on (a, b) with g 0 (x) 6= 0 for all x ∈ (a, b). Then there exists a c ∈ (a, b) such that f (b) − f (a) f 0 (c) = 0 . g(b) − g(a) g (c) (Note: If g(x) = x for all x ∈ [a, b], this is precisely the Mean Value Theorem.) 6.4. THE MEAN VALUE THEOREM 105 Proof. By the Mean Value Theorem (Theorem 6.4.2) there exists a d ∈ (a, b) such that g(b) − g(a) g 0 (d) = . b−a As g 0 (d) 6= 0, we obtain that g(b) − g(a) 6= 0. Define h : [a, b] → R by h(x) = f (b) − f (a) (g(x) − g(a)) − f (x) + f (a) g(b) − g(a) for all x ∈ [a, b]. Note that h makes sense as g(b) − g(a) 6= 0. Furthermore, notice that h is continuous on [a, b] and differentiable on (a, b) with h0 (x) = f (b) − f (a) 0 g (x) − f 0 (x). g(b) − g(a) Furthermore, notice h(a) = f (b) − f (a) (g(a) − g(a)) − f (a) + f (a) = 0 g(b) − g(a) whereas h(b) = f (b) − f (a) (g(b)−g(a))−f (b)+f (a) = (f (b)−f (a))−f (b)+f (a) = 0. g(b) − g(a) Hence by Rolle’s Theorem (Lemma 6.4.1) or, alternatively, by the Mean Value Theorem, there exists a c ∈ (a, b) such that h0 (c) = 0. Hence 0= Therefore f (b) − f (a) 0 g (c) − f 0 (c). g(b) − g(a) f (b) − f (a) f 0 (c) = 0 g(b) − g(a) g (c) as g 0 (c) 6= 0. Using Cauchy’s Mean Value Theorem, we can obtain the following result which is extremely useful for computing limits. This result is commonly believed to be first proved by Bernoulli. Theorem 6.4.10 (L’Hôpital’s Rule). Suppose that −∞ < a < b < ∞, that f, g : (a, b) → R are such that f and g are differentiable on (a, b), that g 0 (x) 6= 0 for all x ∈ (a, b), and that either i) limx→a+ f (x) = limx→a+ g(x) = 0, or ii) limx→a+ f (x) = limx→a+ g(x) = ±∞. 106 CHAPTER 6. DIFFERENTIATION Then: a) If limx→a+ f 0 (x) g 0 (x) = L ∈ R, then limx→a+ b) If limx→a+ f 0 (x) g 0 (x) = ±∞, then limx→a+ f (x) g(x) f (x) g(x) = L. = ±∞. Similarly, the result holds with a+ exchanged with b−, ∞, or −∞. Proof. For all cases, we claim that there exists at most one point x in (a, b) such that g(x) = 0. To see this, notice for all x1 , x2 ∈ (a, b) with x1 < x2 that g is continuous on [x1 , x2 ] (by Theorem 6.1.7) and differentiable on (x1 , x2 ). Hence, by the Mean Value Theorem (Theorem 6.4.2), there exists a d ∈ (x1 , x2 ) such that g(x2 ) − g(x1 ) g 0 (d) = . x2 − x1 As g 0 (d) 6= 0, we obtain that g(x2 ) − g(x1 ) 6= 0. As this holds for all x1 , x2 ∈ (a, b) with x1 < x2 , there exists at most one point, say γ, in (a, b) such that g(γ) = 0. 0 (x) For the proof of part (a), suppose limx→a+ fg0 (x) = L. Let > 0 be 0 arbitrary. Therefore, there exists a b ∈ (a, b) such that 0 f (x) g 0 (x) − L < for all x ∈ (a, b0 ). If γ exists as in the previous paragraph, we may assume that b0 < γ by decreasing b0 if necessary. Let α and β be arbitrary numbers such that a < α < β < b0 . Since f and g are continuous on [α, β], differentiable on (α, β), and g 0 (x) 6= 0 for all x ∈ (α, β), Cauchy’s Mean Value Theorem (Theorem 6.4.9) implies there exists a c ∈ (α, β) such that f 0 (c) f (β) − f (α) = . 0 g (c) g(β) − g(α) Hence, as c ∈ (α, β) ⊆ (a, b0 ), we obtain that 0 f (c) f (β) − f (α) g 0 (c) − L = g(β) − g(α) − L < . For part (i), notice the above inequality holds for any β ∈ (a, b0 ) and for all α ∈ (a, β). Hence, by fixing a β ∈ (a, b0 ) and taking the limit of α ∈ (a, β) as α tends to a, we obtain that f (β) f (β) − f (α) = lim g(β) α→a+ g(β) − g(α) 6.4. THE MEAN VALUE THEOREM 107 as g(β) 6= 0 and limα→a+ f (β) = 0 = limβ→a+ g(β). Hence, for all β ∈ (a, b0 ), we have that f (β) ≤ − L g(β) Since > 0 was arbitrary, we obtain that lim x→a+ f (x) =L g(x) as desired. For part (ii) (that is limx→a+ f (x) = limx→a+ g(x) = ±∞), notice f (β) f (α) 1 − = (f (β) − f (α)) g(α) g(α) g(α) 1 f 0 (c) = (g(β) − g(α)) 0 g(α) g (c) 0 0 g(β) f (c) f (c) = − 0 g(α) g 0 (c) g (c) so f 0 (c) f (β) g(β) f 0 (c) f (α) = 0 + − . g(α) g (c) g(α) g(α) g 0 (c) Hence 0 f (α) f (c) f (β) g(β) f 0 (c) g(α) − L ≤ g 0 (c) − L + g(α) + g(α) g 0 (c) f (β) g(β) + (L + ) ≤ + g(α) g(α) for all β ∈ (a, b0 ) and for all α ∈ (a, β). However, for any fixed β, we know that f (β) g(β) + lim g(α) (L + ) = 0 α→a+ g(α) as limα→a+ g(α) = ±∞ (note we really do not need to know limx→a+ f (x) = ∞ for this to work). Hence, there exists a δ > 0 such that if a < α < a + δ, then f (β) g(β) + (L + ) < . 0 ≤ g(α) g(α) Hence, if a < α < a + δ then f (α) g(α) − L < 2. Since > 0 was arbitrary, we obtain that f (x) =L x→a+ g(x) lim 108 CHAPTER 6. DIFFERENTIATION as desired. 0 (x) For part (b), suppose limx→a+ fg0 (x) = ∞ as the proof when the limit is −∞ is similar. Let M > 0 be arbitrary. Therefore, there exists a b0 ∈ (a, b) such that f 0 (x) >M g 0 (x) for all x ∈ (a, b0 ). If γ exists as in the previous paragraph, we may assume that b0 < γ by decreasing b0 if necessary. Let α and β be arbitrary numbers such that a < α < β < b0 . Since f and g are continuous on [α, β], differentiable on (α, β), and g 0 (x) 6= 0 for all x ∈ (α, β), Cauchy’s Mean Value Theorem (Theorem 6.4.9) implies there exists a c ∈ (α, β) such that f 0 (c) f (β) − f (α) = . 0 g (c) g(β) − g(α) Hence, as c ∈ (α, β) ⊆ (a, b0 ), we obtain that f (β) − f (α) f 0 (c) = > M. g 0 (c) g(β) − g(α) For part (i), notice the above inequality holds for any β ∈ (a, b0 ) and for all α ∈ (a, β). Hence, by fixing a β ∈ (a, b0 ) and taking the limit of α ∈ (a, β) as α tends to a, we obtain that f (β) f (β) − f (α) = lim α→a+ g(β) g(β) − g(α) as g(β) 6= 0 and limα→a+ f (β) = 0 = limβ→a+ g(β). Hence, for all β ∈ (a, b0 ), we have that f (β) ≥M g(β) Since M > 0 was arbitrary, we obtain that lim x→a+ f (x) =∞ g(x) as desired. For part (ii), we may repeat the computation in part (a) to obtain that f (α) f 0 (c) f (β) g(β) f 0 (c) = 0 + − . g(α) g (c) g(α) g(α) g 0 (c) Notice that as limx→a+ f (x) = limx→a+ g(x) = ±∞, we obtain there exists a δ1 > 0 such that f (β) >0 g(α) and g(β) >0 g(α) 6.4. THE MEAN VALUE THEOREM 109 whenever a < α < β < a + δ1 (i.e. f (β), g(α), and g(β) must eventually all by the same sign). Therefore f (β) g(β) f 0 (c) f (β) g(β) − ≥ − M. g(α) g(α) g 0 (c) g(α) g(α) Fix β. Since limx→a+ g(x) = ±∞ we know that lim x→a+ f (β) g(β) − M = 0. g(α) g(α) Hence we may find a 0 < δ < δ1 such that if a < α < a + δ, then M f (β) g(β) − M ≥− g(α) g(α) 2 Thus, if a < α < a + δ, then f (α) f 0 (c) f (β) g(β) f 0 (c) = 0 + − g(α) g (c) g(α) g(α) g 0 (c) M M ≥M− = . 2 2 Since M > 0 was arbitrary, we obtain that lim x→a+ f (x) =∞ g(x) as desired. The proof is nearly identical when we replace a+ with b− (change the role of α and β). To complete the proof, we will run part (a), case (i) when we replace a+ with ∞ as replacing with −∞ is similar and all other parts/cases are similar. 0 (x) Suppose limx→∞ f (x) = limx→∞ g(x) = 0 and limx→∞ fg0 (x) = L ∈ R. Let h(x) = f 1 x and k(x) = g 1 x for all x ∈ (a, ∞). Notice lim h(x) = lim f (x) = 0 = lim g(x) = lim k(x). x→0+ x→∞ x→∞ x→0+ Also notice that h and k are differentiable on (a, ∞) \ {0} via the Chain Rule (Theorem 6.1.14) with 1 1 h (x) = − 2 f 0 x x 0 and Therefore lim x→0+ h0 (x) k 0 (x) f0 = lim x→0+ g0 1 x 1 1 k (x) = − 2 g 0 . x x 0 f 0 (x) = L. x→∞ g 0 (x) = lim 1 x 110 CHAPTER 6. DIFFERENTIATION Hence, by our previous proofs, we obtain that h(x) = L. k(x) lim x→0+ Since the existence of the above limit implies limx→∞ limx→∞ (x)0 f (x) g(x) = limx→0+ h(x) k(x) , f (x) g(x) exists and since the result follows. Using L’Hôpital’s rule, we can determine limx→∞ = 1 and (ex )0 = ex , and since x ex . Indeed, since 1 = 0, ex lim x→∞ we obtain by L’Hôpital’s rule tht limx→∞ we can use L’Hôpital’s rule to show that x ex = 0. Similarly, using induction, xn =0 x→∞ ex lim for all n ∈ N; that is, ex grows substantially faster than any power of x! We can use L’Hôpital’s rule to prove that a plethora of limits exists and compute said limits. Furthermore, there are many other forms of L’Hôpital’s rule. Example 6.4.11 (Fundamental Logarithmic Limit). The Fundamental Logarithmic Limit is the limit lim x ln(x). x→0+ Although it does not appear that we may apply L’Hôpital’s rule, we write x ln(x) = ln(x) 1 . Then we may apply L’Hôpital’s rule limx→0+ ln(x) = −∞ x whereas limx→0+ 1 x = ∞ so, up to multiplication by −1, the hypotheses of L’Hôpital’s rule hold. Since (ln(x))0 = 1 x x→0+ − 12 x 1 x and 0 1 x = − x12 , and since = lim −x = 0 lim x→0+ we obtain that limx→0+ x ln(x) = 0 by L’Hôpital’s rule. Example 6.4.12. Using similar techniques, we may compute 1 1+ x lim x→∞ x . To begin, we will instead first compute lim ln x→∞ 1+ 1 x x = lim x ln 1 + x→∞ 1 x ln 1 + = lim x→∞ 1 x 1 x . 6.4. THE MEAN VALUE THEOREM thermore, since 0 1 x 1 x First notice that limx→∞ = 0 and limx→∞ ln 1 + lim we obtain that limx→∞ ln ex is continuous, 1 1+ x1 − x12 1+ 1 x 1+ x 1 x 1 x 0 = 1 1+ x1 1 x = ln(1) = 0. Fur − x12 , and since − x12 x→∞ = − x12 and ln 1 + x→∞ lim 111 = lim x→∞ x 1 1+ 1 x = 1, = 1 by L’Hôpital’s rule. Therefore, as x 1 = lim eln((1+ x ) ) = e1 = e. x→∞ Hence, using the sequential definition of limits, we obtain the well-known limit 1 n lim 1 + = e. n→∞ n 6.4.5 Taylor’s Theorem Our final application of the Mean Value Theorem is when given a differentiable function f to approximate f pointwise using polynomials. However, to obtain better and better approximations, we will need polynomials of larger and larger degree, and more and more derivatives of f . Thus we defined the following. Definition 6.4.13. Suppose f : (a, b) → R is differentiable on (a, b). If f 0 is differentiable at c ∈ (a, b), the derivative of f 0 is called the second derivative of f and is denoted f 00 . In particular f 0 (c + h) − f 0 (c) . h→0 h f 00 (c) = lim In general, for any n ∈ N, the (n + 1)st -derivative of f is f (n) (c + h) − f (n) (c) h→0 h f (n+1) (c) = lim provided f (n) exists. For convenience, f (0) = f . Definition 6.4.14. Assuming that f is n-times differentiable at a (which means it is (n − 1)-times differentiable in an open interval containing a), the nth -degree Taylor polynomial of f centred at a is Pf,a,n (x) = n X f (k) (a) k=0 k! (x − a)k . 112 CHAPTER 6. DIFFERENTIATION For example, if f (x) = x2 , then Pf,0,n (x) = x2 for all n ≥ 2. Alternatively if f (x) = ex for all x ∈ R, then Pf,0,n (x) = n X 1 k x , k=0 k! which are the polynomials that define ex up to a limit in Remark 6.1.6. The reason for examining Taylor polynomials which says that f (x) is almost Pf,a,n (x). Theorem 6.4.15 (Taylor’s Theorem). Let I be an open interval containing a point a and let f : I → R be n + 1 times differentiable on I. If x ∈ I \ {a}, then there exists a cx ∈ (a, x) ∪ (x, a) such that f (x) = Pf,a,n (x) + f (n+1) (cx ) (x − a)n+1 . (n + 1)! Proof. Consider the function g : I → R defined by g(t) = f (x) − f (t) − n X f (k) (t) k=1 k! (x − t)k . Therefore g(a) = f (x) − Pf,a,n (x) and g(x) = 0. Notice that g is continuous on I and differentiable on I. Since f (t) 7−→ d f 0 (t) dt f (t)(x − t) 7−→ d f 0 (t)(−1) + f 00 (t)(x − t) 0 dt f 00 (t) f 000 (t) (x − t)2 − 7 → d f 00 (t)(−1)(x − t) + (x − t)2 dt 2! 2! f 000 (t) f 000 (t) f 0000 (t) (x − t)3 − 7 →d (−1)(x − t)2 + (x − t)3 dt 3! 2! 3! excreta, we see that g 0 (t) = − f (n+1) (t) (x − t)n . n! Let h : I → R be defined by h(t) = g(t) − x−t x−a n+1 g(a). Therefore h(a) = g(a) − g(a) = 0 and h(x) = g(x) − 0 = 0. 6.4. THE MEAN VALUE THEOREM 113 Since h is differentiable on I, h is continuous on [a, x]∪[x, a] and differentiable on (a, x) ∪ (x, a). Hence by Rolle’s Theorem (Lemma 6.4.1) or by the Mean Value Theorem (Theorem 6.4.2), there exists a c ∈ (a, x) ∪ (x, a) such that h0 (c) = 0. Since f (n+1) (t) −1 h (t) = (x − t)n + (n + 1) n! x−a 0 x−t x−a n g(a), we obtain that f (n+1) (c) 1 (x − c)n = (n + 1) n! x−a Hence x−c x−a n g(a). f (n+1) (c) (x − a)n+1 = g(a) = f (x) − Pf,a,n (x). (n + 1)! As desired. One important use of Taylor’s Theorem can be obtained if one knows bounds for f (n+1) (cx ). Indeed, if one knows that |f (n+1) (c)| ≤ M for all c ∈ (a − δ, a + δ) for some M > 0, then we have that |f (x) − Pf,a,n (x)| ≤ M (x − a)n+1 (n + 1)! for all x ∈ (a − δ, a + δ). Consequently, provided we can approximate M well, we can approximate f (x) with Pf,a,n (x) on this interval! This can be quite useful as dealing with polynomials is substantially easier than dealing with an arbitrary function. 114 CHAPTER 6. DIFFERENTIATION Chapter 7 Integration For our final chapter, we will study what will be shown to be the opposite of differentiation; namely integration. Integration has a wide variety of uses in calculus as it allows the computation of the area under a curve and permits the averaging of the values obtained by a function over an integral. Consequently, the purpose of this chapter is to formal defined the Riemann integral, develop the basic properties of the Riemann integral, and demonstrate the connections between differentiation and integration through our Fundamental Theorems of Calculus (Theorems 7.2.2 and 7.2.3). 7.1 The Riemann Integral The formal definition of the Riemann integral is modelled on trying to approximate the area under the graph of a function. The idea of approximating this area is to divide up the interval one wants to integrate over into small bits, and approximate the area under the graph via rectangles. Thus we must make such constructions formal. Once this is done, we must decide whether or not these approximations are good approximations. If they are, the resulting limit will be the Riemann integral. 7.1.1 Riemann Sums In order to ‘divide up the interval into small bits’, we will use the following notion. Definition 7.1.1. A partition of a closed interval [a, b] is a finite list of real numbers {tk }nk=0 such that a = t0 < t1 < t2 < · · · < tn−1 < tn = b. Eventually, we will want to ensure that |tk − tk−1 | is small for all k in order to obtain better and better approximations to the area under a graph. 115 116 CHAPTER 7. INTEGRATION To obtain a lower bound for the area under a graph, we can choose our approximating rectangles to have the largest possible height while remaining completely under the graph. This leads us to the following notion. Definition 7.1.2. Let P = {tk }nk=0 be a partition of [a, b] and let f : [a, b] → R be bounded. The lower Riemann sum of f associated to P, denoted L(f, P), is n L(f, P) = X mk (tk − tk−1 ) k=1 where, for all k ∈ {1, . . . , n}, mk = inf{f (x) | x ∈ [tk−1 , tk ]}. Example 7.1.3. If f : [0, 1] → R is defined by f (x) = x for all x ∈ [0, 1] and if P = {tk }nk=0 is a partition of [0, 1], it is easy to see that L(f, P) = n X tk−1 (tk − tk−1 ) k=1 as f obtains its minimum on [tk−1 , tk ] at tk−1 . If it so happens that tk = nk for all k ∈ {0, 1, . . . , n}, we see that L(f, P) = = n X k−1 k n k=1 n X k−1 − n n 1 (k − 1) 2 n k=1 X 1 n−1 j = 2 n j=1 = 1− 1 n(n − 1) = 2 n 2 2 1 n by the homework. Clearly, as n tends to infinity, L(f, P) tends to 12 for this particular partitions, which happens to be the area under the graph of f on [0, 1]. Although lower Riemann sums accurately estimate the area under the graph of the function in the previous example, perhaps we also need an upper bound for the area under the graph. By choose our approximating rectangles to have the smallest possible height while remaining completely above the graph, we obtain the following notion. Definition 7.1.4. Let P = {tk }nk=0 be a partition of [a, b] and let f : [a, b] → R be bounded. The upper Riemann sum of f associated to P, denoted U (f, P), is U (f, P) = n X k=1 Mk (tk − tk−1 ) 7.1. THE RIEMANN INTEGRAL 117 where, for all k ∈ {1, . . . , n}, Mk = sup{f (x) | x ∈ [tk−1 , tk ]}. Example 7.1.5. If f : [0, 1] → R is defined by f (x) = x for all x ∈ [0, 1] and if P = {tk }nk=0 is a partition of [0, 1], it is easy to see that U (f, P) = n X tk (tk − tk−1 ) k=1 as f obtains its maximum on [tk−1 , tk ] at tk . If it so happens that tk = nk for all k ∈ {0, 1, . . . , n}, we see that U (f, P) = = n X k k k=1 n X 1 k 2 n k=1 1 = 2 n = n k−1 − n n n X ! k k=1 1+ 1 n(n + 1) = 2 n 2 2 1 n by the homework. Clearly, as n tends to infinity, U (f, P) tends to 12 for this particular partitions, which happens to be the area under the graph of f on [0, 1]. Although we have been able to approximate the area under the graph of f (x) = x using upper and lower Riemann sums, how do we know whether we can accurate do so for other functions? To analyze this question, we must first decide whether we can compare the upper and lower Riemann sums of a function. Clearly we have that L(f, P) ≤ U (f, P) for any bounded function f : [a, b] → R and any partition P of [a, b]. However, if Q is another partition of [a, b], is it the case that L(f, Q) ≤ U (f, P)? Of course our intuition using ‘areas under a graph’ says this should be so, but how do we prove it? To answer the above question and provide some ‘sequence-like’ structure to partitions, we define an ordering on the set of partitions. Definition 7.1.6. Let P and Q be partitions of [a, b]. It is said that Q is a refinement of P, denoted P ≤ Q, if P ⊆ Q; that is Q has all of the points that P has, and possibly more. It is not difficult to check that refinement defines a partial ordering (Definition 1.3.3) on the set of all partitions of [a, b]. Furthermore, the following says that if Q is a refinement of P, we should have better upper and lower bounds for the area under the graph of a function if we use Q instead of P. 118 CHAPTER 7. INTEGRATION Lemma 7.1.7. Let P and Q be partitions of [a, b] and let f : [a, b] → R be bounded. If Q is a refinement of P, then L(f, P) ≤ L(f, Q) ≤ U (f, Q) ≤ U (f, P). Proof. Note the inequality L(f, Q) ≤ U (f, Q) follows from earlier discussions. Thus it suffices to show that L(f, P) ≤ L(f, Q) and U (f, Q) ≤ U (f, P). Write P = {tk }nk=0 where a = t0 < t1 < t2 < · · · < tn−1 < tn = b. First suppose Q = P ∪ {t0 } where t0 ∈ [a, b] is such that tq−1 < t0 < tq for some q ∈ {1, . . . , n}. Therefore, if mk = inf{f (x) | x ∈ [tk−1 , tk ]} and Mk = sup{f (x) | x ∈ [tk−1 , tk ]} then L(f, P) = n X mk (tk − tk−1 ) and U (f, P) = k=1 n X Mk (tk − tk−1 ). k=1 However, if m0q = inf{f (x) | x ∈ [tq−1 , t0 ]}, m00q = inf{f (x) | x ∈ [t0 , tq ]}, Mq0 = sup{f (x) | x ∈ [tq−1 , t0 ]}, and Mq00 = sup{f (x) | x ∈ [t0 , tq ]}, then we easily see that mq ≤ m0q , m00q , that Mq0 , Mq00 ≤ Mq , L(f, P) = m0q (t0 − tq−1 ) + m00q (tq − t0 ) + n X mk (tk − tk−1 ), k=1 k6=q n X U (f, P) = Mq0 (t0 − tq−1 ) + Mq00 (tq − t0 ) + and Mk (tk − tk−1 ). k=1 k6=q Thus L(f, Q) − L(f, P) = m0q (t0 − tq−1 ) + m00q (tq − t0 ) − mq (tq − tq−1 ) ≥ mq (t0 − tq−1 ) + mq (tq − t0 ) − mq (tq − tq−1 ) = 0 so L(f, P) ≤ L(f, Q). Similarly U (f, Q) − U (f, P) = Mq0 (t0 − tq−1 ) + Mq00 (tq − t0 ) − Mq (tq − tq−1 ) ≤ Mq (t0 − tq−1 ) + Mq (tq − t0 ) − Mq (tq − tq−1 ) = 0 7.1. THE RIEMANN INTEGRAL 119 so U (f, Q) ≤ U (f, P). Hence the result follows when Q = P ∪ {t0 }. To complete the proof, let Q be an arbitrary refinement of P. Thus we 0 m can write Q = P ∪ {t0k }m k=1 for some {tk }k=1 ⊆ (a, b). Thus, by the first part of the proof, L(f, P) ≤ L(f, P ∪ {t01 }) ≤ L(f, P ∪ {t01 , t02 }) ≤ · · · ≤ L(f, Q) and U (f, P) ≥ U (f, P ∪ {t01 }) ≥ U (f, P ∪ {t01 , t02 }) ≥ · · · ≥ U (f, Q), which completes the proof. In order to answer our question of whether L(f, Q) ≤ U (f, P) for all partitions P and Q, we can use Lemma 7.1.7 provided we have a partition that is a refinement of both P and Q: Definition 7.1.8. Given two partitions P and Q of [a, b], the common refinement of P and Q is the partition P ∪ Q of [a, b]. Clearly, given two partitions P and Q, P ∪ Q is a refinement of P and Q. Consequently, if f : [a, b] → R is bounded, then Lemma 7.1.7 implies that L(f, P) ≤ L(f, P ∪ Q) ≤ U (f, P ∪ Q) ≤ U (f, Q). Hence any lower bound for the area under a curve is smaller than any upper bound for the area under a curve. Before moving on, we note the above shows that the partially ordered set of partitions of a closed interval [a, b] is a direct set (that is, a partially ordered set with the property that if P and Q are elements of the partially ordered set, then there exists an element R such that P ≤ R and Q ≤ R). A set of real numbers indexed by a direct set is called a net and one can discuss the convergences of nets in R as we did with sequences. It turns out nothing new is gained by using nets instead of sequences and we can avoid the discussion of nets in our discussion of integrals (although they exist in the background). 7.1.2 Definition of the Riemann Integral In order to define the Riemann integral of a function on a closed interval, we desire that the upper and lower Riemann sums both better and better approximate a number. Using the above observations, we notice that if f : [a, b] → R is bounded, then sup{L(f, P) | P a partition of [a, b]} ≤ inf{U (f, P) | P a partition of [a, b]}. 120 CHAPTER 7. INTEGRATION Therefore, in order for there to be no reasonable discrepancy between our approximations, we will like an equality in the above inequality (in which case, the value obtained should be the area under the graph). Unfortunately, this is not always the case by the following example. Example 7.1.9. Let f : [0, 1] → R be defined by ( 1 0 f (x) = if x ∈ Q if x ∈ R \ Q for all x ∈ [0, 1]. Since each open interval always contains at least one element from each of Q and R \ Q by the homework, we easily see that L(f, P) = 0 and U (f, P) = 1 for all partitions P of [0, 1]. Consequently, the upper and lower Riemann sums do not allow us to approximate the area under the graph of f . Consequently, we will restrict our attention to the following type of functions. Definition 7.1.10. Let f : [a, b] → R be bounded. It is said that f is Riemann integrable on [a, b] if sup{L(f, P) | P a partition of [a, b]} = inf{U (f, P) | P a partition of [a, b]}. If f is Riemann integrable on [a, b], the Riemann integral of f from a to b, Rb denoted a f (x) dx, is Z b f (x) dx = sup{L(f, P) | P a partition of [a, b]} a = inf{U (f, P) | P a partition of [a, b]}. Remark 7.1.11. Notice that if f is Riemann integrable on [a, b], then L(f, P) ≤ Z a f (x) dx ≤ U (f, P) b for every partition P of [a, b] by the definition of the Riemann integral. Clearly the function f in Example 7.1.9 is not Riemann integrable. However, which types of function are Riemann integrable and how can we compute the integral? To illustrate the definition, we note the following simple examples (note if the first example did not work out the way it does, we clearly would not have a well-defined notion of area under a graph using Riemann integrals). 7.1. THE RIEMANN INTEGRAL 121 Example 7.1.12. Let c ∈ R and let f : [a, b] → R be defined by f (x) = c for all x ∈ [a, b]. If P = {tk }nk=0 is a partition of [a, b], we see that L(f, P) = U (f, P) = n X c(tk − tk−1 ) = c k=1 n X tk − tk−1 = c(tn − t0 ) = c(b − a). k=1 Hence f is Riemann integrable and doubt?) Rb a f (x) dx = c(b − a). (Was there any Example 7.1.13. Let f : [0, 1] → R be defined by f (x) = x for all x ∈ [0, 1]. For each n ∈ N, note Example 7.1.3 demonstrates the existence of a partition Pn such that L(f, Pn ) = 1 1− n 2 . Hence sup{L(f, P) | P a partition of [a, b]} ≥ lim sup n→∞ 1− 2 1 n 1 = . 2 Similarly, for each n ∈ N, Example 7.1.5 demonstrates the existence of a partition Qn such that U (f, Qn ) = 1 1+ n 2 . Hence inf{U (f, P) | P a partition of [a, b]} ≤ lim inf n→∞ 1+ 2 1 n 1 = . 2 Therefore, since sup{L(f, P) | P a partition of [a, b]} ≤ inf{U (f, P) | P a partition of [a, b]}, the above computations show both the inf and sup must be 12 . Hence f is R Riemann integrable on [0, 1] and 01 x dx = 12 . Example 7.1.14. Let f : [0, 1] → R be defined by fR(x) = x2 for all x ∈ [0, 1]. We claim that f is Riemann integrable on [0, 1] and 01 x2 dx = 13 . To see this, let n ∈ N and let Pn = {tk }nk=1 be the partition of [0, 1] such that tk = nk for all n ∈ N. Then, by the first assignment, L(f, P) = = n X (k − 1)2 k n2 k=1 n X n − k−1 n 1 (k − 1)2 3 n k=1 n−1 X = 1 j2 n3 j=1 = 1 (n − 1)(n)(2(n − 1) + 1) 2n3 − 3n2 + n = n3 6 6n3 122 CHAPTER 7. INTEGRATION and U (f, P) = = n X k2 k k=1 n X Hence, since limn→∞ n2 1 2 k n3 k=1 1 = 3 n = k−1 − n n n X ! k 2 k=1 1 n(n + 1)(2n + 1) 2n3 + 3n2 + n . = 3 n 2 6n3 2n3 −3n2 +1 6n3 = limn→∞ 2n3 +3n2 +1 6n3 = 13 , we see that 1 ≤ sup{L(f, P) | P a partition of [a, b]} 3 1 ≤ inf{U (f, P) | P a partition of [a, b]} ≤ . 3 Hence the inequalities must be equalities so f is Riemann integrable on [0, 1] R by definition with 01 x2 dx = 13 Note in the previous two examples, the functions were demonstrated to be Riemann integrable on [0, 1] via partitions P such that L(f, P) and U (f, P) were as closes as one would like. Coincidence, I think not! Theorem 7.1.15. Let f : [a, b] → R be bounded. Then f is Riemann integrable if and only if for every > 0 there exists a partition P of [a, b] such that 0 ≤ U (f, P) − L(f, P) < . Proof. Note we must have that 0 ≤ U (f, P) − L(f, P) for any partition P by earlier discussions. R First suppose f is Riemann integrable. Hence, if I = ab f (x) dx, we have by the definition of the integral that I = sup{L(f, P) | P a partition of [a, b]} = inf{U (f, P) | P a partition of [a, b]}. Let > 0 be arbitrary. By the definition of the supremum, there exists a partition P1 of [a, b] such that I − L(f, P1 ) < . 2 Similarly, by the definition of the infimum, there exists a partition P2 of [a, b] such that U (f, P2 ) − I < . 2 7.1. THE RIEMANN INTEGRAL 123 Let P = P1 ∪ P2 which is a partition of [a, b]. Since P is a refinement of both P1 and P2 , we obtain that L(f, P1 ) ≤ L(f, P) ≤ U (f, P) ≤ U (f, P2 ) by Lemma 7.1.7. Hence U (f, P) − L(f, P) ≤ U (f, P2 ) − L(f, P1 ) = (U (f, P2 ) − I) + (I − L(f, P1 )) < + = . 2 2 Therefore, since > 0 was arbitrary, this direction of the proof is complete. For the other direction, suppose every > 0 there exists a partition P of [a, b] such that 0 ≤ U (f, P) − L(f, P) < . In particular, for each n ∈ N there exists a partition Pn of [a, b] such that 0 ≤ U (f, Pn ) − L(f, Pn ) < 1 . n Let L = sup{L(f, P) | P a partition of [a, b]} and U = inf{U (f, P) | P a partition of [a, b]}. Then L, U ∈ R are such that L ≤ U . However, for each n ∈ N 0 ≤ U − L ≤ U (f, Pn ) − L(f, Pn ) < 1 . n Therefore, as the above holds for all n ∈ N, it must be the case (by the homework/Archimedean property) that U = L. Hence f is Riemann integrable on [a, b] by definition. Using Theorem 7.1.15, we can obtain an easier method for approximating a Riemann integral provided we know the function is Riemann integrable. Indeed suppose P = {tk }nk=0 is a partition of [a, b] with a = t0 < t1 < t2 < · · · < tn−1 < tn = b and let f : [a, b] → R be bounded. For each k, suppose xk ∈ [tk−1 , tk ] and let R(f, P, {xk }nk=1 ) = n X f (xk )(tk − tk−1 ). k=1 The sum R(f, P, {xk }nk=1 ) is called a Riemann sum. Clearly L(f, P) ≤ R(f, P, {xk }nk=1 ) ≤ U (f, P). 124 CHAPTER 7. INTEGRATION In particular, if f is Riemann integrable, we obtain via Theorem 7.1.15 that for any > 0 there exists a partition P 0 of [a, b] such that 0 L(f, P ) ≤ Z a f (x) dx ≤ U (f, P 0 ) ≤ L(f, P)0 + b and thus Z b 0 n f (x) dx − R(f, P , {xk }k=1 ) < a for any choice of {xk }nk=1 . Consequently, if one knows that f is Riemann R integrable, one may approximate ab f (x) dx using Riemann sums oppose to lower/upper Riemann sums. This is occasionally useful as convenient choices of {xn }nk=1 may make computing the sum much easier. Of course, our next question is, “Which classes of functions that have been studied in this course are Riemann integrable?” 7.1.3 Some Integrable Functions If the theory of Riemann integration will be of use to us, we must have a wide variety of functions that are Riemann integrable. First we start with the following. Theorem 7.1.16. If f : [a, b] → R is monotonic and bounded, then f is Riemann integrable on [a, b]. Proof. Let f : [a, b] → R be monotone and bounded. We will assume that f is non-decreasing as the proof when f is non-increasing is similar. Fix n ∈ N. Let Pn = {tk }nk=0 be the partition such that k (b − a) n for all k ∈ {0, . . . , n}. Notice tk − tk−1 = n1 (b − a) for all k (and thus we call Pn the uniform partition of [a, b] into n intervals). Since f is non-decreasing, if tk = a + mk = inf{f (x) | x ∈ [tk−1 , tk ]} and Mk = sup{f (x) | x ∈ [tk−1 , tk ]} then mk = f (tk−1 ) and Mk = f (tk ). Hence 0 ≤ U (f, Pn ) − L(f, Pn ) = = n X k=1 n X Mk (tk − tk−1 ) − n X mk (tk − tk−1 ) k=1 n X 1 1 f (tk ) (b − a) − f (tk−1 ) (b − a) n n k=1 k=1 1 1 1 = f (tn ) (b − a) − f (t0 ) (b − a) = (b − a)(f (b) − f (a)). n n n 7.1. THE RIEMANN INTEGRAL 125 Since limn→∞ n1 (b − a)(f (b) − f (a)) = 0, for each > 0 there exists an N ∈ N such that 0 ≤ U (f, PN ) − L(f, PN ) ≤ 1 (b − a)(f (b) − f (a)) < . N Hence Theorem 7.1.15 implies that f is Riemann integrable on [a, b]. Of course, if continuous functions were not Riemann integrable, Riemann integration would be next to worthless. Theorem 7.1.17. If f : [a, b] → R is continuous, then f is Riemann integrable on [a, b]. Proof. Let f : [a, b] → R be continuous. Therefore f is bounded on [a, b] by the homework. In order to invoke Theorem 7.1.15, let > 0 be arbitrary. Since f : [a, b] → R is continuous, f is uniformly continuous on [a, b] by Theorem 5.4.4. Hence there exists a δ > 0 such that if x, y ∈ [a, b] and |x − y| < δ then |f (x) − f (y)| < b−a . 1 Choose n ∈ N such that n < δ (homework/Archemedean property), and let P be the uniform partition of [a, b] into n intervals; that is P = {tk }nk=0 be the partition such that tk = a + k (b − a) n for all k ∈ {0, . . . , n}. Let mk = inf{f (x) | x ∈ [tk−1 , tk ]} and Mk = sup{f (x) | x ∈ [tk−1 , tk ]}. Since |tk − tk−1 | = n1 < δ so |x − y| < δ for all x, y ∈ [tk−1 , tk ], it must be the case that Mk − mk = |Mk − mk | ≤ b−a (in fact, < by the Extreme Value Theorem) for all k ∈ {1, . . . , n}. Hence 0 ≤ U (f, P) − L(f, P) = n X (Mk − mk )(tk − tk−1 ) k=1 n X ≤ (tk − tk−1 ) b−a k=1 = n X (b − a) = . tk − tk−1 = b − a k=1 b−a Thus, as > 0 was arbitrary, f is Riemann integrable on [a, b] by Theorem 7.1.15. We have seen continuity is a lot to ask. However, many functions one sees and deals with in real-world applications are continuous at almost every point. In particular, the following shows that if our functions are piecewise continuous, then they are Riemann integrable. 126 CHAPTER 7. INTEGRATION Corollary 7.1.18. If f : [a, b] → R is bounded on [a, b], and continuous on [a, b] except at a finite number of points, then f is Riemann integrable on [a, b]. Proof. Suppose f : [a, b] → R is continuous except at a finite number of points and f ([a, b]) is bounded. Let {ak }qk=0 contain all of the points for which f is not continuous at and be such that a = a0 < a1 < a2 < · · · < aq = b. The idea of the proof is to construct a partition such that each interval of the partition contains at most one ak , and if an interval of the partition contains an ak , then its length is really small. Let > 0 be arbitrary. Let L = sup{f (x) − f (y) | x, y ∈ [a, b]}. Since f ([a, b]) is bounded, we obtain that 0 ≤ L < ∞. Let δ= > 0. 2(q + 1)(L + 1) It is not difficult to see that there exists a partition P 0 = {tk }2q+1 k=0 with a = t0 < t1 < t2 < · · · < t2q+2 = b such that t2k+1 − t2k < δ for all k ∈ {0, . . . , q} and t2k < ak < t2k+1 for all k ∈ {1, . . . , q − 1}. Let mk = inf{f (x) | x ∈ [tk−1 , tk ]} Mk = sup{f (x) | x ∈ [tk−1 , tk ]}. and Thus Mk − mk ≤ L for all k ∈ {1, . . . , 2q + 1}. Since f is continuous on [t2k−1 , t2k ] for all k ∈ {1, . . . , q}, f is Riemann integrable on [t2k−1 , t2k ] by Theorem 7.1.17. Hence, by the definition of Riemann integration, there exist partitions Pk of [t2k−1 , t2k ] such that 0 ≤ U (f, Pk ) − L(f, Pk ) < Let P = P 0 ∪ Sq k=1 Pk . 2q . Then P is a partition of [a, b] such that 0 ≤ U (f, P) − L(f, P) = q X k=1 U (f, Pk ) − L(f, Pk ) + q X (M2k+1 − m2k+1 )(t2k+1 − t2k ). k=0 7.1. THE RIEMANN INTEGRAL 127 (that is, on each [t2k−1 , t2k ] the partition behaves like Pk and thus so do the sums, and the parts of the partition remaining are of the form [t2k , t2k+1 ] (each of which contains at most one aj ). Hence 0 ≤ U (f, P) − L(f, P) ≤ q X k=1 2q + q X Lδ k=0 + (q + 1)Lδ 2 ≤ + (q + 1)L ≤ + = . 2 2(q + 1)(L + 1) 2 2 ≤ Thus, as > 0 was arbitrary, f is Riemann integrable on [a, b] by Theorem 7.1.15. Notice the main idea of the above proof is really to construct a finite number of open intervals which contain all of the discontinuities such that the sum of the lengths of the open intervals is as small as possible. Therefore, the same argument can be used to show that if the set of discontinuities of a function f has measure zero (where ‘measure’ is as defined on the homework), then f is Riemann integrable (see the homework for an example question). 7.1.4 Properties of the Riemann Integral Now that we know several functions are Riemann integrable, we desire properties of the Riemann integral. Hence we begin with the following which is simple to state, yet technical to prove. Theorem 7.1.19. Let f, g : [a, b] → R be bounded, Riemann integrable functions on [a, b]. Then: a) If α ∈ R, then αf is Riemann integrable on [a, b] and Z b Z b (αf )(x) dx = α a f (x) dx. a b) f + g is Riemann integrable on [a, b] and Z b Z b (f + g)(x) dx = a Z b f (x) dx + a g(x) dx. a c) If a ≤ c ≤ b, then f is Riemann integrable on [a, c] and [c, b] with Z b Z c f (x) dx = a Z b f (x) dx. f (x) dx + a c 128 CHAPTER 7. INTEGRATION d) If f (x) ≤ g(x) for all x ∈ [a, b], then Z b f (x) dx ≤ Z b g(x) dx. a a e) If m ≤ f (x) ≤ M for all x ∈ [a, b], then m(b − a) ≤ Z b f (x) dx ≤ M (b − a). a Proof. For part (a), let P be any partition of [a, b]. If α ≥ 0, it is not difficult to see that L(αf, P) = αL(f, P) U (αf, P) = αU (f, P). and Furthermore, if α < 0, then it is not difficult to see that L(αf, P) = αU (f, P) U (αf, P) = αL(f, P) and (i.e. if X is a bounded subset of R, inf(−X) = − sup(X)). Since f is integrable on [a, b], we obtain by definition that Z b f (x) dx = sup{L(f, P) | P a partition of [a, b]} a = inf{U (f, P) | P a partition of [a, b]}. Thus, the above computations imply that Z b α f (x) dx = sup{L(αf, P) | P a partition of [a, b]} a = inf{U (αf, P) | P a partition of [a, b]}. Hence αf is Riemann integrable on [a, b] with Z b Z b (αf )(x) dx = α a f (x) dx. a For part (b), let P be any partition of [a, b]. Since sup{f (x)+g(x) | x ∈ [c, d]} ≤ sup{f (x) | x ∈ [c, d]}+sup{g(x) | x ∈ [c, d]} and inf{f (x) + g(x) | x ∈ [c, d]} ≥ inf{f (x) | x ∈ [c, d]} + inf{g(x) | x ∈ [c, d]} for all c, d ∈ [a, b], we see that L(f, P) + L(g, P) ≤ L(f + g, P) ≤ U (f + g, P) ≤ U (f, P) + U (g, P). 7.1. THE RIEMANN INTEGRAL 129 Let > 0 be arbitrary. Since f is Riemann integrable on [a, b], there exists a partition P1 of [a, b] such that L(f, P1 ) ≤ Z b a f (x) dx ≤ U (f, P1 ) ≤ L(f, P1 ) + . 2 Similarly, since g is Riemann integrable on [a, b], there exists a partition P2 of [a, b] such that L(g, P2 ) ≤ Z b a g(x) dx ≤ U (g, P2 ) ≤ L(g, P2 ) + . 2 Let P = P1 ∪ P2 . Then P is a partition of [a, b] such that Z b and 2 a Z b g(x) dx ≤ U (g, P) ≤ L(g, P) + . L(g, P) ≤ 2 a L(f, P) ≤ f (x) dx ≤ U (f, P) ≤ L(f, P) + Hence, since we know that L(f, P) + L(g, P) ≤ L(f + g, P) ≤ U (f + g, P) ≤ U (f, P) + U (g, P) we obtain that L(f, P) + L(g, P) ≤ L(f + g, P) ≤ U (f + g, P) ≤ L(f, P) + L(g, P) + and Z b Z b f (x) dx + g(x) dx − ≤ L(f + g, P) a a ≤ U (f + g, P) ≤ Z b Z b f (x) dx + a g(x) dx + . a Hence 0 ≤ U (f + g, P) − L(f + g, P) < . Thus, as was arbitrary, Theorem 7.1.15 implies that f + g is Riemann integrable on [a, b]. Furthermore, for each > 0, the above computation produced a partition P such that Z b Z b f (x) dx + a g(x) dx − ≤ L(f + g, P) a ≤ Z b (f + g)(x) dx a ≤ U (f + g, P) ≤ Z b Z b f (x) dx + a g(x) dx + . a 130 Hence CHAPTER 7. INTEGRATION Z Z b Z b b (f + g)(x) dx ≤ . g(x) dx − f (x) dx + a a a Therefore, as > 0 was arbitrary, we obtain that Z b Z b Z b a a a g(x) dx. f (x) dx + (f + g)(x) dx = For part (c), first let us show that f is Riemann integrable on [a, c] and [c, b]. To see this, let > 0. By Theorem 7.1.15, there exists a partition P of [a, b] such that L(f, P) ≤ Z b f (x) dx ≤ U (f, P) ≤ L(f, P) + . a Therefore, if P0 = P ∪ {c}, then P0 is a partition of [a, b] containing c such that Z b L(f, P0 ) ≤ f (x) dx ≤ U (f, P0 ) ≤ L(f, P0 ) + . a Let P1 = P0 ∩ [a, c] and P2 = P0 ∩ [c, b]. Then P1 is a partition of [a, c] and P2 is a partition of [c, b]. Furthermore, due to the nature of these partitions, we easily see that L(f, P0 ) = L(f, P1 ) + L(f, P2 ) and U (f, P0 ) = U (f, P1 ) + U (f, P2 ). Hence 0 ≤ (U (f, P1 ) − L(f, P1 )) + (U (f, P2 ) − L(f, P2 )) = U (f, P0 ) − L(f, P0 ) ≤ . Hence, as 0 ≤ U (f, P1 ) − L(f, P1 ) and 0 ≤ U (f, P2 ) − L(f, P2 ), it must be the case that 0 ≤ U (f, P1 ) − L(f, P1 ) ≤ and 0 ≤ U (f, P2 ) − L(f, P2 ) ≤ . Hence f is integrable on both [a, c] and [c, b] by Theorem 7.1.15. To see that Z b Z c f (x) dx = a Z b f (x) dx, f (x) dx + a c let > 0 be arbitrary. Since f is integrable on both [a, c] and [c, b], there exist partitions P1 and P2 of [a, c] and [c, b] respectively such that Z c and 2 a Z b L(f, P2 ) ≤ f (x) dx ≤ U (f, P2 ) ≤ L(f, P2 ) + . 2 c L(f, P1 ) ≤ f (x) dx ≤ U (f, P1 ) ≤ L(f, P1 ) + 7.1. THE RIEMANN INTEGRAL 131 Let P = P1 ∪ P2 which is a partition of [a, b]. Then, as before, L(f, P) = L(f, P1 ) + L(f, P2 ) U (f, P) = U (f, P1 ) + U (f, P2 ). and Hence Z b Z c f (x) dx + f (x) dx − ≤ L(f, P1 ) + L(f, P2 ) c a = L(f, P) ≤ Z b f (x) dx a ≤ U (f, P) = U (f, P1 ) + U (f, P2 ) ≤ Z b Z c f (x) dx + . f (x) dx + c a Hence Z Z b Z b c f (x) dx < . f (x) dx − f (x) dx + a a c Therefore, since > 0 was arbitrary, we obtain that Z b Z c f (x) dx + f (x) dx = a Z b a f (x) dx. c For part (d), suppose f (x) ≤ g(x) for all x ∈ [a, b]. Let > 0. By Theorem 7.1.15, there exists a partition P of [a, b] such that L(f, P) ≤ Z b f (x) dx ≤ U (f, P) ≤ L(f, P) + . a However, since f (x) ≤ g(x) for all x ∈ [a, b], it must be the case that inf{f (x) | x ∈ [c, d]} ≤ inf{g(x) | x ∈ [c, d]} for any c, d ∈ [a, b]. Therefore L(f, P) ≤ L(g, P). Hence Z b f (x) dx − ≤ L(f, P) ≤ L(g, P) ≤ Z b a g(x) dx. a Since > 0 was arbitrary, we have Z b f (x) dx ≤ Z b a g(x) dx + a for all > 0. Hence it must be the case that Z b f (x) dx ≤ a Z b g(x) dx. a For part (e), we have by part (d) and Example 7.1.12 that m(b − a) = Z b a m dx ≤ Z b a f (x) dx ≤ Z b a M dx = M (b − a). 132 CHAPTER 7. INTEGRATION Note that Theorem 7.1.19 does not produce a formula for the Riemann integral of the product of Riemann integrable functions. Indeed R R it is almost Rb b b always the case that a (f g)(x) dx 6= a f (x) dx a g(x) dx . For example, using Examples 7.1.13 and 7.1.14, we see that Z 1 x2 dx = 0 1 3 2 Z 1 x dx whereas 0 1 = . 4 However, it is still possible to show that if f and g are Riemann integrable on [a, b], then so to is f g. To prove this, we first note the following which has its own uses. Theorem 7.1.20. Let f : [a, b] → R a be bounded, Riemann integrable function on [a, b]. Then the function |f | : [a, b] → R defined by |f |(x) = |f (x)| for all x ∈ [a, b] is Riemann integrable on [a, b] and Z Z b b f (x) dx ≤ |f (x)| dx. a a Proof. Let > 0. By Theorem 7.1.15, there exists a partition P of [a, b] such that 0 ≤ U (f, P) − L(f, P) < . Write P = {tk }nk=0 where a = t0 < t1 < t2 < · · · < tn−1 < tn = b. For each k ∈ {1, . . . , n} let mk (f ) = inf{f (x) | x ∈ [tk−1 , tk ]}, Mk (f ) = sup{f (x) | x ∈ [tk−1 , tk ]}, mk (|f |) = inf{|f (x)| | x ∈ [tk−1 , tk ]}, and Mk (|f |) = sup{|f (x)| | x ∈ [tk−1 , tk ]}. We claim that Mk (|f |) − mk (|f |) ≤ Mk (f ) − mk (f ) for all k ∈ {1, . . . , n}. Indeed notice if x, y ∈ [tk−1 , tk ] are such that: • f (x), f (y) ≥ 0, then |f (x)| − |f (y)| = f (x) − f (y) ≤ Mk (f ) − mk (f ). • f (x) ≥ 0 ≥ f (y) or f (y) ≥ 0 ≥ f (x), then |f (x)| − |f (y)| ≤ f (x) − f (y) ≤ Mk (f ) − mk (f ). 7.1. THE RIEMANN INTEGRAL 133 • f (x), f (y) ≤ 0, then |f (x)| − |f (y)| = f (y) − f (x) ≤ Mk (f ) − mk (f ). The above inequalities and definitions imply that Mk (|f |) − mk (|f |) ≤ Mk (f ) − mk (f ). Hence U (|f |, P) − L(|f |, P) = ≤ n X (Mk (|f |) − mk (|f |))(tk − tk−1 ) k=1 n X (Mk (f ) − mk (f ))(tk − tk−1 ) k=1 = U (f, P) − L(f, P) < . Hence, as > 0 was arbitrary, |f | is Riemann integrable on [a, b] by Theorem 7.1.15. By Theorem 7.1.19, −|f | is also Riemann integrable. Since −|f (x)| ≤ f (x) ≤ |f (x)| for all x ∈ [a, b], Theorem 7.1.19 also implies that − Z b |f (x)| dx ≤ a Hence Z b f (x) dx ≤ a Z b |f (x)| dx. a Z Z b b f (x) dx ≤ |f (x)| dx. a a which completes the proof. As a step toward proving that if f and g are Riemann integrable, then f g is Riemann integrable, we first prove that f 2 is Riemann integrable. Lemma 7.1.21. Let f : [a, b] → R be a bounded, Riemann integrable function on [a, b]. Then the function f 2 : [a, b] → R defined by f 2 (x) = (f (x))2 for all x ∈ [a, b] is Riemann integrable on [a, b]. Proof. Since f is bounded, let K = sup{|f (x)| | x ∈ [a, b]} < ∞. To see that f 2 is Riemann integrable, let > 0 be arbitrary. Since |f | is Riemann integrable by Theorem 7.1.20, by Theorem 7.1.15 there exists a partition P of [a, b] such that 0 ≤ U (|f |, P) − L(|f |, P) < 1 . 2(K + 1) 134 CHAPTER 7. INTEGRATION Write P = {tk }nk=0 where a = t0 < t1 < t2 < · · · < tn−1 < tn = b. For each k ∈ {1, . . . , n} let mk (|f |) = inf{|f (x)| | x ∈ [tk−1 , tk ]}, Mk (|f |) = sup{|f (x)| | x ∈ [tk−1 , tk ]}, mk (f 2 ) = inf{(f (x))2 | x ∈ [tk−1 , tk ]}, and Mk (f 2 ) = sup{(f (x))2 | x ∈ [tk−1 , tk ]}. Since f 2 = |f |2 and since X ⊆ [0, ∞) implies sup({x2 | x ∈ X}) = sup({x | x ∈ X})2 , we see that Mk (f 2 ) − mk (f 2 ) = Mk (|f |)2 − mk (|f |)2 = (Mk (|f |) + mk (|f |))(Mk (|f |) − mk (|f |)) ≤ 2K(Mk (|f |) − mk (|f |)). Hence 0 ≤ U (f 2 , P) − L(f 2 , P) ≤ 2K(U (|f |, P) − L(|f |, P)) ≤ 2K 1 < . 2(K + 1) Hence f 2 is Riemann integrable by Theorem 7.1.20. Theorem 7.1.22. Let f, g : [a, b] → R be bounded, Riemann integrable functions on [a, b]. Then f g : [a, b] → R is Riemann integrable on [a, b]. Proof. Since f (x)g(x) = 1 (f (x) + g(x))2 − f (x)2 − g(x)2 2 and since f + g, f 2 , g 2 , and (f + g)2 are Riemann integrable by Theorem 7.1.19 and Lemma 7.1.21, it follows by Theorem 7.1.19 that f g is Riemann integrable. 7.2 The Fundamental Theorems of Calculus For our final section of the course, we note that although we have developed the Riemann integral and its properties, we still lack a simple way to compute the integral of even some of the most basic functions. Indeed the only integrals we have been able to compute were in Examples 7.1.13 and 7.1.14 where specific sums were used. 7.2. THE FUNDAMENTAL THEOREMS OF CALCULUS 135 The goal of this final section is to prove what is know as the Fundamental Theorems of Calculus. Said theorems are named as such since they provide the ultimate connection between integration and differentiation via antiderivatives as introduced in subsection 6.4.2. To study these theorems, we will need to define some functions based on integrals. To begin, suppose f : [a, b] → R is bounded and Riemann integrable on [a, b]. For simplicity, let us define Z a f (x) dx = 0. a Therefore, if we define F : [a, b] → R by Z x F (x) = f (t) dt a for all x ∈ [a, b], we see that F is a well-defined since f is Riemann integrable on [a, x] by Theorem 7.1.19. Lemma 7.2.1. Let f : [a, b] → R be a bounded, Riemann integrable function on [a, b] and let F : [a, b] → R be defined by Z x F (x) = f (t) dt a for all x ∈ [a, b]. Then F is continuous on [a, b]. Proof. Since f ([a, b]) is bounded, M = max{|f (x)| | x ∈ [a, b]} < ∞. If x1 , x2 ∈ [a, b] and x1 < x2 , then we easily see by Theorem 7.1.19 that Z x 2 −M |x1 − x2 | ≤ f (t) dt ≤ M |x2 − x1 |. x1 Since F (x2 ) − F (x1 ) = Z x2 a f (t) dt − Z x1 Z x2 f (t) dt = a f (t) dt, x1 for all x1 < x2 , it easily follows that F is continuous (i.e. as x2 tends to x1 (from any appropriate direction), M |x2 − x1 | tends to zero so F (x2 ) tends to F (x1 ) by the Squeeze Theorem). As the function F is continuous, we may as, “Is F differentiable?” The First Fundamental Theorem of Calculus shows this is true and enables us to compute the derivative. In fact, the following shows that if we integrate a function f to obtain F , then the derivative of F is f ; that is, derivatives undo integration in a certain sense. 136 CHAPTER 7. INTEGRATION Theorem 7.2.2 (The Fundamental Theorem of Calculus, I). Let f : [a, b] → R be continuous on [a, b] and let F : [a, b] → R be defined by Z x F (x) = f (t) dt a for all x ∈ [a, b]. Then F is differentiable on (a, b) and F 0 (x) = f (x) for all x ∈ (a, b). Proof. Fix x ∈ (a, b). To see that F (x + h) − F (x) = f (x) h→0 h lim let > 0. Since f is continuous at x, there exists a δ > 0 such that if |t − x| < δ then |f (t) − f (x)| < . Suppose |h| < δ. If 0 < h < δ, then F (x + h) − F (x) − f (x) h Z 1 x+h = f (t) dt − f (x) h x Z 1 x+h = f (t) − f (x) dt h x Z x+h 1 |f (t) − f (x)| dt h x Z 1 x+h ≤ dt as |t − x| ≤ δ for all t in the integral h x 1 = ((x + h) − x) = . h ≤ Similarly, if −δ < h < 0, then F (x + h) − F (x) − f (x) h Z x 1 = − f (t) dt − f (x) h x+h Z 1 x f (t) − f (x) dt = − h x+h Z x 1 |f (t) − f (x)| dt h x+h Z 1 x ≤− dt as |t − x| ≤ δ for all t in the integral h x+h 1 = − (x − (x + h)) = . h ≤− 7.2. THE FUNDAMENTAL THEOREMS OF CALCULUS 137 Hence, for all h with 0 < |h| < δ, F (x + h) − F (x) ≤ . − f (x) h Thus, as was arbitrary, lim h→0 F (x + h) − F (x) = f (x). h Hence F 0 (x) exists and F 0 (x) = f (x). As the First Fundamental Theorem of Calculus shows that derivatives undo integration, the Second Fundamental Theorem of Calculus shows that integration undoes derivatives in a certain sense. In particular, the Second Fundamental Theorem of Calculus shows us that if we know the antiderivative of a function f , then we can compute the Riemann integral of f . Theorem 7.2.3 (The Fundamental Theorem of Calculus, II). Let f, g : [a, b] → R be such that f is Riemann integrable on [a, b], g is continuous on [a, b], g is differentiable on (a, b), and g 0 (x) = f (x) for all x ∈ (a, b). Then Z b f (t) dt = g(b) − g(a). a Proof of Theorem 7.2.3 when f is continuous. Define F : [a, b] → R by Z x F (x) = f (t) dt a for all x ∈ [a, b]. By the First Fundamental Theorem of Calculus (Theorem 7.2.2), F (x) is differentiable on (a, b) with F 0 (x) = f (x) = g 0 (x) for all x ∈ (a, b). Hence, by Corollary 6.4.4, there exists a constant α ∈ R such that F (x) = g(x) + α for all x ∈ (a, b). Since F is continuous on [a, b] by Lemma 7.2.1 and since g is continuous on [a, b] by assumption, we have that F (x) = g(x) + α for all x ∈ [a, b]. Hence Z b f (t) dt = F (b) − 0 a = F (b) − F (a) = (g(b) + α) − (g(a) + α) = g(b) − g(a). Proof of Theorem 7.2.3, no additional assumptions. Notice g is continuous on [a, b] by Theorem 6.1.7. 138 CHAPTER 7. INTEGRATION Let > 0. By Theorem 7.1.15 there exists a partition P of [a, b] such that Z b L(f, P) ≤ f (t) dt ≤ U (f, P) ≤ L(f, P) + . a Write P = {tk }nk=0 where a = t0 < t1 < t2 < · · · < tn−1 < tn = b. Since g is continuous on [a, b], for each k ∈ {1, . . . , n}, the Mean Value Theorem (Theorem 6.4.2) implies there exists a xk ∈ (tk−1 , tk ) such that g(tk ) − g(tk−1 ) = g 0 (xk ) = f (xk ) tk − tk−1 =⇒ g(tk )−g(tk−1 ) = f (xk )(tk −tk−1 ). Notice n X n X f (xk )(tk − tk−1 ) = k=1 g(tk ) − g(tk−1 ) = g(tn ) − g(t0 ) = g(b) − g(a). k=1 Furthermore, since L(f, P) ≤ n X f (xk )(tk − tk−1 ) ≤ U (f, P) ≤ L(f, P) + k=1 we obtain that L(f, P) ≤ g(b) − g(a) ≤ L(f, P) + . Since L(f, P) ≤ Z b f (t) dt ≤ U (f, P) ≤ L(f, P) + , a we obtain that Z b f (t) dt ≤ . g(b) − g(a) − a Therefore, as > 0 was arbitrary, the result follows. Of course, we do not have the time in this course to pursue all of the possible common methods of computing Riemann integrals. However, using the second Fundamental Theorem of Calculus, we easily can compute some 7.2. THE FUNDAMENTAL THEOREMS OF CALCULUS 139 derivatives: Z x Z0 x Z 1x tn dt = xn+1 0n+1 xn+1 − = n+1 n+1 n+1 1 dt = ln(x) − ln(1) = ln(x) t et dt = ex − e0 = ex − 1 Z x 0 sin(t) dt = − cos(x) − (− cos(0)) = − cos(x) + 1 Z 0x Z 0x cos(t) dt = sin(x) − (sin(0)) = sin(x) sec2 (t) dt = tan(x) − tan(0) = tan(x) 0 Z x sec(t) tan(t) dt = sec2 (x) − sec2 (0) = sec2 (x) − 1 0 Z x 1 dt = arcsin(x) − arcsin(0) = arcsin(x) 1 − t2 0 Z x π 1 dt = arccos(x) − arccos(0) = arccos(x) − −√ 2 1 − t2 0 √ and so on. To complete our course, let us prove one of the most common methods of integration (if you feel substitution is more common, note the method of substitution easily follows from the Chain Rule and the Second Fundamental Theorem of Calculus whereas the following requires most of the theory we have developed in this course). Corollary 7.2.4 (Integration by Parts). Suppose [a, b] ⊆ (c, d), that f, g : (c, d) → R are continuous and differentiable on [a, b], and f 0 , g 0 : [a, b] → R are Riemann integrable. Then Z b f (x)g 0 (x) dx = f (b)g(g) − f (a)g(a) − a Z b f 0 (x)g(x) dx. a Proof. Let h : [a, b] → R be defined by h(x) = f 0 (x)g(x) + f (x)g 0 (x). Since h is Riemann integrable on [a, b] by Theorems 7.1.17, 7.1.19, and 7.1.22, since f g is differentiable on (c, d) with (f g)0 (x) = h(x) for all x ∈ [a, b] by the Product Rule (Theorem 6.1.10), we obtain by the Second Fundamental Theorem of Calculus (Theorem 7.2.3), and by Theorems 7.1.19, and 7.1.22 140 CHAPTER 7. INTEGRATION that f (b)g(b) − f (a)g(a) = Z b h(x) dx a Z b = a Z b = a Thus the result follows. f 0 (x)g(x) + f (x)g 0 (x) dx f 0 (x)g(x) dx + Z b a f (x)g 0 (x) dx. Index absolute value, 10 anti-derivative, 102 Archimedean Property, 15 Axiom of Choice, 47 Bolzano-Weierstrass Theorem, 30 bounded above, general, 58 bounded below, 13 bounded, above, 13 bounded, subset of R, 13 Cantor-Schröder–Bernstein Theorem, 53 Carathéodory’s Theorem, 89 cardinality, 51 cardinality, less than or equal to, 52 Cartesian product, 45 Cartesian product, multiple, 47 Cauchy’s Mean Value Theorem, 104 chain, 58 chain rule, 90 common refinement, 119 Comparison Theorem, functions, 68 Comparison Theorem, sequences, 27 Completeness of R, 33 continuous function, 73 continuous, uniformly, 78 countable, 55 countably infinite, 55 Decreasing Function Theorem, 103 derivative, 84 differentiable, 83 equinumerous, 51 equivalence class, 35 141 142 equivalence relation, 35 Extreme Value Theorem, 97 field, 8 field, ordered, 10 field, subfield, 8 finite intersection property, 43 First Derivative Test, 103 function, 45 function, x → −∞, 71 function, x → ∞, 71 function, bijection, 49 function, co-domain, 49 function, composition, 51 function, continuous (closed interval), 76 function, continuous (open interval), 76 function, continuous (point), 73 function, decreasing, 91 function, diverge to infinity, 72 function, domain, 49 function, global maximum, 97 function, global minimum, 97 function, increasing, 91 function, injective, 49 function, left-sided limit, 69 function, limit, 64 function, local maximum, 97 function, local minimum, 97 function, monotone, 91 function, non-decreasing, 91 function, non-increasing, 91 function, one-to-one, 49 function, onto, 48 function, preimage, 49 function, range, 48 function, right-sided limit, 68 function, surjective, 48 function, two-sided-limit, 68 Fundamental Theorem of Calculus, I, 136 Fundamental Theorem of Calculus, II, 137 Fundamental Trigonometric Limit, 70 greatest lower bound, 13 Greatest Lower Bound Property, 14 INDEX INDEX Heine-Borel Theorem, 40 Increasing Function Theorem, 102 infimum, 26 integration by parts, 139 Intermediate Value Theorem, 77 interval, 11 interval, closed, 11 interval, open, 11 Inverse Function Theorem, 94 jump discontinuity, 75 L’Hôpital’s Rule, 105 least upper bound, 13 Least Upper Bound Property, 14 limit infimum, 27 limit supremum, 27 limit, function, 64 limit, sequence, 18 lower bound, R, 13 maximal element, 58 Mean Value Theorem, 100 Monotone Convergence Theorem, 21 open cover, 38 partial ordering, 9 partially ordered set, 57 partition, 115 peak point, 30 Peak Point Lemma, 30 Peano’s Axioms, 3 poset, 57 Principle of Mathematical Induction, 4 Principle of Strong Induction, 6 product rule, 87 quotient rule, 89 rational function, 67 refinement, 117 removable discontinuity, 74 Riemann integrable, 120 Riemann sum, 123 143 144 Riemann sum, lower, 116 Riemann sum, upper, 116 Rolle’s Theorem, 100 second derivative, 111 sequence, 17 sequence, bounded, 20 sequence, Cauchy, 32 sequence, constant, 17 sequence, converging, 18 sequence, decreasing, 20 sequence, diverging, 18 sequence, diverging to infinity, 24 sequence, Fibonacci, 17 sequence, increasing, 20 sequence, limit, 18 sequence, monotone, 21 sequence, non-decreasing, 20 sequence, non-increasing, 20 sequence, recursively defined, 17 set, 1 set, closed, 36 set, closure, 42 set, compact, 39 set, compliment, 2 set, difference, 2 set, element, 1 set, intersection, 2 set, open, 34 set, power set R, 9 set, sequentially compact, 41 set, subset, 2 set, union, 2 Squeeze Theorem, functions, 67 Squeeze Theorem, sequences, 25 subfield, 8 subsequence, 29 supremum, 26 Taylor polynomial, 111 Taylor’s Theorem, 112 total ordering, 9 uncountable, 55 uniform partition, 124 INDEX INDEX upper bound, R, 13 upper bound, arbitrary, 58 Well-Ordering Principle, 6 Zorn’s Lemma, 59 145