ANALYSIS — AN INTRODUCTORY COURSE Ivan F Wilde Mathematics Department King’s College London iwilde@mth.kcl.ac.uk Contents 1 Sets 1 2 The Real Numbers 9 3 Sequences 29 4 Series 59 5 Functions 81 6 Power Series 105 7 The elementary functions 111 Chapter 1 Sets It is very convenient to introduce some notation and terminology from set theory. A set is just a collection of objects — which will usually be certain mathematical objects, such as numbers, points in the plane, functions or some such. If A denotes some given set and x denotes an object belonging to A, then this fact is indicated by the expression x∈A to be read as “x belongs to A”, or “x is a member of A”, or “x is an element of A”. If x denotes some object which does not belong to the set A, then this is indicated by the symbolism x∈ /A and is read as “x does not belong to A”, or “x is not a member of A”, or “x is not an element of A”. To say that the sets A and B are equal is to say that they have the same elements. In other words, to say that A = B is to say both that if x ∈ A then also x ∈ B and if y ∈ B then also y ∈ A. We can write this as ( x∈A =⇒ x ∈ B A = B is the same as y∈B =⇒ y ∈ A. The verification that given sets A and B are equal is made up of two parts. The first is the verification that every element of A is also an element of B and the second part is the verification that every element of B is also an element of A. We list a few examples of sets and also introduce some notation. 1 2 Chapter 1 Examples 1.1. 1. The set consisting of the three integers 2, 3, 4. We write this as { 2, 3, 4 }. 2. The set of natural numbers { 1, 2, 3, 4, 5, 6, . . . } (i.e., all strictly positive integers). This set is denoted by N. Notice that 0 ∈ / N. √ 3. The set of all real numbers, denoted by R. For example, 8, −11, 0, 5, − 12 , 13 , π are elements of R. 4. The set of complex numbers is denoted by C. 5. The set of all integers (positive, negative and including zero) is denoted by Z. 6. The set of all rational numbers (all real numbers of the form m n for integers m, n with n 6= 0) is denoted by Q. For example, the real numbers √ 3 17 2∈ / Q. 4 , − 9 , 0, 78, −3 belong to Q, but 7. The set of even natural numbers { 2, 4, 6, 8, . . . }. This could also be written as { n ∈ N : n = 2m for some m ∈ N }. (The colon ‘:’ stands for “such that” (or “with the property that”), so this can be read as “the set of all n in N such that n = 2m for some m in N”.) 8. The set { x ∈ R : x > 1 } is the set of all those real numbers strictly greater than 1. 9. The set { z ∈ C : |z| = 1 } is the set of complex numbers with absolute value equal to 1. This is the “unit circle” in C ( — the circle with centre at the origin and with radius equal to 1). Certain sets of real numbers, so-called “intervals”, are given a special notation with the use of round and square brackets. Let a ∈ R and b ∈ R and suppose that a < b. { x ∈ R : a ≤ x ≤ b } is denoted [ a, b] (closed interval) { x ∈ R : a < x < b } is denoted (a, b) (open interval) { x ∈ R : a ≤ x < b } is denoted [ a, b) (closed-open interval) { x ∈ R : a < x ≤ b } is denoted (a, b] (open-closed interval) { x ∈ R : x ≤ a } is denoted (−∞, a] { x ∈ R : x < a } is denoted (−∞, a) { x ∈ R : a ≤ x } is denoted [ a, ∞) { x ∈ R : a < x } is denoted (a, ∞). Department of Mathematics 3 Sets It is important to realize that all this is just notation — a useful visual short-hand. In particular, the symbol ∞ is used in four of the cases. This in no way is meant to imply that ∞ represents a real number — it positively, absolutely, certainly is not. ∞ is not a real number. There is no such real number as ∞. Given sets A and B, we say that A is a subset of B if every element of A is also an element of B, i.e., x ∈ A =⇒ x ∈ B. If this is the case, we write A⊆B — read “A is a subset of B”. By virtue of our earlier discussion of the equality A = B, we can say that A=B ⇐⇒ both A ⊆ B and B ⊆ A. “ is equivalent to ” “ if and only if ” We have N ⊆ R, Q ⊆ R, N ⊆ Z. Definition 1.2. Suppose that A and B are given sets. The union of A and B, denoted by A ∪ B, is the set with elements which belong to either A or B (or both); A ∪ B = { x : x ∈ A or x ∈ B } — read “A union B equals . . . ”. Note that the usage of the word “or” allows “both”. In non-mathematical language, the union A ∪ B is obtained by bundling together everything in A and everything in B. Clearly, by construction, A ⊆ A ∪ B and also B ⊆ A ∪ B. Example 1.3. Suppose that A = { 1, 2, 3 } and B = { 3, 6, 8 }. Then we find that A ∪ B = { 1, 2, 3, 6, 8 }. Definition 1.4. The intersection of A and B, denoted by A ∩ B, is the set with elements which belong to both A and B; A ∩ B = { x : x ∈ A and x ∈ B } — read “A intersect B equals . . . ”. In non-mathematical language, the intersection A ∩ B is got by selecting everything which belongs to both A and B. Clearly, by construction, we see that A ∩ B ⊆ A and also A ∩ B ⊆ B. King’s College London 4 Chapter 1 Example 1.5. With A = { 1, 2, 3 } and B = { 3, 6, 8 }, as in the example above, we see that A ∩ B = { 3 }. If A and B have no elements in common then their intersection A∩B has no elements at all. It is convenient to provide a symbol for this situation. We let ∅ denote the set with no elements. ∅ is called the “empty set”. Then A ∩ B = ∅ if A and B have no common elements. In such a situation, we say that A and B are disjoint. Example 1.6. Let A and B be the intervals in R given as A = (1, 4] and B = (4, 6). Then A ∩ B = ∅ and A ∪ B = (1, 6). Remark 1.7. Let A and B be given sets and consider the truth, or otherwise, of the statement “ A ⊆ B ”. This fails to be true precisely when A possesses an element which is not a member of B. Now suppose that A = ∅. The statement “ ∅ ⊆ B ” is false provided that there is some “nuisance” element of ∅ which is not an element of B. However, ∅ has no elements at all, so there can be no such “nuisance” element. In other words, the statement “ ∅ ⊆ B ” cannot be false and consequently must be true; ∅ obeys ∅ ⊆ B for any set B. This might seem a bit odd, but is just a logical consequence of the formalism. Theorem 1.8. For sets A, B and C, we have (1) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C). (2) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C). Proof. (1) We must show that lhs ⊆ rhs and that rhs ⊆ lhs. First, we shall show that lhs ⊆ rhs. If lhs = ∅, then we are done, because ∅ is a subset of any set. So now suppose that lhs 6= ∅ and let x ∈ lhs = A ∪ (B ∩ C). Then x ∈ A or x ∈ (B ∩ C) (or both). (i) Suppose x ∈ A. Then x ∈ A ∪ B and also x ∈ A ∪ C and therefore x ∈ (A ∪ B) ∩ (A ∪ C), that is, x ∈ rhs. (ii) Suppose that x ∈ (B ∩ C). Then x ∈ B and x ∈ C and so x ∈ A ∪ B and also x ∈ A ∪ C. Therefore x ∈ (A ∪ B) ∩ (A ∪ C), that is, x ∈ rhs. So in either case (i) or (ii) (and at least one these must be true), we find that x ∈ rhs. Since x ∈ lhs is arbitrary, we deduce that every element of the left hand side is also an element of the right hand side, that is, lhs ⊆ rhs. Now we shall show that rhs ⊆ lhs. If rhs = ∅, then there is no more to prove. So suppose that rhs 6= ∅. Let x ∈ rhs. Then x ∈ (A ∪ B) and x ∈ (A ∪ C). Case (i): suppose x ∈ A. Then certainly x ∈ A ∪ (B ∩ C) and so x ∈ lhs. Case (ii): suppose x ∈ / A. Then since x ∈ (A ∪ B), it follows that x ∈ B. Also x ∈ (A ∪ C) and so it follows that x ∈ C. Hence x ∈ B ∩ C and so x ∈ A ∪ (B ∩ C) which tells us that x ∈ lhs. Department of Mathematics 5 Sets We have seen that every element of the right hand side also belongs to the left hand side, that is, rhs ⊆ lhs. Combining these two parts, we have lhs ⊆ rhs and also rhs ⊆ lhs and so it follows that lhs = rhs, as required. (2) This is left as an exercise. The notions of union and intersection extend to the situation with more than just two sets. For example, A1 ∪ A2 ∪ A3 = { x : x ∈ A1 or x ∈ A2 or x ∈ A3 } = { x : x belongs to at least one of the sets A1 , A2 , A3 } = { x : x ∈ Ai for some i = 1 or 2 or 3 } = { x : x ∈ Ai for some i ∈ { 1, 2, 3 } }. More generally, for n sets A1 , A2 , . . . , An , we have A1 ∪ A2 ∪ · · · ∪ An = { x : x ∈ Ai for some i ∈ { 1, 2, . . . , n } }. This union is often denoted by n [ Ai which is i=1 A1 ∪A2 ∪· · ·∪An . Let Λ denote somewhat more concise than the alternative the “index set” { 1, 2, . . . , n }. This is just the set of labels for the collection of sets we are considering. Then the above can be conveniently written as [ Ai = { x : x ∈ Ai for some i ∈ Λ }. i∈Λ This all makes sense for any non-empty index set. Indeed, suppose that we have some collection of sets indexed (that is, labelled,) by a set Λ. Suppose the set with label λ ∈ Λ is denoted by Aλ . The union of all the Aλ s is defined to be [ Aλ = { x : x ∈ Aλ for some λ ∈ Λ }. λ∈Λ If Λ = N, one often writes ∞ [ Ai for i=1 [ Aλ . λ∈Λ Examples 1.9. 1. Suppose that Λ = { 1, 2, 3, . . . , 57, 58 } and Aj = [j, j + 1] for each j ∈ Λ. (So, for example, with j = 7, A7 = [7, 7 + 1] = [7, 8].) Then 58 [ Aj = [1, 59]. j=1 King’s College London 6 Chapter 1 2. Suppose that Λ = N and Aj = [1, j + 1] for j ∈ N. Then ∞ [ Aj = [1, ∞). j=1 S To see this, suppose that x ∈ ∞ j=1 . Then x is an element of at least one of the Aj s, that is, there is some j0 , say, in N such that x ∈ Aj0 . This means that x ∈ [1, j0 + 1], that is, 1 ≤ x ≤ j0 + 1 and so certainly x ∈ [1, ∞). It follows that lhs ⊆ rhs. Now suppose that x ∈ [1, ∞). Then, in particular, x ≥ 1. Let N be any natural number satisfying N > x. Then certainly x satisfies S 1 ≤ x ≤ N + 1 which means that x ∈ AN and so x ∈ ∞ A j=1 j . Hence rhs ⊆ lhs and the equality lhs = rhs follows. 3. Suppose that Λ is the interval (0, 1) and, for each λ ∈ (0, 1), Aλ is given by Aλ = { (x, y) ∈ R2 : x = λ }. In other words, Aλ is the vertical line x = λ in the plane R2 . Then [ Aλ = { (x, y) ∈ R2 : 0 < x < 1 } λ∈Λ which is the vertical strip in R2 with boundary edges given by the lines with x = 0 and x = 1, respectively. Note that these lines (boundary edges) are not part of the union of the Aλ s. 4. Let Λ be the interval [3, 5] and for each λ ∈ [3, 5] let Aλ = { λ }. In other words, Aλ consists of just one point, the real number λ. Then [ Aλ = [3, 5] λ∈Λ which just says that the interval [3, 5] is the union of all its points (as it should be). A similar discussion can be made regarding intersections. A1 ∩ A2 ∩ A3 = { x : x ∈ A1 and x ∈ A2 and x ∈ A3 } = { x : x belongs to every one of the sets A1 , A2 , A3 } = { x : x ∈ Ai for all i = 1 or 2 or 3 } = { x : x ∈ Ai for all i ∈ { 1, 2, 3 } }. In general, if { Aλ }Λ is any collection of sets indexed by the (non-empty) set Λ, then the intersection of the Aλ s is \ Aλ = { x : x ∈ Aλ for all λ ∈ Λ }. λ∈Λ Department of Mathematics 7 Sets If Λ = { 1, 2, . . . , n }, we usually write we usually write ∞ \ i=1 Ai for \ n \ Ai for i=1 \ Aλ and if Λ = N, then λ∈Λ Aλ . λ∈Λ Examples 1.10. 1. Suppose that Λ = N and for each j ∈ Λ = N, let Aj = [0, j]. Then \ Aj = [0, 1]. j∈N 2. Let Λ = N and set Aj = [j, j + 1] for j ∈ N. Then ∞ \ Aj = ∅. j=1 3. Let Λ = N and set Aj = [j, ∞) for j ∈ N. Then ∞ \ Aj = ∅. j=1 T∞ To see this, note that x ∈ j=1 provided that x belongs to every Aj . This means that x satisfies j ≤ x ≤ j + 1 for all j ∈ N. But clearly this fails whenever j is a natural number strictly greater than x. In other words, there are no real numbers which satisfy this criterion. 4. Suppose that Λ = N and for each k ∈ N let Ak be the interval given by Ak = [0, 1/k). Then, in this case, ∞ \ Ak = { 0 }. k=1 This follows because the only non-negative real number which is smaller than every 1/k (where k ∈ N) is zero. 5. Let Λ = N and let Ak = [0, 1 + 1/k] for k ∈ N. Then ∞ \ Ak = [0, 1]. k=1 Indeed, [0, 1] ⊆ Ak for every k and if x ∈ / [0, 1] then x must fail to belong to some Ak . King’s College London 8 Chapter 1 Theorem 1.11. Suppose that A and Bλ , for λ ∈ Λ, are given sets. Then \ ¡\ ¢ Bλ = (A ∪ Bλ ). (1) A ∪ λ∈Λ (2) A ∩ ¡[ λ∈Λ λ∈Λ ¢ Bλ = [ (A ∩ Bλ ). λ∈Λ ¡T ¢ that x ∈ A ∪ B . If x ∈ A, then x ∈ A ∪ Bλ for Proof. (1) Suppose λ λ∈Λ T all λTand so x ∈ λ∈Λ (A ∪ Bλ ). If x ∈ / A, then it must T be the case that x ∈ λ∈Λ Bλ , in which case x ∈ B for all λ and so x ∈ λ λ∈Λ (A ∪ Bλ ). We ¡T ¢ T have shown that A ∪ B ⊆ (A ∪ B ). λ λ∈Λ λ λ∈Λ T To establish the reverse inclusion, suppose that x ∈ λ∈Λ (A∪B ¡T λ ). Then ¢ x ∈ A ∪ Bλ for every λ ∈ Λ. If x ∈ A, the certainly x ∈ A ∪ B . If λ λ∈Λ T x∈ / A, then we must have that x¡T∈ Bλ for¢ every λ, that is, x ∈ λ∈Λ Bλ . But then itTfollows that x ∈ A ∪¡T λ∈Λ Bλ¢ . Hence λ∈Λ (A ∪ Bλ ) ⊆ A ∪ λ∈Λ Bλ and so the equality A∪ ¡\ λ∈Λ \ ¢ (A ∪ Bλ ) Bλ = λ∈Λ follows. (2) The proof of this proceeds along similar lines to part (1). Department of Mathematics Chapter 2 The Real Numbers In this chapter, we will discuss the properties of R, the real number system. It might well be appropriate to ask exactly what a real number is? It is the job of mathematics to set out clear descriptions of the objects within its scope, so it is not at all unreasonable to expect an answer to this. One must start somewhere. For example, in geometry, one might take the concept of “point” as a basic undefined object. Lines are then specified by pairs of points — the line passing through them. Beginning with the natural numbers, N, one can construct Z and from Z one constructs the rationals, Q. Finally from Q it is possible to construct the real numbers R. We will not do this here, but rather we will take a close look at the structure and special properties of R. Of course, everybody knows that numbers can be added and multiplied and even subtracted and it makes sense to divide one number by another (as long as the latter, the denominator, is not zero). We can also compare two numbers and discuss which is the larger. It is precisely these properties (or axioms) that we wish to isolate and highlight. Arithmetic To each pair of real numbers a, b ∈ R, there corresponds a third, denoted a + b. This “pairing”, denoted ‘ + ’ and called “addition” obeys (A1) a + (b + c) = (a + b) + c, for all a, b, c ∈ R. (A2) a + b = b + a, for all a, b ∈ R. (A3) There is a unique element, denoted 0, in R such that a + 0 = a, for any a ∈ R. (A4) For any a ∈ R, there is a unique element (denoted −a) in R such that a + (−a) = 0. The properties (A1) – (A4) say that R is an abelian group with respect to the binary operation ‘ + ’. 9 10 Chapter 2 Next, we consider multiplication. To each pair a, b ∈ R, there is a third, denoted a.b, the “product” of a and b. The operation ‘ . ’, called multiplication, obeys (A5) a.(b.c) = (a.b).c, for any a, b, c ∈ R. (A6) a.b = b.a, for any a, b ∈ R. (A7) There is a unique element, denoted 1, in R, with 1 6= 0 and such that a.1 = a, for any a ∈ R. (A8) For any a ∈ R with a 6= 0, there is a unique element in R, written a−1 or a1 , such that a.a−1 = 1. The element a−1 is called the (multiplicative) inverse, or reciprocal, of a. (A9) a.(b + c) = a.b + a.c, for all a, b, c ∈ R. Remarks 2.1. 1. 0−1 is not defined. The element 0 has no reciprocal. Such an object simply does not exist in R. 1/0 has no meaning. 2. Subtraction is given by a − b = a + (−b), for a, b ∈ R. 3. Division is defined via a ÷ b = a.(b−1 ) ( = a. 1b = a/b) provided b 6= 0. If it should happen that b = 0, then the expression a/b has no meaning. 4. It is usual to omit the dot and write just ab for the product a.b. There is almost never any confusion from this. All the familiar arithmetic results are consequences of the above properties (A1) – (A9). Examples 2.2. 1. For any x ∈ R, x.0 = 0. Proof. By (A3), 0 + 0 = 0 and so x.(0 + 0) = x.0. Hence, by (A9), x.0 + x.0 = x.0. Adding −(x.0) to both sides gives (x.0 + x.0) + (−(x.0)) = x.0 + (−(x.0)) = 0, by (A4). Hence, by (A1), x.0 + (x.0 + (−(x.0)) = 0 and so, using property (A4) again, we get x.0 + 0 = 0. However, by (A3), x.0 + 0 = x.0 and so by equating these last two expressions for x.0 + 0 we obtain x.0 = 0, as required. Department of Mathematics 11 The Real Numbers 2. For any x, y ∈ R, x.(−y) = −(x.y). Proof. (x.y) + x.(−y) = x.(y + (−y)) , = x.0 , = 0, by (A9), by (A4), by the previous result. By (A4) (uniqueness), −(x.y) must be the same as x.(−y). 3. For any x ∈ R, −(−x) = x. Proof. We have x = x + 0, by (A3), = x + ((−x) + (−(−x))) , by (A4), = (x + (−x)) + (−(−x)) , by (A1), = 0 + (−(−x)) , = −(−x) , by (A4), by (A3) as required. 4. For any x, y ∈ R, x.y = (−x).(−y). Proof. By example 2, above, α.(−β) = −(α, β) for any α, β ∈ R. If we now choose α = −x and β = y, we get (−x).(−y) = −((−x).y) = −(y.(−x)) , by (A6), = −(−(y.x)) , by example 2, above, = −(−(x.y)) , by (A6), = x.y by example 3, above, and we are done. King’s College London 12 Chapter 2 Order properties Here we formalize the idea of one number being greater than another. We can “order” two numbers by thinking of the larger as being the higher in order. More precisely, there is a relation < (read “less than”) between elements of R satisfying the following: (A10) For any a, b ∈ R, exactly one of the following is true: a < b, b < a or a=b (trichotomy). The notation u > a (read “u is greater than a”) means that a < u. (A11) If a < b and b < c, then a < c. (A12) If a < b, then a + c < b + c, for any c ∈ R. (A13) If a < b and γ > 0, then aγ < bγ. Notation We write a ≤ b to signify that either a < b is true or else a = b is true. In view of (A10), we can say that a ≤ b means that it is false that a > b. The notation x ≥ w is used to mean that w ≤ x and as already noted above, x > w is used to mean w < x. By (A10), if x 6= 0, then either x > 0 or else x < 0. If x > 0, then x is said to be (strictly) positive and if x < 0, we say that x is (strictly) negative. Thus, if x is not zero, then it is either positive or else it is negative. It is quite common to call a number x positive if it obeys x ≥ 0 or negative if it obeys x ≤ 0. Should it be necessary to indicate that x is not zero, then one adds the adjective ‘strictly’. Examples 2.3. 1. For any x ∈ R, we have x > 0 ⇐⇒ (−x) < 0. Proof. Using (A12), we have 0 < x =⇒ 0 + (−x) < x + (−x) (adding (−x) to both sides), =⇒ (−x) < 0 since rhs = 0, by (A4). Conversely, again from (A12), (−x) < 0 =⇒ (−x) + x < 0 + x (adding x to both sides), =⇒ 0 < x by (A2), (A3) and (A4) and the result follows. Department of Mathematics 13 The Real Numbers 2. For any x 6= 0, we have x2 > 0. Proof. Since x 6= 0, we must have either x > 0 or x < 0, by (A10). If x > 0, then by (A13) we have x2 > 0 (take a = 0, b = γ = x). On the other hand, if x < 0, then −x > 0 by the example above. Hence, by (A13) (with a = 0, b = γ = (−x)), it follows that (−x)(−x) > 0. But we know (from the arithmetic properties) that (−α)(−β) = α β, for any α, β ∈ R and so we have x2 = x x = (−x)(−x) > 0 as required. The number 1 was introduced in (A7). If we set x = 1 here, then we see that 1 = 12 > 0, i.e., 1 > 0. We have deduced that the number 1 is positive. Nobody would doubt this, but we see explicitly that this is a consequence of our set-up. Note that it follows from this, by (A12), that a < a + 1, for any a ∈ R. 3. If a, b ∈ R with a ≤ b, then −a ≥ −b. Proof. If a = b then certainly −a = −b, so we need only consider the case when a < b. a < b =⇒ a + (−a) < b + (−a) by (A12), =⇒ 0 < b + (−a) =⇒ −b < (−b) + b + (−a) by (A12) and (A4), =⇒ −b < −a by (A1), (A2) and (A4) and the result follows. From now on, we will work with real numbers and inequalities just as we normally would — and will not follow through a succession of steps invoking the various listed properties as required as we go. Suffice it to say that we could do so if we wished. Next, we introduce a very important function, the modulus or absolute value. King’s College London 14 Chapter 2 Definition 2.4. For any x ∈ R, the modulus (or absolute value) of x is the number |x| defined according to the rule ( x, if x ≥ 0, |x| = −x, if x < 0. ¯ ¯ For example, |5| = 5, |0| = 0, |−3| = 3 and ¯− 12 ¯ = 12 . Note that |x| is never negative. We also see that |x| = max{ x, −x }. Let f (x) = x and g(x) = −x. Then |x| = f (x) when x ≥ 0 and |x| = g(x) when x < 0. Now, we know what the graphs of y = f (x) = x and y = g(x) = −x look like and so we can sketch the graph of the function |x|. It is made up of two straight lines, meeting at the origin. |x| 6 ¡ ¡ @ @ ¡y = x @ y = −x@ ¡ ¡ @ ¡ @ ¡ ¡ @ @ @¡ 0 - x Figure 2.1: The absolute value function |x|. The basic properties of the absolute value are contained in the following two propositions. They are used time and time again in analysis and it is absolutely essential to be fluent in their use. Proposition 2.5. (i) For any a, b ∈ R, we have |ab| = |a| |b|. (ii) For any a ∈ R and r > 0, the inequality |a| < r is equivalent to the pair of inequalities −r < a < r. Proof. (i) We just consider the various possibilities. If either a or b is zero, then so is the product ab. Hence |ab| = 0 and at least one of |a| or |b| is also zero. Therefore |ab| = 0 = |a| |b|. If both a > 0 and b > 0, then ab > 0 and we have |ab| = ab, |a| = a and |b| = b and so |ab| = |a| |b| in this case. Now, if a > 0 but b < 0, then ab < 0 so we have |a| = a, |b| = −b and |ab| = −ab = |a| |b|. The case a < 0 and b > 0 is similar. Finally, suppose that both a < 0 and b < 0. Then ab > 0 and we have |ab| = ab, |a| = −a and |b| = −b. Hence, |ab| = ab = (−a)(−b) = |a| |b|. Department of Mathematics 15 The Real Numbers (ii) Suppose that |a| < r. Then max{ a, −a } < r and so both a < r and −a < r. In other words, a < r and −r < a which can be written as −r < a < a. On the other hand, if −r < a < r, then both a < r and −a < r so that max{ a, −a } < r. That is, |a| < r, as required. Remark 2.6. Putting b = −1 in (i), above, and using the fact that |−1| = 1, we see that |−a| = |a|. Proposition 2.7. For any real numbers a and b, (i) |a + b| ≤ |a| + |b|. (ii) |a − b| ≤ |a| + |b|. (iii) ||a| − |b|| ≤ |a − b|. Proof. (i) We have a + b ≤ |a| + |b| and −(a + b) = −a − b ≤ |a| + |b|. Hence |a + b| = max{ a + b, −(a + b) } ≤ |a| + |b| . (ii) Let c = −b and apply (i) to the real numbers a and c to get the inequality |a + c| ≤ |a| + |c|. But then this means that |a − b| ≤ |a| + |b|. (iii) We have |a| = |(a − b) + b| ≤ |a − b| + |b| by part (i) (with (a − b) replacing a). This implies that |a| − |b| ≤ |a − b|. Swapping around a and b, we have −(|a| − |b|) = |b| − |a| ≤ |b − a| = |a − b| and therefore | |a| − |b| | = max{ |a| − |b| , −(|a| − |b|) } ≤ |a − b| as required. If a and b are real numbers, how far apart are they? For example, if a = 7 and b = 11 then we might say that the distance between a and b is 4. If, on the other hand, a = 10 and b = −6, then we would say that the distance between them is 16. In either case, we notice that the distance is given by |a − b|. It is extremely useful to view |a − b| as the distance between the numbers a and b. For example, to say that |a − b| is “very small” is to say that a and b are “close” to each other. King’s College London 16 Chapter 2 Proposition 2.8. Let a, b ∈ R be given and suppose that for any given ε > 0, a and b obey the inequality a < b + ε. Then a ≤ b. In particular, if x < ε for all ε > 0, then x ≤ 0. Proof. We know that either a ≤ b or else a > b. Suppose the latter were true, namely, a > b. Set ε1 = a − b. Then ε1 > 0 and a = b + ε1 . Taking ε = ε1 , we see that this conflicts with the hypothesis that a < b + ε for every ε > 0 (it fails for the choice ε = ε1 ). We conclude that a > b must be false and so a ≤ b. For the last part, simply set a = x and b = 0 to get the desired conclusion. We have listed a number of properties obeyed by the real numbers: (A1) . . . (A9) — arithmetic (A10) . . . (A13) — order. Is this it? Are there any more to be included? We notice that all of these properties are satisfied by the rational numbers, Q. Are all real numbers rational, i.e., is it true that Q = R? Or do we need to consider yet further properties which distinguish between Q and R? Consider an apparently unrelated question. Do all numbers have square roots? Since a2 is positive for any a ∈ R, it is clear that no negative number can have a square root in R. (Indeed, it is the consideration of C, the complex numbers, which allows for square roots of negative numbers.) So we ask, does every positive real number have a square root? Does every natural number n have a square root in R? In particular, is there such a real number as the square root of 2? It would be nice to think that there is such a real number. In fact, according to Pythagoras’ Theorem, this should be the length of the diagonal of a square whose sides have unit length. The following proposition tells us that there is certainly no such rational number. Proposition√2.9. There are no integers m, n ∈ N satisfying m2 = 2n2 . In particular, 2 is not a rational number. √ Proof. To say that 2 is rational√is to say that there are integers m and n (with n 6= 0) such that m/n = 2. This means that m2 /n2 = 2 and so m2 = 2n2 for m, n ∈ N. (By replacing m or n by −m or −n, if necessary, we may assume that m and n in this last equality are both positive.) So the √ fact that 2 ∈ / Q is a consequence of the first part of the proposition. Consider the equality m2 = 2n2 (∗) To show that m2 = 2n2 is impossible for any m, n ∈ N, suppose the contrary, namely that there are numbers m and n in N obeying (∗). We will show that this leads to a contradiction. Department of Mathematics 17 The Real Numbers Indeed, if m2 = 2n2 , then m2 is even. The square of an odd number is odd and so it follows that m must also be even. This means that we can express m as m = 2k for some suitable k ∈ N. But then (2k)2 = m2 = 2n2 which means that 2k 2 = n2 and so n2 is even. Arguing as above, we deduce that n can be expressed as n = 2j for some j ∈ N. Substituting, we see that k and j also obey (∗), namely, k 2 = 2j 2 . This tells us that m/2 and n/2 are integers also obeying (∗). Repeating this whole argument with m0 = m/2 and n0 = n/2, we find that both m0 /2 and n0 /2 belong to N and also satisfy (∗). In other words, m/22 and n/22 belong to N and obey (∗). We can keep repeating this argument to deduce that m/2j and n/2j are integers obeying (∗). In particular, m/2j ∈ N implies that m/2j ≥ 1 and so m ≥ 2j . But this holds for any j ∈ N and we can take j as large as we wish. We can take j so large that 2j > m. This leads to a contradiction, as we wanted it to. We finally conclude that there are no natural numbers m and n obeying (∗) and as a consequence, there is no element of Q whose square is equal to 2, that is, √ 2 is not a rational number. Remark 2.10. A somewhat similar argument can be used to show √ that many other numbers do not have square roots in Q. For example, 3 ∈ / Q. In √ fact, one can show that if n ∈ N, then √ either n ∈ N, that is, n is a perfect √ √ square, or else n ∈ / Q. For example, 16 = 4 ∈ N but 17 ∈ / Q. Returning to the discussion of the defining properties of R, we still have to pinpoint the extra property that R has which is not shared by Q. First we need some terminology. Definition 2.11. A non-empty subset S of R is said to be bounded from above if there is some M ∈ R such that a≤M for all a ∈ S. Any such number M is called an upper bound for the set S. Evidently, if M is an upper bound for S, then so is any number greater than M . We say that a non-empty subset S of R is bounded from below if there is some m ∈ R such that m≤a for all a ∈ S. Any such number m is called a lower bound for the set S. If m is a lower bound for S, then so is any number less than m. If S is both bounded from above and from below, then S is said to be bounded. King’s College London 18 Chapter 2 Example 2.12. Consider the set A = (−6, 4] . Then A is bounded because any x ∈ A obeys −6 ≤ x ≤ 4. (In fact, any x ∈ A obeys the inequalities −6 < x ≤ 4.) Any real number greater than or equal to 4 is an upper bound for A and any real number less than or equal to −6 is a lower bound for A. The set A has a maximal element, namely 4, but A does not have a minimal element. Let ∞ [ 3 5 7 B = (1, 2 ) ∪ (2, 2 ) ∪ (3, 2 ) ∪ · · · = (k, k + 12 ). k=1 Then the set B is bounded from below (the number 1 is clearly a lower bound for B). However, B contains k + 14 for every k ∈ N and so B is not bounded from above (so B is not bounded). We also see that B does not have a minimal element. Remark 2.13. What does it mean to say that a set S is not bounded from above? Consider the inequality x≤M. (∗) Now, given S and some particular real number M , the inequality (∗) may hold for some elements x in S but may fail for other elements of S. To say that S is bounded from above is to say that there is some M such that (∗) holds for all elements x ∈ S. If S is not bounded from above, then it must be the case that whatever M we try, there will always be some x in S for which (∗) fails, that is, for any given M there will be some x ∈ S such that x > M . In particular, if we try M = 1, then there will be some element (many, in fact) in S greater than 1. Let us pick any such element and label it as x1 . Then we have x1 ∈ S and x1 > 1. We can now try M = 2. Again, (∗) must fail for at least one element in S and it could even happen that x1 > 2. To ensure that we get a new element from S, let M = max{ 2, x1 }. Then there must be at least one element of S greater than this M . Let x2 denote any such element. Then we have x2 ∈ S and x2 > 2 and x2 6= x1 . Now setting M = max{ 3, x1 , x2 }, we may say that there is some element in S, which we choose to denote by x3 , such that x3 > 3 and x3 6= x1 and x3 6= x2 . We can continue to do this and so we see that if S is not bounded from above, then there exist elements x1 , x2 , x3 , . . . , xn , . . . (which are all different) such that xn > n for each n ∈ N. The following concepts play an essential rôle. Department of Mathematics 19 The Real Numbers Definition 2.14. Suppose that S is a non-empty subset of R which is bounded from above. The number M is the least upper bound (lub) of S if (i) a ≤ M for all a ∈ S (i.e., M is an upper bound for S). (ii) If M 0 is any upper bound for S, then M ≤ M 0 . If S is a non-empty subset of R which is bounded from below, then the number m is the greatest lower bound (glb) of S if (i) m ≤ a for all a ∈ S (i.e., m is a lower bound for S). (ii) If m0 is any lower bound for S, then m0 ≤ m. Note that the least upper bound and the greatest lower bound of a set S need not themselves belong to S. They may or they may not. The least upper bound is also called the supremum (sup) and the greatest lower bound is also called the infimum (inf). The ideas are illustrated by some examples. Examples 2.15. 1. Let S be the following set consisting of 4 elements, S = { −3, 1, 2, 5 }. Then clearly S is bounded from above and from below. The least upper bound is 5 and the greatest lower bound is −3. 2. Let S be the interval S = (−6, 4]. Then lub S = 4 and glb S = −6. Note that 4 ∈ S whereas −6 ∈ / S. 3. Let S = (1, ∞). S is not bounded from above and so has no least upper bound. S is bounded from below and we see that glb S = 1. Note that glb S ∈ / S in this case. Remark 2.16. Suppose that M is the lub for a set S. Let δ > 0. Then M − δ < M and since any upper bound M 0 for S has to obey M ≤ M 0 , we see that M − δ cannot be an upper bound for S. But this means that it is false that a ≤ M − δ for all a ∈ S. In other words, there must be some a ∈ S which satisfies M − δ < a. Since M is an upper bound for S, we also have a ≤ M and so a obeys M − δ < a ≤ M. So no matter how small δ may be, there will always be some element a ∈ S (possibly depending on δ and there may be many) such that M −δ < a ≤ M , where M = lub S. For any δ > 0, there is a ∈ S such that lub S − δ < a ≤ lub S . King’s College London 20 Chapter 2 Now suppose that m = glb S. Then for any δ > 0 (however small), we note that m < m + δ and so m + δ cannot be a lower bound for S (because all lower bounds for S must be less than or equal to m). Hence, there is some a ∈ S such that a < m + δ, which means that m ≤ a < m + δ. For any δ > 0, there is a ∈ S such that glb S ≤ a < glb S + δ . Remark 2.17. As already noted above, lub S and glb S may or may not belong to the set S. If it should happen that lub S ∈ S, then in this case lub S (or sup S) is the maximum element of S, denoted max S. If glb S ∈ S, then glb S (or inf S) is the minimum element of S, denoted min S. For example, the interval S = (−2, 5] is bounded and, by inspection, we see that sup S = 5 and inf S = −2. Since sup S = 5 ∈ S, the set S does indeed have a maximum element, namely, 5 = sup S. However, inf S ∈ / S and so S has no minimum element. We are now in a position to discuss the final property satisfied by R and it is precisely this last property which distinguishes R from Q. (A14) (The completeness property of R) Any non-empty subset of R which is bounded from above possesses a least upper bound. Any non-empty subset of R which is bounded from below possesses a greatest lower bound. These statements might appear self-evident, but as we will see, they have far-reaching consequences. We note here that these two statements are not independent, in fact, each implies the other, that is, they are equivalent. Remark 2.18. It is very convenient to think of R as the set of points on a line (the real line). Indeed, this is standard procedure when sketching graphs of functions where the coordinate axes represent the real numbers. Department of Mathematics 21 The Real Numbers Imagine now the following situation. ¢¡ | {z A } R | 6 {z } B Figure 2.2: The real line has no gaps. The set A consists of all points on the line (real numbers) to the left of the arrow and B comprises all those points to the right. Numbers are bigger the more they are to the right. The arrow points to the least upper bound of A (which is also the greatest lower bound of B). The completeness property (A14) ensures the existence of the real number in R that the arrow supposedly points to. There are no “gaps” or “missing points” on the real line. We can think of the integers Z or even the rationals Q as collections of dots on a line, but it is property (A14) which allows us to visualize R as the whole “unbroken” line itself. The next result is so obvious that it seems hardly worth noting. However, it is very important and follows from property (A14). Theorem 2.19 (Archimedean Property). For any given x ∈ R, there is some n ∈ N such that n > x. Proof. Let x ∈ R be given. We use the method of “proof by contradiction” — so suppose that there is no n ∈ N obeying n > x. This means that n ≤ x for all n ∈ N, that is, x is an upper bound for N in R. By the completeness property, (A14), N has a least upper bound, α, say. Then α is an upper bound for N so that n≤α (∗) for all n ∈ N. Since α is the least upper bound, α − 1 cannot be an upper bound for N and so there must be some k ∈ N such that α − 1 < k. But we can rewrite this as α < k + 1 which contradicts (∗) since k + 1 ∈ N. We conclude that there is some n ∈ N obeying n > x, as claimed. Corollary 2.20. (i) For any given δ > 0, there is some n ∈ N such that 1 < δ. n α < β. n Proof. (i) Let δ > 0 be given. By the Archimedean Property, there is some n ∈ N such that n > 1/δ. But then this gives 1/n < δ, as required. (ii) For given α > 0 and β > 0, set δ = β/α. By (i), there is n ∈ N such that 1/n < δ = β/α and so α/n < β. (ii) For any α > 0, β > 0, there is n ∈ N such King’s College London 22 Chapter 2 The next result is no surprise either. Theorem 2.21. For any a ∈ R, there is a unique integer n ∈ Z such that n ≤ a < n + 1. Proof. Let S = { k ∈ Z : k > a }. By theorem 2.19, S is not empty and is bounded below (by a). Hence, by the completeness property (A14), S has a greatest lower bound α, say, in R. We have a≤α≤k for all k ∈ S. (The inequality a ≤ α follows because a is a lower bound and α is the greatest lower bound and the inequality α ≤ k follows because α is a lower bound of S.) Since α is the greatest lower bound, α + 1 cannot be a lower bound of S and so there is some m ∈ S such that m < α + 1, that is, m − 1 < α. | m−1=n | | a m=n+1 R Figure 2.3: The integer part of a. Now, α is a lower bound for S and m − 1 < α and so m − 1 ∈ / S. But then, by the defining property of S, this means that it is false that m−1 > a. In other words, we have m − 1 ≤ a. But m ∈ S and so m > a and so m satisfies m − 1 ≤ a < m. Putting n = m − 1, we get n ∈ Z and n satisfies the required inequalities n ≤ a < n + 1. To show the uniqueness of such n ∈ Z, suppose that also n0 ∈ Z obeys 0 n ≤ a < n0 +1. Suppose that n < n0 . Then n+1 ≤ n0 and so the inequalities n0 ≤ a and a < n + 1 give n0 ≤ a < n + 1 ≤ n0 giving n0 < n0 which is impossible. Similarly, the assumption that n0 < n would lead to the impossible inequality n < n. We conclude that n = n0 which is to say that n is unique. Remark 2.22. For x ∈ R, let n ∈ Z be the unique integer obeying the inequalities n ≤ x < n+1. Set r = x−n. Then we see that 0 ≤ x−n = r < 1 and so x = n + r with n ∈ Z and where 0 ≤ r < 1. The unique integer n here is called the integer part of the real number x and is denoted by [x] (or sometimes by bxc). Department of Mathematics 23 The Real Numbers Theorem 2.23. Between any pair of real numbers a < b, there are infinitelymany rational numbers and also infinitely-many irrational numbers. Proof. First, we shall show that there is at least one such rational, that is, we shall show that for any given a < b in R, there is some q ∈ Q such that a < q < b. The idea of the proof is as follows. If there is an integer between a and b, then we are done. In any case, we note that since the integers are spread one unit apart, there should certainly be at least one integer between a and b if the distance between a and b is greater than 1. If the distance between a and b is less than 1, then we can “open up the gap” between them by multiplying both by a sufficiently large (positive) integer, n, say. The gap between na and nb is n(b − a). Clearly, if n is large enough, this value is greater than 1. Then there will be some integer m, say, between na and nb, i.e., na < m < nb. But then we see (since n is positive) that a < m/n < b and q = m/n is a rational number which does the job. We shall now write this argument out formally. Let n ∈ N be sufficiently large that n(b − a) > 1 so that na + 1 < nb and let m = [na] + 1. Since [na] ≤ na < [na] + 1, it follows that [na] ≤ na < [na] + 1 ≤ na + 1 < nb | {z } m and so na < m < nb and hence a < m/n < b. (Note that n > 0, so this last step is valid.) Setting q = m/n, we have that q ∈ Q and q obeys a < q < b, as required. To see that there are infinitely-many rationals between a and b, we just repeat the above argument but with, say, q and b instead of a and b. This tells us that there is a rational, q2 , say, obeying q < q2 < b. Once again, repeating this argument, there is a rational, q3 , say, obeying q2 < q3 < b. Continuing in this way, we see that for any n ∈ N, there are n rationals, q, q2 , . . . , qn obeying a < q < q2 < q3 < · · · < qn < b . Hence it follows that there are infinitely-many rationals between a and b. To show that there are infinitely-many irrational numbers between a and √ b, we use a trick together with the observation that if r is rational, then r/√ 2 is irrational. The trick is simply to apply the first part to the numbers √ a 2 and b 2 to deduce that for any n ∈ N there are rational numbers r1 , r2 , . . . , rn obeying √ √ a 2 < r1 < r2 < · · · < rn < b 2 . √ Now let µj = rj / 2 for j = 1, 2, . . . , n. Then each µj is irrational and we have a < µ1 < µ2 < · · · < µn < b and the result follows. King’s College London 24 Chapter 2 As a further application of the Completeness Property of R, we shall show that any positive real number has a positive nth root. Theorem 2.24. Let x ≥ 0 and n ∈ N be given. Then there is a unique s ≥ 0 such that sn = x. The real number s is called the (positive) nth root of x and is denoted by x1/n . Proof. If x = 0, then we can take s = 0, so suppose that x > 0. Let A be the set A = { t ≥ 0 : tn < x }. Then 0 ∈ A and so A is not empty and, by the Archimedean Property, there is some integer K with K > x. But then every t ∈ A must obey t < K because otherwise we would have t ≥ K and therefore tn ≥ K n ≥ K > x, which is not possible for any t ∈ A. This means that A is bounded from above. By the Completeness Property of R, A has a least upper bound, lub A = s, say. Note now that, since x > 0, by the Archimedean Property there is some m ∈ N such that m > 1/x. Hence mn ≥ m > 1/x which implies that 1/mn < x so that 1/mn ∈ A. This means that s ≥ 1/mn . In particular, s > 0. Now, exactly one of the statements sn = x, sn < x or sn > x is true. We claim that sn = x and to show this we shall show that the last two statements must be false. Indeed, suppose that sn < x. For k ∈ N, let sk = s(1 + k1 ). Then evidently sk > s and we will show that snk < x for suitably large k. Let d = x − sn . Then d > 0 and ¢ ¡ x − snk = x − sn + sn − snk = d − (snk − sn ) = d − sn (1 + k1 )n − 1 . Now, writing α = (1 + k1 ) and noting that 1 < α ≤ 2, we estimate (1 + k1 )n − 1 = αn − 1 = (α − 1)(αn−1 + αn−2 + · · · + 1) ≤ (α − 1)(2n−1 + 2n−2 + · · · + 1) ≤ (α − 1) n 2n = 1 k n 2n . Hence ¡ ¢ sn n 2n snk − sn = sn (1 + k1 )n − 1 ≤ . k For sufficiently large k, the right hand side of this inequality is less than d and so x − snk = x − sn + sn − snk = d − (snk − sn ) > 0 . It follows that if k is large enough, then sk ∈ A. But sk > s which means that s cannot be the least upper bound of A and we have a contradiction. Hence it must be false that sn < x. Department of Mathematics 25 The Real Numbers Suppose now that sn > x and let δ = sn − x. For given k ∈ N, let tk = s(1 − k1 ). Writing β = 1 − k1 and noting that 0 ≤ β ≤ 1, we estimate that 1 − (1 − k1 )n = 1 − β n = −(β n − 1) = −(β − 1)(β n−1 + β n−2 + · · · + 1) = (1 − β)(β n−1 + β n−2 + · · · + 1) ≤ (1 − β) n = 1 k n. It follows that sn − tnk = sn (1 − (1 − k1 )n ) ≤ 1 k sn n < δ for sufficiently large k. But then this means that tnk − x = tnk − sn + sn − x = δ − (sn − tnk ) > 0 for large k. However, tk < s and since s = lub A, it follows that tk is not an upper bound for A. In other words, there is some τ ∈ A such that τ > tk and therefore τ n − x > tnk − x > 0. However, τ ∈ A means that τ n < x which is a contradiction and so it is false that sn > x. We have now shown that sn < x is false and also that sn > x is false and so we conclude that it must be true that sn = x, as required. We have established the existence of some s ≥ 0 such that sn = x and so, finally, we must prove that such an s is unique. If x = 0, then s = 0 obeys sn = 0 = x. No s 6= 0 can obey sn = 0 because sn (1/s)n = 1 6= 0, so s = 0 is the only solution to sn = 0. Now let s > 0 and t > 0. If s > t, then s/t > 1 so that (s/t)n > 1 and we find that sn > tn . Interchanging the rôles of s and t, it follows that if s < t, then sn < tn . We conclude that if sn = x = tn then both s < t and s > t are impossible and so s = t. The proof is complete. King’s College London 26 Chapter 2 Principle of induction Suppose that, for each n ∈ N, P (n) is a statement about the number n such that (i) P (1) is true. (ii) For any k ∈ N, the truth of P (k) implies the truth of P (k + 1). Then P (n) is true for all n. Example 2.25. For any n ∈ N, n(n + 1)(2n + 1) . 6 Proof. For n ∈ N, let P (n) be the statement that 12 + 22 + 32 + · · · + n2 = 12 + 22 + 32 + · · · + n2 = n(n + 1)(2n + 1) . 6 Then P (1) is the statement that 12 = 1(1 + 1)(2 + 1) 6 which is true. Now suppose that k ∈ N and that P (k) is true. We wish to show that P (k + 1) is also true. Since we are assuming that P (k) is true, we see that 12 + 22 + 32 + · · · + k 2 + (k + 1)2 = k(k + 1)(2k + 1) + (k + 1)2 , 6 using the truth of P (k), k(k + 1)(2k + 1) + (k + 1)(6k + 6) 6 2 (k + 1)(2k + k + 6k + 6) = 6 (k + 1)(k + 2)(2k + 3) = 6 which is to say that P (k + 1) is true. By the principle of induction, we conclude that P (n) is true for all n ∈ N. = We can rephrase the principle of induction as follows. Let T be the set given by T = { k ∈ N : P (k) is true }, so k ∈ T if and only if P (k) is true. In particular, P (1) is true if and only if 1 ∈ T . Hence the principle of induction may be rephrased as follows. “ Let T be a set of natural numbers such that 1 ∈ T and such that if T contains k then it also contains k + 1. Then T = N. ” Department of Mathematics 27 The Real Numbers Principle of induction (2nd form) Suppose that Q(n) is a statement about the natural number n such that (i) Q(1) is true. (ii) For any k ∈ N, the truth of all Q(1), Q(2), . . . , Q(k) implies the truth of Q(k + 1). Then Q(n) is true for all n. In a nutshell: Suppose that: Q(1) is true and Q(1) true Q(2) true Q(3) true =⇒ Q(k + 1) true .. . Q(k) true Conclusion: Q(n) is true for all n ∈ N. This follows from the usual form of the principle. To see this, let S = { m ∈ N : Q(m) is true }. We shall use the usual form of induction to show that the hypotheses above imply that S = N. For any n ∈ N, let P (n) be the statement “{ 1, 2, . . . , n } ⊆ S ”. Now, by hypothesis, Q(1) is true and so 1 ∈ S. Hence { 1 } ⊆ S which is to say that P (1) is true. Next, suppose that the truth of P (k) implies that of P (k +1) and assume that P (k) is true. This means that { 1, 2, . . . , k } ⊆ S, that is, each of Q(1), Q(2), . . . , Q(k) is true. But then by the 2nd part of the hypothesis above, Q(k + 1) is true, that is to say, k + 1 ∈ S. Hence { 1, 2, . . . k, k + 1 } ⊆ S. But this just tells us that P (k + 1) is true. By induction (usual form), it follows that P (n) is true for all n ∈ N. This means that { 1, 2, . . . , n } ⊆ S for all n. In particular, n ∈ S for every n ∈ N, that is, Q(n) is true for all n ∈ N which is the content of the 2nd form of the principle. King’s College London 28 Department of Mathematics Chapter 2 Chapter 3 Sequences A sequence of real numbers is just a “listing” a1 , a2 , a3 , . . . of real numbers labelled by N, the set of natural numbers. Thus, to each n ∈ N, there corresponds a real number an . Not surprisingly, an is called the nth term of the sequence. a1 , a2 , a3 , . . . , ak , ak+1 , . . . £°£ £°£ ? labelled by N k th term Figure 3.1: The sequence (an )n∈N . Whilst it may seem a trivial comment, it is important to note that the essential thing about a sequence is that it has a notion of “direction” — it makes sense to talk about one term being further down the sequence than another. For example, a101 is further down the sequence than, say, a45 . It is convenient to denote the above sequence by (an )n∈N or even simply by (an ). Note that there is no requirement that the terms be different. It is quite permissible for aj to be the same as an for different j and n. Indeed, one could have an = α, say, for all n. This is just a sequence with constant terms (all equal to α) — a somewhat trivial sequence, but a sequence nonetheless. Remark 3.1. On a more formal level, one can think of a sequence of real numbers to be nothing but a function from N into R. Indeed, we can define such a function f : N → R by setting f (n) = an for n ∈ N. Conversely, any f : N → R will determine a sequence of real numbers, as above, via the assignment an = f (n). One might wish to consider a finite sequence such as, say, the four term sequence a1 , a2 , a3 , a4 . We will use the word sequence to mean an infinite sequence and simply include the adjective “finite” when this is meant. 29 30 Chapter 3 Examples 3.2. 1. 1, 4, 9, 16, . . . Here the general term an is given by the simple formula an = n2 . 2. 2, 3/2, 4/3, 5/4, 6/5, . . . The general term is an = (n + 1)/n. 3. 2, 0, 2, 0, 2, 0, . . . ( 0, if n is even Here an = 2, if n is odd. This can also be expressed as an = 1 − (−1)n . 4. Let an be defined by the prescription a1 = a2 = 1 and an = an−1 + an−2 for n ≥ 3. The sequence (an ) is then 1, 1, 2, 3, 5, 8, 13, . . . These are known as the Fibonacci numbers. We are usually interested in the “long-term” behaviour of sequences, that is, what happens as we look further and further down the sequence. What happens to an when n gets very large? Do the terms “settle down” or do they get sometimes big, sometimes small, . . . , or what? In examples 3.2.1 and 3.2.4, the terms just get huge. In example 3.2.2, we see, for example, that a99 = 100/99, a10000 = 10001/10000, a1020 = (1020 + 1)/1020 , . . . , so it looks as though the terms become close to 1. In example 3.2.3, the terms just keep oscillating between the two values 0 and 2. In example 3.2.2, we would like to say that the sequence approaches 1 as we go further and further down it. Indeed, for example, the difference between a1010 and 1 is that between (1010 + 1)/1010 and 1, that is, 10−10 . How can we formulate this idea of “convergence of a sequence” precisely? We might picture a sequence in two ways, as follows. The first is as the graph of the function n 7→ an . (Notice that we do not join up the dots.) a1 · 1 a2 · 2 a4 · 3 a· 4 an · ... n 3 Figure 3.2: A sequence as a graph. Department of Mathematics N 31 Sequences The second way is just to indicate the values of the sequence on the real line. × a4 × a2 × a1 × a3 R Figure 3.3: Plot the values of the sequence on the real line. The example 3.2.3 above, would then be pictured either as r r r a4 a2 r r 1 a5 a3 a1 2 3 a6 r ... 4 N Figure 3.4: Graph with values 0 or 2. or as 0 × a2 a4 .. . 2 × a1 a3 .. . R Figure 3.5: The values of an are either 0 or 2. King’s College London 32 Chapter 3 Returning to the general situation now, how should we formulate the idea that a sequence (an ) “converges to α”? According to our first pictorial description, we would want the plotted points of the sequence (the graph) to eventually become very close to the line y = α. × × × × × × × × × y=α × 1 2 3 ... 4 R Figure 3.6: The graph gets close to the line y = α. In terms of the second pictorial description, we would simply demand that the values of the sequence eventually cluster around the value x = α. r r rrrrr r r r R x=α Figure 3.7: The values of (an ) cluster around x = α. If we think of the index n as representing time, then we can think of an as the value of the sequence at the time n. The sequence can be considered to have some property “eventually” provided we are prepared to wait long enough for it to become established. It is very convenient to use this word “eventually”, so we shall indicate precisely what we mean by it. We say that a sequence eventually has some particular property if there is some N ∈ N such that all the terms an after aN (i.e., all an with n > N ) have the property under consideration. (The number N can be thought of as some offered time after which we are guaranteed that the property under consideration will hold and will continue to hold.) As an example of this usage, let (an ) be the sequence given by the prescription an = 100 − n, for n ∈ N. Then a1 = 99, a2 = 98, . . . etc. It is clear that an is negative whenever n is greater than 100. Thus, we can say that this sequence (an ) is eventually negative. Now we can formulate the notion of convergence of a sequence. The idea is that (an ) converges to the number α if eventually it is as “close” to α as desired. That is to say, given some preassigned tolerance ε, no matter how small, we demand that eventually (an ) is close to within ε of α. In Department of Mathematics 33 Sequences other words, the distance between an and α (as points on the real line) is eventually smaller than ε. Definition 3.3. We say that the sequence (an )n∈N of real numbers converges to the real number α if for any given ε > 0, there is some natural number N ∈ N such that |an − α| < ε whenever n > N . α is called the limit of the sequence. In such a situation, we write an → α as n → ∞ or alternatively limn→∞ an = α. The use of the symbol ∞ is just as part of a phrase and it has no meaning in isolation. There is no real number ∞. Remark 3.4. The positive number ε is the assigned tolerance demanded. Typically, the smaller ε is, so the larger we should expect N to have to be. For example, consider the sequence (an ) where an = 1/n. We would expect that an → 0 as n → ∞. To see this, let ε > 0 be given. (We are not able to choose this. It is given to us and its actual value is beyond our control.) It will be true that |an − 0| < ε provided n > 1/ε. So after some contemplation, we proceed as follows. We are unable to influence the choice of ε given to us, but once it is given then we can (and must) base our tactics on it. So let N be any natural number larger than 1/ε. If n > N , then n > N > 1/ε and so 1/n < ε. That is, if n > N , then |an − 0| = 1/n < ε and so, according to our definition, we have shown that an → 0 as n → ∞. Notice that the smaller ε is, the larger N has to be. Note that the statement if n > N then |an − α| < ε can also be written as |an − α| < ε whenever n > N or also as n > N =⇒ |an − α| < ε. Also, we should note that the inequality |an − α| < ε telling that the distance between the real numbers an and α is less than ε can also be expressed by the pair of inequalities −ε < an − α < ε or equivalently by the pair α − ε < an < α + ε. This simply means that an lies on the real line somewhere between the two values α − ε and α + ε. This must happen eventually if the sequence is to be convergent (to α). King’s College London 34 Chapter 3 an lies in here z }| { ( α−ε α ) α+ε R Figure 3.8: The value of an lies within ε of α. Example 3.5. Let (an )n∈N be the sequence with an = 2n + 5 n for n ∈ N. Does (an ) converge? Looking at the first few terms, we find 13 15 17 19 205 (an ) = (7, 29 , 11 3 , 4 , 5 , 6 , 7 , . . . , 100 , . . . ). It seems that an → 2 as n → ∞, but we must prove it. Let ε > 0 be given. We have to show that eventually (an ) is within ε of 2. We have |an − 2| = |(2n + 5)/n − 2| = 5/n. Now, the inequality 5/n < ε is the same as n > 5/ε. Let N be any natural number which obeys N > 5/ε. Then if n > N , we have 5 n>N > ε and so 5/n < ε. This means that if n > N then |an − 2| < ε and we have succeeded in proving that an → 2 as n → ∞. Example 3.6. Let (an )n∈N be the sequence an = 1/n2 . We shall show that an → 0 as n → ∞. Let ε > 0 be given. We wish to show that there is N ∈ N such that if n > N then |an − 0| < ε, that is, |an − 0| = 1/n2 < ε. Now, 1 1 1 ⇐⇒ n > √ < ε ⇐⇒ n2 > n2 ε ε √ so take N ∈ N to be any natural number satisfying N > 1/ ε. Then if √ n > N , it follows that n > 1/ ε and so n2 > 1/ε which in turn implies that 1/n2 < ε and the proof is complete. Alternatively, we note that 1/n2 ≤ 1/n and so if 1/n < ε then it follows that 1/n2 ≤ 1/n < ε. So let N ∈ N be any natural number such that N > 1/ε. Then 1/N < ε and so if n > N we have 1 1 1 1 1 < < ε =⇒ 2 ≤ < < ε. n N n n N Department of Mathematics 35 Sequences Example 3.7. Let (an )n∈N be the sequence an = that an → 0 as n → ∞. 4 1 + √ . We shall show n3 n Let ε > 0 be given. We must show that |an − 0| = 1 4 +√ <ε 3 n n whenever n is large enough. To see this, we note that 4 1 4 1 4 1 5 +√ ≤ +√ ≤√ +√ =√ . n3 n n n n n n If the right √hand side is less than ε, then so is the left hand side. Let N ∈ N satisfy 5/ N < ε, that is, 25/N < ε2 or N > 25/ε2 . Then if n > N , we may say that 25/n < 25/N < ε2 and so 4 1 5 5 +√ ≤ √ < √ <ε 3 n n n N that is, |an − 0| < ε whenever n > N . Example 3.8. Let |x| < 1 and for n ∈ N, let an = xn . Does (an ) converge? Since |x| < 1, |xn | = |x|n gets smaller and smaller as n increases, so we might guess that xn → 0 as n → ∞. Let ε > 0 be given. We must show that eventually |xn − 0| < ε which is the same as showing that eventually |x|n < ε. Set d = |x|. Then we wish to show that eventually dn < ε. Notice that d ≥ 0 and so we no longer have to worry about whether x is positive or negative. We have transferred the problem from one about x to one about d. Consider first the case x = 0. Then also d = 0 and dn = 0 for all n. In particular, if we go through the motions by choosing N = 1, then certainly dn < ε whenever n > N (because dn = 0), which tells us (trivially) that eventually dn < ε and so therefore xn → 0 as n → ∞. Now suppose that x 6= 0. Then 0 < |x| < 1, so that 0 < d < 1. Define t by d = 1/(1 + t), that is t = (1 − d)/d. Then t > 0. By the binomial theorem, we have µ ¶ n 2 n (1 + t) = 1 + nt + t + · · · + tn > nt 2 for any n ∈ N. Hence dn = 1 1 < . n (1 + t) nt King’s College London 36 Chapter 3 We shall use this to estimate dn . If the right hand side is less than ε, then so is the left hand side. To carry this through, let N be any natural number obeying N > 1/εt. Then this means that 1/N t < ε. For any n > N , we therefore have the inequality 1/n < 1/N and (since t > 0) we also have dn < 1 1 < < ε. nt Nt In other words, we have shown that eventually dn is less than ε. In terms of x and an , we have |an − 0| = |x|n = 1 1 1 < < <ε (1 + t)n nt Nt whenever n > N . Hence if |x| < 1 then xn → 0 as n → ∞. Is it possible for a sequence to converge to two different limits? To convince ourselves that this is not possible, suppose the contrary. That is, suppose that (an ) is some sequence which has the property that it converges both to α and β, say, with α 6= β. Let ε > 0 be given. Then by definition of convergence, (an ) is eventually within distance ε of α and also (an ) is eventually within distance ε of β. eventually in here z }| { eventually in here z }| { ( ) α−ε α α+ε ( ) β−ε β β+ε R Figure 3.9: The sequence (an ) is eventually within ε of both α and β. As one can see from the figure, if ε is small enough, then the two intervals (α − ε, α + ε) and (β − ε, β + ε) will not overlap and it will not be possible for any terms of the sequence (an ) to belong to both of these intervals simultaneously. We can turn this into a rigorous argument as follows. Theorem 3.9. Suppose that (an )n∈N is a sequence such that an → α and also an → β as n → ∞. Then α = β, that is, a convergent sequence has a unique limit. Proof. Let ε > 0 be given. Since we know that an → α, then we are assured that eventually (an ) is within ε of α. Thus, there is some N1 ∈ N such that if n > N1 then the distance between an and α is less than ε, i.e., if n > N1 then |an − α| < ε. Similarly, we know that an → β as n → ∞ and so eventually (an ) is within ε of β. Thus, there is some N2 ∈ N such that if n > N2 then the distance between an and β is less than ε, i.e., if n > N2 then |an − β| < ε. Department of Mathematics 37 Sequences So far so good. What next? To get both of these happening simultaneously, we let N = max{ N1 , N2 }. Then n > N means that both n > N1 and also n > N2 . Hence we can say that if n > N then both |an − α| < ε and also |an − β| < ε. Now what? We expand out these sets of inequalities. Pick and fix any n > N (for example n = N + 1 would do). Then α − ε < an < α + ε β − ε < an < β + ε. The left hand side of the first pair together with the right hand side of the second pair of inequalities gives α − ε < an < β + ε and so α − ε < β + ε. Similarly, the left hand side of the second pair together with the right hand side of the first pair of inequalities gives β − ε < an < α + ε and so β − ε < α + ε. Combining these we see that −2ε < α − β < 2ε which is to say that |α − β| < 2ε. This happens for any given ε > 0 and so the non-negative number |α − β| must actually be zero. But this means that α = β and the proof is complete. Definition 3.10. We say that the sequence (an )n∈N is bounded from above if there is some M ∈ R such that an ≤ M for all n ∈ N. The sequence (an )n∈N is said to be bounded from below if there is some m ∈ R such that m ≤ an for all n ∈ N. If (an ) is bounded both from above and from below, then we say that (an ) is bounded. Examples 3.11. 1. Let an = n + (−1)n n for n ∈ N. Then we see that (an ) is the sequence given by (an ) = (0, 4, 0, 8, 0, 12, 0, . . . ). Evidently (an ) is bounded from below (in fact, an ≥ 0) but (an ) is not bounded from above. (There is no M for which an ≤ M holds for all n. Indeed, for any fixed M whatsoever, if n is any even natural number greater than M , then an = 2n > n > M .) King’s College London 38 Chapter 3 2. Let an = 1/n, n ∈ N. It is clear that an obeys 0 ≤ an ≤ 2 for all n and so (an ) is bounded both from above and from below, that is, (an ) is bounded. Proposition 3.12. The sequence (an ) is bounded if and only if there is some K ≥ 0 such that |an | ≤ K for all n. Proof. Suppose first that (an ) is bounded. Then there is m and M such that m ≤ an ≤ M for all n. We do not know whether m or M are positive or negative. However, we can introduce |m| and |M | as follows. For any x ∈ R, it is true that − |x| ≤ x ≤ |x|. Applying this to m and M in the above inequalities, we see that − |m| ≤ m ≤ an ≤ M ≤ |M | . Let K = max{ |m| , |M | }. Then clearly, −K ≤ − |m| ≤ m ≤ an ≤ M ≤ |M | ≤ K which gives the inequalities −K ≤ an ≤ K so that |an | ≤ K, for all n, as required. For the converse, suppose that there is K ≥ 0 so that |an | ≤ K for all n. Then this can be expressed as −K ≤ an ≤ K for all n and therefore (an ) is bounded (taking m = −K and M = K in the definition). Theorem 3.13. If a sequence converges then it is bounded. Proof. Suppose that (an ) is a convergent sequence, an → α, say, as n → ∞. Then, in particular, (an ) is eventually within distance 1, say, of α. This means that there is some N ∈ N such that if n > N then the distance between an and α is less than 1, i.e., if n > N then |an − α| < 1. We can rewrite this as −1 ≤ an − α ≤ 1 or α − 1 ≤ an ≤ α + 1 whenever n > N . This tells us that the tail (an for n > N ) of the sequence is bounded but what about the whole sequence? This is now easy — we know Department of Mathematics 39 Sequences about an when n > N so we only still need to take into account the beginning of the sequence up to the N th term, that is, the terms a1 , a2 , . . . , aN . Let M = max{ a1 , a2 , . . . , aN , α + 1 } and let m = min{ a1 , a2 , . . . , aN , α − 1 }. Then certainly α + 1 ≤ M and m ≤ α − 1. Hence if n > N , then m ≤ an ≤ M. But by construction of m and M , we also have the inequalities m ≤ an ≤ M for any 1 ≤ n ≤ N . Piecing together these two parts of the argument, we conclude that m ≤ an ≤ M for any n and we have shown that (an ) is bounded, as required. Remark 3.14. The converse of this is false. For example, let (an ) be the sequence with an = (−1)n . Then (an ) = (−1, 1, −1, 1, −1, . . . ) which is bounded (for example, −1 ≤ an ≤ 1 for all n) but does not converge. Definition 3.15. A sequence (an ) of real numbers is said to be (i) increasing if an+1 ≥ an for all n; (ii) strictly increasing if an+1 > an for all n; (iii) decreasing if an+1 ≤ an for all n; (iv) strictly decreasing if an+1 < an for all n. A sequence satisfying any of these conditions is said to be monotonic or monotone. It is strictly monotonic if it satisfies either (ii) or (iv). One reason for an interest in monotonic sequences is the following. Theorem 3.16. If (an ) is an increasing sequence of real numbers and is bounded from above, then it converges. Proof. Suppose then that an ≤ an+1 and that an ≤ M for all n. Let K = lub{ an : n ∈ N }, so that K is well-defined with K ≤ M . We claim that an → K as n → ∞. Let ε > 0 be given. We must show that eventually (an ) is within distance ε of K. Now, K is an upper bound for { an : n ∈ N } and so an ≤ K for all n. It is enough then to show that K − ε < an eventually. However, this is true for the following reason. K − ε < K and K is the least upper bound of { an : n ∈ N } and so K − ε is not an upper bound for { an : n ∈ N }. This means that there is some aj , say, with aj > K − ε. But the sequence (an ) is increasing and so an ≥ aj for all n > j. Hence an > K − ε for all n > j. We have shown that K − ε < an ≤ K < K + ε for all n > j. This means that eventually |an − K| < ε and so the proof is complete. King’s College London 40 Chapter 3 Remark 3.17. Note that in the course of the proof of the above result, we have not only shown that (an ) converges but we have actually established what the limit is — it is the least upper bound of the set of real numbers { an : n ∈ N }. Of course, this does not necessarily provide us with the numerical value of the limit. It is also worth noting that from this result and the fact that a convergent sequence is bounded, we can say that an increasing sequence converges if and only if it is bounded. The sequence (an ) with an = n is clearly increasing. It is not bounded and so we can say immediately that it does not converge (which is no surprise, in this case). Corollary 3.18. Any sequence which is decreasing and bounded from below must converge. Proof. Suppose that (bn ) is a sequence which is decreasing and bounded from below. Then bn+1 ≤ bn for all n and there is some k such that bn ≥ k for all n. Set an = −bn and K = −k. Then these inequalities become an ≤ an+1 and an ≤ K for all n, that is, (an ) is increasing and is also bounded from above. By the theorem, we deduce that (an ) converges. Denote its limit by α and let β = −α. We will show that bn → β as n → ∞ (as one might well expect). Let ε > 0 be given. Then there is some N ∈ N such that if n > N then |an − α| < ε. In terms of bn and β, the left hand side becomes |−bn + β| which is equal to |bn − β| and so we have established that |bn − β| < ε whenever n > N , which completes the proof. Example 3.19. Let (an ) be the sequence given by a1 = 1, a2 = 1 + 1, a3 = 1 + 1 + 2!1 , a4 = 1 + 1 + 2!1 + 3!1 , ... an = 1 + 1 + 2!1 + · · · + 1 (n−1)! , ... 1 This can be written more succinctly as a1 = 1 and an = an−1 + (n−1)! for n ≥ 2. Does (an ) converge? It is clear that an+1 > an and so (an ) is increasing (in fact, strictly increasing). If we can show that it is also bounded then we conclude that it must converge. Can we find K such that Department of Mathematics 41 Sequences an ≤ K for all n? We have a1 = 1 and for any n ≥ 1 1 1 1 1 + + + ··· + 2 2.3 2.3.4 2.3 . . . n 1 1 1 1 ≤ 1 + 1 + + 2 + 3 + · · · + n−1 2 ¡ 2 1 n2¢ 2 1 − (2) =1+ , summing the GP, (1 − 21 ) an+1 = 1 + 1 + = 1 + 2(1 − ( 12 )n ) < 1 + 2 = 3. Hence the increasing sequence (an ) is bounded above, by 3. We conclude that (an ) converges. Because it is increasing, we know that its limit is equal to lub{ an : n ∈ N } = α, say. But an obeys an ≤ 3 and so 3 is an upper bound for { an : n ∈ N } and therefore lub{ an : n ∈ N } ≤ 3, that is, α ≤ 3. Of course, α = lub{ an : n ∈ N } ≥ ak for any particular k. Taking k = 3, we get that α ≥ a3 > 2 and so we can say that 2 < α ≤ 3. In fact, α is just e (and e = 2.71828 . . . ). If an → α and bn → β, then we might expect it to be the case that an + bn → α + β. After all, if (an ) is eventually close to α and (bn ) is eventually close to β, then it seems quite reasonable to guess that (an + bn ) is eventually close to α + β. This is true, but we must take care with the details. Theorem 3.20. Suppose that (an ) and (bn ) are sequences in R. (i) If an → α as n → ∞, then λ an → λ α as n → ∞, for any λ ∈ R. (ii) If an → α as n → ∞ and bn → β as n → ∞, then an + bn → α + β as n → ∞ and also an bn → αβ as n → ∞. (iii) If an → α as n → ∞ and if bn → β as n → ∞ and if bn 6= 0 for all n and if β 6= 0, then an /bn → α/β as n → ∞. Proof. (i) Fix λ ∈ R. Let ε > 0 be given. We must show that |λan − λα| < ε eventually. If λ = 0, then λan = 0 for all n and so it is clear that in this case λan = 0 → 0 = λα as n → ∞. So now suppose that λ 6= 0. Let ε0 > 0. (We will specify ε0 in a moment.) Then since we know that an → α, it follows that there is some N ∈ N such that n > N implies that |an − α| < ε0 . Now, |λan − λα| = |λ| |an − α| < |λ| ε0 King’s College London 42 Chapter 3 whenever n > N . If we choose ε0 = ε/ |λ| then see that |λan − λα| < |λ| ε0 = ε whenever n > N . Hence λan → λα, as required. (ii) Let ε > 0 be given. Suppose ε0 > 0. We will specify the value of ε0 in a moment. There is N1 ∈ N such that n > N1 implies that |an − α| < ε0 . Also, there is N2 ∈ N such that n > N2 implies that |bn − β| < ε0 . Set N = max{ N1 , N2 }. Then if n > N , we see that |an + bn − (α + β)| = |an − α + bn − β| ≤ |an − α| + |bn − β| < ε0 + ε0 = 2ε0 . Setting ε0 = 1 2 ε, it follows that if n > N , then |an + bn − (α + β)| < 2 ε0 = ε, that is, an + bn → α + β as n → ∞. To show that an bn → α β, consider first the case xn → 0 and yn → 0. We shall show that xn yn → 0. Let ε > 0 be given. √ Then we know that there is N1 ∈ N such that if n > N1 then |xn | < ε. √ Similarly, we know that there is N2 ∈ N such that if n > N2 then |yn | < ε. Let N = N1 + N2 . Then if n > N , it follows that |xn yn | < √ √ ε ε=ε that is, xn yn → 0 as n → ∞. Now, in the general case, we simply use previous results to note that an bn = (an − α)(bn − β) + αbn + an β − αβ → 0 + αβ + αβ − αβ = αβ as required. (iii) Now suppose that an → α, bn → β and suppose that bn 6= 0 for all n and that β 6= 0. Let γn = 1/bn and let γ = 1/β. Then an /bn = an γn . To show that an /bn → α/β, we shall show that γn → γ as n → ∞. The desired conclusion will then follow from the second part of (ii), above. We have |γn − γ| = |1/bn − 1/β| |β − bn | . = |bn β| Department of Mathematics 43 Sequences For large enough n, the numerator is small and the denominator is close to |β|2 , so we might hope that the whole expression is small. (Note that it is imperative here that β 6= 0.) We shall show that 1/ |bn | is bounded from above. Indeed, |β| > 0 and so, taking 12 |β| as our “ε”, we can say that there is some N 0 such that n > N 0 implies that |bn − β| < 1 2 |β| . Hence, if n > N 0 , we have |β| = |β − bn + bn | ≤ |β − bn | + |bn | < 1 2 |β| + |bn | and so 12 |β| < |bn |. If we set K = min{ |b1 | , |b2 | , . . . , |bN 0 | , 21 β }, then it is true that K > 0 and |bn | ≥ K for all n. Hence 1/ |bn | ≤ 1/K for all n. Let ε > 0 be given. Let ε0 = ε K |β|. Since bn → β, there is N such that n > N implies that |bn − β| < ε0 . But then, for any n > N , we have |γn − γ| = ε0 |β − bn | < =ε |bn | |β| K |β| and the proof is complete. Examples 3.21. 1. Taking an = 1/n, it follows that λ/n → 0 as n → ∞ for any λ ∈ R. 2. Suppose that an → α as n → ∞. Then it follows immediately that an − α → 0 as n → ∞. Indeed, for any given ε > 0, there is some N ∈ N such that n > N implies that |an − α| < ε. But |an − α| = |(an − α) − 0|, so to say that an → 0 is just to say that an − α → 0 as n → ∞. 3. With an = bn , we see that if an → α, then a2n → α2 as n → ∞. Now with bn = a2n , it follows that a3n → α3 as n → ∞. Repeating this (i.e., by induction), we see that if an → α as n → ∞, then akn → αk as n → ∞ for any given k ∈ N. (3 − 4/n2 ) 3n2 − 4 for n ∈ N. We can rewrite a as a = . n n 2n2 + 1 (2 + 1/n2 ) Then we note that −4/n2 → 0 and 1/n2 → 0, so that 3 − 4/n2 → 3 and (3 − 4/n2 ) 2 + 1/n2 → 1 as n → ∞. Finally, it follows that an = → 3/2 (2 + 1/n2 ) as n → ∞. 4. Let an = King’s College London 44 Chapter 3 7n3 − 5n2 + 3n − 9 . The first thing we do is to divide through 3n3 + 4n2 − 8n + 2 by the highest power of n occurring in the numerator or denominator, i.e., in this case, by n3 . So, an can be rewritten as 5. Let an = an = (7 − 5/n + 3/n2 − 9/n3 ) . (3 + 4/n − 8/n2 + 2/n3 ) Then we see that the numerator converges to 7 and the denominator converges to 3 as n → ∞. Hence an → 7/3 as n → ∞. n4 − 8 (1/n3 − 8/n7 ) . Then we have a = and so it follows n n7 + 3 (1 + 3/n7 ) that an → 0/1 = 0 as n → ∞. 6. Let an = 2n5 + 4 (2 + 4/n5 ) . Then a = . Now, the numerator n n3 + 6 (1/n2 + 6/n5 ) converges to 2 whilst the denominator converges to 0 as n → ∞. The above theorem about the convergence of an /bn says nothing about the case when bn or β = lim bn are zero. In this example, we back up and note that, by inspection, we have 7. Let an = an = 2n5 2n5 2n5 + 4 2n2 > ≥ . = n3 + 6 n3 + 6 n3 + 6n3 7 It follows that an is not bounded from above and so cannot converge. 8. Suppose that |x| < 1 and consider the sequence an = xn , for n ∈ N. Then the sequence bn = |an | = |x|n is monotone decreasing and is bounded below (by 0) and so therefore it converges, to `, say: bn → ` as n → ∞. Hence the sequence (b2n ) also converges to `. However, ¯ ¯ b2n = |a2n | = ¯x2n ¯ = |x|n |x|n = bn bn → `2 and so we see that ` = `2 . Therefore either ` = 0 or else ` = 1. The value ` = 1 is not possible because (bn ) converges to its greatest lower bound and the value 1 is not a lower bound. Hence ` = 0 and we conclude that |an | → 0 as n → ∞. Let ε > 0 be given. Then there is some N such that n > N implies that ||an | − 0| = |an | = |an − 0| < ε which shows that xn = an → 0 as n → ∞. The next result is very useful. Department of Mathematics 45 Sequences Proposition 3.22. Suppose that (cn ) is a sequence in R with cn ≥ 0 for all n ∈ N and such that cn → γ as n → ∞. Then γ ≥ 0. In other words, the limit of a convergent positive sequence is positive. (Note that we are using the term positive to mean not strictly negative, so that the value zero is allowed.) Proof. Exactly one of γ < 0, γ = 0 or γ > 0 is true. We wish to show that the first is impossible. To do this, suppose the contrary, that is, suppose that γ < 0. We will obtain a contradiction from this. Let ε = −γ. Then according to our hypothesis, ε > 0. We know that cn → γ as n → ∞ and so we can say that there is some N in N such that n > N implies that |cn − γ| < ε. How can we use this? Fix any n > N , for example we could take n = N + 1. The inequality |cn − γ| < ε is equivalent to the pair of inequalities −ε < cn − γ < ε . Recalling that ε = −γ, we find that cn − γ < ε = −γ . This tells us that cn < 0 which is false. We have obtained our contradiction and so we can conclude that, as claimed, it is true that γ ≥ 0. It is natural to ask whether strict positivity of every cn implies that of the limit γ, that is, if cn > 0 for all n, can we deduce that necessarily γ > 0? The answer is no. To show this, we just need to exhibit an explicit example. Such an example is provided by the sequence cn = 1/n. It is true here that cn = 1/n > 0 for every n. The sequence (cn ) converges, but its limit is γ = 0. So we have cn > 0 for all n, cn → γ as n → ∞ but γ = 0. The following theorem provides a useful technique for exhibiting convergence of a sequence even under circumstances where we do not know explicitly the values of the terms of the sequence. Theorem 3.23 (Sandwich Principle). Suppose that (an ), (bn ) and (xn ) are sequences in R such that (i) an ≤ xn ≤ bn for all n ∈ N and (ii) both an → µ and bn → µ as n → ∞. Then (xn ) converges and its limit is µ. Proof. Let ε > 0 be given. King’s College London 46 Chapter 3 The inequalities an ≤ xn ≤ bn can be rewritten as 0 ≤ xn − an ≤ bn − an . | {z } | {z } yn zn Since both (an ) and (bn ) converge to µ, it follows that zn = bn − an → µ − µ = 0 as n → ∞. Hence there is some N in N such that n > N implies that |zn | < ε. But since yn = xn − an ≥ 0, we have |yn | = yn and so n > N implies that |yn | = yn ≤ zn = |zn | < ε which means that yn → 0 as n → ∞. To finish the proof, we observe that xn = yn + an → 0 + µ as n → ∞ and we are done. We illustrate this with a proof that any real number can be approximated by rationals. Theorem 3.24. Any real number is the limit of some sequence of rational numbers. Proof. Let a be any given real number. For each n ∈ N, we know that there is a rational number qn , say, lying between the numbers a and a + 1/n. That is, qn satisfies a ≤ qn ≤ a + n1 . Since 1/n → 0, an application of the Sandwich Principle tells us immediately that qn → a as n → ∞, as required. Note that a similar proof shows that any real number is the limit of a sequence of irrational numbers (just replace the adjective “rational” by “irrational”.) The point though is that even though one might think of the irrational numbers as somewhat weird, they can nevertheless be approximated as closely as desired by rational numbers. Subsequences Consider the sequence (an ) given by an = sin( 12 nπ) for n ∈ N. Evidently, an = 0 if n is even and alternates between ±1 for n odd. For example, the first 5 terms are a1 = 1, a2 = 0, a3 = −1, a4 = 0, a5 = 1. Next, consider the sequence (bn ) = (1, 23 , 13 , 54 , 51 , . . . ) . This is given by ( bn = 1 n, n n+1 for n odd , for n even. We notice that the odd terms approach 0 whereas the even terms approach 1. Department of Mathematics 47 Sequences These two examples suggest that we might well be interested in considering certain terms of a sequence in isolation from the original sequence. This idea is formalized in the concept of a subsequence of a sequence. Roughly speaking, a subsequence of a sequence is simply any sequence obtained by leaving out particular terms from the original sequence. For example, the even terms a2 , a4 , a6 , . . . form a subsequence of the sequence (an ). Another subsequence of (an ) is obtained by considering, say, every tenth term, a10 , a20 , a30 , . . . . Definition 3.25. Let (an ) be a given sequence. A subsequence of (an )n∈N is any sequence of the form (an1 , an2 , an3 , . . . ) where n1 < n2 < n3 < . . . is any (strictly increasing) sequence of natural numbers. We can express this somewhat more formally as follows. A sequence (bk )k∈N is a subsequence of the sequence (an )n∈N if there is some mapping ϕ : N → N such that i < j implies that ϕ(i) < ϕ(j) (i.e., ϕ is strictly increasing) and such that bk = aϕ(k) for each k ∈ N. This agrees with the above formulation if we simply set ϕ(k) = nk and put bk = aϕ(k) = ank . (It really just amounts to a matter of notation.) Of course, (bk )k∈N is a sequence in its own right and so one can consider subsequences of (bk ). Evidently, a subsequence of (bk ) is also a subsequence of (an )n∈N . This is intuitively clear. We get a subsequence of (bk ) by leaving out some of its terms. However, (bk ) itself was obtained from (an ) by leaving out various terms of (an ), so if we leave out both lots in one step, we get our subsequence of (bk ) directly from (an ). To see this more formally, suppose that (cj )j∈N is a subsequence of (bk )k∈N . Then there is a strictly increasing map ψ : N → N such that cj = bψ(j) for all j ∈ N. However, since (bk ) is a subsequence of (an ), there is a strictly increasing map ϕ : N → N such that bk = aϕ(k) for all k ∈ N. This means that we can write cj as cj = bψ(j) = aϕ(ψ(j)) for j ∈ N. Let γ : N → N be the map γ(j) = ϕ(ψ(j)). Evidently γ is strictly increasing and cj = aγ(j) for j ∈ N. This shows that (cj )j∈N is a subsequence of (an )n∈N . Remark 3.26. Let (anj ) be a subsequence of (an ). It is intuitively clear that, say, the 20th term of (anj ) has to be at least the 20th term of (an ). In general, the term anj is at least as far along the (an ) sequence as the j th or, in other words, nj ≥ j. We will verify this by induction. For j ∈ N, let P (j) be the statement “ nj ≥ j ”. Now, nj ∈ N and so, in particular, n1 ≥ 1, which means that P (1) is true. Fix j ∈ N and suppose that P (j) is true. We will show that this implies that P (j + 1) is also true. Indeed, nj is strictly increasing in j and so we have nj+1 > nj ≥ j , by the induction hypothesis that P (j) is true. King’s College London 48 Chapter 3 Since all quantities under consideration are integer-valued, we deduce that nj+1 ≥ j + 1, i.e., P (j + 1) is true. It follows, by induction, that P (j) is true for all j ∈ N. Proposition 3.27. Suppose that (an ) converges to α. Then so does every subsequence of (an ). Proof. Let (ank )k∈N be any subsequence of (an ) whatsoever. We wish to show that ank → α as n → ∞. Let ε > 0 be given. Now, we know that an → α as n → ∞. Therefore, we are assured that there is some N ∈ N such that n > N implies that |an − α| < ε. But (ank ) is a subsequence of (an ) and so we know that nk ≥ k for all k ∈ N. It follows that if k > N , then certainly nk > N . Hence, k > N implies that |ank − α| < ε and the proof is complete. Remark 3.28. The proposition tells us that if (an ) converges, then so does any subsequence, and to the same limit. Consider the sequence an = (−1)n . Then we see that a2n = 1 for all n, whereas a2n−1 = −1 for all n, so that (an ) certainly possesses two subsequences which both converge but to different limits. Consequently, the original sequence cannot possibly converge. (If it did, every subsequence would have to converge to the same limit, namely, the limit of the original sequence.) Bolzano-Weierstrass Theorem Before we launch into one of the most important results of real analysis, let us make one or two observations regarding upper and lower bounds. Suppose that A and B are subsets of R with A ⊆ B. If M is such that b ≤ M for all b ∈ B, then certainly, in particular, a ≤ M for all a ∈ A. In other words, an upper bound for B is also (a fortiori) an upper bound for any subset A of B. Now, lub B is an upper bound for B and so lub B is certainly an upper bound for A. It follows that lub A ≤ lub B . It is possible for the inequality here to be strict. For example, if A is the interval A = [1, 2] and B is the interval B = [0, 3], then A ⊂ B and evidently lub A = 2 whereas lub B = 3, so that lub A < lub B in this case. Similarly, we note that if m is a lower bound for B, then m is also a lower bound for A and so glb B ≤ glb A . With the example A = [1, 2] and B = [0, 3], as above, we see that glb B = 0 and glb A = 1. Department of Mathematics 49 Sequences Theorem 3.29 (Bolzano-Weierstrass Theorem). Any bounded sequence of real numbers possesses a convergent subsequence. (In other words, if (an )n∈N is a bounded sequence in R, then there is a strictly increasing sequence (nk )k∈N of natural numbers such that (ank )k∈N converges.) Proof. Suppose that M and m are upper and lower bounds for (an ), m ≤ an ≤ M (∗) The idea of the proof is to construct a certain bounded monotone decreasing sequence and use the fact that this converges to its greatest lower bound and to drag a suitable subsequence of (an ) along with this. We construct the first element of the auxiliary monotone sequence. Let M1 = lub{ an : n ∈ N }. Then M1 −1 is not an upper bound for { an : n ∈ N } and so there must be some n1 , say, in N such that M1 − 1 < an1 ≤ M1 . (The value 1 subtracted here (from M1 ) is not important. We could have chosen any positive number. However, we shall repeat this process and we require a sequence of positive numbers which converge to 0. The numbers 1, 12 , 13 , . . . suit our purpose.) We note that M1 is an upper bound for (an ) and so, in particular, it is an upper bound for the set { an : n > n1 }. Next, we construct M2 as follows. Let M2 = lub{ an : n > n1 }. Then M2 ≤ M1 and moreover, M2 − 12 is not an upper bound for { an : n > n1 } and so there is some n2 > n1 such that M2 − 1 2 < an2 ≤ M2 . The way ahead is now clear. Let M3 = lub{ an : n > n2 }. Then M3 ≤ M2 and since M3 − 13 is not an upper bound for { an : n > n2 } there must be some n3 > n2 such that M3 − 1 3 < an3 ≤ M3 . Continuing in this way, we construct a sequence (Mj )j∈N and a sequence (nj )j∈N of natural numbers such that Mj+1 ≤ Mj , nj+1 > nj , and Mj − 1 j < anj ≤ Mj for all j ∈ N. Now we note that m ≤ anj ≤ Mj and so (Mj ) is a decreasing sequence which is bounded from below. It follows that Mj → µ as j → ∞, where µ = glb{ Mj : j ∈ N }. We are not interested in the value of this limit µ. King’s College London 50 Chapter 3 All we need to know is that the sequence (Mj )j∈N converges to something. However, by our very construction, Mj − 1 j < anj ≤ Mj and so, by the Sandwich Principle, Theorem 3.23, anj → µ as j → ∞. We have succeeded in exhibiting a convergent subsequence, namely the subsequence (anj )j∈N and the proof is complete. Remark 3.30. Note that the theorem does not tell us anything about the subsequence or its limit. Indeed, it cannot, because we know nothing about our original sequence other than the fact that it is bounded. It can also happen that there are many convergent subsequences with different limits. It is easy to construct such examples. For example, let (un ), (vn ) and (wn ) be any three given convergent sequences, say, un → u, vn → v and wn → w. We construct the sequence (an ) as follows: (a1 , a2 , a3 , a4 , . . . , ) = (u1 , v1 , w1 , u2 , v2 , w2 , u3 , . . . ) . In other words, the three sequences (un ), (vn ) and (wn ) are dovetailed to form (an ). Explicitly, for n ∈ N, uk , if n = 3k − 2 for some k ∈ N, an = vk , if n = 3k − 1 for some k ∈ N, wk , if n = 3k for some k ∈ N. Evidently, if u, v and w are different, then the sequences (a3j−2 )j∈N = (uj )j∈N , (a3j−1 )j∈N = (vj )j∈N and (a3j )j∈N = (wj )j∈N are three convergent subsequences of (an )n∈N with different limits. Let us say that a real number µ is a limit point of a given sequence if it is the limit of some convergent subsequence. Then in this terminology, the real numbers u, v and w are limit points of the sequence (an ). Next, we need a little more terminology. Definition 3.31. A sequence (an )n∈N is said to be a Cauchy sequence (also known as a fundamental sequence) if it has the property that for any given ε > 0 there is some N ∈ N such that both n > N and m > N imply that |an − am | < ε . In other words, eventually the distance between any two terms of the sequence is less than ε. Department of Mathematics 51 Sequences Proposition 3.32. Every Cauchy sequence is bounded. Proof. Suppose that (an ) is a Cauchy sequence. Then we know that there is some N ∈ N such that both n > N and m > N imply that |an − am | < 1 . (The value 1 on the right hand side here is not at all critical. We could have selected any positive real number instead, with obvious modifications to the following reasoning.) In particular, for any j > N , |aj | ≤ |aj − aN +1 | + |aN +1 | < 1 + |aN +1 | . It follows that if we let M = 1 + max{ |ai | : 1 ≤ i ≤ N + 1 }, then we have |ak | ≤ M for all k ∈ N. This shows that (an ) is bounded. We have seen that a bounded monotone sequence must converge. The next theorem is very important as it gives us a necessary and sufficient condition for convergence of a sequence. Theorem 3.33. sequence. A sequence converges in R if and only if it is a Cauchy Proof. We must show that any Cauchy sequence has to converge and, conversely, that any convergent sequence is a Cauchy sequence. So suppose that (an )n∈N is a Cauchy sequence. We must show that there is some α such that an → α as n → ∞. At first, this might seem impossible because there is no way of knowing what α might be. However, it turns out that we do not need to know the actual value of α but rather just that it does exist. Indeed, we have seen that a Cauchy sequence is bounded and the Bolzano-Weierstrass Theorem tells us that a bounded sequence possesses a convergent subsequence. We shall show that this is enough to guarantee that the sequence itself converges. Let ε > 0 be given. As noted above, we know that (an ) has some convergent subsequence, say ank → α as k → ∞. We shall show that an → α by an ε/2-argument. Since we know that ank → α as k → ∞, we can say that there is k0 ∈ N such that k > k0 implies that |ank − α| < 21 ε . Since (an ) is a Cauchy sequence, there is N0 such that both n > N0 and m > N0 imply that |an − am | < 21 ε . King’s College London 52 Chapter 3 Let N = max{ k0 , N0 }. Now, if k > N it follows that also nk > N (since nk ≥ k) and so if k > N then |ak − α| ≤ |ak − ank | + |ank − α| < 1 2 ε + 12 ε = ε . Thus ak → α as k → ∞ as required. Next, suppose that (an ) converges. We must show that (an ) is a Cauchy sequence. Let ε > 0 be given. We use an ε/2-argument. Let α denote limn→∞ an . Then there is N ∈ N such that n > N implies that |an − α| < 1 2 ε. But then if both n > N and m > N , we have |an − am | ≤ |an − α| − |α − am | < 1 2 ε + 21 ε = ε which verifies that (an ) is indeed a Cauchy sequence, as claimed. Department of Mathematics 53 Sequences Some special sequences Example 3.34. What happens to c1/n as n → ∞ for given fixed c > 0 ? To investigate this, let c > 0 and consider the sequence given by (c1/n ) = (c, c1/2 , c1/3 , c1/4 , . . . ). Suppose first that c > 1. Then c1/n > 1. For n ∈ N, let dn be given by dn = c1/n − 1, so that dn > 0 and c1/n = 1 + dn . Hence, by the binomial theorem, µ ¶ µ ¶ n 2 n n c = (1 + dn ) = 1 + n dn + d + ··· + dn−1 + dnn 2 n n−1 n ≥ 1 + n dn . It follows that c − 1 ≥ n dn and so we have 0 < dn ≤ c−1 . n It follows from the Sandwich Principle that dn → 0 as n → ∞. Hence, for any c > 1, c1/n = 1 + dn → 1 as n → ∞. If c = 1, then evidently c1/n = 11/n = 1 → 1 as n → ∞. Now suppose that 0 < c < 1. Set γ = 1/c so that γ > 1. Then from the above, c1/n = (1/γ)1/n = 1/(γ 1/n ) → 1 as n → ∞. We conclude that c1/n → 1 as n → ∞ for any fixed c > 0. c1/n → 1 as n → ∞ for any fixed c > 0 Example 3.35. What happens to n1/n as n → ∞ ? There is conflicting behaviour here. Taking the nth root would tend to make things smaller, but one is taking the nth root of n which itself gets larger. It is not immediately clear what will happen. Define kn by n1/n = 1 + kn (so that kn = n1/n − 1). Then kn > 0 for all n > 1. We shall show that kn → 0 as n → ∞. To see this, notice that for any n > 1 n = (1 + kn )n = 1 + n kn + > Hence, for n > 1, n(n − 1) 2 kn + · · · + knn 2 n(n − 1) 2 kn . 2 √ 2 0 < kn < √ n−1 King’s College London 54 Chapter 3 and by the Sandwich Principle, we deduce that kn → 0 as n → ∞. Hence n1/n = 1 + kn → 1 as n → ∞. n1/n → 1 as n → ∞ Example 3.36. What happens to cn /n! as n → ∞ for fixed c ∈ R ? If c > 1, then cn gets large as n grows but so does the denominator n!. There is conflicting behaviour here, so it is not obvious what does happen. For any c ∈ R, choose an integer k ∈ N such that k > |c|. The (k + m)th term of the sequence is ck+m ck cm = . (k + m)! k! (k + 1)(k + 2) . . . (k + m) We have ¯ k+m ¯ ¯ ¯ c ¯= 0 ≤ ¯¯ (k + m)! ¯ ¯ k¯ ¯c ¯ |c|m k! (k + 1)(k + 2) . . . (k + m) ¯ k¯ m ¯c ¯ |c| ≤ m ¯ k!k ¯ k ¯c ¯ = γm k! k where γ = |c| /k < 1. Now let aj = cj /j! for 1 ≤ j ≤ k and let aj = |c|k! γ m for j = k + m with m ≥ 1. Then evidently aj → 0 as j → ∞ and ¯ j¯ ¯c ¯ 0 ≤ ¯¯ ¯¯ ≤ aj . j! ¯ ¯ By the Sandwich Principle, it follows that ¯cj /j!¯ → 0 and hence we also have cj /j! → 0 as j → ∞. cj /j! → 0 as j → ∞ for any fixed c ∈ R √ √ Example 3.37. What happens to n + 1 − n as n → ∞ ? Each of the two terms becomes large but what about their difference? To see what does happen, we use a trick and write √ √ √ √ √ √ ( n + 1 − n)( n + 1 + n) √ 0< n+1− n= √ ( n + 1 + n) (n + 1) − n = √ √ ( n + 1 + n) 1 1 = √ √ <√ n ( n + 1 + n) Department of Mathematics 55 Sequences and so by the Sandwich Principle, we deduce that n → ∞. √ √ n + 1 − n → 0 as √ √ n + 1 − n → 0 as n → ∞ Example 3.38. Let 0 < a < 1 and let k ∈ N be fixed. What happens to nk an as n → ∞ ? The term nk gets large but the term an becomes small as n grows. We have conflicting behaviour. To investigate this, first let us note that nk = (nk/n )n and also that = (n1/n )k → 1k = 1 as n → ∞. It follows that nk/n a → a as n → ∞. Let r obey a < r < 1. Then eventually nk/n a < r (because with ε = r − a, eventually nk/n a − a < ε = r − a). It follows that eventually nk/n 0 < nk an = (nk/n a)n < rn . But rn → 0 and so by the Sandwich Principle we conclude that nk an → 0 as n → ∞. (There is N ∈ N such that n > N implies that 0 < nk an = (nk/n a)n < rn . For 1 ≤ n ≤ N , set bn = nk an and for n > N set bn = rn . Then bn → 0 as n → ∞ and we have 0 < nk an ≤ bn and so the Sandwich Principle tells us that nk an → 0 as n → ∞.) nk an → 0 as n → ∞ for any fixed 0 < a < 1 King’s College London 56 Chapter 3 Sequences of functions Just as one can have a sequence of real numbers, so one can have a sequence of functions. By this is simply meant a family of functions labelled by the natural numbers N. Consider, then, a sequence (fn )n∈N of functions. For each given x, the sequence (fn (x))n∈N is just a sequence of real numbers, as considered already. Here, as always, fn (x) is the notation for the value taken by the function fn at the real number x. In this way, we get many sequences — one for each x. Now, for some particular values of x the sequence (fn (x))n∈N may converge whereas for other values of x it may not. Even when it does converge, its limit will, in general, depend on the value of x. These various values of the limit themselves determine a function of x. This leads to the following notion of convergence of a sequence of functions. Definition 3.39. Suppose that (fn )n∈N is a sequence of functions each defined on a particular subset S in R. We say that the sequence (fn )n∈N converges pointwise on S to the function f if for each x ∈ S the sequence (fn (x))n∈N of real numbers converges to the real number f (x). We write fn → f pointwise on S as n → ∞. Some examples will illustrate this important idea. Examples 3.40. 1. Let fn (x) = xn and let S be the open interval S = (−1, 1). We have seen that xn → 0 as n → ∞ for any x with |x| < 1. This simply says that (fn ) converges pointwise to f = 0, the function identically zero on the set (−1, 1). 2. Let fn (x) = xn as above, but now let S = (−1, 1]. Then for |x| < 1, we know that fn (x) = xn → 0 as n → ∞. Furthermore, with x = 1, we have fn (1) = 1n = 1, so that fn (1) → 1 as n → ∞. Let f be the function on S = (−1, 1] given by ( 0, for −1 < x < 1, f (x) = 1, for x = 1. Then we can say that fn → f pointwise on (−1, 1]. 3. Once again, let fn (x) = xn but now let S be the interval S = [−1, 1]. We know that for each x ∈ (−1, 1], the sequence (fn (x)) of real numbers converges. We must investigate what happens for x = −1. We see that fn (−1) = (−1)n , so that the sequence (fn (−1))n∈N of real numbers does not converge. This means that there does not exist a function f on [−1, 1] with the property that fn (−1) → f (−1). The conclusion is that in this case (fn ) does not converge pointwise on [−1, 1] to any function at all. Department of Mathematics Sequences 57 These examples illustrate the obvious but nevertheless crucial point that pointwise convergence of a sequence of functions involves not only a particular sequence of functions but also the set on which the pointwise convergence is to be considered to take place. The notion of “pointwise convergence” only makes sense when used together with the set to which it refers. King’s College London 58 Department of Mathematics Chapter 3 Chapter 4 Series Given a sequence a1 , a2 , . . . we wish to discus the “infinite sum” a1 + a2 + a3 + . . . P Such an expression is called an infinite series and is denoted by ∞ k=1 ak . We shall attempt to interpret such a series as a suitable limiting object. To this end, let sn be the so-called nth partial sum sn = n X ak = a1 + · · · + an . k=1 P Then as n becomes larger, so sn looks more like the series ∞ k=1 ak . Of course, there is the matter of convergence to bePconsidered. The point is that one can always write down the expression ∞ k=1 ak but without some extra discussion it is not all clear what it actually means. It is certainly a combination of symbols, but does it have any reasonable interpretation as a real number? P∞ if it happens to be the case that ak = 1 for P For example, see that in this a = every k, then ∞ k k=1 1. What does this mean? WeP k=1 ∞ special case, sn = n which gets large. The answer is that k=1 ak simply P∞ has no meaning in this case. We say that the series k=1 ak diverges. As another example, suppose that ak = (−1)k+1 . Then ak = 1 for odd k and is otherwise equal to −1. Then ∞ X ak = 1 − 1 + 1 − 1 + 1 − 1 + . . . k=1 — which means what exactly? In this example, we see that sn = 1 if n is odd but is zero if n is even. The partial sums flip interminably between the two values 1 and 0. P Definition 4.1. The series ∞ k=1 ak is said to be convergent if the sequence of partial sums (sn )n∈N converges. If sn →P α as n → ∞, then α is said to be the sum of the series and the expression ∞ k=1 ak is defined to be this limit α. A series which is not convergent is said to be divergent. 59 60 Chapter 4 Example 4.2. Let ak = (1/3)k , so that ∞ X ak = k=1 ∞ X 1 . 3k k=1 We see that the partial sums are given by sn = ³ 1 ´n ( 1 − ( 1 )n+1 ) 1 1 1 ³ 1 ´n 1 3 + ··· + − = 3 = → 1 3 3 2 2 3 2 (1 − 3 ) as n → ∞. Hence ∞ X 1 1 = . k 3 2 k=1 Note that the same argument shows that ∞ X k=1 xk = x 1−x (∗) for any x with |x| < 1. (The requirement that |x| < 1 ensures that xn → 0 as n → ∞.) Note that if we were to ignore the fact that it was not valid but go ahead anyway and simply set x = 1 in the above formula (∗), then we would have P∞ 1 on the left hand side and 1/0 on the right hand side — neither of k=1 which have meaning as real numbers. Again, if we ignore the fact that it is invalid but anyway set x = −1 in (∗), then the left hand side becomes −(1 − 1 + 1 − 1 + . . . ) and the right hand side becomes − 12 which might lead one to suggest that 1 − 1 + 1 − 1 + . . . is in some sense equal to 12 . The fact is that 1 − 1 + 1 − 1 + . . . has no sensible interpretation as a real number. Returning to the series itself and setting x = 5, say, we see that the partial sum sn = 5 + · · · + 5n ≥ 5n so that P∞the ksequence (sn ) does not converge (it is not bounded) and therefore P∞ k=1k 5 is divergent. That’s it — nothing more to say. The expression k=1 5 does not represent a real number P∞andkit cannot be manipulated as if it did. (It is tempting to say that k=1 5 has no meaning at all. However, it does implicitly carry with it the discussion here to the effect that the sequence of partial sums does not converge.) The following divergence test allows us to immediately spot certain series as being divergent. Proposition 4.3 (Test for divergence). Suppose P that the sequence (an ) fails to converge to 0 as n → ∞. Then the series ∞ k=1 ak diverges. P Proof.PWe must show that if ∞ k=1 ak is convergent then an → 0. So suppose that ∞ a is convergent with sum α, say. This means that sn → α as k k=1 P n → ∞, where sn = nk=1 ak . Department of Mathematics 61 Series Let ε > 0 be given. We need an ε/2-argument. Since sn → α, we are assured that there is some N 0 ∈ N such that |sn − α| < 21 ε whenever n > N 0 . But then |an | = |sn − sn−1 | = |sn − α + α − sn−1 | ≤ |sn − α| + |α − sn−1 | < 12 ε + 21 ε = ε provided n > N 0 and n − 1 > N 0 . So if we set N = N 0 + 1, then if n > N we can be sure that |an | < ε which establishes that an → 0 as n → ∞ and the proof is complete. Remark 4.4. It is very important to understand what this proposition says and what it does not say. It says that if the terms of a series fail to converge to zero, then the series itself is divergent. It is quite possible to find a series whose terms do converge to zero but nevertheless, P∞ the series is divergent. Such an example is provided by the series k=1 ak with ak = 1/k. 1+ 1 1 1 1 + + + + ... 2 3 4 5 is a divergent series Indeed, the sequence of partial sums (sn ) is not bounded. One can see this as follows. That portion of the graph of the function y = 1/x between the values x = k and x = k + 1 lies below the line y = 1/k. Let Rk denote the rectangle with height 1/k and with base on the interval [k, k + 1]. Then the area of Rk is greater than the area under the graph of y = 1/x between the values x = k and x = k + 1, that is, Z k+1 1 1 dx = ln(k + 1) − ln k . area Rk = > k x k Summing from k = 1 to k = n, we get n sn = 1 + 1 1 X + ··· + > (ln(k + 1) − ln k) = ln(n + 1) − ln 1 = ln(n + 1). 2 n k=1 But ln(n + 1) > ln n which becomes arbitrarilyPlarge for large enough n. So we conclude that (sn ) is unbounded and so ∞ k=1 ak , with ak = 1/k, is divergent — despite the fact that an = 1/n → 0 as n → ∞. King’s College London 62 Chapter 4 An alternative argument is as follows. One notices that 1+ 1 2 + 1 1 + 1 + 1 + 1 + 1 + 1 + 1 + · · · + 16 + 1 + ··· + 1 +... |3 {z 4} |5 6 {z 7 8} |9 {z } |17 {z 32} > 1 2 > 4× 18 = 12 1 > 8× 16 = 12 1 > 16× 32 = 21 and so we see that s1 = 1, s2 = s1 + 12 , s4 > s2 + 12 , s8 > s4 + 21 , s16 > s8 + 21 , ... and so it follows that s2j ≥ (j + 2) 12 for j ∈ N. (This inequality is strict for j > 1, but this is of no consequence here.) So the sequence (sn ) is not bounded. The next result tells us that we can do arithmetic with convergent series just as we would expect. P∞ P∞ Proposition 4.5. Suppose that a and k k are convergent series. k=1 k=1 bP P∞ ∞ Then the series k=1 (ak + bk ) are also k=1 λ ak , for any λ ∈ R, and convergent and have sums such that ∞ X k=1 λ ak = λ ∞ X ak and ∞ X (ak + bk ) = k=1 k=1 k=1 ∞ X ak + ∞ X bk . k=1 Proof.PWe just need to look P∞So let P and their limits. P at the partial sums a and β = sn = nk=1 ak and let tn = nk=1 bk and let α = ∞ k=1 ak . k=1 k By hypothesis, we know that sn → α and tn → β as n → ∞. But n X λ ak = λ sn → λ α k=1 and n X (ak + bk ) = sn + tn → α + β k=1 P∞ P∞ as n → ∞. It follows that λ a is convergent with sum λ α = λ k k=1 ak k=1 P∞ and P∞ also that P∞ k=1 (ak + bk ) is convergent with sum given by α + β = a + k k=1 bk , as required. k=1 Example 4.6. For k ∈ N, let ak = 9/10k . Then ∞ X ak = k=1 9 9 9 + 2 + 3 + ... 10 10 10 which is usually referred to as 0.9 . . . recurring. Is this series convergent and if so, what is its sum? We see that ∞ X k=1 Department of Mathematics ak = ∞ ∞ X X 9 1 = 9 . k 10 10k k=1 k=1 63 Series P But we have seen earlier that ∞ k=1 1 1 1 10 /(1 − 10 ) = 9 . It follows that 1 10k is convergent with sum equal to ∞ X 9 1 = 9 × = 1. k 9 10 k=1 ∞ X 9 = 0.9999 · · · = 1 10k k=1 Continuing with this theme, we have the following. Example 4.7. Let (ak )k∈N P be any sequence of integers taking values in the k set { 0, 1, 2, . . . , 9 }. Then ∞ k=1 ak /10 is convergent with sum lying in the interval [0, 1]. P ak th partial sum of the series To see this, let sn = nk=1 10 k denote the n P∞ k n+1 ≥ 0 and so the sequence k=1 ak /10 . We note that sn+1 −sn = an+1 /10 of partial sums (sn ) is monotone increasing. Furthermore, since each ak is an integer in the range 0 to 9, it follows that ak /10 ≤ 9/10 and so we can say that ak /10k ≤ 9/10k . Hence, for any n ∈ N, ¡1 ¢ n n 1 n+1 1 X X ak 9 10 ¡− ( 10 ) ¢ 10 ¢ ¡ sn = ≤ = 9 < 9 = 1. 1 1 10k 10k 1 − 10 1 − 10 k=1 k=1 We have shown that the sequence (sn ) is monotone increasing Pand bounded k and therefore it converges. Hence, by definition, the series ∞ k=1 ak /10 is convergent. P k We must now show that ∞ k=1 ak /10 = α, say, lies between 0 and 1. However, each sn ≥ 0 and since sn → α as n → ∞, it follows that it must also be true that α ≥ 0. Furthermore, we have seen that sn < 1 and so we have 1 − sn > 0. But 1 − sn → 1 − α and so it also follows that 1 − α ≥ 0. Hence 0 ≤ α ≤ 1, as claimed. We can interpret this as saying that every infinite decimal represents some real number x lying in the range 0 ≤ x ≤ 1. The converse is also true. Example 4.8. Let x be a real number satisfying 0 ≤ x ≤ 1. Then there is a sequence of integers (ak )k∈N with values in { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 } such P k that the series ∞ a k=1 k /10 is convergent with sum equal to x. To show this, we must construct the sequence (ak ). We know how to do this for x = 1, (take ak = 9 for all k) so let us suppose now that 0 ≤ x < 1. If we are told that a1 a2 a3 x= + 2 + 3 + ... 10 10 10 then evidently, a3 a2 + 2 + ... 10 x = a1 + 10 10 King’s College London 64 Chapter 4 so that a1 is the integer part of 10x, a1 = [10x]. Similarly, 100 x = 10a1 + a2 + a3 + ... 10 so that a2 is the integer part of 100x − 10a1 , a2 = [100x − 10[10x]]. In this way, we can write any ak in terms of x. We simply use this idea to construct the ak s. We isolate the following fact: if u is any real number with 0 ≤ u < 1, then there is an integer a in the set { 0, 1, 2, . . . , 8, 9 } and a real number α obeying 0 ≤ α < 1 such that 10u = a+α. To see this, we first note that 0 ≤ 10u < 10 and so [10u], the integer part of 10u, lies in the set { 0, 1, 2, . . . , 8, 9 }. Let a = [10u] and let α = 10u − [10u]. Since 0 ≤ w − [w] < 1 for any real number w, we see that 0 ≤ α < 1 and 10u = a + α as required. Since 0 ≤ x < 1, as noted above, we can write 10x as 10x = a1 + α1 where a1 is an integer in the set { 0, 1, 2, . . . , 8, 9 } and 0 ≤ α1 < 1. Then x= α1 a1 + . 10 10 Now, with α1 instead of x, we can say that we can write α1 as α1 = a2 α2 + 10 10 for some integer a2 in the set { 0, 1, 2, . . . , 8, 9 } and some real number α2 obeying 0 ≤ α2 < 1. Then x= a1 α1 a1 a2 α2 + = + + . 10 10 10 102 102 Repeating this for α2 , we get x= a1 a2 a3 α3 + + + 10 102 103 103 with a3 ∈ { 0, 1, 2, . . . , 8, 9 } and 0 ≤ α3 < 1. Continuing in this way, we construct integers an in the range 0 to 9 and real numbers αn obeying 0 ≤ αn < 1 such that x= a2 a3 an αn a1 + 2 + 3 + ··· + n + n . 10 10 10 10 10 | {z } sn Finally, we note that |x − sn | = αn 9 ≤ n →0 n 10 10 as P∞n → ∞, kthat is, sn → x as n → ∞ and so it follows that the series k=1 ak /10 converges with sum equal to x, and the proof is complete. Department of Mathematics 65 Series This provides another proof that any given real number is the limit of a sequence of rationals. Indeed, for b ∈ R, write b = [b] + x where [b] is the integer part of b and 0 ≤ x < 1. As discussed above, x = lim sn where each sn is the partial sum of a series with rational terms of the form ak /10k for suitable ak ∈ { 0, 1, 2, . . . , 9 }. In particular, each sn is rational and so is [b] + sn . However, [b] + sn → [b] + x = b and the result follows. Since 0.99 · · · = 1 = 1.00 . . . it is clear that the decimal expansion of a real number need not be unique. Indeed, further examples are provided by 0.5 = 0.499 . . . or 0.63 = 0.6299 . . . and so on. However, this is the only possible kind of ambiguity as the next theorem shows. Theorem 4.9. Suppose that 0 ≤ x < 1 and that x = 0.a1 a2 · · · = 0.b1 b2 . . . , that is, x= ∞ ∞ X X ak bk = 10k 10k k=1 k=1 where each ak and bk belong to { 0, 1, 2, . . . , 8, 9 }. Then either ak = bk for all k ∈ N or else there is some N ∈ N such that ak = bk for 1 ≤ k < N and either aN = bN + 1 and ak = 0 and bk = 9 for all k > N or bN = aN + 1 and bk = 0 and ak = 9 for all k > N . Proof. We will use the following result. P k Lemma 4.10. Suppose that 0 ≤ αk ≤ 9 and that ∞ k=1 αk /10 = 0. Then αk = 0 for all k ∈ N. Pn k Proof of Lemma. Let sn = k=1 αk /10 denote the partial sums of the P∞ k (sn ) is a series k=1 αk /10 . Then sn+1 − sn = αn+1 /10n+1 ≥ 0 so Pthat n positive increasing sequence. Moreover, each sn obeys sn ≤ k=1 9/10k < 1 P∞ so that (sn ) P converges, that is, k=1 αk /10k is a convergent series. Its value k obeys sn ≤ ∞ k=1 αk /10 . Hence, for any m ∈ N, ∞ X αm 0 ≤ m ≤ sm ≤ αk /10k = 0 10 k=1 so that it follows that αm = 0 as claimed and the proof of the lemma is complete. We turn now to the proof of the theorem. Case 1: x = 0. P k In this case, we have 0 = x = ∞ k=1 ak /10 with ak ∈ { 0, 1, . . . , 9 }. By the Lemma, we conclude that ak = 0 for all k ∈ N. Hence ak = bk = 0 for k ∈ N. King’s College London 66 Chapter 4 Case 2: 0 < x < 1. Suppose that x= ∞ ∞ X X ak bk = k 10 10k k=1 k=1 and it is false that ak = bk for all k ∈ N. Let N be the smallest integer for which ak 6= bk , so that ak = bk for 1 ≤ k < N but aN 6= bN . Suppose that aN > bN . Then 0 ≤ bN < aN ≤ 9 and aN ≥ 1. We have ∞ ∞ X X ak bk 0=x−x= − 10k 10k k=1 = = k=1 ∞ X k=N = k=1 ∞ X (ak − bk ) 10k (ak − bk ) 10k (aN − bN ) (aN +1 − bN +1 ) + + ... 10N 10N +1 Multiplying by 10N , we see that (aN +1 − bN +1 ) + ... 10 ∞ X cn = (aN − bN ) + 10n 0 = (aN − bN ) + n=1 where cn = aN +n − bN +n for all n ∈ N. Now, we can write (aN − bN ) as aN − bN = c + 1 where c is an integer with c ≥ 0. We also note cn belongs to the P that each n , we get set { −9, −8, . . . , 8, 9 }. Hence, writing 1 = ∞ 9/10 n=1 ∞ ∞ X X 9 cn 0=c+ + n 10 10n n=1 that is, 0=c+ n=1 ∞ X (9 + cn ) n=1 10n . Now, 9 + cn ≥ 0 and c ≥ 0 so that both terms on the rightP hand side above (9+cn ) are non-negative. It must be the case that c = 0 and also ∞ n=1 10n = 0. But then cn = −9 for all n ∈ N, by the Lemma. Hence aN = bN + 1 and aN +n − bN +n = −9 which implies that aN +n = 0 and bN +n = 9 for all n ∈ N and the result follows. Department of Mathematics 67 Series Returning now to the general theory, it is clear that the convergence of a series will not be affected by changing the values of a few terms, although of course, this will change the value of its sum. This is confirmed formally in the next proposition. P P∞ Proposition 4.11. Suppose that ∞ k=1 ak is a convergent series and k=1 bk is any series such that bk = ak except for at most finitely-many k. Then P∞ k=1 bk is also convergent. Proof.PAs always, we look at the partial sums, so let sn = tn = nk=1 bk . Evidently, tn = n X k=1 bk = n X k=1 ( bk − ak ) + | {z } = ck , say n X ak = k=1 n X Pn k=1 ak and let ck + sn . k=1 P Next, let un = nk=1 ck . Now, by hypothesis, ck = 0 except for at most finitely-many k. In other words, there is some N ∈ N such that ck = 0 for all n > N . This means that un is eventually constant, un = n X k=1 ck = N X ck = uN , k=1 whenever n > N , and so (un ) converges (to the value uN ). But tn = un + sn and since the right hand side converges, so does the left hand side and the result follows. Theorem 4.12 (Comparison P∞ Test for positive series). P∞ Suppose 0 ≤ ak ≤ bk for all k ∈ N and that k=1 bk converges. Then k=1 ak also converges. P P Proof. Let sn = nk=1 ak and tn = nk=1 bk . By hypothesis, (tn ) converges and so (tn ) is a bounded sequence. Therefore there is some M > 0 such that tn ≤ M for all n ∈ N. But since 0 ≤ ak ≤ bk , it follows that sn ≤ tn and so the sequence (sn ) of partial sums is bounded above (by M ). Furthermore, sn+1 − sn = an+1 ≥ 0 and so (sn ) is monotone increasing. However, we know that a monotone increasing sequence which is bounded above must converge. Hence result. King’s College London 68 Chapter 4 ∞ X 1 1 1 1 Example 4.13. Consider the series = 1 + 2 + 2 + 2 + ... . k2 2 3 4 k=1 Pn 1 th partial sum. Then (s ) is increasing. Let sn = n k=1 k2 denote the n Furthermore, for n > 1, we see that 1 1 1 1 + 2 + 2 + ··· + 2 2 2 3 4 n 1 1 1 1 <1+ + + + ··· + 1.2 2.3 3.4 (n − 1)n ³ ´ ³ ´ ³ ³ 1 1 1 1 1 1´ 1´ =1+ 1− + − + − + ··· + − 2 2 3 3 4 n−1 n 1 =2− n <2 sn = 1 + and so (sn ) is P bounded from above and therefore must converge. Hence, 1 by definition, ∞ k=1 k2 is convergent. Note, however, that this discussion gives us no hint as to the value of its sum. This is an example where the convergence of a series can quite sensibly be discussed without actually knowing what its sum is. ∞ X 1 ? Example 4.14. What about the series k4 k=1 1 1 ≤ 2 for all k ∈ N, we can apply the Comparison Test for positive 4 k k ∞ X 1 1 1 series to deduce that is convergent. Indeed, since α ≤ 2 for all 4 k k k k=1 P∞ α k ∈ N for any α ≥ 2, we can say that the series k=1 1/k is convergent whenever α ≥ 2. Since ∞ X k=1 1 k 2+β is convergent for any β ≥ 0 Example 4.15. What about the series ∞ X 1 √ ? k k=1 1 1 ≤ √ , k k for every k ∈ N, together with the Comparison Test for positive series to P conclude that the series ∞ 1/k were convergent. However, we know this k=1 √ P not to be the case. It follows that the series ∞ k=1 1/ k is not convergent. If this series were convergent, then we could use the inequality Department of Mathematics 69 Series Indeed, we can apply this reasoning to the series ∞ X 1 kµ P∞ k=1 1/k µ for any µ ≤ 1. is not convergent for any µ ≤ 1 k=1 ∞ X 1 is convergent for Example 4.16. We have seen above that the series kν k=1 ν ≥ 2 but not convergent for ν ≤ 1. It is natural to ask what happens for values of ν lying in the range 1 < ν < 2. We shall see that the series is convergent for all ν > 1. P Write ν = 1 + ε, where ε > 0 and let sn = nk=1 1/k ν . Evidently (sn ) is an increasing sequence so if we can show that it is bounded, then we will be able to conclude that it converges. The idea is to compare the terms 1/k ν with the integral of the function y = 1/xν over unit intervals. In fact, over the range k ≤ x ≤ k + 1, the function y = 1/x(1+ε) is greater than 1/(k + 1)1+ε and so Z k+1 1 dx ≤ . 1+ε (k + 1) x1+ε k Summing over k, we find that 1 1 1 + ν + ··· + ν ν 2 3 n Z 2 Z 3 Z n dx dx dx ≤1+ + + ··· + 1+ε 1+ε 1+ε x 2 x n−1 x Z1 n dx =1+ 1+ε x 1 h 1 in =1+ − ε εx 1 1 1 =1+ − ε ε nε 1 ≤1+ . ε sn = 1 + We see that the sequence (sn ) is bounded from above and since it is also increasing, it must converge. ∞ X 1 kν is convergent for all ν > 1 and divergent for all ν ≤ 1 k=1 This technique of comparing terms of a series with integrals can be quite useful. The general idea is contained in the following theorem. King’s College London 70 Chapter 4 Theorem 4.17 (Integral Test). Suppose that ψ : [1, ∞)R → R is a positive n decreasing function such P that the sequence of integrals ( 1 ψ(x) dx)n∈N con∞ verges as n → ∞. Then n=1 ψ(n) is convergent. Pn Proof. Since ψ(x) ≥ 0, the sequence of partial sums sn = k=1 ψ(k) is increasing. Now, because ψ is decreasing, it follows that ψ(k) ≤ ψ(x) for Rk all x ∈ [k − 1, k] for all k ≥ 2. Hence k−1 (ψ(k) − ψ(x)) dx ≤ 0, that is, Rk ψ(k) ≤ k−1 ψ(x) dx. Therefore sn = ψ(1) + ψ(2) + ψ(3) + · · · + ψ(n) Z 2 Z 3 Z n ≤ ψ(1) + ψ(x) dx + ψ(x) dx + · · · + ψ(x) dx 1 2 n−1 Z n = ψ(1) + ψ(x) dx . 1 ¡R n ¢ By hypothesis, the sequence of integrals 1 ψ(x) dx converges and so is bounded. It follows that the sequence of partial sums (sn ) is bounded from above and therefore converges. The result follows. This theorem can be rephrased in a slightly more general form, as follows. Theorem 4.18 (Integral Test). Let (an ) be a sequence of positive real numbers and suppose Rthat there is some positive function ϕ such that the sequence n of integrals ( 1 ϕ(x) dx)n∈N converges as n → ∞ and such that, for each k ≥ 2, ak ≤ ϕ(x) P for all (k − 1) ≤ x ≤ k. Then ∞ k=1 ak is convergent. P Proof. As usual, let sn = nk=1 ak . Then (sn )n∈N is an increasing sequence (because each ak ≥ 0). We need only show that (sn ) is bounded. To see this, note that for k ≥ 2, Z ak = Z k k−1 ak dx ≤ k ϕ(x) dx k−1 and so sn = a1 + a2 + · · · + an Z 2 Z 3 Z n ≤ a1 + ϕ(x) dx + ϕ(x) dx + · · · + ϕ(x) dx 1 2 n−1 Z n = a1 + ϕ(x) dx . 1 Department of Mathematics 71 Series Rn Now, the sequence ( 1 ϕ(x) dx) converges, by hypothesis, and so it is bounded Rn and therefore there is a constant C such that 1 ϕ(x) dx ≤ C for all n ∈ N. Hence, for any n, Z n ϕ(x) dx ≤ a1 + C 0 ≤ sn ≤ a1 + 1 which shows that (sn ) is a bounded sequence and the result follows. The following test for convergence of positive series is very useful. Theorem 4.19 (D’Alembert’s Ratio Test for positive series). Suppose that an > 0 for all n. (i) Suppose that there is some 0 < ρ < 1 and some N ∈ N such that if an+1 n > N then < ρ. an P∞ Then the series n=1 an is convergent. (ii) If there is N 0 ∈ N such that P∞ series n=1 an is divergent. an+1 ≥ 1 for all n > N 0 , then the an Proof. (i) Suppose that 0 < ρ < 1 and that an+1 < ρ for n > N . Then an aN +2 < aN +1 ρ aN +3 < aN +2 ρ < aN +1 ρ2 aN +4 < aN +3 ρ < aN +1 ρ3 .. . aN +k+1 < aN +1 ρk . aN +1 . ρN +1 In other words, we have an < K ρn for all n > N + 1. Now we construct a new sequence (un ) by setting ( 0, n≤N +1 un = an , n > N + 1 . Hence aN +k+1 < K ρN +k+1 for all k ≥ 1, where we have let K = P∞ n Then certainly un < K ρn for all n. Now, we know that n=1 K ρ is convergent (with sum K ρ/(1 − ρ)) because 0 < ρ < 1. P By the Comparison Test, it follows that P∞ n=1 un is convergent. However, an = un eventually and so it follows that ∞ n=1 an is also convergent and the proof of (i) is complete. King’s College London 72 Chapter 4 (ii) Suppose now that an+1 ≥ 1 for all n > N 0 . Then for any k ∈ N an aN 0 +k ≥ aN 0 +k−1 ≥ · · · ≥ aN 0 +1 > 0 . This means that it is impossible for an → 0 as n → ∞ P (every term after the (N 0 + 1)th is greater than aN 0 +1 ). We conclude that ∞ n=1 an must be divergent. There is another (weaker) but also very useful version of this theorem. Theorem 4.20 (D’Alembert’s Ratio Test for positive series (2nd version)). an+1 Suppose that an > 0 for all n and that → L as n → ∞. an P (i) If L < 1, then ∞ n=1 an is convergent. (ii) If L > 1, then P∞ n=1 an is divergent. (There is no claim as to what happens when L = 1.) an+1 → L where 0 ≤ L < 1. Then for any ε > 0, an an+1 we may say that eventually ∈ (L − ε, L + ε). an an+1 In particular, < L + ε eventually. Let ε be so small that L + ε < 1, an an+1 < ρ where ρ = L+ε < 1. By the previous version of the Theorem, then an P it follows that ∞ n=1 an is convergent. an+1 (ii) Now suppose that L > 1. Then for any ε > 0, eventually an an+1 belongs to the interval (L − ε, L + ε). In particular, eventually > L − ε. an But L > 1, so if ε > 0 is chosen so small that L − ε > 1, then we may say P∞ an+1 that eventually > L − ε > 1 and so n=1 an is divergent, by the an previous version of the Theorem. Proof. (i) Suppose that Example 4.21. What can be said when L = 1? Without further analysis, the answer is “nothing”. Indeed, there are examples of series which converge when L = 1 and other examples of diverge when L = 1. P series which 2 is convergent and we see that For example, we know that ∞ 1/k k=1 an+1 /an = n2 /(n + 1)2 → 1 as n → ∞, so that L = 1 in this case. P However, we also know that ∞ 1/k is divergent, but here again, we see k=1 that an+1 /an = n/(n + 1) → 1 = L as n → ∞. When L = 1, the Ratio Test tells us nothing. Department of Mathematics 73 Series Example 4.22. For fixed 0 ≤ c < 1, the series P∞ k=1 kc k is convergent. If c = 0, there is nothing to prove, so suppose that 0 < c < 1. Setting an = ncn , we see that an+1 (n + 1)cn+1 (n + 1)c = = →c n an nc n as n → ∞. Since an > 0 forPall n and since L = c < 1, we can apply the k Ratio Test to conclude that ∞ k=1 kc is convergent. P p k The same argument shows that for any power p, the series ∞ k=1 k c is convergent (provided 0 ≤ c < 1). For any 0 ≤ c < 1 and any p ∈ N, the series ∞ X k p ck is convergent. k=1 Theorem 4.23 (nth Root Test). Suppose that an > 0 for all n ∈ N and that (an )1/n → ` as n → ∞. P (i) If ` < 1, the series ∞ k=1 ak is convergent. (ii) If ` > 1, then the series P∞ k=1 ak is divergent. (There is no conclusion when ` = 1.) Proof. Suppose that ` < 1. Choose ρ such that ` < ρ < 1 and set ε = ρ − `. 1/n Then ε > 0 and so there is some N ∈ N such that |an − `| < ε whenever n > N . In particular, (an )1/n − ` < ε = ρ − ` P i.e., an < ρn , whenever n > N . We must show that sn = nk=1 ak converges. Since an > 0, the sequence (sn ) is monotone increasing so it is enough to show that (sn ) is bounded from above. But for any n > N , sn = a1 + a2 + · · · + an = sN + aN +1 + · · · + an < sN + ρN +1 + ρN +2 + · · · + ρn ρN +1 − ρn+1 1−ρ N +1 ρ < sN + . (1 − ρ) = sN + Hence, for any j, sj < sN +j < sN + ρN +j+1 1−ρ King’s College London 74 Chapter 4 which shows that the sequence (sn ) is bounded from above and therefore converges, as claimed. Next, suppose that ` > 1. Choose d such that 1 < d < ` and let ε = `−d. Then ε > 0 and there is N ∈ N such that a1/n ∈ (` − ε, ` + ε) n whenever n > N . In particular, for n > N , ` − ε < a1/n n which means that an > dn . It follows that, for any n > N , sn = sN + aN +1 + · · · + an > an > dn > 1 . From this, we see that it is false that an → 0 as n → ∞ and so by the Test P∞ for divergence, k=1 an is divergent. We have considered tests applicable only to positive series. The following is a convergence test for the case when the terms alternate between positive and negative values. Theorem 4.24 (Alternating Series Test). Suppose that (an ) is a positive, decreasing sequence such that an → 0 as n → ∞. Then the (alternating) series ∞ X a1 − a2 + a3 − a4 + . . . = (−1)n+1 an n=1 is convergent. Proof. By hypothesis, an ≥ 0, an+1 ≤ an and an → 0 as n → ∞. Let sn = a1 − a2 + a3 − a4 + · · · + (−1)n+1 an denote the nth partial sum of the series, as usual. We shall consider the two cases when n is even and when n is odd. Suppose that n is even, say n = 2m. Then s2m+2 = s2m + ( a2m+1 − a2m+2 ) | {z } ≥0 and so s2m+2 ≥ s2m . Next, we note that s2m = a1 − a2 + a3 − a4 + a5 − · · · − a2m = a1 − ( a2 − a3 ) − ( a4 − a5 ) − · · · − ( a2m−2 − a2m−1 ) − a2m | {z } | {z } |{z} | {z } ≥0 ≥0 ≥0 ≥0 ≤ a1 . For notational convenience, let xm = s2m . Then we have shown that (xm ) is increasing and bounded from above (by a1 ). It follows that (xm ) converges, say xm → α as m → ∞. Department of Mathematics 75 Series Claim: sn → α as n → ∞. Let ε > 0 be given. Then there is N1 ∈ N such that if m > N1 then |xm − α| < 12 ε. Also, there is N2 ∈ N such that if n > N2 then |an | < 12 ε. Let N = 2(N1 + N2 ). Let n > N and consider |sn − α|. If n is even, say, n = 2m, then n = 2m > N =⇒ 2m > 2(N1 + N2 ) =⇒ m > N1 and so |sn − α| = |s2m − α| = |xm − α| < 1 2 ε < ε. If n is odd, say n = 2k + 1, then n = 2k + 1 > N =⇒ 2k ≥ N = 2(N1 + N2 ) =⇒ k > N1 . Moreover, since N > N2 , we have n > N =⇒ n > N2 and so we see that n = 2k + 1 > N =⇒ both k > N1 and n > N2 . Hence |sn − α| = |s2k+1 − α| = |s2k + a2k+1 − α| = |xk + an − α| ≤ |xk − α| + |an | < 1 2 ε + 12 ε = ε. So regardless of whether n is even or odd, if n > N then |sn − α| < ε. Hence sP n → α as n → ∞, as claimed, and we conclude that the alternating series ∞ n+1 a is convergent. n n=1 (−1) Example 4.25. The series 1 − 12 + 13 − 14 + 15 − 16 + . . . converges. This follows immediately from the Alternating Series Test. P∞ Definition n=1 an is said to converge absolutely if the P∞ 4.26. The series |a | is convergent. series n=1 P n The series ∞ but does n=1 an is said to converge conditionally if it converges P not converge absolutely, i.e., it converges but the series ∞ |a n=1 n | is not convergent. Example 4.27. We have seen that the series ∞ X (−1)n+1 n1 = 1 − 1 2 + 1 3 − 1 4 + 1 5 − 1 6 + ... n=1 P 1 converges. we know that ∞ n=1 n does not converge and so the P∞ However, 1 n+1 series n=1 (−1) n is an example of a conditionally convergent series. King’s College London 76 Chapter 4 Theorem 4.28. Every absolutely convergent series is convergent. P∞ Pn Proof. Suppose Pn that n=1 an is absolutely convergent. Let tnP=∞ k=1 |ak | and sn = k=1 ak . Then we know that tn converges (since n=1 an converges absolutely). It follows that (tn ) is a Cauchy sequence. We shall show that (sn ) is also a Cauchy sequence. Let ε > 0 be given. Then there is N such that n, m > N imply that |tn − tm | < ε . However, for n > m, |sn − sm | = |am+1 + · · · + an | ≤ |am+1 | + · · · + |an | = |tn − tm | and so it follows that |sn − sm | < ε whenever n, m > N , which shows that (sn ) is a Cauchy sequence. But any Cauchy sequence in R converges and the result follows. We know that if a and b are real numbers, then a + b = b + a. More generally, if a1 , . . . , am is a collection of m real numbers, then their sum a1 + · · · + am is the same irrespective of the order in which we choose to add P them together. Now, a series ∞ a n=1 n is the result of adding together real numbers, so it is natural to guess that the order of the addition does not matter. To discuss this, we shall need the notion of a rearrangement. P P∞ Definition 4.29. The series ∞ n=1 bn is a rearrangement of the series n=1 an if there is some one-one map ϕ of N onto N such that bn = aϕ(n) for each n ∈ N. In other words, every b is one of the a’s and every a appears as some b. P Theorem 4.30. Suppose that the series ∞ n=1 an converges absolutely. Then every rearrangement also converges, with the same sum. P P∞ Proof. Let ∞ n=1 bn be a rearrangement of n=1 an . Then there is some Pn oneone map ϕ of N onto N such that b = a for every n. Let s = n n ϕ(n) k=1 ak , Pn Pn P∞ tn = k=1 |ak |, rn = k=1 bk and let s = k=1 ak = limn→∞ sn . We must show that (rn ) converges and that its limit is equal to s. Let ε > 0 be given. Since sn → s and (tn ) is a Cauchy sequence, there is some N ∈ N such that n, m ≥ N imply that both |sn − s| < ε/2 and |tn − tm | < ε/2 . Now, the sequence of b’s is a relabelling of the a’s and so for each j there is some kj such that aj = bkj . Let N 0 = max{ kj : 1 ≤ j ≤ N } so that the Department of Mathematics 77 Series collection a1 , a2 , . . . , aN is included in the collection b1 , b2 , . . . , bN 0 . Then for any n > N 0 b1 + b2 + · · · + bn = a1 + a2 + · · · + aN + σn where σn = a`1 +· · ·+a`r for some integers `1 , . . . , `r with N < `1 < · · · < `r . Now |σn | ≤ |a`1 | + · · · + |a`r | ≤ `r X |ak | = t`r − tN < ε/2 k=N +1 and so if n > N 0 |rn − s| = |sn + σn − s| ≤ |sn − s| + |σn | ≤ ε/2 + ε/2 = ε and the proof is complete. Theorem 4.31 (Cauchy’s Condensation Test). Suppose (an )n∈N is a positive, decreasing sequence of real numbers (that is, P an ≥ 0 and an+1 ≤ an ). For each k ∈ N, let bkP= 2k a2k . Then the series ∞ n=1 an is convergent if and ∞ only if the series k=1 bk is convergent. (In other words, either both series converge or neither does.) P P Proof. Let sn = nm=1 am and tk = ki=1 bk be the partial sums of the series under consideration. Since an ≥ 0, the sequences (an )n∈N and (bk )k∈N are increasing sequences. Now, we know that if an increasing sequence in R is bounded from above, then it must converge, so our strategy is to show that the sequences of partial sums are bounded (from above). The idea is to estimate the partial sums of the series in terms of each other by bracketing the an terms into groups of size 2, 4, 8, 16, . . . and using the fact that an ≥ an+1 . We note that 2a4 ≤ a3 + a4 ≤ 2a2 4a8 ≤ a5 + a6 + a7 + a8 ≤ 4a4 8a16 ≤ a9 + a10 + · · · + a15 + a16 ≤ 8a8 .. . Summing, we find that (for k > 1) 2a4 + 4a8 + · · · + 2k−1 a2k ≤ a3 + a4 + · · · + a2k ≤ 2a2 + 4a4 + · · · + 2k−1 a2k−1 . King’s College London 78 Chapter 4 In terms of the bk s, this becomes 1 2 (b2 + · · · + bk ) ≤ a3 + a4 + · · · + a2k ≤ b1 + b2 + · · · + bk−1 giving the pair of inequalities 1 2 tk − b1 ≤ s2k − (a1 + a2 ) ≤ tk−1 (∗) P∞ Suppose P∞ now that the series n=1 an converges and, for clarity, let us write s = n=1 an = limj→∞ sj . Since (sj ) is increasing, it follows that sj ≤ s for all j ∈ N. From (∗), it follows that 1 2 tk − b1 ≤ s2k − (a1 + a2 ) ≤ s − (a1 + a2 ) for all k ∈ N. Hence (tk ) is a bounded, P increasing sequence and so converges. But, by definition, this means that ∞ i=1 bi is convergent. P Next, suppose that the series ∞ i=1 bi converges and write t for its sum. Then tk ≤ t for all k ∈ N. Now, for any n ∈ N, it is true that 2n > ¡n (as ¢ can be seen n n n by the Binomial Theorem, as follows; 2 = (1+1) = 1+n+ 2 +· · ·+1 > n). Hence sn ≤ s2n and so, using (∗), we get sn ≤ s2n ≤ tn−1 + (a1 + a2 ) ≤ t + (a1 + a2 ) for all n − 1 ∈ N. Therefore P∞ (sn ) is a bounded, increasing sequence and so converges. Therefore j=1 aj is convergent. P Example 4.32. We already know that the series ∞ n=1 1/n diverges, but let us consider it again via the Condensation Test. First, we note that an = 1/n satisfies the hypotheses required to apply the Condensation Test. Now, bk = 2k a2k = 2k /2k = 1 P P and so it is clear that Pk bk = k 1 diverges. Applying the Condensation Test, we conclude that ∞ n=1 1/n diverges. (In fact, we have already shown this from first principles using this method of grouping.) P∞ 1+δ for given δ > 0. Once again, a = 1/n1+δ Next, consider n n=1 1/n satisfies the hypotheses required to apply the Condensation Test. In this case, we have 2k 1 2k bk = 2k a2k = k(1+δ) = k kδ = kδ 2 2 2 2 P so that k bk is a geometric series with common ratio 1/2δ . This series therefore converges (because 1/2δ is smaller than 1). Department of Mathematics 79 Series P We might say that the series n 1/n diverges presumably because the terms 1/n do not become “small enough quickly enough”. Increasing P the power of n from 1 to 1 + δ is sufficient to “speed things up” so that Pn 1/n1+δ does converge, no matter how small δ may be. Consider the series ∞ n=2 1/(n ln n) (we cannot start this series with n = 1 because ln 1 = 0). Is the change from an = 1/n to an = 1/(n ln n) enough to give convergence of the series? To investigate its convergence or otherwise, let an = 1/(n ln n) for n ≥ 2 and set a1 = 5, say, or any value greater than 1/2 ln 2. This choice of a1 the hypotheses is not quite arbitrary but is chosen so that (an )n∈N satisfies P required to apply thePCondensation Test. The series ∞ a n=2 n converges if and only if the series ∞ a does, regardless of our choice for a1 . Applying n=1 n P∞ the Condensation PTest, nwe may say that the series n=2 1/(n ln n) converges if (and only if) ∞ n=1 2 a2n does. But 2n a2n = 1 1 2n = 2n ln(2n ) ln 2 n P and we know that the series n 1/n does not converge. We can conclude, P∞ then, that the series n=2 1/(n ln n) is divergent. The series ∞ X n=2 1 is divergent. (n ln n) King’s College London 80 Department of Mathematics Chapter 4 Chapter 5 Functions Suppose that x represents the value of the length of a side of a square. Then its area depends on x and, in fact, is given by the formula: area = x2 . The area is a function of x. In general, if S is some given subset of R, then a real-valued function f on S is a rule or assignment by which to each element x ∈ S is associated some real number, denoted by f (x). We write f : S → R which is read as “f maps S into R”. One also writes x 7→ f (x) which is read as “x is mapped to the value f (x)”. The set S is called the domain (of definition) of the function f . If x ∈ / S, then f (x) has not been given a meaning. More generally, if A and B are given sets, then a mapping g : A → B is an association a 7→ g(a) of each element of aµto some ¶ element g(a) ∈ B. t 1 For example, for each 0 ≤ t ≤ 1, let g(t) = . Then t 7→ g(t) is 0 t3 an example of a mapping from the interval [0, 1] into the set of 2 × 2 real matrices. In general, if B is equal to either R or C, the the mapping is often referred to as a function. Note that a function may be given by a “pretty formula” but it does not have to be. For example, the function f : R → R with f (x) = 1 + x2 is given by a formula. To get f (x), we just substitute the value of x into the formula. However, the function 2 x < −1, x , x 7→ 1, −1 ≤ x ≤ 0, 3 x + 1, x > 0 is a perfectly good function, but is not given by a formula in the same way as the previous example. In fact, this function seems to be a concoction constructed from the functions x2 , 1 and x3 + 1. A slightly more involved example is if x ∈ /Q, 0, x 7→ 1/n, if x ∈ Q and x = k/n (with k ∈ Z, n ∈ N and where k and n have no common divisors). 81 82 Chapter 5 For a function to be well-defined, there must be specified (i) its domain of definition, (ii) some assignment giving the value it takes at each point of its domain. It is often very useful to consider the visual representation of f given by plotting the points (x, f (x)) in R2 . This is the graph of f . Examples 5.1. 1. Linear functions: x 7→ f (x) = mx + c for constants m, c ∈ R and x ∈ S. 2. Polynomials: x 7→ f (x) = a0 + a1 x + a2 x2 + · · · + an xn for x ∈ S, where the coefficients a0 , a1 , . . . an are constants in R and an 6= 0. n is the degree of such a polynomial. p(x) for x ∈ S where p and q are q(x) polynomials. Note that the right hand side is not defined for any values of x for which q(x) = 0. ( 1/x, x 6= 0 , 4. S = R, f (x) = 3, x = 0. ( x2 , x 6= 0 , 5. S = [−1, 1], f (x) = 2, x = 0. ( 0, x ∈ / Q, 6. S = R, f (x) = 1, x ∈ Q . ( 1, x ∈ / Q, (A thought: let g(x) = and let h = f + g. 0, x ∈ Q 3. Rational functions: x 7→ f (x) = Then we see that h(x) = f (x) + g(x) =R 1 for all x ∈ RR. Certainly R1 1 1 h(x) dx = 1 but what are the values of 0 f (x) dx and 0 g(x) dx and 0 is it true that Z 1 Z 1 Z 1 1= h(x) dx = f (x) dx + g(x) dx ?) 0 7. S = R, 0, 1, f (x) = 41 2, 1, 0 0 x < 0, 0 ≤ x < 1, 1 ≤ x < 6, x ≥ 6. This kind of step-function is familiar from probability theory — it is the cumulative distribution function f (x) = Prob{ X ≤ x } for a random variable X taking the values 0, 1 and 6 with probabilities 14 , 14 and 21 , respectively. Department of Mathematics 83 Functions Let f : S → R be a given function and let A ⊆ S. We say that f is bounded from above on A if there is some M such that f (x) ≤ M for all x ∈ A. Analogously, f is said to be bounded from below on A if there is some m such that f (x) ≥ m for all x ∈ A. If f is both bounded from above and from below on A, then we say that f is bounded on A. ½ ¾ ½ ¾ increasing f (x1 ) ≤ f (x2 ) We say that f is on A if for any x1 , x2 ∈ A decreasing f (x1 ) ≥ f (x2 ) with x1 < x2 . ½ ¾ ½ ¾ increasing f (x1 ) < f (x2 ) We say that f is strictly on A if for any decreasing f (x1 ) > f (x2 ) x1 , x2 ∈ A with x1 < x2 . Examples 5.2. 1. S = R, f (x) = x2 . Then f is bounded from below on R (by m = 0) but f is not bounded from above on R. f is strictly increasing on [0, ∞) and f is strictly decreasing on (−∞, 0]. f is bounded on any bounded interval [a, b]. (We see that 0 ≤ f (x) ≤ max{ a2 , b2 } on [a, b].) 2. S = [1, ∞), f (x) = 1 − x1 for x ∈ S. Then f is increasing and bounded on S. We see that f attains its glb, namely 0 but does not attain its lub, 1. Definition 5.3. Let f : S → R and let x0 ∈ S. We say that f is continuous at the point x0 if for any given ε > 0 there is some δ > 0 such that x ∈ S and |x − x0 | < δ =⇒ |f (x) − f (x0 )| < ε. We say that f is continuous on some given set A if f is continuous at each point of A. What’s going on? Note that continuity is defined at some point x0 . The idea is that one is first given a margin of error, this is the ε > 0. For f to be continuous at the specified point x0 , we demand that f (x) be within distance ε of f (x0 ) as long as x is suitably close to x0 , i.e., x is within some suitable distance δ of x0 . It must be possible to find such δ no matter how small ε is. In general, one must expect that the smaller ε is, then the smaller δ will need to be. The requirement that x ∈ S ensures that f (x) actually makes sense in the first place. If we set h = x − x0 , then we demand that f (x0 + h) be within distance ε of f (x0 ) whenever |h| < δ (provided that x0 + h ∈ S). The point x0 and the error value ε must be given first. Then one must be able to find a suitable δ as indicated. King’s College London 84 Chapter 5 We shall illustrate the idea with a simple example (so no surprises here). Example 5.4. Let S = R and set f (x) = x2 . Let x0 be arbitrary (but fixed). We shall show that f is continuous at x0 . The procedure is as follows. Let ε > 0 be given. We must find some δ > 0 such that |f (x) − f (x0 )| < ε whenever |x − x0 | < δ. For convenience, write x = x0 + h. We see that ¯ ¯ ¯ ¯ ¯ ¯ (∗) |f (x) − f (x0 )| = ¯x2 − x20 ¯ = ¯(x0 + h)2 − x20 ¯ = ¯2x0 h + h2 ¯ . How small must h be in order for this to be smaller than ε? We do not need an optimal estimate, any will do. One idea would be to notice that ¯ ¯ ¯2x0 h + h2 ¯ ≤ |2x0 h| + h2 and then try to make each of these two terms smaller than 12 ε, that is, we try to make sure that both |2x0 h| < 12 ε and h2 < 21 ε. This suggests the two requirements that |h| < ε/(4 |x0 |) and p |h| < ε/2. We must be careful here because it might happen that x0 = 0, in which case we cannot divide by |x0 |. To side-step this nuisance, we shall consider the two cases x0 = 0 and x0 6= 0 separately. So first suppose that x0 6= 0. Then p we simply choose δ to be the minimum of the two terms ε/(4 |x0 |) and ε/2. This will ensure that if |h| < δ then |f (x) − f (x0 )| < ε. Next, suppose that x0 = 0. Then the right hand side of (∗) is simply equal √ to h2 . If we choose δ = ε, then |h| < δ implies that h2 < ε and so, by (∗), we have |f (x) − f (x0 )| < ε. In either of the cases x0 = 0 or x0 6= 0, we have exhibited a suitable δ so that |x − x0 | < δ implies that |f (x) − f (x0 )| < ε. We have shown that f (x) = x2 is continuous at any given point x0 ∈ R and the proof is complete. Notice that the δ depends on both ε and x0 . We must always expect this to happen (even though in some trivial situations it might not). The following theorem gives us an extremely useful characterization of continuity. Theorem 5.5. Let f : S → R and let x0 ∈ S. The following two statements are equivalent: (i) f is continuous at x0 ; (ii) if (an )n∈N is any sequence in S such that an → x0 as n → ∞, then the sequence (f (an ))n∈N converges to f (x0 ). Proof. Suppose that statement (i) holds. To show that (ii) is also true, let (an ) be any sequence in S with the property that an → x0 as n → ∞. We must show that f (an ) → f (x0 ) as n → ∞. Department of Mathematics 85 Functions Let ε > 0 be given. By hypothesis, f is continuous at x0 and so there is some δ > 0 such that |x − x0 | < δ =⇒ |f (x) − f (x0 )| < ε. (∗) But an → x0 as n → ∞ and so there exists N ∈ N such that n > N =⇒ |an − x0 | < δ. (∗∗) Evidently, (∗) and (∗∗) together (with x = an ) tell us that n > N =⇒ |f (an ) − f (x0 )| < ε which means that (f (an )) converges to f (x0 ) as n → ∞, as required. Now suppose that (ii) holds. We must show that this implies that f is continuous at any x0 ∈ S. Suppose that this were not true, that is, let us suppose that f is not continuous at the point x0 ∈ S. What does this mean? It means that there is some ε0 > 0 such that it is false that there is some δ > 0 so that x ∈ S and |x − x0 | < δ =⇒ |f (x) − f (x0 )| < ε0 . That is, there is some ε0 > 0 such that no matter what δ > 0 we choose, it will be false that x ∈ S and |x − x0 | < δ =⇒ |f (x) − f (x0 )| < ε0 . That is, there is some ε0 > 0 such that for any δ > 0 there is some x ∈ S with |x − x0 | < δ such that it is false that |f (x) − f (x0 )| < ε0 . That is, there is some ε0 > 0 such that for any δ > 0 there is some x ∈ S with |x − x0 | < δ such that |f (x) − f (x0 )| ≥ ε0 . Note that x may well depend on δ. How does this help? For given n ∈ N, set δ = n1 . Then, according to the discussion above, there is some point x ∈ S with |x − x0 | < n1 but such that |f (x) − f (x0 )| ≥ ε0 . The number x could depend on n, so let us relabel it and call it an . Then |an − x0 | < 1 n but |f (an ) − f (x0 )| ≥ ε0 . If we do this for each n ∈ N we get a sequence (an )n∈N in S which clearly converges to x0 . However, because |f (an ) − f (x0 )| ≥ ε0 for all n ∈ N, the sequence (f (an ))n∈N does not converge to f (x0 ). This is a contradiction (we started with the hypothesis that (ii) was true). Therefore our assumption that f was not continuous on S is wrong and we conclude that f is indeed continuous on S. This completes the proof that the truth of statement (ii) implies that of statement (i). King’s College London 86 Chapter 5 We can now apply this theorem, together with various known results about sequences, to establish some (not very surprising but) basic properties of continuous functions. Theorem 5.6. Suppose that f : S → R, g : S → R and that α ∈ R. Suppose that x0 ∈ S and that f and g are continuous at x0 . Then (i) The sum f + g is continuous at x0 . (ii) αf is continuous at x0 . (iii) The product f g is continuous at x0 . (iv) If g does not vanish on S, then the quotient f /g is defined on S and is continuous at x0 . Proof. Suppose that (an ) is any sequence in S with the property that an → x0 as n → ∞. Then we know from the previous theorem that f (an ) → f (x0 ) and also that g(an ) → g(x0 ) as n → ∞. It follows that (i) The sum (f + g)(an ) = f (an ) + g(an ) → f (x0 ) + g(x0 ) = (f + g)(x0 ) as n → ∞. (ii) αf (an ) → αf (x0 ) as n → ∞. (iii) The product (f g)(an ) = f (an )g(an ) → f (x0 )g(x0 ) as n → ∞. (iv) Since g does not vanish on S, the quotient f /g is well-defined on S. Moreover, g(an ) 6= 0 for any n ∈ N and so (f /g)(an ) = f (an )/g(an ) → f (x0 )/g(x0 ) as n → ∞. Now applying the previous theorem once again proves (i)—(iv). Remark 5.7. We could also have proved the above facts directly from the definition of continuity. For example, a proof that f + g is continuous at x0 is as follows. Let ε > 0 be given. Then there is some δ 0 > 0 such that |x − x0 | < δ 0 (and x ∈ S) =⇒ |f (x) − f (x0 )| < 21 ε. (∗) The reason for using 12 ε rather than ε will become clear below. Similarly, there is some δ 00 > 0 such that |x − x0 | < δ 00 (and x ∈ S) =⇒ |g(x) − g(x0 )| < 21 ε. Department of Mathematics (∗∗) 87 Functions Now, let δ = min{ δ 0 , δ 00 }. Then, from (∗) and (∗∗), |(f + g)(x) − (f + g)(x0 )| = |f (x) − f (x0 ) + g(x) − g(x0 )| ≤ |f (x) − f (x0 )| + |g(x) − g(x0 )| < 21 ε + 12 ε = ε whenever |x − x0 | < δ (and x ∈ S) and so, by definition, it follows that f + g is continuous at the point x0 in S. Remark 5.8. The function f (x) = x is continuous on R and so with g = f , we deduce from the theorem that f 2 (x) is also continuous on R. This is just the statement that the function x2 is continuous. By induction, we can deduce from the theorem that products of continuous functions and also finite linear combinations of continuous functions are continuous, i.e., if f1 , . . . , fk are each continuous at x0 , then so is the product function f1 f2 . . . fk as well as the linear combination α1 f1 + · · · + αk fk , for any α1 , . . . , αk ∈ R. In particular, any power of a continuous function is continuous and taking f (x) = x, we see that any polynomial a0 + a1 x + · · · + an xn is continuous on R. Example 5.9. The function x 7→ shown as follows. √ x is continuous on [0, ∞). This can be Let x0 ∈ [0, ∞) be fixed and let ε > 0 be given. Suppose first that x0 > 0. For any x ≥ 0, we have √ √ √ | x − x0 | √ √ √ | x + x0 | | x − x0 | = √ √ | x + x0 | | x − x0 | =√ √ x + x0 | x − x0 | < √ x0 <ε √ provided |x − x0 | < δ where we have chosen δ = ε x0 . To conclude, consider the case x0 = 0. Then we simply observe that √ √ √ | x − x0 | = x <ε whenever |x − 0| < δ with δ chosen to be ε2 . King’s College London 88 Chapter 5 Example 5.10. The function x 7→ 1/x, for x > 0, is continuous on (0, ∞). Let f (x) = 1/x for x > 0. To show that f is continuous on (0, ∞), let x0 ∈ (0, ∞) be given and suppose that (an )n∈N is any sequence in (0, ∞) such that an → x0 as n → ∞. We know that this means that 1/an → 1/x0 , that is, f (an ) → f (x0 ) as n → ∞. But this implies that f is continuous at x0 , as required. Note that f is bounded from below (by 0) but f is not bounded from above on (0, ∞). For any M > 0, there is k ∈ N such that k > M , by the Archimedean Property. Hence, if 0 < x < 1/k, then f (x) = 1/x > k > M . It follows that there is no constant M such that f (x) < M for all x ∈ (0, ∞), that is, f is not bounded from above on (0, ∞). From the example above, we see that if f (x) = 1/x for x in any interval of the form (0, b), say, then f is continuous on (0, b) but is not bounded there. This situation cannot happen on closed intervals. This is the content of the following important theorem. Theorem 5.11. Suppose that the function f : [a, b] → R is continuous on the closed interval [a, b]. Then f is bounded on [a, b]. Proof. We argue by contradiction. Suppose that f is continuous on [a, b] but is not bounded. Suppose that f is not bounded from above. This means that for any given M whatsoever, there will be some x ∈ [a, b] such that f (x) > M . In particular, for each n ∈ N (taking M = n) we know that there is some point an , say, in the interval [a, b] such that f (an ) > n. Consider the sequence (an )n∈N . This sequence lies in the bounded interval [a, b] and so, by the Bolzano-Weierstrass Theorem, it has a convergent subsequence (ank )k∈N , say; ank → α as k → ∞. Since a ≤ ank ≤ b for all k, it follows that a ≤ α ≤ b. (The limit of a convergent sequence belonging to a closed interval also belongs to the same closed interval.) But, by hypothesis, f is continuous at α and so ank → α implies that f (ank ) → f (α). It is this that will provide our sought after contradiction. By construction, f (ank ) > nk and so it looks rather unlikely that (f (ank )) could converge. To see that this is the situation, we observe that there is some K ∈ N such that |f (ank ) − f (α)| < 1 for all k > K (because f (ank ) → f (α)). But then f (ank ) = f (ank ) − f (α) + f (α) ≤ |f (ank ) − f (α)| + f (α) < 1 + f (α) for all k > K. However, f (ank ) > nk ≥ k so 1 + f (α) > k for all k ∈ N. This is a contradiction and we conclude that f is bounded from above. Department of Mathematics Functions 89 To show that f is also bounded from below, we consider g = −f . Then g is continuous because f is. The argument just presented, applied to g, shows that g is bounded from above. But this just means that f is bounded from below and the proof is complete. Remark 5.12. The two essential ingredients are that f is continuous and that the interval is both closed and bounded. The boundedness was required so that we could invoke the Bolzano-Weierstrass Theorem and the fact that it was closed ensured that α, the limit of the Bolzano-Weierstrass convergent subsequence actually also belonged to the interval. This in turn guaranteed that f was not only defined at α but was continuous there. If we try to relax these requirements, we see that the conclusion of the theorem need no longer be true. For example, we must insist that f be continuous. Indeed, consider the function f on the closed interval [0, 1] given by ( 0, for x = 0 f (x) = 1/x, for 0 < x ≤ 1. Evidently f is not bounded on [0, 1] but then f is not continuous at the point x = 0. Taking f (x) = 1/x for x ∈ (0, 1], we see that again f is not bounded on (0, 1], but then (0, 1] is not a closed interval. Let f (x) = x for x ∈ [0, ∞). Again, f is not bounded on the interval [0, ∞) but this interval is not bounded. We have seen that a continuous function on a closed interval is bounded. The next theorem tells us that it attains its bounds. Theorem 5.13. Suppose that f is continuous on the closed interval [a, b]. Then there is some α ∈ [a, b] and β ∈ [a, b] such that f (α) ≤ f (x) ≤ f (β) for all x ∈ [a, b]. In other words, if ran f = { f (x) : x ∈ [a, b] } is the range of f , then f (α) = inf ran f = min ran f and f (β) = sup ran f = max ran f . Proof. We have seen that f is bounded. Let m = inf ran f and M = sup ran f . By definition of the supremum, there is some sequence (yn ) in ran f such that yn → M as n → ∞. Since yn ∈ ran f , there is some xn ∈ [a, b] such that yn = f (xn ). By the Bolzano-Weierstrass Theorem, (xn ) has a convergent subsequence (xnk )k∈N . Let β = limk xnk . Then β ∈ [a, b]. Since f is continuous on [a, b], it follows that f (xnk ) → f (β) as k → ∞. But f (xnk ) = ynk and (ynk )k∈N is a subsequence of the convergent sequence (yn ). Therefore (ynk )k∈N converges to the same limit, that is, ynk → M as k → ∞. Since ynk = f (xnk ) → f (β) as k → ∞, we deduce that M = f (β). That is, sup ran f = f (β) and so f (x) ≤ f (β) for all x ∈ [a, b]. King’s College London 90 Chapter 5 We can argue in a similar way to show that there is some α ∈ [a, b] such that m = f (α). However, we can draw the same conclusion using the above result as follows. Note that if g = −f , then g is continuous on the interval [a, b] and sup ran g = −m. By the argument above, there is some α ∈ [a, b] such that −m = g(α). This gives the desired result that m = f (α). Alternative Proof. We know that f is bounded. Let M = sup ran f . To show that f achieves its least upper bound M , we suppose not and obtain a contradiction. Since M is an upper bound and is not achieved by f , we must have that f (x) < M for all x ∈ [a, b]. In particular, M − f is continuous and strictly positive on [a, b]. It follows that h = 1/(M − f ) is also continuous and positive on [a, b]. But then h is bounded on [a, b] and so there is some constant K such that 0 < h ≤ K on [a, b], that is, 0< 1 ≤K. M −f Hence f ≤ M − 1/K which says that M − 1/K is an upper bound for f on [a, b]. But then this contradicts the fact that M is the least upper bound for f on [a, b]. We conclude that f achieves this bound, i.e., there is β ∈ [a, b] such that f (β) = M = sup ran f . In a similar way, if f does not achieve its greatest lower bound, m, then f − m is continuous and strictly positive on [a, b]. Hence there is L such that 1 0< ≤L f −m on [a, b]. Hence m + 1/L ≤ f and m + 1/L is a lower bound for f on [a, b]. This contradicts the fact that m is the greatest lower bound for f on [a, b] and we can conclude that f does achieve its greatest lower bound, that is, there is α ∈ [a, b] such that f (α) = m. Theorem 5.14 (Intermediate-Value Theorem). Any real-valued function f continuous on the interval [a, b] assumes all values between f (a) and f (b). In other words, if ζ lies between the values f (a) and f (b), then there is some s with a ≤ s ≤ b such that f (s) = ζ. Proof. Suppose f is continuous on [a, b] and let ζ be any value between f (a) and f (b). If ζ = f (a), take s = a and if ζ = f (b) take s = b. Suppose that f (a) < f (b) and let f (a) < ζ < f (b). Let A be the set A = { x ∈ [a, b] : f (x) < ζ }. Then a ∈ A and so A is a non-empty subset of the bounded interval [a, b]. Hence A is bounded and so has a least upper bound, s, say. We shall show that f (s) = ζ. Since s = lub A, there is some sequence (an ) in A such that an ↑ s. But A ⊆ [a, b] and so a ≤ an ≤ b and it follows that a ≤ s ≤ b. Furthermore, Department of Mathematics 91 Functions by the continuity of f at s, it follows that f (an ) → f (s). However, an ∈ A and so f (an ) < ζ for each n and it follows that f (s) ≤ ζ. Since, in addition, ζ < f (b), we see that s 6= b and so we must have a ≤ s < b. Let (tn ) be any sequence in (s, b) such that tn → s. Since tn ∈ [a, b] and tn > s, it must be the case that tn ∈ / A, that is, f (tn ) ≥ ζ. Now, f is continuous at s and so f (tn ) → f (s) which implies that f (s) ≥ ζ. We deduce that f (s) = ζ, as required. Now suppose that f (a) > ζ > f (b). Set g(x) = −f (x). Then we have that g(a) < −ζ < g(b) and applying the above result to g, we can say that there is s ∈ [a, b] such that g(s) = −ζ, that is f (s) = ζ and the proof is complete. Corollary 5.15. Suppose that f is continuous on [a, b]. Then ran f , the range of f , is a closed interval [m, M ]. Proof. We know that f is bounded and that f achieves its bounds, that is, there is α ∈ [a, b] and β ∈ [a, b] such that m = inf ran f = f (α) ≤ f (x) ≤ M = sup ran f = f (β) for all x ∈ [a, b]. Evidently, ran f ⊆ [m, M ]. Let c obey m ≤ c ≤ M . By the Intermediate-Value Theorem, there is some s between α and β such that f (s) = c. In particular, c ∈ ran f and so we conclude that ran f = [m, M ]. Example 5.16. f (x) = x6 + 3x2 − 1 has a zero inside the interval [0, 1]. To see this, we simply notice that f (0) = −1 and f (1) = 3. Since f is continuous on R, it is continuous on [0, 1] and so, by the Intermediate-Value Theorem, f assumes every value between −1 and 3 over the interval [0, 1]. In particular, there is some s ∈ [0, 1] such that f (s) = 0, as claimed. Of course, this argument does not tell us whether such s is unique or not. (In fact, it is because f is strictly increasing on [0, ∞) and so cannot take any value twice on [0, ∞). A moment’s reflection reveals that f (x) ≥ f (0) = −1, f is not bounded from above and f (−x) = f (x). Therefore f assumes every value in the range (−1, ∞) exactly twice and assumes the value −1 at the single point x = 0.) Example 5.17 (Thomae’s function). We wish to exhibit a function which is continuous at each irrational point in [0, 1] but is not continuous at any rational point in [0, 1]. Such a function was constructed by Thomae in 1875. Any rational number x may be written as x = p/q where we may assume that p and q are coprime and that p ∈ Z and q ∈ N. This done, we define King’s College London 92 Chapter 5 ϕ : Q → R by setting ϕ(x) = 1/q where x = p/q. For example, ϕ(x) = 1 for x = 1 ϕ(x) = ϕ(x) = ϕ(x) = 1 2 1 3 1 4 for x = 1 2 1 3, 1 4, for x = 1 2 11 , 11 , for x = for x = 2 3 3 4 .. . ϕ(x) = 1 11 ..., 10 11 . . . and so on. Suppose x ∈ Q obeys 0 < x < 1 and that ϕ(x) = 1/q. Then x must be of the form x = p/q for some p ∈ N with 1 ≤ p ≤ q − 1. In particular, for any given q ∈ N, { x ∈ Q : 0 < x < 1 and ϕ(x) = 1/q } is a finite set of rational numbers. Next, we define f : [0, 1] → R with the help of ϕ as follows. x=0 1, f (x) = ϕ(x), x ∈ Q ∩ [0, 1] 0, x∈ / Q ∩ [0, 1] . Claim: f is discontinuous at every rational in [0, 1]. Proof First we note that f (0) = 1 and that f (x) = 1/q when x has the form p/q (with p, q coprime). In any event, f (x) > 0 for any given rational x in [0, 1]. Now let r ∈ Q ∩ [0, 1] be given and let (xn ) be any sequence of irrationals in [0,√ 1] which converge to r. (For example, if r 6= 0 we could let √ xn = r(1 − 1/n 2) but otherwise let xn = 1/n 2.) Then f (xn ) = 0 for every n so it cannot be true that f (xn ) → f (r) (because f (r) > 0), that is, f fails to be continuous at r, as claimed. Claim: f is continuous at every irrational in [0, 1]. Proof Let x0 be any given irrational number with 0 < x0 < 1. Then f (x0 ) = 0. Let ε > 0 be given. We must show that there is some δ > 0 such that x ∈ [0, 1] and |x − x0 | < δ =⇒ |f (x) − f (x0 )| < ε . | {z } (∗) = |f (x)−0| = |f (x)| Now f (0) = f (1) = 1 and so (∗) must fail for x = 0 or x = 1 if ε < 1. Furthermore, (∗) fails if x = p/q (p, q coprime) and ϕ(x) = 1/q ≥ ε, that is, q ≤ 1/ε. In other words, (∗) will fail if x = 0, x = 1 or else x ∈ Q ∩ [0, 1] and Department of Mathematics 93 Functions ϕ(x) = 1/q where q ≤ 1/ε. However, there are only finitely-many numbers q ∈ N obeying q ≤ 1/ε and so the set A = { r ∈ Q ∩ [0, 1] : r = 0 or ϕ(r) ≥ ε } is finite. Write A = { r1 , . . . , rm }. Since x0 ∈ / Q, it follows that x0 6= rj for any 1 ≤ j ≤ m. For each 1 ≤ j ≤ m, let δj = |x0 − rj | and let δ = min{ δj : 1 ≤ j ≤ m }. Then δ > 0 and if x obeys |x − x0 | < δ it must be the case that x 6= rj for any 1 ≤ j ≤ m. It follows that if x ∈ [0, 1] and obeys |x − x0 | < δ, then either x∈ / Q and so f (x) = 0 or else x ∈ Q but x ∈ / A and so f (x) = ϕ(x) < ε. In any event, (∗) holds and so f is continuous at x0 , as required. King’s College London 94 Chapter 5 Differentiability We know from calculus that the slope of the tangent to the graph of a function f at some point is given by the so-called derivative at the point in question. To find this slope, one considers the limiting behaviour of the Newton quotient f (a + h) − f (a) h as h approaches 0. We wish to set this up formally. Definition 5.18. We say that the function f is differentiable at the point a (a) if limh→0, h6=0 f (a+h)−f exists, that is, if there is some ξ ∈ R such that for h any ε > 0 there is some δ > 0 such that ¯ f (a + h) − f (a) ¯ ¯ ¯ 0 < |h| < δ =⇒ ¯ − ξ ¯ < ε. h The real number ξ is called the derivative of f at a and is usually written df as f 0 (a) or as dx (a). Remarks 5.19. f (a + h) − f (a) is not defined for h = 0 h and clearly, it will only make any sense if both f (a) and f (a + h) are defined. We shall take it to be part of the definition that this is true, at least for suitably small values of h. That is, we assume that there is some (possibly very small) open interval around a of the form (a − ρ, a + ρ) on which f is defined. This means that if a function is defined only on the integers Z, say, then it will not make any sense to discuss its differentiability. 1. Note that the Newton quotient 2. We see immediately that if f is constant, then f (a + h) = f (a) for any h and so the Newton quotient is zero for all h 6= 0 and therefore f is indeed differentiable at a with derivative f 0 (a) = 0. 3. Suppose that f is differentiable at a with derivative f 0 (a). Let Φf,a be the function given by f (x) − f (a) , x 6= a Φf,a (x) = x−a f 0 (a) , x = a. Then f (a + h) − f (a) , h 6= 0 Φf,a (a + h) = h f 0 (a) , h = 0. Department of Mathematics 95 Functions By definition of differentiability, for any given ε > 0 there is some δ > 0 such that ¯ ¯ 0 < |h| < δ =⇒ ¯Φf,a (a + h) − f 0 (a)¯ < ε , that is, 0 < |h| < δ =⇒ |Φf,a (a + h) − Φf,a (a)| < ε . (∗) Now, (∗) is still valid if we allow h = 0 and so (with x = a + h), we see that |x − a| < δ =⇒ |Φf,a (x) − Φf,a (a)| < ε . In other words, the differentiability of f implies that Φf,a is continuous at x = a. ( x3 , x ≥ 0 Example 5.20. Let f (x) = What is f 0 (x)? x2 , x < 0 . Consider the region x > 0. Here, f (x) = x3 and so f is differentiable with derivative 3x2 for any x > 0. In the region x < 0, f (x) = x2 and so f 0 (x) = 2x for any x < 0. What about x = 0? We must argue from first principles. The Newton quotient (with a = 0) is 3 h − 0 2 f (0 + h) − f (0) h = h = 2 h h − 0 = h h → 0 as h → 0. for h > 0 for h < 0 Hence f is differentiable at x = 0 with derivative f 0 (0) = 0. Proposition 5.21. If f is differentiable at a, then f is continuous at a. Proof. The idea is straightforward. For h 6= 0, we can write f (a + h) − f (a) = ³ f (a + h) − f (a) ´ h h. The first term on the right hand side approaches f 0 (a) as h → 0 and so the whole right hand side should approach zero as h → 0. Looking at the left hand side, this means that f (a + h) approaches f (a) as h → 0. Formally, we have f (x) = f (a) + Φf,a (x) (x − a) . The right hand side is the product of the two functions Φf,a (x) and (x − a), each being continuous at x = a and so the same is true of their product. Therefore the left hand side is continuous at x = a, as required. King’s College London 96 Chapter 5 Example 5.22. The converse to Proposition 5.21 is false. As an example, consider f (x) = |x| for x ∈ R. Then f is continuous at every x ∈ R. However, f is not differentiable at x = 0. Indeed, ( 1, if h > 0 f (0 + h) − f (0) |h + 0| − |0| |h| = = = h h h −1, if h < 0 so the Newton quotient does not have a limit as h → 0 (with h 6= 0) and consequently f is not differentiable at x = 0. The following are familiar and very important rules. Proposition 5.23. Suppose that f and g are differentiable at x0 . (i) For any α ∈ R, αf is differentiable at x0 and (αf )0 (x0 ) = α f 0 (x0 ). (ii) The sum f + g is differentiable at x0 and (f + g)0 (x0 ) = f 0 (x0 ) + g 0 (x0 ) . (iii) The product f g is differentiable at x0 and (f g)0 (x0 ) = f 0 (x0 ) g(x0 ) + f (x0 ) g 0 (x0 ) . (iv) Suppose that f 6= 0. Then 1/f is differentiable at x0 and ³ 1 ´0 f (x0 ) = −f 0 (x0 ) . (f (x0 ))2 Proof. In the following, h is small but h 6= 0. (i) We have (α f )(x0 + h) − (α f )(x0 ) α f (x0 + h) − α f (x0 ) = h h = α Φf,x0 (x0 + h) → α f 0 (x0 ) as h → 0. (ii) We have (f + g)(x0 + h) − (f + g)(x0 ) f (x0 + h) + g(x0 + h) − f (x0 ) − g(x0 ) = h h f (x0 + h) − f (x0 ) g(x0 + h) − g(x0 ) = + h h → f 0 (x0 ) + g 0 (x0 ) as h → 0. Department of Mathematics 97 Functions (iii) We have (f g)(x0 + h) − (f g)(x0 ) f (x0 + h) g(x0 + h) − f (x0 ) g(x0 ) = h h f (x0 + h) − f (x0 ) g(x0 + h) − g(x0 ) = g(x0 + h) + f (x0 ) h h → f 0 (x0 ) g(x0 ) + g 0 (x0 ) f (x0 ) as h → 0, since g is continuous at x0 . (iv) We have µ ¶ 1/f (x0 + h) − 1/f (x0 ) 1 1 1 = − h h f (x0 + h) f (x0 ) µ ¶ 1 f (x0 ) − f (x0 + h) = h f (x0 + h) f (x0 ) − Φf (x0 + h) = f (x0 + h) f (x0 ) − f 0 (x0 ) → (f (x0 ))2 as h → 0 since f is continuous at x0 . Recall that f ◦ g denotes the composition x 7→ f (g(x)) (function of a function). Of course, for this to be well-defined the range of g must be contained in the domain of definition of f . In the following, we assume that this is satisfied. Theorem 5.24 (Chain Rule). Suppose that g is differentiable at x0 and that f is differentiable at v0 = g(x0 ). Then the composition f ◦ g is differentiable at x0 and (f ◦ g)0 (x0 ) = f 0 (g(x0 )) g 0 (x0 ) . Proof. Suppose that h is small and that h 6= 0. Let v0 = g(x0 ) and put λ = g(x0 + h) − g(x0 ) so that g(x0 + h) = v0 + λ. Then (f ◦ g)(x0 + h) − (f ◦ g)(x0 ) f (g(x0 + h)) − f (g(x0 )) = h h f (v0 + λ) − f (v0 ) = h 1 = Φf,v0 (v0 + λ) λ (even if λ = 0) h ³ g(x + h) − g(x ) ´ 0 0 = Φf,v0 (v0 + λ) h = Φf,v0 (v0 + λ) Φg,x0 (x0 + h) . King’s College London 98 Chapter 5 Now, g(x0 + h) → g(x0 ) as h → 0 because g is continuous at x0 . In other words, λ = g(x0 + h) − g(x0 ) → 0 as h → 0. It follows that Φf,v0 (v0 + λ) Φg,x0 (x0 + h) → Φf,v0 (v0 ) Φg,x0 (x0 ) = f 0 (v0 ) g 0 (x0 ) = f 0 (g(x0 )) g 0 (x0 ) as h → 0 and the result follows. Imagine a function f (x) on the interval [0, 1], say, which has the property that f (0) = f (1). Can we draw any conclusions about the behaviour of f (x) for x between 0 and 1? It seems clear that either f is constant on [0, 1] or else “goes up and or down” but in any event must have a “turning point”. We know from calculus that this should demand that f 0 be zero somewhere. However, it is clear that f cannot be entirely arbitrary for this to be true. For example, suppose that f (0) = 0 = f (1) and that f (x) = 5x for 0 < x < 1. Evidently f 0 is never zero. In fact, f 0 (x) = 5 for 0 < x < 1. We note that f is not continuous at x = 1. As another example, consider f (x) = 1 − |x| for x ∈ [−1, 1]. We see that f (−1) = 0 = f (1) but is it true that f 0 is zero for x between −1 and 1? No, it is not. We see that f 0 (x) = 1 for −1 < x < 0 and that f 0 (x) = −1 for 0 < x < 1 and f is not differentiable at x = 0. In this example, f is continuous on [−1, 1] but fails to be differentiable on (−1, 1). If we impose suitable continuity and differentiability hypotheses, then what we want will be true. Theorem 5.25 (Rolle’s Theorem). Suppose that f is continuous on the closed interval [a, b] and is differentiable in the open interval (a, b). Suppose further that f (a) = f (b). Then there is some ξ ∈ (a, b) such that f 0 (ξ) = 0. (Note that ξ need not be unique.) Proof. Since f is continuous on [a, b], it follows that f is bounded and attains its bounds, by Theorem 5.13. Let m = inf{ f (x) : x ∈ [a, b] } and let M = sup{ f (x) : x ∈ [a, b] }, so that m ≤ f (x) ≤ M , for all x ∈ [a, b]. If m = M , then f is constant on [a, b] and this means that f 0 (x) = 0 for all x ∈ (a, b). In this case, any ξ ∈ (a, b) will do. Suppose now that m 6= M , so that m < M . Since f (a) = f (b) at least one of m or M must be different from this common value f (a) = f (b). Suppose that M 6= f (a) ( = f (b)). As noted above, by Theorem 5.13, there is some ξ ∈ [a, b] such that f (ξ) = M . Now, M 6= f (a) and M 6= f (b) and so ξ 6= a and ξ 6= b. It follows that ξ belongs to the open interval (a, b). Department of Mathematics 99 Functions We shall show that f 0 (ξ) = 0. To see this, we note that f (x) ≤ M = f (ξ) for any x ∈ [a, b] and so (putting x = ξ +h) it follows that f (ξ +h)−f (ξ) ≤ 0 provided |h| is small enough to ensure that ξ + h ∈ [a, b]. Hence f (ξ + h) − f (ξ) ≤ 0 for h > 0 and small h (∗) f (ξ + h) − f (ξ) ≥ 0 for h < 0 and small. h (∗∗) and But (∗) approaches f 0 (ξ) as h ↓ 0 which implies that f 0 (ξ) ≤ 0. On the other hand, (∗∗) approaches f 0 (ξ) as h ↑ 0 and so f 0 (ξ) ≥ 0. Putting these two results together, we see that it must be the case that f 0 (ξ) = 0, as required. It remains to consider the case when M = f (a). This must require that m < f (a) ( = f (b)). We proceed now just as before to deduce that there is some ξ ∈ (a, b) such that f (ξ) = m and so (∗) and (∗∗) hold but with the inequalities reversed. However, the conclusion is the same, namely that f 0 (ξ) = 0. Theorem 5.26 (Mean Value Theorem). Suppose that f is continuous on the closed interval [a, b] and differentiable on the open interval (a, b). Then there is some ξ ∈ (a, b) such that f 0 (ξ) = f (b) − f (a) . b−a Proof. Let y = `(x) = mx + c be the straight line passing through the pair of points (a, f (a)) and (b, f (b)). Then the slope m is equal to the ratio (f (b) − f (a))/(b − a). Let g(x) = f (x) − `(x). Evidently, g is continuous on [a, b] and differentiable on (a, b) (because ` is). Furthermore, since `(a) = f (a) and `(b) = f (b), by construction, we find that g(a) = 0 = g(b). By Rolle’s Theorem, Theorem 5.25, applied to g, there is some ξ ∈ (a, b) such that g 0 (ξ) = 0. However, g 0 (x) = f 0 (x) − m for any x ∈ (a, b) and so f 0 (ξ) = m = f (b) − f (a) b−a and the proof is complete. We know that a function which is constant on an open interval is differentiable and that its derivative is zero. The converse is true (so no surprise there then). King’s College London 100 Chapter 5 Corollary 5.27. Suppose that f is differentiable on the open interval (a, b) and that f 0 (x) = 0 for all x ∈ (a, b). Then f is constant on (a, b). Proof. Let α and β be any pair of points in (a, b). We shall show that f (α) = f (β). By relabelling, if necessary, we may suppose that α < β. By hypothesis, f is differentiable at each point in the closed interval [α, β] and so is also continuous there, by Proposition 5.21. f obeys the hypotheses of the Mean value Theorem on [α, β] and so we can say that there is some ξ ∈ (α, β) such that f (β) − f (α) f 0 (ξ) = . β−α However, f 0 vanishes on (a, b) and so f 0 (ξ) = 0 which means that we must have f (α) = f (β) and the result follows. Remark 5.28. The Mean Value Theorem can sometimes be useful for obtaining inequalities. For example, setting f (x) = sin x and assuming standard properties of the trigonometric functions, we can apply the Mean Value Theorem to f on the interval [0, x] for x > 0 to find that f 0 (ξ) = f (x) − f (0) x−0 or cos ξ = sin x x for some ξ ∈ (0, x). However, cos θ ≤ 1 for all θ and so we find that sin x ≤ x for all x > 0. Similarly, applying the Mean Value Theorem to f (x) = ln(1 + x) on the interval [0, x], we find that f 0 (ξ) = f (x) − f (0) x−0 or 1 ln(1 + x) = 1+ξ x for some ξ ∈ (0, x). But then 1/(1 + ξ) < 1 and we find that ln(1 + x) < x for any x > 0. These inequalities could also have easily been obtained from the fact that the integral of a positive function is positive. Indeed, Z x x − sin x = (1 − cos t) dt ≥ 0 . 0 In the same way, Z x − ln(1 + x) = 0 x¡ 1− 1 1+t ¢ dt ≥ 0. In fact, one can show that both integrals are strictly positive if x > 0 so this last method gives the strict inequalities sin x < x and ln(1 + x) < x for all x > 0. (In this connection, note that if ln(1 + x) = x, then 1 + x = ex . This is not possible for any x > 0 as is seen from the series expansion for ex .) Department of Mathematics 101 Functions Suppose that f and g are continuous on [a, b], differentiable on (a, b) and that g 0 is never zero on (a, b). The Mean Value Theorem applied to f and g tells us that there is some ξ and η in (a, b) such that f (b) − f (a) = f 0 (ξ) and b−a g(b) − g(a) = g 0 (η) . b−a Dividing (and noting that g(b) − g(a) 6= 0 since g 0 (η) 6= 0, by hypothesis), gives f (b) − f (a) f 0 (ξ) = 0 . g(b) − g(a) g (η) It is possible to do a little better. Theorem 5.29 (Cauchy’s Mean Value Theorem). Suppose that f and g are continuous on [a, b] and differentiable on (a, b). Suppose further that g 0 is never zero on (a, b). Then there is some ξ ∈ (a, b) such that f (b) − f (a) f 0 (ξ) = 0 . g(b) − g(a) g (ξ) Proof. First, we observe that if g(a) = g(b) then Rolle’s Theorem tells us that g 0 (η) = 0 for some η ∈ (a, b). However, g 0 has no zeros on (a, b), by hypothesis, and so it follows as noted above that g(a) 6= g(b). Set ¡ ¢ ¡ ¢ ϕ(x) = g(b) − g(a) f (x) − f (b) − f (a) g(x) . Then ϕ(a) = g(b) f (a) − f (b) g(a) = ϕ(b) and ϕ satisfies the hypotheses of Rolle’s Theorem. Hence there is some ξ ∈ (a, b) such that ϕ0 (ξ) = 0, that is, ¡ ¢ ¡ ¢ g(b) − g(a) f 0 (ξ) − f (b) − f (a) g 0 (ξ) = 0 or f (b) − f (a) f 0 (ξ) = 0 , g(b) − g(a) g (ξ) as required. Remark 5.30. Notice that interchanging a and b does not affect the left hand side of the above equality. This means that we can slightly rephrase Cauchy’s Mean Value Theorem to say that for any a 6= b there is some ξ between a and b such that f (b) − f (a) f 0 (ξ) = 0 , g(b) − g(a) g (ξ) regardless of whether a < b or a > b. King’s College London 102 Chapter 5 Taylor’s Theorem It is convenient to let f (k) denote the k th -derivative of f (whenever it exists). dk (xj ) dk (xj ) = 0, whereas if k ≤ j, then we see that = dxk dxk j(j − 1) . . . (j − (k − 1))xj−k . This vanishes when x = 0 and so we see that ¯ dk (xj ) ¯¯ =0 dxk ¯ x=0 Now, if k > j, then for any k, j ∈ N. Consider the polynomial p(x) = α0 + α1 x + α2 x2 + · · · + αm xm . Taking derivatives and setting x = 0, we find p(0) = α0 , p0 (0) = α1 , p(2) (0) = 2α2 , p(3) (0) = 3! α3 . In general, p(k) (0) = k! αk . Now consider some general function f (x) and define a0 = f (0), a1 = f 0 (0), a2 = 1 2 f (2) (0), . . . , ak = 1 (k) f (0), . . . , etc. k! Let Pn−1 (x) = a0 + a1 x + a2 x2 + · · · + an−1 xn−1 Rn (x) = f (x) − Pn−1 (x) . If f (x) is a polynomial of degree n − 1, then f (x) = Pn−1 (x) and Rn (x) = 0. So, in general, we can think of Pn−1 (x) as a polynomial approximation to f (x) and Rn (x) as the remainder. The smaller Rn (x) is, so f (x) is closer to a polynomial. The question is, what can be said about Rn (x)? This is the content of Taylor’s Theorem. To begin with, we notice that for k ≤ n − 1, (k) R(k) (0) = f (k) (0) − Pn−1 (0) = f (k) (0) − k! ak = 0 , by our construction of the ak s. We will use this in the following discussion. Now, for x 6= 0, we apply Cauchy’s Mean Value Theorem to the pair of functions Rn (t) and gn (t) = tn , to write Rn (x) − Rn (0) R0 (ζ) = 0n gn (x) − gn (0) gn (ζ) for some ζ lying between 0 and x. (It does not matter whether x > 0 or x < 0.) Now, any such ζ can be expressed in the form ζ = θ1 x for some 0 < θ1 < 1. Hence Rn (x) Rn (x) − Rn (0) R0 (θ1 x) = = 0n gn (x) gn (x) − gn (0) gn (θ1 x) Department of Mathematics 103 Functions for some 0 < θ1 < 1, since both Rn (0) = 0 and gn (0) = 0. (k) (k) We repeat this argument applied successively to Rn (t) and gn (t), and (k) (k) use the facts that Rn (0) = 0 and gn (0) = 0 for k ≤ n − 1, to deduce that Rn (x) Rn (x) − Rn (0) R0 (θ1 x) = = 0n for some 0 < θ1 < 1, gn (x) gn (x) − gn (0) gn (θ1 x) R0 (θ1 x) − Rn0 (0) Rn00 (θ2 θ1 x) = n0 = for some 0 < θ2 < 1, gn (θ1 x) − gn0 (0) gn00 (θ2 θ1 x) (3) = Rn00 (θ2 θ1 x) − Rn00 (0) Rn (θ3 θ2 θ1 x) = (3) 00 00 gn (θ2 θ1 x) − gn (0) gn (θ3 θ2 θ1 x) .. . (n−1) = Rn (n−1) gn (n−1) (θn−1 . . . θ1 x) − Rn (n−1) (θn−1 . . . θ1 x) − gn (0) (0) for some 0 < θ3 < 1, (n) = Rn (θn . . . θ1 x) (n) gn (θn . . . θ1 x) for some 0 < θn < 1. (n) (n) (n) However, Rn (s) = f (n) (s) − Pn−1 (s) = f (n) (s) since Pn−1 (s) = 0 and (n) gn (s) = n! . Let τ = θ1 θ2 . . . θn . Then 0 < τ < 1 and we get that f (n) (τ x) f (x) − Pn−1 (x) Rn (x) = = xn gn (x) n! We can rewrite this to give f (x) = Pn−1 (x) + xn (n) f (τ x) n! for some 0 < τ < 1. We have established the following theorem. Theorem 5.31 (Taylor’s Theorem). Suppose f is defined on some interval (α, β) and has derivatives up to order n at all points in (α, β). Suppose also that 0 ∈ (α, β) and x ∈ (α, β). Then f (x) = f (0) + x f 0 (0) + where Rn (x) = x2 00 xn−1 f (0) + . . . + f (n−1) (0) + Rn (x) 2! (n − 1)! xn (n) f (ξ) for some ξ between 0 and x. n! Remark 5.32. Note that ξ will generally depend on f , x and also n. Example 5.33. Let f (x) = ln(1 + x) on, say (−1, 3). The derivatives of f are given by (−1)k+1 (k − 1)! f (k) (x) = for k ∈ N. (1 + x)k King’s College London 104 Chapter 5 For any x ∈ (−1, 3), by Taylor’s Theorem (up to remainder order n + 1), we may say that ln(1 + x) = x − where Rn+1 (x) = x2 x3 x4 + − + · · · + Rn+1 (x) 2 3 4 xn+1 (−1)n+2 n! xn+1 (−1)n+2 = (n + 1)! (1 + ξ)n+1 (n + 1) (1 + ξ)n+1 for some ξ between 0 and x. Now let x = 1. Then f (1) = ln 2 and so ¡ n ¢ ln 2 − 1 − 12 + 13 − 14 + · · · + (−1) = Rn+1 (1) n (−1)n+2 for some 0 < ξ < 1. (n + 1)(1 + ξ)n+1 1 which means that Rn+1 (1) → 0 as n → ∞. It But |Rn+1 (1)| < n+1 follows that where Rn+1 (1) = ln 2 = 1 − 1 2 + 1 3 − 1 4 + ... = ∞ X (−1)n+1 n=1 n There is a further more general formulation. For fixed a, let g(s) = f (s + a) and apply Taylor’s Theorem to g(s) to get g(s) = g(0) + s g 0 (0) + 12 s2 g 00 (0) + · · · + sn−1 sn (n) g (n−1) (0) + g (ξ) (n − 1)! n! for some ξ between 0 and s. Now, g(s) = f (s + a) and g(0) = f (a). Furthermore, by the chain rule, we find that g (k) (0) = f (k) (a) and g (n) (ξ) = f (n) (ξ + a). But if ξ lies between 0 and s then ξ + a lies between a and s + a. Putting x = s + a, we have s = x − a and so η = ξ + a lies between a and x. We arrive at the following version of Taylor’s Theorem. Theorem 5.34 (Taylor’s Theorem for f about a). Suppose f is defined on some interval (α, β) and has derivatives up to order n at all points in (α, β). Suppose also that a ∈ (α, β) and x ∈ (α, β). Then f (x) = f (a) + (x − a) f 0 (a) + where Rn (x) = (x − a)2 00 f (a) + · · · 2! (x − a)n−1 (n−1) ··· + f (a) + Rn (x) (n − 1)! xn (n) f (η) for some η between a and x. n! Department of Mathematics Chapter 6 Power Series P∞ n Definition 6.1. A series of the form n=0 an (x − α) , where the an are constants, is called a power series (about x = α). We notice immediately that such a power series always converges for x = α (in this case, all terms, except possibly for the a0 term, are zero). What can be said about the convergence of power series? The following results explain the situation. By setting w = x − α, it is often sufficient to consider the case α = 0, so that the powers are simply powers of x and we will usually do this. P∞ n Proposition 6.2. Suppose that the power series n=0 an x converges for some value x = x0 with x0 6= 0. Then it converges absolutely for every x satisfying |x| < |x0 |. P Proof. Let Sn (x) = nk=0 ak xk . By hypothesis, (Sn (x0 ))n∈N∪{ 0 } converges. In particular, (ak xk0 ) converges (to zero) and so is a bounded sequence; that is, there is some M > 0 such that |ak xk0 | < M for all k. P We wish to show that nk=0 |ak xk | converges for every x with |x| < |x0 |. Suppose, then, that |x| < |x0 | and set ρ = |x/x0 |. Evidently, P x obeys k converges. But then 0 ≤ ρ < 1 and so ∞ ρ k=0 |ak xk | = |ak xk0 | |x/x0 |k ≤ M ρk and so Pn k k=0 |ak x | converges by the Comparison Test. Radius of Convergence of a Power Series Consider a given power series P∞ J = {x ∈ R : n n=0 an x P∞ and let n n=0 an x converges }. What can be said about J ? Certainly, 0 ∈ J and it could happen that this is the only element of J. For example, if an = nn , then an xn = (nx)n and so no matter how small x is, eventually |nx| > 1 provided x 6= 0. This means 105 106 Chapter 6 that for any given x 6= 0, it is false that an xn → 0 as n → ∞ and so the power series cannot converge. In this P∞case Jn = { 0 }. Suppose that x ∈ J, so that 0 n=0 an x0 is convergent. Then we know P∞ n that n=0 an x also converges (absolutely) for every x obeying |x| < |x0 |. In other words, if x0 ∈ J, then every point in the interval (− |x0 | , |x0 |) also belongs to J. What does this mean for J? There are 3 distinct (mutually exclusive) possibilities. (i) J = { 0 }. (ii) J is bounded but there is some t 6= 0 with t ∈ J (that is, J 6= { 0 } but is bounded). (iii) J is unbounded. We can immediately deduce that if J is not bounded, case (iii), then it must be the whole of R. Indeed, to say that J is not bounded is to say that for any r > 0, there is some x ∈ J with |x| > r. Hence [−r, r] ⊆ J for all r > 0 and so J = R. Now consider case (ii) and let P∞ n A = {r > 0 : n=0 an x converges for x ∈ (−r, r) } . Evidently, if t ∈ J, then |t| ∈ A and so A is bounded because J is. Let R = lub A. Then R > 0 otherwise we are in case (i). Suppose 0 < ρ < R. Then, by definition of lub, there is r ∈ A such P n converges (absolutely) for that ρ < r ≤ R. But then the series ∞ a x n=0 n x ∈ (−r, r) and, in particular, for x with |x| = ρ. P n Next, suppose that x ∈ R with |x| = ρ > R. If ∞ n=0 an x were to converge, then we could deduce that (−ρ, ρ) ⊆ J which would mean that ρ A. This contradicts the fact that R is an upper bound for A and so P∈ ∞ n n=0 an x cannot converge for any such x. P n Case (ii) means then that there is some R > 0 such that ∞ n=0 an x converges (absolutely) for all x with |x| < R but diverges for any x with |x| > R. The behaviour of the power series when |x| = R (i.e., x = ±R) requires separate extra discussion and will depend on the particular power series. Anything is possible. This discussion is summarized in the following very important theorem. Department of Mathematics 107 Power Series Theorem 6.3 (Radius Theorem for Power Series). For any P of Convergence n , exactly one of the following three posgiven power series ∞ a (x − α) n n=0 sibilities applies. P∞ n (i) n=0 an (x − α) converges only for x = α. P n (ii) There is R > 0 such that ∞ n=0 an (x − α) converges (absolutely) for all |x − α| < R but diverges for any x with |x − α| > R. (iii) P∞ n=0 an (x − α)n converges (absolutely) for all x. Definition 6.4. The value R above is called the radius of convergence of the power series. In case (iii), one says that the series has an infinite radius of convergence. Examples 6.5. P n 1. Consider ∞ n=0 x . This series converges if |x| < 1 (by the Ratio Test) and otherwise diverges, so R = 1. Note that the series diverges at both of the boundary values x = ±1. 2. Consider ∞ X an xn = 1 + x + n=0 x2 x3 + + ... 2 3 The series converges if |x| < 1 (by Comparison with 1 + x + x2 + . . . ). If x = 1, then it becomes 1 + 1 + 21 + 13 + . . . which we know diverges. It follows that it cannot converge for any x with |x| > 1. When x = −1, it becomes 1 − 1 + 12 − 13 + . . . which converges. So 1 + x + x2 /2 + x3 /3 + . . . converges at x = −1 but diverges at x = 1. Replacing x by −x, we see that the series 1−x+ x2 x3 x4 x5 − + − + ... 2 3 4 5 converges for |x| < 1 and for x = 1 but diverges when x = −1. 3. Formally adding together the two series above, suggests the power series 2+ 2x2 2x4 2x6 x4 x6 + + + · · · = 1 + 1 + x2 + + + ... 2 4 6 2 3 which converges for |x| < 1 = R but diverges when x = ±1. 4. The series x2 x6 x8 + − + ... 2 3 4 converges for |x| < 1 = R and also converges for both x = ±1. 1− King’s College London 108 Chapter 6 5. The series ∞ X xn n=0 n! =1+x+ x2 x3 + + ... 2! 3! converges absolutely for all x ∈ R, by the Ratio Test. P n If the power series P∞ n=0 an x is differentiated term by term, then the ∞ resulting power series is n=1 nan xn−1 . This is called the associated derived series. The next theorem tells us that this makes sense. P∞ n Theorem 6.6. Suppose P∞ that n−1n=0 an x has radius of convergence R > 0. Then the series n=1 nan x also has radius of convergence equal to R. (The possibility of an infinite radius of convergence is included.) Proof. that 0 < |u| < R. Let r > 0 obey 0 < |u| < r < R. Then P∞ Suppose n converges. Since n1/n → 1 as n → ∞, it follows that there is |a | r n=0 n some N ∈ N such that n1/n < r/ |u| for all n > N . Therefore n |an | |un | = |an | (n1/n |u|)n < |an | rn for all n > N . By Comparison, it follows that ∞ X n an u n=1 n−1 = (1/u) ∞ X n an un n=1 P n−1 has converges absolutely. It follows that the power series ∞ n=1 nan x radius of convergence at least equal to R. P n−1 converges absolutely, On the other hand, if the derived series ∞ n=1 nan x then the inequality |an | |x|n ≤ |x| n |an | |x|n−1 P n for n ≥ 1 implies that ∞ n=0 an x converges absolutely, by Comparison. The result follows. Remark 6.7. By applying the theorem once again, we see that the power P∞ series n=2 n(n − 1) an xn−2 also has radius of convergence equal to R. Of course, we can now apply the theorem again . . . The big question is whether the derived series is indeed the derivative of the original power series. We shall now show that this is true. We recall that Taylor’s Theorem, with 2nd order remainder for a function f about x0 , gives f (x) = f (x0 ) + (x − x0 )f 0 (x0 ) + f (2) (c) (x − x0 )2 2! for some c between x and x0 . Setting f (x) = xk gives the equality 2 xk − xk0 = k (x − x0 ) xk−1 + 21 k(k − 1)ck−2 0 k (x − x0 ) Department of Mathematics 109 Power Series for some ck between x0 and x. Note that ck may depend on k (as well as x0 and x). If x = x0 + h, then this becomes (x0 + h)k − xk0 = h k xk−1 + 21 k(k − 1)ck−2 h2 0 k (∗) for some ck between x0 and x0 + h. We can use this to find the derivative of a power series inside P its disc n of convergence. Indeed, suppose that the power series f (x) = ∞ n=0 an x has radius of convergence R > 0. Let |x0 | < R be given and let r > 0 obey 0 < |x0 | < r < R. Let h 6= P 0 be so small that |x0 | + |h| < r. This means that n −r < x0 + h < r so that ∞ n=0 an (x0 + h) converges (absolutely). Using (∗), we find that ∞ f (x0 + h) − f (x0 ) X − n an xn−1 = 0 h n=1 1 2 h ∞ X n(n − 1)cnn−2 . n=2 Now cn is between x0 and x0 + h and both of these points lie in the interval (−r, r) and so it follows that that is |cn | < r. But then, by P cn ∈ (−r, r),n−2 Comparison with the series ∞ n(n−1)r , the power series on the right n=2 hand side is convergent. Letting h → 0 gives the desired result that ∞ f (x0 + h) − f (x0 ) X = n an x0n−1 . h→0 h f 0 (x0 ) = lim n=1 We have proved the following important theorem. P n Theorem 6.8 (Differentiation of Power Series). The power series ∞ n=0 an x is differentiable at each point x0 inside its radius of convergence. P n−1 Moreover, its derivative is given by the derived series ∞ . n=1 n an x0 Example 6.9. We shall show that ln(1 + x) = x − x2 x3 x4 + − + ... 2 3 4 for any x ∈ (−1, 1). The radius of convergence, R, of the power series on the right hand side is R = 1. Let us begin by guessing that ln(1 + x) = a0 + a1 x + a2 x2 + . . . . If this is to be true, then putting x = 0, we should have ln 1 = a0 + 0, that is, a0 = 0. Differentiating term by term and then setting x = 0, we might guess that d dx ln(1 + x) |x=0 = a1 . This gives 1 = a1 . Differentiating twice (term by d2 term) and setting x = 0, we might guess that dx x ln(1 + x) |x=0 = 2a2 , that is, a2 = − 21 . Repeating this, we guess that ak = (−1)k+1 /k. So much for the guessing, now let us justify our reasoning. King’s College London 110 Chapter 6 Let g(x) be the power series g(x) = x − x2 x3 x4 + − + ... . 2 3 4 We see that this power series converges for x = 1 and so it must converge absolutely for |x| < 1. (This can also be seen directly by the Ratio Test.) The series does not converge when x = 1 and so we deduce that its radius of convergence is R = 1. For any x with |x| < R = 1, the power series can be differentiated and the derivative is that obtained by term by term differentiation. Hence g 0 (x) = 1 − x + x2 − x3 + . . . for any x with |x| < 1. However, we know that 1 = 1 − x + x2 − x3 + . . . 1+x d for |x| < 1 and so g 0 (x) = 1/(1 + x) for x ∈ (−1, 1). But dx ln(1 + x) = 1/(1+x) for x ∈ (−1, 1) and so ln(1+x)−g(x) has zero derivative on (−1, 1). It follows that ln(1 + x) − g(x) is constant on (−1, 1). Setting x = 0, we see that this constant must be ln 1 − g(0) = 0 and so ln(1 + x) = g(x) on the interval (−1, 1), as required. Note that we have shown that ln(1 + x) = x − 21 x2 + 13 x3 − . . . for any x ∈ (−1, 1). We have already seen (thanks to Taylor’s Theorem) that ln 2 = 1 − 12 + 13 − 14 + . . . which means that this expansion is also valid for x = 1. When x = −1, the left hand side becomes ln 0, which is not defined and the right hand side becomes the divergent series −1 − 21 − 31 − 14 − . . . . Department of Mathematics Chapter 7 The elementary functions We have already used the elementary functions (the trigonometric functions, exponential function and the logarithm) as examples to illustrate various aspects of the theory. Now is the time to give their formal definitions. The trigonometric functions sin x and cos x and the exponential function exp x are defined as follows. Definition 7.1. For any x ∈ R, sin x = ∞ X (−1)n x2n+1 n=0 cos x = (2n + 1)! ∞ X (−1)n x2n (2n)! n=0 exp x = ∞ X xn n=0 n! =x− =1− =1+x+ x3 x5 x7 + − + ... 3! 5! 7! x2 x4 x6 + − + ... 2! 4! 6! x2 x3 + + ... . 2! 3! Each of these power series converges absolutely for all x ∈ R (by the Ratio Test) so they have an infinite radius of convergence. Remark 7.2. These are the definitions and so each and every property that these functions possess must be obtainable from these definitions. We can see immediately that sin 0 = 0, cos 0 = 1 and exp 0 = 1. We also note that sin(−x) = − sin x (so sin x is an odd function) and cos(−x) = cos x (so cos x is an even function). Furthermore, by the basic differentiation of power series theorem, Theorem 6.8, we see that these functions are differentiable at every x ∈ R with derivatives given by term by term differentiation 111 112 Chapter 7 so that d sin x = dx d cos x = dx d exp x = dx ´ d ³ x3 x5 x2 x4 x− + − ... = 1 − + − · · · = cos x dx 3! 5! 2! 4! ´ x2 x4 d ³ x3 x5 1− + − . . . = −x + − + · · · = − sin x dx 2! 4! 3! 5! ´ x2 x3 d ³ x2 1+x+ + + ... = 0 + 1 + x + + · · · = exp x . dx 2! 3! 2! We shall establish further familiar properties. Theorem 7.3. For any x ∈ R, sin2 x + cos2 x = 1. Proof. Let ϕ(x) = sin2 x + cos2 x. Then we calculate the derivative ϕ0 (x) = 2 sin x cos x − 2 cos x sin x = 0 . It follows that ϕ(x) is constant on R. In particular, ϕ(x) = ϕ(0) = sin2 0 + cos2 0 = 0 + 1 = 1 that is, sin2 x + cos2 x = 1, as required. Remark 7.4. Since both terms sin2 x and cos2 x are non-negative, we can say that −1 ≤ sin x ≤ 1 and also −1 ≤ cos x ≤ 1 for all x ∈ R. The functions sin x and cos x are bounded (by ±1). This is not at all obvious just by looking at the power series in their definitions. Theorem 7.5 (Addition Formulae). For any a, b ∈ R, we have sin(a + b) = sin a cos b + cos a sin b cos(a + b) = cos a cos b − sin a sin b . Proof. Let ψ(x) = sin(α − x) cos x + cos(α − x) sin x. Then we see that ψ 0 (x) = − cos(α − x) cos x − sin(α − x) sin x + sin(α − x) sin x + cos(α − x) cos x = 0 . It follows that ψ(x) is constant on R and so ψ(x) = ψ(0), that is, sin(α − x) cos x + cos(α − x) sin x = sin α . Putting α = a + b and x = b, we obtain the desired formula sin(a + b) = sin a cos b + cos a sin b . The other formula can be obtained similarly. Indeed, let µ(x) = cos(α − x) cos x − sin(α − x) sin x . Department of Mathematics 113 The elementary functions Then we find that µ0 (x) = 0 so that µ(x) is constant on R. Hence µ(x) = µ(0) = cos α. Again setting α = a + b and x = b, we find that cos(a + b) = cos a cos b − sin a sin b and the proof is complete. Remark 7.6. The formulae sin(a − b) = sin a cos b − cos a sin b cos(a − b) = cos a cos b + sin a sin b follow by replacing b by −b and using the facts that sin(−b) = − sin b whereas cos(−b) = cos b. Notice further that if we set a = x and b = x in this last formula, then we get cos(x − x) = cos2 x + sin2 x that is, we recover the formula sin2 x + cos2 x = 1. The number π The elementary geometric approach to the trigonometric functions is by means of triangles and circles. The number π makes its appearance in the formula relating the circumference and the radius of a circle (or giving the area A = πr2 of a circle of radius r). For us here, we must always proceed via the power series definitions of the trigonometric functions. The identification of π begins with some preliminary properties of the functions sin x and cos x. Lemma 7.7. (i) sin x > 0 for all x ∈ (0, 2) . (ii) cos 2 < 0 . Proof. (i) Taylor’s Theorem (up to order 2) says that f (x) = f (0) + x f 0 (0) + x2 00 f (c) 2! for some c between 0 and x. With f (x) = sin x, we obtain sin x = 0 + x − x2 x2 sin(c) ≥ x − 2 2 for some c between 0 and x. We have used the facts that sin 0 = 0, cos 0 = 1 and − sin(c) ≥ −1. Hence sin x ≥ x − 12 x2 = 1 2 x(2 − x) > 0 if 0 < x < 2, as claimed. King’s College London 114 Chapter 7 (ii) Applying Taylor’s Theorem (up to order 4), we may say that there is some λ between 0 and x such that cos x = 1 − 0 − x2 x4 +0+ cos λ . 2! 4! But cos λ ≤ 1 and so cos x ≤ 1 − x2 x4 + . 2 4! Putting x = 2 gives cos 2 ≤ 1 − 4 16 2 1 + = −1 + = − 2 24 3 3 which implies that cos 2 ≤ − 13 < 0, as required. Now we come to the crucial part. Theorem 7.8. There is a unique 0 < µ < 2 such that cos µ = 0. Proof. We know that cos 0 = 1 and we have just seen that cos 2 < 0. It follows by the Intermediate Value Theorem (applied to the function cos x on the interval [0, 2]) that there is some µ ∈ (0, 2) such that cos µ = 0. We must now show that there is only one such µ. To see this, suppose that cos β = 0 for some β ∈ (0, 2) with β 6= µ. Then by Rolle’s Theorem, ¯ d there is some ξ between µ and β such that dx cos x¯x=ξ = 0, that is, sin ξ = 0. But we have shown that sin x > 0 on (0, 2). This gives a contradiction and so we conclude that there can be no such β. In other words, there is a unique µ with 0 < µ < 2 such that cos µ = 0. Definition 7.9. The real number π is defined to be π = 2µ, where µ is the unique solution in (0, 2) to cos µ = 0. All we can say at the moment is that 0 < π < 4. It is known that π is irrational and its decimal expansion is known to some two million decimal places. Curiously enough, it seems that each of the digits 0, 1, . . . , 9 appears with about the same frequency in this expansion. Theorem 7.10. The number π is such that sin( 12 π) = 1, cos(2π) = 1 and sin(2π) = 0. Furthermore, for any x ∈ R sin(x + 2π) = sin x cos(x + 2π) = cos x . Proof. By its very definition, we know that cos( 12 π) = 0. But since we have the identity sin2 x + cos2 x = 1, it follows that sin( 21 π) = ±1. However, we have seen that sin x > 0 on (0, 2) and so it follows that sin( 12 π) = 1. Department of Mathematics 115 The elementary functions By the addition formulae, sin π = 2 sin( 12 π) cos( 21 π) = 0. This then implies that sin(2π) = 2 sin π cos π = 0. To show that cos(2π) = 1, we use the addition formula again to find that cos(2x) = cos2 x − sin2 x = 1 − 2 sin2 x . Setting x = π, we get cos(2π) = 1 because sin π = 0. Finally, using the above results together with the addition formulae, we calculate sin(x + 2π) = sin x cos(2π) + cos x sin(2π) = sin x cos(x + 2π) = cos x cos(2π) − sin x sin(2π) = cos x for any x ∈ R and the proof is complete. Properties of the exponential function We now turn to a discussion of the exponential function. Proposition 7.11. The function exp x enjoys the following properties. (i) d dx exp x = exp x for all x ∈ R. (ii) exp 0 = 1. (iii) For any a, b ∈ R, exp(a + b) = exp a exp b. (iv) exp(−x) = 1/ exp x for all x ∈ R. (v) exp x > 0 for all x ∈ R. Proof. (i) As already noted, this follows because the derivative of the power series is that power series got by differentiating term by term. (ii) Putting x = 0 in the power series gives exp 0 = 1. (iii) Fix u ∈ R and set ϕ(x) = exp x exp(u − x). Then ϕ0 (x) = exp x exp(u − x) − exp x exp(u − x) = 0 for all x ∈ R. It follows that ϕ(x) is constant, so that ϕ(x) = ϕ(0). But ϕ(0) = exp u and so ϕ(x) = exp u. Letting u = a + b and x = a, we find that exp a exp b = exp(a + b), as required. (iv) From the above, we find that exp x exp(−x) = exp 0 = 1 and so exp(−x) = 1/ exp x. King’s College London 116 Chapter 7 (v) Since exp x exp(−x) = 1 it follows that exp x 6= 0 for any x ∈ R. However, it is clear from the power series that exp x > 0 if x > 0 and so the formula exp x exp(−x) = 1 implies that exp(−x) > 0 too. (Alternatively, one can note that exp x = exp( 12 x) exp( 21 x) = (exp( 21 x))2 which is positive.) Because of the property exp(a + b) = exp a exp b, one often writes ex for exp x, so this reads ea+b = ea eb . However, this notation needs some further discussion. The point is that the symbols e2 , say, now appear to have two interpretations. Firstly as exp(2) and secondly as the square of the number e. The real number e is defined as exp(1) and we see that e2 = exp(1)2 = exp(1) exp(1) = exp(1 + 1) = exp(2) so the two interpretations actually agree. What about, say, e1/2 ? This is interpreted as either exp( 12 ) or as the square root of e. But exp( 12 ) exp( 12 ) = exp( 12 + 12 ) = exp(1) = e so exp( 12 ) is the square root of e. This extends to any rational power. Theorem 7.12. For any r ∈ Q, exp(r) = er , where e = exp(1) Proof. If r = 0, then exp(0) = 1 = e0 , by definition of the power e0 . Now suppose that r > 0 and write r = p/q for p and q ∈ N. We have (exp r)q = exp r × · · · × exp r = exp(rq) {z } | q factors = exp p = exp(1 |+1+ {z· · · + 1}) p terms p = exp 1 × · · · × exp 1 = e | {z } p factors and so exp(r) = ep/q = er . Now let r = −s where s ∈ Q and s > 0. The above discussion tells us that exp(s) = es so that exp(r) = exp(−s) = 1 1 = s = e−s = er exp(s) e and we are done. Remark 7.13. This result clarifies the symbolism ex . This can always be considered as shorthand notation for exp x, but if x is rational, then it can also mean the xth power of the real number e. In this (rational) case, the values are the same, as the theorem shows, so there is no ambiguity. Department of Mathematics 117 The elementary functions Remark 7.14. We have seen that the power series expression for exp x tells us d that dx exp x = exp x and exp 0 = 1. These properties completely determine exp x. In fact, if ψ(x) is the power series ψ(x) = a0 + a1 x + a2 x2 + . . . , then the requirement that ψ 0 (x) = ψ(x) demands that a1 + 2a2 x + 3a3 x2 + 4a4 x3 + · · · = a0 + a1 x + a2 x2 + . . . This holds if kak = ak−1 for all k = 0, 1, 2, . . . which means that a1 = a0 , a2 = a1 /2 = a0 /2, . . . , ak = ak−1 /k = ak−2 /k(k − 1) = · · · = a0 /k!. If ψ(0) = 1, then a0 = 1 and ak = 1/k! so we find that ψ(x) = exp x. This holds without assuming that we begin with a power series. Indeed, suppose that ϕ(x) is differentiable on R and that ϕ0 (x) = ϕ(x) and ϕ(0) = 1. We shall show that ϕ(x) = exp x. Let g(x) = ϕ(x) exp(−x). Then g is differentiable on R and g 0 (x) = ϕ0 (x) exp(−x) − ϕ(x) exp(−x) = 0 since ϕ0 (x) = ϕ(x). Fix u ∈ R and let (a, b) be any interval in R such that both u ∈ (a, b) and 0 ∈ (a, b). Then g 0 is zero on the interval (a, b) and so g is constant there. In particular, g(u) = g(0). However, by construction, g(0) = ϕ(0) exp 0 = 1 and so g(u) = g(0) = 1. Hence ϕ(u) exp(−u) = 1 and we finally arrive at the required result that ϕ(u) = exp u. The function exp x has further interesting properties. Theorem 7.15. The function exp x obeys the following. (i) The map x 7→ exp x is one-one from R onto (0, ∞). In fact, exp x is strictly increasing on R. (ii) For any k ∈ N, xk / exp x → 0 as x → ∞. Proof. (i) From the power series expression for exp x, we see that if x > 0 then exp x > 1 + x > 1. Suppose that a, b ∈ R and that a < b. Then b − a > 0 so that exp(b − a) > 1. Multiplying by exp a (which is positive), we see that exp(b − a) exp a > exp a, that is exp b > exp a. It follows that exp x is strictly increasing and so exp a = exp b is only possible if a = b, that is, x 7→ exp x is one-one. (Alternatively, the Mean Value Theorem tells us that (exp b − exp a)/(b − a) is equal to the derivative of exp x evaluated at some point between a and b. This derivative is always positive and so (exp b − exp a) and (b − a) always have the same sign. In particular, exp a = exp b only if a = b.) We still have to show that exp x maps R onto (0, ∞). To see this, let µ ∈ (0, ∞). We must show that there is some u ∈ R such that exp u = µ. Let α > µ and let β > 1/µ. Then exp α > 1 + α > µ and exp β > 1 + β > 1/µ, so that exp(−β) = 1/ exp β < µ. So we have exp(−β) < µ < exp α . King’s College London 118 Chapter 7 Now exp x is continuous on R and so in particular is continuous on the closed interval [−β, α]. By the Intermediate Value Theorem, there is some u between −β and α such that exp u = µ, as required. (ii) For x > 0, the power series expression for exp x tells us that exp x = ∞ X xn n=0 n! > xk+1 . (k + 1)! Hence 0 < xk / exp x < (k + 1)!/x for x > 0 and so xk / exp x → 0 as x → ∞. Remark 7.16. This last result can be written as xk exp(−x) → 0 as x → ∞ or as (exp x)/xk → ∞ as x → ∞ and it implies that xk exp x → 0 as x → −∞. It is clear from the power series definition (with x = 1) that e > 1+1 = 2. We can easily obtain an upper bound for e via Taylor’s Theorem. Indeed, exp(k) (x) = exp(x) for any k ∈ N and exp(0) = 1, so by Taylor’s Theorem up to remainder of order 3, we have exp(x) = 1 + x + x2 x3 + exp(cx ) 2! 3! for some cx between 0 and x. If x = 1, then c1 < 1 and so ec1 < e and we get e − (1 + 1 + 12 ) = 16 ec1 < 16 e , that is, e < 3. We can profitably pursue this method of estimation. Taylor’s Theorem up to remainder of order m + 1 gives ex = 1 + x + x2 xm + ··· + + Rm+1 2! m! m+1 x where Rm+1 = (m+1)! ecx . Now setting x = 1 and noting the inequalities 3 0 < ec1 < e < 3, we see that 0 < Rm+1 < (m+1)! . 3 1 However, if m ≥ 3, then (m+1)! < m! and we deduce that ³ 1 ´ 1 1 . < 0 < e − 1 + 1 + + ··· + 2! m! m! (∗) This can be rewritten as ³ 1 1 1 ´ 1 1 < e < 1 + 1 + + ··· + + 1 + 1 + + ··· + 2! m! 2! m! m! for any m ≥ 3. These estimates allow us to prove the following interesting fact. Department of Mathematics 119 The elementary functions Theorem 7.17. The real number e is irrational. Proof. The proof is by contradiction. Suppose it were the case that e ∈ Q and let e = p/q where p, q ∈ N. Let m ∈ N obey m > q + 3 (so that m is greater than both q and 3). Using the estimate (∗) and multiplying through by m! , we see that ¶ µ p ³ 1 1 ´ < 1. 0 < m! − 1 + 1 + + ··· + q 2! m! However, ¶ µ ¶ p ³ 1 1 ´ m! p ³ m! m! = m! − 1+1+ +···+ − m! + m! + +···+ q 2! m! q 2! m! which is an integer because each term is an integer. This gives us our contradiction since there is no integer lying strictly between 0 and 1. The proof is complete. In fact, we can prove more, namely that all powers and roots of powers are irrational. That is, ep/q is irrational for any p, q ∈ Z with p 6= 0 and q 6= 0. (If q = 0, then p/q does not make any sense. If q 6= 0 but p = 0, then we have ep/q = e0 = 1 which is rational.) In order to show this, we need a few preliminary results. Lemma 7.18. For given n ∈ N, let f (x) = xn (1 − x)n . Then n! 2n 1 X (i) f (x) = cm xm , with cm ∈ Z. n! m=n (ii) If 0 < x < 1, then 0 < f (x) < 1/n!. (iii) f (k) (0) ∈ Z and f (k) (1) ∈ Z for all k ≥ 0. Proof. (i) The Binomial Theorem tells us that (1 − x)n can be written as (1 − x)n = a0 + a1 x + a2 x2 + · · · + an xn for suitable integers a0 , a1 , . . . , an . In fact, a0 = 1, a1 = −n, a2 = n(n − 1) and so on. In general, am = (−1)m n! /(n − m)! m!. Alternatively, this can be proved by induction. Indeed, let P (n) be the statement that (1 − x)n = a0 + a1 x + a2 x2 + · · · + an xn for coefficients a0 , a1 , . . . , an ∈ Z. Then with n = 1, we have (1 − x)1 = 1 − x and we see that P (1) is true. Now suppose that n ∈ N and P (n) is true. Then (1 − x)n+1 = (1 − x) (1 − x)n = (1 − x)(a0 + a1 x + a2 x2 + · · · + an xn ) King’s College London 120 Chapter 7 for coefficients a0 , a1 , . . . , an ∈ Z. Expanding the right hand side gives (1 − x)n+1 = a0 + a1 x + a2 x2 + · · · + an xn − x(a0 + a1 x + a2 x2 + · · · + an xn ) = a0 + (a1 − a0 )x + (a2 − a1 )x2 + · · · · · · + (an − an−1 )xn − an xn+1 . Evidently, the coefficients all belong to Z and so P (n + 1) is true. By induction, it follows that P (n) is true for all n ∈ N. (ii) If 0 < x < 1, then also 0 < (1−x) < 1 and therefore both 0 < xn < 1 and 0 < (1 − x)n < 1. Hence 0 < f (x) < 1/n! . (iii) We first note that differentiating k times the power xm and then setting x = 0 gives ( ¯ 0, if m 6= k dk xm ¯¯ = ¯ k dx k!, if m = k. x=0 It follows directly from (i) that ( k! ck /n! , if n ≤ k ≤ 2n (k) f (0) = 0, otherwise. Furthermore, if k ≥ n, then k!/n! ∈ Z and so we see that f (k) (0) ∈ Z for any k ≥ 0. Next, we use the relation f (x) = f (1 − x) together with the chain rule to find f (k) (1). Let u = 1 − x. Then du/dx = −1 so that df (u) du df (u) d f (1 − x) = = × (−1) . dx du dx du Differentiating k times gives dk f (1 − x) dk f (u) = × (−1)k . k dx duk Hence, using the equality f (x) = f (1 − x) = f (u), we get f (k) (x) = k dk f (x) dk f (1 − x) k d f (u) = = (−1) dxk dxk duk for all x. Putting x = 1 gives u = 0 and so f (k) (1) = (−1)k f (k) (0). However, we know that f (k) (0) ∈ Z and so (−1)k f (k) (0) ∈ Z. That is, f (k) (1) ∈ Z, as claimed. We are now in a position to prove the following result we are interested in concerning the irrationality of various powers of e. Department of Mathematics 121 The elementary functions Theorem 7.19. er is irrational for every r ∈ Q \ { 0 }. Proof. We first show that es ∈ / Q for every s ∈ N. The proof is by contradiction, so suppose the contrary, namely that there is some s ∈ N such that es ∈ Q. Then we can write es = p/q for p, q ∈ N. Choose and fix n ∈ N obeying the inequality n! > p s2n+1 and let f (x) = xn (1 − x)n /n! as in the previous lemma, Lemma 7.17. We introduce the following function F (x) defined to be F (x) = s2n f (x) − s2n−1 f 0 (x) + s2n−2 f 00 (x) − s2n−3 f (3) (x) + · · · − sf (2n−1) (x) + f (2n) (x) . Now, by part (i) of Lemma 7.17, f (x) has degree 2n and so f (k) (x) = 0 for all k > 2n. Hence, differentiating the formula above, we find that F 0 (x) = s2n f 0 (x) − s2n−1 f 00 (x) + s2n−2 f (3) (x) − s2n−3 f (4) (x) + · · · − sf (2n) (x) + f (2n+1) (x) . | {z } =0 Hence (after many cancellations) F 0 (x) + sF (x) = s2n f 0 (x) − s2n−1 f 00 (x) + s2n−2 f (3) (x) − s2n−3 f (4) (x) + · · · − sf (2n) (x) + s2n+1 f (x) − s2n f 0 (x) + s2n−1 f 00 (x) − s2n−2 f (3) (x) + · · · − s2 f (2n−1) (x) + sf (2n) (x) = s2n+1 f (x) . It follows that ¢ d ¡ sx e F (x) = s esx F (x) + esx F 0 (x) dx ¡ ¢ = esx sF (x) + F 0 (x) I= = s2n+1 esx f (x) . Hence Z 1 h i1 s2n+1 esx f (x) dx = esx F (x) 0 0 s = e F (1) − F (0) = p q F (1) − F (0) since es = p/q. Therefore q I = p F (1) − q F (0) . King’s College London 122 Chapter 7 Now, s ∈ N and by Lemma 7.17 we know that f (k) (0) ∈ Z and f (k) (1) ∈ Z for all k ≥ 0. It follows from the expression for F (x) that both F (0) ∈ Z and F (1) ∈ Z. Hence q I ∈ Z. Furthermore, the integrand in the formula for I is positive on (0, 1) and so I > 0. It follows that q I ∈ N. Now, by Lemma 7.17 again, 0 < f (x) < 1/n! for 0 < x < 1 and esx < es for x < 1 and so Z 1 0 < qI = q s2n+1 esx f (x) dx 0 Z 1 1 2n+1 < qs esx dx n! 0 1 p s2n+1 < q s2n+1 es = n! n! <1 by our very choice of n at the start. However, there are no integers strictly between 0 and 1 so we have finally arrived at our contradiction and we conclude that es is irrational for every s ∈ N. Let m ∈ N. Since em = 1/e−m and we have just shown that em ∈ / Q, it follows that e−m ∈ / Q and so we may conclude that es ∈ / Q for all s ∈ Z\{ 0 }. Now let r ∈ Q \ { 0 }. Write r = m/n for m ∈ Z \ { 0 } and n ∈ N. If er were rational, it would follow that em = (em/n )n = (er )n is also rational. But we know that em is irrational for every m ∈ Z \ { 0 } and so it follows that er is irrational and the proof is complete. Compound Interest If one pound is invested for one year at an annual interest rate of 100r% and compounded at n regular intervals, the compound interest formula states that its value on maturity is (1 + r/n)n pounds. This value is approximately equal to er . We shall see why this is so. Proposition 7.20. For fixed r > 0, let xn = (1 + r/n)n , n ∈ N. Then (xn ) is a bounded increasing sequence and so converges. Proof. Using the Binomial Theorem, we write n µ ¶³ ¡ ¢n X n r ´k xn = 1 + r/n = k n k=0 n X n(n − 1) . . . (n − (k − 1)) ³ r ´k = k! n k=0 = n X k=0 Department of Mathematics bk (n) rk 123 The elementary functions ¡ ¢¡ ¢ ¡ ¢ 1 1 − n1 1 − n2 . . . 1 − (k−1) . n ¡ j¢ Now, as n increases, j/n decreases and so 1 − n increases. In other words, for each fixed k ≤ n, bk (n) < bk (n + 1). It follows that where bk (n) = xn+1 = n(n−1)...(n−(k−1)) k! nk n+1 X = bk (n + 1) rk = k=0 > n X 1 k! n X bk (n + 1) rk + bn+1 (n + 1) rn+1 k=0 bk (n) rk = xn k=0 which shows that (xn ) is an increasing sequence. Moreover, it is clear that bk (n) ≤ 1 k! so we see that xn = n X bk (n) rk ≤ n X rk k=0 k=0 k! < er and the proof is complete. We can now establish the result we are interested in. ¡ ¢n Theorem 7.21. For any fixed r > 0, 1 + r/n → er as n → ∞. Proof. Let ε > 0 be given. Using the notation established above, we know that xn = (1 + r/n)n → α for some α ∈ R. We must show that α = er . Let N1 ∈ N be such that if n > N1 then |xn − α| < Next, let sn = Pn k=0 r k /k! 1 5 ε. and let N2 ∈ N be such that if n > N2 then |sn − er | < 1 5 ε. (We know that sn → er .) Now we note that for each fixed k, bk (n) → 1/k! as n → ∞. Fix N > N1 + N2 and let N3 ∈ N be such that N3 > N and if n > N3 then N N X ¯X ¯ bk (n) rk − k=0 1 k! ¯ rk ¯ < 1 5 ε. k=0 King’s College London 124 Chapter 7 For any n > N3 , we have n n X ¯X ¯ k 1 k¯ ¯ |xn − sn | = bk (n) r − k! r ¯ =¯ k=0 N X k=0 N X bk (n) rk − k=0 1 k! rk k=0 n X + k=N +1 ¯ ≤¯ N X bk (n) rk − 1 5 = 1 5 1 5 < 1 k! n X ¯ rk ¯ + 2 k=0 ∞ X ε+2 1 k! 1 k! ¯ rk ¯ k=N +1 N X k=0 < n X bk (n) rk − 1 k! rk k=N +1 rk k=N +1 r ε + 2 (e − sN ) ε + 25 ε = 3 5 ε. But then, for n > N3 , |α − er | ≤ |α − xn | + |xn − sn | + |sn − er | < 1 5 ε + 53 ε + 15 ε = ε and we conclude that α = er . Corollary 7.22. For any r > 0, (1 − r/n)n → e−r as n → ∞. Proof. Let yn = (1 − r2 /n2 )n and suppose that n is so large that r/n < 1. For such n, we see that 0 < 1 − yn = 1 − (1 − r2 /n2 )n n ³ −r2 ´k X =1− bk (n) n k=0 =− ≤ n X k=1 n X bk (n) bk (n) k=1 r2 /n <e ³ −r2 ´k n ³ r2 ´k n − 1. 2 Now er /n → 1 and so by the Sandwich Principle, we see that yn → 1 as n → ∞. However, we then find that (1 − r/n)n = as required. Department of Mathematics 1 (1 − r2 /n2 )n → r = e−r (1 + r/n)n e 125 The elementary functions The logarithm The logarithm is defined via the exponential function. We know that exp x maps R one-one onto (0, ∞). This means that to each x ∈ (0, ∞) there is one and only one v ∈ R such that exp v = x. Definition 7.23. For x ∈ (0, ∞), log x is the value v ∈ R such that exp v = x. It follows that x 7→ log x maps (0, ∞) onto R. log x is defined by the formula elog x = x for x > 0. Remark 7.24. The notation ln x is also used for the function log x here. The notation ln emphasizes the fact that this is the “logarithm to base e”, the so-called “natural” logarithm. Proposition 7.25. The function log x has the following properties. (i) log 1 = 0 and log e = 1. (ii) For any s, t > 0, log(st) = log s + log t. (iii) For any x > 0, log(1/x) = − log x. (iv) log x is strictly increasing and log x → ∞ as x → ∞. (v) (log x)/x → 0 as x → ∞. Proof. We shall make use of the identity log(es ) = s for s ∈ R. (i) We have log 1 = log(e0 ) = 0. Also log e = log e1 = 1. (ii) For any s, t > 0, we have log(s t) = log(elog s elog t ) = log(elog s+log t ) = log s + log t . (iii) We have log(1/x) = log(1/elog x ) = log(e− log x ) = − log x . (iv) Suppose that a < b. Then elog a = a < b = elog b and so we have log a < log b because exp x is strictly increasing. Now let M > 0 be given. Set m = eM . Then if x > m, it follows that log x > log m, that is, log x > log(eM ) = M . King’s College London 126 Chapter 7 (v) Let v = log x. Then x = ev and log x v = v. x e Now, if x → ∞ then also log x → ∞, that is, v → ∞. However, we already know that v/ev → 0 as v → ∞ and so (log x)/x → 0 as x → ∞. The proof is complete. Theorem 7.26. The function log x is continuous at each point in (0, ∞). Proof. Let s ∈ (0, ∞) be given and let ε > 0 be given. We know that the function t 7→ et is strictly increasing, that is, α < β if and only if eα < eβ . This means that α < t < β ⇐⇒ eα < et < eβ . In particular, if α = log s − ε, t = log x and β = log s + ε, this becomes log s − ε < log x < log s + ε ⇐⇒ s e−ε < x < s eε . Let δ = min{ s eε − s, s − s e−ε }. Then |x − s| < δ =⇒ s − δ < x < s + δ =⇒ s e−ε < x < s eε . Therefore log s − ε < x < log s + ε or |log x − log s| < ε and it follows that log x is continuous at s. The next theorem tells us what the derivative of log x is. Theorem 7.27. The function log x is differentiable at every s > 0 and its d derivative at s is 1/s. (In other words, dx log x = 1/x on (0, ∞).) Proof. Let s ∈ (0, ∞) be given and let h be small but h 6= 0. We must show that the Newton quotient h1 (log(s+h)−log(s)) approaches 1/s as h → 0. To see this, let v = log s and let k = log(s+h)−log s, so that log(s+h) = v +k. The continuity of log x at s implies that log(s + h) → log(s) as h → 0, that is, k → 0 as h → 0. In terms of v and k, we have h = (s + h) − s = elog(s+h) − elog s = ev+k − ev = exp(v + k) − exp(v) . Note also that since h 6= 0, it follows that s + h 6= s and so log(s + h) 6= log s Department of Mathematics 127 The elementary functions which means that k 6= 0. Using these remarks, we get log(s + h) − log(s) k = h exp(v + k) − exp(v) ³ exp(v + k) − exp(v) ´−1 = k ¡ ¢−1 0 → exp (v) as h → 0 (since also k → 0), 1 = exp(v) 1 = s as required. For any positive real number a, we know what the power ak means for any k ∈ N. We also know what ap/q means for p, q ∈ N: it is the real number whose q th√power is equal to ap . However, it is not at all clear what a power such as 3 2 means. We would like to set up a reasonable definition for powers such as this. We need some preliminary results. Proposition 7.28. For any a > 0 and m, n ∈ N, log(am/n ) = m n log(a) . Proof. First note that log(sk ) = k log(s) for any s > 0 and k ∈ N. We shall verify this by induction. For k ∈ N, let P (k) be the statement “log(sk ) = k log(s)”. Evidently, P (1) is true. Using the previous proposition, we see that log(sk+1 ) = log(sk s) = log(sk ) + log(s) = k log(s) + log(s) = (k + 1) log(s) if P (k) is true. Hence the truth of P (k + 1) follows from that of P (k) and so, by induction, we conclude that P (k) is true for all k ∈ N. Let t = a1/n so that tn = a and tm = am/n . We have n log(am/n ) = n log(tm ) = nm log(t) = m log(tn ) = m log(a) and it follows that log(am/n ) = m n log(a). −m/n = From the above, we see that am/n = exp( m n log(a)). Moreover, a m r r log a for any 1/am/n = 1/ exp( m n log(a)) = exp(− n log(a)). Hence a = e r a > 0 and any r ∈ Q. Now, the left hand side, a , makes no sense unless r is rational but the right hand side, namely, er log a (which is short-hand notation for exp(r log a)), is well-defined for any real number r. This suggests the following definition of the power as for any s ∈ R. King’s College London 128 Chapter 7 Definition 7.29. For a > 0 and s ∈ R, the power as is defined to be as = es log a . A further remark is in order here. By setting a = e, the real number exp(1), we have a formula for the power es . But this is just es = exp(s log e) = exp s since log e = 1. This is in agreement with our penchant for using the shorthand notation es = exp s, so everything works out alright; that is, we can think of the expression es as being the power, as defined above, or as an abbreviation for the exponential, exp s. These are the same thing. The next proposition tells us that the expected power laws hold. Proposition 7.30. For any a ∈ (0, ∞) and any s, t ∈ R, we have as+t = as at and (as )t = ast . Proof. From the definition, we have as+t = exp((s + t) log a) = exp(s log a) exp(t log a) = as at . Similarly, (as )t = exp(t log(as )) = exp(t log(es log a )) = exp(t s log a) = ast , as required. Proposition 7.31. For any a ∈ R, the function f (x) = xa is differentiable on (0, ∞) and f 0 (x) = a xa−1 . Proof. From the definition, f (x) = xa = ea log x and so the standard rules of differentiation imply that f 0 (x) = as claimed. Department of Mathematics xa a a log x e =a = a xa−1 x x