Draft for Notes on Integration - October 2000 J.D.S. Jones 1 1.1 Introduction The mid-ordinate rule The mid-ordinate rule is a method of numerical integration—a method of estimating the integral Z b f (x)dx. a First we divide the interval [a, b] into n equal intervals each of length h= b−a . n Now we estimate the area under the graph y = f (x) in the region where a ≤ x ≤ b by using the n rectangles in the following diagram. diagram More precisely, let a0 = a, h x1 = a + , 2 ar+1 = ar + h 1 xr = xr−1 + h = a + (r − )h 2 yr = f (xr ). Rb Then the approximation to a f (x)dx given by the mid-ordinate rule with n strips is n n X X (r − 12 )(b − a) b−a An = f a+ . hyr = n n r=1 r=1 Clearly, there are two fundamental questions. 1 • Does this procedure always work? In precise mathematical terms, is it true that Z b lim An = f (x) dx ? n→∞ a • How efficient is the mid-ordinate rule—given that the sequence An does Rb indeed converge to a f (x) dx, how fast does it converge? The seond question is, of course, very interesting: it is the kind of question studied in numerical analysis. But here we will concentrate on the first question—which is a pretty fundamental theoretical question which we should be able to answer using the techniques of analysis. The trouble is that we are not really dealing with a well-posed problem. Given a function f , the sequence An is clearly well-defined and, from first year analysis we have a very precise mathematical understanding of limits of sequences. However, we do not have a comparable mathematically precise definition of the integral, one which is suitable for rigorous proofs and calculations. And until we do have such a definition we cannot hope to prove that Z b lim An = f (x) dx . n→∞ a And until we have such a proof how do we know the mid-ordinate rule works? The next step is very typical of the thought processes in modern mathematics. We turn the whole thing on its head. The left hand side of the above equation is mathematically well-defined. The right hand side is not and this prevents us from trying to prove the equation. So we try to use the left hand side as the definition of the integral. We turn the proposed method for computing the integral into the definition of the integral. This completely changes the nature of the problem—what we must now do is to prove that if we define the integral by this method, then the resulting process of integration really does have all the properties we expect it to have. We come across an identical situation when we analyse the concept of area. We know what the area of a rectangle is, and while we do have a clear intuitive idea of what the area of a region in the plane is we do not have a mathematically precise definition of area, one which is suitable for rigorous proofs and calculations. However, one method of computing areas is to approximate the region by rectangles and pass to an appropriate limit. Now we turn this method of computing areas into the definition of area; to define the area of a region we approximate it by rectangles and then take a limit. Of course, since areas and integration are so inextricably linked, it is not at all surpising that we should come across the same fundamental problem and solve it by identical methods. 2 One conclusion to draw from this discussion of the mid-ordinate rule is that we need to examine the fundamentals of integration in much more detail. In particular we need to formulate a precise mathematical definition of the integral, one which can be used in the rigorous arguments of analysis. It also gives a clue as to how we might formulate such a definition. It is easiest to express this in terms of defining the area of a region in the plane— approximate the region by rectangles and take a limit. This is probably a familiar idea, and at an intuitive level it seems clear that such a procedure should work. At a mathematical level there are two basic questions. • What, precisely, do we mean by “approximate”? • What, precisely, do we mean by “take a limit”? 1.2 The integral as anti-derivative One of the most important and basic facts of integration is the fundamental theorem of calculus Z b g 0 (x)dx = g(b) − g(a). a To put this another way, if g(x) = Z x f (x)dx a then g 0 (x) = f (x). Thus the integral is the “anti-derivative” in the sense that the integral of a function f gives a function whose derivative is f . The two forms of the fundamental theorem of calculus given above are identical—if you are in any doubt about this you should verify it. A third version of the same statement is that the solution to the ordinary differential equation dy = f (x) dx is given by the indefinite integral Z y = f (x)dx + constant. Could the fundamental theorem of calculus be the starting point for a rigorous theory of integration? The major defect of this approach is that it 3 does not really solve the basic problem, it just pushes it off somewhere else. For example, is it true that every continuous function is integrable? If we define the integral to be the anti-derivative all we have done is to rephrase the problem—is it true that if f is a continuous function, then there is a function g such that g 0 = f ? To illustrate let us look at a specific problem: calculate the arc-length of the ellipse x2 y 2 + 2 = 1. a2 b diagram This problem is of considerable importance in astronomy, since Kepler’s first law of planetary motion is that the planets move in elliptical orbits with the sun at one focus. Remember how to calculate the arc-length of a parametrised curve x = x(t), y = y(t) in the (x, y) plane. The arc-length of that part of the curve where 0 ≤ t ≤ u is given by the integral Z up x0 (t)2 + y 0 (t)2 dt s(u) = 0 where dx dy , y 0 (t) = . dt dt The justification for this formula is that the integrand is the “speed” of the curve and so the integral gives the “distance covered” in the time interval 0 ≤ t ≤ u. In the case of the ellipse, let us consider just that portion of the ellipse which lies in the positive quadrant. Then we can use t = x/a as the parameter and parametrize the ellipse by √ x = at, y = b 1 − t2 . x0 (t) = Thus −bt 1 − t2 and substituting into the integral for arc-length gives Z ur b2 t2 s(u) = a2 + . 1 − t2 0 x0 (t) = a, y 0 (t) = √ 4 Now put b2 − a2 . a2 k= We get s(u) = a =a Z u Z0 u 0 r 1 + kt2 dt 1 − t2 1 + kt2 p (1 − t2 )(1 + kt2 ) dt If k 6= 0 this integral cannot be expressed in terms of elementary functions. Here, by elementary functions we mean polynomials, circular functions (sin, arcsin, cos, arccos, tan, arctan), logarithms, and exponentials. Similar integrals arise in many other classical calculations. For example, calculating the arc-length of the lemniscate (x2 + y 2 )2 = a2 (x2 − y 2 ) (see diagram below) leads to the integral Z u a2 √ dr. a4 − r 4 0 diagram The problem of determining the period of a simple pendulum, with no approximation, leads to the integral Z π/2 dφ p . 1 − k 2 sin2 φ 0 Using the substitution u = sin φ, this integral becomes Z 1 du p . (1 − u2 )(1 − k 2 u2 ) 0 All these integrals are examples of elliptic integrals, integrals of the form Z p R x, p(x) dx where R(x, y) is a rational function of x and y (this means that R(x, y) = α(x, y)/β(x, y) where α(x, y) and β(x, y) are polynomials in x and y) and p(x) is a polynomial in x of degree 3 or 4. In general, an elliptic integral cannot be expressed in terms of elementary functions. 5 To study these elliptic integrals we need a genuinely new class of functions, elliptic functions. But if we insist that the integral is the anti-derivative we quickly get into a logical conundrum: in order to define these elliptic functions we need to know that the integral exists, and in order to show that these integrals exist we need to show that the elliptic functions exist. Once more we are led to the conclusion that we should analyse the fundamental concepts of integration much more deeply. 1.3 A challenge problem Before going into the theory of integration in detail, let us formulate the problem of proving that the mid-ordinate rule, described in Section 1.1, works if the function f has an anti-derivative. Problem/Challenge. Let f : [a, b] → R be a continuous function, and suppose that there is a differentiable function g : [a, b] → R such that g 0 = f . Define n X b−a f (xr ) An = An (f ; a, b) = n r=1 where (r − 12 )(b − a) xr = a + . n Prove that the sequence An converges and that lim An = g(b) − g(a) n→∞ Rb In this problem, the term An is the approximation to the integral a f (x)dx given by the mid-ordinate rule with R b n strips. The right hand side of the equation is the value of the integral a f (x)dx assuming, of course, we can set up a theory of integration and prove the fundamental theorem of calculus. However the problem itself makes no reference to integration whatsoever—so the challenge is to prove the result without using integration in an explicit way. This is a hard problem! 1.4 The programmme In these notes we will carry out the following programme. • We will give a definition of the integral which is based on the idea that the integral is the area under the curve. The way to give a precise mathematical definition of area is to approximate by rectangles and take a limit. 6 • We will then prove that this definition gives us an integral with the expected properties • We will also prove two of the most important theorems in integration: the fundamental theorem of calculus; and the theorem that every continuous function is integrable on a closed Rbounded interval—this means b that if f : [a, b] → R is continuous, then a f (x)dx exists. 2 Step functions The starting point for the construction of the integral is the idea that Z b f (x)dx a is the area under the graph y = f (x) in the region where a ≤ x ≤ b. A step function φ is a function with the property that the region under the graph y = φ(x) is just the union of a finite number of rectangles. In this case we know exactly what the area of this region is. So the first step in developing a rigorous theory of integration is to take a careful look at step functions. Let [a, b] be a closed interval in R. Suppose we choose k − 1 points p1 , . . . , pk−1 in this interval such that a < p1 < p2 < · · · < pk−1 < b. Then we have defined a partition of the interval [a, b] into k smaller intervals [pi−1 , pi ]. We will always use the convention that p0 = a and pk = b and write this partition as a = p0 < p1 < p2 < · · · < pk−1 < pk = b. diagram Note that if we choose any set of k − 1 distinct points in [a, b] then we can order them in increasing order to define a partition of the interval [a, b]. So we will sometimes describe the partition by simply giving the set {p1 , . . . , pk−1 }. Now we give the formal definition of a step function. Definition 2.1. A function φ : [a, b] → R is a step function if there is a partition a = p0 < p1 < · · · < pk−1 < pk = b of the interval [a, b] such that φ is constant on each of the open intervals (pi−1 , pi ). 7 diagram Notice that we put no restrictions on the values of φ at the partitioning points pi . Also notice that we do not assume that the intervals (pi−1 , pi ) have the same length. Next, we make some elementary, but very useful, remarks concerning step functions and partitions. Remark 2.2. Let P and Q be partitions of [a, b]. In terms of the partitioning points let P be the partition a = p0 < p1 < · · · < pk−1 < pk = b and let Q be the partition a = q0 < q1 < · · · < ql−1 < ql = b. Then Q is a refinement of P if {p1 , . . . pk−1 } ⊂ {q1 , . . . , ql−1 } Given that Q is a refinement of p it follows that pr = qjr for some unique jr and pr = qjr < qjr +1 < qjr +1 < qjr +2 < · · · < qjr+1 = pr+1 . So geometrically, Q is a refinement of P if Q is obtained from P by partitioning each of the intervals defined by P into smaller intervals. Remark 2.3. Let P and Q be partitions of [a, b]. Then there is a third partition R of [a, b] such that R is a refinement of both P and Q. If the partition P is defined by the set of points {p1 , . . . pk−1 } and the partition Q is defined by the set of points {q1 , . . . , ql−1 } then we can take R to be the partition defined by the set of points {p1 , . . . pk−1 , q1 , . . . , ql−1 }. This set of points will contain less than k + l − 2 points if some of the pi coincide with some of the qj . Furthermore to actually defined by this partition R it may be necessary to reorder the points p1 , . . . pk−1 , q1 , . . . , ql−1 . But nonetheless these points do indeed define a partition of [a, b]. It is clear that R is a refinement of both P and Q. Here is an example. Take P to be the partition of [0, 3] defined by the points {1, 2}. 8 diagram Take Q to be the partition of [0, 3] defined by the points {1, 32 , 52 }. diagram Then R is the partition of [0, 3] defined by the points {1, 23 , 52 , 2}. There are four partitioning points (rather than five) and to actually obtain the intervals defined by R we need to reorder these four points. Remark 2.4. As an example which illustrates the two previous remarks, suppose we are given two step functions φ, ψ : [a, b] → R. Then there is a partition a = p0 < p1 < · · · < pk−1 < pk = b of [a, b] such that both φ and ψ are constant on the open intervals (pi−1 , pi ). To see this, choose a partition P1 so that φ is constant on the open intervals defined by P1 and another P2 so that ψ is constant on the open intervals defined by P2 . Now choose a common refinement P . Then both φ and ψ are constant on the open intervals defined by P . Remark 2.5. Given a step function φ : [a, b] → R there is a “minimal” partition with the property that φ is constant on the open intervals defined by the partition. Here, minimal means that there are the minimum number of partitioning points. This should be geometrically obvious.Here is a formal construction of this minimal partition. Define p1 by p1 = sup{x ∈ [a, b] : φ is constant on (a, x)}. We must check that this supremum does indeed exist. Since φ is a step function the set {x ∈ [a, b] : φ is constant on (a, x)} is non-empty; it is bounded above by b and therefore the supremum exists. In general define pi inductively by pi = sup{x ∈ [a, b] : φ is constant on (pi−1 , x)}. Why don’t we always use this minimal partition? The answer is that it is much easier to write proofs if we are allowed to choose any partition of the interval with the property that φ is constant on the open intervals defined by the partition. 9 This collection of remarks will prove to be very useful when dealing with step functions. Now we define the integral of a step function. Definition 2.6. Let φ : [a, b] → R be a step function. Then the integral of φ Z b Z b φ(x)dx = φ a a is defined as follows. Choose a partition a = p0 < p1 < · · · < pk−1 < pk = b of [a, b] such that φ is constant on the open intervals (pi−1 , pi ). Let ci be the value of φ on the interval (pi−1 , pi ); then Z a b φ= k X ci (pi − pi−1 ). i=1 Rb This definition says that the if φ is a step function, then a φ is the area under the graph y = φ(x) in the region a ≤ x ≤ b. It should therefore be obvious that the integral does not depend on the choice of the partition used in its definition. However, we will go through the proof of this, if for no other reason than to gain experience in converting intuition into a formal proof. Lemma 2.7. The integral of a step function does not depend on the choice of the partition used in the definition. Proof. Let P be a partition a = p0 < p1 < · · · < pk−1 < pk = b of [a, b] such that φ is constant on the open intervals (pi−1 , pi ). Suppose that we add a single extra point to this partition to get a new partition Q. We will begin by proving that the integral defined using Q is the same as the integral defined using P . As in the definition, let ci be the value φ assumes on (pi−1 , pi ). Then the integral defined using P is c1 (p1 − a) + c2 (p2 − p1 ) + · · · + ck (b − pk ). The extra point q of the partition Q will be in one of the open intervals, say q ∈ (pr−1 , pr ). So Q is the partition a = p0 < p1 < · · · < pr−1 < q < pr < · · · < pk = b Now using Q to define the integral has the effect of replacing the term cr (pr − pr−1 ) by cr (q − pr−1 ) + cr (pr − q). 10 But this does not change the sum. Now if Q is any refinement of P , then we can obtain Q by successively adding single points. So it follows, from the above argument, that the integral defined using Q is the same as the integral defined using P . If Q is any other partition such that φ is constant on the open intervals defined by Q then we can choose a partition R which is a refinement of both P and Q. It follows that the integral defined using P is equal to the integral defined using R and the integral defined using Q is equal to the integral defined using R. Hence the integrals defined using P and Q are equal. Next we establish the basic properties of the integral of step functions. If φ : [a, b] → R is a step function and [u, v] is a closed interval contained in [a, b], then we can restrict φ to [u, v] and φ : [u, v] → R is still a step function. So we can form the integral Z v φ u for any u, v with a ≤ u < v ≤ b. Lemma 2.8. Let φ : [a, b] → R be a step function. Suppose a ≤ u < v < w ≤ b, then Z v Z w Z w φ+ φ= φ. u v u Proof. Choose a partition u = p0 < p 1 < · · · < p k = v of the closed interval [u, v] such that the step function φ is constant on the open intervals (pi−1 , pi ) for 1 ≤ i ≤ k. Now choose a partition v = q0 < q 1 < · · · < q l = w of the closed interval [v, w] such that φ is constant on the intervals (qj−1 , qj ) for 1 ≤ j ≤ l. Now using the partition u = p 0 < p 1 · · · < p k = v = q0 < q 1 < · · · < q l = w to compute Rw u φ, it follows that Z w Z φ= u v φ+ u 11 Z v w φ. This lemma can be used in the following way. Suppose φ : [a, b] → R is a step function and choose a partition a = p0 < p1 < · · · < pk = b such that φ is constant on the open intervals (pi−1 , pi ). Then by the previous lemma Z b φ= a Z a p1 φ+ Z p2 Z φ + ··· + p1 b φ. pk−1 Now φ is constant on each of the intervals (pi−1 , pi ) and so this provides us with a method of extending an argument with a constant function to one which covers the case of a general step function. This trick is illustrated in the next proof. Lemma 2.9. Let φ, ψ : [a, b] → R be step functions, and let λ, µ be real numbers. Then λφ + µψ is a step function and Z b Z b Z b λφ + µψ = λ φ+µ ψ. a a a Proof. Choose a partition a = p0 < p1 < · · · < pk−1 < pk = b of the interval [a, b] such that both φ and ψ are constant on each of the open intervals (pi−1 , pi ). Such a partition exists by 2.4. Let ci and di be the constant values of φ and ψ respectively on the interval (pi−1 , pi ). Then λφ+µψ is constant on (pi−1 , pi ) and its value on this interval is λci + µdi . This shows that λφ + µψ is a step function. Furthermore, Z pi (λφ + µψ) = (λci + µdi )(pi − pi−1 ) pi−1 Z pi Z pi ψ =λ φ+µ pi−1 pi−1 and summing over the intervals (pi−1 , pi ) gives the result. In the language of linear algebra, Lemma 2.9 shows that the set S(a, b) of step functions [a, b] → R is a vector space over R and the function I : S[a, b] → R, I(φ) = Z b φ a is linear. The next lemma tells us that if we are given some bound on a step function then we get a bound on its integral. 12 Lemma 2.10. Let φ : [a, b] → R be a step function and suppose m ≤ φ(x) ≤ M, for all x ∈ [a, b], where m and M are real numbers. Then Z b m(b − a) ≤ φ ≤ M (b − a). a Proof. As usual let a = p0 < p1 < · · · < pk−1 < pk = b be a partition of the interval [a, b] such that φ is constant on the open interval (pi−1 , pi ) and let ci be the value φ assumes on this interval; then b Z φ= a k X ci (pi − pi−1 ). i=1 Since m ≤ φ(x) ≤ M, for all x ∈ [a, b], it follows that m ≤ ci ≤ M and therefore m(pi − pi−1 ) ≤ ci (pi − pi−1 ) ≤ M (pi − pi−1 ). It follows that m k X i=1 However (pi − pi−1 ) ≤ k X ci (pi − pi−1 ) = b φ≤M a i=1 k X Z k X (pi − pi−1 ). i=1 (pi − pi−1 ) = b − a i=1 and this gives the required bound. Corollary 2.11. Let φ : [a, b] → R be a step function and suppose |φ(x)| ≤ K, for all x ∈ [a, b], where K is a real number with K ≥ 0. Then Z b φ ≤ K(b − a). a 13 Proof. Since |φ(x)| ≤ K for all x ∈ [a, b] it follows that −K ≤ φ(x) ≤ K, for all x ∈ [a, b] and so it follows from Lemma 2.10 that Z b −K(b − a) ≤ φ ≤ K(b − a) a and therefore Z b ≤ K(b − a). φ a Finally let us prove the fundamental theorem of calculus for step functions. Let φ : [a, b] → R be a step function. Then for any x with a ≤ x ≤ b we can define Z x Φ(x) = φ, a and this defines a function Φ : [a, b] → R. This function Φ is the idefinite integral of φ; note that Φ(a) = 0. Theorem 2.12. Let φ : [a, b] → R be a step function. Let a = p0 < p1 < · · · < pk−1 < pk = b be a partition of [a, b] such that φ is constant on each of the open intervals (pi−1 , pi ). Then the function Z x Φ(x) = φ a is differentiable on each of the intervals (pi−1 , pi ) and Φ0 (x) = φ(x), for x ∈ (pi−1 , pi ). Proof. For x ∈ (pi−1 , pi ), Φ(x) = Z a x φ= Z a pi−1 φ+ Z x φ pi−1 and since φ is constant on (pi−1 , pi ) it is sufficient to prove the result for a constant function. However, in this case the result is genuinely obvious. 14 To illustrate, let us look at a simple example. Define φ : [0, 2] → R by ( 1 if 0 ≤ x < 1, φ(x) = −1 if 1 ≤ x ≤ 2. diagram Then it is easy to compute the indefinite integral Z x Φ(x) = φ, 0 ( x if 0 ≤ x < 1, Φ(x) = 2 − x if 1 ≤ x ≤ 2. diagram There are three points to make. • Even though φ is not continuous (it is discontinuous at x = 1) its indefinite integral Φ is continuous. • If we change the value of φ at the discontinuity x = 1 this will not change the indefinite integral of φ. • The indefinite integral is continuous at x = 1 but it is not differentiable at x = 1. Furthermore, if we change the value of φ at x = 1 we will not change the fact that the indefinite integral is not differentiable at x = 1. Now we see why Theorem 2.12 makes no assertion at the partitioning points pi of the partition. There is an example where the indefinite integral Φ of a step function φ is not differentiable at the points pi of an appropriately chosen partition. 3 Regulated functions Definition 3.1. A function f : [a, b] → R is a regulated function if given > 0 there is a step function φ : [a, b] → R such that |f (x) − φ(x)| < , 15 for all x ∈ [a, b]. Thus it follows that (using the notation of Definition 3.1) that the distance between the graph of f and the graph of φ is always less than . diagram So, it should be the case that in the region where a ≤ x ≤ b, the area under the graph of y = f (x) differs from the area under the graph y = φ(x) by at most (b − a). Thus we should be able to approximate the area under the graph y = f (x) to any degree of accuracy by the area of the graph under a step function. Thus, taking account of the main point in 1.1 of the introduction, it suggests that we should define Z b f a by approximating f by step functions and taking a limit. This is exactly what we will do. The single most important step in implementing this procedure is to decide what kind of limit to take. Definition 3.2. A sequence of step functions φn : [a, b] → R converges uniformly to a function f : [a, b] → R if given > 0 there is an integer N such that n > N =⇒ |f (x) − φn (x)| < , for all x ∈ [a, b]. The key point in this definition is that N is independent of x; there is one N which works for all x. We sometimes say that φn converges uniformly on [a, b] to emphasize that we can choose N which is independent of x provided x ∈ [a, b]. Lemma 3.3. Let f : [a, b] → R be a function. Then f is a regulated function if and only if there is a sequence of step functions φn : [a, b] → R such that φn converges uniformly to f on [a, b]. Proof. First suppose f is regulated. Choose a step function φn such that |f (x) − φn (x)| < 1/n, for all x ∈ [a, b]. That is take = 1/n in the definition of a regulated function, Definition 3.1. Now it should be clear that the sequence of step functions φn : [a, b] → R converges to f uniformly on [a, b]. (Given > 0, choose N such that N > 1/. Then . . . .) 16 Conversely, suppose that φn : [a, b] → R is a sequence of step functions which converges uniformly to f : [a, b] → R. Then it should be clear that given > 0 there is a step function φ such that |f (x) − φ(x)| < , for all x ∈ [a, b]. (Given > 0, choose N such that n ≥ N =⇒ |f (x) − φn (x)| < , for all x ∈ [a, b]. Now take φ to be φN .) Think of the relation between step functions and regulated functions as analogous to the relation between rational numbers and real numbers. Given a real number x and > 0 there is a rational number q such that |r − q| < . Equivalently, we can find a sequence qn of rational numbers such that qn converges to x. Replace x by f and q by φ. Replace “|r − q| < ” by “|f (x) − φ(x)| < for all x ∈ [a, b]”. We get the definition of a regulated function. Replace qn by φn and “converges to” by “converges to uniformly on [a, b]”. We get Lemma 3.3. Now we implement the idea of defining the integral of a regulated function f by approximating f by step functions and then taking a limit. Here is the main technical result. Theorem 3.4. Let φn : [a, b] → R be a sequence of step functions which converges uniformly to a regulated function f : [a, b] → R. Then the sequence of real numbers Z b φn a converges. Furthermore, if φn , ψn : [a, b] → R are two sequences of step functions both of which converge uniformly to f : [a, b] → R then Z b Z b lim φn = lim ψn . n→∞ n→∞ a a Before giving the proof of this theorem, we will take a time-out to motivate the proof—essentially this means explaining how one might R b arrive at the proof. How can we prove that the sequence of real numbers a φn converges? Since we have no idea of what the limit might be, the only general method we could use is to try to prove that this sequence is a Cauchy sequence. If we are going to prove this, we must estimate Z b Z b Z b = . φ − φ (φ − φ ) n m n m a a a 17 Now, the only general method we have for estimating the integral of a step function (like φn − φm ) is to use Lemma 2.10. Therefore we must try to estimate |φn (x) − φm (x)| . We can estimate the distance from φn (x) to φm (x) by calculating the distance from φn (x) to f (x), the distance from f (x) to φm (x) and using the triangle inequality. This means we should use the inequality |φn (x) − φm (x)| = |φn (x) − f (x) + f (x) − φm (x)| ≤ |φn (x) − f (x)| + |f (x) − φm (x)| . Finally, we can control |φn (x) − f (x)| by using the fact that φn converges uniformly to f . Now we will convert this into a formal proof. However, before doing so we will point out one technical point which often occurs in proofs in Analysis. Remark 3.5. Consider a sequence an and suppose we know the following statement: given > 0 there exists N such that n ≥ N =⇒ |an | ≤ K where K is a constant. This is very close to the statement that the sequence an converges to zero but it is different in two respects: the inequality we end up with involves K rather than , and ≤ rather than <. It is often the case that the natural argument produces an inequality of this kind. This statement does indeed imply that an converges to zero. It seems wise to make this absolutely clear now, so that we can save time later, and avoid awkward arguments involving arbitray looking choices of constants, which have to be made at the beginning of the proof, in order to make the final inequality, at the end of the proof, come out right. Given > 0 we take /2K as the small number in the above statement and conclude that there exists N such that n ≥ N =⇒ |an | ≤ K(/2K) = /2 Choose such an integer N , and then n ≥ N =⇒ |an | < . Thus an converges to 0. 18 Proof of Theorem 3.4. First we will prove that the sequence of integrals converges. The method is to prove that it is a Cauchy sequence. Given > 0 choose N such that if n > N then |f (x) − φn (x)| < , Rb a φn for all x ∈ [a, b]. Now |φn (x) − φm (x)| = |φn (x) − f (x) + f (x) − φm (x)| ≤ |φn (x) − f (x)| + |f (x) − φm (x)| , and so if n, m > N then |φn (x) − φm (x)| < 2, for all x ∈ [a, b]. By lemma 2.10 it follows that Z b (φn (x) − φm (x)) ≤ 2(b − a) a and therefore, by Lemma 2.9 Z b Z b ≤ 2(b − a). φ (x) − φ (x) n m a a So we have shown that given > 0 there exists N such that Z b Z b φm (x) ≤ 2(b − a). n, m ≥ N =⇒ φn (x) − a a This proves that the sequence b Z φn a of real numbers is a Cauchy sequence; compare Remark 3.5. Now suppose that φn , ψn : [a, b] → R are two sequences of step functions which converge uniformly to f : [a, b] → R. Given > 0 choose N1 such that if n ≥ N1 then |f (x) − φn (x)| < for all x ∈ [a, b]. Also, choose N2 such that if n ≥ N2 then |f (x) − ψn (x)| < for all x ∈ [a, b]. 19 Now let N = max(N1 , N2 ). Then if n ≥ N |φn (x) − ψn (x)| = |φn (x) − f (x) + f (x) − ψn (x)| ≤ |φn (x) − f (x)| + |f (x) − ψn (x)| < 2 and this inequality is valid for all x ∈ [a, b]. Now using Lemma 2.9 and Lemma 2.10 as above, it follows that Z b Z b ≤ 2(b − a). φ − ψ n n a a Thus the sequences Z b φn , Z a b ψn a have the same limit. Once more, the technique of Remark 3.5 is needed to finish off the proof. Definition 3.6. The integral of a regulated function f : [a, b] → R is defined as follows. Choose a sequence of step functions φn : [a, b] → R which converges uniformly to f . Then the integral of f is defined by Z b Z b f = lim φn . a n→∞ a In view of Lemma 3.3, there is a sequence of step functions which converges to the regulated function f ,Rand Theorem 3.4 shows that this definition b makes sense (that is the sequence a φn converges), and is independent of the choice of the sequence of step function which converges uniformly to f . Example 3.7. We will illustrate the definition of the integral of a regulated function by showing that the function f : [0, 1] → R defined by f (x) = xk , where k is an integer with k ≥ 1 is regulated and calculate its integral. The proof is based on what would naturally be called the upper-ordinate rule. diagram We will divide the proof up into a number of steps. 20 Step 1. First we divide the interval [0, 1] into n sub-intervals of length 1/n. Now introduce the step function φn : [0, 1] → R suggested by the above diagram. That is define φn (x) = (r/n)k , if x ∈ ( r−1 r , ] with 1 ≤ r ≤ n. n n Now we must estimate |xk − φn (x)|. Step 2. Recall that xk − y k = (x − y)(xk−1 + xk−2 y + · · · + xy k−2 + y k − 1). Now if x, y ∈ [0, 1] each of the terms on the right hand side of this equation is less than 1 and there are k terms. Therefore we get the inequality x, y ∈ [0, 1] |xk − y k | ≤ k|x − y|. =⇒ Step 3. Now we use Step 2 to estimate |xk −φn (x)| as follows. If x ∈ ( r−1 , nr ] n then k rk k |x − φn (x)| = x − k n r ≤ k x − n k ≤ . n The first inequality follows directly from Step 2 and the second follows since x ∈ ( r−1 , nr ] and so |x − nr | < n1 . Thus we have shown that n x∈( r−1 r k , ] =⇒ |xk − φn (x)| ≤ . n n n Since each point x ∈ [0, 1] is in one (and only one) of the half open intervals ( r− , r the inequality is valid for all x and so n n |xk − φn (x)| < k n for all x ∈ [0, 1]. Step 4. Step 3 shows that the sequence of step functions φn : [0, 1] → R converges uniformly to the function f : [0, 1] → R defined by f (x) = xk and therefore this function is regulated. You should write out the details of the proof. 21 Step 5. ¿From the definition of the integral of a step function it follows that Z 1 n X rk φn = . k+1 n 0 r=1 Now we use the following fact about the sum of the first n k-th powers (see Worksheet 1 for more details). Let sk (n) = n X rk . r=1 k+1 Then sk (n) is a polynomial in n with leading term nk+1 , in other words there are real (indeed rational numbers) a0 , . . . , ak such that sk (n) = nk+1 + ak nk + · · · a1 n + a0 . k+1 Therefore Z 1 φn = 0 n X rk nk+1 r=1 sk (n) nk+1 1 ak a1 a0 = + + · · · + k + k+1 . k+1 n n n = Step 6. Finally we compute the limit Z 1 lim φn . n→∞ 0 By Step 5 Z 0 1 φn = 1 ak a1 a0 + + · · · + k + k+1 k+1 n n n and using standard facts about limits it follows that 1 ak a1 a0 1 lim + + · · · + k + k+1 = . n→∞ k+1 n n n k+1 22 Step 7. We are now done since by the definition of the integral of a regulated function we know that Z 1 Z 1 1 k x dx = lim φn = . n→∞ 0 k+1 0 Now we show that a continuous function f : [a, b] → R is regulated. So the integral defined in Definition 3.6 is good enough to integrate every continuous function. To prove this we need a technical result. This is the general analogue of the first step in the proof that xk is regulated on [0, 1] given in Example 3.7. Theorem 3.8. Suppose f : [a, b] → R is continuous. Then given > 0 there exists δ > 0 such that x, y ∈ [a, b], |x − y| < δ =⇒ |f (x) − f (y)| < . This result looks very much like the definition of continuity but it is not and it is very important to understand how the conclusion of this theorem differs from the definition of continuity. Fix x ∈ [a, b]. Since f is continuous we can, by the definition of continuity, choose δx , which depends on x as the notation implies, such that y ∈ [a, b], |x − y| < δx =⇒ |f (x) − f (y)| < . The content of the theorem is that since we are working on a closed interval [a, b] we can choose one δ which will work for all x. The proof of this theorem is typical of many proofs in Analysis. It starts from asking the following question: what would happen if the result were false? In other words we use “proof by contradiction”. Proof of Theorem 3.8. Suppose the result is false. Then there exists > 0 such that for any δ > 0 there are points x, y ∈ [a, b] such that |x − y| < δ and |f (x) − f (y)| ≥ . These points x and y depend on δ. This allows us to construct sequences of points by the following procedure. Take δ = 1 and choose points x1 , y1 ∈ [a, b] such that |x1 − y1 | < 1 and |f (x1 ) − f (y1 )| ≥ . Now take δ = 1/2 and choose points x2 , y2 ∈ [a, b] such that |x2 − y2 | < 1/2 and |f (x2 ) − f (y2 )| ≥ . 23 Carry on: this means take δ = 1/n and choose points xn , yn ∈ [a, b] such that |xn − yn | < 1/n and |f (xn ) − f (yn )| ≥ . Since xn ∈ [a, b], the sequence xn is a bounded sequence of real numbers and by the Bolzano-Weierstrass theorem it has a convergent subsequence xnr . Now |xnr − ynr | < 1/nr and since xnr converges it follows that ynr also converges, lim xnr = lim ynr r→∞ r→∞ and this common limit u is a point in [a, b]. Now f : [a, b] → R is continuous. Therefore it maps convergent sequences in [a, b] to convergent sequences and limits to limits. So it follows that lim f (xnr ) = f (u) = lim f (ynr ) r→∞ r→∞ and therefore lim |f (xnr ) − f (ynr )| = 0. r→∞ However, by construction |f (xnr ) − f (ynr )| ≥ . This is a contradiction. Example 3.9. Theorem 3.8 is false for continuous functions on intervals other than closed intervals. For example, consider the function f : (0, 1] → R, f (x) = 1/x. This is continuous. However, if we take x = n1 and y = integer and n ≥ 1 then 1 1 1 n + 1 − n = n(n + 1) 1 n+1 where n is an |f (1/n + 1) − f (1/n)| = |n + 1 − n| = 1. So given > 0 with < 1. Let δ be any real number. Choose n such that 1 < δ. n(n + 1) Then but 1 1 1 − n + 1 n = n(n + 1) < δ |f (1/n + 1) − f (1/n)| = |n + 1 − n| = 1 > . 24 Remark 3.10. Notice how this result is a general analogue of Step 2 in the argument given in Example 3.7. Consider the function f : [0, 1] → R defined by f (x) = xk . Given , choose δ such that δ< . k Then the inequality x, y ∈ [0, 1] |xk − y k | ≤ k|x − y| =⇒ shows that x, y ∈ [0, 1], |x − y| < δ =⇒ |f (x) − f (y)| < . Now now prove that every continuous function f : [a, b] → R is regulated by following the steps used to prove that xk is regulated in Example 3.7. Theorem 3.11. Let f : [a, b] → R be a continuous function. Then f is regulated. Proof. Since f : [a, b] → R is continuous, it follows from Theorem 3.8 that given > 0 there exists δ > 0 such that if x, y ∈ [a, b] then |x − y| < δ =⇒ |f (x) − f (y)| < . Choose such a δ. Now choose a partition a = p0 < p1 < · · · < pk−1 < pk = b of the interval [a, b] such that pi − pi−1 < δ for 1 ≤ i ≤ k. Define φ : [a, b] → R by φ(a) = f (a) φ(x) = f (p1 ) φ(x) = f (p2 ) .. . φ(x) = f (pr ) if a < x ≤ p1 if p1 < x ≤ p2 if pr−1 < x ≤ pr We will show that |f (x) − φ(x)| < for all x ∈ [a, b]. To do this fix x ∈ [a, b]. Then there is a unique r such that x ∈ (pr−1 , pr ] and then |f (x) − φ(x)| = |f (x) − f (pr )|. 25 However, x ∈ (pr−1 , pr ] and so |x − pr | < pr − pr−1 < δ and therefore |f (x) − f (pr )| < . This shows that |f (x) − φ(x)| < , for all x ∈ [a, b]. Therefore f : [a, b] → R is regulated. Next we establish the most important basic properties of the integral of a regulated function. Lemma 3.12. Let f : [a, b] → R be a regulated function and let [u, v] be a closed interval contained in [a, b]. Then the restriction of f to [u, v] gives a regulated function f : [u, v] → R. Proof. Given > 0, let φ : [a, b] → R be a step function such that |f (x) − φ(x)| < , for all x ∈ [a, b]. |f (x) − φ(x)| < , for all x ∈ [u, v]. Then plainly Since the restriction of φ to [u, v] is a step function, this proves that the restriction of f to [u, v] is regulated. So we can form the integral v Z φ u for any u, v with a ≤ u < v ≤ b. Lemma 3.13. Let f : [a, b] → R be a regulated function. Suppose a ≤ u ≤ v ≤ w ≤ b, then Z v Z w Z w f+ f= f. u v u Proof. Choose a sequence of step functions φn : [a, b] → R which converges uniformly to f : [a, b] → R. Then the restriction of φn to [u, v], [v, w] and [u, w] gives sequences of step functions which converge uniformly to the 26 restriction of f uniformly on the intervals [u, v], [v, w] and [u, w] respectively. Thus Z v Z v lim φn = f n→∞ u Z w Zu w lim φn = f n→∞ v v Z w Z w lim φn = f n→∞ u u However, by Lemma 2.8 we know that Z v Z w Z φn + φn = u v w φn u and so it follows by taking the limit as n → ∞ that Z v Z w Z w f+ f= f. u v u Lemma 3.14. Let f, g : [a, b] → R be regulated functions, and let λ, µ be real numbers. Then λf + µg is a regulated function and Z b Z b Z b (λf + µg) = λ f +µ g. a a a Proof. Choose sequences of step functions φn , ψn : [a, b] → R such that φn converges uniformly to f and ψn converges uniformly to g. Then λφn + µψn : [a, b] → R is a sequence of step functions which converges uniformly to λf + µg : [a, b] → R. this proves that λf + µg is regulated. By the algebra of limits Z b Z b Z b lim (λφn + µψn ) = λ lim φn + µ lim ψn n→∞ n→∞ a and since lim n→∞ b Z φn = a it follows that Z Z b f, a b (λf + µg) = λ a lim n→∞ Z a 27 n→∞ a Z b ψn = Z a b f +µ a Z a b g. b g a In the language of linear algebra, Lemma 3.14 shows that the set R(a, b) ofregulated functions [a, b] → R is a vector space over R and the function Z b I : R[a, b] → R, I(f ) = f a is linear. The next lemma tells us that if we are given some bound on a regulated function then we get a bound on its integral. Lemma 3.15. Let f : [a, b] → R be a regulated function and suppose m ≤ f (x) ≤ M, for all x ∈ [a, b], where m and M are real numbers. Then Z b m(b − a) ≤ f ≤ M (b − a). a Proof. Given > 0 choose a step function φ : [a, b] → R, which depends on , such that |φ(x) − f (x)| < , for all x ∈ [a, b]. We can rewrite this inequality as − + f (x) < φ(x) < f (x) + . Now using the bounds on f (x) we get the inequality − + m < φ(x) < M + , for all x ∈ [a, b]. Now using Lemma 2.10 we get the inequality Z b (m − )(b − a) ≤ φ(x) ≤ (M + )(b − a). a Take = 1/n. Then we get a sequence of step functions φn : [a, b] → R such that |φn (x) − f (x)| < 1/n, for all x ∈ [a, b], Z b (m − 1/n)(b − a) ≤ φn (x) ≤ (M + 1/n)(b − a). a Now the first of these two statements tells us that φn : [a, b] → R converges uniformly to f : [a, b] → R. Thus Z b Z b φn = f. lim n→∞ a a 28 Using the second statement we see that m(b − a) = lim (m − 1/n)(b − a) ≤ lim n→∞ b Z n→∞ φn (x) a ≤ lim (M + 1/n)(b − a) = M (b − a) n→∞ Since lim n→∞ Z b φn = a Z b f. a we get the required inequality. Corollary 3.16. Let f : [a, b] → R be a regulated function and suppose |f (x)| ≤ K, for all x ∈ [a, b], where K is a real number with K ≥ 0. Then Z b ≤ K(b − a). φ a Proof. The proof is formally identical to the proof of Corollary 2.11. The details are left to the reader. 4 The indefinite integral and the fundamental theorem of calculus So far, we have defined the integral of a regulated function and shown that some of the natural elementary properties of an integral do indeed hold. We have also proved that any continuous function on a closed interval is regulated and therefore integrable. We now turn our attention to proving the fundamental theorem of calculus, and therefore justifying the usual methods of calculating integrals. More generally, we will study the indefinite integral of a regulated function. Let f : [a, b] → R be a regulated function. Then for any x ∈ [a, b] we can form the integral Z x f a and we define the indefinite integral of f to be the function Z x F : [a, b] → R, F (x) = f. a Note that this indefinite integral is normalised so that F (a) = 0. We now aim to establish the properties of this indefinite integral. 29 Theorem 4.1. Let f : [a, b] → R be a regulated function. Then the indefinite integral F : [a, b] → R of f is continuous. The proof of this theorem requires a preliminary lemma. Lemma 4.2. Let f : [a, b] → R be a regulated function. Then f is bounded. Proof. A step function φ : [a, b] → R is constant on the open intervals of some partition of [a, b]. Therefore the only values φ attains are its values on these open intervals, together with the values it attains at the end-points of these intervals. In particular φ assumes only a finite number of values. Therefore, a step function is bounded. Now if f : [a, b] → R is a regulated function, choose a step function φ : [a, b] → R such that |f (x) − φ(x)| < 1, for all x ∈ [a, b]. Now choose numbers k and K such that k ≤ φ(x) ≤ K, for all x ∈ [a, b]. Now from the first inequality it follows that φ(x) − 1 < f (x) < φ(x) + 1, for all x ∈ [a, b]. and using the second inequality gives k − 1 < f (x) < K + 1, for all x ∈ [a, b]. This proves that f is bounded. Proof of Theorem 4.1. By Lemma 4.2 we know that f is bounded. So choose M such that |f (x)| ≤ M, for all x ∈ [a, b]. Now if x, y ∈ [a, b] and x ≤ y Z y Z x |F (y) − F (x)| = f− f Za y a = f x ≤ M (y − x) The last inequality follows from Corollary 3.16. This shows that F is continuous. (Given > 0 choose δ such that δ < /M . Then if |y − x| < δ, the above inequality shows that |F (y) − F (x)| < .) 30 Thus the indefinite integral of a regulated function is continuous, even though the regulated function itself need not be continuous. This is an example of how integration “improves” functions. We now go on to show that the indefinite integral of a continuous function is differentiable. Indeed we will prove the following theorem which is one version of the fundamental theorem of calculus. Theorem 4.3. Let f : [a, b] → R be a regulated function. Suppose f is continuous at c with a < c < b. Then the indefinite integral F : [a, b] → R of f is differentiable at c and F 0 (c) = f (c). Proof. Suppose h > 0. Then Z c+h Z c |F (c + h) − F (c)| = f− f Za c+h a = f c and we can estimate |F (c+h)−F (c)| by using the fact that f is continuous at c to control f in the region [c, c + h] and then using Lemma 3.15 to estimate the integral. Given > 0 choose δ > 0 such that x ∈ [a, b], |x − c| < δ =⇒ |f (x) − f (c)| < . Here of course, we are using the fact that f is continuous at c. In particular, if c − δ < x < c + δ it follows that f (c) − < f (x) < f (c) + . Now suppose δ > h > 0. Then it follows from Lemma 3.15 that Z c+h h(f (c) − ) ≤ f ≤ h(f (c) + ) c and therefore h(f (c) − ) ≤ F (c + h) − f (c) ≤ h(f (c) + ). Since h is positive f (c) − ≤ F (c + h) − f (c) ≤ f (c) + h 31 and re-arranging this inequality gives F (c + h) − f (c) − f (c) ≤ . h This proves that if δ > h > 0 then F (c + h) − f (c) ≤ . − f (c) h − ≤ It is easy to see how to modify the argument to prove that if −δ < h < 0 then we get the same inequality. This proves that given > 0 we can find δ > 0 such that F (c + h) − f (c) 0 < |h| < δ =⇒ − f (c) ≤ , h and therefore F (c + h) − f (c) = f (c). h→0 h This proves F is differentiable at c and F 0 (c) = 0. lim This result gives the first version of the fundamental theorem of calculus. Corollary 4.4. Let f : [a, b] → R be a continuous function and let F : [a, b] → R be the indefinite integral of f , that is Z x F (x) = f. a Then F is differentiable and F 0 = f . Proof. By Theorem 3.11, we know that f is regulated and therefore we can form the indefinite integral Z x F (x) = f. a Since f is continuous, Theorem fundamental theorem: 1 shows that F 0 = f. Now we come to the second version of the fundamental theorem of calculus. Theorem 4.5. Let f : [a, b] → R be a regulated function and suppose there is a differentiable function g : [a, b] → R such that g 0 = f ; then Z b f = g(b) − g(a). a 32 Note that if f is continuous, Theorem 4.5 follows immediately from Theorem 4.3. However, Theorem 4.5 can be used even if f is not continuous. Proof of Theorem 4.5. Given > 0, let φ : [a, b] → R be a step function such that |f (x) − φ(x)| < , for all x ∈ [a, b]. Let a = p0 < p1 < · · · < pk−1 < pk = b be a partition of [a, b] such that φ is constant on each of the open intervals (pr−1 , pr ) and let cr be the value φ attains on this open interval. Now, using the mean value theorem for differentiable functions, it follows that g(pr ) − g(pr−1 ) = g 0 (xr )(pr − pr−1 ) for some xr ∈ (pr−1 , pr ). Now g 0 (xr ) = f (xr ), |f (xr ) − φ(xr )| < , φ(xr ) = cr and putting these three facts together gives cr − < g 0 (xr ) < cr + . This now gives us the following inequality (cr − )(pr − pr−1 ) < g(pr ) − g(pr−1 ) < (cr + )(pr − pr−1 ) which can be rewritten as −(pr − pr−1 ) < g(pr ) − g(pr−1 ) − cr (pr − pr−1 ) < (pr − pr−1 ). Summing these inequalities gives − k X (pr −pr−1 ) < r=1 k X (g(pr ) − g(pr−1 ))− r=1 k X cr (pr −pr−1 ) < r=1 r=1 and therefore −(b − a) < g(b) − g(a) − Z b φ < (b − a). a However, − < φ(x) − f (x) < for all x ∈ [a, b] and so Z b φ − f < (b − a) −(b − a) < a 33 k X (pr −pr−1 ) and combining this inequality with the previous one gives Z b −2(b − a) < g(b) − g(a) − f < 2(b − a). a Since is an arbitrary positive real number, it follows that Z b f = g(b) − g(a). a 5 The mid-ordinate rule and its generalisations We now turn our attention to one of the problems which led us to look for a mathematically rigorous theory of integration—the mid-ordinate rule. Actually we will formulate and prove a theorem which might well be called the “any ordinate rule”. Let f : [a, b] → R be a continuous function. Given an integer n, divide the interval [a, b] up into n subintervals each of length (b − a)/n. In formulas, for 0 ≤ r ≤ n define r(b − a) pr = a + . n Thus a = p0 < p1 < · · · < pn−1 < pn = b is a partition of [a, b] into n subintervals each of the same length. We will denote this partition by Pn . For each r with 1 ≤ r ≤ n choose a point tr in (pr−1 , pr ]; thus pr−1 < tr ≤ pr . Now form the sum An = n X f (tr ) r=1 b−a . n Theorem 5.1. This sequence An converges and Z b lim An = f. n→∞ a The idea of the proof is to express the sum An = n X f (tr ) r=1 34 b−a n as an integral An = Z b φn a where φn is a step function and to prove that the sequence φn : [a, b] → R of step functions converges uniformly to the given function f . The following diagram illustrates what is going on diagram The formal proof, which is essentially identical to the proof that a continuous function is regulated goes as follows. Proof of Theorem 5.1. Define the step function φn : [a, b] → R as follows φn (a) = f (a) φn (x) = f (tr ) if pr−1 < x ≤ pr . Now, given > 0, we can use Theorem 3.8 to choose an integer N such that if x, y ∈ [a, b] |x − y| < 1/N =⇒ |f (x) − f (y)| < . Now suppose n > N and consider |φn (x) − f (x)|. Now x must be in one of the intervals in the n-th partition Pn . So there is a unique r with pr−1 < x ≤ pr and then φn (x) = f (tr ) where tr is the chosen point with pr−1 < tr ≤ pr ; thus |φn (x) − f (x)| = |f (tr ) − f (x)|. Both x and tr are in the interval (pr−1 , pr ]; this interval has length 1/n which is smaller that 1/N (since n > N ). Therefore |x − tr | < 1/N and so |f (tr ) − f (x)| < . 35 Since |φn (x) − f (x)| = |f (tr ) − f (x)| this proves that if n > N then |φn (x) − f (x)| < for all x ∈ [a, b]. Thus φn : [a, b] → R converges uniformly to f : [a, b] → R. However, b Z φn = a n X f (tr ) r=1 b−a = An n and so, since φn : [a, b] → R converges uniformly to f : [a, b] → R it must follow that Z b lim An = f. n→∞ a Notice that specialising to tr = pr = a + r(b − a)/n gives the upper-ordinate rule n X b−a lim = f (a + r(b − a)/n) n→∞ n r=1 Z b f; a specialising to tr = pr−1 + (b − a)/2n = a + (r − 1/2)(b − a)/n gives the mid-ordinate rule n X b−a lim = f (a + (r − 1/2)(b − a)/n) n→∞ n r=1 Z b f; a and specialising to tr = a + (r − 1)(b − a)/n gives the lower-ordinate rule n X b−a lim f (a + (r − 1)(b − a)/n) = n→∞ n r=1 36 Z a b f.